Investigation of COSMO-SAC model for solubility and cocrystal formation of pharmaceutical compounds

Mahmoudabadi, Samane Zarei; Pazuki, Gholamreza

doi:10.1038/s41598-020-76986-3

Download PDF

Article
Open access
Published: 16 November 2020

Investigation of COSMO-SAC model for solubility and cocrystal formation of pharmaceutical compounds

Samane Zarei Mahmoudabadi¹ &
Gholamreza Pazuki¹

Scientific Reports volume 10, Article number: 19879 (2020) Cite this article

8264 Accesses
33 Citations
Metrics details

Subjects

Abstract

In this study, a predictive model named COSMO-SAC was investigated in solid/liquid equilibria for pharmaceutical compounds. The examined properties were the solubility of drug in the pure and mixed solvents, octanol/water partition coefficient, and cocrystal formation. The results of the original COSMO-SAC model (COSMO-SAC (2002)) was compared with a semi-predictive model named Flory–Huggins model and a revised version of the COSMO-SAC (COSMO-SAC (2010)). The results indicated the acceptable accuracy of the COSMO-SAC (2002) in the considered scope. The results emphasized on the suitability of the COSMO-SAC model for simple molecules containing C, H, and O by covalent and hydrogen bonding interactions. Applicability of the COSMO-SAC for more complicated molecules made of various functional groups such as COO and COOH doubly requires more modification in the COSMO-SAC.

Investigation of Lacosamide solubility in supercritical carbon dioxide with machine learning models

Article Open access 27 November 2025

A predictive PC-SAFT EOS based on COSMO for pharmaceutical compounds

Article Open access 19 March 2021

Solubility of Ketoconazole (antifungal drug) in SC-CO₂ for binary and ternary systems: measurements and empirical correlations

Article Open access 06 April 2021

Introduction

Knowing of phase equilibria, and thermodynamic properties such as solubility and partition coefficient for pharmaceutical compounds has wide applications in the design, development, and optimization of their manufacturing in laboratory or industry scale. Besides of experimental approach, which is time-consuming and expensive, the mathematical modeling gathered attentions due to lower cost and wide working range without further limitations from the substance type and ambient conditions. Generally, three aspects reported for the thermodynamic modeling are: (1) semi-empirical model, (2) semi-predictive model, (3) predictive model, which have different accuracies and reliable ranges. The theoretical quantum chemistry applied in the model proposal and the need to experimental data are the most significant differences between groups (2) and (3). While the semi-empirical models often are correlations without theoretical meaning obtained by experiment for certain species.

Among the mentioned models, the predictive models estimate entirely the desired properties by knowing only the molecular structure without the further requirement to experimental data. The UNIFAC¹, the NRTL-SAC², the COSMO-RS^3,4,5, and the COSMO-SAC⁶ are a few examples. The predictive models, such as the UNIFAC, primarily defined based on functional group and several adjustable parameters. In contrast, other two predictive models, such as the COSMO-RS and the COSMO-SAC, are conductor-like screening models-realistic solvation and compute activity coefficient based on the computational quantum mechanics by knowing the molecular structure and fewer adjustable parameters in comparison to the UNIFAC. The COSMO-RS is the firstly developed by extension of a dielectric continuum-solvation model to liquid phase thermodynamics, and the COSMO-SAC is a modified version of the COSMO-RS⁷.

The several researcher studied the COSMO-SAC and the COSMO-RS. Tung et al.⁸ compared the NRTL-SAC and the COSMO-SAC to predict pharmaceutical solubilities for Lovastatin, Simvastatin, Rofecoxib, and Etoricoxib. Zhou et al.⁹ applied the COSMO-SAC to separate thioglycolic acid from its aqueous solution by ionic liquids. Paese et al.¹⁰ considered the COSMO-SAC for predicting phase equilibria of aqueous sugar solutions and industrial juices. Xavier et al.¹¹ studied vapor–liquid equilibria (VLE) of systems containing fragrances using the COSMO-SAC. Bouillot et al.¹² investigated drug solubilities by the COSMO-SAC. Shu and Lin¹³ predicted drug solubility in mixed solvent systems using the COSMO-SAC activity coefficient model. Buggert et al.¹⁴ applied the COSMO-RS for partition coefficient calculations. Hsieh et al.¹⁵ considered the original COSMO-SAC (COSMO-SAC 2002) and revised the COSMO-SAC models (COSMO-SAC 2010) for solubility and octanol/water partition coefficient for pharmaceutical compounds. They reported a 388% error for solubility prediction from the original COSMO-SAC (COSMO-SAC 2002).

In contrast to researchers focused on the predictive ability of the COSMO-SAC for different systems, some authors studied the primary quantum mechanism applied in the COSMO-SAC and developed various data bank. Mullins et al.¹⁶ developed a database consist of 1432 COSMO files and provided FORTRAN code for sigma profile and activity computations. Bell et al.¹⁷ assembled an extensive database of COSMO files for 2261 compounds. Ferrarini et al.¹⁸ distributed a sigma-profile database for a wide range of molecules using the GAMESS software. They also tested different quantum chemistry theories for the calculation of the electronic structure. Mu et al.¹⁹ examined the performance of COSMO-RS with sigma profiles from different theories.

Some authors modified the COSMO-SAC model in order to increase accuracy. Lee and Lin²⁰ added Peng–Robinson EOS to the COSMO-SAC. Firstly, Lin et al.²¹ introduced the concept of modifying sigma profile to enhance model precisions. Hsieh et al.²² improved the COSMO-SAC for vapor–liquid and liquid–liquid equilibrium calculations by separating the sigma profile into HB-OH, HB-nonOH, and non-HB. Afterward, Paulechka et al.²³ revised the COSMO-SAC model by splitting the sigma profile into OH and non-OH parts and Islam and Chen²⁴ proposed a method for the sigma profile generation input into the COSMO-SAC.

The object of this study is to investigate the performances of two existing predictive models based on COSMO calculations, the COSMO-SAC (2002) and the COSMO-SAC (2010), for pharmaceutical compounds and to compare it with another widely applicable predictive model called the Flory–Huggins model²⁵. By comparison of the COSMO-SAC to another predictive model such as the Flory–Huggins model, its unremarkable impacts in the predictive model scope is determined. The examined pharmaceutical compounds contain H, C, O, N, S, F, and Cl atoms and include at least one hydrogen bonding or double bond between atoms. The solubility in binary and ternary systems, octanol/water partition coefficient, and cocrystal formation are of interest in the current study. For solubility in the binary system, 918 data for 110 systems for 35 pharmaceutical compounds are over temperature ranges 262–360 K and the mole fractions $1 \times 10^{ - 7}$ to 0.7. Afterward, two systems of cocrystal formation, sulfamethazine-salicylic acid in methanol solvent and carbamazepine-acetyl salicylic acid in ethanol, are investigated by the COSMO-SAC (2002) model which have not been studied before.

Methods

COSMO file and sigma profile

As described before, the basis of the COSMO-SAC model is quantum mechanics through density function theory calculations. Several commercial or free software provide preliminary information for COSMO-SAC in the form of a text file called COSMO-file. Dmol³ module in Materials Studio and academic free software GAMESS are few examples. In COSMO calculations, a molecule separates into several parts called segment and charge distributions over entire segments are calculated in order to neutralize whole molecule. Location of segments, segment areas and charge densities are the computed properties in COSMO file. In order to perform COSMO-SAC calculations, the following data must obtain from COSMO-file: (1) surface area ($A$) and cavity volume of the molecule ($V$), (2) location of segment (a vector with x, y and z coordination), its charge density ($\sigma_{n}^{ * }$) and area ($A_{n} (\sigma )$). The mentioned information were modified in order to make the sigma profile ($p(\sigma )$) required for COSMO-SAC calculations. Klamt et al.⁴ introduced the following equation to average the charge densities from COSMO-file

$$\sigma_{m} = \frac{{\sum\nolimits_{n} {\sigma_{n}^{ * } \frac{{r_{ave}^{2} r_{n}^{2} }}{{r_{ave}^{2} + r_{n}^{2} }}\exp \left( { - \frac{{d_{mn}^{2} }}{{r_{ave}^{2} + r_{n}^{2} }}} \right)} }}{{\sum\nolimits_{n} {\frac{{r_{ave}^{2} r_{n}^{2} }}{{r_{ave}^{2} + r_{n}^{2} }}\exp \left( { - \frac{{d_{mn}^{2} }}{{r_{ave}^{2} + r_{n}^{2} }}} \right)} }}$$

(1)

In the above equation, d_mn is the distance between two segments n and m. The r_n (segment radius) is obtained from segment area as follows:

$$r_{n} = \left( {\frac{{A_{n} }}{\pi }} \right)^{0.5}$$

(2)

Mullins et al.¹⁶ reported the value of r_ave. The sigma profile defined as the probability of finding segments with charge density $\sigma_{m}$:

$$p(\sigma_{m} ) = \frac{{n(\sigma_{m} )}}{{\sum\nolimits_{m} {n(\sigma_{m} )} }} = \frac{{A(\sigma_{m} )}}{{\sum\nolimits_{m} {A(\sigma_{m} )} }}$$

(3)

where n is determined from accounting the number of segments with specific charge density $\sigma_{m}$ and $A(\sigma_{m} )$ is surface area with charge density $\sigma_{m}$.

Generally, for most molecules, charge density values range between − 0.025 to 0.025 $\frac{e}{{\dot{A}^{2} }}$. Four steps for generating the sigma profile are as below:

1.
Consider 50 intervals by 0.001 increments in charge density range − 0.025 to 0. 025.
2.
Each interval is defined by lower and upper bounds, $\sigma_{left}$ and $\sigma_{right}$. Firstly, find the charge densities distributed at interval i and calculated their contributions according to:
$$w_{i} (\sigma ) = \frac{{\sigma - \sigma_{i,left} }}{0.001}$$
(4)
3.
Afterward, calculate probabilities at lower and upper bounds of interval i as below:
$$A(\sigma_{i,left} )p(\sigma_{i,left} ) = \sum\limits_{{\sigma_{i,left} }}^{{\sigma_{i,right} }} {w_{i} (\sigma )A(\sigma )}$$
(5)
$$A(\sigma_{i,right} )p(\sigma_{i,right} ) = \sum\limits_{{\sigma_{i,left} }}^{{\sigma_{i,right} }} {[1 - w_{i} (\sigma )]A(\sigma )}$$
(6)
4.
The sigma profile is generated by plotting sigma values versus the calculated probabilities.

As described in literature review, some authors divided the sigma profile into parts to have a better description of hydrogen-bonding (hb) interactions. Hsieh et al.²² proposed to separate the sigma profile into non hydrogen bounding, hydroxyl group (OH) and non-hydroxyl group as follows equation (COSMO-SAC (2010)):

$$p(\sigma_{m} ) = p^{NHB} (\sigma_{m} ) + p^{OH} (\sigma_{m} ) + p^{OT} (\sigma_{m} )$$

(7)

where $p^{NHB} (\sigma_{m} )$ donates probabilities of all non-hydrogen bounding atoms, $p^{OH} (\sigma_{m} )$ shows probabilities of OH bounding and $p^{OT} (\sigma_{m} )$ determines F, N, and hydrogen atoms connected to F and N atoms. The above-mentioned contributions were determined as follows:

$$p^{OH} (\sigma_{m} ) = \frac{{A^{OH} (\sigma_{m} )}}{{A^{OH} (\sigma_{m} ) + A^{OT} (\sigma_{m} )}}p(\sigma_{m} )\left( {1 - \exp \left( { - \frac{{\sigma^{2} }}{{\sigma_{o}^{2} }}} \right)} \right)$$

(8)

$$p^{OT} (\sigma_{m} ) = \frac{{A^{OT} (\sigma_{m} )}}{{A^{OH} (\sigma_{m} ) + A^{OT} (\sigma_{m} )}}p(\sigma_{m} )\left( {1 - \exp \left( { - \frac{{\sigma^{2} }}{{\sigma_{o}^{2} }}} \right)} \right)$$

(9)

where $\sigma_{o}$ is threshold for hydrogen bounding determination and its values is 0.007 $\frac{e}{{\dot{A}^{2} }}$.

COSMO-SAC model

COSMO-SAC (2002)

In the COSMO-SAC model, activity coefficients computed by solvation energy were obtained from ab initio solvation calculation at two steps: (1) the dissolution of a solute in the conductor, (2) conversion of the conductor into a real solvent. The activity coefficient of component i in solvent S in the COSMO-SAC ($\gamma_{i,S}$) obtained by considering two contributions; combinatorial part.

($\gamma_{i,s}^{C}$) and residual part($\gamma_{i,s}^{R}$) as follows⁶:

$$\ln \gamma_{i,S} = \ln \gamma_{i,s}^{C} + \ln \gamma_{i,s}^{R}$$

(10)

The size and shape differences of the molecules are accounted in the combinatorial part and calculated by the Staverman–Guggenheim term as follows²⁶:

$$\ln \gamma_{i,s}^{C} = \ln \frac{{\phi_{i} }}{{x_{i} }} + \frac{z}{2}q_{i} \ln \frac{{\theta_{i} }}{{\phi_{i} }} + l_{i} - \frac{{\phi_{i} }}{{x_{i} }}\sum\limits_{j} {x_{j} l_{j} }$$

(11)

where $\theta_{i}$, $\phi_{i}$ and $l_{i}$ are defined as follows:

$$\theta_{i} = \frac{{x_{i} q_{i} }}{{\sum\nolimits_{i} {x_{i} q_{i} } }};\phi_{i} = \frac{{x_{i} r_{i} }}{{\sum\limits_{i} {x_{i} r_{i} } }};l_{i} = \frac{z}{2}\left( {r_{i} - q_{i} } \right) - (r_{i} - {1})$$

(12)

In the above expressions, $q_{i}$ and $r_{i}$ are related to cavity volume of component i ($V_{i}$) and total surface area of molecule i ($A_{i}$) obtained from the COSMO-file and defined as follows:

$$r_{i} = \frac{{V_{i} }}{{r_{o} }};q_{i} = \frac{{A_{i} }}{{q_{o} }}$$

(13)

where $r_{o}$ and $q_{o}$ are the normalized volume and normalized surface area. The residual part of the COSMO-SAC (2002) was defined as follows^6,17:

$$\ln \gamma_{i,s}^{R} = n_{i} \sum\limits_{{\sigma_{m} }} {p_{i} (\sigma_{m} )\left[ {\ln (\Gamma_{S} (\sigma_{m} )) - \ln (\Gamma_{i} (\sigma_{m} ))} \right]}$$

(14)

where $n_{i}$, effective segment number of molecule i, is correlated with effective segment surface area ($a_{eff}$) and surface area of molecule i ($A_{i}$) according to below expression:

$$n_{i} = \frac{{A_{i} }}{{a_{eff} }}$$

(15)

where $\Gamma (\sigma_{m} )$ is the segment activity coefficient and calculated from:

$$\ln (\Gamma_{S} (\sigma_{m} )) = - \ln \left\{ {\sum\limits_{{\sigma_{n} }} {p_{S} (\sigma_{n} )\Gamma_{S} (\sigma_{n} )\exp \left[ { - \frac{{\Delta W(\sigma_{m} ,\sigma_{n} )}}{RT}} \right]} } \right\}$$

(16)

$$\ln (\Gamma_{i} (\sigma_{m} )) = - \ln \left\{ {\sum\limits_{{\sigma_{n} }} {p_{i} (\sigma_{n} )\Gamma_{i} (\sigma_{n} )\exp \left[ { - \frac{{\Delta W(\sigma_{m} ,\sigma_{n} )}}{RT}} \right]} } \right\}$$

(17)

The exchange energy $\Delta W(\sigma_{m} ,\sigma_{n} )$ is defined:

$$\Delta W(\sigma_{m} ,\sigma_{n} ) = \left( {\frac{{\alpha^{\prime}}}{2}} \right)\left( {\sigma_{m} + \sigma_{n} } \right)^{2} + c_{hb} \max \left[ {0,\sigma_{acc} - \sigma_{hb} } \right]\min \left[ {0,\sigma_{don} + \sigma_{hb} } \right]$$

(18)

The $c_{hb}$ and $\sigma_{hb}$ are the energy-type constant and cutoff value for hydrogen bonding interaction¹⁶. The $\sigma_{acc}$ and $\sigma_{don}$ are maximum and minimum values of $\sigma_{m}$ and $\sigma_{n}$. $\alpha^{\prime}$ accounts the misfit energy and the T and R are system temperature and the universal gas constant. The values of above mentioned parameters are reported in Mullins et al.¹⁶. In Eq. (16), the sigma profile for the mixture ($P_{S} (\sigma )$) are obtained from:

$$P_{S} (\sigma ) = \frac{{\sum\nolimits_{i} {x_{i} A_{i} (\sigma )P_{i} (\sigma )} }}{{\sum\nolimits_{i} {x_{i} A_{i} (\sigma )} }}.$$

(19)

COSMO-SAC (2010)

After establishing NHB, OH, and OT sigma profiles, the segment activity coefficient calculates as follows:

$$\ln \Gamma_{j}^{t} (\sigma_{m}^{t} ) = - ln\left[ {\sum\limits_{s}^{NHB,OH,OT} {\sum\limits_{{\sigma_{n} }}^{{}} {p^{s} \left( {\sigma_{n}^{s} } \right)\Gamma_{j}^{s} \left( {\sigma_{n}^{s} } \right)\exp \left( {\frac{{ - \Delta W\left( {\sigma_{m}^{t} ,\sigma_{n}^{s} } \right)}}{RT}} \right)} } } \right]$$

(20)

where subscript $j$ shows pure liquid or mixture and subscript $t$ denotes NHB, OH, and OT sites. The exchange energy has defined based on interaction between segments of different types, and is given by:

$$\Delta W(\sigma_{m} ,\sigma_{n} ) = \left( {A_{ES} + \frac{{B_{ES} }}{{T^{2} }}} \right)\left( {\sigma_{m} + \sigma_{n} } \right)^{2} + c_{hb} (\sigma_{m} ,\sigma_{n} )\left( {\sigma_{m} - \sigma_{n} } \right)^{2}$$

(21)

In contrast to COSMO-SAC (2002), the hydrogen bounding interaction c_hb has variable values for the contributions OH and OT:

$$_{hb} (\sigma_{m}^{t} ,\sigma_{n}^{s} ) = \left\{ {\begin{array}{*{20}l} {c_{OH - OH} } \hfill & {t = s = OH,\sigma_{m}^{t} .\sigma_{n}^{s} < 0} \hfill \\ {c_{OT - OT} } \hfill & {t = s = OT,\sigma_{m}^{t} .\sigma_{n}^{s} < 0} \hfill \\ {c_{OH - OT} } \hfill & {t = OH{,}s = OT,\sigma_{m}^{t} .\sigma_{n}^{s} < 0} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right.$$

(22)

Three hydrogen bounding interaction parameters (c_OH-OH, c_OT-OT, and c_OH-OT), A_ES, and B_ES are adjustable parameters and their values were given in Hsieh et al.²². Afterward, the activity coefficient of component i in mixture S is determined from:

$$\ln \gamma_{i} = n_{i} \sum\limits_{t}^{NHB,OH,OT} {\sum\limits_{{\sigma_{n} }} {p_{i}^{t} \left( {\sigma_{n}^{t} } \right)\left( {\ln \Gamma_{S}^{t} \left( {\sigma_{n}^{t} } \right) - \ln \Gamma_{i}^{t} \left( {\sigma_{n}^{t} } \right)} \right)} } .$$

(23)

Flory–Huggins theory

In this study, a semi-predicative version of the Flory–Huggins model was incorporated based on the Hansen solubility parameters. In Flory–Huggins theory, activity coefficient of component i in mixture is obtained from²⁵:

$$\ln \gamma_{i} = \ln \frac{{\phi_{i} }}{{x_{i} }} + 1 - \frac{{\phi_{i} }}{{x_{i} }} + 2V_{i} \sum\limits_{j} {\chi_{ij} \phi_{j}^{2} } - V_{i} \sum\limits_{j} {\sum\limits_{k} {\phi_{j} \phi_{k} \chi_{jk} } }$$

(24)

In the above equation, $\phi$ is the volume fraction ($\phi_{i} = \frac{{x_{i} V_{i} }}{{\sum\nolimits_{i} {x_{i} V_{i} } }}$) and V is the molar volume.$\chi$ is the Flory–Huggins interaction parameter obtained from the Hansen solubility ($\delta$) contributions in the forms non-polar (dispersion) forces (d), polar forces (p) and hydrogen-bonding (h) effects as follows²⁷:

$$\chi_{ij} = \frac{{V_{i} }}{RT}\left( {\left( {\delta_{d,i} - \delta_{d,j} } \right)^{2} + 0.25\left( {\delta_{p,i} - \delta_{p,j} } \right)^{2} + 0.25\left( {\delta_{h,i} - \delta_{h,j} } \right)^{2} } \right)$$

(25)

The Hansen solubility parameters and their contributions were obtained by group contribution methods according to the following equations²⁸:

$$\delta_{d} = \frac{{\sum\nolimits_{i} {F_{d,i} } }}{{\sum\nolimits_{i} {V_{i} } }}; \, \delta_{p} = \frac{{\left( {\sum\nolimits_{i} {F_{p,i}^{2} } } \right)^{0.5} }}{{\sum\nolimits_{i} {V_{i} } }}; \, \delta_{h} = \frac{{\left( {\sum\nolimits_{i} {E_{h,i} } } \right)^{0.5} }}{{\sum\nolimits_{i} {V_{i} } }}; \, \delta_{t}^{2} = \, \delta_{d}^{2} + \, \delta_{p}^{2} + \, \delta_{h}^{2}$$

(26)

The $F_{d,i}$, $F_{p,i}$ ,$E_{h,i}$ and $V_{i}$ values were extracted from Barton²⁸.

Solid–liquid equilibria

In solid–liquid equilibria, the solid solubility in liquid phase is calculated according to the following expression:

$$\ln x_{i} = \frac{{\Delta H_{m} }}{R}\left( {\frac{1}{{T_{m} }} - \frac{1}{T}} \right) - \frac{{\Delta C_{P} }}{R}\left( {1 - \frac{{T_{m} }}{T} - \ln \frac{T}{{T_{m} }}} \right) - \ln \gamma_{i}$$

(27)

where $x_{i}$ and $\gamma_{i}$ stand the solubility and activity coefficient of compound i. The activity coefficient in the above expression was computed from the considered models as described before. $\Delta H_{m}$, $\Delta C_{P}$ and $T_{m}$ represent the fusion enthalpy, the heat capacity of phase change between solid and liquid phases and the melting point temperature, respectively. In the current study, the second term of Eq. (27) was neglected ($\Delta C_{P} = 0$).

Partition coefficient

When the equilibrium condition between two immiscible liquid phases establishes, the components distribute between two phases. The distribution of component i between two phases α and β measured by partition coefficient as follows¹⁵:

$$K_{i}^{\alpha ,\beta } = \frac{{x_{i}^{\alpha } }}{{x_{i}^{\beta } }} = \frac{{\gamma_{i}^{\beta } }}{{\gamma_{i}^{\alpha } }}$$

(28)

where $x_{i}^{\alpha }$ and $x_{i}^{\beta }$ are mole fractions of component i in phases α and β; and their activity coefficients, $\gamma_{i}^{\alpha }$ and $\gamma_{i}^{\beta }$, respectively. Therefore, the octanol/water partition coefficient for component i ($K_{OW,i}$) calculates from¹⁵:

$$\log K_{OW,i} = \log \left( {\frac{{C_{o,W} \gamma_{i}^{W,\infty } }}{{C_{o,O} \gamma_{i}^{O,\infty } }}} \right)$$

(29)

where $C_{o,O}$ and $C_{o,W}$ are total concentrations in octanol-rich and water-rich phases. The $\gamma_{i}^{O,\infty }$ and $\gamma_{i}^{W,\infty }$ are activity coefficients of component i in octanol-rich and water-rich phases at dilute concentration. The default values for $\frac{{C_{o,W} }}{{C_{o,O} }}$ is 0.151. The octanol-rich phase is composed from 27.5 mol% water and 72.5 mol% octanol. The water-rich phase is free of octanol.

Cocrystal formation

The three-phases diagram for a drug and an API with cocrystal (CC) formation includes three lines named solubility lines, API/solvent and drug/solvent, and cocrystal line. The solubility lines of drug and API in solvent are determined from solubility calculations of drug/API in mixture according to Eq. (27) in corporation with the considered models. The cocrystal formation is identified by a chemical reaction between the drug (A) and the API (B) as follows^29,30:

$$aA + bB\overset {K_{CC} } \longleftrightarrow A_{a} B_{b}$$

(30)

where a and b are stoichiometric coefficient of substances A and B in the cocrystal. In the above equations, the K_cc is solubility product and are computed by the following equation:

$$K_{CC} = (x_{A} \gamma_{A} )^{a} \times (x_{B} \gamma_{B} )^{b}$$

(31)

The activity coefficients in Eq. (31) computed from the examined model. The solubility product (K_CC) is depend only on temperature and independent to solvent type. By knowing solubility product at single point, it can be applied to other conditions. After obtaining solubility product for desired system, the invariant points as intersections of cocrystal line and solubility line were computed by simultaneous solvation of Eqs. (27) and (31). Afterward, the cocrystal region is determined by varying drug mole fraction between two invariant points and obtaining API mole fraction from Eq. (31).

Statistical analysis

In order to explore model precision in comparison to experimental data, several statistics were applied such as absolute average percentage deviation (% AAD), root mean square error (RMSE), mean square error (MSE), normalized root mean square error (NRMSE) and normalized mean square error (NMSE). MSE, NRMSE and NMSE were obtained from goodness of Fit function in MATLAB programming software. Absolute average percentage deviation was calculated as following equations:

$$\% AAD = \frac{1}{n}\sum\limits_{i} {\left| {\frac{{\Omega_{i,cal} - \Omega_{i,\exp } }}{{\Omega_{i,\exp } }}} \right|} \times 100$$

(33)

where $\Omega_{cal}$ are $\Omega_{\exp }$ calculated and experimental data of desired properties and n is number of experimental data. The root mean square error (RMSE) was obtained as follows:

$$RMSE = \sqrt {\left| {\frac{{\sum\nolimits_{i} {\left( {\Omega_{i,cal} - \Omega_{i,\exp } } \right)^{2} } }}{n}} \right|} .$$

(34)

Results and discussion

The object of this section is to evaluate the performances of the COSMO-SAC (2002), the COSMO-SAC (2010) and the Flory–Huggins models for pharmaceutical compounds, which mostly are complicated/massive molecules containing electronegative atoms such as N, O, and S; and complicated bonds between atoms such as hydrogen bonding. The considered properties are solubilities of pharmaceutical compounds in pure solvent and solvent mixtures. The octanol/water partition coefficient and cocrystal formation of pharmaceutical compounds are other examined properties. In order to conduct the study, firstly, the COSMO files from DMol³ were required. Thus, the COSMO files prepared for 15 solvents and 35 pharmaceutical compounds from DMol³ modules in Materials Studio 2017 software. In performing the COSMO file, density function was chosen to GGA (VWN-BP) by quality fine. In electronic options, multipolar expansion was selected octupole. The calculations run at four parallel cores. Other options set to default values in DMol³.

After generating the COSMO file, it is time to test sigma profiles obtained in the current study by reported sigma profiles by other studies. Figures 1 and 2 compare sigma profiles generated in current studies for ibuprofen and acetyl salicylic acid in comparison to sigma profiles in the database provided by Mullins et al.¹⁶. Based on Figs. 1 and 2, the same trends between results in this study and Mullins et al.¹⁶ were observed. The small departures between two curves originated from the software version and the sigma profile generation program.

After generating the sigma profiles and providing the COSMO-SAC computation program for the activity coefficient, the solubilities in the binary and ternary systems were calculated and compared by experimental data obtained from the literature.

Figure 3 shows the parity plots of experimental solubility in the pure solvents in comparison to calculated solubilities from the COSMO-SAC (2002), the COSMO-SAC (2010), and the Flory–Huggins models. The mean square error (MSE), normalized root mean square error (NRMSE), and normalized mean square error (NMSE) for the COSMO-SAC (2002) model are 0.0136, 0.0349, and 0.0685. The MSE, NMSE, and NRMSE for the COSMO-SAC (2010) are 0.0187, − 0.2718, and − 0.1277. While MSE, NMSE, and NRMSE for the Flory–Huggins model are 0.0360, − 1.2337, and − 0.4946. According to Fig. 3, it is observed that the Flory–Huggins model under predicts the solubility data. The examined pharmaceutical compounds contain a wide variety of components made of small to long-chain molecules. The pharmaceutical compounds compose of atoms C, H, N, O, S, F, and Cl, which joint by covalent bonds and stronger bonds such as hydrogen bonding. The reported statistics imply on the relatively acceptable performance of the COSMO-SAC (2002) regarding to the COSMO-SAC (2010). The comparison between accuracy of COSMO-SAC (2002) and COSMO-SAC (2010) seems to be inconsistent with those reported in the literature¹⁵. The accuracy of these two COSMO-SAC models has been comprehensively examined through a very large dataset, containing 29,173 data points of infinite dilution activity coefficient and 139,921 VLE data points of 6940 binary mixtures³¹. The mentioned inconsistency arises from different universal constants implemented in sigma profile generation. The differences in investigated systems attribute the second reason for the observed inconsistency.

It is interesting that the COSMO-SAC (2002) was obtained by only eight universal constant parameters without any further modifications. A list of considered pharmaceutical compounds and their physical properties and references for experimental data were presented in supplementary materials (Table S1).

The Hansen solubility parameters, molar volumes for the Flory–Huggins model and the COSMO molar volume of the examined pharmaceutical compounds and solvents were presented on Table 1. Based on Table1, the molar volume obtained from group contribution method in Barton²⁸ and the COSMO calculations have some difference.

Table 1 Hansen solubility parameters and molar volumes from group contribution method in comparison to molar volumes obtained from the COSMO calculations.

Full size table

Table 2 reports the COSMO-SAC (2002), the COSMO-SAC (2010), and the Flory–Huggins results for some pharmaceutical compounds categorized by the solvent type and sorted according to absolute average deviations (AAD%). The RMSE results for the COSMO-SAC (2002), the COSMO-SAC (2010), and the Flory–Huggins models were also reported in Table 2. Based on Table 2, the predictive model of the COSMO-SAC (2002) has a wide range of errors that are in agreement with errors reported by Hsieh et al.¹⁵. The COSMO-SAC (2010) and the Flory–Huggins have larger errors compared to the COSMO-SAC (2002).

Table 2 The results of solubility from the COSMO-SAC model (2002) for some considered pharmaceutical compounds in comparison to Flory–Huggins model and the COSMO-SAC (2010).

Full size table

According to Table 2, pharmaceutical compounds containing H, C and O with the lowest hydrogen bonding numbers have the lower error. Besides, the structure of molecule has a remarkable influence on accuracy. In the case of acetaminophen and acetyl salicylic acid, by solvent replacement from ethanol to acetone, deterioration in model prediction was observed. The impact of eliminating F atom from flurbiprofen observes in the lower error reported for ibuprofen. Although borneol and isoborneol have the same chemical formula, the accuracy of the COSMO-SAC (2002) for them is entirely different. The above studies implied that molecular structure, atoms, and intermolecular interaction must be widely incorporated into the COSMO-SAC model. Since, the COSMO-SAC (2002) provides better approximations of solubility in examined systems, we prefer utilizing the original COSMO-SAC (2002) in our further investigation on the binary and ternary systems. Afterward, two models, the COSMO-SAC (2002) and the Flory–Huggins models were considered for the octanol/water partition coefficient and cocrystal formation.

Afterward, the ternary systems of pharmaceutical compounds in binary solvents were also examined. On the basis of Table 2, two pharmaceutical compounds, acetaminophen and salicylic acid, were suggested. Acetaminophen consists of 20 atoms H, C, N, and O and two functional groups, OH and NH. Salicylic acid consists of 16 atoms H, C, and O, and two functional groups, OH and COOH. Figure 4 presents the comparison between the experimental and calculated solubilities of acetaminophen in ethanol/water mixtures as a function of ethanol mole fraction at two temperatures, 293.15 and 303.15 K. According to Fig. 4, a good agreement between experimental data and the COSMO-SAC calculations observe. The observed trends of the COSMO-SAC as a function temperature match with the reported experiments.

Figure 5 shows the calculated solubility of salicylic acid in ethanol/ethyl acetate mixture compared to experimental data. On the basis of Fig. 5, a departure from experimental data was observed at higher ethyl acetate mole fraction. The ethyl acetate has a functional group COO which its interaction with COOH in salicylic acid has been ignored in the COSMO-SAC (2002).

The octanol/water partition coefficients for some pharmaceutical compounds obtained from the COSMO-SAC model. In Table 3, the results of the octanol/water partition coefficient from the COSMO-SAC model compared to experimental data from the national library of medicine³⁴. The MSE, NMSE, and NRMSE are 2.36, 0.1416, and 0.0735. The RMSEs for the COSMO-SAC and the Flory–Huggins are 1.25 and 4.45. On the basis of Table 3, the various accuracies obtained regarding activity ratio in the octanol/water partition coefficient. In the octanol/water partition coefficient, if the errors in the numerator and denominator cancel each other out, a good accuracy between the COSMO-SAC computation and experiment is harvested. Otherwise, the discrepancies in obtained errors were seen. It is possible that the COSMO-SAC model fails for solubility prediction (such as dapsone) but presents a reasonable estimation of the octanol/water partition coefficient due to the above discussions. As observed from Table 3, the simple molecules made of H, C, and O by only hydrogen bonding have better performance in the COSMO-SAC predictions. On the basis of Table 3, the octanol/water partition coefficients obtained from the Flory–Huggins model are farm from experimental data.

Table 3 The calculated and experimental octanol/water partition coefficient for some pharmaceutical compounds.

Full size table

In order to investigate a more complex system, a three-phases diagram of ternary system is explored by considering the sulfamethazine/salicylic acid cocrystal formation in methanol at 283.15 K, which studied by Ahuja et al.³⁵. Details of calculation and methods were described in “Cocrystal formation” section. After performing the computation by the COSMO-SAC (2002), a triangular diagram of the considered system was plotted by a free software named ProSim Ternary Diagram. On the basis of Fig. 6 and experimental plots in Ahuja et al.³⁵, some differences between experiments and the COSMO-SAC calculations were observed. The cocrystal region for SM/SA predicted by the COSMO-SAC is wider, while experimental data imply on the narrow region. The solubility line of SM in SA + ME mixture expanded in the COSMO-SAC model in comparison to experiments which interpreted by the COSMO-SAC ability in the considered system. The predicted solubility line of SA in the SM + SA is appropriately closer to the reported experimental data which indicates the good performance of the COSMO-SAC for SA. The reported inconsistencies in observed results originated from molecular structure, constituent atoms, and their interactions. The electronegative atoms S and N in sulfamethazine create the observed discrepancies, while their contributions were not considered in the COSMO-SAC (2002) model. The ternary phase diagram carbamazepine (CBZ)/acetylsalicylic acid (ASA) in ethanol (ET) at 298.15 K were computed by the COSMO-SAC (2002) and plotted in Fig. 7. Veith et al.²⁹ studied the CBZ/ASA/ET by PC-SAFT EOS. According to Veith et al.²⁹, the PC-SAFT EOS without binary interaction parameters estimated the narrow cocrystal region and low solubilities. Whilst the COSMO-SAC (2002) predicts higher solubilities and wider cocrystal region. By comparison the COSMO-SAC (2002) calculations to the PC-SAFT EOS by considering binary interaction parameters and experimental data Veith et al.²⁹, a reasonable agreement observes between the COSMO-SAC (2002) and reported data.

Conclusions

The COSMO-SAC as a predictive model has been gained a great attention in thermodynamic modeling and phase equilibria considerations. The eight universal parameters and predefined atomic radiuses for C, H, O, S, N, F, and Cl are the general basis of the COSMO-SAC model. In the current study, the COSMO-SAC model implemented in solid–liquid phase equilibria in form of solubility data in binary and ternary systems, octanol/water partition coefficient, and cocrystal studies. For more comparison, the COSMO-SAC model was also compared with the Flory–Huggins model. The obtained results implied that molecular structure, constituent atoms, functional group, and their interactions have remarkable impacts on the obtained results. In general, the simple molecules made of atoms H, C, and O under special condition, atom N by simple covalent and hydrogen bonding interactions can be deliberated by the COSMO-SAC model. The presence of other atoms such as F and S and other functional groups such as COO and COOH made complex systems. This complexity provides some opportunities to modify the original the COSMO-SAC model.

References

Jakob, A., Grensemann, H., Lohmann, J. & Gmehling, J. Further development of modified UNIFAC (Dortmund): revision and extension 5. Ind. Eng. Chem. Res. 45(23), 7924–7933 (2006).
Article CAS Google Scholar
Chen, C.-C. & Song, Y. Solubility modeling with a nonrandom two-liquid segment activity coefficient model. Ind. Eng. Chem. Res. 43(26), 8354–8362 (2004).
Article Google Scholar
Klamt, A. Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena. J. Phys. Chem. 99(7), 2224–2235 (1995).
Article CAS Google Scholar
Klamt, A., Jonas, V., Bürger, T. & Lohrenz, J. C. Refinement and parametrization of COSMO-RS. J. Phys. Chem. A 102(26), 5074–5085 (1998).
Article CAS Google Scholar
Klamt, A. & Eckert, F. COSMO-RS: a novel and efficient method for the a priori prediction of thermophysical data of liquids. Fluid Phase Equilib. 172(1), 43–72 (2000).
Article CAS Google Scholar
Lin, S.-T. & Sandler, S. I. A priori phase equilibrium prediction from a segment contribution solvation model. Ind. Eng. Chem. Res. 41(5), 899–913 (2002).
Article CAS Google Scholar
Mullins, E. et al. Sigma-profile database for using COSMO-based thermodynamic methods. Ind. Eng. Chem. Res. 45(12), 4389–4415 (2006).
Article CAS Google Scholar
Tung, H. H., Tabora, J., Variankaval, N., Bakken, D. & Chen, C. C. Prediction of pharmaceutical solubility via NRTL-SAC and COSMO-SAC. J. Pharm. Sci. 97(5), 1813–1820. https://doi.org/10.1002/jps.21032 (2008).
Article CAS PubMed Google Scholar
Zhou, Y. et al. Separation of thioglycolic acid from its aqueous solution by ionic liquids: ionic liquids selection by the COSMO-SAC model and liquid–liquid phase equilibrium. J. Chem. Thermodyn. 118, 263–273 (2018).
Article CAS Google Scholar
Paese, L. T., Spengler, R. L., Soares, R. D. P. & Staudt, P. B. Predicting phase equilibrium of aqueous sugar solutions and industrial juices using COSMO-SAC. J. Food Eng. https://doi.org/10.1016/j.jfoodeng.2019.109836 (2020).
Article Google Scholar
Xavier, V. B., Staudt, P. B. & de Soares, R. P. Predicting VLE and odor intensity of mixtures containing fragrances with COSMO-SAC. Ind. Eng. Chem. Res. 59(5), 2145–2154 (2020).
Article CAS Google Scholar
Bouillot, B., Teychené, S. & Biscans, B. An evaluation of COSMO-SAC model and its evolutions for the prediction of drug-like molecule solubility: part 1. Ind. Eng. Chem. Res. 52(26), 9276–9284 (2013).
Article CAS Google Scholar
Shu, C.-C. & Lin, S.-T. Prediction of drug solubility in mixed solvent systems using the COSMO-SAC activity coefficient model. Ind. Eng. Chem. Res. 50(1), 142–147 (2011).
Article CAS Google Scholar
Buggert, M. et al. COSMO-RS calculations of partition coefficients: different tools for conformation search. Chem. Eng. Technol. Ind. Chem.-Plant Equip.-Process Eng.-Biotechnol. 32(6), 977–986 (2009).
CAS Google Scholar
Hsieh, C.-M., Wang, S., Lin, S.-T. & Sandler, S. I. A predictive model for the solubility and octanol–water partition coefficient of pharmaceuticals. J. Chem. Eng. Data 56(4), 936–945 (2011).
Article CAS Google Scholar
Mullins, E., Liu, Y., Ghaderi, A. & Fast, S. D. Sigma profile database for predicting solid solubility in pure and mixed solvent mixtures for organic pharmacological compounds with COSMO-based thermodynamic methods. Ind. Eng. Chem. Res. 47(5), 1707–1725 (2008).
Article CAS Google Scholar
Bell, I. H. et al. A benchmark open-source implementation of COSMO-SAC. J. Chem. Theory Comput. 16(4), 2635–2646 (2020).
Article CAS Google Scholar
Ferrarini, F., Flôres, G., Muniz, A. & de Soares, R. An open and extensible sigma-profile database for COSMO-based models. AIChE J. 64(9), 3443–3455 (2018).
Article CAS Google Scholar
Mu, T., Rarey, J. & Gmehling, J. Performance of COSMO-RS with sigma profiles from different model chemistries. Ind. Eng. Chem. Res. 46(20), 6612–6629 (2007).
Article CAS Google Scholar
Lee, M.-T. & Lin, S.-T. Prediction of mixture vapor–liquid equilibrium from the combined use of Peng–Robinson equation of state and COSMO-SAC activity coefficient model through the Wong-Sandler mixing rule. Fluid Phase Equilib. 254(1–2), 28–34 (2007).
Article CAS Google Scholar
Lin, S.-T., Chang, J., Wang, S., Goddard, W. A. & Sandler, S. I. Prediction of vapor pressures and enthalpies of vaporization using a COSMO solvation model. J. Phys. Chem. A 108(36), 7429–7439 (2004).
Article CAS Google Scholar
Hsieh, C.-M., Sandler, S. I. & Lin, S.-T. Improvements of COSMO-SAC for vapor–liquid and liquid–liquid equilibrium predictions. Fluid Phase Equilib. 297(1), 90–97 (2010).
Article CAS Google Scholar
Paulechka, E., Diky, V., Kazakov, A., Kroenlein, K. & Frenkel, M. Reparameterization of COSMO-SAC for phase equilibrium properties based on critically evaluated data. J. Chem. Eng. Data 60(12), 3554–3561 (2015).
Article CAS Google Scholar
Islam, M. R. & Chen, C.-C. COSMO-SAC sigma profile generation with conceptual segment concept. Ind. Eng. Chem. Res. 54(16), 4441–4454 (2015).
Article CAS Google Scholar
Lindvig, T., Michelsen, M. L. & Kontogeorgis, G. M. A Flory-Huggins model based on the Hansen solubility parameters. Fluid Phase Equilib. 203(1–2), 247–260 (2002).
Article CAS Google Scholar
Staverman, A. The entropy of high polymer solutions. Generalization of formulae. Recl. Trav. Chim. Pays-Bas 69(2), 163–174 (1950).
Article CAS Google Scholar
Kurada, K. V. & De, S. Modeling of solution thermodynamics: A method for tuning the properties of blend polymeric membranes. J. Membr. Sci. 540, 485–495 (2017).
Article CAS Google Scholar
Barton, A. F. Handbook of Polymer–Liquid Interaction Parameters and Solubility Parameters (CRC Press, New York, 1990).
Google Scholar
Veith, H., Schleinitz, M., Schauerte, C. & Sadowski, G. Thermodynamic approach for co-crystal screening. Cryst. Growth Des. 19(6), 3253–3264. https://doi.org/10.1021/acs.cgd.9b00103 (2019).
Article CAS Google Scholar
Ainouz, A., Authelin, J. R., Billot, P. & Lieberman, H. Modeling and prediction of cocrystal phase diagrams. Int. J. Pharm. 374(1–2), 82–89. https://doi.org/10.1016/j.ijpharm.2009.03.016 (2009).
Article CAS PubMed Google Scholar
Fingerhut, R. et al. Comprehensive assessment of COSMO-SAC models for predictions of fluid-phase equilibria. Ind. Eng. Chem. Res. 56(35), 9868–9884. https://doi.org/10.1021/acs.iecr.7b01360 (2017).
Article CAS Google Scholar
Jiménez, J. A. & Martínez, F. Thermodynamic magnitudes of mixing and solvation of acetaminophen in ethanol + water cosolvent mixtures. Rev. Acad. Colomb Cienc 30(114), 87–99 (2006).
Google Scholar
Matsuda, H. et al. Solubilities of salicylic acid in pure solvents and binary mixtures containing cosolvent. J. Chem. Eng. Data 54(2), 480–484. https://doi.org/10.1021/je800475d (2009).
Article CAS Google Scholar
National Library of Medicine, National Center for Biotechnology Information. Accessed 15 July 2020. https://pubchem.ncbi.nlm.nih.gov/.
Ahuja, D., Svärd, M. & Rasmuson, Å. C. Investigation of solid–liquid phase diagrams of the sulfamethazine–salicylic acid co-crystal. CrystEngComm 21(18), 2863–2874 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge financial support (grand number: 98017343) from the Iran National Science Foundation (INSF).

Author information

Authors and Affiliations

Department of Chemical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
Samane Zarei Mahmoudabadi & Gholamreza Pazuki

Authors

Samane Zarei Mahmoudabadi
View author publications
Search author on:PubMed Google Scholar
Gholamreza Pazuki
View author publications
Search author on:PubMed Google Scholar

Contributions

S.Z.M.: Conceptualization, Methodology, Software, Writing. G.P.: Writing, Methodology, Supervision.

Corresponding author

Correspondence to Gholamreza Pazuki.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mahmoudabadi, S.Z., Pazuki, G. Investigation of COSMO-SAC model for solubility and cocrystal formation of pharmaceutical compounds. Sci Rep 10, 19879 (2020). https://doi.org/10.1038/s41598-020-76986-3

Download citation

Received: 20 August 2020
Accepted: 04 November 2020
Published: 16 November 2020
Version of record: 16 November 2020
DOI: https://doi.org/10.1038/s41598-020-76986-3

This article is cited by

A predictive PC-SAFT EOS based on COSMO for pharmaceutical compounds
- Samane Zarei Mahmoudabadi
- Gholamreza Pazuki
Scientific Reports (2021)