Introduction

Multi-particulate dosage forms have been considered as a breakthrough technology in the pharmaceutical industry thanks to their remarkable potential as a drug delivery system with disparate applications1,2. In the current decades, the ultimate purpose of formulation scientists all over the world have been focused on the development of promising state-of-the-art technologies to rise the efficiency and bioavailability of orally-administered drugs with minimum systemic toxicity. Development of such technologies not only provides significant hopes for those patients suffering from different fatal diseases but also can propose excellent chances to the industries to enhance their market share, particularly for competitive therapeutic agents3,4,5.

According to the biopharmaceutics classification system (BCS), principal parameters for describing the absorption behavior of the orally-administered therapeutic agents are solubility and permeability6. Therefore, finding cost-effective and green paradigm to optimize and increase the solubility of poorly-soluble lipophilic drugs in water is a remarkable aim in the pharmaceutical industry7. Application of CO2 supercritical fluid (CO2-SCF) for increasing the solubility of poorly-soluble therapeutic agents in water has recently attracted the attentions of scientists8,9,10. CO2 in the normal condition is considered as one of the most important greenhouse gases. Therefore, some topics such as global warming and air pollution may be raised11,12. However, the use of CO2 can be an efficient method to enhance the solubility of orally-administered drugs with low solubility in water due to its considerable advantages like great potential to manufacture particles with appropriate aerodynamic diameters, very low need of heating to produce the fundamental particles, low toxicity, safety and available critical conditions13,14.

Exemestane (Aromasin, with chemical formula C20H24O2) is a commonly-used steroidal aromatase inhibitor, which has been indicated by the U.S food and drug administration (FDA) since October 1999 for the adjuvant treatment of postmenopausal female patients suffering from hormonally-responsive breast cancer in women. Exemestane significantly declines the production of estrogen by the body and therefore, stop the growth of those cancerous cells that are sensitive to estrogen15,16,17.

Machine learning (ML) methodologies are gradually replacing traditional computing methods in a variety of scientific disciplines18,19. Neural Networks, Deep Learning, Linear Models and ensemble methods are examples of these approaches that are used to solve a variety of problems such as energy, fluid properties, materials, separation, etc20,21. Machine learning models may now analyze any problem by providing certain input qualities and single or multiple target outputs. These models capture the relationships between inputs and outputs through different mechanisms22.

It is common practice to employ SVR, an algorithm grounded in statistical learning theory, to enhance generalization capacity. Based on their estimation function, support vector-based models have numerous versions. There are many support vector regression varieties. We employed Nu-SVR model in this study23,24. Moreover, the Gaussian process model (GPR) is an effective non-parametric Bayesian model for exploration and exploitation. The primary benefit of GPR the ability to process a consistent answer for the model’s input properties25. Bayesian inference is used to let the data determine the complexity of the models, allowing this approach to portray an extensive variety of correlations between input qualities and output values26,27,28,29. Linear regression is a popular statistical analysis method. It’s fundamental, but it’s incredibly useful in places like economics, material science, and chemistry. As another linear model, LASSO regression is often used. The Lasso is a linear model for sparse coefficient prediction30. In this study, the solubility of Exemestane at various temperatures and pressures in supercritical CO2 (scCO2) was evaluated using several novel mathematical models developed through artificial intelligence techniques. Comparative analysis indicated that Gaussian Process Regression (GPR) provided the most accurate predictions, achieving the highest R² value (0.996) alongside the lowest error rate (MAE = 0.904).

Data set for computing

This study involves a regression task. This task contains 45 data points, that are organized below in the table: Two numerical inputs of T (K) and P (MPa) and a single numerical output are provided (Solubility of EXE drug). Table 1 shows the dataset of our research (taken from31 which can be accessed using: https://www.sciencedirect.com/science/article/pii/S0896844609002071. Figure 1 shows the pairwise distribution of variables.

Table 1 The whole used dataset31.
Fig. 1
figure 1

Distributions of parameters.

Methodology

Cuckoo search algorithm (CS)

The CS algorithm32,33 is a meta-heuristic optimizer which operates based on swarm intelligent. Rhododendron homing parasitic characteristics are simulated34. Levy flight is used to get the best possible incubation conditions for a host species’ eggs anywhere in the world. CS algorithm follows these three principles34,35,36:

  1. 1.

    There is only ever one egg laid by a cuckoo at a time, and the nesting sites are selected at random.

  2. 2.

    Among a randomly chosen group of nests, only the nest with the superior-quality eggs is selected to generate the next generation.

  3. 3.

    In every generation of cuckoos, the number of available host nests remains constant34.

GPR

Probabilistic regression has the potential to increase robustness to learning mistakes in many cases. Methods for nonlinear regression that rely on a probabilistic regression framework but non-parametric models37; examples include GPR (Gaussian Process Regression). The premise of this approach is that the y measurements that constitute the output variable are generated as follows25:

$$\:y=f\left(\mathbf{x}\left(k\right)\right)+\epsilon\:$$

\(\:{\:\sigma\:}_{n}^{2}\:\)is the Gaussian noise variance. Instead of giving parameters to the function f, the prior probability is described in respect of the GP, which applies across the entire function space38. The mean m(x) and the covariance equation cov(x, x) of the GP carry ideas about the generating mechanism. The covariance and mean equations are computed, and there we can derive the output corresponding to a specific data point x based on Gaussian distribution p(y|X, y, x) with39:

$$\:\begin{array}{cc}&\:{\stackrel{{}^{}}{y}}_{*}=m\left({\mathbf{x}}_{*}\right)+{\mathbf{k}}_{*}^{\text{T}}{\left(\mathbf{K}+{\sigma\:}_{n}^{2}\mathbf{I}\right)}^{-1}\left(\mathbf{y}-m\left({\mathbf{x}}_{*}\right)\right)\text{,}\\\:&\:{\sigma\:}_{{y}_{*}}^{2}={k}_{*}+{\sigma\:}_{n}^{2}-{\mathbf{k}}_{*}^{\text{T}}{\left(\mathbf{K}+{\sigma\:}_{n}^{2}\mathbf{I}\right)}^{-1}{\mathbf{k}}_{*}\text{,}\end{array}$$

Following the equation presented above, an estimate is calculated on the train vector X, y. Conversely, the prediction in conventional regression methods is based solely on the parameters.

In this formula, K represents a covariance matrix which the elements in this matrix are Ki, j = cov(xi, xj), and k is a vector40:

$$\:\left[k*\right]i=cov\left(xi,x*\right)andk*=cov\left(x*,x*\right)$$

The variables of the mean and covariance functions need has been computed through dataset before reliable predictions can be made. Optimizing log p(y|X), the log likelihood amount of the train subset, is typically applied for organizing the hyper-variables41:

$$\:\text{l}\text{o}\text{g}p\left(\mathbf{y}|\mathbf{X}\right)=-\frac{1}{2}{\mathbf{y}}^{\text{T}}{\left(\mathbf{K}+{\sigma\:}_{n}^{2}\mathbf{I}\right)}^{-1}\mathbf{y}-\frac{1}{2}\text{l}\text{o}\text{g}\left(\left|\mathbf{K}+{\sigma\:}_{n}^{2}\mathbf{I}\right|\right)-\frac{n}{2}\text{l}\text{o}\text{g}\left(2\pi\:\right)$$

In the recent equation, n shows the quantity of instances in the training subset.

Nu-SVR

The Support Vector Machines (SVM) approach fundamentally tries to map the input data vector into a higher dimensional feature space in order to generate an ideal separation hyperplane. By seeing the hyperplane as a curve tube, SVM was effectively used to regression and time series prediction42,43. Consider the following input and output values as basic assumptions43:

$$\:\left[\left({x}_{1},{y}_{1}\right),\dots\:,\left({x}_{n},{y}_{n}\right)\right]$$

Finding the nonlinear correlation shown by the following Equation, as f(x). The aim of the Nu-SVR model is to have it be as near to y as possible. Moreover, it needs to be as level as possible43:

f(x) = wT P(x) + b.

$$\text{f}(\text{x})=\text{w}^{\text{T}} \text{P}(\text{x})+\text{b}$$

To clarify, b stands for the bias, wT shows the weight vector, and P(x) is a non-linear mapping equation that transforms the feature space into one with more downgrades25. The primary focus of the assignment to satisfy the two fundamental requirements of closeness and flatness. In fact, optimizing is the task’s main goal44:

$$\frac{1}{2}{\left| {\left\lceil w \right\rceil } \right|^2} + {\text{C}}\left\{ {Y \cdot \varepsilon + \frac{1}{n}\sum\limits_{i = 1}^n {\left( {\xi + \xi *} \right)} } \right\}.$$

Under the following conditions44:

$$\text{y}_\text{i{-}} \left\langle {w}^{T}.P\left(x\right) \right\rangle -b\le\:\varepsilon\:+{\xi\:}_{i}^{*},$$
$$\:\left\langle {w}^{T}.P\left(x\right) \right\rangle +b-{y}_{i}\le\:\varepsilon\:+{\xi\:}_{i},$$
$$\:{\xi\:}_{i}^{*},{\xi\:}_{i}\ge\:0$$

Here, ɛ stands for a disparity of the f(x) from its experimental data, and extra slack variables25 (ξ, and ξi are) declared in44.

Lasso

The method of LASSO promotes sparsity in coefficient estimates. By favoring solutions with fewer non-zero coefficients, it effectively reduces the number of features contributing to the model, enhancing its applicability in certain scenarios. Compressed sensing relies heavily on models, and Lasso and Lasso-based models are a major part of it. The exact set of coefficients can be found in some situations45. This technique is employed to simplify the model and forestall over-fitting. To adjust the residual sum of squares, we choose βj in the following equation45:

$$\:\begin{array}{c}\sum\:_{i=1}^{n}{\left({\beta\:}_{0}+\sum\:_{k=j}^{K}{\beta\:}_{k}{x}_{k,i}-{y}_{i}\right)}^{2}\end{array}$$

λ is used in LASSO regression to optimize the sum of the residual squares45:

$$\:\begin{array}{c}\sum\:_{i=1}^{n}{\left({\beta\:}_{0}+\sum\:_{k=1}^{K}{\beta\:}_{k}{x}_{k,i}-{y}_{i}\right)}^{2}+\lambda\:\sum\:_{k=1}^{k}{\beta\:}_{k}\:\:\end{array}$$

Evaluation metrics

The predictive capability of the developed models was evaluated via four standard statistical metrics: the coefficient of determination (R²), mean absolute error (MAE), mean absolute percentage error (MAPE), and maximum error (Max Error). These indicators provide a quantitative assessment of the accuracy and reliability of the predicted solubility values relative to the experimental measurements.

The R² measures how well the predicted values approximate the actual observations and is given by46:

$$\:{R}^{2}=1-\frac{{\sum\:}_{i=1}^{n}{\left({y}_{i}-\widehat{y}*i\right)}^{2}}{\sum\:*i={1}^{n}{\left({y}_{i}-\bar{y}\right)}^{2}}$$

MAE measures the average size of prediction errors, indicating the typical deviation of the predicted values from the observed data:

$$\:\text{MAE}=\frac{1}{n}{\sum\:}_{i=1}^{n}\left|{y}_{i}-\widehat{{y}_{i}}\right|$$

The MAPE expresses the average relative error as a percentage:

$$\:\text{MAPE}=\frac{100}{n}{\sum\:}_{i=1}^{n}\left|\frac{{y}_{i}-\widehat{{y}_{i}}}{{y}_{i}}\right|$$

The Max Error determines the largest deviation between experimental and predicted values:

$$\:\text{Max\:Error}=\text{max}\left|{y}_{i}-\widehat{{y}_{i}}\right|$$

In these expressions, \(\:{y}_{i}\) and \(\:\widehat{{y}_{i}}\) stands for the experimental and calculated solubility values, respectively; \(\:\bar{y}\) represents the mean of the observed values, and n corresponds to the total number of data points.

Results and discussions

The introduced models were optimized using the CS algorithm and their effective hyper-parameters were obtained for optimal implementation. At the end, the approaches have been analyzed and validated, then the results of multiple statistical metrics are displayed in Table 2.

The final optimized hyperparameters for all regression models were determined using the CS optimization algorithm to ensure the best predictive performance. For the GPR model, the optimized settings included a squared exponential kernel, kernel scale of 1.42, signal variance of 0.85, and noise level of 0.003. The Nu-SVR model achieved its optimal configuration with a radial basis function (RBF) kernel, penalty parameter C equal to 110, insensitivity parameter epsilon of 0.08, and nu value of 0.45. For the Lasso Regression model, the optimal regularization coefficient lambda was 0.007 with a tolerance value of 1 × 10⁻⁴. These optimized hyperparameter values were obtained based on minimizing the root mean square error and maximizing the coefficient of determination during the cross-validation process, confirming the reliability and generalization capability of the trained models.

Table 2 The outputs of final optimized approaches.

By examining Table 2, the GPR estimator is the most accurate model of our work. Figures 2, 3 and 4 also show a visual comparison of the experimental values and the values obtained from the approaches. The comparison of these three figures demonstrate that the GPR model is the most accurate one and after that the US-SVR is ranked second. Also, 3D diagrams of all three final estimators are displayed in Figs. 5 and 6, and 7.

Fig. 2
figure 2

Comparing Observed and estimated output (GPR method).

Fig. 3
figure 3

Comparing observed and estimated output (Lasso method).

Fig. 4
figure 4

Comparing observed and estimated output (Nu-SVR method).

Fig. 5
figure 5

The 3D final decision surface (GPR MODEL).

Fig. 6
figure 6

The 3D final decision surface (LASSO MODEL).

Fig. 7
figure 7

The 3D final decision surface (Nu-SVR MODEL).

Based on what was said in the previous paragraph, we considered the Gaussian Process Regression model as the main model amongst others and obtained the two-dimensional trends of the parameters depicted in Figs. 8 and 9 with the help of this model. The influence of two functional parameters (pressure and temperature) on the solubility of Exemestane steroidal aromatase inhibitor anti-cancer drug is depicted in Figs. 8 and 9, respectively. Increase in the pressure is in favor of Exemestane solubility. Indeed, increase in the operating pressure dramatically improves the solvent’s density and declines intermolecular spaces between CO2 molecules, which positively encourages the Exemestane solubility. The influence of temperature on the solubility of Exemestane steroidal aromatase inhibitor anti-cancer drug is more complex due to the paradoxical effect of this parameter on the solute’s sublimation pressure, solvent density, and intermolecular interactions in CO2-SCF system. The analysis of the figures confirms that when the system pressure exceeds the cross-over point, variations in temperature significantly influence the solubility of Exemestane. This behavior arises because the positive contribution of increased sublimation pressure outweighs the adverse effect associated with the reduction in solvent density. Then, at this condition the solubility improves considerably. When the operating pressure adjusts below the cross-over value, deteriorative contribution of density reduction overcome the favorable role of increasing the sublimation pressure and thus, the solubility reduces considerably47.

Fig. 8
figure 8

Trends of parameter P.

Fig. 9
figure 9

Trends of parameter T.

To evaluate the generalization capability of the developed GPR model, the same modeling pipeline was applied to ten additional drug solubility datasets in supercritical CO2. As summarized in Table 3, the R² values ranged from 0.966 to 0.991, confirming the robustness and adaptability of the proposed model across diverse drug molecules.

Table 3 R2 performance of the GPR model for additional drug datasets.

Conclusion

Identifying diverse state-of-the-art and breakthrough strategies to improve the bioavailability and solubility of orally administered anticancer agents remains a paramount concern among medical researchers. In this research study, the solubility of Exemestane steroidal aromatase inhibitor versus operating temperatures and pressures is modeled and optimized using machine learning approach. Adjusting the hyper-parameters of three separate models—NU-SVR, GPR and LASSO is done with an approach called the cuckoo search algorithm (CS), which is employed to tackle the problem of the model selection process. R-square scores of 0.996, 0.793, and 0.983 were obtained for the GPR, Nu-SVR, and LASSO models, respectively, based on the evaluations that were carried out as part of this study. The GPR model, the Nu-SVR model, and the LASSO model each exhibit MAE errors with respective values of 0.904, 5.310, and 1.921 with regard to error rate. In light of these findings and the results of the other evaluations, the Gaussian process model emerges as the model within the scope of this research as having the highest degree of precision.