Introduction

Deep eutectic solvents (DESs) are mixtures of a hydrogen bond donor (HBD) and a hydrogen bond acceptor (HBA), which combine to form a solvent with a melting point lower than that of either individual component. Their advantageous properties, including low toxicity and low vapor pressure, have led to significant interest in DESs across a variety of fields1. These types of solvents are considered an alternative to traditional organic solvents. Similar to ionic liquid (IL), DES physicochemical properties can be tuned by the suitable combination of HBD–HBA compounds. It must be noted that ILs and DESs are different due to the nature of starting materials and the methods for their formation. When considering starting materials and formation methods, ILs are mixtures of cations and anions, requiring synthesis with reagents and solvents. In contrast, DESs consist of HBAs and HBDs, and they can be prepared using single components through a simple heat treatment2. DESs are widely used in pharmacology3, gas-capturing processes (especially CO2 capture)4, extraction operations5,6,7, and water treatment8. The low vapor pressure (almost negligible vapor pressure) of DESs is one of their main properties9. In industry, environmentally friendlier and greener solvent alternatives for the manufacturing of their products are required10. Bowen et al. have demonstrated the utility of DESs in protein extraction and purification10 highlighting their potential as a promising, alternative solvent for protein extraction from diverse raw biomass sources. Furthermore, recent research11 has focused on designing green, highly efficient, and recyclable DESs for separating EVA films from end-of-life (EOL) photovoltaic (PV) modules. The inherent high-temperature stability and acidic nature of DESs can effectively facilitate the separation of EVA film layers within these modules11. Jahanbakhsh-Bonab et al. used molecular dynamics (MD) simulations to examine the physicochemical and structural characteristics of novel DESs12. Research indicates that methyl-β-cyclodextrin (MBCD)-based DESs can provide predictive insights for their applications in extraction processes. Furthermore, studies have explored the structural and physicochemical properties of chiral DESs, composed of racemic mixtures of menthol with acetic acid, menthol with lauric acid, and menthol with pyruvic acid, specifically for enantioselective extraction processes13. Jahanbakhsh-Bonab et al. utilized the MD simulations to examine the structural and dynamical properties of DESs-based boron nitride nanotube (BNNT) nanofluid14. The impact of nanotube diameter on the physicochemical parameters of DES-based systems has been investigated. Results indicate that adding Boron Nitride Nanotubes (BNNTs) to DESs increases viscosity due to a reduction in the diffusion coefficient of the DES components. Understanding the structure of DES-based nanofluids is crucial for comprehending the properties of these novel solvents in chemical processes. Moreover, MD simulations have been employed to examine the effects of external electric fields (EEFs) on the structural and transport properties of DESs composed of a 2:1 molar ratio of glycerol (Gly) and choline chloride (ChCl)15. They calculated several key physicochemical properties of Gly/ChCl DESs in the absence of external electric fields (EEFs), including viscosity, self-diffusion coefficient, isothermal compressibility, and density. They also employed the radial distribution function (RDF), coordination number, and number of hydrogen bonds to analyze the arrangement of the DES components. Their findings indicated that the correlation of movement between glycerol (Gly) and chloride (Cl) ions decreases as the strength of the EEF increases. Furthermore, in recent years, MD simulations have been increasingly used to investigate the performance of DESs in separating acid gases from natural gas16. The performance of DESs was compared with that of methyl diethanolamine (MDEA) system. The results show that, the diffusion coefficient of H2S and CO2 follow the trends Nano-DES < DES-MDEA < DES < MDEA < aqueous MDEA, and DES < DES-MDEA < MDEA < Nano-DES < aqueous MDEA in the liquid phase. Also, the performance of an amine-based DES composed of a 6:1 molar ratio of methyl diethanolamine (MDEA) and choline chloride (ChCl) for the natural gas (NG) sweetening process was investigated using MD simulations17. The effect of pressure on the performance of amine-based DESs for natural gas sweetening has been examined. The results suggest that DESs based on N-methyldiethanolamine (MDEA) can compete with conventional amine solvents in natural gas sweetening processes. P. Jahanbakhsh-Bonab et al. simulated the absorption capability of DESs based on MDEA using MD simulations18. Also, the effect of temperature on absorption capability was studied. The thermophysical properties of monoethanolamine (MEA)-based DESs for H2S extraction from biogas were investigated19. The impact of pressure on the performance of amine-based DESs for the extraction of H2S and CO2 was studied. Esfahani et al. used choline chloride-based DESs for the extraction of 1-butanol or 2-butanol from azeotropic n-heptane + butanol mixtures. This shows a difference in application and the type of DES used7. In summary, the prediction of thermodynamic properties of DESs plays an important role in chemical processes. The first-order derivative thermodynamic properties such as density, and second-order derivative thermodynamic properties such as heat capacity and speed of sound of the system, must be estimated in the pre-design of a new chemical process. Predictive models are more cost-effective than experimentally measuring the properties of a large number of potential designs (DESs) for a chemical process. M. Taherzadeh et al. proposed a correlation-based model to correlate/predict the heat capacity of 28 DESs over the wide temperature range from 278 to 363 K20. The proposed correlation was developed based on molecular weight, temperature, acentric factor, and critical pressure. The absolute average relative deviation (AARD%) of the model for all of the studied data points was 4.7%. Leron and Li created a model specifically for estimating the heat capacity of DES21. Naser et al. developed a model for the specific heat capacity calculation of 15 DESs22. Zhang et al. proposed an empirical equation for the heat capacity estimation of two DESs23. H. Peyrovedin et al. proposed a correlation-based model aimed at estimating the speed of sound in 39 different deep eutectic solvents (DESs) across a broad temperature range. This model likely utilizes empirical correlations derived from experimental data to provide accurate predictions of sound speed in these solvents, which is essential for understanding their thermophysical properties and potential applications in various fields24. Their results indicated that the AARD% of the model is about 5.4%. Lapeña et al. correlated the speed of sound, heat capacity, density, isentropic compressibility, and viscosity of two DESs using a linear correlation25. Peng and Minceva utilized the perturbed chain polar statistical associating fluid theory (PCP-SAFT) model to predict the density and viscosity of DESs26. Lashkarbolooki et al. utilized the ANN to predict the heat capacity of binary ionic liquids mixtures27. They collected 1571 binary heat capacity data points for ILs. A neural network with one hidden layer containing 16 neurons successfully predicted IL binary heat capacities27. Thermodynamic models are a primary approach for predicting phase equilibrium and calculating second-order derivative thermodynamic properties

Theory and methodology

Data collection

As mentioned in the introduction section, the experimental data on the speed of sound are scarce.

In this study, 415 experimental data points of 38 DESs over a wide range of temperatures have been selected in the literature. The data points have been randomly divided into two sets containing 300 training and 115 testing data points. The most common train-test splits in the literature are 70:30 and 80:20, offering a good balance by providing enough data for both training and testing, and are often selected for their robustness and reliability in various contexts44. The training data points have been considered to develop the model, and the test data points have been utilized to check the model performance.

The input layer is critical in machine learning models. Predicting the speed of sound in DESs requires appropriate input features, likely drawn from various thermophysical properties. However, a challenge arises with newly introduced DESs, as their thermophysical properties are often unknown. This presents a problem for training accurate machine learning models to predict the speed of sound in these novel solvents. The group contribution methods can be utilized to estimate the thermo-physical properties of DESs such as critical pressure, critical temperature and critical volume, and acentric factor. In this work, the modified Lydersen and Joback–Reid GC method45,46 is used to estimate the thermo-physical properties of DESs. In Table 1 the groups parameters of the Lydersen and Joback–Reid method have been presented.

Table 1 Groups considered in the Joback–Reid method.

Valderrama et al. estimated the critical points, normal boiling temperature, and acentric factor of ILs using the Joback method45. As shown in Table 1, the ion parameters have been considered based on the Valderrama et al. method. The normal boiling temperature (Tb), critical temperature (Tc), critical pressure (Pc), critical volume (Vc), and acentric factor (ω) are estimated using Eqs. (1)–(5) as follows:

$$T_{b} \left( K \right) = 198 + \mathop \sum \limits_{k} N_{k} tb_{k}$$
(1)
$$T_{c} \left( K \right) = T_{b} \left[ {0.584 + 0.965\left\{ {\mathop \sum \limits_{k} N_{k} tc_{k} } \right\} - \left\{ {\mathop \sum \limits_{k} N_{k} tb_{k} } \right\}^{2} } \right]^{ - 1}$$
(2)
$$P_{c} \left( {bar} \right) = \left[ {0.113 + 0.0032N_{atoms} - \mathop \sum \limits_{k} N_{k} pc_{k} } \right]^{ - 2}$$
(3)
$$V_{c} \left( {\frac{{cm^{3} }}{mol}} \right) = 17.5 + \mathop \sum \limits_{k} N_{k} vc_{k}$$
(4)

where Natoms is the total number of atoms in the molecule. The acentric factors of ILs were estimated as follows46:

$$\omega = \frac{{\left( {T_{b} - 43} \right)\left( {T_{c} - 43} \right)}}{{\left( {T_{c} - T_{b} } \right)\left( {0.7T_{c} - 43} \right)}}\log \left( {\frac{{P_{c} }}{{P_{b} }}} \right) - \left( {\frac{{T_{c} - 43}}{{T_{c} - T_{b} }}} \right)\log \left( {\frac{{P_{c} }}{{P_{b} }}} \right) + \log \left( {\frac{{P_{c} }}{{P_{b} }}} \right) - 1$$
(5)

In Table 2 critical properties and acentric factor of DESs have been reported.

Table 2 The list of studied DESs and their thermo-physical properties that estimated by the modified Lydersen and Joback–Reid45,46, and references of speed of sound experiential data.

The GC method serves as an effective approach to estimate the thermophysical properties of DESs based on their molecular structures. This is particularly useful when experimental data is lacking, allowing researchers to generate necessary property estimates that can inform the modeling processes. In this work, the GC approach has been integrated with ANNs to predict the speed of sound in DESs.

Thoroughly analyzing and preparing input data is vital for building robust machine learning models. Each of these steps contributes to improving the quality of the input data, which in turn enhances the model’s ability to learn and generalize from the training data. The input data have been analyzed and reported in Table 358.

Table 3 Statistical description of the data set used for modeling.

In the next section, the ANN + GC method has been presented.

ANN methodology

The ANN can be likened to a black box with multiple inputs and outputs. The number of neurons can vary widely, ranging from fewer than ten to tens of thousands. These neurons can be organized into one, two, three, or more layers, with the two-layer network being the most commonly used ANN design for chemical applications. In the ANN methodology, inputs are fed into the input layer and then transmitted to the hidden layer using a specific transfer function. The transformed inputs are subsequently relayed to the output layer to estimate the desired properties. The estimated values are then compared to experimental data to analyze the error using the objective function (OF). Finally, the results are fed back into the system, and this process is repeated using a trial-and-error approach to minimize the objective function (OF) error. The number of neurons in the hidden layer is adjusted during these trials to achieve the best possible outputs with the lowest OF. This adjustment is guided by error analysis. Initially, one neuron is used to estimate the error with the training subset. Next, two neurons are evaluated, and this process continues, incrementing the number of neurons until the optimal configuration is found based on the minimum error value obtained from the testing subsets. Consequently, if the desired error is not achieved, the number of neurons in the hidden layer is increased. It must be noted that all input data are divided into two subsets namely training and testing. In Fig. 1 the schematic diagram of the neural network has been shown.

Fig. 1
Fig. 1
Full size image

The neural network diagram.

It must be noted that the weight parameters between neurons in hidden layers are utilized to develop the network. Summations of outputs of the previous layer in hidden layers must be weighted and added with bias. Hyperbolic tangent sigmoid transfer function is utilized in the hidden layer as follows:

$$n_{j} = \frac{2}{{1 + {\text{exp}}\left( { - 2Z} \right)}} - 1$$
(6)
$$Z = \mathop \sum \limits_{i = 1}^{r} w_{ij} p_{i} + b_{j}$$
(7)

where, \(n_{j}\) is the jth neuron output, \(w_{ij}\) refers to weights of ith neuron in the previous layer to the jth neuron, \(b_{j}\) is the bias, and \(p_{i}\) is output. In this work, the weight and Bias of output and hidden layer have been reported in Table 3. The speed of sound of new-designed DESs can be predicted using the weight, Bias, and input values (obtained by the GC approach). The results of the ANN + GC model have been described in section “ANN+GC method”.

Machine learning model

Machine learning is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit instructions. Instead of being programmed with specific rules, machines learn from data and improve their performance over time. In this study the Categorical Boosting (CatBoost) machine learning approach has been used59,60. It excels at handling categorical features and often achieves state-of-the-art results in various machine learning tasks, particularly those involving tabular data. It’s designed to be easy to use, robust, and provides excellent accuracy. CatBoost directly handles categorical features without needing extensive preprocessing like one-hot encoding (though it can still benefit from good feature engineering)61. It uses a special method to encode categorical features called Target Statistics (also known as Ordered Target Encoding). This method reduces overfitting compared to naive target encoding. For each categorical feature, CatBoost calculates the average value of the target variable for each category. However, to prevent overfitting (a common problem with target encoding), it uses a more sophisticated approach to calculate these statistics. It avoids using the target value of the current row when calculating the target statistic for that row. This is done by calculating the target statistic based on the rows that came before the current row in the dataset’s order. Random permutations of the data are used to further reduce overfitting. CatBoost implements a variation of gradient boosting known as ordered boosting. This helps to address gradient bias that can occur in traditional gradient boosting algorithms, especially when dealing with categorical features. Gradient bias arises because the model is trained using the same data that was used to calculate the gradients, leading to an overestimation of the model’s performance. Ordered boosting aims to correct this bias. Like other gradient boosting methods, CatBoost iteratively builds an ensemble of decision trees. Each tree is trained to correct the errors made by the previous trees. The optimization process is driven by gradients calculated from a loss function. Within CatBoost, categorical boosting entails the utilization of categorical columns, incorporating permutation techniques such as one hot max size (OHMS) and target-based statistics. This method employs a greedy approach for each new split of the current tree, enabling CatBoost to unveil the exponential expansion of feature combinations61. The CatBoost method follows these steps:

  • Formation of a random subset of the records

  • Converting labels to integers

  • Transforming categorical features into numeric values, as outlined below:

    $${\text{Average}}\,{\text{Target}} = \frac{{{\text{Count in Class }} + {\text{ Prior}}}}{{{\text{Total Count }} + { }1}}$$
    (8)

In the equation provided, the parameter “Count in Class” aggregates the targets. Furthermore, each target is assigned a value of one and linked to specific categorical features, while the term “Total Count” tallies all previous instances62. In Fig. 2, schematic of the CatBoost tree construction has been depicted.

Fig. 2
Fig. 2
Full size image

Schematic of the CatBoost tree construction.

Statistical error analysis

Statistical error analysis involves examining the errors and uncertainties in statistical measurements and data analysis. This is a crucial process in many fields, including science, engineering, economics, and social sciences, as it helps in assessing the reliability and validity of conclusions drawn from data. In this work the model results have been evaluated using the average relative deviation percent (ARD%), standard deviation (SD), mean absolute error (MAE), root mean square deviation (RMSE), and R2 values as follows:

$${\text{ARD}}\left( \% \right) = \frac{100}{N}\mathop \sum \limits_{i = 1}^{N} \left| {\frac{{U_{i}^{exp} - U_{i}^{calc} }}{{U_{i}^{exp} }}} \right|$$
(9)
$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{N} (U_{i}^{exp} - U_{i}^{calc} )^{2} }}{N}}$$
(10)
$${\text{SD}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{N} \frac{{(U_{i}^{exp} - U_{i}^{calc} )^{2} }}{{U_{i}^{exp} }}}}{N - 1}}$$
(11)
$${\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {U_{i}^{exp} - U_{i}^{calc} } \right|$$
(12)
$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {U_{i}^{exp} - \overline{U}_{i}^{calc} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{N} (U_{i}^{exp} - \overline{U}^{exp} )^{2} }}$$
(13)

where \(U_{i}^{exp}\) and \(U_{i}^{calc}\) are experimental and estimated speed of sound of DESs. \(\overline{U}^{exp}\) is the average value of the experimental speed of sound. In the next section the results of ANN + GC and ML + GC approaches have been presented.

Results and discussion

In the previous section the ANN and ML methodologies for the prediction of the speed of sound of DESs has been described and depicted in Figs. 1 and 2. In this section the ANN + GC and ML + GC approaches have been described separately.

ANN + GC method

The number of independent variables (input layer) plays a crucial role in ANN and ML methods63. In Table 1, thermodynamic properties of DESs containing Tc, Vc, MW, and ω have been reported. In the case of ANN method, different input properties and different numbers of neurons in one and two hidden layers have been considered, because one hidden layer only does not lead to adequate results64,65,66,67. The number of the hidden layer and neuron in the hidden layer and output are obtained using a trial and error algorithm. In this work the Levenberg–Marquardt algorithm68,69 is used to optimize the aforementioned parameters. The results show that, one hidden layer containing 16 neurons and four input properties containing the critical volume (Vc), molecular weight (Mw), temperature, and acentric factor (ω) are the optimum values. As described in section “Theory and methodology”, 300 training and 115 testing data points of the speed of sound have been considered.

As mentioned in Eq. (7), the weight parameters between neurons in hidden layers are essential to develop the network. In Table 4 the weight parameters of neurons have been reported.

Table 4 The weight of hidden and output layers for 16 neurons.

The optimum values of neurons in the hidden layer are evaluated using the average relative deviation percent (ARD%). When the optimum network architecture was determined, the input data of the ten DESs were fed to the network to predict their speed of sound. In Fig. 3 the flowchart of the proposed ANN model has been depicted.

Fig. 3
Fig. 3
Full size image

Flowchart of proposed model.

As shown in Fig. 3, the model can predict the speed of sound of DESs using independent variables containing T, Vc, ω, and MW. The inputs containing Vc, and ω can be estimated using GC approaches45,46. Experimental literature data is used as the training dataset for sound speed. An ANN with one hidden layer containing sixteen neurons is employed to develop the model. Four input variables can be fed into the saved file to generate predicted sound speed. Using the “saved network” and these four inputs, the sound speed of DESs can be accurately predicted. The complete MATLAB code, including all source files used in the programming, is provided in the Supplementary Material. The correlated and predicted results of the ANN + GC approach have been shown in Fig. 4.

Fig. 4
Fig. 4
Full size image

The results for the correlated (a) and predicted (b) speed of sound using the ANN + GC approach.

Figure 4 shows that the ANN + GC approach can correlate the speed of sound of DESs over a wide range of temperatures, satisfactory. The ARD% and R2 of the correlated speed of sound have been obtained 0.032% and 0.9988, respectively. Figure 4b shows the prediction of the speed of sound of ten DESs using the ANN + GC approach. The results are in good agreement with experimental data. Figure 5 shows a simultaneous comparison between the experimental data and ANN + GC data.

Fig. 5
Fig. 5
Full size image

The speed of sound of DESs obtained using the ANN + GC. () Experimental data and (∆) ANN + GC.

As shown in Fig. 5, the ANN + GC method can correlate the experimental data accurately. Distributions of the deviation points for the ANN + GC method are shown in Fig. 6.

Fig. 6
Fig. 6
Full size image

Deviations between calculations from ANN + GC and experimental speed of sound data at different temperatures.

As shown in Fig. 6, the deviations between the ANN + GC predictions and experimental data do not exceed 4 m/s. Error analysis indicates that the proposed network is suitable for engineering calculations. In this study, the predictive performance of the ANN + GC model was assessed using R2, ARD%, SD, MAE, and RMSE metrics; see Table 5.

Table 5 Statistical error analysis for the ANN + GC model.

As shown in Table 5, the total ARD%, MAE, SD, RMSE, and R2 values have been obtained 0.032%, 1.5656, 0.0549, 2.227, and 0.9988, respectively. The results of the ANN + GC approach show good agreement with experimental data. Models with high R2 values nearing 1 and low values for ARD%, RMSE, MAE, and SD are considered more accurate in predicting the speed of sound. In this study, the ARD% for the training and testing phases of the ANN + GC model were 0.024% and 0.053%, respectively. The overall R2 value approached unity at 0.9988. These results indicate that the ANN + GC model can accurately correlate the speed of sound in DESs across a wide temperature range. In the next section, the ML + GC model has been studied.

ML + GC method

Similar to the ANN + GC method, the inputs for the ML + GC model included Vc, ω, Mw, and T. Additionally, 300 training and 115 testing data points of the speed of sound were used to develop the machine learning approach. The statistical metrics for the CatBoost model are summarized in Table 6, which presents evaluations on the training and testing subsets (300 and 115 data points, respectively), as well as the complete dataset consisting of 415 points. In this study, the predictive performance of the models was assessed using R2, ARD%, SD, MAE, and RMSE metrics. Comparing model predictions with experimental data across both training and testing datasets provides valuable insights into the models’ accuracy and generalization capability; see Table 6.

Table 6 Statistical error analysis for the ML + GC model using CatBoot approach.

The greater the alignment between the predicted value and the experimental data, the higher the accuracy of the predictive model. In Figs. 7 and 8, the error distribution plot of the presented model vs the predicted speed of sound has been depicted. This visual representation demonstrates the robust agreement between the experimental data and the forecasts produced by the CatBoost ML method.

Fig. 7
Fig. 7
Full size image

Error distribution plot of the ML + GC model to predict speed of sound.

Fig. 8
Fig. 8
Full size image

Cross-plot of the ML + GC model to predict speed of sound.

Figures 7 and 8 illustrate a strong correlation between the model-predicted data and the experimental data across both the training and testing datasets. These figures demonstrate a very close alignment between the model predictions and experimental points. In this research, graphical analysis complemented statistical methods to provide a more comprehensive evaluation of the models’ performance. These visual representations played a vital role in assessing the accuracy and reliability of the models. The percentage distribution of the relative error against the experimental values is presented in Fig. 7. In this type of error evaluation, relative error values are plotted against experimental output values. The closer the data points are to the zero-error line, the model is the more accurate. When the data points are scattered around the zero line, it indicates a significant difference between the predicted values and the experimental data, which proves the high error of the model. As a result, the proximity of the data points to the zero line for the ML + GC model indicates the high accuracy of this model. In Fig. 8, the cross-plot has been depicted. The cross-plot visually represents the comparison between predicted and experimental values. A closer alignment of data points with the unit slope line (Y = X) in the cross plot signifies higher accuracy and effectiveness of the model. The ML + GC model shows significant performance with most of the data points lying around the Y = X line. In the next section, a comparison between ANN + GC, ML + GC, and the correlation-based models has been investigated.

Comparison between ANN + GC, ML + GC, and the correlation-based models

The ANN + GC and ML + GC results have been compared to five correlation-based models24,70,71,72,73. Singh and Singh proposed a correlation for speed of sound based on the surface tension and density70. Hekayati and Esmaeilzadeh suggested a novel interrelationship between surface tension (σ), density (ρ), and speed of sound (u) of ILs71. Gardas and Coutinho proposed a relationship between surface tension (σ), density (ρ), and speed of sound (u) for imidazolium based ILs, covering wide ranges of temperature, 278.15–343.15 K73. The aforementioned models are correlation-based. In Table 7 the ARD% of five correlation-based, ML + GC, and ANN + GC models have been reported and compared.

Table 7 ARD% values of ANN + GC, ML + GC, and five correlation-based models.

The average ARD% value of Peyrovedin et al.24 model was obtained 5.67%. ARD% values of Haghbakhsh et al.’s model72, Hekayati and Esmaeilzadeh’s model71, and Gardas and Coutinho’s model73 for 38 DESs have been obtained 9.52%, 9.38%, and 9.45%, respectively. The average ARD% value of Singh and Singh’s model70 was obtained about 39%. They correlated the speed of sound of ILs using surface tension and density data. As shown in Table 7, the Peyrovedin et al.24 model gives a lower ARD% value compared to other correlation-based models. The ANN + GC and the ML + GC models give lower error values compared to correlation-based models. The ARD% of the ANN + GC model is slightly lower than the ML + GC model. In Fig. 9 the speed of sound of some DESs using the ANN + GC approach have been compared to experimental data.

Fig. 9
Fig. 9
Full size image

Prediction of speed of sound of DESs using ANN + GC approach. Lines are model prediction and symbols are experimental data. Standard uncertainty of DES speed of sound is 1.0 m/s.

As shown in Fig. 9, the ANN + GC correlates the speed of sound of DESs satisfactory. The average ARD% of the ANN + GC model was obtained 0.032%. In Fig. 10, the ANN + GC model results have been compared to experimental data and H. Peyrovedin et al. model.

Fig. 10
Fig. 10
Full size image

Correlation of speed of sound of DESs using ANN + GC approach (lines) and Peyrovedin et al. model (dashed-lines). Symbols are experimental data. Standard uncertainty of DES speed of sound is 1.0 m/s.

As depicted in Fig. 10, the ANN + GC approach correlates the speed of sound of four DESs at various temperatures accurately. In the case of DES1, the ARD value of H. Peyrovedin et al. model is higher than ANN + GC, nevertheless, their obtained results are acceptable. Figure 10 shows that, their proposed correlation is accurate at lower temperatures, and the model deviations are increased by increasing temperature. As reported in Table 7, and Figs. 9 and 10, the average ARD% value of the testing and training results of the ANN + GC are acceptable. In Fig. 11, the ANN + GC model has been compared to the ML + GC and five correlation-based models.

Fig. 11
Fig. 11
Full size image

Comparison of the behavior of the speed of sound of DES 4 versus the temperature for the ANN + GC model, ML + GC model, and literature models. (-) ANN + GC, (--) ML + GC, (- - -) H. Peyrovedin et al.24, (-..-) Haghbakhsh et al.’s model72, (…) Hekayati and Esmaeilzadeh’s model 71, (-.-) Gardas and Coutinho’s model73, and (=.=) Singh and Singh’s model70. Symbols are experimental data.

As shown in Fig. 11, the average ARD% values of the ANN + GC approach are lower than the correlation-based models. The average error values of ANN + GC and ML + GC models are comparable. In Fig. 12, the error distribution plot for ten DESs has been depicted.

Fig. 12
Fig. 12
Full size image

Error distribution plot for ten DESs. Corr_1, Corr_2, Corr_3, Corr_4, and Corr_5 refer to H. Peyrovedin et al.24, Haghbakhsh et al.’s model72, Hekayati and Esmaeilzadeh’s model71, Gardas and Coutinho’s model73, and Singh and Singh’s model70.

Cumulative frequency diagrams are one of the graphical methods used for evaluating model performance74. Figure 13a and b illustrate the cumulative frequency diagrams of the ANN + GC and ML + GC models, along with five correlations (as reported in Table 7).

Fig. 13
Fig. 13
Full size image

Cumulative frequency plot for all studied DESs. (a) ANN + GC and ML + GC methods, (b) Corr_1, Corr_2, Corr_3, Corr_4, and Corr_5 refer to H. Peyrovedin et al.24, Haghbakhsh et al.’s model72, Hekayati and Esmaeilzadeh’s model71, Gardas and Coutinho’s model73, and Singh and Singh’s model70, respectively.

As shown in Fig. 13a, approximately 90% of the values estimated by the ANN + GC model exhibited an ARD% of less than 0.07%. In the case of ML + GC model, 90% of the ARD% values are less than 0.1%. In Fig. 13b, the cumulative frequency of five correlation-based models has been depicted. The correlation developed by Singh and Singh’s70 demonstrated poor performance. The results show that, the ANN + GC model achieves high precision in forecasting speed of sound of DESs compared to the five correlation-based model.

The leverage approach for model analysis

The leverage approach is a valuable tool for ensuring the quality and reliability of statistical models. Identifying and addressing high-leverage points, can improve model accuracy, enhance data understanding, and lead to more informed decision-making75. Leverage values help identify observations that have a disproportionate influence on the regression coefficients. Points with high leverage and large residuals are particularly problematic, as they can significantly distort the model fit. Leverage diagnostics are used during model validation to assess the stability and generalizability of the model. If the model is highly sensitive to a few high-leverage points, it may not perform well on new data. High-leverage points often indicate data errors or unusual events. Identifying these points allows for a targeted investigation of the data to identify and correct errors or to understand the underlying causes of the unusual observations. High leverage points can sometimes indicate the need to include additional predictor variables in the model. In some cases, transforming the predictor or response variables can reduce the influence of high-leverage points and improve the model fit. As well, investigating high-leverage points can provide valuable insights into the data and the underlying processes that generated it. In this study the leverage approach has been utilized to study the ANN + GC model. In this regard, standardized residuals (SR) and Leverage values, derived from the diagonal elements of the hat matrix have been calculated. The hat matrix was given by:

$$H = X\left( {X^{t} X} \right)^{ - 1} X^{t}$$
(14)

where \(X^{t}\) refers to the transpose of matrix X. The critical leverage was calculated as 3(n + 1)/m. where m and n represent the number of data points and model input variables, respectively. The applicability domain of the ANN + GC model can be assessed by plotting standardized residuals against leverage values (Williams plot). The Williams plot is the most common and direct way to do this. The applicability domain (AD) of a model is the region where the model is considered reliable for making predictions. In simpler terms, it’s the set of conditions under which you can trust the model’s output. Extrapolating beyond the AD can lead to inaccurate or unreliable predictions.

By plotting standardized residuals against leverage values against each other, the Williams plot allows you to identify observations that:

  • Are outliers (large standardized residuals)

  • Have high leverage (unusual predictor values)

  • Are both outliers and have high leverage (potentially very influential and problematic)

If the majority of data points were situated within the boundaries of the 0 ≤ H ≤ critical leverage, and—3 ≤ SR ≤ 3, the established model is deemed reliable, and its predictions are confined within the applicability domain75. In Fig. 14, the William’s plot is illustrated.

Fig. 14
Fig. 14
Full size image

The Williams plots for outlier detection using the ANN + GC model.

As shown in Fig. 14, the critical leverage value has been obtained about 0.0545. As depicted in Fig. 14, most of the data point falls between 0 ≤ H ≤ 0.0545, and—3 ≤ SR ≤ 3. The results indicated that, the ANN + GC model is highly reliable. There are some suspicious data (SR > 3 or SR <—3). Figure 14 shows that, only five data points have an SR-value outside the range of—3 to 3, classifying them as questionable data. On the other hand, all data points have H values lower than 0.0545. This result indicated that all data points have satisfactory leverage. The Leverage approach confirms the accuracy of databank and the high reliability of ANN + GC model in estimating speed of sound of DESs.

In the next section, the sensitivity analysis (SA) of input variables in the ANN + GC model has been studied.

Sensitivity analysis

Sensitivity analysis in ANNs involves determining how much each input variable influences the network’s output. It helps you understand which inputs are most important and how changes in those inputs affect the model’s predictions. Sensitivity analysis using weight-based methods involves evaluating the influence of input variables on the output by analyzing the weights within the network. These methods are generally more straightforward and computationally less expensive than perturbation-based methods. Garson suggested an equation based on partitioning of connection weights for sensitivity analysis of input variables as follows76:

$$IF_{j} = \frac{{\mathop \sum \nolimits_{m = 1}^{Nh} \left( {\left( {\frac{{\left| {w_{jm}^{ih} } \right|}}{{\mathop \sum \nolimits_{k = 1}^{Ni} \left| {w_{km}^{ih} } \right|}}} \right).w_{mn}^{ho} } \right)}}{{\mathop \sum \nolimits_{k = 1}^{Nh} \left\{ {\mathop \sum \nolimits_{m = 1}^{Nh} \left( {\left( {\frac{{\left| {w_{km}^{ih} } \right|}}{{\mathop \sum \nolimits_{k = 1}^{Ni} \left| {w_{km}^{ih} } \right|}}} \right).w_{mn}^{ho} } \right)} \right\}}}$$
(15)

where IFj is the relative importance of the jth input variable on output variable; Ni and Nh refer to the number of input and hidden neurons, respectively. The superscripts i, h and o refer to input, hidden and output layers, respectively. The subscripts k, m and n refer to input, hidden and output layers, respectively. w is connection weights. The relative importance of input variables (IFj) were calculated by Eq. (15). This approach expands on Garson’s method by considering the direct and indirect paths from inputs to outputs. It involves calculating the influence of each input across the network layers into the final output. In Fig. 15 the importance of input variables based on normalized percentage has been depicted.

Fig. 15
Fig. 15
Full size image

Relative importance (%) of input variables on the value of the speed of sound of DESs.

It is evident that all selected input variables have a strong influence on the speed of sound values, with importance levels ranging from 21 to 29%. However, it is important to note that highly nonlinear models or coupled input variables can complicate sensitivity analysis. These results highlight which inputs have the most significant impact on the output, aiding in model refinement, feature selection, or providing insight into the underlying process. As shown in Fig. 15, the contributions are typically normalized to sum to 1 (or 100%) to facilitate easier interpretation of the results.

In summary, ANN methods have several key advantages and disadvantages77. They can model complex nonlinear relationships by selecting an appropriate architecture through trial and error. Once the input layer, the number of neurons, and hidden layers are established, ANNs can predict values beyond those considered during training without reprogramming. However, acquiring large datasets is often challenging and time-consuming. Additionally, the complexity of ANNs can make their implementation difficult. Another drawback is that ANNs require robust central processing units (CPUs) or hardware, which can be resource-intensive. This study demonstrates the strong performance of ANN models in predicting second-order derivative thermodynamic properties, such as the speed of sound in DESs, despite the aforementioned limitations. Traditionally, equations of state (EoS) models have been widely used to estimate the thermo-physical properties of complex systems like ILs and DESs. However, predicting the speed of sound using EoS-based models requires the ideal gas heat capacity of the pure components. Estimating this property using GC models often results in significant deviations in some cases. Consequently, researchers are seeking alternative approaches to predict the speed of sound and specific heat capacity without relying on ideal gas heat capacity estimations. This work shows that the ANN + GC method can be considered a robust and efficient alternative, particularly for predicting second-order derivative thermodynamic properties, such as the speed of sound.

Conclusions

In this work, the Group Contribution (GC) approach was employed to estimate the input variables for the ANN model. Critical properties and the acentric factor were determined based on the molecular structure of DESs, utilizing the Lydersen and Joback–Reid GC methods. The combined ANN + GC model was developed to predict the speed of sound in DESs across a wide temperature range. For model development, 415 data points from 38 DESs were selected. The results indicate that a single hidden layer with sixteen neurons provides optimal values for ARD% and R2. The model’s performance was evaluated using several metrics: average relative deviation percent (ARD%), standard deviation (SD), mean absolute error (MAE), root mean square deviation (RMSE), and R2. The findings demonstrate that the ANN + GC model can accurately estimate the speed of sound in DESs over a wide temperature range. The obtained values for the metrics were ARD% = 0.032%, SD = 0.0549, MAE = 1.5656, RMSE = 2.227, and R2 = 0.9988. The model results were compared to five correlation-based models and a Machine Learning (ML) model. Similar to the ANN + GC approach, the GC method was combined with the ML model. The results indicated that the ML + GC and ANN + GC models were comparable; however, the ANN + GC model showed slightly lower error values. Overall, the error metrics for the ANN + GC approach were lower than those of the five correlation-based models. The cumulative frequency diagrams and the leverage approach were implemented to validate the quality and reliability of the proposed model. The leverage analysis confirmed the accuracy of the data used and the high reliability of the ANN + GC model for estimating the speed of sound in DESs. The primary goal of this study was to predict the speed of sound in DESs based solely on their molecular structure, without relying on any experimental data. This work demonstrates that the ANN + GC method can serve as a robust model for predicting second-order derivative thermodynamic properties of DESs in the absence of experimental measurements. The proposed model could be employed in future studies, particularly in the pre-design of new solvents.