Introduction

Encapsulating small gas molecules within an aqueous solution results in the formation of solid compounds known as clathrate hydrates. In such compounds, the gaseous molecules, e.g., CO2, N2, CH4 and H2 act as guest components. While the water molecules host the mentioned molecules1,2,3,4,5,6,7. These solid compounds appear at low temperatures and high pressures. Under such conditions, water molecules form 3D cage-like networks that readily trap gaseous components8,9. Several factors, such as the interactions between the trapped gas and the water, the presence of inhibitive materials (e.g., salts) in the water, and the physical size of the gas molecules can affect the stability of clathrate hydrates. Clathrate hydrates can form in various configurations, including sH, sI, and sII, according to sizes of constituent molecules10. The sH structure features three cages for capturing gas molecules, while the sI and sII structures each have two cages of different sizes. It should be mentioned that if the gas molecules are too large to fit inside these cages, the hydrate formation cannot occur11,12.

Clathrate hydrates pose considerable challenges across diverse industries. Thereupon, several strategies have been implemented for mitigating their harmful effects13. Indeed, they can obstruct gas pipelines, thereby compromising the safety and integrity of industrial processes14. However, in recent decades, these compounds have attracted considerable interest owing to their exceptional features15,16,17,18,19,20. For instance, their capability to encapsulate methane, with an approximate capacity of 170 times their own volume21 positions them as a sustainable energy resource, which is often observed in permafrost regions22,23,24,25. Furthermore, they demonstrate a favorable performance in gas purification, enabling the selective separation of diverse gaseous components26,27,28,29,30,31,32,33,34. On the other hand, the formation of clathrate hydrates exclusively utilizes pure water, without involving dissolved salts. Hence, this process can be considered a promising method for water desalination27,35. Additionally, recent studies have indicated that hydrate-based suspensions show high potential as working fluids in refrigeration systems36,37.

A thorough understanding of the clathrate hydrate equilibrium in saline water solutions is of special importance for various reasons38,39,40,41,42,43. The injection of saline water, as a practical matter, is a well-known industrial strategy to avoid hydrate formation within pipelines44,45,46. In addition, it facilitates efficient exploration in permafrost regions, where drilling operations can be particularly challenging. Furthermore, the recent efforts on natural gas exploration are focused on subsalt formations, environments exhibiting elevated salinity and pressure21,47,48. Consequently, extensive theoretical and experimental evaluations have been undertaken to determine the specific pressure and temperature conditions under which clathrate hydrates will be formed49,50,51.

Various predictive frameworks for hydrate formation equilibrium have been developed based on thermodynamic-based models. Among statistical thermodynamic approaches, the van der Waals-Platteeuw (vdW-P) model is widely used for this purpose. Although this model is capable of accurately estimating the hydrate formation condition and give valuable insight about the equilibrium, it requires a complicated process to determine the Langmuir adsorption constants52,53,54,55,56. Additionally, previous studies have shown some errors for this model at high pressures and high salt concentrations57,58,59,60,61. A comprehensive review on the limitations of vdW-P model has been presented by Medeiros et al.55. The well-known equations of state are also utilized for describing the hydrate equilibrium condition. However, these methods are also associated with some complexities, especially in the calculation of intermolecular interaction factors62,63. While empirical correlations have shown some success in modeling clathrate hydrate equilibrium, they should not be considered general modeling approaches due to their limited application to particular operating conditions.

The predictive capability and robustness of machine learning algorithms have established them as widely used approaches for predicting the conditions under which clathrate hydrates can be formed64,65,66,67,68. In a relevant study, the support vector machine (SVM) methodology was implemented by Eslamimanesh et al.69 utilized the support vector machine (SVM) to model the equilibrium conditions for hydrate formation across different gas blends. This approach exhibited strong performance, achieving relative deviations often below 10%. Similarly, Baghban et al.70 applied the SVM technique to estimate the temperature at which natural gas hydrates are created. Their computational model, built and validated based upon 710 experimental data collected from the Katz diagram71, demonstrated a high degree of accuracy with an average error of just 0.14% during the training step. A comparative analysis between various intelligent techniques for describing the methane hydrate equilibrium in four saline water solutions, i.e., CaCl2, MgCl2, KCl and NaCl, was undertaken by Xu et al.21. The most precise outcomes were produced by the model established based on the gradient boosting regression (GBR) method. By employing the genetic programming (GP), Amar et al.72 derived a mathematical equation to calculate the hydrate formation equilibrium temperatures in natural gas blends. This model yielded an average deviation of 0.14% for all 279 analyzed data. The potential of the extremely randomized trees (ERT) algorithm in modeling the equilibrium of clathrate hydrate in both saline water and alcohol-based solutions was studied by Yarveicy and Ghiasi73. Their results indicated a high level of agreement between predicted and actual data with a total R2 value greater than 96%. Hosseini and Leonenko56 designed intelligent models that allowed the accurate determination of methane hydrate equilibrium in saline waters. However, they defined 13 independent features, including pressure and the concentrations of cationic and anionic species, in the models. Although this concept enhanced the models’ predictive accuracy, it also increased the complexity of the designed models. Additionally, the models are only applicable to saline water solutions containing the ionic compounds analyzed in their study.

The growing availability of methane in recent years has led to an increased interest in gas hydrate-based technologies. Hence, a vast body of research has focused on the application of this methodology for methane storage. The optimal design of the relevant processes necessitates the development of reliable predictive tools for methane hydrate phase equilibrium. Furthermore, studies on the application of intelligent approaches in the field is underdeveloped, and the existing models suffer from several limitations in their applicability range. On the other hand, the identification of operational parameters that have significant impacts on methane hydrate equilibrium is crucial for practical applications. This study seeks to address the aforementioned limitations. To reach this target, 1051 experimentally derived data points were assembled from various studies. In fact, the hydrate formation temperature of methane (HFTM) in 26 distinct saline water solutions is analyzed across a broad range of conditions. Two advanced computational methods, namely SVM and DT, are employed to link the HFTM with pertinent influencing factors. A suite of rigorous assessments are conducted based on the visual representations and statistical indices to prove the robustness and validity of the established models. Additionally, the outputs of the models are used to study the impact of operational variables on HFTM, and a sensitivity analysis is performed to elucidate the factors with the most significant influence.

Methodology

Experimental data gathering

In this research, a comprehensive review of existing studies in the field was performed to prepare an extensive dataset concerning the equilibrium temperatures of methane hydrates in saline water solutions. This step is vital for the development of reliable and broadly applicable models using data-driven techniques. Table 1 summarizes the relevant experimental studies available in the literature. To construct an integrated database, all reported measurements from these studies were assembled. The big databanks collected from various sources are susceptible to the existence of inaccurate data due to various reasons, such as instrument malfunctions and errors made during data recording. The creation of robust and widely applicable predictive tools necessitates precise datasets. Consequently, a thorough evaluation was performed to identify and address any inconsistency within the collected data. A HFTM value of 72.22 K, reported by Dholabhai et al.74, appeared anomalous. Given that the authors declared the experimental HFTM range of 264 to 284 K, the mentioned value was deemed a suspected sample and subsequently excluded from the analyzed data. Furthermore, duplicate data can negatively impact the learning process of intelligent models. Therefore, the dataset underwent an analysis to detect such duplicates. The observed similarity between the duplicate values suggested that these discrepancies arose from variations in measurement methodologies or repeated measurement attempts. For the purposes of this study, only a single instance from each set of duplicate data points was incorporated into the final databank. Overall, 1051 experimental observations from 23 sources are analyzed. These data points include the equilibrium temperatures of methane hydrate in 26 diverse saline water solutions, across a broad range of pressures and salinities.

Table 1 Details of the analyzed sources for HFTM in saline water solutions.

Intelligent methods

DT

Decision tree (DT), as a class of nonparametric supervised learning algorithms, offers a versatile framework for addressing both regression and classification problems93,94,95,96,97. A regression tree is characterized by its hierarchical organization, consisting of nodes interconnected by branches, as shown in Fig. 1. The root node, situated at the apex of the tree, encompasses the entirety of the data space. Internal nodes, distinguished by a single incoming branch and multiple outgoing edges, represent decision points predicated on specific attributes98,99. Terminal nodes, also known as leaves, mark the final outcome of the decision process.

The core process of constructing a regression tree involves three crucial steps: splitting, stopping, and pruning. Splitting entails partitioning the dataset into distinct subsets, guided by the identification of the most relevant attribute100,101. This selection process often relies on metrics such as classification error, Gini index, information gain, and gain ratio, as elucidated by Patel and Upadhyay102. To prevent overfitting and maintain model generalizability, stopping criteria are judiciously employed. These criteria regulate the complexity of the tree by imposing constraints on the minimum number of data points within a node or leaf before splitting, and by limiting the depth of the tree103,104. Without these constraints, the tree could potentially become overly complex, resulting in perfect classification of the training data but poor performance on unseen data. Pruning, a further technique for mitigating overfitting, is implemented when stopping criteria alone prove insufficient105. This process entails the generation of a complete decision tree, followed by the selective removal of nodes that offer minimal information gain or contain inadequate validation data106. This pruning step results in a more parsimonious and robust model with enhanced generalization capabilities.

Fig. 1
figure 1

A schematic illustration of the DT algorithm.

SVM

SVM is a machine learning tool that is broadly utilized in regression and classification problems107,108. This machine learning approach has been developed based on the structural risk minimization (SRM) concept, which makes it possible to reduce the risk in the learning process, and improve the prediction capability of the model109,110,111,112,113. The principal idea of SVM includes the approximation of the training data as a linear regression function following the mapping of input features samples into a high-dimensional space. The training data are defined as \(\:Z=\left\{{x}_{i},{y}_{i}|\:i=\text{1,2},\dots\:,n\right\}\), where \(\:{x}_{i}\) denotes a m-dimensional vector that represents the values of input variables, \(\:{y}_{i}\) stands for the corresponding values of output function, and \(\:n\) is the number the training samples. Hence, the following regression function is established by the SVM method114,

$$\:y=b+{W}^{T}\theta\:\left(x\right)\:$$
(1)

where \(\:b\) and \(\:W\) are the symbols of bias term and weight vector, respectively. Also, \(\:\theta\:\left(x\right)\) is a function that nonlinearly maps \(\:x\) into a high-dimensional space. The optimal values of \(\:W\) are determined through the minimization of the following function,

$$\:H=\frac{1}{2}{W}^{2}+C\sum\:_{i=1}^{n}\left({\zeta\:}_{i}+{\zeta\:}_{i}^{*}\right)\:$$
(2)

With the constrains defined as115,

$$\:{y}_{i}-\left\{b+{W}^{T}\theta\:\left({x}_{i}\right)\right\}\le\:{\zeta\:}_{i}+\psi\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:i=\text{1,2},\dots\:,n$$
(3)
$$\:\left\{b+{W}^{T}\theta\:\left({x}_{i}\right)\right\}\le\:{\zeta\:}_{i}^{*}+\psi\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:i=\text{1,2},\dots\:,n$$
(4)

where \(\:{\zeta\:}_{i}\) and \(\:{\zeta\:}_{i}^{*}\) denote the slack variables, and \(\:\psi\:\) is the accuracy of regression in the training phase. In addition, the penalization parameter, i.e., \(\:C\) is responsible to control the trade-off between the training deviation and the complexity of the model116. After defining the constrains and applying the Lagrange multipliers, the final solution of SVM can be expressed as follow,

$$\:f\left(x\right)=\sum\:_{i=1}^{n}\left({\mu\:}_{i}-{\mu\:}_{i}^{*}\right)G\left({x}_{i},{x}_{j}\right)+b$$
(5)

where \(\:{\mu\:}_{i}\) and \(\:{\mu\:}_{i}^{*}\) stand for non-negative Lagrange multipliers, and \(\:G\left({x}_{i},{x}_{j}\right)\) is the kernel function117. The performance of a model developed by the SVM method strongly depends on the selected kernel function. In this context, the gaussian kernel function has been employed,

$$\:G\left({x}_{i},{x}_{j}\right)={exp}\left(-\left|\left|x-{x}_{i}\right|\right|/{\sigma\:}^{2}\right)$$
(6)

where \(\:\sigma\:\) is the variance of the gaussian function, which is tuned during the training phase.

Input features of the models

Researches conducted based on experimental measurements reveal significant influences of pressure and the salt composition on HFTM. Consequently, these factors are of particular importance and must be considered when constructing predictive models. Furthermore, as noted earlier, the analyzed data points encompassed the HFTM in 26 saline water solutions. Thus, to develop robust models that can differentiate the HFTM across varying solutions, the input features must include some parameters reflecting the physical properties of the salts. Herein, the melting temperature as well as the molar mass of the salts were chosen to serve this purpose. Consequently, the correlation between the HFTM and input features can be defined as follows,

$$\:{T}_{E}=f\left({{Mw}_{s},T}_{ms},P,\:W\right)$$
(7)

Statistical evaluation

The error metrics defined below were used as standard metrics to evaluate the performance of the predictive tools developed in this study,

$$\:MAPE\:\left(\%\right)=\frac{1}{N}\sum\:\left|{\sigma\:}_{i}\right|\times\:100$$
(8)
$$\:RRMSE\:\left(\%\right)=\frac{\sqrt{\frac{1}{N}\sum\:{\left({HFTM}_{pre}-{HFTM}_{exp}\right)}^{2}}}{\frac{1}{N}\sum\:{HFTM}_{exp}}\times\:100$$
(9)
$$\:SD\:\left(\%\right)=\sqrt{\frac{\sum\:{\left({\sigma\:}_{i}-\overline{{\sigma\:}_{i}}\right)}^{2}}{N}}\times\:100$$
(10)

where \(\:{\sigma\:}_{i}\) is the relative deviation of ith predicted value,

$$\:{\sigma\:}_{i}=\frac{{HFTM}_{pre,i}-{HFTM}_{exp,i}}{{HFTM}_{exp,i}}$$
(11)

Results and discussions

Development of the novel models for HFTM

Following the framework suggested in Eq. (7), the collected experimental samples were employed to build reliable models for determining the HFTM. The design of these models was accomplished through the utilization of the machine learning methods of SVM and DT. Initially, the dominant portion (80%) of the dataset, termed the training subset, was dedicated to establish the predictive models. Then, the effectiveness of models was assessed using the remaining 20% of the dataset, known as the testing subset. The error assessment metrics of the novel models for both the training and testing phases have been represented in Fig. 2. Clearly, during the training stage, the differences between measured values and those calculated by the intelligent models is extremely low, with all error indices’ values falling below 1%. This observation confirms the capability of the employed intelligent techniques to effectively learn the problem. Regarding the testing phase, both models demonstrate high precision; however, the SVM methodology exhibits slightly better agreements with measured data with MAPE, SD and RRMSE values of 0.26%, 0.78% and 1.95%, respectively. Such results highlight the great accuracy and truthfulness of this technique for estimating the HFTM. The DT model also performs well in the testing stage, yielding MAPE of 0.45%, and SD of 0.66%. Accordingly, it is another a robust predictive technique for the HFTM. In conclusion, the newly established machine learning models provide effective performance for the scope of this research.

Fig. 2
figure 2

Error metrics of the intelligent techniques for predicting the HFTM.

Assessment of the new models

In the current section, two visual techniques, namely cross-plots and cumulative frequency diagram are employed to examine the performance of the recently proposed models.

The precision of the suggested intelligent models in estimating the HFTM has been assessed through a comparative analysis of their outputs against actual data in the cross-plots of Fig. 3. A greater abundance of the estimated values along the diagonal axis represents the enhanced predictive capability. As it is clear, a remarkable proportion of the values calculated by both intelligent techniques have high concordance with the experimental data. This fact is more evident regarding the SVM model, whose predicted values are generally close to the best-fit line. This observation demonstrates the robust capability of this model for predicting the HFTM. The other intelligent model designed based on the DT method presents reasonable performance for all analyzed data, and its outcomes, across both the training and testing subsets, fall within \(\:\pm\:\)2% error bounds. Accordingly, this analysis reflects the fact that both machine learning algorithms utilized in this study provide strong performance in describing the HFTM in saline water solutions. Overall, the SVM model yields superior predictions, owing to its ability for discovering the complicated relationships between factor. This ability arises from the inclusion of various hyperparameters, such as penalization parameter and kernel function, which contributes to the versatility and accuracy of the model110,118,119. However, this method often lacks interpretability, making it harder to understand the modeling procedure. The DT model, despite slightly lower accuracy, offers the advantage of understandable rules, which enable users to gain valuable insights into the data and the behavior of the model98,120,121.

Fig. 3
figure 3

Comparison between the experimental values of HFTM and the outputs of the intelligent techniques.

Figure 4 represents a comparative assessment of the proposed HFTM models, utilizing the cumulative frequency plot. This visual representation shows the fractions of data points that are within specific levels of relative error for various models. A highly accurate model should exhibit an ascending trend in its cumulative frequency curve at lower error margins. As is evident, both newly developed models display considerable cumulative frequencies at small error bounds, affirming their high precision in estimating the HFTM. The SVM model’s curve consistently lies above that of the DT model. This observation signifies the greater accuracy of the predictive approach developed by the SVM technique. In fact, this model can estimate 84.30% of the dataset within a 0.10% error threshold; this rises to 93.24%, 99.90%, 98.76% and 98.95% within 0.20%, 0.50%, 0.70%, and 1% margins of error, respectively. The corresponding cumulative frequency values for the DT are 33.11%, 55.57%, 79.83%, 86.68%, and 93.72%, respectively. Consequently, the DT model demonstrates a reduced precision level when compared with its SVM counterpart. Overall, these results highlight the high level of accuracy attainable by the novel intelligent models and confirm the elevated predictive capability of the SVM-based methodology.

Fig. 4
figure 4

Cumulative frequency curves of the SVM and DT models.

Detection of suspected data

The reliability of the models designed based on experimental data can be influenced by some problematic data, called outliers. Outliers are data samples that remarkably deviate from the typical patterns observed in the dataset. Several problems lead to the occurrence of these observations, such as errors made by humans, uncertainties in measurements, and the complicated nature of the system’s behavior. Examining the impact of outliers on the integrity of models is crucial to confirm their robustness. In this investigation, as a well-known technique for identifying the suspected data, the William’s plot has been utilized. This method evaluates the validity of data by plotting the values of standardized residual (SR) for all data versus the corresponding hat values (H). Based on the values of SR and H, the William’s plot includes the following zones,

  1. i.

    \(\:H\le\:{H}^{*}\) and \(\:\left|SR\right|\le\:3\)

  2. ii.

    \(\:H>{H}^{*}\) and \(\:\left|SR\right|\le\:3\)

  3. iii.

    \(\:\left|SR\right|>3\)

Samples falling into zones i, ii, and iii are designated as reliable data, high-leverage points, and outliers, respectively. The symbol \(\:{H}^{*}\) denotes the warning leverage limit, calculated as,

$$\:{H}^{*}=\frac{3\left(S+1\right)}{N}$$
(12)

where S and N symbolize the number of input variables and the total size of the dataset, respectively.

The results of applying the William’s plot on the outcomes of the SVM model have been visualized in Fig. 5. It is seen that a large majority of the analyzed dataset (1010 data) fall within the valid zone, demonstrating the robustness of both the acquired data and the developed models. Furthermore, there are 30 data points classified as high-leverage samples. This shows that even HFTM data deviating considerably from the typical experimental conditions have been accurately predicted by the SVM model. In contrast, a negligible part (less than 2%) of the data, is identified as outliers. This insignificant quantity does not significantly affect the overall validity of the proposed models. Therefore, the experimental data benefit from a strong level of truthfulness, which in turn enables the application of the suggested models with high assurance.

Fig. 5
figure 5

William’s plot of the HFTM data based on the outcomes of the SVM model.

Trend analysis

In this section, the capability of the newly established models to capture the physical behaviors is evaluated through the analysis of the impacts of pressure, salinity, and salt properties on the HFTM. This evaluation is carried out via the outcomes of the SVM model, which this research has established as the superior predictive method.

Figure 6 illustrates the effects of both pressure and the nature of dissolved salts on the HFTM, when the mass fractions of all evaluated salts have been kept constant at 0.1. Examination of the figure reveals a consistent upward trend in HFTM as pressure increases, regardless of the saline water solution utilized. In contrast, the employment of saline water solutions, instead of pure water, leads to a depression in the HFTM. This reduction originates from the presence of salts, which, through a mechanism termed “salting-out”, diminishes the concentration of free water molecules within the aqueous phase. This occurrence disturbs the close arrangement of water molecules and weakens their hydrogen bonds. Therefore, the effectiveness of the water and the gas-water interactions are reduced, and this results in the inhibition of clathrate hydrate formation. The SVM model demonstrates a high level of precision in representing the observed tendencies in HFTM, with its predicted values having close consistencies with experimental data.

Fig. 6
figure 6

Effects of pressure and salt characteristics on the HFTM according to the outputs of the SVM model.

Figure 7 depicts the correlation between ZnBr2 concentration and HFTM. Even small additions of this salt to aqueous solution lead to a decrease in the HFTM. This effect becomes more pronounced at higher ZnBr2 concentrations. Increasing the amount of dissolved salts disrupts the hydrogen-bonding network and reduce the activity of water. This makes it more difficult for water molecules to form the cage-like structures required for hydrate formation. To compensate for this, the temperature must be diminished to provide a greater thermodynamic driving force for hydrate formation. The SVM model provides a good prediction of the above behaviors, showing excellent agreements with experiments.

Fig. 7
figure 7

Effect of salinity on the HFTM according to the outputs of the SVM model.

Sensitivity analysis

To examine how each input variable affects the performance of the new predictive tools, herein, a sensitivity analysis based on the Pearson’s correlation coefficient (PCC) is presented,

$$\:PCC\left({T}_{E},X\right)=\frac{\sum\:_{i=1}^{n}\left({X}_{i}-\overline{{X}_{i}}\right)\left({T}_{E,i}-\overline{{T}_{E,i}}\right)}{\sqrt{\sum\:_{i=1}^{n}{\left({X}_{i}-\overline{{X}_{i}}\right)}^{2}\sum\:_{i=1}^{n}{\left({T}_{E,i}-\overline{{T}_{E,i}}\right)}^{2}}}$$
(13)

where \(\:X\) signifies a given input variable. Also, \(\:\overline{{X}_{i}}\) and \(\:\overline{{T}_{E,i}}\) stand for the average values of the input variable and equilibrium temperature, respectively.

The value of PCC, ranging from − 1 to + 1, determines the degree of significance. A value of -1 indicates a strong inverse correlation, while + 1 exhibits a strong direct correlation between input variables and the HFTM. Additionally, a PCC value approaching zero shows a minimal influence of that particular parameter on the HFTM. It should be noted that PCC quantifies the strength of linear associations between variables. However, its utility is diminished when the correlation between factors is highly non-linear. As demonstrated in “Trend analysis”, the relationships observed between the HFTM and its input factors approximate a linear pattern, which can be properly described by PCC.

Figure 8 represents the importance of each input feature in controlling the HFTM. This figure reveals a positive correlation between pressure and HFTM. On the other hand, factors such as salinity, the salt’s properties (melting temperature and molecular weight) display negative correlations with HFTM. These findings are in accordance with the previous discussions in “Trend analysis”. Figure 8 also shows that operational pressure is the most fundamental factor affecting the HFTM. Moreover, salinity, the salt’s melting point, and molecular weight are identified as the second, third, and fourth most influential factors, respectively. Hence, the present analysis suggests that employing lower pressures and the use of high salinity water is the best way for inhibiting the methane hydrate formation.

Fig. 8
figure 8

Relevancy factors between HFTM and input features.

Comparison of the novel models with literature predictive frameworks

Previous studies, as indicated in first section, have used thermodynamic models to correlate their HFTM data. The strategies implemented in the foregoing studies, along with their error metrics for the analyzed data, have been summarized in Table 2. For comparative purposes, the performance of the SVM model under each condition has also been presented in the table. Despite achieving acceptable outcomes, the conventional methodologies have been verified only using limited datasets, thereby limiting their applicability to certain saline waters and operational conditions. Moreover, the calculation of HFTM using these correlations can be quite laborious, which limits their usage in engineering application. The newly developed models, however, can be regarded as efficient computational techniques, which enable HFTM prediction across an extensive range of saline water solutions and operational conditions using only four straightforward parameters. Furthermore, the MAPE values achieved by the SVM model remain below 0.2% for all cases analyzed in Table 2, which are better than those of the literature models. Consequently, the novel machine learning tools represent considerable advancements in the field of HFTM prediction from the standpoints of comprehensiveness, precision and simplicity.

Table 2 Accuracy of the literature models for their analyzed HFTM data, and the results of the SVM model for the same data.

Summary and conclusions

The main objective of this research was to develop intelligent computational methodologies for estimating the HFTM within saline water solutions. A comprehensive set of experimental measurements was assembled for this purpose, featuring 1051 samples from 23 independent studies. The collected databank covered the HFTM in 26 unique saline water solutions across a spectrum of operating conditions. The modeling process employed two heuristic computational approaches, i.e., SVM and DT. The operating pressure and the characteristics of salts were employed as input features to model the HFTM. A performance assessment based on statistical metrics indicated that the developed models benefit from excellent predictive capabilities. During the testing stage, the SVM methodology yielded a MAPE of 0.26% and a SD of 0.78%, while the corresponding values for the DT model were 0.45% and 0.66%, respectively. A suite of visual comparative analyses demonstrated that the SVM model has better predictive capabilities when compared to the DT approach, and gives relative deviations below 0.20% for a predominant part of HFTM data. Furthermore, the presented intelligent techniques properly depicted the physical relationships between HFTM and various operating parameters. The integrity of the collected data and the credibility the suggested models were verified through the William’s plot analysis, less than 2% of all data were outliers. A sensitivity analysis based on new models identified operational pressure and the salinity as the most fundamental factors influencing the HFTM. Ultimately, A comparison between the conventional predictive frameworks and the novel models demonstrated that the latter allow the straightforward calculation of HFTM in diverse saline water solutions with higher accuracy.

In conclusion, this study introduced advanced computational techniques with excellent performance in describing the HFTM in saline water solutions. The findings contribute to the efficient design of the pertinent large-scale industrial processes.