Introduction

Besides hydrocarbons, crude oil includes other constituents such as organic sulfur containing compounds, including benzothiophene (BT), dibenzothiophene (DBT), and other thiophenic compounds in significant concentrations. Sulfur oxide gasses (SOx) are released into the atmosphere by burning fuels containing sulfur-containing compounds in industrial boilers and car engines1. The presence of SOx in the air leads to the degradation of the ozone layer, the formation of acid rain, and a decrease in soil fertility. In addition, the existence of sulfur compound derivatives in petroleum products leads to corrosion issues in petroleum refinery apparatus as well as poisons catalysts. Therefore, removing sulfur from the crude oil and its derived products is crucial for safeguarding the environment and human health by mitigating air pollution2.

Recently, the top five promising techniques such as hydrodesulfurization (HDS)3, biodesulfurization (BDS)4, extractive desulfurization (EDS)5, oxidative desulfurization (ODS)6, and adsorptive desulfurization (ADS)7 were widely used for desulfurization purposes8. Although the HDS method is the most commonly used technique for removing sulfur from fuels, it has some drawbacks, including a costly process due to high temperature and pressure demands, hydrogen consumption, and catalyst fouling issues, which limit its applicability. Among the mentioned desulfurization strategies, ADS is considered as a promising alternative to tackle the mentioned problems due to its advantages such as mild-condition performance, cheap operating cost, high desulfurization efficiency, and excellent protection of fuel purity8. Many adsorbents, such as porous carbon materials9,10, metal oxides11,12, zeolites13, and metal organic frameworks (MOFs)14,15,16,17, were investigated in adsorption applications. Among the mentioned adsorbents, MOFs are an emerging category of reticular-structured nanoporous crystal materials composed of organic ligands (e.g., nitrogen-containing ligands, carboxylates, or chains) co-arranged with inorganic structural units (e.g., metal ions, chains, or clusters). MOFs have the greatest potential for use in desulfurization and other adsorption fields due to their extremely high porosity, vast array of pore structure possibilities, and modifiable pore size18. Considerable advancements have been achieved in removing sulfur compounds using a wide range of MOFs. For example, MIL-47 and MIL-53 were employed by Khan et al.19 to eliminate benzothiophene (BT). It was discovered that the presence of the metal clusters inside the MOFs act as an acidic site that can adsorb the BT component through an acid-base interaction. Peralta et al. discovered that the thiophene removal performance of CPO-27-Ni, which possessed coordinately unsaturated sites, was favorable due to the interaction between the metal centers of the MOFs and the delocalized π electrons of the aromatic ring20. A comprehensive analysis of the π-complexation was presented in the work performed by Khan et al.21. For the purpose of dibenzothiophene removal, Qin et al. synthesized HKUST-1@al2O3 composite using nanosized HKUST-122. It was determined that the performance of the composite material was superior to that of bare HKUST-1 due to its shortened diffusion channels and more effective utilization of active sites. In another study, Li et al. studied the desulfurization efficiency of HKUST-1towards the adsorption of H2S, CH3SCH3, and C2H5SH23. Similarly, in another research performed by Ahmed et al.7, the sulfur removal capability of the several MOF materials such as HKUST-1, MOF-505, MOF-177, UMCM-150, MIL-53, and MOF-74 were examined.

Generally, physical interaction (Vander Waals interaction) or chemical bonding (σ and π bonds) between the sulfur-containing compounds and active sites on the adsorbent’s surface determines the selectivity of the adsorbents. Chemisorption often occurs through a combination of the following mechanisms8,24:

  1. (i)

    Either the π orbitals of organic sulfur compounds transfer electrons to the vacant s-orbitals of adsorbent metal atoms or the d-orbitals of metal atoms in the MOF’s skeleton donate electrons to the antibonding π orbitals of sulfur containing compounds in π complexation.

  2. (ii)

    Acid-base interaction — the metallic sites of the adsorbent (e.g., Fe, Cr, Al, Cu, Zn, Co, Ag) interact with the basic sulfur atoms of the sulfur-containing compounds through Lewis acid-base interaction.

  3. (iii)

    By transferring a lone pair of electrons from sulfur atoms in the thiophenic compounds to the metal atoms in the MOF’s structure, a strong sulfur-metal σ bond is formed.

Physisorption by van der Waals forces does not determine the selectivity of sulfur compound adsorption. The π-complexation connection is more robust than the van der Waals interaction, allowing greater selectivity for sulfur compounds compared to other chemicals. Adsorbents containing metals including Cu, Ag, Pd, and Pt effectively adsorb TH, BT, and DBT by π-complexation. Direct sulfur-metal bonds facilitate the selective adsorption of sulfur compounds8.

Thus far, a considerable number of MOFs have been synthesized. Furthermore, the potential for designing further MOFs exists by exploring other combinations of building blocks. Nevertheless, the task of determining the most suitable MOFs from this vast collection, as well as creating new MOFs with excellent performance using conventional experimental synthesis and trial-and-error approaches, may be both time-consuming and expensive25. In order to tackle this issue, machine learning (ML) has been created and applied in several fields such as medical, economics, engineering, and environmental management. The primary objective of ML is to analyze, design, and enhance mathematical models that can be trained using context-specific data to predict future outcomes and make choices even in the absence of comprehensive information about all the affecting factors26. Machine learning enables the creation of algorithms that may employ existing data to identify patterns that connect the input variables (dependent variables) with the output variable(s) (independent variables) without the need for explicit programming. The induced patterns serve as the foundation for constructing a model that is utilized to forecast the result when novel, unseen inputs are provided. This approach is highly valued for its strong learning abilities27. For instance, Boyd et al. conducted data mining on over 300,000 MOFs and devised a method to find customized MOFs for CO2 capture28. Teng and Shan, emphasized the significance of extreme gradient boosting (XGBoost) model in elucidating the essential parameters affecting CO2 uptake, identifying molecular structure and pore size as pivotal drivers across different operational condition29. Ercakir et al. used different ML techniques alongside Grand Canonical Monte Carlo (GCMC) molecular simulation to assess CO adsorption in hypothetical MOFs, pinpointing tiny pores and particular metal clusters as indicators of elevated adsorption30. Xu et al. combined high-throughput GCMC simulation and ML methods to evaluate the CF4/N2 separation efficiency of about 690 samples from computation-ready experimental metal-organic frameworks (CoRE-MOFs)31. As a result, they concluded that the highest adsorption capacity as well as maximum selectivity could be obtained by focusing on Zn-rich frameworks. Lee et al. combined GCMC simulation results with artificial neural network to develop a predictive model for methane adsorption capacity of more than 100 trillion MOF samples32. Next the genetic algorithm approach was applied to identify the most appropriate set of MOFs with the highest methane uptake capacity. The result indicated that about 96 MOF samples can adsorb methane gas over 208 cm3/g. Anderson et al. shown that using density functional theory (DFT), and GCMC simulation data enabled the training of deep neural network (DNN) models to forecast CO2 adsorption capacity of hMOFs across diverse process condition33. Ma et al. constructed a DNN model with two hidden layers on 13,506 hypothetical MOFs to predict H2 adsorption at 100 bar and 243 K34. The R2 was obtained 0.998, indicating that deep learning is a very promising method for investigating H2 adsorption capabilities of MOFs. Fernandez et al. created a quantitative structure-property relationship (QSPR) classifier by selecting just 10% of the whole library and selected prospective MOFs with superior CO2 adsorption capacity (higher than 1 mmol/g at 0.15 bar pressure and more than 4 mmol/g at 1 bar)35. A summary of the results of the related literature, studied the ML based modeling of the MOF materials in the adsorption applications are reported in Table 1.

Table 1 A summary of similar studies’ outcomes on developing ML-based model for MOF materials’ adsorption capabilities

Notwithstanding the significant advancements in the application of ML models to develop predictive models for MOF materials, most current research predominantly depends on theoretical data obtained from hypothetical MOFs created via molecular dynamics (MD), Density Functional Theory (DFT), and Grand Canonical Monte Carlo (GCMC) simulations. Although these methodologies are beneficial for investigating novel MOF characteristics and comprehending basic interactions, they are hindered by considerable constraints. Hypothetical datasets frequently neglect to encapsulate the intrinsic complexity and variabilities of actual MOFs, including defects, synthesis conditions, and departures from idealized structures. Furthermore, these datasets lack direct experimental validation, raising questions regarding their prediction accuracy and practical usefulness in real-world contexts. For example, features forecasted from theoretical data may overlook contextual variables or dynamic interactions in real-world applications48. In the case of thiophenic chemical adsorption, dependence on hypothetical data may result in the overestimation or underestimating of adsorption capacities and selectivities. This shortcoming highlights the necessity for predictive machine learning models developed from experimental data to reconcile theoretical predictions with empirical observations. Moreover, in the domain of sulfur adsorption, particularly the removal of TH, BTH, and DBT from petroleum oil utilizing MOF materials, there is no predictive model to forecast the desulfurization efficacy of MOF materials, considering operational conditions such as process temperature and the initial sulfur concentration in the oil. Therefore, this study seeks to improve the robustness and usability of ML models by employing empirically confirmed data from literature, hence offering more trustworthy insights into the adsorptive desulfurization performance of MOFs toward removing thiophenic compounds in practical applications.

In this work, a big data mining process is conducted to collect a dataset containing 676 row data from the related literatures, studied the MOF materials’ adsorption capabilities for removing thiophenic compounds including BT, DBT, and DMDBT. Some features such as metal ions type, average pore diameter, total pore volume (TPV), BET surface area, adsorption temperature, oil/adsorbent ratio, adsorbate kinetic diameter (KD), initial concentration of sulfur compound (C0), adsorbate dipole moment (Dm), solvent molecular weight (MWs), adsorbate molecular weight (MWa), and contact time are considered as input parameters to correlate the adsorbent features, adsorption conditions, and adsorbate characteristics to the MOF sample’s sulfur adsorption capability. The gathered data is used to develop five ML models including support vector machine (SVM) regression, random forest, extra tree, extreme gradient boosting (XGBoost) regressor, and multi-layer perceptron (MLP). Next, the mentioned models performance are evaluated using different metrics namely correlation coefficient (R2), mean square error (MSE), and mean relative error (MRE), and the best model with the highest accuracy is obtained. Finally, the features importance analysis as well as optimization using genetic algorithm technique are performed based on the best ML model and the most significant factors and their corresponding optimal values are determined.

Data processing and methods

Data collection

The adsorption capacity data of the 40 different MOF samples toward removing sulfur contentsspecially 4,6-dimethyl dibenzothiophene (4,6-DMDBT), dibenzothiophene (DBT), and benzothiophene (BT) were collected from 14 research articles. Although, the MOF materials’ textural characteristics such as specific surface area, pore volume, and average pore size are considered as the most influential factors affected the MOF’s desulfurization efficiency, some studies demonstrated that the type of metal cluster, used as secondary building unit in MOF synthesis, has a major impact on adsorption capability of the MOFs49,50,51. Metal clusters with mixed or higher valence have a key role on improving the acid-base interaction between thiophene containing compounds and metal ions52. Therefore, in this study a dataset containing 676 rows including the MOF’s metal cluster type, MOF’s structural characteristics, operational condition, and oil/solvent features were gathered. The obtained dataset’s features include the MOF’s metal ions type, average pore diameter, total pore volume (TPV), BET surface area, adsorption temperature, oil/adsorbent ratio, adsorbate kinetic diameter (KD), initial concentration of sulfur in the model fuel (C0), adsorbate dipole moment (Dm), solvent molecular weight (MWs), adsorate molecular weight (MWa), and contact time. Data extraction from articles, tables, and figures was done carefully using Origin Lab’s Digitizer tool. This study emphasizes ensuring data integrity and minimizing missing values.

Data preprocessing

Although, the common method in many data analysis scenarios is to remove outliers to increase model performance and ensure robustness, no outlier method was used in the current study to maintain the integrity and completeness of the data set. By remaining outliers, the goal is to preserve all possible insights about the importance of each feature and its impact. This approach is aimed at developing a comprehensive understanding of MOF properties and their behavior in sulfur adsorption processes. Moreover, the dataset was thoroughly analyzed using the pandas’ library in Python. This process involves summarizing the data set to understand its basic statistical characteristics, such as the mean, standard deviation, median, and range of values for each characteristic. Another important aspect of the preprocessing step is data visualization. Therefore, boxplots that display data distribution based on a five-number summary including minimum, first quartile (Q1), median, third quartile (Q3), and maximum are employed to analyze the collected data. To further examine the dataset, Pearson correlation heatmaps were employed. Heatmaps were drawn using the Seaborn library offer a visual depiction of the correlation matrix.

To define the categorical feature namely metal cluster’s type as input data for training machine learning algorithms, feature encoding was performed similar to the procedure conducted by Bailey et al.53, using one-hot encoding method. Table 2 reports a descriptive statistics of the categorical feature. Also, Table 3 describe a detailed information including the central tendencies and measures of dispersion of the data set for the numerical features. Numerical features were normalized using Min Max Scaler, and the target variable was standardized using Standard Scaler from the Scikit- Learn preprocessing module54, as shown in Eqs. 1 and 2, respectively55. This standardization is significant since it normalizes the data while ensuring that each feature contributes equally to the model’s performance and is not influenced by its size. Following the preprocessing step, the data is divided into two sets including training and testing datasets. The train/test split module from the Scikit-learn library was utilized to divide the data into the mentioned categories. In order to assess the model’s performance using data that it has not been trained on, the test_size was specified as 0.2. It signifies that 80% of the data was utilized for training purposes, while the remaining data was employed as test data to assess the performance of the learning models.

$$\:{x}_{scaled}=\frac{{x-x}_{min}}{{x}_{max}-{x}_{min}}$$
(1)
$$\:{z}_{i}=\frac{{y}_{v}-{y}_{mean}}{{y}_{std}}$$
(2)
Table 2 Categorical data description
Table 3 Numerical data description

Machine learning modeling approaches

Diverse machine learning algorithms can be employed to address a wide range of classification, and regression tasks. The current problem lies in determining the optimal model and hyperparameter configurations that would yield superior performance on the specific dataset. This scenario’s optimization method comprises some learning approaches and hyperparameters. In order to achieve the highest level of prediction accuracy and identify the ideal set of hyperparameters, it is necessary to generate a large number of hyperparameter combinations. Furthermore, the accuracy of the optimal prediction may be attained by investigating various combinations of hyperparameters.

This study utilizes five specified models, each accompanied by a concise description. The models used include Random Forest regression, Support Vector Machine (SVM), Extra Trees, Extreme Gradient Boosting (XGB), and Artificial Neural Network (ANN) with Multi-layer Perceptron (MLP) architectures, respectively. The optimization process of the hyperparameters of the mentioned models was carried out using Optuna56, an open-source hyperparameter optimization framework in Python. The optimization was conducted using five-fold cross-validation technique57 to identify the most efficient set of hyperparameters and mitigate overfitting during the model training, excluding the ANN model. In k-fold validation method (k represents an integer), the training dataset is divided into k-folds of roughly similar size. Each fold is treated as a test set, while the remaining k-1 folds are treated as the training set. The model evaluation metrics are subsequently averaged or integrated58. A Flowchart of the various steps of this research is illustrated in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Flowchart of the various steps of this research

Support Vector Machine (SVM)

SVM is a machine learning training approach that may be applied to regression and classification problems. Unlike many machine learning methods where the objective is to reduce the cost function, maximizing the margin between support vectors using a separating hyperplane appears to be the main objective of SVM59. It includes both linear and nonlinear regression in addition to linear and nonlinear classification. Reversely pursuing the objective is the key to employing SVMs for regression instead of classification55. Throughout the learning process, the Support Vector Machine, or SVM, acquires knowledge about the significance of every data point to depict the decision boundary separating the different samples accurately.

The decision boundaries establish the sample boundary, which is referred to as support vectors. A new data point is predicted in the prediction stage by measuring its distance to each support vector and then using those vectors as a basis for the prediction. The Gaussian kernel, represented by Eq. (3), is used to measure the distance between data points60.

$$\:\text{K}\left(\:{x}_{i},{x}_{j}\right)={e}^{\left(\gamma\:({\left|\right|{x}_{i}-\:{x}_{j}\left|\right|)}^{2}\:\right)\:}$$
(3)

where the xi and xj represent the data points. The Euclidean distance between xi and xj is calculated using the term (||xi − xj||)2, also the parameter \(\:\gamma\:\) is used to regulate the width of the Gaussian kernel. In the present study, SVM regression was performed by utilizing the SVR class of the SVM tool of the Scikit-learn module.

Random Forest

The Random Forest Regressor is a popular and adaptable machine learning technique for regression applications. It works by creating a large number of decision trees during training and then calculating the average predicted values of each tree. This ensemble technique allows for more accurate and reliable forecasts than a single decision tree. The approach begins with bootstrapping, which involves randomly picking replacements from the training dataset to generate various subsets of data. For each bootstrap sample, a decision tree is built; hence, throughout the development of these trees, a random subset of characteristics is chosen at each split, contributing to the trees’ variety. The random forest collects its forecasts once all decision trees have been created. Regression issues, are usually accomplished by averaging the outputs of all trees. One of the critical characteristics of the Random Forest Regressor is its capacity to assess feature relevance. It determines how much each feature contributes to the model’s accuracy by examining how much error occurs when the data for that feature is randomized while all other characteristics remain constant. This feature is extremely useful for understanding the driving forces behind the model’s predictions, which aids in interpretability and future feature selection55,61. In this study, random forest regression was applied using the ensemble tool of the Scikit-learn module.

Extra Trees

The Extra Trees Regressor, also known as the Extremely Randomized Trees Regressor, is an important approach in machine learning for regression problems. It is similar to the Random Forest algorithm, but its principal distinction is the method used to calculate decision tree splits. Like Random Forest, Extra Trees uses a dataset to create an ensemble of decision trees. The method of split selection is the primary distinguishing factor. While Random Forest uses a randomized subset of characteristics to find the best split, Extra Trees goes a step further by randomly generating the divides themselves. Extra Trees finds a split point at random rather than attempting to find the most optimal split for each feature. After creating the trees, the algorithm forecasts by average the results of all the trees in the ensemble62,63.

Extreme gradient boosting (Xgboost) regressor

The XGBoost Regressor is an improved and fast implementation of gradient boosting, a robust machine learning technique that is commonly employed in regression applications. XGBoost is well-known for its speed, performance, and its ability to efficiently handle large and complex datasets. The model uses the gradient boosting architecture. New models are added in a sequential order to address faults generated by older models. It begins with a starting prediction (usually a fixed value) and iteratively adds new models (often decision trees) to improve on earlier models. XGBoost creates a new tree at each step to anticipate the residuals or errors of previous trees. It employs a gradient descent technique to reduce the loss function, which measures the difference between anticipated and actual values. A crucial aspect of XGBoost is its built-in tuning (both L1 and L2), which helps prevent over-installation. To construct this model, the XGBRegressor module of the xgboost package was used64.

Artificial neural network-multilayer perceptron (ANN-MLP)

The MLP, or Multi-Layer Perceptron, is an artificial neural network (ANN) that is extensively employed for a range of applications, such as prediction, classification, and optimization tasks. The MLP networks have a significant benefit in their capacity to represent intricate non-linear connections, making them highly valuable in cases when traditional linear models may be inadequate. The MLP network is composed of numerous layers of nodes or neurons, where each layer is fully interconnected with the preceding and succeeding levels. The number of hidden layers and neurons in each layer may differ based on the problem’s complexity65. The MLP network is specifically designed to acquire knowledge of complex, nonlinear correlations between input and output variables by adjusting weights and biases in the inter-neuron connections. During the training process, the network is exposed to a collection of input-output pairs, and the weights and biases are adjusted using an optimization algorithm, such as backpropagation, to reduce the discrepancy between the expected output and the actual output. Neurons compute output signals by adding up the weighted input data followed by comparing this sum to the threshold or bias value of the neuron66.The output data is produced by applying an activation function to the sum of the input data, but only if the sum value surpasses the bias value of the neuron. Equation 467, represents the standard format for computations in artificial neural networks (ANNs).

$$\:Y={f}_{2}\left(\sum\:_{i=1}^{n}{w}_{j}\times\:{f}_{1}\left(\sum\:{h}_{i}{x}_{i}+{b}_{j}\right)+{b}_{0}\right)$$
(4)

The terms hi, bi, and f1 indicate the weight values, biases, and activation function of the neurons in the hidden layer, respectively. Similarly, wj, b0, and f2 represent the weights, biases, and activation function of the neurons presented in the output layer67. Non-linearity is introduced by activation functions such as sigmoid, tangent hyperbolic (tanh), and rectified linear unit (ReLU). By arranging layers of neurons and executing calculations, MLPs are able to learn complex patterns throughout training68. Important hyperparameters of MLP networks include the particular details of the architecture’s hidden layers, the activation functions used, the learning rate, the batch size, and the quantity of training epochs57. In this study, the MLP modeling was conducted by utilizing the Keras framework, explicitly leveraging the dense function to design the network’s layers intricately. Dropout was used as a regularization technique to mitigate the overfitting problem. Overfitting is a common problem in machine learning where the model performs well on training data but poorly on new, unseen data. By randomly deactivating a portion of the neurons during training, dropout prevents complex co-adaptations on training data, thereby improving the model’s generalization ability69. Hyperparameter tuning was conducted using methods such as grid search or random search, which involved examining different combinations of hyperparameters. As a result, the optimal architectures for each algorithm were identified, leading to higher performance on the dataset.

Genetic algorithm principles

The genetic algorithm (GA) methodology is a stochastic global optimization technique grounded in an iterative process that emulates evolutionary processes in biology. In contrast to gradient-based methods for nonlinear parameter identification, the genetic algorithm (GA) approach does not need gradient computation and is more likely to identify the global optimum without succumbing to local minimums. The GA method commences with a population of randomly generated individuals (starting parameter estimations), each assessed for its efficacy in addressing the specified optimization problem. Each iteration, referred known as a generation, entails a competitive selection process to eliminate inferior people. Subsequent to selection, both mutation and crossover are employed on the superior individuals to generate offspring. The updated parameter estimations serve as the foundation for the subsequent generation. The method is repeated until convergence across the population is attained. The employed selection algorithm integrates choosing tournaments with exclusivity. Tournament selection involves randomly picking two people from the general population and advancing the superior person to the subsequent generation. In the event of exclusivity, select individuals are permitted to survive into the subsequent generation without interference from crossover or mutation70,71. A variety of user-defined parameter settings govern the genetic algorithm and influence its functionality. The adjustment variables are population size, generation count, crossover probability, and mutation probability. The selection of these adjusting factors is contingent upon the specific problem. Preliminary experiments demonstrated that the GA program exhibited resilience to adjustments in tuning parameters70.

ML based models’ evaluation merits

The models’ performance is evaluated using the following metrics (R2, MSE, and Mean Relative Error), and ultimately, MRE criterion took into consideration in order to choose the optimal model.

Mean relative error (MRE)

Relative error for each prediction is the absolute difference between the predicted value and the actual value, divided by the actual value:

$$\:MRE=\frac{1}{n}{\sum\:}_{i=1}^{n}\frac{\left|y_{i}-{\hat{ y_{i}}}\right|}{y_{i}}\times\:100$$
(5)

Mean squared error (MSE)

MSE can also be considered as a loss function that needs to be minimized. MSE is commonly used in practical machine learning applications since it assigns higher penalties to larger failures55.

$$\:MSE=\frac{1}{n}\sum\:_{i=1}^{n}(y_{i}-{\hat{ y_{i}}})2$$
(6)

Coefficient of determination (R2)

It evaluates the model’s adequacy for the reliable, scientific outcomes. A greater R2 value indicates a stronger fit between the predictions and the experimental data. The calculation for R2 is as follows72:

$$\:{R}^{2\:}=\frac{{\sum\:}_{i=1}^{n}{(Y}_{predicted\:\:}-{Y}_{actual})2}{{\sum\:}_{i=1}^{n}({Y}_{predicted\:}\:-{Y}_{mean})2}$$
(7)

where Ymean is the mean of the actual quantities.

Results and discussion

Quantitative analysis of the collected dataset

Boxplot provides a graphical representation of the distribution and variability of the data. This visualization helps to better understand the dispersion and central values of the features used in machine learning models, providing a comprehensive view of the dataset’s features. Figure 2 exhibits the boxplots for some numerical features with the highest data dispersion. In this figure, data inconsistency was measured utilizing the interquartile gap (IQG), which was obtained by dividing the data into quartiles. In each figure, the minimum, first quartile (Q1), middle, third quartile (Q3), and maximum data values were denoted by five lines arranged from bottom to top, respectively. The provided data pertains to scientific findings within the Q1–1.5*IQG and Q3 + 1.5*IQG; the IQG represents the variation between the third and first quartiles. Data outside the specified area were presented in a distinct, folded form. According to the Fig. 2, the substantial standard deviation (SD) of 687.78 m2/g for the BET surface area suggests a considerable variation among the MOF materials, employed as solid sorbent. Conversely, the total pore volume has a significantly reduced variability, as seen by the SD of 0.37 cm3/g. The pore size exhibits a moderate level of variability, as indicated by the SD of 2.68 nm, and the oil/ adsorbent ratio’s SD of 265.68 g/g demonstrates a notable degree of variability in the experimental conditions. Significant variation in the duration of the process is indicated by the comparatively high SD of 5.75 h for the adsorption time. Furthermore, the moderate SD of 35.86 mg/g for sulfur adsorption capacity indicates the extent of variation in the sulfur uptake capabilities of the used MOFs.

The Pearson’s correlation coefficients for each numerical feature were also examined; the results are shown in Fig. 3. According to Fig. 3, the sulfur adsorption time had the most vital link with adsorption capacity of the MOFs, with a moderately significant R2 of 0.43 being observed. This demonstrates the significant impact of contact time between MOFs and the sulfur containing oils on the sulfur adsorption process’s performance. The association between an increase in sulfur initial concentration and an increase in adsorption capacity is well-established regarding the R2 of 0.39 obtained. Based on statistical analysis, significant relationships were identified among the MOF materials’ structural properties such as TPV and BET surface area. As shown in Fig. 3, a moderate and statistically significant association with the R2 of 0.44 can be observed between the total pore volume and the MOFs’ adsorption capacity, also the R2 value of 0.44 between BET surface area and the adsorption capability of the MOF materials refer to the positive impact of this feature on the adsorptive desulfurization process. Additionally, the negative association of the adsorbate molecule’s kinetic diameter and temperature with the MOF material’s sulfur uptake capability can be concluded regarding the R2 values of -0.34 and − 0.23, respectively.

The Pearson correlation coefficient matrix, shown in Fig. 4, reveals several significant relationships between metal ions and the sulfur adsorption capacity of MOF materials. According to this figure, copper (Cu) and zirconium (Zr) negatively affect the sulfur adsorption capacity of the MOF materials regarding the R2 values of − 0.016 and − 0.14, respectively. This phenomenon indicates that the presence of Cu and Zr in the MOF structure may not significantly improve the sulfur adsorption performance of MOFs. Similarly, iron (Fe) and aluminum (Al) have a moderate negative correlation with sulfur adsorption capacity, indicating that the presence of Fe and Al in the MOF structure may have an adverse effect on the material’s sulfur adsorption performance. In contrast, chromium (Cr) has a moderate positive correlation (R = 0.42) with sulfur adsorption capacity, implying that including Cr in the MOF design may help to improve the material’s sulfur adsorption capabilities strongly. Similarly, other metal ions namely Zinc (Zn), Vanadium (V), and Cobalt (Co), have a positive effect on the adsorptive desulfurization efficiency of the MOFs regarding their R2 values about 0.28, 0.11, and 0.11, respectively. Therefore, it can be concluded that the presence of the mentioned metal ions in the MOF materials’ structure may be beneficial for improving sulfur adsorption73.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Boxplot, summarizing the statistical characteristics of the numerical features

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Pearson correlation coefficient matrix to consider simultaneously the effect of process condition and structural properties of MOFs on adsorptive desulfurization efficiency

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Pearson correlation coefficient matrix to consider simultaneously the effect of metal ions type and structural properties of MOFs on adsorptive desulfurization performance

Models’ hyperparameters adjustment

SVM The hyperparameters that are taken into account during the optimization of SVM include ‘C’, ‘gamma’, ‘epsilon’, and different Kernels. The ideal values for these hyperparameters were obtained using The Optuna optimizer with the five-fold cross-validation method. The hyperparameters optimization process was repeated 60 times, and finally the tuned values of the hyperparameters namely C, gamma, epsilon, and kernel were obtained around 97.17, 0.22, 0.037, and, radial basis function (RBF), respectively.

Random Forest To optimize this model, the following hyperparameters are taken into account: n_estimators, min_samples_leaf, and min_samples_split. The hyperparameter optimization procedure was iterated 60 times, and the adjusted values of the n_estimators, min_samples_leaf, and min_samples_split were obtained about 240, 1, and 2, respectively.

Extra Trees To adjust hyperparameters of the extra trees model, the Optuna optimizer with the five-fold cross-validation approach was used. After 60 iterations of the optimization procedure, the best performing hyperparameters were found equal to 143 for n_estimators, 1 for min_samples_leaf, 2 for min_samples_split, 20 for max_depth, and and ‘auto’ for max_features parameters.

XGBoost To optimize this model, Various hyperparameters including ‘n_estimators’, ‘Learnin rate ‘, colsample_bytree’, ‘gamma ‘, ‘reg_alpha ‘, ‘reg_lambda’, ‘subsample’, and ‘max_depth’ are considered. After 60 iterations of optimization, the best-performing values of the mentioned hyperparameters were found equal to 1500, 0.0587, 0.66415, 0, 0.3, 0.3, 0.35, and 11, respectively.

MLP network After 60 rounds of the optimization procedure, a MLP network with three hidden layers was determined to be the optimal structure. The number of neurons inside each layer, starting with the first layer, were found to be 394, 140, and 170, sequentially. The activation functions that yielded the highest performance for the neurons in the first, second, and third hidden layers were ‘Relu’, ‘Relu’, and ‘Tanh’, respectively. The optimal values for the remaining hyperparameters, namely the optimization algorithm, learning rate, and dropout rate, were determined as ‘Adam’, 0.0021, and 0.15, respectively. The structure of the optimized MLP model is shown in Fig. 5.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Structure of optimized MLP

Comparison between models’ performance

To assess the accuracy of the developed models in predicting the adsorptive desulfurization capabilities of the various MOF samples, some statistical criteria including R2, MSE, and MRE were measured. The results of the models’ performance investigation in both test and training datasets are summarized in Table 4. Table 4 indicates that the most effective ML model for predicting the relationships between the variables in the dataset is the MLP model. The MLP model shows the most negligible difference in MSE between the test and training sets, also it shows superior performance in terms of MRE value in test data, which is recorded at 15.26%. This model, which had its hyperparameters finely adjusted, exhibited excellent generalization. These results are predicted since a correlation value close to 1 or -1 indicates a strong linear link between two variables. Conversely, a correlation value approaching 0 may indicate a nonlinear relationship between two variables. Based on the analysis of the relationships between the features and the target (as shown in Figs. 3 and 4), it can be concluded that these relationships are predominantly nonlinear. Thus, it was noted that all the characteristics exhibit a nonlinear or semi-linear relationship with the sulfur adsorption capacity of the MOFs60. When the hyperparameters of the neural network are perfectly tuned, the model has the ability to learn the relationships exceptionally well.

To further analysis of the resulting models’ performance in forecasting the sulfur uptake capacity of the various MOFs, five papers were considered randomly and their corresponding operating condition as well as MOF’s features were defined as input of the models. The experimental result of the sulfur uptake capacity and the models’ outcomes are presented in Table 5. Among the developed models, the MLP model exhibits the highest accuracy for predicting the sulfur uptake capacity of the MOF samples in most cases considered in the Table 5. These outcomes have good accordance with the results mentioned earlier in Table 4. Figure 6 displays the predicted values of the test data versus the experimental data, which were utilized to assess the models’ performance in prediction and generalization. The increased prediction accuracy is associated with a graph where the forecasted values are in greater proximity to the 45° line. In other words, the discrepancy between the anticipated and test target data is minimal, and these two values are in close proximity. The MLP neural network achieves the greatest R2 value, indicating superior generalization ability. On the contrary, the RandomForest model demonstrates the most variability and poor performance in the test data with a very high average relative error of 45.77% and the lowest R2 value of 0.95 compared to other models. This sizable margin of error strongly suggests a case of overfitting to the training data, therefore the ability of the model to generalize beyond the training set is significantly limited.

Table 4 Performance of the models on the training data and the test data
Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Predicted Adsorption capacity versus Experimental Capacity of the Test data using the models: a RandomForest, b SVR, c Xgboost, d Extratree, e MLP

Detailed analysis of the MLP model as the best performer

To highlight the role of models training performance, a detailed analysis of the best performers, namely the MLP artificial neural network was conducted. This investigation focused on identifying which features exert the most significant influence on the MOF’s sulfur adsorption capacity predictions. The MLP model implemented in Keras lacks the inherent feature importance package. To bypass this limitation, the shapely additive plan (SHAP)75 was used as an additive explanations analysis. The SHAP analysis explains the model’s output by measuring the effect of each feature on the model’s prediction performance. It quantifies both the positive or negative directions and the amount of contribution of each feature. As shown in Fig. 7, the SHAP analysis indicates that “initial sulfur content” is the most influential parameter, and “contact time” also plays a vital role in the sulfur adsorption process’s efficiency. Figure 8 depicts a comparative study of feature relevance as a proportion of each feature among total factors. In this Figure, the importance of the features derived based on the MLP model is highlighted and classified into three main areas: adsorbent properties, adsorption conditions, and adsorbate molecule characteristics. According to Fig. 8, adsorption conditions emerged as the most important influencing factor, mirroring the findings observed in the study on zeolites73. A tangible distinction is evident in the MLP model, which focuses more on adsorption conditions (around 51%), particularly on factors such as initial sulfur concentration, contact time, and the oil/adsorbent ratio. The next order for feature importance is obtained for MOF’s structural characteristics (about 36%), especially the metal ions type. Therefore it can be noted that the desulfurization efficiency of the various MOFs with different metal clusters is deeply affected by metal ion type. This phenomenon may be related to the different characteristics of metal ions such as ion charge, and electronegativity which directly influence the physiochemical properties of the MOF samples. Based on the MLP modeling outcomes, the molecular properties of the adsorbate namely BT, DBT, and DMDBT compounds, are the less important factor (about 13%) in the adsorptive desulfurization process. A complete rationale for the function of the most critical factors in the desulfurization process, derived from SHAP analysis and the MLP model, will be provided in the next section.

Table 5 The experimental as well as model-based predicted sulfur uptake capacity of the various MOFs
Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

The importance of the features based on the MLP model

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Overall relative importance of the MLP model

Dominant factors affecting adsorption capacity

Initial concentration of the thiophenic compounds

According to the SHAP analysis outcomes, the sulfur content adsorption by MOF materials profoundly depends on the initial concentration of the sulfur containing pollutants. Figure 9(a), (b) highlight the dependency of the DMDBT, DBT, and BT adsorption capacity to the sulfur initial concentration for Zinc-baced MOF, and Aluminium-based MOF, respectively. These plots were prepared using MLP model by considering constant values for the features including BET, TPV, pore size, oil/adsorbent ratio, temperature, MWs, and contact time equal to 1588 m2/g, 0.83 cm3/g, 1.05 nm, 70 g/g, 30 °C, 114.23 g/mol, and 2 h, respectively. According to Fig. 9, a higher concentration of thiophenic chemicals leads to an increase in the adsorption capacity of MOF samples. The improvement in removal efficiency is due to an increase in the driving force of thiophenic compounds and a decrease in mass transfer barriers for thiophenic compound diffusion into the MOF structure76.

Contact time

Based on the SHAP analysis findings, contact time or adsorption time is identifies as the most influential factor, with longer contact times leading to increased predicted adsorption capacity. Figure 9-(c), (d) exhibit the dependency of the DMDBT, DBT, and BT uptake capacity to the adsorption time for Zn-baced MOF, and Al-based MOF, respectively. These patterns were plotted using MLP model by considering constant values for the features including BET, TPV, pore size, oil/adsorbent ratio, temperature, MWs, and initial concentration equal to 833 m2/g, 1.20 cm3/g, 5.78 nm, 100 g/g, 25 °C, 114.23 g/mol, and 700 ppm, respectively. According to Fig. 9-(c), (d), the highest sulfur uptake capability could be reached in higher contact times. The enhanced desulfurization efficiency with gaining in the adsorption time is attributed to the more opportunities for the thiophenic compounds to diffuse into the MOF pores and interact with unavailable active sites inside the adsorbent skeleton76.

Oil/adsorbent ratio

One of the significant discoveries from the SHAP study is the importance of the oil/adsorbent ratio (g/g) in predicting adsorption capacity. In this model, a larger oil/absorbent ratio typically improves the estimated adsorption capacity. It is worth noting that the SHAP values for this property vary greatly; the actual effect of the oil-adsorbent ratio on adsorption capacity may be influenced by a complex interaction with other variables such as the adsorbent’s physical properties, the type of oil, and the presence of competing materials. The observed variety demonstrates the importance of careful consideration when modifying this ratio, since the ideal balance for maximal adsorption capacity might vary depending on the specific conditions of each system.

Metal ions type

The metal ions cluster in MOFs can play a crucial role in the adsorption of thiophenic compounds. Thiophenic compounds may be eliminated from oils via π-complexation or direct interaction between sulfur and metal ions. Sulfur containing compounds tend to function as intermediate to soft bases in accordance with Pearson’s hard − soft acid − base principle. Specifically, soft sulfur containing compounds demonstrate a preference for interacting with intermediate or soft Lewis acid sites, including Cu2+, Zn2+. Conversely, their interaction with hard Lewis acid sites, including Cr3+, Al3+, and Fe3+, is relatively weak77. Some metals may facilitate the adsorption process, highlighting the potential of metal ions to act as active sites for chemisorption.

Adsorbate molecule features

The physiochemical properties of the thiophenic compounds such as dipole moment is a key factor affected the tendency of the adsorbate molecule to be adsorbed on the surface of the solid adsorbent via physisorption mechanism. The thiophenic compounds’ adsorption process can be attributed to the dual adsorption process, specifically physisorption and chemisorption postulated for the studies done at various temperatures. Nevertheless, suppose both physisorption and chemisorption take place throughout the adsorption process, with chemisorption being the predominant mechanism. In that case, the uptake capacity for BT is expected to be lower compared to DBT. Indeed, the inclusion of extra aromatic rings in BT and DBT might augment the π-electron count, hence increasing the probability of π-complexation with the exposed metal sites. Furthermore, the electron density on the sulfur atom of thiophenic sulfurs rises in the following order: BT (5.739) < DBT (5.758) < DMDBT (5.76). This increase strengthens the connection between the metal centers and the sulfur molecules77,78. These finding are depicted in the Fig. 9. According to this figure, the thiophenic compounds’ adsorption order in different condition follows the order of DMDBT > DBT > BT.

Adsorbent porous characteristics

Brunauer-Emmett-Teller (BET) surface area, pore size, and total pore volume (TPV) are variably involved which emphasizes the importance of adsorbent properties in adsorption. SHAP values for these features are centered on zero (as illustrated in Fig. 10), and their influence is very contextual and may be moderated by other factors.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Effect of sulfur initial concentration on the DMDBT, DBT, and BT uptake capacity of a Zn-based MOF, b Al-based MOF. Effect of contact time on the sulfur uptake capability of the c Zn-based MOF, and d Al-based MOF

Implications for adsorption process optimization

The insight provided by SHAP analysis is valuable for optimizing the adsorption process. Knowing which feature has the most impact, the adsorption system can be adjusted to increase capacity. For example, the efficiency of the adsorption process can be increased by optimizing the exposure time and controlling the oil/adsorbent ratio. The SHAP summary diagram provides a fine-grained view of the feature contribution to the MLP model predictions. The feature “Time (hr)” shows a wide range of SHAP values, implying that the contact time can sharply increase the predicted adsorption capacity depending on other feature interactions. Similarly, “Oil/Adsorbent (g/g)” and “Cu” also show high SHAP value distributions, indicating their critical roles in influencing the adsorption process. These features likely have nonlinear effects, as both high and low values lead to varying impacts on adsorption, as shown by the spread of SHAP values along both positive and negative x-axis sides.

In contrast, features lower in the list, like “Zn,” “Zr,” and “V,” have relatively minimal influence, as indicated by the SHAP values concentrated around zero. This implies that variations in these features have little to no significant impact on the adsorption capacity. Features like “Temp (°C)” and “MWs” exhibit some impact, although to a lesser extent than the top features. The model’s dependence on features such as “Time,” “Oil/Adsorbent,” and “Cu” suggests that they play a primary role in influencing adsorption, and optimizing these may be more effective in enhancing adsorption capacity compared to focusing on less impactful features.

Oil/Adsorbent (g/g): This feature has a broad range, with high SHAP values mostly positive, implying that a higher oil-to-adsorbent ratio generally boosts adsorption. Blue dots (lower values) are more concentrated around zero, indicating minimal impact at lower ratios.

Cu (Copper): Copper exhibits a notable positive shift for high concentrations (red dots) and a more neutral to negative impact for low concentrations (blue dots). This suggests higher copper levels are associated with increased adsorption effectiveness.

C0 (ppm): The SHAP values for initial concentration (C0) are mostly positive, with red dots clustered towards the right. Higher pollutant concentrations enhance adsorption, likely by creating a stronger concentration gradient that drives adsorption.

Al (Aluminum): Aluminum shows both positive and negative SHAP values with a moderate spread, indicating a nuanced impact where both high and low values can variably affect adsorption capacity.

DM (Dipole moment): The dots for DM are scattered across a smaller range with a slight positive skew, suggesting that higher diffusion rates contribute positively to adsorption, but with limited overall impact.

BET (m2/g): BET surface area has a narrower range of mostly positive SHAP values. Red dots are on the positive side, implying larger surface areas enhance adsorption by offering more binding sites.

MWs (Solvent’s molecular weight): The spread is relatively narrow and slightly negative for higher molecular weights (red dots), suggesting that larger molecules might reduce adsorption efficiency due to limited diffusion.

Cr (Chromium): Chromium shows mixed SHAP values, with a small spread around zero, indicating a less consistent impact, with some cases showing a slight positive effect and others negative.

Pore Size (nm): The SHAP values for pore size are mostly close to zero, showing a minimal effect, although larger pore sizes (red dots) sometimes slightly increase adsorption.

KD (Kinetic diameter): The adsorbate’s kinetic diameter shows a positive skew, with higher values (red dots) slightly improving adsorption, indicating that better adsorbate-adsorbent affinity aids the process.

TPV (cm3/g): Total pore volume has a narrow range around zero, with slightly positive values for higher volumes, suggesting only a marginal improvement in adsorption with more pore space.

Temp (°C): Temperature’s impact is minimal, with dots close to zero. Both high (red) and low (blue) values have limited influence, implying temperature has little effect within the tested range.

Zn (Zinc): Zinc’s dots are tightly centered around zero, showing that variations in its concentration have no significant effect on adsorption capacity.

MWa (Molar Mass of Adsorbate): Similar to molecular weight, this feature’s SHAP values are close to zero, suggesting little impact on adsorption regardless of value.

Co (Cobalt): Cobalt’s SHAP values are tightly clustered around zero, indicating minimal influence, with no strong effect from either high or low concentrations.

Zr (Zirconium): Zirconium shows almost no variation, with SHAP values close to zero, meaning its concentration does not affect adsorption in this model.

Fe (Iron): Iron also has SHAP values centered around zero, showing little impact on adsorption, regardless of its concentration.

V (Vanadium): Vanadium has a negligible effect, with dots concentrated near zero, indicating no significant impact on adsorption capacity.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

SHAP analysis of the MLP model

Genetic algorithm optimization outcomes

In order to achieve the highest sulfur uptake capability in an adsorptive desulfurization process, the GA optimization technique was applied on the resulting MLP model as the objective function within the range of features reported in Tables 2, and 3. The results of the GA method indicated that the optimal conditions for desulfurization involved the presence of Zirconium (Zr) within the Metal-Organic Framework (MOF), a BET surface area of 756 m2/g, a total pore volume of 0.955 cm3/g, and an average pore size of 5.96 nm. Additionally, the Oil/Adsorbent ratio was determined to be 449.85 g/g at a temperature of 20.1 °C, achieving a maximum adsorption capacity of 161.6 mg/g. The optimal values of the adsorbate molecule’s features revealed that the highest sulfur uptake capacity is feasible for DBT adsorption.

Conclusion

This work comprehensively analyzed different ML models for predicting MOF material sulfur adsorption capacity. Regarding performance metrics, the MLP model as the best outperformed with a low MRE of 0.0032 on the test set versus 0.0045 for RandomForest as the worst model. For the MLP model, SHAP analysis revealed “initial concentration of sulfur compound” (0.51) and “contact time” (0.37) as the most significant. On the training set, the MLP’s MSE was observed 0.0021. Crucially, the MLP achieved a superior MRE value of 15.26% on the test set compared to 17.83% for RandomForest, indicating better generalization. Feature classification exhibited adsorption conditions as the most influential factor for both models. However, the MLP emphasized adsorption conditions like contact time, initial sulfur content, and oil/adsorbent ratio more than the RandomForest model, it also focused on the presence of various metal ions’ effects. Conversely, the RandomForest model prioritized total pore volume while giving less weight to metal ions. Overall, the MLP model’s low MSE on test/train sets, coupled with its low 15.26% MRE on the test set, demonstrated excellent prediction performance and generalization ability with finely tuned hyperparameters. The detailed SHAP and feature importance analyses provide valuable insights into the key drivers impacting model predictions, facilitating further optimization for improved sulfur adsorption capacity forecasting in MOF applications. Considering features importance and their corresponding constrains, the GA optimization method revealed the highest DBT uptake capacity of 161.6 mg/g for Zr-based MOF when the other features including BET, TPV, pore size, oil/adsorbent ration, and temperature were adjusted to 756 m2/g, 0.955 cm3/g, 5.96 nm, 449.85 g/g, 20.1 °C, respectively.