Introduction

In recent years, the rapid growth of the global economy has resulted in increased emissions of sulfur oxides (SOx) from the combustion of sulfur-containing compounds in fuel oil. These emissions contribute significantly to air pollution, causing environmental issues such as acid rain and forest damage. Additionally, sulfur can cause corrosion in process equipment, such as pumps, pipelines, and refinery units, posing significant challenges for fuel processing and reforming systems1,2. Current regulations limit sulfur content to 10 ppmw in gasoline and 15 ppmw in diesel3. Consequently, achieving ultra-deep desulfurization of liquid fuels has become a global research priority, with numerous studies dedicated to developing methods for reducing fuel sulfur levels4. Various forms of Sulfur Compounds, such as sulfides and disulfides, also organosulfur compounds, including mercaptans, thiophene (Th), benzothiophene (BT), benzonaphthothiophene (BNT), dibenzothiophene (DBT), and 4,6-dimethyl dibenzothiophene (4,6-DMDBT)5. Several desulfurization techniques are proposed, such as hydrodesulfurization (HDS)6, extractive desulfurization (EDS)7, biodesulfurization (BDS)8, oxidative desulfurization (ODS)9, and adsorptive desulfurization (ADS)10. Because of its effective effect on removing mercaptans and sulfides, HDS is currently the most popular technology and a widely adopted approach for lowering sulfur levels in fuels at refineries globally11. However, despite the high consumption of hydrogen and performance in high temperature and pressure conditions, HDS has a limited effect on the elimination of refractory sulfur compounds, including benzothiophene, thiophene, and their derivatives, which include most of the sulfur compounds in the fuel12,13. Moreover, HDS produces hydrogen sulfide, another sulfur-containing compound that must be separated14. Among these methods, ADS is considered a well-known technique10. Particularly for ultralow sulfur levels because of its mild process conditions, low operating costs, economical and environmentally friendly, and excellent preservation of fuel quality15. Figure 1 is a simplified illustration of the ADS process. The system includes a feed tank, air compressor, ADS reactor, and condenser. Sulfur-containing fuel flows through the packed-bed reactor packed with a porous zeolite adsorbent, where sulfur impurities are removed. The magnified section indicates the internal structure of the zeolite, with particular focus on its porosity and surface activity, both of which are critical for the efficiency of adsorption. The effectiveness and cost-efficiency of ADS in removing sulfur from oil largely depend on choosing the right adsorbent, as it determines the process’s overall efficiency and flexibility16. Porous materials, including zeolite-based materials17, activated carbons (AC)18, aluminas19, Mesoporous silicates20, metal–organic frameworks (MOFs)21, and metal oxides22, are used for adsorption desulfurization. Zeolites are particularly suitable adsorbents due to their unique physical and chemical properties, such as high adsorption capacity, selectivity, specific surface area, and regenerability22,23. Faujasite (FAU) zeolites with different Si ratios, such as NaX and NaY, are widely studied for their effectiveness in adsorptive desulfurization due to their high porosity and surface area. Their ion-exchange capability and tunable structures lead to the high flexibility of zeolites in adsorption capacity and selectivity24. The schematic structure of the FAU zeolite with metal ion exchange is shown in Fig. 2.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Schematic of ADS process25.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Metal-containing FAU-type zeolite.

Numerous investigations have explored using zeolites for desulfurizing liquid hydrocarbon fuels13,14,15. The zeolite’s porous structure, characterized by microcavities, constrains the adsorption of sulfur compounds26. Creating mesoporous zeolites through various structural modification processes, including desilication and dealumination, mitigates diffusion limitations, allowing refractory sulfur compounds to access internal adsorption sites without compromising the zeolite’s structure27. Dealumination refers to removing aluminum species from the framework, typically carried out through steaming, acid treatment, and chemical treatment.28. Numerous attributes have been cited as crucial for determining the adsorption efficiency of zeolites, but there is no clear consensus on which factors are most impactful. This disparity likely arises due to the differing operating conditions and zeolite properties, resulting in the predominant factor or condition influencing the adsorption rate to vary accordingly29. The Si/Al ratio impacts the catalytic performance of Fe-ZSM-12, according to Akopyan et al. The activity was increased by raising the Si/Al ratio, which they linked to a decrease in weak acid sites and a minor increase in strong acid sites following iron deposition30.

On the other hand, Mahmoudi and Falamaki reported an increase in activity when the Si/Al ratio was lower31. Various metal ions, such as copper, nickel, and cerium, are applied as single metals or in bimetallic combinations to modify adsorbents through ion exchange or impregnation to increase adsorption capacity.15,16,32. It has been discovered that the kind of sulfur component being removed affects how well adsorption works. According to Zhou et al., BT was the most selectively adsorbed compound, followed by TH and DBT33. According to Akopyan et al. Organosulfur compounds have different levels of activity. DBT has the most activity, followed by 4,6-DMDBT, and BT has the lowest. They concluded that steric hindrances and the electron density surrounding the sulfur atom are the main factors influencing this tendency, consistent with earlier research findings.30. The link between zeolite properties and adsorption capacity during desulfurization is well-documented. However, these properties’ relative importance and interplay in adsorption processes are poorly understood and have not been specifically studied. Identifying the key process parameters is crucial for enhancing the ADS process. Although experimental approaches can determine the effects of zeolite properties and process conditions on adsorption capacity, these experiments are often challenging due to their complexity and the substantial resources required, making them impractical for many researchers. Consequently, many unanswered questions remain about how zeolites’ properties, like their Brunauer–Emmett–Teller (BET) surface area and pore volume, impact their capacity to adsorb sulfur. Response Surface Methodology (RSM) has garnered significant interest among researchers because it effectively manages several variables with limited data. This technique excels at identifying the specific interactions between independent variables34.

Utilizing RSM as a statistical approach allows for creating effective empirical models. The choice of response surface designs varies based on the experimental aims and conditions35. Although RSM shows limitations in effectively addressing nonlinear issues in multicomponent systems, advanced techniques can tackle these challenges. Artificial Neural Networks (ANN) can model complex and nonlinear problems36. Research in this field has been limited, with few studies to date. A recent survey by Mguni et al. investigated ADS using zeolite-based adsorbents, noting challenges in screening zeolites and the lack of consensus on key parameters. They applied machine learning techniques, specifically multiple linear regression (MLR) and random forest (RF) regression, to analyze ADS processes. The RF model showed better predictive performance (R2 = 0.9300) compared to the MLR model (R2 = 0.8800)29. Despite increased research interest in employing ANN and RSM in simulating adsorption processes, relatively little effort has been made with their employment in simulating ADS using zeolites, with a focus particularly on the twin aspects of fuel properties, operational conditions, and adsorbent structure. In this paper, we set up a modeling platform that includes operational parameters (reaction time and temperature), fuel-specific descriptors (sulfur compound molecular weight), and zeolite structural parameters (micropore volume and BET surface area) to predict sulfur adsorption capacity precisely. In addition to comparing the optimized multilayer perceptron (MLP) and radial basis function (RBF) neural network performance, model interpretability was enhanced through 3D surface plots and global sensitivity analysis (GSA) to investigate variable interactions and identify important factors affecting sulfur uptake. The approach provides improved insight into the adsorption behavior and allows for more targeted experimental design. An uncertainty analysis was also performed on the optimized MLP model using a Monte Carlo-based approach, calculating mean predictions and 95% confidence intervals to validate the model’s reliability further. In this work, a neural network model for predicting sulfur adsorption capacity on modified zeolites will be developed, and its outcomes will be compared with those of the RSM model. Using statistical analysis and comparison, the study aims to provide a comprehensive investigation of the correctness of both models utilizing the mean square error (MSE) and coefficient of determination (R2). This approach can potentially reduce the need for extensive experimental screening, thereby saving time and resources while offering valuable insights into the mechanisms of adsorption desulfurization.

Materials and methods

Data collection

The dataset, which included 317 data points, was taken from earlier experimental investigations and utilized to train and test the models. To reduce data inconsistency in the results obtained, only research under batch and atmospheric pressure was utilized. Besides, data from a similar temperature range, time, and model fuel type were utilized. Physical properties of the zeolites, i.e., micropore volume and BET surface area, were also considered and utilized as model input parameters so that comparison could be performed consistently for various zeolites. All experimental data used in this study were collected from previously published studies and are summarized in Table 1, which presents the types of zeolites used, their structural characteristics, pore properties, operating conditions (e.g., temperature and time), and the specific sulfur compounds used in the model fuels.

Table 1 Summary of experimental data collected from the literature for various zeolites used in ADS.

Statistical properties of data

Data normalization was performed to ensure precise neural network outcomes. All data were scaled to a range of -1 to + 1 using Eq. (1).

$$x_{{{\text{norm}}}} = \frac{{2X - X_{{{\text{Max}}}} - X_{{{\text{Min}}}} }}{{X_{{{\text{Max}}}} - X_{{{\text{Min}}}} }}$$
(1)

Xnorm represents the normalized data in this context. At the same time, the input variable is indicated by X. The maximum and minimum values of the data are represented by Xmax and Xmin, respectively. Minimizing the predicted network error at each iteration is essential to get ideal network parameter values during training. The criteria employed for this purpose include the MSE, R2, and the total absolute average relative deviation (AARD%). The mathematical expressions for MSE, R2, and AARD are given in the following equations45,46:

$$MSE = \frac{1}{n}\sum_{i = 1}^{n} \left( {Y_{{{\text{predicted}}}} - Y_{{{\text{actual}}}} } \right)^{2}$$
(2)
$$R^{2} = \frac{{\sum_{i = 1}^{n} \left( {Y_{{{\text{predicted}}}} - Y_{{{\text{actual}}}} } \right)^{2} }}{{\sum_{i = 1}^{n} \left( {Y_{{{\text{predicted}}}} - Y_{{{\text{mean}}}} } \right)^{2} }}$$
(3)
$${\text{AARD}}\left( {\text{\% }} \right){ } = \frac{100}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{Y_{{{\text{predicted}}}} - Y_{{{\text{actual}}}} }}{{Y_{{{\text{predicted}}}} }}} \right|$$
(4)

Yactual stands for the experimental value, and Ypredicted for the value the artificial neural network anticipated.

Response surface methodology (RSM)

RSM is a widely used statistical method and the key to identifying the interrelationships between process variables and the desired outcomes in cases of, for example, sulfur removal efficiency. RSM models the complex system with the minimum number of operations and applies regression modeling and nonlinear analysis. Therefore, it is also the experimental design47. This was mainly due to the research of Box and Behnken, which is known for its effectiveness and frugality, as well as its capability to test out the effect of separate and together elements47. The RSM process has three main stages: developing foolproof tests, building exact mathematical models, and finding the best conditions to reach the highest or lowest problems. RSM plays a crucial role in process improvement, reiterating excellent product quality, reducing costs, and facilitating extensive and skilled teaching of process interactions to the students through its visual and multivariable analysis capabilities.34,48. The quadratic polynomial equation is the most frequently utilized model for fitting experimental data34. This study examined and used a quadratic polynomial model by reviewing existing models, shown in Eq. (5)49.

$$Y = \beta_{0} + \mathop \sum \limits_{i = 1}^{k} \beta_{i} X_{i} + \mathop \sum \limits_{i = 1}^{k} \beta_{ii} X_{i}^{2} + \mathop \sum \limits_{i = 1}^{k} \mathop \sum \limits_{j = i + 1}^{k} \beta_{ij} X_{i} X_{j}$$
(5)

where \({\beta }_{0}\) indicates the intercept or constant term, \({\beta }_{i}\) Is the coefficient for the linear terms and \({\beta }_{ij}\) represents the interaction coefficients between the variables, \(Y\) represents the expected response (Adsorption Capacity) in this equation. Epsilon represents the residual error, while \({X}_{i}\) and \({X}_{j}\) stand for the input parameters. Table 2 compiles the independent variables’ lowest, maximum, and average values.

Table 2 Process factor in RSM modeling.

The curvature of the response surface can be identified in RSM modeling when the coefficients for quadratic components are computed using the experimental design’s central points. On the other hand, factorial points, which show the direct linear correlations between input variables and the answer, are used to derive the coefficients of linear terms. Building a trustworthy forecasting model requires accurate calculation of these coefficients50. Table 2 highlights the process factors examined in the RSM framework. These factors were analyzed to determine how they affected adsorption capacity and maximize efficiency (see Table 3). While the predicted optimum lies within the boundaries of the input variable limits, it slightly exceeds the maximum value observed in the experimental data. This common trend in RSM-based models reflects a potentially improved combination of parameters. These predictions are mathematically derived from the quadratic regression surface, which can identify optimum regions not directly represented by the experimental points. Nonetheless, further experimental testing may be considered in future research to confirm the empirical validity of this prediction.

Table 3 Optimization range of the Sulfur adsorption by RSM-CCD.

Artificial neural network (ANN) theory

ANNs, which are powerful computations emulating human brain operations, are thus called. These networks impersonate the brain network architecture of biological systems by embracing layers of connected nodes, namely an input layer, one or more hidden layers, and an output layer. Such models are chiefly applied to difficult, nonlinear systems, where they are excellent at extracting information and analyzing the data to reveal patterns and relationships. ANNs may accomplish complicated tasks like language translation and image processing by joining these layers to mimic the activity of organic neurons. The ability of ANNs to model complex systems and forecast outcomes in various industries, such as technology, healthcare, and finance, makes them essential in artificial intelligence and machine learning. In particular, feed-forward ANNs perform exceptionally well at approximating smooth functions when given enough neurons and training conditions. These networks are frequently used in chemical engineering to improve the accuracy of regression and classification issues, supporting tasks like system optimization, process modeling, and surrogate model creation.34,51,52.

To save time and money on computation, neural networks concentrate on determining the optimal weights (w) for their functions (f). They accomplish this by applying the associated weights (w) to each input (xi), adding a bias factor (b), and then summing these products, as shown in Eq. (6).53.

$$net = \left( {\sum_{i = 1}^{n} w_{i} x_{i} } \right) + b$$
(6)

The outcomes are derived using the transfer function (f), as shown in Eq. (7), which varies in form. The data is divided randomly, with 70% used for training, 15% for validating, and 15% for testing. This process is illustrated in the schematic view of the work cycle of an ANN, as shown in Fig. 3.

$$y = f\left( {\text{net }} \right)$$
(7)
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Schematic representation of the work cycle of an ANN.

Multilayer perceptron (MLP)

Supervised learning will be performed based on the feed-forward architecture of the MLP networks. Generally, ANNs have one input layer to accept data and an output layer to provide the final ranked outputs, and there might be one or more hidden layers to process the data in various ways. While fully modeling the relationships in data, the hidden neurons use various nonlinear activation functions, including the hyperbolic tangent and sigmoid. The MLP uses a back-propagation technique on weights during training to fine-tune these and reduce error functions and values. The output generation step is non-linear because of an activation function, the addition of bias terms, the weighing of inputs, and the sum of contributions from all hidden neurons. This systematic procedure allows MLP networks to perform consistently on various tasks, from different predictions to classification tasks54,55. The MLP network structure is shown in Fig. 4.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

MLP network structure.

In Eq. (8), the output vector \(g\) of the MLP neural network is defined with \({x}_{i}\) representing the reference vector, \(w\) denoting the coefficient weighting vector and \(\theta\) Symbolizing the threshold limit. These variables are fundamental in describing the MLP neural network’s output.

$$g = f\left( {wx_{i}^{k} + \theta } \right)$$
(8)
$$\gamma_{jk} = F_{k} \left( {\mathop \sum \limits_{i = 1}^{{N_{K - 1} }} w_{ij} \gamma_{{j\left( {k - 1} \right)}} + \beta_{jk} } \right)$$
(9)

In Eq. (9), \({\gamma }_{jk}\) and \({\beta }_{jk}\) represent the contribution of neuron \(j\) from layer \(k\) and its corresponding bias weight, respectively. The weights of the connections are denoted by \({w}_{ij}\), \({F}_{k}\) while signifies the nonlinear activation transfer function.

Radial basis function (RBF)

The RBF network uses radial basis functions, which are one of the feed-forward networks with a single hidden layer. The radial basis function is used as the activation function in the RBF network’s single hidden layer structure, which is an advantage. An input layer, a hidden layer with a non-linear RBF activation function, typically a Gaussian function, and an output layer made up of linear combinations make up the standard configuration of RBF networks, as seen in Fig. 5. An RBF unit is a fixed point or reference point for the distance between the input data and the center point; Euclidean distances can be used to calculate the amount of this distance. The ultimate response is obtained by a linear combination of the radial basis functions obtained as the output of each RBF unit56,57.

$$f\left( x \right) = \mathop \sum \limits_{i = 1}^{N} w_{ij} G\left( {\left\| {x - c_{i} } \right\|{*}b} \right)$$
(10)
Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

RBF network structure.

In Eq. (10), the RBF network’s output layer functions through a linear combination. Here, \(N\) refers to the total number of training samples, \({w}_{ij}\) represents the weight applied to each hidden layer neuron, \(x\) stands for the input variable, \({c}_{i}\) indicates the center points, and \(b\) is the bias term. The Gaussian function is then applied to derive the centralized response from the hidden neurons, as shown in Eq. (11).

$$G\left( {\left\| {x - c_{i} } \right\|{*}b} \right) = {\text{exp}}\left( { - \frac{1}{{2\sigma_{i}^{2} }}\left( {\left\| {x - c_{i} } \right\|{*}b} \right)^{2} } \right)$$
(11)

The parameter \({\sigma }_{i}\) determines the width of the Gaussian function, while \(t\) describes the extent of \(\left\| {x - c_{i} } \right\|\) in the input space that triggers the RBF neuron’s response. This setup ensures that the network’s neurons are finely tuned to react within specific regions of the input space.

ANN model design

Figure 3 gives a suggested algorithm where the first step is to compile all the experimental data, which includes variables such as a, Vp, T, t, and MW as inputs and qe as output. The second step uses normalized data for inputs and outputs; then, the learning algorithm is properly chosen to build the network structure. For the ANN model, 70% of the dataset is used for training to optimize network parameters such as weights, biases, and thresholds to improve the model’s performance. Besides, 15% of the dataset is used in validation, and the remaining 15% is used for testing. When evaluating the model’s accuracy, comparisons between expected and actual data are used to calculate statistical measures such as the R2 value and MSE. By experimenting with different numbers of hidden layers and neurons in each layer, as well as other training procedures, the optimal MLP configuration is achieved. Trial and error is typically used to determine the number of neurons in the RBF network, starting with a large number and reducing it to a level that yields the lowest MSE. The training is terminated when the optimal error is reached.

Results and discussion

RSM results

Variance analysis (ANOVA)

Table 5 shows the ANOVA results from evaluating the experimental data. The F-value indicates the overall significance of the model, and the P-value indicates the probability associated with the ANOVA analysis. P-values of less than 0.05 indicate that a term is statistically significant within the model. P-values above 0.1, however, suggest that the terms are not statistically significant.46,58. The model’s F-value was determined to be 404.99, suggesting a high degree of significance. This suggests there is only a 0.01% chance that such a large F-value could occur due to random noise. P-values for variables A, B, C, D, and E are less than 0.05, indicating they are significant and influential on the dependent variable. Variables A and E have the lowest P-values and the highest F-values among individual parameters, suggesting they have the most important effect on q. The Adjusted R2 is 0.9502, and the Predicted R2 is 0.9475, showing a close agreement with a difference of less than 0.2, indicating that the model effectively predicts new data while fitting the existing data well. Adeq. Precision measuring the signal-to-noise rate is 90.5336, significantly above the desirable threshold of 4, demonstrating a strong signal. The parameters for the response’s fit statistics are shown in Table 4, which was obtained from analyzing 317 observations (Table 5).

Table 4 A quadratic model was used for the responses based on the statistical parameters.
Table 5 Variance Analysis results for adsorption capacity response.

Perturbation plots

An important tool for assessing how different process parameters affect both qe at the central point is that the perturbation plot enables each component’s impact to be observed using a single visual representation. Figure 6 demonstrates the perturbation plot for q, effectively showing how this method visualizes the influence of each parameter on the process. As illustrated in the plot, the surface area exhibits a direct and linear relationship with the adsorption capacity, showing a relatively steep slope; an increase in surface area results in higher adsorption capacity. In contrast, the micropore volume demonstrates an inverse relationship, where increased volume leads to decreased adsorption capacity. The plot for the temperature parameter presents a parabolic trend, initially showing a positive effect on adsorption with growing time but eventually leading to a decrease in capacity as time progresses. The impact of time on adsorption capacity starts as a positive relationship; initially, as time increases, so does the adsorption. But as time goes on, this impact tapers off and even slightly reverses, creating a parabolic shape. This trend likely occurs because, after a while, the adsorption sites on the surface become saturated, which naturally slows down further adsorption. At this stage, slight desorption can also happen due to repulsive interactions or shifts in surface stability. On the other hand, the molecular weight of sulfur compounds in the fuel has a direct, linear relationship with adsorption capacity. As we move from thiophene to benzothiophene, the additional benzene ring in heavier compounds encourages π-π interactions with the adsorbent, leading to improved adsorption.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Variation curves of Adsorption capacity based on a coded factor.

Pearson correlation matrix

Figure 7 shows the Heat correlation matrix, a square matrix representing correlation coefficients between feature pairs within a dataset. These coefficients, ranging from -1 to + 1, indicate the strength and direction of linear relationships: 0 indicates no correlation, -1 indicates a perfect negative correlation, and + 1 indicates a perfect positive correlation. Diagonal elements are always 1, representing a feature’s correlation with itself. This matrix provides insights into the dataset’s structure, as the sign and magnitude of coefficients reveal the nature of relationships between variables.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Heat correlation matrix between any two variables.

Interaction of factors

This study used Design-Expert software, version 13.0, to divide data and create three-dimensional (3D) response surfaces. The 3D plots assessed the sulfur compounds’ surface area, pore volume, temperature, duration, and MW to maximize their adsorption capability. These visual aids were also used to determine the ranges of parameters that would maximize the capacity of sulfur adsorption. Figure 8 illustrates sulfur adsorption capability using color codes and labeled lines.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Response surfaces plots of Sulfur adsorption capacity influence of (a) a and Vp, (b) a and T, (c) a and t, (d) a and S-compound MW, (e) S-compound MW and Vp, (f) S-compound MW and T, and (g) S-compound MW and t.

As shown in Fig. 8a–d, increasing surface area enhances adsorption capacity, benefiting sulfur removal by providing more active sites for sulfur molecules. Figure 8a indicates that reducing micropore volume can enhance adsorption, likely due to metal ions introduced into the zeolite structure. These ions occupy micropores and promote sulfur adsorption through strong interactions, including π-complexes or metal-sulfur bonds. Modifications such as dealumination or metal impregnation can reduce micropore volume while creating mesopores, thereby increasing surface area and enhancing sulfur capture, consistent with previous studies. Figure 8b shows that while an initial temperature increase boosts adsorption by enhancing molecular mobility, further increases reduce adsorption since it is an exothermic process. As seen in Fig. 8c, increasing adsorption time initially improves capacity until saturation is reached, after which further time yields diminishing returns due to potential desorption. Finally, Fig. 8d–g illustrates that the molecular weight of sulfur compounds positively affects desulfurization. Higher molecular weight, as seen with benzothiophene, enhances adsorption efficiency compared to thiophene, likely due to its additional aromatic ring facilitating stronger π-complex interactions with metal ions.

ANN results

Prediction and optimization

317 data points were used to create the neural network, 70% of which were used for training, and the final 30% were split between 15% for testing and 15% for validation. Critical elements, including the number of neurons and layers, activation functions, training epochs, and training methods, were all considered to get the best performance out of the test data’s MLP network architecture. A comprehensive evaluation was conducted to identify each model’s most effective configurations and activation functions. In this regard, twelve backpropagation algorithms were investigated, including the Bayesian Regularization (trainbr)59, Scaled Conjugate Gradient (trainscg) method60, and Levenberg–Marquardt (trainlm) algorithm61. The study looked at MLP designs with two or three hidden layers and a range of 15 to 30 neurons. It was discovered that adding more neurons or layers up to this point did not enhance performance but contributed significantly to training time and the possibility of overfitting, primarily due to the constrained dataset size. The output layer was subjected to the linear function (purelin), while the hidden layers were activated using the sigmoid function (tansig). The selection of the tansig activation function for hidden layers and the purelin function for the output layer was also motivated by relevant literature in the modeling of adsorption. For instance, Kolbadinejad et al.62 employed a two-hidden-layer MLP structure with tansig and purelin functions to predict gas adsorption on zeolites and activated carbon and achieved an R2 of 0.9998. As a result of the similarity in adsorption mechanism and use of zeolite-based adsorbents, such an ANN configuration was considered appropriate for modeling adsorptive desulfurization of model fuels in the present research. Initial weights and biases were randomly initialized using MATLAB. Each network architecture was trained at least three times to account for potential variability due to these random initializations. The most optimal results from these repetitions were selected for further analysis. This approach mitigated the influence of initial weight and bias settings on the outcomes and ensured that the resulting models exhibited superior accuracy and robustness. By thoroughly examining various configurations and training conditions, the proposed MLP architecture effectively addressed the inherent variations in the training process, thereby enhancing the reliability of the results. The results of the network evaluation are summarized in Table 6, which presents detailed performance metrics for each tested algorithm. The table includes the following parameters: the algorithm name, optimal network architecture, and performance measurements like MSE and R2 for the training, validation, test, and overall datasets. Additionally, it provides information about the training time (in seconds) and the number of epochs required for convergence.

Table 6 The outcomes of employing various algorithms and optimal network architectures.

Based on the results presented in the table, the Levenberg–Marquardt (LM) method stands out as the top performer, achieving the highest accuracy (R2 = 0.9919) and the lowest mean squared error (MSE = 0.0025) across all datasets, training, validation, and test. This impressive performance highlights LM’s effectiveness in optimizing neural networks, especially given its ability to converge within just 15 epochs. In comparison, Bayesian Regularization (BR) also performs well, reaching an overall accuracy of 0.9910 and an MSE of 0.0030. However, BR requires much larger epochs (300) and a longer runtime, indicating a heavier computational demand. Meanwhile, the Scaled Conjugate Gradient (SCG) algorithm emerges as the most time-efficient approach, delivering solid performance with an accuracy of 0.9824 and an MSE of 0.0058, all while completing training in a swift 1.5940 s and requiring only 68 epochs. This makes SCG particularly attractive for applications where time efficiency is essential. On the other hand, BFG has a notably longer runtime (2.7350 s) yet achieves high test accuracy (R2 = 0.9900), reflecting its optimization strategy’s complexity and computational intensity. Gradient Descent with Momentum and an Adaptive Learning Rate (GDX) benefits from adaptive learning and momentum, achieving a reasonable accuracy of 0.9711. However, it’s less efficient than other algorithms, taking 3.0290 s to complete. Regarding convergence, GD and GDA (Gradient Descent Adaptive) stand out for a less desirable reason: both reach the maximum allowed epochs (300) without achieving satisfactory accuracy, underlining their limitations in navigating the optimization landscape effectively. In contrast, LM’s rapid convergence within a limited number of epochs reaffirms its stability and robustness in training neural networks. LM is the preferred choice due to its high accuracy, low MSE, and rapid convergence, making it especially suitable for applications that demand precision and stable performance. Although SCG offers a faster runtime, LM’s superior accuracy and consistency in convergence make it the better option for applications focused on reliability. Figure 9(a) presents how the mean square error changes with the number of data steps, where the best MLP model reaches its best validation performance of 0.0023 at 15 epochs. Meanwhile, the regression outcomes of the best MLP network are shown in Fig. 10.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

MSE as a function of epochs for adsorption capacity datasets in (a) MLP, (b) RBF.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Comparing the experimental data with the results of the MLP neural network model with the (a) training, (b) validation, (c) test, and (d) total data.

The optimization of the RBF neural network is critical for accurate predictions, requiring adjustments to parameters such as the spread value, training functions, and number of neurons. The optimal RBF model, with 228 neurons in its hidden layer, achieved a mean square error of 0.0015 (Fig. 9b) and a strong correlation with experimental data (R2 = 0.9951, Fig. 11). As shown in Fig. 12, the MSE varies with different values of spread and neuron counts. The MSE generally decreases as the number of neurons increases up to around 50, and the best performance is observed with a lower spread value (e.g., 0.1) and 228 neurons. Beyond this point, the MSE tends to stabilize and sometimes slightly increase, likely due to overfitting. To prevent overtraining and improve generalizability, the network’s runtime was reduced, and its accuracy was validated using data not included in the training.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

Comparing the experimental data with the results of the RBF neural network model.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

MSE variation across different neuron counts and spread values in the RBF neural network.

Global sensitivity analysis

To investigate the influence of the individual input parameters on the predicted sulfur adsorption capacity, global sensitivity analysis (GSA) was performed for the optimized MLP neural network model trained with the trainlm algorithm63,64. The GSA was done through a variance-based method with Monte Carlo sampling over normalized ranges of inputs. The results showed that the micropore volume made the largest output variation (sensitivity index = 0.5918), demonstrating its predominant role in adsorption. The other parameters, including BET surface area, reaction time, temperature, and molecular weight of sulfur compounds, were much smaller in sensitivity indices, which determined the structural characteristics of the zeolite as the most predominant factor in the process of adsorption-based desulfurization. The detailed results of this analysis are illustrated in Fig. 13.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.
Full size image

Variance-based global sensitivity analysis showing the influence of input variables on sulfur adsorption capacity.

Uncertainty analysis

An uncertainty analysis was performed on the optimized MLP model to determine the robustness and predictive stability of the trained neural network. The analysis computes the sensitivity of the model predictions to random variability in initial weight settings and plots a statistical confidence interval around predicted values65. From the comparison of training algorithms (refer to Section “ANN results”), the LM method was selected as the best training method because it performed better. Therefore, the uncertainty analysis was performed for the MLP model trained using LM. The model was trained 20 times with different random initializations, and the mean predicted adsorption capacity and 95% confidence interval were computed. This was done using a Monte Carlo-based method to achieve the statistical variability of the model. The findings on the whole dataset are presented in Fig. 14, showing that the model has stable predictions with small uncertainty bounds for most of the data points.

Fig. 14
Fig. 14The alternative text for this image may have been generated using AI.
Full size image

Uncertainty analysis of optimized MLP model.

3D response surfaces

The response surfaces of an ANN model using an MLP technique to predict adsorption capacity are depicted in three-dimensional plots in Fig. 15. These plots highlight the impact of changes in five factors while keeping the other two factors fixed to evaluate their combined influence on the output. A comparison between RSM and ANN results reveals that, while both exhibit similar trends, the ANN model is more effective in predicting the interactions among the parameters, providing more accurate insights. In Fig. 15a, it can be observed that an increase in surface area enhances adsorption due to the presence of more active sites for sulfur compounds. However, reducing the micropore volume can also improve adsorption efficiency, likely due to the incorporation of metal ions within the zeolite structure. These ions occupy micropores and facilitate sulfur uptake through mechanisms such as π-complex interactions and metal-sulfur bonding. Introducing mesopores, often achieved through dealumination or metal impregnation, further improves adsorption by increasing surface area. As shown in Fig. 15b and f, an initial increase in temperature promotes adsorption by enhancing molecular mobility; however, since the adsorption process is typically exothermic, a further temperature increase leads to decreased adsorption due to desorption effects. Figure 15c reveals that while extending adsorption time initially increases Adsorption capacity, this effect plateaus once saturation is reached, as prolonged exposure may result in desorption. Figure 15d–g also highlights that sulfur compounds with higher molecular weights exhibit better adsorption performance. Compounds like benzothiophene, with an additional aromatic ring, demonstrate stronger interactions with metal sites than lighter compounds like thiophene, thus enhancing adsorption through more robust π-complexation.

Fig. 15
Fig. 15The alternative text for this image may have been generated using AI.
Full size image

3D response surface plots generated by the MLP model to provide adsorption capacity for analyzing the influence of (a) a and Vp, (b) a and T, (c) a and t, (d) a and S-compound MW, (e) S-compound MW and Vp, (f) S-compound MW and T, and (g) S-compound MW and t.

Prediction of Adsorption capacity with new data

The MLP and RBF networks were examined using a different set of 17 experimental data points that had been removed from the original dataset to evaluate the performance of the created neural network models. The adsorption capacities predicted by these models were then compared with the actual experimental measurements. For further evaluation, an additional random subset of experimental data was used to compare the accuracy of the MLP and RBF networks. This accuracy was quantified by comparing the predicted outputs with the experimental values and calculating the %AARD. As highlighted in Table 7 and Fig. 16, the RBF model achieved a higher R2 value of 0.9803, outperforming the MLP network, which had an R2 value of 0.9775. This indicates that the RBF network demonstrated superior accuracy in predicting adsorption capacities, showcasing its better ability to capture underlying patterns within the dataset. These 17 data points were selected at approximately equal intervals across the entire dataset and were excluded from all training, validation, and internal testing phases. This sampling strategy ensured that the final evaluation was based on unseen and independently distributed data, offering a reliable assessment of the models’ generalization performance.

Table 7 Prediction of adsorption capacity using MLP and RBF.
Fig. 16
Fig. 16The alternative text for this image may have been generated using AI.
Full size image

Linear regression between new experimental data and (a) RBF and (b) MLP outputs.

Conclusion

Sulfur removal from fuels remains a critical challenge to mitigate environmental damage, and zeolites, with their tailored adsorption properties and high surface area, play a vital role in achieving effective desulfurization. This study employed RSM to examine the impact of surface area, micropore volume, temperature, time, and sulfur compound molecular weight on adsorption capacity. The quadratic model achieved high predictive accuracy, with an adjusted R2 of 0.9502 and a predicted R2 of 0.9475, indicating excellent alignment with experimental data. Perturbation plots and Pearson correlation analysis further highlighted the significant effects of individual parameters. ANN models were developed as robust predictive tools to overcome RSM’s limitations in addressing nonlinear interactions. Among twelve learning algorithms tested for the MLP network, the TrainLM algorithm emerged as the most effective, achieving an R2 value of 0.9919 and an MSE of 0.0023 after just 15 epochs. The optimal MLP model incorporated two hidden layers with 45 neurons each, utilizing purelin for the output and the Tansig activation function in hidden layers. RBF model surpassed the MLP in accuracy, delivering an R2 value of 0.9951 and an MSE of 0.0015. Validation with new datasets confirmed the reliability of both networks, with the RBF model achieving an R2 value of 0.9803 compared to the MLP’s 0.9775. These results underline the ANN’s superior capability to handle complex nonlinear relationships, making it a valuable alternative to RSM. Additionally, GSA identified micropore volume as the most influential factor governing sulfur adsorption, while uncertainty analysis confirmed the stability and robustness of the optimized MLP model through narrow confidence intervals across repeated runs. In conclusion, this study demonstrates the potential of ANN, particularly the RBF network, in optimizing adsorptive desulfurization processes. By reducing experimental requirements and enhancing predictive capabilities, these models open pathways for advancing environmental technologies and achieving ultra-low sulfur fuels with greater efficiency.