Introduction

The combustion of sulfides in transportation fuel produces toxic sulfur components, prompting strict regulations worldwide to limit their emissions. Within the petroleum industry, organic sulfur compounds like disulfides, mercaptans, and thiophenes contribute to these emissions, impacting the environment, human health, and refinery equipment. Recent mandates from authorities such as the US Environmental Protection Agency and the European Union highlight the urgency in addressing this issue1,2,3. Over the past 2 decades, researchers have identified several alternative methods to hydrodesulfurization (HDS)4,5 for sulfur removal that are effective under mild conditions. The leading technologies among these are6,7: bio-desulfurization (BDS)8, adsorptive desulfurization (ADS)9,10,11, extractive desulfurization (EDS)12, and oxidative desulfurization (ODS)13. The global challenge of removing organic sulfur from liquid fuel prompts the exploration of cost-effective solutions. Adsorptive desulfurization emerges as a promising technology to tackle this issue. With its competitiveness, ADS has garnered considerable interest from both academia and industry14,15. Porous solid sorbents, including activated carbon (AC)14,16,17, zeolites9,18 and MOF19,20 have gained attention for their cost-effectiveness, low energy consumption, and ease of regeneration. In air pollution control, they are extensively utilized for desulfurization due to their ability to achieve ultra-low sulfur levels and high selectivity for sulfur compounds. Additionally, adsorptive desulfurization with porous adsorbents is gaining traction as an alternative for ultra-deep desulfurization. This method offers advantages such as easy accessibility and renewability of adsorbents, moderate operating conditions, low energy consumption, thermal stability, and desirable textural features21,22. Activated carbon is a highly effective material for removing pollutants due to its extensive microporosity and surface area23. This makes it a valuable adsorbent for capturing sulfur oxides (SO2) from various sources. Studies confirm that AC’s superior performance stems from its large surface area, functional surface groups, and spacious pores24. Notably, AC can be derived from various renewable precursors like cow dung, rice husk, or wood. However, raw AC may require modification for optimal sulfur removal. Through a process involving impregnation with transition metals (like copper or nickel), AC’s efficiency in capturing aromatic sulfur compounds from fuel oils is significantly enhanced7. The understanding of adsorption equilibria is facilitated through mathematical formulations known as adsorption isotherms. Additionally, adsorption kinetics models are utilized to describe the temporal evolution of the adsorption process through rate equations25,26. Artificial intelligence (AI) algorithms have demonstrated their efficacy in addressing diverse real-world challenges across engineering and industrial sectors, with notable applications in the petroleum industry27. Given the time-consuming and expensive nature of laboratory experiments, the utilization of predictive mathematical models becomes imperative. Artificial neural networks have emerged as powerful tools in various fields, offering superior accuracy in predicting nonlinear multivariate simulations compared to conventional linear methods. ANN, comprising input, hidden, and output layers, facilitate modeling without prior knowledge of the underlying physical or chemical processes, making them applicable in diverse areas such as adsorption and reaction kinetics28. In the context of desulfurization capacity forecasting, ANN have proven beneficial, offering a viable alternative where traditional insights are lacking. Utilizing experimental data, both multilayer perceptron and radial basis function networks are developed, along with a quadratic model, to establish semi-empirical predictive models. A comparative analysis between ANN and response surface methodology models highlights the predictive capabilities of each approach, with ANN showing promise in optimizing the adsorption process to maximize desulfurization uptake capacity. The versatility of ANN models further extends to facilitating desulfurization adsorption plant design, underscoring their significance in modern nonlinear statistical modeling29,30. Noora Naif Darwish31 conducted a study investigating the adsorptive desulfurization of diesel oil using two commercial powdered activated carbons (PAC1 and PAC2) and one granular activated carbon (GAC). Darwish’s research evaluated the impact of adsorbent amount, temperature, and contact time on sulfur removal and the ignition quality of diesel fuel. Adsorption isotherms were analyzed using Langmuir and Freundlich models, with the Freundlich model fitting PAC1 and PAC2 well. Additionally, a feed-forward neural network was used to correlate experimental data, demonstrating accurate predictions of sulfur removal capacities. This study aims to develop advanced neural networks for the adsorption desulfurization of liquid fuels, a critical process in reducing harmful sulfur emissions. We will explore the adsorption desulfurization using both RSM and ANN, with a focus on the MLP and RBF approaches. Key independent factors in this study include surface area, temperature, concentration, time, and fuel/adsorbent. These variables will serve as inputs, while the removal percentage of sulfur will be the response variable for the modeling approaches. The RSM technique will be employed to identify ideal parameters and design a semi empirical model that accurately reflects the impact of these independent variables on the removal percentage. This method is particularly valuable for understanding the interactions between variables and optimizing the desulfurization process under various conditions. Similarly, the primary goal of the artificial neural networks approach is to establish the optimal network architecture and ascertain the most effective network biases and weights. This involves a detailed exploration of the relationship between the removal percentage and independent variables For the MLP network, this includes determining the ideal size of the hidden layer, selecting the appropriate number of neurons and the most suitable training functions. In the radial basis function network, the focus will be on optimizing the number of neurons, the spread value, and the total count of epochs. In addition to the core modeling objectives, this study will also compare the predictive accuracy and robustness of the RSM and ANN approaches. By leveraging the strengths of both methods, we aim to develop a comprehensive modeling framework that can be applied to various desulfurization scenarios. The insights gained from this study will not only advance the field of fuel desulfurization but also contribute to the broader application of neural network modeling in environmental engineering. Furthermore, this research will address the scalability and practical implementation of the optimized models in industrial settings. By conducting sensitivity analyses and validation with experimental data, we aim to ensure that the developed models are both reliable and applicable to real-world desulfurization processes. The ultimate goal is to enhance the efficiency and effectiveness of adsorption desulfurization, thereby contributing to cleaner fuel production and reduced environmental impact. Mazen Khaled32 conducted a detailed investigation into the adsorption isotherms and kinetics of thiophene and dibenzothiophene (DBT) in hexane, serving as a model diesel fuel. This study compared the adsorption efficiency of thiophene and DBT on multi-walled carbon nanotubes (MWCNT) and graphene oxide (GO) with that of activated carbon, which was used as a benchmark. Results indicated that MWCNT achieved a DBT removal efficiency of 68.8%, almost twice the efficiency of GO, though it was about 25% less efficient than AC. Furthermore, the research demonstrated that these adsorbents were significantly more proficient in removing DBT from the model fuel than thiophene. Olawumi Oluwafolakemi Sadare33 research centered on the creation and assessment of a dual process combining adsorption and bio-desulfurization (AD/BDS) to eliminate sulfur compounds from South African petroleum distillates. The study utilized adsorbents like pomegranate and neem leaf powder, activated carbon, carbon nanotubes (CNTs) and functionalized carbon nanotubes (FCNT) to test their efficiency in removing dibenzothiophene from both model and actual diesel. The effectiveness of adsorption was measured through comprehensive characterization and kinetic analyses. The results demonstrated that the integrated AD/BDS process is highly effective in desulfurizing diesel, ensuring compliance with strict environmental standards. Ibrahim and Aljanabi34 conducted a study on batch adsorption desulfurization of diesel fuel with an initial sulfur content of 580 ppm, utilizing activated carbon for the physical adsorption of refractory sulfur compounds. They examined the effects of various parameters such as temperature, time, ratio of diesel to activated carbon, particle size of activated carbon, mixing speed and initial sulfur concentration on desulfurization efficiency. The process reduced the sulfur concentration to 247 ppm, achieving a 57% desulfurization efficiency. Kinetic analysis revealed that the pseudo-second order model accurately predicted the equilibrium sorption capacity, while the Freundlich isotherm provided the best fit for the adsorption data, indicating physical adsorption. Nunthaprechachan et al.35 investigated the removal of dibenzothiophene from n-octane using activated carbons derived from sewage sludge (S-ACs). They investigated how various activating agents, ratios, carbonization temperatures, and durations impact the properties of S-ACs and their ability to adsorb DBT. The S-AC activated with KOH demonstrated the most effective DBT removal, achieving a rate of 70.6%, which exceeded that of commercially available activated carbon. The adsorption process adhered to the Langmuir isotherm model. Al-Khodor et al.36 studied sulfur removal from crude oil 2.5 wt.% from Iraq’s Halfaya Oil Field using activated carbon adsorption. They investigated AC dosage, time, and temperature effects, applying Langmuir, Freundlich, and Temkin isotherm models. Temkin models best fit the data. Thermodynamic assessments indicated spontaneous, endothermic adsorption, reducing 28% desulfurization efficiency crude oil.

The summary of experimental adsorption desulfurization processes are presented in Table 1. The results of the conducted research show that there is a need for recognition of effective factors and the interaction of factors with each other on the efficiency percentage of removing sulfur compounds. Therefore, in this research, two methods of RSM and neural networks models have been used to evaluate and analyze the behavior of the adsorptive desulfurization process. Using the combination of the two methods, the effective factors on the removal percentage of sulfur compounds were determined.

Table 1 Summary of experimental works on adsorption desulfurization processes.

The novelty of this study is uses both artificial neural networks and response surface methodology to simulate and optimize the removal of sulfur from liquid fuels using carbon-based adsorbents. In this study, ANN models and RSM are thoroughly compared using both radial basis function (RBF) and multilayer perceptron (MLP) algorithms, with ANNs exhibiting better prediction accuracy. In addition to outperforming RSM, the ANN models improved the determination coefficient (R2) and reduced prediction errors (MSE) to provide higher accuracy when compared to conventional experimental techniques. Finding important variables like temperature and concentration also helps to improve the optimization process and provides a more reliable and effective option for process improvement than traditional experimental methods. Mohammad Shokouhi et al. used artificial intelligence techniques to create a neural network model optimized using a genetic algorithm (GA) to forecast the solubility of hydrogen sulfide (H2S) in N-methylpyrrolidone (NMP). The model was configured using experimental data, with solubility as the output and temperature and pressure as inputs. Using GA, neural network parameters such as learning rates, momentum, and neuron count were changed to optimize the model’s architecture design. Using the Peng–Robinson–Stryjek–Vera (PRSV) cubic equation of state, the predictions of the GA-ANN model were evaluated using experimental and thermodynamic data. The GA-ANN model showed remarkable accuracy and was a dependable and effective method of predicting H2S solubility in NMP, at least in comparison to the PRSV model, which had an absolute relative deviation (ARD%) of 1.91%37. Morteza Esfandyari et al. used particle swarm optimization (PSO) to develop an adaptive neuro-fuzzy inference system (ANFIS) and artificial neural networks (ANNs) for predicting important performance parameters in a double-pipe counter-flow heat exchanger. The hot flow rates (113–257 l/h), inlet fluid temperatures (40–60 °C), ultrasonic excitation power levels (0 and 60 W), and nanoparticle volume percentages (0–0.8%) were all varied in their studies. The Nusselt number, number of transfer units (NTU), heat transfer rate, and system effectiveness were all predicted using the models. With correlation coefficients above 94.84%, their results showed that the ANN-PSO and ANFIS-PSO models were both quite accurate38 (Table 2).

Table 2 Overview of studies on adsorption and prediction models.

Process description

Adsorption desulfurization is a pivotal technique in liquid fuel treatment, known for its effective removal of sulfur contaminants under relatively gentle operational parameters. Adsorption desulfurization operates under mild conditions, exhibits high selectivity towards thiophenic compounds, and achieves superior desulfurization efficiency. Adsorption, fundamentally involves a process of mass transfer where molecules adhere to a surface through intermolecular forces. This phenomenon plays a crucial role in the effectiveness of adsorption-based desulfurization methods36. The adsorption quantity is determined using the equation provided below:

$$q_{e} = \frac{{V\left( {C_{i} - C_{e} } \right)}}{M}$$
(1)

Ci represents the initial concentration and Ce denotes the equilibrium concentration. V stands for the volume, while M denotes the mass. Adsorption stands as a promising method for desulfurizing liquid fuels, offering a selective approach to remove sulfur compounds. The core principle involves extracting sulfur contaminants, constituting less than 1% of the fuel mass, through targeted adsorption. This method effectively purifies the fuel, preserving 99% of its sulfur-free composition43. Adsorption desulfurization involves the utilization of an adsorbent to extract sulfur compounds from liquid fuels. This method optimizes temperature and pressure conditions to ensure effective interaction between the adsorbent and sulfur-containing fuel. During the desulfurization process, functional groups within the adsorbent form chemical bonds with sulfur compounds in the liquid, creating covalent bonds or π–π complexes44.

Adsorption desulfurization factors

The adsorbent‘s specific surface area and pore structure are pivotal in determining its capacity to capture sulfur. Temperature and adsorption time between the adsorbent and liquid fuel are critical parameters affecting adsorption. Proper optimization of these conditions is essential for achieving maximum sulfur removal efficiency. The type and concentration of sulfur compounds in the liquid fuel directly impact the kinetics of adsorption and overall desulfurization performance. There are three kinds of factors including adsorbent characteristics, operating conditions, and fuel properties impact on adsorption capacity of desulfurization adsorbents.

ADS can achieve profound sulfur removal, effectively reducing sulfur content to very low levels (Fig. 1). ADS applies to a wide array of liquid fuels, encompassing petroleum-based fuels, biofuels, and synthetic fuels. ADS processes are generally regarded as environmentally friendly, generating minimal waste and emissions. ADS systems can be tailored to function under diverse conditions, encompassing both continuous and batch processes44,45.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Flowchart diagrams of an experimental setup for a fixed-bed system in adsorptive desulfurization17.

Data collection

In this study, the dataset employed for training, testing, and validation was from earlier empirical research, yielding a substantial compilation of 274 data points across five separate references32,33,34,35,36. The experimental data, detailed in Table 3, pertain to the adsorption desulfurization of liquid fuel and encompass a variety of characteristics. The integration of this comprehensive dataset facilitates robust model training and validation, thereby enhancing the reliability and accuracy of the predictive models.

Table 3 The information and specifics of the data utilized in adsorptive desulfurization.

Response surface methodology

Response surface methodology is a statistical method employed to explore the relationships between input parameters and sulfur removal efficiency using preprocessed data. RSM, rooted in experimental design and analysis, aims to identify the optimal conditions for various processes. This methodology offers numerous advantages, including high efficiency, low cost, robust optimization and data analysis capabilities, and broad applicability. For researchers and engineers, RSM is an invaluable tool for optimizing process parameters, thereby enhancing production efficiency and product quality while simultaneously reducing costs and boosting competitiveness46. RSM is characterized by its visual, nonlinear, and multivariable optimization capabilities, leveraging multiple regression methods to address complex nonlinear and multivariable problems. It also provides visual analysis results that facilitate a clearer understanding of the data47. By empirically analyzing the effects of input variables, RSM aids in system optimization. The three fundamental processes in RSM are the systematic design of experiments, regression modeling approach, and optimization technique. Implementing RSM minimizes the total number of experiments required and allows for monitoring the synergistic influence of each independent variable throughout the process48. The two-factor interaction (2FI) model is a tool employed in response surface methodology to examine the linear effects of two factors and their interactions. In this model, it is assumed that the factors X1 and X2 have a direct linear impact on the response variable Y, while the interaction between these two factors is also taken into account. The regression equation for the 2FI model can be expressed as:

$${\text{Y}} =\upbeta _{0} + \upbeta _{1} {\text{X}}_{1} + \upbeta _{2} {\text{X}}_{2} + \upbeta _{12} {\text{X}}_{1} {\text{X}}_{2} +\upvarepsilon$$
(2)

In this equation, β0 represents the intercept, indicating the value of the response Y when both X1 and X2 are zero and β1 is the regression coefficient associated with factor X1, reflecting its linear effect on the response Y and β2 is the regression coefficient for factor X2, representing its linear impact on the response Y and β12 denotes the interaction coefficient between X1 and X2, showing how the combination of these two factors influences the response and ε accounts for the random error within the model49,50,51. The design of experiments is a systematic approach that incorporates various input variables into a process to gain insights and establish optimal input, output relationships. This approach provides multiple benefits, including identifying crucial factors impacting the process, reducing costs, developing process models, and developing process models. By examining how independent variables interact and impact the dependent variable, experimental design can develop a semi-empirical model that precisely pinpoints influential factors and optimizes the process, all while reducing costs. Response surface methodology is a statistical technique utilized in experimental design to forecast the relationship between independent and dependent variables through a surface model. This approach includes a variety of mathematical and statistical models aimed at improving, optimizing, and analyzing different processes by investigating the links between input variables and output responses52. Table 2 shows the input and output parameters of the process.

Artificial neural network

The first study on artificial neural networks traces back to 1943, when McCulloch and Pitts53 devised a rudimentary neural network computational model. In 1949, Hebb54 proposed learning principles for neural networks. Since then, ANN has undergone rapid development and has found extensive application across diverse domains55. Artificial neural networks offer an alternative approach to tackling complex formulation problems by adeptly managing intricate data and establishing mathematical links between input and output variables, all without necessitating a comprehensive theoretical understanding of the underlying phenomenon. Moreover, ANN possess predictive capabilities, enabling them to forecast outcomes based on the input data provided56. Artificial neural networks are experiencing widespread application across numerous scientific and engineering domains. In chemical engineering, ANN finds utility in various areas such as process dynamics, process modeling, and the optimization of industrial chemical processes57. The ANN is a computational model that takes inspiration from the human brain, consisting of interconnected nodes called neurons. Typically, an ANN is structured with an input layer, one or more hidden layers, and an output layer. The input layer corresponds to the predictor variables in regression analysis. Within the hidden layers, complex computations occur via weighted connections. The output layer finalizes the computation, producing the network’s outputs58. The essence of an artificial neural network lies in its representation of a process through mathematical modeling, forged not through conventional mass and energy balances, but rather through empirical development. Mimicking the intricate network of neurons in the human brain, an ANN comprises interconnected processing elements or nodes, organized into layers. These layers are linked through interconnections, forming a complex web of information processing. Illustrated in Fig. 2, the fundamental structure of a single neuron or node within an ANN model encompasses inputs, an activation function, and a solitary output. The connections between nodes are quantified by calculated values known as weights, which denote the “strength” of inter-neuronal connections, ultimately influencing the output, denoted by y59.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

A schematic depiction of a node within a neural network.

Within the hidden layer of an ANN, each neuron processes weighted inputs and biases from every neuron in the previous layer, as shown in Eq. (3):

$$Z_{i} = \left( {\mathop \sum \limits_{k = 1}^{{N_{j - 1} }} X_{k}^{j - 1} W_{k,i} - b_{k} } \right)$$
(3)

where \(X_{k}^{j - 1}\) signifies the input originating from the k-th node within the j-th layer, \(W_{k,j}\) stands for the weight of the connection linking node k with all nodes in the preceding layers, while \(b_{i}\) represents the bias associated with the node. \(N_{j - 1}\) denotes the total count of nodes within layer j − 1.

Following the summation, an activation function is applied to compute the output of each node, which is computed as \(Y_{i} = f\left( {Z_{i} } \right)\). Among the various activation functions, the sigmoid function is the most frequently employed, which is defined as follows59:

$$f\left( {Z_{i} } \right) = \frac{1}{{1 + e^{{Z_{i} }} }}$$
(4)

The mean square error function is also utilized to compute the network output error. This can be represented by the following equation:

$$MSE = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {Y_{i} - T_{i} } \right)^{2}$$
(5)

where \(Y_{i}\) and \(T_{i}\) represent the network output and the target value, respectively. N represents the quantity of experimental data points used in the investigation. The MSE is instrumental in determining the adjustments to the network’s weights and biases. This process continues, with the MSE being recalculated iteratively, until it falls below a predefined threshold or another stopping condition is met60.

In this study, the absolute value of the relative error, AARE, is used to compare neural networks with the RSM model and the performance of the resultant network is demonstrated through R2 and MSE. Furthermore, the efficacy of the generated network was assessed via the computation of R2, MSE, and AARD metrics.

$$R^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{predicted} - Y_{actual} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{predicted} - Y_{mean} } \right)^{2} }}$$
(6)
$$AARD\left( \% \right) = \frac{100}{n}\mathop \sum \limits_{i = 1}^{n} \left| {\frac{{Y_{predicted} - Y_{actual} }}{{Y_{predicted} }}} \right|$$
(7)

In this context, Ypredicted denotes the anticipated value of Y derived from both ANN and RSM, whereas Yactual signifies the empirical value of Y. Ymean corresponds to the average value of Y61.

Multilayer perceptron neural networks

The perceptron algorithm, pioneered by Rosenblatt in the late 1950, is widely acknowledged as a prevalent and extensively utilized model in supervised machine learning62. The MLP is a widely recognized type of ANN and is considered a reliable technique for addressing modeling tasks across various fields63. The term “ANN model” refers to the specific arrangement and interconnections of neurons within the network. Over time, numerous ANN models have been proposed, each tailored to different applications and advancements in the field. The MLP is a specific case of the feedforward neural network (FNN), characterized by fully connected neurons between each preceding and succeeding layer. When the number of hidden layers increases, the MLP is also referred to as a deep neural network (DNN). In the training phase, the connection weights are updated in a backward manner based on the error magnitude, leading to the alternative name “feedforward backpropagation networks” (FFBN). These terms are often used interchangeably in the literature64. A typical MLP is composed of three layers: the input layer, one or more hidden layers, and the output layer. The hidden layer can comprise one or more layers, each containing specific processing units known as neurons. The size of the input and output datasets dictates the number of neurons present in these layers. Various methods are employed to evaluate the optimal number of neurons and hidden layers, including the trial-and-error approach and more sophisticated intelligent methods65. The MLP’s capability to approximate any nonlinear function is attributed to its ability to tune weights and activation functions effectively66. One of the most commonly used methods for training MLPs is the backpropagation algorithm, which is based on the error-correction learning rule. This algorithm adjusts the model parameters in a manner that minimizes the error function, thereby reducing the overall model error. The backpropagation training process can be summarized in two main steps:

  • Forward pass: The input vector is fed into the multilayer network, propagating through the hidden layers until it reaches the output layer. The output layer generates a response vector, which represents the MLP’s final output.

  • Backward pass: During this phase, the MLP’s parameters are updated and refined. This involves adjusting the weights of the neurons in the hidden layers according to the error-correction rule. The adjustments aim to minimize the discrepancy between the predicted outputs of the neural network and the actual target values.

Through these iterative processes, the MLP continuously improves its performance, making it a powerful tool for modeling complex, nonlinear relationships in data67. The performance of the MLP model can fluctuate due to the initial weights assigned to the input parameters, which is considered a significant drawback. To overcome this issue, the model is executed multiple times, and the most accurate iteration is chosen as the final version68. Equation (7) illustrates the output of any perceptron69.

$$\gamma_{i} \left( {x^{\left( j \right)} } \right) = \varphi \left( {\mathop \sum \limits_{i = 1}^{N} w_{ik} x_{k}^{\left( j \right)} + b_{i} } \right)$$
(8)

Here, φ(x) denotes non-linear activation function, γi represents the output of the ith neuron, x(j) denotes the input for the jth layer, xk(j) signifies the value of the kth neuron within the jth layer, wik stands for the weight between the ith and kth neuron, and bi corresponds to the bias value linked to the ith neuron70.

Figure 3 depicts the structure of an MLP featuring three hidden layers, with each neuron in one layer is fully connected to every neuron in the preceding layer. The main goal during training is to reduce the mean squared error. This is achieved by creating an error signal from the activation function, which is then propagated backward through the layers to update the weights. Adjustments that effectively reduce the overall MSE are prioritized.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Schematic diagram of MLP network with three hidden layers.

Radial basis function (RBF)

In the late 1980s, Moody and Darken introduced the RBF neural network, renowned for its self-learning abilities, efficient storage, rapid search speed, and quick computation time71. Radial basis function neural networks (RBF-NNs) utilize radial basis functions as activation functions, distinguishing them as a specific type of artificial neural network. These networks are particularly known for their ability to approximate functions and are widely used in pattern recognition, classification, and interpolation problems. The architecture of an RBF-NN typically consists of three layers: the input layer, a hidden layer with a non-linear RBF activation function, and a linear output layer. Radial basis function neural networks are a subset of artificial neural networks characterized by their three-layer structure: the input layer, the hidden layer, and the output layer. Notably, RBF-NNs have a single hidden layer that plays a critical role in their operation. Within this hidden layer, there are units known as neurons or hidden units, each of which utilizes a radial basis function as its activation function. The performance of RBF-NNs is significantly influenced by certain parameters known as weight terms. These weights are pivotal to the network’s ability to learn and generalize from the data. The values of these weight terms are optimized using a method called the gradient descent approach. This optimization technique iteratively adjusts the weights to minimize the error between the network’s predictions and the actual target values, thereby enhancing the network’s accuracy and performance65. The radial basis function neural network is a type of feedforward neural network renowned for its outstanding performance. This network excels at approximating non-linear functions with a high degree of accuracy and achieves global approximation without encountering local minima. Additionally, due to its compact topological structure, the RBF network boasts a rapid learning speed, making it highly efficient and effective for various applications72. While numerous activation functions exist for radial-based neurons, the Gaussian function stands out as the most widely utilized73. It is mathematically represented as Eq. (8)74:

$$\varphi_{i} \left( {\left\|x - c_{i}\right\| *b} \right) = \exp \left( { - \frac{1}{{2\sigma_{i}^{2} }}\left( {\left\|x - c_{i}\right\| *b} \right)^{2} } \right)$$
(9)

Here, x signifies the input, φi represents the output, while ci and σ denote the center and spread parameters of the Gaussian function, respectively. Moreover, the variable b stands for the bias term. To derive the network’s output (y), a linear combination of the activation function and the weight vector w of the output layer is computed.

$$y = \mathop \sum \limits_{i = 0}^{n} \varphi_{i} w_{i}$$
(10)

In this context, wi represents the combined weighted value attributed to the ith basis function. Figure 4 showcases the configuration of the RBF network employed in this investigation.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Architecture of a typical RBF network model.

ANN model design

As outlined in the Fig. 5 algorithm, the initial step involved consolidating experimental data. Input variables such as surface area, fuel/absorbent ratio, temperature, sulfur concentration, and time were integrated, with removal percentage serving as the output. Subsequent to data integration, the second phase entailed normalizing both input and output data. The artificial neural network model was trained on 70% of the complete dataset, adjusting parameters such as weights, biases, and thresholds to improve the model’s effectiveness. Validation of the network was conducted using a dedicated 15% subset, while testing utilized another 15%. Evaluating the trained model’s accuracy involved employing statistical measures like R2 value and MSE, contrasting predictions against observed data. Determining the best configuration for the multilayer perceptron model involved experimenting with different parameters, including hidden layer structures and training algorithms, to achieve optimal predictive performance. In contrast, the process of determining the ideal neuron count for the RBF network often involves iterative experimentation, initially employing a surplus of neurons and gradually reducing their number until reaching minimal MSE levels. Training algorithms cease operation upon achieving optimized error levels during input testing.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Flow chart of ANN model.

Results and discussion

Both MLP and RBF models demonstrated satisfactory performance in predicting sulfur removal efficiency. Performance evaluation metrics, such as mean squared error and coefficient of determination (R2), were employed to assess model accuracy. The RBF network exhibited superior performance compared to the MLP network, achieving a higher R2 value and lower MSE, indicating a more accurate prediction of sulfur removal efficiency. Sensitivity analysis was conducted to investigate the influence of each input parameter on sulfur removal. The analysis revealed that temperature and fuel-to-adsorbent ratio were the most significant factors affecting sulfur removal efficiency.

RSM results

Analysis of variance (ANOVA)

The results of the ANOVA analysis are presented in Table 4, showing the significance of the model with the F-value and the likelihood associated with the ANOVA test with the P value75. Factors in the model are deemed significant if their P values are below 0.05, while P values exceeding 0.1 suggest that the factors are not significant29. The analysis revealed that the model exhibited a significant F-value of 194.32. Among the independent variables, only parameter A had a value slightly above 0.05, indicating that nearly all parameters are significant. Term C demonstrated the highest F-value and the lowest p > F value, signifying that temperature has the greatest influence on the removal percentage, followed by concentration and the fuel-to-adsorbent ratio. The model also showed a strong signal with a ratio of 52.5912. Table 5 presents the static parameters of the fit for the responses based on 274 observations.

Table 4 ANOVA results.
Table 5 The statistical parameters were utilized to model the responses in the 2FI model.

Correlations

The 2FI equation presented in Eq. (10) was used to model the experimental results, demonstrating the influence and interaction between the variables.

$$\begin{aligned} Removal & = - 1883.55111 + 0.221732 \times A + 10.81465 \times B + 6.24666 \times C + 1.52885 \times D \\ & \quad + - 0.346199 \times E + - 0.000141 \times AB + - 0.000621 \times AC + - 8.28035e - 06 \times AD \\ & \quad + - 5.88762e - 06 \times AE + - 0.035731 \times BC + - 0.000366 \times BD + 0.002749 \times BE \\ & \quad + - 0.004976 \times CD + - 0.000143 \times CE + 0.000293 \times DE \\ \end{aligned}$$
(11)

Analysis of parameter interdependencies using correlation matrix

The Fig. 6 illustrates a correlation matrix that displays the impact of various parameters on each other. Positive values indicate a positive correlation between two parameters, while negative values indicate a negative correlation. The intensity of the colors represents the strength of the correlation; darker colors signify stronger correlations.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Correlation matrix illustrating the interdependencies among various parameters.

Perturbation plots

The perturbation plot facilitates a comprehensive comparison of the impacts of all five process parameters on the removal percentage at the center point. By employing a single plot, the influence of each parameter is distinctly observable. Figure 7 shows the perturbation plot for Removal percentage. It has been well-established that the removal percentage increases significantly as the sulfur concentration rises, while it significantly decreases with higher fuel/adsorbent ratios and elevated temperatures. In contrast, surface area and contact time have relatively moderate effects.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Perturbation plots for removal percentage.

Effect of adsorptive desulfurization parameters on removal percentage

In this study, 3D response surface plots were generated using Design-Expert version 13.0 to illustrate the interactions between important independent variables and their impact on desulfurization efficiency. Figure 8a–e provide a detailed examination of these interactions. Figure 8a shows the removal percentage as a function of the fuel/adsorbent ratio and desulfurization time. The plot indicates that the removal percentage increases with an increase in desulfurization time and a decrease in the fuel/adsorbent ratio. This suggests that longer processing times and higher adsorbent concentrations are beneficial for enhancing the desulfurization efficiency. Figure 8b illustrates the removal percentage as a function of sulfur concentration and desulfurization time. The removal percentage is observed to increase with both increasing sulfur concentration and desulfurization time. This indicates that higher initial sulfur concentrations and extended treatment durations can lead to more effective sulfur removal. Figure 8c depicts the removal percentage as a function of sulfur concentration and BET surface area during desulfurization. The plot shows that the removal percentage increases with an increase in both BET surface area and sulfur concentration. This suggests that a larger surface area available for adsorption, along with higher sulfur concentrations, improves the efficiency of sulfur removal. Figure 8d shows the removal percentage as a function of the fuel/adsorbent ratio and sulfur concentration. The plot indicates that the removal percentage increases with increasing sulfur concentration and decreasing fuel/adsorbent ratio. This implies that higher sulfur concentrations and greater amounts of adsorbent relative to oil enhance the desulfurization process. Figure 8e illustrates the removal percentage as a function of the fuel/adsorbent ratio and BET surface area. The plot demonstrates that the removal percentage increases with an increase in BET surface area and a decrease in the fuel/adsorbent ratio. This suggests that maximizing the adsorbent’s surface area and optimizing the fuel/adsorbent ratio are crucial for achieving higher desulfurization efficiency.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

(a) Fuel/adsorbent ratios and time on removal percentage; (b) time and sulfur concentration on removal percentage; (c) surface area and sulfur concentration on removal percentage; (d) sulfur concentration and fuel/adsorbent ratios on removal percentage; and (e) fuel/adsorbent ratios and surface area on removal percentage.

These observations provide valuable insights into the optimization of desulfurization processes, highlighting the importance of key parameters such as desulfurization time, sulfur concentration, fuel/adsorbent ratio, and surface area (BET) in enhancing removal efficiency.

Results of ANN

Optimization and prediction

In order to achieve the best results on the test data, the architecture of the multilayer perceptron network was carefully crafted. Various factors were meticulously taken into account, including the quantity of layers and neurons within each layer, the activation function utilized for each layer, the number of epochs, and the training function. In the same way, the RBF network underwent optimization through fine-tuning its characteristics, such as adjusting the number of neurons, epochs, and the training function. In order to avoid overtraining and make reliable predictions, it is essential to reduce the network’s runtime and assess the model’s accuracy using data that was not used in the training phase. Overtraining occurs when the network fits the training data effectively but struggles to perform well on new data, leading to low accuracy and lack of generalizability. Hence, it is crucial to validate the network’s performance by comparing predicted values with test data. As shown in the preceding equations, the models’ performance was assessed using analytical criteria, specifically the MAE, MSE, RMSE, and R2. The results are shown in Table 6.

Table 6 Analytical standards for evaluating various models.

Three different training algorithms were used to determine the best algorithm for the ANN model: Levenberg–Marquardt (trainlm)46, Bayesian Regularization (trainbr)47, and Scaled Conjugate Gradient (trainscg)48. Various activation functions were also explored in this study. The sigmoid function (tansig) was selected for the neurons in the hidden layers, and the linear function (purelin) was chosen for the output layer, as detailed in Table 7.

Table 7 The Tansig and Purelin activation functions were used, with n represents the input signal and an represents the output.

Various performance metrics, such as MSE, R squared R2, and the number of epochs, were assessed to identify the optimal neural network configuration as shown in Table 8. The MLP model was tested with neuron counts ranging from 1 to 46. The data clearly shows that the trainlm algorithm performs better than all other configurations in terms of all performance metrics. Therefore, the trainlm algorithm was chosen as the preferred approach for training the network.

Table 8 The outcome of training algorithms for Backpropagation in Multilayer Perceptron.

The study observed that the mean squared error for the multilayer perceptron network increased with the addition of more hidden layers and neurons. However, beyond two hidden layers, the increase in MSE was not significant. Consequently, the number of hidden layers was capped at three to avoid unnecessary computational slowdowns. For a comparative analysis with the MLP network, the parameters of the Radial Basis Function network needed optimization. The optimization of the spread parameter was carried out using network data, as depicted in Fig. 9. The Radial Basis Function network, producing two outcomes, employed the average MSE to identify the optimal spread. Furthermore, Fig. 10 illustrates that increasing the number of neurons led to a further decrease in MSE and improved the results. The Gaussian function-based RBF model achieved optimal performance with 43 neurons. Figure 11a and b demonstrate that the multilayer perceptron and radial basis function networks achieved the most favorable MSE validation outcomes following 32 and 40 epochs, respectively. The optimal MSE value for the MLP network’s removal percentage was 0.0028, while for the RBF network, it was 0.0026. Figure 12 demonstrates that the regression values R2 for training, testing, and the overall selected structure were nearly identical for both MLP and RBF networks, with R2 = 0.980 for MLP and R2 = 0.981 for RBF. The optimal MLP network configuration consisted of three hidden layers with 9, 17, and 20 neurons, respectively. This configuration was found to be the most effective in achieving low MSE and high regression accuracy, underscoring the importance of careful network architecture design in optimizing desulfurization processes.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

The relationship between the spread and MSE in RBF neural network.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

The relationship between the number of neurons and mean squared error in RBF neural network.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

The MSE for the MLP and RBF models is validated to assess the network performance for removal percentage.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

MLP network regression status of (a) training data, (b) validation data, (c) testing data, and (d) all data.

3D response surfaces

The response surface of the ANN model is depicted in the three-dimensional plots in Fig. 13a–c. In these plots, three factors are varied while keeping the other two constant to emphasize the interaction effects between the factors. The results from the response surface methodology indicate that similar behaviors, trends, and magnitudes are observed for the parameters influencing the response.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.
Full size image

3D response surface plots using an artificial neural network with a multilayer perceptron model for (a) fuel/adsorbent ratios and time on removal percentage, (b) time and sulfur concentration on removal percentage, (c) surface area and sulfur concentration on removal percentage.

Comparison of results between the RSM and ANN

In order to assess the efficiency of the RSM model and the trained neural networks, a random set of experimental data was selected and inputted into both the RSM model and the MLP and RBF networks for comparison. The accuracy of these models was evaluated by comparing their predicted response values with the experimental data and calculating the average absolute relative deviation (%AARD). The results of this comparison are shown in Table 9. The neural networks demonstrated lower %AARD values compared to the RSM model, suggesting better predictive accuracy. In addition, the RBF network was more accurate than the MLP network in predicting the output parameter.

Table 9 Comparison of results from several case studies using ANN and RSM.

Conclusion

The growing need for process analysis and optimization, coupled with the enhanced availability of statistical software and computing power, has significantly contributed to the widespread adoption of response surface methodology and artificial neural network modeling tools. This study centers on adsorption desulfurization. Key parameters such as temperature, concentration, surface area, fuel-to-adsorbent ratio, and time were utilized to predict removal efficiency through both response surface methodology and artificial neural network modeling. The response surface methodology approach enabled a detailed examination of the interactions among these factors. Predictions from both modeling techniques were closely aligned with the experimental data. Moreover, response surface methodology was utilized to create two correlations for the response with the 2FI model. ANOVA results indicated that temperature and concentration were the most influential factors affecting removal efficiency. In the context of ANN modeling, both MLP and RBF algorithms were assessed, with minimal differences observed between them. The trainlm algorithm stood out as the most effective among the MLP algorithms, utilizing the tansig function in with three hidden layers (20, 17, and 9 neurons) and the purelin function in the output layer. The MLP network demonstrated optimal performance with an MSE of 0.0028 for removal percentage over 30 epochs, whereas the Radial Basis Function network achieved a superior MSE of 0.0026 for removal percentage over 40 epochs. The ANN and RSM models exhibited average R2 values of 0.981 and 0.919, respectively. ANN models are very useful for optimizing complicated processes like adsorption desulfurization because they provide significant advantages over experimental methods in terms of prediction speed, cost efficiency, and adaptability to a wide range of situations. All things considered, ANN outperformed RSM in terms of efficiency, offering a dependable and potent instrument for next developments in sulfur removal optimization. This study not only shows how effective ANN is in adsorption desulfurization but also shows how it may improve predictive accuracy, cut down on the amount of testing required, and expedite process enhancements in environmental applications.