Modeling crude oil pyrolysis process using advanced white-box and black-box machine learning techniques

Hadavimoghaddam, Fahimeh; Rozhenko, Alexei; Mohammadi, Mohammad-Reza; Mostajeran Gortani, Masoud; Pourafshary, Peyman; Hemmati-Sarapardeh, Abdolhossein

doi:10.1038/s41598-023-49349-x

Download PDF

Article
Open access
Published: 19 December 2023

Modeling crude oil pyrolysis process using advanced white-box and black-box machine learning techniques

Fahimeh Hadavimoghaddam^1,2,
Alexei Rozhenko³,
Mohammad-Reza Mohammadi⁴,
Masoud Mostajeran Gortani⁵,
Peyman Pourafshary⁶ &
…
Abdolhossein Hemmati-Sarapardeh^4,7

Scientific Reports volume 13, Article number: 22649 (2023) Cite this article

3488 Accesses
10 Citations
Metrics details

Subjects

Abstract

Accurate prediction of fuel deposition during crude oil pyrolysis is pivotal for sustaining the combustion front and ensuring the effectiveness of in-situ combustion enhanced oil recovery (ISC EOR). Employing 2071 experimental TGA datasets from 13 diverse crude oil samples extracted from the literature, this study sought to precisely model crude oil pyrolysis. A suite of robust machine learning techniques, encompassing three black-box approaches (Categorical Gradient Boosting—CatBoost, Gaussian Process Regression—GPR, Extreme Gradient Boosting—XGBoost), and a white-box approach (Genetic Programming—GP), was employed to estimate crude oil residue at varying temperature intervals during TGA runs. Notably, the XGBoost model emerged as the most accurate, boasting a mean absolute percentage error (MAPE) of 0.7796% and a determination coefficient (R²) of 0.9999. Subsequently, the GPR, CatBoost, and GP models demonstrated commendable performance. The GP model, while displaying slightly higher error in comparison to the black-box models, yielded acceptable results and proved suitable for swift estimation of crude oil residue during pyrolysis. Furthermore, a sensitivity analysis was conducted to reveal the varying influence of input parameters on residual crude oil during pyrolysis. Among the inputs, temperature and asphaltenes were identified as the most influential factors in the crude oil pyrolysis process. Higher temperatures and oil °API gravity were associated with a negative impact, leading to a decrease in fuel deposition. On the other hand, increased values of asphaltenes, resins, and heating rates showed a positive impact, resulting in an increase in fuel deposition. These findings underscore the importance of precise modeling for fuel deposition during crude oil pyrolysis, offering insights that can significantly benefit ISC EOR practices.

Modeling residue formation from crude oil oxidation using tree-based machine learning approaches

Article Open access 19 July 2025

Enhanced intelligent approach for determination of crude oil viscosity at reservoir conditions

Article Open access 30 January 2023

Enhanced machine learning—ensemble method for estimation of oil formation volume factor at reservoir conditions

Article Open access 14 September 2023

Introduction

In-situ combustion (ISC) is a challenging thermal enhanced oil recovery (EOR) technique, defined as the process of oil recovery by burning the heavy oil in reservoir¹. In this technique, pure oxygen or oxygen-enriched gas is injected into the reservoir to combust a portion of the crudes. In other words, a portion of the oil-in-place is oxidized and utilized as fuel to generate heat². The initiation of combustion front involves either artificial or spontaneous ignition of the oil. Artificial methods, such as gas/air burners, steam/hot fluid injection, or electric ignition, can be employed to ignite the oil deliberately. Alternatively, spontaneous ignition occurs at or near the injection well, often facilitated by downhole igniters³. As oxygen-enriched gas is continuously injected, the combustion front will propagate toward the production well. This causes a lot of heat to be released within the reservoir, reducing oil viscosity and achieving oil recovery. One of the prerequisites for ISC is fuel availability in the reservoir for the sustainability of the combustion front. The fuel served during ISC consists of carbonaceous residues (mainly coke) deposited around the combustion front as a result of thermal cracking, pyrolysis, and distillation of crude oil. Eventually, the recovery of unburned oil is enhanced due to displacement agents made by gases and heat released from combustion, along with changes in the physical and chemical properties of reservoir oil^1,4,5. During ISC, oxidation and pyrolysis of hydrocarbons take place, which strongly affect the quantity and quality of the formed fuel required for the sustainability of the combustion front. Pyrolysis is a chemical reaction involving crude oil exposure to heat in lack of an oxidizing medium^5,6,7. Pyrolysis, cracking, vapourization, condensation, and dehydrogenation may occur during ISC, which affects the physical and chemical properties of the carbonaceous residue and are important for oil production^5,7.

Over time, thermogravimetry analysis (TGA) and differential thermogravimetry (DTG) techniques have been employed as investigative instruments for studying ISC processes^5,8. Specifically, TGA can monitor weight changes during the combustion of fuels or residues, yet it is important to acknowledge that it does not claim complete simulation capability for such intricate phenomena during ISC. Ciajolo and Barbella⁶ investigated the oxidation and pyrolysis of several heavy oils and their fractions using DTG profiles. Low-temperature (< 400 °C) and high-temperature phases were found in the thermal behavior of fuel, which includes the volatilization of paraffinic and aromatic fractions in the first phase, and the pyrolysis of polar and asphaltene fractions causing a particulate carbon residue in the latter. Ranjbar and Pusch⁹ experimentally showed that heat transfer and transferability characteristics of the pyrolysis medium as well as the colloidal composition of oil (such as asphaltenes and resins) have a noticeable impact on the fuel formation and composition. In another study, Ranjbar¹⁰ showed that clay minerals existing in the matrix raise fuel deposition during the pyrolysis process and catalyze the oxidation of fuel. Kok¹¹ studied differential scanning calorimeter (DCS) and TGA of two heavy crudes and showed that the heavier oil deposited larger quantities of residue/fuel after distillation was complete. Karacan and Kok¹² analyzed the pyrolysis of crude oils and their fractions using TGA and DSC and showed that asphaltenes and resins respectively have the most contribution to coke formation. In another laboratory study, Kok and Karacan¹³ showed that as crude oils' °API decreases cracking activation energy increases. They also indicated two main mechanisms along with their temperature ranges for mass loss, which included thermal cracking and vis-breaking (400–600 °C) along with distillation (20–400 °C). Ambalae et al.¹⁴ experimentally indicated that asphaltenes have the largest role in the formation of coke (fuel) among other fractions of crude oil. Kok¹⁵ showed that the heating rate influenced the reaction region peak, intervals, and burn-out temperatures in TG-DTG experiments of crude oil combustion. Li et al.¹⁶ showed that pyrolyzed and oxidized cokes are the main types of coke in the ISC process, releasing more heat than crude oil under similar conditions. A lot of research has been done on the catalytic impact of different compounds on oxidation and pyrolysis of various crudes and cokes^{17,18,19,20,21,22}. The kinetics of combustion and pyrolysis of crudes and their fractions and the deposited coke have been investigated in some studies^{13,22,23,24,25,26,27,28}.

In the realm of crude oil pyrolysis and oxidation, despite extensive laboratory research, recent attention has shifted to modeling approaches, particularly through machine learning regression. This artificial intelligence technique proves valuable in understanding the complex relationships within crude oil pyrolysis and oxidation processes, given the multitude of influencing parameters in ISC. Rasouli et al.²⁹ investigated the pyrolysis of six crudes and represented a multilayer perceptron model for predicting the crude oil residue on the basis of TGA with a 3.5% error. Norouzpour et al.³⁰ modeled crude oil pyrolysis employing a radial basis function neural network based on TGA of six crudes with a 5.8% error. Mohammadi et al.³¹ collected TGA experimental data from nine crude oils’ oxidation and presented a model using a generalized regression neural network with an error of 2.3%. In another study, Mohammadi et al.³² modeled the pyrolysis of 11 crude oils based on TGA data by applying a cascade forward neural network with an error of 1.04%. Despite the existence of several models for predicting crude oil pyrolysis, gathering more experimental data and applying cutting-edge and robust black-box and white-box machine learning techniques have the potential to engender streamlined mathematical correlations, thereby yielding more precise intelligent models.

In this study, 2071 experimental TGA findings for 13 different crude oils are gathered from the literature in order to precisely represent crude oil pyrolysis, which is a crucial reaction in the ISC EOR process. Four robust machine learning techniques, including three black-box approaches (Categorical Gradient Boosting—CatBoost, Gaussian Process Regression—GPR, Extreme Gradient Boosting—XGBoost), and a white-box approach (Genetic Programming—GP), are used to model the residual mass of crude oils at various temperatures obtained from TGA. High-precision statistical and graphical error analyses are utilized to validate the developed models and mathematical correlation. Eventually, sensitivity analysis is carried out to reckon the relative effect of inputs on crude oil residue obtained during pyrolysis.

Data gathering and preparation

In this study, 2071 experimental TGA findings related to 13 distinct crude oils were gathered from the literature^{12,13,24,29,30,33,34,35} in order to precisely represent crude oil pyrolysis, which is a crucial reaction in the ISC EOR process. The database used in this work is more comprehensive than the one used in Mohammadi et al.’s study³¹ (i.e. 2015 TGA data for 11 distinct crude oils). Since the kind of crude oil affects how it is pyrolyzed, a variety of crude oils with the characteristics mentioned in Table 1 were chosen to serve as input data for our models.

Table 1 Parameters of different crude oils utilized in the research.

Full size table

For model training, factors identified in the literature as being important during the pyrolysis of crude oil^{9,13,33,36,37} were taken into consideration. In this research, the models' input parameters included the temperature, heating rate, weight percentage of asphaltenes and resins, and oil °API gravity. Since these values are often accessible, there is a large enough database for training the models. The model's result was the residual mass of crude oil at various temperatures. Table 1 lists the characterizations for crudes and heating rates utilized in this study's simulation. Additionally, Table 2 lists the output parameter and statistical descriptions of every model input variable, and Fig. 1 visually depicts the distribution of all the arguments.

Table 2 The statistical specifications of the input and target parameters of models.

Full size table

Asymmetrical distribution, in contrast to symmetrical distribution, deviates from a regular and balanced pattern typically illustrated by a bell curve. Skewness may quantify the asymmetry of the distribution in this situation. Skewness value is positive when the probability function's left side contains the majority of the data, and vice versa. Conversely, kurtosis identifies the distribution shape in relation to the normal distribution. For instance, if the kurtosis is positive, it means that the normal distribution has a greater peak than the usual distribution³⁸. According to the data in Table 2 and Fig. 1, the distribution and variation range of the input variables are broad enough to provide a generic model for forecasting the pyrolysis of crude oil. It should be mentioned that oil °API gravity, heating rate and especially asphaltene have a number of outliers which, in turn, definitely influence the precision of models. However, the vast majority of observations, as it is seen, are located within the box borders, making the impact of an error term insignificant. Despite the presence of outliers in the data, a thorough examination confirms their validity, indicating that they statistically differ from the majority of the data. As evident from the modeling results, these outliers do not significantly skew errors during the modeling process. As it is further observed, residual crude oil and temperature data are provided in a continuous form, and the distance between observations is insignificant, while other parameters including oil °API gravity, heating rate, asphaltene, and resin are represented with considerable gaps. Moreover, the median of the temperature and the residual crude oil used to develop models in this research are 385.71 and 40.22, respectively. The median of other parameters such as resin, asphaltene, heating rate, and oil °API gravity are 14, 9.66, 8, and 20.26, accordingly.

Figure 2 shows the correlation matrix of input data. As it is demonstrated, the temperature factor accounted for the greatest influence on mass estimate defining around 92% of its behavior. It should be stated that the correlation is negative, thus it means the bigger the temperature the less the mass value, and vice versa. Other parameters have a much smaller effect than the temperature on the target factor, but these parameters are essential for differentiation in crude oil characteristics and modeling of crude pyrolysis.

The dataset was partitioned into a training set comprising 80% of the total data and a test set with 20% randomly. Training involved using the training subset, while the test subset assessed the model's predictive performance. Here, K-fold cross-validation, specifically K-fold 6, was implemented to ensure each observation had an equal chance in training and validation. This involved randomly splitting the training data into 6 folds, fitting the model using 5 folds, and validating it with the remaining fold, a practice tailored to our dataset size.

Model development

In this study, four different machine learning approaches were used for the purpose of the calculation of residual crude oil during pyrolysis. Among these techniques, one utilized was of white-box nature, and the others were of black-box origin. The flowchart represented below in Fig. 3 depicts the general schematic of the research showing the main steps of each stage employed.

Gaussian process regression (GPR)

A common nonparametric modeling approach called GPR employs the Gaussian process before doing regression analysis³⁹. It includes a prior Gaussian process solved using Bayesian inference as well as the regression residual.

A distribution over functions could be described by a Gaussian process, which is a group of random variables. Since a mean function $m(x)$ and a covariance function $k(x, x{\prime})$ may fully explain an actual process $f(x)$, it might be expressed as⁴⁰:

$$f\left(x\right)\sim GP(m\left(x\right), k\left(x, { x}{\prime}\right))$$

(1)

The objective of GPR is to determine the mapping correlation between the input vector x and the observable y for a specific training dataset $D={\left\{{x}_{i}, {y}_{i} \right\}}_{i=1}^{n}$, where ${x}_{i}$ is the input vector of the ith sample and ${y}_{i}$ is the observation value of the ith sample⁴¹:

$$y=f\left(x\right)+ \zeta $$

(2)

where $\zeta $ is the additional disturbance that matches a Gaussian distribution with zero mean and variance ${\sigma }_{n}^{2}$. Calculating the covariance of noisy measurements y is as follows: $K\left(x, x\right)+ {\sigma }_{n}^{2}+{I}_{n}$, where $y={{[y}_{1 }, {y}_{2}, \dots ,{y}_{n} ]}^{T}$, $X={{[x}_{1 }, {x}_{2}, \dots ,{x}_{n} ]}^{T}$, K represents the covariance matrix, ${I}_{n}$ shows the n-dimensional identity matrix. Therefore, the joint distribution of the testing sample data ${x}_{*}$ under the prior may be computed as⁴¹:

$$\left[\begin{array}{c}y\\ f\left({x}_{*}\right)\end{array}\right]\sim N\left(0,\left[\begin{array}{cc}K(X,X)+{\sigma }_{n}^{2}{I}_{n}& K\left(X,{x}_{*}\right)\\ K\left({x}_{*},X\right)& K\left({x}_{*},{x}_{*}\right)\end{array}\right]\right)$$

(3)

According to Eq. (3), the mean of $f\left({x}_{*}\right)$ and covariance of $f\left({x}_{*}\right)$ may be written as⁴¹:

$$m\left(f\left({x}_{*}\right)\right)=E\left[f\left({x}_{*}\right)\mid X,y,{x}_{*}\right]=K\left({x}_{*},X\right){\left[K(X,X)+{\sigma }_{n}^{2}{I}_{n}\right]}^{-1}y$$

(4)

$${\text{cov}}\left(f\left({x}_{*}\right)\right)=K\left({x}_{*},{x}_{*}\right)-K\left({x}_{*},X\right){\left[K(X,X)+{\sigma }_{n}^{2}{I}_{n}\right]}^{-1}K\left(X,{x}_{*}\right)$$

(5)

In the traditional GPR (CGPR), the entire training dataset $D={\left\{{x}_{i}, {y}_{i} \right\}}_{i=1}^{n}$ is utilized to develop the nonparametric model and to compute the prediction findings for a specific test sample. The dimension of the covariance matrix K in CGPR is $n\times n$.

Extreme gradient boosting (XGBoost)

Boosting applies to a family of learning techniques that increase the fit of ultimate models by mixing base models with basic functions⁴². The composite of basic models with fairly low precision⁴³ creates a scalable solution that could identify deep interactions and is less susceptible to anomalies⁴⁴. The gradient boosting approach, which consists of an effective linear model solver and a tree learning algorithm, is utilized to develop the model. Several objective functions, including regression, classification, and ranking, are supported by the boosting method. XGBoost, a free software package, delivers cutting-edge solutions to a variety of challenges, notably climate projections^45,46. XGBoost with a scalable tree-boosting method performs more than 10 times quicker than current popular solutions on a single computer⁴⁴. XGBoost includes several parameters, making it a complicated model. In addition, hyperparameters are required to limit the danger of over-fitting and forecast variability⁴⁷. The number of iterations (n estimators) and the learning rate are the two key hyper-parameters that avoid overfitting in XGBoost. In this technique, n estimators relate to the complexity of the model; raising this parameter may result in a more robust model, but it might still overfit to a certain extent. The amount of iterations governs the degree of fit and so influences the optimal learning rate value, and conversely. Generalization effectiveness is often enhanced by minimizing the learning rate. Decreased learning rate may significantly enhance predictive accuracy⁴⁸. The regularization term, proposed by Friedman⁴³, assists users in avoiding overfitting and manages the model's complexity. Throughout the tuning procedure, model regularization factors such as lambda and alpha should be adjusted to the required regularization weight in order to improve the quality of the model.

Categorical gradient boosting (CatBoost)

CatBoost is a machine learning technique founded on gradient boosting decision tree (GBDT) that was developed by Yandex researchers in 2017^49,50. Through ranking promotion, it enhances GBDT, assures that all datasets may be utilized for training and learning, and decreases the over-fitting of training⁵¹. Due to its strong effectiveness, CatBoost has been employed in various sectors, notably driving style identification⁵² and diabetes diagnosis⁵³. The traditional GBDT method substitutes the category feature with the average label value related to that category. In a decision tree, the mean label value is used as the segmentation criteria for nodes. This technique is referred to as greedy target-based statistics (greedy TBS) and is described as follows⁴⁹:

$$\frac{\sum_{j=1}^{P} \left[{x}_{j,k}={x}_{i,k}\right]{Y}_{i}}{\sum_{j=1}^{n} \left[{x}_{j,k}={x}_{i,k}\right]}$$

(6)

In general, though, features include more data than lab. While the mean label value is employed to forcefully represent characteristics, conditional transfer takes place. The claim is that the supplied collection of findings ${\text{D}} = \{\text{X}_{\text{i}}, {\text{Y}}_{\text{i}},\}, \text{i} = 1,\ldots, \text{n}, \sigma=(\sigma_{1},\ldots,\sigma_{n} )$ is a permutation, and ${x}_{{p}_{A}k}$ may be replaced with⁴⁹:

$$\frac{\sum_{j=1}^{P-1} \left[{x}_{{\sigma }_{j,k}}={x}_{{\sigma }_{p,k}}\right]{Y}_{{\sigma }_{j}}+aP}{\sum_{j=1}^{P-1} \left[{x}_{{\sigma }_{j,k}}={x}_{{\sigma }_{p,k}}\right]+a}$$

(7)

here, P is the a priori, and a is its weight (a > 0). The addition of a priori reduces the noise produced by the low-frequency category.

Genetic programming (GP)

GP is a frequently used evolutionary method in evolutionary-based computing^54,55. GP may be used to locate global optimum solutions in a wrapped search space. It may additionally generate optimization algorithms motivated by Darwin's theory of evolution⁵⁶. GP employs an evolutionary path including selection, crossover, mutation, and cloning procedures to seek syntactic expressions that offer more connection between a set of independent (input) and dependent (output) elements⁵⁷. GP is capable of optimizing model structure on its own, and its results are symbolic in nature. Moreover, its depiction is adaptable. These significant qualities make GP an excellent method for symbolic regression. GP-evolved solutions, in contrast, provide robust interpretability in terms of how features are learned or retrieved from the signals and how they influence categorization⁵⁸.

Model optimization and tuning

Optimal hyperparameter selection is crucial for algorithm performance. Tuning these parameters fine-tunes the model, significantly impacting accuracy and ensuring the algorithm is well-suited to the specific characteristics of the data, ultimately enhancing predictive capabilities⁵⁹. In constructing each model and addressing overfitting, grid search was utilized to optimize the hyperparameters. The hyperparameters selected for each model differed, with their importance grounded in both theoretical principles and practical considerations. Table 3 provides a comprehensive overview of the selected hyperparameters for the algorithms implemented in this work.

Table 3 Optimal features for implemented models.

Full size table

Evaluation of models

Utilizing seven statistical indicators, the accuracy of the suggested models was evaluated. The following metrics have been employed in the research: MAPE, SD, RMSE, R², MAE, MBE, and NSE. The selection of such indicators is based on the fact that they are commonly considered to be the most representative and effective ones in the fields of statistics and machine learning. These are the descriptions for the measures listed below⁶⁰:

Mean absolute percentage error (MAPE, %):

$${E}_{r}=\frac{1}{n}\sum_{i=1}^{n}abs(\left[\frac{{\left(y\right)}_{exp}-{\left(y\right)}_{pred}}{{y}_{exp}}\right])\times 100$$

(8)

Standard deviation (SD):

$${\text{SD}}=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}{\left(\frac{{y}_{exp}-{y}_{pred}}{{y}_{exp}}\right)}^{2}}$$

(9)

Root mean square error (RMSE):

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({y}_{exp}-{y}_{pred})}^{2}}$$

(10)

Determination coefficient (R²):

$$ R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{N} {(y_{\exp } - y_{pred} )^{2} } }}{{\sum\limits_{i = 1}^{N} {(y_{exp} - \overline{{y_{\exp } }} )^{2} } }} $$

(11)

where N shows the count of data, y_exp refers to the experimental data, and y_pred stands for predicted data by presented models.

Mean absolute error (MAE):

This estimate is a risk measure equivalent to the anticipated value of the absolute error loss or $l1$-norm loss. If $\hat{y}_{i}$ is the anticipated value of the ith sample and ${y}_{i}$ is the matching real value, then the calculated MAE over ${n}_{\text{samples}}$ is given by:

$$ {\text{MAE}}\left( {y,\hat{y}} \right) = \frac{1}{{n_{{{\text{samples}}}} }}\sum\limits_{i = 0}^{{n_{{{\text{samples}}}} - 1}} {\left| {y_{i} - \hat{y}_{i} } \right|.} $$

(12)

Mean bias error (MBE):

This parameter quantifies the average mistake in a forecast and is computed as:

$$MBE=\frac{1}{n}\sum_{i=1}^{n} \left({\tilde{y}}_{i}-{y}_{i}\right)$$

(13)

The Nash–Sutcliffe efficiency (NSE):

It is a normalized measurement that compares the residual variation (or "noise") to the variation of the observed data.

$$NSE=1-\frac{{\sum }_{t=1}^{T}{\left({y}_{o}^{t}-{y}_{m}^{t}\right)}^{2}}{{\sum }_{t=1}^{T}{\left({y}_{o}^{t}-{\overline{y} }_{o}\right)}^{2}}$$

(14)

Here, ${\overline{y} }_{o}$ represents the mean of observed data, while ${y}_{m}$ signifies the simulated data. Additionally, ${y}_{o}^{t}$ denotes the data being released at time instant t.

In combination with the statistical method, graphical analysis was utilized to verify the accuracy of the models. The following is a brief summary of what these graphical analyses imply⁶¹:

The plot of the error distribution is the percent relative error (E_i), which is generated using the given equations and plotted against the experimental findings or variable. This graph illustrates the error pattern and the distribution of approximated E_i values along the axis of zero error.

$${E}_{i}=\left[\frac{{y}_{i,\mathit{exp}}- {y}_{i,pred}}{{y}_{i,exp}}\right]\times 100, i=\mathrm{1,2},3,\dots ..,n$$

(15)

The number of data units along the Y = X axis impacts the correctness of the model; the fewer points there are, the more effective the model.

A graph of cumulative frequency vs absolute relative error (E_a) displays the accuracy of the model in anticipating any percentage of data. E_a is computed using the following equation:

$${E}_{a}=\left|\frac{{y}_{i,\mathit{exp}}- {y}_{i,pred}}{{y}_{i,exp}}\right|\times 100, i=\mathrm{1,2},3,\dots ..,n$$

(16)

Results and discussion

Developed correlation

For the GP algorithm, the following correlation which can accurately predict the target parameter was developed. In order to optimize the model, a thorough grid search was done to find the optimum population size, tree depth, tree length, maximum generations, etc. As a result, the comprehensible equation consisting of 3 input parameters and 10 additional coefficients was established.

$$\text{Residual Crude Oil} \, \left.=\left(\left(\left({c}_{0}-{c}_{1}\cdot \text{ Temperature}\right)\cdot {c}_{2}\cdot {\text{Temperature}}+\left({c}_{3}+{c}_{4}\cdot {\text{Temperature}}\right)\right)\cdot \left(\left({c}_{5}\cdot {\text{HeatingRate}}+{c}_{6}\cdot {\text{Asphaltene}}\right)-{c}_{7}\cdot {\text{Temperature}}\right)-{c}_{8}\cdot {\text{API}}\right)\cdot {c}_{9}+{c}_{10}\right)$$

(17)

$$\begin{aligned} {c}_{0}&=22.115\\ {c}_{1}&=0.030446\\ {c}_{2}&=0.002202\\ {c}_{3}&=18.099\\ {c}_{4}&=0.0057179\\ {c}_{5}&=3.2189\\ {c}_{6}&=2.8308\\ {c}_{7}&=1.1226\\ {c}_{8}&=1.9776\\ {c}_{9}&=0.0055491\\ {c}_{10}&=105.47\end{aligned}$$

Figure 4 represents the schematic of the GP employed for estimating the residual crude oil during pyrolysis.

Statistical evaluation of models

According to the statistical analysis provided in Table 4, the XGBoost model has provided the highest accuracy and reliability in terms of all the indicators. Its R² and NSE are almost equal to 1, and the RMSE, SD, MAPE, MBE, and MAE are extremely small for the testing and training as well as the whole datasets. The GP approach has proved itself to be the worst among the four developed techniques, yet its precision is still quite high despite being less robust than others having 0.9820 of R² for all portions of data. The middle positions are held by GPR and CatBoost. The performance of the first and the latter is rather decent with RMSE, SD, MBE, and MAE extremely close to zero and the estimate of R² being more than 0.99. However, the GPR is better when comparing all the parameters except for MBE. All of these models outperform previously published models and correlations in terms of the precision of the forecasts. Summing up the statistical analysis, the following list from the best performance to the weakest can be established: XGBoost, GPR, CatBoost, and GP.

Table 4 The statistical errors of the proposed models in train, test, and total data sets.

Full size table

Graphical evaluation of models

In this regard, the graphical assessment of the models’ results was performed first by displaying the cross-plot of algorithms outcomes vs real data points, as shown in Fig. 5. Based on these diagrams, the spread of data points forms a line with a unit slope, indicating that the predicted and objective data points in all models except for GP are in excellent conformity. Having a unit slope though, XGBoost, GPR, and CatBoost differ from each other. As seen in the pictures, GPR and CatBoost have a number of insignificant outliers, whereas the XGBoost line is the smoothest among all.

A residual graph is a diagram depicting residuals along the vertical axis and the independent variable along the horizontal axis. The residual number is the discrepancy between the reported and expected numbers. According to the visual materials represented in Fig. 6, the best performance should be attributed to XGBoost possessing the smallest y-axis range (from approximately − 1 to around 1) and the lowest amount of outliers. GP is the least accurate with the spread of residuals from 20 to − 20. GPR and CatBoost are somewhat similar both having the same range in which the majority of observations are located, yet the outliers of CatBoost make it less precise than GPR.

A histogram of error distribution is an allocation of probabilities regarding a point projection that specifies the likelihood of each inaccuracy. Based on Fig. 7, the distributions are highly centered having little deviations in all approaches employed. The majority of the observations are at the point of zero relative error for both training and testing. However, the XGBoost is again the leader in the assessment as its relative error spread is the lowest one being from roughly − 0.14 to 0.08 for both training and testing.

Figure 8 illustrates the relative deviation graph of the created models' results. The horizontal line of this graph represents experimental values, while the vertical axis represents the comparative deviation of model results from experimental values. This graph demonstrates that the comparative deviations of the suggested models are generally spread around the zero-deviation line, indicating that the models can predict the target data with tolerable error rates. As in all the cases, XGBoost effectiveness is the highest one comparing to other techniques utilized with the smallest relative error variation equal to around − 0.14 and 0.08. GP range is the greatest one being in the range from ~ − 0.5 to ~ 0.5 which indicated its lowest level of accuracy.

A cumulative frequency is the total values distributed across multiple absolute relative error intervals. As depicted in Fig. 9, XGBoost, GPR, and CatBoost are the most effective methods for predicting the correct value of the target parameter, as the relative error of 90% of the data does not exceed 10–15%. The best precision is in the case of XGBoost, as the relative error of around 99% of the data is roughly equal to 5–7%. GP performance is worse than black-box models, as shown by the graph. Although the lower accuracy of the developed correlation is obvious and even predictable from the beginning, the advantage of correlation is fast prediction without the need for artificial intelligence-related knowledge, which is usually required to use black-box models.

In comparing this study with previous research, it can be asserted that a dataset comprising 2071 experimental TGA findings for 13 distinct crude oil samples was harnessed in this study, ensuring a comprehensive foundation for the modeling approach. This dataset represents the most extensive compilation utilized for modeling crude oil pyrolysis to date. The application of advanced machine learning techniques led to the development of models with high accuracy. Specifically, the XGBoost model achieved an overall MAPE of 0.7796% and an R² of 0.9999, signifying a remarkable level of precision. This result compares favorably with prior investigations. Past studies in this domain have also sought to model crude oil pyrolysis and predict fuel deposition. Notably, Rasouli et al.²⁹ developed a multilayer perceptron model with a 3.5% error for the pyrolysis of 6 crudes, Norouzpour et al.³⁰, employed a radial basis function neural network with a 5.8% error for the pyrolysis of 6 crudes, and Mohammadi et al.³² utilized a generalized regression neural network with a 1.04% error for the pyrolysis of 11 crudes. While these models showcased respectable performance, our current study not only extends the dataset size but also harnesses a variety of machine learning techniques, enhancing accuracy and robustness in modeling. Furthermore, this study introduces a straightforward mathematical correlation that achieves remarkable accuracy with a mere 9.73% error. Formulating a coherent correlation between input and output datasets proves challenging in opaque methodologies. The application of black-box models demands sophisticated computer systems and specialized expertise, constraining widespread accessibility. Consequently, the development of user-friendly mathematical correlations using advanced white-box algorithms can streamline the prediction of fuel formation during crude oil pyrolysis, offering rapid and precise predictions without the necessity for specialized tools.

Trend analysis

Lastly, Figs. 10, 11 and 12 illustrate how the XGBoost model predicts residual crude oil during pyrolysis as a function of temperature for various crudes and heating rates. It should be mentioned that the chosen oil samples in each graph were connected to particular research to guarantee that the TG experimental settings were the same. Table 1 provides a summary of the heating rates and characteristics of these crude oils. The TG curves of several crudes (Oil 1, 2, 3, 5, and 6) with respect to temperature are shown in Fig. 10a,b. As shown in Fig. 10, the XGboost model successfully estimates the experimental trend for various heating rates and oils. Because crude oils have diverse constituents, so do their TG curves are likewise distinct. Heavy crudes often leave more residue because they contain more asphaltenes. In this instance, the suggested XGBoost model accurately recognizes the TG curve trend and forecasts the quantity of residue for each crude oil sample at various temperatures.

Figure 11 displays the TG curves for crude sample #5 at three diverse heating rates. As shown in Fig. 11, when the heating rate decreases, the crude oil's TG curve shifts to the left as a consequence of a longer exposure period to heat. The XGBoost model successfully predicts the experimental trend and tracks the influence of the heating rate.

Figure 12 displays the TG curves for crude samples #4 and #7 at the same heating rate, which is 10 °C/min. As shown in Fig. 12, the XGBoost model successfully predicts the experimental trend.

Sensitivity analysis

To assess the comparative significance of input variables on residual crude oil, the relevance factor (r) and the XGBoost model results are utilized. The accompanying method is utilized to calculate the r values for each input parameter^32,62:

$$r\left(inp,\sigma \right)=\frac{\sum_{j=1}^{n}(in{p}_{i,j }-in{p}_{m,i})({\sigma }_{j}-{\sigma }_{m}) }{{\left(\sum_{j=1}^{n}{\left(in{p}_{i,j}-in{p}_{m,i}\right)}^{2}\sum_{j=1}^{n}{\left({\sigma }_{j}-{\sigma }_{m}\right)}^{2}\right)}^{0.5}}$$

(18)

where ${\sigma }_{m}$ is the average value of calculated residual crude oil and ${\sigma }_{j}$ is the jth value of assessed crude oil residue; and $in{p}_{i,j}$ and $in{p}_{m,i}$ are the jth and average value of the ith input parameter, correspondingly, where $in{p}_{i,j}$ are oil ^oAPI gravity, heating rate, resins, asphaltenes, and temperature. Figure 13 depicts the relative effect and relevance of input parameters on residual crude oil. As it is seen, the most impact in the XGBoost model is attributed to the temperature with approximately − 0.92 significance. All other parameters such as resin, asphaltene, heating rate, and oil ^oAPI gravity are not as influential as temperature having less than 0.16 of importance. Overall, among the mentioned inputs, temperature and asphaltenes owe the highest influence on the crude oil pyrolysis process. In addition, temperature and oil ^oAPI gravity had negative impacts on fuel deposition, while asphaltenes, resins, and heating rates had a positive impact on fuel deposition during crude oil pyrolysis. This means that the higher the amount of asphaltene and resin of crude, the higher the amount of fuel (coke) formation.

The high negative impact of temperature on fuel deposition is attributed to the fundamental principles of pyrolysis. Elevated temperatures promote the thermal cracking and vaporization of hydrocarbons in crude oil, leading to a high reduction in the mass of residual crude oil. This behavior is consistent with the well-established pyrolysis process. Asphaltenes and resins are complex, high molecular weight components in crude oil. They tend to break down and contribute to coke formation during pyrolysis. Their positive impact on fuel deposition can be attributed to their transformation into solid carbonaceous residues, which enhance the overall fuel availability for sustaining the combustion front in ISC. Overall, an increase in asphaltene and resin content results in a reduction in mass loss during the pyrolysis of crude oil, consequently leading to increased fuel deposition. While heating rate is essential in governing the speed of temperature increase, its impact is relatively low in this model. With an escalation in heating rate, the TG curve for crude oil shifts to the right, signifying an increase in the mass of residual crude oil. The observed result is linked to the reduced exposure time of the crude oil to heat. Finally, Oil ^oAPI gravity, with its lower significance, implies that its effect on fuel deposition is less pronounced. Typically, heavier crude oils characterized by lower ^oAPI gravity tend to leave more residue, primarily due to a higher concentration of asphaltene. In summary, the technical reasons for these sensitivity analysis outcomes are rooted in the complex chemistry of crude oil pyrolysis. Understanding the behavior of these parameters can aid in optimizing ISC processes and improving the recovery of unburned oil.

Conclusions

Crude oil pyrolysis analysis through TGA runs offers insights into fuel deposition during ISC EOR. This study aimed to precisely model crude oil pyrolysis by leveraging 2071 experimental TGA datasets obtained from literature sources. A suite of robust machine learning techniques, encompassing three black-box approaches (CatBoost, GPR, and XGBoost), and a white-box approach (GP), was employed to estimate crude oil residue at varying temperature intervals during TGA runs. Among the developed models and mathematical correlation, the XGBoost model exhibited exceptional precision, achieving an overall MAPE of 0.7796% and an R² of 0.9999. Following the XGBoost model, GPR, CatBoost, and GP models provided the next best results, respectively. Notably, the GP model, despite displaying a slightly higher error compared to the black-box models, provided satisfactory results, making it a viable option for rapid estimation of crude oil residue during pyrolysis. Moreover, a sensitivity analysis was conducted to explore the relative impact and significance of inputs on residual crude oil during pyrolysis. Among these inputs, temperature and asphaltenes were identified as the most influential factors in the crude oil pyrolysis process. Higher temperatures and oil ^oAPI gravity were associated with a negative impact, leading to a decrease in fuel deposition. On the other hand, increased values of asphaltenes, resins, and heating rates showed a positive impact, resulting in an increase in fuel deposition.

Data availability

The datasets used during the current study available from the corresponding author on reasonable request.

References

Green, D. W. & Willhite, G. P. Enhanced oil recovery. Vol. 6 (Henry L. Doherty Memorial Fund of AIME, Society of Petroleum Engineers, 1998).
Tarek, A. & Nathan, M. Advanced reservoir management and engineering (Gulf Professional Pub, 2012).
Google Scholar
Fazlyeva, R. et al. In situ combustion. Thermal Methods, 155–215 (2023).
Sarathi, P. S. In-situ combustion handbook--principles and practices (National Petroleum Technology Office, Tulsa, OK (US), 1999).
Mahinpey, N., Ambalae, A. & Asghari, K. In situ combustion in enhanced oil recovery (EOR): A review. Chem. Eng. Commun. 194, 995–1021 (2007).
Article CAS Google Scholar
Ciajolo, A. & Barbella, R. Pyrolysis and oxidation of heavy fuel oils and their fractions in a thermogravimetric apparatus. Fuel 63, 657–661 (1984).
Article CAS Google Scholar
Ramey, H. (Gulf Publishing Company, Texas, 1985).
Vossoughi, S. TGA/DSC techniques as research tools for the study of the in-situ combustion process. Thermochim. Acta 106, 63–69 (1986).
Article CAS Google Scholar
Ranjbar, M. & Pusch, G. Pyrolysis and combustion kinetics of crude oils, asphaltenes and resins in relation to thermal recovery processes. J. Anal. Appl. Pyrolysis 20, 185–196 (1991).
Article CAS Google Scholar
Ranjbar, M. Influence of reservoir rock composition on crude oil pyrolysis and combustion. J. Anal. Appl. Pyrolysis 27, 87–95 (1993).
Article CAS Google Scholar
Kok, M. V. Use of thermal equipment to evaluate crude oils. Thermochim. Acta 214, 315–324 (1993).
Article CAS Google Scholar
Karacan, O. & Kok, M. V. Pyrolysis analysis of crude oils and their fractions. Energy Fuels 11, 385–391 (1997).
Article CAS Google Scholar
Kök, M. & Karacan, O. Pyrolysis analysis and kinetics of crude oils. J. Thermal Anal. Calorimetry 52, 781–788 (1998).
Article Google Scholar
Ambalae, A., Mahinpey, N. & Freitag, N. Thermogravimetric studies on pyrolysis and combustion behavior of a heavy oil and its asphaltenes. Energy Fuels 20, 560–565 (2006).
Article CAS Google Scholar
Kok, M. V. Clay concentration and heating rate effect on crude oil combustion by thermogravimetry. Fuel Process. Technol. 96, 134–139 (2012).
Article CAS Google Scholar
Li, Y.-B. et al. Characteristics and properties of coke formed by low-temperature oxidation and thermal pyrolysis during in situ combustion. Ind. Eng. Chem. Res. 59, 2171–2180 (2020).
Article CAS Google Scholar
Kök, M. & Iscan, A. Catalytic effects of metallic additives on the combustion properties of crude oils by thermal analysis techniques. J. Thermal Anal. Calorimetry 64, 1311–1318 (2001).
Article Google Scholar
Rezaei, M., Schaffie, M. & Ranjbar, M. Thermocatalytic in situ combustion: Influence of nanoparticles on crude oil pyrolysis and oxidation. Fuel 113, 516–521 (2013).
Article CAS Google Scholar
Zhang, X., Liu, Q. & Fan, Z. Enhanced in situ combustion of heavy crude oil by nickel oxide nanoparticles. Int. J. Energy Res. 43, 3399–3412 (2019).
Article CAS Google Scholar
Li, Y.-B. et al. Study of the catalytic effect of copper oxide on the low-temperature oxidation of Tahe ultra-heavy oil. J. Thermal Anal. Calorimetry 135, 3353–3362 (2019).
Article CAS Google Scholar
Abaas, M., Yuan, C., Emelianov, D. A., Varfolomeev, M. A. & Ariskina, K. A. Effect of calcite on crude oil combustion characterized by high-pressure differential scanning calorimetry (HP-DSC). Pet. Sci. Technol. 37, 1216–1221 (2019).
Article CAS Google Scholar
Li, Y.-B. et al. A comprehensive investigation of the influence of clay minerals on oxidized and pyrolyzed cokes in in situ combustion for heavy oil reservoirs. Fuel 302, 121168 (2021).
Article CAS Google Scholar
Ren, Y., Freitag, N. & Mahinpey, N. A simple kinetic model for coke combustion during an in-situ combustion (ISC) process. J. Can. Pet. Technol. 46 (2007).
Murugan, P., Mahinpey, N., Mani, T. & Freitag, N. Pyrolysis and combustion kinetics of Fosterton oil using thermogravimetric analysis. Fuel 88, 1708–1713 (2009).
Article CAS Google Scholar
Gundogar, A. S. & Kok, M. V. Thermal characterization, combustion and kinetics of different origin crude oils. Fuel 123, 59–65 (2014).
Article CAS Google Scholar
Karimian, M., Schaffie, M. & Fazaelipoor, M. H. A kinetic investigation into the in situ combustion reactions of Iranian heavy oil from Kuh-E-Mond reservoir. Iran. J. Oil Gas Sci. Technol. 6, 18–33 (2017).
Google Scholar
Zhao, S., Pu, W., Sun, B., Gu, F. & Wang, L. Comparative evaluation on the thermal behaviors and kinetics of combustion of heavy crude oil and its SARA fractions. Fuel 239, 117–125 (2019).
Article CAS Google Scholar
Wang, J.-X., Wang, L.-L., Wang, T.-F. & Peng, X.-Q. Effects of SARA fractions on pyrolysis behavior and kinetics of heavy crude oil. Pet. Sci. Technol. 38, 945–954 (2020).
Article CAS Google Scholar
Rasouli, A., Dabiri, A. & Nezamabadi-pour, H. A multi-layer perceptron-based approach for prediction of the crude oil pyrolysis process. Energy Sour. Part A Recov. Util. Environ. Effects 37, 1464–1472 (2015).
CAS Google Scholar
Norouzpour, M., Rasouli, A. R., Dabiri, A., Azdarpour, A. & Karaei, M. A. Prediction of crude oil pyrolysis process using radial basis function networks. Revista QUID, 567–576 (2017).
Mohammadi, M.-R. et al. On the evaluation of crude oil oxidation during thermogravimetry by generalised regression neural network and gene expression programming: Application to thermal enhanced oil recovery. Combust. Theory Model. 25, 1268–1295 (2021).
Article CAS ADS Google Scholar
Mohammadi, M.-R., Hemmati-Sarapardeh, A., Schaffie, M., Husein, M. M. & Ranjbar, M. Application of cascade forward neural network and group method of data handling to modeling crude oil pyrolysis during thermal enhanced oil recovery. J. Pet. Sci. Eng. 205, 108836 (2021).
Article CAS Google Scholar
Alvarez, E. et al. Pyrolysis kinetics of atmospheric residue and its SARA fractions. Fuel 90, 3602–3607 (2011).
Article CAS Google Scholar
Coriolano, A. C., Oliveira, A. A., Bandeira, R. A., Fernandes, V. J. & Araujo, A. S. Kinetic study of thermal and catalytic pyrolysis of Brazilian heavy crude oil over mesoporous Al-MCM-41 materials. J. Thermal Anal. Calorimetry 119, 2151–2157 (2015).
Article CAS Google Scholar
Wang, Y. et al. New insights into the oxidation behaviors of crude oils and their exothermic characteristics: Experimental study via simultaneous TGA/DSC. Fuel 219, 141–150 (2018).
Article CAS Google Scholar
Coriolano, A. C., Oliveira, A. A., Bandeira, R. A., Fernandes, V. J. & Araujo, A. S. Kinetic study of thermal and catalytic pyrolysis of Brazilian heavy crude oil over mesoporous Al-MCM-41 materials. J. Therm. Anal. Calorimetry 119, 2151–2157 (2015).
Article CAS Google Scholar
Bae, J. Characterization of crude oil for fireflooding using thermal analysis methods. Soc. Pet. Eng. J. 17, 211–218 (1977).
Article CAS Google Scholar
Hemmati-Sarapardeh, A., Varamesh, A., Husein, M. M. & Karan, K. On the evaluation of the viscosity of nanofluid systems: Modeling and data assessment. Renew. Sustain. Energy Rev. 81, 313–329 (2018).
Article CAS Google Scholar
Rasmussen, C. E. & Williams, C. K. Gaussian processes in machine learning. Lect. Notes Comput. Sci. 3176, 63–71 (2004).
Article Google Scholar
Rasmussen, C. E. & Williams, C. K. Gaussian processes for machine learning. Vol. 1 (Springer, 2006).
Ouyang, Z.-L., Chen, G. & Zou, Z.-J. Identification modeling of ship maneuvering motion based on local Gaussian process regression. Ocean Eng. 267, 113251 (2023).
Article Google Scholar
Schapire, R. E. & Freund, Y. Boosting: Foundations and algorithms. Kybernetes 42, 164–166 (2013).
Article Google Scholar
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–1232 (2001).
Chen, T. & Guestrin, C. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
Zheng, H. & Wu, Y. A xgboost model with weather similarity analysis and feature engineering for short-term wind power forecasting. Appl. Sci. 9, 3019 (2019).
Article Google Scholar
Ma, X., Fang, C. & Ji, J. in IOP Conference Series: Earth and Environmental Science. 012013 (IOP Publishing).
Madani, S. A. et al. Modeling of nitrogen solubility in normal alkanes using machine learning methods compared with cubic and PC-SAFT equations of state. Sci. Rep. 11, 24403 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Shi, Y., Li, J. & Li, Z. Gradient boosting with piece-wise linear regression trees. arXiv preprint arXiv:1802.05640 (2018).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 31 (2018).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).
Pham, T. D. et al. Comparison of machine learning methods for estimating mangrove above-ground biomass using multiple source remote sensing data in the red river delta biosphere reserve Vietnam. Remote Sens. 12, 1334 (2020).
Article ADS Google Scholar
Liu, W. et al. A semi-supervised tri-catboost method for driving style recognition. Symmetry 12, 336 (2020).
Article ADS Google Scholar
Fengshun, M., Yan, L., Cen, G., Meiji, W. & Dongmei, L. Diabetes prediction method based on CatBoost algorithm [J]. Comput. Syst. Appl. 28, 215–218 (2019).
Google Scholar
Al-Sahaf, H. et al. A survey on evolutionary machine learning. J. R. Soc. N. Zeal. 49, 205–228 (2019).
Article Google Scholar
Poli, R., Langdon, W. B., McPhee, N. F. & Koza, J. R. A Field guide to genetic programming. lulu. com. With contributions by JR Koza (2008).
Koza, J. R. Genetic programming: On the programming of computers by means of natural selection (complex adaptive systems). A Bradford Book 1, 18 (1993).
Google Scholar
Emigdio, Z. et al. Modeling the adsorption of phenols and nitrophenols by activated carbon using genetic programming. J. Clean. Prod. 161, 860–870 (2017).
Article Google Scholar
Bi, Y., Xue, B. & Zhang, M. Genetic programming for image classification: An automated approach to feature learning. Vol. 24 (Springer Nature, 2021).
Mohammadi, M.-R. et al. Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state. Sci. Rep. 11, 17911 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Liu, B. et al. Pore structure characterization of solvent extracted shale containing kerogen type III during artificial maturation: Experiments and tree-based machine learning modeling. Energy 283, 128885 (2023).
Article CAS Google Scholar
Rashidi-Khaniabadi, A., Rashidi-Khaniabadi, E., Amiri-Ramsheh, B., Mohammadi, M.-R. & Hemmati-Sarapardeh, A. Modeling interfacial tension of surfactant–hydrocarbon systems using robust tree-based machine learning algorithms. Sci. Rep. 13, 10836 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar
Ansari, S. et al. Experimental measurement and modeling of asphaltene adsorption onto iron oxide and lime nanoparticles in the presence and absence of water. Sci. Rep. 13, 122 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Continental Shale Hydrocarbon Accumulation and Efficient Development, Ministry of Education, Northeast Petroleum University, Daqing, 163318, China
Fahimeh Hadavimoghaddam
Ufa State Petroleum Technological University, Ufa, 450064, Russia
Fahimeh Hadavimoghaddam
Plekhanov Russian University of Economics, Moscow, 117997, Russia
Alexei Rozhenko
Department of Petroleum Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
Mohammad-Reza Mohammadi & Abdolhossein Hemmati-Sarapardeh
National Iranian Oil Company, Tehran, Iran
Masoud Mostajeran Gortani
School of Mining and Geosciences, Nazarbayev University, Astana, Kazakhstan
Peyman Pourafshary
State Key Laboratory of Petroleum Resources and Prospecting, China University of Petroleum (Beijing), Beijing, China
Abdolhossein Hemmati-Sarapardeh

Authors

Fahimeh Hadavimoghaddam
View author publications
Search author on:PubMed Google Scholar
Alexei Rozhenko
View author publications
Search author on:PubMed Google Scholar
Mohammad-Reza Mohammadi
View author publications
Search author on:PubMed Google Scholar
Masoud Mostajeran Gortani
View author publications
Search author on:PubMed Google Scholar
Peyman Pourafshary
View author publications
Search author on:PubMed Google Scholar
Abdolhossein Hemmati-Sarapardeh
View author publications
Search author on:PubMed Google Scholar

Contributions

F.H.: writing—original draft, methodology, visualization, A.R.: writing—original draft, validation, methodology, M.-R.M.: data curation, writing—original draft, M.M.G.: conceptualization, methodology, visualization, P.P.: validation, methodology, reviewing and editing, A.H.-S.: supervision, conceptualization, methodology, reviewing and editing.

Corresponding author

Correspondence to Abdolhossein Hemmati-Sarapardeh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hadavimoghaddam, F., Rozhenko, A., Mohammadi, MR. et al. Modeling crude oil pyrolysis process using advanced white-box and black-box machine learning techniques. Sci Rep 13, 22649 (2023). https://doi.org/10.1038/s41598-023-49349-x

Download citation

Received: 21 March 2023
Accepted: 07 December 2023
Published: 19 December 2023
Version of record: 19 December 2023
DOI: https://doi.org/10.1038/s41598-023-49349-x

This article is cited by

Modeling residue formation from crude oil oxidation using tree-based machine learning approaches
- Mohammad-Reza Mohammadi
- Seyyed-Mohammad-Mehdi Hosseini
- Ahmad Mohaddespour
Scientific Reports (2025)
Predictive modeling of CO2 solubility in piperazine aqueous solutions using boosting algorithms for carbon capture goals
- Mohammad-Reza Mohammadi
- Aydin Larestani
- Mohammad Ranjbar
Scientific Reports (2024)

Subjects

Abstract

Similar content being viewed by others

Modeling residue formation from crude oil oxidation using tree-based machine learning approaches

Enhanced intelligent approach for determination of crude oil viscosity at reservoir conditions

Enhanced machine learning—ensemble method for estimation of oil formation volume factor at reservoir conditions

Introduction

Data gathering and preparation

Model development

Gaussian process regression (GPR)

Extreme gradient boosting (XGBoost)

Categorical gradient boosting (CatBoost)

Genetic programming (GP)

Model optimization and tuning

Evaluation of models

Results and discussion

Developed correlation

Statistical evaluation of models

Graphical evaluation of models

Trend analysis

Sensitivity analysis

Conclusions

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Modeling residue formation from crude oil oxidation using tree-based machine learning approaches

Predictive modeling of CO2 solubility in piperazine aqueous solutions using boosting algorithms for carbon capture goals

Search

Quick links