Statistical and machine learning analysis of diesel engines fueled with Moringa oleifera biodiesel doped with 1-hexanol and Zr2O3 nanoparticles

Kumar, K. Sunil; Razak, Abdul; Ramis, M. K.; Irshad, Shaik Mohammad; Islam, Saiful; Wodajo, Anteneh Wogasso

doi:10.1038/s41598-025-87818-7

Download PDF

Article
Open access
Published: 01 March 2025

Statistical and machine learning analysis of diesel engines fueled with Moringa oleifera biodiesel doped with 1-hexanol and Zr₂O₃ nanoparticles

K. Sunil Kumar¹,
Abdul Razak²,
M. K. Ramis²,
Shaik Mohammad Irshad³,
Saiful Islam⁴ &
…
Anteneh Wogasso Wodajo⁵

Scientific Reports volume 15, Article number: 7269 (2025) Cite this article

3274 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

This experimental study based on DOE (Design of experiments) explores the performance and emission characteristics of Moringa oleifera-based biodiesel blends enhanced with zirconium oxide (ZrO₂) and 1-hexanol as boosting agents in a slow-speed diesel engine operating at 1500 rpm. The novelty lies in the synergistic use of these additives for improving fuel efficiency and reducing emissions, combined with advanced statistical and machine learning models for optimization and prediction. Four test blends were analyzed: 90D5MO5H + 25 ppm ZrO₂, 80D10MO10H + 50 ppm ZrO₂, 70D15MO15H + 75 ppm ZrO₂, and 100MO + 100 ppm ZrO₂. A comprehensive methodology involving experimental testing and statistical modelling using Gradient Boosting (GBoost), Extreme Learning Machine (ELM), and Response Surface Methodology (RSM) was employed. Key findings include a brake thermal efficiency (BTE) of 8.63% higher than diesel and a fuel consumption reduction of 46.13% (0.14 kg/kWh) for the 90D5MO5H + 25 ppm ZrO₂ blend. This blend also demonstrated superior combustion characteristics, including a peak cylinder pressure of 70 bar and a heat release rate (HRR) of 45 J/°CA. Emission analysis revealed significantly reduced hydrocarbon emissions (0.020%) for 100MO + 100 ppm ZrO₂ and the lowest carbon monoxide emissions (10.1%) for 90D5MO5H + 25 ppm ZrO₂. Among predictive models, ELM exhibited the highest accuracy with an R² value of 0.9604, outperforming other approaches. The findings suggest that optimized moringa oleifera blends with zirconium oxide and 1-hexanol offer a promising solution for sustainable and cleaner diesel engine operation, with potential applications in transportation and energy sectors aiming for reduced environmental impact.

Impact of metal oxides on thermal response of zirconia coated diesel engines fueled by Momordica biodiesel machine learning insights

Article Open access 21 July 2025

Modelling of compression ignition engine by soft computing techniques (ANFIS-NSGA-II and RSM) to enhance the performance characteristics for leachate blends with nano-additives

Article Open access 18 September 2023

Artificial intelligence based prediction and multi-objective RSM optimization of tectona grandis biodiesel with Elaeocarpus Ganitrus

Article Open access 30 January 2025

Introduction

Petroleum and gas are easily stored, portable, and have a high chemical reaction perspective, so they are used as the primary means of energy in internal combustion engines¹. The significance of renewable gasoline, which can serve as a substitute for natural gas, is increasing steadily. This is driven by the dwindling availability of natural deposits in certain regions, their ongoing depletion, and the environmental issues they cause, including air pollution and harm to surrounding ecosystems². Minimizing the use of petroleum and diesel in engine combustion (ICE) and consequently lowering the emission of risky exhaust pollutants have been the focus of investigators and researchers for a considerable period^3,4. The demand and use for petroleum and coal are rapidly rising daily due to the growing number of vehicles on the road. In tandem with the decline of the supply and demand equilibrium, the costs of oil and gas also rise swiftly. Implementing innovative fuel options in internal combustion engines is a reasonable strategy to fulfil these increasing needs⁵. The term “alternative fuel” refers to any substance capable of replacing natural gas in engines equipped with internal combustion systems. The feasibility of using petroleum as a substitute for natural gas depends on its characteristics, such as cost-effectiveness, ethical acceptability, and availability⁶.

Moringa oleifera seeds, known for their high mono-unsaturated to saturated fatty acid (MUFA/SFA) ratio, as well as their rich content of sterols, tocopherols, and sulfur-containing amino acid peptides, are increasingly recognized as a valuable resource for a variety of food and non-food applications⁷. The swift proliferation of Moringa shrubs in tropical and tropical regions, even during extended periods of dryness, establishes the species just like a dependable asset to improve the nutritional well-being of local communities and, if improved farming methods are utilized, their financial situation, as it may produce biodiesel fuel from a source that does not compete alongside food for humans crops. There has been a single human pharmaceutical activity investigation into the seed as well as oil, despite its widespread use as conventional medicine. Certain positive testimony, nevertheless, warrants fresh attempts to get precise and conclusive facts regarding any benefits to people that accompany seedling consumption⁸. An extensive analysis of scientific information on the structure of oil derived from Moringa has initiated a systematic strategy for additional studies. These research investigations, utilizing the seeds as well as fuel, will concentrate on cultivating methods to enhance plant yield and will examine the health impacts on people who consume both the seeds and oil produced from them⁹.

A comparative analysis of the engine efficiency and pollution characteristics of diesel engines powered by moringa biodiesel, palm biodiesel as the jatropha plant biodiesel, plus diesel fuel is the objective of this work. In this work, a mere 20% of each biofuel (referred to as MB20, PB20, as well as JB is used to 20%, correspondingly) was evaluated in a diesel engine. This decision was based on the fact that the potential application of biodiesel as much as twenty percent in diesel engines requires adjustments. Furthermore, the chemical and physical properties of all the fuel samples are provided and compared to the requirements set by ASTM A6751. All biodiesel fuel types showed lower brake force (BP) and higher pedal-specific fuel consumption (BSFC) compared to diesel fuel. An experimental fuel blend including the RMA10 gasoline and FAME biofuel was used in the study. Specifically, the discharge of nitrogen dioxide in exhaust gases decreased by 24.1%, while the release of carbon dioxide in emission gasses diminished by 25.1%. Nevertheless, the economic effectiveness of diesel engine operation declined when biofuel was used, as the particular efficient fuel consumption rose by 9.7%. An optimal extent of sustainable development could be achieved by employing a fuel blend of 10 to 15 per cent renewables in an 80% diesel load. This scenario reduced nitrogen oxide emissions by 25.1%, and carbon dioxide oxide emissions by 19.81% more than conventional fuel¹⁰.

The Artificial neural networks were developed for different biodiesels to determine the performance parameters and also from the parameters such as injection timings and from the test results it is understood that the percentage of error concerning different exhaust gas parameters was very low compared to conventional experiments. Hence this study is useful for researchers willing to minimize emissions without any further engine modifications¹¹. From innovative experimentation¹², crude palm oil was investigated in diesel engines to evaluate performance parameters across different test trials, including blends of 20% crude palm oil + 80% diesel, 30% crude palm oil + 70% diesel, and 40% crude palm oil + 60% diesel, over a total operating time of 380 h. These tests revealed that the lowest emissions were achieved with 100% crude palm oil due to its higher oxidation capacity compared to the blended fuels.

The use of nickel nanoparticles in a diesel engine fueled with Neem biodiesel, blended with varying concentrations of methyl ester reactants, has been shown to enhance combustion parameters, resulting in improved performance and reduced emissions. Specifically, hydrocarbon emissions decreased by 2.2%, carbon monoxide emissions by 3.8%, and nitrogen oxide emissions by 11.3%. This improvement is attributed to the nickel nano catalyst’s effectiveness as a reactive agent in enhancing the physical properties of the fuel blends, as demonstrated in prior studies^13,14. Recent studies^15,16 have explored the properties of methanol-biodiesel blends for Azadirachta indica biodiesel-diesel mixtures, focusing on two formulations: 10% methanol + 90% biodiesel and 20% methanol + 80% biodiesel. The investigation included evaluating kinematic viscosity, flash point, fire point, and cetane number in accordance with marine protocol standards. The results revealed that the 20% methanol + 80% biodiesel blend performed best, offering a calorific value comparable to diesel. Specifically, the 20% blend showed a 7.2% increase in calorific value, a 40% increase in kinematic viscosity, and a 23% improvement in cetane index compared to pure diesel. These findings indicate that biodiesel blends with a 20% methanol content can enhance the fuel properties and performance of marine diesel engines. Studies^17,18 evaluated hone biodiesel blends under compression ratios of 17.5 to 18 and injection pressures of 180, 210, and 240 bar. The blend with 20% hone biodiesel exhibited the best performance at an injection pressure of 210 bar, achieving higher calorific values and reducing brake-specific fuel consumption (BSFC) by 0.05 kg/kWh compared to diesel. Similarly¹⁹, highlighted the superior performance of hone biodiesel across varying compression ratios and injection pressures, demonstrating its potential as an efficient and sustainable alternative fuel.

While numerous experiments have explored the application of Moringa-based biofuels in diesel engines, they typically demonstrate minor improvements in engine performance and slight reductions in emissions, often necessitating engine modifications to achieve these results. However, the potential of hexanol as a boosting fuel in biodiesel blends remains largely unexplored, with limited testing and minimal focus on its ability to enhance combustion and overall engine efficiency. Previous studies have also lacked comprehensive insights into the synergistic role of hexanol with catalysts like zirconium oxide (ZrO₂) in biodiesel applications and their optimization through advanced computational techniques. Motivated by the need for sustainable and environmentally friendly fuel alternatives, this work integrates experimental testing with machine learning models like Gradient Boosting (GBoost), Extreme Learning Machine (ELM), and Response Surface Methodology (RSM) to predict and optimize engine performance and emissions. The novelty lies in this dual approach, combining innovative fuel formulations with state-of-the-art predictive tools. The aim is to evaluate the performance, combustion, and emission characteristics of these biodiesel blends, offering a sustainable solution with real-world applications in the transportation and energy sectors while contributing to reduced environmental impact.

Materials and Methods

Materials

The materials used in this analysis are additives such as Zr₂O₃ and the ignition improver is 1-hexagonal. The materials are purchased from alridtch enterprises with a cost of RS 200 per kg and 1 hexagonal from the local supplier at Chennai Ambattur cost of RS 400 per litre. The raw biodiesel called moringa oleferia biodiesel is purchased from Andhra Pradesh per litre Rs 1200. The physical properties of the Zr₂O₃ Catalyst are mentioned in Table 1.

Table 1 Physical properties of the Zr₂O₃ catalyst.

Full size table

Methods

The blends were proportionated by the following terms named 90D5MO5H + 25 ppm Zr₂O_3, 80D10MO10H + 50 ppm Zr₂O_{3 +} 70D15MO15H + 75 ppm Zr₂O₃ and 100 MO + 100 ppm Zr₂O_3. All the results obtained from the engine are compared with standard diesel. These blends’ physical and chemical properties are tested at the local testing Centre available at the EETA laboratory. A few tests such as kinematic viscosity and cetane numbers are tested at the Indian Institute of Madras Chennai. The ASTM standards are shown in the table to better understand the properties of the blends obtained during the tests.

Experimental Setup

The experimental apparatus set up used in this study is schematically shown in Fig. 1. The engine’s equipment comprised a thermocouple for measuring temperature, load cells, a tachometer, the Froude hydraulic device, and a fuel consumption measuring container with a precision of ± 0.5. A performance assessment of the combustion system was conducted using various fuel blends, ranging from 0–30% biodiesel by volume in diesel¹⁹. The combustion characteristics were analyzed, and the results were presented through graphs plotting key parameters such as pressure, temperature, and heat release rate over the engine cycle in Section “Combustion characteristics”. To investigate how different biodiesel blend ratios affect engine performance and emissions, and to determine whether higher biofuel concentrations can be used in diesel-powered vehicles without significant adverse effects, a higher percentage of biofuel was tested in this experiment. Initial trials were conducted to ensure a controlled water flow into the centrifuge and maintain the necessary torque for the pump. The engine was tested under torque conditions ranging from 20 to 100 Nm. Following every engine test, the speed, consumption of fuel (FC), brake power (BP), brake-specific fuel expenditure (BSFC), fuel equal power (FEP), and brake thermal efficiency (BTE) were measured and calculated for each torque condition. Following data collection, the entire process was repeated for all torques that were tested. The performance characteristics were calculated following the methodology employed. More and more studies are developing to test appropriate biodiesel blends in diesel engines to determine their performance and emissions quality, even though ASTM guidelines permit fuel mixes between 5 and 20%. Every single test was executed with a constant torque working condition ranging from 20 N m to 100 N m, with increments of 20 N m. The engine torque was manipulated by regulating the water flow being directed into the dynamometer. The transmission pumping water into the A dynamometer was employed to acquire various engine torques, which could be directly measured through the instrumentation board²⁰. Table 2 presents the engine specifications, while Table 3 provides the properties of the fuel blends.

Table 2 Engine specifications.

Full size table

Table 3 Properties of the fuel blends.

Full size table

Uncertainty analysis

Defects and ambiguities may arise through several sources such as instrument design and measurement, fluctuating surroundings, assessments and observations, and so on. Broadly speaking, doubt can be categorized into two primary components, particularly specified mistakes and unplanned mistakes²¹. The former situation pertains to the aspect of consistency, whilst the second one encompasses the aspect of statistical measures. Unpredictability associated with the equipment employed for this field study is shown in Table 4, while the uncertainties in the observed values are provided in Table 4. The present study evaluates the uncertainty of the observed variable (DX) using a Gaussian distribution, as defined in Eq. (2), with a confidence limit of ± 2 s. The value 2 s represents the median range within which 95% of the measurements fall. Equation (1) is referenced from²².

$$\Delta {\text{R}} = \sqrt {\left( {\frac{{\partial {\text{P}}}}{{\partial {\text{p}}_{1} }}\Delta {\text{d}}_{1} } \right)^{2} + \left( {\frac{{\partial {\text{P}}}}{{\partial {\text{p}}_{2} }}\Delta {\text{d}}_{2} } \right)^{2} + \left( {\frac{{\partial {\text{P}}}}{{\partial {\text{p}}_{3} }}\Delta {\text{d}}_{3} } \right)^{2} + \cdots \left( {\frac{{\partial {\text{P}}}}{{\partial {\text{p}}_{{\text{n}}} }}\Delta {\text{d}}_{{\text{n}}} } \right)^{2} }$$

(1)

where “∂P/(∂P₁), ∂P/(∂p₂ ) and ∆d₁, ∆d₂, ∆d₃ represent the ratio of deviations to standard data and degree of uncertainties at repeated readings, respectively.

Table 4 Uncertainty analysis.

Full size table

The actual readings and deviated readings due to the atmospheric and other defects can be found using Eq. (2) as done by²⁰

$$\sum P_{a} = \frac{{2\sigma_{p} }}{{\overline{x}_{np} }}*100$$

(2)

where P_a = Actual Measurements taken from the experimental set, σ_p = Result deviations from the experimental set and x_np = The intensity of the uncertainty.

Response surface methodology

A potent statistical tool for optimizing complex systems and processes is Response Surface Methodology (RSM). It analyses and enhances the links between multiple input variables and the output response using mathematical and statistical models. RSM’s main goal is to identify ideal conditions for desired responses by means of a sequence of experimental runs by means of interactions between variables. Usually using design of experiments ideas, RSM creates data to be used in empirical model construction. Often quadratic in character, these models offer a surface that reflects the response as a function of the input variables. Using surface analysis, researchers can pinpoint the parameters that either enhance or limit the response variable⁵.

RSM employs common designs such as Box-Behnken designs and central composite designs (CCD). For appropriate second-order models, CCD is especially useful; it also efficiently explores curvature in the response surface. Conversely, box-Behnken designs help to minimize the number of experimental runs while nonetheless offering a full picture of the interactions among the components³. RSM’s iterative character lets one improve experiments depending on the first results, increasing precision and efficiency. Applied extensively in many disciplines, including engineering, chemical processing, and product development, this approach helps to improve performance, lower costs, and raise quality in all spheres.

Soft computing approaches

Gradient boosting

In the machine learning paradigm, Gradient Boosting (GBoost) regression is a method whereby a predictive model is iteratively created by aggregating the predictions of weak learners typically decision trees—to generate a strong learner. It is predicated on the boosting idea, in which each new model concentrates on fixing the errors created by the last one after the sequential training of models. GBoost fundamental concept is to use gradient descent to reduce the loss function of mean squared error. Every iteration a fresh weak learner usually a shallow decision tree is included in the model. Designed to roughly match the negative gradient of the loss function about the true output, this new learner fits the residuals, or errors, of the previous model. This method repeatedly improves the model by guiding the next learner to concentrate on the challenging-to-predict cases. Often the mean of the target values for regression issues, the boosting procedure starts with an initial forecast. The model then progressively adapts a weak learner to the residual errors, modulating its predictions to lower the total error. Following every stage, the model updates the forecast by aggregating the output of the present model with a weighted form of the new learner’s predictions. The learning rate parameter lets the model adjust its performance and avoid overfitting by controlling the amount each learner adds to the general prediction.

Although GBR supports several loss functions and is somewhat flexible, for regression the technique usually uses squared error loss. Furthermore, typically employed to improve generalization and lower overfitting are regularizing methods including shrinkage and subsampling. Gradient Boosting Regression essentially uses weak models to progressively increase performance while being computationally effective and scalable.

Gradient Boosting is an ensemble technique to improve the prediction accuracy by combining the output of multiple weak learners (typically decision trees) in a sequential manner. Each new learner focuses on the errors made by the previous ones, thereby reducing the model’s overall bias and variance.

Here’s how it can be applied to datasets (Fig. 2):

1.
Pre-processing the dataset:
- Input Features: The dataset is split into training, validation, and test sets.
- Normalization/Standardization: Data is cleaned and normalized to ensure consistent scales across features.
2.
Training the gradient boosting model:
- Initial Prediction: A simple model (e.g., predicting the mean value of the target) is created as a baseline.
- Error Computation: The residuals (errors) between the baseline prediction and actual target values are computed.
- Weak Learner Addition: Decision trees are sequentially trained on these residuals. Each tree focuses on correcting the errors of the previous model.
- Learning Rate: A learning rate is applied to control the contribution of each tree, preventing over fitting.
3.
Hyper parameter tuning:
- Parameters like the number of trees, tree depth, and learning rate are tuned using cross-validation for optimal performance.
4.
Model evaluation:
- The trained model is evaluated using metrics such as mean squared error, accuracy, or F1-score on validation/test datasets.
5.
Final prediction:
- The final model aggregates the predictions of all weak learners for robust results.

Extreme learning machine

Extreme Learning Machine (ELM) is a fast and efficient machine learning algorithm primarily used for regression and classification tasks. Based on a single-hidden-layer feedforward neural network (SLFN), it presents a clear benefit over conventional neural networks by doing away with iterative training processes. In ELM, the weights between the input layer and hidden layer as well as the biases for the hidden neurons are randomly generated and remain constant all through the training phase. The weights between the hidden layer and the output layer alone are trained in the model. ELM is fundamentally based on the approximation theory, in which the input data is converted into a high-dimensional space and the hidden neurons can be seen as feature mapping units. Since it guarantees that the hidden neurons generate nonlinear transformations of the input data, hence producing more robust feature representations, the random initialization of input weights and biases is vital. Usually utilizing a least squares method, the output weights are analytically found once the hidden layer outputs are generated by minimizing the error between the predicted and actual values. ELM is quite quick relative to conventional neural networks because of this architecture since it avoids the requirement for back propagation and gradient descent. Nonetheless, as they can influence the generalization capacity and performance of the model, the number of hidden neurons and their matching weights must be carefully chosen. We know that various research has utilized ANOVA methodologies to examine the efficiency and pollution levels of biofuel vehicles. One study, for instance, looked into the effectiveness and pollution qualities of combinations of biodiesel composed of Moringa blends. By combining such oils with fuel oil, they were able to improve efficiency metrics while cutting pollutants by thirty per cent, according to their analysis of variance. Table 5 represents ANOVA results. Some research examines data on vehicle emissions using ANOVA. One research optimized a light-duty diesel engine burning and pollution using the analysis of variance method. Nitrogen oxides, or smoke, and particular energy use were all significantly reduced in the findings. Table 6 represents engine emissions data.

Table 5 ANOVA results for engine performance data.

Full size table

Table 6 ANOVA results for engine emission data.

Full size table

The Extreme Learning Machine is a fast, single-hidden-layer feed forward neural network. It randomly assigns input weights and biases and only trains the output weights, making it computationally efficient. Here’s its application (Fig. 2):

1.
Pre-processing the dataset:
- Input features are prepared similarly to Gradient Boosting, ensuring data consistency and feature scaling.
2.
ELM training process:
- Input Weights Initialization: Random weights and biases are assigned to the connections between the input and hidden layers.
- Hidden Layer Activation: The input features are transformed using a nonlinear activation function (e.g., sigmoid or ReLU).
- Output Weights Calculation: Using the transformed data, output weights are computed analytically (e.g., using a Moore–Penrose pseudoinverse) to minimize the error.
3.
Hyper parameter selection:
- Parameters such as the number of hidden neurons and the type of activation function are selected based on the dataset.
4.
Model evaluation:
- The ELM model is evaluated using the same performance metrics as Gradient Boosting.
5.
Final prediction:
- The trained model is used to make predictions on unseen test data.

Development of RSM models

Statistical interpretation of model data

To analyse the engine performance data, an Analysis of Variance (ANOVA) was conducted to evaluate the statistical significance of the results. The following Table 5 presents the ANOVA results for the engine performance parameters, which include the effects of different fuel blends and operating conditions on key performance indicators.

Similarly, an Analysis of Variance (ANOVA) was applied to assess the engine emission data. The following Table 6 presents the ANOVA results for engine emissions, highlighting the impact of various fuel blends and operational factors on emission levels.

$$\begin{gathered} {\text{BTE}} = 308.23 + 3.66*{\text{BP}} - 0.016*{\text{LHV}} + 0.00012 \hfill \\ \quad \quad \quad \;\;*{\text{BP}}*{\text{LHV}} - 0.78*{\text{BP}}^{2} + 0.00000022*{\text{LHV}}^{2} \hfill \\ \end{gathered}$$

(3)

$$\begin{gathered} {\text{BSFC}} = 6.23{-}0.43*{\text{BP}}{-}0.00022*{\text{LHV}} + 0.0000064 \hfill \\ \quad \quad \quad \;\;*{\text{BP}}*{\text{LHV}} + 0.0185*{\text{BP}}^{2} + 2.16228{\text{E}} - 009*{\text{LHV}}^{2} \hfill \\ \end{gathered}$$

(4)

$$\begin{gathered} {\text{CO}} = - 0.66{-}0.025*{\text{BP}} + 0.000037*{\text{LHV}} + 0.000003 \hfill \\ \quad \quad \quad \;\;*{\text{BP}}*{\text{LHV}} + 0.00099*{\text{BP}}^{2} - 4.83622{\text{E}} - 010*{\text{LHV}}^{2} \hfill \\ \end{gathered}$$

(5)

$$\begin{gathered} {\text{HC}} = - 2789.95{-}3.98*{\text{BP}} + 0.136*{\text{LHV}}{-}0.000084 \hfill \\ \quad \quad \quad *{\text{BP}}*{\text{LHV}} + 2.29*{\text{BP}}^{2} - 1.63246{\text{E}} - 00*{\text{LHV}}^{2} \hfill \\ \end{gathered}$$

(6)

$$\begin{gathered} {\text{CO}}_{2} = 191.11 + 3.24*{\text{BP}}{-}0.0094*{\text{LHV}}{-}0.000058 \hfill \\ \quad \quad \quad *{\text{BP}}*{\text{LHV}} + 0.135*{\text{BP}}^{2} + 1.17720{\text{E}} - 007*{\text{LHV}}^{2} \hfill \\ \end{gathered}$$

(7)

$$\begin{gathered} {\text{NOx}} = 23425.89 + 95.19*{\text{BP}} - 1.165*{\text{LHV}} + 0.0125 \hfill \\ \quad \quad \quad *{\text{BP}}*{\text{LHV}} - 42.83*{\text{BP}}^{2} + 0.000015*{\text{LHV}}^{2} \hfill \\ \end{gathered}$$

(8)

BTE model

The BTE model was developed using ANOVA. It is shown in the mathematical form as Eq. (3). The ANOVA results are listed in Table 5. The surface diagram for BSFC model is depicted in Fig. 3a. Examining variance (ANOVA) for the response surface quadratic model provides noteworthy results for the response variable "BTE." With a very low probability (p < 0.0001), the model shows an F-value of 714.85, meaning that noise is statistically significant about this result. With a mean square value of 90.37, the model’s total of squares is 451.87, dispersed over 5 degrees of freedom. Important elements causing the model’s relevance are A-BP, B-LHV, A², and B². With an F-value of 2944.76 and a p-value of less than 0.0001, the factor A-BP shows particularly great individual contribution. Analogous to this, the factor B-LHV displays an F-value of 399.34 (p < 0.0001). With an F-value of 4.77, the interaction term AB has a p-value of 0.0538, therefore suggesting marginal significance. With F-values of 219.27 (p < 0.0001) and 7.66 (p = 0.0199), respectively, both quadratic terms, A² and B², are likewise significant. With a sum of squares of 1.26 and 10 degrees of freedom, the residual error—which reflects inexplicable variance is small. The good fit of the model to the data is underlined by its total sum of squares, 453.13. Important model terms imply that it might not be required to continue model refinement or reduction. The surface diagram shows that peak BTE is observed with low biodiesel blends and full engine load as described in experimental section.

BSFC model

The BSFC model was developed using ANOVA as shown in the form of algebraic expression in Eq. (4). The ANOVA results are listed in Table 5. The surface diagram for BSFC model is depicted in Fig. 3b. It is noted that lowest BSFC is attained at similar locations on surface diagram as was the case of peal BTE i.e. higher LHV and fuel engine loads. With an F-value of 125.14 and a p-value of less than 0.0001, the ANOVA for the response surface quadratic model connected to “BSFC” reveals that the model is very significant and indicates a very low possibility that such a result could arise owing of noise. With a mean square value of 0.026 the model’s sum of squares, distributed over 5 degrees of freedom, is 0.13. With F-values of 424.22 and 100.41 respectively and p-values less than 0.0001, A-BP and B-LHV are among the model terms with great significance. With an F-value of 8.57 (p = 0.0151) the interaction term AB is particularly noteworthy. Furthermore, whilst the term B² is not significant with a p-value of 0.5135, the quadratic term A² shows an F-value of 75.61 (p < 0.0001. Low unexplained variance is shown from the minimal residual sum of squares at 2.067E-003. With the major terms A, B, AB, and A² helping to define the model’s efficacy, the overall sum of squares for the model is 0.13, therefore demonstrating a solid fit to the data.

CO emission model

ANOVA was employed in the development of the CO emission model, presented as algebraic equation in Eq. (5). Figure 3c shows the surface diagram for the CO emission model. Lowest CO is observed at higher engine load where combustion improved. The ANOVA results are listed in Table 6. With an F-value of 190.52 and a p-value of less than 0.0001, the ANOVA for the response surface quadratic model for “CO” demonstrates the model is highly significant and indicates a very low possibility of this outcome arising owing of noise. With a mean square value of 3.085E-004 and a sum of squares of 1.543E-003 distributed over 5 degrees of freedom, the model has. With an F-value of 914.33 (p = 0.0001), A-BP has a major impact among the model factors; the quadratic term A² is also important and displays an F-value of 28.13 (p = 0.0003). With p-values of 0.5959, 0.1446, and 0.1177 respectively, other terms—B-LHV, AB, and B²—are not significant. With a sum of squares of 1.619E-005, the residual error is negligible and suggests low variance inexplicable by the model. With A-BP and A² as the most important factors and the non-significant components implying the possibility for model reduction to increase parsimony without compromising accuracy, the model fits the data generally rather well.

HC emission model

The HC emission model was developed using ANOVA; it is shown as algebraic equation in Eq. (6). The surface diagram for the HC emission model is shown in Fig. 3d. The ANOVA results are listed in Table 6. The lowest HC emission was observed when the load on engine was low and biodiesel blending was on higher side. The ANOVA results are listed in Table 3. With an F-value of 31.45 and a p-value of less than 0.0001, the ANOVA for the response surface quadratic model for “HC” shows a highly significant model indicating a minimum possibility that these results are due to noise. With a mean square of 573.48, the model’s total of squares is 2867.41, dispersed over 5 degrees of freedom. With F-values of 106.20 (p = 0.0001) and 35.18 (p = 0.0001 respectively, A-BP and B-LHV respectively emphasize their great impact and help to explain the relevance of the model. With an F-value of 13.18 (p = 0.0046), the quadratic term A² is likewise important. With p-values of 0.9000 and 0.1158 respectively, the interaction term AB and the quadratic term B² are not significant, either though. With a sum of squares of 182.34, suggesting a decent fit, the residual error is rather little. Given the little impact of AB and B², model reduction may help to simplify the model without sacrificing its explanatory ability.

CO₂ emission model

ANOVA was applied in development of the model utilizing CO₂ emission data. The CO₂ emission model is shown in Eq. (7) as algebraic equation. Surface diagram for the CO₂ emission model is shown in Fig. 3e. The ANOVA results are listed in Table 6. Lower LHV and full engine load exhibit more CO₂ emission level. The low CO₂ emission levels are observed at lower engine loads. The ANOVA for the response surface quadratic model for "CO₂" shows a highly significant model, with an F-value of 1679.69 and a p-value of less than 0.0001, indicating a negligible likelihood that the results are due to noise. The model’s sum of squares is 110.97, with a mean square of 22.19 across 5 degrees of freedom. The term A-BP is the most influential, with an exceptionally high F-value of 8195.78 (p < 0.0001), indicating its dominant effect on CO2 levels. B-LHV is also significant, with an F-value of 79.79 (p < 0.0001), along with the interaction term AB, which has an F-value of 10.99 (p = 0.0078). Both quadratic terms, A² and B², are significant, with F-values of 62.33 (p < 0.0001) and 21.28 (p = 0.0010), respectively. The residual sum of squares is minimal at 0.13, indicating a good fit of the model to the data. The significant contributions of A, B, AB, A², and B² highlight the complexity of the interactions and nonlinear effects in the model, while the low residual suggests that model reduction may not be necessary.

NOx emission model

The NOx emission data was used to develop the model by using ANOVA. The NOx emission model is depicted in the form of algebraic equation in Eq. (8). Figure 3f exhibits the surface diagram for the NOx emission model. Full engine load and high LHV shows a higher NOx emission level. The ANOVA results are listed in Table 6. This is attributed to high combustion chamber temperatures. With an F-value of 589.50 and a p-value of less than 0.0001, the ANOVA findings for the NOx emission model show a highly significant model demonstrating the model is a powerful predictor and unlikely to have occurred by chance. Comprising a mean square of 6.814E + 005 over 5 degrees of freedom, the model’s sum of squares is 3.407E + 006. With an extraordinarily high F-value of 2864.54 (p < 0.0001), A-BP is the most important element among the model components in terms of NOx emissions. Though their effects are less than those of A-BP, B-LHV with an F-value of 6.26 (p = 0.0314) and the interaction term AB with an F-value of 5.76 (p = 0.0273) are also statistically significant. With an F-value of 72.51 (p < 0.0001), the quadratic term A² is significant and emphasizes non-linear effects of A-BP on NOx emissions. With a p-value of 0.0868, which denotes a smaller non-linear effect of B-LHV, the term B² is not significant, though. With a mean square of 1155.91 across 10 degrees of freedom, the low residual sum of squares of 11,559.05 imply the model catches most of the variability in NOx emissions. The model offers a strong explanation of the NOx emissions considering the important terms (A, B, AB, and A²) and the minimum residual error; still, there is space for model improvement by eliminating the non-significant B² term.

Desirability approach for optimization

Response Surface Methodology (RSM) makes extensive use of the desirability method as a commonly utilized optimization tool to identify the ideal mix of several answers. Based on predefined goals like maximizing, minimizing, or reaching a target value, it turns every response into a desirability function ranging from 0 (undesirable) to 1 (completely desirable). Usually using geometric mean, this distinct desirability are then aggregated into an overall desirability value. The optimization process balances the trade-offs of contradictory reactions to optimize this general desirability. Applications of this approach abound in industrial processes, developing goods, and multi-objective optimization projects. The desirability bar plots are depicted in Fig. 4. The optimized levels of each parameter are listed in Table 7.

Table 7 Optimization results.

Full size table

ML modelling

Correlation analysis

The correlation matrix given in Table 8, shows relationships between many performance and emission parameters including brake power (BP), lower heating value (LHV), brake thermal efficiency (BTE), brake specific fuel consumption (BSFC), and emissions such CO, HC, CO₂, and NOx. These interactions and trade-offs between engine performance and emissions assist one to better grasp them (Fig. 5). Beginning with braking power (BP), CO₂ (0.99) and NOx (0.98) show a significant positive association suggesting that emissions of CO₂ and NOx climb as brake power increases. This implies that more complete combustion brought about by higher power output generates more CO₂ and NOx. With BTE (0.91), a comparable strong positive association indicates that braking power and efficiency increase concurrently, hence increasing engine efficiency at higher power output. On the other hand, BP shows substantial negative associations with CO (-0.98) and BSFC (-0.83), therefore as power rises, CO emissions and fuel consumption per kW output drop, indicating more effective combustion.

Table 8 Correlation matrix.

Full size table

With a modest positive association between lower heating value (LHV) and hydrocarbon (HC) emissions (0.46), fuels with higher energy content usually generate more unburned hydrocarbons. LHV does not affect general combustion efficiency or emissions trends, though, as there is no appreciable association between LHV and other factors including braking power or emissions including CO₂, CO, and NOx. Positive correlations with BP (0.91), NOx (0.95), and CO₂ (0.84) show that as BTE improves emissions of CO₂ and NOx rise. Higher engine efficiency at higher combustion temperatures matches this to more complete oxidation (CO₂) and greater NOx generation. Conversely, BTE displays substantial negative associations with BSFC (-0.97) and CO (-0.93), implying that as efficiency increases the engine consumes less fuel and generates less carbon monoxide emissions, hence stressing the need for effective combustion.

Where lower BSFC equates to greater thermal efficiency, as less fuel is needed to generate a given amount of power, brake-specific fuel consumption (BSFC) has a very significant negative association with BTE (-0.97). Decreased fuel usage per unit of power is thus connected to decreased emissions of NOx (-0.88) and CO (-0.88), which have negative relationships as well. Higher power and efficiency lead to lower CO emissions, which is expected as more complete combustion produces less incomplete combustion byproducts like CO. CO emissions show negative relationships with BP (-0.98) and BTE (-0.93). A strong inverse connection with NOx (-0.99) emphasizes that CO and NOx emissions have an inverse relationship, where more complete combustion lowers CO but raises NOx due to greater combustion temperatures favouring NOx development. With LHV (0.46), hydrocarbon (HC) emissions exhibit a modest positive association suggesting that some fuels with higher energy density may produce more unburned hydrocarbons, perhaps due to incomplete combustion at lower engine efficiency. Though to a lesser degree than other factors, HC also correlates somewhat with CO₂ (0.76) and NOx (0.75), implying that greater HC emissions are related to increases in CO₂ and NOx. Higher brake power and efficiency lead to more CO₂ emissions due to more complete combustion converting more carbon into CO₂, so strongly correlated CO₂ emissions are with BP (0.99), BTE (0.84), and NOx (0.95). Higher NOx levels also accompany high CO₂ emissions, suggesting that efficient combustion and higher temperatures support both CO₂ and NOx generation.

Positive correlations between NOx emissions and BP (0.98), BTE (0.95), and CO₂ (0.95) indicate that rising power and efficiency likely result in greater combustion temperatures that enable NOx generation. The significant inverse link between NOx and CO (-0.99) highlights, even more, the usual trade-off in combustion processes: lower CO emissions are usually connected with greater NOx emissions since more complete combustion reduces CO while increasing NOx generation. This study shows generally that although higher brake power and thermal efficiency usually translate into lower fuel consumption and reduced CO emissions, they also result in increased NOx and CO₂ emissions, so highlighting the difficulties in maximizing engine performance while minimizing pollutants, particularly.