Introduction

Unsaturated soil behavior, also referred to as unsaturated soil mechanics, is a field within geotechnical engineering dedicated to studying soils that are not fully saturated with water1. While traditional soil mechanics has primarily focused on saturated soils, unsaturated soils are critical in various engineering and environmental contexts1,2,3. These soils contain both air and water within their pore spaces1,4, with saturation levels that can range from partial to full1,3. Factors such as suction, matric potential, and water content significantly influence the behavior of unsaturated soils1,2,3,4,5,6. The integration of biochar with a Plant Microbial Fuel Cell offers a promising, comprehensive approach to tackling soil degradation, boosting agricultural productivity, and supporting sustainable energy solutions in unsaturated soils1. Continued research and experimentation are essential to fully explore these synergies and to optimize this combined system for practical applications, leveraging machine learning techniques. Figure 1 presents an illustrative mechanism of the PMFC system.

Fig. 1
figure 1

Mechanism of the biochar catalytic system.

Chen et al.1 examined how plant microbial fuel cells (PMFCs) behave in terms of bio-hydrology and suggested using biochar amendment to enhance their bioelectricity production under drought. Biochar-enhanced Plant Microbial Fuel Cells were created by cultivating green-roof plants on sandy lean clay mixed with biochar at various mass ratios. Biochar significantly boosted the electrical power production of PMFCs by up to 30 times in unsaturated circumstances, with bioelectricity closely linked to suction. The study proposed utilizing bioelectricity to monitor soil-water properties and improve the drought resilience of PMFCs. Hussain et al.2 examined how plant microbial fuel cells behave in terms of bio-hydrology and suggested using biochar amendment to enhance their bioelectricity production under drought. Biochar-enhanced Plant Microbial Fuel Cells were created by cultivating green-roof plants in sandy lean clay mixed with biochar at various mass ratios. The study found that biochar significantly boosted the electrical power generated by PMFCs by up to 30 times under dry conditions, and bioelectricity was closely linked to suction. The study proposes utilizing bioelectricity to monitor soil-water properties and improve the drought resilience of PMFCs. Also, Cai et al.3 examined how biochar affects the soil shrinkage and water retention properties of kaolin and bentonite in salty environments. The findings indicated that biochar effectively reduced the shrinkage of clays under salt stress by 6-14% and 50-107%, respectively. Biochar’s porous structure and hydrophilic properties trap sodium ions by undergoing ion exchange and protonation processes. The research indicated that biochar-modified clays may aid in desalination and enhance resistance to shrinkage-induced harm in hydro-chemical barriers. Zhou et al.4 examined biochar synthesis, engineering techniques, uses, and future possibilities. The text emphasized the advancement of physical, chemical, and bio-engineering methods for producing biochar. Possible substitutes for biochar include the potential for creating carbon-based products, treating wastewater, storing energy, and repairing super-capacitors. The report also explored upcoming tactics and technology for the circular bio-economy. Guo et al.5 discovered that the use of peanut shell biochar can enhance the interaction between soil and grass in grassed plots. The use of biochar increased the grass leaf area index by 38% and root length density by 200%. The biochar decreased peak suction during evapo-transpiration by 54%, therefore decreasing excessive water loss. The study suggested using 5% peanut shell biochar content for the long-term maintenance of vegetated earthen infrastructures to mitigate adverse effects on plant development. Chen et al.6 investigated how freezing-thawing cycles affect clay permeability and saturated permeability (Ksat) in clay. Clay samples were compressed at different moisture levels, and the amount of biochar applied was recorded. The study found that the hydraulic conductivity (Ksat) decreased significantly when the biochar application rate exceeded 4%. This indicated that biochar can be utilized as an environmentally acceptable additive to control clay permeability, particularly in cold regions, to reduce Ksat in geo-environmental constructions. Keeffe7 intended to include biochar in precision agriculture technology to enhance soil health, water retention, and crop output. A finite element analysis model was created and utilized on a conventional Palouse silt loam soil at various concentrations. The model demonstrated higher retention for both types of amendments and concentrations, resulting in increased moisture retention in and around the amendment zone. The study investigated the impact of biochar amendment on the hydrologic processes in the Palouse region, specifically targeting the argillic and frangipani layers. Chen et al.8 examined the molecular composition changes in paddy topsoil following biochar amendments using untreated, manured, and burned maize straw. Research findings indicate that biochar improves the retention of carbon in the soil and the durability of organic matter, although there is limited investigation into alterations in the molecular structure of organic compounds. The research revealed that biochar raised soil organic matter content by 12% and 36% in comparison to no amendment. Yet, biochar also reduced the quantity of n-alkanes, fatty acids, and free lipids in the study. The study proposed that reintroducing agricultural residue as biochar could be a sustainable method to improve soil organic matter and molecular diversity. Another research report, Quinn9 examined the movement of Escherichia coli pathogenic and nonpathogenic. Coli k12 strains in water-saturated sand columns treated with Magnesium and Nitrogen-doped biochar. Bacterial cell retention was improved by adding biochar at a 2% weight ratio. The study determined that biochar repels water, whereas sand and bacteria attract water. The optimal biochar was created at 600 degrees Celsius, indicating that while developing efficient biochar filters, all factors influencing bacterial movement should be taken into account. In other studies, Geuder10 created optimal management strategies for farmers and producers to mitigate these detrimental effects. Two components are discussed: a soil amendment experiment comparing greenhouse gas (GHG) reduction of various biochar application methods and the development and building of a passive multi-component bioreactor. Biochar additions have been proven to enhance soil health and crop yield, as well as reduce greenhouse gas emissions when used in conjunction with fertilizer on farmland. The study investigated the release of CO2 and N2O from four small areas in Winchester, southern Ontario, Canada, after applying varying amounts of liquid swine dung and biochar. The bioreactors were created to reduce N2O gas emissions and eliminate veterinary and pharmaceutical substances present in field amendments. Additional research is required to establish the ideal biochar co-amendment quantities and timing, as well as to evaluate bioreactor efficiency. Graves11 evaluated the effectiveness of various biochar types derived from Miscanthus grass for controlling ammonia levels in chicken houses. Acid-activated biochar reduced biochar pH and raised overall acidity levels. During a two-week lab-scale experiment, biochar produced at 400 °C, processed with acetic acid, and administered at a high addition rate decreased ammonia emissions by 19.7%. Future studies investigated novel biochar activation techniques to improve the preservation of acid functional groups to boost ammonia adsorption.

Methodology

Preamble

The preparation of materials for a biochar-enhanced plant microbial fuel cell experiment in unsaturated soil involved assembling the components required for a controlled and reproducible setup1. Unsaturated soil samples were collected from the study site, and moisture content was adjusted to maintain unsaturated conditions12,13,14. Containers were prepared to hold the soil and selected vegetation. Biochar was produced via a controlled pyrolysis process following the under-pressure-controlled heating method outlined by Onyelowe et al.15. Soil and biochar particle distributions are shown in Fig. 2. The soil was composed of 65% silt and 35% sand, with a plasticity index of 15%, specific gravity of 2.66, and a saturated hydraulic permeability (Ks) of 1.5E-6 m/s, classified as CL in the unified soil classification system1. The biochar feedstock was Prunus persica, processed at 600 °C, with a cation exchange capacity (CEC) of 85.0 cmol/kg, ash content of 21.6%, pH of 9.1, and a water absorption capacity (WAC) of 3.53 g/g1. Comprehensive characteristics of both soil and biochar are detailed in the literature1.

Fig. 2
figure 2

Silty soil and biochar particles distributions curves.

Treatment process and data collection

The biochar was incorporated into the PMFC system at mass ratios of 0%, 5%, and 10% relative to soil mass1,2. To ensure uniform mixing, variations in particle size and biochar source were taken into account, with shovels or mixing tools used to evenly distribute the biochar within the soil. Hydrocotyle vulgaris plants, suitable for the experiment, were selected for planting1. A watering system using watering cans provided consistent moisture control throughout the experiment. Carbon-based materials were selected for the anode and cathode electrodes in a PMFC setup, which were securely placed in containers to house the plants and electrodes16. Electrical connections were established with wires and connectors linked to measurement instruments, including a multimeter or data logger to record voltage, current, and power output from the PMFC. Soil moisture sensors were installed for real-time monitoring, and data recording tools like notebooks, spreadsheets, or data loggers were used to track all measurements. Safety protocols were followed, with appropriate personal protective equipment (PPE) such as gloves and safety glasses, depending on the materials and procedures. Soil samples were either collected from the study area or selected from a standard soil type, and moisture content was adjusted by air-drying or adding water to achieve unsaturated conditions. Containers or pots were filled with the prepared soil, and the biochar was mixed into the soil in precise amounts for each treatment level, ensuring even distribution16,17,18. The selected vegetation was planted according to recommended spacing and depth guidelines, ensuring consistency in plant type, growth stage, and health1. The PMFC system was completed by securely installing the chosen electrode materials into the soil containers19,20,21,22. Wires connected the electrodes to measuring devices, and soil moisture sensors were placed at different depths. Data recording tools were configured to continuously track soil moisture levels, PMFC performance, and other relevant data1. Experimental conditions such as temperature and lighting were maintained at consistent levels, with replicates set up for each treatment to support reliable results. The complete setup is illustrated in Fig. 31.

Fig. 3
figure 3

Experimental setup with (a) soil plots of PMFC samples, (b) devices for monitoring and measuring unsaturated soil properties and bioelectricity, and (c) illustrative scheme of the PMFC model.

A total of ninety (90) records were collected from the experimentally tested silty sand samples mixed with different amount of biochar. Each record contains the following data:

  • Bio Biochar ratio;

  • I Electric current (µA);

  • U Electrical potential (mV);

  • θ Metric water content;

  • T Temperature (°C);

  • γb Bulk density (g/cm3);

  • Log ψ the logarithm of Suction (kPa) to base 10.

The collected records were divided into training set (70 records ≈ 75%) and validation set (20 records ≈ 25%). Tables 1 and 2 summarize their statistical characteristics and the Pearson correlation matrix. Finally, Fig. 4 shows the histograms for both inputs and outputs and the relations between the inputs and the outputs.

Table 1 Statistical analysis of collected database.
Fig. 4
figure 4

Correlation, distribution and interpretation chart.

Sensitivity analysis

The suction of unsaturated granular soil treated with biochar plays a critical role in the performance of Plant Microbial Fuel Cells (PMFCs) for bioelectricity generation. Suction controls the soil moisture retention, hydraulic conductivity, microbial activity, and electrochemical interactions that influence both soil behavior and bioelectricity output. Conducting a sensitivity analysis helps to identify the most influential parameters affecting suction and their impact on the system’s efficiency. A preliminary sensitivity analysis was carried out on the collected database to estimate the impact of each input on the (Log Ψ) values. “Single variable per time” technique is used to determine the “Sensitivity Index” (SI) for each input using Hoffman a Gardener formula as follows:

$$\:SI\:\left({X}_{n}\right)=\:\frac{Y\left({X}_{max}\right)-Y\left({X}_{min}\right)}{Y\left({X}_{max}\right)}$$
(1)

A sensitivity index of 1.0 indicates complete sensitivity, a sensitivity index less than 0.01 indicates that the model is insensitive to changes in the parameter. Figure 5 shows the sensitivity analysis with respect to (Log Ψ). It can be shown that U: the Electric potential has the highest influence on the suction of the unsaturated soil and this is followed closely by θ: Metric water content and γb: Bulk density (g/cm3).

Fig. 5
figure 5

Sensitivity analysis with respect to LogΨ.

Research program

Eight different ML classification techniques were used to predict the suction of biochar incorporated into the PMFC system at mass ratios of 0%, 5%, and 10% relative to soil mass using the collected database. These techniques are “Gradient Boosting (GB)”, “CN2 Rule Induction (CN2)”, “Naive Bayes (NB)”, “Support vector machine (SVM), “Stochastic Gradient Descent (SGD)”, “K-Nearest Neighbors (KNN)”, “Tree Decision (Tree)” and “Random Forest (RF)”. The developed models were used to predict (Log Ψ) using the inputs (Bio, I, U, θ, T, γb). All the developed models were created using “Orange Data Mining” software version 3.36. The considered data flow diagram is shown in Fig. 6. The following section discusses the results of each model. The accuracies of developed models were evaluated by comparing SSE, MAE, MSE, RMSE, Error (%), Accuracy (%) and R2 between predicted and calculated suction parameters values. The definition of each used measurement is presented in Eq. (2) to (7).

$$\:MAE=\:\frac{1}{N}\sum\:_{i=1}^{N}\left|{y}_{i}-\widehat{y}\right|$$
(2)
$$\:MSE=\:\frac{1}{N}\sum\:_{i=1}^{N}{\left({y}_{i}-\widehat{y}\right)}^{2}$$
(3)
$$\:RMSE=\:\sqrt{MSE}$$
(4)
$$\:Error\:\%=\frac{RMSE}{\widehat{y}}$$
(5)
$$\:Accurcy\:\%=1-Error\:\%$$
(6)
$$\:{R}^{2}=1-\:\frac{\sum\:{\left({y}_{i}-\widehat{y}\right)}^{2}}{\sum\:{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}}$$
(7)
Fig. 6
figure 6

The considered data flow in “Orange” software.

Theoretical frameworks for the selected machine learning techniques used in this study

Gradient boosting

Gradient boosting is an enhanced machine learning technique, which generates a strong predictive model by combining a series of decision trees5. The gradient boosting has a framework that iteratively minimizes loss function by using gradient descent, to become e effective for both classification and regression analysis. It starts with an initial model, f0(x), often a simple prediction like the mean of the target values for regression or a uniform class probability for classification. In each step, the model is improved by training a new weak learner to focus on the remaining errors, or residuals, from previous predictions. At each iteration, the algorithm calculates the negative gradient of the loss function with respect to the model’s predictions and essentially finding the direction in which the model should adjust to reduce error. This gradient guides the training of a new weak learner, hm(x), which is then added to the existing model. The updated model can be written as:

$$\:{f}_{m+1}\left(x\right)={f}_{m}\left(x\right)+\alpha\:{h}_{m}\left(x\right)$$
(8)

where α = learning rate and it controls the influence of each weak learner on the overall model.

CN2 rule induction

The CN2 Rule Induction operates as a rule-based classification algorithm that has been designed to generate a set of if-then rules, which differentiate between classes within a dataset24. There set of instances as the starting point and each instance contains a class label and a attribute values collection. The algorithm’s objective is to iteratively create simple, interpretable rules that classify data accurately by sequentially optimizing rules for maximum coverage and accuracy. Each rule in CN2 takes the general form: IF Condition→THEN Class. Where the Condition is a conjunction of attribute-value pairs that define a subset of instances for which the rule is valid, and Class is the predicted class label for instances satisfying the condition. CN2 evaluates candidate rules using a heuristic measure, commonly based on the entropy or likelihood ratio of the rule’s coverage and accuracy in differentiating a specific class. For a given rule R, the information gain can be calculated using entropy to measure the quality of the rule. The entropy H for a rule’s distribution over classes is defined as:

$$\:H\left(R\right)=-\sum\:_{cƐC}P\left(c|R\right){log}_{2}P\left(c\right|R)$$
(9)

where P(cR)is the conditional probability of class ccc given that an instance matches the conditions of rule RRR. The information gain IGIGIG of a rule is then:

$$\:IG\left(R\right)=H\left(C\right)-H\left(R\right)$$
(10)

where H(C) represents the entropy of the class distribution in the dataset, and H(R) is the entropy of instances covered by the rule RRR. A higher information gain implies a more effective rule in separating instances of different classes. The coverage of a rule R, denoted as Cov(R), refers to the proportion of instances in the dataset that satisfy the rule’s conditions. This is mathematically represented as:

$$\:Cov\left(R\right)=\frac{\left|\left\{xƐX\:\right|x\:satisfies\:R\right\}|}{\left|X\right|}$$
(11)

where X is the total number of instances. Higher coverage indicates that the rule applies to a larger portion of the dataset, though there may be a trade-off between coverage and precision.

Naive Bayes

Naive Bayes is based on Bayes’ theorem of probabilistic classifier and leveraging the assumption that class features are conditionally independent. For instance, a given class C and feature vector X = (x1,x2,…,xn), will have a posterior probability as follow:

$$\:P\left(C\right|X\left)\alpha\:P\right(C)\prod\:_{i=1}^{n}P\left({x}_{i}\right|C)$$
(12)

Thus the class \(\:\widehat{C}\) that maximizes this posterior probability is predicted by the classifier, such that:

$$\:\widehat{C}={}_{C}{}^{\text{arg}max}\left(P\right(C)\prod\:_{i=1}^{n}P\left({x}_{i}\right|C\left)\right)$$
(13)

Where P(C) = prior probability of each class, estimated as the relative frequency of instances in that class. P(xiC) = conditional probability of each feature given the class, which can be calculated as frequency counts for categorical features or approximated by a Gaussian distribution for continuous features.

Support vector

Support Vector Machines (SVMs) are supervised machine learning techniques mainly used for classification projects25. In SVMs, finding the optimal hyperplane that maximally separates data points from different classes is achieved. For instance, in linearly separable data, SVM can identify this hyperplane, through maximizing the distance or margin between the each data closest data points or support vectors. Considering dataset of labeled instances (xi,yi)) where xiRn and yi{−1,1}, the decision boundary becomes a hyperplane wx + b = 0, where w = weight vector perpendicular to the hyperplane, and b = bias term. The optimization problem to maximize the margin is formulated as:

$$\:{}_{w,b}{}^{min}{\frac{1}{2}\left|\left|w\right|\right|}^{2}$$
(14)

Subject to the constraints:

$$\:{y}_{i}\left(w.{x}_{i}+b\right)\ge\:1\:\:\:\forall\:i$$
(15)

In the case of non-linearly separable data, SVM applies the kernel functions to project data into a higher-dimensional space, where a linear separation is possible. Common kernels include the linear, polynomial, and radial basis function (RBF) kernels. The decision function for classification is then:

$$\:f\left(x\right)=sign\left(\sum\:_{i=1}^{n}{\alpha\:}_{i}{y}_{i}K\left(x,{x}_{i}\right)+b\right)$$
(16)

where αi = Lagrange multipliers, and K(x, xi) = chosen kernel function.

Stochastic gradient descent

Stochastic Gradient Descent (SGD) is a machine learning models that performs iterative optimization for data training26. For instance, in high-dimensional spaces, SGD minimizes a given objective function, J(θ), which typically determines the model error with parameters θ. In each iteration, SGD computes the gradient using a single randomly chosen instance or a small batch rather than computing the gradient over the entire dataset. Thus, this technique speeds up convergence by updating the parameters, such that:

$$\:\theta\:=\theta\:-\eta{\nabla\:}_{\theta\:}J(\theta\:;{x}^{\left(i\right)},{y}^{\left(i\right)})$$
(17)

where η is the learning rate, controlling the step size and \(\:\eta{\nabla\:}_{\theta\:}J(\theta\:;{x}^{\left(i\right)},{y}^{\left(i\right)})\) is the gradient of the objective function with respect to θ, evaluated at a training example\(\:{x}^{\left(i\right)},{y}^{\left(i\right)}\).

k-Nearest neighbours

The k-Nearest Neighbors algorithm, also denoted as k-NN, is a non-parametric and instance-based classification technique, which predicts the class of a query instance based on the majority class among its k closest neighbors in the population27. Figure 12 shows the illustration of the K-nearest neighbours. It operates by estimating the distance between the query instance and all other points in the dataset, commonly using Euclidean distance for continuous variables:

$$\:d\left(x,{x}^{{\prime\:}}\right)=\sqrt{\sum\:_{i=1}^{n}{({x}_{i}-{x}_{i}^{{\prime\:}})}^{2}}$$
(18)

where x and x′ are two instances in n-dimensional space.

Tree decision

Decision Trees are supervised learning algorithms, which are used for classification and regression projects. They are able to split data recursively using feature values to create a tree structure, having each internal node, branches and leaf nodes representing feature test, outcomes, and predicted values, respectively. For example, considering a dataset D with classes C, the tree grows by selecting features that maximize the information gain or minimize the impurity. Hence, information gain IG for a split on feature X is respected as:

$$\:IG\left(D.X\right)=H\left(D\right)-\sum\:_{v\in\:values\left(X\right)}\frac{\left|{D}_{v}\right|}{\left|D\right|}H\left({D}_{v}\right)$$
(19)

where H(D) is the entropy or impurity of dataset D, and Dv is the subset of D for each value v of feature X.

Random forest

The random forest algorithm is an ensemble learning approach, which builds multiple decision trees for regression or classification project, and it improves the robustness and accuracy by reducing single trees overfitting25. Each tree in the forest is trained on a different bootstrap sample of the dataset, with random subsets of features selected at each split, introducing diversity among trees. For a training dataset D with n samples, for instance, Random Forest will construct m decision trees T1, T2, …, Tm. Thus, each of the trees is trained on a bootstrap sample Di (random sample with replacement) from D, and at each node, a random subset of k features is selected to find the best split. For classification, the output is determined by a majority vote across all trees:

$$\:\widehat{y}=mode({T}_{1}\left(x\right),{T}_{2}\left(x\right),\dots\:,{T}_{m}\left(x\right))$$
(20)

For regression, the output is the average prediction from all trees:

$$\:\widehat{y}=\frac{1}{m}\sum\:_{i=1}^{m}{T}_{i}\left(x\right)$$
(21)

Response surface methodology

The Response Surface Methodology (RSM) comprises of both mathematical and statistical methods for modelling and optimizing processes by exploring the relationships between response variables and multiple input factors28. In its operation, RSM approximates the underlying process using mostly a second-order polynomial equation that is suitable for capturing response surface curvature. For a process with input variables x1,x2,…,xk and response y, the second-order response surface model is:

$$\:y={\beta\:}_{0}+\sum\:_{i=1}^{k}{\beta\:}_{i}{x}_{i}+\sum\:_{i=1}^{k}{\beta\:}_{ii}{x}_{i}^{2}+\sum\:_{i<j}{\beta\:}_{ij}{x}_{i}{x}_{j}+\in\:$$
(22)

where β0 is the intercept, Βi, βii, and βij are coefficients for linear, quadratic, and interaction terms, respectively, \(\epsilon\) is the error term. Also, RSM utilizes Box-Behnken Design and Central Composite Design (CCD) to gather data and fit model efficiently. Moreover, RSM also identifies the optimal settings of the input factors by analyzing the fitted response surface, often using gradient-based methods to locate the maximum or minimum response or desired regions.

Results presentation and discussion

GB model

The developed (GB) model was based on (Scikit-learn) method with learning rate of 0.1 and minimum tensile subset of 2. Six trials were conducted for each model started with two trees and three tree levels and increased to five tree levels but 4 and 5 trees produced negative errors invoking distortions. The reduction of the prediction Error (%) for each trail is presented in Fig. 7. Accordingly, the models with four trees and four tree levels are considered the optimum ones. Performance metrics of the developed model for both training and validation dataset are listed in Table 2. The average achieved accuracy was (98%). The relations between calculated and predicted values are shown in Fig. 8.

Fig. 7
figure 7

Reduction in Error % with increasing the number of trees and levels.

Fig. 8
figure 8

Relation between predicted and calculated suction using (GB).

CN2 model

Similarly, five (CN2) models were developed considering “Laplace accuracy” as evaluation measurement with beam width of 1.0 and minimum rule coverage of 1.0. The maximum rule length was started by 1.0 and increased up to 5.0. Figure 9 shows the reduction in Error % with increasing the rule length. Accordingly, rule length of 5.0 is considered. The developed models contains 54 “IF condition” rules, Fig. 10 presents some of these rules. Performance metrics of the developed model for both training and validation dataset are listed in Table 2. The average achieved accuracy was (95%). The relations between calculated and predicted values are shown in Fig. 11.

Fig. 9
figure 9

Reduction in error % with increasing the rule length.

Fig. 10
figure 10

Sample of the developed CN2 “If condition”.

Fig. 11
figure 11

Relation between predicted and calculated suction using (CN2).

NB model

Traditional Naive Bayes classifier technique considering the concept of “Maximum likelihood” was used to develop the nine models. Although this type of classifier is highly scalable and are used in many applications, but it showed a low performance as shown in Table 2. The relations between calculated and predicted values are shown in Fig. 12. The achieved average accuracy was 76%.

Fig. 12
figure 12

Relation between predicted and calculated suction using (NB).

SVM model

The developed (SVM) model was based on “polynomial” kernel with cost value of 100, regression loss of 0.10 and numerical tolerance of 1.0. The kernel started with one-degree polynomial (linear) and increased up to two-degree polynomial (quadratic). The reduction in the error % with increasing the polynomial degree is illustrated in Fig. 13. Performance metrics of the three developed models for both training and validation dataset are listed in Table 2. The average achieved accuracy was (97%). The relations between calculated and predicted values are shown in Fig. 14.

Fig. 13
figure 13

Reduction in error % with increasing the polynomial degree.

Fig. 14
figure 14

Relation between predicted and calculated suction using (SVM).

SGD model

These three models were developed considering modified Huber classification function and “Elastic net” re-generalization technique with mixing factor of 0.01 and strength factor of 0.001.The learning rate starts with 0.01, then gradually decreased to 0.001. The reduction in error (%) with reducing the learning rate is presented in Fig. 15. Performance metrics of the three developed models for both training and validation dataset are listed in Table 2. The average achieved accuracy was (85%). The relations between calculated and predicted values are shown in Fig. 16.

Fig. 15
figure 15

Reduction in error % with reducing the learning rate.

Fig. 16
figure 16

Relation between predicted and calculated suction using (SGD).

KNN model

Considering number of neighbors of 1.0, Euclidian metric method and weights were evaluated by distances, the developed (KNN) models showed the best accuracy. (KNN) model showed the best performance where the average error (%) was (95%). The relations between calculated and predicted values are shown in Fig. 17.

Fig. 17
figure 17

Relation between predicted and calculated suction using (KNN).

Tree model

These Four models were developed considering minimum number of instants in leaves of 2.0 and minimum split subset of 5.0. The models began with only two tree levels and gradually increased to five levels. Figure 18 illustrates the reduction in error with increasing the number of layers. The generated trees layouts are shown in Fig. 19. Performance metrics of the last developed model for both training and validation dataset are listed in Table 2. The average achieved accuracy was (95%). The relations between calculated and predicted values are shown in Fig. 20.

Fig. 18
figure 18

Reduction in error % with increasing the no. of layers.

Fig. 19
figure 19

The layout of the developed (Tree).

Fig. 20
figure 20

Relation between predicted and calculated suction using (Tree).

RF model

Finally, six (RF) models were generated. The models began with only two trees and three levels and increased up to three trees and five levels. Figure 21 shows the reduction in Error (%) with increasing number of Tress and layers. Accordingly, the models with four trees and four layers are considered. The developed models are graphically presented using Pythagorean Forest in Fig. 22. These arrangements leaded to a good average accuracy of (91%). The relations between calculated and predicted values are shown in Fig. 23.

Fig. 21
figure 21

Reduction in error (%) with increasing the no. of trees and layers.

Fig. 22
figure 22

Pythagorean forest diagram for the developed (RF) models.

Fig. 23
figure 23

Relation between predicted and calculated suction using (RF).

Overall, the performance summary of the suction models is presented in Table 2showing the selected indices of performance evaluation such as SSE, MAE, MSE, RMSE, Error, Accuracy and R2 utilized in this research paper. Figure 24 presents the Taylor’s chart for comparing the accuracies of the developed models.

Table 2 Performance measurements of developed models.
Fig. 24
figure 24

Comparing the accuracies of the developed models for (Ft) using Taylor charts, (a) training dataset, (b) validation dataset.

RSM model

The application used the actual factor coding Type III - Partial Sum of squares. The Model F-value of 18.04 implies that the model is significant. There is only a 0.01% chance that an F-value this large could occur due to noise. The P-values less than 0.0500 indicate model terms are significant. In this case B, C, AB, AC, BF, CF, A2, C2 are significant model terms. Values greater than 0.1000 indicate the model terms are not significant. These indications are shown in Tables 3 and 4. If there are many insignificant model terms (not counting those required to support hierarchy), model reduction may improve your model. The Predicted R2 of 0.9812 is not as close to the Adjusted R2 of 0.8289 as one might normally expect; i.e. the difference is more than 0.2. This may indicate a large block effect or a possible problem with your model and/or data. The optimized model plots for the RSM prediction of the suction pressure are presented in Figs. 25, 26, 27 and 28. Things to consider are model reduction, response transformation, outliers, etc. All empirical models should be tested by doing confirmation runs. The Adeq Precision measures the signal to noise ratio. A ratio greater than 4 is desirable. Your ratio of 19.797 indicates an adequate signal. This model can be used to navigate the design space. The Eq. (23) in terms of actual factors can be used to make predictions about the response for given levels of each factor. Here, the levels should be specified in the original units for each factor. This equation should not be used to determine the relative impact of each factor because the coefficients are scaled to accommodate the units of each factor and the intercept is not at the center of the design space.

Table 3 ANOVA for quadratic + extra terms model (aliased) for response y.
Table 4 The statistical fit values.
$$\begin{aligned} {\text{y}}\,= & \,{\text{187}}.{\text{57142}}\,+\,{\text{1}}.{\text{35376E}}\,+\,0{\text{5Bio}} - \,{\text{452}}.{\text{179}}0{\text{2I}}\,+\,{\text{48}}.{\text{3818}}0{\text{U}} - \,{\text{36667}}.0{\text{791}}0{\text{q}}\,+\,{\text{173}}.{\text{94143T}} - \,{\text{1348}}.{\text{35129gb}}\, \\ & +\,{\text{777}}.{\text{359}}0{\text{3G}} - \,{\text{78}}.{\text{75812Bio}}*{\text{I}}\,\,+\,{\text{76731}}.{\text{939}}00{\text{Bio}}*{\text{U }} - {\text{523}}.{\text{91}}0{\text{64Bio}}*{\text{q}} - \,{\text{5961}}0.{\text{96739Bio}}*{\text{T}} \\ & - \,0.0{\text{34816Bio}}*{\text{G}} - \,{\text{136}}.{\text{89295I}}*{\text{U}} - \,{\text{1}}.{\text{86523I}}*{\text{q}}\,+\,{\text{234}}.0{\text{6554I}}*{\text{T}}\,+\,{\text{15}}.{\text{41467I}}*{\text{G}}\,+\,0.{\text{378524U}}*{\text{q}} \\ & - \,{\text{28}}.{\text{53141U}}*{\text{T}}\,+\,{\text{152}}.{\text{24491U}}*{\text{G}}\,+\,{\text{15512}}.{\text{76376q}}*{\text{T}} - \,{\text{299}}.{\text{12222q}}*{\text{G}} - \,{\text{1}}.{\text{47921E}}\,+\,0{\text{5gb}}*{\text{G }} \\ & - \,0.{\text{1334}}0{\text{3Bi}}{{\text{o}}^{\text{2}}}\,+\,0.00{\text{9499}}{{\text{I}}^{\text{2}}} - \,{\text{8927}}.0{\text{1175}}{{\text{U}}^{\text{2}}}\,+\,{\text{8}}.{\text{78913}}{{\text{q}}^{\text{2}}}\,+\,{\text{199}}0.{\text{93268}}{{\text{T}}^{\text{2}}} \\ \end{aligned}$$
(23)
Fig. 25
figure 25

Optimized plots of (a) residuals, (b) residuals versus predicted, (c) residuals versus experimental runs.

Fig. 26
figure 26

Plots of (a) Cook’s distance and (b) Box-Cox curves for power transforms.

Fig. 27
figure 27

Scatter plots of (a) leverage versus runs, (b) DFFITS versus runs and (c) DFBETAS for intercept versus runs.

Fig. 28
figure 28figure 28

Scatter plots for the suction pressure versus the input variables.

Conclusions

This research aims to predict the logarithmic of suction pressure (Log Ψ) of granular soil treated with biochar using biochar ratio (Bio), Electric current (I), Electrical potential (U), Metric water content (θ),Temperature (T) and Bulk density (γb). Eight ML classification techniques namely GB, CN2, NB, SVM, SGD, KNN, Tree and RF and one symbolic regression technique such as the RSM were considered in this research. The outcomes of this study could be concluded as follows:

  • GB, SVM, and CN2 models showed an excellent accuracy of about 97%, while KNN, and Tree models showed very good accuracies of about 93%, SGD, and RF models showed fair accuracy level of about (90–83%) and at last NB with poor accuracy of 74%.

  • Sensitivity analysis indicated that all inputs had almost the same level of influence on the suction pressure (20–24%) except the temperature which showed less influence level (about 10%).

  • All the developed models are too complicated to be used manually, which may be considered as the main disadvantage of the ML classification techniques compared with other symbolic regression ML techniques such as GP and EPR.

  • The RSM symbolic model produced an R2 of 0.9812 with adequate precision of 19.797 which indicated an adequate signal from the model interface.

  • The developed models are valid within the considered range of parameter values, beyond this range; the prediction accuracy should be verified.