Introduction

Stabilization techniques for sulfate sand subgrade materials are becoming increasingly crucial for transportation infrastructures in desert and arid regions across countries along the Belt and Road such as China, Iran, and Saudi Arabia, etc.1,2. In cases of sulfate sand, traditional calcium-based stabilizers (cement and lime) would react with sulfates and generate ettringite minerals, causing significant volumetric expansion during hydration and bring heaving and cracking of subgrades3,4,5,6. Additionally, the production of cement also arouses increasing concerns in terms of high energy consumption, carbon emission, and environmental risk7,8. microsilica, a by-product of silicon metal production, poses significant risks to human health and the environment if not handled properly. Recently, the combination of microsilica and lime as mixed stabilizers has shown promise for sulfate soil stabilization9,10,11,12,13,14,15,16. By mixing microsilica and lime in the presence of water, the environmental pH value increases and the active silica reacts with calcium hydroxide, forming calcium silicate hydrated gels17,18 and results in better performance of microsilica-lime stabilized sulfate sand (MSLSS), with high strength and less hydration expansion.

Despite this potential, accurately modeling the relationships between the strength of MSLSS and its multiple influencing factors remains a challenge hindering its broader application. The mechanical properties of MSLSS are governed by several key factors, including the proportions of lime and microsilica, curing age, and compaction conditions, etc.19,20. A thorough understanding of the relationship between them is essential for advancing the utilization of MSLSS. Conventional laboratory testing provides a direct approach but is time-intensive and costly, especially for this multi-variable system. On the other hand, the accuracy capability of conventional empirical formula by regression analyses becomes doubtful as regression coefficients increasingly introduced to mimic the highly nonlinear relationship. As a result, it is imperative to propose a method with high robustness, accuracy and generalization capabilities in predicting the mechanical properties of MSLSS.

In recent years, machine learning (ML) has emerged as a transformative tool in civil engineering, enabling data-driven modeling of complex material behaviors and geotechnical properties that are otherwise challenging to characterize using traditional empirical or physical approaches21,22. The application of ML techniques has demonstrated remarkable success in predicting mechanical properties such as unconfined compressive strength (UCS), stiffness, permeability, and liquefaction potential of soils and construction materials23,24,25,26,27,28,29. For instance, Jas et al.30 developed an explainable ML model using LightGBM and SHAP analysis to evaluate liquefaction potential in gravelly soils, achieving a balance between accuracy and interpretability, while Kang et al.31 integrated Lattice Boltzmann Methods with ML algorithms to predict permeability in porous media, reporting R2 values exceeding 0.9. In the realm of strength prediction, Nasiri et al.32 employed SHAP-based explainable AI to model UCS and Young’s modulus of rocks, emphasizing model transparency, and Nawaz et al.33,34 used Gene Expression Programming (GEP) and ANN to estimate UCS and stiffness modulus in clayey soils, achieving high accuracy (R2 up to 0.98) through sensitivity analysis. Similarly, Zhang et al.35 applied Extreme Gradient Boosting (XGBoost) to predict compressive strength of cement-stabilized soft soil, identifying key features such as cement content and curing age. Further advancing the field, Khawaja et al.36 compared GEP, ANN, and Multi-Expression Programming for predicting resilient modulus of subgrade soil, while Jafri et al.37 used GEP to model rock cutting performance. Moreover, recent works by Sun et al.38 and Yao et al.39 have optimized alkali-activated concrete and solid waste-cement stabilized soils using Random Forest and XGBoost models, respectively, further reinforcing the potential of ML in sustainable construction practices. These advancements collectively illustrate the superiority of single ML models over conventional empirical formulas, particularly in handling nonlinearity and high-dimensional interactions, paving the way for more reliable and efficient engineering solutions.

A common drawback of ML approaches is that researchers often either dedicate substantial time to manually adjusting hyperparameters or rely on default settings, which may be far from optimal40. To address this, scholars are increasingly leveraging intelligent optimization algorithms (sparrow search algorithm41, genetic algorithm42, whale optimization algorithm43, particle swarm optimization44, grey wolf optimizer45, Artificial bee colony46, etc.) to optimize ML model hyperparameters during training and improve prediction accuracy. Among them, the sparrow search algorithm (SSA), a biologically inspired heuristic approach, has an edge of simple structure, fast speed, and strong search ability, leading to its extensive application in optimization tasks47,48. Recently, some scholars have successfully applied the SSA optimization model to hydrological management and disaster prevention. Liu et al.49, Hu et al.50 and Song et al.51 successfully produced groundwater potentiality prediction and water quality prediction by hybrid artificial intelligence methods, integrating SSA with machine learning methods. Zheng et al.52, Shui et al.53, and Wang et al.54 adopted SSA to optimize ML models for slope stability prediction.

Despite these advances, the prediction of strength in stabilized sulfate-rich soils, a critical concern for infrastructure development in arid regions, remains underexplored. The complex physicochemical interactions between lime, microsilica, and sulfate sands present challenges for conventional modeling approaches. Notably, the potential of SSA-optimized hybrid machine learning models for predicting geotechnical properties of MSLSS represents a significant research gap. This study addresses this void by pioneering the integration of metaheuristic optimization with interpretable ML frameworks, ultimately contributing to intelligent, data-informed geotechnical design for sustainable infrastructure.

To achieve this goal, hybrid prediction models with hyperparameters tuned by the sparrow search algorithm are developed for predicting the unconfined compressive strength (UCS) of MSLSS. Firstly, a dataset containing the results of compaction test and UCS test of MSLSS was collected. Secondly, six ML models were proposed and evaluated: three hybrid models (XGB-SSA, RF-SSA, DT-SSA) and their corresponding single models (XGB, RF, DT). During model training, the hyperparameters of the hybrid models were optimized using SSA, while the single models used default settings. Model performance was assessed using indicators including R2, MAE, MSE, and MRE. This study represents the first application of hybrid ML models optimized by SSA to forecast the UCS of MSLSS, contributing to the advancement of ML applications in soil modification.

Methodology

Methodology conception

Figure 1 illustrates the methodology flowchart for developing both hybrid ML models integrated with SSA and single ML models. The main steps of the prediction model development are as follows:

  1. (1)

    A dataset containing experimental UCS data for MSLSS was compiled from the literature9,20. Initial exploratory data analysis included generating scatter plots and calculating Pearson correlation coefficients to investigate relationships between variables.

  2. (2)

    Data splitting and normalization. For hybrid prediction models, the dataset was partitioned into a training set (70%), a validation set (10%), and a testing set (20%). For single models, the dataset was partitioned into a training set (80%) and a testing set (20%).

  3. (3)

    Models training. For the hybrid models, the base ML models (XGB, RF, DT) were initially trained using the training set. Subsequently, the validation set was input into these preliminarily trained models under different hyperparameter configurations. For the single models, the models were trained directly using the training set with their default hyperparameter settings.

  4. (4)

    Models validating. The testing set was used to evaluate the predictive accuracy and generalization capabilities of the final models.

Fig. 1
Fig. 1
Full size image

Methodology flowchart.

Data collection and analysis

The experimental dataset utilized in this study, comprising a total of 96 data points, was sourced from the work of Ghorbani et al.9 and Karimi et al.20, specifically focusing on the UCS of MSLSS. The dataset encompasses key parameters influencing UCS, including lime content (L), microsilica content (MS), curing days (CD), curing condition (CC), optimum moisture content (OMC), and maximum dry density (MDD).

In this work, these six parameters are considered independent variables, while UCS is the response variable. Table 1 outlines the statistical distribution of each input and output variable within the compiled databank. Scatter plots depicting the relationships between each input variable and UCS, along with linear fit lines, are presented in Fig. 2. The scatter plots indicate a positive correlation between UCS and L, MS, and OMC, while a negative correlation is observed with MDD. Notably, all variables exhibit considerable dispersion, suggesting inherent randomness and complex, potentially nonlinear, interrelationships. This complexity precludes accurate description using conventional polynomial models, motivating the use of ML approaches.

Table 1 Description of variables.
Fig. 2
Fig. 2
Full size image

Scatter plots of input variables versus output UCS: (a) L-UCS; (b) MS-UCS; (c) CC-UCS; (d) CD-UCS; (e) OMC-UCS; (f) MDD-UCS.

Figure 3 depicts the Pearson correlation matrix, with blue and red hues denoting negative and positive coefficients, respectively. The input variables exhibit two striking relationships: OMC-L (PCC = 0.91) and MDD-MS (PCC = – 0.91). The simultaneous rise in OMC and fall in MDD originates from the intrinsically lower bulk densities and higher hydrophilicity of microsilica and lime. The incorporation of microsilica and lime dilutes the overall matrix density, directly reducing MDD, while their affinity for water elevates the water demand during compaction, increasing OMC. Despite reducing density and increasing water content, microsilica and lime act as effective cementitious agents: after compaction, their pozzolanic reactions stiffen the soil skeleton55,56. This results in a counterintuitive negative relationship between MDD and UCS, alongside significant positive correlations of UCS with OMC (PCC = 0.65) and with L (PCC = 0.62).

Fig. 3
Fig. 3
Full size image

Matrix of Pearson correlation coefficient for all variables.

Data splitting and normalization

To mitigate overfitting risks, we employed a combination of cross-validation techniques and learning curve analysis. The dataset is split into five folds. Each fold in turn serves as the testing set while the remaining four folds are used for training and validating, yielding five training-validation runs. Specifically, the dataset was partitioned into training (70%), validation (10%), and testing (20%) sets for hybrid models, and training (80%) and testing (20%) sets for single models, using systematic sampling to maintain data distribution consistency. The final performance is reported as the average of the five validation results.

Prior to model training, all input variables were normalized to the interval [0, 1]. This preprocessing step ensures uniform influence of each feature on the predicted response and significantly accelerates the convergence and enhances the numerical stability of the subsequent iterative optimization process.

Machine learning models

The DT model is a supervised ML model commonly used for classification and regression. It is structured as a tree, where leaves represent outcomes (predicted values) and branches represent decision rules leading to those outcomes57. The DT model is characterized in nature by nonlinearity and is easy to interpret and visualize. Its drawback that needs attention is its proneness to overfitting, especially since the tree-growth is without constraints.

The RT model is a widely used ensemble ML method that integrates multiple DTs. Figure 4 presents a schematic of the RF model. To enhance precision and efficiency, each DT is trained independently on a distinct bootstrap sample (with replacement) of the training data58. The final outcome is determined from the average or most-voted results of each DT, and the problems of overfitting and noisy or outlier data points can be mitigated by aggregating the results of these multiple DTs. RF also excels at handling high-dimensional data without requiring dimensionality reduction. Due to its robustness and versatility, RF is one of the most commonly used ML methods.

Fig. 4
Fig. 4
Full size image

Schematic representation of RF model.

The XGB model is a typical ensemble ML method derived from traditional GBDT methodology, which integrates gradient-boosting techniques to achieve more precise final predictions59. The model schematic map of XGB is illustrated in Fig. 5. As shown in this figure, XGB can minimize loss function by sequentially training DT based on the residual between the predicted- and measured- value of the previous DT. Through multiple iterations, the accuracy of the XGB model was progressively enhanced. The results of each DT are weighted for the final prediction result. This sequential learning approach enables XGB to effectively model complex data relationships, enhancing both performance and generalizability.

Fig. 5
Fig. 5
Full size image

Schematic representation of XGB model.

Optimization algorithm

The SSA is a novel meta-heuristic optimization algorithm taking inspiration from the hunting and anti-predation behavior of sparrows40. It operates independently of gradient information, boasts excellent parallel processing capabilities, and achieves convergence at a rapid pace. In this study, SSA was used in this study to optimize the parameters of ML models.

The process of constructing the optimization model based on SSA is presented as follows;

  1. (1)

    Initialize sparrow population, iterations, and proportion of finders and followers. Calculate and sort the fitness of all sparrows.

  2. (2)

    Update the position of finders. Finders (typically 10%-20% of the population), responsible for scouting food sources, update their positions according to:

    $$X_{nd}^{g + 1} =_{{}} \left\{ {\begin{array}{*{20}c} {X_{nd}^{g} \cdot \exp (\frac{ - n}{{\alpha \cdot g_{\max } }}), \, R_{2} < ST} \\ {X_{nd}^{g} + Q \cdot L, \, R_{2} \ge ST} \\ \end{array} } \right.$$
    (1)

where g and \(g_{\max }\) denotes the present number of iterations and the maximum iteration, respectively;\(X_{n,d}^{g}\) represents the d-th dimension of the n-th sparrow in the present iteration; \(R_{2}\) and ST denotes the alarm value and the threshold for triggering the alarm.

Update the position of followers. Followers (the remaining population) update their positions by moving towards the finders:

$$X_{nd}^{g + 1} = \left\{ {\begin{array}{*{20}c} {Q \cdot \exp (\frac{{X_{worst}^{g} - X_{nd}^{g} }}{{n^{2} }}), \, n > \frac{N}{2}} \\ {X_{p}^{g + 1} + \left| {X_{nd}^{g} - X_{p}^{g + 1} } \right| \cdot A^{ + } \cdot L, \, n \le \frac{N}{2}} \\ \end{array} } \right.$$
(2)

where \(X_{p}^{g}\) denotes the optimal position occupied by the finder, \(X_{{{\text{worst}}}}^{g}\) indicates global worst position; \(A^{ + }\) is a matrix in which each element is randomly assigned as 1 or -1.

The formula for the sparrows that are randomly chosen to exhibit scouting and warning behaviors is:

$$X_{nd}^{g + 1} = \left\{ {\begin{array}{*{20}c} {X_{best}^{g} + \beta \left( {X_{nd}^{g} - X_{best}^{g} } \right), \, f_{n} \ne f_{g} } \\ {X_{nd}^{g} + P \cdot \left( {\frac{{X_{nd}^{g} - X_{best}^{g} }}{{\left| {f_{n} - f_{t} } \right| + e}}} \right), \, f_{n} = f_{g} } \\ \end{array} } \right.$$
(3)

where \(X_{best}^{g}\) signifies the best position; β is a parameter controlling step size; \(f_{n}\) refers to the present fitness-value, \(f_{g}\), and \(f_{w}\) refers to the fitness values of the best and worst position.

Performance evaluation

To evaluate the performance of the hybrid model and other models, several statistical metrics were utilized, including mean squared error (MSE), R-squared (R2), mean absolute error (MAE), and mean relative error (MRE). In regression analysis, R2 quantifies the proportion of variance explained by the model, while metrics such as MAE, MSE, and MRE provide complementary measures of prediction error magnitude60. The formulas for these metrics are:

$$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{n} {(y_{i}^{{}} - y_{ip}^{{}} )^{2} } }}{{\sum\limits_{i = 1}^{n} {(y_{i}^{{}} - \overline{y}_{i}^{{}} )^{2} } }}$$
(4)
$$MSE = \frac{1}{n}\sum\limits_{i = 1}^{n} {(y_{i}^{{}} - y_{ip}^{{}} )^{2} }$$
(5)
$$MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {y_{i}^{{}} - y_{ip}^{{}} } \right|}$$
(6)
$$MRE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {\frac{{y_{i}^{{}} - y_{ip}^{{}} }}{{y_{i}^{{}} }}} \right|}$$
(7)

where, \(y_{i}\), \(y_{ip}\) and \(\overline{y}_{i}\) represents the tested-, predicted-, and mean-value of UCS for MSLSS, respectively. In present work, a comprehensive ranking method was utilized for model performance evaluation61,62.

$$Score = \sum\limits_{i = 1}^{m} {Rank_{i} }$$
(8)

where score represents the average performance of the model, m represents the number of performance indicators, and \(Rank_{i}\) is the ranking of each model based on the performance indicators.

Results and discussion

Three hybrid ML models and their single counterparts were utilized to capture the intricate, nonlinear correlations between the UCS of MSLSS and its significant influencing factors (i.e., L, MS, CD, CC, OMC, MDD). To assess the efficacy of the hybrid models, the predictions of the hybrid models (XGB-SSA, DT-SSA, RF-SSA), the models without hyperparameter optimization (XGB, DT, RF), and conventional empirical models were thoroughly compared and evaluated. Two empirical formulas (EF) were also evaluated. Similar to ML model training, 80% of the experimental data were used as the training set for regression analysis, yielding the following formulas:

Empirical formula (I):

$$UCS = - 52.56 + 3.07L + 0.19MS - 2.17CC + 0.20CD + 4.29OMC + 4.38MDD$$
(9)

Empirical formula (II):

$$\begin{aligned} UCS = & 1897.43 + 331.80L - 39.74MS - 0.03CD - 212.13OMC - 991.32MDD \\ & + 0.13L \cdot CD - 4.28L \cdot OMC - 138.05L \cdot MDD + 21.28MS \cdot MDD + 109.63OMC \cdot MDD \\ \end{aligned}$$
(10)

Construction of hybrid models

ML models are often initially trained using default hyperparameter values, which frequently lead to suboptimal performance. To enhance results, SSA was employed to iteratively update the models by minimizing the mean MAE from fivefold cross-validation, thereby searching for optimal hyperparameters. The iteration curves for the SSA-hybridized ML models are shown in Fig. 6. Table 2 presents the optimal parameters for each hybrid ML model (XGB-SSA, DT-SSA, RF-SSA) after 250 iterations. It is observed that different ML models exhibit distinct optimization speeds and fitness values, indicating variations in SSA effectiveness and overall hybrid model performance. The XGB-SSA model achieved the lowest fitness value, demonstrating the best validation set performance. Over iterations, the fitness value decreased from 1.06 to 1.04 for XGB-SSA, from 2.97 to 1.29 for RF-SSA, and from 3.83 to 3.57 for DT-SSA. Consequently, the validation set performance ranking (strongest to weakest) is: XGB-SSA > RF-SSA > DT-SSA. Although XGB achieved the best overall performance, it showed the smallest improvement when integrated with SSA. In contrast, the RF model exhibited the most substantial performance enhancement through SSA optimization.

Fig. 6
Fig. 6
Full size image

Identifying the optimum parameters.

Table 2 Optimum values of the parameters of algorithms.

Indicators of model performance

To develop high-performance prediction models, both ML models and empirical models were trained and evaluated using four performance indicators: R2, MAE, MSE, and MRE. Their predictive capabilities were compared across both training and testing datasets. The results are visually summarized in a radar chart (Fig. 7) and detailed numerically in Table 3. A model is considered to perform better in predicting the unconfined compressive strength (UCS) of MSLSS as the R2 value approaches 1.0, and the MAE, MSE, and MRE values approach zero.

Fig. 7
Fig. 7
Full size image

Radar chart of evaluation indexes for models: (a) R2; (b) MAE; (c) MSE; (d) MRE.

Table 3 Performance indices of eight models.

As illustrated, all models except EF(I) and EF(II) demonstrate strong performance on the training set, with R2 values exceeding 0.90. The inferior performance of the empirical formulas arises from their limited capacity to capture the nonlinear relationships between input variables and the target UCS. Ranked in ascending order of training set performance, the models are: EF(II), RF, DT-SSA, DT, RF-SSA, XGB, and XGB-SSA. The hybrid XGB-SSA model consistently outperformed all others across every evaluation metric and dataset. On the training set, XGB-SSA achieved the lowest MAE (0.491), MSE (0.557), and MRE (0.167), alongside the highest R2 (0.998). This superior performance extended to the testing set, where XGB-SSA again yielded the best results: MAE (1.358), MSE (3.846), R2 (0.982), and MRE (1.046). The standard XGB model ranked second overall, exhibiting strong and competitive results close to those of XGB-SSA, particularly on the testing set (MAE = 1.581, MSE = 5.494, R2 = 0.974).

SSA optimization significantly enhanced the predictive performance of the base ML models (XGB, RF, DT). The improvement was most pronounced for RF-SSA, which significantly reduced the testing MSE from 40.819 to 8.876 and increased R2 from 0.813 to 0.959. Moreover, the SSA-optimized models exhibited improved generalization ability, showing less overfitting compared to their non-optimized counterparts. For example, the RF-SSA model maintained robust performance on the testing set (R2 = 0.959), despite an increase in MAE from training to testing. In contrast, the base RF model suffered from substantial overfitting, with a notable decline in R2 from 0.903 on the training set to 0.813 on the testing set. These results demonstrate that SSA optimization not only boosts predictive accuracy but also effectively mitigates overfitting. Conversely, the empirical models fundamentally underfit, failing to capture the underlying complexities of the dataset in both training and testing phases.

Prediction performance and comparative analysis

A score ranking method was employed to quantitatively evaluate the prediction capabilities of the ML models. As illustrated in Fig. 8, models were compared based on this scoring approach, with higher scores indicating better predictive performance. The overall ranking of model performance is as follows: XGB-SSA > XGB > RF-SSA > DT-SSA > DT > RF > EF(II) > EF(I).

Fig. 8
Fig. 8
Full size image

Score ranking of models.

The XGB-SSA consistently achieves the best performance across all datasets, while the standard XGB model closely followed, ranking second. This comparable performance can be attributed to XGB’s inherent ensemble learning mechanism and advanced optimization algorithms, which reduce the necessity for additional hyperparameter tuning via SSA. Therefore, depending on practical constraints, the direct application of XGB may serve as an efficient and competitive alternative. In contrast, the RF model demonstrated relatively inferior performance, likely due to overfitting. However, its hybrid version, RF-SSA, ranked third overall, indicating that SSA optimization significantly enhances RF’s prediction capability by improving hyperparameter selection. This observation aligns with the empirical findings of Wang & Zhao63, who reported that metaheuristic optimization via SSA led to the most substantial improvement in RF model efficacy compared to conventional tuning methods. It is also noteworthy that all ML models consistently outperformed the empirical models, underscoring the strength of ML approaches in capturing nonlinear and complex relationships within data.

Scatter plots in Fig. 9 compare the experimental UCS of MSLSS against predictions from all eight models. Five reference lines indicate error margins of 0%, ± 10%, and ± 20%. Predictions from the EF(I) model show considerable volatility, with approximately 78% of the training data points lying outside the ± 20% error bounds. Although EF(II) and the RF model exhibit tighter clustering around the ideal line (y = x) on the training set, their testing errors increase significantly, reflecting poor generalization. In comparison, the XGB and hybrid ML models (especially XGB-SSA) achieve higher accuracy with lower error rates. During training, about 85% of the data points generated by XGB-SSA and XGB fall within the ± 20% error range. More importantly, during testing, the majority of predictions lie close to the ideal 45° line, with the remainder confined within the ± 20% error boundaries. These results emphasize the superior generalization ability of the hybrid XGB-SSA model over both standalone ML and empirical models.

Fig. 9
Fig. 9
Full size image

Scatter plots exhibition: (a) XGB-SSA; (b) RF-SSA; (c) DT-SSA; (d) XGB; (e) RF; (f) DT; (g) EF(I); (h) EF(II).

The Taylor diagram64 offers a concise and integrated visualization of model accuracy by summarizing performance statistics (R, RMSE, standard deviations) through the spatial positioning of model markers. In such diagrams, the proximity of a point to the reference point reflects the model’s predictive idealness. As illustrated in Fig. 10, which is based on error analyses from Table 3 for both training and testing datasets, the XGB-SSA model is positioned markedly closer to the reference point than all other models. It is closely followed by XGB, RF-SSA, and DT-SSA, thereby robustly confirming its superior predictive performance among the evaluated approaches.

Fig.10
Fig.10
Full size image

Evaluation of model performance by Taylor diagram: (a) training set; (b) testing set.

Figure 11 further displays the residual error distributions of all models during the testing phase. The RF and empirical models demonstrated notably poor predictive accuracy, with EF(I) exhibiting the widest residual range. The residual spans of EF(II) and the RF model were only exceeded by EF(I), underscoring their substantial instability in predicting UCS. By contrast, the residuals of the XGB-SSA and XGB models were tightly clustered near zero. The relative frequencies of residuals within the interval [− 2, 2] reached 0.63 for XGB-SSA and 0.47 for XGB, indicating significantly higher stability and reliability compared to other models.

Fig. 11
Fig. 11
Full size image

Residual error distribution of models: (a) curve plots; (b) violin plots.

Feature importance

Although the XGB-SSA model demonstrates superior predictive performance for UCS estimation of MSLSS, the inherent complexity of XGB’s multi-layered decision tree architecture limits the interpretability of feature-outcome relationships. To gain deeper insights into global feature importance and their specific contributions to UCS predictions, Shapley Additive Explanations (SHAP) analysis was employed. For each specimen, SHAP values quantify the relative contribution of each input feature relative to a baseline average prediction.

Figure 12a and b present the SHAP-based feature importance analysis and the summary plot, respectively, for the six input variables used in predicting the UCS of MSLSS. Feature importance is represented by the mean absolute SHAP value, with higher values indicating greater predictive influence. As illustrated in Fig. 12a, a clear feature hierarchy emerges: OMC, MS, and CD are the dominant parameters, with SHAP contributions of 0.196, 0.095, and 0.055, respectively. The variable L exhibits a secondary yet statistically significant impact (0.021), while the influences of CC and MDD are minimal. The SHAP analysis identified OMC as the most influential parameter, which aligns with geotechnical principles where moisture content critically affects compaction efficiency and strength development in stabilized soils. This phenomenon is due to the closely correlation between OMC and lime content (PCC ≈ 0.91), reflecting the role of lime in altering the soil’s water retention properties and facilitating pozzolanic reactions with microsilica.

Fig. 12
Fig. 12
Full size image

SHAP values of different feature values: (a) feature importance; (b) summary plot.

The SHAP summary plot in Fig. 12b visualizes the distribution of feature effects across individual specimens. Features are categorized along the vertical axis, and the horizontal alignment corresponds to the magnitude of SHAP values. A color gradient reflects feature values, with red indicating high values and blue denoting low values.

Figure 12b clearly indicates a positive correlation between OMC and UCS. Pearson correlation analysis (Fig. 3) further reveals that OMC exhibits a strong positive correlation with L (PCC ≈ 0.91) and a weaker positive correlation with MS (PCC ≈ 0.32). This increase in OMC, typically associated with the addition of lime and microsilica, enhances hygroscopicity through pozzolanic reactions and modifies the pore structure65. These mechanisms collectively contribute to the improvement of UCS. In contrast, the relationship between MS and UCS exhibits a parabolic trend, peaking near the median value. Initial strength gains are attributed to high reactivity and pore-filling effects of microsilica. However, beyond an optimal threshold, excessive MS content leads to over-densification and increased viscosity, which hinders uniform binder distribution and may induce internal stresses and micro-cracking due to uncontrolled hydration, ultimately reducing soil strength66.

Conclusions

In this study, machine learning (ML) models integrated with the intelligent Sparrow Search Algorithm (SSA) were employed to predict the strength of MSLSS. Three hybrid models (XGB-SSA, RF-SSA, DT-SSA) were developed by coupling SSA with the XGB, RF, and DT algorithms, respectively. Their performance was evaluated and compared using four metrics (R2, MAE, MSE and MRE) to identify the most effective predictive model. Finally, the SHAP method was applied to interpret the influence of input variables. The superiority of the proposed models was established through comparisons with both standalone ML models and empirical formulas. The main conclusions are summarized as follows:

  1. (1)

    Integrating SSA with base ML models markedly improved their predictive accuracy for the UCS of MSLSS. The hybrid models (XGB-SSA, RF-SSA, DT-SSA) consistently outperformed their non-optimized counterparts (XGB, RF, DT) as well as conventional empirical models (EF(I) and EF(II)) across all evaluation metrics (R2, MAE, MSE, MRE) on both training and testing sets. This underscores the vital role of intelligent hyperparameter optimization in enhancing the accuracy and generalization ability of ML models for complex geotechnical engineering problems.

  2. (2)

    Among all evaluated models, the XGB-SSA hybrid model achieved the highest predictive accuracy and robustness. It delivered exceptional performance on the testing set (R2 = 0.982, MAE = 1.358, MSE = 3.846, MRE = 1.046), demonstrating a strong capability to capture complex nonlinear relationships. The standard XGB model also exhibited high performance, ranking second overall. In contrast, conventional empirical regression models displayed fundamental limitations due to underfitting, resulting in the poorest predictive outcomes.

  3. (3)

    Based on the SHAP analysis of the optimal XGB-SSA model, OMC, MS, and CD are identified as the most influential parameters affecting the prediction of UCS. Among these, OMC exhibits the strongest positive contribution, which can be attributed to its association with increased lime and microsilica content. These components enhance pozzolanic reactivity and improve microstructural density, thereby strengthening the material. In contrast, the relationship between MS content and UCS is nonlinear: while moderate amounts of MS improve strength through pore-filling and heightened reactivity, exceeding an optimal threshold leads to excessive densification and disrupted binder distribution, ultimately reducing overall strength.

In summary, the developed hybrid predictive model offers significant potential for saving resources and time by enabling preliminary strength assessments, thereby reducing the reliance on extensive experimental work while maintaining scientific rigor. However, the reliance on a limited dataset from specific regions may constrain generalizability across diverse soil types and environmental conditions. Therefore, it should be used as a complementary tool alongside laboratory testing for critical geotechnical designs, particularly in preliminary stages and mix optimization processes. Furthermore, the current model does not fully incorporate long-term durability factors such as wet-dry cycles and sulfate attack resistance. Future research should focus on expanding the database to include more diverse soil conditions and chemical compositions.