Curtain grouting volume prediction using a Bayesian-optimized stacking ensemble model with SHAP analysis

Ma, Yahui; Yuan, Zhanquan; Xiong, Bo; Lei, Hongwei; Zhao, Weiquan

doi:10.1038/s41598-026-45538-6

Download PDF

Article
Open access
Published: 01 April 2026

Curtain grouting volume prediction using a Bayesian-optimized stacking ensemble model with SHAP analysis

Yahui Ma¹,
Zhanquan Yuan²,
Bo Xiong³,
Hongwei Lei¹ &
…
Weiquan Zhao¹

Scientific Reports volume 16, Article number: 15374 (2026) Cite this article

931 Accesses
Metrics details

Subjects

Abstract

Curtain grouting is widely used to control seepage in large-scale water conservancy and hydropower projects, and accurate prediction of grouting volume is essential for ensuring construction quality and cost control. This study proposes a grouting volume prediction model combining Bayesian optimization (BO) with a stacking ensemble learning framework. The model was developed using a dataset of 778 valid grouting records and incorporated seven key input features: hole sequence, hole depth, section length, hole diameter, pre-grouting permeability, initial water-cement ratio, and grouting pressure. Within this framework, XGBoost, LightGBM, and Random Forest were employed as base learners, with BO applied for global hyperparameter optimization. Ridge regression served as the meta-learner to construct the Bayesian-optimized stacking ensemble (BO-Stacking) model. SHapley Additive exPlanations (SHAP) analysis was used to quantify feature contributions and enhance model interpretability. The results show that the BO-Stacking model outperformed the benchmark models, achieving a coefficient of determination (R²) of 0.92, a mean absolute error (MAE) of 70.19 L, and a root mean square error (RMSE) of 187.07 L. Scatter analysis further indicated strong agreement between predicted and measured values. SHAP analysis quantified the relative contributions of geological conditions, construction parameters, and slurry properties to grouting volume. Overall, the proposed approach improves predictive performance of grouting volume under complex geological conditions and provides support for construction planning and quality management.

Estimating seepage in heterogeneous earthfill dams on permeable foundations using explainable machine learning

Article Open access 10 April 2026

Hybrid machine learning for SCC strength prediction using metaheuristic optimization

Article Open access 20 May 2026

Enhancing wellbore stability through machine learning for sustainable hydrocarbon exploitation

Article Open access 09 October 2025

Introduction

Grouting is a critical seepage-control technique widely used in major infrastructure projects, including hydraulic and hydropower engineering, transportation tunnels, and mining engineering¹. Its primary objective is to form continuous and dense seepage curtains or reinforcement zones within rock and soil masses, thereby blocking groundwater flow paths and improving foundation bearing capacity². In practice, grouting is a concealed process characterized by substantial uncertainty. Its effectiveness is governed by the coupled effects of geological conditions, grout rheology, and construction parameters³. Complex subsurface conditions further increase the uncertainty of grouting performance and the difficulty of construction control^4,5. Under these conditions, accurate prediction of grouting volume has become a key scientific and engineering challenge for design optimization, refined construction control, and quality assessment. Recent studies in geotechnical and underground engineering have focused on slope stability, rock fracture evolution, and anchorage mechanisms under complex geological conditions^6,7,8,9. However, quantitative prediction of grouting volume in curtain grouting remains insufficiently explored relative to these well-studied topics,

Early approaches to grouting volume prediction primarily relied on empirical relationships, engineering analogy, and numerical simulation. Empirical methods estimate grouting volume by combining simplified theoretical assumptions with historical experience and project-specific parameters¹⁰, whereas analogy-based methods extrapolate predictions from comparable projects¹¹. Numerical simulation approaches attempt to reproduce grout migration behavior using mechanistic models under prescribed boundary conditions¹². Although these methods have contributed to engineering practice, they are often limited in predictive accuracy and transferability under complex field conditions.

Unlike conventional model-driven approaches that rely on explicit governing equations, machine learning methods learn relationships directly from historical data, thereby capturing complex nonlinear patterns and interactions among high-dimensional variables^{13,14,15,16,17,18}. With advances in artificial intelligence, big data technology, and engineering monitoring systems¹⁹, machine learning has emerged as a promising data-driven approach for grouting volume prediction^{20,21,22,23,24}. However, the predictive performance of a single model is typically constrained by algorithmic bias and sensitivity to data heterogeneity. In practice, grouting datasets often vary across sites, strata, and construction stages, and measurement noise and response imbalance are common. Consequently, single models may exhibit unstable performance and limited generalizability when applied to conditions beyond those represented in the training data.

To address these limitations, ensemble learning has attracted increasing attention as an effective strategy for improving predictive accuracy and stability in engineering applications^25,26,27. Representative ensemble paradigms include bagging, boosting, and stacking²⁸. Bagging reduces variance primarily through bootstrap sampling and parallel training, whereas boosting enhances accuracy by iteratively correcting prediction errors and thereby reducing bias. In contrast, stacking improves predictive performance by integrating predictions from multiple base learners. Its core mechanism involves leveraging a meta-learner to combine the outputs of base learners, thereby integrating complementary model strengths and improving overall predictive accuracy and generalization²⁹.

In constructing the stacking model, base learners were selected according to the principles of both accuracy and diversity. Specifically, XGBoost³⁰ achieves strong predictive accuracy and generalization through regularization and precise loss approximation, whereas LightGBM³¹ demonstrates high training efficiency and low memory consumption in large-scale and high-dimensional settings. Random Forest (RF)³², as a representing bagging-based algorithm, is highly robust to noise and effective in reducing variance. These three algorithms process data through distinct mechanisms, thereby meeting the fundamental requirements for base learners. Because base-learner predictions often exhibit high correlation, which may lead to multicollinearity, ridge regression was selected as the meta-learner. By introducing an L2 regularization penalty, ridge regression suppresses coefficient instability and enhances the stability of the final prediction³³. Accordingly, this study used XGBoost, LightGBM, and RF as base learners, with ridge regression as the meta-learner to construct the stacking ensemble model for grouting volume prediction.

Because ensemble models involve numerous hyperparameters and their performance is highly sensitive to parameter configuration, empirical tuning is often insufficient to identify optimal solutions³⁴. Although grid search (GS) and random search (RS)³⁵ are commonly used for hyperparameter tuning, both methods show clear limitations when applied to high-dimensional parameter spaces. In contrast, Bayesian optimization (BO), as an efficient global optimization strategy, employs surrogate models to approximate the objective function based on historical evaluations, thereby guiding the search process more precisely and converging to the global optimum³⁶. Compared with metaheuristic algorithms such as particle swarm optimization (PSO)³⁷ and grey wolf optimizer (GWO)³⁸, BO offers advantages in search efficiency while also demonstrating greater capacity to handle noise and uncertainty. Therefore, this study employed BO for hyperparameter optimization to improve model predictive performance under a limited computational budget.

Although ensemble learning methods can improve predictive accuracy, their complex internal structures often reduce model interpretability³⁹. To address this limitation, explainable artificial intelligence (XAI) has emerged as an important research direction in intelligent engineering. Among the available techniques, SHapley Additive exPlanations (SHAP), grounded in cooperative game theory, provides an effective means of quantifying feature contributions to model predictions, owing to its sound theoretical foundation and consistent explanatory framework⁴⁰. SHAP quantify the marginal contribution of each input feature to the model output and provide both global feature-importance rankings and local prediction explanations, thereby helping to interpret the nonlinear and interactive patterns learned by the model. Although some studies have applied SHAP in civil engineering^41,42,43, research on model interpretability for curtain grouting volume prediction remains limited.

To address the challenges outlined above, this study develops a curtain grouting volume prediction model based on a Bayesian-optimized stacking ensemble (BO-Stacking) model with SHAP-based interpretability analysis. The model was validated using field data collected from a large-scale water conservancy project in Xinjiang via an intelligent grouting monitoring system. The overall workflow is illustrated in Fig. 1 and consists of three main steps: feature selection and data acquisition, construction of the BO-Stacking model, and model evaluation with SHAP analysis.

Research framework

Using engineering data from a large-scale water conservancy project in Xinjiang, this study constructed and validated a BO-Stacking model for curtain grouting volume prediction. The technical framework is illustrated in Fig. 1, and the specific implementation steps are detailed as follows:

(1) Feature selection and data acquisition. The primary factors influencing grouting volume include geological attributes, construction parameters, and slurry properties²⁵. Based on grouting mechanisms and engineering experience, seven key parameters were selected as input features: hole sequence, hole depth, section length, hole diameter, pre-grouting permeability (Lugeon), initial water–cement ratio, and grouting pressure. Grouting volume served as the target variable. The data were obtained from real-time monitoring records of the project’s intelligent grouting system. The dataset covered the period from April to July 2025 and comprised 778 valid grouting records across five curtain grouting units. These five units encompassed distinct construction zones and local geological conditions within the project, thereby improving the diversity of the dataset. Table 1 presents the descriptive statistics of the feature parameters in the raw dataset. Following preprocessing, the dataset was divided into training and test sets at an 8:2 ratio for model construction and performance evaluation.

(2) Construction of the BO-Stacking model. XGBoost, LightGBM, and RF were used as base learners. To mitigate the risk of overfitting, the base learners were trained using five–fold cross-validation. Ridge regression was served as the meta-learner to integrate base-learner predictions. The out-of-fold predictions generated during five-fold cross-validation were used as inputs to train the meta-learner, thereby preventing data leakage in the stacking procedure. The BO algorithm was applied to optimize hyperparameters of all base learners.

(3) Model evaluation with SHAP analysis. To evaluate the predictive performance of the proposed BO-Stacking model, its predictive results on the test set were compared against those of the benchmark models. The coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) were selected as evaluation metrics to quantify predictive performance across multiple dimensions. Furthermore, SHAP analysis was incorporated to enhance model interpretability by quantifying the marginal contributions of each input features to grouting volume.

Table 1 Descriptive statistics of parameters.

Full size table

Methodology

Stacking ensemble

Stacking is an ensemble learning strategy designed to enhance model generalizability. Its core mechanism is to construct a multilevel prediction structure that leverages the complementary strengths of different base models in data representation and pattern recognition⁴⁴. In the first layer, multiple base learners are trained independently and generate predictions on the training data. In the second layer, a meta-learner is trained to combine the predictions of the base learners through learned weighting or nonlinear mapping. By compensating for the biases of the base learners, the meta-learner produces a more accurate and robust final prediction. The overall architecture of the stacking ensemble is illustrated in Fig. 2.

Base learners

XGBoost

XGBoost is an ensemble learning framework widely applied in various predictive modeling tasks. Its core mechanism is based on an additive model and a forward stagewise optimization strategy, in which decision trees are iteratively constructed to correct the residuals of the preceding iteration³⁰. Unlike conventional gradient boosting methods, the objective function of XGBoost considers not only training loss term but also a regularization term to balance predictive accuracy and model complexity. The regularization term penalizes model complexity, thereby mitigating overfitting and improving generalization. The objective function is given by Eqs. (1) and (2).

$$\:Obj={\sum\:}_{i=1}^{n}l({y}_{i},{\widehat{y}}_{i})+{\sum\:}_{k=1}^{K}{\Omega\:}\left({f}_{k}\right)$$

(1)

$$\:{\Omega\:}\left({f}_{k}\right)=\gamma\:T+\frac{1}{2}\lambda\:{\sum\:}_{j=1}^{T}({\omega\:}_{j}{)}^{2}$$

(2)

where Obj is the objective function; l is the loss function measuring the discrepancy between the predicted $\:{\widehat{y}}_{i}$ and actual values $\:{y}_{i}$; $\:{f}_{k}$ is the k-th decision tree; K is the total number of decision trees; $\:{\Omega\:}\left({f}_{k}\right)$ is the regularization term of the k-th tree; $\:\gamma\:$ and $\:\lambda\:$ are regularization hyperparameters; T is the total number of leaf nodes; and $\:{\omega\:}_{j}$ is the weight of the j-th leaf node.

LightGBM

LightGBM is a distributed ensemble learning framework designed for handling large-scale data and high-dimensional features. Its core mechanism lies in the adoption of two algorithmic techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB)³¹. The overall prediction of LightGBM is obtained by accumulating the outputs of all constituent trees, as given by Eq. (3).

$$\:{F}_{m}\left(x\right)={F}_{m-1}\left(x\right)+{\rho\:}_{m}{h}_{m}\left(x\right)$$

(3)

where $\:{F}_{m}\left(x\right)$ is the final prediction; $\:{F}_{m-1}\left(x\right)$ is the cumulative prediction of the previous m − 1 trees; $\:{h}_{m}\left(x\right)$ is the prediction result of the m-th tree for input x; and $\:{\rho\:}_{m}$ is the learning rate or weight assigned of the m-th tree.

Random forest

Random Forest is a parallel ensemble learning algorithm based on the bagging methodology³². It constructs and aggregates multiple independent decision trees to achieve robust predictive performance. First, bootstrap sampling with replacement is applied to the original training set to generate independent training subsets for each decision tree. Second, at each node during tree construction, a random subset of features is selected as candidates, and the optimal split feature is identified from this subset. Finally, for regression tasks, the final prediction is obtained by averaging the outputs of all decision trees. The model formulation for regression is given by Eq. (4).

$$\:F\left(x\right)=\frac{1}{n}{\sum\:}_{i=1}^{n}{h}_{i}\left(x\right)\:$$

(4)

where n is the number of decision trees; $\:{h}_{i}\left(x\right)$ is the prediction of the i-th tree for input x; and $\:F\left(x\right)$ is the final prediction of the RF model.

Since XGBoost, LightGBM, and RF are all tree-based ensemble algorithms that partition the feature space based on rank ordering rather than absolute feature magnitudes, no feature normalization was required prior to model training⁴⁵. These models are inherently invariant to feature scaling and monotonic transformations.

Meta-learner

As a regularized least squares method designed to address multicollinearity, ridge regression offers notable robustness when employed as a meta-learner. Its core mechanism involves introducing an L2 regularization term into the ordinary least squares loss function to constrain the magnitude of model coefficients, thereby achieving a controlled trade-off between bias and variance. Rather than minimizing the residual sum of squares (RSS) alone, the objective function simultaneously penalizes the magnitude of the regression coefficients, thereby incorporating structural risk into the optimization. The resulting loss function, comprising the RSS and the L2 regularization term, is given by Eq. (5).

$$\:L\left(\theta\:\right)={\sum\:}_{i=1}^{n}({y}_{i}-{\sum\:}_{j=1}^{m}{\theta\:}_{j}{x}_{ij}{)}^{2}+\lambda\:{\sum\:}_{j=1}^{m}{\theta\:}_{j}^{2}$$

(5)

where n is the number of samples; m is the number of features; $\:{y}_{i}$ is the target value of the i-th sample; $\:{x}_{ij}$ is the j-th feature value of the i-th sample; $\:{\theta\:}_{j}$ is the weight of the j-th feature; and $\:\lambda\:$ is the regularization parameter controlling the strength of the penalty.

Because ridge regression is sensitive to feature scale, the meta-features were standardized using Z-score normalization prior to meta-learner training. The scaler was fitted exclusively on the training meta-features and subsequently was applied to the validation and test meta-features to prevent data leakage.

Bayesian optimization

BO is a sequential global optimization method based on probabilistic surrogate models. It is particularly suited for black-box optimization problems in which the objective function is computationally expensive, non-differentiable, or stochastic. Its core mechanism involves employing a Gaussian process (GP) to construct a probabilistic surrogate model of the objective function. By optimizing an acquisition function, the algorithm balances exploration and exploitation, thereby converging toward the global optimum within a limited evaluation budget. The overall optimization workflow is illustrated in Fig. 3.

The BO algorithm operates as a closed-loop sequential optimization procedure. In the initialization phase, methods such as Latin hypercube sampling are used to generate an initial sample set, which is subsequently used to train the GP surrogate model and establish a prior approximation of the objective function. The algorithm then enters the iterative phase. In each iteration, the next candidate point is selected by maximizing the acquisition function based on the posterior distribution of the current GP model. The objective function is then evaluated at this candidate point, and the resulting observation is added to the dataset. The GP surrogate model is subsequently refitted using the updated dataset to refine the posterior approximation of the objective function. This procedure is repeated until a predefined convergence criterion or computational budget is reached, at which point the optimal hyperparameter configuration is returned.

SHAP

SHAP is a feature attribution method based on cooperative game theory, designed to provide consistent and interpretable explanations for the predictions of complex models. Its theoretical foundation derives from the Shapley value concept in cooperative game theory, which treats model prediction as a cooperative game⁴⁶, which all features contribute jointly to the output. The contribution of each feature is quantified as the weighted average of its marginal contributions across all possible feature coalition orderings.

The Shapley attribution value of feature i for sample x quantifies its marginal contribution to the model output. Based on this attribution mechanism, SHAP decomposes the model output into a base value and the sum of individual feature contributions, thereby forming an additive explanation model. The mathematical formulations are given by Eqs. (6) and (7).

$$\:{\varphi\:}_{i}\left(x\right)={\sum\:}_{S\subseteq\:N\backslash\:\left\{i\right\}}\frac{\left|S\right|!\left(M-\left|S\right|-1\right)!}{M!}\left[{f}_{S\cup\:\left\{i\right\}}\left(x\right)-{f}_{S}\left(x\right)\right]$$

(6)

$$\:f\left(x\right)={\varphi\:}_{0}+{\sum\:}_{i=1}^{M}{\varphi\:}_{i}$$

(7)

where N is the set of all features; S is a subset excluding feature i; |S| is the cardinality of subset S; M is the total number of features; $\:{f}_{S}\left(x\right)$ is the model prediction based on feature subset S; $\:{f}_{S\cup\:\left\{i\right\}}\left(x\right)$ is the model prediction after adding feature i to subset S; $\:f\left(x\right)$ is the model prediction for sample x; and $\:{\varphi\:}_{0}$ is the base value.

Performance evaluation

To evaluate the predictive performance of the proposed model, the coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) were selected as evaluation metrics^47,48. R² measures the extent to which the model explains the total variance in the observed data; values closer to 1 indicates better model fit. MAE quantifies the average absolute deviation between predicted and observed values. RMSE reflects the root mean square deviation and assigns greater weight to larger errors. Smaller values of both MAE and RMSE indicate lower prediction error and higher predictive accuracy. The corresponding calculation formulas are given by Eqs. (8)– (10).

$$\:{R}^{2}=1-\frac{{\sum\:}_{i=1}^{n}({\widehat{y}}_{i}-{y}_{i}{)}^{2}}{{\sum\:}_{i=1}^{n}(\overline{y}-{y}_{i}{)}^{2}}$$

(8)

$$\:RMSE=\sqrt{\frac{1}{n}{\sum\:}_{i=1}^{n}({y}_{i}-{\widehat{y}}_{i}{)}^{2}}$$

(9)

$$\:MAE=\frac{1}{n}{\sum\:}_{i=1}^{n}|{y}_{i}-{\widehat{y}}_{i}|$$

(10)

where n is the number of samples; $\:{\widehat{y}}_{i}\:$and$\:\:{y}_{i}\:$are the predicted and observed values of the i-th sample, respectively; and$\overline{y}$is the mean of the observed values.

Results

Feature correlation analysis

The Pearson correlation coefficient⁴⁹ was used to examine the linear relationships among the input features and the target variable, as illustrated in Fig. 4. In the figure, red and blue indicate positive and negative correlations, respectively, and color intensity reflects the magnitude of the correlation coefficient. The analysis focused on the pairwise correlations between the input features and grouting volume to provide a preliminary assessment of each feature’s linear association with the target variable. As shown in the heatmap, pre-grouting permeability exhibited the strongest positive correlation with grouting volume, with a Pearson coefficient of 0.79. Hole sequence showed a moderate negative correlation with grouting volume, with a Pearson coefficient of − 0.43. Section length and hole depth showed weak to moderate positive correlations with grouting volume, with Pearson coefficients of 0.21 and 0.20, respectively. Grouting pressure and hole diameter showed weaker positive correlations with grouting volume, with Pearson coefficients of 0.17 and 0.10, respectively. The initial water–cement ratio showed the weakest linear correlation with grouting volume, with a Pearson coefficient of only 0.05, suggesting a limited linear association under the current dataset. It should be noted that the Pearson correlation coefficient quantifies only the degree of linear association between variables. It is sensitive to outliers and cannot capture nonlinear relationships or multivariate interaction effects. Therefore, the correlation analysis presented here served primarily as a preliminary quantitative reference for variable relationships. Feature importance and complex interaction effects should be further elucidated through more comprehensive analysis.

Model training

Following model construction, systematic hyperparameter tuning was conducted to mitigate the risk of overfitting and enhance model generalization performance. Given the complexity of the hyperparameter space, BO was adopted as the primary optimization strategy. The key hyperparameters and their corresponding search ranges for each base learner are summarized in Table 2. All hyperparameters not listed in Table 2 were retained at their default values.

Table 2 Hyperparameter configuration of the model.

Full size table

Model performance evaluation

Table 3; Fig. 5 present a comparison of evaluation metrics of the seven models on the test set. In terms of overall performance, the three BO-optimized models showed clear improvements across all metrics relative to their corresponding baseline models. Compared with their unoptimized models, the R² values for BO-XGBoost, BO-LightGBM, and BO-RF increased by 19.18%, 10.26%, and 4.94%, respectively; MAE values decreased by 38.00%, 25.06%, and 8.96%, respectively; and RMSE values decreased by 27.37%, 19.61%, and 4.93%, respectively. These results suggested that systematic hyperparameter search via BO yielded parameter configurations more closely aligned with the data distribution. Consequently, the optimized models demonstrated improved ability to capture variations in grouting volume, thereby reducing overall prediction error on the test set.

Building on these results, the BO-Stacking model achieved better predictive performance than the three BO-optimized base learners under the current evaluation setting. Compared with BO-XGBoost, BO-LightGBM, and BO-RF, the R² values of BO-Stacking increased by 5.75%, 6.98%, and 8.24%, respectively; MAE values decreased by 11.87%, 20.45%, and 26.38%, respectively; and RMSE values decreased by 14.51%, 19.04%, and 25.01%, respectively. These results suggested that a single learner might be insufficient to fully capture the complex nonlinear relationships between grouting volume and its multiple influencing factors. In contrast, the stacking framework used the predictions of the three BO-optimized base learners as meta-features, enabling the meta-learner to integrate the complementary strengths of different algorithms in characterizing the feature space and error distribution. Consequently, the BO-Stacking model improved the predictive performance relative to the individual base learners.

Table 3 Evaluation table of the predictive performance of the models.

Full size table

Visual assessment of prediction accuracy

To provide a sample-level assessment of predictive accuracy and systematic bias, Fig. 6 presents scatter plots of measured versus predicted grouting volumes for the seven models on the test set. Each point represents one test sample; the horizontal axis denotes the measured grouting volume, and the vertical axis denotes the corresponding model prediction. The purple dashed line represents the ideal line (y = x); the blue solid line indicates the fitted regression line; and the orange shaded band denotes the 95% confidence interval of the fitted regression line. This band reflects the uncertainty in the estimated trend between measured and predicted values.

As shown in Fig. 6, the baseline models without BO exhibited greater scatter dispersion. Especially in the medium-to-high grouting-volume range, many samples deviated substantially from the ideal line, indicating relatively large prediction errors in this range. The overall pattern also suggested the presence of potential systematic bias, along with pronounced deviations for some individual samples. Furthermore, the slope and intercept of the fitted regression line deviated noticeably from those of the ideal line. This suggested that the baseline models have an overall tendency toward systematic underestimation or overestimation of grouting volume changes.

Following BO optimization, the optimized models showed a clear improvement in scatter distribution. Most samples were more tightly clustered around the ideal line, with reduced dispersion and fewer apparent outliers. This suggested that the predictions were more consistent with the ideal agreement trend and that the fit across different grouting volume levels became more reasonable. Meanwhile, the fitted line was closer to the ideal line. This suggested that systematic bias was mitigated, and that the characterization of medium-to-high grouting-volume samples was improved.

Among all evaluated models, the BO-Stacking model achieved the best overall performance. Its sample points were closely concentrated near the ideal line, with a compact overall scatter pattern, indicating smaller deviations for most test samples. The fitted line was the closest to the ideal line, suggesting minimal systematic bias. Considering the compactness of the scatter distribution, the magnitude of sample-wise deviations, and the consistency between the fitted and ideal lines, the BO-Stacking model demonstrated favorable predictive accuracy and robustness on the test set. Consequently, it may be regarded as the best-performing model under the current evaluation setting.

SHAP analysis

To address this interpretability limitation, SHAP analysis was applied to interpret the BO-Stacking model. SHAP quantified the marginal contribution of each input feature to grouting volume through an additive attribution framework. For clarity, all SHAP values reported in this section were expressed in liters (L). Figure 7 presents the global feature importance ranking based on mean absolute SHAP values, reflecting the overall contribution of each feature to grouting volume. Figure 8 further illustrates the direction and magnitude of the association between feature values and grouting volume across the feature value range. Figure 9 presents SHAP dependence plots to examine the conditional relationships between individual features and their attributed contributions to the predicted grouting volume.

As shown in Fig. 7, pre-grouting permeability had the largest mean absolute SHAP value of 279.88 L, substantially exceeding those of all other features. This finding was consistent with engineering knowledge, suggesting that pre-grouting permeability is the dominant feature associated with grouting volume under the current dataset. Among the construction parameters, hole sequence had the second largest mean absolute SHAP value of 69.99 L, suggesting that grouting sequence might notably influence stratum conditions. Grouting pressure and hole depth also exhibited moderate feature importance, with mean absolute SHAP values of 31.68 L and 25.24 L, respectively. By contrast, hole diameter, section length, and initial water–cement ratio had relatively small mean absolute SHAP values, indicating comparatively limited contributions to grouting volume under the current model and dataset.

As shown in Fig. 8, pre-grouting permeability showed a strong positive association with grouting volume. From the perspective of seepage mechanics, pre-grouting permeability may be regarded as a macroscopic indicator of fracture aperture and network interconnectivity. Samples with high permeability values were predominantly associated with positive SHAP values. This was consistent with the geological expectation that higher permeability may be associated with more developed fracture networks and correspondingly larger grouting volumes. Hole sequence showed a negative association with grouting volume in the SHAP analysis. As hole sequence number increased, the corresponding SHAP values tended to decrease, suggesting that grouting volume in later-sequence holes might be reduced following treatment of earlier holes. This pattern was broadly consistent with the principles of staged curtain grouting practice. Grouting pressure showed a positive association with grouting volume in the SHAP analysis. In grouting theory, cement slurry is generally characterized as a Bingham fluid with a finite yield stress. Grouting pressure acts as the driving force required to overcome this yield stress and drive slurry flow. Accordingly, higher grouting pressure may expand the diffusion radius of the slurry, potentially contributing to larger grouting volumes. This finding was broadly consistent with a previous grouting volume prediction study⁴², in which permeability-related variables were identified as the most influential features and process-related variables showed distinct contributions to model output.

Figure 9 presents SHAP dependence plots to examine the conditional relationships between selected engineering parameters and their attributed contributions to the predicted grouting volume. For grouting pressure, samples with higher permeability tended to exhibit larger positive SHAP values, suggesting that the contribution of grouting pressure to grouting volume might be more pronounced in highly transmissive strata. Similarly, the hole sequence dependence plot indicated that samples with higher permeability tended to exhibit larger SHAP magnitudes across sequence levels, suggesting that permeability might modulate the influence of hole sequence on grouting volume in transmissive strata. These findings were consistent with the view that grouting volume was influenced by multiple interacting factors rather than any single input variable.

Discussion

Comparative analysis with existing studies

Several previous studies have addressed grouting volume prediction using a range of machine learning and modeling approaches. Zhe et al.⁴² employed borehole image features combined with explainable machine learning methods, achieving an R² of 0.848 for grouting volume prediction. In comparison, the BO-Stacking model proposed in the present study achieved better predictive performance under the current evaluation setting, with an R² of 0.92. Although Tong et al.²² reported a higher R² of 0.9675 for the ICEEMDAN-BO-GRU model, this result should be interpreted in the context of a different prediction setting. Their study focused on post-grouting prediction rather than on pre-grouting prediction. Variables such as pure irrigation time and end of injection rate may provide strong posterior information that improves model accuracy but are not available for advance prediction prior to grouting. Compared with the hybrid optimized stacking study by Zhu et al.²⁵, the present study additionally introduced SHAP analysis to enhance model interpretability. Collectively, these studies were selected for comparison because they encompassed diverse modeling strategies, prediction settings, and levels of interpretability, thereby providing a representative benchmark for evaluating the proposed model.

Research limitations

The data used in this study were collected from a single engineering project, in which the geological conditions, structural characteristics, and construction practices showed clear locality. Accordingly, the generalizability of the proposed model to other engineering contexts remains to be verified. It should be noted that this limitation is not unique to the present study, as grouting prediction studies have commonly relied on project-specific datasets from a single engineering project, such as the Dongzhuang hydraulic project dataset used by Sun et al.²⁴ and the Huanggou Pumped Storage Power Station dataset used by Tong et al.²². Previous studies have suggested that grouting prediction may be improved by introducing more detailed fracture-related parameters²⁵ or borehole image data⁴². Compared with those studies, the present study relied primarily on pre-grouting permeability as the main geological indicator. Additionally, the deep learning frameworks have been shown to improve grouting-related prediction in some contexts²⁰, but the present study used relatively conventional machine learning algorithms. Furthermore, as this study was primarily motivated by practical engineering prediction rather than probabilistic statistical inference, it did not include statistical significance testing or systematic uncertainty quantification for the reported performance improvements.

Conclusions

Based on field grouting data from a large-scale hydraulic project in Xinjiang, this study developed a BO-Stacking model for curtain grouting volume prediction and conducted a systematic evaluation of predictive accuracy, robustness, and interpretability. The main conclusions are as follows:

1.
Bayesian optimization was applied to search complex hyperparameter spaces and identify improved configurations for the base learners. The BO-optimized XGBoost, LightGBM, and Random Forest models achieved better performance than their default baseline counterparts across all evaluated metrics. Compare to their default baseline counterparts, the BO-optimized XGBoost, LightGBM, and Random Forest models achieved consistent performance improvements across all evaluated metrics. BO provided a highly reliable and efficient approach for enhancing the predictive capabilities of base learners.
2.
The BO-Stacking model demonstrated favorable predictive accuracy and robustness under complex geological conditions. Under the current evaluation setting, the model achieved the best performance across key metrics among all evaluated models, with an R² of 0.92, an MAE of 70.19 L, and an RMSE of 187.07 L. In contrast to the benchmark models, which exhibited greater scatter dispersion and systematic bias in the medium-to-high grouting volume range, the predictions of the BO-Stacking model were more closely concentrated near the ideal line. The proposed model addressed certain limitations of conventional grouting volume prediction methods and provided robust methodological support for pre-grouting volume estimation under complex geological conditions, demonstrating significant engineering application value.
3.
The SHAP analysis was broadly consistent with engineering knowledge. Pre-grouting permeability was the key feature associated with variation in grouting volume. Hole sequence was identified as the most influential construction parameter, while grouting pressure and hole depth also exhibited relatively important contributions to grouting volume. In contrast, hole diameter, section length, and initial water–cement ratio showed comparatively limited contributions to the predicted grouting volume. Grouting volume was influenced by multiple interacting factors spanning geological conditions, construction parameters, and slurry properties, rather than any single input variable. Overall, the SHAP analysis enhanced the interpretability of the BO-Stacking model and provided feature-contribution patterns broadly consistent with the engineering characteristics of curtain grouting.

Data availability

The data presented in this study are available on request to the corresponding author. The data are not publicly available due to privacy.

References

Xu, Z. H. et al. Latest research progress and future development direction of grouting numerical simulation in underground engineering. J. Basic. Sci. Eng. 33, 903–922 (2025).
Google Scholar
Guo, P. et al. Discrete element analysis of grouting reinforcement and slurry diffusion in overburden strata. Appl. Sci. 15, 8464 (2025).
Article CAS Google Scholar
Xia, K. Grouting Technology Collection (China Water & Power Press, 2015).
Google Scholar
Arash, M., Dolatshahi, A. & Shahriar, K. Three-dimensional simulation analysis of the impact of excavation on non-level crossing tunnels. Sci. Rep. https://doi.org/10.1038/s41598-025-33078-4 (2025).
Article PubMed PubMed Central Google Scholar
Lan, F. et al. Research on surrounding rock deformation characteristics and support optimization measures for tunnel TBM crossing through fault fracture zones. Sci. Rep. https://doi.org/10.1038/s41598-026-35748-3 (2026).
Article PubMed PubMed Central Google Scholar
Zhu, D. et al. Probabilistic stability analyses of two-layer undrained slopes. Comput. Geotech. 182, 107178 (2025).
Article Google Scholar
Wang, L. et al. Fracture evolution of granite under cyclic thermal shocks: effects of liquid nitrogen cooling on strength, toughness, and acoustic emission characteristics. Therm. Sci. Eng. Prog. 70, 104507 (2026).
Article CAS Google Scholar
Liu, M. et al. Experimental study on the structural failure characteristics and load-bearing mechanism of anchored fractured rock mass. Sci. Rep. 16, 4537 (2026).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, H. et al. Mechanism and application of reaming anchorage of inverted wedge-shaped hole bottom in argillaceous cemented roadway. Sci. Rep. 16, 5094 (2026).
Article ADS CAS PubMed PubMed Central Google Scholar
Sohrabi-Bidar, A., Rastegar-Nia, A. & Zolfaghari, A. Estimation of the grout take using empirical relationships (case study: Bakhtiari dam site). Bull. Eng. Geol. Environ. 75, 425–438 (2016).
Article CAS Google Scholar
Li, X. et al. Nonlinear flow characteristics of cement grout in fractures with varying geometries. Sci. Rep. https://doi.org/10.1038/s41598-025-30580-7 (2025).
Article PubMed PubMed Central Google Scholar
Rastegarnia, A. et al. Assessment of relationship between grouted values and calculated values in the Bazoft Dam site. Geotech. Geol. Eng. 35, 1299–1310 (2017).
Article Google Scholar
Rouhani, M. M. et al. Cerchar abrasiveness index prediction based on rock properties leveraging hybrid soft computing techniques. Sci. Rep. 15, 37499 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Thangavel, P. et al. Investigation of cold formed steel angle compression through high throughput design FEA and machine learning. Sci. Rep. 15, 19222 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Rouhani, M. M. & Dolatshahi, A. Predicting mode I fracture toughness of CCNBD quasi-brittle specimens using hybrid gradient boosting algorithms. Theor. Appl. Fract. Mech. 140, 104966 (2025).
Article Google Scholar
Rouhani, M. M. et al. Intelligent prediction of flyrock hazards in surface mining using optimized gradient boosting models. Nat. Resour. Res. 35, 1–23 (2025).
Google Scholar
Sunkpho, J. et al. Sustainable use of red mud in concrete: Assessing mechanical strength, durability, and performance through machine learning models. Green. Technol. Sustain. 4 (2), 100283 (2025).
Article Google Scholar
Katuwal, T. B., Panthi, K. K. & Basnet, C. B. Machine learning approach for prediction of TBM performance and risk of jamming in Himalayan geology using a cross-project tunnelling database. Sci. Rep. https://doi.org/10.1038/s41598-025-34273-z (2026).
Article PubMed PubMed Central Google Scholar
Zhong, D. H. et al. Advancements in intelligent dam engineering construction in the intelligence era. J. Hydraul Eng. 56, 1–19 (2025).
Google Scholar
Li, K. et al. Grouting flow hybrid prediction model based on CEEMDAN–Transformer. J. Hydraul Eng. 54, 806–817 (2023).
Google Scholar
Ouyang, S. M. et al. Grouting volume prediction for underground water-sealed caverns based on TPE–GBT model. Mod. Tunn. Technol. 61, 138–145 (2024).
Google Scholar
Tong, F. et al. A hybrid ICEEMDAN-BO-GRU model for real-time grouting parameter prediction. Appl. Soft Comput. 185, 113912 (2025).
Article Google Scholar
Shi, Z. Z. et al. Surrogate model for dam foundation grouting volume prediction based on improved multiple kernel extreme learning machine. Water Resour. Hydropower Eng. 52, 57–66 (2021).
Google Scholar
Sun, D. et al. Intelligent prediction of grouting in fractured rock masses. Intelligent Geoengineering 2, 66–79 (2025).
Article Google Scholar
Zhu, Y. S. et al. Research on an ISSA-Stacking ensemble learning surrogate model of dam foundation grouting volume prediction. J. Tianjin Univ. Sci. Technol. 57, 174–185 (2024).
Google Scholar
Li, R. et al. Explainable rock mass classification under imbalanced TBM data using ensemble learning. Eng. Appl. Artif. Intell. 162, 112641 (2025).
Article Google Scholar
Abdulrahman, S. et al. Hybrid-optimized ensemble models for predicting compressive strength of fly ash and slag concrete. Mater. Today Commun. 38, 113762 (2025).
Article Google Scholar
Sagi, O. & Rokach, L. Ensemble learning: A survey. WIREs Data Min. Knowl. Discov. 8, e1249 (2018).
Article Google Scholar
Tang, Y. et al. Multi-output prediction of TBM operation parameters using stacking ensemble algorithms. Tunn. Undergr. Space Technol. 152, 105952 (2024).
Article Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 785–794ACM, (2016).
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 3146–3154 (2017).
Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
McDonald, G. C. Ridge regression. WIREs Comput. Stat. 1, 93–100 (2009).
Article Google Scholar
Kumar, D. R. et al. Application of random forest and optimization techniques in forecasting the axial load capacity of lipped channel cold-formed steel sections. J. Soft Comput. Civ. Eng. 10, e1950 (2026).
Liashchynskyi, P. Grid search, random search, and genetic algorithm for hyperparameter optimization. Preprint at (2019). https://arxiv.org/abs/1912.06059
Wang, X. et al. Recent advances in Bayesian optimization. ACM Comput. Surv. 55, 1–36 (2023).
Google Scholar
Jabar, A. B. & Thangavel, P. ANN-PSO modelling for predicting buckling of self-compacting concrete column containing RHA properties. Matéria 28, e20230102 (2023).
Article Google Scholar
Negi, G. et al. Grey wolf optimizer: a review and applications. Int. J. Syst. Assur. Eng. Manag. 12, 1–8 (2021).
Article Google Scholar
McGovern, A. et al. Making the black box more transparent. Bull. Am. Meteorol. Soc. 100, 2175–2199 (2019).
Article ADS Google Scholar
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).
Google Scholar
Rouhani, M. M. & Dolatshahi, A. Advanced hybrid models for predicting dynamic compressive strength of rocks: Exploring various input scenarios and introducing novel empirical approaches. Results Eng. 28, 107947 (2025).
Article Google Scholar
Zhe, Y. et al. Prediction model for grouting volume using borehole image features and explainable artificial intelligence. Constr. Build. Mater. 470, 140626 (2025).
Article Google Scholar
Rouhani, M. M., Namin, F. S. & Chakeri, H. Screw conveyor speed prediction for EPB-TBM excavations using hybrid deep learning models. Results Eng. 28, 108064 (2025).
Article Google Scholar
Mishra, D. et al. A review of ensemble learning methods. Int. J. Res. Publ Rev. 6, 3795 (2025).
Article Google Scholar
Pinheiro, J. M. H. et al. The impact of feature scaling in machine learning: Effects on regression and classification tasks. IEEE Access 13, 199903–199931 (2025).
Article Google Scholar
Winter, E. The Shapley value. In Handbook of Game Theory with Economic Applications Vol. 3 (eds Aumann, R. J. & Hart, S.) 2025–2054 (Elsevier, 2002).
Nagelkerke, N. J. D. A note on a general definition of the coefficient of determination. Biometrika 78, 691–692 (1991).
Article ADS MathSciNet Google Scholar
Willmott, C. J. & Matsuura, K. Advantages of the mean absolute error over the root mean square error. Clim. Res. 30, 79–82 (2005).
Article Google Scholar
Sedgwick, P. Pearson’s correlation coefficient. BMJ https://doi.org/10.1136/bmj.e4483 (2012).
Article PubMed Google Scholar

Download references

Acknowledgements

This work has been supported by China Gezhouba Group Co., Ltd. (CGGC).

Funding

This work is funded by China Energy Engineering Corporation Limited (CEEC) through the Intelligent Grouting System Research and Development Program (CEEC2024-KJZX-03).

Author information

Authors and Affiliations

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China
Yahui Ma, Hongwei Lei & Weiquan Zhao
China Gezhouba Group Co., Ltd, Yichang, 443000, China
Zhanquan Yuan
China Gezhouba Group Municipal Engineering Co., Ltd, Yichang, 443000, China
Bo Xiong

Authors

Yahui Ma
View author publications
Search author on:PubMed Google Scholar
Zhanquan Yuan
View author publications
Search author on:PubMed Google Scholar
Bo Xiong
View author publications
Search author on:PubMed Google Scholar
Hongwei Lei
View author publications
Search author on:PubMed Google Scholar
Weiquan Zhao
View author publications
Search author on:PubMed Google Scholar

Contributions

Yahui Ma: Conceptualization, formal analysis, methodology, software, validation, writing—original draft. Zhanquan Yuan: Project administration, supervision, writing—review and editing. Bo Xiong: Resources, supervision. Hongwei Lei: Methodology, software. Weiquan Zhao: Funding acquisition, investigation, supervision, writing—review and editing.

Corresponding author

Correspondence to Weiquan Zhao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ma, Y., Yuan, Z., Xiong, B. et al. Curtain grouting volume prediction using a Bayesian-optimized stacking ensemble model with SHAP analysis. Sci Rep 16, 15374 (2026). https://doi.org/10.1038/s41598-026-45538-6

Download citation

Received: 13 January 2026
Accepted: 19 March 2026
Published: 01 April 2026
Version of record: 19 May 2026
DOI: https://doi.org/10.1038/s41598-026-45538-6

Subjects

Abstract

Similar content being viewed by others

Estimating seepage in heterogeneous earthfill dams on permeable foundations using explainable machine learning

Hybrid machine learning for SCC strength prediction using metaheuristic optimization

Enhancing wellbore stability through machine learning for sustainable hydrocarbon exploitation

Introduction

Research framework

Methodology

Stacking ensemble

Base learners

XGBoost

LightGBM

Random forest

Meta-learner

Bayesian optimization

SHAP

Performance evaluation

Results

Feature correlation analysis

Model training

Model performance evaluation

Visual assessment of prediction accuracy

SHAP analysis

Discussion

Comparative analysis with existing studies

Research limitations

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links