Abstract
The crack healing capacity of self-healing concrete is crucial for enhancing structural durability, especially in aggressive environments where the dynamic progression of healing depth directly influences service life. This study introduces a modeling and prediction approach based on Polynomial Chaos Expansion (PCE) to quantitatively assess the crack cross-sectional repair rate throughout the full healing cycle. A foundational database is first established by statistically identifying key factors governing healing behavior. A first-order PCE surrogate model is developed to characterize the temporal evolution from early-stage variability to nonlinear saturation. Dimensionality reduction combined with order elevation enhances accuracy under limited data conditions. To overcome the constraints of conventional Hermite polynomials bound by Gaussian assumptions, a generalized PCE framework accommodating arbitrary distributions is formulated, enabling broad applicability across healing scenarios. Extrapolative validation on unmodeled healing ages confirms the model’s robustness and reliability throughout all healing stages. This work provides a reliable quantitative framework for predicting service life and optimizing repair strategies in engineering practice.
Similar content being viewed by others
Introduction
Cement-based materials are among the most extensively utilized man-made materials in the construction industry. As demands for extended service life and environmental sustainability continue to rise, the durability of cement-based materials has become a major research concern1,2. Nevertheless, due to their intrinsic brittleness and frequent exposure to harsh service environments, cracking remains inevitable3. If left unrepaired, such cracks allow the ingress of water and aggressive agents, accelerating material degradation and compromising structural integrity4. Consequently, timely and effective crack repair is essential to extend the service life of cementitious structures. Traditional repair methods, such as surface patching, grouting, and structural reinforcement using epoxy resins or polymer mortars, are often costly, labor-intensive, and ineffective in addressing micro-cracks that are difficult to detect and access5,6. In contrast, microbial self-healing concrete offers an autonomous, sustainable solution by enabling the material to repair its own damage2. This technology has attracted growing interest over the past two decades due to its environmental friendliness and potential to improve the durability, stability, and safety of concrete structures7,8,9.
Microbial self-healing concrete incorporates microorganisms into the concrete matrix10. When crack form, the metabolic activity of these microbes induces calcium carbonate precipitation, sealing the cracks and enhancing both durability and service life11,12. The effectiveness of self-healing is typically evaluated through macroscopic indicators such as surface crack closure, water permeability reduction, and chloride diffusion resistance13,14,15; along with non-destructive testing methods including ultrasonic pulse velocity and electrical resistivity measurements16. For instance, Wiktor et al.17 introduced a dual-component microbial healing agent encapsulated within expanded clay particles, achieving healing of cracks up to 0.46 mm—substantially greater than the 0.18 mm closure observed in the control group. Similarly, Zhang et al.18 immobilized Bacillus spores in expanded perlite and reported full crack closure up to 0.79 mm, outperforming the 0.39 mm healing achieved by directly mixed microbes after 28 d. Among various evaluation metrics, crack healing depth is particularly critical as it directly reflects the internal effectiveness of the healing process19. Wang et al.20 used urease-producing microorganisms to repair concrete, achieving healing depths over 7 mm. Ersīan et al.21 applied aerobic bacteria, reporting healing depths ranging from 5 mm to 13 mm, though still limited in scope. Qian et al.22 employed carbonic anhydrase-producing microbes capable of converting atmospheric CO₂ into carbonate ions, achieving crack healing depths of up to 40 mm in cylindrical specimens. Zhan et al.23 further enhanced microbial healing by improving the micro-environment with inorganic hydration products, resulting in over 70% cross-sectional repair.
Polynomial Chaos Expansion (PCE), initially introduced by Wiener24, is a powerful method for representing second-order stochastic processes via a series expansion of orthogonal polynomials, typically Hermite polynomials under Gaussian assumptions. Non-intrusive PCE methods, especially those based on statistical regression, allow flexible sampling strategies and have become widely adopted for uncertainty quantification. Choi et al.25 extended PCE to non-Gaussian variables using Latin Hypercube Sampling (LHS), incorporating significance testing and residual analysis for adaptive polynomial selection. Further developments extended the approach to Gaussian random fields26. Unlike traditional surrogate models, PCE embeds a physical model within a larger stochastic framework parameterized by white noise, thereby enabling simultaneous representation of model inputs and outputs. Although this embedding introduces challenges such as the “curse of dimensionality”, it provides a rigorous mathematical foundation for analyzing convergence and efficiency27,28,29. The stochastic inputs in PCE—often referred to as “random seeds” or “degrees of freedom”—represent statistically independent features needed to describe the quantity of interest (QoI). In conventional PCE, these are associated with predefined probability models of the input parameters30,31,32. However, in the context of self-healing concrete, experimental observations may not conform to standard probability distributions, necessitating the development of a generalized PCE framework capable of accommodating arbitrary input distributions. Based on the comprehensive review of recent advancements in uncertainty quantification for civil engineering applications, generalized Polynomial Chaos Expansion (gPCE) methods have gained notable traction due to their ability to handle non-Gaussian input distributions and provide robust surrogate models for material performance prediction, structural reliability analysis, and durability assessment, as evidenced by studies on concrete strength variability and seismic response modeling33,34,35.
This study aims to develop a robust prediction model for crack healing in self-healing concrete. While the mathematical frameworks employed are well-established in uncertainty quantification literature, this work presents their first application to predict crack healing in self-healing concrete, providing an engineering analysis framework capable of handling real-world data characteristics despite the constrained dataset scope. The remainder of this paper is organized as follows: A preliminary PCE model, developed and validated on microbial concrete specimens at single healing ages, revealed considerable accuracy loss in early-stage predictions due to high variability. To address this, Sobol sensitivity analysis and dimensionality reduction based on functional relevance were employed to exclude less influential predictors and improve model robustness. Furthermore, a generalized arbitrary-distribution PCE (aPCE) framework was developed using Kernel Density Estimation (KDE) to construct adaptive orthogonal polynomial bases from non-parametric input distributions. This generalized approach demonstrated high predictive reliability across unmodeled healing ages, notably enhancing the flexibility and applicability of PCE in uncertainty quantification for cementitious systems. These findings establish a robust theoretical basis for extending PCE methodologies within concrete durability research and for optimizing the lifecycle performance of self-healing materials.
Experimental materials and methodology
Raw materials and healing agent composition
The primary constituents used in the preparation of the self-healing mortar included ordinary Portland cement, fine aggregate, mixing water, a high-range water-reducing agent, and a specialized healing agent. Ordinary Portland cement (P.O 42.5), supplied by China Resources Building Materials Technology Holdings Limited, was employed as the binder, and Supplementary Table 1 shows its chemical composition. Natural river sand with a fineness modulus of 2.6 served as the fine aggregate. Tap water was used for mixing, and a polycarboxylate-based superplasticizer was used to enhance workability.
The healing agent, functioning as the core self-healing component of the mortar, was independently developed by the research team36. As shown in Fig. 1a, the agent features a core-shell structure, consisting of microbial and an inorganic functional phase. The selected microorganism, Bacillus mucilaginosus (AS 1.232), is characterized by its high enzymatic production of carbonic anhydrase, facilitating biomineralization. The inorganic phase consists of sodium aluminate (NaAlO2) and calcium nitrite (Ca(NO2)2) in a fixed mass ratio of 3:1, enabling the rapid in situ formation of calcium-aluminum layered double hydroxides (CaAl-LDHs) upon exposure to aqueous environments. The core of each healing particle has an approximate diameter of 2 mm, while the full encapsulated agent, including the shell, measures about 3 mm in diameter. This design optimizes both mechanical integrity and functional material loading. The shell material is composed of fast-setting calcium aluminate cement, providing structural integrity and compatibility with the cementitious matrix. Following fabrication, the healing agents were allowed to rest for 12 hours and subsequently cured by water spraying for 7 days prior to incorporation into the mortar.
a Preparation of bacterial cultures and bio-agents. The process begins with medium preparation, followed by autoclave sterilization, microbial inoculation, and shaking incubation. The cultured bacteria are then centrifuged and harvested. Microbial activity is verified before the bacterial solution is spray-dried into a powder for self-healing agent preparation, which is finally incorporated into concrete specimens. b Multidimensional crack-healing assessment methodology. A comprehensive suite of non-destructive and physical tests is employed to evaluate healing performance. Diagrams illustrate the corresponding testing devices. The workflow for specimen conditioning, including crack generation and dry-wet cycle curing, is also shown.
Mortar specimen preparation and curing protocol
Based on prior experimental findings23, the optimal dosage of the healing agent was determined and incorporated into the mortar by replacing an equivalent mass of fine aggregate (sand) during the mixing process. The detailed mix proportions for the self-healing mortar are summarized in Table 1. Following casting, all specimens were subjected to standard curing conditions in a controlled environment (relative humidity: 90%, temperature: 20 ± 3 °C) for 14 days prior to crack induction.
Cracks were introduced using a universal testing machine, with the loading rate strictly controlled between 0.10 and 0.20 kN/s. Loading was halted immediately upon the visual detection of surface cracks. In cases where the initial loading failed to produce a crack width of at least 0.5 mm, the specimen was reloaded until the desired crack width was attained. All specimens were ultimately adjusted to a uniform target crack width of approximately 1 mm. To simulate realistic service conditions conducive to microbial self-healing, a wet–dry cyclic curing regime was implemented. Each cycle lasted 12 hours, consisting of alternating wet and dry phases. The artificial seawater recipe was prepared according to the ASTM D1141 standard, ensuring that the ion concentrations match those of natural seawater. All chemical reagents were of analytical grade and dissolved in deionized water to eliminate the influence of impurities. Supplementary Table 2 shows the simulated seawater formula. All experiments were conducted in a constant-temperature room, where the temperature was maintained at 25 ± 2 °C. The complete curing protocol is illustrated in Fig. 1(b).
Evaluation methods for crack healing efficiency
The macroscopic healing rate of cracks may be influenced by both surface and internal healing mechanisms, as well as their synergistic interaction. To accurately evaluate internal crack healing, a multi-dimensional characterization approach was adopted. An overview of the experimental setup used for this analysis is presented in Fig. 1b.
An industrial camera (ZQ-616) was used to capture high-resolution images of the specimen surfaces (φ100 mm × 100 mm) both before and after the healing process. To ensure measurement consistency, lighting conditions, camera-to-sample distance, and vertical alignment were maintained constant throughout image acquisition. The captured images were processed using “Image J” software, where binarization was applied to quantify the pixel area corresponding to each crack. The surface healing ratio (PA) was calculated according to Eq. (1), where A0 represents the initial crack pixel area, and At is the pixel area after t d of healing.
Water permeability was assessed using a custom-designed test apparatus. A polyvinyl ether (PVE) pipe was securely mounted on the top surface of each cylindrical specimen (φ100 mm × 100 mm) and tightly fastened using an iron clamp to prevent leakage. A constant hydraulic head of 500 mm was applied, and the mass of water that seeped through the crack over a 3-minute period was recorded. The permeability recovery rate (PK) was calculated using Eq. (2), where Qw0 and Qwt are the water flow rates before and after t d of healing (m3/s); h is specimen height (mm); A is the contact area (mm2); and ΔL is the hydraulic head (mm).
Chloride resistance was evaluated using an NELD-RUL530 device. A 3% NaCl solution and 0.3 mol/L NaOH solution were used as electrolytes in the cathode and anode chambers, respectively. A 60 V DC voltage was applied across the specimen for a duration of 2 hours, during which current values were recorded every 5 minutes. The chloride resistance recovery (PQ) was calculated using Eqs. (3) and (4), where It is the current at minute t (mA); Q0 is the total charge passed before healing (mA); and Qt is the charge after t d.
Ultrasonic pulse velocity measurements were conducted using an HC-U81 ultrasonic tester to evaluate internal crack healing. Transducers were symmetrically positioned on opposite sides of the crack, at a distance of 20 mm from the crack edge. Vaseline was applied as a couplant to ensure effective acoustic transmission. The initial pulse velocity at each time point was recorded. The recovery rate of ultrasonic pulse velocity (PV) was calculated using Eq. (5), where vt is the wave velocity after t d (km/s); vi is that of an intact specimen; and v0 is the initial cracked state.
Prismatic specimens (40 mm × 40 mm × 160 mm) were prepared by incorporating 0.6% conductive carbon fibers by volume. The fibers had an average length of 0.6 mm, a diameter of 0.7 μm, and an electrical resistivity ranging from 1.0 to1.6 Ω·cm. Nickel electrodes (20 mm × 60 mm × 0.2 mm) were embedded at both ends of the specimen, inserted to a depth of 20 mm, and spaced 120 mm to establish a conductive path. A constant test voltage of 30 V was applied during the measurements. The electrical conductivity recovery rate (PR) was calculated using Eq. (6), where I0, Ii, and It represent current values of the cracked, intact, and healed specimens, respectively, at day t.
PCE framework
Wiener-Askey polynomial chaos
If X(ω) is a stochastic process composed of uncertain parameters \({{\rm{\xi }}}=({{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{1},{{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{2},\ldots ,{{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{3},\ldots )\) that follow a Gaussian distribution, then X(ω) can be expressed as 25:
where \({{{\rm{a}}}}_{0},{{{{\rm{a}}}}_{{{\rm{i}}}}}_{1},{{{{\rm{a}}}}_{{{\rm{i}}}}}_{1},{{{\rm{i}}}}_{2},{{{{\rm{a}}}}_{{{\rm{i}}}}}_{1},{{{\rm{i}}}}_{2},{{{\rm{i}}}}_{3},\ldots\) represent deterministic coefficients, and \({{{\rm{H}}}}_{{{\rm{n}}}}({{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{1},{{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{2},\ldots ,{{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{{{\rm{n}}}})\) denotes the n-th order Hermite polynomial. For convenience, Eq. (7) can also be abbreviated as:
Each \({{{\rm{H}}}}_{{{\rm{n}}}}({{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{1},{{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{2},\ldots ,{{{{\rm{\xi }}}}_{{{\rm{i}}}}}_{{{\rm{n}}}})\) corresponds one-to-one with ψi(ξ). The Hermite chaos polynomial basis {ψi} forms a complete orthogonal basis, that is:
In a Hilbert space, the inner product is defined as:
The weighted function is expanded as:
A key distinguishing feature of the Hermite chaos expansion lies in its use of Hermite polynomials as basis functions, which are specifically associated with Gaussian random variables. These polynomials are orthogonal with respect to a weighting function W(ω), which corresponds to the probability density function of an n-dimensional independent standard Gaussian distribution. The general form of a multivariate Hermite polynomial can be expressed as:
The Hermite polynomial satisfies the following recurrence relation, which defines the sequence recursively and is fundamental to their construction:
Hermite polynomials possess orthogonality with respect to the weight function associated with the standard Gaussian distribution. Specifically, they are orthogonal under the weighting function:
Wiener-Hermite chaos polynomials
The Hermite chaos expansion is particularly effective for solving stochastic differential equations with Gaussian inputs, and in certain cases, it can also be applied to specific non-Gaussian inputs. Its theoretical foundation is supported by the Cameron–Martin theorem. However, for general non-Gaussian input distributions, the optimal exponential convergence rate typically associated with Hermite chaos may not be achieved, and in some instances, the convergence rate can deteriorate quickly.
To address a broader class of stochastic inputs, the generalized polynomial chaos (gPC) framework extends the Wiener–Hermite basis by employing orthogonal polynomials from the Askey scheme. The Askey polynomials constitute a family of hypergeometric orthogonal polynomials, each associated with a specific probability distribution. These polynomials can be expressed as:
Among them, (a)n and (b)n are Pochhammer symbols, which are defined as:
In Eq. (16), the denominator cannot be a negative integer; otherwise, the equation will be meaningless. However, if the molecule is a negative integer, Eq. (16) can be truncated as:
By taking advantage of the orthogonality of Askey polynomials, polynomials of different orders satisfy:
Or
Here, W(x) is the weight function associated with the corresponding orthogonal polynomial. When the parameters r and s are 2 and 1 respectively, the Askey polynomial reduces to the Hermite polynomial, and the generalized expression (e.g., Eq. (19)) simplifies to the specific case of the Hermite polynomial (e.g., Eq. (15)). Similar to the Winer-Hermite expansion, the generalized polynomial chaos can represent the random process W(ω) as:
Here, In represents the Askey polynomial, a0 represents the mean of X(ω), and the sum of the squares of ai(i > 0) represents the variance of X(ω). Table 2 shows the distribution of common random variables and their corresponding Askey polynomials37.
Extension to arbitrary distributions via aPCE
By applying Z-score normalization, the input parameters are transformed into a form with zero mean and unit variance. The transformation is defined as follows, where μj is the sample mean, and σj is the sample standard deviation:
In the probabilistic modeling stage, the joint probability distribution of the input variables is constructed using non-parametric Kernel Density Estimation (KDE). The Gaussian kernel function K(u) is employed, and the bandwidth h is adaptively determined based on Silverman’s rule of thumb:
In the orthogonal basis construction stage, the s method is used to generate polynomial bases adapted to arbitrary probability measures:
The above formula shows the forms of the first two orders of orthogonal polynomials for each parameter, where δ is the Kronecker delta function, and their orthogonality satisfies:
Coefficient determination method for chaos polynomials
In practical applications, polynomial chaos expansions are limited to a finite number of terms. Therefore, Eq. (8) needs to be truncated:
Here, P is the number of terms in the chaos polynomial, which is determined by the number of variables M and the polynomial order p:
By projecting the expanded random parameters onto the expansion coefficients, the response can be expressed as:
The above Eq. (29) can be abbreviated as:
Based on least square regression, the coefficient matrix of the PCE can be estimated as:
The choice of least squares over alternatives like singular value decomposition (SVD) is justified by the full-rank design matrix, low computational cost, and it avoids the complexity of SVD.
Sobol sensitivity indices and error quantification
For variance-based global sensitivity indices, they can be obtained through post-processing of the coefficients. The formula for the main Sobol index is:
In Eq. (32):
The formula for the total Sobol index is:
And
A higher value of Si indicates a greater individual contribution of input variable i to the variability of the model output. In contrast, a higher value of STi reflects not only the main effect of variable i but also its interaction effects with other input variables. For any input variable i, the following inequality always holds 0≤Si ≤ STi ≤ 1.
The normalized empirical error ∈emp provides an estimate of the generalization error by quantifying how accurately the surrogate model reproduces the outputs of the original model at the experimental design points. It is calculated using the following expression:
Here, \({\hat{\mu }}Y\) is the sample mean of the experimental design responses.
Although the estimator defined in Eq. (37) offers low computational cost, it is prone to overfitting. This is because it behaves as a monotonically decreasing function of the polynomial order p, irrespective of the size of the experimental design. To address this limitation, the leave-one-out cross-validation (LOOCV) error, denoted ∈LOOCV, is employed. This metric aims to mitigate overfitting by leveraging cross-validation techniques rooted in statistical learning theory38. The LOOCV procedure involves constructing N surrogate models M PC/i, each trained on a reduced experimental design X \ x(i)={x(j), j = 1,…, N, j ≠ i}, obtained by omitting the ith sample. The prediction of M PC/i at the excluded input x(i) is then compared with the true output value y(j). The LOOCV error can be expressed as:
In practical applications, once the results of least squares minimization are obtained, it is not necessary to explicitly construct N independent surrogate models. Instead, the leave-one-out error can be efficiently calculated using the following formula:
Here, hi is the ith component of the vector given by the following formula:
Parameter distributions and settings
In this study, a Polynomial Chaos Expansion (PCE) model was developed using MATLAB in conjunction with the UQLab uncertainty quantification framework to evaluate the sensitivity of multiple variables influencing the performance of self-healing cementitious materials. The experimental dataset comprised 90 samples, organized into three groups based on curing age (7, 14, and 28 d), with 30 samples per group. Five input variables are considered: crack surface repair rate (a), crack water seepage repair rate (k), crack resistivity-based repair rate (r), crack ultrasonic velocity repair rate (v), and crack anti-chloride repair rate (m). The output variable was defined as the crack cross-sectional repair rate (f). f is a comprehensive output variable predicted via the PCE model from the five healing indicators (a, k, r, v, m), representing the overall repair efficiency as a percentage, and is not derived as a time derivative of any individual indicator.
For each curing age group, the input variables were modeled as Gaussian random variables, with distribution parameters estimated from the sample means and standard deviations. PCE models were constructed using ordinary least squares (OLS) regression, employing Hermite polynomials as basis functions to capture nonlinear relationships between inputs and the output. According to Eq. (28), the number of PCE coefficients P depends on the number of variables M and the selected polynomial order p. Given the five input parameters and a sample size of 30 samples per group, polynomial orders of 1 and 2 were adopted to ensure solution convergence and computational stability.
To enhance model interpretability, only important terms—those with absolute coefficient values greater than 0.001—were retained from the PCE expansion to derive physically meaningful analytical expressions. Sobol global sensitivity analysis was subsequently performed using 1000 Monte Carlo samples to compute the first-order sensitivity indices (Si) and total effect indices (STi), thereby quantifying the influence of each parameter on the model output. The Sobol indices were primarily computed directly from the PCE expansion coefficients following the analytical formulation by Sudret39, while Monte Carlo simulation-based estimation was concurrently employed as a supplementary measure to ensure robustness and facilitate comparability with alternative sensitivity analysis methodologies. Model validation was conducted by plotting predicted versus observed values and visually assessing prediction accuracy with respect to the ideal 1:1 reference line. This integrated data-driven approach, combining PCE-based surrogate modeling with rigorous uncertainty quantification, provides a robust framework for sensitivity analysis in complex, multivariable material systems.
Results and Discussion
Multidimensional statistical characterization of repair effects
Figure 2 shows the actual effect of crack repair and the statistical characterization results of the multidimensional crack repair performance. In this study, the term “multidimensional” refers to the use of multiple independent physical and engineering metrics to characterize crack healing, forming a multi-index feature space for comprehensive assessment. As summarized in Table 3, the data for each curing age group exhibit approximate normal distributions, in line with experimental expectations.
a Initial surface conditions of samples. b Temporal evolution of crack healing visualized through schematic progression and cross-sectional micrographs. c Statistical assessment of healing efficiency, presented as the evolving distribution of repair rates over time. The evaluated healing indices include crack surface repair rate (a), crack water seepage repair rate (k), crack resistivity-based repair rate (r), crack ultrasonic velocity repair rate (v), and crack anti-chloride repair rate (m).
With respect to curing age, the crack surface repair rate ranges from 62% to 68% at 7 d. This value increases to over 90% at 14 d, reaching a peak of 94.68%. By 28 d, surface cracks are nearly completely repaired, with rates clustered around 97%. These results demonstrate the early-stage efficacy of the healing agent and suggest that the majority of surface cracks are essentially repaired by day 14.
Upon cracking, water rapidly infiltrates the openings. Wider cracks facilitate faster seepage; however, as healing progresses, repair products fill the cracks and reduce permeability. Consequently, the water seepage resistance recovery rate serves as an indirect indicator of internal healing. At 7 d, this rate exceeds 66%, increasing by approximately 20% by day 14 and stabilizing near 90%. Between 14 and 28 days, the rate improves by only about 5%, confirming that the healing agent reaches optimal functionality by 14 days. Subsequent improvements are primarily attributed to continued microbial mineralization that densifies the filled regions.
In the early stages of healing, crack surfaces are in point contact. As healing progresses, the accumulation of repair products expands the contact area, transitioning from point to surface contact, thereby notably lowering electrical resistivity. The resistivity recovery rate, therefore, provides a direct measure of surface-level healing. At 7 d, the rate averages around 47%, which is lower than the corresponding surface and seepage recovery rates due to the limited connectivity of early-stage internal products. It increases to approximately 58% at 14 d and stabilizes between 68% and 72% at 28 d.
Cracks impede the propagation of ultrasonic waves, reducing wave velocity. As mineralized products bridge the cracks, ultrasonic transmission improves. The ultrasonic velocity recovery rates are approximately 36%–42% at 7 d, 57%–61% at 14 d, and 65%–70% at 28 d. These trends closely align with those observed in resistivity recovery, supporting their combined use as indicators of internal crack healing.
In the accelerated chloride ion penetration test, an applied electric field drives ion migration, enabling quantification of permeability reduction due to healing. The chloride resistance recovery rate is approximately 53% at 7 d, increases by 12% at 14 d, and reaches 77% at 28 d. This index exhibits slightly higher values than resistivity and ultrasonic recovery, suggesting it is less sensitive to variations in internal microstructure during healing.
Crack cross-sectional repair rate, which directly reflects internal healing product distribution, requires destructive testing and is thus not reproducible. At 7 d, the internal product distribution rate is approximately 50%, rising to 70% at 14 d and improving by only 8% by 28 d. This indicates that later-stage healing primarily results from the densification of existing repair products through microbial mineralization.
Table 4 demonstrate that the Relative Standard Deviation (RSD) for all sample groups remains below 3%, which robustly confirms the high repeatability and low variability of the experimental outcomes. This exceptional consistency is attributed to our stringent specimen preparation protocols, including the use of identical molds for all specimens and tightly controlled environmental conditions during curing. Furthermore, the adoption of the crack surface repair rate rather than absolute depth measurements effectively eliminated biases arising from dimensional variations across specimens. These statistical indicators, including the low RSD values and precise mean ± standard deviation reporting, notably enhance the reliability of the PCE model and underscore its capability to support accurate full-cycle predictions of crack healing behavior.
In summary, evaluating the healing agent’s performance requires consideration of both surface healing and internal recovery. While surface healing can be readily observed and quantified, traditional internal evaluation methods often lack precision and fail to correlate strongly with actual crack depth. Advanced imaging techniques, such as X-ray computed tomography (X-CT), offer improved accuracy but are limited by high cost and specimen size constraints. By integrating multidimensional characterization indicators with the PCE model, this study provides a robust and efficient method for predicting internal crack healing performance, thereby enhancing the practical applicability of self-healing concrete technologies.
Uncertainty quantification based on first-order PCE model analysis
Sensitivity and performance indicators
Figure 3a and b present the coefficient distributions of the PCE models and the corresponding sensitivity analysis results across three curing ages. At 7 d, the parameter k exhibits a clearly dominant influence on the output, with a first-order Sobol index of 0.7904, contributing to 79.04% of the output variance. In contrast, the parameter m shows negligible sensitivity (Si = 0.0004). These findings suggest that, in the early-stage of curing, repair performance is primarily governed by the recovery of water seepage resistance, while chloride resistance plays a minimal role. As shown in Table 5, the First-order PCE model’s LOOCV error remains relatively high at 7 d (LOOCV = 1.5654); however, the low coefficient of variation in the prediction mean implies a rapidly evolving repair system. This may involve nonlinear interactions among parameters that are not fully captured by the first-order PCE model.
First-order PCE model results showing a the distribution of polynomial coefficients, b the main Sobol index (Si) and the total Sobol index (STi) for each input variable (where a, k, r, v, m represent crack surface repair rate, crack water seepage repair rate, crack resistivity-based repair rate, crack ultrasonic velocity repair rate, and crack anti-chloride repair rate, respectively), and c a parity plot comparing model predicted values (Pv) against true measured repair rates (Tv). d Comparison of prediction errors between the first-order model and a benchmark method at curing times of 7, 14, and 28 days. e Result of dimensionality reduction for the second-order PCE model based on minimizing the Leave-One-Out Cross-Validation (LOOCV) error. Second-order model results showing f Sobol sensitivity indices (where a, k, m zrepresent crack surface repair rate, crack water seepage repair rate, and crack anti-chloride repair rate, respectively) and g the distribution of polynomial coefficients. h Enhanced comparison of prediction performance between the second-order and first-order models.
At 14 d, the influence of m becomes more influential, with its Si rising to 0.5446. Simultaneously, the sensitivity of parameter v increases substantially to 37.47%, highlighting the increasing importance of internal densification in the healing process. Notably, the influence of k drops sharply to 2.27%, reflecting the time-dependent nature of water seepage resistance mechanisms. The model’s predictive performance improves markedly at this stage, with the LOOCV error decreasing to 0.0647.
By 28 d, m demonstrates a clear dominant effect, with an Si accounting for 61.67% of the output variance. Parameters k and a contribute 18.66% and 15.98%, respectively, while r and v exhibit sensitivities below 3.3%. At this stage, the model achieves high predictive accuracy, with an average predicted value of 77.8423 and a coefficient of variation of just 1.279%. These results indicate that the repair system has reached a stable and mature state.
While autogenous healing from continued cement hydration undoubtedly occurs under wet-dry cycles, the designed healing agent dominates the repair process, especially in the deeper crack regions. This is substantiated by the strong performance and high sensitivity of the internal recovery indicators, which exhibit a synergistic enhancement beyond what is typically achievable by autogenous healing alone. The PCE model’s reliance on these internal parameters further confirms the predominant role of the autonomous healing mechanism in the overall recovery performance.
Comparison between predicted and measured values
Figure 3d compare the PCE model predictions (Pv) with the experimentally measured values (Tv) across the three curing ages. The results demonstrate a clear trend of improving predictive performance with increased curing time. At the early curing stage (7 d), the model shows notable limitations. For instance, a notable underestimation is observed for Tv/Pv = 52.85/49.88. Systematic bias is also evident within the 48–51 range, such as an overprediction of 1.14 for Tv = 48.54 and an underprediction of 1.91 for Tv = 51.99. The error distribution at this stage displays a distinct “bimodal” pattern—predictions for low-strength samples (<49) are generally overestimated by 1.2-1.5, while those for high-strength samples (>51) are underestimated by 1.8-2.3. These discrepancies suggest that the model struggles to capture the nonlinear and highly variable behavior characteristics of early-stage repair. The inclusion of higher-order polynomial terms might enhance prediction accuracy in this phase.
By 14 d, the model’s predictive performance stabilizes, with only minor deviations observed. In the central range of 69–72, representative samples such as Tv/Pv with 72.09/71.92 (error = 0.17), and Tv/Pv with 71.70/71.56 (error = 0.14) demonstrate a high level of agreement between predictions and experimental values, confirming the model’s reliability in capturing mid-stage healing behavior. At 28 d, the model predictions align closely with the measured values. For instance, Tv/Pv with 79.26/79.61 (error = 0.35), and Tv/Pv with 76.67/77.17 (error = 0.50) indicate that most prediction errors fall within ±1.0. Only a few samples exhibit slightly larger discrepancies, such as Tv/Pv with 77.29/78.58 (error = 1.29). The overall mean absolute error at this stage is approximately 0.53, reflecting the model’s strong capacity to characterize material performance during the mature healing phase.
Notably, all three datasets exhibit reduced prediction accuracy at the extremes of distribution. For example, at 28 d, the maximum measured value (Tv = 80.02) is underestimated by 0.52, while the lowest value at 7 d (Tv = 48.44) is overestimated by 1.8. These edge-case discrepancies may arise from underrepresentation of extreme values in the training data or from unmodeled multi-factor coupling effects, particularly during early-stage repair. The degradation in model predictive accuracy at earlier ages is consistent with the increasing complexity and stochasticity of initial self-healing mechanisms, highlighting the potential value of incorporating multi-physics coupling and higher-order PCE terms in future model enhancements.
To comprehensively evaluate the predictive performance and robustness of the model, three cross-validation methods were employed: LOOCV, 5-Fold Cross-Validation (5-Fold CV), and 10-Fold Cross-Validation (10-Fold CV). The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) across different curing ages (7 d, 14 d, 28 d) are summarized in the Fig. 3c. The results demonstrate an important time-dependent performance of the model. Specifically, both MAE and RMSE exhibit a systematic decreasing trend as the curing age increases. For instance, from 7 d to 28 d, the RMSE values across all validation methods decreased by approximately 60–70%. This trend aligns strongly with the physical principles of the repair process: during the early stage (7 d), the healing reactions are highly dynamic and non-equilibrium, leading to greater data variability and increased prediction uncertainty. As the curing progresses, the repair products accumulate and stabilize, and the dominant healing mechanisms become more pronounced, resulting in a notable improvement in model prediction accuracy.
Uncertainty quantification via higher-order PCE based on model analysis
Model simplification through dimensionality reduction
To enhance model accuracy, the polynomial order of the PCE model was increased to second-order, thereby expanding the number of basis functions to 21. However, due to the limited dataset size—only 30 samples per curing group—this increase in model complexity led to a substantial rise in LOOCV errors, indicating pronounced overfitting. To address this issue, a dimensionality reduction strategy was employed by integrating first-order sensitivity analysis with a mechanistic understanding of each variable’s contribution to the crack repair process. As illustrated in Fig. 2c, the parameter r consistently exhibited a negligible influence on the model output across all curing ages. Parameters v and m both relate to internal crack healing mechanisms. However, based on their cumulative sensitivity contributions, v was identified as less critical and excluded from further modeling. Consequently, three key input parameters a, k, and m were retained to reconstruct a simplified second-order PCE model, reducing the number of basis functions from 21 to 10.
In comparison with the original five-variable model, this reduced model improved predictive performance by lowering the LOOCV errors. The original second-order PCE model yielded LOOCV errors of 11.51 for the 7-day dataset and 11.08 for the 28-day dataset—values indicative of overfitting due to an imbalance between model complexity (21 basis terms) and sample size. After excluding the low-sensitivity variables r and v, the simplified model achieved the LOOCV errors of 2.23 and 3.68 for the same datasets, corresponding to reductions of 80.6% and 66.8%, respectively, as shown in Fig. 3e.
These results underscore the engineering feasibility and effectiveness of applying dimensionality reduction in PCE modeling. The simplified model not only aligns more closely with the available data but also provides a more robust and reliable framework for multi-stage prediction of crack healing performance in self-healing cementitious materials.
Second-order sensitivity and performance metrics
This section presents the enhanced interpretability of concrete crack repair mechanisms achieved through a second-order PCE model based on three key variables: m, k, and a. In the 28-day model, although the LOOCV error increased slightly from 0.54 to 3.68, the total Sobol index STi for m rose substantially from 61.67% to 87.26%. Moreover, the interaction effect between a and m reached a high coefficient of 0.8999, revealing strong synergistic behavior during the late-stage repair process. In the 7-day model, k retained a dominant first-order Si of 0.4609, and its interaction with a have reach 0.2854, which underscored the critical role of water seepage resistance in early-stage healing. These outcomes are consistent with experimental observations: during the initial repair stages, k consistently exhibited the highest STi, and its interaction with a highlighted the practical significance of surface-seepage coupling effects. Similarly, the elevated STi of m at the 28 d validated the model’s capacity to accurately capture the dominant mechanisms governing long-term repair behavior.
Figure 3f–h show the coefficient distributions and Sobol indices of the second-order PCE model. Compared to the five-variable first-order PCE model, the simplified three-variable second-order formulation offers a more favorable balance between model complexity and limited sample size while preserving predictive accuracy. As shown in Table 5, the 14-day Second-Order PCE model achieved a low LOOCV error of 0.0952 and a coefficient of variation of 3.896%, both within acceptable bounds. At this intermediate curing stage, both m and a demonstrated substantial total Sobol effects (STi has reached 0.4551 and 0.5103, respectively), indicating notable synergistic contributions to the repair process. Additionally, the explicit inclusion of the k × a interaction term in the 7-day model effectively addressed the mechanistic underrepresentation in the first-order formulation, while ensuring model stability, as evidenced by CV values of 2.50% at 7 d and 1.97% at 28 d.
Overall, the second-order PCE model improves both the predictive robustness and mechanistic interpretability of the self-healing process. The distribution of Sobol indices aligns well with observed crack repair dynamics, confirming the model’s reliability and practical relevance. This framework is particularly well-suited for scenarios involving limited experimental data, where understanding the interactions among key repair factors is essential for performance optimization and material design.
Prediction reliability of the second-order PCE model
The second-order PCE model, constructed using three variables—m, k, and a—and refined through dimensionality reduction from the original five-variable formulation, exhibits multidimensional advantages in predictive performance. As shown in Fig. 3(f), targeted feature selection and parameter restructuring reduce the MAE in 28-day strength predictions from 0.4430 to 0.2664, marking a 40.9% improvement in predictive accuracy. Notably, in the high-strength regime, the inclusion of second-order terms enables the model to better capture the nonlinear decay behavior of the repair agent, resulting in an RMSE reduction from 0.5792 to 0.3572. This enhancement is primarily attributed to the exclusion of low-sensitivity variables prone to multicollinearity and the reinforcement of dominant interaction effects through second-order polynomial terms.
Consistent performance improvements were observed across all curing ages. The most pronounced improvement occurred at 7 d, where the MAE decreased by 20.4%, driven by the model’s improved representation of the interaction between k and a. Although the accuracy gains at 14 and 28 d were comparatively moderate, the steady increase in R2 values across these stages indicates a progressively stronger capacity to explain variance in the experiment data.
Importantly, the second-order model not only delivers superior prediction accuracy but also reduces the dimensionality of the input space. This confirms the efficacy of the dimensionality reduction strategy in maintaining model fidelity while mitigating the risk of overfitting, particularly in the context of limited data availability. The approach offers practical advantages in engineering applications where data collection is constrained by time, cost, logistics, or technical limitations. By balancing accuracy, interpretability, and computational efficiency, the proposed model serves as a robust and scalable tool for predicting the performance of self-healing cementitious materials across different curing stages.
Model construction and evaluation based on aPCE
Distribution-aware basis construction via Stieltjes polynomials
The PCE models discussed in the preceding sections were constructed and validated for specific curing ages individually. To develop a generalized predictive model with enhanced extrapolation capability across the entire healing process, a comprehensive dataset integrating samples from all curing ages (7 d, 14 d, and 28 d) was established. It is important to note that the Gaussian assumption in Table 2 is applied to the data within each individual curing age group (7 d, 14 d, 28 d), where the low coefficient of variation justifies its use for age-specific PCE models. In contrast, the non-parametric distributions depicted in Fig. 4a are derived from the combined dataset pooling all aging stages, which captures the evolution of parameter values over time and thus exhibits more complex, non-Gaussian shapes. This distinction is fundamental: the former supports static, age-wise analysis, while the latter enables the development of a unified, dynamic prediction model (aPCE) valid across the entire healing cycle.
a Characterization of input variables and polynomial basis. The top panel shows the kernel density estimation (KDE) plots showing the probability density function (PDF) of each input variable, and the bottom panel displays Stieltjes orthogonal polynomials derived from the empirical distributions. b Age-specific prediction verification. Parity plots compare model predictions (Pv) against experimentally measured true values (Tv) across 7, 14, and 28 curing days. Blue markers denote Pv-Tv pairs; yellow/blue bars indicate sample count distributions; red dashed line represents ideal 1:1 correspondence. c Comprehensive comparison of predicted versus actual results across the entire curing timeline. Dataset-wide comparison of Pv and Tv, including a ± 20% error band (shaded) and the ideal fit line. d Model performance evaluated using multiple statistical metrics (where a, k, r, v, m represent crack surface repair rate, crack water seepage repair rate, crack resistivity-based repair rate, crack ultrasonic velocity repair rate, and crack anti-chloride repair rate, respectively). e Global sensitivity analysis through the Sobol index. The main Sobol index (Si) represents individual contributions to output variance; the total Sobol index (STi) includes interaction effects.
Kernel Density Estimation (KDE) was employed to accurately characterize the empirical distributions of input variables in a non-parametric, data-driven manner. As illustrated in Fig. 4a, the distribution of variable a (x = 0.7, PDF = 1.18) exhibits a sharply peaked density, delineating a distinct critical influence zone and effectively avoiding the systemic bias typically introduced by assuming a Gaussian prior. In contrast, variable r (x = 0.15, PDF = 0.28) displays a strongly right-skewed distribution, concentrated in the low-value region, revealing its heightened sensitivity to outliers. Variables k (x = 0.69, PDF = 0.98) and v (x = 0.64, PDF = 0.4) present broad-peaked distributions, indicating relatively uniform influence across their respective domains. Meanwhile, variable m (x = 0.28, PDF = 0.2) approximates a uniform distribution, reaffirming its marginal contribution in first-order sensitivity terms.
The principal strength of KDE lies in its data-driven adaptability, which facilitates the derivation of realistic probability density functions for constructing customized polynomial bases via the Stieltjes procedure. These orthogonal polynomials are tailored to the empirical input distributions, thereby enabling optimal representation of the system’s nonlinear characteristics. For example, the polynomial basis derived for the right-skewed r variable—whose mode is concentrated near x = 0.15—assigns greater weight to the 0-0.3 interval, enabling more accurate modeling of localized sensitivities. Similarly, the peaked distribution of a leads to basis functions that are highly responsive around x = 0.7, effectively addressing the mismatches encountered when employing classical Hermite polynomials under Gaussian assumptions. For variables such as k, v, and m, which exhibit flatter or near-uniform distributions, the Stieltjes method naturally yields low-oscillation polynomial bases, minimizing overfitting while maintaining numerical stability.
The core advantage of this methodology lies in its distribution-adaptive construction of polynomial bases, which substantially improves prediction accuracy—as evidenced by a reduced LOOCV error of just 1.8%—while preserving the orthogonality required for rigorous Sobol sensitivity index calculations. By integrating KDE to extract the true empirical distributions with the Stieltjes procedure to generate aligned basis functions, this approach effectively eliminates systemic biases associated with traditional PCE frameworks constrained by fixed distributional assumptions. The resulting univariate Stieltjes polynomials for each input variable are detailed in Eqs. (41)–(45).
Sensitivity analysis of aPCE model
As shown in Fig. 4(d), Sobol sensitivity indices derived from the aPCE model identify variable r as the dominant contributor to output variability. This predominance stems from r’s direct representation of the reconstruction of the internal conductive network within cracks—a process fundamentally linked to the integrity of internal self-healing. As the self-healing agent permeates into the concrete crack and chemically interacts with the matrix, the continuity of the crack is gradually restored. The reformation of conductive pathways leads to a substantial reduction in electrical resistivity, which exhibits an approximately linear relationship with the extent of healing. Consequently, r not only displays the highest Sobol index but also possesses the largest linear coefficient in the aPCE model, notably surpassing those of other input variables. In contrast, variables such as a, k, and m primarily characterize surface-level sealing phenomena, including the formation of thin films or shallow infill layers on the concrete surface. While such superficial healing processes can temporarily enhance impermeability, they are insufficient for long-term structural durability if internal restoration remains incomplete. Surface layers are particularly vulnerable to secondary cracking caused by stress concentration or environmental degradation. Therefore, the pronounced influence of r underscores the critical role of internal healing in determining both mechanical resilience and long-term structural stability.
Nevertheless, the interaction between surface-level indicators and internal healing retains substantial engineering relevance. For instance, a high value in conjunction with elevated r suggests that the healing agent has achieved gradient restoration from the surface to the interior, thereby mitigating the risk of “pseudo-sealing” due to unhealed internal sections. Similarly, moderate improvements in k and m, while limited in their individual contributions, may serve as supporting evidence of multi-scale healing performance. This is particularly applicable in systems utilizing microbial agents that simultaneously promote internal crack repair and generate surface byproducts that reduce porosity and enhance impermeability. Moreover, surface repair indicators offer practical advantages for rapid on-site assessment. For example, variable a can be evaluated through visual inspection, while k can be measured using water permeability tests, both serving as preliminary screening tools for internal healing potential. Ultimately, r, measured via electrical resistivity, provides the definitive validation metric for validating internal self-healing efficacy. The integration of surface and internal indicators thus enhances the efficiency, accuracy and reliability of performance assessments.
The relatively low sensitivity of the model to surface variables suggests that, under current experimental conditions, surface healing may have already reached an optimization threshold, with most effects concentrated in the early stages of the repair process. This observation underscores the importance of developing coordinated healing strategies that balance surface and internal effects. Future research should explore optimized agent loading ratios, tailored reaction kinetics, and the use of slow-release carriers to facilitate a controlled transition from superficial sealing to deep, sustained internal repair. In summary, aPCE-based analysis effectively identifies critical bottlenecks in internal healing performance while leveraging complementary surface indicators to construct a comprehensive, multi-dimensional framework for evaluating self-healing behavior in cementitious materials.
Predictive performance assessment of the aPCE framework
As shown in Fig. 4b, the predictive performance of the aPCE model was evaluated using R2, which quantifies the proportion of variance in the Tv explained by Pv. The model achieved an R2 value of 0.9816, indicating that 98.16% of the variance in Tv is captured by Pv, thereby meeting the criteria for high-accuracy predictive models. To further examine model behavior, the dataset was stratified into three age-based intervals—7 d (short-term), 14 d (mid-term), and 28 d (long-term)—corresponding to approximate Tv ranges of 48–52, 68–72, and 70–80, respectively. The analysis revealed that residuals in the high-strength range were minimal and randomly distributed. For example, sample 1 exhibited a residual of only 0.96, suggesting strong local accuracy. In contrast, systematic overestimation was observed in the low-value interval, where predicted values consistently exceeded measured values. Notable instances include sample 61 with a residual of 1.94 and sample 74 with 2.37. This interval contributed 34.2% of the total residual sum of squares (RSS), suggesting that the model may exhibit reduced responsiveness to nonlinear behavior in early-age predictions. These discrepancies could stem from data imbalance or the intrinsic complexity of early-age cement hydration processes.
From a quantitative perspective, the model achieved a root mean square error (RMSE) of 0.5297 and a mean absolute error (MAE) of 0.4709, both indicating low overall prediction bias as shown in Fig. 4e. Despite the wide target range (48.44–80.02), the model maintained a mean absolute error below 0.5, reflecting strong generalization capability. Several samples, including samples 3, 13, and 44, showed absolute errors below 0.2, with sample 44 achieving perfect agreement between Pv and Tv (both 69.45), underscoring the model’s precision in feature learning.
Further qualitative analysis suggests that over 70% of predictions in the low-strength domain exhibited absolute errors below 1.0. No consistent directional bias was detected, indicating an absence of systematic overfitting or underfitting. Nevertheless, the observed tendency toward overestimation in this range highlights a potential area for refinement, particularly through enhancements to the model’s nonlinear expressiveness and improved rebalancing of the dataset.
These findings highlight the necessity of combining global performance metrics with stratified residual analysis to reveal subtle limitations in model behavior. It is crucial to emphasize that the primary advantage of the aPCE framework is not the complete elimination of prediction errors at the early stage. Eliminating such errors is inherently challenging due to high stochasticity. Instead, the framework provides a unified, distribution-adaptive representation across the entire healing cycle. The Stieltjes procedure adaptively constructs orthogonal polynomials directly from the empirical data distributions via KDE. This method creates a unified and data-driven foundation. It intrinsically captures complex morphologies and strong parameter interactions without requiring high-order expansions that risk overfitting. Consequently, this approach shifts the modeling paradigm from ‘piecewise fitting’ for individual ages to ‘global adaptation’ across the full healing cycle. This transition ensures consistent predictive accuracy from the early nonlinear transitions to the later saturation stage. Ultimately, the approach fulfills the core objective of this study by delivering a robust, full-cycle prediction model for the crack cross-sectional repair rate.
Experimental verification of the aPCE model
Verification using non-modeled age-group samples
To validate the generalization capability and engineering applicability of the developed aPCE model across different healing stages, three new sets of specimens were additionally introduced, corresponding to 10, 20, and 30 days of repair. Each group contained three specimens, and each specimen was tested in triplicate. As shown in Fig. 5a, the crack surface healing effects were visually observable, reflecting progressive surface restoration.
a Validation of extrapolation capability to unseen curing ages. The panel demonstrates the model’s ability to predict outcomes at unobserved ages. From left to right: actual cross-sectional images showing the physical repair state, a scatter plot of the corresponding test data statistics (where a, k, r, v, m represent crack surface repair rate, crack water seepage repair rate, crack resistivity-based repair rate, crack ultrasonic velocity repair rate, and crack anti-chloride repair rate, respectively), and a line chart with error bars comparing predictions against measurements, quantitatively displaying the prediction accuracy across different ages. b Benchmarking against independent literature data. The model’s generalizability is tested against published experimental results. The left subpanel presents a comparison of key variables via bar charts. The right subpanel shows a scatter plot validating the correlation between model predictions (Pv) and experimentally reported true values (Tv) from independent studies, with a fitted regression line.
The results of Fig. 5a also demonstrate that the model exhibits notable temporal optimization features and strong predictive accuracy across various repair durations. As the repair period lengthens, the model’s predictive performance improves steadily, with RMSE decreasing to 0.5306 and MAE converging to 0.4713 at the 30-day stage, representing improvements of 32.9% and 30.7% compared to the 10-day stage, respectively, indicating the model’s robust capability to identify and represent long-term repair behavior. For example, in the high-strength case of 30 d, Sample 7 had a true value of 80.34 and a predicted value of 80.01, with an absolute error of only 0.33 (error rate of 0.4%), while Sample 3, with a true value of 80.05 and predicted value of 79.79, had an error of 0.26, further verifying the model’s high accuracy in representing the repair saturation phase.
The model also demonstrated strong cross-period prediction stability and error control. At the early 10-day stage, despite elevated uncertainty associated with dynamic healing processes, 70% of the samples maintained absolute prediction errors below 1.0. For instance, Sample 3 and Sample 7 had errors below 0.2, reflecting good adaptability to early dynamic repair conditions. By the 20 d stage, the model achieved stable fitting within the mid-to-high value range (71.46–73.61), with Sample 3 showing a prediction error of only 0.04 and Sample 7 an error of 0.62. The analysis of error directions—positive bias in Samples 2 and 9, and negative in Samples 4 and 8—also provides a useful basis for further parameter optimization. Notably, at 30 d, all sample errors remained below 1.0, and key sample prediction error rates dropped below 0.5%, confirming the model’s precision and convergence in the later repair stage.
Overall, the aPCE model exhibits excellent temporal adaptability and stable predictive performance across varying magnitudes of strength development. It consistently maintains error rates below 1.5% within a wide output range (58.27–80.34). The model’s RMSE of 0.5306 supports a prediction deviation within ±1.06 under a 95% confidence interval, meeting the stringent accuracy requirements of engineering applications such as concrete healing rate assessment and durability monitoring. These results comprehensively validate the model’s synergistic strengths in theoretical construction, error management, and practical deployment, establishing a high-accuracy and high-reliability framework for time-resolved repair prediction.
Verification using other types of self-healing agents
To demonstrate the generalizability of this proposed method within the domain of self‑healing concrete, three performance indicators are extracted from the literature40: crack surface healing rate (Var1), water permeability recovery rate (Var2), and ultrasonic pulse velocity recovery rate (Output), all measured under the influence of a 4 wt% dosage of the “SHG” healing agent. The raw dataset is summarized in Fig. 5b.
Assuming Gaussian distributions for Var1 and Var2, the first‑order PCE model described in Section 3.2.1 was applied. Figure 5b and “SHG” in Table 5 present the predictive results. The predicted and observed values exhibit a clear banded cluster around the 1:1 reference line, demonstrating the model’s consistency. The coefficient of determination (R² = 0.7317) indicates that 73.17% of the variance in the response is explained by the model, confirming its baseline predictive reliability. While the MAE and RMSE values reveal a degree of systematic bias, application of the empirical “68‑95‑99.7 rule” reveals that most observations fall within one standard deviation of the mean, in line with expectations under normal distribution assumptions. Because a first‑order polynomial expansion was used, the Si and STi indices are identical. Sensitivity analysis highlights Var1 as the dominant input variable, with a Sobol index of 0.7955, indicating that the crack surface healing rate plays a primary role in influencing UPV recovery.
In summary, the model demonstrates substantive predictive efficacy when applied to independent literature-based data, reinforcing its potential applicability in broader contexts. However, limitations remain in its ability to model extreme-value behavior and capture variable interaction effects. Expanding the sample coverage—particularly at distribution boundaries—may enhance both the model’s predictive accuracy and robustness in future implementations.
Summary and Conclusion
This study addresses the complex challenge of quantifying internal crack healing in self-healing cement-based materials by developing a multidimensional prediction framework based on PCE. By establishing a mapping between five key healing indicators and crack parameters, and applying Sobol sensitivity analysis, the model captures the temporal evolution of healing mechanisms. The primary motivation for adopting the Stieltjes orthogonal polynomial-based aPCE framework was to overcome the inherent limitations of classical Wiener-Hermite PCE in modeling the early, highly dynamic, and non-equilibrium stages of the self-healing process, particularly evident at the 7-day curing age. Conventional PCE exhibited systematic prediction biases and a “bimodal” error distribution at this stage, primarily due to its reliance on Gaussian assumptions, which increasingly deviate from the true, complex parameter distributions shaped by the evolving healing process. The Stieltjes procedure, by adaptively constructing orthogonal polynomials directly from the empirical data distributions via KDE, provides a unified, data-driven foundation that intrinsically captures these complex morphologies and strong parameter interactions without requiring high-order expansions that risk overfitting. This approach shifts the modeling paradigm from “piecewise fitting” for individual ages to “global adaptation” across the full healing cycle, ensuring consistent predictive accuracy from the early nonlinear transitions to the later saturation stage, thereby fulfilling the core objective of this study: to deliver a robust, full-cycle prediction model for crack cross-sectional repair rate. The main findings are as follows:
-
1.
Healing performance exhibits strong temporal dependence. Surface healing stabilizes at approximately 97% by 28 d, while internal indicators show a more gradual recovery. Water permeability resistance correlates closely with surface healing, emphasizing the importance of surface sealing in early-stage durability enhancement.
-
2.
The first-order PCE model, which is combined with Sobol sensitivity analysis, reveals a temporal shift in parameter influence: parameter k dominates at early stages, while m becomes the most influential at 28 d. Predictive accuracy improves accordingly, with R2 rising from 0.5236 at 7 d to 0.9523 at 28 d.
-
3.
Based on the global Sobol sensitivity analysis, the initial five-dimensional first-order PCE model was reduced to a three-dimensional second-order PCE model. This dimensionality reduction—guided by Sobol indices—revealed key interaction effects and improved predictive performance at 7 d (R2 = 0.8536) while reducing complexity by over 50%.
-
4.
By adapting Stieltjes orthogonal polynomials to empirically derived input distributions via KDE, the aPCE model overcomes the Gaussian assumptions inherent in traditional Hermite polynomials. The aPCE model demonstrated strong generalization across non-Gaussian and multi-modal datasets, achieving RMSE = 0.5297 and MAE = 0.4709.
-
5.
Model validation using previously unseen age-group samples confirmed the aPCE model’s high reliability throughout the entire healing period. Maximum prediction errors were MAE = 0.6834 and RMSE = 0.7948—representing a 42.3% reduction compared to traditional Gaussian-based models. These results highlight the model’s consistent accuracy and robustness, even under conditions of high variability and nonlinear saturation effects.
Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Lv, L., Guo, P., Liu, G., Han, N. & Xing, F. Light induced self-healing in concrete using novel cementitious capsules containing UV curable adhesive. Cem. Concr. Compos. 105, 103445 (2020).
Yang, D. F., Xu, G. B., Duan, Y. & Dong, S. Self-healing cement composites based on bleaching earth immobilized bacteria. J. Clean. Prod. 358, 132045 (2022).
Achal, V., Mukerjee, A. & Reddy, M. S. Biogenic treatment improves the durability and remediates the cracks of concrete structures. Constr. Build. Mater. 48, 1–5 (2013).
Kan, Y.-C., Lee, M.-G. & Lee, H.-W. Experimental investigation of mode-I fracture toughness of real-cracked concrete repaired by epoxy. Constr. Build. Mater. 293, 123490 (2021).
Wang, Z. et al. Experimental research on variable-amplitude fatigue behavior of cracked mortar after epoxy grouting. Constr. Build. Mater. 304, 124600 (2021).
Zhang, W., Zheng, Q., Ashour, A. & Han, B. Self-healing cement concrete composites for resilient infrastructures: A review. Compos. B-Eng. 189, 107892 (2020).
Li, Z. et al. Biomineralization process of CaCO3 Precipitation Induced by Bacillus mucilaginous and its potential application in microbial self-healing concrete. Appl. Biochem. Biotechnol. 196, 1896–1920 (2024).
Zhang, L., Zheng, M., Zhao, D. & Feng, Y. A review of novel self-healing concrete technologies. J. Build. Eng. 89, 109331 (2024).
Khan, M. B. E., Dias-da-Costa, D. & Shen, L. Factors affecting the self-healing performance of bacteria-based cementitious composites: A review. Constr. Build. Mater. 384, 131271 (2023).
Wong, L. S. Microbial cementation of ureolytic bacteria from the genus Bacillus: a review of the bacterial application on cement-based materials for cleaner production. J. Clean. Prod. 93, 5–17 (2015).
Wang, J. Y., Soens, H., Verstraete, W. & De Belie, N. Self-healing concrete by use of microencapsulated bacterial spores. Cem. Concr. Res. 56, 139–152 (2014).
Tziviloglou, E., Wiktor, V., Jonkers, H. M. & Schlangen, E. Bacteria-based self-healing concrete to increase liquid tightness of cracks. Constr. Build. Mater. 122, 118–125 (2016).
Ahmad, I., Shokouhian, M., Jenkins, M. & McLemore, G. L. Factors influencing bacterial-based precipitation, assessment of crack inducing, durability and characterization methods: a comprehensive review. Innov. Infrastruct. Solut. 10, 107 (2025).
Cappellesso, V. G., Van Mullem, T., Gruyaert, E., Van Tittelboom, K. & Belie, N. D. Bacteria-based self-healing concrete exposed to frost salt scaling. Cem. Concr. Compos. 139, 105016 (2023).
Amjad, H., Zeb, M. S., Khushnood, R. A. & Khan, N. Impacts of biomimetic self-healing of Lysinibacillus boronitolerans immobilized through recycled fine and coarse brick aggregates in concrete. J. Build. Eng. 76, 107327 (2023).
Zhan, Q., Zhang, X., Zhao, H. & Su, Y. A full-depth self-healing strategy for cracks in cement-based materials under marine environment. Constr. Build. Mater. 449, 138295 (2024).
Wiktor, V. & Jonkers, H. M. Quantification of crack-healing in novel bacteria-based self-healing concrete. Cem. Concr. Compos. 33, 763–770 (2011).
Zhang, J. et al. Immobilizing bacteria in expanded perlite for the crack self-healing in concrete. Constr. Build. Mater. 148, 610–617 (2017).
Zhang, X. et al. Self-healing properties and improvement methods of mortar cracks in marine corrosive environments. J. Build. Eng. 78, 107676 (2023).
Wang, J., Jonkers, H. M., Boon, N. & De Belie, N. Bacillus sphaericus LMG 22257 is physiologically suitable for self-healing concrete. Appl. Microbiol. Biotechnol. 101, 5101–5114 (2017).
Ersan, Y. C., Hernandez-Sanabria, E., Boon, N. & de Belie, N. Enhanced crack closure performance of microbial mortar through nitrate reduction. Cem. Concr. Compos. 70, 159–170 (2016).
Qian, C., Chen, H., Ren, L. & Luo, M. Self-healing of early age cracks in cement-based materials by mineralization of carbonic anhydrase microorganism. Front. Microbiol. 6, 1225 (2015).
Fu, C. et al. Self-healing properties of cement-based materials in different matrix based on microbial mineralization coupled with bimetallic hydroxide. Constr. Build. Mater. 400, 132686 (2023).
Wiener, N. The homogeneous chaos. Am. J. Math. 60, 897–936 (1938).
Choi, S. K., Grandhi, R. V. & Canfield, R. A. Structural reliability under non-Gaussian stochastic behavior. Comput. Struct. 82, 1113–1121 (2004).
Choi, S.-K., Canfield, R. A. & Grandhi, R. V. Estimation of structural reliability for Gaussian random fields. Struct. Infrastruct. Eng. 2, 161–173 (2006).
Ghanem, R. & Spanos, P. D. Polynomial chaos in stochastic finite-elements. J. Appl. Mech.-Trans. Asme 57, 197–202 (1990).
Ghanem, R. Ingredients for a general purpose stochastic finite elements implementation. Comput. Methods Appl. Mech. Eng. 168, 19–34 (1999).
Marzouk, Y. M. & Najm, H. N. Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems. J. Comput. Phys. 228, 1862–1902 (2009).
Ghanem, R. G. & Doostan, A. On the construction and analysis of stochastic models: Characterization and propagation of the errors associated with limited data. J. Comput. Phys. 217, 63–81 (2006).
Das, S., Ghanem, R. & Spall, J. C. Asymptotic sampling distribution for polynomial chaos representation from data: a maximum entropy and Fisher information approach. Siam J. Sci. Comput. 30, 2207–2234 (2008).
Ghanem, R. G., Doostan, A. & Red-Horse, J. A probabilistic construction of model validation. Comput. Methods Appl. Mech. Eng. 197, 2585–2595 (2008).
Mahdavi, G. & Hariri-Ardebili, M. A. Kriging, polynomial chaos expansion, and low-rank approximations in material science and big data analytics. Big Data 12, 270–281 (2024).
He, J., Gao, R. & Tang, Z. A data-driven multi-scale constitutive model of concrete material based on polynomial chaos expansion and stochastic damage model. Constr. Build. Mater. 334, 127441 (2022).
Kim, Y. J., Micnhimer, D. & Park, H.-G. Sparse grid modeling of carbon fiber-reinforced polymer-strengthened pilotis under biaxial load. Acids. Struct. J. 118, 19–32 (2021).
Fu, C., Zhan, Q., Wang, A., Zhou, J. & Pan, Z. Study on improving the activity of mineralized microorganisms by regulating the chemical environment of marine concrete crack area with inorganic minerals. Constr. Build. Mater. 344, 128173 (2022).
Askey, R. & Wilson, J. Some basic hypergeometric orthogonal polynomials that generalize Jacobi polynomials. Mem. Am. Math. Soc. 54, 1–55 (1985).
Blatman, G. & Sudret, B. An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis. Probab. Eng. Mech. 25, 183–197 (2010).
Sudret, B. Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst. Saf. 93, 964–979 (2008).
Hu, H. T. et al. A novel protective cementitious material based on bimetallic hydroxide-modified microbial mineralization: Design, preparation, protective performance, mechanism, and engineering application. Constr. Build. Mater. 445, 137966 (2024).
Acknowledgements
The authors would like to acknowledge the support from the Ministry of Science and Technology of China under grant No. 2023YFC3804300, National Natural Science Foundation of China under grant no. 52125802, and the Postgraduate Research& Practice Innovation Program of Jiangsu Province under grant no. SJCX25_0118, National Natural Science Foundation of China under grant no. 52178114, and the National Natural Science Foundation of China under grant no. 52578560.
Author information
Authors and Affiliations
Contributions
C.F.: Conceptualization, Methodology, Investigation, Formal analysis, Validation, Data curation, Writing – original draft, Visualization. W.X.: Supervision, Project administration, Resources, Writing – review & editing. Q.Z.: Conceptualization, Resources, Funding acquisition, Methodology, Writing – review & editing. C.C.: Methodology, Software, Formal analysis, Writing – review & editing. T.G.: Resources, Validation, Writing – review & editing. X.Z.: Investigation, Data curation, Validation, Formal analysis. BENQIANG PANG: Investigation, Data curation, Visualization.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Engineering thanks Seyed Hooman Ghasemi and the other anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: [Ali Behnood] and [Philip Coatsworth]. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Fu, C., Xu, W., Zhan, Q. et al. Full-cycle prediction of crack healing in self-healing concrete using generalized polynomial chaos expansion. Commun Eng 5, 54 (2026). https://doi.org/10.1038/s44172-026-00608-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s44172-026-00608-5







