Abstract
Accurate description of the condition of engineering structures is important for ensuring structural safety. Traditional analysis methods based on simplified physical mechanisms cannot accurately characterize the structural condition and neglect the value of the large amount of data generated during the construction process. This paper proposes a data-driven analysis framework that combines physical principles, dimensionality reduction techniques and ensemble learning models to trace back the deep-seated connections between data, achieving multi-factor analysis of structural defects. Using concrete structural cracks in a certain project as an example, the framework considers full life-cycle data, including material, environment, and construction processes, to construct an assessment model. The results show that by establishing a mapping relationship between construction data and structural condition, and integrating cumulative indicators from different construction stages, a reference for describing the structural safety condition can be provided to some extent, along with optimization suggestions, offering an analytical perspective for solving complex structural problems in engineering.
Similar content being viewed by others
Introduction
The integrity and safety of structures are paramount in the construction of engineering projects. Recently, the focus has shifted towards prefabricated modular structures in infrastructure development, predominantly utilizing concrete—a material that transitions from a fluid to a solid state, acquiring strength progressively. Despite rigorous management and quality controls, defects such as concrete cracking still occur1, threatening the long-term reliability of these structures. Common defects are typically addressed through established engineering practices and preventive standards, yet complex defects can persist2, underscoring the need for a comprehensive analysis of structural defects and their origins.
Traditional research on construction-related defects has primarily centered on identification techniques, employing tools like computer vision3,4, structural modal analysis5, ultrasonic testing, and infrared imaging5 for defect detection. However, these methods fall short of explaining the underlying causes of defects. A thorough diagnostic approach necessitates an understanding of defect origins and development processes through empirical experiments6,7, numerical simulations8,9,10, and data-driven methods11,12. Despite their utility, empirical methods and simulations are limited by their idealized conditions and lack of comprehensive real-world applicability, prompting a shift towards integrating big data and machine learning for a deeper analysis.
Although machine learning methods have been widely applied in the engineering field13,14,15,16,17, purely data-driven scientific approaches struggle to effectively handle the complex data in engineering and integrate it into standardized datasets. Moreover, traditional machine learning methods focus on the study of data correlations, often overlooking the intrinsic mechanisms of structural defects and lacking the integration of domain knowledge, resulting in a lack of transparency and interpretability in the predictions18. To address this, the approach can be to first standardize and integrate the complex engineering data based on physical principles, then use physical models to define and interpret the results of machine learning data analysis. At the same time, data-driven techniques such as principal component analysis (PCA), t-SNE, and UMAP19,20,21,22 can be used to refine and quantify the physical models in reverse, in order to elucidate the complex relationships and identify the key influencing factors18. Ensemble tree models23,24,25 also offer good interpretability, with tree-based performance evaluation objectively assessing the importance of variables26, and are more suitable than neural networks for engineering problems with limited data, enabling better study of defect evolution mechanisms27.
For instance, ensemble learning models like XGBoost (eXtreme Gradient Boosting) have demonstrated superior performance in predicting concrete mechanical properties and offer a structured approach to understanding the influence of different materials. Such models are adept at handling the dynamic and complex nature of engineering practices where multiple factors—material properties25,28,29,30, environmental conditions, and structural loads31,32,33,34—interplay to affect the outcome. This capability is particularly vital in large-scale constructions35,36 and the management of complex equipment37, where the dynamic interplay of various factors is crucial.
The generation of voluminous construction data during the building of structures provides a rich source of time-series information that is invaluable for defect analysis. This study introduces a multifaceted framework to interpret the coupled causes of defects from the perspective of construction data. Figure 1 provides an overview of the workflow of this framework. By integrating domain knowledge with data models, we obtained physics-enhanced integrated data, and employed data dimensionality reduction and data mining techniques to construct a comprehensive structural defect assessment model. Using the example of mass-produced concrete structures with longitudinal cracks that appeared in a precast small box girder project, we describe the process of defect analysis and propose an evaluation system: First, we perform physical enhancement and integration of the raw data. Then, the integrated data is input into the constructed model to obtain interpretable analysis results. By combining these results with the characteristics of the engineering structure, we propose a comprehensive evaluation model that incorporates data information. Finally, the model’s validity is verified using auxiliary indicators. This system considers structural performance, environmental influences, and construction methods to assess their impact on structural quality.
Conventional structural safety assessments often rely on field testing and visual inspections to determine the health of the structure. This approach describes specific state indicators at a given time, and sometimes fails to accurately identify the causes of complex damage, as the damage may have been initiated earlier. The method proposed in this paper attempts to incorporate data from the entire construction process into the analytical framework, utilizing techniques such as machine learning to include construction process information, thereby enhancing the applicability of structural condition assessments.
This multifactorial analysis framework, based on rigorous construction data analysis, provides new tools and perspectives for understanding and mitigating defects in concrete structures. The findings will assist in improving construction strategies, processes, and safety management in civil engineering, offering a systematic approach to classifying and addressing the quality of prefabricated structures.
Background
The background engineering of this study involves 472 standardized prefabricated concrete box girders, among which 96 girders exhibited longitudinal cracks in the web during the storage phase. This batch of girders is intended for an urban elevated road project, with a structural system consisting of simply supported girders and a continuous bridge deck. The prefabricated box girders are post-tensioned prestressed concrete box girders, designed using widely applied standardized drawings ranging from 25.62 m to 30.62 m. The cracks in the affected girders extend along the traffic direction of the girders, which is significantly different from typical vertical cracks in the web or transverse cracks in the bottom slab caused by self-weight or external loads. During the preliminary inspection, the construction process was consistent with the established procedures, the materials were supplied in uniform specifications, and the construction control metrics met the standards, making it difficult to determine the cause of the defects. Detailed information on the engineering project can be found in Appendix I.
Results
Processing and physical enhancement of construction process data
Table 1 presents a comprehensive dataset recorded during the construction of prefabricated concrete small box girders, encompassing both objective and measured data. Objective data includes design parameters, construction timelines, and environmental temperatures, while measured data captures indicators at critical construction stages such as the strength of various concrete and grout specimens, the elongation of prestress tendons, and details on crack dimensions.
Although the dataset is comprehensive, key measurement data was subject to errors, incompleteness or unavailability, and the large volume of data was difficult to effectively integrate. Therefore, the data was supplemented and integrated through existing material property research and theoretical studies. The construction process and data collection procedures can be found in Appendix I, and the raw data can be obtained by contacting the corresponding author.
The process of concrete solidification from a liquid to a solid state involves a hydration reaction, which is significantly influenced by time and ambient temperature. Variations in environmental temperature can impede the hydration process, affecting strength development. Traditional reliance on curing time and surface strength tests during construction is insufficient for a comprehensive assessment of concrete strength development. The study leverages the concepts of equivalent age and maturity, which incorporate both time and temperature, to provide a precise description of concrete strength development38. Appendix II provide detailed descriptions of the methods used for supplementing the data.
The enhanced dataset now includes 44 categories and 23,600 records, covering every critical construction stage as shown in Table 1. The enhanced dataset consists of multi-degree of freedom data, which contains information on multiple variables as well as the associated physical principles. It features maturity and equivalent age metrics at times of stripping, demoulding, prestressing, and grouting, alongside calculated concrete strength and estimated elastic properties at these stages. This extensive dataset supports a thorough analysis of potential causes behind longitudinal cracks observed in the small box girders, employing advanced data analysis techniques to uncover underlying patterns.
Dataset integration and data selection
The dataset was developed based on three principles: (1) Removing redundant data with high correlations; (2) Retaining data that includes both engineering measurements and physical principles; (3) Maintaining consistent data dimensions, i.e., ensuring the degrees of freedom are coordinated among the retained variables.
A pairwise correlation analysis was conducted using the Spearman correlation coefficient to evaluate relationships within the dataset. The results are visually represented in Fig. 1, with reference numbers corresponding to Table 1. This heatmap highlights connections between various factors and structural cracking through lines, where thicker lines indicate stronger correlations. The intensity of the correlations between pairs of factors is represented by the size of the squares, with larger squares denoting stronger correlations. Notably, aside from the expected high self-correlation near the diagonal, two distinct regions displayed significant correlations.
The first region indicates a strong correlation between date and temperature, attributable to the seasonal construction phases of small box girders from November to June, covering winter through summer, leading to substantial temperature fluctuations. The second region reveals a significant correlation among maturity, equivalent age, and strength—all derived variables—as well as time and temperature. These variables collectively reflect the age and temperature data, suggesting that retaining just one of these variables (equivalent age) in the dataset could suffice due to their overlapping information content.
Consequently, the dataset was streamlined by excluding less critical variables such as temperature, time, and calculated strength, while keeping the equivalent age. Adjustments were made to the data related to prestress tendons elongation by retaining relative elongation measures and discarding absolute values to account for variations in girder length. This optimization eliminated redundant data, including design elongation specifications.
The refined dataset was reduced from 44 to 13 distinct groups, forming the core analysis dataset, and these data all have multiple degrees of freedom. This dataset contains representative data from the key milestones of the construction process, spanning from material supply, to concrete casting, to concrete demolding, to tension the tendon, and finally to grouting and girder storage. It encompasses the full lifecycle information of the structure. This set includes metrics such as the 28-day compressive strength of concrete, equivalent ages at various stages (demoulding, tensioning, grouting), and multiple measurements of relative elongation (L1 to R4). The verification dataset, essential for confirming findings, comprises the number of cracks and the widths of those cracks, as detailed in Table 2.
Data mining and results interpretation
The integrated dataset contains 13 influential factors that cover the key stages of the structural life cycle. The values of these factors are influenced by environmental and other factors, and may deviate from the designed values. These deviations are reflected in the final state of the structure, and excessively large deviations may lead to cracking and other types of defects. After the data has been well-integrated, it can be processed using a standardized data model to quantify this bias. The XGBoost model was utilized to analyze the dataset and interpret the influence of each factor on structural cracking using the SHAP (SHapley Additive exPlanations) method. This analysis highlighted the specific impact of changes in each factor’s value as well as their overall significance in contributing to structural cracking. Figure 2 displays a bar chart of the SHAP values for influential factors, organized by decreasing importance. The horizontal axis measures the absolute SHAP values, with larger values indicating a greater impact of the factor on cracking.
The most significant influence was observed from the equivalent age at tensioning, which is indicative of the development level of structural strength. Following this, the influence of prestress elongation was notable, reflecting the effects of construction control on structural integrity. Although the materials used, such as concrete and grout, do exert some influence, they are not primary determinants of structural performance. Given that the absolute age at demoulding was consistent at 2 days across all samples, resulting in minimal variation, its impact on structural cracking is considered negligible.
Figures 3 and 4 display scatter plots that detail how changes in the SHAP values of these factors affect structural cracking. In these figures, the blue curve and histogram represent the data distribution. The x-axis denotes the value of the influencing factor, and the y-axis measures its impact on structural cracking, referred to as “Cracking Impact”. Positive values on this scale correlate with an increased likelihood of cracking, whereas negative values suggest a mitigating effect, with larger absolute values indicating a stronger influence.
The relationship between each factor’s value and its impact is encapsulated by the function fi(xi), depicted by the red curve in the scatter plots, derived from GAM (the Generalized Additive Model) fitting. Given that engineering structure data behaves as a random variable, it typically follows a normal distribution under standard construction conditions. This study incorporates a 90% confidence interval to account for statistical and measurement errors, represented by the shaded red area in the plots. This comprehensive approach ensures a robust calibration of the model, enhancing our understanding of the factors that contribute to structural defects.
Establishing of structural condition assessment model
During our data analysis, the characteristic data of each girder was linked to a unique SHAP value, representing the total potential for cracking. This value was then incorporated into Formula (4) to calculate the cumulative effect of the 12 influencing factors on structural cracking, yielding a comprehensive cracking impact degree Qdefect for each girder. A scatter plot, with Qdefect on the vertical axis, visually differentiates the girders based on the presence of cracks: red data points indicate cracked girders and blue points indicate those without cracks, as shown in Fig. 5. The red and blue curves demonstrate the distribution of Qdefect values among cracked and uncracked girders respectively. This plot is divided into three regions: an easily cracked region (orange), a cracked region (red), and an uncracked region (blue).
The cracked region, characterized by a high Qdefect, correlates with a greater likelihood of cracking. Girders within this region were indeed found to be cracked upon inspection, including two girders that underwent destructive testing and revealed extensive cracking. In contrast, the easily cracked region contains a mix of cracked and uncracked girders, accounting for 66.7% of all samples. This area represents a critical threshold; about 40% of these girders exhibited external surface cracking, emphasizing the variability and critical condition of girder performance due to inconsistent material properties and construction practices.
Conversely, the uncracked region indicates a low Qdefect and signifies girders that are in good condition, meeting all design and construction standards with minimal risk of cracking. Inspection confirmed the absence of surface cracks on girders in this region, suggesting a low likelihood of internal cracking as well.
To summarize, extensive data collection and analysis during construction, alongside theoretical calculations to track material performance at critical construction stages, have elucidated the causes of cracking in prestressed small box girders. The analysis identifies two pivotal stages in construction process control: the tensioning of prestress tendons and the grouting of prestressing ducts. Material performance variability has led to misjudgments in the evolution of material properties, contributing to premature tensioning based on incomplete property development. The use of tensioning force as a control measure often resulted in inadequate elongation of prestress tendons and asymmetrical tensioning. Additionally, the expansion of grouting material during hardening exerted radial forces, contributing to significant cracking along the outer web. Despite design redundancies intended to mitigate cracking, inconsistent control at various construction stages resulted in a significant dispersion of outcomes and a cumulative increase in cracking potential, affecting the structural integrity of several girders.
Validation of model effectiveness
The proposed model simplifies complex, nonlinear interactions into manageable linear or simple nonlinear relations. However, the validity of these simplifications requires empirical verification. Parameters such as crack number, width, and depth, not initially included in the model, serve as independent indicators of structural defect and are used for model validation. Existing research shows that crack width is positively correlated with the degree of defects39,40,41. Our analysis, shown in Fig. 6, reveals a strong linear correlation between crack width and the comprehensive evaluation index, with a Pearson correlation coefficient of 0.65, affirming the model’s reliability in reflecting the degree of structural defect.
This validation underscores the model’s effectiveness in capturing the nuances of structural integrity, providing a robust framework for predicting and managing cracking in prestressed small box girders.
Discussion
Variability in Physical properties of Slurry materials is the Main cause of poor Quality Control
Our data mining efforts categorized the factors influencing structural cracking into three main groups: material factors (such as concrete strength), construction operation factors (including the relative elongation of prestress tendons), and construction process control factors (notably the equivalent age). As depicted in Fig. 3, the impact of material strength—specifically the strength of concrete and grouting materials—on structural cracking exhibits a significant trend. Within specific strength thresholds, dramatic changes in their influence on cracking are evident. For instance, a notable division occurs at 62 MPa for grouting materials; below this threshold, the cracking impact peaks at 0.5, yet it falls below zero when this strength is surpassed. Conversely, concrete strength has a relatively subdued effect on cracking, peaking at 0.3 and diminishing as the strength exceeds 55 MPa, dropping below zero after 60 MPa.
The performance of these materials—both concrete and grouting—varies due to different environmental conditions such as temperature, humidity, and curing practices during construction, affecting their strength and expansion characteristics. Generally, there is an inverse relationship between strength and expansion rate; higher strength correlates with lower expansion, influenced by factors such as water-to-cement ratio, cement type, aggregate properties, and admixtures. Despite uniform material supply, significant variations were observed in the performance of batches on-site, with grouting material strength averaging between 60 and 70 MPa and concrete strength around 60 MPa, both surpassing designed values significantly. These variations highlight the potential risks and inaccuracies in on-site testing indicators, which could lead to cracking in precast box girder webs, warranting meticulous attention.
Theoretical calculations need to be strengthened to Enhance Product Quality in Prefabrication projects
Figure 3 explores the development of material strength at critical construction stages: the tensioning of prestress and grouting of prestress ducts. The calculated equivalent age of concrete at the time of tensioning is particularly sensitive to web cracking, with impacts ranging significantly. Notably, at an equivalent age of 9 days, there is a stark drop in the influence on cracking. This timing coincides with rapid initial strength gain in concrete, as hydration progresses to peak strength and modulus development. If the concrete has not adequately matured by the time of prestress tensioning, the induced stresses can lead to undesirable deformations and heightened cracking risks. Despite pre-testing under similar conditions to ascertain a feasible tensioning schedule, discrepancies in actual concrete age due to environmental and curing variations led to some instances of web cracking. The reliance solely on rebound measurements for assessing surface strength further complicates this, as earlier tests may satisfy strength requirements despite insufficient actual maturation, prompting premature tensioning activities.
The impact of grouting timing also shows a significant correlation with web cracking, with earlier grouting generally beneficial in mitigating crack formation. The synchronous expansion and hardening of the structural concrete and grouting materials can effectively manage stress concentrations and minimize cracking risks. However, as the concrete matures during the grouting process, the expanding grout exerts increasing compression on the concrete web, potentially exacerbating cracking. This sequential process of tensioning followed by grouting underscores the importance of stabilizing concrete material performance prior to any prestress activities to prevent cracking.
The sequence of prestress tensioning critically alters the stress conditions within prestressed structures, with each tendon’s tensioning impacting web cracking differently. Precise control of tensioning forces and subsequent verification of relative elongation are crucial; deviations in elongation, particularly around − 3%, markedly increase cracking risks. Sequential tensioning from one side to the other can also introduce asymmetry in stress distribution, amplifying potential issues. Given the inherent reliability of industrially produced prestress tendons over on-site concrete, ensuring the concrete’s maturity and compatibility with expected performance standards before tensioning is pivotal in averting structural issues.
Conclusions and outlook
(1) Through data physical enhancement, correlation analysis, and dimensionality reduction algorithms, material factors, construction operation factors, and construction process control factors that lead to the cracking of the webs of precast box girders were summarized.
(2) Poor control at various construction stages of the precast prestressed small box girders resulted in an accumulation of cracking defects, ultimately leading to the cracking of some girders.
(3) The framework constructed using complete construction process data explained the process and reasons for the formation of cracks and classified the quality grades of precast box girders.
The current model still has limitations in handling temporal sequences and causal relationships. Future directions include: obtaining more precise, high-quality data; transcoding data into formats that are easier for computers to process; and achieving the forecasting and prediction of defects.
Methods
Framework of analysis
The analysis framework, depicted in Fig. 7, encompasses several sequential steps. Before conducting a typical data analysis workflow, the data is first enhanced through physical formulas, adjusting the coordination of information contained in each data category, and physical principles can be incorporated into the data. (The detailed data physical enhancement process is provided in Appendix II.) Then, correlation analysis is performed on the data for data cleaning, removing redundant data, and integrating the dataset into an analysis dataset and a validation dataset. Next, integrated models are applied to quantitatively analyze the data, extracting features and classifying samples. Subsequently, domain knowledge and physical models are integrated to interpret the results from the data mining stage. The final step involves calibrating the physical model of structural defects, establishing a comprehensive evaluation model, and employing the validation data to confirm the model’s accuracy.
Correlation analysis
The Spearman correlation coefficient is utilized to analyze the relationships between pairs of data features, as demonstrated by Eq. (1). It divides the data sample into several levels and then studies the correlation between variables, can better describe the non-linear relationship between them. High correlation coefficients between data groups indicate substantial information overlap. This redundancy can be mitigated by removing redundant data, ensuring that the remaining features are relatively independent.
Where ρS represents the Spearman correlation coefficient, Ri and Si are the respective levels of the observed value i. \(\:\stackrel{-}{R}\) and \(\:\stackrel{-}{S}\) are the respective mean levels of the variables x and y.
Integrated models and data fitting
The SHAP algorithm, following the classification output from the XGBoost model, is employed to conduct attribution analysis on each category of features according to Eq. (2). Concurrently, GAM processes the data. As defined in Eq. (3), the fitted basis function pi is identified as a spline basis function. The process involves enumerating all potential forms of pi, exploring every combination, and calculating residuals. This iterative approach continues until the residuals converge, resulting in the optimal fitted function.
Where \(\:{\widehat{y}}_{i}^{\left(t\right)}\) represents the predicted result of sample i after t iterations, \(\:{\widehat{y}}_{i}^{\left(t-1\right)}\) represents the predicted result of the first t-1 trees, and ft(xi) represents the model of the t-th tree.
Where xi represents the i-th sample, xij represents the j-th feature of the i-th sample, yi represents the predicted value of the model for that sample, ybase represents the baseline of the entire model (usually the mean of the target variable for all samples), and f(xij) represents the SHAP value of xij.
Where, mi is the independent variable, pi(mi) is the i-th spline basis function, and q(mi) is the fitting function.
Calibration and interpretation of results
The influences of construction process factors on the structural condition are highly coupled, with multiple factors jointly affecting the structural state. Data mining has decoupled the numerous influencing factors, such that the impact of each influencing factor on the structure is independent. (The detailed discussion on the complex influence of construction process factors on the structure is provided in Appendix II.) The data analysis has led to the creation of a simplified path diagram of structural defect, as shown in Fig. 8. This diagram illustrates the progression of engineering structures from raw materials to completion, emphasizing the accumulation of defects throughout this process. Integrating the data set with uneven information content into an integrated data set with several key nodes. This allows the complex structural state function to be represented by a simple additive formula. Data mining techniques have facilitated the integration of various influences on structural integrity into a unified metric, captured and expressed in Eq. (5).
The factors, denoted as xi and detailed in Table 2, represent the twelve most significant variables influencing structural integrity. The relationship between each factor’s contribution and its effect on structural defects is nonlinear. Calibration and interpretation of these factors utilize SHAP values to understand their specific impacts.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Trindade, J. C., Garcia, S. L. G., Lacerda, T. N. & Resende, T. L. Analysis of the shear behavior of reinforced recycled aggregate concrete beams based on shear transfer mechanisms. Eng. Struct. 293, 116616 (2023).
Maruyama, I. & Lura, P. Properties of early-age concrete relevant to cracking in massive concrete. Cem. Concr Res. 123, 105770 (2019).
Paneru, S. & Jeelani, I. Computer vision applications in construction: current state, opportunities & challenges. Autom. Constr. 132, 103940 (2021).
Ali, R., Chuah, J. H., Talip, M. S. A., Mokhtar, N. & Shoaib, M. A. Structural crack detection using deep convolutional neural networks. Autom. Constr. 133, 103989 (2022).
He, Z. et al. Integrated structural health monitoring in bridge engineering. Autom. Constr. 136, 104168 (2022).
Çakır, Ö. Experimental analysis of properties of recycled coarse aggregate (RCA) concrete with mineral additives. Constr. Build. Mater. 68, 17–25 (2014).
Wu, C., Hwang, H. J., Shi, C., Li, N. & Du, Y. Shear tests on reinforced slag-based geopolymer concrete beams with transverse reinforcement. Eng. Struct. 219, 110966 (2020).
Mehta, P. K. & Monteiro, P. J. M. Concrete: Microstructure,Properties,and Materials 4th edn (MCGraw-Hill, 2013).
Yazid, A., Abdelkader, N. & Abdelmadjid, H. A state-of-the-art review of the X-FEM for computational fracture mechanics. Appl. Math. Model. 33, 4269–4282 (2009).
Liu, S. et al. Mechanical strength model of engineered cementitious composites with freeze–thaw damage based on pore structure evolution. Cem. Concr Compos. 134, 104706 (2022).
Ben Chaabene, W., Flah, M. & Nehdi, M. L. Machine learning prediction of mechanical properties of concrete: critical review. Constr. Build. Mater. 260, 119889 (2020).
Akiyama, M., Frangopol, D. M. & Takenaka, K. Reliability-based durability design and service life assessment of reinforced concrete deck slab of jetty structures. Struct. Infrastruct. Eng. 13, 468–477 (2017).
Dantas, A. T. A., Batista Leite, M. & de Jesus Nagahama, K. Prediction of compressive strength of concrete containing construction and demolition waste using artificial neural networks. Constr. Build. Mater. 38, 717–722 (2013).
Golafshani, E. M., Behnood, A. & Arashpour, M. Predicting the compressive strength of normal and high-performance concretes using ANN and ANFIS hybridized with Grey Wolf Optimizer. Constr. Build. Mater. 232, 117266 (2020).
Salami, B. A., Olayiwola, T., Oyehan, T. A. & Raji, I. A. Data-driven model for ternary-blend concrete compressive strength prediction using machine learning approach. Constr. Build. Mater. 301, 124152 (2021).
Mangalathu, S. & Jeon, J. S. Classification of failure mode and prediction of shear strength for reinforced concrete beam-column joints using machine learning techniques. Eng. Struct. 160, 85–94 (2018).
Sarveghadi, M., Gandomi, A. H., Bolandi, H. & Alavi, A. H. Development of prediction models for shear strength of SFRCB using a machine learning approach. Neural Comput. Applic. 31, 2085–2094 (2019).
Zhao, X., Liang, J. & Dang, C. Clustering ensemble selection for categorical data based on internal validity indices. Pattern Recognit. 69, 150–168 (2017).
Yang, D., Wei, V., Jin, Z., Yang, Z. & Chen, X. A UMAP-based clustering method for multi-scale damage analysis of laminates. Appl. Math. Model. 111, 78–93 (2022).
Chen, X. et al. Health diagnosis of concrete dams with continuous missing data for assessing structural deformation based on tSNE–AHC algorithm and deep transfer learning. Structures 57, 105134 (2023).
Milošević, D. et al. The application of Uniform Manifold Approximation and Projection (UMAP) for unconstrained ordination and classification of biological indicators in aquatic ecology. Sci. Total Environ. 815, 152365 (2022).
Stolarek, I., Samelak-Czajka, A., Figlerowicz, M. & Jackowiak, P. Dimensionality reduction by UMAP for visualizing and aiding in classification of imaging flow cytometry data. iScience 25, 105142 (2022).
Han, T., Siddique, A., Khayat, K., Huang, J. & Kumar, A. An ensemble machine learning approach for prediction and optimization of modulus of elasticity of recycled aggregate concrete. Constr. Build. Mater. 244, 118271 (2020).
Zhang, S., Xu, J., Lai, T., Yu, Y. & Xiong, W. Bond stress estimation of profiled steel-concrete in steel reinforced concrete composite structures using ensemble machine learning approaches. Eng. Struct. 294, 116725 (2023).
Zhao, X. Y., Hong, M. Y. & Wu, B. Chemistry-informed multi-objective mix design optimization of self-compacting concrete incorporating recycled aggregates. Case Stud. Constr. Mater. 19, e02485 (2023).
Kuncheva, L. I. & Whitaker, C. J. Measures of diversity in classifier ensembles and their relationship with the Ensemble Accuracy. Mach. Learn. 51, 181–207 (2003).
Mian, Z. et al. A literature review of fault diagnosis based on ensemble learning. Eng. Appl. Artif. Intell. 127, 107357 (2024).
Shah, M. I., Javed, M. F., Aslam, F. & Alabduljabbar, H. Machine learning modeling integrating experimental analysis for predicting the properties of sugarcane bagasse ash concrete. Constr. Build. Mater. 314, 125634 (2022).
Zhang, L. V., Marani, A. & Nehdi, M. L. Chemistry-informed machine learning prediction of compressive strength for alkali-activated materials. Constr. Build. Mater. 316, 126103 (2022).
Abdulalim Alabdullah, A. et al. Prediction of rapid chloride penetration resistance of metakaolin based high strength concrete using light GBM and XGBoost models by incorporating SHAP analysis. Constr. Build. Mater. 345, 128296 (2022).
Feng, D. C., Wang, W. J., Mangalathu, S., Hu, G. & Wu, T. Implementing ensemble learning methods to predict the shear strength of RC deep beams with/without web reinforcements. Eng. Struct. 235, 111979 (2021).
Rahman, J., Ahmed, K. S., Khan, N. I., Islam, K. & Mangalathu, S. Data-driven shear strength prediction of steel fiber reinforced concrete beams using machine learning approach. Eng. Struct. 233, 111743 (2021).
Huu Nguyen, M., Nguyen, T. A. & Ly, H. B. Ensemble XGBoost schemes for improved compressive strength prediction of UHPC. Structures 57, 105062 (2023).
Liang, M. et al. Interpretable ensemble-machine-learning models for predicting creep behavior of concrete. Cem. Concr Compos. 125, 104295 (2022).
Jas, K. & Dodagoudar, G. R. Explainable machine learning model for liquefaction potential assessment of soils using XGBoost-SHAP. Soil. Dyn. Earthq. Eng. 165, 107662 (2023).
Tabesh, M., Mahmoudzadeh, A. & Arezoumand, S. A reliability-base method for thermal cracking prediction in asphalt concrete. Constr. Build. Mater. 409, 133912 (2023).
Hu, M., Zhang, H., Wu, B., Gang, L. & Li, Z. Interpretable predictive model for shield attitude control performance based on XGboost and SHAP. Sci. Rep. 12, 18226 (2022).
Elvery, R. H. & Evans, E. P. Principles underlying the steam curing of concrete at atmospheric pressure. Mag Concr Res. 2, 127–140 (1951).
Zhang, X. et al. Characteristics of crack spacing and crack width movement of early-age partially continuous reinforced concrete pavement under environmental loading: a full-scale field investigation. Constr. Build. Mater. 422, 135832 (2024).
Wang, Y., Chen, M., Shao, X., Deng, S. & Li, C. Flexural cracking behaviour and crack width calculation of steel-plate-reinforced ribbed UHPFRC deck panels. Eng. Struct. 315, 118478 (2024).
Jia, S., Akiyama, M., Frangopol, D. M. & Xu, Z. Bayesian inference of the spatial distribution of steel corrosion in reinforced concrete structures using corrosion-induced crack width. Struct. Saf. 111, 102518 (2024).
Acknowledgements
This research supported by the Fundamental Research Funds for the Central Universities of China (Grant Numbers 2022-5-5YB-12). Appreciation is also extended to the National Natural Science Foundation of China (Grant Number 51678435 and 52078367), Shanghai Pujiang Program (Grant Number 23PJ1413300) and China Postdoctoral Science Foundation (Grant Number 2024M752416). The views and opinions expressed in this paper are solely those of the authors and do not necessarily represent those of the funding agencies.
Author information
Authors and Affiliations
Contributions
Han Si: Methodology, Data Curation, Software, Visualization, Writing - Original Draft;Qidi Wang: Methodology, Validation, Supervision, Writing - Review & Editing; Xin Ruan: Conceptualization, Methodology, Funding Acquisition, Resources, Supervision, Writing - Review; Xingpo Fang: Resources, Investigation.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Si, H., Wang, Q., Ruan, X. et al. Framework for investigating structure cracking using real engineering data combined with physics constraints. Sci Rep 15, 6344 (2025). https://doi.org/10.1038/s41598-024-85079-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-85079-4










