Introduction

Concrete is the most widely produced product utilized worldwide1. Lightweight concrete (LWC) was originally developed to lower the density by including voids into conventional concrete in order to minimize the self-weight of modern structures, therefore decreasing dead loads in design while preserving acceptable strength and durability. This was accomplished by adding voids to the aggregates with lightweight aggregates (LWA), producing voids in the cement paste with cellular or foamed concrete, eliminating fine aggregates, and combining these techniques2. Researchers tend to favor using LWA over the other approaches. Significant efforts have been undertaken in recent decades to improve the characteristics of concrete for a variety of applications, including enhancing its strength and durability while minimizing the environmental impact of its manufacture. Pumice3, expanded slate4, expanded clay5, expanded polystyrene beads6, perlite7, and cenosphere8 are among the LWAs that have been the subject of intensive research over time to produce LWC for several applications9. LWC has a lower density because there are more empty spaces there. Because of this, it is a better option for thermal insulation and has lower production cost and greater fire resistance than normal-weight concrete (NWC)10. As a result, it has been used in both structural and nonstructural applications, including offshore and floating structures, high-rise skyscrapers, and long-span bridges11,12,13. The Romans employed natural lightweight pumice materials to make LWC two thousand years ago to build the 44-meter-diameter dome of the Pantheon14. Historically, LWA was mostly composed of raw natural elements such as tuff, pumice, and scoria. Artificial LWA, such as expanded clay/perlite/shale/slate and sintered fly ash (FA), have recently been developed on a massive scale to fulfill the growing worldwide energy demand and the shortage of natural LWA15.

Because of voids in the material, LWC usually has a lower compressive strength than NWC while having a lower density14. As such, the use of LWC is limited to applications where the concrete needs to have greater strength and improved durability. In light of this, researchers have spent over a decade working to maintain the unique characteristics of LWC while enhancing its mechanical properties, most notably its compressive strength. This effort resulted in the invention of lightweight high-strength concrete (LWHSC). The advantages of high-strength concrete (HSC) and LWC are synergistically integrated by LWHSC. HSC is defined by ACI21316 as concrete whose compressive strength is more than 40 MPa. Although Euro code 217 does not define HSC specifically, it is commonly considered to refer to concrete having a compressive strength of more than 40–50 MPa. The majority of researchers believe that LWHSC has a strength of more than 40 MPa and a density of less than 2000 kg/m318,19. Because LWHSC is more affordable than ordinary concrete, it is the favored material for high-rise buildings, modular construction, precast concrete structures, and marine constructions. Because LWHSC’s main goal is to reduce the self-weight of modular elements while preserving their high strength, it can be applied to modular constructions. Because LWHSC has a lower self-weight than other materials, it can be used to cast precast concrete members for roof framing and long-span bridge girders without requiring the handling of large concrete elements or resolving production-related challenges on-site14,20.

Cement, which accounts for 8% of global greenhouse gas emissions, is being replaced by sustainable byproducts known as supplemental cementitious materials (SCMs) in the construction industry21,22. Economically speaking, these materials are often more affordable than cement, which lowers the total cost of construction. Additionally, using them improves the mechanical properties and resilience of concrete, which may save maintenance costs over the course of a structure’s lifetime. From an environmental standpoint, using these by-products in place of cement reduces the requirement for clinker, the primary component of cement that is associated with high carbon dioxide emissions23,24. By lowering greenhouse gas emissions, this replacement supports international efforts to combat climate change25,26. Utilizing these industrial byproducts also diverts them from landfills, reducing pollution to the environment and promoting environmentally friendly waste management practices25,27. According to several investigations28,29, high-strength lightweight concrete (HSLWC) produced with silica fume (SF) exhibited strong bond strength and chloride permeability comparable to NWC HSLWC was created by Klic et al.30 using SF and FA to lower density and boost strength. According to earlier research, 25-year-old LWC with pozzolanic materials such as SF, FA, and blast furnace slag (BS) had less chloride ion penetration than reference concrete without SCMs31,32. Wilson et al.27 found that adding 5–10% SF to cement by weight increased HSLWC’s resistance to freezing and thawing. The mechanical properties and durability of HSLWC were improved by the use of SF and metakaoline (MK)33, with even greater advantages noted at a lower water-to-binder ratio of 0.2534. By decreasing permeability and shrinkage, the addition of MK and BS improved the early-age strength and durability of HSLWC35,36. Bleeding and up-floating of LWA are successfully prevented in HSLWC by using FA-BS-SF37. To counteract the negative effects of LWA, nano-silica was added to HSLWC in addition to micro-size SCMs38. Results indicated a significant improvement in strength and durability with the addition of these byproducts39.

Despite tremendous advancements, earlier research had substantial shortcomings that support the necessity of machine learning (ML) techniques. The intricate, nonlinear interactions between various SCM combinations, LWA types, and mix parameters are frequently difficult for traditional empirical models to capture. Moreover, a lack of generalizability across material systems, inconsistent testing circumstances, and small dataset sizes plague many prior experimental investigations. Recent studies have shown that machine learning (ML) techniques, including artificial neural networks (ANN), random forests (RF), and gradient boosting, can perform better than traditional regression techniques in strength prediction and sustainability assessment40,41,42.

So far, only a few research have studied the prediction of mechanical properties for LWHSC using sophisticated ML approaches, and even fewer have assessed the role of recycled concrete powder (RCP), SCMs, or recycled aggregates in a sustainability context43,44. Experimental work can be laborious and costly; machine learning methodologies such as Multi-Expression Programming and Random Forest (RF) models provide an efficient alternative by utilizing existing data to forecast the mechanical properties of LWHSC that incorporates supplementary cementitious materials.

Machine learning (ML) models are surpassing conventional methods and transforming constitutive model creation. Their vast capacity for processing information, capacity to identify interactions at several scales, and versatility in handling different datasets all help to improve accuracy and efficiency45. An analysis of Multi-Expression Programming (MEP) and RF in the context of concrete prediction and analysis demonstrates important developments in predictive modeling. MEP is an evolutionary algorithm that builds mathematical models from data, which makes it perfect for complex material behavior predictions. MEP has been used in several research to forecast the mechanical characteristics of concrete. As an example, to predict the split tensile strength (TS) and elastic modulus (E) of waste-foundry-sand concrete (WFSC), Chen et al.46 created Multi-Expression Programming (MEP) models. They were able to achieve strong correlations of R = 0.892 for TS and R = 0.996 for E. By using MEP to predict the 28-day compressive strength of fiber-reinforced self-compacting concrete (FR-SCC), Inqiad et al.47 were able to demonstrate the predictive reliability of the method with an objective function (OF) value of 0.031. MEP was also used by M. Khan et al.48 to predict the slump, compressive strength, and elastic modulus of bentonite plastic concrete (BPC). They found remarkable correlations for slump, compressive strength, and elastic modulus: R = 0.9999, R = 0.9831, and R = 0.9300, respectively. Apart from concrete, Jalal et al.49 used MEP to model the compaction properties of expansive soils, including maximum dry density and optimum moisture content, using a dataset of 195 cases. They showed dependable predictive accuracy that was confirmed by several statistical indicators (MAE, RMSE, NSE, R). Furthermore, Farooq et al.50 used the RF approach to estimate the compressive strength of high-strength concrete (HSC) and outperformed traditional algorithms, obtaining a coefficient of determination R2 = 0.96 with low prediction errors. Although MEP and RF have shown promise in modeling concrete and geotechnical qualities, more research is needed to determine how effectively they work synergistically to predict the mechanical properties of LWHSC that contains SCMs. To address this gap, the current study combines MEP and RF models to provide strong predictive frameworks for LWHSC with SCMs, improving the interpretability and accuracy of predictions for mix design optimization.

Although a lot of research has been done on LWHSC, comprehensive studies that apply advanced machine learning algorithms to forecast its mechanical properties are still lacking. Specifically, not enough studies have been conducted to investigate the mechanical behavior of LWHSC by adding supplemental cementitious materials (SCMs) through machine learning. The inability of traditional empirical and statistical models to accurately capture the intricate connections between material composition and mechanical characteristics impedes the advancement of more effective formulations for advanced structural applications. To address these gaps, this study collected an extensive dataset from the existing literature to develop constitutive models for the prediction of the mechanical properties of LWHSC. The findings of this study have the potential to considerably advance sustainable construction practices by providing practical solutions for the construction industry’s transition to greener materials and practices. Table 1 contains a quick summary of major studies to provide a more complete picture of recent machine learning applications in sustainable concrete.

Table 1 A summary of recent studies on ML-based prediction of concrete mechanical characteristics.

Research methodology

Multi expression programming

MEP, a robust linear-based technique within Genetic Programming (GP), utilizes linear chromosomes to store solutions effectively. Operating similarly to GEP, MEP encodes multiple solutions within a single chromosome, enhancing efficiency55. The process involves generating solutions based on the fittest chromosome identified through fitness function comparison. Parent selection occurs via a binary environment, leading to recombination and the creation of two distinct offspring56. Subsequent mutation and iteration refine the offspring until the optimal program is identified, triggering termination based on predefined criteria, illustrated in Fig. 1, while Fig. 2 shows the architecture of MEP. MEP model calibration necessitates adjustment of key parameters, including code length, subpopulation size, subpopulation number, function set, and crossover probability57. Notably, increasing the number of subpopulations extends processing time, while the complexity of MEP formulation is strongly influenced by code length.

MEP provides various advantages over other forms of genetic techniques, such as genetic programming (GP). GP uses a tree crossover evolutionary process to build a large number of parse trees, which increases processing time and storage needs58. Furthermore, because GP is both a phenotype and a genotype, it is difficult to provide a simple formulation for the intended job. MEP allows a broad variety of expressions, including implicit parallelism. MEP also has the ability to store many solutions to an issue on a single chromosome56,59. MEP can discriminate between symptoms and genotypes due to linear progressions60. MEP is projected to be more effective than other machine learning algorithms due to its ability to encode many responses on a single chromosome. This unique feature enables MEP to look for a more viable solution. Unlike previous GP algorithms, MEP provides clear decoding operations and pays special attention to cases where the intended expression requirements are ambiguous59. MEP may deal with a variety of difficulties, including division by zero and improper expressions61. Furthermore, multi-gene genetic programming (MGGP) and MEP are expansions of traditional GP designed to address difficult optimization problems. While their methodologies are similar, they differ significantly in how solutions are presented and evolved. In MGGP, an individual is represented by a collection of genes, each of which may encode a distinct subcomponent or module of the solution48,62. These genes could be trees or other structures relevant to the problem domain. In MEP, an individual is represented as a set of numerous expressions, which are often linear or matrix-based63. Each expression contributes to the overall result and can be examined independently. Furthermore, MGGP typically uses genetic operators like as crossover and mutation at the gene level64. It means that crossover and mutation processes can take place inside individual genes, allowing the solution’s subcomponents to be swapped or changed. MEP, on the other hand, commonly incorporates expression-level mutation operators, which modify individual expressions or portions of expressions to generate new candidate solutions. Crossover procedures in MEP may entail mixing complete expressions from multiple individuals56.

Fig. 1
figure 1

Flowchart of MEP algorithm.

Fig. 2
figure 2

MEP algorithm architecture.

Random forest regression

Breiman proposed RF regression in 200165, which is considered an enhanced classification regression method. The fundamental characteristics of RF include its speed and flexibility in establishing a link between input and output functions. Furthermore, RF outperforms other machine learning algorithms when dealing with huge datasets. RF creates numerous decision trees during the training phase and combines their predictions to achieve more accurate and dependable predictions, as shown in Fig. 366. RF has been employed in a variety of industries, including banking to forecast client responses67, stock market price direction68, medicine/pharmaceutical business69, e-commerce70, and so on. RF is formed by combining many tree predictors, with a random vector influencing each tree’s choice. This vector is sampled separately and distributed uniformly across the forest65. Each tree in the forest is trained on a unique training set, which may provide somewhat different results71. During the training phase, each tree is constructed from a randomly chosen subset of the training data, known as bootstrapped samples, which implies that certain data points may be repeated while others are discarded. Furthermore, RF employs feature randomization, which selects a random collection of features for each tree node split. This technique reduces overfitting and increases tree variety. RF evaluates its training performance using the Out-of-Bag (OOB) error72. This entails evaluating each tree’s predictions for data points not included in the bootstrap sample. The OOB error provides an accurate approximation of the model’s generalization error. The final output is produced by averaging the various tree forecasts for regression tasks73,74.

The RF approach is perfect for regression tasks because of a few benefits. For instance, RF is renowned for its inherent resistance to overfitting and great precision. It requires little tweaking and performs effectively on a range of datasets75. Additionally, since individual trees may be made separately, RF can be parallelized. Because of this, RF is suitable for training in remote computing environments and managing large datasets76.

Fig. 3
figure 3

RF algorithm flowchart.

Datasets used in modeling

The novel MEP-based and RF machine learning methods were used in the current study. To train our model based on the worldwide sample and produce MEP and RF models, a comprehensive database was built from the experimental work conducted in the literature. 170 CS data points, expressed in megapascals (MPa), are included in the training database18,77,78,79,80,81,82,83,84,85,86,87, 161 FS19,79,80,81,83,86,87,88,89,90,91,92,93,94,95, and 186 data points for TS19,33,78,79,81,82,83,84,86,87,89,92,95,96,97. The supplementary material contains information on the dataset that was utilized for the modelling. For modelling purposes, water to cement ratio (W/B), cement (C), fly ash (FA), blast furnace slag (BS), silica fume (SF), natural fine aggregate (NFA), lightweight aggregate (LWA), and basalt fiber (BF) content are the eight most important input components that were taken into account. Additionally, Table 2 presents a variety of statistical values for the data to offer additional insights into the database. This table shows that CS, derived from 170 observations, exhibits values between 19.65 MPa and 86.10 MPa, with a mean of 46.79 MPa, reflecting substantial variability in structural performance. FS based on 161 data points, ranges from 0.73 MPa to 13.35 MPa, with an average of 5.57 MPa, while tensile strength, calculated from 186 observations, spans 0.46 MPa to 12.82 MPa, with a mean of 3.72 MPa. The skewness values for all examined variables fall within the acceptable interval of − 3 to + 3, indicating appropriate distributional symmetry98, whereas the kurtosis values lie within the recommended range of − 10 to + 10, signifying an absence of pronounced outliers or extreme deviations from normality. Collectively, these descriptive statistics underscore the diversity and balanced nature of the dataset, establishing its suitability for subsequent statistical modeling and in‑depth analysis99,100.

Table 2 Detailed statistics about the dataset.

Furthermore, Pearson correlation heat maps were generated to assess the interrelationships among input variables for CS, TS, and FS, as shown in Fig. 4. Understanding these correlations is essential, as strong associations between predictors can lead to multicollinearity, which complicates the interpretation of regression and machine learning models101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118. The analysis indicated that most pairwise correlation coefficients (r-values) remained below the recommended threshold of 0.8, suggesting a low risk of multicollinearity and ensuring more reliable model estimation49,119,120. These results confirm that the dataset maintains an acceptable level of independence among its variables, thereby supporting the development of robust and interpretable predictive models.

Fig. 4
figure 4figure 4

Pearson correlation heat maps: (a) CS (b) TS (c) FS.

The prediction model is known to be influenced by the pattern and distribution of the input data. Figure 5 illustrates the frequency distribution of the input data. This chart illustrates that the input data frequencies are much greater, and the distribution is not homogeneous, indicating that the models apply to a wider range of data.

The influence of input parameters on the mechanical properties of LWHSC was investigated using Hex contour plots for CS, FS, and TS, as shown in Figs. 6, 7 and 8. The darker regions in these figures indicate areas with a higher concentration of input variables. For CS (Fig. 6), the W/B ratio (Fig. 6a) was most densely populated between two regions, from 0.23 to 0.30 and from 0.37 to 0.42, indicating a preference for lower ratios in effective mixtures. Cement content (Fig. 6b) commonly ranged from 390 to 510 kg/m3, while FA (Fig. 6c) and slag (Fig. 6d) were concentrated between 25 and 50 kg/m3 and 20–50 kg/m3, respectively. SF (Fig. 6e) was primarily clustered between 4% and 10%. NFA and LWA (Fig. 6f, g) showed concentrations in the ranges of 300–1300 kg/m3 and 200–800 kg/m3, respectively. BF content (Fig. 6h) was most commonly used between 0.5% and 1.0%. A similar pattern was observed for FS (Fig. 7), where the W/B ratio (Fig. 7a) was concentrated between 0.26 and 0.42, and cement content (Fig. 7b) ranged from 300 to 520 kg/m3. FA and slag (Fig. 7c, d) were frequently used in the 40–80 kg/m3 and 10–40 kg/m3 ranges, respectively, while silica fume (Fig. 7e) remained between 6% and 60%. NFA and LWA (Fig. 7f, g) showed similar distributions to those in CS, and BF (Fig. 7h) was densely populated between 0.8% and 1.0%. For TS (Fig. 8), the W/B ratio (Fig. 8a) was mainly between 0.20 and 0.43, cement content (Fig. 8b) was found between 380 and 500 kg/m3, and both FA and slag (Fig. 8c and d) showed concentrations in the 40–100 kg/m3 and 20–40 kg/m3 intervals, respectively. SF (Fig. 8e) was again centered around 25–80%, with NFA and LWA aggregates (Fig. 8f, g) following previous patterns. BF (Fig. 8h) remained most frequent between 0.8% and 1.0%. With these input parameters, the values of CS generally ranged from 25 to 70 MPa, with the majority concentrated between 35 and 50 MPa. FS values ranged approximately from 2.5 to 7.5 MPa, predominantly clustering around 5 to 6 MPa, while TS values fell between 2 and 5 MPa, with a dense concentration near 4 MPa. These trends suggest that optimized input ranges consistently support enhanced mechanical performance across all strength categories.

Fig. 5
figure 5

Frequency histograms of variables: CS, TS, and FS.

Fig. 6
figure 6

Hex contour plot of input parameters for CS; (a) w/b; (b) Cement; (c) Fly ash; (d) Slag; (e) Silica Fume; (f) NFA; (g) LWA; (h) Basalt Fiber.

Fig. 7
figure 7

Hex contour plot of input parameters for FS; (a) W/B; (b) Cement; (c) Fly ash; (d) Slag; (e) Silica Fume; (f) NFA; (g) LWA; (h) Basalt Fiber.

Fig. 8
figure 8

Hex contour plot of input parameters for TS; (a) W/B; (b) Cement; (c) Fly ash; (d) Slag; (e) Silica Fume; (f) NFA; (g) LWA; (h) Basalt Fiber.

MEP and RF models development

Prior to creating a model, it is critical to carefully select the input elements that have a major impact on the features of LWHSC77,81. The most influential factors from the literature were studied for LWHSC. The eight main factors selected were Water to cement ratio (W/B), Cement (C), Fly ash (FA), Blast Furnace Slag (BS), Silica fume (SF), Natural Fine Aggregate (NFA), Lightweight Aggregate (LWA), and Basalt fiber (BF) content. Thus, the mechanical strength predicted models were created based on the characteristics provided in Eq. (1)

$$CS,\;FS\;and\;TS=f\;(W/B{\text{ }}ratio,\;C,\;FA,\;BS,\;SF,\;NFA,\;LWA,\;BF)$$
(1)

Figure 9 shows the methodology utilized in this investigation. Several MEP setup factors must be determined before developing a valid and adaptive model. The setup variables are determined based on earlier recommendations and a trial-and-error method121. The population size determines how many programs are generated. A large-scale population model may be more complex, but it is more precise and reliable and it takes longer to achieve convergence. However, if the size goes above a specific range, the model may overfit. Table 3 displays the setup variables used for the model built in this study. The function uses simple mathematical operators (ln, exp, -, ×, ÷, +) to simplify the final formulations. The number of generations reflects how accurate the procedure was before being terminated. A multi-generation run will create the most error-free simulation model. Various variable combinations were utilized to improve the model, and the best combination was chosen to produce the model with the fewest errors. The primary difficulty with ML prediction simulation is overfitting the prediction model. When used with original data, the model performs admirably; however, when given unknown data, the model performs much worse. To avoid overfitting, it has been proposed that the model be tested on previously unknown data121,122. As a result, the data is proportionally divided into two categories. After training, the model is tested on a dataset that was not used to train the model. The database was split into two subsets: 30% for testing and 70% for training. The created models perform admirably on all datasets. The MEPX tool (version 2023.3.5) was used in this work to perform MEP modeling.

Initially, the modeling process produces the best answers for the population. The technique is repeated, with each iteration getting closer to the final result. The fitness of each subsequent generation is determined. MEP modeling procedure continues until the fitness value remains unchanged. If the results are not exact, the operation is repeated by gradually increasing the population size and tweaking additional hyperparameters. After evaluating the fitness function of each model, the model with the lowest fitness is picked.

The Random Forest (RF) model was developed using Python’s Scikit-Learn library, with hyperparameters optimized through an iterative tuning process to enhance predictive accuracy. The number of estimators, each representing a decision tree, was balanced between performance improvements and computational cost. This aligns with findings from Liu et al., where RF was successfully employed to predict the compressive strength of high-performance concrete, yielding robust results123. Moreover, RF was shown to outperform other regression techniques in predicting compressive strength of self-compacting concrete, demonstrating its effectiveness in modeling complex mix designs124. In eco-friendly concrete incorporating rice husk ash, RF achieved strong predictive capabilities (with accuracy often surpassing Gaussian Process Regression and Decision Tree models)125. Additionally, RF demonstrated high reliability in estimating the compressive strength of sustainable self-consolidating concrete containing SCMs like fly ash and GGBS126. These studies reinforce RF’s suitability and that of its tuned hyperparameters for predicting mechanical behaviors in SCM-based concrete systems. Table 3 summarizes the final optimized hyperparameters used in this study.

Table 3 MEP and RF algorithms setup parameters.
Fig. 9
figure 9

Flowchart of research methodology adopted in the present study.

MEP and RF models evaluation

Various measures have been used to analyze a model’s viability and performance. Each indicator has its own method of estimating the performance of these models. The developed model was assessed using three statistical techniques: mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE)46,127. It is worth noting that the RMSE, MSE, and MAE are three statistical metrics used extensively in machine learning to evaluate error levels128. A lower RMSE, MSE, or MAE value suggests that the model is more precise and accurate. Equations (2)–(4) represent the aforementioned statistical checks. In addition, the coefficient of determination R2 was calculated for each model.

$$\:MAE=\frac{\sum\:_{i=1}^{n}|ei-mi|}{\text{n}}$$
(2)
$$\:MSE=\frac{1}{\left|\text{n}\right|}\sum\:_{i=1}^{n}{(ei-mi)}^{2}$$
(3)
$$\:RMSE=\sqrt{\frac{\sum\:_{\text{i}=1}^{\text{n}}{\left(\text{e}\text{i}-\text{m}\text{i}\right)}^{2}}{\text{n}}}$$
(4)

The symbols “mi” and “ei” represent the model and experimental values, respectively, whereas the variable “n” denotes the total quantity of samples. The R2 factor was calculated for both the actual and predicted outcomes due to its fair estimation results and enhanced performance. R2 values approaching 1 signify more efficiency in the developed model129. RMSE effectively addresses the higher magnitudes of error. A lower root mean square error (RMSE) score denotes less error impact and enhanced model performance130. On the other hand, it was found that MAE is more significant when applied to smooth and continuous data131.

Interpretation of models with SHAP analysis

Lundberg and Lee132 introduced a method known as SHAP (Shapely Additive Explanations) for evaluating machine learning models. The capacity of ML models to learn from seen data and anticipate outcomes in fresh data has captured all attempts to create robust estimating tools. However, most ML modeling approaches suffer from excessive complexity and limited interpretability. The SHAP technique improves the explainability of prediction models. In the present study, SHAP is utilized to see the input parameters’ importance and influence on the response parameter. Even though much ML research in construction materials has attained incredible precision in forecasting their targets, insufficient consideration is made to the interpretation of the ML models. Numerous research computes characteristic significance in ML models using the decision path, and intuitive techniques133. To evaluate every input variable’s proportional importance to the output, as well as to check whether input variables improve favorably or adversely the output, a SHAP based technique was developed. References134,135 provide a full discussion of the SHAP technique. Shapley value indicates the significance of each independent variable to the response properties. This approach is the same as parametric analysis in which one variable is altered while other parameters are held fixed to evaluate how changes in one input parameter are affecting the output. The SHAP is used in this work to provide both parameter importance and influence interpretations of each input variable. Additionally, it facilitates an examination of how input variables positively or negatively affect the model’s output132,136,137,138,139,140,141,142,143. Based on cooperative game theory, SHAP represents the average proportional contribution of an attribute value across all potential combinations. The SHAP number of a character is defined as the average predicted value of a variable with this feature without the mean forecasted value of observations. The outcome of an ML model is assumed as a linear sum of its input characteristics scaled by matching SHAP values to enable interpretability.

Results and discussion

Outcomes of MEP modeling

Regression analysis

Figures 10 and 11 compare experimental and predicted CS of LWHSC using the MEP model, showing strong agreement with R2 values of 0.98378 (training) and 0.98383 (testing) and Pearson’s r of 0.99186 and 0.99188, respectively. Similarly, Figs. 12 and 13 present TS results, with R2, adjusted R2, and Pearson’s r of 0.99307, 0.99302, and 0.99653 for training, and 0.99212, 0.99197, and 0.99605 for testing, indicating excellent predictive accuracy. While Figs. 14 and 15 show FS comparisons, where R2, adjusted R2, and Pearson’s r were 0.94331, 0.9428, and 0.97124 for training, and 0.95681, 0.95587, and 0.97817 for testing, confirming strong model reliability.

These findings are consistent with prior studies using machine learning methods to predict concrete characteristics. Inqiad et al.47 found that MEP was highly accurate in forecasting the 28-day compressive strength of fiber-reinforced self-compacting concrete, with R2 values exceeding 0.97. Similarly, Khan et al.48 demonstrated the ability of MEP to forecast the workability and mechanical properties of bentonite plastic concrete with R values more than 0.98, highlighting its robustness in capturing nonlinear connections. Furthermore, predicted experiments on expansive soils employing MEP49 verified its high dependability and narrow error margins, which are consistent with the accuracy found in the current study for LWHSC. These comparisons confirm that MEP not only makes correct predictions in our work, but also consistently outperforms other cementitious materials, bolstering its usefulness in real-world engineering applications.

Fig. 10
figure 10

Comparing the MEP training prediction model to CS experimental data.

Fig. 11
figure 11

Comparing the MEP testing prediction model to CS experimental data.

Fig. 12
figure 12

Comparing the MEP training prediction model to TS experimental data.

Fig. 13
figure 13

Comparing the MEP testing prediction model to TS experimental data.

Fig. 14
figure 14

Comparing the MEP training prediction model to FS experimental data.

Fig. 15
figure 15

Comparing the MEP testing prediction model to FS experimental data.

Error analysis

This section shows the absolute error that exists between the actual and model values for the given datasets. These plots provide a glimpse of the maximum error occurrence in the MEP model. Figures 16 and 17 illustrate the 3D error distributions for MEP-predicted CS, TS, and FS of LWHSC during the training and testing phases, respectively. In the training phase (Fig. 16), CS errors range from 0.002 to 7.8 MPa, averaging 1.2 MPa, with most values under 3 MPa. TS errors fall between 0.0005 and 0.68 MPa (average 0.25 MPa), while FS errors range from 0.001 to 2.4 MPa (average 0.8 MPa). In the testing phase (Fig. 17), CS errors vary from 0.25 to 6.0 MPa, generally staying below 3.0 MPa. TS errors range from 0.00 to 0.50 MPa, mostly under 0.30 MPa, and FS errors lie between 0.00 and 2.0 MPa, with most values below 1.0 MPa. These results demonstrate that prediction errors remain consistently low in both phases, highlighting the MEP model’s high accuracy and reliable generalization across all mechanical properties.

Fig. 16
figure 16

Error representation in MEP (a) CS, (b) TS and (c) FS predicted and actual of training data set.

Fig. 17
figure 17

Error representation in MEP (a) CS, (b) TS and (c) FS predicted and actual of testing data set.

Outcomes of RF model

Regression analysis

Figures 18 and 19 illustrate the comparison between experimental and predicted CS of lightweight high-strength concrete (LWHSC) using the RF model, showing strong agreement with R2 values of 0.9065 (training) and 0.87558 (testing) and Pearson’s r of 0.9521 and 0.93572, respectively. Figures 20 and 21 present the TS results, where the model achieved R2, adjusted R2, and Pearson’s r values of 0.85115, 0.8500, and 0.92258 for training, and 0.80237, 0.79864, and 0.89575 for testing, indicating substantial predictive accuracy. Figures 22 and 23 compare predicted and experimental FS, with R2, adjusted R2, and Pearson’s r values of 0.92445, 0.92377, and 0.96148 for training, and 0.94441, 0.9432, and 0.97181 for testing, confirming the model’s strong reliability in estimating LWHSC properties.

Notably, the performance of our RF model is supported by recent literature, which has demonstrated that RF can provide good prediction accuracy for concrete strength estimation. Xu et al.144 employed RF to estimate compressive strength of high-performance concrete and found R2 = 0.93, indicating reliable prediction capacity. Khan et al.145 found R2 = 0.9922 when using RF to estimate compressive strength in recycled coarse aggregate concrete, indicating strong alignment with experimental results in similar circumstances. Zhang et al.146 employed RF and other ML models with hyperparameter tuning and found robust generalization for recycled aggregate concrete, which is consistent with the significant prediction success seen in our investigation. These findings collectively reinforce that, while RF has slightly poorer precision than MEP in our results, its performance is nevertheless comparable to top-tier RF applications in the existing literature.

Fig. 18
figure 18

RF prediction model comparison with corresponding experimental CS data of training set.

Fig. 19
figure 19

RF prediction model comparison with corresponding experimental CS data of testing set.

Fig. 20
figure 20

RF prediction model comparison with corresponding experimental TS data of training set.

Fig. 21
figure 21

RF prediction model comparison with corresponding experimental TS data of testing set.

Fig. 22
figure 22

RF prediction model comparison with corresponding experimental FS data of training set.

Fig. 23
figure 23

RF prediction model comparison with corresponding experimental FS data of testing set.

Error analysis

Figures 24 and 25 present the error distributions of the RF model for predicting the compressive strength (CS), split tensile strength (TS), and flexural strength (FS) of LWHSC in both training and testing phases. During training (Fig. 24), the RF model showed high accuracy, with CS errors ranging from 0.00 to around 5.2 MPa, most of which were below 2.0 MPa. TS errors were between 0.00 and 0.40 MPa, while FS errors fell within 0.00 to 2.0 MPa. The predictions closely matched the actual values, indicating a precise fit. In the testing phase (Fig. 25), the model continued to perform well, with CS errors ranging from 0.00 to 6.0 MPa and generally staying under 3.0 MPa. TS errors varied between 0.00 and 0.45 MPa, and FS errors ranged from 0.00 to 2.2 MPa. Although testing errors showed a slightly wider distribution, especially for FS, most values still remained near the actual results. Overall, the RF model delivered strong predictive performance in both phases, maintaining low errors and demonstrating its reliability in estimating all key strength properties.

Fig. 24
figure 24

Error representation in RF (a) CS, (b) TS, and (c) FS predicted and actual of Training data set.

Fig. 25
figure 25

Error representation in RF (a) CS, (b) TS, and (c) FS predicted and actual of Testing data set.

Performance evaluation of MEP and RF models

The number of data points needed to build a model is important since it influences the model’s validity. A satisfactory model requires a data set proportion of 3 to the number of inputs, with a ratio of 5 preferred147,148,149. This work maintains a ratio of 21 for CS, 23 for TS, and 20 for FS. As previously stated, the performance of all models is evaluated using multiple statistical measures (R2, MAE, MSE, and RMSE). Statistical tests on the training and testing datasets were performed to measure the effectiveness and competency of the created models. The values of all error measurements for MEP and RF algorithms are provided in Table 4 and illustrated by radar graphs shown in Figs. 26 and 27, respectively. The table provides a good correlation between the model-estimated and actual values, as R2 values are closer to 1 (ideal condition). The MAE, MSE, and RMSE values for all datasets and both algorithms are notably lower, which indicates the good precision and generalization ability of MEP and RF models. All the statistical checks were within the limits specified by the earlier studies87.

Fig. 26
figure 26

Radar plots presenting the performance of MEP testing and training models: (a) CS, (b) TS, (c) FS.

Fig. 27
figure 27

Radar plots presenting the performance of RF testing and training models: (a) CS, (b) TS, (c) FS.

Table 4 Various statistical calculations of the MEP and RF models.

Comparative analysis of the performance of established models

Figure 28 depicts the Taylor diagram, which is useful for a more in-depth comparison of the established models’ performance. Taylor150 invented the Taylor diagram, which is a visual representation that aids in determining the accuracy of several models by demonstrating which model is most realistic and best aligns with the actual data. In a Taylor diagram, different models or datasets are shown on the same graph. Their relative performance can be measured based on how well they align with the reference data (actual dataset)151. Furthermore, this diagram provides a comprehensive method for comparing multiple models based on their correlation, variance, and RMSE from reference data. It provides a thorough analysis of the models’ correlation and standard deviation with the observed data152,153. Figure 28 illustrates each model as a point, with its performance represented by the distance from the reference point. Higher correlation and fewer errors indicate that a model is closer to the reference point, and hence more accurate in predicting outcomes154,155.

Figure 28 presents Taylor diagrams comparing the MEP and RF models for CS, TS, and FS. In each case, the MEP model demonstrates a stronger match with the experimental data. For CS, MEP achieves a high correlation coefficient of approximately 0.99 and a standard deviation of about 13.95, which closely matches the experimental standard deviation (14). In contrast, the RF model shows a lower correlation of around 0.92 and a smaller standard deviation of 13.1, indicating weaker alignment with the data. A similar trend is observed in TS prediction, where MEP attains a correlation of 0.985 and a standard deviation of 1.525, both very close to the actual values, while RF lags slightly with a correlation of 0.96 and a standard deviation of 1.475. For FS, the MEP model again performs better, with a correlation near 0.99 and a standard deviation of 2.625, closely matching the experimental 2.64, whereas RF records a lower correlation of 0.94 and a standard deviation of 2.57. These results confirm that MEP consistently provides higher accuracy, better variance matching, and improved predictive reliability. Therefore, it can be concluded that the MEP model significantly outperforms the RF model across all strength parameters, making it the more robust and dependable choice for modeling the mechanical properties of LWHSC.

Fig. 28
figure 28

Taylor diagram comparing the performance of the models.

Comparing MEP and RF models with related work

The present work MEP and RF models are compared with similar models previously developed in the literature for estimating the properties of LWHSC, as presented in Table 5. Sifan et al.156 predicted the compressive strength of LWHSC by utilizing Gradient Boosting Regression (GBR) ML technique. The results from his work showed an R2 value of 0.95 which is found to be lower than the present study. Similarly, Kumar et al.157 forecasted the CS of LWHSC by using Gaussian Progress Regression (GPR), Ensemble Learning (EL), Support Vector Machine Regression (SVMR), and optimized GPR, SVMR, and EL. Findings from their study showed lower performance as compared to our present study. Furthermore, Yaser et al.158 developed a deep learning model that identified important variables such the water-to-cement ratio, aggregate ratios, and superplasticizer content in order to predict the compressive strength (UCS) of lightweight pumice concrete. The model outperformed the Multilayer Perceptron (MLP), Support Vector Machine (SVM), and Decision Tree (DT) models with R2 = 0.914, accuracy = 0.97, and AUC = 0.971. Our MEP model, in contrast, demonstrated greater accuracy, particularly for compressive strength (R2 = 0.98–0.99). Additionally, using 2,568 samples, NS Alghrairi et al.159 developed nine models to forecast the compressive strength (CS) of lightweight concrete (LWC) with and without nanomaterials. Density, water content, cement, water-to-binder ratio, and nanomaterial content were among the input factors. Gradient-Boosted Trees (GBT), Random Forest (RF), Tree Ensemble (TE), Extreme Gradient Boosting (XGB), Keras Neural Network (KNN), Simple Regression (SR), Probabilistic Neural Network (PNN), Multilayer Perceptron (MLP), and Linear Regression (LR) were the models that performed the best (R2 = 0.90, RMSE = 5.286). On the other hand, our MEP model demonstrated better prediction performance for LWC strength, achieving higher accuracy (R2 = 0.98–0.99). In a comparable fashion, Fazal Hussain et al.160 used 420 data points from 43 investigations to build a machine learning (ML)-based method for mix design optimization of lightweight aggregate concrete (LWAC). Eleven input parameters pertaining to mix and aggregate features were used to train five machine learning algorithms: Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), Gaussian Process Regression (GPR), and Extreme Gradient Boosting (XGBoost). The outcome was compressive strength. The GPR model outperformed the others (R2 = 0.99, RMSE = 1.34, MSE = 1.79, MAE = 0.69). By contrast, our MEP model demonstrated higher predictive effectiveness for LWAC strength, with similar high accuracy (R2 = 0.98–0.99) with enhanced trend catching.

In addition to LWC and LWHSC, several researchers have used different machine learning methods to forecast the mechanical characteristics of other kinds of concrete, like rubberized concrete and high-strength concrete (HSC). Here is a summary of these studies for comparison as well.

Using inputs including cement, aggregate ratio, water, and superplasticizer, Farooq et al.50 used supervised machine learning (ML) techniques, such as Random Forest (RF) and Gene Expression Programming (GEP), to estimate the compressive strength of high-strength concrete (HSC). While GEP also generated good predictions with empirical relations between real and estimated values, the RF model demonstrated strong performance with R2 = 0.96. Their models showed good accuracy, but they were not as predictive as the MEP and RF models created in this study. Similarly, David Sinkhonde and Destine Mashava et al.161 estimated the compressive strength of rubberized concrete incorporating brick powder using the ANN model with 6 input parameters. The performance of the ANN model was found to be significantly poorer with R2 value of 0.83 in comparison to the performance achieved in the present study. Senthil Vadivel et al.162 utilized XG boost, CatBoost, Extra trees regressor, and Bagging regressor algorithms for predicting the 28 days compressive, split tensile, and flexural strength of rubberized concrete in replacement of fine aggregate with fine rubber (FR), and Coarse Aggregate with Coarse Rubber (CR). From ML models of 5 input parameters, the best R2 value of 0.98 was achieved. Furthermore, Musa Adamu et al.163 employed the ANN algorithm to forecast the CS, STS, FS, and E properties of rubberized concrete incorporating fly ash and nano-silica, and achieved an R2 value of 0.99. Additionally, Jingkui Zhang et al.164 developed ELM based predictive model using 6 input parameters. The performance of the ELM model was found to be poorer in comparison to the performance achieved in the present study. Furthermore, the MEP model in this study outperformed all models in the literature in predicting the compressive, flexural and split tensile strength of LWHSC with more input variables. More specifically, the MEP model exhibited better statistical error values (MAE, RMSE, and MSE) as compared to previous literature ML algorithms. Hence, it can be concluded that MEP and RF machine learning algorithms can be used with confidence for the prediction of various mechanical properties of LWHSC.

Table 5 Previous modeling techniques used for LWHSC.

Analysis of the influence of input parameters on compressive strength

To understand how various mixture components, affect concrete strength, SHAP (SHapley Additive exPlanations) study is crucial. It provides a thorough understanding of the average contribution, variability, and direction of influence of each input parameter, enabling enhanced mix design optimization to improve mechanical performance92,168. In this work, average SHAP values were used to assess the importance of input variables in predicting compressive strength (CS).

The mean SHAP values for features obtained from the MEP and RF models are displayed in Fig. 29. The parameters are the W/B ratio, LWA, NFA, Cement, SF, FA, BF and Slag in decreasing order of importance. While LWA and NFA also show moderate effects (≈ 4.3 and above), the W/B ratio has the greatest mean SHAP value (≈ 8), suggesting a major influence on model predictions. Due to inadequate data in the dataset, BF and Slag contribute very little, while Cement, SF, and FA have smaller contributions (1–2.5).

The SHAP summary plot (Fig. 30) further demonstrates how each feature influences predicted CS. Each dot represents a feature’s SHAP value for a data point, with color denoting magnitude (red = high, blue = low). Dots on the right indicate an increase in CS, while those on the left indicate a decrease. The W/B ratio exhibits a strong positive relationship, confirming its major role in strength prediction; LWA typically exerts a smaller influence, whereas NFA has a mostly neutral but occasionally variable effect; Cement, SF, and FA show both positive and negative effects, depending on their proportions; BF tends to increase strength, while Slag shows negative SHAP values, suggesting a possible decrease in CS at higher contents.

Fig. 29
figure 29

Mean SHAP values of input parameters.

Fig. 30
figure 30

SHAP summary plot illustrating the influence of input parameters on compressive strength (CS).

Analysis of the influence of input parameters on tensile strength

Figure 31 illustrates the mean SHAP values for various mix design parameters affecting the tensile strength (TS) of lightweight high strength concrete. The results, derived from the MEP and RF models, reveal that cement has the highest mean SHAP value, confirming its dominant role in determining TS. This aligns with established knowledge of cement’s critical contribution to strength development. The W/B ratio ranks second, emphasizing its substantial effect on the mechanical behavior of the mixture. FA also demonstrates a considerable positive influence, though less pronounced than cement and W/B. LWA and SF show moderate impacts, while NFA, BF, and Slag exhibit minimal mean SHAP values, suggesting relatively limited but non-negligible contributions to the model’s TS predictions.

The SHAP bee swarm plot (Fig. 32) further explains how input parameters influence TS predictions. Cement displays a wide spread of both positive and negative SHAP values, indicating strong variability in its impact across samples. The W/B ratio and FA show mixed effects centered near the zero axis, reflecting their context-dependent influence on tensile strength. LWA also demonstrates both positive and negative SHAP distributions, implying that its effect varies depending on the combination of other mix parameters. In contrast, SF shows predominantly negative SHAP values, suggesting a reduction in tensile strength beyond optimal content levels consistent with experimental observations. NFA, BF, and Slag cluster closely around zero, indicating largely neutral or minor effects on TS overall.

Fig. 31
figure 31

Mean SHAP values of input parameters.

Fig. 32
figure 32

SHAP summary plot illustrating the influence of input parameters on tensile strength (TS).

Analysis of the influence of input parameters on flexural strength

Similar to compressive strength, SHAP analysis was employed to interpret the influence of each input parameter on the flexural strength (FS) of lightweight high strength concrete. Figure 33 presents the mean SHAP values, showing the average contribution of each variable to the model output. Among all parameters, LWA exhibits the highest mean SHAP value, confirming its dominant influence on FS prediction. The W/B ratio ranks second, exerting a significant yet smaller effect. NFA and Cement have moderate impacts, whereas SF, FA, and Slag contribute minimally, indicating limited average influence on flexural strength estimation.

The SHAP bee swarm plot (Fig. 34) further illustrates the relative effects of input parameters on FS. Positive SHAP values for LWA, W/B ratio, and NFA indicate that increasing these parameters enhances flexural strength, with the W/B ratio and NFA showing consistently favorable impacts across the dataset. In contrast, SF exhibits predominantly negative SHAP values, suggesting that higher SF content tends to reduce FS. FA, Slag, and BF show near-zero SHAP distributions, implying generally neutral effects, though some BF data points display positive contributions under specific conditions. Cement shows a mild positive trend, reinforcing its known but limited role in improving strength in LWA-based concrete.

Fig. 33
figure 33

Mean SHAP values of input parameters.

Fig. 34
figure 34

SHAP summary plot illustrating the influence of input parameters on flexural strength (FS).

Conclusion

This study developed machine learning models based on multi-expression programming (MEP) and random forest (RF) to predict the compressive strength (CS), split tensile strength (TS), and flexural strength (FS) of lightweight high-strength concrete (LWHSC) incorporating supplementary cementitious materials (SCMs). The following conclusions can be drawn:

  • Performance of model Both MEP and RF models successfully predicted the mechanical properties of LWHSC, with MEP consistently outperforming RF. The MEP model achieved very high correlation coefficients (R2 = 0.98–0.99 for CS, = 0.99 for TS, and 0.94–0.96 for FS), whereas RF showed comparatively lower but still acceptable performance (R2 = 0.88–0.91 for CS, 0.80–0.85 for TS, and 0.92–0.94 for FS). These findings suggest that MEP’s symbolic regression feature helps it better represent the intricate nonlinear relationships between SCM content and strength development in LWHSC.

  • Statistical evaluation Statistical indicators (MSE, MAE, RMSE) evaluation confirmed the superiority of MEP, which produced much lower errors) across all strength parameters compared to RF. While RF occasionally showed reasonable accuracy, its error margins were several times higher than those of MEP, reinforcing the reliability of MEP in predicting concrete strength.

  • Error evaluation Error analysis demonstrated that most prediction deviations remained below 3 MPa for CS, 0.5 MPa for TS, and 2 MPa for FS in both training and testing phases, underscoring the robustness and reliability of the developed models.

  • Comparative validation Taylor diagram comparisons confirmed that MEP predictions closely matched experimental variance and distribution, establishing it as the more accurate and dependable approach compared to RF. In contrast to earlier research employing GEP, ANN, SVM, DT, KNN, PNN, MLP, GPR or hybrid tree-based models, the current MEP and RF techniques especially MEP provided equivalent or superior accuracy.

  • Engineering and practical implications MEP’s exceptional performance demonstrates its capacity to comprehend LWHSC material behavior while retaining high prediction precision, which is especially useful for optimizing sustainable design. By measuring the impact of SCM combinations on mechanical performance, the developed RF and especially MEP models can assist mix proportioning decisions and minimize cement use and experimental effort.

  • SHAP Analysis The SHAP analysis offered insightful information about how various mixture components affect LWHSC mechanical performance. The most important factors influencing strength development were found to be cement, LWA, and the W/B ratio. Slag and silica fume (SF) contributed relatively little, but fly ash (FA) had a substantial impact, especially on tensile strength. Notably, under some circumstances, basalt fiber (BF) showed a favorable impact on strength.

This study is novel in its comprehensive application of MEP and RF to model LWHSC incorporating SCMs, offering a systematic alternative to time-consuming and resource-intensive laboratory testing. The predictive framework developed herein can be directly applied in practice to optimize mix design, contributing to the development of sustainable, high-performance concretes that align with global efforts to reduce carbon emissions in the construction industry.

Overall, this study shows that MEP offers a reliable, comprehensible, and effective framework for modeling LWHSC properties utilizing SCMs. Beyond its prediction accuracy, the suggested method advances sustainable concrete technology by supporting carbon reduction objectives in the construction sector, enabling data-driven mix design, and reducing the need for intensive laboratory testing. For more thorough sustainability evaluations, future studies may expand this framework to include durability characteristics and hybrid ML–mechanistic models.

Study limitations and potential future research directions

The current work developed and evaluated machine learning models (MEP and RF) for predicting CS, TS, and FS of LWHSC with cementitious materials. While the results indicate that both models have high predictive potential, several limitations should be considered.

  • Dataset size and variability The dataset was compiled from previously published studies, resulting in variances in experimental circumstances. Although sufficient for model training, adding bigger and more diverse experimental records, particularly from controlled laboratory testing, would improve the models’ reliability and generalizability.

  • Input variables Only eight input variables were considered W/B, cement, FA, BS, SF, NFA, LWA, and BF. Other elements that may influence mechanical performance include the curing regime, admixtures, temperature, and humidity, which should be investigated in future studies.

  • Modeling approach The study used standalone models as its modeling approach. Future research could include hybrid and ensemble methods, such as Genetic Algorithm Particle Swarm Optimization (GA-PSO) and deep learning frameworks, which have the potential to improve predicted accuracy and stability.

  • Interpretability Although the present study demonstrated strong predictive reliability through multiple statistical indicators and comparative analyses, future work should employ k-fold cross-validation or repeated resampling techniques to further verify model robustness and ensure generalizability across diverse datasets and experimental conditions.

  • Sustainability assessment To better illustrate the sustainability effects of the optimized mix designs, future studies should include thorough analyses of CO2 savings, embodied-carbon reductions, and cement-replacement benefits using life-cycle assessment or standardized emission factors.

  • Research scope This research focuses on mechanical properties. Extending the modeling framework to include LWHSC’s durability, long-term behavior, and microstructural performance would provide a more complete knowledge of its usefulness in sustainable construction.