Introduction

The widespread use of concrete in civil engineering has brought growing attention to the challenges of sourcing its raw materials. The use of waste materials and industrial by-products plays an important role in producing green and sustainable concrete1,2,3,4,5,6,7,8. Therefore, civil engineers aim to partially replace conventional concrete components with waste materials and industrial by-products. Many researchers have studied the use of fly ash, rice husk ash, tire rubber, recycled aggregates, and waste foundry sand for producing green concrete. Among these materials, the use of WFS in producing green concrete appears to be a promising option because of the continuous growth of the foundry industry. The cost of landfill disposal for WFS is relatively high, ranging from about USD 135 to 675 per ton. Moreover, the presence of heavy metals such as cadmium, zinc, and lead in, which can raise environmental concerns when released into the environment. Using WFS as a partial replacement for fine aggregates in concrete production can help reduce environmental issues and economic costs. Laboratory studies have reported that the optimal replacement level of WFS as fine aggregate is around 10–20%9,10,11,12.

Determining the mechanical properties of concrete through laboratory testing is expensive, time-consuming, and requires extensive experimental procedures for each mix design. Therefore, civil engineers are looking for methods to estimate the mechanical properties of different types of material without performing physical tests13,14,15,16,17,18,19,20. Soft computing techniques can predict these properties by learning the hidden relationships within datasets21,22,23,24,25,26. Several studies have applied different soft computing approaches to predict the mechanical parameters of green concrete27,28,29,30,31,32,33,34,35. Javed et al.36 used machine learning methods, including support vector regression (SVR), decision tree (DT), and AdaBoost regressor (AR), to predict the mechanical properties of green concrete. The results of this study showed that the SVR method was the most suitable method for predicting the mechanical properties of green concrete. In addition to the appropriate accuracy, these methods were black box methods, meaning they did not provide explicit mathematical relationships for predicting mechanical parameters. Instead, their application requires access to the implemented software code and the corresponding dataset. Iqbal et al.37 used the GEP method to predict the mechanical parameters of concrete containing WFS. The results showed that the R-value obtained from the GEP model for the elastic modulus and splitting tensile strength were 0.812 and 0.818, respectively. Jakubowski and Tomczak38 predicted the self-healing process of concrete using a convolutional neural network (CNN). Their model could reasonably accurately predict the crack width at different stages of self-healing. In evaluating the model, the performance criteria included an MAE of 14.1 μm and an RMSE of 26.7 μm. Kumar et al.39 predicted the compressive strength of lightweight concrete using soft computing methods. All the models used in this study were black-box methods. The results showed that the Gaussian Process Regression (GPR) and Support Vector Machine Regression (SVMR) methods predicted the compressive strength of lightweight concrete with an R-value of 0.9740 and 0.9777, respectively. Shahrokhishahraki et al.40 used various machine learning methods to predict the compressive strength of structural concrete. The results showed that the Elastic Net al.gorithm performed best in determining the optimal mixing design to achieve maximum compressive strength. This mixing design resulted in a 10% reduction in cement consumption while maintaining the compressive strength of the concrete.

Research significant

The utilization of recycled materials and industrial by-products, such as waste foundry sand, as a partial replacement for natural aggregates represents an effective step toward the development of sustainable and eco-friendly concretes. Although numerous studies have been conducted, accurately predicting WFS-containing concrete mechanical properties remains a key challenge in civil engineering. Most previous studies have employed soft computing approaches with a black-box nature. Although these models often exhibit satisfactory predictive accuracy, their complex and non-transparent internal structures make it difficult to interpret the relationships among variables and to apply the results in engineering design. Nevertheless, an approach that provides both high prediction accuracy and interpretability of relationships has not yet been fully achieved.

In the present study, three modeling techniques with varying levels of interpretability, including RSM, GMDH, and GEP methods, have been employed to develop accurate and reliable relationships for predicting E and STS of WFS-based concrete. The selection of these three methods was based on their ability to balance high predictive accuracy with the ability to interpret mathematical relationships. This ensures that while making accurate predictions, it is also possible to analyze the influence of input variables.

The main innovation of this research lies in a comprehensive analytical framework that combines interpretable and semi-transparent methods. This framework enables both a comparative evaluation of model performance in predicting the mechanical behavior of WFS-based concrete and the development of robust empirical relationships applicable to engineering design. Furthermore, the sensitivity analysis based on the developed models enables the identification of the most influential variables affecting E and STS. It leads to a deeper understanding of the interaction between the mixture composition and the mechanical behavior of WFS-based concrete. This approach can serve as a practical step toward optimizing the mix design and advancing the development of sustainable concrete in civil engineering.

Data collection

This study considers the splitting tensile strength (STS) and elastic modulus (E) of concrete as output variables. To model these properties, seven input variables have been selected, including the ratio of waste foundry sand to cement (WFS/C), the ratio of waste foundry sand to fine aggregate (WFS/FA), the ratio of fine aggregate to total aggregate (FA/TA), the ratio of water to cement (W/C), the ratio of coarse aggregate to cement (CA/C), the ratio of superplasticizer to cement (1000SP/C) and the age of the sample. These parameters were chosen because of their significant influence on the mechanical properties of concrete and their critical role in designing the optimal mix. A brief description of the rationale behind selecting each input parameter is provided in the following section.

Waste foundry sand to cement ratio (WFS/C)

This ratio represents the amount of waste foundry sand used as a cement substitute. An increase in this ratio may reduce the elastic modulus and splitting tensile strength, because WFS generally lacks the mechanical properties of cement. This input variable was included due to the importance of waste management and in reducing cement consumption.

Waste foundry sand to fine aggregate ratio (WFS/FA)

This ratio indicates the proportion of waste sand used to replace part of the fine aggregate. An increasing ratio may influence the mechanical properties of concrete by affecting the bond quality between the cement paste and the aggregates. This input parameter was selected to examine the effect of using alternative materials as fine aggregates.

Fine aggregate to total aggregate ratio (FA/TA)

This ratio represents the distribution of fine and coarse aggregate within the concrete mix. Variations in this ratio can influence the compressibility and density of the mix and, consequently, affect mechanical properties such as splitting tensile strength. Moreover, a higher proportion of fine aggregates reduces porosity and increases the elastic modulus by making the mix more compact. This input parameter was included to optimize the concrete mix design.

Water to cement ratio (W/C)

A higher W/C ratio generally reduces the elastic modulus as excess water increases the porosity of the concrete matrix. Conversely, a lower W/C ratio tends to enhance the splitting tensile strength due to improved bonding between the cement paste and aggregates. However, with excessive W/C, the splitting tensile strength decreases. This input is selected based on concrete mix design standards.

Coarse aggregate to cement ratio (CA/C)

This ratio reflects the influence of coarse aggregates on the strength and elastic modulus of concrete. Increasing this ratio enhances the elastic modulus but may adversely affect the workability of the mixture. A higher CA/C ratio can also reduce the splitting tensile strength because coarse aggregates tend to create discontinuities within the cementitious matrix. This input was selected due to the role of coarse aggregates in stress distribution and cracking reduction.

Superplasticizer to cement ratio (1000SP/C)

The use of superplasticizers improves the workability of concrete while reducing the water-to-cement ratio. An increase in this ratio generally enhances the splitting tensile strength and elastic modulus by reducing porosity and strengthening internal bonds. This input parameter was selected because of the important role of superplasticizers in concrete quality.

Age of the sample (age)

The age of concrete directly affects its mechanical properties. As the age increases, the splitting tensile strength and elastic modulus of concrete typically improve due to continued cement hydration. This input was selected because of the natural effect of time on the properties of concrete.

In this study, two separate laboratory data sets were used to analyze the mechanical properties of concrete: 146 data sets related to elastic modulus and 242 data sets related to splitting tensile strength41,42,43,44. Prior to modeling, data preprocessing was carried out to ensure consistency and reliability. Outliers were identified and removed using the Modified Z-score method with a threshold value of 3.5, applied to all input variables. After data cleaning, 242 valid data points remained for STS and 146 for E. In the following sections, the effect of input variables on each of these outputs is examined separately, and the modeling results are presented. Tables 1 and 2 provide statistical information related to STS and E input and output variables, respectively. Additionally, Figs. 1 and 2 illustrate the histograms of the input and output data for STS and E and their distribution curves.

Table 1 Dataset used in the current study for STS.
Table 2 Dataset used in the current study for E.
Fig. 1
Fig. 1
Full size image

Histogram and fitted normal distribution of the input and output data related to splitting tensile strength (STS).

Fig. 2
Fig. 2
Full size image

Histogram and fitted normal distribution of the input and output data related to elastic modulus (E).

Modeling

GMDH

A three-layer hidden architecture with up to 20 neurons in each layer was used to predict the E and STS using the GMDH method. The network structure was automatically optimized to achieve the best model performance. In the model training process, 80% of the data was used for model development, and the remaining 20% ​​was reserved for validation. This approach allowed for the assessment of the predictive accuracy of the model under various conditions. Figure 3 illustrates the structure of GMDH networks used for STS and E prediction.

Fig. 3
Fig. 3
Full size image

Structure of GMDH neural network for predicting (a) STS and (b) E.

Equations 1 and 2 present the polynomial expressions derived from the GMDH method for predicting STS and E, respectively.

$$\left\{ \begin{aligned} {y_1} & =3.473+0.121 \times {\text{CA/C}}+0.66 \times {\text{1000SP/C}} - 0.091 \times {\text{CA/}}{{\text{C}}^2} \hfill \\ & \quad - 6.63 \times {10^{ - 3}} \times {\text{1000SP/}}{{\text{C}}^2} - 0.17 \times {\text{CA/C}} \times {\text{1000SP/C}} \hfill \\ {y_2} & =3.427 - 0.889 \times WFS/C+0.165 \times WFS/FA+0.215 \times WFS/{C^2} \hfill \\ & \quad - 5.51 \times {10^{ - 3}} \times WFS/F{A^2} - 0.192 \times WFS/C \times WFS/FA \hfill \\ {y_3} & =2.928 - 0.053 \times {\text{1000SP/C}}+0.008 \times Age+0.004 \times {\text{1000SP/}}{{\text{C}}^2} \hfill \\ & \quad - 1.52 \times {10^{ - 5}} \times Ag{e^2} - 8.92 \times {10^{ - 5}} \times {\text{1000SP/C}} \times Age \hfill \\ {y_4} & =3.333 - 0.712 \times WFS/FA - 0.012 \times {\text{1000SP/C}}+0.182 \hfill \\ & \quad \times WFS/F{A^2}+1.17 \times {10^{ - 3}} \times {\text{1000SP/}}{{\text{C}}^2} - 0.016 \times WFS/FA \times {\text{1000SP/C}} \hfill \\ \end{aligned} \right.$$
(1)
$$\left\{ \begin{aligned} {Y_1} & = - 33.908 - 2.212 \times {y_1}+22.624 \times {y_3}+0.434 \hfill \\ & \quad \times y_{1}^{2} - 3.317 \times y_{3}^{2}+0.147 \times {y_1} \times {y_3} \hfill \\ {Y_2} & =1.903+0.690 \times {y_2} - 0.991 \times {y_4} - 0.28 \hfill \\ & \quad \times y_{2}^{2} - 0.095 \times y_{4}^{2}+0.595 \times {y_2} \times {y_4} \hfill \\ \end{aligned} \right.$$
$$\begin{aligned} STS & = - 13.191+0.507 \times {Y_1}+8.675 \times {Y_2} \hfill \\ & \quad - 0.352 \times Y_{1}^{2} - 1.803 \times Y_{2}^{2}+0.903 \times {Y_1} \times {Y_2} \hfill \\ \end{aligned}$$
$$\left\{ \begin{aligned} {y_1} & =353.89 - 1311.69 \times W/C - 5.076 \times 1000SP/C+1309.14 \hfill \\ & \quad \times W/{C^2}+0.052 \times 1000SP/{C^2}+8.859 \times W/C \times 1000SP/C \hfill \\ {y_2} & = - 27.11+64.051 \times CA/C - 5.818 \times 1000SP/C \hfill \\ & \quad - 15.254 \times CA/{C^2}+0.095 \times 1000SP/{C^2}+1.506 \times CA/C \times 1000SP/C \hfill \\ {y_3} & =236.64 - 725.085 \times W/C - 171.24 \times FA/TA+697.37 \hfill \\ & \quad \times W/{C^2}+234.43 \times FA/T{A^2}+45.94 \times W/C \times FA/TA \hfill \\ \end{aligned} \right.$$
(2)
$$\left\{ \begin{aligned} {Y_1} & = - 35.558+5.353 \times {y_1} - 2.105 \times {y_2} \hfill \\ & \quad - 0.038 \times y_{1}^{2}+0.084 \times y_{2}^{2} - 0.08 \times {y_1} \times {y_2} \hfill \\ {Y_2} & =12.313+0.846 \times {y_2} - 0.638 \times {y_3} \hfill \\ & \quad +0.081 \times y_{2}^{2}+0.104 \times y_{3}^{2} - 0.173 \times {y_2} \times {y_3} \hfill \\ \end{aligned} \right.$$
$$\begin{aligned} E & =1.391+4.86 \times {Y_1} - 3.653 \times {Y_2} - 0.174 \hfill \\ & \quad \times Y_{1}^{2} - 0.029 \times Y_{2}^{2}+0.204 \times {Y_1} \times {Y_2} \end{aligned}$$

RSM

In this section, the RSM method was employed to model and predict STS and E. As a practical statistical approach, RSM enables the analysis of variable interactions and the development of mathematical relationships among them. The final regression equations were derived to describe the relationships between input and output variables and to assess the significance of the RSM model parameters. Equations 3 and 4 present the mathematical relationship between the input variables with STS and E, respectively.

$$\begin{aligned} STS & =4.0817 - 0.642 \times WFS/C - 1.526 \times W/C \hfill \\ & \quad - 0.120 \times CA/C+0.1844 \times WFS/FA \hfill \\ & \quad +0.7824 \times 1000SP/C - 4.334 \times {10^{ - 3}} \times Age \hfill \\ & \quad +0.0205 \times WFS/C \times Age - 0.6672 \times W/C \times 1000SP/C \hfill \\ & \quad +0.0775 \times W/C \times Age - 0.1392 \times CA/C \hfill \\ & \quad \times 1000SP/C - 8.2998 \times {10^{ - 3}} \times CA/C \times Age \hfill \\ & \quad - 0.0158 \times WFS/FA \times Age \hfill \\ & \quad - 1.73690507445996 \times {10^{ - 5}} \times Ag{e^2} \hfill \\ \end{aligned}$$
(3)
$$\begin{aligned} E & =2299.95 - 45.9427 \times WFS/C4321.21 \times W/C - 604.924 \times CA/C \hfill \\ & \quad - 2041.52 \times FA/TA - 435.29 \times WFS/FA - 59.6731 \times 1000SP/C \hfill \\ & \quad - 0.388477 \times Age - 287.085 \times WFS/C \times W/C+96.8323 \times WFS/C \times CA/C \hfill \\ & \quad - 167.266 \times WFS/C \times FA/TA+197.768 \times WFS/C \times WFS/FA \hfill \\ & \quad +0.340552 \times WFS/C \times Age - 959.453 \times W/C \times CA/C+207.258 \times W/C \times WFS/FA \hfill \\ & \quad +232.165 \times W/C \times 1000SP/C - 0.762129 \times W/C \times Age+562.072 \times CA/C \times FA/TA \hfill \\ & \quad +41.872 \times CA/C \times WFS/FA - 17.2479 \times CA/C \times 1000SP/C+0.112909 \times CA/C \times Age \hfill \\ & \quad +516.433 \times FA/TA \times WFS/FA+1.47503 \times FA/TA \hfill \\ & \quad \times Age - 141.834 \times WFS/{C^2}+7168.61 \times W/{C^2} \hfill \\ & \quad +154.541 \times CA/{C^2}+572.575 \times FA/T{A^2}+ - 16.6941 \hfill \\ & \quad \times WFS/F{A^2} - 1.1044 \times {10^{ - 4}} \times Ag{e^2} \hfill \\ \hfill \\ \end{aligned}$$
(4)

Table 3 presents the analysis of variance (ANOVA) results for Eq. 3 in predicting STS. This table provides detailed statistical information, including the F-value, P-value, and the contribution of each factor to the total variance. Analysis of these data is essential for evaluating the validity of the model and identifying influential variables.

Table 3 ANOVA evaluation of the quadratic model for STS prediction.

The results in Table 3 indicate that the quadratic model used to predict STS of concrete is statistically significant, identifying the general trend and the relationship between the input and response variables. The model F value of 13.21 and a p-value of less than 0.0001 confirm significant relationships between the main parameters and the response with high confidence. However, the Lack of Fit test (p = 0.0014) reveals that the model does not perfectly reproduce the experimental data. This means that although the model is statistically valid and can describe the general patterns between the variables, it cannot fully explain all the variability observed in the experimental data. Such discrepancies may be attributed to the complex behavior of concrete, insufficient data at certain critical points, or the omission of influential variables.

Next, the quadratic Equation predicting the elastic modulus and its ANOVA table are examined using the RSM method. Table 4 shows the ANOVA analysis of Eq. 4 for predicting E.

Table 4 ANOVA evaluation of the quadratic model for E prediction.

According to Table 4, the quadratic model developed for predicting E is statistically significant, with an F-value of 95.74 and a p-value < 0.0001. This demonstrates that the model effectively explains the variance in the experimental data. The primary factors, such as W/C, CA/C, FA/TA, and Age, are statistically significant, along with their interactions and second-order terms, indicating a substantial influence on the response. In contrast, the variable WFS/FA with p-value = 0.755 is insignificant and has no notable impact on the model. The residual value and the mean square error (2.35) indicate that the unexplained variation by the model is relatively low and that the model has acceptable accuracy. Moreover, Lack of Fit had an F-value of 1.41 and a p-value of 0.26 with pure error of 1.72, indicating that Lack of Fit is insignificant. The insignificance of Lack of Fit further confirms that the provides an acceptable level of accuracy in predicting E.

GEP

Next, the GEP method was employed to predict STS and E. This approach was selected because of its high ability to discover complex nonlinear relationships among variables. It is also capable of producing accurate and interpretable models. For model development, the available dataset was divided into two categories:

  • 80% of the data for the model training phase to learn the relationships between inputs and outputs.

  • 20% of the data for testing the model and evaluating its prediction accuracy.

Table 5 summarizes the parameters used in the GEP models. These parameters include model inputs and related settings used to optimize and increase the accuracy of STS and E predictions.

Table 5 Summary of parameters used in GEP models.

The equations obtained for predicting STS and E by the GEP method are as follows:

$$\begin{aligned} STS & =\left[ {\cos \left( {\left( {CA/C} \right)+\left( {\left( {\left( {0.58 - FA/TA} \right) \times W/C} \right)+\cos \left( {9.11} \right)} \right)} \right)} \right] \hfill \\ & \quad +\left[ {\cos \left( {\cos \left( {W/C - Age} \right)} \right)} \right] \hfill \\ & \quad + \left[ {\cos \left( {\cos \left( {CA/C+\left( {\left( {0.58 - FA/TA} \right) \times W/C} \right)+\cos \left( {9.33} \right)} \right)} \right)} \right] \hfill \\ & \quad + \left[ {\cos \left( {\cos \left( {\left( {\left( {FA/TA - 1000SP/C} \right) \times \cos \left( {CA/C} \right)} \right)} \right)+CA/C} \right)} \right] \hfill \\ \hfill \\ \end{aligned}$$
(5)
$$\begin{aligned} E & =\left[ {{{CA/C} \mathord{\left/ {\vphantom {{CA/C} {\left( {CA/C - WFS/FA} \right)}}} \right. \kern-0pt} {\left( {CA/C - WFS/FA} \right)}}} \right] \hfill \\ & \quad +...\left[ {WFS/F{A^3}} \right]+\left[ {WFS/FA} \right] \hfill \\ & \quad +...\left[ {\left( {\left( {WC - FA/TA} \right) - WFS/{C^3}} \right)+\left( {\left( {WFS/C \times FA/TA} \right)+FA/TA} \right)} \right] \hfill \\ & \quad +...\left[ {\left( {\left( { - 3.912 - 3.64} \right) - \left( {FA/TA - CA/C} \right) \times \left( {\left( {{{ - 1.93} \mathord{\left/ {\vphantom {{ - 1.93} {W/C}}} \right. \kern-0pt} {W/C}}} \right) - 2.201} \right)} \right)} \right] \hfill \\ \end{aligned}$$
(6)

Figures 4 and 5 illustrate the expression trees corresponding to Eqs. (5) and (6), which describe the relationships between the input variables and the output parameters, STS and E. In these models, variables d₀ to d₆ represent WFS/C, WFS/FA, FA/TA, W/C, CA/C, 1000SP/C, and Age, respectively, while c₀ to c₆ denote the constants automatically generated during the model development process.

Fig. 4
Fig. 4
Full size image

Expression trees (Sub-ET1 to Sub-ET4) of the GEP model for predicting STS.

Fig. 5
Fig. 5
Full size image

Expression trees (Sub-ET1 to Sub-ET5) of the GEP model for predicting E.

Performance of methods in predicting STS and E and sensitivity analysis of input parameters

Several statistical indices, R, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Standard Deviation (StD), and Scatter Index (SI), were used to evaluate the performance of different methods in predicting STS and E. The equations for calculating these statistical parameters are presented below.

$$R=\frac{{\sum {({y_i} - \bar {y})} ({{\hat {y}}_i} - \bar {\hat {y}})}}{{\sqrt {\sum {{{({y_i} - \bar {y})}^2}} } \sqrt {\sum {{{({{\hat {y}}_i} - \bar {\hat {y}})}^2}} } }}$$
(7)
$$MAE=\frac{{\sum\limits_{{i=1}}^{n} {\left| {{Y_{i(Act)}} - {Y_{i(\Pr e)}}} \right|} }}{N}$$
(8)
$$RMSE=\sqrt {\frac{{\sum\limits_{{i=1}}^{n} {{{\left( {{Y_{i(Act)}} - {Y_{i(\Pr e)}}} \right)}^2}} }}{n}}$$
(9)
$$StD=\sqrt {\frac{{\sum\limits_{{i=1}}^{n} {{{\left( {{y_i} - \overline {y} } \right)}^2}} }}{{n - 1}}}$$
(10)
$$SI=\frac{{RMSE}}{{\left( {{1 \mathord{\left/ {\vphantom {1 n}} \right. \kern-0pt} n}} \right)\sum\nolimits_{{i=1}}^{n} {{y_{i(Actual)}}} }}$$
(11)

In Eqs. (7 to 11), \(\overline {y}\) is the mean of the data, n is the total number of data, Yi(Act) is the actual value, and Yi(Pre) is the predicted value of the ith sample of the data set.

Figures 6 and 7 illustrate the relationship between the predicted and actual values ​​of STS and E for the three GMDH, RSM, and GEP models. In both cases, the closer the data points are to the 45-degree line, the higher the model accuracy. In the STS prediction, the GMDH model, with a correlation coefficient of R = 0.731, provides the best fit to the actual data. The data are mainly within the ± 20% error range, indicating satisfactory accuracy. The RSM model has an acceptable but slightly weaker performance, and the GEP model exhibited the widest error dispersion.

In the prediction of E, the RSM model with R = 0.978 provided the highest correlation between the actual and predicted values and the closest fit to the ideal line. The GMDH and GEP models, with R = 0.855 and R = 0.744, demonstrated weaker predictive capability and greater deviation from the measured values. Therefore, the RSM model in estimating E and the GMDH model in predicting STS were identified as the most accurate models.

Fig. 6
Fig. 6
Full size image

Correlation between the predicted and actual STS values using the GEP, RSM, and GMDH models.

Fig. 7
Fig. 7
Full size image

Correlation between the predicted and actual E values using the GEP, RSM, and GMDH models.

In selecting the most appropriate model, in addition to the correlation coefficient, other statistical indices such as MSE, MAE, and RMSE should also be considered to achieve a more accurate evaluation of the performance of each model. Figure 8 compares different statistical indices to assess the performance of GMDH, GEP, and RSM models in predicting STS. These indices provide a basis for evaluating the accuracy and predictive capability of each model.

Fig. 8
Fig. 8
Full size image

Comparison of statistical indices of GMDH, GEP and RSM models for STS prediction.

Figure 8 shows that the GMDH model achieved the highest R-value, indicating a strong correlation between the actual and predicted values. It also has the lowest RMSE = 0.533 and MAE = 0.434, indicating its superior predictive accuracy compared with the other methods. In contrast, the GEP model has the lowest R-value and the highest MSE (0.648) and MAE (0.5406), reflecting the weakest predictive performance and the largest error among the models. The RSM model performed between the two model, with an R-value of 0.655, demonstrating a moderate correlation between the actual and predicted data.

To assess the statistical significance of the difference in model performance, the Wilcoxon signed-rank test was performed on the absolute error values ​​between the actual and predicted data. The results revealed that the difference in performance between the RSM and GMDH models (p = 0.0066) and between RSM and GEP (p = 0.0125) was statistically significant. In addition, the difference between the GMDH and GEP models was also significant (p = 4.74 × 10⁻⁵), indicating a significant difference in the prediction accuracy of the two methods. Figure 9 also presents the residual plots for the compared models. As shown, the GMDH model has a relatively symmetrical distribution of residuals without systematic patterns and fewer outliers than the other models. In contrast, the GEP model has a wider dispersion and several significant errors in areas with high STS values. The RSM model, although generally consistent, displays more fluctuations in residuals for boundary data. Accordingly, the GMDH model provides the most accurate and stable performance in predicting the STS of concrete and was selected for further analyses, including sensitivity analysis.

Fig. 9
Fig. 9
Full size image

Residual plots of GMDH, RSM, and GEP models for predicting the STS of concrete.

Various statistical indices are shown in Fig. 10 to evaluate the performance of different models in predicting E.

Fig. 10
Fig. 10
Full size image

Comparison of statistical indices of GMDH, GEP and RSM models for E prediction.

Figure 10 demonstrates that, for predicting E, the RSM method had the best performance, with the highest correlation coefficient (R = 0.978) and the lowest errors (RMSE = 1.372 and MAE = 1.088). The GMDH method also performed well but with lower performance than RSM, with R = 0.855 and moderate errors (RMSE = 3.478 and MAE = 2.702). In contrast, the GEP method showed the weakest predictive capability, with the lowest R-value (R = 0.744) and the highest errors (RMSE = 4.48 and MAE = 3.346).

Furthermore, the results of the Wilcoxon signed-rank test based on the absolute error values ​​revealed that the difference between the RSM model and the other models is statistically significant (RSM vs. GMDH: p = 2.62 × 10⁻¹⁶, RSM vs. GEP: p = 9.05 × 10⁻¹⁵). In contrast, the difference between GMDH and GEP is not statistically significant (p = 0.8276). These findings indicate that the RSM model is the most statistically accurate method for predicting the E, whereas the GMDH and GEP models have comparable performance. Additionally, the residual plots shown in Fig. 11 confirm this conclusion. The RSM model demonstrates that the distribution of errors is more symmetrical around the zero axis without any systematic pattern, while the GEP model shows wider dispersion and a greater number of outliers.

Fig. 11
Fig. 11
Full size image

Residual plots of GMDH, RSM, and GEP models for predicting the E of concrete.

The selected GMDH and RSM models were used to perform sensitivity analysis for STS and E, respectively. Sensitivity analysis evaluates the effect of each input variable on the model output and plays a key role in understanding the relationships among variables. This process involves controlled variations in the values ​​of input parameters such as WFS/C, WFS/FA, FA/TA, W/C, CA/C, 1000SP/C, and Age while keeping other factors constant. In this analysis, the values ​​of input variables are changed relatively at different levels to determine the response of the model to these variations. Then, the effect of each parameter on the model output is discussed, and their importance in predicting STS and E is determined. For this purpose, the method presented by Liong et al.45 has been used to assess the relative sensitivity of each variable.

$$SL{\text{ }}of{\text{ }}{X_{i{\text{ }}}}(\% )=\frac{1}{M}{\text{ }}\sum\limits_{{j=1}}^{M} {{\text{ }}{{\left( {\frac{{\% {\text{ }}change{\text{ }}in{\text{ }}output}}{{change{\text{ }}in{\text{ }}\operatorname{int} put}}} \right)}_j} \times 100}$$
(12)

Figures 12 and 13 show the sensitivity analysis of the GMDH model for STS prediction and the RSM model for E prediction, respectively.

Fig. 12
Fig. 12
Full size image

Sensitivity analysis of input parameters based on the GMDH model for STS prediction.

The analysis results indicate that the CA/C ratio has the most significant influence on STS, with the model exhibiting high sensitivity to changes in this parameter across all ranges. This finding highlights the strong dependence of STS on the aggregate-to-cement ratio. The WFS/C ratio was identified as the second most influential variable, where higher values lead to a greater model. Also, Ageshowed the lowest sensitivity, which was considered the least significant factor. These results suggest that optimizing the aggregate-to-cement and recycled sand-to-cement ratios can significantly enhance he prediction and control of splitting tensile strength. In contrast, the effects of other parameters are relatively limited. Overall, this analysis provides valuable insight for the optimal design of concrete mixtures and the enhancement of their mechanical performance.

The sensitivity analysis of the RSM model for predicting the E indicates that the W/C ratio significantly influences the model output. In contrast, other variables, such as WFS/C, CA/C, and FA/TA, exhibit a comparatively lower impact. Additionally, minor variations in the value of 1000SP/C and Age have a negligible effect on the model. Overall, this analysis confirms the importance of accurately controlling the water-cement ratio in designing concrete mixtures to achieve optimal E values.

Fig. 13
Fig. 13
Full size image

Sensitivity analysis of input parameters based on the RSM model for E prediction.

Conclusion

This study used interpretable and semi-interpretable soft computing methods, including GMDH, GEP, and RSM, were employed to develop predictive relationships for estimating E and STS of concrete containing WFS. In addition, a sensitivity analysis was conducted to evaluate the influence of input parameters on these mechanical properties. The key findings of this research are summarized as follows:

  • The GMDH model achieved the highest R-value in predicting STS. This model had the lowest error values, with RMSE = 0.533 and MAE = 0.434. Based on all statistical indices, it demonstrated the best overall performance among the evaluated methods.

  • Among the three methods used to predict E, the RSM method performed the best with the highest correlation coefficient R = 0.978 and the lowest errors (RMSE = 1.372 and MAE = 1.088). The GMDH method had an acceptable performance but with lower accuracy than RSM, with R = 0.855 and moderate errors (RMSE = 3.478 and MAE = 2.702).

  • The sensitivity analysis revealed that the CA/C ratio had the most significant influence on STS. The WFS/C ratio was identified as the second most influential parameter, with model sensitivity increasing as its value increased. Other parameters, such as WFS/FA and 1000SP/C, showed comparatively minor effects on STS.

  • Sensitivity analysis of the RSM model for E indicated that the W/C ratio had the most significant influence on E. Other variables, including WFS/C, CA/C, and FA/TA, showed comparatively lower effects.