Introduction

The construction industry’s environmental impact has spurred research into sustainable practices, focusing on integrating industrial byproducts like waste foundry sand (WFS) into construction materials. While WFS shows promise as a natural sand alternative, its application in cement-based materials is hindered by composition variability concerns. Despite extensive research on WFS and cement strength class (CSC) individually, their combined effect on compressive strength remains underexplored1,2. Traditional empirical models often fail to capture the complex interactions among material components in predicting compressive strength. To overcome this limitation, the present study utilizes Gene Expression Programming (GEP), an advanced machine learning technique, to develop a predictive model based on varying levels of waste foundry sand (WFS) and cement strength classes (CSC). This approach supports the optimization of mortar mix designs and promotes sustainable construction through resource efficiency and waste valorization. The study also examines the life cycle of WFS, which originates from silica sand used in metal casting. After repeated use in molds, the sand loses its functional properties and becomes a byproduct, contributing to environmental challenges due to landfill disposal. Through washing, sieving, and chemical treatment, WFS can be processed for reuse. Once treated, it serves as an effective replacement for natural aggregates in cementitious materials, enhancing sustainability in construction applications.

The incorporation of WFS into structural applications promotes waste valorization, conserves natural resources, and aligns with circular economy principles by reducing reliance on virgin materials. Growing demand for sustainable, high-performance construction materials has driven research toward innovative mix design optimization and advanced predictive modeling methods3. Compressive strength of cement mortar4 is significantly influenced by key factors such as the strength class of cement (CSC)5 and alternative aggregate materials6, waste foundry sand (WFS)7, making them critical elements in the research of construction materials. While prior studies have looked at the individual impacts of CSC, WFS, and their predictive modeling through methods like gene expression programming (GEP)8, there has been no research that examines their combined effect on mortar compressive strength. This gap highlights the significance of the current study, which aims to explore the simultaneous influence of WFS and CSC on the strength of cement mortar using GEP. The objective of this research is to develop a more comprehensive predictive model, thereby enhancing material selection and performance optimization for sustainable construction practices.

In a study, Reis et al.9 studied the effect of different cements in concrete on the environment. The studies showed that different energy is consumed to produce cement with different grades, and the higher the grade of cement, the finer the grains and the more electrical energy is consumed. However, given that better hydration occurs and fewer polluting gases are produced, using cements with higher softness is more beneficial and economical for the environment. Kazemi et al.10 performed an investigation to evaluate the impact of CSC on the compressive strength of cement mortar. Their experimental framework incorporated six different W/C (0.5, 0.45, 0.4, 0.35, 0.3, and 0.25), three S/C (3, 2.75, and 2.5), and three distinct CSCs (52.5, 42.5, and 32.5 MPa). The findings revealed that the ideal mix design is contingent upon the CSC. For example, the 28 days compressive strength of specimens utilizing 32.5 MPa cement and a W/C ratio of 0.25 reached its maximum at an S/C ratio of 2.75, while for 52.5 MPa cement with the same W/C ratio, the peak strength was observed at an S/C ratio of 2.5. In a related study, Ghaemifard et al.11 explored the effects of freeze–thaw cycles (200, 150, 100, 50, and 0 cycles) on mortar samples composed of various CSCs. Their results demonstrated that samples containing 52.5 MPa cement experienced a slower rate of strength degradation during freeze–thaw cycles compared to those with 42.5 and 32.5 MPa cement. Specifically, after 50 and 100 freeze–thaw cycles, the strength reduction was markedly less for samples made with 52.5 MPa cement than for those with lower-strength classes. These investigations highlight the critical need to comprehend the interactions between CSC, mix design, and environmental factors to enhance the performance of cement mortar.

The influence of waste materials on the compressive strength of concrete and mortar has been extensively researched, with numerous scholars investigating their viability as sustainable substitutes for natural aggregates12,13,14,15,16. A research investigation carried out by Prabo et al.17, it was noted that substituting up to 20% of natural aggregate with waste fines did not significantly affect the strength of concrete; however, exceeding this threshold resulted in diminished performance. Likewise, Arumathi et al.18 explored the replacement of natural aggregate with waste aggregate and discovered that at W/C of 0.4, a 30% substitution of natural aggregate resulted in only a 1.9% a reduction in compressive strength compared to samples composed entirely of natural aggregate. At a W/C ratio of 0.5, this reduction increased to 6.1%. These findings suggest that waste aggregate can serve as an effective alternative to natural aggregate, thereby aiding in resource conservation. In a separate investigation, Çevik et al.19 An analysis of cement mortar with varying ratios of waste fine aggregate revealed that replacing more than 15% of natural aggregate significantly reduces compressive strength compared to mixes with only natural aggregate. These findings highlight the potential of incorporating waste materials into cementitious composites to improve sustainability, provided replacement levels are carefully managed to maintain structural performance.

Although many studies have been conducted focusing on manufacturing samples with different cements20,21, aggregates22,23,24, and additives25,26,27,28,29,30,31,32, studies have shown that the process of constructing and curing cement-based samples imposes a lot of cost and time on the projects33. Therefore, to solve these problems, researchers were looking for alternative solutions to achieve the desired results34. Since the results obtained from the production of mortar and concrete samples depend on several input parameters35,36, including the water/cement ratio37,38, sand/cement ratio39,40, type of cement41, type of aggregate42,43, and additives44,45,46, the solution provided must be nonlinear47,48, simplified, and practical49. Therefore, the presentation of meta-heuristic algorithms became important50,51.

In recent years, prediction methods52,53 such as fuzzy logic (FL)54,55, artificial neural networks (ANN)56,57, multiple linear regression (MLR)58,59, multi expression programming (MEP)60,61, genetic programming (GP)62,63,64,65,66,67 genetic algorithm (GA)68,69, gene expression programming (GEP)70,71 have been among the most widely used methods to reduce time and cost while maintaining the accuracy of results. GEP method for forecasting the compressive strength of cement-based materials has been investigated in numerous studies, showcasing its efficacy as a predictive instrument. Mahdinia et al.72 utilized GEP methodologies to model the compressive strength of cement mortar, emphasizing the critical influence of software parameters, such as the linking function and the number of inputs, on improving prediction accuracy. In a similar vein, Iqbal et al.73 carried out a research study focused on the prediction of the mechanical properties of green concrete that incorporates WFS, employing GEP. In this investigation, they applied the GEP technique to estimate the compressive strength of the concrete. Their research was based on a dataset comprising 234 concrete mix designs obtained from earlier studies. The GEP model incorporated four input variables: the (W/C), (%WFS), (%WFS/C), and the fineness modulus. The resulting predictive equation exhibited a high degree of accuracy, attaining a correlation coefficient (R) of 0.85 and a minimal root mean square error (RMSE) of 4. These results highlight the promise of GEP as a dependable modeling technique for the purpose of forecasting mechanical characteristics in the context of sustainable construction. In a study, Behnood et al.74 investigated the effect of WFS on concrete properties. In this study, 234 compressive strength data, 163 flexural strength data, and 85 elastic modulus data were collected from various articles. The parameters of (W/C) ratio, (%WFS), (%WFS/C), and modulus of elasticity of WFS were used as inputs of GEP, and the parameters of modulus of elasticity, compressive and flexural strength were the outputs of this model. The results show that the presented GEP model has a high R2, so it is an accurate model that can be used for future research.

Extensive research has been conducted on waste foundry sand (WFS) and cement strength class (CSC) separately; however, there is a notable lack of understanding regarding their joint impact on the compressive strength of cement mortar, particularly when utilizing advanced predictive methodologies such as gene expression programming (GEP). This study aims to explore the simultaneous effects of varying proportions of WFS and different CSCs on the compressive strength of cement mortar samples. The research involves the preparation of mix designs in two distinct approaches: random and sorted. The datasets are evaluated under two conditions—one that incorporates CSC as an input variable and another that omits it—to examine the influence of including CSC in the GEP model. By concurrently introducing WFS percentage and CSC as input parameters, the study seeks to ascertain their collective contribution to predictive accuracy. The anticipated results are expected to yield significant insights into the interaction between these variables, thereby facilitating the creation of more precise predictive models and informing sustainable and optimized mortar mix designs for practical use.

Materials, preparation of specimens and curing

In this study, 36 mixing designs comprising three (CSCs) with strengths of 52.5, 42.5, and 32.5 MPa, a single (S/C) proportion of 2.75, six different levels of (WFS) at 50, 40, 30, 20, 10, and 0 percent, and two (W/C) ratios of 0.5 and 0.4 were formulated as delineated in Table 1. (A total of 1,260 specimens were prepared for 36 mix designs. For each mix, five specimens were tested at seven different curing ages: 3, 7, 14, 21, 28, 56, and 91 days.)

Table 1 Mixture designs of cement mortar.

The dosage of (HRWR) in the assorted mixtures was adequate to achieve a flow value of 110 ± 5 within 25 droplets on the flow table75. The compacted cubes measuring 160 \(\times\) 40 \(\times\) 40 mm undergo tamping in a two-layer process for each composition, following the guidelines of ASTM C348-0276.

The characteristics of the cement utilized in this investigation are detailed in Tables 2 and 3, encompassing both chemical and physical attributes.

Table 2 The chemical characteristics of cement mortar samples.
Table 3 The physical characteristics of cement mortar samples.

In alignment with ASTM C-30577, the preparation of the cement paste commenced with the mixing of potable water, maintained at a temperature of 20 ± 2 °C, with different types of cementitious materials. Subsequently, this cement paste was combined with fine aggregates sourced from Mashhad, as illustrated in Fig. 1, along with waste foundry sand obtained from the Mubarake steel facility in Isfahan. The characteristics of these materials are detailed in Table 4.

Fig. 1
figure 1

Sample of fine aggregates: (a) Natural sand, (b) waste foundry sand.

Table 4 The Physical properties of fine aggregate.

The particle size distribution of aggregates is depicted in Fig. 2. Subsequently, the cementitious mortar was introduced into the molds having the conventional cubic dimensions of 160 \(\times\) 40 \(\times\) 40 mm and demolded post a duration of 24 h. Subsequently, all samples were placed in a water container set at a specified temperature for the process of curing.

Fig. 2
figure 2

The distribution of particle sizes in fine aggregates and waste foundry sand (ASTM C33)78.

The ultimate compressive strength of the samples was assessed after curing for 3, 7, 14, 21, 28, 56 and 91 days. In order to enhance both the compressive strength and workability, a high range water reducing (HRWR) agent, characterized by its distinct carboxylic ether polymer composition, was employed.

Figure 3 illustrates the process of sample manufacturing and testing, with each step clearly illustrated. Initially, the materials are combined to create the cement mortar, as depicted in the first step. Subsequently, in the second step, the prepared sample is positioned within the flow table test apparatus. The third step presents the workability results obtained from this test. The fourth step displays the molds utilized in the sample production, followed by the fifth step, where the samples are submerged in a water tank until they reach the appropriate age for testing. Steps six and seven illustrate the testing of the samples, and step eight presents the sample following the completion of the compressive strength test.

Fig. 3
figure 3

Sample structure stages.

Figure 4 presents six samples derived from three distinct mixing designs, displayed both before and after cement mortar compressive strength testing. The designations of these samples are referenced in Table 1, corresponding to mixing designs 6, 18, and 30, which underwent testing at 28 days of age. The sole variable among the samples is the cement strength class parameter. Notably, mixing design 6, utilizing CSC325MPa, exhibits a lower compressive strength compared to the other samples. In contrast, mixing design 30, which incorporates CSC525MPa, demonstrates the highest compressive strength and exhibits minimal damage among the three samples.

Fig. 4
figure 4

Samples were prepared before and after evaluating the compressive strength of cement mortar.

The variables for both input and output of the experimental samples are presented in Table 5.

Table 5 The limit amount of input and output used in GEP approach models.

Figure 5 demonstrates how the compressive strength of concrete (Fc) is influenced by the percentage of waste foundry sand (WFS), the cementitious strength component (CSC) and the ratio of water to cement (W/C). The color gradient visually represents changes in Fc, where red regions indicate higher compressive strength values, while blue and green shades represent lower values. The trend suggests that increasing CSC leads to an increase in Fc, particularly at lower WFS values. However, at higher WFS percentages, Fc declines, indicating that excessive WFS may negatively impact strength. This observation highlights the importance of optimizing the WFS content to balance sustainability and mechanical performance in concrete mixtures.

Fig. 5
figure 5

Contour plot illustrating the relationship between compressive strength of mortar (Fc), the percentage of waste foundry sand (WFS), and (a) the cementitious strength component (CSC), (b) the ratio of water to cement (W/C) with a color gradient representing variations in Fc (MPa).

The Pearson correlation matrix (Table 6) reveals key relationships among concrete mix components and compressive strength (Fc). The strongest positive correlation with Fc is curing age (r = 0.54579, p < 0.0001), confirming that longer curing improves strength, followed by cement class (CSC, r = 0.46572, p < 0.0001) and sand (S, r = 0.36025, p < 0.0001), indicating their contribution to strength. Conversely, HRWR (r = −0.50857, p < 0.0001), waste fine sand (WFS, r =  −0.36025, p < 0.0001), and water-to-cement ratio (W/C, r = −0.22062, p = 0.000418) show negative correlations, suggesting excessive water content, HRWR dosage, and fine particles reduce strength. Water (W) and W/C exhibit a perfect inverse correlation (r = − 1), as expected in mix design, while sand (S) and WFS are also perfectly negatively correlated (r = − 1), confirming their trade-off in the mixture. The strong correlation between HRWR and WFS (r = 0.91791, p < 0.0001) suggests HRWR is often used to counteract workability issues caused by WFS. Overall, cement content, sand, and curing time enhance strength, while excessive admixtures and water content weaken it, aligning with concrete mix design principles.

Table 6 Pearson correlation coefficients (r) measure the linear relationship between different variables related to concrete mix design and compressive strength (Fc) with the range from −1 to 1.

Figure 6 illustrates the Pearson correlation matrix, highlighting the relationships among key mix components and their influence on the compressive strength of concrete, where positive and negative correlations are visually represented by red and blue hues, respectively. The heatmap visually represents the relationships among key parameters affecting compressive strength (Fc). Age (r = 0.54579) and cement content (CSC, r = 0.46572) exhibit strong positive correlations with Fc, confirming that longer curing and higher cement content enhance concrete strength. Sand (S, r = 0.36025) also contributes positively, indicating its role in structural integrity. Conversely, HRWR (r = − 0.50857), waste fine sand (WFS, r = − 0.36025), and water-to-cement ratio (W/C, r = − 0.22062) negatively impact Fc, suggesting that excessive admixtures, fine aggregates, and water content weaken the mixture. The inverse correlation between water (W) and W/C (r = − 1), as well as between sand (S) and WFS (r = − 1), highlights their trade-offs in mix design. Additionally, HRWR and WFS (r = 0.91791) show a strong positive correlation, indicating that HRWR is often added to counterbalance workability issues introduced by WFS. Overall, the heatmap underscores the importance of optimizing mix proportions to maximize compressive strength while minimizing detrimental effects from excessive additives and water content.

Fig. 6
figure 6

Heatmap of the Pearson correlation matrix for mortar mix components and compressive strength (Fc), with color intensity representing correlation strength (red for positive and blue for negative correlations).

Figure 7 provides a comprehensive visualization of the relationships between different concrete mix parameters and their impact on compressive strength (Fc). The pair plot provides a comprehensive visualization of the relationships between different concrete mix parameters, including CSC, W, S, HRWR, W/C, WFS, Age and Compressive Strength (Fc). The diagonal elements of the plot display the distribution of each individual variable, revealing their spread and potential skewness. Off-diagonal scatter plots illustrate the pairwise relationships between variables, where trends and correlations can be visually assessed. For instance, a noticeable negative correlation is observed between HRWR and W/C, indicating that as the high-range water reducer increases, the water-to-cement ratio decreases. Similarly, there is a visible trend between Age and Fc, suggesting that concrete strength increases over time. Additionally, the plot helps identify potential non-linear relationships and clusters within the data, offering valuable insights for further statistical analysis and modeling.

Fig. 7
figure 7

Pair plot showing the relationships between mortar mix parameters, including CSC, W, S, HRWR, W/C, WFS, Age, and Fc, with diagonal elements representing individual variable distributions and off-diagonal elements depicting pairwise scatter plots.

The distribution of these variables is illustrated through a frequency histogram in Fig. 8. This figure presents histograms of key variables influencing mortar compressive strength (Fc), including CSC compressive strength (MPa), water/cement ratio (W/C), percentage of WFS, high-range water reducer (HRWR) dosage (ml), curing age (days), and Fc (MPa). Each subplot shows frequency (green bars) and cumulative percentage (orange line) distributions to illustrate the range and distribution of these parameters in the dataset. The discussion will highlight key observations, such as the narrow range of CSC (32.5–52.5 MPa) and W/C (0.4–0.5), indicating controlled mix design, while WFS (0–50%) and age (3–91 days) show wider variability, reflecting diverse experimental conditions. The Fc distribution (0–60 MPa) will be linked to these factors, suggesting their combined influence on strength development.

Fig. 8
figure 8

Histograms of the variables, (a) CSC (MPa), (b) W/C, (c) %WFS, (d) HRWR (ml), (e) Age (day), (f) Fc (MPa).

Evolutionary methodologies: GA, GP and GEP

Gene expression programming (GEP) represents a modern advancement of Genetic programming (GP) and was introduced by Ferreira79. Broadly speaking, the procedural stages of GEP bear resemblance to those of GP80. Both methodologies, incorporating genetic algorithm concepts, adhere to the principles of natural selection based on Darwinian Theory81. The fundamental procedures of the genetic methodology are articulated as outlined82.

  1. (1)

    The input variables and parameters of the genetic algorithm are essential components to consider.

  2. (2)

    The creation of an initial population of individuals as parents sets the foundation for the next generation.

  3. (3)

    Providing a fitness value to each initial individual is a vital phase in the procedure.

  4. (4)

    The application of genetic operations is necessary for the population of the current generation to evolve.

    1. (a)

      During the crossover operation, the genetic algorithm alters the strings of two parents, resulting in the generation of two offspring. This process involves genetically recombining substrings at a randomly selected crossover point. The offspring generated are then placed in the population based on the minimum fitness criterion.

    2. (b)

      The operation of mutation involves the random alteration of a string belonging to a pair of selected parents with an arbitrary string. Subsequently, the resultant new offspring is introduced into the population in place of one of the selected parents.

  5. (5)

    Conducting step 4 is imperative for a sufficient number of generations to meet the stipulated termination criteria.

Individuals within Genetic Programming (GP) are typically denoted by parse trees within List Processing (LisP) programs. In GEP, after the segment linked to parse tree encoding (Head), the vacant cells are filled in the process of generation (Tail). An illustration of a chromosome, along with its corresponding LisP code and mathematical representation, is depicted in Fig. 9. The decoding of chromosomes into expression trees (ETs) is accomplished through the utilization of rules that also play a role in the interaction among ETs. These rules constitute the two primary parameters for GEP languages, where the language of the gene and ETs are distinct languages employed within this framework. The interplay between the phenotype and the gene sequence contributes to the further proliferation of GEP within the context of the Karva language83. To produce offspring, one must select two parents based on the strength of their mortar and exchange their chosen branches. The genetic programming crossover operation, as demonstrated in the example, bears a striking resemblance to the techniques of pruning and grafting observed in arboriculture. Its structure exhibits conformity with LisP programs84.

Fig. 9
figure 9

A tree representation for computer program in LisP.

An illustration of genetic programming crossover can be found in Fig. 10, depicting the identification of branches to be altered.

Fig. 10
figure 10

Example of gene expression programming crossover.

The mutate operator functions by selecting a node within the parse tree and modifying the element of said node either by introducing a different element accidentally or by substituting it with a randomly generated branch. Figure 11 provides an example of mutation, showcasing the random alteration of elements. In addition, the process of gene substitution in GEP is illustrated in Fig. 12.

Fig. 11
figure 11

Example of gene expression programming mutation.

Fig. 12
figure 12

GEP flowchart.

Development of the prediction model

In this research, to obtain the appropriate model, 4 GEP models have been implemented. In all these models, the GEP settings are the same, including (Function set, Genetic operators, RNC, and Numerical Constant) and their linking function is the addition function (Table 7).

Table 7 Parameters setting for GEP model.

Both models of the four implemented models are different from each other. For example, GEP1 and GEP2 have 5 input parameters (WFS, W/C, S/C, HRWR, and age), and GEP3 and GEP4 are implemented with 6 input parameters (5 parameters in GEP1 and GEP2 plus the influential parameter of the CSC) (Table 8). It is observed that in GEP1 and GEP3, random data distribution (RDD) is used while in GEP2 and GEP4, sorted data distribution (SDD) is utilized. Also, the difference between models 1 and 2 is in the way of calling the inputs to the GEP. In GEP1, all 252 mixing designs are given to the GEP, and the GEP randomly selects 200 numbers as the train and 52 as the test. It is considered, but in GEP2, before the mixing designs are entered into the GEP, we separate the 200 mixing designs as a train and the remaining 52 mixing plans as a test.

Table 8 GEP setting.

Regarding calling the input data, GEP1 and GEP3 are similar to each other, as well as GEP2 and GEP4. In all models, 80% of the data is assigned to the train and the remaining 20% ​​to the test. The stiffness functions used in both train and test phases are r-square \({(R}^{2})\), root mean squared error \((RMSE)\), mean absolute percentage error \((MAPE)\), relative absolute error \((RAE)\) performance index \((PI)\), relative root mean square error \((RRMSE)\) and correlation coefficient \((R)\) as following Eqs. (17). In these equations n is the total number of data Pi are predicted values and Ai are actual values for ith data of n (Fig. 13).

$${R}^{2}=1-\left(\frac{\sum_{i}{\left({A}_{i}-{P}_{i}\right)}^{2}}{\sum_{i}{({P}_{i})}^{2}}\right)$$
(1)
$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({A}_{i}-{P}_{i})}^{2}}$$
(2)
$$MAPE=\frac{1}{n}\sum \left|\frac{{A}_{i}-{P}_{i}}{{A}_{i}}\right|\times 100$$
(3)
$$RAE=\frac{\sum_{i=1}^{n}\left|{A}_{i}-{P}_{i}\right|}{\sum_{i=1}^{n}\left|{A}_{i}-\left(\frac{\sum_{i=1}^{n}{A}_{i}}{n}\right)\right|}$$
(4)
$$PI=\frac{RRMSE}{R+1}$$
(5)
$$RRMSE=\frac{1}{\left|{A}_{i}^{\prime}\right|}\sqrt{\frac{\sum_{i=1}^{n}{({A}_{i}-{P}_{i})}^{2}}{n}}$$
(6)
$$R=\frac{\sum_{i=1}^{n}({A}_{i}-{A}_{i}^{\prime})({P}_{i}-{P}_{i}^{,})}{\sqrt{\sum_{i=1}^{n}{({A}_{i}-{A}_{i}^{,})}^{2}\sum_{i=1}^{n}{({P}_{i}-{P}_{i}^{,})}^{2}}}$$
(7)

Analysis and insights

Experimental observations

Figures 14 and 15 shows the variation of compressive strength of cement mortar samples based on different percentages of WFS with two W/C ratios of 0.4 and 0.5, respectively. Each shape includes 3 parts: a) CSC 32.5 MPa b) CSC 42.5 MPa and c) CSC 52.5 MPa, in each part of each shape different ages of the samples are also shown. As expected, at a fixed WFS percentage, age, W/C ratio and CSC are the main factors affecting the decrease and increase of compressive strength of cement mortar. However, the optimal WFS ratio is different for different mix designs. For example, the optimal amount of WFS for mortar with CSC 32.5 is 20%, which is the same in different W/C ratio while the optimal amount of WFS for cement 52.5 MPa and 42.5 MPa occurs at 10% and 0%, respectively. This result means that in both Figs. 13 and 14, the trend of the graph is upward until reaching the optimal point, and the compressive strength of the samples at all ages increases until reaching the optimal point, and after the introduced optimal point, which is different in the sections of each graph, the trend of the graph is downward and all the compressive strength of the samples decreases. In general, the highest compressive strength value of cement mortar occurs at the age of 91 days with a strength class of 52.5 MPa and 0 WFS percentage and a W/C ratio of 0.4, while the lowest compressive strength value of cement mortar occurs at the age of 3 days with 32.5 MPa cement and percentage the WFS is 50% and the water/cement ratio is 0.5.

Fig. 13
figure 13figure 13

The relationship between compressive strength and age for S/C ratio of 2.75 and W/C ratio of 0.4 and CSC is (a) 32.5 MPa, (b) 42.5 MPa, and (c) 52.5 MPa.

Fig. 14
figure 14figure 14

The relationship between compressive strength and age for S/C ratio of 2.75 and W/C ratio of 0.4 and CSC is (a) 32.5 MPa, (b) 42.5 MPa, and (c) 52.5 MPa.

Figure 15 shows the graphs of the compressive strength ratio of samples with different cements at different percentages of WFS. This figure consists of two parts (a) and (b), where part (a) is for samples made with W/C = 0.4 and part (b) is for samples made with W/C = 0.5. Each of the graphs in part (a) and (b) consists of 3 curves, one of which shows the compressive strength ratio of samples made with CSC 425 MPa to samples made with CSC 325 MPa. The other curve is for the compressive strength ratio of samples made with CSC 525 MPa to samples made with CSC 325 MPa and similarly the last curve is for the compressive strength ratio of samples made with CSC 525 MPa to samples made with 425 MPa. These ratios shown for each point of the graph are incremental ratios. For example, in section a, at the point where the ratio of 0.2 is reported for the CSC 525 MPa to CSC 425 MPa curve, it means that WFS is 0%, the increase in compressive strength of the sample made with CSC 525 MPa is 1.2 times that of the same sample made with CSC 425 MPa. Similarly, in section (b), this figure is also for W/C = 0.5, and all the cases reported for section (a) are also valid for section (b).

Fig. 15
figure 15

Variation rate of compressive strength versus WFS percentage for W/C of 0.4 and 0.5

Predictive modeling outcomes and analysis

Figures 16, 17, 18 and 19 shows the experimental and prediction results of all 252 cement mortar mixing designs. In each of the Figs, parts a, c and e include the results related to the train and parts b, d and f include the results related to the test. By examining parts a and b in Figs. 16 and 17, as well as parts a and b in Figs. 18 and 19, it shows the importance of the model of calling the input data to the GEP in the condition that the number of inputs and all the settings of the GEP are fixed, to give by comparing Figs. 16 and 18 as well as 17 and 19, it can be seen that the simultaneous effect of CSC and %WFS on the prediction of the compressive strength of cement mortar, that by examining these figures, the importance of the correct recall model as well as the effect of cement strength class in cement mortar samples containing waste foundry sand is determined. After comparing Figs. 16, 17, 18 and 19, we conclude that GEP4 with SDD call model and 6 input data including percentage of WFS, CSC (MPa), S/C ratio, W/C ratio, HRWR (ml) and age of sample (day) is the best model introduced with R2 = 0.98, which indicates the good performance of GEP4.

Fig. 16
figure 16figure 16

The correlation of the experimental and predicted Fc values for GEP2 model: (a) shows train R2, (b) shows test R2, (c) shows train prediction, (d) shows test prediction, (e) shows train ratio of Fc, (f) shows test ratio of Fc (g) train values of residual versus predicted Fc and (h) test values of residual versus predicted Fc.

Fig. 17
figure 17

The correlation of the experimental and predicted Fc values for GEP2 model: (a) shows train R2, (b) shows test R2, (c) shows train prediction, (d) shows test prediction, (e) shows train ratio of Fc, (f) shows test ratio of Fc (g) train values of residual versus predicted Fc and (h) test values of residual versus predicted Fc.

Fig. 18
figure 18figure 18

The correlation of the experimental and predicted Fc values for GEP3 model: (a) shows train R2, (b) shows test R2, (c) shows train prediction, (d) shows test prediction, (e) shows train ratio of Fc, (f) shows test ratio of Fc (g) train values of residual versus predicted Fc and (h) test values of residual versus predicted Fc.

Fig. 19
figure 19

The correlation of the experimental and predicted Fc values for GEP4 model: (a) shows train R2, (b) shows test R2, (c) shows train prediction, (d) shows test prediction, (e) shows train ratio of Fc, (f) shows test ratio of Fc, (g) train values of residual versus predicted Fc and (h) test values of residual versus predicted Fc.

The comparison of \({R}^{2}\), \(RMSE, MAPE\), \(RAE\), \(PI\), \(RRMSE\) and \(R\) for all GEP modelling are shown in Table 9 and for GEP4, an expression tree is also presented in Fig. 20 and Eq. 9.

Table 9 GEP models results compared with experimental results are used as validation sets.
Fig. 20
figure 20figure 20

Expression tree for GEP4 model.

Figure 21 presents the box plot analysis of the residuals for four Gene Expression Programming (GEP) models, comparing their performance on (a) training data and (b) test data. The vertical axis represents the residual values (difference between predicted and actual values), while the horizontal axis lists the different GEP models. The red diamonds indicate the mean residual along with ± 1 standard deviation (SD), while the blue vertical bars represent the mean ± 1.96 SD, capturing approximately 95% of the data distribution. For the training data (Fig. 21a), GEP models exhibit different levels of dispersion in residuals. GEP1 and GEP2 show larger variability compared to GEP3 and GEP4, indicating potential overfitting or underfitting issues. GEP3 and GEP4 display more compact residual distributions, suggesting relatively stable predictions. For the test data (Fig. 21b), a similar pattern is observed. The variability in residuals is more pronounced in GEP1 and GEP2, whereas GEP3 and GEP4 maintain tighter distributions, implying better generalization performance. A comparison between training and test residuals suggests that GEP3 and GEP4 might be better candidates for predictive modeling due to their lower residual spread, while GEP1 and GEP2 may require further refinement to improve their accuracy and robustness.

Fig. 21
figure 21

Box plot comparing the performance (residual) of four GEP models for (a) Training data, (b) Test data.

The error distribution of the four Gene Expression Programming (GEP) models is analyzed using histograms, as illustrated in Fig. 22. This figure presents the residuals (error values) for both the training and test datasets across different models. Subfigures (a)–(d) show individual histograms for each GEP model, while subfigure (e) provides a combined view of all models, allowing for direct comparison. The histograms in Fig. 22 reveal key characteristics of the model errors. The error distributions for both training and test datasets generally resemble a normal distribution centered around zero, though slight asymmetries and long tails suggest the presence of outliers or systematic bias in some models. Across all models, the training errors (blue bars) are more concentrated around zero compared to the test errors (red bars), indicating better performance on training data, while the wider spread of test errors suggests some degree of overfitting. Variations in error values show that GEP3 and GEP4 exhibit narrower distributions, implying better generalization and lower error variance, whereas GEP1 and GEP2 have broader distributions with larger residual values, indicating potential challenges in model accuracy. The combined histogram in subfigure (e) provides an overview of all models, confirming that most errors are concentrated near zero, though differences in spread highlight variations in model robustness.

Fig. 22
figure 22

Histogram of error values (residuals) for four GEP models, comparing training and test datasets: (a) GEP1, (b) GEP2, (c) GEP3, (d) GEP4, and (e) combined histogram of all models.

Overall, models with smaller variance and a symmetric error distribution are generally preferred for predictive stability, making GEP3 and GEP4 the most reliable due to their balanced error distribution and lower spread, while GEP1 and GEP2 may require further optimization for improved predictive performance on test data. The best model presented is model GEP4, which has 4 genes. This model includes 6 input parameters (CSC, %WFS, W/C, S/C, age and HRWR), which are present in different genes to predict the compressive strength of cement mortar and are shown in the expression tree of Fig. 21 and Eq. (8). As is clear in Eq. (8), due to the importance of %WFS, CSC and age, these parameters are repeated more often in the genes, and the HRWR, which has less effect, is repeated less often in the expression tree and the presented equation. In Eq. (8), there are 4 functions that are connected to each other with a plus sign, these 4 functions represent 4 genes and the plus sign represents the same linking function. In the above equation, if the CSC, WFS, W/C, S/C, age and HRWR parameter are respectively entered in MPa, %, ratios, ratios, days, and ml the output of this equation will be the compressive strength of the cement mortar in MPa.

$$\begin{aligned} Fc = & \sqrt[3]{{\left( {wfs + \frac{s}{c}} \right)\left( {12.84 - \frac{s}{c}} \right)\left( { - 2.98Age} \right) + HRWR}} \\ & + \frac{{{\text{ln}}\left( {Age} \right)\left( {\frac{csc + 10.81}{2}} \right)}}{{\left( \frac{s}{c} \right)^{2} \left( {7.236\frac{w}{c}} \right)}} \\ & + \frac{3.89}{\frac{w}{c}}\frac{{\sqrt[3]{wfs} + 2.58\frac{w}{c}}}{2} \\ & + \frac{{max\left( {\tan^{ - 1} \left( {wfs} \right),\tan^{ - 1} \left( {Age} \right)} \right)\sqrt[3]{csc} + max\left( {2.58, wfs} \right)}}{2} \\ \end{aligned}$$
(8)

Figure 23a shows the frequency of compressive strength of different mixing designs after prediction in 6 intervals (0–10, 11–20, 21–30, 31–40, 41–50 and 51–60) and in Fig. 23b the comparison of the frequency of compressive strength of different mixing designs before and after prediction has been done. By examining part b and considering the very high data overlap, concluded that the presented model has a very high accuracy.

Fig. 23
figure 23

Comparison of compressive strength frequency in the experimental and prediction, (a) frequency of compressive strength and (b) cumulative percentage of compressive strength before and after prediction.

In Fig. 24, the Probability Density Functions (PDFs) illustrate the error distributions for the four Gene Expression Programming (GEP) models, distinguishing between training and test datasets.

Fig. 24
figure 24

Probability Density Function (PDF) of the error values for the four GEP models. Subplots (a)–(h) represent the error distributions for training and test datasets, respectively, with kernel smoothing or normal distribution fitting applied.

The red curves represent either kernel density estimation (KDE) or normal distribution fits, providing insight into the spread and concentration of errors. GEP3 and GEP4 exhibit a more symmetric, narrow error distribution in both training and test datasets, indicating a more reliable predictive performance. In contrast, GEP1 and GEP2 display wider error spreads, particularly in the test sets, suggesting greater variability and potential overfitting. The higher standard deviation in test errors compared to training errors for GEP1 and GEP2 also implies a degradation in generalization capability. The analysis suggests that while all models capture trends within the dataset, GEP3 and GEP4 may offer superior generalization due to their more stable and centralized error distributions. Figure 25 illustrates the cumulative probability distribution of prediction errors for the four GEP models, comparing their performance on training and test datasets. In the training data (Fig. 25a), the error distributions of all models show a relatively smooth transition, with GEP3 (red) and GEP4 (green) having the steepest slopes, indicating lower error variance and more concentrated errors around zero. Conversely, in the test data (Fig. 25b), the distributions appear more spread out, particularly for GEP2 (blue) and GEP4 (green), suggesting that these models exhibit a wider range of error values and potential overfitting to the training data.

Fig. 25
figure 25

Cumulative Distribution Function (CDF) of the error values for different GEP models, (a) CDF of the training data, (b) the CDF of the test data.

The shift in the curves between training and test sets highlights the generalization capability of each model, where GEP1 (black) and GEP3 (red) maintain relatively consistent distributions across both datasets, indicating better robustness. The deviations observed in GEP2 and GEP4 suggest higher sensitivity to unseen data, potentially leading to reduced predictive accuracy. This analysis confirms that while all models exhibit reasonable error distributions, GEP1 and GEP3 demonstrate better generalization performance, making them more reliable choices for predictive modeling. Figure 26 illustrates a Taylor Diagram, which evaluates the performance of four different GEP models (Train & Test) by comparing their standard deviation (STD), correlation coefficient (R), and RMSE against the experimental data (cyan point). The angular position represents the correlation coefficient, where points closer to the 0° axis indicate higher correlation with the reference data. The radial distance corresponds to the standard deviation, meaning models closer to the experimental data have a similar spread.

Fig. 26
figure 26

Taylor Diagram comparing the performance of different Gene Expression Programming (GEP) models for train and test datasets based on standard deviation (STD), correlation coefficient (R), and root mean square error (RMSE).

The distance from the experimental data point reflects the RMSE, with smaller distances indicating better predictive accuracy. Observing the diagram, GEP models exhibit varying levels of agreement with the reference data, where models positioned closer to the cyan point demonstrate superior performance. The test data points (red, blue, pink, and purple) are generally farther from the reference compared to train data points (black, brown, yellow, and cyan), indicating that some models may experience overfitting, leading to reduced generalization performance. Among the models, GEP1 and GEP3 (Train data) appear to have the closest agreement with the experimental data, suggesting a more reliable predictive capability.

Synergistic influence of CSC and WFS on strength prediction

This study examines the combined influence of various CSC and the incorporation of WFS on the predictive modeling of compressive strength in cement mortar. The analysis will utilize the initial three data sets from prior research conducted by Gi Ryu et al. (including 16 mix designs)85, BoYeol et al. (including 24 mix designs86 and Young Moon et al. (including 36 mix designs)87 collected and after that, the collected data was initially implemented by the model proposed by Iqbal with the five input parameters (WFS, W/C, S/C, HRWR and Age) and with all the functional and general settings related to this model (Fig. 27), and then it was implemented by the GEP4 model presented in this research (Fig. 28).

Fig. 27
figure 27

The correlation of the experimental and predicted Fc values for data collection by Iqbal model, (a) shows train R2, (b) shows test R2, (c) shows train prediction, (d) shows test prediction, (e) shows train ratio of Fc and (f) shows test ratio of Fc.

Fig. 28
figure 28

The correlation of the experimental and predicted Fc values for data collection by GEP4 model, (a) R2 for train data, (b) R2 for train data, (c) train prediction, (d) test prediction, (e) train ratio of Fc and (f) test ratio of Fc.

Figure 29 shows the prediction diagram of the compressive strength of concrete samples using ANN method, to compare the superiority of the over the ANN in Fig. 30, the data related to Fig. 29 has been using the model GEP4 presented in this prediction research. The comparison of Figs. 27 and 28 shows the better performance of GEP4 compared to the model presented by Iqbal, which shows the importance of the simultaneous effect of the CSC and WFS as an input parameter in the prediction model.

Fig. 29
figure 29

The correlation of the experimental and predicted Fc values in ANN model.

Fig. 30
figure 30figure 30

The correlation of the experimental and predicted Fc values for Silva collection 88 by GEP4 model: part (a) shows train R2, part (b) shows test R2, part (c) shows train prediction, part (d) shows test prediction, part (e) shows train ratio of Fc and part (f) shows test ratio of Fc.

Also, comparison of Fig. 30, in addition to showing the effect of the CSC parameter as a main input parameter, the superiority of the GEP method over the ANN is clearly seen. Since cement is one of the main materials used in designing cement mortar, not introducing it as an effective input parameter in the modeling can have a great negative effect on the prediction results, according to the results obtained from the graphs Above, it is recommended that in the research related to waste foundry sand, the simultaneous effect of the CSC and WFS should be used in the predictions.

The comparison of Figs. 28, 29, and 30 highlights the progressive improvement in predictive modeling for the compressive strength (Fc) of cement mortar when incorporating advanced AI-based models. Figure 28, representing the Iqbal model, shows the weakest performance, with train R2 = 0.7744 and test R2 = 0.7162, indicating moderate but unreliable predictions. The significant deviations between experimental and predicted values, along with large fluctuations in the ratio analysis, reveal high instability and poor generalization of this model. In contrast, Fig. 30, based on the Artificial Neural Network (ANN) model, improves prediction accuracy (R2 = 0.8802) and shows a stronger correlation between predicted and experimental values. However, some discrepancies remain, suggesting that ANN alone does not fully capture the complex interactions between input parameters. Finally, Fig. 30, showcasing the GEP4 model, achieves the highest prediction accuracy with train R2 = 0.9759 and test R2 = 0.9756, demonstrating excellent generalization, minimal error, and stable predictions. The consistent alignment between predicted and experimental values, along with a nearly constant Fc ratio, confirms the superiority of the GEP4 model over both ANN and Iqbal models. These findings emphasize the importance of incorporating comprehensive input parameters (CSC and WFS) and advanced hybrid AI models (GEP4) for accurate and reliable predictive modeling in cement mortar research.

Table 10 presents a numerical comparison of the predictive performance of the Iqbal empirical model, the Artificial Neural Network (ANN), and the optimized GEP4 model. As shown, the GEP4 model demonstrably excels in performance compared to the other models, achieving the highest coefficient of determination (R2 = 0.981) and the lowest root mean square error (RMSE = 1.827). These results demonstrate the superior generalization ability and accuracy of the GEP4 model in capturing the complex, nonlinear relationships between input parameters and compressive strength. In contrast, the Iqbal and ANN models exhibit lower predictive power, confirming the advantages of symbolic regression approaches like GEP in this application.

Table 10 Numerical comparison of model performance using R2 and RMSE for the Iqbal, ANN, and GEP4 models in predicting compressive strength. GEP4 demonstrates the highest predictive accuracy with the lowest.

Conclusion

This study explored the influence of waste foundry sand (WFS) and cement strength class (CSC) on the compressive strength of cement mortar using Gene Expression Programming (GEP) as a predictive tool. The experimental results confirmed that both WFS content and CSC significantly affect mortar strength, with specific combinations yielding optimal performance. Incorporating CSC as a model input improved predictive accuracy, demonstrating the importance of material quality in data-driven mix design optimization. The integration of WFS, a sustainable industrial by-product, into mortar mixtures presents a viable path toward eco-efficient construction without compromising mechanical performance. Overall, the study offers a robust framework that combines experimental insight with machine learning to support sustainable material usage and enhance mortar mix design.

  • Model GEP4, considering the simultaneous effect of CSC and WFS as input parameters and random selection of data as software settings, provides the highest prediction performance.

  • If all input parameters and GEP setting are constant, a model whose input data is entered randomly has better performance.

  • Comparing the results of the model presented in this study with the proposed Silva ANN model shows the powerful role of the combined effect of the GEP model and the CSC input parameter.

  • Comparing the results of Iqbal’s proposed method with the method presented in this study shows the significant impact of GEP settings and selection of appropriate input parameters.

  • The effect of CSC in the present study, as well as in comparison with other prediction methods such as neural networks and data collected from other studies and changes made to GEP settings, indicates the direct effect of this parameter on the output results, and it seems that this input parameter should be considered in all research that depends on cement.

This study showed that WFS and CSC significantly affect mortar strength. Using GEP, it identified optimal mixes and improved prediction accuracy, supporting sustainable, high-performance mortar design. Future investigations could broaden this methodology to include other waste materials and examine their applicability across various structural and environmental contexts.