Introduction

Concrete demand across all construction sectors has risen due to urbanization and economic growth 1,2. The substantial energy requirements of cement production are projected to intensify environmental concerns by 2030, with global cement consumptionpotentially reaching 5.54 billion tons. The Portland cement industry emits over 1.65 billion tons of CO2 annually, representing approximately 5–8% of total anthropogenic greenhouse gas emissions and significantly contributing to global warming3,4. India rankssecond among cement producers, accounting for about 8% of global capacity. Ongoing infrastructure development and increasing housing demand are projected to raise cementconsumption in India from 445 MMT in FY24 to 670 MMT by 20305. The depletion ofnatural river sediment, excessive extraction of natural aggregates, and the environmental impacts of cement production pose substantial challenges to sustainable concrete. Thesepractices not only damage the environment but also increase construction costs.

To mitigate these challenges, the construction industry is increasingly utilizing industrial.

residues and alternative fine materials to produce concrete with lower carbon andresource footprints. The potentialmaterials such as fly ash6,7,8,9, silica fume 10,11,12,13,14, andmanufactured sand 15,16,17 have been examined to improve concrete sustainability 18,19.In this context, some of the studies on SCMs and M-sand asfollows: Kaushik et al. 2025 13 investigate the following mortar compositions: traditional,(MCS), modified (MFSS), and modified (MFSHS), including flyash, silica fume, and human hair. Mixtures containing these elements increased workability by a considerable margin; the mix with the greatest improvement, MFSS-2 (portland cement, yamuna sand, 10% FA, and 6% SF), exhibited a 42.50 percent gain. Yergol et al. 14 studied the main parameters impacting composite concrete strength in recycled aggregate (RA) and silica fume combinations. 60% RA mixed concrete products had lower shear strengths after 7 and 28 days of drying. Compressive strength of 10% silica fumes and 45% RA is higher than that of other silica fume-based composite concrete materials, indicating better split tensile strength. Narendra Kumar and Vinod Kumar20 examine the Portland cement with flyash, GGBS (20% and 30%) in SCC mixes, and the optimal concentrations of nano-silica (NS), nano-alumina (NA), and graphene oxide (GO) are 6%, 4.5%, and 0.04% of cement, respectively. Somasri and Narendra Kumar 202121 observed that the performance compared to the control mix and examined the HSSCC’s pore structure and rheological properties are improved by replacing silica fume (10 wt% of cement) with fly ash (30 wt% of cement), and the concrete’s graphene oxide (GO) dispersion is improved as a result. Aswin et al. 22 examined replacing cement with FA up to 60%. Compression testing was observed at 1, 7, 28, 60, and 180 days. The compressive strength of ECC mortar samples (ECC-0) varied, but all mixes showed 20 MPa after 1 day, showing significant early compressive strength. After 28 days, compressive strength was 111.28 MPa, but the optimal performance is 15% FA; however, 30% FA is still strong. However, ECC-0 samples with above 30% FA concentration had lower compressive strengths than MF0.

Singh et al. 23 studied the high-volume fly ash (HVFA) and the function of silica nanoparticles (SNPs)in the cementitious system. In this research, 40% (40 FA) and 50% (50 FA) of the cement was substituted with fly ash (FA), incorporating 3% SNPs into concrete increased its compressive strength, which allowed for faster construction times (maximum compressive strength was reached in 7 days instead of 28 days). Specimens containing SNPs also exhibited significantly higher resistance to sulphate attack, at 41% with 40 FA and 34% with 50 FA samples, compared to control specimens. However, Lekhya and Kumar, 2024 6 observe that silica fume with fly ash enhanced the compressive strength by 145%, reaching its maximum strength at 10 wt% SF. Scanning electron micrographs show that the combination of fly ash and silica fume increased compressive strength by making the microstructure denser. To improve HPC, SCMs made of fly ash and silica fume (SF) are being used. Following 90 days of curing, the ternary mix U10S15 showed improved compressive strengths of 104.28 MPa and a water absorption rate of 1.26%, indicating a reduction of 44.9% in water absorption with extended curing is observed, but in Sha and Liu24 observes the high-performance microfine cementitious grout (HPMCG), found that the optimal component was 40% MFA relative to the matrix and 10% and 30% SF and MBFS. The optimized HPMCG exceeded expectations, has high mechanical strength, good anti-permeability, an advantageous mineral component, and a microstructure.

Predicting concrete compressive strength is challenging due to the interplay of chemical composition, mix proportions, curing times, and hydration processes 25,26. This complexity increases with sustainable concretes that use alternative binders and ingredients, where standard models often underperform. Differences in reactivity, particle size, and chemical composition introduce further uncertainty. Although machine learning (ML) can identify hidden patterns, its effectiveness is often limited by insufficient datasets and a lack of physical insights, such as microstructural analysis27,28,29,30. As a result, ML algorithms frequently struggle to predict the strength of sustainable concrete due to inconsistent data. Table 1 summarizes the ML-based prediction approaches examined.

Table 1 Literature comparison of ML models for strength prediction in concrete of SCMwith the present work.

Based on the comprehensive literature review, several critical research gaps have been identified. (1) limited investigations on the combined effect of flyash and silica fume as a binder with M-Sand as fine aggregate for the preparation of concrete ; (2) insufficient correlation between microstructural characteristics and mechanical properties; and (3) lack of validated machine learning models to forecast the compressive strength in the combination of teneray bends with M-sand.

This study addresses the gap and aims to establish a direct relationship between hydration product formation and the mechanical strength of sustainable ternary concrete mixes using 100% M-sand, 10% fly ash, silica fume varying (0%, 6%, 12%, 18%, and 24%) as partial cement. The Compressive, split tensile, ultrasonic pulse velocity (UPV), and flexural strengths were measured at 7, 28, and 90 days. Microstructural analysis at 28 days identified and quantified hydration products using scanning electron microscopy with energy-dispersive spectroscopy (SEM–EDS) and thermogravimetric analysis (TGA). To enhance prediction accuracy and reduce experimental workload, advanced machine learning models, including LASSO, Random Forest, Gradient Boosting, XGBoost, AdaBoost, and Artificial Neural Networks, were used to predict compressive strength in the combination of teneray bends with M-sand.

Materials and methods

Materials

OPC 53 Grade according to (IS 12269:2015)41, which was used for the study. Figure 1 shows the particle size distribution (PSD) curve measured from a Horiba laser diffractometer. The smaller the particles, which are SF and FA rather than cement, the more influence they have on the microstructure and strength of the material. The D50 values indicate that the median particle size of fly ash (FA) is about 44.08 µm. The cement has a median particle size of about 89.97 µm, while silica fume has a median particle size of about 86.05 µm. The aggregates were classified as either fine or coarse according to the requirements laid down in IS 383:201642. The coarse aggregate used in this study has a size range of 12.5 mm to 20 mm and was acquired locally. Before being dried to a surface-dry state, the aggregates were cleaned to remove any dirt or dust. The coarse aggregate used in this study has a specific gravity of 2.7, a bulk density of 1600 kg/m3, and a fineness modulus of 7.17. M-Sand, a locally sourced fine aggregate, has a specific gravity of 2.38, a fineness modulus of 2.5, and a bulk density of 1450 kg/m3. PCE bases SP in accordance with IS 9103: 199943. The methodology of the study is shown in Fig. 2.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Particle size distribution curve of binder materials.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Flow chart of the methodology for the study.

Mix proportions and preparation

The mix proportions are carried out in accordance with IS 10262:2019 44. Six mixtures are made, including the control mixture. In the study, the variable of the concrete matrix is the fly ash (10%), silica fume (0%, 6%, 12%, 18%, 24%), and M-Sand (100%) their quantities and mix proportions of mixes are shown in Table 2.The mixing was conducted in a laboratory using panmixing. The aggregates, binders, water, and superplasticizers were combined in a pan mixer for a total duration of 5 to 10 min. The mixing continued until a homogeneous mixture was achieved. The concrete was thereafter poured into the adequately lubricated moulds. The concrete sample is put into the mould for each corresponding mix; vibrated, excess concrete is removed, and levelled. The moulds remain open for a maximum of 24 h, after which samples are immersed in the curing tank for periods of 7 days and 28 days.

Table 2 Mix proportions of mixes, kg/cu.m

Data

Normalization of the data features is essential for accurate evaluation of the output parameter. An initial statistical review identifies the key input characteristics and the corresponding compressive strength. In the present study, a total of six concrete mixes were evaluated at three curing ages (7, 28, and 90 days), with three experimental replicates, resulting in 54 strength data points. In the study, Table 3 summarizes this analysis, presenting the mean, maximum, minimum, and their respective deviations. The compressive strength of each sample ranges from 25.68 to 53.57 MPa. For the input parameters like cement (C), Flyash (FA), river sand(RS), M-Sand, coarse aggregate (20 mm, 12 mm), Water (W), superplastizer (SP) and curing ages- their values range from—270.6 to 410, 0 to 41, 0 to 73.8, 0 to 624.29, (766.85 and 511.23), 180, 4.1, 7 to 90 days, respectively. For the dataset, effectively reduced skewness and stabilized variance across features using the Power Transformer using Yeo–Johnson45 is used for the study. The box-plot analysis before transformation, as shown in Fig. 3, and after transformation, as shown in Fig. 4. This pre-processing helps the feature consistency and improves robustness and performance of the predictive models. After the processing of the data, the correlation between the eight input features and compressive strength is calculated.

Table 3 Statistical analysis of the data.
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

The box plot of the data used before skewness.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

The box plot after skewness of the data used.

Figure 5 shows the Pearson correlation matrix, with numerical coefficients in each cell for clear interpretation of relationships among mixture constituents, curing age, and compressive strength. Strong inverse correlations are found between cement and supplementary materials, including cement–fly ash (r = − 0.95) and cement–silica fume (r = − 0.96), indicating proportional substitution in the mix design. The perfect negative correlation between river sand and M-sand (r = − 1.00) confirms their direct replacement. Compressive strength has only weak correlations with individual constituents, such as fly ash (r = 0.23), silica fume (r = 0.17), river sand (r = − 0.16), and M-sand (r = 0.16), suggesting that material proportions alone do not determine strength development. In contrast, curing age shows a strong positive correlation with compressive strength (r = 0.76), indicating that age is the main factor in strength gain. Including numerical coefficients makes the heatmap a more effective quantitative tool, clarifying the strength of relationships and improving the robustness of the analysis.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Correlation analysis of different features of the data used.

Experimental investigation

Test procedure

According to the study, nine 100 × 100 × 100 mm cubes, nine 100/200 mm cylinders, and six 500 × 100 × 100 mm beams are cast for the mixtures as per IS: 516–2021 46, and a 100-tonne UTM is used to find the compressive, split tensile, and flexural strengths, as per IS: 516-202146. The Proceq Pundit Lab in Switzerland conducts the UPV test according to IS 13311-1:1992 47. Each mix is cured as per the requirements of the test at 7, 28, and 90 days. Subsequently, microstructural analysis and sensitivity analysis are carried out for the mixes for the reliability of the study.

TGA analysis

The Hitachi STA-7200 is a TGA and DTG instrument to measure the weight loss of the sample during testing. Samples weighing 10 to 20 mg are heated from 27 to 800 °C at 20 °C/min under a constant nitrogen flow of 20 mL/min. DTG analysis determines the peak weight-loss temperature of the composites.

SEM /EDS analysis

The main goal of the microstructural investigation is to identify the crystalline structures in the materials. After testing the compressive strength, scanning electron microscopy (SEM) and energy-dispersive spectroscopy (EDs) are used to analyse each sample composition. SEM uses a focused electron beam to produce detailed surface images, while backscattered electron imaging (SBHTESCAN) and visual electron beam (VEG3) help identify specific microstructural features in each sample.

Sensitivity analysis

This study evaluated the effects of cement, silica fume, normal sand, and M-sand on compressive strength, split tensile strength, flexural strength, and ultrasonic pulse velocity. Fly ash, coarse particles, and water were excluded since their quantities were constant in all samples. As shown in Eq. (1), all input variables were normalised using the z-score transformation to ensure comparability across different units and scales.

$$z = \frac{x - \mu }{\sigma }$$
(1)

where x is the observed input value, μ is the mean of the input, and σ is its standard deviation. This step ensures comparability between variables with different units and scales. Secondly, a multiple linear regression model was then fitted separately for each output by using Eq. (2), y is the output (e.g., compressive strength), α is the intercept, βi are the standardized coefficients for each input zi, and ε is the residual error. Along with standardized coefficients (unitless, effect per one standard deviation), unstandardized coefficients (effect per one unit increase) were also calculated (Eq. 3). Where SSres is the residual sum of squares and SStot is the total sum of squares. Finally, a one-way sensitivity analysis was carried out. Starting from the mean mix design, each input was varied by ± 10% while keeping others constant. where ŷ is the predicted value, and xi is the mean of feature i (Eq. 4). This approach highlights which variables have the highest influence on each property.

$$y = \alpha + \mathop \sum \limits_{i = 1}^{n} \beta_{i} z_{i} + \varepsilon$$
(2)
$$R^{2} = 1 - \frac{{SS_{res} }}{{SS_{tot} }}$$
(3)
$$Impact_{i} = {\hat{\text{y}}}\left( {x_{i} \times 1.1} \right) - {\hat{\text{y}}}\left( {x_{i} \times 0.99} \right)$$
(4)

Machine learning

In the present study, a total of six concrete mixes were evaluated at three curing ages (7, 28, and 90 days), with three experimental replicates, resulting in 54 strength data points. Machine learning (ML) models are applied to the experimental dataset to predict compressive strength and identify nonlinear relationships among mix parameters. The input variables included cement, fly ash, river sand, manufactured sand, two sizes of coarse aggregate, water, superplasticiser, and curing time. Before modelling, we checked the data for completeness, accuracy, outliers, and distribution patterns. To prevent overfitting and data leakage, we applied appropriate data splitting and regularisation methods during training and validation (70/30). For consistency, we randomly split the dataset into training and test sets with a fixed seed, ensuring equal counts of 7, 28, and 90-day observations. We tested several regression techniques, including lasso regression (Lasso), random forest (RF), adaptive boosting (Adaboost), gradient boosting (GB), extreme gradient boosting (XGBoost), and artificial neural networks (ANN). Model training was done on a standard workstation using Python and the scikit-learn (v1.3) module. We used the following metrics to measure prediction accuracy: coefficient of determination (R2), mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE).

Lasso regression

Lasso regression, or least absolute shrinkage and selection operator, is a regularisation technique in machine learning that enhances linear models for high-dimensional data by performing feature selection and reducing overfitting.

Random forest (RF)

Random forest bagging trains multiple decision trees using n-size bootstrap samples from the original dataset. Bootstrap uses random sampling with replacement to ensure tree diversity. Each node division evaluates a random selection of input properties,

adding decorrelation to the trees. The final regression model prediction is the average of all tree findings.

Gradient boosting machines

Gradient boosting weaves together a series of simple models, most often decision trees, to build a powerful predictor. With each step, the model learns from past mistakes, steadily refining its accuracy by minimizing errors.

Extreme gradient boosting (XGBoost)

In engineering and experimental research, where data is often restricted, extreme gradient boosting (XGBoost) stands out as an advanced and effective ensemble learning approach that thrives in these conditions. To avoid the problems encountered by bagging approaches, XGBoost employs boosting, building decision trees sequentially while gaining insight from the errors made by earlier models. Even with fewer data, XGBoost can discover important patterns because of its innovative method.

Artificial neural networks (ANNs)

A computer system that aims to mimic the way the brain processes information is an artificial neural network (ANN). By modeling the behavior of neurons, ANNs aim to mimic the brain’s data processing and decision-making.

Adaptive boosting method

One more powerful ensemble ML method is adaptive boosting (AdaBoost). Making use of several simple “weak” models, such as decision stumps, it generates a very accurate “strong” model. While improving classification performance and fixing errors, AdaBoost iteratively raises the weights of models that were misclassified by previous models.

Results and discussions

Examination of compressive strength (CSM) features

Figure 6 illustrates the progression of compressive strengths for the mixtures. All mixtures exhibit increased strength with extended curing periods. The compressive strength measurement (CSM) for mixtures M1 to M6 ranges from approximately 26.75 to 32.87 MPa at 7 days, 39.62 to 47.65 MPa at 28 days, and 42.95 to 53.57 MPa at 90 days. While CSM improves notably at 28 days, no significant enhancement is observed at 90 days. The data show that mixtures cured for 7, 28, and 90 days generally achieve higher strengths than the M1 samples. After 7 days, the strength increases are approximately 2.62%, 6.56%, 18.61%, 16.03%, and 3.5%. At 28 days, the increases are 2.72%, 6.80%, 16.85%, 13.7%, and 2.50%, respectively. At 90 days, the percentages for M2-M6 relative to the control sample M1 are about 2%, 5.9%, 20%, 16.2%, and 6.2%. The M4 mixture achieves the highest strength, incorporating 12% silica fume (SF) and 10% fly ash (FA) as cement replacements and manufactured sand (MS) as fine aggregate, resulting in compressive strengths of approximately 32.87 MPa, 47.65 MPa, and 53.57 MPa at 7, 28, and 90 days, respectively.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

CSM properties of the mixes at 7, 28, and 90 days of curing.

From the study of Khan M, Ali M 7 reported that the mechanical properties of concrete are improved when it contains 2% coconut fiber, 10% fly ash, and 15% silica fume. This improvement is a result of the coconut fiber material’s overall pore structure and continuity. Yergol et al. 14 reported a reduction in compressive and split tensile strength, with values approximately 5% below the intended mean strength, due to the incorporation of 10% silica fume and 60% recycled aggregate. This nominal loss of strength suggests the formation of a denser, more compact concrete matrix, attributed to the cementitious properties of silica fume, as confirmed by SEM analysis. According to Zada et al.12, the effects of silica fume (SF) and metakaolin (MK) on steel-fiber-reinforced concrete were evaluated. Concrete mixtures containing 10% silica fume and 1% steel fiber exhibited a 19.11% increase in compressive strength compared to the control group.

Examination of split strength(STS) features

Split tensile strength is a critical parameter for evaluating the mechanical performance of concrete, particularly in relation to reinforcement shear and anchoring. Figure 7 shows that each mixture exhibits a consistent increase in strength during curing, with values increasing from M1 to M6. At 7 days, split tensile strength (STS) ranges from 2.35 to 2.94 MPa, increases to 4.67 to 7.65 MPa at 28 days, and reaches 5.25 to 7.85 MPa at 90 days. These mixtures display significant strength enhancements, with improvements ranging from 1.26 to 20% at 7 days, 13.37% to 38.95% at 28 days, and 5.45% to 33.32% at 90 days. Among all mixtures, M4 (FA10SF12MS100) achieves the highest STS, whereas M5 and M6 exhibit a slight reduction compared to M4.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

STS properties of the mixes at 7, 28, and 90 days of curing.

The previous studies show Bheel et al. 10 use silica fume (SF), rice husk ash (RHA), and marble dust powder as SCMs, both individually and as ternary cement mixtures, increasing compressive strength by 10.44% and split tensile strength by 9.54%. Incorporating 9% TCM in concrete has produced strong results in the construction industry and reduced cement consumption. Zada et al. 12 explore the effects of SF and MK on steel-fiber-reinforced concrete. When compared to a control mix, concrete with 10% SF and 1% steel fiber has a 17.23% increase in split tensile strength (STS).

Examination of flexure strength (FST) features

The flexure strength of the concrete is evaluated at the curing ages of 28 and 90 days are shown in Fig. 8. At 28 days, FST values range from 4.61 to 6.83 MPa, 4.98 to 7.02 MPa at 90 days, respectively. The mixtures show the improvement in the FST values after partial replacement of cement with FA and SF, along with river sand substituted with MS. The improvement of FST in the mixes may be attributed to the existence of a strong bond at ITZ between the aggregate and paste phases, which leads to the formation of primary and secondary CSH gels in the matrix. The mix M4 (FA10SF12MS100) shows the improvement of FST about 32.5% and 29% compare with the control mixture (M1), but M5 and M6 shows decrement in the FTS compared with M4.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

FST properties of the mixes at 28 and 90 days of curing.

Examination of the quality of concrete features

The quality of the concrete is assessed in a non- destructive form using the UPV for the mixes at the curing ages before CSM. Figure 9 shows the incremental trend in the UPV, which attributes the quality of mixes to be excellent as perIS 13,311–1:199247. After 7 days, the UPV values range from 4.56 to 4.98 km/s, at 28 days, range from 4.75 to 5.32 km/s, and at 90 days, range from 4.85 to 5.56 km/s.The quality of the concrete samples is excellent with the replacement cement with FA, SF, and fine aggregate with M-Sand. Zada et al. 12 examine the influence of SF and MK on steel-fiber inclusion are compared to the control mix. Concrete mixes containing 10% SF and 1% steel fiber significantly increase the ultrasonic pulse velocity (UPV) by 10.06%, respectively. In this study, the quality of concrete mixes shows the incremental trend the mix M4 (FA10SF10MS100) shows about 8.43%, 10.7%, and 12.76% improvement at 7, 28, and 90 days.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

UPV of the mixes at 7, 28 and 90 days of curing.

Sensitivity analysis

The regression and sensitivity analysis were carried out for all four outputs, namely compressive strength, split tensile strength, flexural strength, and ultrasonic pulse velocity (UPV). The results showed that the type of sand used in the mix played the most important role across multiple properties, with M-Sand consistently having a positive influence and Normal Sand showing a negative influence. For compressive strength, increasing the proportion of M-Sand improved strength, while higher amounts of Normal Sand reduced it, and the sensitivity analysis confirmed that a ± 10% change in M-Sand had the largest impact on predicted strength. For split tensile strength, silica fume showed a positive effect, meaning it enhanced tensile performance, while cement appeared to have a negative effect in this dataset, and sensitivity analysis indicated that silica fume had the most influence on this property. The results indicated that flexural strength increased with the use of manufactured sand (M-sand) and decreased with river sand, with similar trends observed in compressive strength. Sensitivity analysis demonstrated that variations in river sand content had the most significant impact on material properties. Ultrasonic pulse velocity (UPV) was primarily influenced by cement content; river sand reduced predicted UPV values, whereas higher cement content enhanced UPV. The ratio of M-sand to river sand affected multiple mechanical properties. Increased silica fume content improved tensile strength, while the effect of cement was inconsistent. Due to the small sample size (n = 6), the regression models accounted for only a portion of the total variance. Therefore, these findings should be interpreted as preliminary trends, and further testing with larger sample sizes is necessary to provide stronger evidence and develop more reliable models.

Assessment of TGA analysis

To evaluate the thermal stability of the ternary mix with M-Sand, TGA and DTG were used. For a comprehensive understanding of how materials react to heat, TGA is an invaluable technique. It allows one to examine thermal stability, breakdown, deterioration, the impact of SF and M-Sand, moisture content, and more. The results are shown in Fig. 10. Thermograph analysis shows that the reference sample (M1) and binder mixes (M2-M6) experience mass loss regardless of temperature. Figure 10a shows a total mass loss of 8.91% at 800 °C, with notable peaks at 75, 330, 450, and 700 °C. The initial mass loss, less than 2% between 0 and 200 °C, is attributed to the dehydration of water-bound substances.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

TGA analysis of the mixes. (a) M2 (FA10SF0MS100), (b) M3 (FA10SF6MS100), (c) M4 (FA10SF12MS100), (d) M5 (FA10SF18MS100).

The mass loss between 200 and 400 °C is about 2.38%, and the major peak of 400–600 °C is about 2.17%. The final peak mass loss is observed at 600–800 °C, which attributes the removal of carbon, which is associated with the decomposition of calcites and other carbonates48. The TGA curve indicates the mass loss of about 2.79% at 600–800 °C. The Fig. 10b shows the TGA behaviour of the M2 sample. The First mass loss up to 200 °C is about < 1.2% was due to the loss of water bound in the sample. The main peak is observed between 400 and 600 °C, it is about 3.66% due to dehydration of (OH) groups. Figure 10c–f shows the three major peaks for the samples from M2–M6. In general, from 0 to 200 °C attributes about < 1.5% (M3-M6). Apart from M5, which is < 5%, the sample shows the slowest rate of weight loss, suggesting it has the lowest water content compared to the others. Impact from FA and SF, which makes matrices denser and fills gaps, may be associated with this phenomenon. The evaporation of physically bonded water was the main cause of the sluggish weight loss of all samples, which was about less than 5.5%, within the temperature range of 200–500℃. A further process leading to a gradual decrease in body weight was the dehydroxylation of silicon-hydroxyl groups that had been chemically linked; this process created silicon-oxygen groups and released water 49. In addition, the combustion of residual coal content from the FA and SF mainly explained the minimal weight loss between 600 and 800 °C. Beyond a certain point, the DTG curves exhibit this behavior beyond 600◦C, where a major hump shows a slight alteration in the weight loss pattern.

Figure 4c–f demonstrates that elevated temperatures increase the complexity observedin the DTG analysis of the mixtures. At 300, 500, and 700 °C, about 10% of the mass is lost, corresponding to the most prominent peaks. This mass loss is attributed to the degradation of dihydroxylation and hydroxyl groups present in fly ash and silica fume. The increased peak intensity suggests a higher concentration of carbonated products released as hydrates, such as calcite, calcium silicate hydrate, Ca(OH)2, and C–A–S–H, which subsequently decompose 48.

SEM/EDs observations

The shape and crystalline structure were examined using microstructure analysis on mixed samples, as shown in Fig. 11. The cement replacement with FA10%, SF0%, and M-Sand 100% (M2) is shown in Fig. 11a. Portlandite structure, fractures, holes, and a rich CSH matrix are all visible in the scanning electron micrograph. The scanning electron micrograph (SEM) of 100% M-Sand (M3) used to replace 10% FA and 6% SF in the cement is shown in Fig. 11b. Void patterns and thick C–S–H gel are seen. This SEM picture of the M4 (FA10%, SF12%, and M-Sand 100%) blend is shown in Fig. 11c. Portlandite and C–S–H nanostructures are clearly visible in the scanning electron micrograph (SEM) of the thick C–S–H gel. The hydration process fills the tiny gaps and fissures with growing nanostructures. The cement replacement mixes with FA10%, SF18%, and M-Sand 100% (M5) are shown in Fig. 11d. The SEM picture reveals a number of holes and the development of C–S–H gel.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

SEM images of the mixes. (a) M2 (FA10SF0MS100), (b) M3 (FA10SF6MS100), (c) M4 (FA10SF12MS100), (d) M5 (FA10SF18MS100).

The scanning electron microscopy (SEM) pictures show that the aggregate-paste interface is much less porous and has better bonding properties; moreover, the hydration products and aggregate surfaces have almost continuous contact M4. Small particles of FA and SF enhance the microstructure by interacting with CH to produce more C–S–H gel, which is visible at the microscopic level. This gel occupies the interstitial spaces between cement grains, resulting in a denser matrix structure and notable pore refinement 50,51. The combined effects of FA and SF filler action, as well as pozzolanic reactivity, make the material stronger. The formation of nanostructures is due to the aggregation of CSH nanostructures and portlandite crystals, which is what caused this improvement 11,12,51,52. As a result, the interfacial transition zone (ITZ) between the aggregate and cement paste has improved 51,52.

EDS analysis

A systematic progression in elemental composition mixtures was found by energy dispersive spectroscopy (EDs) research. In order to determine the elements composition and the C–S–H gel formation in the concrete, EDS was employed. The use of chemical reactions, the connection between the Ca/Si ratio, and the hydration process of cement was examined. The EDS analysis shows the peaks of Si, Ca, C, and O under EDS analysis. Figure 12 shows EDS analysis of the samples. The range Ca/Si varies from 1.46 to 7.04. Figure 12a represents Ca/Si = 3.36 for M2, Fig. 12 (b) shows Ca/Si = 7.04 from EDS analysis of M3 samples. Figure 12c shows Ca/Si = 1.46 from EDS analysis of M4 samples. Figure 12d shows Ca/Si = 2.97 from EDS analysis of M4 samples. From Fig. 12, a consistent hydration environment was established as the oxygen concentration was reasonably constant (53.14–66.84%) throughout all mixes. A gradual rise in silicon content (from 4.46 to 11.03% of the total) and a systematic drop in calcium concentration (from 25.49% in M2 to 16.8% in M4) suggest that pozzolanic reactions are producing C–S–H gel structures that are stronger.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

EDS analysis of the mixes. (a) M2 (FA10SF0MS100), (b) M3 (FA10SF6MS100), (c) M4 (FA10SF12MS100), (d) M5 (FA10SF18MS100).

At 28 days, the (Al + Fe)/Ca ratio in M4 was 0.170%, but in M2 it was only 0.081%, suggesting the production of C–S–H gel phases was replaced with aluminium and iron. By optimizing particular elements in M4 (10% flyash, 12% silica fume, and 100% M-Sand), we are able to establish a direct correlation between compositional parameters and mechanical performance as well as microstructural properties. According to Zhang et al.53, strength is decreased with larger Ca/Si ratios, whereas lower ratios are associated with the production of more stable C–S–H gel structures that have improved mechanical characteristics. A decrease in concrete’s compressive strength is caused by an increase in the atomic Ca/Si ratio. On the other hand, the presence of Si and Ca causes a dense accumulation of CSH gel in the ITZ of the concrete matrix phase, leading to tightly packed homogeneous integrity. Consequently, less calcium silicate hydrate (C–S–H) gel is synthesized 54,55,56.The results correspond with recognized standards for ideal C–S–H gel composition in cementitious systems.

Machine learning models

The LASSO, RF, GB, AB, XGB, and ANN prediction models were trained and verified on the aforementioned dataset. Five fold cross-validation was used to ensure the machine learning models were robust and reliable. Model performance was evaluated using R2, MSE, RMSE, and MAE, with consistent results across folds indicating trustworthy predictions and minimal overfitting. Hyperparameters were tuned to balance accuracy and complexity. Ensemble models used 300 to 400 estimators, while XGBoost applied a low learning rate and shallow trees to promote generalisation. The artificial neural network was configured with a fixed number of iterations and two hidden layers. Model performance was assessed by comparing predicted and actual compressive strengths at 7, 28, and 90 days. Although accuracy varied due to differences in architecture and training, all models identified the key patterns in compressive strength.

Performance of ML models

The efficacy of the machine learning models developed, as assessed by metrics including root mean square error (RMSE), mean absolute error (MAE), mean squared error (MSE), and the coefficient of determination (R2), is presented in Table 4. To forecast the effects of compressive strength on the concrete matrix at 7, 28, and 90 days, the study employed machine learning models such as LASSO, RF, GB, XGBoost, AdaBoost, and ANN. Among these, the GB and XGBoost models demonstrated the highest accuracy and robustness, achieving R2 = 0.99, RMSE values of 0.688 MPa and 0.471 MPa, and MAE values of 0.55 MPa and 0.5565 MPa. These models effectively captured complex nonlinear relationships among the input parameters. In a similar way, the most accurate models in terms of R2, RMSE, and MAE for effectively modelling complicated connections in a very short dataset are ANN (0.99), AdaBoost(0.99), and RF (0.810 MPa, 0.49 MPa, and 0.90 MPa, respectively).

Table 4 Performance of ML models developed.

As a result of the training samples, the LASSO model showed a larger root-mean-square error (RMSE) of 25.01 MPa with a poorer accuracy (R2 = 0.60). Ensemble learning techniques outperform single predictive models in capturing the behaviour of concrete matrices. Based on that, the ranking of models performance is as follows: GB > AdaBoost > XGBoost > RF > ANN > LASSO. For engineering-scale strength prediction, an acceptable range of ± 0.69 MPa, ± 0.9 MPa was reached by theGB, AdaBoost and ± 1.1 MPa for XGBoost, RF, and ANN model minimum RMSE.

The six machine learning models that were tested for their prediction capabilities were LASSO, RF, GB, XGBoost, AdaBoost, and ANN. The models were tested using anticipated vs real compressive strength charts and the coefficient of determination (R2). In contrast to more basic models, ensemble models are able to represent complicated interactions between mix parameters and nonlinear relationships, as shown by these differences. Additional visual evidence for these quantitative results may be seen in Fig. 13, which shows scatter plots of the actual and anticipated compressive strength values.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.
Full size image

Actual versus Predictive compressive strength using different ML models.

The models exhibit distinct trends when comparing predicted and actual compressive strengths within the 28–52 MPa range. LASSO regression frequently produces inaccurate results when prediction errors exceed 20% and compressive strength falls between 30 and 45 MPa. The sequential method fails to capture the complex relationships among mix design factors, including binder content and curing age. In contrast, the Random Forest model achieves an R2 score of 0.98, indicating strong alignment with the experimental results. The model’s decision trees effectively identify relevant feature interactions, maintaining prediction errors within ± 20% across all strength levels. This finding suggests that Random Forest can detect nonlinear and indirect effects. Gradient Boosting demonstrates even higher performance, with an R2 of 0.99 and very high predictive accuracy. Its predictions for strengths between 30 and 50 MPa closely align with the ideal reference line. The results remain tightly clustered, within ± 20% of the mean, due to the iterative learning process that minimises variation in feature contributions.

AdaBoost produces predictions that are within ± 20% of the actual values for most cases, indicating high accuracy (R2 = 0.98). However, its consistency decreases at compressive strengths between 48 and 52 MPa, limiting its ability to capture complex feature effects at higher strength levels. XGBoost also yields predictions closely aligned with observed data (R2 = 0.98–0.99), particularly in the 28–50 MPa range. It maintains prediction errors below 30% through regularisation and robust tree boosting, and effectively models interactions among nonlinear features. The artificial neural network (ANN) similarly demonstrates strong predictive performance in the 28–52 MPa range, achieving an R2 of 0.98.

Interpretation of model behaviour

Analysis of model performance and feature importance reveals that compressive strength is predicted differently across models. Figure 14 demonstrates significant variation in input weights and feature combinations across linear, ensemble, and neural network approaches. The LASSO model emphasises specific features, assigning approximately 40% importance to cement, 25% to superplasticiser, 20% to fly ash, and 10% to curing age, with minimal weight on other variables. Due to its linear structure and L1 regularisation, LASSO reduces the influence of correlated and less significant variables, resulting in this uneven distribution. Consequently, LASSO fails to capture complex nonlinear interactions among aggregates, water, and other cementitious materials, which leads to less accurate and more variable predictions. In contrast, tree-based ensemble models provide clearer physical interpretations. For example, Random Forest attributes about 85% importance to curing age, highlighting the critical role of hydration time in strength development. Superplasticiser (8%), cement (4%), and fly ash (2%) also contribute, indicating that these models capture nonlinear relationships among mixture components more effectively. Cement and admixtures enhance predictions by correcting errors over time.

Fig. 14
Fig. 14The alternative text for this image may have been generated using AI.
Full size image

Relative importance of input features in predicting the compressive strength for ML models.

Gradient Boosting exhibits a similar trend, with curing age accounting for 90% of total importance. These patterns align with the established understanding that hydration increases strength. AdaBoost distributes its concentration more evenly, assigning 15% to curing age, 5% to cement, and around 70% to curing time. This greater allocation, however, reduces its robustness at higher strength levels since secondary attributes contribute less consistently. XGBoost achieves an ideal balance, with curing age notably at 90%, aided by consistent contributions from cement and admixtures. The regularisation strategy ensures equal learning of both major and secondary characteristics, resulting in accurate predictions across the entire compressive strength range.

Within the Artificial Neural Network (ANN) approach, fly ash, superplasticiser, cement, aggregates, and age each contribute between 12 and 17% to overall relevance. This balanced distribution underscores the ANN’s capacity to identify complex, continuous, and interrelated nonlinear interactions among mixture components without dependence on a single dominant factor. Feature-importance analysis demonstrates that ensemble and neural network models effectively explain the intricate relationships that influence compressive strength, particularly those driven by curing age and binder properties. In contrast, linear models do not adequately represent these complex processes.

Figure 15 shows the feature-importance heat map; across the majority of ensemble models, age (days) is the most important predictor. The main factor determining compressive strength is curing time, as seen by the high weights given to age by XGBoost (0.92), Random Forest (0.89), Gradient Boosting (0.86), and AdaBoost (0.77). Silica fume (SF; 0.18 in AdaBoost, 0.10 in Gradient Boosting) and cement (C; around 0.03–0.04) are the secondary contributors, but fly ash (FA), river sand (RS), and manufactured sand (M-Sand) have a negligible effect (around 0.00–0.01). On the other hand, LASSO reflects linear sensitivity by allocating significance among cement (0.41), silica fume (0.25), fly ash (0.21), and age (0.11). Fair feature use is indicated by the ANN’s nearly consistent significance assignments (~ 0.16–0.20) across all features.

Fig. 15
Fig. 15The alternative text for this image may have been generated using AI.
Full size image

Feature importance heat map for ML models.

Conclusion

This study investigated various concrete mixes by evaluating their mechanical strength, microstructure, and the predictive performance of machine learning models. Experimental blends incorporated 100% manufactured sand (M-sand), 10% fly ash (FA), and silica fume (SF) at levels of 0%, 6%, 12%, 18%, and 24%. The following key conclusions were drawn from the research:

  1. 1.

    Partial replacement of SF with cement and M-Sand in the design mixes containing these combinations resulted in improvement in the mechanical properties of mixtures at the curing ages compared with reference mixes (M1).

  2. 2.

    Replacing cement with SF12%, FA10% and M-sand (M4) as fine aggregate improved about 16.85% in CSM, 38.95 in STS, 32.50% in FTS at 28 days curing compared with reference mixes (M1). The incorporation of FA, SF with cement and M-sand as fine aggregate, the quality of concrete is excellent with a velocity > 4.7 km/s in all the mixtures.

  3. 3.

    TGA results show that FA–SF mixtures exhibit less than 2.5% weight loss at 250 °C, increasing to ~ 7% between 400 and 600 °C and ~ 10% between 600 and 800 °C, with major losses above 600 °C attributed to carbonation loss, portlandite decomposition, and C–S–H bond disruption, leading to increased porosity and reduced compressive strength.

  4. 4.

    Statistical analysis models are developed for mixes that demonstrated superior accuracy compared with other properties, providing reliable tools for practical implementation, and sensitivity analysis attributes a strong positive influence on the binders and M-Sand on the output parameters.

  5. 5.

    SEM–EDS results indicate that the incorporation of FA, SF, and M-sand promotes the formation of calcium-rich phases and secondary ettringite, resulting in a denser and more refined interfacial transition zone (ITZ). The improved chemical integrity of the ITZ enhances overall concrete performance. EDS analysis further revealed an optimized Ca/Si ratio of 1.46 for mix M4 at 28 days, confirming the formation of stable C–S–H gel with superior binding capacity and mechanical performance.

  6. 6.

    Gradient Boosting achieved the best predictive performance (R2 = 0.9929) with the lowest errors, followed by AdaBoost and Random Forest (R2 > 0.98), while XGBoost and ANN showed moderate accuracy; LASSO performed poorly due to the limitations of linear modeling.