Introduction

Self-consolidating concrete or SCC is an extremely flowable concrete that can flow by its own weight, thereby filling all cavity spaces in molds without vibration. SCC has drawn a lot of attention in current construction because it is efficient in enhancing the efficiency and quality of concrete placing, particularly in complex formwork and highly reinforced zones1. The rheological properties of self-compacting concrete (SCC) include slump flow (mm) (mm), V-funnel time (s), yield stress, viscosity, and flowability. They are particularly relevant to the construction sector, since the concrete is deposited within a construction when it is in the plastic state2.

Conventionally, rheological properties of SCC have been identified by experimental testing, which is time-consuming and comparatively expensive. Recent years have seen the adoption of sophisticated computational methods, including Gene Expression Programming (GEP), Deep Neural Networks (DNN), Decision Trees (DT), Support Vector Machines (SVM), and Random Forests (RF), in predicting these properties from mix design parameters and material properties. GEP, an evolutionary algorithm, can represent intricate relationships and provide simply interpretable solutions3. On the other hand, DNN is a sub-branch of ML in which good capabilities for handling large datasets and the learning of nonlinear patterns exist so that it could potentially become a promising tool for the prediction of concrete properties4.

Several machine learning techniques have been used to model SCC workability and rheology. For instance, Nunes et al.5 predicted SCC’s flowability using artificial neural networks, while Zhang et al.6 represented the influence of mix proportions on SCC’s yield stress using GEP. Although promising, further studies must be conducted that will compare the performances of GEP and DNN models for predicting SCC’s rheological properties. These should be integrated since each has its advantage.

The main goal of this study is to design and compare the performance of different machine learning (ML) models Gene Expression Programming (GEP), Deep Neural Networks (DNN), Decision Trees (DT), Support Vector Machines (SVM), and Random Forests (RF) to predict the rheological properties of self-consolidating concrete (SCC) with respect to slump flow (mm) and V-funnel time (s). An extensive dataset of 348 SCC mixtures was gathered from published literature, covering a broad variety of mix design parameters.

Rheometer

A rheometer is still a useful instrument in rheological testing; the shear strength of concrete being measured. More importantly, it is very useful in assessing the shear stress as well as flowability under different loading conditions, it is applicable to the study of concrete under conditions of stress like those that occur during placement and compaction.

The technical committee 266-MRP of RILEM has carried out round-robin testing to examine the performance of different rheometers in measuring yield stress and viscosity values in combination with flow behavior of cementitious materials7. Though the operational mechanism is different and the measurement procedure, the overall outcome of this round-robin testing program has proven the notion that the suitability of a rheometer depends upon the conditions and the type of material for which a test is being performed. Therefore, comprehension of the strength and weakness of each instrument will allow characterization with more accuracy and uniformity of the fresh concrete properties in any testing situation. Different types of rheometers given in Table 1.

Table 1 Different types of rheometers.

Factors influencing rheological properties as determined by rheometers

The measurement of rheological properties in cement-based materials depends, among others, on the inherent factors related to the rheometer in use. It comprises calibration, geometric shape, spacing between cylinders, both inner and exterior, and the choice between vane or coaxial geometries, as well as alternative geometries such cone systems. To comprehend the effect of these rheometer-specific Variables are important in cement-based materials for correct characterization8,9.

Several factors affect the measurement of rheological properties, and these are often associated with the design and capabilities of the used rheometer. Some of the important considerations that might affect the reliability and accuracy of rheological measures include:

The importance of calibration in ensuring accurate rheological measurements

For rheological measurements to be reliable, calibration is essential. Calibration is essentially used for perfectly correlating the rheometer’s sensors and the software with proper acquisition of data. The process might be affected by factors such as sensor resolution, long-term stability, and a little bit of calibration methodology used. In practice, different calibration procedures and standards applied may lead to inconsistency in the measurement of rheological properties among different rheometers. Therefore, standard calibration procedures are crucially important in minimizing inconsistencies and maximizing the comparison of measurements.

Impact of rheometer geometric design on rheological measurements

Rheological measurements are greatly influenced by geometric patterns of a rheometer, including the inner and outer cylindrical arrangement or any other structural element. The components’ height, shape, and radius are important factors to consider. Shear rate, shear stress, and flow behavior in examined materials can all be impacted by geometrical variations in the aforementioned factors. Because it affects the velocity gradient, which in turn affects the rheological properties measured, the distance among inside and outside cylinders or simply the distance in other configurations is particularly crucial. A larger gap, for instance, would lead to lower shear stress and possible YS undershooting. Therefore, it is crucial to standardize gap dimensions and reporting to ensure consistency and comparability of rheological data among various rheometers.

Influence of inner cylinder geometry on rheological measurements: Vane vs. coaxial configurations

Vane and coaxial geometries are crucial for identifying inner cylinders, and since each has unique advantages and disadvantages, the best one must be used depending on the application. High torque sensitivity is provided by vane shapes, which are connected to the inner cylinder and are perfect for yield stress (YS) and plastic viscosity (PV). However, when adopting vane designs, the flow near the rheometer walls may be nonuniform, which can restrict their accuracy, particularly for materials that exhibit big particles. Coaxial geometries, on the other hand, guarantee more dependable flow conditions by creating a consistent space between the inner and outer cylinders. Compared to geometries that use vane flow, this design usually lacks torque sensitivity, but it is ideally suited for evaluating the rheological characteristics of highly flowable materials. For more specific applications, there are other arrangements like helical screws or cone systems, but they typically include complications that make it more difficult to determine basic rheological characteristics. Certain results may need specific calibration procedures or be provided in relative units. When selecting a geometry, one must consider the characteristics of the material and the kinds of rheological parameters that are needed, constantly monitoring the advantages and disadvantages of various configurations.

To enable precise and reliable measurement of rheological characteristics in cement-based materials, considerations pertaining to rheometer calibration, geometry, and inner cylinder configuration must be made. The comparability of data between various rheometers will be improved by standardizing calibration procedures with accurate reporting of geometric characteristics, enabling insightful comparisons and a deeper comprehension of the behavior of such materials.

Rheological property prediction

Accurate predictions allow optimized mix designs while minimizing costs and allowing the desired performance to be achieved. Traditional prediction methods are based on empirical relations extracted from extensive laboratory testing. However, the complex and nonlinear factors governing the behavior of Self-Consolidating Concrete demand more advanced predictive methods. It is here that the domain of ML comes into action, with high potential because it can dig deep into complex relationships involved in huge data. Though promising, very few research has investigated the usage of ML to predict rheological parameters.

To predict the interface rheological characteristics of fresh concrete, a combined ML technique of integrating LSSVM with PSO was employed. This work used 142 experimental designs in conjunction with a tribometer for rheological property recording. Furthermore, the fresh properties of the concrete mixtures are not reported. Although the PSO-LSSVM model was strong in predictability, one of its major drawbacks was the limited number of avaiLabel experimental data. The authors suggested that better generalizability of the model would be achieved through higher experimental data accumulation10.

A comparative study is carried out with evolutionary artificial intelligence techniques using the Decision Tree (DT) and the Bagging Regressor (BR) models in predicting rheological properties of fresh concrete9. The use of 140 experimental points could cause some inconsistencies as both experiments were taken using different instruments to determine properties. The 6 input variabels used were cement, water, FA, CA, and Total Powder. The authors reported that the predictions of the rheological properties using ML algorithms are highly promising. Among the BR and DT methods that were tested, BR performed better than DT in terms of predictive accuracy. The statistical sensitivity analysis showed that major influential variables to PV predictions were found to be related to cement and medium-to-coarse aggregates, whereas for Yielding Stress, the greatest impact was identified in the case of small and medium coarse gravels9,10,11. In such instances, they came to use the same database and obtained similar limitations as those noticed in their earlier investigation12.

Several machine learning techniques, such as MLR, RF, DT, and SVM, have been used to predict SCC’s rheological properties10. The 100 results of SCC mixture were based on a prior study. However, it is unclear how the 100 mixtures were gathered since the referenced study only had to work with 59 Self-Consolidating Concrete mixtures except if mortar mixtures were included. The slump flow (mm), V-funnel time (s), and H2/H1 values were considered the input parameters used in this study. It was determined that the DT algorithm was the most effective for predicting yield stress of SCC, while the RF model showed to be the most accurate in predicting the plastic viscosity13,14,15.

A database containing data from several studies examining 170 SCC mixtures was used. It has been considered a major weakness that the database is not uniform, as data have been aggregated from sources with different methods of rheological measurement. The mortar mixtures were the subject of one study, which further adds to the diversity. 4 advanced machine learning techniques were used namely: LightGBM, XGBoost, RF, and CatBoost. Using multiple algorithms would be helpful in reducing some limitations imposed by inconsistent sources, but the findings should be critically analyzed due to inconsistencies reported for a relatively high R213,15,16,17,18,19.

A total number of 348 Self-Consolidating Concrete mixtures were analyzed from 19 published articles using various rheometers. The two methods used are SHAP and PDP used to investigate how the different SCC composition influences the rheological properties of Self-Consolidating Concrete. The model presented very good accuracy for prediction of rheological characterized as SCC, with R² values ranging between 0.93 and 0.98 and an Index of Agreement between 0.92 and 0.99. The investigation done by both SHAP and PDP indicated that the YS and plastic viscosity (PV) were inversely proportional to slump flow (mm), ratio, and segregation rate. However, a positive direct proportion was observed between those factors and time required for the V-funnel time (s)14,15,18,21,22,23,24,25,26,27,28,29,30,31,32,33,34.

There are several gaps evident in research work which mainly pertain to inconsistent source data since most of the work used for the compilation of this dataset has been sourced from different entities that sometimes-used measurement techniques and instruments. This causes inconsistencies and thus reduces the generalizability of results. Secondly, small sample size limits the applicability of the results. Better datasets are required to ensure improved reliability in the conclusions that are obtained. There is also high measurement error, particularly as seen with tribometers, which carries with it values of YS and PV, likely to be unusually high, whose accuracy becomes doubtful. In addition, the unclear origin of some data could also undermine the integrity of the results obtained15. Some studies already have compared different methods, but it would be valuable to conduct a more comprehensive analysis that compares machine learning techniques with conventional statistical techniques and hybrid methods using reliable datasets. More sensitive analyses conducted in detail would help better understand the reliability of the models when applied to various types of concrete and under other conditions. The issues have now been addressed and provided more precise, trustworthy, and broadly applicable findings in prediction of rheological properties for Self-Consolidated Concrete.

Research significance

Various studies have shown that ML models consistently demonstrate powerful predictor abilities for predicting rheological properties of self-compacting concrete. This assesses the viability of using advanced computational methods in the field. Based on selected algorithms, effectiveness of the adopted predictive models can vary. The BR was outperforming DT model in one study while XGBoost was doing exceptionally in another10,16. Besides, the input features of cement type and aggregate characteristics can affect models differently, and understanding which features have the greatest impact on the outcome can guide the design of future experiments. This research marks a paradigm shift in the approach to rheological predictions through use of the five ML algorithm. The article aims to generate more precise, efficient, and holistic predictions that will enhance the model and application processes of Self-Consolidating Concrete. Database creation here-from various research related to concrete and various rheometers-is full of many potentials beyond its academic interest. It would remake the practices and standards within industries, making SCC in construction a source of innovative and optimized change.

Methodology

Figure 1 illustrates the overall methodology adopted for predicting the fresh properties of Self-Consolidating Concrete (SCC) using machine learning techniques. The process begins with data collection, after finalized data divided into 15% test set and 85% train set and then followed by a thorough data analysis phase, which includes both descriptive analysis to understand the distribution of variables and a correlation matrix to examine inter-variable relationships. The insights from this analysis guide the model development stage, where five machine learning models Support Vector Machine (SVM), Decision Tree (DT), Gene Expression Programming (GEP), Deep Neural Network (DNN), and Random Forest (RF) are trained using the input data. These models are then assessed using various performance metrics, such as R2, Relative Absolute Error (RAE), Mean Percentage Error (MPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The final output is a comparative analysis of model performance, providing results that inform optimal model selection for accurate and efficient SCC property prediction.

Fig. 1
figure 1

Methodology flowchart.

Data collection and cleaning

To use machine learning algorithms to generate the proper prediction models, data collection is the first step. Collection and generation of reliable data for any model is the most tedious task in machine learning75. 2500 SCC mixes were gathered from 176 published studies between 2014 and 2024 to create a complete dataset35,63. In total, there were 2500 raw data points that were initially gathered according to published literature, online databases, and experimental sources. Several filtering procedures were carried out to get quality and consistency of data. The physically unrealistic entries like Water < 50 kg/m3 were dropped off and 154 published studies entries were dropped off. This was followed with deletion of 197 records in which the data was more than 20% missing in key mix parameters, and imputation of the records with minimal incomplete data records with the median of the variable. The detection of the outliers was performed with the help of the Interquartile Range (IQR) technique, which was followed by the removal of values that were further than 1.5 x IQR of the first (Q1) or third quartile (Q3). A heavy dose of special attention was drawn toward those variables that are highly skewed i.e. Admixture and TP, therefore pruning 281 outlier records. Copies and items that were in non-standard units were also standardized. After this sequence of cleaning, the resulting final dataset used in modeling consisted of 348 records of high compatibility mix data and was statistically sound, and unbiased with regards to missing values and extreme values.

Of 176 identified studies, 34 were screened full-text, and 19 matched the inclusion standards, and 348 unique SCC mixes were found after unit harmonization and duplicate elimination. Figure 2 shows the PRISMA flow diagram of the inclusion and exclusion of the study 19 papers as shown in Table 2. Rheological characteristics obtained only through empirical testing were not included. Inconsistent entries were identified through data cleaning and removed to ensure dataset integrity. Missing mix components were handled using mean imputation where appropriate. Outliers were detected using the interquartile range (IQR) method and were either removed or capped based on their influence on model performance. The rheological data from each rheometer was normalized using min-max scaling to reduce inter-device variability. Additionally, rheometer type was included as a categorical covariate in the model to account for instrument-specific effects.

Fig. 2
figure 2

PRISMA flow diagram of study selection.

Table 2 The database’s properties and references developed.

Descriptive analysis

Rigorous data standardization and normalization procedures ensure that all features were on a consistent scale79. Important mixture design factors, such as the ingredients’ specific gravity, were noted to calculate derived qualities like paste volume. Rheological and important new properties were collected. Table 3; Fig. 3, which also display the mean, standard deviation, ranges, and minimum and maximum, Skewness and kurtosis values for each parameter, highlight these mixing parameters.

Table 3 Descriptive analysis.
Fig. 3
figure 3

variation and distribution of data. Dot Represent Mean in the box.

Correlation matrix and data distribution for input parameters

Examining the linear and non-linear correlations between features and outputs is crucial for understanding the interplay within the dataset76. The Pearson correlation matrix provides valuable insights into the linear relationships between various parameters in the dataset74.The two heatmaps Fig. 4a and b illustrate correlation relationships among features in the dataset. In Fig. 4a, Cement and TP exhibit a moderately strong positive correlation (r = 0.53), while Water and CA show a negative correlation (r = -0.31), indicating an inverse relationship. V-funnel time (s) has weak correlations (r < 0.2) with most features, suggesting limited interdependence. In Fig. 4b, Cement and TP maintain a positive correlation (r = 0.45), consistent with Heatmap 1, and Water and CA again show a negative correlation (r = -0.31). Slump flow (mm) has weak correlations, such as with MGS (r = 0.07), reflecting minimal dependency. Across both heatmaps, Pairwise correlations remained below 0.80, suggesting limited risk of severe multicollinearity, although residual multicollinearity among interacting predictors cannot be fully excluded. Cement and TP emerge as influential variables, while V-funnel time (s) and Slump flow (mm) exhibit weaker impacts on the dataset.

Multicollinearity and interdependencies among input features can present challenges in ML modeling. When two or more features are strongly correlated, it becomes difficult for the model to disentangle their individual impacts, potentially leading to overfitting and reduced generalizability. Such issues undermine model accuracy and interpretability, making it harder to assess the unique contribution of each feature. To mitigate these challenges, it is generally recommended that correlations between features remain below 0.80. In this study, the correlations between all feature pairs are below this threshold, minimizing the risk of multicollinearity and enhancing the robustness and interpretability of the models23,25,26,40,41.

The scatterplot matrices visualize pairwise relationships among variables such as Cement (kg/m3), Total Powder (TP) (kg/m3), Fine Aggregate (FA) (kg/m3), Coarse Aggregate (CA) (kg/m3), Water(kg/m3), Admixtures (Adm)%, and Maximum Grain Size (MGS) (mm), and their influence on response variables like V-funnel time (s) Fig. 5a and Slump Fig. 5b. The diagonal plots display the distribution of each variable, while the off-diagonal scatterplots show how each pair of variables is related. Linear trends in the plots indicate strong correlations, while scattered points suggest weak or no relationships. These matrices are valuable for identifying correlations, detecting outliers, and selecting key variables that influence the response variables, providing a quick and comprehensive overview of the dataset’s structure.

Fig. 4
figure 4

a Heatmap of V-funnel time (s) and b Heatmap of Slump flow (mm).

Fig. 5
figure 5

a Scatter plot for V-funnel time (s) and b scatter plot of Slump flow (mm).

Model development

The predictive models for rheological properties of SCC are developed in this research by applying GEP, DNN, DT, SVM and RF. These methods capture unique powers for analyzing intricate and non-linear relationships between input parameters and output properties.

Gene expression programming (GEP)

GEP was used as it combines the strengths of genetic algorithms and symbolic regression to identify explicit mathematical expressions representing the relationship between the mixture design parameters and the target rheological properties. A split of 85% of the dataset for training and 15% as a test set was done and Gene Expression Programming (GEP) model was created with a set of functions based on the equation {+, −, ×, ÷, √, log, exp} The operation used was addition (+), and the training process was stopped when there was no significant improvement (i.e. less than 0.001 reduction in error) in 50 consecutive generations or when the training had reached 1000 generations. The number of chromosomes and genes in the current study were determined using a trial-and-error approach and suggestions from earlier research65,66. These parameters were adjusted until the highest level of accuracy was achieved in the model. Schematic sketch of GEP model shown in Fig. 6.

Fig. 6
figure 6

Gene expression programming representation.

Key parameters for GEP, like gene length, number of chromosomes, and mutation rates, were optimized to maximize predictive performance in the prediction for both Slump flow (mm) and V funnel. The algorithm was run on a preprocessed dataset that had outliers removed and missing values added, thus making strong predictions.

Deep neural networks (DNN)

The DNN model, being capable of capturing high-level abstraction with the help of large datasets, was developed through multiple layers of neurons that are interconnected with nonlinear activation functions. The number of hidden layers and neurons was optimized to achieve maximum accuracy through trial and error67. The architecture consisted of an:

The Deep Neural Network (DNN) consisted of three hidden layers with 128, 64, 32 units with ReLU activation and dropout rate of 0.2 with a final linear output layer. The model was trained in batches of 32 to a maximum of 500 epochs using the Adam optimizer (learning rate = 0.001) and learning rate reduction schedule, and the early stopping with 30-patience was utilized to prevent overfitting. The random seed was fixed (42) to make it reproducible, and hyperparameter optimization was performed on a fixed validation set that was not dependent on the test set in any way. The DNN was trained using an Adam optimizer and mean squared error (MSE) as the loss function. Applications of early stopping and dropout can avoid overfitting. A split of 85% of the dataset for training and 15% as a test set was done, and other hyperparameters such as the number of neurons per layer, batch size, and learning rate were adjusted to optimize the performance using grid search. schematic sketch of DNN shown in Fig. 7.

Fig. 7
figure 7

Schematic sketch of DNN Model.

Decision trees

Decision Trees are one of the most popular machine learning models for classification and regression tasks. They operate by splitting the dataset into subsets based on feature values in a tree-like structure. Each internal node represents a decision based on a specific feature, and each leaf node corresponds to an outcome or prediction. Decision Trees can be easily interpreted and visualized in understanding complex datasets. Nonetheless, they are prone to overfitting, especially as the tree becomes very deep in which case generalization could be impacted.

Techniques such as pruning or the setting of maximum depth commonly must be applied for overfitting. As simple to use, they would not necessarily perform better than even more complex models in problems with high-dimensional data or noise. Despite this, Decision Trees are the base for a lot of advanced models such as Random Forests and Gradient Boosted Trees that boost accuracy and robustness through the combination of multiple trees. They can handle categorical and numerical data, so they are versatile for any application. A split of 85% of the dataset for training and 15% as a test set was done. schematic sketch of DT shown in Fig. 8.

Fig. 8
figure 8

Schematic sketch of DT model.

Support vector machine (SVM)

A support vector machine (SVM) is a supervised machine learning algorithm designed for classification and regression tasks73. SVMs work by discovering an optimal hyperplane for the features of the data points from different classes which maximize the margin and pushes the support vectors, the closest data points to the boundary, as far apart as possible from it. SVMs are very effective in handling high-dimensional data and can model nonlinear relationships using kernel functions such as polynomial, radial basis function (RBF), or sigmoid kernels.

Although SVMs are powerful tools, they tend to be computationally expensive, especially for large datasets. Their performance is strongly dependent on the choice of kernel and hyperparameters, including the regularization parameter (C) and the parameters of the kernel. Despite these challenges, SVMs perform well in applications that require high precision, such as text categorization, bioinformatics, and image recognition. A split of 85% of the dataset for training and 15% as a test set was done schematic sketch of SVM shown in Fig. 9.

Fig. 9
figure 9

Schematic sketch of SVM model.

Random forest (RF)

Random Forest is an ensemble learning method that combines the predictions of multiple Decision Trees so that the model can predict more accurately and avoid overfitting. Each of the trees in a Random Forest is trained on some random subset of the data along with features, so the overall trees are diverse. A Random Forest aggregates the individual predictions of all the trees using majority voting for the classification task or averaging in the case of regression. It provides a balance between bias and variance and can effectively handle big datasets and high-dimensional data. But compared to individual decision trees, it can be less interpretable and computationally costly53.

Random Forests are versatile models. It can be used for classification and regression problems effectively without showing overfitting issues during the training phase since they implement randomness in it. In addition, although computational requirements may increase as their ability to use the feature scale increases for very big feature space, they remain useful, especially for problems requiring less complexity like in a certain medical diagnostics problem or finance modeling or even some types of environmental science problem. A split of 85% of the dataset for training and 15% as a test set was done schematic sketch of RF shown in Fig. 10.

Fig. 10
figure 10

Schematic sketch of RF model.

Performance metrics

The accuracy of GEP and DNN models was assessed using R², Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) criteria shown in Table 4. Such metrics measure how successful the applied model is in generalization as well as the reliability of its predictions. The comparison between models showed that they achieved very high prediction performance.

Performance assessment

To ensure that robust models are created that may forecast the output with low error-free outcomes, it is important to evaluate the estimation capacity of machine learning models using a variety of statistical metrics63. To evaluate the effectiveness of the models in both the training and testing stages, this work employs three of the most popular error metrics. Table 4 lists the applied error metrics, their mathematical formulas, their relevance, and the ideal values at which models should be approved.

Table 4 Error evolution criteria.

Results and discussions

Regression slope analysis

The regression plots for training and testing are shown in Figs. 11 and 12, where the estimated data is shown on the y-axis and the actual records are shown on the x-axis. As more data closely matches the line, demonstrating a high degree of agreement between the actual and anticipated values, the model performs better. Furthermore, there is excellent alignment between the model’s predictions and the actual values when the regression slope (RS) is greater than 0.868,69. As seen in Fig. 11, all the V-funnel time (s) models that were generated showed regression slopes. The suggested ML models’ high prediction accuracy was demonstrated by the regression slopes training data for GEP and DNN, which were both above the suggested cutoff of 0.80. The GEP and DNN model demonstrated a strong ability to predict the fresh property with exceptional R- Square values of training dataset is 0.957 and 0.950 Respectively.

As compared, the Decision Tree model exhibits a significantly reduced performance level where R-squared is lower at 0.818. The SVM and RF models are the least effective models in this dataset, with predicted values recording the greatest deviation with the actual value with an R-squared of 0.781 and 0.643 respectively. Overall, the GEP and DNN models are decisively better, as is the SVM model, which is considerably less stable.

Fig. 11
figure 11

Observed vs. predicted V-Funnel time (s) using Linear Regression for training and testing datasets.

Similarly Fig. 12 indicates the predictive performance of Gene Expression Programming (GEP), Deep Neural Network (DNN), Decision Tree (DT), Support Vector Machine (SVM) and Random Forest (RF) models for slump flow (mm) the slump prediction range is 426.136–890 mm for training data set and for testing 380–810 mm. Both models, GEP and DNN, exhibit high levels of accuracy since they predict values that are quite like the real ones, as shown by the high R-squared scores of training data is 0.915 and 0.911, respectively and testing is 0.901 and 0.894 respectively. As compared, the Decision Tree model exhibits a significantly reduced performance level where R-squared is lower at 0.885. The SVM and RF models are the least effective models in this dataset, with predicted values recording the greatest deviation with the actual value with an R-squared of 0.599 and 0.435 respectively. Overall, the GEP and DNN models are decisively better, as is the SVM model, which is considerably less stable.

Fig. 12
figure 12

Observed vs. predicted slump flow (mm) using Linear Regression for training and testing datasets.

Residual analysis

Residual analysis for slump flow (mm)

Figure 13 shows residual plots of the slump prediction based on the five models, i.e., GEP, DNN, DT, SVM, and RF, help in depicting the distribution and normality of the residuals thus assessing the performance of the model. The GEP and DNN models are excellent in their performance, with Q-Q plots that are close to the diagonal line and their histograms of the residuals symmetrically and sharply peaked distributions with the mode at zero-meaning normal distribution of residuals and errors that are minimal. Random Forest (RF) is also satisfactory in that it exhibits a slight deviance in the tails. Conversely, the Q-Q plots of DT and SVM are more deviated to normality, and the histograms are wider and tend to be slightly skewed, meaning that the predictions are less consistent, and the residual variance is greater. In general, GEP and DNN are the most successful models in predicting slump relying on the behavior of the residuals, as well as RF being a close second.

Fig. 13
figure 13

Residual analysis for slump.

Residual analysis for V-Funnel

Figure 14 represents the residual analysis of five models GEP, DNN, DT, SVM, and RF in predicting V-funnel time (s). The Q-Q plots and the histograms of the residuals will be used to determine the distribution of the errors of prediction and their dispersion in each of the models. The Q-Q plots of the GEP and DNN highly follow the line of reference thus showing that the model at hand is approximately normally distributed with their histograms being very sharp and well centered on the origin which implies high degree of prediction and low range of error variability. By contrast, DT and SVM have high deviations in Q-Q plot and broader and more scattered histograms that indicate non-normal residuals and less reliable predictions. RF exhibits moderate ability, and residual patterns are superior to SVM and DT but not very consistent like GEP and DNN. All in all, both GEP and DNN have a better performance in modeling V-funnel time (s); this can be seen in residual behaviors.

Fig. 14
figure 14

Residual analysis for V-Funnel.

Model performance using Taylor diagram

The Taylor diagram provides a comprehensive graphical representation of model performance by simultaneously displaying the correlation coefficient, standard deviation, and centered root mean square error (RMSE) between predicted and actual values. In this study, Taylor diagrams were used to compare the predictive accuracy of five models (GEP, DNN, DT, SVM, and RF) for slump flow (mm) and V-funnel time (s). Models positioned closer to the red reference point indicate higher correlation, better match in standard deviation, and lower error. For both properties, GEP and DNN models showed the best agreement with experimental values, while RF and SVM exhibited relatively lower performance, as indicated by their greater distance from the reference point.

Figure 15a and b indicate that the GEP and DNN models exhibited better accuracy for Slump flow (mm) and V-funnel time (s)prediction. Similarly, Fig. 15b shows that the GEP and DNN model also demonstrated high accuracy for Slump flow (mm) prediction. Overall, the GEP and DNN model showed better accuracy for both v funnel and Slump flow (mm).

Fig. 15
figure 15

Taylor diagrams comparing the predictive performance of (a) Slump flow (mm) (b) V- Funnel Time.

Statistical assessment of the models

Low mean squared error (MSE), Root Mean squared error (RMSE), Mean absolute error (MAE), and mean absolute percentage error (MAPE) values underscore the model’s proficiency in minimizing prediction errors77.

Table 5 summaries the predictive performance of all of the models for v-funnel time (s) using a grouped cross-validation with studies and rheometers defining the groups so that there was no overlap between training and test folds. The grouped splits are reported as a mean and standard deviation of repeated results. In sum, GEP and DNN were always better than their conventional machine learning baselines their R2 value is 0.957 and 0.950 respectively.

Table 5 Error metrics for V funnel.

Table 6 presents the performance of the models in predicting slump flow (mm), evaluated with grouped cross-validation to prevent train–test overlap at the study/rheometer level. GEP and DNN clearly provided the most reliable predictions, achieving R² = 0.915 ± 0.04 and 0.911 ± 0.05, respectively.

Table 6 Error metrics for slump.

Explainable artificial intelligence (XAI)

Explaining machine learning (ML) predictions requires mathematical computations, supporting hypotheses, and an understanding of the underlying mechanisms71. Interpretability approaches are commonly employed to make ML models more accessible to non-technical audiences. In this paper we used SHAP and PDP to better interpret the model.

SHAP violin analysis

Feature attribution was done with the TreeSHAP variant of SHAP which gives exact Shapley values for tree-based models. Expected values were also computed using the training data, which served as the background dataset, and were therefore consistent with the model fitting distribution. Because SHAP values may be sensitive to multicollinearity, we also assessed pairwise correlations among input variables; in the presence of strong correlations (|r| > 0.8), we reported SHAP rankings considered jointly instead of assigning importance to a single feature. Finally, to ensure the robustness of the attributions, we contrasted SHAP rankings with permutation importance and found agreement.

Post-training, SHAP (SHapley Additive exPlanations) analysis determine the contribution of each feature to the predictive outcomes of the model78.

The generated models are interpreted in this work using SHAP, which offers insights on feature relevance and how it affects output predictions. Multicollinearity and potentially synergistic effects among variables are satisfactorily addressed by the SHAP methodology72. The SHAP values for each parameter are displayed in Fig. 16a. With the greatest SHAP score (about 10) in Slump, total powder (TP) had a notable effect on Slump flow (mm), MGS, FA, cement, Adm, water, and CA. Water had the greatest mean absolute SHAP score in V Funnel modeling, followed by cement concentration, FA, CA, TP, and MGS Fig. 16b. As seen in Fig. 17a and b, the influence of selected factors on model output was examined using the SHAP violin summary graphic. The SHAP violin score of parameters is displayed in Fig. 17a and b. The output target is positively impacted by the feature if the SHAP violin score is positive; higher values signify a stronger influence. It is observable. Slump is negatively impacted by high TP. The Table 7 presents the global mean absolute SHAP values for each target, in descending order of feature contribution to model predictions. For slump flow (mm), total powder and mean grain size were the predominant variables whereas in v-funnel time (s), water and fine aggregate were the predominant variables. This demonstrates that different rheological properties are controlled by different mix design parameters (powder-related properties affecting one target and water binder ratios affecting the other).

Table 7 Global mean |SHAP| feature importance ranking per target.
Fig. 16
figure 16

SHAP Analysis (a) For Slump (b) For V-funnel time (s).

Fig. 17
figure 17

SHAP violin plot (a) V Funnel time, s, (b) Slump flow (mm), mm.

PDP explanation

Figures 18 and 19 show the PDP plots for slump flow (mm) and V-funnel, respectively. Partial dependence plots were trimmed to realistic mix-design domains for interpretation: cement (230–550 kg/m3; The partial dependence plots of slump-flow show the ability to interpret feature-response trends. Cement content (230–550 kg/m3) increases slump-flow up to about 450 kg/m3 (about + 60 mm) and then the effect becomes constant. Total powder (350–650 kg/m^3) exhibits a non-monotonic behavior: slump-flow decreases by − 40 mm up to 450 kg/m^3 and then increases again by approximately + 50 mm towards 600 kg/m^3, probably due to admixture dosage interactions. Coarse aggregate (550–950 kg/m3) has a moderate effect (plus or minus 20 mm). Water content has a very significant influence on slump-flow, resulting in an increase of about + 50 mm between 160 and 200 kg/m3 (keeping other parameters at median). Across all the PDPs, 95% CI bands and data-density rug plots are presented, and regions of non-monotonicity are highlighted explicitly as being due to interactions of the features (e.g. high TP with low admixture) or to sparsity of the data rather than model instability.

Figure 19 shows partial dependence plots for V-funnel which shows obvious quantitative patterns. Cement increase from 200 to 550 kg/m3 increases V-funnel time (s) by approx. +5 s. The non-linear effect is strongest for coarse aggregate: V-funnel decreases by -4 s between 550 and 750 kg/m^3 then increases again by + 5 s up to 950 kg/m^3. A U-shaped relation was observed for water, with water decreasing V-funnel by -6 s up to a W/P ratio of 1.0 before rising by + 8 s as W/P neared 3.5. By comparison, total powder causes less than 1 s variation and maximum grain size only impacts V-funnel up to 0.8 mm (approximately − 5 s) before the effect becomes constant.

Fig. 18
figure 18figure 18

PDP Plot for Slump flow (mm).

Fig. 19
figure 19figure 19

PDP Plot for V Funnel time (s).

Conclusion

This study showed how state of the art machine learning models can be used to predict the fresh properties of the self-consolidating concrete (SCC). Five ML algorithms GEP, DNN, DT, SVM and RF were compared based on their predictive power of slump flow and V-funnel time by using a strictly filtered dataset of 348 SCC mixtures. Of these, both GEP and DNN performed better than other models, with higher values of R 2 and other error measures, whereas explainable AI systems like SHAP and PDP provided valuable insights into the impact of mix design parameters. The results validate that a combination of ML and explainability techniques can be an effective method to achieve high accuracy of predictions, less dependence on tiresome laboratory analyses, and can justify optimization of SCC mix design in practice.

  • GEP and Deep Neural Networks (DNN) performed better with up to 0.93 and 0.82 of the R 2 respectively with V-funnel and slump flow respectively.

  • Taylor diagram, Q-Q plot and residual analysis confirmed the strength of GEP and DNN, and the results of the prediction and the experimental findings were in high agreement.

  • SHAP and PDP analyses showed that various mix design factors predominate various rheological characteristics, total powder and mean grain size have pronounced effect on slump flow, and water and fine aggregates condition V-funnel action.

  • Altogether, this study highlights the promise of explainable ML models to revolutionize SCC mix design optimization, allowing them to build more quickly, affordably, and reliably.