Introduction

The compressive strength of concrete reinforced with basalt fiber is a topic of significant interest due to the beneficial properties basalt fiber brings to concrete structures1,2,3. Adding basalt fibers to concrete typically increases its compressive strength, particularly at low to moderate fiber content4,5,6. Basalt fibers improve the microstructure of concrete by bridging micro-cracks and reducing crack propagation, which enhances the load-carrying capacity7,8,9,10. The compressive strength gains depend on the volume fraction and length of basalt fibers added11,12,13. Generally, adding 0.1–0.5% basalt fiber by volume can improve compressive strength, but excessive fiber content may lead to fiber clustering, negatively impacting strength and workability14,15,16,17,18,19. Therefore, finding the optimal dosage is essential. Basalt fiber-reinforced concrete (BFRC) generally exhibits improved toughness and ductility20,21,22,23,24,25. While basalt fibers primarily enhance tensile and flexural strengths, their reinforcing effect also contributes to compressive load distribution, delaying failure under high loads25,26,27,28,29. Basalt fibers increase the durability of concrete by reducing permeability and enhancing resistance to freeze-thaw cycles, chemical attacks, and other environmental stresses30,31,32. This can indirectly contribute to maintaining compressive strength over time, especially in aggressive environments33,34,35. Basalt fiber is eco-friendly, derived from natural volcanic rock, and is highly resistant to heat and corrosion23,24,25,26. It is an economical alternative to synthetic fibers, especially where sustainable and durable materials are prioritized in construction36. In summary, basalt fiber reinforcement in concrete can significantly enhance compressive strength, durability, and toughness, especially at optimal fiber dosages32. This makes basalt fiber-reinforced concrete a valuable material in structural applications where higher strength and longevity are required21. Basalt fiber is a green, non-polluting inorganic, nonmetallic substance. Because of its high tensile mechanical qualities, it is commonly utilized in fiber-reinforced concrete composite materials to increase the tensile strength of concrete37,38,39,40.

The integration of machine learning (ML) in modeling basalt fiber reinforced concrete (BFRC) offers a modern and efficient approach to optimizing its design, predicting performance, and improving its usage in engineering applications24. BFRC is a composite material known for its enhanced mechanical properties, durability, and crack resistance30. Predicting its behavior under different conditions can be complex due to non-linear relationships between input variables (e.g., fiber content, concrete mix ratios) and output properties (e.g., compressive strength, flexural strength) and variability in material properties and environmental conditions28. ML provides a cost-effective and time-saving alternative to extensive experimental trials by leveraging historical data for predictions and insights. Machine learning offers a transformative approach to modeling BFRC3. By providing accurate predictions and optimizing performance, ML enhances sustainable construction practices and supports engineers in designing durable and efficient structures. This approach represents a significant leap towards smarter, greener, and more resilient infrastructure. Concrete performance measurements typically comprise compressive resistance1, frost resistance2, fracture resistance3, and erosion resistance4. Compressive strength is the most essential reference index in the mechanical properties of concrete, as it has a substantial impact on its safe use41,42,43,44,45,46. As a result, accurately and quickly predicting the compressive strength of basalt fiber concrete (BFRC) is critical for practical engineering applications. To fully utilize the tensile mechanical capabilities of basalt fiber, the literature by Wang et al.5, Sun et al.6 added the material to concrete and conducted extensive study on the mechanical properties of basalt concrete.

Almohammed et al.7 investigated the prediction of concrete’s compressive strength using basalt fiber reinforced polymer (BFRP) using random forest, M5P, M5P-Stochastic, artificial neural network, and random tree-based models. Cement, aggregate, water, superplasticizer, fly ash, and BFRP were among the input variables used in the data sets that were gathered from several researches. We employed performance assessment indicators, and the best model was random forest. Almohammed and Thakur8 studied basalt fiber reinforced concrete’s (BFRC) compressive strength following a 28-day curing period. The compressive strength of M25 drops by 9.71–18.52%, according to the results. The strength of BFRC was predicted using soft computing methods such as Artificial Neural Network, Random Forest, and Stochastic Random Tree. The most efficient model was the stochastic random tree one. Other studies, Wei Chen et al.9 investigated the influence of basalt fibers on the mechanical characteristics of concrete by simulating cubic compression, axial compression, and crackling stress. They matched the simulation results to physical and mechanical test data from others and discovered that adding basalt fibers could increase concrete’s compressive capacity. Scholars have undertaken relevant study on the appropriate proportion of basalt over the last ten years. Hematibahar et al.10 created an algorithm for estimating the compressive strength and stress-strain curve of basalt fiber high-performance concrete (BFHPC) with the aid of a logistic map and a classical programming method. It was discovered that the algorithm was extremely accurate when it was tested following various curing times. BFHPC’s properties may be predicted with a high degree of confidence thanks to the algorithm’s potential, as demonstrated by its Coefficient of Determination (R2), which has a value of 0.96. Almohammed et al.11 developed multiscale models for estimating the flexural strength (FS) and split tensile strength (STS) of concrete reinforced with basalt fibers that will be used in construction. Academic studies and experimental analyses of concrete loaded with basalt fibers provided the data. To forecast the FS and STS, the study employed seven soft computing techniques: Random Forest, Stochastic Random Forest, Random Tree, Bagging Random Forest, and Artificial Neural Network. For FS, the Stochastic-RT model had superior predictive power, whereas for STS, the Bagging-RT model demonstrated superior accuracy. The length of the basalt fiber greatly affected FS prediction, although the curing period affected STS. Kavyaet al.12 created an artificial neural network (ANN) model to predict the strengths of glass and basalt fiber reinforced concrete (GFRC) and BFRC at 28 days’ age using literature data. The model considers parameters like aggregate-cement ratio, water-cement ratio, fly ash-cement ratio, fiber content, diameter, density, elastic modulus, length, and concrete strengths. Also, Hasanzadeh et al.13 described an effective implementation of machine learning models for predicting the mechanical properties of basalt fibre-reinforced high-performance concrete (BFHPC). Three different prediction algorithms were considered: linear regression (LR), support vector regression (SVR), and polynomial regression (PR). These models’ performance was assessed using the coefficient of determination (R2), mean absolute errors (MAE), and root mean square errors (RMSE). The results showed that the PR approach was more accurate and reliable than other prediction algorithms, indicating that it is suitable for assessing the mechanical features of BFHPCs.

Asghar et al.14 assessed the compressive and tensile strength of basalt fiber reinforced concrete (BFRC) utilizing gene expression programming (GEP), Artificial Neural Network (ANN), and Extreme Gradient Boosting (XG Boost). The study analyzed literature to validate GEP’s efficiency by comparing Regression (R2) values from all three models. The optimal BF content for industrial-scale BF reinforcement of concrete is explored, which could provide an economical alternative for industrial manufacturing. Almohammed et al.15 investigated the predictive capacities of Artificial Neural Network (ANN), Random Forest (RF), and Random Tree (RT) models for calculating concrete compressive strength (CS), with a focus on the impacts of Basalt Fiber (BF) and Polypropylene Fiber (PPF) on CS. The RF model beats the other models, with the curing duration being the most sensitive input for calculating CS. Najm et al.16 examined the application of statistical and artificial intelligence (AI) approaches to analyze and forecast the effect of basalt fibers on the mechanical behavior of high-strength sustainable self-consolidating concrete. The researchers used a concrete mix design that substituted 80% of the cement with a mixture of cementitious elements such as GGBS, FA, and SF. The study discovered that cement was the most successful in increasing or restoring mechanical strength, followed by GGBS and basalt fibers. Despite the unclear behavior in the experimental data, ANN tools demonstrated great prediction accuracy. This paper focuses on the complex combinatorial learning abilities of six sets of ensemble and classification machine learning models to predicting the compressive strength of basalt fiber reinforced concrete. Also, the Hoffman and Gardener’s method of sensitivity analysis has been utilized to evaluate the impact of the individual concrete components on the compressive strength. These are with a view to proposing more sustainable forecasting models for the production and utilization of the concrete.

Research gap and innovative statement

The research on basalt fiber-reinforced concrete (BFRC) has demonstrated its ability to enhance compressive strength, durability, and mechanical performance. Existing studies highlight that adding basalt fibers to concrete improves its microstructure, reduces crack propagation, and optimizes load distribution. The effectiveness of BFRC in improving compressive strength depends on factors such as fiber content, length, and distribution. While moderate basalt fiber inclusion enhances strength, excessive content can lead to clustering and workability issues. Various studies have explored the optimal fiber dosage, mechanical properties, and durability aspects, confirming that basalt fiber contributes significantly to structural performance. However, the complexities of BFRC behavior require advanced predictive tools beyond conventional experimental methods. Machine learning (ML) has emerged as a powerful approach for modeling BFRC properties. Prior research has applied ML techniques such as artificial neural networks (ANN), random forest (RF), support vector regression (SVR), and polynomial regression (PR) to predict compressive strength, flexural strength, and split tensile strength. Studies have shown that ML models can effectively capture nonlinear relationships between concrete mix parameters and mechanical properties. Some studies have also developed multiscale and ensemble models for improving prediction accuracy. Despite these advancements, there remains a need for more comprehensive and robust forecasting models that integrate sensitivity analysis for optimizing BFRC design. A significant research gap exists in systematically assessing the impact of individual concrete components on BFRC compressive strength using advanced sensitivity analysis methods. Most prior studies have relied on ML models for prediction without incorporating sensitivity analysis to determine the most influential factors in BFRC performance. Additionally, while various ML models have been applied, no consensus exists on the most reliable approach for predicting BFRC compressive strength across different datasets and environmental conditions. Many studies focus on a limited set of ML techniques, often without comparing the effectiveness of different models under varying conditions. This research addresses these gaps by integrating machine learning with the Hoffman and Gardener sensitivity analysis method to evaluate the impact of concrete components on compressive strength. By leveraging ML ensemble techniques and classification models, this study aims to propose more sustainable and accurate forecasting models for BFRC. The novelty lies in combining machine learning with sensitivity analysis to enhance predictive accuracy and optimize BFRC mix design. This approach will provide engineers with a data-driven framework for improving BFRC’s mechanical properties, leading to more resilient and sustainable construction materials.

Methodology

Data collected and preliminary analysis

A total of three hundred and nine (309) records were collected from literature47 for compressive strength for different mixing ratios of basalt fiber concrete with concrete at different ages. Each record contains the following data: C-Cement content (kg/m3), FA-Fly ash content (kg/m3), W-Water content (kg/m3), SP-Super-plasticizer content (kg/m3), CAg-Coarse aggregates content (kg/m3), FAg-Fine aggregates content (kg/m3), Age-The concrete age at testing (days), L_b-length of basalt fibers (mm), d_bf-Diameter of basalt fibers (µm), V_bf-Volume content of basalt fibers (%) and Cs_bf-Compressive strength of basalt fibre concrete (MPa). The collected records were divided into training set (249 records≈80%) and validation set (60 records≈ 20%) in line with the requirements of Ebid et al.17. The appendix includes the complete dataset, while Table 1 summarizes their statistical characteristics. Finally, Fig. 1 shows Pearson correlation matrix, histograms, and the relations between variables.

Table 1 Statistical analysis of collected database.
Fig. 1
figure 1

Correlation, Distribution and Interpreting chart.

Sensitivity analysis

Sensitivity analysis evaluates how variations in input parameters influence the predictive accuracy of models for BFRC compressive strength. It provides insights into the key factors affecting the compressive strength and helps improve the reliability and efficiency of predictive models. This is aimed to determine the input factors (e.g., basalt fiber content, water-cement ratio, curing time, etc.) that significantly influence compressive strength predictions. Focus on the most sensitive parameters to refine and optimize BFRC formulations. Provide insights into model behavior, ensuring better transparency and usability in production. Understand parameter interactions and reduce prediction errors under varying conditions. Sensitivity analysis provides valuable insights into the factors influencing BFRC compressive strength predictions. It ensures optimized model performance and supports efficient and sustainable concrete production by prioritizing the most impactful parameters in mix design and quality control. A preliminary sensitivity analysis was carried out on the collected database to estimate the impact of each input on the (Y) values. “Single variable per time” technique is used to determine the “Sensitivity Index” (SI) for each input using Hoffman & Gardener formula18 as follows:

$$\:SI\:\left({X}_{n}\right)=\:\frac{Y\left({X}_{max}\right)-Y\left({X}_{min}\right)}{Y\left({X}_{max}\right)}$$
(1)

A sensitivity index of 1.0 indicates complete sensitivity, a sensitivity index less than 0.01 indicates that the model is insensitive to changes in the parameter. Figure 2 shows the sensitivity analysis with respect to Cs_bf.

Fig. 2
figure 2

Sensitivity analysis with respect to Cs_bf.

Research program

Five different ML techniques were used to predict the compressive strength of the basalt fibers concrete using the collected database. These techniques are “Artificial Neural Network (ANN)”, “Support vector machine (SVM), “K-Nearest Neighbors (KNN)”, “Tree Decision (Tree)” and “Random Forest (RF)”.The developed models were used to predict (Cs_bf) using the concrete mixture contents, age, and fiber dimensions. All the developed models were created using “Orange Data Mining” software version 3.36. The considered data flow diagram is shown in Fig. 3. The following section discusses the results of each model. The accuracies of developed models were evaluated by comparing SSE, MAE, MSE, RMSE, Error %, Accuracy % and R2 between predicted and calculated shear strength parameters values. The definition of each used measurement is presented in Eq. 2 to 7.

Fig. 3
figure 3

The considered data flow in Orange software.

$$\:MAE=\:\frac{1}{N}\sum\:_{i=1}^{N}\left|{y}_{i}-\widehat{y}\right|$$
(2)
$$\:MSE=\:\frac{1}{N}\sum\:_{i=1}^{N}{\left({y}_{i}-\widehat{y}\right)}^{2}$$
(3)
$$\:RMSE=\:\sqrt{MSE}$$
(4)
$$\:Error\:\%=\frac{RMSE}{\widehat{y}}$$
(5)
$$\:Accurcy\:\%=1-Error\:\%$$
(6)
$$\:{R}^{2}=1-\:\frac{\sum\:{\left({y}_{i}-\widehat{y}\right)}^{2}}{\sum\:{\left({y}_{i}-\stackrel{-}{y}\right)}^{2}}$$
(7)

Theoretical framework of selected machine learning techniques

ANN techniques

The ANN algorithm is formulated based on biological neural networks, and it involves a series of interconnected nodes or neurons, which work together to learn data and solve complex problems. Typically, the ANN model composed of three layers, namely input layer, hidden layer(s), and output layer, as described by Haykin19. A typical ANN architecture is illustrated in Fig. 4.

Fig. 4
figure 4

Typical ANN architecture.

The input data are denoted as a vector X=[x1, x2,…,xn], where xi​ are the input parameters.

The hidden layer does a transformation using an activation function to do a weighted sum of the inputs. Mathematically, hidden layer is

$$\:{h}_{j}=f\left(\sum\:_{i=1}^{n}{w}_{ij}{x}_{i}+{b}_{j}\right)$$
(8)

where wij​ are the weights, bj​ are the biases, and fo() is the activation function (normally a sigmoid, tanh, or ReLU function). Thus, in the output layer, the compressive strength is predicted by a combination of outputs from the hidden layers.

$$\:y=\:{f}_{o}\left(\sum\:_{j=1}^{m}{v}_{j}{h}_{j}+{b}_{o}\right)$$
(9)

where vj = weights from the hidden layer to the output layer, fo() = output activation function. Overall, the ANN model is trained to minimize error or loss function, given by mean squared error (MSE):

$$\:MSE=\:\frac{1}{N}\sum\:_{i=1}^{N}{({y}_{i}-\widehat{{y}_{i}})}^{2}$$
(10)

where yi= actual compressive strength, and \(\:\widehat{{y}_{i}}\) = predicted compressive strength

k-Nearest neighbours

The k-Nearest Neighbors algorithm, also denoted as k-NN, is a non-parametric and instance-based classification technique, which predicts the class of a query instance based on the majority class among its k closest neighbors in the population. Figure 5 shows the illustration of the K-nearest neighbours.

Fig. 5
figure 5

(adapted from Musolf et al.20)

Illustration of the K-nearest neighbours.

It operates by estimating the distance between the query instance and all other points in the dataset, commonly using Euclidean distance for continuous variables:

$$\:d\left(x,{x}^{{\prime\:}}\right)=\sqrt{\sum\:_{i=1}^{n}{({x}_{i}-{x}_{i}^{{\prime\:}})}^{2}}$$
(11)

where x and x′ are two instances in n-dimensional space.

Support vector

Support Vector Machines (SVMs) are supervised machine learning techniques mainly used for classification projects. In SVMs, finding the optimal hyperplane that maximally separates data points from different classes is achieved. Figure 6 shows the schematic of support vector algorithm. For instance, in linearly separable data, SVM can identify this hyperplane, through maximizing the distance or margin between the each data closest data points or support vectors.

Fig. 6
figure 6

(Adapted from Zou et al.21)

Sketch of support vector algorithm.

Considering dataset of labeled instances (xi,yi)) where xiRn and yi{−1,1}, the decision boundary becomes a hyperplane wx + b = 0,

where w = weight vector perpendicular to the hyperplane, and.

b = bias term. The optimization problem to maximize the margin is formulated as:

$$\:{}_{w,b}{}^{min}{\frac{1}{2}\left|\left|w\right|\right|}^{2}$$
(12)

Subject to the constraints:

$$\:{y}_{i}\left(w.{x}_{i}+b\right)\ge\:1\:\:\:\forall\:i$$
(13)

In the case of non-linearly separable data, SVM applies the kernel functions to project data into a higher-dimensional space, where a linear separation is possible. Common kernels include the linear, polynomial, and radial basis function (RBF) kernels. The decision function for classification is then:

$$\:f\left(x\right)=sign\left(\sum\:_{i=1}^{n}{\alpha\:}_{i}{y}_{i}K\left(x,{x}_{i}\right)+b\right)$$
(14)

where αi = Lagrange multipliers, and K(x, xi) = chosen kernel function.

Tree decision

Decision Trees are supervised learning algorithms, which are used for classification and regression projects. They are able to split data recursively using feature values to create a tree structure, having each internal node, branches and leaf nodes representing feature test, outcomes, and predicted values, respectively. A general layout of the tree decision approach is shown in Fig. 7.

Fig. 7
figure 7

General layout of the tree decision approach.

For example, considering a dataset D with classes C, the tree grows by selecting features that maximize the information gain or minimize the impurity. Hence, information gain IG for a split on feature X is respected as:

$$\:IG\left(D.X\right)=H\left(D\right)-\sum\:_{v\in\:values\left(X\right)}\frac{\left|{D}_{v}\right|}{\left|D\right|}H\left({D}_{v}\right)$$
(15)

where H(D) is the entropy or impurity of dataset D, and Dv​ is the subset of D for each value v of feature X.

Random forest

The random forest algorithm is an ensemble learning approach, which builds multiple decision trees for regression or classification project, and it improves the robustness and accuracy by reducing single trees overfitting. Each tree in the forest is trained on a different bootstrap sample of the dataset, with random subsets of features selected at each split, introducing diversity among trees. Figure 8 presents a schematic of the random forest algorithm.

For a training dataset D with n samples, for instance, Random Forest will construct m decision trees T1,T2,…,Tm. Thus, each of the trees is trained on a bootstrap sample Di​ (random sample with replacement) from D, and at each node, a random subset of k features is selected to find the best split. For classification, the output is determined by a majority vote across all trees:

$$\:\widehat{y}=mode({T}_{1}\left(x\right),{T}_{2}\left(x\right),\dots\:,{T}_{m}\left(x\right))$$
(16)

For regression, the output is the average prediction from all trees:

$$\:\widehat{y}=\frac{1}{m}\sum\:_{i=1}^{m}{T}_{i}\left(x\right)$$
(17)
Fig. 8
figure 8

(adapted from Zou et al.21)

Schematic of the random forest.

Results presentation and analysis

ANN model

The developed (ANN) model consists of one hidden layer with six neurons, the database was pre-scaled between (-1.0 to 1.0) using hyper-normalized technique, all the neurons use (Hyper-tan) as activation functions. The network was trained using Adam technique with regularization rate of 0.0001 and maximum 1000 iterations. The used hyper-parameters are shown in Fig. 9. The network layout is presented in Fig. 10, while the weights matrix of the developed model is listed in Table 2. The impacts of considered inputs on the output were ranked based on the summation of the absolute weights of each input; the relative importance of each input is illustrated in Fig. 11. It could be noted that Cement content (C) has the highest importance (17%), then water and aggregates contents (W, CAg, FAg) with (12–14%), then the plasticizer and fly ash (PL, FA) with (10%) and finally the age and basalt fibers (Age, L_bf, d_bf, V_bf) with (3–8%). The average achieved accuracy was (95%), the relations between calculated and predicted values for both training and validation dataset are shown in Fig. 12. This outcome agrees with the results of previous research reports on the application of the ANN22,23,24. The Artificial Neural Network (ANN) model achieving an average accuracy of 95%, structured with one hidden layer of six neurons, using a hyper-normalized database scaled between − 1.0 and 1.0, Hyper-tan activation functions, and trained with the Adam optimization technique (regularization rate: 0.0001, maximum: 1000 iterations), provides the following applications for the production of basalt fiber-reinforced concrete The hyper-tan activation functions capture complex nonlinear relationships between input features (e.g., basalt fiber content, water-cement ratio) and compressive strength. Regularization prevents overfitting, ensuring predictions are reliable for diverse production scenarios. The ANN identifies optimal basalt fiber dosage, minimizing material wastage while achieving desired strength. Helps balance compressive strength with other critical properties, such as durability and workability. It predicts compressive strength for production batches, ensuring uniformity and compliance with standards. Quickly identifies deviations in material properties or mixing conditions, enabling corrective actions. It reduces excessive trial-and-error testing, lowering raw material usage and production costs. Optimized mix designs reduce cement content and associated carbon emissions. Helps determine the most influential factors affecting compressive strength, supporting data-informed decisions. The model can be expanded or retrained with new data to accommodate evolving production needs. The Adam optimizer ensures efficient convergence even for large datasets, making the model adaptable to industrial workflows. The model can be adapted for other concrete types or reinforcement materials, enhancing its utility. This ANN model delivers exceptional predictive performance and versatility for designing and producing basalt fiber-reinforced concrete. By offering precise compressive strength predictions and supporting efficient material use, it enables consistent quality, cost savings, and sustainability in modern concrete production.

Fig. 9
figure 9

The considered hyper-parameters of (ANN) model.

Fig. 10
figure 10

Relation between predicted and calculated strength using (ANN).

Table 2 Weights matrix of the developed (ANN) model.
Fig. 11
figure 11

Relative importance factors of inputs based on the (ANN) model.

Fig. 12
figure 12

Relation between predicted and calculated strength using (ANN).

k-NN model

Considering number of neighbors of 1.0, Euclidian metric method and weights were evaluated by distances as presented in Fig. 13. The developed (k-NN) model showed the best accuracy. (k-NN) model showed the best performance where the average error% was (96%). This results corroborates with the behavior of the k-NN in previous research models23,24,25,26,27. The relations between calculated and predicted values are shown in Fig. 14. The k-Nearest Neighbors (k-NN) model, achieving an average accuracy of 96%, utilizing 1 neighbor, the Euclidean distance metric, and distance-based weighting, offers significant benefits for the production of basalt fiber-reinforced concrete. k-NN provides highly specific predictions by focusing on the closest data point(s), ensuring tailored compressive strength estimates for given material compositions. Distance-based weighting ensures that predictions emphasize the most relevant historical data, facilitating precise adjustments to mix designs. Predicts compressive strength for each batch, ensuring that the mix meets performance criteria before casting. Identifies inconsistencies between predicted and actual outcomes, allowing for immediate corrective measures. k-NN helps determine the ideal basalt fiber content, balancing cost and strength without overuse. Supports identifying the lowest feasible cement content while maintaining structural integrity. High accuracy reduces trial-and-error in production, leading to less material waste. Precise predictions enable resource-efficient production, aligning with eco-friendly construction practices. k-NN’s straightforward approach makes it easy to integrate into production workflows. The model is adaptable to new datasets, ensuring continued relevance as production conditions or material properties change. Helps identify critical parameters affecting strength, such as fiber dosage, water-cement ratio, and curing conditions. Empowers production teams with actionable insights to optimize material usage and design. The high accuracy and simplicity of the k-NN model make it a reliable tool for predicting and optimizing the compressive strength of basalt fiber-reinforced concrete. Its adaptability, cost-effectiveness, and ease of implementation support efficient and sustainable production practices in the concrete industry.

Fig. 13
figure 13

The considered hyper-parameters of (KNN) model.

Fig. 14
figure 14

Relation between predicted and calculated strength using (KNN).

SVM model

The developed (SVM) model was based on “polynomial” kernel with cost value of 100, regression loss of 0.10 and numerical tolerance of 1.0. The kernel is four-degree polynomial (quartic) as illustrated in Fig. 15. The average achieved accuracy was (95%). The relations between calculated and predicted values are shown in Fig. 16. This outcome supports the findings of previous research reports on the utilization of the SVM28,29. The Support Vector Machine (SVM) model, with an average accuracy of 95%, based on a four-degree polynomial kernel, a cost value of 100, regression loss of 0.10, and numerical tolerance of 1.0, presents substantial value in the production of basalt fiber-reinforced concrete. The polynomial kernel effectively models the complex, nonlinear interactions between input parameters (e.g., basalt fiber dosage, water-cement ratio, admixtures) and compressive strength. Accurate strength predictions allow for the creation of tailored concrete mixes to meet specific structural and durability requirements. SVM can predict compressive strength during production, ensuring consistency across batches. Identifies discrepancies between predicted and actual performance, reducing production errors. Helps determine the optimal basalt fiber content needed to achieve desired properties, reducing excess usage and cost. Supports optimizing cement use by incorporating fibers and supplementary materials without compromising strength. Precise predictions minimize material waste by avoiding overdesign or rejected batches. Optimized mix designs lower cement consumption, aligning with green construction goals. The SVM model can be embedded into production workflows for real-time adjustments and predictive control. Suitable for large-scale production scenarios due to its robust accuracy and ability to handle complex relationships. The SVM model provides insights into how variations in input parameters affect strength, aiding strategic decisions on material selection and mix design. Its predictive power supports innovative applications, including high-performance and lightweight concrete. The high accuracy of the SVM model, powered by the four-degree polynomial kernel, makes it a reliable tool for optimizing basalt fiber-reinforced concrete production. Its capacity to model nonlinear relationships ensures precise strength predictions, enabling efficient, sustainable, and cost-effective concrete manufacturing.

Fig. 15
figure 15

The considered hyper-parameters of (SVM) model.

Fig. 16
figure 16

Relation between predicted and calculated strength using (SVM).

Tree model

This model was developed considering minimum number of instants in leaves of 2.0 and minimum split subset of 5.0. The models began with only one tree level and gradually increased to 8.0 levels as shown on Fig. 17. The layout of the generated model is presented in Fig. 18. The average achieved accuracy was (95%). The relations between calculated and predicted values are shown in Fig. 19. This shows a close agreement with the results of previous research papers on the deployment of the tree model30,31,32. The Decision Tree model, with an average accuracy of 95%, leveraging a minimum number of instances in leaves of 2.0, a minimum split subset of 5.0, and a structure that evolved from 1 to 8 tree levels, offers significant applications in the production of basalt fiber-reinforced concrete. By progressively increasing tree depth, the model can identify complex, hierarchical relationships between input parameters (e.g., basalt fiber content, water-to-cement ratio) and compressive strength. Accurate predictions ensure optimized mix proportions to meet specific strength and performance requirements. The decision tree can quickly validate batch mixes by predicting compressive strength based on material properties and environmental conditions. Early identification of potential deviations in expected strength allows corrective action before production errors escalate. The decision tree’s ability to pinpoint critical thresholds for basalt fiber content minimizes unnecessary use, balancing cost and performance. Efficient prediction supports reducing cement content through alternative materials or admixtures without compromising compressive strength. The decision tree’s interpretable structure allows production engineers to understand the rationale behind predictions, facilitating trust in the model’s recommendations. Gradual increase in tree depth enables fine-tuning for different production scenarios, from simple to complex mix designs. Accurate predictions lower the risk of overdesign, which conserves materials and reduces carbon emissions. Reliable compressive strength predictions prevent rejected batches, improving material efficiency. This model can be integrated into automated systems, providing scalable solutions for large-scale concrete manufacturing while maintaining accuracy and efficiency. In conclusion, this decision tree model is a powerful, interpretable tool for predicting and optimizing the compressive strength of basalt fiber-reinforced concrete, enabling cost-effective, high-quality, and sustainable production.

Fig. 17
figure 17

The considered hyper-parameters of (Tree) model.

Fig. 18
figure 18

The layout of the developed (Tree) model.

Fig. 19
figure 19

Relation between predicted and calculated strength using (Tree).

RF model

Finally, the developed (RF) model has eight trees and eight levels as shown in Fig. 20. The developed models are graphically presented using Pythagorean Forest in Fig. 21. These arrangements leaded to an average accuracy of (95%). The relations between calculated and predicted values are shown in Fig. 22. This is in agreement with the outcome of previous research papers reported in the literature25,33. The Random Forest (RF) model with an average accuracy of 95%, utilizing 8 trees and a maximum depth of 8 per tree, provides highly reliable predictions of the compressive strength of basalt fiber-reinforced concrete. Its application in concrete production offers several advantages. The high accuracy of the RF model ensures precise estimation of compressive strength for varying basalt fiber proportions, water-to-cement ratios, and other mix parameters. The RF model can identify the optimal mix designs for specific strength requirements, minimizing trial-and-error approaches. The RF model can be integrated into production workflows to predict compressive strength based on material inputs and environmental conditions, enabling real-time quality monitoring. By reducing variability in strength predictions, the model helps ensure consistent product quality across batches. RF predictions can help identify the most efficient basalt fiber content, balancing performance improvements with cost considerations. Accurate modeling supports reducing cement content by supplementing with basalt fibers and other additives without compromising strength. The precise prediction reduces the risk of overdesign or underperformance, thereby minimizing material waste. Optimizing mix designs lowers the carbon footprint by using resources efficiently. The RF model’s robustness and interpretability make it suitable for deployment in automated systems for large-scale concrete production, ensuring scalability while maintaining accuracy.

In conclusion, this RF model is a valuable tool for achieving high-performance basalt fiber-reinforced concrete, offering cost-effectiveness, quality assurance, and sustainability benefits in production.

Fig. 20
figure 20

The considered hyper-parameters of (RF) model.

Fig. 21
figure 21

Pythagorean Forest diagram for the developed (RF) models.

Fig. 22
figure 22

Relation between predicted and calculated strength using (RF).

Overall, Table 3 presents the performance measurements of developed models for the basalt fiber concrete compressive strength (Cs_bf). It can be shown that the present research work outclassed other ML techniques applied in the previous research paper, which reported the utilization of the same size of data entries and balsalt reinforced concrete constituents17. Taylor chart for measured compressive strength of basalt fiber reinforced concrete predicted with ANN, KNN, SVM, Tree and RF is presented in Fig. 23. The Taylor Chart is a powerful graphical tool for comparing the performance of predictive models by illustrating three key statistical measures simultaneously: the correlation coefficient (R), the normalized standard deviation (σ), and the root-mean-square error (RMSE). It has assessed the accuracy of models in predicting the compressive strength of basalt fiber-reinforced concrete to determine which model aligns closest to the measured values based on statistical performance metrics. The depicted correlation coefficient (R) indicates the strength of the linear relationship between predicted and measured values. Closer to 1 means better prediction. The normalized standard deviation (σ) compares the standard deviation of predicted values to measured ones. An ideal value is 1, indicating a perfect match. The Root-Mean-Square Error (RMSE) reflects the average prediction error. Lower values indicate better performance. Artificial Neural Network (ANN) typically excels with complex, non-linear data. It may show a high R and relatively low RMSE. The k-Nearest Neighbors (KNN) performance depends on parameter tuning; it could perform moderately well if the correct number of neighbors and distance metrics are used. The Support Vector Machine (SVM) effective for regression tasks, particularly with the right kernel (e.g., RBF or polynomial). It may have high R and low RMSE. The Decision Tree is simple and interpretable but prone to overfitting, possibly resulting in moderate performance with higher RMSE. The Random Forest (RF) is expected to have robust performance with high R and low RMSE due to its ensemble nature. The model point closest to the reference point (ideal R = 1, normalized σ = 1, RMSE = 0) is the most accurate. Models deviating significantly from this reference point are less reliable. The Taylor Chart (Fig. 23) visually compares the predictive capabilities of these models, helping identify the most suitable one for accurately predicting the compressive strength of basalt fiber-reinforced concrete. Typically, RF and ANN show better alignment with measured data due to their ability to handle non-linearity and noise effectively. Finally, it can be deduced that after considering the performance indices of the selected ensemble and classification models utilized in this present research paper, all the developed modes have almost the same excellent level of accuracy 95%, three techniques were used to estimate the impact of each input on the compressive strength, namely correlation matrix, sensitivity analysis and relative importance chart.

Comparison to contemporary works

The present research compares favorably with the reviewed literature by demonstrating strong predictive performance across multiple machine learning models for estimating the compressive strength of basalt fiber-reinforced concrete (BFRC). Among the models applied, the k-nearest neighbors (KNN) model exhibited the highest accuracy (97%) and the lowest error (0.03%) in the training phase, with a coefficient of determination (R²) of 0.99, indicating near-perfect predictive capability. Artificial neural networks (ANN) and support vector machines (SVM) also performed well, achieving high accuracy (95–96%) and low errors (0.04–0.05%) in both training and validation phases, with R² values of 0.98, consistent with previous studies that identified ANN and stochastic-based models as effective predictors of BFRC strength. Compared to the reviewed literature, which highlighted the effectiveness of ANN, RF, and stochastic models, the present study confirms that ANN remains a robust method with high accuracy (95–96%) and low error rates. However, while prior research favored random forest (RF) in some cases, the present study indicates that RF had higher sum of squared errors (SSE) and mean squared error (MSE) compared to KNN and ANN, suggesting that alternative ensemble methods may provide better accuracy for BFRC strength prediction. Decision tree-based models, including RF and standalone tree models, showed slightly lower accuracy and higher errors than ANN and KNN, aligning with literature findings that ensemble tree models require optimization to match ANN performance. The results also reinforce the importance of model selection based on dataset characteristics. While stochastic models such as bagging and hybrid tree models were found to outperform others in some literature, the present study suggests that simpler models like KNN can achieve comparable, if not better, performance. The findings further validate the significance of sensitivity analysis in understanding parameter influences, aligning with studies that emphasize the role of individual concrete components in BFRC strength predictions. Overall, the present study enhances existing knowledge by providing a comparative analysis of multiple machine learning techniques, confirming the high predictive accuracy of ANN while highlighting the efficiency of KNN for compressive strength estimation in BFRC.

The present research findings align with and extend the results reported in the reviewed literature on the modeling of basalt fiber-reinforced concrete (BFRC) compressive strength. The artificial neural network (ANN) model in this study demonstrated high predictive accuracy, with an R² value of 0.98 in both training and validation phases, supporting previous studies such as those by Almohammed et al.7 and Hasanzadeh et al.13, which highlighted ANN’s superior predictive capability for BFRC. Additionally, Almohammed et al.15 found that random forest (RF) models were highly effective, though in the present study, RF exhibited slightly higher SSE and MSE compared to ANN and k-nearest neighbors (KNN), suggesting that while RF remains a reliable method, KNN demonstrated superior predictive efficiency in this case. The findings also confirm the observations made by Wei Chen et al.9, who noted the significant impact of basalt fibers on the mechanical characteristics of concrete, as the present study similarly shows strong model performance in capturing the influence of fiber content on compressive strength. Kavya et al.12 developed an ANN-based predictive model for BFRC and glass fiber-reinforced concrete, reporting high accuracy, which is consistent with the present study’s ANN results. Hematibahar et al.10 created an algorithm using classical programming methods that achieved an R² of 0.96 for BFRC strength prediction, closely aligning with the ANN and SVM models in the present research, which both attained R² values of 0.98. Furthermore, the stochastic models applied by Almohammed et al.11 for flexural and split tensile strength predictions found that bagging-based ensemble techniques provided optimal results. While ensemble models such as RF and tree-based methods were included in the present study, KNN outperformed RF, showing that model selection is highly dependent on dataset characteristics. The findings also reinforce the conclusions drawn by Asghar et al.14, who validated gene expression programming (GEP) and ANN for predicting BFRC strength, as ANN remains one of the top-performing models in this study. The sensitivity analysis conducted in the present research aligns with the approach used by Najm et al.16, who explored the impact of basalt fibers on the mechanical properties of high-strength self-consolidating concrete. The results confirm that basalt fiber dosage, along with other mix design parameters, plays a crucial role in determining compressive strength. In comparison to statistical and AI-based models used by Sun et al.6 and Wang et al.5, which also emphasized the importance of accurate model selection, the present research provides additional insights by applying multiple machine learning techniques and identifying KNN as a strong contender alongside ANN. Overall, the present study contributes to existing knowledge by validating the effectiveness of ANN while demonstrating that KNN can achieve comparable accuracy with lower SSE and MSE values. The results support previous literature while offering new insights into model selection48,49,50,51,52,53,54,55,56,57, suggesting that while ensemble tree models such as RF remain effective, simpler models like KNN can be more efficient in specific datasets for BFRC compressive strength prediction.

Table 3 Performance measurements of developed models for (Cs_bf).
Fig. 23
figure 23

Comparing the accuracies of the developed models for (Cs_bf) using Taylor charts, (a) Training dataset, (b) Validation dataset.

Conclusions

This research presents a comparative study between five ML techniques namely ANN, KNN, SVM, Tree and RF to estimate the compressive strength of basalt fibers concrete considering mixture components contents, fibers dimensions and concrete age. The present research has demonstrated the effectiveness of machine learning models in accurately predicting the compressive strength of basalt fiber-reinforced concrete. Among the models tested, the artificial neural network exhibited the highest accuracy, confirming findings from previous studies that have highlighted its superior predictive capabilities. The k-nearest neighbors model also showed strong performance, achieving lower SSE and MSE values, making it a competitive alternative. While ensemble-based models such as random forest and decision trees have been widely used in past research, their performance in this study suggests that simpler models can achieve comparable or even better accuracy depending on the dataset characteristics. The results align with existing literature, reinforcing the impact of basalt fiber content on concrete strength and the importance of selecting appropriate modeling techniques. Sensitivity analysis confirmed that dry unit weight, sand proportion, and void ratio are the most influential factors affecting shear strength, providing valuable insights for optimizing mix design in engineering applications. The findings contribute to the advancement of predictive modeling in sustainable construction, offering more efficient and cost-effective approaches for assessing BFRC performance. This research highlights the potential of machine learning in civil engineering, enabling engineers to optimize material properties and design more durable concrete structures. Future studies can further enhance prediction accuracy by integrating hybrid models and exploring larger datasets with diverse material compositions. The outcomes of this work provide a strong foundation for data-driven decision-making in the construction industry, ensuring improved performance, sustainability, and reliability of BFRC in structural applications. The itemized outcomes of this study could be concluded as follows:

  • All the developed modes have almost the same excellent level of accuracy (95%).

  • ANN, KNN, and SVR produced R2 of 0.98 each with KNN producing MAE of 1.4 MPa, and MSE of 2.5 MPa to outperform ANN and SVR which produced MAE of 1.55 MPa/MSE of 4.1 MPa and MAE of 1.6 MPa/MSE of 3.85 MPa, respectively.

  • Three techniques were used to estimate the impact of each input on the compressive strength, namely correlation matrix, sensitivity analysis and relative importance chart. The results of the three techniques are not matched or even close. This could be a good study point for farther researches.

  • All the developed models are too complicated to be used manually, which may be considered as the main disadvantage of the ML classification techniques compared with other symbolic regression ML techniques such as (GP) and (EPR).

  • The developed models are valid within the considered range of parameter values, beyond this range; the prediction accuracy should be verified.

Practical application of the research work

The findings of this research have direct applications in structural engineering, where the accurate prediction of basalt fiber-reinforced concrete (BFRC) compressive strength is crucial for optimizing material design and construction practices. The developed machine learning models can be used by engineers and researchers to assess concrete performance without relying solely on extensive laboratory testing, saving time and resources. Construction companies can utilize the predictive models to determine the optimal mix proportions of basalt fiber in concrete, ensuring improved mechanical properties and durability while maintaining cost-effectiveness. The insights from sensitivity analysis can guide material selection and mix design strategies, leading to the production of more resilient and sustainable concrete structures. These models can also aid in quality control during manufacturing, allowing for real-time adjustments to ensure consistency in BFRC performance. Furthermore, the research supports infrastructure development projects by enabling engineers to design high-performance concrete for bridges, highways, and buildings, particularly in environments requiring enhanced durability and crack resistance. The integration of machine learning in BFRC design promotes data-driven decision-making, fostering innovation in the construction industry and contributing to the advancement of sustainable and resilient infrastructure.

Recommendation for future

Future research should explore the integration of additional machine learning techniques, such as deep learning and hybrid models, to further improve the accuracy and robustness of BFRC compressive strength predictions. Investigating the effects of varying environmental conditions, such as temperature, humidity, and exposure to aggressive chemicals, on BFRC performance can enhance the practical applicability of the models. Expanding the dataset by incorporating a wider range of mix proportions, fiber lengths, and curing conditions would improve model generalization and reliability. The development of real-time predictive tools using Internet of Things (IoT) and sensor-based monitoring systems could allow for continuous assessment of concrete performance in structural applications. Future studies should also consider the long-term durability and aging effects of BFRC in different service conditions to establish comprehensive performance models. Collaboration with industry stakeholders to validate the models through large-scale experimental testing and field applications would bridge the gap between theoretical advancements and practical implementation. Additionally, sustainability assessments incorporating life cycle analysis and cost-benefit evaluations should be conducted to ensure the economic feasibility of BFRC in large-scale construction projects.

Limitations and how to address them

One limitation of the research is the reliance on a specific dataset, which may not capture the full range of variability in basalt fiber-reinforced concrete (BFRC) properties. This can be addressed by expanding the dataset with more experimental results from diverse sources, covering different mix designs, curing conditions, and environmental exposures. Another limitation is the potential overfitting of machine learning models, particularly when trained on limited data. Implementing cross-validation techniques, regularization methods, and hyperparameter tuning can help mitigate this issue. The study does not fully account for long-term durability factors such as creep, shrinkage, and degradation under extreme conditions. Future work should incorporate time-dependent performance evaluations and field studies to validate predictions over extended periods. Additionally, while the research focuses on compressive strength, other mechanical properties such as flexural and tensile strength should be further analyzed to develop a more comprehensive understanding of BFRC behavior. Addressing these limitations through broader data collection, advanced model optimization, and extended experimental validation will enhance the reliability and applicability of the findings.