Introduction

In recent years, with the continuous development of high-rise buildings and large-span structures, the performance requirements of structural members are increasing. Concrete-encased steel (CES) columns have been widely used in modern building structures due to their excellent seismic performance, stable mechanical properties and good fire resistance1,2. The steel within concrete-encased steel columns is advantageous over traditional steel structures due to the encapsulating concrete, which prevents local buckling of the embedded steel when subjected to compression, thereby optimizing the material’s performance3. Compared with the traditional reinforced concrete structure, due to the high bearing capacity of steel, it can be used as a supporting member to bear the construction load before concrete pouring, which greatly shortens the construction period4. The constraint provided by the steel enhances the ductility and strength of the core concrete even more. In addition, with the rapid development of metallurgical manufacturing process and the rapid progress of science and technology, the use of high strength building materials in practical engineering is increasing5.

Currently, numerous researchers have extensively conducted experiments, numerical analyses, and theoretical investigations into the axial compression mechanical behavior of CES columns6,7,8,9,10,11. However, under the action of axial load, the interaction relationship between the material components of the CES column is extremely complex, showing a high nonlinearity. Of note is that the prevailing computational approach for determining the axial compression bearing capacity of CES columns relies on mathematical formulations, failing to adequately account for the material nonlinearity and the intricate interactions among the column’s constituent parts. Consequently, the precision of predicting the axial compression bearing capacity of CES columns necessitates further examination and refinement.

Machine learning (ML) has emerged as a potent data processing technique and is recognized as the most triumphant division within the field of artificial intelligence12. It can explore learning in a given database and find the influence of the internal variation of various parameters on performance13, especially for problems with strong uncertainty or limited by experimental research. ML is learned from the data characteristics of the database itself, and does not need to be defined by assumed mathematical expressions. The roots of ML in engineering extend to 1989, when Adeli et al.14 employed ML for the design of steel beams. As computer science and technology advance rapidly and artificial intelligence technology matures, the application of machine learning in structural engineering is increasingly gaining prominence. ML has found application in a multitude of areas, including structural analysis and design, structural health monitoring and damage detection, predicting the load-bearing capacity of structural members under diverse loading conditions, as well as in studying the mechanical properties of concrete and mix design optimization15,16,17,18,19. Miao et al.20 employed a machine learning algorithm to develop a prediction model for the ultimate strength of circular fiber-reinforced polymer concrete-steel tubular composite columns. The research findings indicate that the prediction model grounded in machine learning exhibits superior accuracy in its predictions. Wang et al.21 established a database of rectangular concrete-filled steel tubular composite columns, and carried out statistical analysis on the database. The prediction model for the ultimate bearing capacity of concrete-filled steel tubular columns subjected to eccentric loads was developed using three machine learning algorithms. The results show that the prediction accuracy is significantly improved compared with the standard design method. Zhou et al.22 proposed a prediction method combining data-driven and physical mechanisms. Based on the verification of physical mechanisms, a database for machine learning model training was constructed. The findings reveal that utilizing combined parameters in place of basic parameters significantly enhances the computational efficiency and operability of the model. The prediction accuracy of the ML model surpasses that of international design standards and overcomes the limitations of conventional methods. It can be seen that ML has achieved a series of results in other fields and has great application potential. However, the axial compression bearing capacity is affected by many factors and the development mechanism is complex. The complexity of the ML model makes it difficult to effectively explain its output results, which limits its application in the optimization design of axial compression bearing capacity. Shapley Additive Explanations (SHAP) facilitate the interpretation of machine learning models by quantifying the impact of input variables on the model’s output. This is achieved through the analysis of the marginal contribution of each individual input variable to the overall prediction. SHAP can identify the characteristic factors that play a key role in model prediction and further explore the potential factors that affect the output results23,24. SHAP quantifies the importance of characteristic factors and reveals their influence on bearing capacity. This transforms model output from “uninterpretable values” to “laws that can be used in engineering design,” enhancing trust in ML for structural engineering.

Considering the aforementioned analysis, employing the machine learning (ML) method is an efficacious strategy for investigating the mechanical behavior of concrete-encased steel composite columns and understanding the nonlinear interactions among various materials. This study compiled an extensive dataset, resulting in a database that encompasses 116 groups of axial compression test results for concrete-encased steel composite columns, which was subsequently subjected to statistical analysis. Secondly, the paper provides a concise introduction to three machine learning algorithms: Artificial Neural Network (ANN), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). Based on the three algorithms, the prediction model of axial compression bearing capacity of concrete-encased steel composite columns is established. An evaluation index system is formulated to assess the predictive performance of the model, with the most suitable model identified through comparative analysis. SHAP is utilized to elucidate both the global and instance-specific predictions of the model, meticulously analyzing the impact—both positive and negative—of individual features on the prediction outcomes for each sample25. It can not only provide the importance ranking of each feature, but also provide the influence law of features on the prediction results. It provides solid theoretical and technical support for designing and engineering section concrete-encased steel composite columns.

Establishment of CES column database

This study gathered 116 datasets of axial compression tests on concrete-encased steel composite columns and constructed a database to train machine learning algorithms26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48. In order to reduce the dimension of data in the database, improve the computational efficiency of machine learning, and strengthen the generalization ability of the prediction model, the variable parameters affecting the structural performance of composite columns were found based on the analysis results. Select cross-sectional width B (120–600 mm), section height D (120–600 mm), specimen height H (360–4100 mm), concrete compressive strength fc (16.77–130.2 MPa), built-in steel yield strength fa (240–769.6 MPa), steel yield strength fs (0–578 MPa), built-in steel cross-sectional area Aa (630-10380mm, stirrup ratio (0–2.79%), built-in steel cross-sectional form F (H-shaped and cross-shaped), longitudinal steel cross-sectional area As (0–7922.2mm, a total of ten input characteristics. The ultimate bearing capacity (Nu) of the composite column is the output parameter. When training the model, the steel section form is input into the model by the assignment method. When the steel section form is ‘H-shaped’, it is defined as ‘1’, and when the steel section form is ‘cross-shaped’, it is defined as ‘2’. Table 1 summarizes and describes the statistical distribution of input features \({\rho _{{\text{sv}}}}\)and output parameters in the database.

Table 1 Feature statistical distribution.

Figure 1 illustrates the distribution of the database, offering a clear and direct visualization of the intervals for input features and output variables. As depicted in Fig. 1, the material strength range in the database is more comprehensive, from low strength to high strength. To design the CES column effectively, one must assess the correlation between sample parameters to prevent multicollinearity issues from arising. Figure 2 displays the matrix of Pearson correlation coefficients for the sample parameters. In the figure, r is the Pearson correlation coefficient, and the mathematical expression is shown in Eq. (1). Observations indicate that the relationship between input parameters and the axial compression load capacity is intricate and nonlinear.

$$r=\frac{{n\sum\limits_{{i=1}}^{n} {{x_i}{y_i}} - \sum\limits_{{i=1}}^{n} {{x_i}} \sum\limits_{{i=1}}^{n} {{y_i}} }}{{\sqrt {n\sum\limits_{{i=1}}^{n} {x_{i}^{2}} - {{\left( {\sum\limits_{{i=1}}^{n} {{x_i}} } \right)}^2}\sqrt {n\sum\limits_{{i=1}}^{n} {y_{i}^{2}} - {{\left( {\sum\limits_{{i=1}}^{n} {{y_i}} } \right)}^2}} } }}$$
(1)

Note: x, y are two different variables.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Scatter plots of input features against the ultimate strength.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Correlation matrix of the database.

Furthermore, the variability in data sources leads to inconsistencies in the measurement methods for concrete strength across various literary works. The compressive strength of concrete in the established database is measured based on the size of three concrete test blocks, which are 150 mm cube test block (fcu,150), 100 mm diameter, 200 mm height cylinder test block (fcy,100), 150 mm diameter, 300 mm height cylinder test block (fcy,150). In order to reduce the error of the model prediction and improve the prediction accuracy of the model, it is necessary to convert the concrete strength, that is, all the test data of concrete strength is expressed by the same concrete strength through the Equation. In this paper, fcy,150 is used as the characteristic compressive strength of concrete, that is, fc = fcy,150. The compressive strength of other types of concrete is converted into characteristic compressive strength by using Eq. (2)49,50.

$${f_{{\text{cy}},150}}=\left\{ \begin{gathered} 1.67 \times {150^{ - 0.112}}{f_{{\text{cy}},100}} \hfill \\ \left[ {0.76+0.2{{\log }_{10}}\left( {\frac{{{f_{{\text{cu}},150}}}}{{19.6}}} \right)} \right]{f_{{\text{cu}},150}} \hfill \\ \end{gathered} \right.$$
(2)

The standardization of features can unify the value range of different features to the same scale, so that the mean value is 0 and the variance is 1. The standardized data will be more consistent with the Gaussian distribution, which can reduce the weight difference between different features and enhance the discrimination of each feature, so that the model can better capture the correlation and difference between data. 80% of the standardized data is randomly selected for machine learning model training, and 20% of the data is used for model verification. The data normalization conversion Equation is as follows:

$${x_{i,{\text{norm}}}}=\frac{{2\left( {{x_i} - {x_{i,\hbox{min} }}} \right)}}{{{x_{i,\hbox{max} }} - {x_{i,\hbox{min} }}}} - 1$$
(3)

Note

xi,norm is the normalized value of the i-th variable; xi,min is the original minimum value of the i-th parameter; xi,max is the original maximum value of the i-th parameter; xi is the original value of the i-th parameter.

Machine learning of axial compression bearing capacity of CES column

This section aims to select a black box model suitable for predicting the axial compression bearing capacity of CES columns. Initially, black box models for predicting the axial bearing capacity of CES columns are developed using ANN, RF, and XGBoost algorithms. Subsequently, the suitability of the aforementioned machine learning models for predicting the axial bearing capacity of CES columns is ascertained using a range of evaluation metrics.

Overview of machine learning models

Artificial neural network (ANN)

ANN is a computational model designed to mimic the architecture of biological neural networks. It consists of a multitude of interconnected units, known as ‘neurons’12. These units are structured into several layers: an input layer, one or more hidden layers, and an output layer. Each neuron receives signals from other neurons, and according to the weighting and bias of these signals, the activation function determines whether and how much to transmit the signal. The input layer is responsible for receiving and interpreting external signals or data. Each input node represents a feature of the data, and the output layer is where the network produces the final result. The number of nodes in the output layer is determined by the specific requirements of the network’s task, ensuring the production of the desired output results. The hidden layer, situated between the input and output layers, encompasses the weight parameters, bias terms, and the application of the activation function. The weight is the parameter connecting each neuron, which plays a decisive role in the importance of the input signal. The bias is the threshold of each neuron, which directly affects the activation state of the neuron, and the activation function determines the output of the neuron. The process of forward propagation can be articulated as follows:

$$y=f\left( {\sum\limits_{{i=1}}^{n} {{w_i}{x_{i,{\text{norm}}}}+b} } \right)$$
(4)

Note

wi and b are weight and neuron deviation respectively; f () is the activation function; y is the output value;

The loss function serves to quantify the discrepancy between the predicted and actual values. If the error between the two is unacceptable, the back propagation process is performed. Each connection’s weight is fine-tuned using the gradient descent method, and the iterative training process continues until either the specified number of iterations is completed or the predefined loss threshold is met, at which point the model training is concluded. Figure 3 shows the structure diagram of the artificial neural network model in this paper. The weight matrix and deviation matrix of the model are shown in Eqs. (57):

$${w_1}=\left( \begin{gathered} {\text{-0}}{\text{.6279 0}}{\text{.4807 -0}}{\text{.1532 0}}{\text{.3412 -0}}{\text{.5704 0}}{\text{.3246}} \hfill \\ {\text{ 0}}{\text{.4825 0}}{\text{.4348 0}}{\text{.2258 -0}}{\text{.1699 -0}}{\text{.6154 0}}{\text{.2187}} \hfill \\ {\text{-0}}{\text{.2886 -0}}{\text{.3901 0}}{\text{.6618 2}}{\text{.3719 -0}}{\text{.6102 0}}{\text{.6698}} \hfill \\ {\text{-0}}{\text{.1924 -0}}{\text{.0755 -0}}{\text{.1133 4}}{\text{.4103 5}}{\text{.0266 3}}{\text{.7356}} \hfill \\ {\text{ 0}}{\text{.1434 0}}{\text{.0107 1}}{\text{.7019 4}}{\text{.2794 -1}}{\text{.6281 1}}{\text{.6334}} \hfill \\ {\text{ 0}}{\text{.3152 0}}{\text{.1203 1}}{\text{.7569 1}}{\text{.7497 -1}}{\text{.713 0}}{\text{.2819}} \hfill \\ {\text{-0}}{\text{.3571 -0}}{\text{.5437 -0}}{\text{.4718 -1}}{\text{.2515 0}}{\text{.5103 -0}}{\text{.2385}} \hfill \\ {\text{-0}}{\text{.2425 0}}{\text{.4038 -1}}{\text{.0599 1}}{\text{.3246 -0}}{\text{.1527 0}}{\text{.2882}} \hfill \\ {\text{-0}}{\text{.0126 0}}{\text{.2191 0}}{\text{.1914 0}}{\text{.3526 -0}}{\text{.2815 0}}{\text{.1046}} \hfill \\ {\text{ 0}}{\text{.4316 0}}{\text{.1678 -0}}{\text{.1019 0}}{\text{.1250 -0}}{\text{.4711 -0}}{\text{.4779}} \hfill \\ \end{gathered} \right) {b_1}=\left( \begin{gathered} {\text{-}}0.5817 \hfill \\ {\text{ }}0.569 \hfill \\ {\text{-}}0.3388 \hfill \\ {\text{-}}0.5443 \hfill \\ {\text{ 0}}{\text{.4024}} \hfill \\ {\text{-}}0.0308{\text{ }} \hfill \\ \end{gathered} \right)$$
(5)
$${w_2}=\left( \begin{gathered} {\text{ 0}}{\text{.3735 0}}{\text{.7023 0}}{\text{.0889 -0}}{\text{.1639 -0}}{\text{.3773 0}}{\text{.5463}} \hfill \\ {\text{-0}}{\text{.5823 0}}{\text{.5796 0}}{\text{.0265 -0}}{\text{.2370 0}}{\text{.3860 -0}}{\text{.4378}} \hfill \\ {\text{ 0}}{\text{.1543 0}}{\text{.0562 -0}}{\text{.9245 -0}}{\text{.4493 -0}}{\text{.4592 0}}{\text{.1161}} \hfill \\ {\text{-0}}{\text{.2945 -0}}{\text{.6465 -0}}{\text{.4204 -0}}{\text{.4712 -0}}{\text{.2427 -0}}{\text{.0950}} \hfill \\ {\text{-0}}{\text{.3506 2}}{\text{.5710 4}}{\text{.7012 -0}}{\text{.4436 -0}}{\text{.2984 -0}}{\text{.2161}} \hfill \\ {\text{-0}}{\text{.0556 -0}}{\text{.0517 2}}{\text{.2405 -0}}{\text{.4646 0}}{\text{.0435 -0}}{\text{.0655}} \hfill \\ \end{gathered} \right){\text{ }}{b_2}=\left( \begin{gathered} {\text{ }}0.4149 \hfill \\ {\text{ }}0.6412 \hfill \\ {\text{-}}0.6552 \hfill \\ {\text{ }}0.2862 \hfill \\ {\text{-}}0.2524 \hfill \\ {\text{ }}0.1008 \hfill \\ \end{gathered} \right)$$
(6)
$${w_3}=\left( \begin{gathered} {\text{-0}}{\text{.4631}} \hfill \\ {\text{-2}}{\text{.5}} \hfill \\ {\text{2}}{\text{.3011}} \hfill \\ {\text{0}}{\text{.6287}} \hfill \\ {\text{-0}}{\text{.0089}} \hfill \\ {\text{-0}}{\text{.2223}} \hfill \\ \end{gathered} \right){\text{ }}{b_3}=\left( {0.8012} \right)$$
(7)
Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Structure diagram of ANN algorithm.

Random forest (RF)

Random forest51 is an ensemble learning method evolved from decision tree52. By constructing multiple decision trees, it is the underlying logic of operation, and each decision tree is independent in the training process. The RF algorithm uses bootstrap sampling method to select samples and features, and constructs multiple independent parallel decision trees, which all produce prediction results based on samples and features. By combining the results of multiple decision trees, a more accurate and stable overall model is obtained. The random forest algorithm has strong robustness, good resistance to common overfitting problems, and good generalization ability. The specific implementation steps are as follows: self-service sampling method is used to randomly select multiple subsets of the training data set, and each subset is used to train a decision tree; when each node splits, some features are randomly selected to determine the best splitting point, which is called feature random selection. Once the decision trees have been fully trained, each tree is capable of generating predictions independently for incoming samples. Subsequently, the predictions from all the decision trees are aggregated by calculating the average, which yields the ultimate prediction outcome. Figure 4 shows the structure of RF algorithm.

Distributed gradient boosting library (XGBoost)

The XGBoost algorithm was integrated utilizing the Scikit-learn and XGBoost packages within the Python 3.7 environment. XGBoost is an extensible end-to-end tree lifting system, which is enhanced in terms of loss function and loss optimization process, so that the lifting tree can exceed its own computational limit, so as to achieve fast operation and superior model performance, so as to achieve the purpose of engineering application53.XGBoost is an additive model composed of K base models, which has the characteristics of high operating efficiency and high precision. The basic model is classification and regression tree (CART), and Fig. 5 is the structure diagram of XGBoost algorithm.

$${\hat {y}_i}=\sum\limits_{{k=1}}^{K} {{f_k}} \left( {{x_i}} \right),{\text{ }}{f_k} \in F$$
(8)

Where xi is the i-th sample, \({\hat {y}_i}\) is the predicted value of the i-th sample, \({f_k}\left( {{x_i}} \right)\) is the calculated score of the k-th tree for the i-th sample in the dataset, and F is the set of all trees. Define the XGBoost objective function as

$$o=\sum\limits_{{i=1}}^{N} {l\left( {{y_i},{{\hat {y}}_i}} \right)} +\sum\limits_{{k=1}}^{K} {\Omega \left( {{f_k}} \right)}$$
(9)

Note

N is the number of samples, \(l\left( {{y_i},{{\hat {y}}_i}} \right)\) is the loss function and \(\Omega \left( {{f_k}} \right)\) is the regularization term. The loss function quantifies the alignment between the model’s predictions and the actual data, while the regularization term assesses the intricacy of the model’s structure. The objective function’s Taylor expansion is manipulated by integration and rearrangement, converting it into a polynomial expression that is connected to the prediction error. The optimal weight of the leaf node \(w_{j}^{*}\) and the optimal solution of the target value are obtained as follows:

$$w_{j}^{*}= - \frac{{{G_j}}}{{{H_j}+\lambda }}$$
(10)
$${o^*}= - \frac{1}{2}\sum\limits_{{j=1}}^{T} {\frac{{G_{j}^{2}}}{{{H_j}+\lambda }}+\gamma T}$$
(11)

Where \({G_j}=\sum {_{{i \in {I_j}}}} {g_i}\) and \({H_j}=\sum {_{{i \in {I_j}}}} {h_i}\), gi and hi are the first derivative and the second derivative of the loss function, respectively, \({I_j}=\left\{ {i|q\left( {{x_i}} \right)=j} \right\}\) is the sample group of the leaf j, and \(q\left( {{x_i}} \right)\) is the tree structure function; \({o^*}\)is the smaller the structural score of the tree, the smaller the the \({o^*}\), the smaller the total loss is; T is the number of leaf nodes; \(\gamma\) is the penalty coefficient of the model. In order to solve the target value, the greedy algorithm is used to divide the subtree. Greedy algorithm is an algorithm that achieves global optimization by controlling local optimum. Each time, it tries to add a partition point to the existing leaf node, enumerates the feasible partition points, and selects the partition with the largest gain54. The expression of the gain Equation is

$$G=\frac{1}{2}\left[ {\frac{{G_{L}^{2}}}{{{H_L}+\lambda }}+\frac{{G_{R}^{2}}}{{{H_R}+\lambda }} - \frac{{{{\left( {{G_L}+{G_R}} \right)}^2}}}{{{H_L}+{H_R}+\lambda }}} \right] - \gamma$$
(12)
$${G_L}=\sum\limits_{{i \in {I_L}}} {{g_i}} ,{\text{ }}{G_R}=\sum\limits_{{i \in {I_R}}} {{g_i}} ,{\text{ }}{H_L}=\sum\limits_{{i \in {I_L}}} {{h_i}}$$
(13)

Note

IL and IR are the sample sets of the left subtree and the right subtree after the tree classification, which \(G_{L}^{2}/\left( {{H_L}+\lambda} \right)\) is the information scores of the left subtree, \(G_{R}^{2}/\left( {{H_R}+\lambda } \right)\) is the information scores of the right subtree, and \({\left( {{G_L}+{G_R}} \right)^2}/\left( {{H_L}+{H_R}+\lambda } \right)\) is the current undivided information scores.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

RF algorithm structure diagram.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

XGBoost algorithm structure diagram.

Hyperparameter optimization and result display

Hyperparameters refer to the parameters that control the training and learning process of the model. By optimizing the hyperparameters of the model, the model can obtain the best prediction effect. Hyperparameter optimization aims to identify the best combination of hyperparameters, allowing the model to achieve the lowest possible value of the specified loss function for the provided dataset55. In order to improve the generalization performance of the model, this paper uses ten-fold cross-validation, and Fig. 6 is a cross-validation diagram. 80% of the data (training set) in the database is further divided into ten parts, of which nine parts are used as training sets for model training, and the remaining part is used as verification sets. The procedure is carried out a total of ten times, with a unique fold serving as the validation set in each iteration.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Ten-fold cross validation diagram.

A two-step strategy is used to optimize the hyperparameters. The first step is to select a wide range of values for each hyperparameter, create a larger space for the combination of hyperparameters, and select the hyperparameter combination in the space to perform ten-fold cross-validation on each model. The second step is to select a smaller range of values near the optimization value of the initial hyperparameters obtained in the first step to combine and cross-validate the model again. Finally, the hyperparameters with the strongest generalization ability of the model are determined as the optimal hyperparameters. Table 2 displays the optimal hyperparameters for the three respective models.

Table 2 Optimal hyperparameters adopted for the ANN, RF, XGBoost models.

Performance comparison of models

Evaluating the performance of machine learning models is a crucial step in assessing their accuracy and dependability. The evaluation index system is used to assess the accuracy and reliability of the ML prediction model, which is based on the three aforementioned algorithms. Through comparison and analysis of the evaluation metrics for the three algorithms, the most suitable algorithm for predicting the axial compression bearing capacity of concrete-encased steel composite columns is identified.

Evaluation index system

In the performance evaluation of machine learning models, there are various performance indicators that can be used to evaluate the advantages and disadvantages of the model. The mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and determination coefficient (R2) were selected to form the evaluation index system, and then the prediction performance of the model was evaluated. Among them, R2 is a statistical measure that assesses the model’s fit, with values ranging from 0 to 1. The closer the R2 is to 1, the better the fitting degree between the predicted results of the model and the experimental results, that is, the higher the accuracy of the predicted results of the model; MAE is an index to measure the accuracy of the prediction model. It is computed by taking the absolute differences between the model’s predicted values and the corresponding experimental values, and then averaging these differences. MAPE is an index to measure the relative size of the deviation between the predicted value and the actual value of the model. It provides a standardized error measure by calculating the percentage of prediction error; RMSE is a commonly used indicator to measure the accuracy of prediction models, especially in regression analysis. The calculation method involves first determining the sum of the squares of the differences between the model’s predicted values and the experimental values. Subsequently, the square root of the average of this sum of squares is computed to yield the final result. The smaller the MAE, MAPE and RMSE, the smaller the error of the model prediction results. The calculation expressions of the four performance evaluation indexes are summarized in Table 3.

Table 3 Calculating expression of evaluating indicators.

Analysis of evaluation results

The performance of the ML model is assessed using the established evaluation index system. Table 4 summarizes the performance evaluation metrics of the models under the three algorithms in the training and test sets56. Figure 7 illustrates the prediction performance of the ML model across the three algorithms, distinctly showcasing the model’s predictive capabilities within both the training and test datasets.

Table 4 Performance metrics for the training and testing sets of ML model.
Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

ML model prediction performance. (a) ANN, (b) RF, (c) XGBoost.

As shown in Fig. 7, for the training set, the R2 of the RF-based prediction model is the highest, which can reach 0.987, and the three errors of RMSE, MAPE and MAE are the smallest, indicating that the RF-based model has the best fitting effect on the training data. The second is the model based on XGBoost, and the worst fitting effect is the model based on ANN, indicating that RF shows the strongest learning ability. For the test set, the model based on XGBoost achieves the highest R2 value, which is 0.98, and it also exhibits the smallest values for the three errors metrics. In addition, the R2 of the XGBoost-based prediction model on the training set and the test set is very close, indicating that the XGBoost model exhibits the best generalization ability and can accurately predict unknown data. However, the R2 of the RF model on the test set is only 0.922, and the prediction performance is not ideal, which is quite different from the good fitting effect shown in the training set. It shows that the generalization ability of the RF model is weak, and there is an over-fitting phenomenon, which can not accurately predict the unknown data. In conclusion, the XGBoost model exhibits the highest prediction accuracy for the axial compression bearing capacity of concrete-encased steel composite columns, with the ANN model ranking second in terms of accuracy. The RF model, on the other hand, demonstrates the least effective prediction, indicating that the XGBoost model is the optimal choice for this prediction task.

To bolster the reliability of the prediction model and to further validate its accuracy, three different specifications are employed to compute and compare the bearing capacities of the specimens within the database. Table 5 collates the predicted values by the model alongside the calculated values.

Table 5 Comparison of test values with ML model predictions and codes calculations.

The data is visually expressed to visually display the comparison between the specification and the prediction model. Figure 8 depicts the comparison between the actual values and the predicted values of the ML model.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Comparison between the true value and the predicted value of ML model. (a) ANN, (b) RF, (c) XGBoost.

Figure 8 (a), (b) and (c) show the comparison between the predicted value and the true value of the model based on the three algorithms. It can be clearly seen from the figure that the overall determination coefficient of the three models is 0.972, indicating that the three models can accurately predict the bearing capacity of concrete-encased steel composite columns. The reason is that the calculation Equation does not consider the specific parameters to express the interaction between the influencing factors, but only by considering the influence of less specific parameters to get the final calculation results. In fact, not only there are complex nonlinearities between materials, but also there are interactions between various materials that cannot be ignored. Compared with the standard calculation Equation, the ML model learns from a large amount of data, which can capture the complex interaction between parameters and predict the bearing capacity of components with better accuracy to a certain extent.

Analysis of interpretability based on SHAP

Poor interpretability is an important factor hindering the application of machine learning algorithms in practical engineering57. Understand why the model predicts the axial bearing capacity of concrete-encased steel composite columns according to the eigenvalue, and the credibility of the model prediction results. Compared with the simple single decision tree, the prediction process of the complex integrated decision tree model (represented by XGBoost) is more difficult to be understood intuitively, so it is necessary to analyze the interpretability of the prediction results. Lundberg et al.58 proposed a unified interpretable machine learning method, Shapley additive explanations (SHAP). SHAP constructs an additive interpretation model, regards all features as contributors, and calculates their contribution values. The sum of the contribution values of all features is the final prediction of the model:

$$f\left( x \right)=g\left( {{z^*}} \right)={\phi _0}+\sum\limits_{{i=1}}^{M} {{\phi _i}} z_{i}^{*}$$
(14)

Note

f(x) is a machine learning model, and this study is an XGBoost model; \({z^*}\)= {0, 1}, when the feature i is observed, \({z^*}\)= 1, otherwise 0; if i is involved in the prediction process, M is the number of features; \({\phi _i}\)is the contribution of feature i, the expression is

$${\phi _i}=\sum\limits_{{s \subseteq N\backslash i}} {\frac{{\left| {S\left| {!(M - } \right|\left. S \right| - 1)!} \right.}}{{M!}}} \left[ {f(S \cup \left\{ {\left. i \right\}) - f\left( S \right)} \right.} \right]$$
(15)

Note

N is the set of all input features, and S is the set of non-zero indexes59. SHAP has different kernels, such as Kernel SHAP, Deep SHAP, Tree SHAP53, where Tree SHAP matches the XGBoost model in this study.

Global explanatory analysis

The same machine learning model can have different feature importance evaluation methods. For most tree-based models, different techniques can be used to output feature importance. One of the commonly used techniques is based on the statistical information of the number of node splittings in the tree, that is, the frequency of feature usage in the tree; another common technique is based on the average reduction of node impureness by features, that is, the contribution of features to model prediction. Due to the use of different model structures and different feature importance assessment methods, the feature importance ranking may change significantly according to these factors. In contrast, the SHAP value can more accurately evaluate causal inference, and can better identify important features and identify influencing factors. Therefore, the SHAP value can reflect the influence of features on the model output, and the average absolute SHAP value can be used as the basis for analyzing the importance ranking of features23.

Figure 9 is the SHAP summary diagram, where the abscissa represents the SHAP value of each feature, the right ordinate is the eigenvalue size of each feature, the red represents the value of the feature is relatively large, and the blue represents the value of the feature is relatively small. From the diagram, it can be seen that Aa has a positive effect on the axial bearing capacity. With the increase of Aa, the SHAP value increases, and the corresponding axial bearing capacity also increases accordingly. Similar fa and fc have a positive effect on the axial bearing capacity; h has a negative effect on the bearing capacity of axial compression. With the increase of H, the SHAP value decreases gradually, which has a negative effect on the bearing capacity of axial compression. The other input parameters have little effect on the SHAP value.This finding can provide data to support future revisions to the code, adapting it to the actual force mechanism of concrete-encased steel composite columns.

The importance ranking of axial compression bearing capacity characteristics based on SHAP values is shown in Fig. 10. SHAP not only explains the influence of input characteristics on the output results, but also calculates its contribution value. Therefore, SHAP can be used to rank the important characteristics of the black box model with high reliability. Figure 10 shows the important characteristics of the input parameters calculated by the XGBoost model on the axial compression bearing capacity. The results show that Aa, As, fa, B and fc are important characteristic parameters affecting the CES axial compression column. In engineering practice, the axial compressive load capacity of concrete-encased steel composite columns can be prioritized to increase Aa, As, fa, B and fc.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

SHAP summary of input parameters.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Importance ranking of input parameters.

Explanatory analysis of single sample

In order to understand the influence of the specific characteristics of a single sample on the prediction, Fig. 11 shows the interpretation analysis of the compressive strength prediction of the sample instance. The E[f(X)] in the diagram is the reference value of the model, that is, the average value of all the predicted values of axial bearing capacity, and f(X) is the actual predicted value for a specific sample. Both of them are converted from the actual output of the model. The influencing factors of axial bearing capacity are sorted from large to small according to the absolute value of SHAP value. The predicted value is based on the benchmark value and is obtained under the influence of the SHAP value of each influencing factor. The SHAP value reflects the contribution of each feature to the prediction of the axial compression bearing capacity of the model, and its absolute value is expressed by the length of the arrow. Red and blue represent the positive and negative effects of the characteristic value on the axial bearing capacity predicted by the model, respectively. If the Aa of the sample is smaller than the average Aa of the database sample as a whole, and Aa is positively correlated with the prediction of axial bearing capacity, the smaller Aa has a negative effect on the prediction of axial bearing capacity, and the analysis of other variables is the same. It should be noted that since the graph shows the prediction results of a single sample, the feature importance ranking of each sample is not necessarily the same, and may be different from the feature ranking of the overall sample.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.
Full size image

SHAP’s predictive interpretation of a single sample.

Characteristic dependence analysis

The SHAP method can quantify the dependence between different characteristic factors, and further explore the influence mechanism of each characteristic factor on the axial compression bearing capacity60. Figure 12 is the characteristic dependence diagram between the steel area and the yield strength of the steel and the width of the specimen section. The feature dependence graph shows the relationship between the predicted output and each feature, while taking into account the influence of other features. As shown in Fig. 12 (a), when Aa≤0, fs has no significant effect on Aa; when 0 ≤ Aa≤0.3, fs has a positive effect on Aa; when 0.3 ≤ Aa≤1, fs has a negative effect on Aa. As shown in Fig. 12 (b), when Aa≤0, B has no significant effect on Aa; when Aa≥0, B has a positive effect on Aa. The design of the concrete-encased steel composite columns should optimize the relationship between the steel section and cross-section size to avoid unnecessarily increasing the cross-section and wasting materials.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

SHAP feature dependence graph of input parameters. (a) The interaction between Aa and fs characteristics, (b) The interaction between Aa and B characteristics.

Conclusion

In this paper, 116 sets of test data were collected through the established database standard, of which 80% data was used for model training and 20% data were used for model verification. ANN, RF and XGBoost were used to establish prediction models to predict the axial compression bearing capacity of concrete-encased steel composite columns. The SHAP method is used to effectively reveal the coupling relationship between the relevant variables, which can be explained by a single sample, and provides a good basis for the rapid prediction of the axial bearing capacity of concrete-encased steel composite columns.

(1) In the established black box model of CES composite columns, the comprehensive performance of XGBoost model is the best, followed by ANN model, and RF is the worst.

(2) Based on the global interpretation analysis of SHAP, it can be found that Aa, fs, B, fa and fc are the most important factors affecting the axial compression bearing capacity of the CES column. In addition, through the global interpretation analysis, it can be seen that the characteristic variables fa has a positive synergistic effect on the axial compression bearing capacity, while H has a negative effect on the axial compression bearing capacity.

(3) The feature-dependent interpretation of SHAP explores the tendency of the cross-sectional area of the built-in steel, the cross-sectional width and the yield strength of the steel bar to the axial compression bearing capacity of the CES column. Therefore, the interpretability mechanism of SHAP can provide a reference for the design and application of CES columns.