Introduction

Recently, the stability of slopes has been a focal point in the fields of geotechnical and geological engineering. Slopes constitute a fundamental geological environment for various human engineering activities. Numerous slopes are formed through railway, mining, road, and construction projects. Once these slopes become unstable, they can trigger significant geological disasters such as landslides, collapses, and mudflows, leading to substantial loss of life, property damage, and adverse impacts on socio-economic development1,2. By the end of 2020, there were 328,654 geological hazard points in China, with landslides, collapses, mudflows, and unstable slopes accounting for 96.16% of these hazard points3. Consequently, the developing a reliable and effective predictive warning model for slope instability holds significant theoretical and practical importance.

Scholars both domestically and internationally have conducted research of varying depths on the analysis and evaluation of slope stability. Traditional methods for analyzing slope stability include model testing, theoretical analysis, numerical calculation, and statistical analysis4. However, as dynamic open systems, slopes are influenced by numerous factors, many of which are random and variable. This leads to complex nonlinear relationships between these factors and slope stability, thereby imposing certain limitations on traditional analysis methods5. In recent years, with the development of emerging technologies such as artificial intelligence, machine learning methods have been widely applied to slope stability prediction6. These predictions primarily focus on classification and regression, making supervised learning models in machine learning the commonly adopted approach. Supervised learning models can be categorized into single models and ensemble models7.

Common single models include Support Vector Machines (SVM), Neural Networks (NN), and Logistic Regression (LR). The single model is straightforward to implement and does not demand excessive computational resources or time; however, it tends to overlook the inherent characteristics of the data during the calculation process, which may result in inaccurate predictive outcomes. Despite the strong generalization capabilities of SVM in handling high-dimensional data, it faces high computational complexity when dealing with large datasets and lacks sensitivity to data8,9. NN can manage complex nonlinear relationships but require a substantial volume of data samples and have challenges in explaining the computation process10,11. LR offers fast training speed and is well-suited for binary classification tasks, but it struggles with imbalanced data12,13. Decision Trees require minimal data cleaning and operate quickly, but they often result in lower stability and typically provide locally optimal solutions14,15.

Integrated models combine multiple single models to form a stronger predictive model, effectively reducing the prediction bias and variance of individual models, thereby achieving relatively optimal performance. Common ensemble models include eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and Categorical Boosting (CatBoost). Wu et al. compared XGBoost and RF algorithms to evaluate slope safety and stability, concluding that XGBoost was the best evaluation model16. Zhang and Zhang employed LGBM algorithm to study slope stability prediction, proposing a model that adeptly captures the complex nonlinear relationships between influencing factors and slope stability17. XGBoost provides feature importance evaluation, offering higher interpretability, but it involves complex parameter tuning and high memory consumption. LightGBM uses a histogram-based algorithm, which reduces time complexity and computational cost, but it is prone to overfitting and sensitive to noise and outliers.

In the realm of contemporary slope stability prediction research, traditional predictive methods generally exhibit limitations, including poor universality, extended time requirements, and suboptimal performance of prediction models. Compared to other algorithms, CatBoost offers higher accuracy and shorter training times than XGBoost, and it better addresses the overfitting issues present in LightGBM through enhanced operations. CatBoost demonstrates exceptional performance, robustness, and generalizability. It is user-friendly and more practical. The predictive capabilities of CatBoost have been initially applied in fields such as automation technology18, medical health19, energy and environment20, and transportation21. However, it is seldom employed in the field of geotechnical engineering and slope stability analysis22,23. CatBoost enriches feature dimensions and combats noise points in the training set. Compared to other models, it reduces the need for hyperparameter tuning and lowers the probability of overfitting, making it particularly suitable for applications in geological disaster prediction.

Based on aforementioned studies, it is proposed to use the slope instance data from24 to establish a slope prediction database. A slope stability prediction model will be established using CatBoost, and its performance will be compared with common machine learning models: SVM, LGBM, RF, and LR. The effectiveness of CatBoost model’s predictions will be validated through these comparisons. Subsequently, a slope instability early warning model will be developed based on this foundation, providing corresponding warning levels. This approach aims to offer new solutions for related research and practical applications.

CatBoost

CatBoost is an ensemble learning algorithm that builds on the enhanced Gradient Boosting Decision Tree (GBDT) method, which was developed by Yandex in 2017 25. The algorithm utilizes a symmetric tree as its fundamental structure, employs a preprocessing method for feature splitting, and incorporates an ordered boosting strategy to mitigate gradient bias and prediction offset issues that are frequently encountered in gradient boosting algorithms. This leads to efficient classification with a high level of robustness.

GBDT

The fundamental procedure of CatBoost entails utilizing the gradient boosting algorithm to compute the loss function in the function space. The process involves iteratively combining weighted weak learners to minimize the loss function and achieve an optimal decision tree structure as shown in Fig. 126. The basic process is as follows27:

Fig. 1
figure 1

Principle of gradient lifting algorithm.

When provided with a training dataset \(T = \left\{ {(x_{1},y_{1}),(x_{2},y_{2}), \ldots ,(x_{N},y_{N})} \right\}\), a loss function \(L(y_{i},f(x))\), the slope can be determined. Initializing weak learners, the constant value that minimizes the loss function can be obtained as Eq. (1).

$$f_{o}(x) = \arg \mathop {\min }\limits_{\gamma } \sum\limits_{i = 1}^{N} {L(y_{i},y)}$$
(1)

By iterating the training process for \(m = 1,2, \cdot \cdot \cdot ,M\) and calculating residuals for each sample \(i = 1,2, \ldots,N\). The residual \(r_{im}\), which is defined as the negative gradient of the loss function at the proposed model’s value, can be determined as Eq. (2).

$$r_{im} = - \left[ {\frac{L(y_{i},f(x_{i}))}{{f(x_{i})}}} \right]_{{f = f_{m - 1} }}$$
(2)

Employing the residual \(r_{im}\) as the new ground truth for the sample, thereby making \((x_{i},r_{im})\) the input for the subsequent tree. \(f_{m}(x)\) represents the new tree, while \(R_{jm}(j = 1,2, \ldots ,J_{m})\) representing the corresponding leaf node region. The optimal fit value that minimizes the loss function for the leaf region can be determined using Eq. (3).

$$ \gamma_{jm} = \arg \mathop {\min }\limits_{\gamma } \sum\limits_{{x_{i} \in R_{jm}}} {L(y_{i}} ,f_{m - 1} (x_{i}) + \gamma )$$
(3)

The final strong learner is represented as Eq. (4).

$$f_{m} (x) = f_{m - 1} (x) + \sum\limits_{j = 1}^{{J_{m} }} {\gamma_{jm} } I(x \in R_{jm} )$$
(4)

where \(I\) is the indicator function. If \(x \in R_{jm}\), \(I = 1\). Otherwise, \(I = 0\).

Symmetric tree

CatBoost employs a symmetric tree as the foundational model, wherein it encodes the index of each leaf node in a binary vector of the same length as the tree depth. This process guarantees uniformity in the division of features at every iteration, thereby preserving full symmetry and equilibrium between the left and right subtrees. This design reduces the complexity of the model, thereby improving the accuracy of slope stability prediction, training speed, and memory utilization. At each iteration, the leaves of the preceding tree are divided using identical conditions. Selecting the feature split that results in the lowest loss is of paramount importance. The balanced tree structure facilitates efficient CPU implementation, diminishes prediction time, and functions as a form of regularization to mitigate overfitting28.

Feature combination partition

CatBoost organizes and stores the 6 features of slope data by grouping them. However, an excessive number of feature combinations may lead to a substantial increase in the final model size. The storage capacity is contingent upon the quantity of values that each feature encompasses. By partitioning the decision tree model to reduce the final model size and taking into account the potential weights of features, the algorithm chooses the optimal partition29. The scoring formula for various partition scenarios30 is provided in Eq. (5).

$$s_{new} = s_{old}\left( {1 + \frac{u}{U}} \right)^{Z}$$
(5)

where \(s_{new}\) is the new Score for the slope feature partition; \(s_{old}\) is the old Score for the feature partition; u is the number of features; U is the maximum value of u; and Z is the model size coefficient. Various partition scenarios are evaluated and compared based on their respective Scores, with the optimal scenario being selected.

Ordered boosting

While conventional gradient boosting algorithms involve fitting a new tree to the gradient of the proposed model, CatBoost introduces enhancements to address the problem of overfitting resulting from biased pointwise gradient estimates. The issue of overfitting, stemming from biased pointwise gradient estimation, has been observed in numerous models, rendering them susceptible to overfitting when applied to small and noisy datasets. To address this challenge and achieve unbiased gradient estimation, CatBoost initially trains a distinct model Mi for each sample xi, utilizing all samples except Mi. Subsequently, it employs the traditional GBDT framework31,32. The model Mi is utilized for estimating gradients of the samples, while the gradient training base learner is leveraged to derive the final model. CatBoost employs an ordered boosting strategy to replace the gradient estimation method used in traditional gradient-boosting algorithms. The algorithm establishes relatively independent models for different slope samples by incorporating prior terms and weight coefficients. It continuously trains base learners using gradient values, thereby reducing bias in gradient estimation and enhancing the model’s generalization capability33.

Establishment of the model

Slope stability database

A database was created by compiling 221 sets of slope instance data gathered from more than 60 locations, as depicted in Table 1. The dataset includes 115 stable slopes and 106 failure slopes. The stability of a slope is affected by various factors, including the slope’s geometry, geological characteristics, and the impact of surface and groundwater. These factors can be broadly categorized into three distinct categories: slope geometry, geotechnical physical parameters, and the presence of water within the geotechnical body. The majority of slope failure forms documented in the database display characteristics of near-circular sliding, which is a prevalent type of slope in engineering projects34. Key factors influencing the stability of these slopes, as demonstrated by various studies35,36,37, encompass slope height (H), slope angle (β), unit weight (γ), cohesion (C), internal friction angle (Φ), and pore pressure ratio (ru). Slope height and slope angle are utilized to characterize the fundamental geometry of a slope. The mechanical properties of rock and soil bodies are represented by unit weight, cohesion, and internal friction angle. Cohesion and internal friction angle are critical parameters that influence slope stability. The strength reduction method is applied through iterative adjustments of cohesion and internal friction angle. The pore pressure ratio is employed to describe the water content in rock and soil, defined as the ratio of pore water pressure to overlying rock pressure. Rainfall results in an increase in the pore water pressure ratio. The database was constructed based on 6 variables and the slope state.

Table 1 Slope Stability Prediction Model Database.

Distribution characteristics

Several feature variables display variations in slopes. It is essential to comprehend the characteristics of slope features in various states for qualitative analysis. Figure 2 displays box plots and normal distribution curves for the 221 instances of the slope, with specific numerical values indicated. The calculation of skewness and kurtosis was performed in Table 2, and the analysis of each feature variable was conducted using a combination of "box plots + normal distribution curves." Skewness and kurtosis are frequently employed statistical measures that depict the asymmetry and peakedness of data distributions, serving as tools for evaluating the properties of probability distributions. In general, if the absolute values of the skewness and kurtosis coefficients are less than 1.96 times the standard error, it can be inferred that the data conforms to a normal distribution38.

Fig. 2
figure 2

Box plot of characteristic parameters of slope stability + normal distribution curve: (a) Slope Height (H), (b) Slope Angle (β), (c) Unit Weight (γ), (d) Cohesion (C), (e) Internal Friction Angle (Φ), (f) Pore Pressure Ratio (ru).

Table 2 Skewness and kurtosis statistics of each characteristic variable of slope stability.

The variable H exhibits a skewness of 1.942 and a kurtosis of 3.267, suggesting a departure from normal distribution. The values are predominantly clustered within the range of 40 m to 50 m, with a maximum value of 565 m and a minimum value of 3.66 m. The representation includes slopes of varying heights, such as low, medium–high, high, and ultra-high. In Fig. 2, the normal distribution curves for β and γ exhibit a relatively low kurtosis, with median values close to the mean, suggesting a uniform and symmetric distribution with minimal deviation. The 25th to 75th percentiles for slope angles are represented by values of 25° and 45°, suggesting the existence of steep, moderate, and gentle slopes. C exhibits a skewness of 4.156 and a kurtosis of 22.902, suggesting a departure from normal distribution. The skewness of Φ is characterized by a small and negative value, indicating a left-skewed distribution of most data points with large internal friction angle values. This is further emphasized by the relatively large absolute value of the skewness. The skewness for ru is greater than 0, while the kurtosis is less than 0, suggesting a right-skewed distribution with relatively small pore pressure ratios. An extensive evaluation considering the γ, C, and Φ indicates that the quantity of rocky slopes is limited to 15, constituting less than 10% of the slope dataset. Moreover, the instance data predominantly consists of soil slopes, with cohesive soil slopes accounting for over 95% of the total39.

In summary, the "box plots + normal distribution curves" illustrate the data distribution for each slope stability feature variable. Out of the 6 feature variables, only the variable γ conforms to a normal distribution. To conduct a more in-depth analysis of the relationship between the feature variables, it is necessary to normalize the data.

Correlation analysis

Due to the presence of outliers in the slope feature variables and their non-normal distribution, which could potentially affect predictive accuracy in subsequent stages, all variables were subjected to transformation using the Box-Cox method40. Following the transformation, all the data closely approximated a normal distribution, which allowed for further correlation analysis.

The Pearson correlation coefficient is a statistical measure used to assess the strength and direction of the linear relationship between two variables which was utilized to conduct a correlation analysis. The absolute values of the Pearson correlation coefficient fall within specific ranges to indicate the strength of correlation. A value between 0 and 0.2 suggests an extremely weak or no correlation, while a value between 0.2 and 0.4 indicates a weak correlation. A value between 0.4 and 0.6 suggests a moderate correlation and a value between 0.6 and 0.8 indicates a strong correlation. Finally, a value between 0.8 and 1 indicates an extremely strong correlation41. Figure 3 depicts the distribution and interrelationships of slope feature variables. The diagonal depicts the histograms of individual features, while the upper triangle presents the correlation coefficients between two slope features. The lower triangle illustrates the distribution density and linear fitting curves for the two features. The figure indicates that the absolute values of the correlation coefficients between the variable ru and other features, as well as between variables H and C, are all below 0.2. The data points in the lower triangle are distributed around the fitting curves, suggesting a very weak correlation between these variables. The highest correlation coefficient values are observed between γ and Φ, reaching a value of 0.46. The data points in the lower triangle exhibit a higher concentration around the fitting curve, indicating a moderate correlation between them.

Fig. 3
figure 3

Correlation matrix of slope characteristic parameters.

In brief, the correlation coefficients between the features are all below 0.6, suggesting a weak correlation. The features exhibit a complex non-linear relationship, and all 6 slope features can be regarded as parameters for forecasting slope stability, demonstrating the model’s capacity to accommodate such intricate relationship data.

CatBoost slope stability prediction model

A set of 221 data points, comprising 6 variables (H, β, γ, C, Φ, and ru), was chosen as the input for the CatBoost slope stability prediction model. The input data underwent normalization using the min–max standardization method. Due to the limited sample size, the dataset was partitioned into a training set and a test set. After optimizing the traditional 6:2:2 ratio of training, validation, and test sets and comparing results with different ratios, the data was partitioned randomly into training and testing sets at an 8:2 ratio, followed by the implementation of fivefold cross-validation on the training set. The CatBoost algorithm iterated through the 6 slope features, splitting them to construct symmetric decision trees. The potential feature weights were taken into consideration, and the algorithm utilized an ordered boosting method to select the optimal partition. Different models were developed for various slope samples, with a continuous training process for base learners using gradient values. Ultimately, the optimal parameters were identified through grid search in conjunction with cross-validation, resulting in the development of a relatively optimal model that underwent validation using the testing set. To illustrate the predictive abilities of CatBoost, this study compared its performance with SVM, LGBM, RF, and LR models. In this process, the mesh search method integrated with cross-validation (GSCV) was employed to optimize the parameters of five models, thereby determining the optimal parameters for each model.To guarantee the reliability and applicability of the findings, five performance evaluation metrics were employed to evaluate the predictive capability of the model. The detailed process is depicted in Fig. 4.

Fig. 4
figure 4

CatBoost-based slope stability prediction model.

Slope instability warning model

Previous studies have shown that the application of machine learning in geological disaster early warning research is limited, primarily focusing on earthquakes and ground surface deformation42,43. In contrast, slope instability issues face challenges due to difficulties in data measurement and constraints related to time and space, resulting in a predominance of qualitative analyses44. The CatBoost-based early warning system for slope instability can effectively predict such instabilities and accurately classify warning levels when provided with known data. This offers valuable support for early warnings and preventive measures in geotechnical engineering applications, including foundation pit construction and mining operations.

Upon the establishment of the CatBoost prediction model, probability values for both correct and incorrect predictions were obtained. The frequencies of stable and failure slopes with accurate predictions were tallied. The instability probability was computed for stable slopes with accurate predictions, and distribution intervals and frequencies were determined. This information was utilized in the development of a model for predicting slope instability, which involved the definition of various warning levels. Various levels of warnings were employed to facilitate assessment through manual examination and other techniques for identifying hazardous zones and implementing appropriate preventive measures as shown in Fig. 5.

Fig. 5
figure 5

Slope instability early warning model.

Results and discussion

Comparative analysis

Model evaluation metrics

Machine learning algorithms necessitate distinct model evaluation standards for various problems. In this study, the prediction of slope stability classification was evaluated using 5 metrics: accuracy, precision, F1-Score, recall, and Area Under Curve (AUC). The confusion matrix, which is a standard format for evaluating precision, is a matrix with n rows and n columns, as depicted in Table 3.

Table 3 Confusion matrix.

Where TP represents the number of samples stable and predicted as stable; FP represents the number of samples failing but predicted as stable; FN represents the number of samples stable but predicted as failure; and TN represents the number of samples failing and predicted as failure. The most direct evaluation metric for classification problems is accuracy. Precision refers to the ratio of correctly predicted stable samples to the total samples predicted as stable by the model. In general, greater precision signifies a heightened capacity of the model to distinguish failure samples, leading to improved outcomes. The accuracy and accuracy ranges from 0 to 1, and the larger the value, the stronger the classification and prediction ability of the model. F1-Score can be conceptualized as the weighted average of precision and recall, with values ranging from 0 to 1. The model’s performance is better as it approaches a value of 1, and worse as it approaches 0. The recall is defined as the proportion of correctly identified stable samples to all stable samples, and its effectiveness aligns with the F1-Score. AUC denotes the area under the receiver operating characteristic (ROC) curve, typically falling within the range of 0.5 to 1. AUC represents a probability value, and a higher value signifies a model that is closer to perfection and demonstrates superior classification performance. The formulas for each metric and the coordinates of the ROC curve are provided as Eqs. (6)–(10) 16:

$$Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}}$$
(6)
$$Precision = \frac{TP}{{TP + FP}}$$
(7)
$$Recall = \frac{TP}{{TP + FN}}$$
(8)
$$F_{1} - Score = \frac{2 \times Recall \times Precision}{{Recall + Precision}}$$
(9)
$$\begin{gathered} ROC: \, X = \frac{FP}{{FP + TN}} \hfill \\ \, Y = \frac{TP}{{TP + FN}} \hfill \\ \end{gathered}$$
(10)

Comparative validation

To assess the effectiveness of the CatBoost prediction model, four other models (SVM, LGBM, RF, and LR) were compared with the CatBoost model in relation to accuracy, precision, recall, F1-Score, and AUC on both the training and testing sets.

Figure 6 depicts the precision of each model on both the training and testing sets. CatBoost and RF models both attained a training set accuracy of 99.43%, with LGBM following closely at 98.86%. SVM and LR models demonstrated lower training set accuracy, achieving 81.82% and 73.86%, respectively. While CatBoost and RF models exhibited similar classification abilities on the training set, they demonstrated various classification performances on the testing set. The CatBoost model achieved an accuracy of 93.18% on the testing set, representing a deviation of 6.25% from the training set. The accuracy of the RF model on the testing set was 86.36%, indicating a deviation of 13.07% from the accuracy observed in the training set. The models exhibited high accuracy when trained on the samples, but showed poor performance when tested, a phenomenon also noted in SVM, LGBM, and LR models. The disparities in accuracy between the training and testing sets for these 4 models all exceeded 10%, suggesting inadequate classification performance on the testing set and overfitting. Consequently, the CatBoost model demonstrated superior parameter selection capability in terms of accuracy on both the training and testing sets, in comparison to other models.

Fig. 6
figure 6

Accuracy of training set and test set under different models.

Figures 7 and 8 depict three-dimensional geometric shapes illustrating precision, recall, and F1-Score, as well as ROC curves for each model. The results suggest that the CatBoost model demonstrates superior performance across all performance metrics compared to other models, as indicated by specific values highlighted in the figures.

Fig. 7
figure 7

Three-dimensional schematic diagrams of precision, recall and F1-score for each model.

Fig. 8
figure 8

ROC curve of each model.

In Fig. 7, the distance of the CatBoost corresponding sphere from the origin is greater than that of the other 4 points. The precision projections of each model on the test set are represented as a square, while the recall projections are depicted as a circle, and F1-Score projections are illustrated as a triangle. CatBoost model exhibits a precision of 100%, surpassing that of the other 4 algorithms, effectively differentiating failure samples. The number of accurately predicted stable slope samples corresponds to the actual number of stable slope samples. The model’s predictive performance is better illustrated by the combination of precision and accuracy. Accuracy denotes the likelihood of accurately hitting the target, whereas precision refers to hitting a particular point within the target. The CatBoost model exhibits a recall rate of 85.71%, signifying the percentage of stable slope samples correctly predicted in the dataset, thereby showcasing the model’s strong classification capability for slope stability within the dataset categories. Among the 5 models, the CatBoost model demonstrates an F1-Score value of 61.02%, surpassing the F1-Score values of the remaining 4 models. F1-Score is affected by both precision and recall, as it represents their harmonic mean. When the values of precision and recall are closely aligned and high, F1-Score tends to be higher. However, when there is a significant difference between the two values, F1-Score tends to be closer to the lower of the two.

ROC curves for each model are depicted in the figure below. AUC values for CatBoost, SVM, LGBM, RF, and LR models are 0.95, 0.69, 0.89, 0.93, and 0.62, respectively. AUC is a widely utilized evaluation metric for binary classification models. A value approaching 1 signifies superior classification performance. CatBoost model demonstrates a higher AUC value compared to the other 4 models, indicating its strong ability to sort slope sample data. In conclusion, the CatBoost model exhibits significantly superior slope stability prediction capabilities compared to other models and possesses effective sample sorting capabilities.

Warning level division

Among the 221 slope instance data, 44 sets of data were utilized as a test set to validate the CatBoost model. The test set achieved an accuracy of 93.18%, with 41 sets predicted correctly and 3 sets predicted incorrectly. The prediction of the CatBoost model is determined by comparing the predicted probability of stability with a threshold of 0.5. If the anticipated likelihood of stability exceeds 0.5, the outcome of the prediction indicates a stable slope; otherwise, it suggests instability. Following statistical analysis of the prediction outcomes, 23 stable slopes were correctly predicted, while 18 slopes were classified as failures.

Figure 9 illustrates the line graph of the probability distribution for accurately predicted slopes and accurately predicted stable slopes. The two lines have little difference in distribution frequency and similar distribution trend in each interval, with the purple line indicating accurately predicted slopes. The frequency of distribution for predicted probabilities less than 0.9 does not surpass 0.25. The green line depicts stable slopes that have been accurately predicted, with a distribution frequency that does not exceed 0.1 for predicted probabilities less than 0.9. CatBoost model demonstrates a high level of precision and stability in predicting slope instability.

Fig. 9
figure 9

Prediction probability distribution diagram.

The 23 stable slopes that were accurately predicted are categorized into 5 intervals based on their predicted probabilities: 0.5–0.6, 0.6–0.7, 0.7–0.8, 0.8–0.9, and 0.9–1.0. The distribution frequencies corresponding to the values are 0, 0.043, 0, 0.043, and 0.91, respectively. In the study by He et al.45 on the development of a remote monitoring and early warning system for landslide geological disasters, a 4-level early warning system was established. This system includes stable, sub-stable, near-sliding, and imminent sliding warnings. CatBoost model accurately predicted 23 stable slopes, in accordance with their frequency distribution and relevant warning regulations46, the slope instability early warning system is categorized into 3 levels. When the anticipated probability of slope instability falls below 0.1, the slope is considered to be in a stable warning state, indicated by the color blue. When the anticipated likelihood of slope instability falls within the range of 0.1 to 0.3, the slope is considered to be in a condition of sub-stable warning, denoted by the color yellow. When the anticipated likelihood of slope instability falls within the range of 0.3 to 0.5, the slope is considered to be in a state of imminent sliding, denoted by the color red. The precise categorization is presented in Table 4.

Table 4 Early warning level of slope instability.

The frequency distribution of predicted slope probabilities exceeding 0.9 is 0.91, accompanied by a significant test accuracy of 93.18%. The results suggest that the CatBoost model exhibits robust classification capabilities in predicting the stable state of slopes, albeit with a certain degree of inaccuracy. The warning model is only appropriate for accurately predicting stable slopes. Upon application, the CatBoost slope stability prediction model exhibits a high level of accuracy in providing early warnings for slope instability.

Limitations

The comparative analysis above indicates that the predictive performance of the slope stability model based on CatBoost is significantly superior to other models. The early warning model established on this basis demonstrates certain applicability, but there are still some limitations in this study that can be addressed in future research.

Firstly, this study only considered slope failures dominated by circular sliding failure, and did not consider other types of slope failures. In the process of establishing the database and the final slope instability warning model, no consideration was given to rainfall, earthquakes, and human activities, because the collection of slope parameter data of various types is difficult. Therefore, in future research, it may be possible to consider as many slope types and characteristic variables as possible to further improve the model’s universality. Secondly, this study did not calculate the slope stability coefficient, but simply divided the slope stability into stable and unstable states and established a binary classification prediction model. In the future, the slope stability coefficient can be classified into intervals for better prediction of the slope stability state. Thirdly, the early warning model lacks big data support, and multi-party data training can be collected to improve its robustness.

Conclusions

The following conclusions can be drawn from the experimental results:

  1. 1.

    The CatBoost stability prediction model that was developed demonstrates effective prediction of slope stability, achieving a test accuracy of 93.18%. Additionally, the precision, recall, F1-Score, and AUC metrics reached 100%, 85.71%, 61.02%, and 0.95, respectively. CatBoost model demonstrates high accuracy, robust classification performance, and the capability to address gradient bias and prediction offset issues when applied to slope stability prediction problems.

  2. 2.

    In comparison to SVM, LGBM, RF, and LR models, the CatBoost model demonstrates an improvement in accuracy ranging from 6.82% to 29.54%, precision from 10.53% to 38.1%, recall from 4.76% to 23.81%, and F1-Score from 1.37% to 9.96%. The findings suggest that the CatBoost model effectively mitigates challenges related to low test accuracy resulting from overfitting, and exhibits a degree of superiority, thereby offering valuable contributions to the field of slope stability prediction research.

  3. 3.

    An instability early warning model is established by obtaining the predicted probability of instability for correctly predicted stable slopes. The warning system is categorized into three levels according to probability values: stable, sub-stable, and near-sliding warnings. This classification offers guidance for early warning and prevention efforts related to slope instability.

  4. 4.

    The accuracy and precision of the slope stability prediction model based on CatBoost are notably high. Furthermore, the early warning model demonstrates a significant accuracy rate with stable distribution. This further indicates that the CatBoost model effectively characterizes the dynamics of slope instability changes. Consequently, it can be practically applied to various slope and embankment projects, including foundation pit support, open-pit excavation, and geotechnical reinforcement. Thus, it provides a scientific reference for geotechnical engineering.