Abstract
Rockburst is a typical dynamic disaster in deep underground engineering, and its accurate prediction is of great significance to ensure the safety of engineering. Aiming at the key problems in rockburst prediction, such as insufficient analysis of nonlinear correlation characteristics, significant discreteness of small sample data and limited generalization ability of traditional models, this study innovatively constructs a LOF-ENN-KNN coupling prediction model. The model achieves a breakthrough in prediction performance through a three-level progressive processing architecture: noise samples are effectively eliminated by LOF (Local Outlier Factor), category distribution is optimized by ENN (Edited Nearest Neighbour), and high-precision prediction is finally achieved by KNN(K-Nearest Neighbors).In the rockburst database composed of 299 sets of measured data in the world, the LOF-ENN-KNN model shows significant advantages: its overall accuracy rate is 98.93%,which has an overwhelming advantage over the traditional model. Through the comparison and exploration of different sampling methods and different combinations of LOF algorithm, single resampling technology (SMOTE, ADASYN) or simple technology superposition (LOF-SMOTE, LOF-ADASYN) is easy to introduce over-fitting or negative coupling effect, while LOF-ENN-KNN significantly improves the robustness and generalization ability of the model through modular design. In addition, compared with LR, SVM, DT, NBs and DNN intelligent algorithms, LOF-ENN-KNN has significant advantages. Through engineering examples, it is further confirmed that high-precision prediction ability can still be preserved in complex nonlinear parameters, which provides an efficient and reliable technical scheme for rockburst warning in deep engineering.
Introduction
With the demand for resources, shallow mineral resources can no longer meet people’s needs, and it is an inevitable trend to march towards the deep. As one of the representatives of deep disasters, rockburst seriously threatens the construction of underground engineering. Rockburst is one of the dynamic instability phenomena of surrounding rock spalling, blockage, ejection or throwing due to the sudden and sharp dissipation of elastic strain energy in rock1,2 and its essence is that the surrounding rock obtains kinetic energy3. Rock particles can be ejected at a rapid rate causing huge damage to infrastructure and equipment, and may even cause fatal damage4,5. In 1640, the rockburst disaster at the Altenberg tin mine in Germany caused serious damage, which may be the earliest rockburst disaster on record. Many hard rock mines and deep buried tunnels in China, Switzerland and other countries have suffered different degrees of rockburst hazards. Taking China ‘s data as an example, it is reported that from 2001–2007, there were 13,000 rockburst accidents in metal mines, with more than 16,000 casualties6. In recent years, although many countries have done a lot of research on rockburst, with the increasing depth of underground engineering, the phenomenon of rockburst is not reduced but increased7,8.
In the field of rock burst prediction, although empirical methods and on-site monitoring have achieved good results, they often face the challenges of monitoring conditions, data errors and long cycle in practical engineering. In contrast, machine learning shows significant advantages: low requirements for environmental adaptability, relatively short prediction period, and effective capture of complex nonlinear interactions between rockburst influencing factors. This feature makes machine learning particularly suitable for different types of rockburst prediction requirements9. According to the characteristics of different influencing factors and prediction requirements. Rockburst prediction is divided into long-term prediction and short-term prediction10. Long-term rockburst prediction is based on the inherent mechanical parameters of rock to estimate the possibility of rockburst in different rock types under different stress conditions11. Mainly focused on long-term planning. These parameters are relatively stable, widely spatial representative and easy to obtain. Machine learning models are particularly good at mining potential and complex association patterns from such static data, and can handle high-dimensional parameter spaces. Therefore, long-term prediction results are relatively stable and provide a wide range of spatial risk assessment. For this reason, researchers have developed many efficient machine learning models: Liu et al.12 developed the NGO-CNN-BiGRU-Attention model, which innovatively combines the northern eagle optimization algorithm with convolutional neural network and bidirectional gated recurrent unit. The model extracts the spatial distribution characteristics of rock mechanical parameters through CNN, captures the temporal evolution law of microseismic activity by BiGRU, and dynamically weighted key features of attention mechanism. The accuracy rate of the model is 98% in 287 sample tests, which is 45% -60% higher than that of traditional SVM and KNN models. Jia et al.13combined the meta-heuristic algorithm with the Voting-Soft model to construct a comprehensive model for rockburst intensity prediction. Armaghani et al.10 combined the Wild Dog Optimization Algorithm (DOA), Osprey Optimization Algorithm (OOA), Rime Ice Optimization Algorithm (RIME) and Support Vector Machine (SVM)to construct a hybrid model for long-term prediction of rockburst. The model has high accuracy and strong generalization ability. The PCA-PSO-RVM model proposed by Kuang et al.14 has the advantages of high precision and simple calculation, so it shows high precision and strong adaptability in engineering verification. The K-means-CART classifier K-means-GEP-LR model proposed by Faradonbeh et al.15 has high robustness and high classification mode in long-term rockburst monitoring. The application of KMSORF model generated by Rao et al.16 using the Optuna framework to achieve efficient adjustment of hyperparameters in tungsten mine and tunnel projects shows its ability to accurately predict rockburst levels. Short-term rockburst prediction uses in-situ monitoring techniques or data to predict near-term rockburst risks, such as microseismic monitoring, acoustic emission, microgravity and electromagnetic radiation techniques or data focusing on ' real-time early warning ‘. Sun et al.11 used six microseismic parameters to use RF-CRITIC, ECICM, CM, and CG algorithm models to form 105 groups of data sets for short-term rockburst prediction of slight and moderate rockburst in the Ashel copper mine in Xinjiang, the New Jersey hydropower project in Pakistan, and two moderate rockburst in the Qinling water conveyance tunnel in Shaanxi. In the same year, Qiu and Zhou17 also used the same input parameters for short-term rockburst prediction. Liang et al.18 added the incubation time parameter on the basis of six microseismic parameters and divided the 91 datasets into the training set and the validation set at 7:3 to establish multiple single models for engineering practice and the Jinping II Hydropower Station. Therefore, compared with short-term prediction, long-term prediction model has strong universality and mobility.
Therefore, with the deepening of research, many scholars have achieved ideal prediction results19,20,21. Li and Jimenez(2018)22 proposed an empirical prediction method for rockburst in 135 datasets using LR, Li et al. (2022)23; Zhang et al. (2024)24; Xue et al.(2022)25 and Qiu et al.(2023)17; Yang et al.(2024)26 used SOMTE and ADASYN oversampling techniques to optimize imbalanced sample sets, which improved the accuracy of prediction. Zhang et al. (2024)24 used LOF to eliminate outliers, making the model more accurate in predicting rockburst. Although machine learning has achieved good prediction results, there are still limitations such as oversampling technology leading to blurred boundaries; single anomaly monitoring cannot solve category overlap; the complex model is limited by the small sample problem. In view of the above shortcomings, this paper uses 299 groups of rockburst data to effectively eliminate noise samples through LOF (Local Outlier Factor), and uses ENN (Edited Nearest Neighbour) to optimize the category distribution. Finally, the LOF-ENN-KNN rockburst prediction model predicted by KNN (K-Nearest Neighbors) is designed to solve the problem of boundary overlap and outlier interference simultaneously, and the KNN classifier is introduced to increase the nonlinear mapping ability.
Data Preparation and characteristic engineering
Rockburst characteristic index system construction
Because the empirical criterion of rockburst is mainly based on rock strength, in-situ stress, brittleness and energy4, this paper will choose the following six variables as characteristic variables: σθ, σc, σt, σθ/σc, σc/σt, Wet, which is defined as Table 1. In the process of underground excavation, when the shear stress of the boundary rock mass increases, the radial stress will decrease. Therefore, high shear stress may lead to strain explosion on the surface of surrounding rock, so σθ is selected. The uniaxial compressive strength σc and the uniaxial tensile strength σt reflect the internal properties of the rock. Because the rock accumulates energy before the fracture, the rock with high strength is more likely to burst. In addition, σθ/σc reaction stress concentration coefficient, σc/σt are two forms of rock brittleness index; wet reflects the ratio of the stored elastic strain energy to the dissipated elastic strain energy in the hysteresis loop test27. Therefore, this paper will choose the following six variables as the characteristic variables: σθ, σc, σt, σθ/σc, σc/σt, Wet, which are defined as Table 1. In addition, these six characteristic variables are often used as empirical indicators for rockburst evaluation and are often used in previous literature10,13,28 and are widely considered to be closely related to rockburst.
Data source and analysis
The real data of rockburst were collected from published literatures at home and abroad. A total of 771 groups of data were collected, and 299 groups of rockburst data were obtained after eliminating duplicate samples. Among them, 122 groups of data were from Guo et al.(2022)29103 groups of data were from Afraei et al.(2019)30 36 groups of data were from Dong et al.(2013)31 11 groups of data were from Liu et al.201932; seven groups from Wu et al.33; there are six groups from Li et al.(2022)23 and two groups from Zhou et al.(2012)6. The distribution of the six characteristic variables of the data is shown in Fig. 1. In Fig. 1, the rockburst level abbreviation is defined as: S (Strong Rockburst), M (Moderate Rockburst), L (Light Rockburst), N (None Rockburst). The red dots in the Fig. 1 mark the center value (i.e., median) of the characteristic variables corresponding to each rockburst grade. The observation of Fig. 1 shows that σθ and Wet show a right-skewed distribution, and the main sample is concentrated in the low value area. σt shows a bimodal distribution, reflecting the biclustering characteristics of tensile strength of rock mass. σc shows a multi-peak distribution, which may be due to the difference in compressive strength of different lithologies. σθ/σc showed a strong right-skewed distribution; the distribution of σc/σt is highly discrete and has extreme values. The overall data is relatively discrete, showing data asymmetry and greater volatility. Therefore, it is necessary to preprocess the input variables.
According to the general classification standard of rockburst grade shown in Table 2 below6 a total of 55 groups of no rockburst data(grade 1), 78 groups of mild rockburst data(grade 2), 115 groups of moderate rockburst data(grade 3), 51 groups of severe rockburst data(grade 4)were collected, as shown in Fig. 2.
Characteristic correlation analysis
Parson correlation coefficient is a commonly used classification coefficient in statistics. It is often used to describe the relationship and direction of correlation between two variables. The value of the correlation coefficient is between-1and1.When the correlation coefficient is 0,it means irrelevant, and the negative value of the correlation coefficient means negative correlation28. As shown in Fig. 3, most of the parameters are weakly correlated, and only a few show correlation, indicating that the intensity of rockburst is affected by multiple variable factors, rather than a single variable. Therefore, data preprocessing is performed on the original data to eliminate the influence of variable correlation.
According to the rockburst database established in this paper, the data characteristics are listed, as shown in Table 3, and the scatter matrix diagram between the parameters is drawn. As shown in Fig. 4, it can be seen from Fig. 4 that there are discrete values in the diagram. It may be due to the error of manual input, the error of measuring instrument and measuring method, resulting in the deviation of data from the true value. This may lead to the deviation of data distribution form due to outliers in the training model, resulting in deviation from data analysis and reducing the accuracy and reliability of the model. Therefore, it is necessary to eliminate outliers.
Construction of rockburst prediction model
LOF anomaly detection algorithm
Due to the scatter matrix diagram (Fig. 4) in the data analysis stage, it is found that there are outliers in the original data. Outliers refer to samples that are significantly different from most samples in the data group. These samples may be due to measurement errors, data entry errors or real anomalies, such outliers will distort the data distribution, resulting in a decline in the generalization ability of the model. So the LOF (Local Outlier Factor) algorithm is introduced.
LOF (Local Outlier Factor) algorithm is a density-based outlier detection method proposed by Breunig et al. (2000)34. The core idea is to identify outliers by comparing the local density of the sample with the local density of its neighboring samples. The core principle of the algorithm.
LOF determines outliers through five steps:
-
(1)
Definition of k-distance field.
For data point p, its k-domain is:
where q is the k-nearest value of p, k is the number of nearest neighbor samples of p, and distance is the distance between the k-th nearest neighbor point of p and the object p.
-
(2)
Calculating the reachable distance formula of data objects p and o.
In the formula: MinPts is the nearest neighbor number of the local neighborhood of object p. Figure 5 shows rdk (p2, o) and k-distance(o).
The distance formula used in the algorithm is Euclidean distance, and the distance between object p and o is d (p, o). This design ensures that distance distortion is avoided when p is in the o domain; keep the true distance when p is far away from o.
-
(3)
Calculate the local reachable density:
where lrd is the local reachable density, expressed as the reciprocal of the average reachable distance between the object p and its neighborhood points.
-
(4)
Calculate the local outlier factor:
where LOF is the local outlier factor, which is the average of the ratio of the local reachable density of the neighborhood point of object p to the local reachable density of object p.
(5) Decision rule.
If the value of the local outlier factor is greater than 1, indicating that the density of p is less than the density of the neighborhood point, then the point may be an outlier.
Key parameter optimization:
The grid search is driven by the Silhouette Coefficient, and the k value is finally determined to be 7. Observing the Fig. 6, it is found that the contour coefficient basically rises with the increase of k value and then reaches the plateau period; when k is greater than or equal to 11, it enters the platform period, k = 15 is the highest peak and k = 7 is the local sub-peak. At the same time, considering the local abnormal sensitivity of rockburst data, the sensitivity of the smaller k value to local outliers, the maximum value of k = 15 in the platform area is given up and k = 7 (local sub-peak) is selected.
ENN sample optimization algorithm
For the class imbalance problem of rockburst data, hybrid methods (DBNR-Tomek) and traditional resampling techniques (SMOTE, ENN and ADASYN, etc.) can be used. DBNR-Tomek is good at global proportion and enhanced data balance35 SMOTE / ADASYN is more suitable for scenes with absolute scarcity of minority classes, but may introduce noise. ENN is more suitable for dealing with boundary blur and improving boundary clarity. Considering the data group selected in this study, there are a large number of boundary overlapping samples, so ENN is selected as the data balance algorithm. The ENN (Edited Nearest Neighbour) algorithm is an undersampling method based on the nearest neighbor rule, which is proposed by Wilson. The basic principle is that for each sample in the majority class, if most of its nearest neighbor samples belong to the minority class, the majority class samples are eliminated. The ENN algorithm aims to retain the majority class samples that have an important impact on the classification boundary, while eliminating noise samples or redundant samples, thereby improving the classification performance36. The schematic diagram is shown in Fig. 7. The specific steps of ENN algorithm are as follows:
-
(1)
Calculate the nearest neighbor.
For each sample xi in the majority class, the Euclidean distance between it and all the minority samples is calculated, and the k nearest neighbor samples Nk (xi)of xi are determined according to the distance ranking.
(2) The number of samples belonging to the minority class in Nk (xi) is counted as nminority. If nminority ≥ k / 2 (that is, the minority class samples account for the majority in the nearest neighbor of xi ), it is considered that xi is located near the classification boundary, which may be noise or redundant samples.
(3) Delete boundary samples.
Deleting all samples that meet the above conditions xi, that is: if, then delete xi; where yi is the true label of xi and is an indicator function.
-
(4)
Start the compensation mechanism of implicit oversampling.
This mechanism is inspired by the future research direction of unbalanced data in literature37 and is designed to ensure that after the ENN and LOF cleanup, the already scarce minority samples become rarer, resulting in a decrease in model performance, thereby reducing the prediction accuracy of the model. If the sample size of a certain category after ENN cleaning is lower than the preset threshold, the target threshold of the maximum category of SMOTE oversampling value is automatically triggered, where the preset threshold α = 0. 6×max (Ni)target threshold: β = 0. 9×max (Ni), α, β were determined by grid search sensitivity analysis, and F1-Score was used as the evaluation index.
(5) Build a balanced data group.
The retained majority class samples, oversampling generated samples and the original minority class samples are combined to form a new sample set.
KNN classification algorithm
KNN (K-Nearest Neighbors) is an instance-based lazy learning method. It is mainly used for classification and regression tasks, where we focus on classification tasks. The core idea of KNN algorithm is to predict the category of new samples according to the category of K nearest neighbors of samples by measuring the distance between samples38. The flow chart is shown in Fig. 8. The specific steps are as follows:
-
(1)
Calculate the distance between samples.
Select the appropriate distance measurement methods, such as Euclidean distance, Hamandu distance, etc., and calculate the distance between all training samples.
-
(2)
Select k nearest neighbors.
According to the calculated distance, K training samples closest to the new sample are selected as the nearest neighbors.
-
(3)
Classification decision-making.
According to the category of K nearest neighbors, the category of new samples is determined by majority voting.
-
(4)
Determine the K-value selected by the KNN model.
Cross-validation was used to perform cross-validation on the new training data set and the original data set processed by LOF + ENN to systematically evaluate the optimal K values of a series of K-values (K = 1, 3. . 27, 29) in the KNN original model and the LOF-ENN-KNN model. Finally, it was found that the lowest error rate was provided in the LOF-ENN-KNN model when K = 1, and the lowest error rate was provided in the LOF-ENN-KNN model when K = 7. It is worth noting that the theoretical optimal K-value of the LOF-ENN-KNN model is usually greater than 1. However, due to the processing of LOF and ENN algorithms, the noise and inconsistency of the data set are significantly reduced, and the data distribution is optimized. In this case, the sensitivity of the model to overfitting is reduced, and the requirement for generalization ability is relatively weakened. At this time, the selection of K-value equal to 1 can better reflect the real category information of the local area.
Implementation of LOF-ENN-KNN integrated model
Firstly, the LOF algorithm is used to eliminate the outliers of the original data group, and 4 groups of non-rockburst data,6 groups of mild rockburst data,6 groups of moderate rockburst data,14 groups of severe rockburst data are eliminated, forming 269 groups of new data. The ENN algorithm is used to undersample and group the threshold to trigger implicit oversampling. Finally, 934 groups of data groups are obtained. The distribution of each parameter is shown in Figs. 9, 10 and 11, and the characteristics are shown in Table 4 below.
By comparing Fig. 1 (raw data) and Fig. 10 (processed data) and Fig. 4 (raw data) and Fig. 11 (processed data), it can be clearly observed that LOF and ENN preprocessing significantly optimize data distribution. The specific performance is as follows: (a) the multi-peak distribution of σc and σc/σt in the original data (Fig. 1) tends to be a single peak after preprocessing (Fig. 10); (b) The right-skewed long-tailed feature of σθ/σc(Fig. 1) is significantly shortened (Fig. 10); (c) The aggregation of sample points within each rockburst grade is enhanced (for example, the distribution of Wet index in strong rockburst grade is more concentrated, Fig. 10) ; (d) The overlap between classes is reduced (such as the separation of mild / moderate rockburst in the σθ dimension). Compared with the phenomenon that the sample points are dispersed and the overlap between classes is high before processing, the data groups optimized by LOF, ENN and other algorithms show closer clustering characteristics, and the decision boundaries of different categories in the feature space are clearer. This structural optimization effectively reduces the noise interference, narrows the sample distance within the class, and improves the separation between classes. The intensive trend of data distribution not only enhances the identifiability of feature representation, but also improves the stability and predictive confidence of the classification system by reducing the local optimal risk during model training.
The processed data groups were stratified sampled, of which 70% of the data groups were used as training groups and 30%of the data groups were used as test groups to evaluate the reliability and generalization ability of the model. Through five-fold cross-validation, it is imported into the KNN algorithm for prediction and verification and compared with BNs, SVM, LR, DT, and DNN algorithms. The classification performance is evaluated to find the optimal model. The specific flow chart is shown in Fig. 12.
Results and analysis of precision
Evaluation index system
Macro Accuracy, Macro Precision, Macro Recall, Macro F1-Score (hereinafter referred to as Accuracy Precision Recall F1-Score), Kappa coefficient is often used to evaluate the classification of the model17,39,40,41 the specific calculation formula is as follows (where C = 4 is the number of categories, TPc is the true example of the true category c):
.
Different algorithms are combined with each other and finally combined into 8 models: KNN(K = 7);LOF-KNN(K = 7); SMOTE-KNN(K = 7); ADASYN-KNN(K = 7); ENN-KNN(K = 7); LOF-SMOTE-KNN(K = 7); LOF-ENN-KNN(K = 1); LOF-ADASYN-KNN(K = 7), The model results are shown in Fig. 13 that LOF-ENN-KNN model has achieved an overwhelming advantage in all evaluation indicators. Compared with the original model and the single resampling model, the integrated model achieves a qualitative leap in prediction performance through a third-order processing framework. The evaluation index of LOF-ENN-KNN on the test set is Accuracy = 0. 9893, Precision = 0. 9935, Recall = 0. 9812, F1-score = 9870, Kappa = 0. 9847, which is significantly improved compared with the benchmark model. It is worth noting that the model optimizes the precision value and recall rate while ensuring high accuracy, and the accuracy rate is 28% higher than that of the single SMOTE model.
The horizontal comparison shows (Fig. 13) that although the LOF-ADASYN model increases the recall rate to 0. 6688 by adaptive oversampling, there is still a risk of overfitting. Although LOF-SMOTE has a Kappa coefficient of 0. 6019 and removes some noise according to the LOF algorithm, the performance of the test set is degraded due to its decision surface fuzzy strategy.
Further exploration of the confusion matrix diagram (Fig. 14) shows that the performance differences of different combination methods show a systematic law. The KNN benchmark model shows a serious category imbalance in the classification: the recall rate of None rockburst and Moderate rockburst is as high as 80%, while that of weak rockburst and strong rockburst is less than 50%. This phenomenon confirms the inherent defects of traditional methods on imbalanced data.
The improvement effect of single method showed significant differentiation. The LOF algorithm increases the recall rate of weak rockburst to more than 60%, but the recall rate of strong rockburst is still less than 50%. The SMOTE method is misjudged as other types of rockburst due to the distribution deviation of synthetic samples, resulting in nearly half of moderate. It is worth noting that the ADASYN model has a high recall rate due to its adaptive sampling characteristics. In addition, the nonlinear effect of the mixed model is significant. Although LOF-ADASYN has only 2 false positives in none rockburst, other rockburst categories are seriously misjudged. The recall rate of LOF-SMOTE categories is relatively average, but the false positives are significant. These findings show the negative coupling effect that may be caused by simple technology superposition.
In contrast, the LOF-ENN-KNN model shows excellent equalization performance in the confusion matrix. The recall rates of none rockburst and weak rockburst are both 100%, and the recall rates of moderate rockburst and strong rockburst are both above 90%. The model not only achieves a leap in accuracy on the basic model, but also establishes an overwhelming advantage in classification balance. Its ability to overcome over-fitting shows that it is a high-quality rock burst classification solution.
Comparison of model selection and result analysis
The LOF-ENN-KNN model is compared with the LOF + ENN + logistic regression (LR)/support vector machine (SVM)/decision tree (DT)/Bayesian (BNs)/deep neural network (DNN). Based on the data group, the rockburst prediction is carried out, and the performance of the model is evaluated by five-fold cross. The results show that the LOF-ENN-KNN model shows significant advantages as shown in Table 5. Its accuracy (98.93%) and Kappa coefficient (0.9847) are significantly higher than other comparison models, indicating that this method has the best comprehensive performance in rockburst classification tasks. The decision tree model (LOF-ENN-DT) ranked second with an accuracy of 97.51%, but its accuracy (97.47%) and F1-score (97.68%) were 2.88 and 1.02% points lower than KNN, respectively, reflecting the subtle over-fitting phenomenon in the decision boundary construction process. The intermediate level performance (83.96% accuracy) of deep neural network (LOF-ENN-DNN) reveals the relative advantages of traditional machine learning methods under limited sample size.
The overall disadvantage(accuracy < 60%) of linear models (LOF-ENN-LR/SVM) and Naive Bayes (LOF-ENN-NBs) indicates the nonlinear nature of rockburst parameters. In particular, there is a significant difference between the recall rate (53.19%) and the accuracy rate (62.86%) of LOF-ENN-LR, indicating that the linear decision boundary is difficult to coordinate the classification requirements of the majority class and the minority class. It is worth noting that although all models adopt the same preprocessing framework, the recall rate of KNN (98.12%) is 15.96% points higher than that of DNN (82.16%), which confirms that the local similarity measure is more suitable for the local aggregation characteristics of geological parameters than the global feature mapping.
The core of the performance difference between the models is the ability of the classifier to adapt to the feature space. KNN effectively captures the local structural features of the cleaned data through dynamic neighborhood adjustment (k-value optimization), while the hard segmentation strategy of decision tree can maintain high accuracy, but it is easy to generate error accumulation in areas with complex feature interaction. DNN is limited by the size of training data and the change of data distribution caused by preprocessing, and its deep nonlinear advantages have not been fully demonstrated. The comparison results confirm that the classifier based on geometric neighborhood relationship is more practical than the complex model in the medium-scale rockburst data group.
Engineering application verification
In order to facilitate engineering applications, this study collected 15 groups of data from the literature41 (Table 6), expanded the sample size to 46 groups through the LOF-ENN algorithm, and used different models to predict. The results show that (Fig. 15), LOF-ENN-KNN algorithm predicts all correctly, LOF-ENN-LR, LOF-ENN-SVM, LOF-ENN-DT accuracy is above 90%, BNs, DNN algorithm accuracy is less than 85%, indicating that the model can effectively predict the occurrence of rockburst in engineering practice, and also meet the actual accuracy requirements of engineering.
Discussion
The LOF-ENN-KNN proposed in this study improves the performance of the rockburst prediction model by solving the limitations of the traditional model. The model effectively eliminates the data discreteness caused by errors or data anomalies through the LOF algorithm, and uses ENN to correct the category distribution, balance the sample error and solve the problem of boundary ambiguity. In addition, the KNN classifier using local feature similarity to capture the complex clustering pattern of rock mechanics data is superior to the global model, and achieves better accuracy. However, the model is more dependent on hyperparameter optimization and needs to be modified according to data groups in different regions. At the same time, the data group is fully available depending on the parameters. When the parameters are missing or ultra-deep excavation is likely to cause a decrease in prediction reliability. Therefore, in the future, we should focus on integrating Bayesian optimization and meta-heuristic algorithms to achieve automatic hyper-parameter optimization. In addition, the development of transfer learning or embedding layers to process incomplete data uses the global rockburst database for unsupervised pre-training.
Conclusion
In this study, by selecting six characteristic variables: σθ, σc, σt, σθ/σc, σc/σt, Wet as input variables and rockburst grade as output variables, an innovative rockburst prediction model based on LOF-ENN-KNN is proposed. The conclusions are as follows.
(1) The noise samples are effectively eliminated by local outlier factor detection (LOF), and the category distribution is optimized by edit nearest neighbor rule (ENN). Finally, high-precision prediction is realized by K-nearest neighbor algorithm (KNN). The prediction accuracy is 98.93%, which has good stability, applicability and versatility.
(2) Through the comparative study of different sampling methods and different combinations of LOF algorithm, single resampling technology (SMOTE, ADASYN) or simple technology superposition (LOF-SMOTE, LOF-ADASYN) is easy to introduce over-fitting or negative coupling effect, while LOF-ENN-KNN significantly improves the robustness and generalization ability of the model through modular design.
(3)Compared with LR, SVM, DT, NBs and DNN algorithms, the prediction performance of this model is obviously better than other models, which proves that the local similarity measure is more suitable for the local aggregation characteristics of rock mechanics parameters than the global feature mapping.
(4)The prediction results are consistent with the field situation. The prediction accuracy of the model is 100%, showing significant engineering application value.
Data availability
Data is provided within the manuscript.
References
Li, P. & Cai, M. F. Challenges and new insights for exploitation of deep underground metal mineral resources. Trans. Nonferrous Met. Soc. China. 31, 3478–3505. https://doi.org/10.1016/s1003-6326(21)65744-8 (2021).
Wang, C., Xu, J. H., Li, Y. F., Wang, T. H. & Wang, Q. W. Optimization of BP neural network model for rockburst prediction under multiple influence factors. Appl. Sciences-Basel. 13 https://doi.org/10.3390/app13042741 (2023).
Qian, Q. H. Definition, mechanism, classification and quantitative forecast model for rockburst and pressure bump. Yantu Lixue/Rock Soil. Mech. 35, 1–6 (2014).
Zhou, J., Li, X. & Mitri, H. S. Classification of rockburst in underground projects: comparison of ten supervised learning methods. J. Comput. Civil Eng. 30 https://doi.org/10.1061/(ASCE)CP.1943-5487.0000553 (2016).
Cai, M. Principles of rock support in burst-prone ground. Tunn. Undergr. Space Technol. 36, 46–56. https://doi.org/10.1016/j.tust.2013.02.003 (2013).
Zhou, J., Li, X. & Shi, X. Long-term prediction model of rockburst in underground openings using heuristic algorithms and support vector machines. Saf. Sci. 50, 629–644. https://doi.org/10.1016/j.ssci.2011.08.065 (2012).
Chen, B. R., Feng, X. T., Li, Q. P., Luo, R. Z. & Li, S. J. Rock burst intensity classification based on the radiated energy with damage intensity at Jinping II hydropower station, China. Rock Mech. Rock Eng. 48, 289–303. https://doi.org/10.1007/s00603-013-0524-2 (2015).
Xie, H. P., Zhu, J. B., Zhou, T., Zhang, K. & Zhou, C. T. Conceptualization and preliminary study of engineering disturbed rock dynamics. Geomech. Geophys. Geo-Energy Geo-Resources. 6 https://doi.org/10.1007/s40948-020-00157-x (2020).
Liu, R. et al. Research on rock fracture evolution prediction model based on Adam-ConvLSTM and transfer learning. Discov Appl. Sci. 7 https://doi.org/10.1007/s42452-025-06661-7 (2025).
Armaghani, D. J. et al. Toward precise Long-Term rockburst forecasting: A fusion of SVM and Cutting-Edge Meta-heuristic algorithms. Nat. Resour. Res. 33, 2037–2062. https://doi.org/10.1007/s11053-024-10371-z (2024).
Sun, J., Wang, W. & Xie, L. Predicting Short-Term rockburst using RF–CRITIC and improved cloud model. Nat. Resour. Res. 33, 471–494. https://doi.org/10.1007/s11053-023-10275-4 (2024).
Liu, H. et al. Deep learning in rockburst intensity level prediction: performance evaluation and comparison of the NGO-CNN-BiGRU-Attention model. Appl. Sci. (Switzerland). 14. https://doi.org/10.3390/app14135719 (2024).
Jia, Z. C., Wang, Y., Wang, J. H., Pei, Q. Y. & Zhang, Y. Q. Techniques and Multi-model Ensemble Learning Algorithms. Rock Mech. Rock Eng. 57, 5207–5227. https://doi.org/10.1007/s00603-024-03811-y (2024). Rockburst Intensity Grade Prediction Based on Data Preprocessing.
Kuang, H. W., Ai, Z. Y. & Gu, G. L. A novel identification model for rock burst grades-taking Jinping II hydropower station hub engineering as an example. Comput. Geotech. 172 https://doi.org/10.1016/j.compgeo.2024.106440 (2024).
Shirani Faradonbeh, R., Vaisey, W., Sharifzadeh, M. & Zhou, J. Hybridized intelligent multi-class classifiers for rockburst risk assessment in deep underground mines. Neural Comput. Appl. 36, 1681–1698. https://doi.org/10.1007/s00521-023-09189-2 (2024).
Rao, G. et al. Long-term prediction modeling of shallow rockburst with small dataset based on machine learning. Sci. Rep. 14 https://doi.org/10.1038/s41598-024-64107-3 (2024).
Qiu, Y. & Zhou, J. Short-term rockburst prediction in underground project: insights from an explainable and interpretable ensemble learning model. Acta Geotech. 18, 6655–6685. https://doi.org/10.1007/s11440-023-01988-0 (2023).
Liang, W., Sari, Y. A., Zhao, G., McKinnon, S. D. & Wu, H. Probability estimates of Short-Term rockburst risk with ensemble classifiers. Rock Mech. Rock Eng. 54, 1799–1814. https://doi.org/10.1007/s00603-021-02369-3 (2021).
Guo, D., Chen, H., Tang, L., Chen, Z. & Samui, P. Assessment of rockburst risk using multivariate adaptive regression splines and deep forest model. Acta Geotech. 17, 1183–1205. https://doi.org/10.1007/s11440-021-01299-2 (2022).
Wang, H. et al. An intelligent rockburst prediction model based on scorecard methodology. Minerals 11 https://doi.org/10.3390/min11111294 (2021).
Rao, G. et al. Impact of a multiple oversampling technique-based assessment framework on shallow rockburst prediction models. Front. Earth Sci. 12, 1514591. https://doi.org/10.3389/feart.2024.1514591 (2025).
Li, N. & Jimenez, R. A logistic regression classifier for long-term probabilistic prediction of rock burst hazard. Nat. Hazards. 90, 197–215. https://doi.org/10.1007/s11069-017-3044-7 (2018).
Li, D., Liu, Z., Xiao, P. & Zhou, J. Jahed armaghani, D. Intelligent rockburst prediction model with sample category balance using feedforward neural network and bayesian optimization. Undergr. Space (China). 7, 833–846. https://doi.org/10.1016/j.undsp.2021.12.009 (2022).
Zhang, H., Xia, Y., Lin, M., Huang, J. & Yan, Y. A three-step rockburst prediction model based on data preprocessing combined with clustering and classification algorithms. Bull. Eng. Geol. Environ. 83 https://doi.org/10.1007/s10064-024-03774-y (2024).
Xue, Y. et al. Intelligent prediction of rockburst based on Copula-MC oversampling architecture. Bull. Eng. Geol. Environ. 81 https://doi.org/10.1007/s10064-022-02659-2 (2022).
Yang, T. J. et al. Comparative analysis and application of rockburst prediction model based on secretary bird optimization algorithm. Front. Earth Sci. 12 https://doi.org/10.3389/feart.2024.1487968 (2024).
Zhang, J., Wang, Y., Sun, Y. & Li, G. Strength of ensemble learning in multiclass classification of rockburst intensity. Int. J. Numer. Anal. Meth. Geomech. 44, 1833–1853. https://doi.org/10.1002/nag.3111 (2020).
Zhang, Q. et al. A semi-Naïve bayesian rock burst intensity prediction model based on average one-dependent estimator and incremental learning. Tunn. Undergr. Space Technol. 146 https://doi.org/10.1016/j.tust.2024.105666 (2024).
Guo, J., Guo, J., Zhang, Q. & Huang, M. Research on rockburst classification prediction based on BP-SVM model. IEEE Access. 10, 50427–50447. https://doi.org/10.1109/ACCESS.2022.3173059 (2022).
Afraei, S., Shahriar, K. & Madani, S. H. Developing intelligent classification models for rock burst prediction after recognizing significant predictor variables, sect. 1: literature review and data preprocessing procedure. Tunn. Undergr. Space Technol. 83, 324–353. https://doi.org/10.1016/j.tust.2018.09.022 (2019).
Dong, L. J., Li, X. B. & Peng, K. Prediction of rockburst classification using random forest. Trans. Nonferrous Met. Soc. China. 23, 472–477. https://doi.org/10.1016/S1003-6326(13)62487-5 (2013).
Liu, R., Ye, Y., Hu, N., Chen, H. & Wang, X. Classified prediction model of rockburst using rough sets-normal cloud. Neural Comput. Appl. 31, 8185–8193. https://doi.org/10.1007/s00521-018-3859-5 (2019).
Wu, S., Wu, Z. & Zhang, C. Rock burst prediction probability model based on case analysis. Tunn. Undergr. Space Technol. 93 https://doi.org/10.1016/j.tust.2019.103069 (2019).
Breunig, M. M., Kriegel, H. P., Ng, R. T. & Sander, J. LOF: identifying density-based local outliers. Sigmod Record. 29, 93–104 (2000).
Yan, S. et al. Investigation and application of data balancing and combined discriminant model in rock burst severity prediction. Sci. Rep. 14 https://doi.org/10.1038/s41598-024-81307-z (2024).
Wilson, D. L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man. Cybernetics. 2, 408–421. https://doi.org/10.1109/TSMC.1972.4309137 (1972).
He, H. B. & Garcia, E. A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284. https://doi.org/10.1109/tkde.2008.239 (2009).
Cover, T. M. & Hart, P. E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 13, 21–27. https://doi.org/10.1109/TIT.1967.1053964 (1967).
Wang, J., Ma, H. & Yan, X. Rockburst intensity classification prediction based on Multi-Model ensemble learning algorithms. Mathematics 11 https://doi.org/10.3390/math11040838 (2023).
Yin, X. et al. Strength of stacking technique of ensemble learning in rockburst prediction with imbalanced data: comparison of eight single and ensemble models. Nat. Resour. Res. 30, 1795–1815. https://doi.org/10.1007/s11053-020-09787-0 (2021).
Xue, G. L., Yilmaz, E., Song, W. D. & Cao, S. Fiber length effect on strength properties of polypropylene fiber reinforced cemented tailings backfill specimens with different sizes. Constr. Build. Mater. 241 https://doi.org/10.1016/j.conbuildmat.2020.118113 (2020).
Acknowledgements
This work was supported by the 2023 Jiangxi Province“Science and Technology + Emergency Response”Joint Program Project: Research on Safe and Efficient Mining Technology and Application Demonstration in Gannan Tungsten Mine[2023KYG01002].
Funding
This work was supported by the 2023 Jiangxi Province“Science and Technology + Emergency Response”Joint Program Project: Research on Safe and Efficient Mining Technology and Application Demonstration in Gannan Tungsten Mine[2023KYG01002].
Author information
Authors and Affiliations
Contributions
H.G. is the executor of the modeling design and theoretical analysis of this study, and is responsible for the writing of the first draft. J.Z., C.M. and K.H. completed data analysis and guided the writing and revision of the paper; Y.L. and Z.W. participates in the modeling process and results analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ge, H., Zhang, J., Ma, C. et al. Rockburst intensity grading prediction based on the LOF-ENN-KNN model. Sci Rep 15, 29385 (2025). https://doi.org/10.1038/s41598-025-15603-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-15603-7