‌Rockburst intensity grading prediction based on the LOF-ENN-KNN model‌

Ge, Haoran; Zhang, Jiyong; Ma, Congbo; Hui, Kai; Li, Yicong; Wu, Ziniu

doi:10.1038/s41598-025-15603-7

Download PDF

Article
Open access
Published: 11 August 2025

‌Rockburst intensity grading prediction based on the LOF-ENN-KNN model‌

Haoran Ge¹,
Jiyong Zhang^1,2,
Congbo Ma¹,
Kai Hui¹,
Yicong Li³ &
…
Ziniu Wu¹

Scientific Reports volume 15, Article number: 29385 (2025) Cite this article

784 Accesses
Metrics details

Subjects

Abstract

Rockburst is a typical dynamic disaster in deep underground engineering, and its accurate prediction is of great significance to ensure the safety of engineering. Aiming at the key problems in rockburst prediction, such as insufficient analysis of nonlinear correlation characteristics, significant discreteness of small sample data and limited generalization ability of traditional models, this study innovatively constructs a LOF-ENN-KNN coupling prediction model. The model achieves a breakthrough in prediction performance through a three-level progressive processing architecture: noise samples are effectively eliminated by LOF (Local Outlier Factor), category distribution is optimized by ENN (Edited Nearest Neighbour), and high-precision prediction is finally achieved by KNN(K-Nearest Neighbors).In the rockburst database composed of 299 sets of measured data in the world, the LOF-ENN-KNN model shows significant advantages: its overall accuracy rate is 98.93%,which has an overwhelming advantage over the traditional model. Through the comparison and exploration of different sampling methods and different combinations of LOF algorithm, single resampling technology (SMOTE, ADASYN) or simple technology superposition (LOF-SMOTE, LOF-ADASYN) is easy to introduce over-fitting or negative coupling effect, while LOF-ENN-KNN significantly improves the robustness and generalization ability of the model through modular design. In addition, compared with LR, SVM, DT, NBs and DNN intelligent algorithms, LOF-ENN-KNN has significant advantages. Through engineering examples, it is further confirmed that high-precision prediction ability can still be preserved in complex nonlinear parameters, which provides an efficient and reliable technical scheme for rockburst warning in deep engineering.

Introduction

With the demand for resources, shallow mineral resources can no longer meet people’s needs, and it is an inevitable trend to march towards the deep. As one of the representatives of deep disasters, rockburst seriously threatens the construction of underground engineering. Rockburst is one of the dynamic instability phenomena of surrounding rock spalling, blockage, ejection or throwing due to the sudden and sharp dissipation of elastic strain energy in rock^1,2 and its essence is that the surrounding rock obtains kinetic energy³. Rock particles can be ejected at a rapid rate causing huge damage to infrastructure and equipment, and may even cause fatal damage^4,5. In 1640, the rockburst disaster at the Altenberg tin mine in Germany caused serious damage, which may be the earliest rockburst disaster on record. Many hard rock mines and deep buried tunnels in China, Switzerland and other countries have suffered different degrees of rockburst hazards. Taking China ‘s data as an example, it is reported that from 2001–2007, there were 13,000 rockburst accidents in metal mines, with more than 16,000 casualties⁶. In recent years, although many countries have done a lot of research on rockburst, with the increasing depth of underground engineering, the phenomenon of rockburst is not reduced but increased^7,8.

In the field of rock burst prediction, although empirical methods and on-site monitoring have achieved good results, they often face the challenges of monitoring conditions, data errors and long cycle in practical engineering. In contrast, machine learning shows significant advantages: low requirements for environmental adaptability, relatively short prediction period, and effective capture of complex nonlinear interactions between rockburst influencing factors. This feature makes machine learning particularly suitable for different types of rockburst prediction requirements⁹. According to the characteristics of different influencing factors and prediction requirements. Rockburst prediction is divided into long-term prediction and short-term prediction¹⁰. Long-term rockburst prediction is based on the inherent mechanical parameters of rock to estimate the possibility of rockburst in different rock types under different stress conditions¹¹. Mainly focused on long-term planning. These parameters are relatively stable, widely spatial representative and easy to obtain. Machine learning models are particularly good at mining potential and complex association patterns from such static data, and can handle high-dimensional parameter spaces. Therefore, long-term prediction results are relatively stable and provide a wide range of spatial risk assessment. For this reason, researchers have developed many efficient machine learning models: Liu et al.¹² developed the NGO-CNN-BiGRU-Attention model, which innovatively combines the northern eagle optimization algorithm with convolutional neural network and bidirectional gated recurrent unit. The model extracts the spatial distribution characteristics of rock mechanical parameters through CNN, captures the temporal evolution law of microseismic activity by BiGRU, and dynamically weighted key features of attention mechanism. The accuracy rate of the model is 98% in 287 sample tests, which is 45% -60% higher than that of traditional SVM and KNN models. Jia et al.¹³combined the meta-heuristic algorithm with the Voting-Soft model to construct a comprehensive model for rockburst intensity prediction. Armaghani et al.¹⁰ combined the Wild Dog Optimization Algorithm (DOA), Osprey Optimization Algorithm (OOA), Rime Ice Optimization Algorithm (RIME) and Support Vector Machine (SVM)to construct a hybrid model for long-term prediction of rockburst. The model has high accuracy and strong generalization ability. The PCA-PSO-RVM model proposed by Kuang et al.¹⁴ has the advantages of high precision and simple calculation, so it shows high precision and strong adaptability in engineering verification. The K-means-CART classifier K-means-GEP-LR model proposed by Faradonbeh et al.¹⁵ has high robustness and high classification mode in long-term rockburst monitoring. The application of KMSORF model generated by Rao et al.¹⁶ using the Optuna framework to achieve efficient adjustment of hyperparameters in tungsten mine and tunnel projects shows its ability to accurately predict rockburst levels. Short-term rockburst prediction uses in-situ monitoring techniques or data to predict near-term rockburst risks, such as microseismic monitoring, acoustic emission, microgravity and electromagnetic radiation techniques or data focusing on ' real-time early warning ‘. Sun et al.¹¹ used six microseismic parameters to use RF-CRITIC, ECICM, CM, and CG algorithm models to form 105 groups of data sets for short-term rockburst prediction of slight and moderate rockburst in the Ashel copper mine in Xinjiang, the New Jersey hydropower project in Pakistan, and two moderate rockburst in the Qinling water conveyance tunnel in Shaanxi. In the same year, Qiu and Zhou¹⁷ also used the same input parameters for short-term rockburst prediction. Liang et al.¹⁸ added the incubation time parameter on the basis of six microseismic parameters and divided the 91 datasets into the training set and the validation set at 7:3 to establish multiple single models for engineering practice and the Jinping II Hydropower Station. Therefore, compared with short-term prediction, long-term prediction model has strong universality and mobility.

Therefore, with the deepening of research, many scholars have achieved ideal prediction results^19,20,21. Li and Jimenez(2018)²² proposed an empirical prediction method for rockburst in 135 datasets using LR, Li et al. (2022)²³; Zhang et al. (2024)²⁴; Xue et al.(2022)²⁵ and Qiu et al.(2023)¹⁷; Yang et al.(2024)²⁶ used SOMTE and ADASYN oversampling techniques to optimize imbalanced sample sets, which improved the accuracy of prediction. Zhang et al. (2024)²⁴ used LOF to eliminate outliers, making the model more accurate in predicting rockburst. Although machine learning has achieved good prediction results, there are still limitations such as oversampling technology leading to blurred boundaries; single anomaly monitoring cannot solve category overlap; the complex model is limited by the small sample problem. In view of the above shortcomings, this paper uses 299 groups of rockburst data to effectively eliminate noise samples through LOF (Local Outlier Factor), and uses ENN (Edited Nearest Neighbour) to optimize the category distribution. Finally, the LOF-ENN-KNN rockburst prediction model predicted by KNN (K-Nearest Neighbors) is designed to solve the problem of boundary overlap and outlier interference simultaneously, and the KNN classifier is introduced to increase the nonlinear mapping ability.

Data Preparation and characteristic engineering

Rockburst characteristic index system construction

Because the empirical criterion of rockburst is mainly based on rock strength, in-situ stress, brittleness and energy⁴, this paper will choose the following six variables as characteristic variables: σ_θ, σ_c, σ_t, σ_θ/σ_c, σ_c/σ_t, W_et, which is defined as Table 1. In the process of underground excavation, when the shear stress of the boundary rock mass increases, the radial stress will decrease. Therefore, high shear stress may lead to strain explosion on the surface of surrounding rock, so σ_θ is selected. The uniaxial compressive strength σ_c and the uniaxial tensile strength σ_t reflect the internal properties of the rock. Because the rock accumulates energy before the fracture, the rock with high strength is more likely to burst. In addition, σ_θ/σ_c reaction stress concentration coefficient, σ_c/σ_t are two forms of rock brittleness index; wet reflects the ratio of the stored elastic strain energy to the dissipated elastic strain energy in the hysteresis loop test²⁷. Therefore, this paper will choose the following six variables as the characteristic variables: σ_θ, σ_c, σ_t, σ_θ/σ_c, σ_c/σ_t, W_et, which are defined as Table 1. In addition, these six characteristic variables are often used as empirical indicators for rockburst evaluation and are often used in previous literature^10,13,28 and are widely considered to be closely related to rockburst.

Table 1 Definition of characteristic variables²⁸.

Full size table

Data source and analysis

The real data of rockburst were collected from published literatures at home and abroad. A total of 771 groups of data were collected, and 299 groups of rockburst data were obtained after eliminating duplicate samples. Among them, 122 groups of data were from Guo et al.(2022)²⁹103 groups of data were from Afraei et al.(2019)³⁰ 36 groups of data were from Dong et al.(2013)³¹ 11 groups of data were from Liu et al.2019³²; seven groups from Wu et al.³³; there are six groups from Li et al.(2022)²³ and two groups from Zhou et al.(2012)⁶. The distribution of the six characteristic variables of the data is shown in Fig. 1. In Fig. 1, the rockburst level abbreviation is defined as: S (Strong Rockburst), M (Moderate Rockburst), L (Light Rockburst), N (None Rockburst). The red dots in the Fig. 1 mark the center value (i.e., median) of the characteristic variables corresponding to each rockburst grade. The observation of Fig. 1 shows that σ_θ and Wet show a right-skewed distribution, and the main sample is concentrated in the low value area. σ_t shows a bimodal distribution, reflecting the biclustering characteristics of tensile strength of rock mass. σ_c shows a multi-peak distribution, which may be due to the difference in compressive strength of different lithologies. σ_θ/σ_c showed a strong right-skewed distribution; the distribution of σ_c/σ_t is highly discrete and has extreme values. The overall data is relatively discrete, showing data asymmetry and greater volatility. Therefore, it is necessary to preprocess the input variables.

According to the general classification standard of rockburst grade shown in Table 2 below⁶ a total of 55 groups of no rockburst data(grade 1), 78 groups of mild rockburst data(grade 2), 115 groups of moderate rockburst data(grade 3), 51 groups of severe rockburst data(grade 4)were collected, as shown in Fig. 2.

Table 2 The common classification standard for rockburst intensities⁶.

Full size table

Characteristic correlation analysis

Parson correlation coefficient is a commonly used classification coefficient in statistics. It is often used to describe the relationship and direction of correlation between two variables. The value of the correlation coefficient is between-1and1.When the correlation coefficient is 0,it means irrelevant, and the negative value of the correlation coefficient means negative correlation²⁸. As shown in Fig. 3, most of the parameters are weakly correlated, and only a few show correlation, indicating that the intensity of rockburst is affected by multiple variable factors, rather than a single variable. Therefore, data preprocessing is performed on the original data to eliminate the influence of variable correlation.

According to the rockburst database established in this paper, the data characteristics are listed, as shown in Table 3, and the scatter matrix diagram between the parameters is drawn. As shown in Fig. 4, it can be seen from Fig. 4 that there are discrete values in the diagram. It may be due to the error of manual input, the error of measuring instrument and measuring method, resulting in the deviation of data from the true value. This may lead to the deviation of data distribution form due to outliers in the training model, resulting in deviation from data analysis and reducing the accuracy and reliability of the model. Therefore, it is necessary to eliminate outliers.

Table 3 Data characteristics of each parameter in database.

Full size table

Construction of rockburst prediction model

LOF anomaly detection algorithm

Due to the scatter matrix diagram (Fig. 4) in the data analysis stage, it is found that there are outliers in the original data. Outliers refer to samples that are significantly different from most samples in the data group. These samples may be due to measurement errors, data entry errors or real anomalies, such outliers will distort the data distribution, resulting in a decline in the generalization ability of the model. So the LOF (Local Outlier Factor) algorithm is introduced.

LOF (Local Outlier Factor) algorithm is a density-based outlier detection method proposed by Breunig et al. (2000)³⁴. The core idea is to identify outliers by comparing the local density of the sample with the local density of its neighboring samples. The core principle of the algorithm.

LOF determines outliers through five steps:

(1)
Definition of k-distance field.

For data point p, its k-domain is:

$$\:{N}_{k}\left(p\right)=\left\{q\in\:D\backslash\:\left\{p\right\}|d\left(p,q\le\:k-distance\right)\right\}$$

(1)

where q is the k-nearest value of p, k is the number of nearest neighbor samples of p, and distance is the distance between the k-th nearest neighbor point of p and the object p.

(2)
Calculating the reachable distance formula of data objects p and o.

$$\:{rd}_{MinPts}\left(p,o\right)=\text{max}\left\{k-distance\left(o\right),d\left(p,o\right)\right\}$$

(2)

In the formula: MinPts is the nearest neighbor number of the local neighborhood of object p. Figure 5 shows rd_k (p₂, o) and k-distance(o).

The distance formula used in the algorithm is Euclidean distance, and the distance between object p and o is d (p, o). This design ensures that distance distortion is avoided when p is in the o domain; keep the true distance when p is far away from o.

$$\:d\left(p,o\right)=\sqrt{{\left({x}_{p}-{x}_{o}\right)}^{2}+{\left({y}_{p}-{y}_{o}\right)}^{2}}$$

(3)

(3)
Calculate the local reachable density:

$$\:{lrd}_{MinPts}\left(p\right)=\frac{1}{\left(\frac{\sum\:_{o\in\:{N}_{MinPts}\left(p\right)}{rd}_{MinPts}\left(p,o\right)}{\left|{N}_{MinPts}\left(p\right)\right|}\right)}$$

(4)

where lrd is the local reachable density, expressed as the reciprocal of the average reachable distance between the object p and its neighborhood points.

(4)
Calculate the local outlier factor:

$$\:{LOF}_{MinPts}\left(p\right)=\left(\frac{\sum\:_{o\in\:{N}_{MinPts}\left(p\right)}\frac{{lrd}_{MinPts}\left(o\right)}{{lrd}_{MinPts}\left(p\right)}}{\left|{N}_{MinPts}\left(p\right)\right|}\right)$$

(5)

where LOF is the local outlier factor, which is the average of the ratio of the local reachable density of the neighborhood point of object p to the local reachable density of object p.

(5) Decision rule.

If the value of the local outlier factor is greater than 1, indicating that the density of p is less than the density of the neighborhood point, then the point may be an outlier.

Key parameter optimization:

The grid search is driven by the Silhouette Coefficient, and the k value is finally determined to be 7. Observing the Fig. 6, it is found that the contour coefficient basically rises with the increase of k value and then reaches the plateau period; when k is greater than or equal to 11, it enters the platform period, k = 15 is the highest peak and k = 7 is the local sub-peak. At the same time, considering the local abnormal sensitivity of rockburst data, the sensitivity of the smaller k value to local outliers, the maximum value of k = 15 in the platform area is given up and k = 7 (local sub-peak) is selected.

ENN sample optimization algorithm

For the class imbalance problem of rockburst data, hybrid methods (DBNR-Tomek) and traditional resampling techniques (SMOTE, ENN and ADASYN, etc.) can be used. DBNR-Tomek is good at global proportion and enhanced data balance³⁵ SMOTE / ADASYN is more suitable for scenes with absolute scarcity of minority classes, but may introduce noise. ENN is more suitable for dealing with boundary blur and improving boundary clarity. Considering the data group selected in this study, there are a large number of boundary overlapping samples, so ENN is selected as the data balance algorithm. The ENN (Edited Nearest Neighbour) algorithm is an undersampling method based on the nearest neighbor rule, which is proposed by Wilson. The basic principle is that for each sample in the majority class, if most of its nearest neighbor samples belong to the minority class, the majority class samples are eliminated. The ENN algorithm aims to retain the majority class samples that have an important impact on the classification boundary, while eliminating noise samples or redundant samples, thereby improving the classification performance³⁶. The schematic diagram is shown in Fig. 7. The specific steps of ENN algorithm are as follows:

(1)
Calculate the nearest neighbor.

For each sample xi in the majority class, the Euclidean distance between it and all the minority samples is calculated, and the k nearest neighbor samples N_k (x_i)of xi are determined according to the distance ranking.

(2) The number of samples belonging to the minority class in N_k (x_i) is counted as n_minority. If n_minority ≥ k / 2 (that is, the minority class samples account for the majority in the nearest neighbor of x_i ), it is considered that x_i is located near the classification boundary, which may be noise or redundant samples.

(3) Delete boundary samples.

Deleting all samples that meet the above conditions x_i, that is: if, then delete x_i; where y_i is the true label of x_i and is an indicator function.

(4)
Start the compensation mechanism of implicit oversampling.

This mechanism is inspired by the future research direction of unbalanced data in literature³⁷ and is designed to ensure that after the ENN and LOF cleanup, the already scarce minority samples become rarer, resulting in a decrease in model performance, thereby reducing the prediction accuracy of the model. If the sample size of a certain category after ENN cleaning is lower than the preset threshold, the target threshold of the maximum category of SMOTE oversampling value is automatically triggered, where the preset threshold α = 0. 6×max (N_i)target threshold: β = 0. 9×max (N_i), α, β were determined by grid search sensitivity analysis, and F1-Score was used as the evaluation index.

(5) Build a balanced data group.

The retained majority class samples, oversampling generated samples and the original minority class samples are combined to form a new sample set.

KNN classification algorithm

KNN (K-Nearest Neighbors) is an instance-based lazy learning method. It is mainly used for classification and regression tasks, where we focus on classification tasks. The core idea of KNN algorithm is to predict the category of new samples according to the category of K nearest neighbors of samples by measuring the distance between samples³⁸. The flow chart is shown in Fig. 8. The specific steps are as follows:

(1)
Calculate the distance between samples.

Select the appropriate distance measurement methods, such as Euclidean distance, Hamandu distance, etc., and calculate the distance between all training samples.

(2)
Select k nearest neighbors.

According to the calculated distance, K training samples closest to the new sample are selected as the nearest neighbors.

(3)
Classification decision-making.

According to the category of K nearest neighbors, the category of new samples is determined by majority voting.

(4)
Determine the K-value selected by the KNN model.

Cross-validation was used to perform cross-validation on the new training data set and the original data set processed by LOF + ENN to systematically evaluate the optimal K values of a series of K-values (K = 1, 3. . 27, 29) in the KNN original model and the LOF-ENN-KNN model. Finally, it was found that the lowest error rate was provided in the LOF-ENN-KNN model when K = 1, and the lowest error rate was provided in the LOF-ENN-KNN model when K = 7. It is worth noting that the theoretical optimal K-value of the LOF-ENN-KNN model is usually greater than 1. However, due to the processing of LOF and ENN algorithms, the noise and inconsistency of the data set are significantly reduced, and the data distribution is optimized. In this case, the sensitivity of the model to overfitting is reduced, and the requirement for generalization ability is relatively weakened. At this time, the selection of K-value equal to 1 can better reflect the real category information of the local area.

Implementation of LOF-ENN-KNN integrated model

Firstly, the LOF algorithm is used to eliminate the outliers of the original data group, and 4 groups of non-rockburst data,6 groups of mild rockburst data,6 groups of moderate rockburst data,14 groups of severe rockburst data are eliminated, forming 269 groups of new data. The ENN algorithm is used to undersample and group the threshold to trigger implicit oversampling. Finally, 934 groups of data groups are obtained. The distribution of each parameter is shown in Figs. 9, 10 and 11, and the characteristics are shown in Table 4 below.

Table 4 Data characteristics of each parameter after database processing.

Full size table

By comparing Fig. 1 (raw data) and Fig. 10 (processed data) and Fig. 4 (raw data) and Fig. 11 (processed data), it can be clearly observed that LOF and ENN preprocessing significantly optimize data distribution. The specific performance is as follows: (a) the multi-peak distribution of σ_c and σ_c/σ_t in the original data (Fig. 1) tends to be a single peak after preprocessing (Fig. 10); (b) The right-skewed long-tailed feature of σ_θ/σ_c(Fig. 1) is significantly shortened (Fig. 10); (c) The aggregation of sample points within each rockburst grade is enhanced (for example, the distribution of Wet index in strong rockburst grade is more concentrated, Fig. 10) ; (d) The overlap between classes is reduced (such as the separation of mild / moderate rockburst in the σ_θ dimension). Compared with the phenomenon that the sample points are dispersed and the overlap between classes is high before processing, the data groups optimized by LOF, ENN and other algorithms show closer clustering characteristics, and the decision boundaries of different categories in the feature space are clearer. This structural optimization effectively reduces the noise interference, narrows the sample distance within the class, and improves the separation between classes. The intensive trend of data distribution not only enhances the identifiability of feature representation, but also improves the stability and predictive confidence of the classification system by reducing the local optimal risk during model training.

The processed data groups were stratified sampled, of which 70% of the data groups were used as training groups and 30%of the data groups were used as test groups to evaluate the reliability and generalization ability of the model. Through five-fold cross-validation, it is imported into the KNN algorithm for prediction and verification and compared with BNs, SVM, LR, DT, and DNN algorithms. The classification performance is evaluated to find the optimal model. The specific flow chart is shown in Fig. 12.

Results and analysis of precision

Evaluation index system

Macro Accuracy, Macro Precision, Macro Recall, Macro F1-Score (hereinafter referred to as Accuracy Precision Recall F1-Score), Kappa coefficient is often used to evaluate the classification of the model^17,39,40,41 the specific calculation formula is as follows (where C = 4 is the number of categories, TPc is the true example of the true category c):

$$\:Accuracy=\frac{1}{C}\sum\:_{c=1}^{C}\frac{T{P}_{C}}{T{P}_{C}+F{P}_{C}+F{N}_{C}}$$

(6)

$$\:Precision=\frac{1}{C}\sum\:_{c=1}^{C}\frac{T{P}_{C}}{T{P}_{C}+F{P}_{C}}$$

(7)

$$\:Recall=\frac{1}{C}\sum\:_{c=1}^{C}\frac{T{P}_{C}}{T{P}_{C}+F{N}_{C}}$$

(8)

$$\:F1-Score=\frac{1}{C}\sum\:_{c=1}^{C}2\times\:\frac{{Precision}_{c}{\times\:Recall}_{c}}{{Precision}_{c}+{Recall}_{c}}$$

(9)

$$\:Kappa=\frac{{P}_{0}-{P}_{e}}{1-{P}_{e}}$$

(10)

$$\:{P}_{0}=\frac{TP+TN}{TP+YN+FN+FP},{P}_{e}=\frac{\left(TP+FP\right)\left(TP+FN\right)+\left(FN+TN\right)\left(FP+TN\right)}{{\left(TP+FN+FP+TN\right)}^{2}}$$

.

Different algorithms are combined with each other and finally combined into 8 models: KNN(K = 7);LOF-KNN(K = 7); SMOTE-KNN(K = 7); ADASYN-KNN(K = 7); ENN-KNN(K = 7); LOF-SMOTE-KNN(K = 7); LOF-ENN-KNN(K = 1); LOF-ADASYN-KNN(K = 7), The model results are shown in Fig. 13 that LOF-ENN-KNN model has achieved an overwhelming advantage in all evaluation indicators. Compared with the original model and the single resampling model, the integrated model achieves a qualitative leap in prediction performance through a third-order processing framework. The evaluation index of LOF-ENN-KNN on the test set is Accuracy = 0. 9893, Precision = 0. 9935, Recall = 0. 9812, F1-score = 9870, Kappa = 0. 9847, which is significantly improved compared with the benchmark model. It is worth noting that the model optimizes the precision value and recall rate while ensuring high accuracy, and the accuracy rate is 28% higher than that of the single SMOTE model.

The horizontal comparison shows (Fig. 13) that although the LOF-ADASYN model increases the recall rate to 0. 6688 by adaptive oversampling, there is still a risk of overfitting. Although LOF-SMOTE has a Kappa coefficient of 0. 6019 and removes some noise according to the LOF algorithm, the performance of the test set is degraded due to its decision surface fuzzy strategy.

Further exploration of the confusion matrix diagram (Fig. 14) shows that the performance differences of different combination methods show a systematic law. The KNN benchmark model shows a serious category imbalance in the classification: the recall rate of None rockburst and Moderate rockburst is as high as 80%, while that of weak rockburst and strong rockburst is less than 50%. This phenomenon confirms the inherent defects of traditional methods on imbalanced data.

The improvement effect of single method showed significant differentiation. The LOF algorithm increases the recall rate of weak rockburst to more than 60%, but the recall rate of strong rockburst is still less than 50%. The SMOTE method is misjudged as other types of rockburst due to the distribution deviation of synthetic samples, resulting in nearly half of moderate. It is worth noting that the ADASYN model has a high recall rate due to its adaptive sampling characteristics. In addition, the nonlinear effect of the mixed model is significant. Although LOF-ADASYN has only 2 false positives in none rockburst, other rockburst categories are seriously misjudged. The recall rate of LOF-SMOTE categories is relatively average, but the false positives are significant. These findings show the negative coupling effect that may be caused by simple technology superposition.

In contrast, the LOF-ENN-KNN model shows excellent equalization performance in the confusion matrix. The recall rates of none rockburst and weak rockburst are both 100%, and the recall rates of moderate rockburst and strong rockburst are both above 90%. The model not only achieves a leap in accuracy on the basic model, but also establishes an overwhelming advantage in classification balance. Its ability to overcome over-fitting shows that it is a high-quality rock burst classification solution.

Comparison of model selection and result analysis

The LOF-ENN-KNN model is compared with the LOF + ENN + logistic regression (LR)/support vector machine (SVM)/decision tree (DT)/Bayesian (BNs)/deep neural network (DNN). Based on the data group, the rockburst prediction is carried out, and the performance of the model is evaluated by five-fold cross. The results show that the LOF-ENN-KNN model shows significant advantages as shown in Table 5. Its accuracy (98.93%) and Kappa coefficient (0.9847) are significantly higher than other comparison models, indicating that this method has the best comprehensive performance in rockburst classification tasks. The decision tree model (LOF-ENN-DT) ranked second with an accuracy of 97.51%, but its accuracy (97.47%) and F1-score (97.68%) were 2.88 and 1.02% points lower than KNN, respectively, reflecting the subtle over-fitting phenomenon in the decision boundary construction process. The intermediate level performance (83.96% accuracy) of deep neural network (LOF-ENN-DNN) reveals the relative advantages of traditional machine learning methods under limited sample size.

The overall disadvantage(accuracy < 60%) of linear models (LOF-ENN-LR/SVM) and Naive Bayes (LOF-ENN-NBs) indicates the nonlinear nature of rockburst parameters. In particular, there is a significant difference between the recall rate (53.19%) and the accuracy rate (62.86%) of LOF-ENN-LR, indicating that the linear decision boundary is difficult to coordinate the classification requirements of the majority class and the minority class. It is worth noting that although all models adopt the same preprocessing framework, the recall rate of KNN (98.12%) is 15.96% points higher than that of DNN (82.16%), which confirms that the local similarity measure is more suitable for the local aggregation characteristics of geological parameters than the global feature mapping.

The core of the performance difference between the models is the ability of the classifier to adapt to the feature space. KNN effectively captures the local structural features of the cleaned data through dynamic neighborhood adjustment (k-value optimization), while the hard segmentation strategy of decision tree can maintain high accuracy, but it is easy to generate error accumulation in areas with complex feature interaction. DNN is limited by the size of training data and the change of data distribution caused by preprocessing, and its deep nonlinear advantages have not been fully demonstrated. The comparison results confirm that the classifier based on geometric neighborhood relationship is more practical than the complex model in the medium-scale rockburst data group.

Table 5 Different model classification evaluation performance table.

Full size table

Engineering application verification

In order to facilitate engineering applications, this study collected 15 groups of data from the literature⁴¹ (Table 6), expanded the sample size to 46 groups through the LOF-ENN algorithm, and used different models to predict. The results show that (Fig. 15), LOF-ENN-KNN algorithm predicts all correctly, LOF-ENN-LR, LOF-ENN-SVM, LOF-ENN-DT accuracy is above 90%, BNs, DNN algorithm accuracy is less than 85%, indicating that the model can effectively predict the occurrence of rockburst in engineering practice, and also meet the actual accuracy requirements of engineering.

Table 6 Specific data of data group used in engineering verification.

Full size table

Discussion

The LOF-ENN-KNN proposed in this study improves the performance of the rockburst prediction model by solving the limitations of the traditional model. The model effectively eliminates the data discreteness caused by errors or data anomalies through the LOF algorithm, and uses ENN to correct the category distribution, balance the sample error and solve the problem of boundary ambiguity. In addition, the KNN classifier using local feature similarity to capture the complex clustering pattern of rock mechanics data is superior to the global model, and achieves better accuracy. However, the model is more dependent on hyperparameter optimization and needs to be modified according to data groups in different regions. At the same time, the data group is fully available depending on the parameters. When the parameters are missing or ultra-deep excavation is likely to cause a decrease in prediction reliability. Therefore, in the future, we should focus on integrating Bayesian optimization and meta-heuristic algorithms to achieve automatic hyper-parameter optimization. In addition, the development of transfer learning or embedding layers to process incomplete data uses the global rockburst database for unsupervised pre-training.

Conclusion

In this study, by selecting six characteristic variables: σ_θ, σ_c, σ_t, σ_θ/σ_c, σ_c/σ_t, W_et as input variables and rockburst grade as output variables, an innovative rockburst prediction model based on LOF-ENN-KNN is proposed. The conclusions are as follows.

(1) The noise samples are effectively eliminated by local outlier factor detection (LOF), and the category distribution is optimized by edit nearest neighbor rule (ENN). Finally, high-precision prediction is realized by K-nearest neighbor algorithm (KNN). The prediction accuracy is 98.93%, which has good stability, applicability and versatility.

(2) Through the comparative study of different sampling methods and different combinations of LOF algorithm, single resampling technology (SMOTE, ADASYN) or simple technology superposition (LOF-SMOTE, LOF-ADASYN) is easy to introduce over-fitting or negative coupling effect, while LOF-ENN-KNN significantly improves the robustness and generalization ability of the model through modular design.

(3)Compared with LR, SVM, DT, NBs and DNN algorithms, the prediction performance of this model is obviously better than other models, which proves that the local similarity measure is more suitable for the local aggregation characteristics of rock mechanics parameters than the global feature mapping.

(4)The prediction results are consistent with the field situation. The prediction accuracy of the model is 100%, showing significant engineering application value.

Data availability

Data is provided within the manuscript.

References

Li, P. & Cai, M. F. Challenges and new insights for exploitation of deep underground metal mineral resources. Trans. Nonferrous Met. Soc. China. 31, 3478–3505. https://doi.org/10.1016/s1003-6326(21)65744-8 (2021).
Article CAS Google Scholar
Wang, C., Xu, J. H., Li, Y. F., Wang, T. H. & Wang, Q. W. Optimization of BP neural network model for rockburst prediction under multiple influence factors. Appl. Sciences-Basel. 13 https://doi.org/10.3390/app13042741 (2023).
Qian, Q. H. Definition, mechanism, classification and quantitative forecast model for rockburst and pressure bump. Yantu Lixue/Rock Soil. Mech. 35, 1–6 (2014).
Google Scholar
Zhou, J., Li, X. & Mitri, H. S. Classification of rockburst in underground projects: comparison of ten supervised learning methods. J. Comput. Civil Eng. 30 https://doi.org/10.1061/(ASCE)CP.1943-5487.0000553 (2016).
Cai, M. Principles of rock support in burst-prone ground. Tunn. Undergr. Space Technol. 36, 46–56. https://doi.org/10.1016/j.tust.2013.02.003 (2013).
Article Google Scholar
Zhou, J., Li, X. & Shi, X. Long-term prediction model of rockburst in underground openings using heuristic algorithms and support vector machines. Saf. Sci. 50, 629–644. https://doi.org/10.1016/j.ssci.2011.08.065 (2012).
Article Google Scholar
Chen, B. R., Feng, X. T., Li, Q. P., Luo, R. Z. & Li, S. J. Rock burst intensity classification based on the radiated energy with damage intensity at Jinping II hydropower station, China. Rock Mech. Rock Eng. 48, 289–303. https://doi.org/10.1007/s00603-013-0524-2 (2015).
Article ADS Google Scholar
Xie, H. P., Zhu, J. B., Zhou, T., Zhang, K. & Zhou, C. T. Conceptualization and preliminary study of engineering disturbed rock dynamics. Geomech. Geophys. Geo-Energy Geo-Resources. 6 https://doi.org/10.1007/s40948-020-00157-x (2020).
Liu, R. et al. Research on rock fracture evolution prediction model based on Adam-ConvLSTM and transfer learning. Discov Appl. Sci. 7 https://doi.org/10.1007/s42452-025-06661-7 (2025).
Armaghani, D. J. et al. Toward precise Long-Term rockburst forecasting: A fusion of SVM and Cutting-Edge Meta-heuristic algorithms. Nat. Resour. Res. 33, 2037–2062. https://doi.org/10.1007/s11053-024-10371-z (2024).
Article Google Scholar
Sun, J., Wang, W. & Xie, L. Predicting Short-Term rockburst using RF–CRITIC and improved cloud model. Nat. Resour. Res. 33, 471–494. https://doi.org/10.1007/s11053-023-10275-4 (2024).
Article Google Scholar
Liu, H. et al. Deep learning in rockburst intensity level prediction: performance evaluation and comparison of the NGO-CNN-BiGRU-Attention model. Appl. Sci. (Switzerland). 14. https://doi.org/10.3390/app14135719 (2024).
Jia, Z. C., Wang, Y., Wang, J. H., Pei, Q. Y. & Zhang, Y. Q. Techniques and Multi-model Ensemble Learning Algorithms. Rock Mech. Rock Eng. 57, 5207–5227. https://doi.org/10.1007/s00603-024-03811-y (2024). Rockburst Intensity Grade Prediction Based on Data Preprocessing.
Kuang, H. W., Ai, Z. Y. & Gu, G. L. A novel identification model for rock burst grades-taking Jinping II hydropower station hub engineering as an example. Comput. Geotech. 172 https://doi.org/10.1016/j.compgeo.2024.106440 (2024).
Shirani Faradonbeh, R., Vaisey, W., Sharifzadeh, M. & Zhou, J. Hybridized intelligent multi-class classifiers for rockburst risk assessment in deep underground mines. Neural Comput. Appl. 36, 1681–1698. https://doi.org/10.1007/s00521-023-09189-2 (2024).
Article Google Scholar
Rao, G. et al. Long-term prediction modeling of shallow rockburst with small dataset based on machine learning. Sci. Rep. 14 https://doi.org/10.1038/s41598-024-64107-3 (2024).
Qiu, Y. & Zhou, J. Short-term rockburst prediction in underground project: insights from an explainable and interpretable ensemble learning model. Acta Geotech. 18, 6655–6685. https://doi.org/10.1007/s11440-023-01988-0 (2023).
Article Google Scholar
Liang, W., Sari, Y. A., Zhao, G., McKinnon, S. D. & Wu, H. Probability estimates of Short-Term rockburst risk with ensemble classifiers. Rock Mech. Rock Eng. 54, 1799–1814. https://doi.org/10.1007/s00603-021-02369-3 (2021).
Article ADS Google Scholar
Guo, D., Chen, H., Tang, L., Chen, Z. & Samui, P. Assessment of rockburst risk using multivariate adaptive regression splines and deep forest model. Acta Geotech. 17, 1183–1205. https://doi.org/10.1007/s11440-021-01299-2 (2022).
Article Google Scholar
Wang, H. et al. An intelligent rockburst prediction model based on scorecard methodology. Minerals 11 https://doi.org/10.3390/min11111294 (2021).
Rao, G. et al. Impact of a multiple oversampling technique-based assessment framework on shallow rockburst prediction models. Front. Earth Sci. 12, 1514591. https://doi.org/10.3389/feart.2024.1514591 (2025).
Article Google Scholar
Li, N. & Jimenez, R. A logistic regression classifier for long-term probabilistic prediction of rock burst hazard. Nat. Hazards. 90, 197–215. https://doi.org/10.1007/s11069-017-3044-7 (2018).
Article Google Scholar
Li, D., Liu, Z., Xiao, P. & Zhou, J. Jahed armaghani, D. Intelligent rockburst prediction model with sample category balance using feedforward neural network and bayesian optimization. Undergr. Space (China). 7, 833–846. https://doi.org/10.1016/j.undsp.2021.12.009 (2022).
Article Google Scholar
Zhang, H., Xia, Y., Lin, M., Huang, J. & Yan, Y. A three-step rockburst prediction model based on data preprocessing combined with clustering and classification algorithms. Bull. Eng. Geol. Environ. 83 https://doi.org/10.1007/s10064-024-03774-y (2024).
Xue, Y. et al. Intelligent prediction of rockburst based on Copula-MC oversampling architecture. Bull. Eng. Geol. Environ. 81 https://doi.org/10.1007/s10064-022-02659-2 (2022).
Yang, T. J. et al. Comparative analysis and application of rockburst prediction model based on secretary bird optimization algorithm. Front. Earth Sci. 12 https://doi.org/10.3389/feart.2024.1487968 (2024).
Zhang, J., Wang, Y., Sun, Y. & Li, G. Strength of ensemble learning in multiclass classification of rockburst intensity. Int. J. Numer. Anal. Meth. Geomech. 44, 1833–1853. https://doi.org/10.1002/nag.3111 (2020).
Article Google Scholar
Zhang, Q. et al. A semi-Naïve bayesian rock burst intensity prediction model based on average one-dependent estimator and incremental learning. Tunn. Undergr. Space Technol. 146 https://doi.org/10.1016/j.tust.2024.105666 (2024).
Guo, J., Guo, J., Zhang, Q. & Huang, M. Research on rockburst classification prediction based on BP-SVM model. IEEE Access. 10, 50427–50447. https://doi.org/10.1109/ACCESS.2022.3173059 (2022).
Article Google Scholar
Afraei, S., Shahriar, K. & Madani, S. H. Developing intelligent classification models for rock burst prediction after recognizing significant predictor variables, sect. 1: literature review and data preprocessing procedure. Tunn. Undergr. Space Technol. 83, 324–353. https://doi.org/10.1016/j.tust.2018.09.022 (2019).
Article Google Scholar
Dong, L. J., Li, X. B. & Peng, K. Prediction of rockburst classification using random forest. Trans. Nonferrous Met. Soc. China. 23, 472–477. https://doi.org/10.1016/S1003-6326(13)62487-5 (2013).
Article Google Scholar
Liu, R., Ye, Y., Hu, N., Chen, H. & Wang, X. Classified prediction model of rockburst using rough sets-normal cloud. Neural Comput. Appl. 31, 8185–8193. https://doi.org/10.1007/s00521-018-3859-5 (2019).
Article Google Scholar
Wu, S., Wu, Z. & Zhang, C. Rock burst prediction probability model based on case analysis. Tunn. Undergr. Space Technol. 93 https://doi.org/10.1016/j.tust.2019.103069 (2019).
Breunig, M. M., Kriegel, H. P., Ng, R. T. & Sander, J. LOF: identifying density-based local outliers. Sigmod Record. 29, 93–104 (2000).
Article Google Scholar
Yan, S. et al. Investigation and application of data balancing and combined discriminant model in rock burst severity prediction. Sci. Rep. 14 https://doi.org/10.1038/s41598-024-81307-z (2024).
Wilson, D. L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man. Cybernetics. 2, 408–421. https://doi.org/10.1109/TSMC.1972.4309137 (1972).
Article MathSciNet Google Scholar
He, H. B. & Garcia, E. A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284. https://doi.org/10.1109/tkde.2008.239 (2009).
Article Google Scholar
Cover, T. M. & Hart, P. E. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 13, 21–27. https://doi.org/10.1109/TIT.1967.1053964 (1967).
Article Google Scholar
Wang, J., Ma, H. & Yan, X. Rockburst intensity classification prediction based on Multi-Model ensemble learning algorithms. Mathematics 11 https://doi.org/10.3390/math11040838 (2023).
Yin, X. et al. Strength of stacking technique of ensemble learning in rockburst prediction with imbalanced data: comparison of eight single and ensemble models. Nat. Resour. Res. 30, 1795–1815. https://doi.org/10.1007/s11053-020-09787-0 (2021).
Article Google Scholar
Xue, G. L., Yilmaz, E., Song, W. D. & Cao, S. Fiber length effect on strength properties of polypropylene fiber reinforced cemented tailings backfill specimens with different sizes. Constr. Build. Mater. 241 https://doi.org/10.1016/j.conbuildmat.2020.118113 (2020).

Download references

Acknowledgements

This work was supported by the 2023 Jiangxi Province“Science and Technology + Emergency Response”Joint Program Project: Research on Safe and Efficient Mining Technology and Application Demonstration in Gannan Tungsten Mine[2023KYG01002].

Funding

This work was supported by the 2023 Jiangxi Province“Science and Technology + Emergency Response”Joint Program Project: Research on Safe and Efficient Mining Technology and Application Demonstration in Gannan Tungsten Mine[2023KYG01002].

Author information

Authors and Affiliations

School of Resources and Environmental Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China
Haoran Ge, Jiyong Zhang, Congbo Ma, Kai Hui & Ziniu Wu
Jiangxi Provincial Key Laboratory of Low-Carbon Processing and Utilization of Strategic Metal Mineral Resources, Jiangxi University of Science and Technology, Ganzhou, 341000, China
Jiyong Zhang
School of Architecture and Design, Jiangxi University of Science and Technology, Ganzhou, 341000, China
Yicong Li

Authors

Haoran Ge
View author publications
Search author on:PubMed Google Scholar
Jiyong Zhang
View author publications
Search author on:PubMed Google Scholar
Congbo Ma
View author publications
Search author on:PubMed Google Scholar
Kai Hui
View author publications
Search author on:PubMed Google Scholar
Yicong Li
View author publications
Search author on:PubMed Google Scholar
Ziniu Wu
View author publications
Search author on:PubMed Google Scholar

Contributions

H.G. is the executor of the modeling design and theoretical analysis of this study, and is responsible for the writing of the first draft. J.Z., C.M. and K.H. completed data analysis and guided the writing and revision of the paper; Y.L. and Z.W. participates in the modeling process and results analysis. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiyong Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ge, H., Zhang, J., Ma, C. et al. ‌Rockburst intensity grading prediction based on the LOF-ENN-KNN model‌. Sci Rep 15, 29385 (2025). https://doi.org/10.1038/s41598-025-15603-7

Download citation

Received: 09 May 2025
Accepted: 08 August 2025
Published: 11 August 2025
DOI: https://doi.org/10.1038/s41598-025-15603-7