Introduction

Mine safety is a critical issue within the global coal industry, particularly in coal-rich nations such as China. As an essential component of coal mine safety management, the prediction of mine water inflow plays a vital role in preventing water-related accidents, safeguarding the lives of miners, and ensuring the economic viability of coal mining operations1. In karst regions such as southwest China, coal mining frequently presents complex groundwater issues that pose a dual threat: they not only jeopardize the stability of the mines but also risk mine water flooding when the volume of water entering the mine surpasses the drainage system’s capacity. This situation is closely linked to the rationality of the mining plan and the design of the mine’s drainage capacity, both of which significantly influence the mine’s safe production and contribute to serious environmental and safety challenges2,3,4. For example: A coal mine in Shanxi, China, caused five deaths and incurred a direct economic loss of 13.82 million yuan due to a roof water surge resulting from the improper arrangement of the comprehensive mining face5. Similarly, a coal mine in western Ghana resulted in four fatalities and a direct economic loss of 17.11 million yuan6. Additionally, a mine in Sudan, Africa, experienced a severe water surge accident, leading to the deaths of 21 individuals and an economic loss of up to 70.672 million yuan7,8. Therefore, accurate prediction of mine water surge is crucial for safe production in coal mines, formulation of reasonable drainage measures and prevention of water damage accidents9,10 .

Currently, various technical methods have been developed to predict mine water inflow. These methods can be broadly classified into two categories: deterministic analysis methods and uncertainty analysis methods11,12. Deterministic methods necessitate precise information regarding the geological conditions of the mine, as well as accurate hydrogeological parameters. Representative methods include the water balance method, numerical simulation method, analytical method, and support vector machine, among others13,14,15,16,17,18,19. Deterministic methods necessitate precise information regarding the geological conditions of the mine, as well as accurate hydrogeological parameters. Representative methods include the water balance method, numerical simulation method, analytical method, and support vector machine, among others20,21,22,23,24,25.

The contribution of these methods to safe mining is undeniable. For instance, Yang et al.26. employed the singular spectrum technique to analyze the time series data of mine water flow and developed a Matlab-based prediction model for mine water flow. Similarly, Chen et al.27. established a Monte Carlo-based discrete fractured rock media model to predict coal bed water inflow, while Xu et al.28. created an optimized GM(1,1) model for predicting mine water inflow. However, each method has its own limitations. For instance, the water equilibrium method faces challenges in determining the equilibrium equation and generalizing the boundary conditions29; Similarly, the regression analysis method struggles with identifying relevant influencing factors and requires a substantial amount of computational data30,31. Additionally, the time series method relies on analyzing the patterns of event development and predicting data randomness over a specified period; however, it often exhibits low predictive accuracy when applied to long sequences of data32,33,34,35,36.

As global demand for mineral resources continues to rise and the depth and scale of underground mining expand, traditional single mine water inflow prediction methods frequently fall short of providing effective reference values. This inadequacy stems from their simplified assumptions and challenges in capturing the complex dynamic characteristics of mine water inflow. Consequently, these methods are often unable to accurately predict actual inflow data, leading to biased or even erroneous results37,38,39. In karst areas characterized by complex geological conditions, developed underground water systems, intricate karst conduits, and easily penetrable water-conducting fractures, traditional methods exhibit significant limitations and uncertainties in practical application. Consequently, in recent years, numerous scholars worldwide have undertaken research to enhance the prediction accuracy and time efficiency of inflow prediction models. This research often involves the integration of two or more prediction methods to leverage their respective advantages and mitigate their weaknesses. However, despite these advancements, new inflow prediction methods continue to encounter various challenges40,41,42.

For example, Wu et al.43. and Zhang et al.44. established a neural network grey model, however, this model necessitates large sample sizes, stable time series data, and faces challenges related to the excessive number of parameters in the nonlinear data relationships, which can be difficult to ascertain. Meenal et al.45. developed a support vector regression (SVR) method based on machine learning algorithms, yet its practical application is highly dependent on the setting of hyperparameters46. Subasi et al.47. introduced a particle swarm optimization algorithm as an alternative to traditional methods for optimizing support vector machines, resulting in the PSO-SVR prediction model. Nevertheless, this model requires the specification of initial parameters, and the selection of these parameters can significantly influence the model’s performance while also presenting high computational complexity48,49.

On the other hand, the non-smoothness of the data presents a significant challenge to these combined prediction models when applied in real-world environments50,51. Indeed, when confronted with fundamentally non-smooth data, there are considerable drawbacks to employing these methods to process the entire dataset. The decomposition of temporal data that informs the model training process can result in boundary effects, which diminish the model’s accuracy52,53,54. Furthermore, many wave prediction models struggle to address the extreme non-linear challenges that emerge as the distance and depth of coal mining increase55,56. Consequently, current prediction models fail to meet the demand for finer-grained predictions, leading to a scarcity of effective tools for leveraging the observations.

To address the challenges of parameter selection in nonlinear data, as well as the data instability and extreme dynamic changes characteristic of time-series data in water inflow prediction, we integrate the nonlinear fitting capabilities of the BP neural network with the time-series analysis strengths of the ARIMA model. This approach results in a reliable, widely applicable, and high-precision dynamic combination model for predicting water inflow. We utilize water inflow data obtained from underground monitoring at a typical Longfeng coal mine in Guizhou Province to validate the model. Furthermore, to ensure the accuracy of the proposed model, we compare its performance against traditional prediction models, with the goal of enhancing the accuracy and reliability of water inflow predictions. This research provides a more scientific and precise tool for predicting coal mine water influx, thereby offering substantial technical support for coal mine safety operations and water damage prevention. Such advancements are crucial for improving coal mine safety management, reducing water-related accidents, safeguarding miners’ lives, and promoting the sustainable development of the coal industry.

Materials and methods

Overview of the study area

The coal mine is situated in the southwestern part of Jinsha County, where coal seam 9 is extracted, with a designed production capacity of 1.2 million tonnes per year. The mine is approximately 14.5 km from the county town in a straight line and falls under the administrative jurisdiction of Anluo Township, Xinhua Township, and Chengguan Township in Jinsha County, as well as Qianxi County Re-town. A traffic location map of the study area is presented in Fig. 1. The geographical coordinates are as follows: east longitude 106°07′30″ to 106°15′00″ and north latitude 27°14′45″ to 27°24′45″. The field measures approximately 7.0 to 11.5 km in length and 2.5 to 9.5 km in width, covering an area of 91.6633 km². The Longfeng Field is located on the northwest flank of the Pingba syncline, characterized by generally monoclinal tectonics. The strata predominantly dip at angles between 120° and 160°, with local variations ranging from 40° to 70°, and an overall dip angle of 4° to 12°. The region experiences a northern subtropical humid monsoon climate, with the highest average monthly temperature reaching 22.9 °C and the lowest average monthly temperature dropping to 3.8 °C. The annual average temperature ranges from 12.5 °C to 16.5 °C, and the average frost-free period spans approximately 275 days. The maximum average monthly rainfall is 169.2 mm, while the minimum average monthly rainfall is 20.2 mm, resulting in an average annual rainfall of 1050 mm. In addition to atmospheric precipitation that replenishes surface waters as surface runoff, a portion of the precipitation infiltrates to recharge various aquifers, thereby forming underground runoff.

In the process of mining and extracting coal, the primary water sources include the thin-seam tuff located on the roof of the coal seam, the Changxing aquifer, which predominantly yields water in the form of fissure water, and locally, the Yulong Mountain tuff. The water richness of the thin-seam tuff and the Changxing tuff is relatively low, whereas the Yulong tuff exhibits a significantly higher water richness; however, there is no direct hydraulic connection with the strong aquifer. The average water inflow into the mine is 30.37 m³/h, with a maximum monthly inflow of 76 m³/h.

Fig. 1
Fig. 1
Full size image

Traffic location map.

Data source

The observation records of mine water influx at Longfeng coal mine from January 2020 to February 2023, coupled with an analysis of atmospheric precipitation data, mine return, and mine inflow statistics, reveal a notable relationship (Fig. 2). Initially, the mine water influx correspondingly increased with the length of roadway development. However, after reaching a certain threshold, the influx began to stabilize despite ongoing roadway development, indicating that the relationship between these variables has become less pronounced. Notably, in November 2021, when mining at the 1903 working face commenced, there was a significant increase in normal water inflow, suggesting a correlation between mine water influx and the mining activities at the working face. It is anticipated that once the mine water influx reaches a certain level, it will stabilize again as mining at the working face continues.Furthermore, mine water inflow is influenced by atmospheric rainfall, with seasonal variations playing a critical role. Specifically, during periods of heavy rainfall, characterized by maximum daily precipitation, there is a marked increase in mine water inflow during the wet season compared to the dry season. Generally, the mine water inflow is responsive to changes in atmospheric precipitation, typically increasing in tandem with or slightly lagging behind these precipitation events.

Fig. 2
Fig. 2
Full size image

Relationship between influencing factors of surge volume.

In this study, the observed data on mine flow from January 2020 to August 2022 (Table 1), influenced by various factors such as rainfall, reworking, and seasonal variations, were utilized as training samples for model prediction. In contrast, the measured data on mine water inflow from September 2022 to February 2023 were used as test samples to compare with the model prediction results.

Table 1 Monthly mine water inflow from January 2020 to February 2023.

BP neural network model

The BP neural network, or Back-Propagation Network, typically refers to the back-propagation neural network algorithm57. This results in a gradient decrease of the errorThe standard BP neural network model consists of a fixed input layer, an output layer, and a variable number of hidden layers, with the latter adjusted based on the network’s error. This design aims to enhance the model’s accuracy. The core algorithmic process of the model can be divided into two main stages: forward propagation of the signal and back propagation of the error. Through the iterative adjustment of connection weights between the input nodes and the hidden layer nodes, as well as between the hidden layer nodes and the output nodes, along with modifications to the threshold value, the network learns and trains until it identifies the optimal connection weights and threshold values that minimize the error. This process leads to a gradient reduction in the error58. Neural networks possess robust multi-input and multi-output capabilities, parallel computing capabilities, non-linear fitting capabilities, and high fault tolerance, making them widely applicable across various fields for managing both qualitative and quantitative knowledge59,60,61. The flow of the BP neural network model is illustrated in Fig. 3.

The inputs and outputs of the input layer of the BP model satisfy the\(\:{O}_{j}={\text{X}}_{j}\). The output and hidden layers satisfy the following relationship (Eq. 1):

$$\:{O}_{j}={f}_{j}(\begin{array}{c}Ne{t}_{j})\end{array}={f}_{j}{\left(\begin{array}{c}\sum\:{W}_{j}{X}_{i}\end{array}+{\theta\:}_{j}\right)}_{\:}$$
(1)

Which \(\:{f}_{j}\) is the excitation function corresponding to neuron j; the sigmoid function is now the most commonly used.

$$\:f\left(x\right)=\frac{1}{1+{e}^{-x}}$$
(2)

Which \(\:{{\uptheta\:}}_{\text{j}}\) is a threshold of neuron j; \(\:{\text{X}}_{\text{i}}\) is an individual input to neuron j; and \(\:{\text{W}}_{\text{j}}\) is a connection weight of that neuron j with its corresponding input.

Fig. 3
Fig. 3
Full size image

Flow chart of BP neural network model.

ARIMA(p, d,q) model

ARIMA(p, d,q) (Autoregressive Integrated Moving Average) is a widely utilized time series forecasting model, noted for its broad applicability and high forecasting accuracy. This model can effectively predict trends based on a given dataset62,63. The ARIMA model integrates the autoregressive moving average (ARMA) model with the differencing model. The process begins with a d-order differencing operation applied to the non-stationary time series, resulting in a stationary series. Subsequently, the ARMA model is employed to fit this stationary series, thereby addressing the challenges associated with forecasting non-stationary time series. Here, ‘AR’ refers to the autoregressive method, ‘I’ denotes the differencing method, ‘MA’ signifies the moving average method, ‘p’ represents the autoregressive term, ‘q’ indicates the number of moving average terms, and ‘d’ denotes the number of differencing operations performed to convert the non-smooth time series into a smooth one64. The model is fitted iteratively, and the construction of the model utilizes an information criterion function to determine the orders of p and q in the ARIMA model. The general form of the model is presented in Eq. (3):

$$\:{Y}_{x}=c+{\alpha\:}_{1}{Y}_{x-1}+\dots\:+{\alpha\:}_{p}{Y}_{x-p}+{\epsilon}_{x}+{\beta\:}_{1}{e}_{x-1}+\dots\:+{\beta\:}_{q}{e}_{x-q}$$
(3)

Which Yx is the smoothed time series; c is a constant; \(\:{\alpha\:}_{p}, {\beta\:}_{q}\) are the autoregressive and moving average coefficients, respectively; and \(\:{e}_{t}\) is the white noise series.

Basic idea of the BP-ARIMA model

The BP-ARIMA model is a forecasting approach that integrates the BP (Backpropagation) neural network model with the ARIMA (Autoregressive Integrated Moving Average) model, leveraging the strengths of both methodologies. The BP neural network is capable of capturing complex nonlinear relationships due to its extensive mapping capabilities, while the ARIMA model effectively addresses linear relationships and seasonality within time series data. In this study, the BP neural network model is employed to model and predict the normal inflow measured data from January 2020 to June 2022, resulting in the generation of a residual sequence. Subsequently, the ARIMA model is applied to refine the residual sequence, thereby enhancing the accuracy of the overall model predictions. The flowchart illustrating the integrated BP-ARIMA model is presented in Fig. 4, and the specific steps involved are outlined as follows:

Step 1 The measured surge data is taken as the original data series \(\:{x}^{\left(0\right)}\).

$$\:{x}^{\left(0\right)}=\{{x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\cdots\:,{x}^{\left(0\right)}\left(n\right)\}$$
(4)

Step 2 Perform the prediction of the BP neural network model on the original data \(\:{x}^{\left(0\right)}\) to obtain the predicted sequence under the BP neural network model.

$$\:{\widehat{x}}^{\left(0\right)}=\{\widehat{x}{\:}^{\left(0\right)}\left(1\right),\widehat{x}{\:}^{\left(0\right)}\left(2\right),\cdots\:,{\widehat{x}}^{\left(0\right)}\left(n\right)\}$$
(5)

Step 3 Generate a relative residual sequence \(\:e\left(i\right)\) from the predicted sequence \(\:{\widehat{x}}^{\left(0\right)}\) and the original sequence \(\:{x}^{\left(0\right)}\).

$$\:e\left(i\right)=\frac{{x}^{\left(0\right)}\left(i\right)-{\widehat{x}}^{\left(0\right)}\left(i\right)}{{x}^{\left(0\right)}\left(i\right)},i=1,2,\cdots\:,n_0$$
(6)

Step 4 Perform ARIMA(p, d,q) model prediction on relative residual \(\:e\left(i\right)\) to obtain relative residual \(\:\widehat{e}\left(i\right)\) in the predicted state.

Step 5 Set the surge prediction value \(\:{\stackrel{-}{x}}^{\left(0\right)}\left(i\right)\:\text{t}\text{o}:\)

$$\:{\stackrel{-}{x}}^{\left(0\right)}\left(i\right)=\frac{{\widehat{x}}^{\left(0\right)}\left(i\right)}{1-\widehat{e}\left(i\right)}$$
(7)
Fig. 4
Fig. 4
Full size image

Flow chart of the combined idea of the BP-ARIMA model.

Large well method

As the water level in the mine gradually decreases and stabilizes, a landing funnel centered on a large well will form. Furthermore, the flow field resulting from mine evacuation and drainage is analogous to the well flow field generated by pumping tests65. It is therefore concluded that the water influx in the mine caused by coal seam mining can be regarded as a stable flow, initially under pressure during the pumping phase, but subsequently transitioning to unpressurized water following depressurization and drainage. Consequently, the pressure-to-no-pressure formula in the ‘big well method’ is considered the most appropriate for estimating the mine water influx Q.

$$\:Q=1.866\frac{\text{K}\left(2\text{S}\text{M}-{\text{M}}^{2}-{\text{h}}_{0}^{2}\right)}{\text{l}\text{g}\frac{{\text{R}}_{0}}{{\text{r}}_{0}}}$$
(8)

In practice, the mining dynamic water level tends to fall to the bottom of the roadway, which can be considered as h0 = 0. Therefore, the formula can be simplified as:

$$\:\mathcal{Q}=1.866\frac{K\left(2SM-{M}^{2}\right)}{lg\frac{{R}_{0}}{{r}_{0}}}$$
(9)
$$\:R_{0} = r_{0} + 10 S\sqrt K$$
(10)
$$\:{r}_{0}=\sqrt{\frac{A}{\pi\:}}$$
(11)

In this context, K represents the permeability coefficient (m/h), S denotes the head height (m), M indicates the thickness of the aquifer (m), R0 signifies the radius of influence of the mine drainage references (m), r0 refers to the radius of the mine road references (m), S also represents the maximum difference in mine height, and A is the area of the mine (m²).

GM(1,1) grey model

The grey forecasting model serves as a tool for predicting continuous changes in variables within a system. By cumulatively processing data and treating the process as a time-dependent grey process, the model effectively reduces randomness. It elucidates the underlying principles of system development and facilitates quantitative predictions66. In the context of mine influx prediction, mine influx can be analyzed as a grey system due to the inherent uncertainty of influencing factors and the diversity of filling channels. The following section delineates the steps involved in making a prediction. The accuracy of the check is shown in Table 2, and the specific forecasting steps are as follows:

Table 2 GM(1,1) Grey Model Accuracy Checklist.

Step 1 takes the surge volume measurement data as raw data sequence \(\:{x}^{\left(0\right)}\).

$$\:{x}^{\left(0\right)}=\{{x}^{\left(0\right)}\left(1\right),{x}^{\left(0\right)}\left(2\right),\cdots\:,{x}^{\left(0\right)}\left(n\right)\}$$
(12)

Step 2 undergoes one accumulation to get the cumulative sequence \(\:{\widehat{x}}^{\left(0\right)}\).

$$\:{\widehat{x}}^{\left(0\right)}=\{\widehat{x}{\:}^{\left(0\right)}\left(1\right),\widehat{x}{\:}^{\left(0\right)}\left(2\right),\cdots\:,{\widehat{x}}^{\left(0\right)}\left(n\right)\}$$
(13)

GM(1,1) model equation:

$$\:\frac{\text{d}{\widehat{x}}^{\left(0\right)}}{\text{d}t}+a{\widehat{x}}^{\left(0\right)}=u$$
(14)

Step 3 solve \(\:\text{a}\) and \(\:\text{u}\) by least squares:

$$\:C={\left[\begin{array}{c}a,u\end{array}\right]}^{\text{T}}={\left({B}^{\text{T}}B\right)}^{-1}{B}^{\text{T}}Y$$
(15)
$$\:B = \left[ {\begin{array}{*{20}l} { - 1/2\left( {\hat{x}^{{\left( 0 \right)}} \left( 1 \right) + \hat{x}^{{\left( 0 \right)}} \left( 2 \right)} \right)} & 1 \\ { - 1/2\left( {\hat{x}^{{\left( 0 \right)}} \left( 2 \right) + \hat{x}^{{\left( 0 \right)}} \left( 3 \right)} \right)} & 1 \\ {\: \vdots } & \vdots \\ { - 1/2\left( {\hat{x}^{{\left( 0 \right)}} \left( {n - 1} \right) + \hat{x}^{{\left( 0 \right)}} \left( n \right)} \right)} & 1 \\ \end{array} } \right];\:Y = \left[ {\begin{array}{*{20}l} {x^{{\left( 0 \right)}} \left( 2 \right)} \\ {x^{{\left( 0 \right)}} \left( 3 \right)} \\ \vdots \\ {x^{{\left( 0 \right)}} \left( n \right)} \\ \end{array} } \right]$$
(16)

Step 4 solves the differential equation to obtain the predicted value of the cumulative sequence:

$$\:{\widehat{x}}^{\left(0\right)}\left(k+1\right)=\left[{\widehat{x}}^{\left(0\right)}\left(1\right)-\frac{\widehat{a}}{\widehat{u}}\right]{e}^{-\widehat{a}k}+\frac{\widehat{a}}{\widehat{u}}$$
(17)

Step 5 the prediction model for the corresponding raw data is:

$$\:{x}^{\left(0\right)}\left(k+1\right)={\widehat{x}}^{\left(0\right)}\left(k+1\right)-{\widehat{x}}^{\left(0\right)}\left(k\right),k=1,2,\cdots\:,n-1.$$
(18)

Calculate model accuracy

To more objectively evaluate the performance of the models, along with their advantages and disadvantages, the Absolute Relative Error (ARE) index was selected as a measure of model accuracy. This error metric quantifies the discrepancy between the predicted values and the true values, enabling the calculation of both the absolute difference and the ratio of the true value.

The formula is as follows:

$$\:\text{A}\text{bsolute\:relative\:error=}\left[\left|\frac{{\widehat{y}}_{t}-{y}_{t}}{{y}_{t}}\right|\times\:100\text{\%}\right]$$
(19)

The absolute relative error can be characterized in terms of interpretability, relativity, and sensitivity to large errors, which allows for an assessment of the overall level of error in a predictive model across different samples. Specifically, the absolute relative error indicates the extent of variation between the predicted value and the true value.

Surge prediction results and discussion

BP-ARIMA model prediction results

Utilizing the nonlinear mapping characteristics of the BP neural network model, we predicted the inflow data over a total of 38 months, spanning from January 2020 to June 2022. The fitting results are presented in Table 8. However, the accuracy of the BP neural network model obtained is not sufficiently high, necessitating the residual processing of the predicted data values. Subsequently, we employed the ARIMA model to forecast the residual sequence, with the parameter list for the ARIMA(3,1,1) model illustrated in Table 3.

Table 3 Parameters of the ARIMA(3,1,1) model.

For the relative error, combined with the AIC information criterion, the optimal model is finally found to be ARIMA(3,1,1). The detailed formula of the model is:

$$\:{y}_{i}=0.337-{0.862}^{\ast\:}\hspace{0.25em}{y}_{1}\left(x-1\right)-{0.682}^{\ast\:}{y}_{2}\left(x-2\right)-{0.724}^{\ast\:}{y}_{3}\hspace{0.25em}\left(x-3\right)+{0.649}^{\ast\:}e\hspace{0.25em}\left(x-1\right)$$
(20)

The predicted values of the inflow were obtained by equation after the calculation of the relative residual data and are shown in Table 4:

Table 4 BP-ARIMA model prediction results.

BP neural network model prediction results

The measured data of the normal inflow from January 1, 2020, to June 2022 were modeled using the BP neural network model. The parameter settings are presented in Table 5.

Table 5 Table of model parameter settings.

The results of the ARIMA model prediction for the mine influx are presented in Table 6.

Table 6 BP neural network model prediction accuracy values.

As shown in Table 6, the goodness of fit (R²) for the time series is 0.88, while the Mean Square Error (MSE) is 8.05. These results indicate that the prediction outcomes of the BP model are relatively accurate, though the overall prediction accuracy remains moderate.

ARIMA model prediction results

The ARIMA model necessitates that the model residuals exhibit white noise characteristics, indicating the absence of autocorrelation. This can be assessed using the Q-statistic test. For the residuals to be considered white noise, the corresponding p-value should exceed 0.1; conversely, a p-value below this threshold suggests that the residuals do not conform to white noise. The model calculates the Q statistic, as presented in Table 7.

Table 7 Model calculation of Q-statistics.

The results of the Q statistic indicate that the p-value of Q exceeds 0.1. Consequently, the original hypothesis cannot be rejected at a significance level of 0.1, suggesting that the model’s residuals are white noise. Therefore, the model essentially satisfies the required conditions.

As shown in Table 8, the minimum accuracy of water inflow prediction using the ARIMA model is 70.65%, while the average accuracy is 88.56%. Additionally, the goodness of fit, represented by R², is 0.87, and the Mean Square Error (MSE) is 9.10.

Table 8 ARIMA model prediction accuracy values.

Results of the Big Well forecasting method

The analysis of hydrogeological conditions and water filling factors in the mining area indicates that the fissure water within the Changxing tuff and the thin layer of tuff at the top of the Longtan formation serve as the primary water-filling aquifers for the 9-coal mining operation. Consequently, the water inflow parameters utilized in the budget are derived from the pumping tests conducted in the Changxing tuff and the 9-coal roof aquifer. Boreholes B3909, B3509, BE1101, BE1302, and BE1401 were drilled five times to assess the 9-coal roof aquifer, with the results of the pumping tests detailed in Table 9.

Table 9 9-Coal roof pumping test table.

The depth of the water level drop in the aquifer, denoted as s, is calculated as the average of the results obtained from the pumping tests, in accordance with Eq. (21). The permeability coefficient, K, is determined as a weighted average based on the methodology outlined in Eq. (22).

$$\:s=\frac{{s}_{1}+{s}_{2}+{s}_{3}+{s}_{4}+{s}_{5}}{5}$$
(21)
$$\:K=\frac{{s}_{1}{K}_{1}+{s}_{2}{K}_{2}+{s}_{3}{K}_{3}+{s}_{4}{K}_{4}+{s}_{5}{K}_{5}}{{s}_{1}+{s}_{2}+{s}_{3}+{s}_{4}+{s}_{5}}$$
(22)

Where s is the mean water level drop depth of the aquifer, m; K is the mean permeability coefficient of the aquifer, m/d; si is the water level drop depth of different boreholes, m; Ki is the permeability coefficient of different boreholes, m/d.

From September 2022 to February 2023, during the mining of Coal 9, the data regarding the mining area, aquifer thickness, and associated mining height are presented in Table 10.

Table 10 Statistics of relevant parameters.

Calculated to obtain the prediction results of the mine water inflow volume in the big well method, as shown in Table 11.

Table 11 Prediction results of water inflow in the mine shaft using the big well method.

GM(1,1) grey model prediction results

According to the principles of the grey model and the calculation formula (12–18), a prediction model for mine inflow has been established using Python, based on the measured data of normal mine inflow from January 2020 to June 2022. The calculation parameters of the model are presented in Table 12.

Table 12 Parameter values of the GM(1,1) grey model.

According to Table 12, the a posteriori difference ratio C-value is 0.244, which is less than 0.35, indicating that the model’s accuracy class is very good. Furthermore, the small error probability p-value is 0.895, which is less than 0.95, suggesting that the model’s accuracy is acceptable. The results of the mine influx prediction for the period from September 2022 to February 2023 are presented in Table 13.

Table 13 GM(1,1) grey model mine influx prediction results.

Discussion of projected results

Based on the BP-ARIMA model established above, the prediction results were obtained and compared with those derived from the BP neural network model, the ARIMA model, the large well method, and the GM(1,1) grey model individually. The comparison is illustrated in Fig. 5. Additionally, the model’s goodness-of-fit, represented by R², and the mean square error (MSE) are presented in Fig. 6.

Fig. 5
Fig. 5
Full size image

Comparison of model prediction results.

Fig. 6
Fig. 6
Full size image

Comparison of model goodness of fit and mean square error plots.

The evaluation index presents a comprehensive comparison of the relative absolute errors between the predicted and measured water inflow sequences for each model mine, as illustrated in Fig. 7.

Fig. 7
Fig. 7
Full size image

Absolute relative error values for each model.

Based on the mine inflow prediction results presented in Fig. 7 and the model accuracy comparison in Table 14, the following findings emerged:

Table 14 Longfeng Coal Mine January-December 2024 mine influx forecast values.
  1. (1)

    A comparison of the BP-ARIMA model with other prediction methods yielded mean absolute relative error, goodness of fit (R²), and mean square error (MSE) values for the BP model, ARIMA model, BP-ARIMA model, large well method, and GM(1,1) model as follows: 5.662%/0.88/8.05, 13.998%/0.87/9.10, 11.022%/0.93/0.49, 11.359%/0.81/9.96, and 13.229%/0.69/11.06, respectively. Consequently, the BP-ARIMA model demonstrates the highest effectiveness in predicting mine influx, followed by the BP, ARIMA, and large well methods. In contrast, the GM(1,1) grey model proves to be the least effective for such predictions.

  2. (2)

    The BP-ARIMA model demonstrated superior accuracy compared to the standalone BP neural network model and the ARIMA autoregressive sliding average model. The average absolute relative error is 1.022%, and the goodness-of-fit R² value reaches as high as 0.93, significantly reducing prediction error. Additionally, the predicted dynamic trend aligns closely with the actual data. These results indicate that the BP-ARIMA model effectively addresses both linear and nonlinear relationships, showcasing high flexibility and predictive capability. Moreover, it exhibits the ability to perform dynamic predictions in the context of mine influx forecasting, thereby enhancing the overall accuracy of the model.

  3. (3)

    The prediction results of the big well method generally align with the dynamic changes of the measured values; however, the overall calculation results tend to be higher than the measured values. This discrepancy can be attributed to several factors: First, the big well method is more suited for mining areas with simple hydrogeological conditions, while the hydrogeological conditions at Longfeng Coal Mine are complex, which limits the accuracy of this method. Second, the big well method operates on an idealized model that assumes the aquifer is homogeneous and isotropic. In contrast, the rock layers of the Longtan Formation and Changxing Greywacke are unlikely to be entirely homogeneous and isotropic. This assumption leads to the calculation of the permeability coefficient based on these simplified conditions, resulting in calculated surge volumes that deviate from the actual volumes of water inflow. Additionally, this method overlooks the hydraulic connection between the aquifer and the mine, particularly in the Longfeng Coal Mine, where the hydrogeological conditions are more intricate. Furthermore, the values of the permeability coefficient and aquifer thickness used in the prediction formula of the big well method are subject to human error, which can introduce parameter errors and generalization errors, ultimately reducing the accuracy of water inflow predictions.

  4. (4)

    The grey theory GM(1,1) prediction model establishes a dynamic model described by differential equations based on original mine water inflow data, thereby revealing the intrinsic patterns within the data. However, when the GM(1,1) model was employed to predict gushing water, the results deviated significantly from actual measurements. This discrepancy primarily arises from the non-linear and highly complex nature of our gushing water data structure. While the GM(1,1) model is effective for fitting and predicting data trends, particularly with limited data points, its prediction accuracy is significantly influenced by model parameters, structure, and the original data series. Consequently, the model performs poorly in processing data, especially when extreme values or non-linear characteristics are present. Therefore, the grey theory GM(1,1) prediction model is more suitable for short-term predictions; as the prediction horizon lengthens, the error tends to increase.

Modelling applications

The established BP-ARIMA model was used to predict the mine water inflow of Longfeng Coal Mine from January to December 2024, and the prediction results are shown in Table 14.

Table 14 illustrates that the mean value of mine water influx for the period from January to December 2024 is 88.54 m³/h, with a maximum value of 97.96 m³/h. The predicted results exceed the existing measured data on water influx due to the cracking of coal-bearing strata and their overlying layers, which occurs as a result of roadway excavation and mining activities at the working face. The formation of water-conducting fissure zones and the enhanced permeability of these strata contribute to an increase in the normal water influx within the mine. The quantity of normal water influx is gradually increasing. Furthermore, atmospheric precipitation, which serves as the primary indirect source of water, significantly influences the mine water influx. As tunneling and workface mining progress, the cracking of coal-bearing strata and their overlying layers leads to the formation of water-conducting fracture zones, improving the aquifer’s permeability and consequently increasing the normal water inflow. Additionally, the mine inflow is affected by atmospheric precipitation, exhibiting seasonal variations; specifically, the inflow is higher during the rainy season compared to the dry season. Mine water inflow is sensitive to changes in atmospheric precipitation, generally increasing in response to precipitation, albeit with a slight delay. To address the challenge of increased water inflows, we implemented continuous monitoring by integrating the BP-ARIMA model with the mine’s online monitoring system. This integration facilitates real-time data collection and analysis, allowing us to continuously adjust and optimize the model parameters based on prediction results and actual events, thereby enhancing the accuracy and practicality of the predictions. Additionally, we improved the drainage system and monitoring protocols to strengthen mine water management and enhance our ability to predict and respond to mine disasters. The related measures include the enhancement of the mine drainage system through the installation of additional, more powerful pumps, ensuring that the interior of the mine remains dry and secure despite rising water influxes. Furthermore, a comprehensive inspection and reinforcement of the mine’s waterproofing facilities were conducted, with particular attention given to the roadways and working faces. This involved re-laying the waterproofing layer to reduce groundwater infiltration. Additionally, a more rigorous hydrogeological monitoring program has been introduced at the mine, which includes real-time monitoring of hydrological changes both within and outside the mine through the installation of supplementary water level and flow monitoring equipment. The collected data will be utilized to further optimize the mine’s drainage and contingency plans, ensuring a rapid and effective response in the event of an emergency. Through the implementation of these comprehensive measures, the safety and stability of the mine have been significantly enhanced, providing a robust guarantee for the mine’s continuous production.

Conclusion

  1. (1)

    This study utilized the nonlinear mapping capabilities of a BP neural network model, integrated with an ARIMA autoregressive sliding average model, to address the linear and seasonal characteristics of time-series data observed at the Longfeng coal mine from September 2020 to August 2022. The primary objective was to develop a comprehensive BP-ARIMA model. Subsequently, the accuracy of this constructed model was validated using the mine’s influx data from September 2022 to February 2023.

  2. (2)

    The developed BP-ARIMA model was employed to forecast the influx of water from the Longfeng mine. The model’s prediction results yielded a mean absolute relative error of 1.02%, a goodness of fit of 0.93, and a mean square error (MSE) of 0.49. Furthermore, when compared to the BP model, the ARIMA model, the large well method, and the grey theory GM(1,1) model for predicting the influx of water from the Longfeng coal mine, the constructed BP-ARIMA model demonstrated lower error rates and a higher goodness of fit than the aforementioned models. This improvement enhances the accuracy of predictions and provides a scientific foundation for the prevention and control of water damage in the mine.

  3. (3)

    The prediction analysis was conducted using the constructed BP-ARIMA model to forecast water inflow for the period from January to December 2024 at Longfeng Mine. The results indicate that the average water inflow is projected to reach 88.54 cubic meters per hour, with a maximum expected value of 97.96 cubic meters per hour. In comparison to the water inflow data from September 2020 to February 2023, the predictions suggest a gradual increase in water inflow. This phenomenon is primarily attributed to the continuous expansion of the mine development area and the increasing depth of mine operations, although the effects of atmospheric precipitation and surface water are also significant factors.

  4. (4)

    The established BP-ARIMA model, as a method for predicting mine water surges, significantly enhances prediction accuracy compared to traditional approaches. However, this method necessitates that the training samples exhibit specific non-linear characteristics and seasonal patterns. To broaden the model’s applicability to a wider range of mine water emergence data, further optimization is desired to enable adaptation to various mine water emergence scenarios. Through this optimization, the BP-ARIMA model can serve as a more robust reference for mine water emergency prevention and control, assisting mining companies in effectively preventing and responding to mine water emergencies, thereby safeguarding the lives of miners and ensuring the production safety of mines.