Deep learning framework based on ITOC optimization for coal spontaneous combustion temperature prediction: a coupled CNN-BiGRU-CBAM model

Shao, Xuming; Liu, Wenhao; Bai, Gang; Chen, Yan; Liu, Yu; Guang, Jiahe

doi:10.1038/s41598-025-11294-2

Download PDF

Article
Open access
Published: 23 July 2025

Deep learning framework based on ITOC optimization for coal spontaneous combustion temperature prediction: a coupled CNN-BiGRU-CBAM model

Xuming Shao¹,
Wenhao Liu²,
Gang Bai^1,3,4,
Yan Chen¹,
Yu Liu^1,3,4 &
…
Jiahe Guang⁵

Scientific Reports volume 15, Article number: 26700 (2025) Cite this article

1573 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Coal spontaneous combustion (CSC) poses a significant safety hazard in coal mines, requiring effective prevention and control strategies. Accurate temperature prediction, crucial for assessing coal oxidation stages and combustion risk, underpins early warning systems. This study analyzes programmed heating experimental data from Dongtan Mine coal samples and integrates the coal oxidation–pyrolysis coupled reaction mechanism. Pearson correlation analysis identified six key gas indicators—O₂, CO, C₂H₄, CO/ΔO₂, C₂H₄/C₂H₆, and C₂H₆—highly correlated with spontaneous combustion temperature. Based on these variables, a deep learning framework combining an Improved Tornado Optimization with Coriolis force (ITOC) strategy and a CNN-BiGRU-CBAM model is proposed. The ITOC algorithm incorporates cubic chaotic mapping initialization, quantum entanglement, and Coriolis force perturbation to enhance global optimization. Comparative experiments with five heuristic algorithms demonstrate ITOC’s superior accuracy and convergence stability. Key CNN-BiGRU-CBAM hyperparameters—learning rate, BiGRU neuron count, and convolutional kernel size—were jointly optimized by ITOC, resulting in optimal values of 0.0093, 108 neurons, and 8.54, respectively. The dataset was split into training, validation, and test sets at an 8:2:1 ratio. Performance evaluation against benchmark models shows the proposed framework achieves a test set R² of 0.9738, MAPE of 4.1254%, MAE of 6.2740, and RMSE of 12.4735. Validation on coal faces in Shandong, Shanxi, and Shaanxi mines confirmed strong generalization and engineering adaptability, with predicted temperature ranges closely matching measurements. The ITOC-CNN-BiGRU-CBAM model offers a promising theoretical and practical approach for intelligent early warning and precise prevention of CSC hazards.

Research on coal spontaneous combustion hierarchical prediction model based on NSGA-II-RF

Article Open access 21 February 2025

Study on spontaneous combustion characteristics of coal under thermo mechanical coupling

Article Open access 30 December 2024

Research on early warning model of coal spontaneous combustion based on interpretability

Article Open access 29 May 2025

Introduction

As a cornerstone of China’s medium- and long-term energy strategy, coal continues to play an indispensable role in supporting national economic and social development^1,2. However, according to the 2023 National Economic and Social Development Statistical Bulletin released by the National Bureau of Statistics, the safety situation in coal mine production remains critical. In 2023 alone, 443 fatalities were reported due to mining accidents, corresponding to a million-ton mortality rate as high as 0.094. Among the major hazards, coal spontaneous combustion (CSC) poses a particularly severe threat. CSC not only results in substantial resource wastage and increased labor costs but also has the potential to trigger cascading disasters, such as gas explosions and coal dust explosions, ultimately leading to catastrophic casualties and economic losses^1,3,4,5. Against this backdrop, the development of an accurate temperature prediction model for coal spontaneous combustion holds significant theoretical and practical value in enhancing coal mine safety^1,6.

A wide range of methods have been proposed by scholars to predict the spontaneous combustion temperature of coal, including direct temperature measurement⁷, isotope radon detection⁸, functional group characterization^9,10, and gas component analysis¹¹. Among these, temperature measurement techniques are frequently employed due to the thermal characteristics of coal. Specifically, the low thermal conductivity of coal and rock prevents effective heat dissipation from high-temperature zones, often resulting in abnormally elevated surface temperatures. However, conventional temperature measurement methods suffer from limited spatial resolution and data incompleteness, which constrain their ability to accurately capture the oxidation reaction process, thereby limiting their application in early warning systems. Isotope radon detection offers high accuracy and operational convenience, yet it is highly sensitive to environmental disturbances such as fluctuations in temperature and humidity. Additionally, the equipment used is often difficult to optimize in terms of both portability and impact resistance. Functional group characterization provides rich chemical information, but its reliance on a single gas indicator and complex experimental procedures weakens its predictive capacity. In contrast, gas analysis methods are sensitive and easy to operate, but their performance is often hindered by the nonlinear relationships between different indicator gases. Traditional research approaches have employed basic theories and classical algorithms to model the coal spontaneous combustion process. For instance, Yutao et al.¹² applied Pearson correlation coefficient analysis to assess the relationship between coal quality indicators and apparent activation energy, subsequently constructing a multiple linear regression model to evaluate the spontaneous combustion tendency. Bo et al.¹³, based on programmed temperature rise experiments, identified the initial temperature and characteristic temperature as early warning thresholds and used grey relational analysis to evaluate the tendency for spontaneous combustion. They also considered CO concentration and carbon oxide ratios, developing a four-level early warning mechanism. Nonetheless, these conventional methods generally rely on manual selection of characteristic indicators and experimental parameter settings, which limits model adaptability and generalization capacity. Furthermore, classical techniques such as multiple linear regression and grey correlation analysis struggle with high-dimensional, complex data and are inadequate for capturing the inherent nonlinear relationships in coal spontaneous combustion processes. As a result, their predictive performance is often inferior to that of modern machine learning and deep learning approaches^14,15.

In recent years, advances in machine learning and deep learning techniques have introduced new perspectives and methodologies for predicting spontaneous coal combustion^16,17. A range of single-model architectures—including back propagation neural networks (BPNN)¹⁸, radial basis function networks (RBF)¹⁹, generalized regression neural networks (GRNN)²⁰, random forests (RF)²¹, support vector regression (SVR)²², convolutional neural networks (CNN)⁶, and gated recurrent units (GRU)¹—have demonstrated significantly superior predictive performance compared to traditional theoretical approaches. Moreover, the integration of metaheuristic optimization algorithms with machine learning and deep learning models has led to the development of various hybrid prediction frameworks for coal spontaneous combustion risk assessment. By optimizing neural network structures and hyperparameters, these hybrid models have substantially enhanced predictive accuracy and generalization capabilities. Representative examples include Particle Swarm Optimization–Back Propagation Neural Network (PSO-BPNN)²³, Modified Sine Whale Optimization Algorithm–Back Propagation Neural Network (MSOWA-BPNN)²⁴, Sparrow Search Algorithm–Convolutional Neural Network (SSA-CNN)⁶, Improved Grey Wolf Optimizer–Gated Recurrent Unit (IGWO-GRU)¹, and Improved Grey Wolf Optimizer–General Regression Neural Network (IGWO-GRNN)²⁰. Despite the impressive performance of these intelligent models in recent years, challenges remain—particularly with regard to model interpretability. In practical engineering applications, decision-makers not only require accurate predictions but also seek to understand the key influencing factors and the underlying mechanisms in order to devise effective, targeted prevention and control strategies. Furthermore, the spontaneous combustion process involves complex nonlinear couplings among various gas indicators, and current models continue to face limitations in terms of generalization and robustness. These limitations are particularly evident when models are applied across diverse geological settings, requiring improved adaptability, parameter selection strategies, and tuning mechanisms. Therefore, enhancing the interpretability, stability, and engineering applicability of these models remains a critical direction for future research^1,6.

In the field of coal spontaneous combustion temperature prediction, deep learning frameworks have evolved from early single-model approaches to sophisticated multivariate hybrid architectures. Recent studies illustrate this progression: Wang et al.²employed the SSA-CNN model to extract local features of gas indicators using convolutional neural networks (CNN), but its unidirectional structure limits its ability to capture the bidirectional dependencies inherent in time-series data. Wei et al.²³ adopted a PSO-BPNN model, while Chen et al.¹utilized the IGWO-GRU model—both leveraging meta-heuristic algorithms to optimize network parameters—yet lacked dynamic mechanisms for assessing and filtering feature importance.²⁵ Some models integrate the CBAM (Convolutional Block Attention Module) attention mechanism²⁶, but fail to achieve synergy with optimization algorithms, resulting in suboptimal allocation of key feature weights. In contrast, the ITOC-CNN-BiGRU-CBAM model proposed in this study addresses these limitations. It enables bidirectional time-series modeling through BiGRU, enhances feature selection via hierarchical coupling of CNN and CBAM, and achieves global collaborative optimization through the chaotic initialization and quantum entanglement strategies embedded in the ITOC algorithm. Despite advancements, existing hybrid models still face significant challenges. In terms of feature coupling, most models simply concatenate network modules (e.g., CNN + GRU), which limits their ability to capture complex nonlinear interactions between indicators such as CO and C₂H₄¹⁰. For cross-scenario applications, models trained solely on data from a single mine (e.g., the SSA-CNN^1,6 based on Dongtan Coal Mine) often struggle to generalize across varying coal qualities. In terms of computational complexity, models such as IGWO-GRNN are difficult to deploy in real-time monitoring scenarios due to their large parameter scale. Additionally, the interpretability of black-box models (e.g., PSO-BPNN) remains limited, making it difficult to quantify the individual contribution of gas indicators and thereby hindering mechanistic analysis¹⁴. The synergy within the proposed model is realized through several coordinated components. CNN, with an ITOC-optimized 8.54-dimensional convolutional kernel, captures abrupt changes in gas concentration (e.g., CO spikes at 220 °C) and applies MaxPooling to reduce dimensionality and denoise the signal. The 108-neuron BiGRU processes time-series data bidirectionally, simultaneously learning the forward causal relationship between temperature rise and gas generation, and the backward dependency linking indicator lag to temperature inflection points. Within the CBAM module, channel attention suppresses the influence of negatively correlated indicators such as O₂ (by approximately 30%), thereby enhancing key ratios like CO/ΔO₂; spatial attention further localizes composite gas features in high-temperature regions. The ITOC algorithm—via cubic chaotic mapping and a quantum entanglement strategy (entanglement probability of 0.4)—jointly optimizes learning rate (0.0093), neuron count, and other critical parameters to match the temporal resolution required for spontaneous combustion prediction²⁷. Collectively, these components form a closed-loop system of feature extraction → time-series modeling → weight allocation → parameter optimization, achieving a prediction accuracy of R² = 0.9738, which represents a 5%–10% improvement over traditional hybrid models.

Accordingly, this study proposes a novel deep learning-based prediction framework for spontaneous coal combustion temperature. Utilizing experimental data from programmed temperature rise tests conducted at the Dongtan Coal Mine, six key gas indicators—O₂, CO, C₂H₄, CO/ΔO₂, C₂H₄/C₂H₆, and C₂H₆—were identified through Pearson correlation coefficient analysis and grounded in the theoretical understanding of the coal oxidation–pyrolysis composite reaction mechanism. These indicators exhibited strong correlations with the spontaneous combustion temperature and were selected as characteristic input features for model development. Building on this foundation, a deep learning prediction framework was constructed, integrating an Improved Tornado Optimization with Coriolis force (ITOC) algorithm with a hybrid CNN-BiGRU-CBAM model. The ITOC algorithm enhances global search capability through the incorporation of a cubic chaos initialization mechanism and a quantum entanglement strategy, which collectively facilitate efficient exploration of the solution space. The algorithm is employed to optimize key hyperparameters of the prediction model, including learning rate, the number of BiGRU neurons, and convolutional kernel size. The proposed ITOC-optimized CNN-BiGRU-CBAM model was benchmarked against a series of representative prediction algorithms, and its generalization ability was further validated through application in multiple coal mine working faces. The results demonstrate the model’s strong predictive performance and robustness, providing a solid technical foundation and theoretical reference for the intelligent early warning of spontaneous coal combustion and its practical implementation in engineering applications.

Theoretical basic research

Improved Tornado Optimizer with Coriolis force (ITOC)

Tornado Optimizer with Coriolis force (TOC)

The Tornado Optimizer with Coriolis Force (TOC), proposed in 2025, is a novel population-based intelligent optimization algorithm inspired by the dynamic motion characteristics of natural tornadoes. The algorithm emulates the spiral ascent and core-attraction mechanisms of tornado airflow, achieving a balance between global exploration and local exploitation through the synergistic interaction of two main phases: spiral search and centripetal convergence. TOC is characterized by a minimal number of control parameters, rapid convergence speed, and a strong capability to escape local optima. These attributes make it particularly well-suited for solving nonlinear, multimodal, and high-dimensional optimization problems. The algorithm has demonstrated promising performance across various domains, including engineering optimization, image processing, and machine learning²⁷.

Modalities for improvement

Improvement of initialization based on Cubic chaotic mapping

Cubic chaotic mapping is a nonlinear iterative mapping method capable of generating chaotic sequences characterized by randomness, ergodicity, and high sensitivity to initial conditions^26,28. Its mathematical expression is presented in Eq. (1). By incorporating Cubic chaotic mapping into the initialization process—specifically as shown in Eq. (2)—the generated chaotic sequences are used to initialize the positions of individuals in the population. This approach promotes a more uniform and diverse distribution of the initial population across the solution space, thereby reducing the risk of premature convergence to local optima. As a result, it significantly enhances the algorithm’s global search capability and improves its optimization efficiency and accuracy when solving complex, high-dimensional problems.

$$z_{k + 1} = \mu z_{k} (1 - z_{k}^{2} )$$

(1)

where z_k denotes the chaotic value generated by the k-th iteration; k is the number of iterations; the initial value z₀ is the seed value given in the interval (0,1), which is set to 0.5 in this study; μ is the control parameter, and μ = 2.5 is selected to generate the chaotic sequence.

$$x_{ij} (0) = l_{j} + z_{j \times d + j} \times (u_{j} - l_{j} )$$

(2)

where x_ij(0) is the initial position of the i-th individual (i = 1, 2, …, n, n is the population size) in the j-th dimension (j = 1, 2, …, d, d is the dimension of the search space); l_j and u_j and are the lower and upper bounds, respectively, of the j-th dimension of the search space; and $z_{j \times d + j}$ is the corresponding chaotic value generated by the Cubic Chaos Mapping. It is used to determine the specific value of the initial position of the individual in that dimension within the upper and lower limits.

Individual position update based on quantum entanglement

Quantum entanglement is a quantum mechanical phenomenon wherein a special form of correlation exists between multiple quantum particles (e.g., photons, electrons), such that the measurement of one particle’s state instantaneously influences the state of another, regardless of the spatial separation between them. This phenomenon, which challenges the classical deterministic view of physics, serves as an inspiration for enhancing population diversity in the TOC algorithm. Specifically, a quantum entanglement mechanism is introduced to establish associative relationships between certain storm and thunderstorm individuals within the population.

At the beginning of the optimization process, each individual is assigned an entanglement marker vector, as defined in Eq. (3). During the search process, when some individuals begin to converge toward a local optimum, the entanglement mechanism allows other linked individuals to detect this trend. In response, these entangled individuals can adaptively adjust their search trajectories, thereby preventing premature convergence of the entire population to a local optimum. For instance, when a thunderstorm individual approaches a local optimal region, its entangled storm counterparts can randomly alter their search directions to explore alternative regions of the solution space. This interaction mimics the instantaneous influence seen in quantum entanglement, and effectively sustains population diversity, thus enhancing the algorithm’s global exploration capability^29,30.

$$E_{i} = [e_{i1} ,e_{i2} , \cdots ,e_{im} ](e_{ik} \in \{ 0,1\} ,k = 1,2, \cdots ,m)$$

(3)

where E_i is the entanglement marker vector of the i-th individual, which is used to record the entanglement relationship between this individual and other individuals; e_ik denotes the entanglement marker between the i-th individual and the k-th potential entangled individual (at most m potential entangled individuals). Where e_ik = 1 indicates that the entanglement relationship is established, and e_ik = 0 indicates that the entanglement relationship is not established, and e_ik is set to 0 initially.

Constructing the position adjustment term due to quantum entanglement to adjust the algorithm individual update is shown in Eq. (4):

$${\Delta }x_{{w_{ij} }}^{entangle} = \mathop \sum \limits_{k = 1}^{m} e_{ik} \times \xi_{k} \times (rand_{entangle} \times (u_{j} - l_{j} ))$$

(4)

where ${\Delta }x_{{w_{ij} }}^{entangle}$ is the position adjustment term due to quantum entanglement; $w_{ij}$ is the j-th dimension of individual i to be updated; $\xi_{k}$ is the weight coefficient (value in the interval), which is used to measure the influence of different entangled individuals on the position adjustment of the current storm individual, and is set to 0.7; $rand_{entangle}$ is the random number in the interval of (-1,1), which is used to randomly adjust the direction and amplitude of the position change due to entanglement.

To traverse the individuals to find entanglementable objects, use for each individual i in the population (except for the tornado individual, since the tornado, as a globally optimal bootstrap individual, is not actively involved in the establishment of entanglement for the time being), check the other individuals in turn $j(j \ne i)$ Use the squared sum of the difference of their current position vectors in each dimension as a measure, i.e., using the formula for calculating the inter-individual similarity metrics as shown in Eq. (5):

$$s_{ij} = \mathop \sum \limits_{k = 1}^{d} (x_{ik} - x_{jk} )^{2}$$

(5)

where $s_{ij}$ is the degree of similarity between the i-th individual and the j-th individual, the smaller the value means that the two individuals are closer in each dimension, i.e., the more likely to converge similarly; $x_{ik}$ is the position of the i-th individual in the k-th dimension; $x_{jk}$ is the position of the j-th individual in the k-th dimension, and d is the dimensionality of the search space.

The positions of storm and thunderstorm individuals are updated by adding the position adjustment term brought about by quantum entanglement, and the storm position updating formula is shown in Eq. (6), and the thunderstorm position updating formula is shown in Eq. (7).

$$x_{{w_{ij} }}^{t + 1} = x_{{w_{ij} }}^{t} + 2 \times \alpha \times (x_{{o_{ij} }}^{t} - rand_{w} ) + {\Delta }x_{{w_{ij} }}^{entangle} + v_{ij}^{t + 1}$$

(6)

where $x_{{w_{ij} }}^{t + 1}$ is the position of the storm individual $w_{i}$ in the j-th dimension at the t + 1st iteration; $x_{{w_{ij} }}^{t}$ is the position of the storm individual $w_{i}$ in the j-th dimension at the t-th iteration; a is a parameter controlling the step size, which is used to regulate the size of the step size of the position updating; $x_{{o_{ij} }}^{t}$ is the position of the relevant reference position (e.g., corresponding to the position of the thunderstorm individual, etc., in this dimension) at the t-th iteration; is the random number generated within the interval of (0,1), which is used to introduce a certain degree of randomness; $rand_{w}$ is the random number generated in the (0,1) interval; $v_{ij}^{t + 1}$ is the velocity term used for position updating according to the original tornado algorithm related to gradient wind speeds, etc., which is used to guide storm individuals to move around and explore according to certain physical laws.

$$x_{{t_{ij} }}^{t + 1} = x_{{t_{ij} }}^{t} + 2 \times rand \times (x_{{t_{ij} }}^{t} - x_{{w_{{j + \sum {_{i = 1}^{{n\mathop {wk}\limits^{.} }} } }}^{t} }} ) + 2 \times rand \times (x_{{o_{ij} }}^{t} - x_{{t_{ij} }}^{t} ) + \Delta x_{{t_{ij} }}^{en\tan gle}$$

(7)

The meaning of each parameter is similar to that in the storm position update equation, which here corresponds to the position update of thunderstorm individual $t_{i}$ in the j-th dimension at the (t + 1)-st iteration.

The tornado individual, as a representative of the global optimal solution, has a relatively more stable position update, which is still mainly based on its own convergence properties and its function of guiding other individuals. Since it mainly plays the role of guiding, but will be indirectly affected by the influence of other individuals on the exploration of the whole population after changing their behaviors through entanglement, which in turn affects its subsequent position, it does not additionally add the direct adjustment term due to entanglement. Its position is updated as shown in Eq. (8):

$$x_{{t_{ij} }}^{t + 1} = x_{{t_{ij} }}^{t} + 2 \times \alpha \times (x_{{t_{ij} }}^{t} - x_{{o_{\zeta j} }}^{t} ) + 2 \times \alpha \times (x_{{t_{pj} }}^{t} - x_{{t_{ij} }}^{t} )$$

(8)

where $x_{{t_{ij} }}^{t + 1}$ is the position of the tornado individual in the j-th dimension at the (t + 1)-st iteration; $x_{{t_{ij} }}^{t}$ is the position of the tornado individual in the j-th dimension at the t-th iteration; a is the parameter of the time-control step; $x_{{o_{\zeta j} }}^{t}$ and $x_{{t_{pj} }}^{t}$ are the positions of the other reference positions associated with the tornado individual in the j-th dimension at the t-th iteration, respectively.

The judgment termination condition of the overall algorithm is: t ≥ max_it. In this study, during the experiment, the entanglement establishment probability p_entangle is set to 0.4, the threshold th_similarity used to judge the similarity is set to 0.7, the weight coefficient xi_k related to quantum entanglement is set to 0.4, and the set upper limit of the number of individuals that can be entangled with m is set to 10.

Convolutional Neural Network—Bidirectional Gated Recurrent Unit—Attention Mechanism Model(CNN-BiGRU-CBAM)

Convolutional Neural Network(CNN)

Convolutional Neural Network (CNN) is a highly influential model in deep learning, primarily used in fields such as image, speech, and natural language processing. It consists of several layers, including a convolutional layer, a pooling layer, and a fully connected layer. The CNN is capable of automatically learning feature representations by extracting features through the convolutional kernel, reducing dimensionality in the pooling layer, and integrating the output in the fully connected layer^6,31. The primary role of the convolutional layer is to extract local features from the input data and generate a feature map by applying the convolutional kernel to the input data. Its mathematical expression is shown in Eq. (9):

$$Y_{i,j,k} = f(\sum\limits_{c = 1}^{{C_{in} }} {\sum\limits_{m = 1}^{{H_{k} }} {\sum\limits_{n = 1}^{{W_{k} }} {W_{m,n,c,k} } } } \cdot x_{i + m - 1,j + n - 1,c} + b_{k} )$$

(9)

The pooling manipulation serves to downsample and reduce the size of the feature map while retaining important features, making the network more robust to changes in the position of features. The pooling layer is calculated as shown in Eq. (10):

$$X_{j}^{l} = f(\beta_{j}^{l} \cdot down(X_{j}^{l - 1} ) + b_{j}^{l} )$$

(10)

where β is the weight matrix; $down( \cdot )$ is the downsampling function.

The role of the fully connected layer is to completely connect every neuron of the input data to the output data, which can synthesize and non-linearly combine the features extracted from the previous layer to achieve a high-level feature representation, as shown in Fig. 1a.

Bidirectional Gated Recurrent Unit(BiGRU)

Bidirectional Gated Recurrent Unit (BiGRU) is a deep learning architecture derived from the expansion of the Gated Recurrent Unit (GRU). It processes data simultaneously from both the forward and reverse directions of the sequence by utilizing two independent GRUs (forward and reverse). These GRUs perform computations and combine their hidden states, avoiding the problem of excessive additional parameters through parameter sharing. The bidirectional structure enables the model to consider both past and future information in the sequence at the same time. In the forward direction, the input data is processed in time steps, while in the reverse direction, the input data is processed in reverse time steps^32,33. The bi-directional gated loop cell structure is shown in Fig. 1b. This bidirectional setup helps the model capture complex patterns and the intricate relationships in the input sequences more effectively.

$$\overrightarrow {h}_{ft} = {\text{GRU}} (\overrightarrow {h}_{f(t - 1)} ,x_{t} )$$

(11)

$$\overrightarrow {h}_{bt} = {\text{GRU}} (\overrightarrow {h}_{b(t + 1)} ,x_{t} )$$

(12)

where $\overrightarrow {h}_{ft}$ and $\overrightarrow {h}_{bt}$ are the left-to-right and right-to-left hidden states, respectively. GRU denotes the GRU unit and $x_{t}$ denotes the t-th element in the input sequence. The final hidden state is the splice of the hidden state in both directions.

Attention mechanism(CBAM)

The Convolutional Block Attention Module (CBAM) is a lightweight attention mechanism widely used in computer vision tasks. CBAM consists of two key components: the Channel Attention Module and the Spatial Attention Module. These components focus on answering two questions: “What are the important features?” and “Where are the critical regions?” respectively. In the Channel Attention Module, CBAM generates two spatial context descriptors through average pooling and maximum pooling along the spatial dimensions of the input feature maps. These descriptors are then passed through a shared multilayer perceptron (MLP) to create a channel attention map, which helps evaluate the importance of each channel²⁵. In the Spatial Attention Module, CBAM extracts spatial information by applying average pooling and maximum pooling over the channel dimensions. The two resulting descriptors are concatenated and then passed through a convolutional layer to generate a spatial attention map, which highlights key regions in the image. By linking channel attention and spatial attention in tandem, CBAM directs the model to focus more efficiently on the essential information, thus enhancing feature representation and improving model performance. The schematic diagram of its structure is shown in Fig. 1c.

ITOC-GRITOC-GRU based predictive model for spontaneous coal combustion

Model framework construction

In this study, the data are input as slices containing both known and labeled time series. These slices first pass through the convolutional neural network (CNN) layer, where local features are extracted using a one-dimensional convolutional layer (Conv1D) with 64 filters and a kernel size of 2. The ReLU activation function is introduced to increase nonlinearity. Afterward, the features undergo downsampling via a one-dimensional maximum pooling layer (MaxPooling1D) with a pooling window size of 2, reducing the feature dimension and expanding the receptive field.

The output from the CNN layer is then processed by the bidirectional gated recurrent unit (BiGRU) layer after reshaping with the MaxPooling1D and Reshape layers. To optimize model performance, the Improved Tornado Optimization Algorithm (ITOC) is employed to identify the optimal hyperparameters, including the learning rate (ranging from 0.001 to 0.1), the number of BiGRU neurons (ranging from 32 to 256), and the convolution kernel size (ranging from 2 to 8). The initial population is constructed based on ITOC’s rules and iterated to find the optimal hyperparameter combinations. During each iteration, a model instance is built according to the current hyperparameter set, trained with the training data, and evaluated with validation data using metrics such as mean square error or accuracy. ITOC generates new individuals by updating strategies, performing multiple iterations until convergence, and determining the optimal hyperparameters. Next, the BiGRU layer processes the CNN output features using a bidirectional structure, capturing complex dependencies within the time series. The output is then passed to the Convolutional Block Attention Module (CBAM) layer. In the CBAM layer, the channel attention mechanism first operates by performing global average and maximum pooling to generate two vectors, which are processed by a multilayer perceptron (MLP) containing hidden and output layers. The sum of these vectors is passed through a Sigmoid function to obtain the attention weights, which are multiplied by the feature map to apply channel attention, emphasizing key channel features. Following this, the spatial attention mechanism is applied. The channel-attended feature maps are spliced after average pooling and maximum pooling, and the resulting vectors are passed through a 7 × 7 convolutional layer and a Sigmoid function to generate spatial attention weights, which are then multiplied with the feature maps to highlight key spatial regions. Finally, the data processed by CBAM is flattened into one-dimensional vectors through a Flatten layer, and the final prediction result is produced by the Dense fully connected layer. At this point, all modules—CNN, BiGRU, and CBAM—work synergistically, with the ITOC-optimized hyperparameters to complete the task. The specific flowchart of the model is shown in Fig. 1d.

Sample data-driven characterization of gas evolution

Key gas reaction mechanisms in coal pyrolysis processes

Scholar Tromp³⁴ conducted an in-depth study on the mechanism of coal pyrolysis, systematically describing the main chemical reactions involved in the pyrolysis process and the characteristics of its products. The specific reaction pathway of coal pyrolysis is shown in Fig. 2. The pyrolysis process follows the general law of thermal cracking of organic matter and is influenced by various factors. Based on the characteristics of the pyrolysis reactions and their stage-by-stage evolution, the process can be divided into primary cracking, secondary reactions, and condensation reactions. The composition and evolution of the pyrolysis products are determined collectively by these stages^1,6.

The pyrolysis process of coal is a complex multiphase reaction system, where the core mechanism involves the step-by-step deconstruction of the network structure of coal macromolecules and the evolution of free radical-mediated reaction pathways. In the initial pyrolysis stage, weak chemical bonds (such as methylene bridges, ether bonds, and thioether bonds) in the coal matrix preferentially undergo homolytic cleavage, generating highly reactive radical intermediates. This is accompanied by β-breakage of aliphatic side chains, releasing light hydrocarbons, primarily CH₄, C₂H₆, and C₂H₄. The decomposition behavior of oxygen-containing functional groups exhibits significant thermodynamic differentiation: phenolic hydroxyl groups undergo dehydration reactions to generate H₂O, while carboxyl groups undergo decarboxylation/decarbonylation, releasing CO₂ and CO, respectively. Notably, oxygen-containing heterocyclic compounds (e.g., furan and pyran structures) exhibit a unique mechanism during pyrolysis, where the release of ring ensile forces influences the path of the ring-opening reaction, significantly affecting the distribution of gas-phase product compositions.

The evolution of low-molecular-weight compounds is controlled by the phase transition behavior of the coal matrix. When the temperature exceeds the glass transition point, aliphatic microcrystals melt to form viscoelastic matrices with mass-transfer channels, which significantly facilitates the diffusive release of volatile components. These volatile components undergo a complex network of secondary reactions during high-temperature retention, including: (i) free radical-induced C–C bond breaking leading to further molecular weight reduction, (ii) aryl ring dehydrocondensation reactions generating polycyclic aromatic precursors, and (iii) redistribution of reactive hydrogens driving the hydrostabilization pathway to form methylated aromatic structures. Experimental data suggest that the kinetic competition between Diels–Alder cycloaddition and radical recombination determines the evolution of the aromaticity index of the tar in the 400–500 °C temperature range.

Interestingly, at temperatures above 500 °C, a significant shift in the reaction pathway occurs: condensation reactions dominate, forming highly condensed aromatic clusters through aromatic ring thickening and radical recombination. This process not only leads to a decrease in tar yield but also significantly alters the chemical composition of the tar (e.g., by increasing the content of heavier fractions). Since the temperatures used in the experiments were below 500 °C, they are not analyzed in detail in this text^1,6.

Acquisition of sample data

The ITOC-CNN-BiGRU-CBAM coal auto-ignition prediction model uses experimental data published in Ref.³⁵, which originates from the coal auto-ignition experiment conducted at the Dongtan Coal Mine. This experiment aims to simulate the auto-ignition process of coal under actual production conditions. The experiment systematically monitored the temperature distribution of coal samples and the evolution characteristics of gas products, while also calculating key parameters of the coal auto-ignition process.

In the initial phase of the experiment, 1000g of mixed coal samples were placed in a program-controlled heating device, with adequate space left at both the top and bottom of the samples to ensure smooth gas flow. A uniform gas flow was then introduced into the experimental device, and the temperature was gradually increased by the program-controlled heating system, with preheated air delivered inside the device to simulate the conditions of spontaneous combustion. Throughout the heating process, the concentration changes of the gas products were monitored in real time, and relevant parameters were recorded. When the temperature reached the set threshold, heating was immediately halted to analyze the spontaneous combustion characteristics of coal under different temperature conditions. The measurement data obtained from the experiment are shown in Table 1 (only a partial display).

Table 1 Experimental data (partial)

Full size table

Preliminary analysis of the correlation between gas indicators and temperature

In order to screen the key gas indicators that are highly correlated with the spontaneous combustion temperature of coal, Pearson correlation coefficient is used for correlation analysis in this paper^36,37. The threshold value selected was |r|> 0.6, and the significance level p < 0.05 was used as the judgment criterion to ensure that the screened indicators had statistically significant linear correlations with temperature. Pearson correlation analysis is applicable when the data satisfy an approximate normal distribution and there is a linear relationship between the variables. According to the normality test and scatter plot analysis of the experimental data, the data in this paper basically satisfy these assumptions. Therefore, the use of Pearson correlation coefficient as a parametric method to quantitatively measure the linear relationship between temperature and the concentration of each gas is both simple and effective, providing a scientific basis for the subsequent model construction. In this study, six key gas indicators, O₂, CO, C₂H₄, CO/ΔO₂, C₂H₄/C₂H₆, and C₂H₆ were screened, and all of them met the screening criteria mentioned above, and the subsequent models were constructed based on these indicators. The Pearson correlation coefficient r was calculated as shown in Eq. (13):

$$r = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {X_{i} - \overline{X}} \right)\left( {Y_{i} - \overline{Y}} \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {X_{i} - \overline{X}} \right)^{2} \sum\nolimits_{i = 1}^{n} {\left( {Y_{i} - \overline{Y}} \right)^{2} } } } }}$$

(13)

where X_i and Y_i are the observed values of the temperature and gas indicators, respectively, $\overline{X}$ and $\overline{Y}$ are their mean values. The value of this coefficient ranges from -1 to 1. When r tends to 1, it indicates a strong positive correlation between the temperature and the gas indicator; when r tends to -1, it reflects a strong negative correlation; and if r is approximately equal to 0, it indicates that the linear correlation between the two is weak.

Figure 3 shows the strength of the Pearson correlation between temperature and six gas indicators measured during the coal spontaneous combustion experiment, while Table 2 summarizes the corresponding statistical test results. The findings indicate that temperature exhibits significant positive correlations with CO, C₂H₄, CO/∆O₂, and C₂H₆ when each gas metric is considered individually, whereas a negative correlation is observed with O₂. This pattern can be attributed to the nature of coal reactions under elevated temperatures: as temperature rises, incomplete combustion and pyrolysis reactions are intensified, leading to increased emissions of gases such as CO and C₂H₄. In contrast, O₂, a key oxidizing agent in these reactions, is consumed at an accelerated rate, resulting in a noticeable decline in its concentration.

Table 2 Correlation Kendall’s W analysis results

Full size table

Furthermore, statistical testing confirmed that all Pearson correlation results were highly significant (e.g., the chi-square statistics and corresponding p-values were far below the 0.001 threshold), reinforcing the conclusion that the observed linear correlations are not attributable to random variation. In summary, the Pearson correlation analysis conducted in this study reveals robust linear relationships between temperature and the gas indicators CO, C₂H₄, CO/∆O₂, C₂H₄/C₂H₆, C₂H₆, and O₂ during the coal spontaneous combustion process. These statistical findings provide a valuable quantitative foundation for further elucidating the mechanisms underlying chemical reactions in coal spontaneous combustion.

Comprehensive analysis of gas evolution laws based on coal pyrolysis reaction mechanism and experimental data

Figure 4 shows the relationship between gas concentrations or concentration ratios and coal temperature based on the experimental data. As shown in Fig. 4a, the O₂ concentration exhibits a clear downward trend as the coal temperature increases, primarily due to the desorption and decomposition effects of various gases during the heating process. In contrast, the CO concentration shows a steady upward trend. Specifically, when the temperature is below 220 °C, the gradual increase in CO concentration is attributed to coal oxidation reactions and the thermal breakdown of side chains and bridge bonds in the molecular structure. Upon reaching approximately 220 °C, secondary reactions—such as the formation of sub-methylene groups—become more pronounced, resulting in a rapid rise in CO emissions¹. Figure 4a also reveals that C₂H₄ concentration increases significantly with temperature. Around 80 °C, the cleavage of aliphatic side chains leads to the release of hydrocarbon gases, including CH₄, C₂H₆, and C₂H₄. As the temperature exceeds 220 °C, the direct cracking reactions become dominant, triggering a sharp surge in C₂H₄ concentration¹.

In Fig. 4b, the CO/ΔO₂ ratio demonstrates a general increasing trend with rising coal temperature. Below 50 °C, the coal remains in a slow low-temperature oxidation stage characterized by weak physical desorption, resulting in minor fluctuations in the ratio. As the temperature approaches 50 °C, accelerated coal oxidation leads to a notable reduction in O₂ concentration, thereby elevating the CO/ΔO₂ ratio^1,6. At approximately 85 °C, the breaking of bridging bonds results in the generation of a large number of free radicals, causing the CO/ΔO₂ ratio to rise rapidly. Between 85 °C and 130 °C, the ratio displays significant oscillations and multiple local extrema, reflecting the dynamic changes in radical concentration. Additionally, the sharp increases in the CO/ΔO₂ ratio observed around 220 °C and 415 °C align well with the upward trend in CO concentration.

Similarly, the C₂H₄/C₂H₆ ratio, also depicted in Fig. 4b, increases overall with coal temperature. Around 170 °C, the direct cleavage reactions of secondary coenzymes further contribute to this trend. The concentration profiles of C₂H₆ and C₂H₄ both support the close relationship between coal temperature variations and the generation and consumption of gaseous products.

These dynamic changes across the six gas indicators reflect not only the stage-specific characteristics but also the nonlinear evolution of the coal–oxygen complex reaction. Specifically, the O₂ concentration serves as a useful early warning signal in the initial stage of coal spontaneous combustion. Meanwhile, CO concentration and the CO/ΔO₂ ratio more accurately characterize the heating and spontaneous combustion stages. Furthermore, as key pyrolysis-indicative gases, the concentration variations of C₂H₄ and C₂H₆, along with the C₂H₄/C₂H₆ ratio, offer critical insights for early detection and risk assessment of coal spontaneous combustion^1,6.

ITOC validation

Summary of current status and research progress of coal spontaneous combustion optimization algorithms

Table 3 summarizes the optimization algorithms currently applied in the field of coal spontaneous combustion engineering, highlighting the methods adopted by various scholars in constructing mathematical models and the corresponding data sources. Building upon these existing approaches, the ITOC algorithm proposed in this study integrates the strengths of previous optimization techniques while achieving notable enhancements in global search capability and convergence efficiency.

Table 3 Summary of relevant optimization algorithms in the field of coal auto-ignition engineering

Full size table

Numerical experimental conditions and test function selection

To validate the effectiveness of the improved ITOC algorithm, eight widely used standard benchmark functions proposed by²² were selected for numerical experiments. These functions encompass unimodal, multimodal, and fixed-dimension multimodal categories, thereby providing a comprehensive evaluation framework for optimization algorithm performance. For comparative analysis, the ITOC algorithm was benchmarked against the optimization algorithms listed in Table 3, including the basic Tornado Optimization Algorithm (TOC), Modified Whale Optimization Algorithm (MSWOA), Particle Swarm Optimization (PSO), Improved Gray Wolf Optimization (IGWO), Sparrow Search Algorithm (SSA), and Simulated Annealing (SA), resulting in a total of eight heuristic optimization methods being assessed.

To ensure fairness and accuracy in the experimental evaluation, all algorithms were configured with identical parameters: a population size of 30 and a maximum of 500 iterations. Each algorithm was independently executed 20 times on each test function, and the optimal solutions from each run were recorded. This setup facilitates a robust comparison of algorithmic stability, convergence speed, and solution quality.

Analysis of test results

Figure 5 presents the distribution clouds and fitness evolution curves of various algorithms across different test functions. The results clearly demonstrate that the ITOC algorithm exhibits superior optimization performance compared to the six other algorithms. It performs consistently well in solving both unimodal and multimodal functions. Notably, ITOC not only achieves the best fitness values but also converges to the optimal solution within 200 iterations in most cases.

A comparative analysis of the vertical axes across the function plots reveals that the fitness values obtained by ITOC are markedly lower than those of the other algorithms—frequently by several orders of magnitude. For the first three unimodal functions, ITOC exhibits a significantly faster convergence rate and achieves solution accuracy more than five times higher than that of the original algorithm. In the case of the three multimodal functions, ITOC maintains a clear advantage in both convergence speed and solution precision. Furthermore, for the fixed-dimension multimodal functions, ITOC accurately and rapidly locates the global optimum.

In summary, the ITOC algorithm demonstrates substantial improvements in both convergence efficiency and solution accuracy, thereby validating its enhanced global search capability and overall effectiveness.

CNN-BiGRU-CBAM coal spontaneous combustion prediction modeling

Based on the experimental data of coal spontaneous combustion temperature rise in the Dongtan Coal Mine, the original dataset was first partitioned into training, validation, and test sets in an 8:2:1 ratio to facilitate the development of a high-performance prediction model. To ensure representativeness and consistency across subsets, a hybrid strategy combining random sampling and stratified sampling was employed. Specifically, stratification was conducted based on critical attributes such as the experimental temperature intervals of coal samples and coal quality parameters, thereby maintaining a balanced distribution of features across all subsets. This approach enhances both the robustness of model training and the reliability of subsequent evaluations. The different activation function accuracies are shown in Fig. 6.

In the model training stage, time-series data were input in the form of sliding window slices containing sequential features and corresponding target labels. Local feature extraction was initially performed using a one-dimensional convolutional layer (Conv1D), configured with 64 convolutional filters of kernel size 2, and employing the ReLU activation function to improve non-linear representation capacity. Subsequently, MaxPooling1D was applied to reduce feature dimensionality and expand the receptive field.

The pooled features were reshaped and passed to a bidirectional gated recurrent unit (BiGRU) layer, which captured bidirectional temporal dependencies, thereby enhancing the model’s ability to learn complex temporal structures. The output of the BiGRU layer was then fed into a Convolutional Block Attention Module (CBAM), which sequentially applies channel and spatial attention mechanisms. Channel attention utilizes both average and max pooling operations to generate channel descriptors, which are passed through a shared multi-layer perceptron (MLP) to produce a channel attention map that emphasizes critical feature channels. Spatial attention, on the other hand, aggregates spatial information across channels and constructs spatial attention maps via a 7 × 7 convolution, thereby focusing on key spatial regions.

The attention-enhanced features were then flattened and input into a fully connected (Dense) layer, which integrates the multi-scale features and outputs the predicted temperature associated with coal spontaneous combustion. During training, model parameters were optimized using backpropagation until convergence. Upon completion, the model was evaluated on the test set, using metrics such as mean squared error (MSE) and prediction accuracy to assess its predictive performance and generalization capability. If the evaluation results were suboptimal, the model architecture and hyperparameters were further fine-tuned until satisfactory performance was achieved.

Hyperparameter optimization of CNN-BiGRU-CBAM model by ITOC algorithm

To enhance model performance, this study focuses on the optimization of three critical hyperparameters: learning rate (range: [0.001, 0.1]), the number of BiGRU neurons (range: [32, 256]), and the convolutional kernel size (range: [2, 8]). These hyperparameters are jointly tuned using an improved hybrid optimization algorithm, ITOC (Improved Tangent-Order Chaos Optimization). In the optimization process, the ITOC algorithm initializes a population in which each individual represents a unique combination of the three hyperparameters. For each iteration, a CNN-BiGRU-CBAM model is instantiated using the parameters defined by an individual. This model is trained on the training set, and its performance is evaluated on the validation set using a loss function such as mean squared error (MSE), which serves as the individual’s fitness score. The algorithm iteratively updates the population based on these fitness evaluations, gradually converging towards the global optimum. Figure 7 shows the hyperparameter optimization process.

The rationale for optimizing these specific hyperparameters is as follows: the learning rate determines the step size in parameter updates during training—values that are too large may cause divergence, while values that are too small may result in excessively slow convergence. The number of neurons in the BiGRU layer influences the model’s ability to capture temporal dependencies, but must be balanced against the risk of overfitting. The convolutional kernel size affects the receptive field for local feature extraction, requiring a trade-off between representational capacity and computational cost.

The training environment is configured as follows: Windows 11 operating system, Python 3.8, and PyTorch 1.12. The Adam optimizer is employed with an initial learning rate of 0.001, a batch size of 16, and a maximum of 500 training epochs. The hardware setup includes an Intel Core i9-10900K (10-core CPU) and an NVIDIA GeForce RTX 3090 GPU (24GB GDDR6X, 35,580 GFLOPS at FP16 precision), with CUDA 11.1 and cuDNN 8 for deep learning acceleration.

Following the optimization process, the optimal hyperparameter configuration is determined as: a learning rate of 0.0093, 108 neurons in the BiGRU layer, and a convolutional kernel size of 8.54. This configuration, which balances model training stability and predictive accuracy, significantly improves the model’s ability to extract and learn temporal features relevant to coal spontaneous combustion temperature prediction. As a result, it provides a robust foundation for high-precision forecasting in coal spontaneous combustion risk assessment.

Prediction results based on ITOC-CNN-BiGRU-CBAM models

The prediction process utilizing the ITOC-CNN-BiGRU-CBAM model involves dividing the coal spontaneous combustion dataset into training, validation, and test sets in a ratio of 8:2:1. This division enables a comprehensive evaluation of the model’s performance across different learning stages. Specifically, the training set is used to fit model parameters, the validation set assists in hyperparameter tuning and structural optimization during training, and the test set is reserved for an unbiased evaluation of the model’s generalization capability on unseen data. By strictly isolating these subsets, the risk of data leakage is effectively avoided, thereby ensuring the reliability and objectivity of the model evaluation.

Figure 8 presents the prediction results on the training, validation, and test sets, both before and after hyperparameter optimization. The predicted values closely match the actual values across all three datasets, demonstrating a strong fit and indicating that the model maintains high predictive performance during both the fitting and generalization phases. Notably, after hyperparameter optimization via the ITOC algorithm, the overall prediction accuracy of the model is further enhanced, as evidenced by the increase in the coefficient of determination (R2) from 0.9782 to 0.9901. This significant improvement confirms the effectiveness and high precision of the proposed ITOC-CNN-BiGRU-CBAM model in forecasting coal spontaneous combustion temperature.

Discussion

Comparative analysis of models

To systematically evaluate the performance of different models in predicting the temperature variation during coal spontaneous combustion, several representative approaches from the existing literature were implemented and compared. These include PSO-BPNN, MSOWA-BPNN, SSA-CNN, IGWO-GRU, IGWO-GRNN, and the proposed integrated deep learning architecture ITOC-CNN-BiGRU-CBAM. All models were trained and validated on a unified dataset to ensure fairness in performance evaluation.

Table 4 presents the key performance metrics for each model across the training, validation, and test sets, including the coefficient of determination (R2), mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE)²⁰. These metrics provide a comprehensive assessment of the models’ predictive capabilities, capturing aspects such as goodness of fit, relative and absolute prediction errors, and error variability.

Table 4 Comparison results of different model predictions

Full size table

The proposed ITOC-CNN-BiGRU-CBAM model demonstrates clear superiority across all major evaluation indicators. Specifically, it achieves R2 values of 0.9931, 0.9803, and 0.9738 on the training, validation, and test sets, respectively—outperforming all other compared models and indicating strong fitting and generalization capabilities. Moreover, the model records a test set MAPE of only 4.1254% and an RMSE of 12.4735, which are significantly lower than those of traditional models such as PSO-BPNN (MAPE: 10.2025%, RMSE: 21.6402) and MSOWA-BPNN (MAPE: 9.4768%, RMSE: 20.5193), illustrating a notable improvement in both accuracy and robustness.

Although some optimized deep learning models like SSA-CNN, IGWO-GRU, and IGWO-GRNN demonstrate performance enhancements over classical approaches, they remain inferior to the proposed model in both predictive accuracy and generalization ability. For instance, SSA-CNN attains an R2 of only 0.9264 on the test set, highlighting its limitations in capturing temporal dependencies. Similarly, although IGWO-GRU and IGWO-GRNN show improvements in error metrics, they fall short in overall performance and fail to address the influence disparity among input features.

By contrast, the proposed model effectively integrates the local feature extraction capabilities of Convolutional Neural Networks (CNN), the bidirectional temporal dependency modeling of BiGRU, and the attention mechanism of CBAM, which dynamically recalibrates feature importance. This architecture significantly enhances the model’s ability to extract essential patterns from complex inputs, mitigates information redundancy, and reduces the omission of critical features—issues that are prevalent in traditional models.

In conclusion, the ITOC-CNN-BiGRU-CBAM model exhibits superior comprehensive performance in high-precision prediction of coal spontaneous combustion temperature, demonstrating strong practical application potential in coal safety monitoring and early warning systems.

Analysis of model application in different coal mines

To further validate the applicability and generalization capability of the ITOC-CNN-BiGRU-CBAM model in predicting the spontaneous combustion temperature of actual coal, the 3302 working face from a coal mine in Shandong and the 3301 working face from a coal mine in Shanxi were selected as the research subjects. Coal samples were collected from the central regions of these working faces, and programmed temperature rise experiments were conducted based on the spontaneous combustion characteristics of the coal. Subsequently, input data sets were constructed based on the experimental results. The collected data were divided into training, validation, and test sets at a ratio of 8:2:1. Using the established prediction model, the spontaneous combustion temperature was predicted for each set. Model performance was quantitatively assessed using two key evaluation metrics: the coefficient of determination (R2) and the root mean square error (RMSE). The prediction results under varying working conditions are presented in Fig. 9, further demonstrating the model’s effectiveness and stability in real-world applications.

The application results in different coal mine workings show that the proposed ITOC-CNN-BiGRU-CBAM model exhibits good adaptability and stability in the task of coal spontaneous combustion temperature prediction. Figure 8a and b correspond to the test results of Shandong 3302 working face and Shanxi 3301 working face, respectively. It can be observed that under the two actual working conditions with obvious regional differences and coal quality differences, the model achieves high R2 values (both more than 0.97) and low RMSE values (12.4735 and 11.7261, respectively), which indicates that the model is able to accurately capture the change rule of spontaneous combustion temperature of coal samples, with excellent fitting and generalization capabilities.

Field engineering application results

To evaluate the applicability and accuracy of the ITOC-CNN-BiGRU-CBAM coal autogenous combustion temperature prediction model under real-world mining conditions, the 4507 working face of a coal mine in Shaanxi was chosen as the application scenario. Gas samples were collected continuously for 16 days from the working face using a beam tube monitoring system. Data on various index gas concentrations, including CO, C₂H₄, C₂H₆, and others, were obtained during this period. The gas concentration parameters were used as inputs to the model to predict coal seam temperature. These predicted results were then compared with the measured temperature data from the site, as shown in Table 5.

Table 5 Beam tube monitoring system field monitoring gas sample data and temperature prediction results

Full size table

From Table 5, it can be observed that the predicted temperature range for the coal seam in the 4507 working face, based on the ITOC-CNN-BiGRU-CBAM model, is predominantly between 30 °C and 45 °C, aligning closely with the measured data. Over the entire 16-day monitoring cycle, the concentration of various index gases remained low, and the temperature stayed within a relatively stable, low range without any significant increase. No signs of spontaneous combustion were detected, indicating that the coal seam in this area is in a safe condition, with no risk of spontaneous combustion. These findings fully demonstrate the robustness and practical applicability of the proposed model in complex mining environments, highlighting its potential for field implementation and widespread application.

Conclusions

In this study, using experimental data from programmed warming of coal samples from Dongtan coal mine and considering the mechanism of the coal oxidation-pyrolysis composite reaction, Pearson’s correlation coefficient method is applied to identify six gas characteristic variables strongly correlated with the spontaneous combustion temperature of coal. Based on these variables, a deep learning prediction model is constructed, incorporating the ITOC optimization strategy—ITOC-CNN-BiGRU-CBAM. This model is co-tuned with key hyperparameters through an improved tornado optimization algorithm. To verify its prediction performance, the model is compared with mainstream coupled models from the literature. Additionally, the model is applied to different coal mine workings using field-measured data to systematically evaluate its performance across four key indicators: R², MAPE, MAE, and RMSE. The evaluation focuses on prediction accuracy, generalization ability, and engineering adaptability. The main conclusions are as follows:

(1)
Based on the mechanism of the coal oxidation-pyrolysis composite reaction, Pearson correlation coefficient analysis was used to identify six key characteristic variables that are highly correlated with the spontaneous combustion temperature of coal. These variables, including O₂, CO, C₂H₄, CO/ΔO₂, C₂H₄/C₂H₆, and C₂H₆, were selected from multiple gas indicators. A comprehensive prediction index system for coal spontaneous combustion temperature was then constructed, centered on these six key indices.
(2)
By introducing a Cubic chaotic mapping initialization mechanism and quantum entanglement strategy, the individual position updating method was improved. This enhanced the global optimization-seeking ability of the Improved Tornado Optimization with Coriolis Force (ITOC) algorithm, which includes a Coriolis force perturbation mechanism. A deep learning prediction framework, incorporating ITOC optimization strategies, was constructed. The coupled CNN-BiGRU-CBAM model for ITOC optimization was developed. Numerical experimental results comparing five existing mainstream heuristic optimization algorithms demonstrate that the ITOC algorithm offers significant advantages in both search accuracy and convergence stability. The key hyperparameters of the CNN-BiGRU-CBAM model—learning rate, number of BiGRU neurons, and convolutional kernel size—were jointly optimized using the ITOC algorithm, with optimal configurations found to be: learning rate = 0.0093, number of BiGRU neurons = 108, and convolutional kernel size = 8.54. These optimizations improved the model’s prediction accuracy.
(3)
The experimental data for coal spontaneous combustion were divided into training, validation, and test sets with a ratio of 8:2:1. The ITOC-CNN-BiGRU-CBAM model was compared and analyzed with five representative prediction models (PSO-BPNN, MSOWA-BPNN, SSA-CNN, IGWO-GRU, IGWO-GRNN) widely used in the literature. The results show that the ITOC-CNN-BiGRU-CBAM model achieved a coefficient of determination (R2) of 0.9738, a mean absolute percentage error (MAPE) of 4.1254%, a mean absolute error (MAE) of 6.2740, and a root mean square error (RMSE) of 12.4735 on the test set. These results represent a significant improvement in overall performance compared to the other models, demonstrating superior prediction accuracy.
(4)
In validation experiments conducted at the 3302 working face of a mine in Shandong and the 3301 working face of a mine in Shanxi, the constructed prediction model maintained a good fit and generalization ability (R2 > 0.97, RMSE < 12.5), demonstrating strong adaptability to different coal quality and geological conditions. In the on-site engineering application at the 4507 working face of a mine in Shaanxi, the model’s prediction results closely matched the measured temperatures. The coal seam was successfully identified as being in a stable state with no risk of spontaneous combustion, further proving the robustness and engineering practical value of the model in an actual mine environment.

Data availability

The datasets generated and analyzed during the current study are not publicly available due to institutional data-sharing policies but are available from the corresponding author upon reasonable request. For data access inquiries, please contact Xuming Shao at shaoxuming66@163.com.

References

Chen, Q. et al. Spontaneous coal combustion temperature prediction based on an improved grey wolf optimizer-gated recurrent unit model. Energy 314, 133980 (2025).
Article CAS Google Scholar
Qin, B. & Ma, D. Research progress and challenges in prevention and control of combined disasters of coal spontaneous combustion and methane in coal mine goaf. J. China Coal Sci, 1–18 (2024).
Onifade, M. & Genc, B. A review of research on spontaneous combustion of coal. Int. J. Min. Sci. Technol. 30, 303–311 (2020).
Article CAS Google Scholar
Pone, J. D. N. et al. The spontaneous combustion of coal and its by-products in the Witbank and Sasolburg coalfields of South Africa. Int. J. Coal Geol. 72, 124–140 (2007).
Article CAS Google Scholar
Tian, F. et al. Research progress of spontaneous combustion of coal containing gas under the compound disaster environment in the goaf. J. China Coal Soc. 49, 2711–2727 (2024).
Google Scholar
Wang, K. et al. Research on prediction model of coal spontaneous combustion temperature based on SSA-CNN. Energy 290, 130158 (2024).
Article CAS Google Scholar
Liu, H., Li, Z., Yang, Y., Miao, G. & Li, J. The temperature rise characteristics of coal during the spontaneous combustion latency. Fuel 326, 125086 (2022).
Article CAS Google Scholar
Warwick, P. D. & Ruppert, L. F. Carbon and oxygen isotopic composition of coal and carbon dioxide derived from laboratory coal combustion: A preliminary study. Int. J. Coal Geol. 166, 128–135 (2016).
Article CAS Google Scholar
Ma, T. et al. Study on the influence of key active groups on gas products in spontaneous combustion of coal. Fuel 344, 128020 (2023).
Article CAS Google Scholar
Zhang, Y. et al. Oxidation characteristics of functional groups in relation to coal spontaneous combustion. ACS Omega 6, 7669–7679 (2021).
Article PubMed PubMed Central CAS Google Scholar
Yan, H. et al. Experimental assessment of multi-parameter index gas correlation and prediction system for coal spontaneous combustion. Combust. Flame 247, 112485 (2023).
Article CAS Google Scholar
Yutao, Z., Qiang, G., Yuanbo, Z., Yaqing, L. & Yali, S. Correlation analysis and prediction of coal spontaneous combustion risk based on correlation coefficient method. China Saf. Sci. J. (CSSJ) 34, 125–132 (2024).
Google Scholar
Bo, T. et al. Research on grading and early warning of coal spontaneous combustion based on correlation analysis of index gas. China Saf. Sci. J. 31, 33 (2021).
Google Scholar
Shukla, U. S., Mishra, D. P. & Mishra, A. Prediction of spontaneous combustion susceptibility of coal seams based on coal intrinsic properties using various machine learning tools. Environ. Sci. Pollut. Res. 30, 69564–69579 (2023).
Article CAS Google Scholar
Zhang, L. et al. Prediction of coal self-ignition tendency using machine learning. Fuel 325, 124832 (2022).
Article CAS Google Scholar
Salloum, S. A., Alshurideh, M., Elnagar, A. & Shaalan, K. in The International Conference on Artificial Intelligence and Computer Vision. 50–57 (Springer).
Thakur, R., Panse, P. & Bhanarkar, P. in Machine Learning and Metaheuristics: Methods and Analysis 235–253 (Springer, 2023).
Zhao, J. et al. Prediction of temperature and CO concentration fields based on BPNN in low-temperature coal oxidation. Thermochim. Acta 695, 178820 (2021).
Article CAS Google Scholar
Yan, L. et al. Research on Prediction and Early Warning Technology of Gob Spontaneous Combustion Based on RBF Neural Network. Combustion Science and Technology, 1–21 (2024).
Ni, S., Yue, Y. & Chen, Q. Research on the Prediction Model of Coal Spontaneous Combustion Hazard Level Based on IGWO-GRNN. Combustion Science and Technology, 1–11 (2025).
Jun, D. et al. Random forest method for predicting coal spontaneous combustion in gob. J. China Coal Soc. 43, 2800–2808 (2018).
Google Scholar
Li, S., Xu, K., Xue, G., Liu, J. & Xu, Z. Prediction of coal spontaneous combustion temperature based on improved grey wolf optimizer algorithm and support vector regression. Fuel 324, 124670 (2022).
Article CAS Google Scholar
Wei, W., Ran, L., Yun, Q., Baoshan, J. & Zewei, W. Prediction model of coal spontaneous combustion risk based on PSO-BPNN. China Saf. Sci. J. 33, 127 (2023).
Google Scholar
Biao, K. et al. Study on prediction of coal spontaneous combustion based on MSWOA-BP. Mining Saf. Environ. Protect. 50, 30–36 (2023).
Google Scholar
Bhuyan, P., Singh, P. K. & Das, S. K. Res4net-CBAM: A deep cnn with convolution block attention module for tea leaf disease diagnosis. Multimedia Tools Appl. 83, 48925–48947 (2024).
Article Google Scholar
Naik, R. B. & Singh, U. A review on applications of chaotic maps in pseudo-random number generators and encryption. Ann. Data Sci. 11, 25–50 (2024).
Article Google Scholar
Braik, M. et al. Tornado optimizer with Coriolis force: a novel bio-inspired meta-heuristic algorithm for solving engineering problems. Artif. Intell. Rev. 58, 1–99 (2025).
Article Google Scholar
Rogers, T. D. & Whitley, D. C. Chaos in the cubic mapping. Math. Modell. 4, 9–25 (1983).
Article MathSciNet Google Scholar
He, W. et al. An efficient and robust fusion positioning system based on entangled photons. IEEE J. Sel. Areas Commun. 42, 78–92 (2023).
Article CAS Google Scholar
Yin, J. et al. Entanglement-based secure quantum cryptography over 1,120 kilometres. Nature 582, 501–505 (2020).
Article ADS PubMed CAS Google Scholar
Bhatt, D. et al. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics 10, 2470 (2021).
Article Google Scholar
Dai, Y., Yu, W. & Leng, M. A hybrid ensemble optimized BiGRU method for short-term photovoltaic generation forecasting. Energy 299, 131458 (2024).
Article Google Scholar
Dave, E. & Chowanda, A. IPerFEX-2023: Indonesian personal financial entity extraction using indoBERT-BiGRU-CRF model. J. Big Data 11, 139 (2024).
Article Google Scholar
Miura, K. Mild conversion of coal for producing valuable chemicals. Fuel Process. Technol. 62, 119–135 (2000).
Article CAS Google Scholar
Jiang, P. Research on Prediction Model of Coal Spontaneous Combustion Temperature Based on Machine Learning. Xi’an University of Science and Technology, Xi’an (2020).
Chen, P., Li, F. & Wu, C. in Journal of Physics: Conference Series. 012054 (IOP Publishing).
Deng, J., Deng, Y. & Cheong, K. H. Combining conflicting evidence based on Pearson correlation coefficient and weighted graph. Int. J. Intell. Syst. 36, 7443–7460 (2021).
Article Google Scholar
Deng, J., Chen, W., Wang, C. & Wang, W. Prediction model for coal spontaneous combustion based on SA-SVM. ACS Omega 6, 11307–11318 (2021).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 52274204, 52104195), the Excellent Young Scientists Fund of the Natural Science Foundation of Liaoning Province (Grant No. 2024JH3/10200042), and the Xing-Liao Talent Program” of Liaoning Province (Grant No. XLYC2403031).

Author information

Authors and Affiliations

Safety Science and Engineering College, Liaoning Technical University, Huludao, 125105, Liaoning , China
Xuming Shao, Gang Bai, Yan Chen & Yu Liu
Guangxi Technological College of Machinery and Electricity, Nanning, 530007, Guangxi, China
Wenhao Liu
Key Laboratory of Mine Thermodynamic Disasters and Control of Ministry of Education, Liaoning Technical University, Huludao, 125105, Liaoning, China
Gang Bai & Yu Liu
Ordos Research Institute, Liaoning Technical University, Ordos, 017004, Neimenggu, China
Gang Bai & Yu Liu
School of Electronic and Information Engineering, Liaoning Technical University, Huludao, 125105, Liaoning, China
Jiahe Guang

Authors

Xuming Shao
View author publications
Search author on:PubMed Google Scholar
Wenhao Liu
View author publications
Search author on:PubMed Google Scholar
Gang Bai
View author publications
Search author on:PubMed Google Scholar
Yan Chen
View author publications
Search author on:PubMed Google Scholar
Yu Liu
View author publications
Search author on:PubMed Google Scholar
Jiahe Guang
View author publications
Search author on:PubMed Google Scholar

Contributions

Xuming Shao: Supervision, Writing – original draft, Investigation, Project administration. Wenhao Liu: Resources, Software, Validation, Writing – review & editing Gang Bai: Conceptualization, Methodology, Funding acquisition Yan Chen: Investigation, Writing – review & editing, Visualization. Yu Liu: Data curation, Formal analysis, Validation. Jiahe Guang: Software, Model development.

Corresponding author

Correspondence to Xuming Shao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Shao, X., Liu, W., Bai, G. et al. Deep learning framework based on ITOC optimization for coal spontaneous combustion temperature prediction: a coupled CNN-BiGRU-CBAM model. Sci Rep 15, 26700 (2025). https://doi.org/10.1038/s41598-025-11294-2

Download citation

Received: 05 June 2025
Accepted: 09 July 2025
Published: 23 July 2025
DOI: https://doi.org/10.1038/s41598-025-11294-2

Subjects

Abstract

Similar content being viewed by others

Research on coal spontaneous combustion hierarchical prediction model based on NSGA-II-RF

Study on spontaneous combustion characteristics of coal under thermo mechanical coupling

Research on early warning model of coal spontaneous combustion based on interpretability

Introduction

Theoretical basic research

Improved Tornado Optimizer with Coriolis force (ITOC)

Tornado Optimizer with Coriolis force (TOC)

Modalities for improvement

Improvement of initialization based on Cubic chaotic mapping

Individual position update based on quantum entanglement

Convolutional Neural Network—Bidirectional Gated Recurrent Unit—Attention Mechanism Model(CNN-BiGRU-CBAM)

Convolutional Neural Network(CNN)

Bidirectional Gated Recurrent Unit(BiGRU)

Attention mechanism(CBAM)

ITOC-GRITOC-GRU based predictive model for spontaneous coal combustion

Model framework construction

Sample data-driven characterization of gas evolution

Key gas reaction mechanisms in coal pyrolysis processes

Acquisition of sample data

Preliminary analysis of the correlation between gas indicators and temperature

Comprehensive analysis of gas evolution laws based on coal pyrolysis reaction mechanism and experimental data

ITOC validation

Summary of current status and research progress of coal spontaneous combustion optimization algorithms

Numerical experimental conditions and test function selection

Analysis of test results

CNN-BiGRU-CBAM coal spontaneous combustion prediction modeling

Hyperparameter optimization of CNN-BiGRU-CBAM model by ITOC algorithm

Prediction results based on ITOC-CNN-BiGRU-CBAM models

Discussion

Comparative analysis of models

Analysis of model application in different coal mines

Field engineering application results

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links