Settlement early warning method for high speed railway subgrades based on TD Transformer

Kebing, Wen; Qinghuai, Liang

doi:10.1038/s41598-025-05067-0

Download PDF

Article
Open access
Published: 05 June 2025

Settlement early warning method for high speed railway subgrades based on TD Transformer

Wen Kebing^1,2^na1 &
Liang Qinghuai¹^na1

Scientific Reports volume 15, Article number: 19746 (2025) Cite this article

1139 Accesses
2 Citations
Metrics details

Subjects

Abstract

During high speed railway construction, shield-tunnel undercrossing frequently induces subgrade settlement, which threatens project safety and progress. Existing settlement monitoring methods struggle to provide timely early warnings due to unclear data features and inadequate long-term dependency modeling.To address this, we propose a settlement early warning method for high-speed railway subgrades based on TD Transformer. Firstly, we utilize temporal-spatial enhanced attention (TSEA) for feature extraction from high-speed railway settlement data, effectively resolving the problem of vague features post-extraction. Secondly, dynamic global temporal attention (DGTA) is employed to dynamically capture and represent the long-term dependencies of settlement data. Experimental results demonstrate that TD Transformer achieves Accuracy, Precision, Recall, and F1-Score of 93.39%, 93.10%, 93.40%, and 93.24%, respectively, outperforming other advanced settlement early warning methods for high-speed railway subgrade with relative improvements of 1.24%, 1.3%, 1.3%, and 1.27%.This method effectively forecasts subgrade settlement and exhibits significant superiority in the task of multi-factor settlement early warning for high-speed railway subgrades.

A simplified method for determining earthquake early warning thresholds in high speed railways

Article Open access 25 September 2025

Geohazard assessment of Mexico City’s Metro system derived from SAR interferometry observations

Article Open access 12 March 2024

Identification of railway subgrade defects based on ground penetrating radar

Article Open access 13 April 2023

Introduction

With the acceleration of urbanization and the growing demand for transportation, rail transit systems such as subways and high-speed railways have become integral components of modern urban transportation networks. In the context of urban planning and rail transit construction, the phenomenon of subways tunneling under high-speed railway subgrades is becoming increasingly common. Due to the extensive underground excavation and support required for subway tunnel construction, the stability of high-speed railway subgrades can be compromised, leading to settlement or even deformation. The stability of high-speed railway subgrades is crucial for the safe operation of high-speed trains. Uneven subgrade settlement can cause track deformation, affecting the safety of train travel and potentially leading to serious incidents such as derailments. Therefore, in the construction process of subway underpassing high-speed railway subgrade, real-time monitoring of the settlement of high-speed railway subgrade and timely warning of potential risks are the key measures to ensure the safe operation of high-speed railway.

Currently, high-speed railway subgrade settlement monitoring typically relies on a high-precision sensor network deployed on the subgrade. These sensors can collect real-time data on settlement, strain, and tilt¹. Effectively processing and analyzing this vast amount of sequential data, and providing timely warnings, has become a critical research focus. With the development of big data technology and artificial intelligence, sequence data analysis methods based on machine learning² and deep learning³offer new solutions for high-speed railway subgrade settlement early warning.Near the new railway project of high-speed railway operation,it is crucial to assess the potential impact of additional settlement and deformation on the existing infrastructure during the design phase. Furthermore, effective measures must be implemented during both the design and construction phases. Liang F et al.⁴ proposed a study on high-speed railway subgrade settlement early warning mechanisms based on artificial neural networks (ANNs), utilizing ANNs to address challenges related to subgrade settlement prevention.

However, most methods do not take into account the unique characteristics of high-speed railway subgrade settlement data, namely:

1. Blurred Features of High-Speed Railway Subgrade Settlement Data: For the settlement data of similar high-speed railway subgrade, the data comes from various sensors, and its own data characteristics are not obvious, so it is difficult to extract effectively.

2. Long-Term Dependency in Settlement Sequence Data: Long-term dependency refers to the phenomenon where the current value in a sequence not only depends on recent data points but also has significant associations with data points further in the past. Capturing these long-term dependencies is crucial for accurate prediction and early warning of high-speed railway subgrade settlement.

In recent years, Transformers have gained increasing recognition, attracting many prominent scholars to the field of Transformer research⁵. Transformers are now widely applied across various domains, including data mining⁶, time series prediction⁷, and other related fields, achieving significant results⁸.

To address the aforementioned issues, this paper uses Transformer to use the surface settlement data during shield construction as the basis, and proposes a high-speed railway subgrade settlement early warning method based on TD Transformer combined with the actual situation. This method incorporates Temporal-Spatial Enhanced Attention(TSEA) and Dynamic Global Temporal Attention (DGTA) to effectively extract key features of settlement data and dynamically capture long-term dependencies. The effectiveness of this method is validated through a practical shield tunneling project where the Xi’an Metro Line 1 tunnels beneath the Xulan high-speed railway.

Related work

Settlement warning method

Settlement early warning for high-speed railway subgrades is a crucial research direction to ensure the safe operation and structural stability of high-speed railways. With the rapid expansion of the high-speed rail network, effectively monitoring and predicting settlement conditions has become a focus for both academia and the engineering community. Existing high-speed railway settlement methods mainly include surface monitoring⁹, underground monitoring¹⁰, numerical simulation¹¹, and intelligent early warning¹².Although these methods are grounded in soil mechanics theory, they typically depend on accurate parameter calibration and encounter difficulties when applied to complex geological conditions.As science and technology advance, intelligent early warning systems have become mainstream. These systems combine big data and artificial intelligence technologies, using machine learning algorithms to analyze and process monitoring data to establish high-speed railway settlement early warning models. He et al.¹³proposed a comprehensive risk assessment and early warning method. By integrating expert consultation, finite element model analysis, a combination of quantitative and qualitative risk assessment, and ARIMA method for deformation prediction, they successfully established a settlement early warning system for the underground passage of Cuiling Road in the Beijing-Tianjin Intercity Railway. This system effectively reduces risks and ensures structural safety by leveraging big data and artificial intelligence technologies, showing potential for application in similar projects.Jehanzaib et al.¹⁴while achieving certain results using Decision Tree (DT), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) classifiers for prediction, faced some shortcomings in feature extraction. They neglected the temporal dependency information in time series data and failed to fully exploit the temporal dynamics of the data. Cai et al.¹⁵ used XGBoost to construct a prediction model. Although experiments showed the method’s effectiveness in early warning, the feature extraction resulted in blurred features, making it difficult to clearly reflect important patterns in the data. Zhao et al.¹⁶designed a hazard source identification and early warning system based on the RF algorithm, effectively addressing the issues of large errors and long response times in traditional systems. Simulation results showed that the average identification error of the system was only 4.1%, and the early warning response time could be controlled within 9 s. However, the system failed to effectively capture long-term dependencies in the data, limiting the model’s ability to fully utilize historical information for prediction.More recently, machine learning methods such as support vector machines (SVMs) and random forests have improved prediction accuracy, yet they often lack interpretability and may underperform with small datasets. These limitations highlight the need for a more robust, data-driven approach capable of modeling complex spatiotemporal dependencies in settlement behavior—a gap our Transformer-based framework aims to address by leveraging self-attention mechanisms for enhanced feature extraction and early warning performance.

Transformer

Since its introduction in 2017, the Transformer model has achieved significant results, greatly enhancing computational efficiency and parallel processing capabilities through its self-attention mechanism. It has seen notable success in both NLP and CV fields. Recently, Transformer models have also been applied to fault early warning systems.Yan et al.¹⁷proposed a hybrid prediction and early warning model based on time series analysis, utilizing adaptive normalization, GRU, EEMD, and the Optuna framework for optimization. This approach achieved highly accurate predictions and early warnings, significantly improving the early warning capabilities and predictive performance of smart operations.Song et al.¹⁸introduced a model that integrates LSTM and Self-Attention, which, despite improving the monitoring and prediction accuracy of financial market risks by capturing long-term correlations and trends, still relied on traditional time series techniques for feature extraction, resulting in blurred features. Li et al.¹⁹proposed a prediction framework based on Transformers that accurately captures time series features through multi-layer encoders and attention modules. Data validation demonstrated that this framework outperformed the LSTM framework, reducing prediction errors by 50%. However, the model still struggled with dynamically capturing long-term dependencies in time series data, affecting accurate diagnosis and early warning.Chen et al.²⁰developed a framework named TCN-Transformer. To overcome the limitations of RNNs, this framework combines Temporal Convolutional Networks (TCNs) and Transformers to extract both local and global features. Additionally, a loss function was designed to ensure classification performance while focusing more on early features.

Attention mechanism

The Attention Mechanism plays a crucial role in the field of deep learning, especially in processing time series data²¹ and natural language processing²². This mechanism enhances the efficiency and effectiveness of information processing by allowing models to focus on the most important parts when handling information.

Attention mechanisms based on Convolutional Neural Networks (CNN) are typically used for image and video processing but can also be applied to time series data²³. This type of attention mechanism is usually achieved by adding a weight layer, which weights the importance of different parts of the input data.Xuan Y. et al.²⁴improved Long Short-Term Memory networks (MCA-LSTM) by employing a multi-channel attention mechanism, which constructs memory channels through weighted attention to enhance cross-channel feature information interaction. This approach allows the detection of wind farm faults approximately 10 hours earlier than traditional records.Zhou et al.²⁵proposed several models for time series prediction and classification, including LSTM with autoencoder and temporal attention mechanism, and TCN models with temporal attention mechanism²⁶. Chang et al.²⁷ introduced the Channel Attention Mechanism to enhance prediction accuracy and stability in time series forecasting. This mechanism captures key features and contextual information in the data, automatically adjusting the weights of feature channels to focus on the impact of critical features.

Self-attention is currently one of the most popular attention models, especially in the fields of time series and natural language processing. This mechanism does not rely on traditional sequence alignment but instead calculates the relationships between different parts of the sequence to determine the focus area. This allows the model to handle longer sequences and achieve a deeper understanding of context.The Transformer model proposed by Vaswani et al²⁸is a significant milestone in the application of the Attention Mechanism. Its core idea is to use the Masked Multi-head Self Attention Mechanism to capture dependencies between any two positions in a sequence²⁹. Due to CNNs’ limitations in establishing dependencies over long sequence data, they face challenges in processing hyperspectral sequence features.To overcome these limitations, inspired by the Transformer model, Peng et al.³⁰proposed a Cross-Attention Spatial-Spectral Transformer (CASST) method.

Methods

To address the issues of vague features in existing high-speed railway settlement data and the difficulty in capturing long-term dependencies, this paper proposes a Settlement Early Warning Method for High-Speed Railway Subgrades Based on TD Transformer. We utilize Time-Series Enhanced Attention (TSEA) and Dynamic Global Temporal Attention (DGTA) to construct feature representations for settlement early warning.First, we use the Transformer model to predict the settlement conditions of high-speed railway subgrades. (As shown in Fig. 1.) The Transformer model is well-suited for handling time series data. Next, we apply TSEA to the input part of the model, obtaining weighted representations of sensor values based on their attention scores. These learned attention scores indicate the contribution of each settlement influencing factor to the feature representations used in subsequent layers, enhancing the model’s feature extraction capability.Finally, DGTA aggregates information from all time steps in the sequence through the Attention Mechanism, generating a dynamic global representation. This approach is highly effective in capturing the overall long-term dependencies within the entire time series. Detailed discussions on TSEA and DGTA are provided in the following sections.

Transformer

Compared to traditional time series processing models, the Transformer excels in handling more complex features, including nonlinear and hierarchical characteristics, largely due to its multi-head self-attention mechanism. This mechanism allows the model to simultaneously focus on different parts of the input sequence, thereby enhancing its ability to capture intricate relationships within the sequence.

The original Transformer model we use consists of three parts: input, encoder, and output. Firstly, the input part of the Transformer is the result of adding the input embedding and the positional embedding. Secondly, the Transformer’s encoder primarily employs Multi-Headed Self-Attention (MHSA), which assigns a weight to each position in the input sequence and then uses these weighted positional vectors as the output for the next part. Finally, the output part of the Transformer passes the feature information processed by the encoder to the softmax layer for final classification.

TSEA

TSEA (Temporal-Spatial Enhanced Attention) is a convolution-based attention mechanism. In neural networks, attention mechanisms allow the model to effectively focus on the most important parts of the input data. For time series data, such as sedimentation factors, this helps the original Transformer model identify and emphasize the most informative time steps or influencing factors for the final task.The implementation of TSEA is carried out through the following steps, just as shown in Fig. 2. First, dilated convolution is used to extract features, capturing long-term dependencies in the time series data. Next, a 1x1 convolution is applied to aggregate the features into a single channel, reducing dimensionality and preparing for the calculation of attention scores. Then, separable convolution is used to further process the features, reducing the number of parameters and optimizing the computation of attention weights. Finally, the Softmax function is applied to compute the attention scores, allowing the model to assign different importance to various features or time steps. This process produces a weighted version of the input, where important information from each sensor’s data is highlighted through learned features. Therefore, TSEA is not just a conventional convolutional layer; it also integrates an attention mechanism to enhance the model’s feature extraction capabilities, providing more salient feature information.

DGTA

We utilize DGTA to extract feature representations at each time point and evaluate their relative importance in predicting the subsidence warning level within a given time window, as well as capturing their long-term dependencies. Using the attention scores obtained from Eq. (1), we compute a dynamically weighted average feature representation that encompasses information from all time points within the observation window. This representation is achieved by dynamically adjusting the attention scores through learnable scaling factors, which enhances the Transformer’s expressive power, particularly in handling subsidence data with long-term dependencies. This weighted average feature vector is subsequently used by the feed forward network layers to determine the current subsidence status.

$$\begin{aligned} & g^{\left( t_i\right) }=\tanh \left( W_{g a} \cdot s^{\left( t_i\right) }+b_{g a}\right) \end{aligned}$$

(1)

$$\begin{aligned} & a^{\left( t_i\right) }=\frac{\exp \left( s \cdot \left( g^{\left( t_i\right) }\right) ^T \cdot g_s\right) }{\sum _t \exp \left( s \cdot g^{\left( t_i\right) } \cdot g_s\right) } \end{aligned}$$

(2)

$$\begin{aligned} & c_i=\sum _t a^{\left( t_i\right) } \cdot s^{\left( t_i\right) } \end{aligned}$$

(3)

The parameters $W_{g a}$and $b_{g a}$mentioned in Eq. (1) are learned during the training process and are used to extract hidden representations from each vector $s^{\left( t_i\right) }$generated by the self-attention module. The parameter$g_s$ in Eq. (2) captures temporal context and assists in computing attention scores during learning. The factors s represents a dynamic scaling factor that can be learned to adjust the overall attention scores, allowing the model to adapt more flexibly to the classification task of subsidence factors. Finally, in Eq. (3) a weighted sum is computed based on the relative importance of each time step, producing a weighted aggregate that serves as the input feature vector for subsequent fully connected layers.

Experiment

Data preparation

The data used in this paper were collected from subgrade settlement monitoring of the No.1 Subway in Xi’an from Qin Du Station to Bao Quan Road Station. This subway requires digging tunnels underneath two railway lines, including the Xu Lan High-Speed Railway and the Long Hai Railway. We researched the settlement data of two real datasets, Xu Lan High-Speed Railway and Long Hai Railway, from July 2022 to March 2025, with a monitoring period of 1005 days and a acquisition cycle of hours, totaling 24120 data points. The train set consists of 14472 data points, the test set consists of 4824 data points, and the validation set consists of 4824 data points.

Monitoring data were collected through IoT smart remote control terminals and transmitted wirelessly in real-time to the monitoring data platform. Initial deformation values were transmitted to the platform via acquisition software, and subsequent monitoring values were collected in real-time. Due to the strong continuity of the dataset in this paper, the polynomial interpolation method is used to infer reasonable missing values using existing data, thereby reducing information loss.Table 1presents the high-speed railway subgrades settlement warning levels based on settlement distance and settlement velocity.

Table 1 High-speed railway subgrades settlement warning levels.

Full size table

Based on long-term industry background verification, the settlement warning level for high-speed railway subgrades is determined by two main factors: settlement distance and settlement velocity. Table 2 presents partial settlement data in the Xu Lan high-speed railway,which include the data of settlement distance, distance warning level, settlement velocity, velocity warning level and the comprehensive warning level. The comprehensive warning level is the final result of settlement warning for high-speed railway subgrades. In the Table 2, the data of settlement distance and settlement velocity are monitored through IoT smart remote control terminals. The distance warning level and velocity warning level are obtained by the data of Table 1. The comprehensive warning level is labeled by engineers in actual scenarios.

Table 2 Settlement data in the Xu Lan high-speed railway.

Full size table

When using the TD Transformer model to implement warning tasks, the data in Table 2 is used as the training set for the model. The input features primarily consist of settlement distance values, settlement velocity, distance warning level, velocity warning level and comprehensive warning levels. The model outputs include predicted settlement distance values, settlement velocity, and their associated comprehensive warning levels.The common evaluation metrics are Accuracy, Precision, Recall, and F1-Score. These metrics assess the model’s performance from different perspectives:

Accuracy: Represents the proportion of correctly predicted samples out of the total number of samples. High accuracy indicates that the model performs well overall; however, it may be biased in datasets with class imbalances.

$$\begin{aligned} \text{ Accuracy } =\frac{T P+T N}{N} \end{aligned}$$

(4)

Precision: Measures the accuracy of the model in predicting positive samples, i.e., the proportion of true positive samples among all samples predicted as positive.

$$\begin{aligned} \text{ Precision } =\frac{T N}{F P+F P} \end{aligned}$$

(5)

Recall: Measures the model’s ability to identify positive samples, i.e., the proportion of true positive samples correctly predicted among all actual positive samples. High recall indicates that the model has few false negatives.

$$\begin{aligned} \text{ Recall } =\frac{T P}{T P+F N} \end{aligned}$$

(6)

F1-Score: The harmonic mean of Precision and Recall, it considers both false positives and false negatives. A high F1-Score indicates that the model performs more reliably when dealing with class imbalance.

$$\begin{aligned} F 1=\frac{2 \times \text{ Precision } \times \text{ Recall } }{ \text{ Precision } + \text{ Recall } } \end{aligned}$$

(7)

Ablation studies

Result of TSEA

To validate the effectiveness of the proposed method, we conducted relevant experiments under ablation conditions. By comparing TSEA and DGTA with other attention mechanisms, we obtained experimental results that further demonstrate the superiority of our attention mechanism. As shown in Table 3 , we compared TSEA with other CNN-based attention mechanisms. The results indicate that TSEA effectively extracts temporal and spatial features from ground subsidence data, enhancing the model’s ability to capture subsidence changes.

Table 3 Results of ablation experiments on the Xu Lan High Speed Railway dataset.

Full size table

Figure 3 demonstrates that all enhanced Transformer models achieve better training performance compared to the baseline. The Transformer + TSEA model stands out with the fastest convergence and the highest training accuracy, suggesting that the integration of Temporal-Spatial Enhanced Attention significantly improves the model’s learning efficiency. Although the Transformer + Channel Attention and Transformer + TCN models also show substantial reductions in training loss, their accuracy improvements are more gradual. The baseline Transformer, in contrast, converges more slowly and exhibits greater fluctuation in loss. Throughout training, none of the models show obvious signs of overfitting, as indicated by the steady increase in accuracy and consistent decline in loss. Among them, the TSEA-based model maintains strong performance and stability, indicating promising generalization capability.

Result of DGTA

In the experimental results shown in Table 4, we compared DGTA with other self-attention-based mechanisms such as Cross-Attention and Masked Multi-head Self-Attention (MMSA), verifying the performance improvement of the model with the introduction of DGTA. DGTA uses a dynamically weighted average feature representation that effectively captures the long-term dependencies in time series data, enhancing the model’s stability and accuracy in handling complex subsidence patterns. By dynamically adjusting the attention scores through learned parameters, the model achieves greater accuracy in predicting subsidence warning levels.

Table 4 Results of ablation experiments on the Xu Lan High Speed Railway dataset.

Full size table

The results in Table 4 demonstrate the effectiveness of the proposed TD Transformer-based subsidence early warning method for high-speed railway embankments. The Transformer model performs well across various metrics, indicating its high predictive accuracy and stability in processing and analyzing ground subsidence data. From the table, it is evident that the proposed method’s effectiveness has been fully validated. The Transformer model’s strong performance on all indicators shows its capability in accurately predicting and analyzing ground subsidence data. Figure 4 presents the training accuracy and loss of the baseline Transformer and its improved variants. All enhanced models demonstrate faster convergence compared to the baseline, with the Transformer + DGTA achieving the highest accuracy and the lowest, most stable loss throughout training. This suggests that the Dynamic Global Temporal Attention (DGTA) mechanism not only accelerates learning but also enhances model robustness. The Transformer + Cross-Attention and Transformer + MMSA models also contribute to performance gains, showing clear reductions in training loss, though their accuracy improvement is more gradual. Importantly, none of the models exhibit signs of overfitting, as the loss continues to decline steadily and no sharp fluctuations appear in later epochs, indicating good generalization behavior during training.

Comprehensive experiment

Table 5 shows that the TSEA mechanism effectively resolves feature blurring in high-speed railway subsidence data extraction. The Transformer + TSEA model achieves 93.10% accuracy, 92.85% precision, 93.05% recall, and 92.97% F1-score, with 87.1M parameters and 720 samples/s inference speed. Compared to the baseline Transformer, TSEA improves performance with only a 0.8% parameter increase and a minimal 7% speed reduction. The DGTA module further enhances long-term dependency modeling, boosting accuracy to 93.20%, precision to 93.00%, recall to 93.10%, and F1-score to 93.05%, while using 89.3M parameters and maintaining 650 samples/s throughput. This demonstrates that DGTA’s added complexity is justified by its performance gains. Our TD Transformer, combining both mechanisms, achieves the best results: 93.39% accuracy, 93.10% precision, 93.40% recall, and 93.24% F1-score. With 90.2M parameters and 520 samples/s speed, it balances computational cost and accuracy, improving over the baseline by 1.24 percentage points while maintaining efficient inference. These results confirm that TSEA and DGTA synergistically enhance the Transformer for subsidence early warning, with accuracy gains outweighing computational overhead. The TD Transformer’s optimal performance makes it suitable for real-world deployment in railway monitoring systems.

Table 5 Experimental results of ablation on the Xu Lan High Speed Railway dataset.

Full size table

Table 6 Experimental results of ablation on the Long Hai High Speed Railway dataset.

Full size table

Table 6 presents the ablation results on the Long Hai High-Speed Railway dataset, demonstrating the effectiveness of the proposed TSEA and DGTA modules. The baseline Transformer model achieves 91.67% accuracy, 91.20% precision, 91.50% recall, and a 91.35 F1-score with 86.4M parameters while processing 810 samples per second. Integrating the TSEA mechanism improves performance to 92.48% accuracy, 92.10% precision, 92.30% recall, and a 92.20 F1-score using 87.1M parameters at 690 samples per second, confirming its ability to refine local spatial and temporal representations in noisy railway subsidence data. The addition of the DGTA module further enhances results to 92.95% accuracy, 92.60% precision, 92.80% recall, and a 92.70 F1-score with 89.3M parameters running at 620 samples per second, demonstrating superior capability in capturing long-term temporal dependencies for subsidence trend analysis. The complete TD Transformer model combining both modules achieves state-of-the-art performance with 93.35% accuracy, 93.10% precision, 93.20% recall, and a 93.1 F1-score using 90.2M parameters while maintaining efficient inference at 570 samples per second, establishing an optimal balance between feature refinement, temporal modeling, and computational efficiency for railway subsidence prediction tasks.

Sequential experiment

As shown in Table 7, The experimental results indicate that applying TSEA before DGTA yields relatively better evaluation metrics. Starting with TSEA may better capture dependencies and structural information, providing more detailed feature information, which then enhances the effectiveness of DGTA in dynamically capturing the characteristics of subsidence data. This sequential synergy optimizes the overall model performance.

Table 7 Results of sequential experiments on the Xu Lan High Speed Railway dataset.

Full size table

Experimental result

This study establishes a high-speed railway subsidence early warning model using two indicators: Settlement Distance and Settlement Velocity. These indicators are jointly used to determine the subsidence warning level. The warning level for each indicator is calculated based on the warning range, and the overall warning level is computed by combining the weights and warning levels of each indicator. The experimental results of the TD Transformer algorithm are shown in Table 8 .

Table 8 TD transformer experimental results.

Full size table

Table 8 reveals significant uncertainty in high-speed railway subsidence and its rate of change. The data indicate that regions with larger settlement values also correspond to higher settlement velocities, suggesting more pronounced subsidence changes in these areas. This is fully reflected in the subsidence early warning mechanism. The TD Transformer model not only effectively captures both short-term and long-term subsidence changes but also weights features based on the importance of different sensor data, thereby enhancing its ability to warn about subsidence conditions. Table 9 presents the experimental results of SVM, XGBoost, RF, Transformer, and TD Transformer across four evaluation metrics: Accuracy, Precision, Recall, and F1-Score.

Table 9 Experimental results with different models on the Xu Lan High Speed Railway dataset.

Full size table

As observed from Table 9 and Fig. 5, the evaluation metrics of the TD Transformer model surpass those of the traditional Transformer model. The TD Transformer achieves an accuracy of 93.39%, representing a 1.24% improvement over the Transformer. Its precision is 93.10%, a 1.3% increase; recall is 94.40%, also a 1.3% increase; and the F1 score is 93.24%, a 1.27% improvement. These results indicate that the TD Transformer excels in all aspects, particularly in precision and F1 score, demonstrating its advantages in this task. The findings highlight the significant superiority of the TD Transformer in handling the high-speed railway subsidence early warning task compared to other methods.

Conclusion

High-speed railway subgrade settlement is crucial for the safety of high-speed trains. Existing monitoring data features are often vague, and long-term dependencies are difficult to capture, increasing the difficulty of early warning. To address these issues, this paper proposes a Settlement Early Warning Method for High-Speed Railway Subgrades Based on TD Transformer. This method first employs Time-Series Enhanced Attention (TSEA) for feature extraction, effectively improving the model’s feature extraction capability and solving the problem of feature vagueness. Secondly, Dynamic Global Temporal Attention (DGTA) is used to dynamically capture the long-term dependencies of the settlement data. Experimental results show that TD Transformer outperforms traditional high-speed railway subgrade settlement early warning methods in various metrics. Compared to other models, TD Transformer demonstrates significant advantages in Accuracy, Precision, Recall, and F1-Score.However, despite its outstanding performance, the TD Transformer method has some limitations. Firstly, its high computational complexity leads to a significant increase in hardware resource requirements, limiting its application in resource-constrained environments. Secondly, the model’s early warning effectiveness may be affected by unforeseen factors such as extreme weather or sudden events. For instance, extreme weather can cause rapid changes in subgrade settlement, and sudden events can trigger abnormal settlement, leading to less accurate short-term warnings by the model.Future research should focus on optimizing the model’s computational efficiency and considering more external factors to enhance the robustness of the early warning system, thereby more effectively coping with unforeseen settlement changes.

Data availibility

Due to concerns related to privacy information, project confidentiality, data security, etc., the dataset generated and/or analyzed during the current research period is not publicly available, but can be obtained from the corresponding author upon reasonable request.

References

Chao, T. et al. An IOT-based early warning system for settlement monitoring using differential pressure static level. In 2024 26th International Conference on Advanced Communications Technology (ICACT). 01–05. https://doi.org/10.23919/ICACT60172.2024.10471941 (IEEE, 2024).
Li, C. et al. Prediction of surface settlement induced by large-diameter shield tunneling based on machine-learning algorithms. Geofluids 2022, 4174768. https://doi.org/10.1155/2022/4174768 (2022).
Article Google Scholar
Liu, M. et al. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438 (2021).
Liang, F. Research on early warning mechanism of high-speed railway subgrade settlement based on artificial neural network. In 2023 Smart City Challenges & Outcomes for Urban Transformation (SCOUT). 218–222. https://doi.org/10.1109/SCOUT58937.2023.00049 (2023).
Lin, T., Wang, Y., Liu, X. & Qiu, X. A survey of transformers. AI Open 3, 111–132. https://doi.org/10.1016/j.aiopen.2022.10.001 (2022).
Article Google Scholar
Li, Z., Peng, Y., Li, J. & Tang, Z. Composite foundation settlement prediction based on LSTM-transformer model for CFG. Appl. Sci. 14, 732. https://doi.org/10.3390/app14020732 (2024).
Article CAS Google Scholar
Wu, X., Yang, S., Zhang, D. & Zhang, L. Transformer based neural network for daily ground settlement prediction of foundation pit considering spatial correlation. Plos one 18, e0294501. https://doi.org/10.1371/journal.pone.0294501 (2023).
Article CAS PubMed PubMed Central Google Scholar
Shamshad, F. et al. Transformers in medical imaging: A survey. Med. Image Anal. 88, 102802. https://doi.org/10.1016/j.media.2023.102802 (2023).
Article PubMed Google Scholar
Wang, H., Li, K., Zhang, J., Hong, L. & Chi, H. Monitoring and analysis of ground surface settlement in mining clusters by SBAS-INSAR technology. Sensors 22, 3711. https://doi.org/10.3390/s22103711 (2022).
Article ADS PubMed PubMed Central Google Scholar
Mu, B. et al. Monitoring, modelling and prediction of segmental lining deformation and ground settlement of an EPB tunnel in different soils. Tunnel. Undergr. Sp. Technol. 113, 103870. https://doi.org/10.1016/j.tust.2021.103870 (2021).
Article Google Scholar
Yang, H., Lu, J., Chen, L. & Liu, W. Analysis of the influence of slope excavation in the expansion of high-speed railway on the settlement safety of existing expressway. In Advances in Frontier Research on Engineering Structures. Vol. 2. 58–64. https://doi.org/10.1016/j.tust.2021.103870 (CRC Press, 2023).
Dui, H., Dong, X., Wu, X., Chen, L. & Bai, G. IOT-enabled risk warning and maintenance strategy optimization for tunnel-induced ground settlement. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2024.3377440 (2024).
Article Google Scholar
He, W., Duan, Y., Deng, L. & Zhou, W. Risk assessment and early-warning system for high-speed railway during the construction and operation of underpass bridges. J. Perform. Construct. Fac. 30, C4015003. https://doi.org/10.1061/(ASCE)CF.1943-5509.0000717 (2016).
Article Google Scholar
Jehanzaib, M., Shah, S. A., Son, H. J., Jang, S.-H. & Kim, T.-W. Predicting hydrological drought alert levels using supervised machine-learning classifiers. KSCE J. Civ. Eng. 26, 3019–3030. https://doi.org/10.1007/s12205-022-1367-8 (2022).
Article Google Scholar
Cai, J. et al. Feeder fault warning of distribution network based on xgboost. J. Phys. Conf. Ser. 1639, 012037. https://doi.org/10.1088/1742-6596/1639/1/012037 (IOP Publishing, 2020).
Zhao, Y. & Tian, S. Hazard identification and early warning system based on stochastic forest algorithm in underground coal mine. J. Intell. Fuzzy Syst. 41, 1193–1202. https://doi.org/10.3233/JIFS-210105 (2021).
Article Google Scholar
Yan, Z. et al. Gas outburst warning method in driving faces: Enhanced methodology through optuna optimization, adaptive normalization, and transformer framework. Sensors 24, 3150. https://doi.org/10.3390/s24103150 (2024).
Article ADS PubMed PubMed Central Google Scholar
Song, Y., Du, H., Piao, T. & Shi, H. Research on financial risk intelligent monitoring and early warning model based on LSTM, transformer, and deep learning. J. Organ. End User Comput. (JOEUC) 36, 1–24. https://doi.org/10.4018/JOEUC.337607 (2024).
Article CAS Google Scholar
Li, Z., Li, D. & Sun, T. A transformer-based bridge structural response prediction framework. Sensors 22, 3100. https://doi.org/10.3390/s22083100 (2022).
Article ADS PubMed PubMed Central Google Scholar
Chen, H., Tian, A., Zhang, Y. & Liu, Y. Early time series classification using TCN-transformer. In 2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT). 1079–1082. https://doi.org/10.1109/ICCASIT55263.2022.9986835 (IEEE, 2022).
Liu, J. et al. Remote sensing time series classification based on self-attention mechanism and time sequence enhancement. Remote Sens. 13, 1804. https://doi.org/10.3390/rs13091804 (2021).
Article ADS Google Scholar
Niu, Z., Zhong, G. & Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091 (2021).
Article Google Scholar
Li, J., Jin, K., Zhou, D., Kubota, N. & Ju, Z. Attention mechanism-based CNN for facial expression recognition. Neurocomputing 411, 340–350. https://doi.org/10.1016/j.neucom.2020.06.014 (2020).
Article Google Scholar
Xuan, Y., Wan, Y. & Chen, J. Time series classification by LSTM based on multi-scale convolution and attention mechanism. J. Comput. Appl. 42, 2343 (2022).
Google Scholar
Zhou, K., Wang, W., Hu, T. & Deng, K. Time series forecasting and classification models based on recurrent with attention mechanism and generative adversarial networks. Sensors 20, 7211. https://doi.org/10.3390/s20247211 (2020).
Article ADS PubMed PubMed Central Google Scholar
Zhang, D. et al. Capture and prediction of rainfall-induced landslide warning signals using an attention-based temporal convolutional neural network and entropy weight methods. Sensors 22, 6240. https://doi.org/10.3390/s20247211 (2022).
Article ADS PubMed PubMed Central Google Scholar
Ming-Yu, C., Le, T. & Mao-zu, G. Bayesian neural network-based equipment operational trend prediction method using channel attention mechanism. IEEE Access https://doi.org/10.1109/ACCESS.2024.3367829 (2024).
Article Google Scholar
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Ge, Q. et al. Litetransnet: An interpretable approach for landslide displacement prediction using transformer model with attention mechanism. Eng. Geol. 331, 107446. https://doi.org/10.1016/j.enggeo.2024.107446 (2024).
Article Google Scholar
Peng, Y., Zhang, Y., Tu, B., Li, Q. & Li, W. Spatial-spectral transformer with cross-attention for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 60, 1–15. https://doi.org/10.1109/TGRS.2022.3203476 (2022).
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant No. 42072319) This work was supported in part by the Key Research and Development Program of Shaanxi Province(Grant No. 2023-YBSF-487)

Author information

Wen Kebing and Liang Qinghuai contributed equally to this work.

Authors and Affiliations

School of Civil Engineering, Beijing Jiaotong University, Beijing, 100044, China
Wen Kebing & Liang Qinghuai
Xi’an Rail Transit Group Co., Ltd., Xi’an, 710018, China
Wen Kebing

Authors

Wen Kebing
View author publications
Search author on:PubMed Google Scholar
Liang Qinghuai
View author publications
Search author on:PubMed Google Scholar

Contributions

W. K and L Q wrote the main manuscript text, W.K conceived and conducted the experiment All authors reviewed the manuscript.

Corresponding author

Correspondence to Wen Kebing.

Ethics declarations

Competing interests

The authors declare there is no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kebing, W., Qinghuai, L. Settlement early warning method for high speed railway subgrades based on TD Transformer. Sci Rep 15, 19746 (2025). https://doi.org/10.1038/s41598-025-05067-0

Download citation

Received: 02 August 2024
Accepted: 30 May 2025
Published: 05 June 2025
Version of record: 05 June 2025
DOI: https://doi.org/10.1038/s41598-025-05067-0

Subjects

Abstract

Similar content being viewed by others

A simplified method for determining earthquake early warning thresholds in high speed railways

Geohazard assessment of Mexico City’s Metro system derived from SAR interferometry observations

Identification of railway subgrade defects based on ground penetrating radar

Introduction

Related work

Settlement warning method

Transformer

Attention mechanism

Methods

Transformer

TSEA

DGTA

Experiment

Data preparation

Ablation studies

Result of TSEA

Result of DGTA

Comprehensive experiment

Sequential experiment

Experimental result

Conclusion

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links