Abstract
During high speed railway construction, shield-tunnel undercrossing frequently induces subgrade settlement, which threatens project safety and progress. Existing settlement monitoring methods struggle to provide timely early warnings due to unclear data features and inadequate long-term dependency modeling.To address this, we propose a settlement early warning method for high-speed railway subgrades based on TD Transformer. Firstly, we utilize temporal-spatial enhanced attention (TSEA) for feature extraction from high-speed railway settlement data, effectively resolving the problem of vague features post-extraction. Secondly, dynamic global temporal attention (DGTA) is employed to dynamically capture and represent the long-term dependencies of settlement data. Experimental results demonstrate that TD Transformer achieves Accuracy, Precision, Recall, and F1-Score of 93.39%, 93.10%, 93.40%, and 93.24%, respectively, outperforming other advanced settlement early warning methods for high-speed railway subgrade with relative improvements of 1.24%, 1.3%, 1.3%, and 1.27%.This method effectively forecasts subgrade settlement and exhibits significant superiority in the task of multi-factor settlement early warning for high-speed railway subgrades.
Similar content being viewed by others
Introduction
With the acceleration of urbanization and the growing demand for transportation, rail transit systems such as subways and high-speed railways have become integral components of modern urban transportation networks. In the context of urban planning and rail transit construction, the phenomenon of subways tunneling under high-speed railway subgrades is becoming increasingly common. Due to the extensive underground excavation and support required for subway tunnel construction, the stability of high-speed railway subgrades can be compromised, leading to settlement or even deformation. The stability of high-speed railway subgrades is crucial for the safe operation of high-speed trains. Uneven subgrade settlement can cause track deformation, affecting the safety of train travel and potentially leading to serious incidents such as derailments. Therefore, in the construction process of subway underpassing high-speed railway subgrade, real-time monitoring of the settlement of high-speed railway subgrade and timely warning of potential risks are the key measures to ensure the safe operation of high-speed railway.
Currently, high-speed railway subgrade settlement monitoring typically relies on a high-precision sensor network deployed on the subgrade. These sensors can collect real-time data on settlement, strain, and tilt1. Effectively processing and analyzing this vast amount of sequential data, and providing timely warnings, has become a critical research focus. With the development of big data technology and artificial intelligence, sequence data analysis methods based on machine learning2 and deep learning3offer new solutions for high-speed railway subgrade settlement early warning.Near the new railway project of high-speed railway operation,it is crucial to assess the potential impact of additional settlement and deformation on the existing infrastructure during the design phase. Furthermore, effective measures must be implemented during both the design and construction phases. Liang F et al.4 proposed a study on high-speed railway subgrade settlement early warning mechanisms based on artificial neural networks (ANNs), utilizing ANNs to address challenges related to subgrade settlement prevention.
However, most methods do not take into account the unique characteristics of high-speed railway subgrade settlement data, namely:
1. Blurred Features of High-Speed Railway Subgrade Settlement Data: For the settlement data of similar high-speed railway subgrade, the data comes from various sensors, and its own data characteristics are not obvious, so it is difficult to extract effectively.
2. Long-Term Dependency in Settlement Sequence Data: Long-term dependency refers to the phenomenon where the current value in a sequence not only depends on recent data points but also has significant associations with data points further in the past. Capturing these long-term dependencies is crucial for accurate prediction and early warning of high-speed railway subgrade settlement.
In recent years, Transformers have gained increasing recognition, attracting many prominent scholars to the field of Transformer research5. Transformers are now widely applied across various domains, including data mining6, time series prediction7, and other related fields, achieving significant results8.
To address the aforementioned issues, this paper uses Transformer to use the surface settlement data during shield construction as the basis, and proposes a high-speed railway subgrade settlement early warning method based on TD Transformer combined with the actual situation. This method incorporates Temporal-Spatial Enhanced Attention(TSEA) and Dynamic Global Temporal Attention (DGTA) to effectively extract key features of settlement data and dynamically capture long-term dependencies. The effectiveness of this method is validated through a practical shield tunneling project where the Xi’an Metro Line 1 tunnels beneath the Xulan high-speed railway.
Related work
Settlement warning method
Settlement early warning for high-speed railway subgrades is a crucial research direction to ensure the safe operation and structural stability of high-speed railways. With the rapid expansion of the high-speed rail network, effectively monitoring and predicting settlement conditions has become a focus for both academia and the engineering community. Existing high-speed railway settlement methods mainly include surface monitoring9, underground monitoring10, numerical simulation11, and intelligent early warning12.Although these methods are grounded in soil mechanics theory, they typically depend on accurate parameter calibration and encounter difficulties when applied to complex geological conditions.As science and technology advance, intelligent early warning systems have become mainstream. These systems combine big data and artificial intelligence technologies, using machine learning algorithms to analyze and process monitoring data to establish high-speed railway settlement early warning models. He et al.13proposed a comprehensive risk assessment and early warning method. By integrating expert consultation, finite element model analysis, a combination of quantitative and qualitative risk assessment, and ARIMA method for deformation prediction, they successfully established a settlement early warning system for the underground passage of Cuiling Road in the Beijing-Tianjin Intercity Railway. This system effectively reduces risks and ensures structural safety by leveraging big data and artificial intelligence technologies, showing potential for application in similar projects.Jehanzaib et al.14while achieving certain results using Decision Tree (DT), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) classifiers for prediction, faced some shortcomings in feature extraction. They neglected the temporal dependency information in time series data and failed to fully exploit the temporal dynamics of the data. Cai et al.15 used XGBoost to construct a prediction model. Although experiments showed the method’s effectiveness in early warning, the feature extraction resulted in blurred features, making it difficult to clearly reflect important patterns in the data. Zhao et al.16designed a hazard source identification and early warning system based on the RF algorithm, effectively addressing the issues of large errors and long response times in traditional systems. Simulation results showed that the average identification error of the system was only 4.1%, and the early warning response time could be controlled within 9 s. However, the system failed to effectively capture long-term dependencies in the data, limiting the model’s ability to fully utilize historical information for prediction.More recently, machine learning methods such as support vector machines (SVMs) and random forests have improved prediction accuracy, yet they often lack interpretability and may underperform with small datasets. These limitations highlight the need for a more robust, data-driven approach capable of modeling complex spatiotemporal dependencies in settlement behavior—a gap our Transformer-based framework aims to address by leveraging self-attention mechanisms for enhanced feature extraction and early warning performance.
Transformer
Since its introduction in 2017, the Transformer model has achieved significant results, greatly enhancing computational efficiency and parallel processing capabilities through its self-attention mechanism. It has seen notable success in both NLP and CV fields. Recently, Transformer models have also been applied to fault early warning systems.Yan et al.17proposed a hybrid prediction and early warning model based on time series analysis, utilizing adaptive normalization, GRU, EEMD, and the Optuna framework for optimization. This approach achieved highly accurate predictions and early warnings, significantly improving the early warning capabilities and predictive performance of smart operations.Song et al.18introduced a model that integrates LSTM and Self-Attention, which, despite improving the monitoring and prediction accuracy of financial market risks by capturing long-term correlations and trends, still relied on traditional time series techniques for feature extraction, resulting in blurred features. Li et al.19proposed a prediction framework based on Transformers that accurately captures time series features through multi-layer encoders and attention modules. Data validation demonstrated that this framework outperformed the LSTM framework, reducing prediction errors by 50%. However, the model still struggled with dynamically capturing long-term dependencies in time series data, affecting accurate diagnosis and early warning.Chen et al.20developed a framework named TCN-Transformer. To overcome the limitations of RNNs, this framework combines Temporal Convolutional Networks (TCNs) and Transformers to extract both local and global features. Additionally, a loss function was designed to ensure classification performance while focusing more on early features.
Attention mechanism
The Attention Mechanism plays a crucial role in the field of deep learning, especially in processing time series data21 and natural language processing22. This mechanism enhances the efficiency and effectiveness of information processing by allowing models to focus on the most important parts when handling information.
Attention mechanisms based on Convolutional Neural Networks (CNN) are typically used for image and video processing but can also be applied to time series data23. This type of attention mechanism is usually achieved by adding a weight layer, which weights the importance of different parts of the input data.Xuan Y. et al.24improved Long Short-Term Memory networks (MCA-LSTM) by employing a multi-channel attention mechanism, which constructs memory channels through weighted attention to enhance cross-channel feature information interaction. This approach allows the detection of wind farm faults approximately 10 hours earlier than traditional records.Zhou et al.25proposed several models for time series prediction and classification, including LSTM with autoencoder and temporal attention mechanism, and TCN models with temporal attention mechanism26. Chang et al.27 introduced the Channel Attention Mechanism to enhance prediction accuracy and stability in time series forecasting. This mechanism captures key features and contextual information in the data, automatically adjusting the weights of feature channels to focus on the impact of critical features.
Self-attention is currently one of the most popular attention models, especially in the fields of time series and natural language processing. This mechanism does not rely on traditional sequence alignment but instead calculates the relationships between different parts of the sequence to determine the focus area. This allows the model to handle longer sequences and achieve a deeper understanding of context.The Transformer model proposed by Vaswani et al28is a significant milestone in the application of the Attention Mechanism. Its core idea is to use the Masked Multi-head Self Attention Mechanism to capture dependencies between any two positions in a sequence29. Due to CNNs’ limitations in establishing dependencies over long sequence data, they face challenges in processing hyperspectral sequence features.To overcome these limitations, inspired by the Transformer model, Peng et al.30proposed a Cross-Attention Spatial-Spectral Transformer (CASST) method.
Methods
To address the issues of vague features in existing high-speed railway settlement data and the difficulty in capturing long-term dependencies, this paper proposes a Settlement Early Warning Method for High-Speed Railway Subgrades Based on TD Transformer. We utilize Time-Series Enhanced Attention (TSEA) and Dynamic Global Temporal Attention (DGTA) to construct feature representations for settlement early warning.First, we use the Transformer model to predict the settlement conditions of high-speed railway subgrades. (As shown in Fig. 1.) The Transformer model is well-suited for handling time series data. Next, we apply TSEA to the input part of the model, obtaining weighted representations of sensor values based on their attention scores. These learned attention scores indicate the contribution of each settlement influencing factor to the feature representations used in subsequent layers, enhancing the model’s feature extraction capability.Finally, DGTA aggregates information from all time steps in the sequence through the Attention Mechanism, generating a dynamic global representation. This approach is highly effective in capturing the overall long-term dependencies within the entire time series. Detailed discussions on TSEA and DGTA are provided in the following sections.
Transformer
Compared to traditional time series processing models, the Transformer excels in handling more complex features, including nonlinear and hierarchical characteristics, largely due to its multi-head self-attention mechanism. This mechanism allows the model to simultaneously focus on different parts of the input sequence, thereby enhancing its ability to capture intricate relationships within the sequence.
The original Transformer model we use consists of three parts: input, encoder, and output. Firstly, the input part of the Transformer is the result of adding the input embedding and the positional embedding. Secondly, the Transformer’s encoder primarily employs Multi-Headed Self-Attention (MHSA), which assigns a weight to each position in the input sequence and then uses these weighted positional vectors as the output for the next part. Finally, the output part of the Transformer passes the feature information processed by the encoder to the softmax layer for final classification.
TSEA
TSEA (Temporal-Spatial Enhanced Attention) is a convolution-based attention mechanism. In neural networks, attention mechanisms allow the model to effectively focus on the most important parts of the input data. For time series data, such as sedimentation factors, this helps the original Transformer model identify and emphasize the most informative time steps or influencing factors for the final task.The implementation of TSEA is carried out through the following steps, just as shown in Fig. 2. First, dilated convolution is used to extract features, capturing long-term dependencies in the time series data. Next, a 1x1 convolution is applied to aggregate the features into a single channel, reducing dimensionality and preparing for the calculation of attention scores. Then, separable convolution is used to further process the features, reducing the number of parameters and optimizing the computation of attention weights. Finally, the Softmax function is applied to compute the attention scores, allowing the model to assign different importance to various features or time steps. This process produces a weighted version of the input, where important information from each sensor’s data is highlighted through learned features. Therefore, TSEA is not just a conventional convolutional layer; it also integrates an attention mechanism to enhance the model’s feature extraction capabilities, providing more salient feature information.
DGTA
We utilize DGTA to extract feature representations at each time point and evaluate their relative importance in predicting the subsidence warning level within a given time window, as well as capturing their long-term dependencies. Using the attention scores obtained from Eq. (1), we compute a dynamically weighted average feature representation that encompasses information from all time points within the observation window. This representation is achieved by dynamically adjusting the attention scores through learnable scaling factors, which enhances the Transformer’s expressive power, particularly in handling subsidence data with long-term dependencies. This weighted average feature vector is subsequently used by the feed forward network layers to determine the current subsidence status.
The parameters \(W_{g a}\)and \(b_{g a}\)mentioned in Eq. (1) are learned during the training process and are used to extract hidden representations from each vector \(s^{\left( t_i\right) }\)generated by the self-attention module. The parameter\(g_s\) in Eq. (2) captures temporal context and assists in computing attention scores during learning. The factors s represents a dynamic scaling factor that can be learned to adjust the overall attention scores, allowing the model to adapt more flexibly to the classification task of subsidence factors. Finally, in Eq. (3) a weighted sum is computed based on the relative importance of each time step, producing a weighted aggregate that serves as the input feature vector for subsequent fully connected layers.
Experiment
Data preparation
The data used in this paper were collected from subgrade settlement monitoring of the No.1 Subway in Xi’an from Qin Du Station to Bao Quan Road Station. This subway requires digging tunnels underneath two railway lines, including the Xu Lan High-Speed Railway and the Long Hai Railway. We researched the settlement data of two real datasets, Xu Lan High-Speed Railway and Long Hai Railway, from July 2022 to March 2025, with a monitoring period of 1005 days and a acquisition cycle of hours, totaling 24120 data points. The train set consists of 14472 data points, the test set consists of 4824 data points, and the validation set consists of 4824 data points.
Monitoring data were collected through IoT smart remote control terminals and transmitted wirelessly in real-time to the monitoring data platform. Initial deformation values were transmitted to the platform via acquisition software, and subsequent monitoring values were collected in real-time. Due to the strong continuity of the dataset in this paper, the polynomial interpolation method is used to infer reasonable missing values using existing data, thereby reducing information loss.Table 1presents the high-speed railway subgrades settlement warning levels based on settlement distance and settlement velocity.
Based on long-term industry background verification, the settlement warning level for high-speed railway subgrades is determined by two main factors: settlement distance and settlement velocity. Table 2 presents partial settlement data in the Xu Lan high-speed railway,which include the data of settlement distance, distance warning level, settlement velocity, velocity warning level and the comprehensive warning level. The comprehensive warning level is the final result of settlement warning for high-speed railway subgrades. In the Table 2, the data of settlement distance and settlement velocity are monitored through IoT smart remote control terminals. The distance warning level and velocity warning level are obtained by the data of Table 1. The comprehensive warning level is labeled by engineers in actual scenarios.
When using the TD Transformer model to implement warning tasks, the data in Table 2 is used as the training set for the model. The input features primarily consist of settlement distance values, settlement velocity, distance warning level, velocity warning level and comprehensive warning levels. The model outputs include predicted settlement distance values, settlement velocity, and their associated comprehensive warning levels.The common evaluation metrics are Accuracy, Precision, Recall, and F1-Score. These metrics assess the model’s performance from different perspectives:
Accuracy: Represents the proportion of correctly predicted samples out of the total number of samples. High accuracy indicates that the model performs well overall; however, it may be biased in datasets with class imbalances.
Precision: Measures the accuracy of the model in predicting positive samples, i.e., the proportion of true positive samples among all samples predicted as positive.
Recall: Measures the model’s ability to identify positive samples, i.e., the proportion of true positive samples correctly predicted among all actual positive samples. High recall indicates that the model has few false negatives.
F1-Score: The harmonic mean of Precision and Recall, it considers both false positives and false negatives. A high F1-Score indicates that the model performs more reliably when dealing with class imbalance.
Ablation studies
Result of TSEA
To validate the effectiveness of the proposed method, we conducted relevant experiments under ablation conditions. By comparing TSEA and DGTA with other attention mechanisms, we obtained experimental results that further demonstrate the superiority of our attention mechanism. As shown in Table 3 , we compared TSEA with other CNN-based attention mechanisms. The results indicate that TSEA effectively extracts temporal and spatial features from ground subsidence data, enhancing the model’s ability to capture subsidence changes.
Figure 3 demonstrates that all enhanced Transformer models achieve better training performance compared to the baseline. The Transformer + TSEA model stands out with the fastest convergence and the highest training accuracy, suggesting that the integration of Temporal-Spatial Enhanced Attention significantly improves the model’s learning efficiency. Although the Transformer + Channel Attention and Transformer + TCN models also show substantial reductions in training loss, their accuracy improvements are more gradual. The baseline Transformer, in contrast, converges more slowly and exhibits greater fluctuation in loss. Throughout training, none of the models show obvious signs of overfitting, as indicated by the steady increase in accuracy and consistent decline in loss. Among them, the TSEA-based model maintains strong performance and stability, indicating promising generalization capability.
Result of DGTA
In the experimental results shown in Table 4, we compared DGTA with other self-attention-based mechanisms such as Cross-Attention and Masked Multi-head Self-Attention (MMSA), verifying the performance improvement of the model with the introduction of DGTA. DGTA uses a dynamically weighted average feature representation that effectively captures the long-term dependencies in time series data, enhancing the model’s stability and accuracy in handling complex subsidence patterns. By dynamically adjusting the attention scores through learned parameters, the model achieves greater accuracy in predicting subsidence warning levels.
The results in Table 4 demonstrate the effectiveness of the proposed TD Transformer-based subsidence early warning method for high-speed railway embankments. The Transformer model performs well across various metrics, indicating its high predictive accuracy and stability in processing and analyzing ground subsidence data. From the table, it is evident that the proposed method’s effectiveness has been fully validated. The Transformer model’s strong performance on all indicators shows its capability in accurately predicting and analyzing ground subsidence data. Figure 4 presents the training accuracy and loss of the baseline Transformer and its improved variants. All enhanced models demonstrate faster convergence compared to the baseline, with the Transformer + DGTA achieving the highest accuracy and the lowest, most stable loss throughout training. This suggests that the Dynamic Global Temporal Attention (DGTA) mechanism not only accelerates learning but also enhances model robustness. The Transformer + Cross-Attention and Transformer + MMSA models also contribute to performance gains, showing clear reductions in training loss, though their accuracy improvement is more gradual. Importantly, none of the models exhibit signs of overfitting, as the loss continues to decline steadily and no sharp fluctuations appear in later epochs, indicating good generalization behavior during training.
Comprehensive experiment
Table 5 shows that the TSEA mechanism effectively resolves feature blurring in high-speed railway subsidence data extraction. The Transformer + TSEA model achieves 93.10% accuracy, 92.85% precision, 93.05% recall, and 92.97% F1-score, with 87.1M parameters and 720 samples/s inference speed. Compared to the baseline Transformer, TSEA improves performance with only a 0.8% parameter increase and a minimal 7% speed reduction. The DGTA module further enhances long-term dependency modeling, boosting accuracy to 93.20%, precision to 93.00%, recall to 93.10%, and F1-score to 93.05%, while using 89.3M parameters and maintaining 650 samples/s throughput. This demonstrates that DGTA’s added complexity is justified by its performance gains. Our TD Transformer, combining both mechanisms, achieves the best results: 93.39% accuracy, 93.10% precision, 93.40% recall, and 93.24% F1-score. With 90.2M parameters and 520 samples/s speed, it balances computational cost and accuracy, improving over the baseline by 1.24 percentage points while maintaining efficient inference. These results confirm that TSEA and DGTA synergistically enhance the Transformer for subsidence early warning, with accuracy gains outweighing computational overhead. The TD Transformer’s optimal performance makes it suitable for real-world deployment in railway monitoring systems.
Table 6 presents the ablation results on the Long Hai High-Speed Railway dataset, demonstrating the effectiveness of the proposed TSEA and DGTA modules. The baseline Transformer model achieves 91.67% accuracy, 91.20% precision, 91.50% recall, and a 91.35 F1-score with 86.4M parameters while processing 810 samples per second. Integrating the TSEA mechanism improves performance to 92.48% accuracy, 92.10% precision, 92.30% recall, and a 92.20 F1-score using 87.1M parameters at 690 samples per second, confirming its ability to refine local spatial and temporal representations in noisy railway subsidence data. The addition of the DGTA module further enhances results to 92.95% accuracy, 92.60% precision, 92.80% recall, and a 92.70 F1-score with 89.3M parameters running at 620 samples per second, demonstrating superior capability in capturing long-term temporal dependencies for subsidence trend analysis. The complete TD Transformer model combining both modules achieves state-of-the-art performance with 93.35% accuracy, 93.10% precision, 93.20% recall, and a 93.1 F1-score using 90.2M parameters while maintaining efficient inference at 570 samples per second, establishing an optimal balance between feature refinement, temporal modeling, and computational efficiency for railway subsidence prediction tasks.
Sequential experiment
As shown in Table 7, The experimental results indicate that applying TSEA before DGTA yields relatively better evaluation metrics. Starting with TSEA may better capture dependencies and structural information, providing more detailed feature information, which then enhances the effectiveness of DGTA in dynamically capturing the characteristics of subsidence data. This sequential synergy optimizes the overall model performance.
Experimental result
This study establishes a high-speed railway subsidence early warning model using two indicators: Settlement Distance and Settlement Velocity. These indicators are jointly used to determine the subsidence warning level. The warning level for each indicator is calculated based on the warning range, and the overall warning level is computed by combining the weights and warning levels of each indicator. The experimental results of the TD Transformer algorithm are shown in Table 8 .
Table 8 reveals significant uncertainty in high-speed railway subsidence and its rate of change. The data indicate that regions with larger settlement values also correspond to higher settlement velocities, suggesting more pronounced subsidence changes in these areas. This is fully reflected in the subsidence early warning mechanism. The TD Transformer model not only effectively captures both short-term and long-term subsidence changes but also weights features based on the importance of different sensor data, thereby enhancing its ability to warn about subsidence conditions. Table 9 presents the experimental results of SVM, XGBoost, RF, Transformer, and TD Transformer across four evaluation metrics: Accuracy, Precision, Recall, and F1-Score.
As observed from Table 9 and Fig. 5, the evaluation metrics of the TD Transformer model surpass those of the traditional Transformer model. The TD Transformer achieves an accuracy of 93.39%, representing a 1.24% improvement over the Transformer. Its precision is 93.10%, a 1.3% increase; recall is 94.40%, also a 1.3% increase; and the F1 score is 93.24%, a 1.27% improvement. These results indicate that the TD Transformer excels in all aspects, particularly in precision and F1 score, demonstrating its advantages in this task. The findings highlight the significant superiority of the TD Transformer in handling the high-speed railway subsidence early warning task compared to other methods.
Conclusion
High-speed railway subgrade settlement is crucial for the safety of high-speed trains. Existing monitoring data features are often vague, and long-term dependencies are difficult to capture, increasing the difficulty of early warning. To address these issues, this paper proposes a Settlement Early Warning Method for High-Speed Railway Subgrades Based on TD Transformer. This method first employs Time-Series Enhanced Attention (TSEA) for feature extraction, effectively improving the model’s feature extraction capability and solving the problem of feature vagueness. Secondly, Dynamic Global Temporal Attention (DGTA) is used to dynamically capture the long-term dependencies of the settlement data. Experimental results show that TD Transformer outperforms traditional high-speed railway subgrade settlement early warning methods in various metrics. Compared to other models, TD Transformer demonstrates significant advantages in Accuracy, Precision, Recall, and F1-Score.However, despite its outstanding performance, the TD Transformer method has some limitations. Firstly, its high computational complexity leads to a significant increase in hardware resource requirements, limiting its application in resource-constrained environments. Secondly, the model’s early warning effectiveness may be affected by unforeseen factors such as extreme weather or sudden events. For instance, extreme weather can cause rapid changes in subgrade settlement, and sudden events can trigger abnormal settlement, leading to less accurate short-term warnings by the model.Future research should focus on optimizing the model’s computational efficiency and considering more external factors to enhance the robustness of the early warning system, thereby more effectively coping with unforeseen settlement changes.
Data availibility
Due to concerns related to privacy information, project confidentiality, data security, etc., the dataset generated and/or analyzed during the current research period is not publicly available, but can be obtained from the corresponding author upon reasonable request.
References
Chao, T. et al. An IOT-based early warning system for settlement monitoring using differential pressure static level. In 2024 26th International Conference on Advanced Communications Technology (ICACT). 01–05. https://doi.org/10.23919/ICACT60172.2024.10471941 (IEEE, 2024).
Li, C. et al. Prediction of surface settlement induced by large-diameter shield tunneling based on machine-learning algorithms. Geofluids 2022, 4174768. https://doi.org/10.1155/2022/4174768 (2022).
Liu, M. et al. Gated transformer networks for multivariate time series classification. arXiv preprint arXiv:2103.14438 (2021).
Liang, F. Research on early warning mechanism of high-speed railway subgrade settlement based on artificial neural network. In 2023 Smart City Challenges & Outcomes for Urban Transformation (SCOUT). 218–222. https://doi.org/10.1109/SCOUT58937.2023.00049 (2023).
Lin, T., Wang, Y., Liu, X. & Qiu, X. A survey of transformers. AI Open 3, 111–132. https://doi.org/10.1016/j.aiopen.2022.10.001 (2022).
Li, Z., Peng, Y., Li, J. & Tang, Z. Composite foundation settlement prediction based on LSTM-transformer model for CFG. Appl. Sci. 14, 732. https://doi.org/10.3390/app14020732 (2024).
Wu, X., Yang, S., Zhang, D. & Zhang, L. Transformer based neural network for daily ground settlement prediction of foundation pit considering spatial correlation. Plos one 18, e0294501. https://doi.org/10.1371/journal.pone.0294501 (2023).
Shamshad, F. et al. Transformers in medical imaging: A survey. Med. Image Anal. 88, 102802. https://doi.org/10.1016/j.media.2023.102802 (2023).
Wang, H., Li, K., Zhang, J., Hong, L. & Chi, H. Monitoring and analysis of ground surface settlement in mining clusters by SBAS-INSAR technology. Sensors 22, 3711. https://doi.org/10.3390/s22103711 (2022).
Mu, B. et al. Monitoring, modelling and prediction of segmental lining deformation and ground settlement of an EPB tunnel in different soils. Tunnel. Undergr. Sp. Technol. 113, 103870. https://doi.org/10.1016/j.tust.2021.103870 (2021).
Yang, H., Lu, J., Chen, L. & Liu, W. Analysis of the influence of slope excavation in the expansion of high-speed railway on the settlement safety of existing expressway. In Advances in Frontier Research on Engineering Structures. Vol. 2. 58–64. https://doi.org/10.1016/j.tust.2021.103870 (CRC Press, 2023).
Dui, H., Dong, X., Wu, X., Chen, L. & Bai, G. IOT-enabled risk warning and maintenance strategy optimization for tunnel-induced ground settlement. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2024.3377440 (2024).
He, W., Duan, Y., Deng, L. & Zhou, W. Risk assessment and early-warning system for high-speed railway during the construction and operation of underpass bridges. J. Perform. Construct. Fac. 30, C4015003. https://doi.org/10.1061/(ASCE)CF.1943-5509.0000717 (2016).
Jehanzaib, M., Shah, S. A., Son, H. J., Jang, S.-H. & Kim, T.-W. Predicting hydrological drought alert levels using supervised machine-learning classifiers. KSCE J. Civ. Eng. 26, 3019–3030. https://doi.org/10.1007/s12205-022-1367-8 (2022).
Cai, J. et al. Feeder fault warning of distribution network based on xgboost. J. Phys. Conf. Ser. 1639, 012037. https://doi.org/10.1088/1742-6596/1639/1/012037 (IOP Publishing, 2020).
Zhao, Y. & Tian, S. Hazard identification and early warning system based on stochastic forest algorithm in underground coal mine. J. Intell. Fuzzy Syst. 41, 1193–1202. https://doi.org/10.3233/JIFS-210105 (2021).
Yan, Z. et al. Gas outburst warning method in driving faces: Enhanced methodology through optuna optimization, adaptive normalization, and transformer framework. Sensors 24, 3150. https://doi.org/10.3390/s24103150 (2024).
Song, Y., Du, H., Piao, T. & Shi, H. Research on financial risk intelligent monitoring and early warning model based on LSTM, transformer, and deep learning. J. Organ. End User Comput. (JOEUC) 36, 1–24. https://doi.org/10.4018/JOEUC.337607 (2024).
Li, Z., Li, D. & Sun, T. A transformer-based bridge structural response prediction framework. Sensors 22, 3100. https://doi.org/10.3390/s22083100 (2022).
Chen, H., Tian, A., Zhang, Y. & Liu, Y. Early time series classification using TCN-transformer. In 2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT). 1079–1082. https://doi.org/10.1109/ICCASIT55263.2022.9986835 (IEEE, 2022).
Liu, J. et al. Remote sensing time series classification based on self-attention mechanism and time sequence enhancement. Remote Sens. 13, 1804. https://doi.org/10.3390/rs13091804 (2021).
Niu, Z., Zhong, G. & Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62. https://doi.org/10.1016/j.neucom.2021.03.091 (2021).
Li, J., Jin, K., Zhou, D., Kubota, N. & Ju, Z. Attention mechanism-based CNN for facial expression recognition. Neurocomputing 411, 340–350. https://doi.org/10.1016/j.neucom.2020.06.014 (2020).
Xuan, Y., Wan, Y. & Chen, J. Time series classification by LSTM based on multi-scale convolution and attention mechanism. J. Comput. Appl. 42, 2343 (2022).
Zhou, K., Wang, W., Hu, T. & Deng, K. Time series forecasting and classification models based on recurrent with attention mechanism and generative adversarial networks. Sensors 20, 7211. https://doi.org/10.3390/s20247211 (2020).
Zhang, D. et al. Capture and prediction of rainfall-induced landslide warning signals using an attention-based temporal convolutional neural network and entropy weight methods. Sensors 22, 6240. https://doi.org/10.3390/s20247211 (2022).
Ming-Yu, C., Le, T. & Mao-zu, G. Bayesian neural network-based equipment operational trend prediction method using channel attention mechanism. IEEE Access https://doi.org/10.1109/ACCESS.2024.3367829 (2024).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
Ge, Q. et al. Litetransnet: An interpretable approach for landslide displacement prediction using transformer model with attention mechanism. Eng. Geol. 331, 107446. https://doi.org/10.1016/j.enggeo.2024.107446 (2024).
Peng, Y., Zhang, Y., Tu, B., Li, Q. & Li, W. Spatial-spectral transformer with cross-attention for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 60, 1–15. https://doi.org/10.1109/TGRS.2022.3203476 (2022).
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant No. 42072319) This work was supported in part by the Key Research and Development Program of Shaanxi Province(Grant No. 2023-YBSF-487)
Author information
Authors and Affiliations
Contributions
W. K and L Q wrote the main manuscript text, W.K conceived and conducted the experiment All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare there is no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Kebing, W., Qinghuai, L. Settlement early warning method for high speed railway subgrades based on TD Transformer. Sci Rep 15, 19746 (2025). https://doi.org/10.1038/s41598-025-05067-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-05067-0







