Introduction

With the acceleration of urbanization and the growing demand for transportation, rail transit systems such as subways and high-speed railways have become integral components of modern urban transportation networks. In the context of urban planning and rail transit construction, the phenomenon of subways tunneling under high-speed railway subgrades is becoming increasingly common. Due to the extensive underground excavation and support required for subway tunnel construction, the stability of high-speed railway subgrades can be compromised, leading to settlement or even deformation. The stability of high-speed railway subgrades is crucial for the safe operation of high-speed trains. Uneven subgrade settlement can cause track deformation, affecting the safety of train travel and potentially leading to serious incidents such as derailments. Therefore, in the construction process of subway underpassing high-speed railway subgrade, real-time monitoring of the settlement of high-speed railway subgrade and timely warning of potential risks are the key measures to ensure the safe operation of high-speed railway.

Currently, high-speed railway subgrade settlement monitoring typically relies on a high-precision sensor network deployed on the subgrade. These sensors can collect real-time data on settlement, strain, and tilt1. Effectively processing and analyzing this vast amount of sequential data, and providing timely warnings, has become a critical research focus. With the development of big data technology and artificial intelligence, sequence data analysis methods based on machine learning2 and deep learning3offer new solutions for high-speed railway subgrade settlement early warning.Near the new railway project of high-speed railway operation,it is crucial to assess the potential impact of additional settlement and deformation on the existing infrastructure during the design phase. Furthermore, effective measures must be implemented during both the design and construction phases. Liang F et al.4 proposed a study on high-speed railway subgrade settlement early warning mechanisms based on artificial neural networks (ANNs), utilizing ANNs to address challenges related to subgrade settlement prevention.

However, most methods do not take into account the unique characteristics of high-speed railway subgrade settlement data, namely:

1. Blurred Features of High-Speed Railway Subgrade Settlement Data: For the settlement data of similar high-speed railway subgrade, the data comes from various sensors, and its own data characteristics are not obvious, so it is difficult to extract effectively.

2. Long-Term Dependency in Settlement Sequence Data: Long-term dependency refers to the phenomenon where the current value in a sequence not only depends on recent data points but also has significant associations with data points further in the past. Capturing these long-term dependencies is crucial for accurate prediction and early warning of high-speed railway subgrade settlement.

In recent years, Transformers have gained increasing recognition, attracting many prominent scholars to the field of Transformer research5. Transformers are now widely applied across various domains, including data mining6, time series prediction7, and other related fields, achieving significant results8.

To address the aforementioned issues, this paper uses Transformer to use the surface settlement data during shield construction as the basis, and proposes a high-speed railway subgrade settlement early warning method based on TD Transformer combined with the actual situation. This method incorporates Temporal-Spatial Enhanced Attention(TSEA) and Dynamic Global Temporal Attention (DGTA) to effectively extract key features of settlement data and dynamically capture long-term dependencies. The effectiveness of this method is validated through a practical shield tunneling project where the Xi’an Metro Line 1 tunnels beneath the Xulan high-speed railway.

Related work

Settlement warning method

Settlement early warning for high-speed railway subgrades is a crucial research direction to ensure the safe operation and structural stability of high-speed railways. With the rapid expansion of the high-speed rail network, effectively monitoring and predicting settlement conditions has become a focus for both academia and the engineering community. Existing high-speed railway settlement methods mainly include surface monitoring9, underground monitoring10, numerical simulation11, and intelligent early warning12.Although these methods are grounded in soil mechanics theory, they typically depend on accurate parameter calibration and encounter difficulties when applied to complex geological conditions.As science and technology advance, intelligent early warning systems have become mainstream. These systems combine big data and artificial intelligence technologies, using machine learning algorithms to analyze and process monitoring data to establish high-speed railway settlement early warning models. He et al.13proposed a comprehensive risk assessment and early warning method. By integrating expert consultation, finite element model analysis, a combination of quantitative and qualitative risk assessment, and ARIMA method for deformation prediction, they successfully established a settlement early warning system for the underground passage of Cuiling Road in the Beijing-Tianjin Intercity Railway. This system effectively reduces risks and ensures structural safety by leveraging big data and artificial intelligence technologies, showing potential for application in similar projects.Jehanzaib et al.14while achieving certain results using Decision Tree (DT), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) classifiers for prediction, faced some shortcomings in feature extraction. They neglected the temporal dependency information in time series data and failed to fully exploit the temporal dynamics of the data. Cai et al.15 used XGBoost to construct a prediction model. Although experiments showed the method’s effectiveness in early warning, the feature extraction resulted in blurred features, making it difficult to clearly reflect important patterns in the data. Zhao et al.16designed a hazard source identification and early warning system based on the RF algorithm, effectively addressing the issues of large errors and long response times in traditional systems. Simulation results showed that the average identification error of the system was only 4.1%, and the early warning response time could be controlled within 9 s. However, the system failed to effectively capture long-term dependencies in the data, limiting the model’s ability to fully utilize historical information for prediction.More recently, machine learning methods such as support vector machines (SVMs) and random forests have improved prediction accuracy, yet they often lack interpretability and may underperform with small datasets. These limitations highlight the need for a more robust, data-driven approach capable of modeling complex spatiotemporal dependencies in settlement behavior—a gap our Transformer-based framework aims to address by leveraging self-attention mechanisms for enhanced feature extraction and early warning performance.

Transformer

Since its introduction in 2017, the Transformer model has achieved significant results, greatly enhancing computational efficiency and parallel processing capabilities through its self-attention mechanism. It has seen notable success in both NLP and CV fields. Recently, Transformer models have also been applied to fault early warning systems.Yan et al.17proposed a hybrid prediction and early warning model based on time series analysis, utilizing adaptive normalization, GRU, EEMD, and the Optuna framework for optimization. This approach achieved highly accurate predictions and early warnings, significantly improving the early warning capabilities and predictive performance of smart operations.Song et al.18introduced a model that integrates LSTM and Self-Attention, which, despite improving the monitoring and prediction accuracy of financial market risks by capturing long-term correlations and trends, still relied on traditional time series techniques for feature extraction, resulting in blurred features. Li et al.19proposed a prediction framework based on Transformers that accurately captures time series features through multi-layer encoders and attention modules. Data validation demonstrated that this framework outperformed the LSTM framework, reducing prediction errors by 50%. However, the model still struggled with dynamically capturing long-term dependencies in time series data, affecting accurate diagnosis and early warning.Chen et al.20developed a framework named TCN-Transformer. To overcome the limitations of RNNs, this framework combines Temporal Convolutional Networks (TCNs) and Transformers to extract both local and global features. Additionally, a loss function was designed to ensure classification performance while focusing more on early features.

Attention mechanism

The Attention Mechanism plays a crucial role in the field of deep learning, especially in processing time series data21 and natural language processing22. This mechanism enhances the efficiency and effectiveness of information processing by allowing models to focus on the most important parts when handling information.

Attention mechanisms based on Convolutional Neural Networks (CNN) are typically used for image and video processing but can also be applied to time series data23. This type of attention mechanism is usually achieved by adding a weight layer, which weights the importance of different parts of the input data.Xuan Y. et al.24improved Long Short-Term Memory networks (MCA-LSTM) by employing a multi-channel attention mechanism, which constructs memory channels through weighted attention to enhance cross-channel feature information interaction. This approach allows the detection of wind farm faults approximately 10 hours earlier than traditional records.Zhou et al.25proposed several models for time series prediction and classification, including LSTM with autoencoder and temporal attention mechanism, and TCN models with temporal attention mechanism26. Chang et al.27 introduced the Channel Attention Mechanism to enhance prediction accuracy and stability in time series forecasting. This mechanism captures key features and contextual information in the data, automatically adjusting the weights of feature channels to focus on the impact of critical features.

Self-attention is currently one of the most popular attention models, especially in the fields of time series and natural language processing. This mechanism does not rely on traditional sequence alignment but instead calculates the relationships between different parts of the sequence to determine the focus area. This allows the model to handle longer sequences and achieve a deeper understanding of context.The Transformer model proposed by Vaswani et al28is a significant milestone in the application of the Attention Mechanism. Its core idea is to use the Masked Multi-head Self Attention Mechanism to capture dependencies between any two positions in a sequence29. Due to CNNs’ limitations in establishing dependencies over long sequence data, they face challenges in processing hyperspectral sequence features.To overcome these limitations, inspired by the Transformer model, Peng et al.30proposed a Cross-Attention Spatial-Spectral Transformer (CASST) method.

Methods

To address the issues of vague features in existing high-speed railway settlement data and the difficulty in capturing long-term dependencies, this paper proposes a Settlement Early Warning Method for High-Speed Railway Subgrades Based on TD Transformer. We utilize Time-Series Enhanced Attention (TSEA) and Dynamic Global Temporal Attention (DGTA) to construct feature representations for settlement early warning.First, we use the Transformer model to predict the settlement conditions of high-speed railway subgrades. (As shown in Fig. 1.) The Transformer model is well-suited for handling time series data. Next, we apply TSEA to the input part of the model, obtaining weighted representations of sensor values based on their attention scores. These learned attention scores indicate the contribution of each settlement influencing factor to the feature representations used in subsequent layers, enhancing the model’s feature extraction capability.Finally, DGTA aggregates information from all time steps in the sequence through the Attention Mechanism, generating a dynamic global representation. This approach is highly effective in capturing the overall long-term dependencies within the entire time series. Detailed discussions on TSEA and DGTA are provided in the following sections.

Fig. 1
figure 1

TD transformer network structure.

Transformer

Compared to traditional time series processing models, the Transformer excels in handling more complex features, including nonlinear and hierarchical characteristics, largely due to its multi-head self-attention mechanism. This mechanism allows the model to simultaneously focus on different parts of the input sequence, thereby enhancing its ability to capture intricate relationships within the sequence.

The original Transformer model we use consists of three parts: input, encoder, and output. Firstly, the input part of the Transformer is the result of adding the input embedding and the positional embedding. Secondly, the Transformer’s encoder primarily employs Multi-Headed Self-Attention (MHSA), which assigns a weight to each position in the input sequence and then uses these weighted positional vectors as the output for the next part. Finally, the output part of the Transformer passes the feature information processed by the encoder to the softmax layer for final classification.

TSEA

TSEA (Temporal-Spatial Enhanced Attention) is a convolution-based attention mechanism. In neural networks, attention mechanisms allow the model to effectively focus on the most important parts of the input data. For time series data, such as sedimentation factors, this helps the original Transformer model identify and emphasize the most informative time steps or influencing factors for the final task.The implementation of TSEA is carried out through the following steps, just as shown in Fig. 2. First, dilated convolution is used to extract features, capturing long-term dependencies in the time series data. Next, a 1x1 convolution is applied to aggregate the features into a single channel, reducing dimensionality and preparing for the calculation of attention scores. Then, separable convolution is used to further process the features, reducing the number of parameters and optimizing the computation of attention weights. Finally, the Softmax function is applied to compute the attention scores, allowing the model to assign different importance to various features or time steps. This process produces a weighted version of the input, where important information from each sensor’s data is highlighted through learned features. Therefore, TSEA is not just a conventional convolutional layer; it also integrates an attention mechanism to enhance the model’s feature extraction capabilities, providing more salient feature information.

Fig. 2
figure 2

TSEA module network structure.

DGTA

We utilize DGTA to extract feature representations at each time point and evaluate their relative importance in predicting the subsidence warning level within a given time window, as well as capturing their long-term dependencies. Using the attention scores obtained from Eq. (1), we compute a dynamically weighted average feature representation that encompasses information from all time points within the observation window. This representation is achieved by dynamically adjusting the attention scores through learnable scaling factors, which enhances the Transformer’s expressive power, particularly in handling subsidence data with long-term dependencies. This weighted average feature vector is subsequently used by the feed forward network layers to determine the current subsidence status.

$$\begin{aligned} & g^{\left( t_i\right) }=\tanh \left( W_{g a} \cdot s^{\left( t_i\right) }+b_{g a}\right) \end{aligned}$$
(1)
$$\begin{aligned} & a^{\left( t_i\right) }=\frac{\exp \left( s \cdot \left( g^{\left( t_i\right) }\right) ^T \cdot g_s\right) }{\sum _t \exp \left( s \cdot g^{\left( t_i\right) } \cdot g_s\right) } \end{aligned}$$
(2)
$$\begin{aligned} & c_i=\sum _t a^{\left( t_i\right) } \cdot s^{\left( t_i\right) } \end{aligned}$$
(3)

The parameters \(W_{g a}\)and \(b_{g a}\)mentioned in Eq. (1) are learned during the training process and are used to extract hidden representations from each vector \(s^{\left( t_i\right) }\)generated by the self-attention module. The parameter\(g_s\) in Eq.  (2) captures temporal context and assists in computing attention scores during learning. The factors s represents a dynamic scaling factor that can be learned to adjust the overall attention scores, allowing the model to adapt more flexibly to the classification task of subsidence factors. Finally, in Eq. (3) a weighted sum is computed based on the relative importance of each time step, producing a weighted aggregate that serves as the input feature vector for subsequent fully connected layers.

Experiment

Data preparation

The data used in this paper were collected from subgrade settlement monitoring of the No.1 Subway in Xi’an from Qin Du Station to Bao Quan Road Station. This subway requires digging tunnels underneath two railway lines, including the Xu Lan High-Speed Railway and the Long Hai Railway. We researched the settlement data of two real datasets, Xu Lan High-Speed Railway and Long Hai Railway, from July 2022 to March 2025, with a monitoring period of 1005 days and a acquisition cycle of hours, totaling 24120 data points. The train set consists of 14472 data points, the test set consists of 4824 data points, and the validation set consists of 4824 data points.

Monitoring data were collected through IoT smart remote control terminals and transmitted wirelessly in real-time to the monitoring data platform. Initial deformation values were transmitted to the platform via acquisition software, and subsequent monitoring values were collected in real-time. Due to the strong continuity of the dataset in this paper, the polynomial interpolation method is used to infer reasonable missing values using existing data, thereby reducing information loss.Table 1presents the high-speed railway subgrades settlement warning levels based on settlement distance and settlement velocity.

Table 1 High-speed railway subgrades settlement warning levels.

Based on long-term industry background verification, the settlement warning level for high-speed railway subgrades is determined by two main factors: settlement distance and settlement velocity. Table 2 presents partial settlement data in the Xu Lan high-speed railway,which include the data of settlement distance, distance warning level, settlement velocity, velocity warning level and the comprehensive warning level. The comprehensive warning level is the final result of settlement warning for high-speed railway subgrades. In the Table 2, the data of settlement distance and settlement velocity are monitored through IoT smart remote control terminals. The distance warning level and velocity warning level are obtained by the data of Table 1. The comprehensive warning level is labeled by engineers in actual scenarios.

Table 2 Settlement data in the Xu Lan high-speed railway.

When using the TD Transformer model to implement warning tasks, the data in Table 2 is used as the training set for the model. The input features primarily consist of settlement distance values, settlement velocity, distance warning level, velocity warning level and comprehensive warning levels. The model outputs include predicted settlement distance values, settlement velocity, and their associated comprehensive warning levels.The common evaluation metrics are Accuracy, Precision, Recall, and F1-Score. These metrics assess the model’s performance from different perspectives:

Accuracy: Represents the proportion of correctly predicted samples out of the total number of samples. High accuracy indicates that the model performs well overall; however, it may be biased in datasets with class imbalances.

$$\begin{aligned} \text{ Accuracy } =\frac{T P+T N}{N} \end{aligned}$$
(4)

Precision: Measures the accuracy of the model in predicting positive samples, i.e., the proportion of true positive samples among all samples predicted as positive.

$$\begin{aligned} \text{ Precision } =\frac{T N}{F P+F P} \end{aligned}$$
(5)

Recall: Measures the model’s ability to identify positive samples, i.e., the proportion of true positive samples correctly predicted among all actual positive samples. High recall indicates that the model has few false negatives.

$$\begin{aligned} \text{ Recall } =\frac{T P}{T P+F N} \end{aligned}$$
(6)

F1-Score: The harmonic mean of Precision and Recall, it considers both false positives and false negatives. A high F1-Score indicates that the model performs more reliably when dealing with class imbalance.

$$\begin{aligned} F 1=\frac{2 \times \text{ Precision } \times \text{ Recall } }{ \text{ Precision } + \text{ Recall } } \end{aligned}$$
(7)

Ablation studies

Result of TSEA

To validate the effectiveness of the proposed method, we conducted relevant experiments under ablation conditions. By comparing TSEA and DGTA with other attention mechanisms, we obtained experimental results that further demonstrate the superiority of our attention mechanism. As shown in Table 3 , we compared TSEA with other CNN-based attention mechanisms. The results indicate that TSEA effectively extracts temporal and spatial features from ground subsidence data, enhancing the model’s ability to capture subsidence changes.

Table 3 Results of ablation experiments on the Xu Lan High Speed Railway dataset.
Fig. 3
figure 3

Comparison of model training accuracy and loss.

Figure 3 demonstrates that all enhanced Transformer models achieve better training performance compared to the baseline. The Transformer + TSEA model stands out with the fastest convergence and the highest training accuracy, suggesting that the integration of Temporal-Spatial Enhanced Attention significantly improves the model’s learning efficiency. Although the Transformer + Channel Attention and Transformer + TCN models also show substantial reductions in training loss, their accuracy improvements are more gradual. The baseline Transformer, in contrast, converges more slowly and exhibits greater fluctuation in loss. Throughout training, none of the models show obvious signs of overfitting, as indicated by the steady increase in accuracy and consistent decline in loss. Among them, the TSEA-based model maintains strong performance and stability, indicating promising generalization capability.

Result of DGTA

In the experimental results shown in Table 4, we compared DGTA with other self-attention-based mechanisms such as Cross-Attention and Masked Multi-head Self-Attention (MMSA), verifying the performance improvement of the model with the introduction of DGTA. DGTA uses a dynamically weighted average feature representation that effectively captures the long-term dependencies in time series data, enhancing the model’s stability and accuracy in handling complex subsidence patterns. By dynamically adjusting the attention scores through learned parameters, the model achieves greater accuracy in predicting subsidence warning levels.

Table 4 Results of ablation experiments on the Xu Lan High Speed Railway dataset.

The results in Table 4 demonstrate the effectiveness of the proposed TD Transformer-based subsidence early warning method for high-speed railway embankments. The Transformer model performs well across various metrics, indicating its high predictive accuracy and stability in processing and analyzing ground subsidence data. From the table, it is evident that the proposed method’s effectiveness has been fully validated. The Transformer model’s strong performance on all indicators shows its capability in accurately predicting and analyzing ground subsidence data. Figure 4 presents the training accuracy and loss of the baseline Transformer and its improved variants. All enhanced models demonstrate faster convergence compared to the baseline, with the Transformer + DGTA achieving the highest accuracy and the lowest, most stable loss throughout training. This suggests that the Dynamic Global Temporal Attention (DGTA) mechanism not only accelerates learning but also enhances model robustness. The Transformer + Cross-Attention and Transformer + MMSA models also contribute to performance gains, showing clear reductions in training loss, though their accuracy improvement is more gradual. Importantly, none of the models exhibit signs of overfitting, as the loss continues to decline steadily and no sharp fluctuations appear in later epochs, indicating good generalization behavior during training.

Fig. 4
figure 4

Comparison of model training accuracy and loss.

Comprehensive experiment

Table 5 shows that the TSEA mechanism effectively resolves feature blurring in high-speed railway subsidence data extraction. The Transformer + TSEA model achieves 93.10% accuracy, 92.85% precision, 93.05% recall, and 92.97% F1-score, with 87.1M parameters and 720 samples/s inference speed. Compared to the baseline Transformer, TSEA improves performance with only a 0.8% parameter increase and a minimal 7% speed reduction. The DGTA module further enhances long-term dependency modeling, boosting accuracy to 93.20%, precision to 93.00%, recall to 93.10%, and F1-score to 93.05%, while using 89.3M parameters and maintaining 650 samples/s throughput. This demonstrates that DGTA’s added complexity is justified by its performance gains. Our TD Transformer, combining both mechanisms, achieves the best results: 93.39% accuracy, 93.10% precision, 93.40% recall, and 93.24% F1-score. With 90.2M parameters and 520 samples/s speed, it balances computational cost and accuracy, improving over the baseline by 1.24 percentage points while maintaining efficient inference. These results confirm that TSEA and DGTA synergistically enhance the Transformer for subsidence early warning, with accuracy gains outweighing computational overhead. The TD Transformer’s optimal performance makes it suitable for real-world deployment in railway monitoring systems.

Table 5 Experimental results of ablation on the Xu Lan High Speed Railway dataset.
Table 6 Experimental results of ablation on the Long Hai High Speed Railway dataset.

Table 6 presents the ablation results on the Long Hai High-Speed Railway dataset, demonstrating the effectiveness of the proposed TSEA and DGTA modules. The baseline Transformer model achieves 91.67% accuracy, 91.20% precision, 91.50% recall, and a 91.35 F1-score with 86.4M parameters while processing 810 samples per second. Integrating the TSEA mechanism improves performance to 92.48% accuracy, 92.10% precision, 92.30% recall, and a 92.20 F1-score using 87.1M parameters at 690 samples per second, confirming its ability to refine local spatial and temporal representations in noisy railway subsidence data. The addition of the DGTA module further enhances results to 92.95% accuracy, 92.60% precision, 92.80% recall, and a 92.70 F1-score with 89.3M parameters running at 620 samples per second, demonstrating superior capability in capturing long-term temporal dependencies for subsidence trend analysis. The complete TD Transformer model combining both modules achieves state-of-the-art performance with 93.35% accuracy, 93.10% precision, 93.20% recall, and a 93.1 F1-score using 90.2M parameters while maintaining efficient inference at 570 samples per second, establishing an optimal balance between feature refinement, temporal modeling, and computational efficiency for railway subsidence prediction tasks.

Sequential experiment

As shown in Table 7, The experimental results indicate that applying TSEA before DGTA yields relatively better evaluation metrics. Starting with TSEA may better capture dependencies and structural information, providing more detailed feature information, which then enhances the effectiveness of DGTA in dynamically capturing the characteristics of subsidence data. This sequential synergy optimizes the overall model performance.

Table 7 Results of sequential experiments on the Xu Lan High Speed Railway dataset.

Experimental result

This study establishes a high-speed railway subsidence early warning model using two indicators: Settlement Distance and Settlement Velocity. These indicators are jointly used to determine the subsidence warning level. The warning level for each indicator is calculated based on the warning range, and the overall warning level is computed by combining the weights and warning levels of each indicator. The experimental results of the TD Transformer algorithm are shown in Table 8 .

Table 8 TD transformer experimental results.

Table 8 reveals significant uncertainty in high-speed railway subsidence and its rate of change. The data indicate that regions with larger settlement values also correspond to higher settlement velocities, suggesting more pronounced subsidence changes in these areas. This is fully reflected in the subsidence early warning mechanism. The TD Transformer model not only effectively captures both short-term and long-term subsidence changes but also weights features based on the importance of different sensor data, thereby enhancing its ability to warn about subsidence conditions. Table 9 presents the experimental results of SVM, XGBoost, RF, Transformer, and TD Transformer across four evaluation metrics: Accuracy, Precision, Recall, and F1-Score.

Table 9 Experimental results with different models on the Xu Lan High Speed Railway dataset.

As observed from Table 9 and Fig. 5, the evaluation metrics of the TD Transformer model surpass those of the traditional Transformer model. The TD Transformer achieves an accuracy of 93.39%, representing a 1.24% improvement over the Transformer. Its precision is 93.10%, a 1.3% increase; recall is 94.40%, also a 1.3% increase; and the F1 score is 93.24%, a 1.27% improvement. These results indicate that the TD Transformer excels in all aspects, particularly in precision and F1 score, demonstrating its advantages in this task. The findings highlight the significant superiority of the TD Transformer in handling the high-speed railway subsidence early warning task compared to other methods.

Fig. 5
figure 5

Experimental results of SVM, XGBoot, RF, Transformer and TD transformer.

Conclusion

High-speed railway subgrade settlement is crucial for the safety of high-speed trains. Existing monitoring data features are often vague, and long-term dependencies are difficult to capture, increasing the difficulty of early warning. To address these issues, this paper proposes a Settlement Early Warning Method for High-Speed Railway Subgrades Based on TD Transformer. This method first employs Time-Series Enhanced Attention (TSEA) for feature extraction, effectively improving the model’s feature extraction capability and solving the problem of feature vagueness. Secondly, Dynamic Global Temporal Attention (DGTA) is used to dynamically capture the long-term dependencies of the settlement data. Experimental results show that TD Transformer outperforms traditional high-speed railway subgrade settlement early warning methods in various metrics. Compared to other models, TD Transformer demonstrates significant advantages in Accuracy, Precision, Recall, and F1-Score.However, despite its outstanding performance, the TD Transformer method has some limitations. Firstly, its high computational complexity leads to a significant increase in hardware resource requirements, limiting its application in resource-constrained environments. Secondly, the model’s early warning effectiveness may be affected by unforeseen factors such as extreme weather or sudden events. For instance, extreme weather can cause rapid changes in subgrade settlement, and sudden events can trigger abnormal settlement, leading to less accurate short-term warnings by the model.Future research should focus on optimizing the model’s computational efficiency and considering more external factors to enhance the robustness of the early warning system, thereby more effectively coping with unforeseen settlement changes.