Evaluation of geological hazards susceptibility along a key railway based on machine learning

Liang, Jiarong; Qi, Wenwen; Xu, Chong; Wang, Peng; Sun, Jingjing; Zhang, Xuewei; Xue, Zhiwen; Chen, Jingyu; Cui, Yulong; Pan, Junwei; Wang, Juanling; Sun, Qihao

doi:10.1038/s41598-025-26496-x

Download PDF

Article
Open access
Published: 27 November 2025

Evaluation of geological hazards susceptibility along a key railway based on machine learning

Jiarong Liang^1,2,
Wenwen Qi ORCID: orcid.org/0000-0002-8439-4339^1,2,
Chong Xu^1,3,
Peng Wang⁵,
Jingjing Sun^1,2,
Xuewei Zhang^1,2,
Zhiwen Xue^1,4,
Jingyu Chen^6,11,
Yulong Cui⁷,
Junwei Pan⁸,
Juanling Wang⁹ &
…
Qihao Sun¹⁰

Scientific Reports volume 15, Article number: 42497 (2025) Cite this article

1487 Accesses
Metrics details

Subjects

Abstract

Geological hazards, such as landslides, pose significant risks to railway infrastructure. This study assesses landslide susceptibility along a critical section of the Hefei-Fuzhou High-Speed Railway, which traverses a geologically complex region. We developed a landslide susceptibility model using the Random Forest (RF) algorithm based on a historical landslide inventory. The model demonstrates excellent predictive performance, and the resulting susceptibility map highlights that a significant portion of the study area falls within the high and extremely high susceptibility zones. These high-risk areas are spatially correlated with steep slopes and high annual precipitation, providing crucial guidance for regional disaster prevention. Furthermore, an assessment of influencing factor importance using the RF algorithm revealed that slope is the dominant factor affecting landslide susceptibility. The varying importance rankings of other factors between the northern and southern sub-regions highlight the complex and diverse landslide mechanisms driven by different geological conditions. This research fills a key gap in geological hazard mapping for a major high-speed railway corridor in southeastern China. The findings and methodological approach offer a valuable framework for regional infrastructure planning and geological risk management in similar developing regions.

Evaluation of landslide susceptibility of mountain highway based on RF and SVM models

Article Open access 10 July 2025

Study on landslide hazard risk in Wenzhou based on slope units and machine learning approaches

Article Open access 03 March 2025

Geoinformation-based landslide susceptibility mapping in subtropical area

Article Open access 21 December 2021

Introduction

Geological hazards, particularly landslides, are significant natural processes that pose a major threat to human lives and infrastructure globally^1,2. Geological hazards cause extensive economic losses and casualties each year³, often triggered by a complex interplay of environmental factors (such as topography, geology, climate) and triggering factors (e.g., seismic activity, heavy rainfall)⁴. To mitigate these risks, landslide susceptibility mapping (LSM) has become an indispensable tool for spatially predicting landslide occurrence and guiding effective disaster management and land use planning⁵.

Traditionally, LSM relied on heuristic and statistical methods⁶, in recent years, the field has been revolutionized by the emergence of machine learning (ML) models. These models are increasingly favored due to their superior ability to capture complex, non-linear relationships within high-dimensional data⁷, leading to more accurate and reliable predictions. Among the various ML algorithms, the random forest (RF) model has gained widespread recognition^8,9. It is an ensemble learning algorithm that combines the power of multiple decision trees, offering exceptional stability, high accuracy, and computational efficiency¹⁰. Furthermore, a key advantage of the RF model is its ability to quantify the importance of different environmental factors¹¹, providing critical insights into the underlying mechanisms of landslide development. while SVM are known for their effectiveness in high-dimensional spaces¹². The XGBoost¹³ and deep learning models¹⁴ have been explored to further enhance prediction performance.

Despite the proliferation of ML-based LSM studies, a significant gap remains in applying these advanced techniques to specific, high-stakes linear infrastructure, such as high-speed railways¹⁵. Notably, infrastructure projects, particularly railway construction, are critically influenced by geological hazards¹⁶. The construction and use of roads can damage surface vegetation, leading to slope instability that threatens the safety of the infrastructure¹⁷. Slope failures can not only disrupt transportation and damage property but also cause casualties and adversely affect socio-economic stability and regional development. Therefore, it is particularly important to evaluate the risk of geological hazards along major constructing or operating roads.

The Hefei-Fuzhou High-speed Railway is a critical transportation artery connecting the central and southeastern regions in China. This railway traversed a diverse and complex geomorphological landscape, including plains, hills and mountains, making it highly vulnerable to geological hazards¹⁸. In recent years, there have been several catastrophic geological hazards reported in nearby regions. For example, an embankment collapse disrupted the Meida Highway and killed 48 travelers in May 2024¹⁹. A disastrous avalanche in Lishui City claimed 38 lives in November 2015²⁰. A major landslide in Chengtian village, Zhejiang destroyed a section of a provincial highway and blocked the local traffic for 17 days²¹. These catastrophic hazards in nearby regions with similar climatic and topographic characteristics highlight the necessity of mapping regional landslide susceptibility for this crucial linear infrastructure²². Zhou et al.²³ and Huang et al.²⁴ mapped landslide susceptibility for few counties in Jiangxi Province and no work has been done for Anhui Province, the northern part of our study area. Zhou et al.²⁵ mapped landslide susceptibility for Fujian Province. While these regional scale LSM studies have been conducted in the provinces along this corridor, they often lack the fine-scale resolution required for infrastructure-specific risk management. Specifically, previous studies have not provided a comprehensive and tailored landslide susceptibility assessment for this crucial railway corridor, nor have they systematically addressed the distinct landslide-driving mechanisms in its geologically different northern and southern sections.

To address these critical knowledge gaps, this study provides a comprehensive landslide susceptibility assessment for the Hefei-Fuzhou High-Speed Railway corridor using the Random Forest model. We aim to offer robust scientific support for disaster prevention and mitigation along this major transportation artery. The central objectives of this study are threefold:

To develop a robust landslide susceptibility model using the Random Forest algorithm and generate a high-resolution susceptibility map for the railway corridor.
To evaluate the importance of various landslide influencing factors to identify the dominant drivers of slope instability in the study area.
To conduct a comparative analysis of the northern and southern sub-regions to reveal segment-specific controlling mechanisms and provide a foundation for tailored risk mitigation strategies.

This study will fill a key gap in geological hazard mapping for one of China’s most important high-speed railway corridors, offering a valuable methodological framework for similar infrastructure projects in developing regions.

Study area

In this study, the Huangshan-Fuzhou section of Hefei-Fuzhou High-Speed Railway and its vicinity is selected as the study area. This crucial transportation artery, located in southeastern China, traverses a complex geological and geomorphological landscape, passing through five cities in Anhui Province, one city in Jiangxi Province, and two cities in Fujian Province before reaching Fuzhou City (Fig. 1).

The study area exhibits significant topographic variability, with elevation ranging from approximately − 8 m to over 2000 m. The terrain is dominated by low-to-medium mountains and hills, characterized by steep slopes and deeply incised valleys, which are conducive to slope instability. The region spans multiple tectonic units, including the Yangtze Plate and South China Plate, and exhibits well-developed fault structures and relatively active neotectonic movements. The main exposed strata consist predominantly of Presinian mica-quartz schist and the third and fourth intrusive granites from the Yanshanian period, with developed folds and relatively fractured rock mass, resulting in a complex geological hazard background²⁶.

The region is characterized by a subtropical monsoon climate, with a distinct long rainy season extending from April to September with an average annual rainfall of 1283–2308 mm. The area is also frequently affected by typhoons in the summer and autumn, which can bring short-duration, high-intensity rainfall events. Such extreme precipitation acts as a key triggering factor, leading to soil saturation, increased pore water pressure, and a subsequent decrease in slope stability, thereby significantly increasing landslide risk²⁷.

The study area has a history of significant landslide activity. As of 2020, more than 9,000 landslides of varying scales had been recorded within Jiangxi Province alone²⁸. There are many landslides in the northeast and southwest regions of the corridor, and the distribution is concentrated (the red dots in Fig. 1 represent large-scale landslide traces). These large-scale landslide traces are closely associated with areas of high relief and intense human activity, highlighting the need for a systematic and comprehensive susceptibility assessment.

Methodology

Data

Landslide and non-landslide samples

A comprehensive landslide inventory for the study area was compiled through a multi-step process. Using high-resolution satellite imagery from Google Earth, we employed visual interpretation to identify and delineate the boundaries of landslide traces along the Hefei-Fuzhou High-Speed Railway corridor^30,31. This was systematically verified and supplemented by field investigations and a review of existing literature, ensuring the precise spatial positioning and accuracy of the inventory. This process yielded a dataset of 2,299 landslide locations, which were used as our positive samples (landslides distribution shown in Fig. 1).

To create a balanced dataset, an equal number of non-landslide points (negative samples) were generated. We first identified the non-landslide areas by taking the spatial complement of the landslide polygons within the study region. Subsequently, 2,299 random points were generated within these non-landslide areas using GIS software. To ensure spatial representativeness and avoid sampling bias, a minimum distance of 1 km was maintained between any two sample points. The final dataset, consisting of 4,598 landslide and non-landslide points, was carefully screened for invalid values and served as the training data for the machine learning model.

Environmental factors

The occurrence of landslides is a complex phenomenon influenced by the interaction of multiple environmental factors. A scientifically grounded selection of these factors is crucial for developing an accurate landslide susceptibility model. Based on a comprehensive review of existing literature^32,33 and the specific characteristics of the study area, we selected 13 key factors, which were categorized into four groups: topographic, geological, hydrological, and land cover factors (Table 1; Fig. 2).

Topographic factors

Topography plays a crucial role in controlling landslide occurrence by influencing slope stability, water flow and weathering processes³⁴. Six topographic factors were derived from a high-resolution Digital Elevation Model (DEM), including Slope, Aspect, Curvature, Topographic Position Index (TPI), Topographic Relief, Terrain Wetness Index (TWI). Slope angle is widely recognized as the most critical factor influencing landslide susceptibility. Steeper slopes are more susceptible to failure as they experience higher shear stress, reducing overall stability. Aspect affects the duration and intensity of solar radiation and rainfall received by a slope. This in turn influences soil moisture, weathering processes, and vegetation distribution, leading to variations in slope stability³⁵. Curvature affects the convergence or divergence of surface runoff and subsurface water flow, which affects soil moisture and pore water pressure, thereby impacting slope stability³⁶. TPI reflects the relative position of a point on a slope compared to its surroundings, which helps to identify different landforms (e.g., valleys, ridges, and slopes) with distinct susceptibility characteristics. Relief can measure the complexity of the terrain in the region by quantifying the fluctuation of the terrain. TWI represents the spatial distribution of soil moisture, which is a critical hydrological factor³⁷.

Geological factors

Among the geological factors, we selected geology and distance to faults as evaluation indicators. The geological map of the study area provides a basis for assessing susceptibility based on the lithological characteristics of a region. Faults are zones of structural weakness and fracturing. The closer a location is to a fault, the higher the likelihood of rock mass fracturing and weathering, making it more susceptible to landslide³⁸.

Hydrological and land cover factors

Hydrological conditions and human activities are also key influencing factors that affect slope stability. River erosion at the toe of a slope can reduce its lateral support, leading to instability. Therefore, proximity to rivers is often a significant factor in landslide susceptibility. Precipitation is a primary triggering factor for landslides. Prolonged and intense rainfall increases soil saturation, raises groundwater levels, and elevates pore water pressure, reducing the shear strength of the slope material. Land cover reflects both natural conditions and human activities, disturbing slope equilibrium and significantly increasing landslide risk. Fractional vegetation cover reflects the density of vegetation in a region, directly influencing slope stability by affecting the efficiency of rainfall infiltration and the intensity of surface runoff. Higher FVC generally correlates with greater slope stability.

All these factors were processed using GIS software, and their data sources are detailed in Table 1. The spatial distribution of these factors across the study area is shown in Fig. 2.

Table 1 Environmental factors and data sources.

Full size table

Methods

The overall workflow of this study is depicted in the flow chart (Fig. 3). Firstly, a landslide inventory was created by visual interpretation of high-resolution satellite images, followed by field investigation. The landslide samples and the randomly generated non-landslide points constitute the model training samples, which were then integrated with the environmental factor data. RF algorithm was used to train the susceptibility model, which was subsequently applied to the entire study area to predict landslide susceptibility values. Finally, a landslide susceptibility map was generated and classified into five risk levels (Very Low, Low, Moderate, High and Very High) using the Natural Breakpoint Classification method.

Building landslide susceptibility model

As an ensemble learning algorithm, RF utilizes decision trees as base classifiers. It repeatedly performs bootstrap sampling from the original dataset to generate multiple subsets, and constructs a decision tree for each subset. These individual trees are then combined into an ensemble. During prediction, each tree votes on the classification outcome, and the final result is determined by majority voting across all trees in the forest.

For the model training, we configured the hyperparameters as follows: the forest was composed of 100 decision trees, each with a maximum depth of 8. The minimum number of samples required to split an internal node was set to 10, while the minimum for a leaf node was 5. At each split, the number of features considered was the square root of the total feature count. The model was trained using the 13 environmental factors described previously. To prepare the data, the landslide and non-landslide samples were divided into training and validation sets at an 80/20 ratio, with a fixed random seed (random_state = 42) to ensure reproducibility.

The choice of 100 trees for our RF model was carefully considered based on a trade-off between computational efficiency and predictive performance. Although increasing the number of trees (e.g., to 500 or 1 000) could theoretically enhance model stability. Preliminary tests confirmed that the model’s accuracy and stability plateaus around 100 trees, while using more trees significantly increases computation time without a meaningful improvement in predictive power.

Evaluating model performance

Model accuracy

Receiver Operating Characteristic (ROC) and Area Under the Curve (AUC) are used to comprehensively evaluate the classification performance of the model⁴⁵. The AUC value was calculated based on the ROC curve, which ranges from 0 to 1. Generally, greater than 0.7 of the AUC value indicates a better discriminatory ability and a more accurate model⁴⁶. The ROC curve is plotted based on the True Positive Rate and False Positive Rate of the model at different thresholds.

Model reliability

Calibration curve or Reliability diagram is frequently used to assess model uncertainty⁴⁷. The Mean Square Error of the calibration curve (a.k.a. the Brier score), the Expected Calibration Error (ECE) and the Maximum Calibration Error (MCE) are three important indicators in model reliability evaluation⁴⁸. They are always used to measure the gap between the predicted probability and the actual probability. The lower the value of these indicators, the better the model’s reliability is.

Quantifying the importance of environmental factors

To quantify the importance of the environmental factors, this study employed a dual-approach analysis. First, we used the Gini Impurity metric, an intrinsic feature importance method of the RF model, to conduct a global assessment⁴⁹. This method calculates a feature’s overall importance by measuring the average reduction in node impurity it provides when used for splitting, aggregated across all trees in the forest. This allowed us to identify the dominant factors influencing landslide susceptibility across the entire study area. Second, to compare the controlling factors within the distinct sub-regions, we utilized Permutation Importance⁵⁰. This method assesses a feature’s significance by measuring the decrease in model performance when its values are randomly shuffled. A substantial drop in prediction accuracy indicates that the feature is critical for the specific sub-region. By training the same RF model on data from different sub-regions, we were able to conduct a fine-grained analysis of the varying factor importances. This dual approach allowed us to analyze factor importance at both a macro- and micro-level, providing a more robust and nuanced understanding of the landslide-driving mechanisms.

Results

Inter-factor correlation analysis

To assess the potential for multicollinearity among the continuous environmental factors (DEM, Slope, Relief, TWI, Curvature, River, Rain, and FVC), the Pearson’s Correlation Coefficient (PCC) was used in this study. Results of the correlation matrix are shown in Fig. 4. Previous works show that PCC > 0.7 could be used as a threshold to determine high degree of covariance between the variables⁵¹. From Fig. 4, he highest PCC in our study area is 0.49 and the mean absolute PCCs is 0.17, both of which are far lower than 0.7. Therefore, the above mentioned 13 influencing factors were used to establish this landslide susceptibility model. The categorical factors (Distance to Fault, Aspect, Geology, Landcover, and TPI) were not included in this analysis.

Model performances and reliability

The performance of our Random Forest model was evaluated using both accuracy and reliability metrics. The Receiver Operating Characteristic (ROC) curve and its corresponding Area Under the Curve (AUC) value were used to assess the model’s predictive accuracy. As shown in Fig. 5, the AUC values of the training dataset and the validation dataset were 0.93 and 0.85, respectively. While a slight performance difference exists between the two datasets, the model maintains a high discriminant efficiency, confirming its strong prediction ability and capacity to effectively distinguish between landslide-prone and non-prone areas.

Furthermore, we evaluated the model’s reliability using the Calibration curve to assess the alignment between predicted probabilities and actual outcomes. A perfect Calibration curve lies on the 45-degree line of the plot, indicating a perfect match between predicted and actual probabilities. As shown in Fig. 6, our model exhibits high reliability, as evidenced by low values for key metrics: a Brier score of 0.1589, an Expected Calibration Error (ECE) of 0.0572, and a Maximum Calibration Error (MCE) of 0.1178. These results collectively demonstrate the model’s robustness and its suitability for landslide susceptibility assessment.

The landslide susceptibility map and factor importance

To identify key environmental factors that have significant impacts on the landslide susceptibility model, the importance of landslide evaluation factors is obtained as shown in following Fig. 7. We can see that Slope (0.31), Relief (0.18) and TWI (0.11) are among the top three environmental factors that contribute to landslide susceptibility in the entire study area.

By applying the trained RF model to the entire study area, we generated a landslide susceptibility map of the study area. With the Jenks Natural Breaks Classification Method, the results were categorized into five categories: Very Low (< 0.13), Low [0.13–0.31), Medium [0.31–0.50), High [0.50–0.69), and Very High (> 0.69) susceptibility (Fig. 8). The Very High and High susceptibility categories account for a combined 31.87% of the study area (14.36% and 17.51%, respectively). These high-risk areas are mainly distributed in mountainous areas with high elevation, high annual precipitation, and steep slope, spatially correlating well with the distribution of the landslide inventory. In contrast, the Low and Very Low susceptibility zones, which comprise 50.97% of the study area, are typically found in relatively flatter terrain dominated by farmland, paddy land, and urban areas.

Comparison of the South and North sub-regions

Our results show that high and very high susceptible areas are mainly located at the north and south ends of the study area (red rectangular box in Fig. 7). The results, as shown in Fig. 9, confirm that the dominant factors differ between these two areas. In the northern sub-region, the top three factors are Slope (0.091), Relief (0.040) and Curvature (0.033). This highlights that topography remains the primary driver of landslide susceptibility in this section. However, in the southern sub-region, while Slope (0.135) remains the most important factor, Rain (0.026) ranks as the third most significant factor after Curvature (0.031). The increased importance of precipitation in the south is likely due to its proximity to the coast, which leads to higher annual rainfall and more frequent typhoon events. This comparative analysis demonstrates the segment-specific controls on slope instability and the need for tailored mitigation strategies.

Discussion

This study provides a comprehensive landslide susceptibility assessment along the Hefei-Fuzhou High-Speed Railway, a critical infrastructure corridor in southeastern China, using a RF model. Our discussion is structured to provide a deeper interpretation of our findings beyond mere results, focusing on three key areas: the regional variability of landslide-controlling factors, the robustness of our model, and a critical analysis of its limitations for future research.

Regional variability of landslide susceptibility

A primary contribution of this study is the identification of distinct regional drivers of landslide susceptibility along the railway corridor, which for the first time provides a targeted hazard assessment for this key transportation artery. While topographic factors are universally important, their specific nature and interaction with hydroclimatic conditions vary significantly between the northern and southern sections. This finding underscores the necessity of considering regional environmental characteristics when assessing risks along linear infrastructure that traverses diverse geographic zones.

In the mountainous and hilly terrain of the northern section, the most influential factors are Slope, Relief, and Curvature, which collectively point to a topography-dominated landslide regime. The high susceptibility in this region is primarily driven by steep slopes that generate greater shear stress, and high relief which is indicative of vigorous geomorphic processes and active uplift. The strong correlation with curvature, especially in areas of high relief, suggests that local terrain morphology is key to stress redistribution and the concentration of both surface and subsurface water flow, which ultimately controls slope stability. This aligns with previous studies in similar mountainous terrains where landslides are controlled by stress and structural configurations.

In contrast, the southern section, which lies in Fujian Province, is governed by Slope, Curvature, and Rain. The heightened importance of precipitation in this sub-region highlights a hydroclimatic-driven landslide mechanism, consistent with studies identifying intense, short-duration rainfall as the primary trigger for catastrophic landslides in Fujian⁵². This difference is rotted in the region’s distinct geological and climatic setting. The southern sub-region is characterized by a warm, humid climate with abundant rainfall and frequent typhoons, which significantly exacerbates slope instability. Intense rainfall events facilitate water infiltration into fractured bedrock and deep weathering layers, leading to elevated pore water pressure that drastically reduces the shear strength of the slope material, particularly along weak lithological layers⁵³.

In addition, the significant differences between the northern and southern sub-regions are further explained by their unique lithology and land use patterns. The northern sub-region is underlain by Proterozoic metamorphic rocks (such as schist and gneiss) and Mesozoic Yanshanian granites, that weather into strong, coarse-grained soils. In such environments, topographic factors become the primary control on slope stability. Conversely, the southern sub-region is dominated by the red sedimentary beds, that weather into clay-rich, expansive soils. This lithology, combined with heavy rainfall, creates ideal conditions for the development of high pore water pressure above weak and slippery layers, leading to rapid destabilization. Additionally, land use practices amplify these disparities. The northern sub-region benefits from well-preserved forests which provide root reinforcement. while the southern sub-region, with dense populations, experiences intensive agricultural use and deforestation, which disrupts soil structure and increases infiltration capacity, thereby exacerbating landslide susceptibility.

Model reliability and uncertainty analysis

The robustness of our findings is supported by a rigorous data and modeling methodology. The RF model’s high predictive accuracy (AUC = 0.85) and reliability (low Brier score, ECE, and MCE) confirm its suitability for this analysis. Crucially, we extended our reliability assessment by analyzing the model’s predictive standard deviation (SD), which provides a quantitative measure of uncertainty^54,55.

As shown in Fig. 10, the spatial distribution of SD reveals that the model’s predictions are highly confident across most of the study area, with over 90% of the region showing an SD of less than 0.3. This low uncertainty indicates a high level of consensus among the individual decision trees within the forest, suggesting that the selected environmental factors effectively and unambiguously distinguish between stable and unstable slope conditions⁵⁶. It also indirectly reflects the high quality and representativeness of our training inventory⁵⁷, which sufficiently captures the diverse environmental conditions along the railway corridor.

However, areas with higher uncertainty (SD > 0.3) are primarily located within transition zones between different susceptibility classes (e.g., between “low” and “high” susceptibility) and in regions with complex geological conditions. Within these transition zones, the values of environmental factors fall near the decision boundaries of many trees, leading to lower inter-tree consensus and, consequently, higher SD. This finding is critical for practical application. For hazard management, the SD map acts as a vital prioritization tool⁵⁸. Areas classified as “high susceptibility” with “low uncertainty” should be assigned the highest priority for immediate mitigation measures. In contrast, areas with both “high susceptibility” and “high uncertainty” require further field investigation or expert evaluation before committing significant resources⁵⁹. This approach transforms a static susceptibility map into a more dynamic and risk-informed decision-support system, ensuring the efficient allocation of limited resources.

Limitations and future research

While this study provides valuable insights, we acknowledge several limitations that offer avenues for future research. First, while the 13 factors used in our model represent a comprehensive suite of common landslide triggers, region-specific phenomena may have been overlooked. For instance, future models could incorporate variables like wind speed and typhoon tracks to better capture the localized dynamics in the southern sub-region⁶⁰. Second, this study focuses on static susceptibility mapping. A more advanced approach would be to incorporate active slope deformation data, such as from InSAR, to enable a transition from static susceptibility maps to dynamic hazard warnings. This would provide more effective support for risk management and emergency response along the Hefei-Fuzhou High-Speed Railway.

Conclusion

This study successfully developed and validated a Random Forest model to map landslide susceptibility along the Hefei-Fuzhou High-Speed Railway, a critical transportation artery in southeastern China. The model demonstrated high predictive accuracy, providing a robust quantitative basis for landslide risk management.

Our findings reveal two major implications for hazard assessment of linear infrastructure. First, while topographic factors (Slope, Relief, and TWI) are the dominant drivers of landslide susceptibility across the entire corridor, their influence varies significantly between sub-regions. We found a clear distinction between a topography-dominated landslide regime in the northern section and a hydroclimatic-driven regime in the southern section, where rainfall emerges as a critical factor. This pronounced regional variability underscores the necessity of moving beyond a uniform approach and adopting tailored, location-specific strategies for disaster prevention.

Second, this research provides not only a crucial susceptibility map to guide hazard mitigation but also a vital methodological framework. By demonstrating how machine learning can be used to identify and explain complex regional differences in landslide drivers, our work offers a valuable scientific basis for assessing geological hazards for similar large-scale projects in diverse environments. The study’s systematic analysis of model uncertainty, using the standard deviation of RF predictions, provides a practical tool for prioritizing mitigation efforts and allocating resources effectively.

Despite its contributions, the study has certain limitations. The model’s accuracy is affected by uncertainties in multi-source data and data gaps in some coastal areas. Future research could enhance predictive performance and regional applicability by integrating more precise remote sensing data and incorporating dynamic factors such as slope deformation monitoring to enable a shift from static susceptibility analysis to a real-time early warning system for slope instability.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

References

Kirschbaum, D., Stanley, T. & Zhou, Y. Spatial and Temporal analysis of a global landslide catalog [J]. Geomorphology 249, 4–15. https://doi.org/10.1016/j.geomorph.2015.03.016 (2015).
Article ADS Google Scholar
Nadim, F. et al. Global landslide and avalanche hotspots [J]. Landslides 3 (2), 159–173. https://doi.org/10.1007/s10346-006-0036-1 (2006).
Article Google Scholar
Haque, U. et al. The human cost of global warming: deadly landslides and their triggers (1995–2014) [J]. Sci. Total Environ. 682, 673–684. https://doi.org/10.1016/j.scitotenv.2019.03.415 (2019).
Article ADS CAS PubMed Google Scholar
Fell, R. et al. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning [J]. Eng. Geol. 102 (3), 85–98. https://doi.org/10.1016/j.enggeo.2008.03.022 (2008).
Article MathSciNet Google Scholar
Chang, Z. et al. Landslide susceptibility prediction using slope unit-based machine learning models considering the heterogeneity of conditioning factors [J]. J. Rock. Mech. Geotech. Eng. 15 (5), 1127–1143. https://doi.org/10.1016/j.jrmge.2022.07.009 (2023).
Article Google Scholar
Youssef, K. et al. Landslide susceptibility modeling by interpretable neural network [J]. Commun. Earth Environ. 4, 162. https://doi.org/10.1038/s43247-023-00806-5 (2023).
Article ADS Google Scholar
Tehrani, F. S. et al. Machine learning and landslide studies: recent advances and applications [J]. Nat. Hazard. 114 (2), 1197–1245. https://doi.org/10.1007/s11069-022-05423-7 (2022).
Article Google Scholar
Ishwaran, H. & Malley, J. D. Synthetic learning machines [J]. Biodata Min. 7 (1), 1–12. https://doi.org/10.1186/s13040-014-0028-y (2014).
Article Google Scholar
Qi, W., Xu, C., Xu, X. & AutoGluon A revolutionary framework for landslide hazard analysis [J]. Nat. Hazard. Res. 1 (3), 103–108. https://doi.org/10.1016/j.nhres.2021.07.002 (2021).
Article Google Scholar
Thi Ngo, P. T. et al. Evaluation of deep learning algorithms for National scale landslide susceptibility mapping of Iran [J]. Geosci. Front. 12 (2), 505–519. https://doi.org/10.1016/j.gsf.2020.06.013 (2021).
Article Google Scholar
Reichenbach, P. A review of statistically-based landslide susceptibility models [J]. Sci. Lett. https://doi.org/10.1016/j.earscirev.2018.03.001 (2018).
Article Google Scholar
Bammou, Y. et al. Improving landslide susceptibility mapping in semi-arid regions using machine learning and Geospatial techniques [J]. Dysona Appl. Sci. 6 (2), 269–290. https://doi.org/10.30493/das.2025.484839 (2025).
Article Google Scholar
He, K., Shen, X. & Anagnostou, E. Global burn severity in forest ecoregions: trends, climate drivers, and predictive insights [J]. Npj Nat. Hazards. 2 (1), 61. https://doi.org/10.1038/s44304-025-00113-3 (2025).
Article Google Scholar
Pham, B. T. et al. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India) [J]. Environ. Modell Softw. 84, 240–250. https://doi.org/10.1016/j.envsoft.2016.07.005 (2016).
Article Google Scholar
Shi, W. & Lu, P. Intelligent perception of coseismic landslide migration areas along Sichuan–Tibet railway [J]. IEEE J. Sel. Top. Appl. Earth Obs Remote Sens. 14, 8876–8883. https://doi.org/10.1109/JSTARS.2021.3105671 (2021).
Article ADS Google Scholar
Marchesini, I. et al. National-scale assessment of railways exposure to rapid flow-like landslides [J]. Eng. Geol. 332, 107474. https://doi.org/10.1016/j.enggeo.2024.107474 (2024).
Article Google Scholar
Panchal, S. & Shrivastava, A. Landslide hazard assessment using analytic hierarchy process (AHP): A case study of National highway 5 in India [J]. Ain Shams Eng. J. 13. https://doi.org/10.1016/j.asej.2021.10.021 (2021).
Pu, H. et al. Railway alignment optimization in mountainous regions considering Spatial geological hazards: A sustainable safety perspective [J]. Sustainability 13 (4), 1661. https://doi.org/10.3390/su13041661 (2021).
Article ADS Google Scholar
Xue, Y. et al. Slope failure mechanism of the 5·1 Meida highway collapse in Guangdong, china: interaction between multi-source water and weathered granite soil [J]. Landslides 22 (4), 1199–1212. https://doi.org/10.1007/s10346-025-02455-5 (2025).
Article Google Scholar
Chen, X. et al. The catastrophic 13 November 2015 rock-debris slide in Lidong, south-western Zhejiang (China): a landslide triggered by a combination of antecedent rainfall and triggering rainfall [J]. Geomatics Nat. Hazards Risk. 9 (1), 608–623. https://doi.org/10.1080/19475705.2018.1455750 (2018).
Article Google Scholar
Wang, F. et al. The fault-controlled Chengtian landslide triggered by rainfall on 20 May 2021 in Songyang County, Zhejiang Province, China [J]. Landslides 19 (7), 1751–1765. https://doi.org/10.1007/s10346-022-01891-x (2022).
Article Google Scholar
Feng Wenkai, Z. J. et al. Characteristics and drivers of clustered landslides induced by the extreme rainstorm on June 16 in the Fujian-Guangdong-Jiangxi junction area [J]. Earth Sci. https://doi.org/10.3799/dqkx.2025.087 (2025).
Article Google Scholar
Zhou, X. et al. Geoinformation-based landslide susceptibility mapping in subtropical area [J]. Sci. Rep. 11 (1), 24325. https://doi.org/10.1038/s41598-021-03743-5 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, F. et al. Landslide susceptibility prediction based on a semi-supervised multiple-layer perceptron model [J]. Landslides 17 (12), 2919–2930. https://doi.org/10.1007/s10346-020-01473-9 (2020).
Article Google Scholar
Zhou, S. et al. Exploring bayesian network model with noise filtering for rainfall-induced landslide susceptibility assessment in Fujian, China [J]. Front. Earth Sci. 12. https://doi.org/10.3389/feart.2024.1444882 (2024).
Lin, J. et al. Risk assessment and its influencing factors analysis of geological hazards in typical mountain environment [J]. J. Clean. Prod. (Aug.1), 309. https://doi.org/10.1016/j.jclepro.2021.127077 (2021).
Zhou, X. et al. Hydrogeochemical Evolution, isotopic Insights, and genetic models of geothermal water in Anhui Province, China [J]. Water. 17 (2), 236. https://doi.org/10.3390/w17020236 (2025).
Article CAS Google Scholar
Ou, P. et al. Assessment of landslide hazard in Jiangxi using Geo-information [J]. Front. Earth Sci. 9, 648342. https://doi.org/10.3389/feart.2021.648342 (2021).
Article Google Scholar
QGIS Geographic Information System 3.10. (2019). https://www.qgis.org
Wang, W. et al. Identification and distribution of 13003 landslides in the Northwest margin of Qinghai-Tibet plateau based on human-computer interaction remote sensing interpretation [J]. China Geol. 7 (2), 171–187. https://doi.org/10.31035/cg2023140 (2024).
Article Google Scholar
Gao, H. et al. Landslides triggered by the July 2023 extreme rainstorm in the Haihe river Basin, China [J]. Landslides 21 (11), 2885–2890. https://doi.org/10.1007/s10346-024-02322-9 (2024).
Article Google Scholar
Ma, H. & Wang, F. Factors controlling the formation and movement of clustered shallow landslides triggered by the extreme rainstorm in July 2023 in Beijing, China [J]. Geomorphology, 478: 109728. (2025). https://doi.org/10.1016/j.geomorph.2025.109728
Ersayin, K. & Uzun, A. A comprehensive analysis of landslide susceptibility in Iyidere basin (NE, Turkey) using machine learning techniques and statistical bivariate methods [J]. Nat. Hazard. https://doi.org/10.1007/s11069-025-07354-5 (2025).
Article Google Scholar
Dahal, A. et al. Quantifying the influence of topographic amplification on the landslides triggered by the 2015 Gorkha earthquake [J]. Commun. Earth Environ. 5 (1), 678. https://doi.org/10.1038/s43247-024-01822-9 (2024).
Article ADS Google Scholar
Nguyen, H-H-D. et al. A hybrid approach combining physics-based model with extreme value analysis for Temporal probability of rainfall-triggered landslide [J]. Landslides 22 (1), 149–168. https://doi.org/10.1007/s10346-024-02366-x (2025).
Article Google Scholar
Kumari, K. et al. Spatial footprints of moisture-driven landslides in Western Himalayas from 2007 to 2022 [J]. Nat. Hazard. 121 (6), 7325–7345. https://doi.org/10.1007/s11069-024-07086-y (2025).
Article Google Scholar
Achu, A. L., Gopinath, G., a C, H. T. J. & D V. Redefining landslide susceptibility under extreme rainfall events using deep learning [J]. Geomorphology 448 (Mar.1). https://doi.org/10.1016/j.geomorph.2023.109033 (2024). 1.1–1.18.
Yacine et al. How do machine learning techniques help in increasing accuracy of landslide susceptibility maps? [J]. Geosci. Front., 11(3): 13. CNKI:SUN:GSFT.0.2020-03-012 (2020).
Jarvis, A. et al. Hole-filled SRTM for the globe Version 4 [J]. (2008).
Xu, X. et al. Seismotectonic map in China and its adjacent regions. In: Beijing: Seismological Press. [Z]. In: Beijing: Seismological Press. 2016.activefault.china.250.2016.db.
Wu, X. et al. China active faults database and its web system [J]. Earth Syst. Sci. Data Discuss. https://doi.org/10.5194/essd-2023-119 (2023).
Article Google Scholar
Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km Spatial resolution climate surfaces for global land areas [J]. Int. J. Climatol. 37 (12), 4302–4315. https://doi.org/10.1002/joc.5086 (2017).
Article Google Scholar
Tateishi, R. et al. Production of global land cover data - GLCNMO [J]. Int. J. Digit. Earth. (1), 4. https://doi.org/10.1080/17538941003777521 (2011).
Tianditu http://www.tianditu.gov.cn/
Fawcett, T. An introduction to ROC analysis [J]. Pattern Recognit. Lett. 27 (8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010 (2006).
Article ADS MathSciNet Google Scholar
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms [J]. Pattern Recognit. 30 (7), 1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2 (1997).
Article ADS Google Scholar
Degroot, M. H. & Fienberg, S. E. The comparison and evaluation of forecasters [J]. J. R Stat. Soc. D. 32 (1), 12–22. https://doi.org/10.2307/2987588 (1983).
Article Google Scholar
Guo, C. et al. On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning (ICML 2017). 1321–1330.
Breiman, L. Random forests [J]. Mach. Learn. 45 (1), 5–32. https://doi.org/10.1023/A:1010933404324 (2001).
Article Google Scholar
Fisher, A., Rudin, C. & Dominici, F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously [J]. J. Mach. Learn. Res. 20. https://doi.org/10.48550/arXiv.1801.01489 (2019).
Bui, D. T. et al. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree [J]. Landslides 13 (2), 361–378. https://doi.org/10.1007/s10346-015-0557-6 (2016).
Article Google Scholar
Ma, S., Shao, X. & Xu, C. Landslides triggered by the 2016 heavy rainfall event in Sanming, Fujian province: distribution pattern analysis and Spatio-Temporal susceptibility assessment [J]. Remote Sens. 15 (11), 24. https://doi.org/10.3390/rs15112738 (2023).
Article Google Scholar
Luo, X., Li, C. & Zhou, J. GIS-based prediction method of shallow landslides induced by heavy rainfall in large mountainous areas [J]. J. Mountain Sci. 21 (5), 1534–1548. https://doi.org/10.1007/s11629-023-8535-2 (2024).
Article Google Scholar
Huang, F. et al. Uncertainties of landslide susceptibility prediction considering different landslide types [J]. J. Rock. Mech. Geotech. Eng. 15 (11), 2954–2972. https://doi.org/10.1016/j.jrmge.2023.03.001 (2023).
Article CAS Google Scholar
Huang, F. et al. Comparisons of heuristic, general statistical and machine learning models for landslide susceptibility prediction and mapping [J]. CATENA, 191: 104580. (2020). https://doi.org/10.1016/j.catena.2020.104580
Takoutsing, B. & Heuvelink, G. B. M. Comparing the prediction performance, uncertainty quantification and extrapolation potential of regression kriging and random forest while accounting for soil measurement errors [J]. Geoderma 428, 116192. https://doi.org/10.1016/j.geoderma.2022.116192 (2022).
Article Google Scholar
Kuhn, S. et al. Identification of intrusive lithologies in volcanic terrains in British Columbia by machine learning using random forests: the value of using a soft classifier [J]. Geophysics 85 (6), B249–B258. https://doi.org/10.1190/geo2019-0461.1 (2020).
Article Google Scholar
Das, R. et al. Machine learning-based assessment of regional-scale variation of landslide susceptibility in central Vietnam [J]. PLOS ONE. 19 (10), e0308494. https://doi.org/10.1371/journal.pone.0308494 (2024).
Article CAS PubMed PubMed Central Google Scholar
Abbasi, M. et al. A hybrid of random forest and deep Auto-Encoder with support vector regression methods for accuracy improvement and uncertainty reduction of long-term streamflow prediction [J]. J. Hydrol. 597, 125717. https://doi.org/10.1016/j.jhydrol.2020.125717 (2021).
Article Google Scholar
Lin, Y-C. et al. Strong wind is one of the important factors that trigger landslides [J]. Npj Nat. Hazards. 2 (1), 12. https://doi.org/10.1038/s44304-025-00062-x (2025).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (Grant No. 2024YFC3012603 and 2024YFC3012604), National Natural Science Foundation of China (Grant No. 42407257), Chongqing Water Resources Bureau, China (Project No. CQS24C00836) and Key Project of China Railway Design Corporation (Project No. 2023A0226409). Special thanks to Dr. Wentao Yang for his help for the uncertainty assessment of model in the revision stage. Special thanks also go to the editors and anonymous reviewers for their invaluable assistance in refining this work.

Author information

Authors and Affiliations

National Institute of Natural Hazards, Ministry of Emergency Management of China, Beijing, 100085, China
Jiarong Liang, Wenwen Qi, Chong Xu, Jingjing Sun, Xuewei Zhang & Zhiwen Xue
Key Laboratory of Compound and Chained Natural Hazards Dynamics, Ministry of Emergency Management of China, Beijing, 100085, China
Jiarong Liang, Wenwen Qi, Jingjing Sun & Xuewei Zhang
School of Geology and Mining Engineering, Xinjiang University, Urumqi, 830047, China
Chong Xu
School of Emergency Management Science and Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China
Zhiwen Xue
Beijing Engineering Corporation Limited, Beijing, 100024, China
Peng Wang
School of Engineering and Technology, China University of Geosciences (Beijing), Beijing, 100083, China
Jingyu Chen
School of Civil Engineering and Architecture, Anhui University of Science and Technology, Anhui, 232001, China
Yulong Cui
China MCC5 Group Co., Ltd, Chengdu, 610063, China
Junwei Pan
China Railway Xi’an Group Company Limited, Xi’an, Shaanxi Province, 710048, China
Juanling Wang
China Railway Design Corporation, Tianjin, 300143, China
Qihao Sun
Beijing Aidi Geological Engineering Technology Co., Ltd, Beijing, 100144, China
Jingyu Chen

Authors

Jiarong Liang
View author publications
Search author on:PubMed Google Scholar
Wenwen Qi
View author publications
Search author on:PubMed Google Scholar
Chong Xu
View author publications
Search author on:PubMed Google Scholar
Peng Wang
View author publications
Search author on:PubMed Google Scholar
Jingjing Sun
View author publications
Search author on:PubMed Google Scholar
Xuewei Zhang
View author publications
Search author on:PubMed Google Scholar
Zhiwen Xue
View author publications
Search author on:PubMed Google Scholar
Jingyu Chen
View author publications
Search author on:PubMed Google Scholar
Yulong Cui
View author publications
Search author on:PubMed Google Scholar
Junwei Pan
View author publications
Search author on:PubMed Google Scholar
Juanling Wang
View author publications
Search author on:PubMed Google Scholar
Qihao Sun
View author publications
Search author on:PubMed Google Scholar

Contributions

J. L. and W. Q.: Methodology, Investigation, Writing—original draft. J. L., W. Q., C. X. and Z. X.: Writing—review and editing, Validation. W. Q. and C. X.: Project administration. P.W., J. S., X. Z., J. C., Y. C., J. P., J. W. and Q. S.: Data acquisition, Investigation.

Corresponding author

Correspondence to Wenwen Qi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study did not involve any human participants, animal subjects, or sensitive data requiring ethical approval. No experiments, interventions, or interactions were conducted that necessitate formal review or approval by an ethics committee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liang, J., Qi, W., Xu, C. et al. Evaluation of geological hazards susceptibility along a key railway based on machine learning. Sci Rep 15, 42497 (2025). https://doi.org/10.1038/s41598-025-26496-x

Download citation

Received: 22 April 2025
Accepted: 29 October 2025
Published: 27 November 2025
Version of record: 27 November 2025
DOI: https://doi.org/10.1038/s41598-025-26496-x