Introduction

Landslides are among the most destructive natural hazards, causing severe impacts on human life, infrastructure, and ecosystems1. They involve the gravity-driven downslope movement of soil, rock, or debris and commonly occur in steep terrains under specific geological or climatic conditions such as intense rainfall, snowmelt, or seismic activity2. Human activities, including mining, construction, and water management modifications, can further increase landslide occurrence3. Globally, landslides result in substantial economic losses each year and pose serious risks to communities, particularly in regions characterized by complex geological and climatic settings4. In recent decades, their increasing frequency and intensity have been widely associated with climate change, rapid urbanization, deforestation, and unregulated land-use practices5. These factors collectively enhance slope instability, highlighting the urgent need for reliable and accurate methods for landslide risk assessment and prediction6.

Effective landslide susceptibility assessment requires a clear understanding of the factors controlling slope instability to support early warning systems, land-use planning, and disaster preparedness7,8,9. Traditional approaches, such as field surveys and empirical models, rely on site-specific observations and simplified assumptions, often failing to capture the complex and nonlinear interactions governing landslide processes10,11. Advances in remote sensing have enhanced susceptibility analysis by providing high-resolution, multi-temporal spatial data and enabling integration of diverse environmental variables12,13,14. Satellite imagery (e.g., Sentinel-2 and Landsat) and DEM-derived indicators, including slope, aspect, and elevation, offer essential terrain information15, while GIS integration improves spatial analysis of landslide-prone areas16. However, the multidimensional and dynamic nature of landslide processes requires analytical frameworks capable of handling large and complex datasets17. Traditional statistical models can identify relationships among conditioning factors but often struggle to represent nonlinear dependencies and interactions13. Consequently, machine learning and deep learning approaches have gained prominence for their ability to extract complex spatial and temporal patterns and improve susceptibility prediction accuracy16,17.

Modern susceptibility studies increasingly employ machine learning algorithms, including Support Vector Machines (SVMs), Decision Trees (DTs), Random Forests (RFs), and Artificial Neural Networks (ANns), which have demonstrated strong performance in integrating topographic, geological, hydrological, and meteorological variables to delineate high-risk zones14,15. However, many of these models primarily emphasize either spatial or temporal information, limiting their ability to represent dynamic triggering mechanisms such as rainfall sequences or reservoir level fluctuations. To address this limitation, recent research has increasingly shifted toward deep learning techniques that explicitly model spatiotemporal dependencies inherent in landslide processes18.

Deep learning, a subset of machine learning, has become a key tool in natural hazard research due to its ability to automatically extract high-level features from large and complex datasets. In landslide susceptibility studies, Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks are widely used architectures that address different data characteristics19. CNN are particularly effective for spatial data such as satellite imagery and DEM, as they capture hierarchical spatial patterns through convolutional operations18 and can identify critical indicators such as slope features, vegetation variability, and land-use patterns19. In contrast, LSTM networks are designed for sequential data20 and can learn long-term dependencies in time series, making them suitable for modeling temporal triggers such as rainfall variability and reservoir-level fluctuations12. To exploit their complementary strengths, recent studies increasingly employ hybrid CNN–LSTM frameworks that integrate spatial and temporal information within a unified structure14, enabling improved predictive performance and a more comprehensive representation of landslide processes16.

Despite the global advancement of deep learning in hazard research, hybrid spatiotemporal applications in Iran remain limited. Most existing landslide susceptibility studies in Iran rely on traditional machine learning models that separately consider spatial predictors or temporal triggers. To address this gap, this study develops a hybrid CNN–LSTM framework that integrates spatial features derived from multi-source remote sensing data with temporal environmental variables. The CNN component extracts terrain- and land-cover–related features, while the LSTM component models hydrological and seasonal temporal dynamics. This end-to-end architecture reduces manual feature engineering and enables joint spatiotemporal learning. The main objectives are to construct a comprehensive geospatial–temporal database, design and train the CNN–LSTM model, and evaluate its performance using classification metrics. The resulting susceptibility maps support disaster risk management and land-use planning. Although hybrid deep learning models have been applied elsewhere, their use for landslide susceptibility mapping in Kerman remains limited. The contribution of this study lies in its region-specific model configuration, systematic data integration, and quantitative performance improvement over conventional machine learning and standalone deep learning approaches in the Kerman region.

Background

The application of artificial intelligence and deep learning in landslide susceptibility mapping has expanded rapidly, with numerous studies demonstrating improved prediction accuracy and reliability19. In Iran, increasing attention has been given to machine learning–based susceptibility assessments across diverse geographical settings20. Table 1 summarizes recent studies conducted in Iran, highlighting study areas, data sources, modeling approaches, and key findings. The variety of applied models reflects the adaptability of machine learning techniques to Iran’s heterogeneous geological and environmental conditions and their ability to capture key conditioning factors21,22,23,24. By integrating topographic, meteorological, and land-use variables, these approaches enhance the representation of landslide-prone areas7 and support disaster risk management and land-use planning19.

Table 1 Summary of recent machine learning–based landslide susceptibility studies.

This study advances previous research by implementing a hybrid CNN–LSTM framework that explicitly integrates spatial and temporal information within a unified architecture. Most prior studies in Iran and elsewhere have relied on traditional machine learning models such as SVM, DT, RF, and Multilayer Perceptron (MLP), which primarily emphasize spatial predictors and often overlook temporal triggering dynamics13. Similarly, many GIS- and statistical-based approaches depend on historical datasets and underutilize high-resolution remote sensing data14. By combining multi-source spatial features with temporal environmental variables through deep learning, the proposed framework provides a more comprehensive susceptibility assessment17. Although hybrid deep learning models have been applied in other regions, their use in Iran remains limited. The contribution of this study lies in its spatiotemporal model design and its quantitative improvement over conventional machine learning and standalone deep learning approaches in the Kerman region12,13.

Studied region

Kerman province

Kerman Province, located in southeastern Iran, exhibits diverse geological, climatic, and morphological conditions. Its landscape consists of mountain ranges, valleys, and plains shaped by long-term tectonic and sedimentary processes36. Positioned along the margin of the Arabian Plate, the region contains extensive folded and faulted formations37 composed of sedimentary, igneous, and metamorphic units that host significant mineral resources36. The geographical location and geo-structural setting of the province are presented in Figs. 1 and 2. Major tectonic structures, including the Tali Shear Zone and active faults such as the Kerman and Bafgh faults, have significantly influenced regional morphology and seismic activity38. Climatically, the province is predominantly arid to semi-arid, characterized by hot summers, cold winters, and low annual precipitation (approximately 150–300 mm), mainly occurring in winter and spring. These conditions promote desert landscapes, seasonal rivers, and extensive salt flats. Topography ranges from high mountains exceeding 4,000 m to broad desert plains. The Zagros mountain range strongly influences regional relief, hydrology, and microclimate36. Fluvial processes, particularly along major rivers such as the Gavkhuni River, drive erosion and sediment transport, while widespread salt flats (“kavirs”) reflect intense evaporation37. Human activities, including valley-based agriculture and large-scale mining, have further altered the landscape, increasing pressures related to soil degradation, groundwater depletion, and environmental stability38.

Fig. 1
Fig. 1
Full size image

Location of the Kerman province in Iran and global context (the world and Iran maps was generated by using Microsoft Excel’s map chart feature and elevation map using DEM data).

Fig. 2
Fig. 2
Full size image

Geo-structural map of Kerman province40 (the maps as generated using geological data and ArcGIS software version 4.10.1, Esri, https://www.esri.com/arcgis)‎.

Triggering factors selections

Selecting appropriate conditioning factors is essential for accurate landslide susceptibility assessment7. Rainfall, seismic activity, land use, slope characteristics, and soil or rock properties play key roles in landslide initiation12,17. This is particularly relevant in Kerman Province, where complex topography, variable lithology, and active tectonics favor both rainfall- and earthquake-induced landslides25. Integrating these factors supports land-use planning, early warning systems, and risk mitigation strategies21,31. In this study, multi-source spatial and temporal datasets were compiled to characterize the environmental controls on landslide occurrence. Sentinel-2 and Landsat imagery were used to derive vegetation indices and land-use classes, while a DEM provided topographic parameters such as elevation, slope, and aspect. Hydrological variables, including rainfall intensity, cumulative precipitation, and reservoir-level fluctuations, were incorporated to represent dynamic triggers. Additional layers describing distances to rivers, roads, and faults were extracted to account for hydrological, anthropogenic, and tectonic influences38. Historical landslide records were used for model training and validation. For the temporal component, rainfall and reservoir-level data were organized as daily time-series sequences. A fixed 30-day window preceding each landslide or non-landslide instance was used as LSTM input to capture short-term and cumulative hydrological effects. The final set of predictors included elevation (EV), slope angle (SA), slope aspect (SAP), Topographic Wetness Index (TWI), Stream Power Index (SPI), engineering rock group (ERG), distance to fault (DF), distance to rivers (DRV), distance to roads (DRD), land use, and average annual precipitation (AAP). Their spatial distribution is shown in Fig. 3. EV, SA, and SAP describe terrain morphology; TWI and SPI represent moisture accumulation and runoff energy; ERG reflects lithological resistance; DF, DRV, and DRD capture structural and anthropogenic influences; and land use and AAP represent surface conditions and long-term climatic forcing. Together, these variables provide a comprehensive basis for landslide susceptibility analysis in Kerman Province.

Fig. 3
Fig. 3
Full size image

Landslide influencing factors include: (a) EV, (b) SA, (c) SAP, (d) TWI, (e) SPI, (f) ERG, (g) DF, (h) DRV, (i) DRD, (j) land use, (k) AAP (the maps were generated using ArcGIS software version 4.10.1, Esri, https://www.esri.com/arcgis).

Historical landslide inventory and validation strategy

The landslide inventory used in this study comprised 719 verified landslide events, which were defined as positive samples (label = 1). These events were compiled from official geological reports, field investigations, and visual interpretation of high-resolution satellite imagery. Because consistent polygon boundaries or rasterized landslide masks were not available across the entire study area, each landslide was represented as a point corresponding to its mapped centroid or initiation location. This representation is commonly adopted in regional-scale landslide susceptibility studies to ensure spatial consistency and computational feasibility. To construct a balanced binary classification dataset, an equal number of non-landslide samples (719 points; label = 0) were randomly generated from areas with no recorded landslides. To reduce potential spatial dependence and sampling bias, non-landslide points were selected with sufficient spatial separation from documented landslide locations. The final dataset therefore consisted of 1,438 samples with an exact 1:1 class ratio. For each sample location, all spatial and temporal conditioning factors were extracted and organized into feature vectors used as model inputs. The dataset was partitioned into training, validation, and test subsets using stratified random sampling while preserving class balance within each subset. A 70%–15%–15% split was applied, resulting in 503 landslide and 503 non-landslide samples (1,006 total) in the training set, 108 landslide and 108 non-landslide samples (216 total) in the validation set, and 108 landslide and 108 non-landslide samples (216 total) in the independent test set. The test subset was strictly excluded from model training and hyperparameter tuning and was used solely for final performance evaluation to ensure unbiased assessment. All sample counts reported in the manuscript, tables, and figures have been cross-checked for internal consistency. Because class balance was maintained across all subsets, no additional imbalance-handling techniques (e.g., class weighting, oversampling, or focal loss) were applied. All quantitative performance metrics reported in Sect.  5 were computed exclusively using this independent test subset. No training or validation samples were included in the final evaluation statistics. This separation prevents data leakage between training and evaluation stages and ensures that reported metrics reflect out-of-sample predictive performance.

The CNN–LSTM model produces a binary probability of landslide occurrence for each sample. For susceptibility mapping, these continuous probabilities were reclassified into categorical susceptibility levels (e.g., low, moderate, high, and very high) using consistent threshold criteria. As detailed information on landslide size or volume was not uniformly available, all events were assigned equal weight. This is consistent with the objective of susceptibility mapping, which focuses on the spatial likelihood of occurrence rather than event magnitude. The spatial positional accuracy of the landslide inventory is estimated to be within ± 30 m, corresponding to the spatial resolution of the DEM and satellite imagery used. To ensure spatial consistency, all input datasets were resampled to a common resolution and projected into a unified coordinate reference system. Landslide point locations were visually cross-validated against topographic features to confirm positional reliability. Model validation was performed using standard classification metrics (accuracy, precision, recall, F1-score, and AUC) computed on the independent test set, as well as spatial overlay analysis between predicted susceptibility classes and historical landslide locations.

Methodology

The methodology of this study was designed to address the challenges of landslide susceptibility prediction by integrating remote sensing data with advanced deep learning techniques. Data collection involved the acquisition of high-resolution satellite imagery from multiple remote sensing missions, complemented by geospatial and environmental datasets. These datasets included key conditioning factors influencing landslide occurrence, such as topographic attributes, land-use and vegetation characteristics, and temporal variables (Fig. 3). In addition, a historical landslide inventory was incorporated to support model training, validation, and performance evaluation.

Data sources and preprocessing

After data compilation, several preprocessing steps were applied to ensure data quality, consistency, and compatibility across datasets. These steps included image georeferencing, feature extraction, and normalization of all conditioning factors. To standardize variables with different units and value ranges, the Min–Max normalization method was employed41. Each factor was rescaled to the range [0, 1] using the difference between its minimum and maximum values, ensuring that no single variable dominated the learning process due to scale effects42. This normalization was particularly important in Kerman Province, where landslide susceptibility is influenced by heterogeneous topographic conditions, variable precipitation patterns, seismic activity, and diverse land-use practices. Applying Min–Max normalization enabled balanced integration of environmental, geological, and anthropogenic factors, thereby improving model stability and predictive reliability. The core analytical framework of this study is a hybrid CNN–LSTM model designed to jointly capture spatial and temporal characteristics of landslide processes. The CNN component was employed to extract spatial features from remote sensing–derived inputs, including terrain morphology, vegetation patterns, and land-use characteristics19. In parallel, the LSTM component was used to model temporal dependencies by learning from rainfall and reservoir-level time-series, which represent key dynamic triggers of landslide initiation42. Model performance was evaluated using standard classification metrics, including accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC), and was compared against conventional machine learning models such as SVM, DT, RF, and MLP.

To ensure reproducibility, the CNN–LSTM architecture is summarized as follows. The CNN branch comprises three convolutional blocks with 32, 64, and 128 filters (3 × 3 kernel), each followed by ReLU activation and 2 × 2 max-pooling. Batch normalization is applied after each block to improve training stability and generalization. The extracted spatial feature maps are then flattened and forwarded to the fusion stage. The LSTM branch includes two stacked layers with 128 and 64 units, respectively. The first layer returns sequences to enable hierarchical temporal feature learning, while the second captures deeper temporal dependencies. Both layers use tanh activation and a recurrent dropout rate of 0.2 to reduce overfitting. Spatial and temporal features are concatenated and passed through two fully connected layers (128 and 64 units, ReLU activation), followed by a sigmoid output layer that generates the binary probability of landslide occurrence. This integrated architecture enables joint learning of spatial patterns and temporal dynamics for landslide susceptibility prediction.

Model architecture and hyperparameters

The overall workflow of the proposed landslide susceptibility analysis follows a systematic framework consisting of data collection, preprocessing, time-series preparation, model development, training, and evaluation (Fig. 4). This structured approach ensures consistency, reproducibility, and reliable model performance for the study area.

Fig. 4
Fig. 4
Full size image

CNN-LSTM model implementation and evaluation flowchart

Data collection

Multiple spatial and temporal datasets were collected to characterize landslide conditioning factors. Topographic variables (elevation, slope, and aspect) were derived from a 30 m resolution DEM. Land use and land cover information was extracted from classified satellite imagery with a spatial resolution of 10 m. Vegetation conditions were represented using the Normalized Difference Vegetation Index (NDVI), calculated from Sentinel-2 multispectral imagery (10 m resolution) using the red (Band 4) and near-infrared (Band 8) bands. Daily rainfall data (mm) were obtained from meteorological ground stations operated by the Iranian Meteorological Organization (IRIMO)39 and supplemented by satellite-based precipitation products where required. Reservoir water level data were collected from official records provided by regional water authorities, reported in meters (m) at a daily temporal resolution. Additional geospatial layers describing distances to faults, rivers, and roads were derived using GIS techniques. All spatial datasets were resampled to a common resolution of 30 m and projected to a unified coordinate reference system.

Data preprocessing

Preprocessing steps included data cleaning, removal of duplicate records, and interpolation of limited missing values using statistical methods. Feature extraction was performed to derive slope, NDVI, and distance-based variables. Continuous variables were normalized using Min–Max normalization, while categorical variables were converted to numerical form using one-hot encoding. These steps ensured numerical compatibility across inputs and prevented scale-related bias during training.

Time-series preparation

Temporal variables, including rainfall and reservoir levels, were organized into fixed-length time-series sequences and provided as inputs to the LSTM component. A temporal window of 30 consecutive days preceding each sample was used to capture short-term and cumulative hydrological effects relevant to landslide initiation. Missing temporal observations were minimal and were addressed using linear interpolation to preserve continuity.

Model architecture

The proposed hybrid CNN–LSTM model integrates spatial and temporal information at the sample level. Spatial conditioning factors were provided to the CNN as structured multi-channel feature tensors derived from a multi-band raster stack. The input does not consist of raw image patches or texture-based imagery. Instead, for each labeled sample location, a tensor of shape (C × 1) was constructed, where C represents the number of normalized spatial variables (e.g., slope, elevation, TWI, SPI, land use). In this configuration, convolutional filters operate across stacked geospatial predictor layers rather than natural image textures, enabling the CNN to learn cross-variable spatial interactions while remaining compatible with sample-based classification. Temporal inputs (rainfall and reservoir-level fluctuations) were supplied to the LSTM as localized time-series sequences with tensor shape (N × T), where N = 30 represents the number of preceding time steps and T denotes the number of temporal variables. For each spatial sample, daily rainfall and reservoir-level values for the 30 days prior to the landslide or non-landslide instance were extracted and organized into a fixed-length sequence. These temporal sequences were not spatially gridded over the entire raster domain; instead, they were directly linked to each labeled sample, ensuring that temporal dynamics correspond to the same spatial location processed by the CNN branch. Spatial features extracted by the CNN were flattened into a one-dimensional embedding vector, while the final hidden representation of the second LSTM layer served as the temporal embedding. These two representations were concatenated through a late-fusion layer and passed to fully connected layers for binary classification (landslide vs. non-landslide). The CNN branch comprises three convolutional blocks with 32, 64, and 128 filters (kernel size 3 × 3), each followed by ReLU activation, 2 × 2 max-pooling, and batch normalization. The LSTM branch consists of two stacked layers with 128 and 64 units, respectively, using tanh activation and a recurrent dropout rate of 0.2. The fused feature vector is processed by two dense layers (128 and 64 units, ReLU activation) and a final sigmoid output layer that produces the probability of landslide occurrence.

Model implementation and training

The CNN–LSTM model was implemented using custom Python code (Python 3.10) with TensorFlow (v2.x) and Keras libraries. Geospatial preprocessing was performed using NumPy, Pandas, GDAL, and ArcGIS 10.4.1. Model training employed the Adam optimizer with a learning rate of 0.001, batch size of 32, and 50 epochs. Dropout and L2 regularization were applied to reduce overfitting. Given the training dataset size (1006 samples), batch size (32), and 50 training epochs, the model performed approximately 1,550 mini-batch gradient update steps in total. This value corresponds to the effective number of parameter updates during training and ensures consistency between dataset size and reported training configuration. Training and validation losses and accuracies were monitored simultaneously to assess convergence and generalization.

Model workflow

In the proposed hybrid framework, CNN are used to extract spatial features from remote sensing data, while LSTM networks model temporal dependencies in environmental time-series data. For the CNN component, the convolution operation applied to an input feature map X with a kernel and bias b is defined as:

$$Z_{{i,j}}^{k}=\sum\limits_{{m,n}} {{X_{n+j,m+i}}\, \cdot W_{{m,n}}^{k}+{b^k}}$$
(1)

where \(Z_{{i,j}}^{k}\) represents the output feature map of the k-th filter. A nonlinear activation function (ReLU) is then applied:

$$A_{i,j}^k = \,\max \,\left[ {0,\,Z_{i.j}^k} \right]$$
(2)

Max-pooling is subsequently used to reduce spatial dimensionality and retain dominant features. The temporal dependencies are modeled using LSTM units. At each time step t the LSTM cell operations are defined as:

$${f_t} = \left( {W \cdot {\sigma _f}\, \cdot \,\left[ {{x_t},\,{h_{t - 1}}} \right]\, + {b_f}} \right)$$
(3)
$${f_i} = \left( {W \cdot {\sigma _i}\, \cdot \,\left[ {{x_t},\,{h_{t - 1}}} \right]\, + {b_i}} \right)$$
(4)
$${C_t} = \left( {W \cdot {{\tanh }_c}\, \cdot \,\left[ {{x_t},\,{h_{t - 1}}} \right] + {b_c}} \right)$$
(5)
$${o_t} = \left( {W \cdot {\sigma _o}\, \cdot \,\left[ {{x_t},\,{h_{t - 1}}} \right]\, + {b_o}} \right)$$
(6)
$${C_t} = {f_t}\, \otimes {C_{t - 1}} + {i_t}\, \otimes \,{C_{\tilde t}}$$
(7)
$${h_t} = {o_t} \otimes \,\tanh \,({C_t})$$
(8)

where ft, it and ot denote the forget, input, and output gates, respectively; Ct is the cell state; ht is the hidden state; σ is the sigmoid activation function; and denotes element-wise multiplication. The spatial features extracted by the CNN and the temporal features learned by the LSTM are concatenated and passed through fully connected layers to generate the final landslide susceptibility prediction. For the temporal component of the model, rainfall and reservoir level data were structured as fixed-length sequences and used as inputs to the LSTM layers. Specifically, a temporal window of N = 30 consecutive time steps preceding each landslide (or non-landslide) instance was employed. This window length was selected to capture short- to medium-term hydrological conditions influencing slope stability, while maintaining computational efficiency.

The proposed CNN–LSTM framework follows a structured and reproducible workflow that ensures consistency between data preparation, model training, and province-wide susceptibility mapping. First, multi-source spatial conditioning factors were harmonized by resampling all raster layers to a common spatial resolution and coordinate reference system, followed by Min–Max normalization. For each labeled landslide and non-landslide sample, a fixed-size local patch (H × W) was extracted from the stacked spatial predictor layers, resulting in an input tensor of shape (H × W × C), where C denotes the number of conditioning factors. This patch-based representation provides a valid two-dimensional spatial grid for the 3 × 3 convolution and 2 × 2 max-pooling operations implemented in the CNN branch. The CNN therefore learns neighborhood-scale spatial patterns and cross-factor interactions among conditioning variables, rather than texture features from raw imagery. Temporal variables, including daily rainfall and reservoir-level fluctuations, were organized as fixed-length sequences for each labeled sample. Specifically, for every spatial sample, a 30-day antecedent time window was extracted and aligned with the corresponding geographic location. These temporal sequences were directly associated with each sample point and served as input to the LSTM branch, which models short- to medium-term hydrological memory effects. The spatial embedding generated by the CNN and the temporal embedding derived from the final LSTM hidden state were concatenated through a late-fusion layer and subsequently passed through fully connected layers for binary classification (landslide vs. non-landslide).

For province-wide susceptibility mapping, the trained CNN–LSTM model was applied to all grid cells across the study area. Spatial patches (H × W × C) were extracted for each grid location from the stacked raster layers using the same patch size employed during training to maintain input consistency. Temporal variables were first spatially interpolated across the study domain based on station observations and official reservoir records, producing continuous daily hydrological surfaces. For each grid cell, a 30-day antecedent temporal sequence was assigned using the interpolated rainfall and reservoir values corresponding to that location. This ensured that spatial and temporal inputs used for full-domain inference were constructed in the same format and dimensional structure as those used during model training. The trained model generated continuous probability estimates of landslide occurrence for each grid cell. These probabilities were subsequently reclassified into categorical susceptibility levels to produce the final landslide susceptibility maps. Model performance was evaluated using the independent test dataset and validated through spatial overlay analysis with the historical landslide inventory. This workflow guarantees methodological coherence between preprocessing, feature construction, model training, and map generation, thereby improving transparency and reproducibility of the proposed spatiotemporal framework.

Results

Performance metrics

The performance of the proposed CNN–LSTM model was evaluated using standard classification metrics and compared with traditional machine learning and standalone deep learning models. Accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) were computed using the independent test dataset described in Sect.  3.3. The quantitative results for all models are summarized in Table 2, while the comparative performance across metrics is illustrated in Fig. 5. According to Table 2, the CNN–LSTM model achieved an accuracy of 0.956, precision of 0.928, recall of 0.943, F1-score of 0.935, and an AUC of 0.980 on the test set. As shown in Fig. 5, these values are consistently higher than those obtained by SVM, DT, RF, MLP, as well as the CNN-only and LSTM-only architectures. Confusion matrix analysis further confirmed a high true positive rate and a low false positive rate, indicating stable discrimination between landslide and non-landslide samples.

Table 2 Hyperparameters and evaluation metrics of the CNN–LSTM model and benchmark classifiers
Fig. 5
Fig. 5
Full size image

Performance comparison of the proposed approach with other algorithms

To evaluate robustness, the hybrid CNN–LSTM was compared with its single-stream counterparts. The CNN-only model effectively captured spatial conditioning patterns but lacked temporal context, whereas the LSTM-only model represented temporal dynamics without fully accounting for spatial heterogeneity. Their integration within a unified framework resulted in improved overall performance, demonstrating the benefit of joint spatiotemporal feature learning. Model discrimination capability was further assessed using ROC analysis, with curves presented in Fig. 6. ROC curves were generated from probabilistic outputs on the independent test dataset. For traditional classifiers (SVM, RF, DT, and MLP), probability estimates were obtained using native prediction functions, while CNN-based architectures used sigmoid activation outputs. The CNN–LSTM model achieved the highest AUC (0.98), followed by Random Forest (0.92), MLP (0.90), SVM (0.88), and DT (0.86), as reported in Table 2. This represents an absolute AUC improvement of 6% points over the best-performing traditional benchmark (RF), indicating the advantage of integrated spatial–temporal modeling. To reduce overfitting, dropout (rate = 0.2) and L2 regularization (λ = 0.01) were applied during training. Model selection was based on minimum validation loss, and the training curves (Fig. 7) demonstrate stable convergence without divergence between training and validation performance.

Fig. 6
Fig. 6
Full size image

ROC curves provided for this analysis

Fig. 7
Fig. 7
Full size image

Training and validation loss and accuracy curves of the CNN–LSTM model

Outputs and error analysis

Although the CNN–LSTM model incorporates temporal sequences of rainfall and reservoir-level data, its output is a static landslide susceptibility map rather than an explicit time-forward or pixel-wise forecast. Temporal information is used to characterize antecedent environmental conditions preceding each landslide or non-landslide instance, thereby improving the discrimination between stable and unstable locations. No future-time prediction or dynamic spatiotemporal simulation was performed. Accordingly, model validation was based on spatial agreement between predicted susceptibility classes and historical landslide occurrences, rather than on forward temporal ground-truth forecasting. Figure 7 illustrates the training and validation loss and accuracy curves of the CNN–LSTM model. A gradual decrease in loss and a corresponding increase in accuracy across epochs indicate stable convergence and effective learning without evidence of severe overfitting. Figure 8 presents the trends of MAE and RMSE during training and validation. Although the model addresses a classification problem, these error metrics were used to evaluate the stability of probability outputs during training. The consistent decline of MAE and RMSE suggests improved prediction stability and supports the robustness of the learning process. Together, these results indicate that the CNN–LSTM model converges reliably and generalizes well within the scope of static landslide susceptibility assessment.

Fig. 8
Fig. 8
Full size image

MAE and RMSE trends during training and validation

The model generated landslide susceptibility maps (Figs. 9, 10 and 11) classifying the study area into low, medium, and high susceptibility zones using different predictive models. These maps support land-use planning, infrastructure management, and disaster risk reduction. The CNN–LSTM map shows strong spatial agreement with historical landslide records, confirming the reliability of the proposed framework. Feature importance was evaluated using permutation-based sensitivity analysis, where each factor was independently perturbed and the resulting performance decrease was measured. Slope gradient was the most influential variable (28.4%), followed by cumulative rainfall (24.7%) and reservoir-level fluctuations (18.9%). Elevation contributed indirectly (12.3%), reflecting its association with steep terrain. Distance to fault and land use together accounted for approximately 15.7% of total importance. These findings emphasize the dominant role of slope and hydrological dynamics in regional landslide susceptibility. Validation of the final CNN–LSTM map was performed by overlaying susceptibility classes with the historical inventory. Approximately 86.4% of the 719 recorded landslides fell within the high and very high susceptibility zones, demonstrating strong spatial consistency and supporting the applicability of the model for regional-scale susceptibility assessment. Approximately 86.4% of the independent test-set landslide locations were located within the high and very high susceptibility classes, indicating strong spatial agreement between predicted hazard zones and observed out-of-sample landslide occurrences.

Fig. 9
Fig. 9
Full size image

Landslide susceptibility map provided based on CNN-LSTM (the map was generated using ArcGIS software version 4.10.1, Esri, https://www.esri.com/arcgis)‎.

Fig. 10
Fig. 10
Full size image

Landslide susceptibility maps provided based on deep learning: (a) CNN-only, (b) LSTM-only (the maps were generated using ArcGIS software version 4.10.1, Esri, https://www.esri.com/arcgis).

Fig. 11
Fig. 11
Full size image

Landslide susceptibility maps provided based on benchmark models: (a) MLP, (b) SVM, (c) DT, (d) RF (the maps were generated using ArcGIS software version 4.10.1, Esri, https://www.esri.com/arcgis).

Discussion

Interpretation of results

The CNN–LSTM model achieved strong predictive performance for landslide susceptibility assessment in Kerman Province, with an accuracy of 95.6%, precision of 92.8%, recall of 94.3%, F1-score of 93.5%, and an AUC of 0.98. These results indicate high discriminative capability between landslide and non-landslide locations. Compared with traditional machine learning models (SVM, DT, RF, and MLP) and single-stream deep learning architectures (CNN-only and LSTM-only), the hybrid model consistently performed better across all evaluation metrics. This improvement reflects its capacity to integrate spatial variability and temporal triggering mechanisms within a unified framework. The CNN component effectively captured terrain-related patterns such as slope configuration and land-cover variability, while the LSTM modeled temporal dependencies associated with rainfall accumulation and reservoir-level fluctuations. When used independently, each component provided only partial representation of landslide controls. Their integration reduced prediction errors and produced more stable susceptibility estimates, supported by lower MAE and RMSE values.

Importantly, the architecture was configured according to the environmental characteristics of Kerman Province rather than applied as a generic hybrid model. Landslides in this region result from interactions between spatial factors (e.g., steep terrain, lithology, fault proximity) and delayed hydrological triggers. Accordingly, the CNN depth was limited to three convolutional blocks to balance feature extraction and overfitting risk, and a 30-day temporal window was selected to capture short- to medium-term hydrological memory typical of arid and semi-arid conditions. The need for a hybrid approach arises from the interacting but partially independent nature of spatial and temporal controls. Spatial predictors alone cannot explain failures under anomalous hydrological conditions, while purely temporal models neglect terrain heterogeneity. The adopted late-fusion strategy, which concatenates spatial and temporal representations before classification, enabled complementary learning without unnecessary model complexity. Comparative experiments confirmed that deeper CNN structures or single-stream models did not achieve similar performance, indicating that the improvement stems from context-specific architectural design rather than generic model complexity.

Limitations of the study

Despite its strong performance, several limitations should be noted. First, model accuracy depends on the quality and spatial resolution of remote sensing data, which may be affected by cloud cover, sensor noise, or DEM limitations. Second, the temporal component included only rainfall and reservoir-level fluctuations. The absence of additional time-series variables, such as soil moisture, ground deformation, or seismic activity, may limit representation of all triggering mechanisms. Third, the landslide inventory may be incomplete or biased toward accessible areas, leading to higher uncertainty in regions with sparse records. The framework was evaluated only in Kerman Province, and its transferability to regions with different environmental conditions remains untested. In addition, statistical uncertainty analysis and repeated cross-validation were not performed; therefore, performance differences should be interpreted as indicative rather than statistically conclusive. Finally, random dataset splitting may introduce spatial autocorrelation between subsets, potentially resulting in optimistic estimates. Future studies should apply spatially independent validation strategies, test the framework in multiple regions, and incorporate additional spatiotemporal variables to enhance robustness and generalizability.

Implications for hazard management

The CNN–LSTM outputs provide clear insight into regional susceptibility patterns. The resulting maps show that 24.7% of the area falls within high-susceptibility zones, mainly associated with steep slopes, high cumulative rainfall, and significant reservoir-level fluctuations. Medium-susceptibility areas cover 41.3%, while 34.0% of the region is classified as low susceptibility. These patterns support prioritization of mitigation measures, infrastructure planning, and emergency preparedness. Although temporal rainfall and reservoir sequences were incorporated, the model produces static susceptibility maps rather than future forecasts. Temporal inputs represent antecedent conditions, improving discrimination between stable and unstable areas. The strong spatial agreement with historical landslides confirms that the model effectively captures combined spatial and recent hydrological influences. Feature importance analysis emphasizes the dominant role of slope gradient, cumulative rainfall, and reservoir-level fluctuations, confirming the importance of both topographic and hydrological controls. The CNN–LSTM consistently outperformed traditional machine learning models (SVM, DT, RF) and single-stream deep learning models (CNN-only and LSTM-only) across all evaluation metrics. These results demonstrate that integrating spatial remote sensing predictors with temporal environmental sequences enhances representation of complex triggering processes. Overall, the hybrid framework provides a more comprehensive and reliable susceptibility assessment, supporting evidence-based landslide risk management.

Conclusion

This study developed a data-driven framework for landslide susceptibility assessment using a hybrid Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM) model applied to Kerman Province, Iran. By integrating spatial predictors derived from remote sensing data (e.g., slope, elevation, vegetation indices) with temporal variables (rainfall and reservoir-level fluctuations), the model captures the combined effects of terrain conditions and antecedent hydrological processes. On the independent test dataset, the CNN–LSTM achieved an accuracy of 95.6%, precision of 92.8%, recall of 94.3%, F1-score of 93.5%, and an AUC of 0.98. Compared with the best traditional benchmark (Random Forest, AUC = 0.92), the hybrid model improved AUC by 6% points, demonstrating the advantage of integrated spatiotemporal modeling. The resulting susceptibility maps show that 24.7% of the area is classified as high susceptibility, primarily in zones with steep slopes and significant hydrological variability, while 41.3% and 34.0% fall into medium and low susceptibility classes, respectively. Although temporal sequences were incorporated, the outputs represent static susceptibility patterns rather than future forecasts. The strong spatial correspondence between high-susceptibility zones and historical landslides supports the reliability of the results. Feature importance analysis identified slope gradient, cumulative rainfall, and reservoir-level fluctuations as dominant controls, highlighting the importance of integrating topographic and hydrological factors. Comparative experiments confirmed that the CNN–LSTM consistently outperformed traditional machine learning methods and single-stream deep learning models, emphasizing the value of hybrid spatiotemporal learning. Overall, the proposed framework provides a robust and practical approach for regional-scale landslide susceptibility mapping and supports land-use planning and disaster risk management. Future work should assess model transferability to other regions, incorporate additional dynamic variables (e.g., soil moisture or seismic indicators), and apply spatially independent validation strategies to further evaluate robustness and generalization.