Introduction

Landslides result from complex, interacting environmental forces. The trigger and the main cause are separated into these factors. Weathering, earthquakes, precipitation, and snowmelt are some of the factors that cause landslides (Asadi, Baise et al.1). The phenomenon may also be caused by human activity, such as distributing water from supply and sewage systems and constructing roads and structures on steep slopes (Liu, Liu et al.2). Geomorphic and geologic characteristics, rock outcroppings, rock kinds, and vegetation cover are a few significant fundamental factors. There are many potential primary causes of landslides, including flow accumulation, topography, wetness, distance to roads, stream-power indices, land use, prior landslides, human activities, and the presence of these factors. For landslide risk assessment to be successful, research on the dynamics and interconnections of the various components determining landslide activity is crucial3, Iwahashi, Watanabe et al.4, Ayalew and Yamagishi5. Due to the complex, nonlinear relationships between landslide incidence and its contributing factors, a sophisticated modeling strategy is required to achieve more precise forecasts.

Recently, ANN have been used as analytical tools for a variety of purposes in the natural sciences. These usages include speech recognition, face recognition, classification of satellite images, and form and texture recognition6, Zhao, Peng et al.7. For qualitative modeling of natural events, ANNs are useful because they can handle data at any scale, including nominal, ordinal, linear, and ratio measurements8. An ANN can handle data on any measurement scale, including ordinal, nominal, ratio, and linear scales, as well as any data partitioning, which is an advantage of using an ANN for simulating natural phenomena. Moreover, it is commonly used to predict and classify spatial data from various sources since it can handle qualitative variables. Many authors have discussed the fundamental concepts and applications of ANNs for pattern diagnosis9,10,11,12,13. The ANNs are data-driven methods that approximate nonlinear functions globally. As a result, ANNs have been crucial simulation tools for determining landslide susceptibility zones. The difficulty of categorizing landslide-prone locations is complicated by ANNs’ capacity to learn nonlinear functions from the data (Melchiorre, Matteucci et al.14, Wang and Tian15.

To assess susceptibility to debris flows, Xu et al. (Chen, Li et al.16) combined a data-value model with seven environmental variables. Machine learning is once again receiving attention due to the development of deep learning (Peng, Niu et al.17). A hybrid model was constructed by Peng et al. (Pham, Pradhan et al.18 using multisource data and support vector machines (SVMs) to evaluate regional landslide vulnerability. In their study of the Uttarakhand region of India’s susceptibility to landslides, Binh et al. (Tien Bui, Tuan et al.,19 contrasted the SVM with other models. The ANN, SVM, and a framework for assessing shallow landslide susceptibility were also examined by Bui et al.20. The backpropagation ANN model was used by Pradhan et al. (Conforti, Pascale et al.21 to evaluate landslide susceptibility in the Malaysian Klang Valley region. A model based on an ANN was developed by Conforti et al. (Feng, Yu et al.22 to assess landslide susceptibility.

In addition, Feng et al.‘s mapping of rainfall-triggered landslide susceptibility included logistic regression (LR), the information value method (IVM), SVM, and ANN. There have been many comparative studies of the benefits and drawbacks of various landslide susceptibility mapping methods, but surprisingly few mining LSM analyses. Su et al. (Su, Zhang et al.23) used SVM, LR, and ANN models to evaluate LSM in a coal mining area. Despite this, only the receiver operating characteristic (ROC) curves (AUC) and a few straightforward decision criteria were used to compare the three fitted models. The lack of certainty in the models, which is increasingly important nowadays, was rarely investigated in these studies. It is also less discussed how the selection of landslide-determination variables affects study outcomes. It is possible to implement machine learning algorithms that handle high-dimensional spaces effectively, thereby achieving high classification accuracy. However, they do not provide a direct method for analyzing the relevance of contributing attributes (Micheletti, Foresti et al.24. It is possible to delete unnecessary features, create simpler, lower-dimensional models, and maintain high classification accuracy by combining feature selection with machine learning techniques. To create landslide susceptibility maps and quantitatively model the link between geology, geomorphic elements, and landslide occurrence, this study uses four metaheuristic algorithms (Black Hole Algorithm (BHA), Cuckoo Optimization Algorithm (COA), Multiverse Optimization Algorithm (MVO), and Vertex Search algorithm (VS)) combined with an ANN. This study used landslide data collected in Azerbaijan, Iran. Furthermore, this study will determine whether the four algorithms that optimized the ANN can be used to develop an effective method for producing landslide susceptibility mapping systems. This research can therefore be used to plan risk management.

Description of study area

Between 45 degrees 30 min and 47 degrees 43 min east longitude and 36 degrees 47 min and 38 degrees 42 min north latitude, East Azarbaijan province is one of the most significant and populous provinces in Iran, located in its northwest corner (Fig. 1). This province covers 45,491 square kilometers, about 2.8% of Iran’s total area. This area is located at the confluence of two mountain ranges, the Alborz and Zagros, or in the Alborz region and Azerbaijan, within the Iranian plateau in the northwest corner. About 40% of East Azerbaijan is mountainous, making it a mountainous region. The East Azerbaijan region is characterized by cold weather and mountains, and it falls under the semi-arid climate classification, with an average yearly rainfall of 250 to 300 millimeters. Landslides occur frequently in this region due to its mountainous terrain, climatic conditions, and the expansion of human infrastructure. An essential management measure in landslide-sensitive areas is identifying and zoning them using new models.

Fig. 1
Fig. 1
Full size image

The study area is located in East Azerbaijan. The maps show the elevation classes used in the landslide susceptibility analysis (b), and the administrative boundaries (a).

Methodology

Artificial neural network

The study of networks of flexible nodes is the basis for knowledge of neural computing, which stores and retrieves experiential knowledge from task examples through a learning process25. ANN computing is a computer model that learns how data is generated, enabling it to forecast outputs from inputs. Because of this technique, neurons are arranged in consecutive layers with one or more neurons per layer. Input, hidden, and output layers. Feed-forward networks have weights assigned to each path that determine the scale of the relationships among neurons. Information processing behavior within an ANN is determined by the distribution of weights, which can be positive or negative. To process information, the input layer receives a set of patterns with various output and input paths for each neuron \(\:{N}_{i}\).Hidden (middle) layers process input results from the input layer, where output and input paths are multiple for each neuron. \(\:{N}_{j}\) An ANN’s internal representation of a problem is provided by its hidden layer. Each neuron has one output path \(\:{N}_{k}\:\)and several input paths in the target layer. The output layer transforms the central layer’s findings to create the target pattern. ANNs are used to model information patterns in two steps: the training step adjusts the connection weights between neurons, and the classification step utilizes input patterns to forecast the target (Fig. 2).

Fig. 2
Fig. 2
Full size image

The structure of the ANN, showing the input variables, hidden-layer neurons, and output layer.

Hybrid model development

Each model is evaluated based on three criteria. In Eqs. (1) and (2), error indices are defined, including the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE). We selected indexes that were most frequently used in previous research26; Castelli, Trujillo et al.27 to compare our fuzzy technique with theirs.

$$\:\text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{1}{N}\sum\:_{i=1}^{N}{\left|{y}_{i}-{\widehat{y}}_{i}\right|}^{2}}$$
(1)
$$\:\text{M}\text{A}\text{E}=\frac{1}{N}\sum\:_{i=1}^{N}\left|{y}_{i}-{\widehat{y}}_{i}\right|$$
(2)

As shown in these equations, \(\:\widehat{y}\left(t\right)\) represents the forecasted target, \(\:y\left(t\right)\) is the represents the actual target, and N denotes the overall number. In the current study, we use RMSE and MAE as accuracy indices to compare several strategies on the same database. Since the MAE is a linear index, all individual differences are equally weighted in the average. Because RMSE squares errors before averaging, it assigns a high weight to large errors. Stated differently, the RMSE is most useful in situations where significant mistakes are particularly undesirable. Combining MAE and RMSE can reduce the variability of forecast errors.

The third assessment standard was implemented to provide a comprehensive performance measure. As a result,

$$\:{R}^{2}=1-\sum\:_{i=1}^{N}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}/\sum\:_{i=1}^{N}{\left({y}_{i}-{\stackrel{-}{y}}_{i}\right)}^{2}$$
(3)

The term \(\:{\stackrel{-}{y}}_{i}\) denotes the number and the average of the actual landslide amounts.

Black hole algorithm (BHA)

The black hole algorithm was proposed in28 using the black hole analogy. This region of space has a great concentration of mass. So, the object cannot get away from it. Therefore, this is called a black hole. An optimization algorithm for black holes uses this same concept. A black hole is assigned in each iteration of the algorithm based on the best candidate. A selected candidate is attracted to other solutions as a result of the analogy. A solution that is too close to the black hole is attracted to it and vanishes forever. After that, the solution’s movement in the search space is affected by the black hole, as described by Eq. (4).

$$\:{x}_{i}^{d}\left(t+1\right)={x}_{i}^{d}\left(t\right)+r\left[{x}_{BH}^{d}-{x}_{i}^{d}\left(t\right)\right],\:\forall\:i\in\:\left\{1,\dots\:,\:N\right\}$$
(4)

N represents the solutions’ total number. \(\:{x}_{i}^{d}\left(t\right)\) and \(\:{x}_{i}^{d}\left(t+1\right)\) denote the i-th factor resulted in the repetitions t and t + 1. Finally, \(\:{x}_{BH}^{d}\) represents the d-th factor of the best solution found.

Cuckoo optimization algorithm (COA)

X. S. Yang and S. Deb29 developed the Cuckoo Algorithm in 2009, and Ramin Rajabioun investigated it in detail in 201130. This algorithm’s development is mainly influenced by the varied and interesting lifestyles of cuckoos, as well as by their egg-laying behavior. The bird is capable of brilliantly deceiving other birds into helping it survive. Instead of building nests or protecting their eggs, cuckoos lay their eggs in the nests of other birds, where they are supported along with the other birds’ eggs.

Similar to other evolutionary techniques, COA starts with a randomly generated population, called “habitat,” composed of cuckoos. The habitat of a cuckoo is expressed as an array of 1\(\:\times\:{N}_{var}\), which shows the bird’s current location. According to Eq. (5), this array can be identified.

$$\:habitat=\left[{x}_{1},{x}_{2},\:\dots\:,{x}_{{N}_{var}}\right]$$
(5)

\(\:{N}_{var}\) is the dimension of the problem.

Cuckoos are assigned random eggs to put in the nests of various host birds. It is known as the maximum Egg Laying Radius (ELR), and hosts nest within a range specified by Eq. (6).

$$\:ELR=\left[\propto\:\times\:\frac{Number\:of\:current\:cuckoo\:}{Total\:EquationNumber\:of\:eggs}\right]\times\:\left({var}_{hi}-{var}_{low}\right)$$
(6)

where \(\:\propto\:\) is a numerical value meant to handle the highest possible amount of ELR. \(\:{var}_{low}\) and \(\:{var}_{hi}\) are the respective low and high limits of each parameter in optimization issues.

The likelihood of a cuckoo becoming mature is higher when a cuckoo’s eggs are more similar to the host eggs. About 10% of eggs are recognized by the host bird and removed. More survivors in one region mean more benefits for that region. In other words, COA seeks to optimize a situation in which the maximum eggs’ number of eggs survives. The chicks that survive are raised in the host’s nest and grow into mature cuckoos. During egg-laying, cuckoos move to better habitats with a higher chance of successfully hatching eggs. During migration, cuckoos deviate ϕ degrees from the target after passing a portion of the path (λ% of the total path). By using this parameter, they can search for more zones. Upon reaching the goal location, each cuckoo owns an egg; its ELR is calculated, and egg-laying begins. Food shortages, and hunting by hunters, among many other factors, cause the number of birds to balance in nature, so there are also numbers like \(\:{N}_{max}\:\) in the Cuckoo Algorithm that controls the cuckoos’ maximum number. Cuckoos reach their maximum food resources and the maximum similarity of their eggs to host eggs after several iterations. With the fewest eggs destroyed, it is the place with the highest benefit.

Multiverse optimizer (MVO)

The multiverse optimizer (MVO) is a theory in astrophysics introduced by Mirjalili in 201631. Several types of holes interact with each other through the multiverse as a result of the Big Bangs creating countless universes. The multiverse explains how these holes create wormholes, black holes, and white holes. According to physicists, the universe’s birth was triggered by multiple major explosions. Although a tunnel represents motion between two universe pairs and the interaction between black and white, a multiverse approach runs counter to the universe. Black holes attract all, while white holes emit all. It is assumed that white holes in MVO have high inflation rates for their universes, while black holes have tiny inflation rates. Through the black-and-white hole tunnels, events are moved from the world of high inflation to one of low inflation. As a result, the inflation rate of all universes is enhanced. The roulette wheel scheme is used to model this action. Every iteration of the universe was organized according to the inflationary space that drove its evolution. So, through the roulette wheel, there is only one universe chosen that has white holes as follows:

$$\:{X}_{i}^{j}=\left\{\begin{array}{c}{X}_{k}^{j}\:\:\:,\:\:\:\:\:{r}_{1}<{U}_{i}\\\:{X}_{i}^{j}\:\:\:,\:\:\:\:\:{r}_{1}\ge\:{U}_{i}\end{array}\right.$$
(7)

where \(\:{X}_{i}^{j}\) denotes the jth factor of ith universe, \(\:{X}_{k}^{j}\) is the jth factor of the kth universe, \(\:{r}_{1}\:\)is an accidental number in the [0,1] interval, and \(\:{U}_{i}\:\)is the ith universe normalized inflation rate. The upgrading procedure in MVO is able to be written as follows:

$$\:{X}_{i}^{j}=\left\{\begin{array}{c}\left\{\begin{array}{c}{X}_{j}+TDR\times\:\left(\left(ub-lb\right)\times\:{r}_{4}+lb\right),\:\:\:\:{r}_{3}<0.5\\\:{X}_{j}-TDR\times\:\left(\left(ub-lb\right)\times\:{r}_{4}+lb\right),\:\:\:\:{r}_{3}\ge\:0.5\end{array}\:\:\:\:\:\:\:If\:\:\:\:{r}_{2}<WEP\right.\\\:{X}_{i}^{j}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:If\:\:\:\:{r}_{2}\ge\:WEP\end{array}\right.$$
(8)

Where \(\:{X}_{j}\:\)is the jth variable of the biggest universe, lb and ub are the lower and upper limits, respectively, and \(\:{r}_{2}\), \(\:{r}_{3}\), and \(\:{r}_{4}\:\)are accidental numbers in the [0, 1] interval, TDR is the traveling distance’s rate and the likelihood that wormholes are present is known as WEP. The high TDR value and the low WEP value, which are modified using Eqs. (7) and (8, respectively), emphasize exploration and local optima prevention. High WEP and Low TDR improve the exploitation procedure and prepare a reliable evaluation of the optimal global outcomes. The following are possible expressions for TDR and WEP:

$$\:TDR=1-\frac{{Iter}^{\left(\frac{1}{P}\right)}}{{Iter}_{max}^{\left(\frac{1}{P}\right)}}$$
(9)
$$\:WEP={W}_{min}+Iter\times\:\left(\frac{{W}_{max}-{W}_{min}}{{Iter}_{max}}\right)$$
(10)

where \(\:{W}_{max}\), \(\:{W}_{min}\), \(\:{Iter}_{max}\), Iter, and P are the maximum, minimum, the number of, current iteration, and exploitation’s precision over iterations, respectively. Initializing the random variables representing the universes is the first step in the optimization process. Each time a universe factor with a high fitness score migrates through a black and white hole toward a universe with a low fitness score. Each universe, through wormholes, meets the finest universe in its variables through random theoretical transitions. Repeat this process until a defined peak number of iterations is reached. A further advantage of MVO is that the algorithm keeps the best alternative for affecting the other alternatives.

Vortex search algorithm (VS)

The VSA algorithm was developed by Doğan and Ölmez32 based on the vortex patterns observed in stirred fluids. A balance is maintained between exploratory and exploitative behavior in the algorithm, as in many other schemes. For optimal results, the VSA uses an adaptive step size adjustment technique. For this reason, the VSA initially considers exploratory behavior, leading to a stronger global search. By applying an exploitative design in the proximity of the solutions found, the optimal response is reached32.

The vortex can be modeled as a set of nested circles in a two-dimensional problem. The existing space constraints U and L. Equation (11) yield the initial center (\(\:{\lambda\:}_{0}\)) of the outer circle:

$$\:{\lambda\:}_{0}=\frac{U+L}{2}$$
(11)

In the following stage, some neighbor solutions \(\:{C}_{t}\left(s\right)\) are accidentally generated. Next, some neighbor solutions \(\:{C}_{t}\left(s\right)\) are generated randomly. This production uses a Gaussian distribution method.

$$\:{C}_{0}\left(s\right)=\left\{{S}_{1},\:{S}_{2},\:\dots\:,\:{S}_{g}\right\}\:\:\:\:\:\:\:\:g=1,\:2,\:\dots\:,\:z$$
(12)

z is the total number of candidate solutions, and t is the number of iterations. x is an accidental parameter’s vector and Σ is the covariance matrix. Multivariate Gaussian distribution is expressed by Eq. (13):

$$\:p\left(x\left|\lambda\:,\:{\Sigma\:})=\frac{1}{\sqrt{{\left(2\pi\:\right)}^{2}{\Sigma\:}}}\text{exp}\{-\frac{1}{2}{\left(x-\lambda\:\right)}^{T}\sum\:\left(x-\lambda\:\right)\right.\right\}\:\:\:$$
(13)

The sample mean vector is shown by D, where D is the problem dimension.

If the off-diagonal features are uncorrelated and the variances for the covariance matrix amounts are equal, our distribution would be spherical. With I serving as a D × D identity matrix and \(\:{\sigma\:}^{2}\) representing the variance of the distribution, Σ may be explained as follows:

$$\:\sum\:={\sigma\:}^{2}\times\:{\left[I\right]}_{D\times\:D}$$
(14)

To determine the initial standard deviation of the distribution (\(\:{\sigma\:}_{0}\)), we use Eq. (15). Notably, this variable can equal \(\:{r}_{0}\)33:

$$\:{\sigma\:}_{0}=\frac{\text{m}\text{a}\text{x}\text{i}\text{m}\text{u}\text{m}\:\left(U\right)-\text{m}\text{i}\text{n}\text{i}\text{m}\text{u}\text{m}\:\left(L\right)}{2}$$
(15)

The main idea of metaheuristic algorithms is to improve the final solution by upgrading the discovered solutions. During the selection phase of the VSA, the current \(\:{\lambda\:}_{0}\) is changed with the most accurate solution. This requires the candidate solution to be present within the space. This item is checked using Eq. (16).

$$\:\left\{\begin{array}{c}{S}_{g}^{i}=rand\bullet\:\left({U}^{i}-{L}^{i}\right)+{L}^{i},\:\:\:if\:{S}_{g}^{i}<{L}^{i}\\\:{S}_{g}^{i}=rand\bullet\:\left({U}^{i}-{L}^{i}\right)+{L}^{i},\:\:\:if\:{S}_{g}^{i}>{U}^{i}\end{array}\right.$$
(16)

A random value with uniform distribution is called rand.

Next, the best-fitted solution so far is applied to the second (i.e., inner) circle’s center. Upon reducing the new solution’s efficient radius, a new solutions’ group (\(\:{C}_{1}\left(s\right)\)) is created. Repeating the same strategy may yield a more promising result33. Earlier studies have also described the VSA (Altintasi, Aydin et al.34;.

Landslide conditioning variables and landslide inventory map (LIM)

The sum of 16 factors affected the occurrence of slope movements, based on earlier studies and the expertise of experts. As such, gradient layers, slope layers, transverse curvature layers, longitudinal curvature layers, and orientation layers are created using DEM layers in ArcGIS 10.3. As part of the United States Geological Survey (USGS) homepage in ArcGIS 10.3, Landsat satellite imagery from the ETM sensor was used to create the Normalized Difference Vegetation Index (NDVI) layer on July 8th, 2019. A similar approach was used to identify fault distances and lithological strata for each of the 100,000 cities evaluated in the geodatabase (Moayedi, Mehrabi et al.35. Based on long-term, mediocre rainfall data from rain gauges in eastern Azerbaijan and Iran, rainfall layers were created using the Interpolation Distance Weight (IDW) algorithm. As a result36, , Chau, Sze et al.37, Galli, Ardizzone et al.38, Calligaris, Poretti et al.39, Xu, Xu et al.40,41, Moosavi, Talebi et al.42 were the parameters used in the construction of the maps, which included (a) aspect, (b) rainfall, (c) elevation, (d) land use, (e) geology, (f) distance from the fault, (g) profile curvature, (h) plan curvature, (i) NDVI, (j) Stream Power Index (SPI), (k) distance from the road, (l) distance from the river, (m) Sediment Transport Index (STI), (n) Terrain Roughness Index (TRI), and (o) slope degree, (p) Topographic Wetness Index (TWI) maps43. After pooling the information from these layers, data sets were created that include specifications affecting landslide onset in eastern Kurdistan (Fig. 3). Landslides are classified according to various factors and specifications shown in Table 1. Approaches were analyzed and implemented using MATLAB, SPSS 20, and ArcGIS 10.3.

Feature selection refers to approaches that assign scores to input attributes based on their relevance to forecasting an output parameter44;. When creating a predictive approach, feature selection is the procedure of minimizing the number of input parameters45, Zhao, Li et al.46, Mafarja and Mirjalili47, Zhang, Zhou et al.48. Reducing the number of input variables is preferable, as it can lower modeling’s computational cost and, in some situations, enhance the model’s performance (Micheletti, Foresti et al.24, Zhang, Liu et al.49. Figure 4 shows the feature selection plot. According to this figure, rainfall has the highest importance ranking (near 0.5), and SPI has the lowest (almost 0). Figure 5 presents a visualization of all 16 input variables used in this study, showing their distribution patterns and ranges. In this figure, red bars indicate samples with landslide occurrence (landslide presence), while blue bars indicate samples without landslide occurrence (non-landslide absence). This consistent color coding across all subplots clearly illustrates the distribution of landslide and non-landslide cases among different classes of conditioning factors. Figure 6 highlights the variation of normalized values ​​for the six highest-ranked variables: Rainfall, Elevation, Slope, Land use, River, and TWI, emphasizing their relative influence on landslide susceptibility. All continuous variables originally expressed in physical units, including precipitation (mm), elevation (m), slope (degrees), and distances to rivers or faults (m), were normalized and reclassified into discrete factor classes before model implementation.

Fig. 3
Fig. 3Fig. 3Fig. 3Fig. 3
Full size image

Detailed maps showing landslide-prone environmental conditions have been prepared for the eastern Azerbaijan included (a) aspect, (b) rainfall, (c) elevation, (d) land use, (e) geology, (f) distance from the fault, (g) profile curvature, (h) plan curvature, (i) Normalized Difference Vegetation Index, (j) stream power index (SPI), (k) distance from the road, (l) distance from the River, (m) STI (sediment transport index), (n) terrain roughness index (TRI), and (o) slope degree, (p) topographic wetness index (TWI).

Consequently, the x-axes represent reclassified factor classes rather than raw physical values, improving comparability among inputs and enhancing the stability of optimization-based artificial inteligence models. Index-based variables such as NDVI, SPI, TWI, TRI, and STI are dimensionless, whereas categorical variables such as land use and geology lack numerical units. In statistical analysis, the breadth of a class, known as the class interval, determines how data are grouped into discrete categories. Although class intervals are often equal in width, they may vary depending on the data distribution, leading to conflicts when balancing data representation and resolution. In this study, the class intervals were specifically designed to categorize the dataset into two distinct groups, landslide and non-landslide, which facilitates clearer visualization of patterns and ensures that subsequent modeling accurately captures the relationship between input variables and landslide occurrence.

Fig. 4
Fig. 4
Full size image

Ranking the importance of used features as input layers on landslide susceptibility output.

Fig. 5
Fig. 5
Full size image

Data visualization of the sixteen input variables included rainfall (a), road (b), NDVI (c), TWI (d), SPI (e), elevation (f), land use (g), geology (h), profile (i), plan (j) river (k), fault (l), TRI (m), STI (n), slope (o), aspect (p), and target (q), illustrating their distribution patterns and value ranges: red bars (landslide occurrence), Blue bars (absence of landslide occurrence).

Fig. 6
Fig. 6
Full size image

Normalized range changes for each input layer. Boxplots show the median, interquartile range, and minimum-maximum values ​​after normalization.

Table 1 Landslide indicators and the types they belong to.

Results and discussion

In this study, network architectures were evaluated and simulated using MATLAB software to identify the most efficient design for landslide susceptibility mapping. Several network architectures with varying numbers of layers and neurons were tested, and their performance was assessed using R² and RMSE. The results presented in Table 2 show that a feed-forward backpropagation network with six hidden layers achieved the most optimal performance among the conventional ANN models. This indicates that increasing network depth to a certain point enhances the ANN’s capacity to capture nonlinear and complex relationships between conditioning factors and landslide occurrences.

Table 2 The sensitivity of the susceptibility mapping for landslides to changes in the neurons’ number.

As shown in Fig. 7, RMSE values ​​vary with the number of neurons per hidden layer, confirming that model accuracy is highly sensitive to the chosen architecture. A structure with lower MSE demonstrates superior predictive power, meaning that careful adjustment of hidden layers and neurons is crucial for improving model performance. These findings also highlight that ANN models are prone to overfitting when the architecture becomes unnecessarily complex, while underfitting may occur if the architecture is too shallow.

The subsequent optimization procedures (BHA-MLP, COA-MLP, MVO-MLP, and VS-MLP) were built on these baseline results to refine model accuracy further. The results show that the optimization algorithms could effectively search the parameter space and identify more efficient configurations. Specifically, BHA, COA, MVO, and VS models reached their optimal solutions at population sizes of 250, 350, 200, and 150, respectively. These results suggest that each algorithm exhibits distinct convergence behavior: for instance, MVO achieves stability at a smaller population size, suggesting a stronger ability to rapidly exploit optimal solutions, while COA requires larger populations to reach comparable accuracy, reflecting its exploration-oriented mechanism.

Overall, the results demonstrate that combining ANN with metaheuristic algorithms not only reduces error values ​​but also improves stability and convergence speed. This confirms the advantage of hybridized models over conventional ANN in capturing the complexity of landslide susceptibility patterns.

Fig. 7
Fig. 7Fig. 7
Full size image

The best-fit method for the (a) BHAMLP, (b) COAMLP, (c) MVOMLP, and (d) VSMLP illustrating the convergence behavior and optimization performance of each method.

Error analysis

In the second evaluation stage, the performance of the hybrid ANN models was assessed using ROC curves and AUC values, which are widely recognized as reliable indicators of classification accuracy. An AUC close to 1.0 reflects excellent predictive power, while values near 0.5 indicate no discriminative ability. Figures 8, 9, 10 and 11 present the ROC curves of the BHA-MLP, COA-MLP, MVO-MLP, and VS-MLP models.

The results demonstrated that all four hybrid models achieved very high AUC values, confirming their strong effectiveness in landslide susceptibility modeling. However, comparative analysis revealed meaningful differences. Among them, MVO-MLP achieved the highest performance with training AUC values above 0.99 and testing AUC values near 0.98, highlighting its superior robustness and generalization. This advantage stems from the MVO algorithm’s ability to balance exploration and exploitation. Simulating the motion of multiple objects in a multi-dimensional search space helps avoid premature convergence and captures the complex nonlinear relationships among landslide conditioning factors more effectively. BHA-MLP and COA-MLP also produced strong predictive accuracy (AUC > 0.98), but their performance appeared slightly less stable across different swarm sizes. This instability may be related to higher sensitivity to parameter tuning and a tendency towards overfitting when population size increases. In contrast, VS-MLP produced consistently high AUC values, though marginally lower than MVO-MLP, suggesting that, while effective, its search mechanism is somewhat less adaptive to the data’s complexity (Figs. 8, 9, 10 and 11). The determination of optimal swarm sizes (250 for BHA, 350 for COA, 200 for MVO, and 150 for VS) further demonstrates that algorithmic performance is strongly influenced by parameter calibration. These findings underscore the importance of selecting not only an appropriate optimization strategy but also fine-tuning its parameters for reliable susceptibility modeling (Figs. 12, 13, 14 and 15).

In summary, the comparative discussion confirms that coupling ANNs with metaheuristic algorithms substantially enhances prediction accuracy. The results further demonstrate that MVO-MLP is particularly effective due to its well-balanced exploration-exploitation process and lower sensitivity to parameter variations, making it the most reliable hybrid approach for spatial landslide susceptibility modeling (Tables 3, 4, 5 and 6).

Fig. 8
Fig. 8
Full size image

Training (a) and testing (b) stages’ outs for various BHAMLP structures illustrating the performance of different configurations.

Fig. 9
Fig. 9
Full size image

Training (a) and testing (b) stages’ outcomes for various COAMLP structures illustrating the performance of different configurations.

Fig. 10
Fig. 10
Full size image

Training (a) and testing (b) stages’ outcomes for various MVOMLP structures illustrating the performance of different configurations.

Fig. 11
Fig. 11
Full size image

Training (a) and testing (b) stages’ outcomes for various VSMLP structures illustrating the performance of different configurations.

Table 3 An analysis of network outcomes based on the AUC values for a variety of BHAMLP swarm sizes.
Table 4 An analysis of network outcomes based on the AUC values for a variety of COAMLP swarm sizes.
Table 5 An analysis of network outcomes based on the AUC values for a variety of MVOMLP swarm sizes.
Table 6 An analysis of network outcomes based on the AUC values for a variety of VSMLP swarm sizes.
Fig. 12
Fig. 12
Full size image

An analysis of the frequency and MAE error for the best-fit structure proposed by BHAMLP training (a) and testing (b) dataset.

Fig. 13
Fig. 13
Full size image

An analysis of the frequency and MAE error for the best-fit structure proposed by COAMLP training (a) and testing (b) dataset.

Fig. 14
Fig. 14
Full size image

An analysis of the frequency and MAE error for the best-fit structure proposed by MVOMLP training (a) and testing (b) dataset.

Fig. 15
Fig. 15
Full size image

An analysis of the frequency and MAE error for the best-fit structure proposed by VSMLP training (a) and testing (b) dataset.

Taylor diagram assessment

During the validation stage, the performance of the optimization-based hybrid ANN models was further examined using Taylor diagrams50. This graphical method provides a comprehensive evaluation of model accuracy by simultaneously displaying three key statistics: the correlation coefficient (R), the standard deviation (SD), and the root-mean-square deviation (RMSD). In Fig. 16, the gray arc, the blue azimuthal line, and the green contour line represent SD, R, and RMSD, respectively, for both the training and test datasets.

The results clearly demonstrated the validity of the proposed hybrid algorithms in modeling landslide susceptibility. As shown in Fig. 16a, the correlation coefficients for the training data were 0.95 for MVO-MLP, 0.93 for VSA-MLP, and 0.92 for COA-MLP and BHA-MLP. Similarly, in the testing stage (Fig. 16b), the correlation coefficients remained high, with values ​​of 0.90 for BHA-MLP, MVO-MLP, and VSA-MLP, and 0.88 for COA-MLP.

These results not only confirm the strong agreement between observed and predicted values ​​but also highlight the robustness of the hybrid optimization methods. Among them, the MVO-MLP showed slightly stronger correlation in both training and testing, consistent with its superior performance in ROC-AUC analysis. The relatively stable performance of the other algorithms further demonstrates that hybridizing with optimization techniques improves the reliability of ANN models for predicting landslide susceptibility, even across different validation metrics.

Fig. 16
Fig. 16
Full size image

Taylor diagram for the best-fit structures of BHA-MLP, COA-MLP, MVO-MLP, and VS-MLP; (a) training dataset, and (b) testing dataset.

Discussion

Landslide susceptibility maps are powerful tools for identifying high-risk areas and analyzing the relationship between preexisting landslides and environmental conditions. In this study, hybrid ANN models BHA-MLP, COA-MLP, MVO-MLP, and VS-MLP were employed with sixteen influential factors to generate optimized landslide susceptibility maps. The resulting landslide susceptibility indices (LSIs) were used in ArcGIS to produce standardized maps classified into five categories: very low, low, moderate, high, and very high (Fig. 17).

All models successfully identified high-concentration landslide zones, particularly in the North, Southeast, and Central regions, while the Western and Southeastern areas exhibited low to very low susceptibility. This spatial pattern aligns with the distribution of active landslides and highlights the significance of these areas for infrastructure, public services, and economic activities.

Comparative analysis revealed distinct differences in model performance. The MVO-MLP consistently achieved the highest performance across accuracy, stability, and generalization. Its main strength lies in the optimal balance between exploration and exploitation, enabling it to capture complex, nonlinear relationships among multiple factors while remaining relatively insensitive to parameter variations.

The BHA-MLP model also performed well and accurately identified high-risk areas; however, it is slightly more sensitive to swarm size and algorithm parameters, occasionally leading to minor fluctuations in predictive accuracy. The COA-MLP demonstrated good accuracy and effectively captured spatial patterns, but it is more sensitive to selected features and may overfit with larger populations, reducing stability in certain conditions. The VS-MLP model offers good flexibility and high accuracy for small to medium datasets. Yet, its primary weakness is reduced adaptability to complex data patterns and relative sensitivity to initial values, which can slightly lower prediction accuracy under specific circumstances. Overall, this analysis demonstrates that all four models can reliably identify high-risk areas. Nevertheless, MVO-MLP stands out as the most robust, accurate, and flexible model for landslide susceptibility mapping. The other models retain practical value for smaller-scale studies or datasets with limited observations. These differences underscore the importance of selecting an appropriate algorithm and fine-tuning its parameters to achieve optimal predictive performance. These findings demonstrate that the ANN optimization algorithm exhibits high accuracy, consistent with the results reported by Doğan and Ölmez51 in large-scale studies. Similarly, ANNs have been applied in another study in Iran52 to assess landslide susceptibility, yielding results comparable to those of the present research. These comparisons further highlight the robustness and reliability of ANN-based models in predicting landslide-prone areas across different geographic and environmental contexts32.

Fig. 17
Fig. 17Fig. 17
Full size image

In training and test data, we mapped landslide susceptibility using BHA (a), COA (b), MVO (c), and VS (d) with best fits (250, 350, 200, and 150).

Conclusion

This study successfully developed and trained four hybrid ANN models, BHA-MLP, COA-MLP, MVO-MLP, and VS-MLP, to generate a precise and comprehensive landslide susceptibility map for Eastern Azerbaijan. By considering 16 geomorphic and geological factors, the study demonstrated that landslide occurrence can be quantitatively modeled, and that integrating neural networks with metaheuristic optimization algorithms significantly improves prediction accuracy. These findings emphasize that landslide prediction relies not only on individual factors but also on understanding their combined and nonlinear interactions.

One of the key scientific contributions of this research is the establishment of a comprehensive and generalizable framework for landslide modeling using hybrid neural networks and optimization algorithms. This framework effectively identifies complex, nonlinear relationships among environmental factors that affect landslides and accurately detects high-risk areas. Comparative performance analysis showed that MVO-MLP achieved the highest accuracy, stability, and generalizability. At the same time, BHA-MLP and COA-MLP also performed well but exhibited higher sensitivity to population size and algorithm parameters. VS-MLP demonstrated flexibility for small-to-medium datasets but was less capable of adapting to complex patterns than MVO-MLP. This comparison highlights the critical role of selecting appropriate algorithms and fine-tuning parameters to achieve reliable, optimal results. In terms of practical implications, the generated landslide susceptibility maps can inform land-use planning, infrastructure management, early-warning system design, and disaster risk reduction strategies. Identifying areas with high or very high susceptibility enables prioritization of preventive measures, emergency planning, and resource allocation, potentially reducing both human and economic losses. Moreover, these maps provide valuable guidance to decision-makers and disaster management authorities for developing safety policies and enhancing urban and rural resilience. Despite the models’ high accuracy, some areas without a history of landslides may remain vulnerable in the future, underscoring the need to expand the landslide inventory and collect future event data. Additionally, incorporating semi-supervised learning and transfer learning, and accounting for temporal and climate change effects, could further enhance the models’ generalizability and prediction accuracy. Future studies could also explore the algorithms’ sensitivity to feature selection, parameter tuning, and varying environmental conditions to establish a more robust and reliable framework for landslide prediction.

Overall, this study demonstrates that hybrid neural network models, especially MVO-MLP, are powerful, flexible, and reliable tools for landslide susceptibility mapping. These models not only advance scientific understanding of landslide processes but also provide practical guidance for risk management and decision-making in landslide-prone areas. The study lays a foundation for future research aimed at improving model generalizability, enhancing prediction accuracy, and increasing the resilience of communities and infrastructure in vulnerable regions. Consequently, this research contributes to both scientific knowledge and practical applications for safeguarding human life and reducing economic losses from landslides.