Abstract
Accurate flood susceptibility mapping (FSM) is a control approach to flood management. This research introduces a novel approach to increase the accuracy of FSM by optimizing the CatBoost algorithm with two swarm-based metaheuristic algorithms: the Zebra optimization algorithm (ZOA) and the Whale optimization algorithm (WOA). This research addresses the critical gap in determining optimal hyperparameters for machine learning models in FSM. Existing studies often ignore the importance of hyperparameter tuning in FSM, leading to suboptimal model performance. This research seeks to enhance and improve the accuracy of FSM in Shushtar County, located in southwest Iran, by utilizing the CatBoost-WOA and CatBoost-ZOA algorithms. In this research, FSM was conducted by using 13 parameters that affect floods and flood occurrence points as inputs. The evaluation results of the flood susceptibility maps showed an accuracy of 84.2% for CatBoost, 85% for CatBoost-WOA, and 87.2% for CatBoost-ZOA. This represents a 3.0% absolute improvement in accuracy with the ZOA-optimized model over the non-optimized CatBoost. The results of this study demonstrated that integrating swarm-based methods with machine learning boosting algorithms significantly enhanced FSM accuracy. The results of this study, as a non-structural approach, can help managers and decision-makers in flood management and control.
Similar content being viewed by others
Introduction
One of the worst natural disasters is floods, which can significantly affect human life, infrastructure, and the environment41. Some of the significant destructions caused include loss of lives, displacement of people, damage to property, and economic losses58. In past years, there has been an increase in the frequency and magnitude of flooding, creating a challenge in many countries worldwide. The statistics indicate the high damage to infrastructure worldwide, specifically in the Middle East and Iran18,26. Generally, the Middle East has been a region that experiences several hydrological hazards, of which flooding is one,hence, it is prone to flood impacts34. Iran has different topographic and climatic conditions, contributing to the frequency of floods9. The monsoons mainly acted as vital flooding events during this period. Monsoon floods, in particular, involving short and intense rainfall events, are associated with very rapid and excessive runoff, which exceeds the capacities of drainage systems and inundates vast areas40. The extremity and complexities of the impacts of monsoon floods underpin the necessity for effective flood management practices9.
In the last few decades, remote sensing (RS) has demonstrated its worth in flood monitoring and management with the support of geographic information systems (GIS)8. RS facilitates detailed and updated information about flood situations, such as extent, water depth, and related variables that can be taken from satellites39. However, RS is not enough for flood management. Information gathered by RS is integrated and provided in a more combined way by GIS technology8. This can also be used in spatial analysis and data visualization to make proper decisions to manage floods.
The use of GIS technology is very vital in the preparation of flood susceptibility mapping (FSM)38. It would be very beneficial to explain the spatial distribution of areas that floods could affect by interpreting FSMs. This information is essential because it provides meaningful information about high-risk areas, areas of high value, and what mitigation strategies should be considered46. Basically, with the help of FSMs, decision-makers would efficiently complete infrastructure projects that needed to be developed and provide a focused approach toward reducing the potential harm from flooding to communities38.
Accurate modeling of flood susceptibility is essential to develop effective flood management and mitigation strategies. Nevertheless, there is still a difficulty in obtaining higher accuracies when modeling FSMs27. Machine learning algorithms have demonstrated significant potential in FSM by leveraging the correlations between several spatial features and the occurrence of floods53,66. However, achieving higher modeling accuracy requires careful consideration of the hyperparameters in machine learning algorithms71. These settings govern the behavior and performance of machine learning models, determining how a machine learning model learns from training data, generalizes to unseen data, and adapts to specific domains54. The hyperparameters should be chosen carefully because different settings can significantly affect the accuracy of the flood susceptibility modeling49. The selection of inappropriate or suboptimal values for hyperparameters can lead to models that underfit the data, resulting in poor performance with limited predictive ability, while using models with overly complex hyperparameters can result in overfitting models that capture noise and perform poorly in generalization on unseen data54. Through careful tuning and selecting hyperparameters, researchers can improve the modeling accuracy of flood susceptibility predictions49.
Therefore, the purpose of this research is to prepare an FSM for a region in southwest Iran by optimizing and improving the accuracy of the CatBoost machine learning algorithm using two swarm-based metaheuristic algorithms: the Whale optimization algorithm (WOA) and the Zebra optimization algorithm (ZOA). This research addresses several important gaps in existing literature. First, although the usage of machine learning models has increased with FSM, many studies do not pay attention to hyperparameter optimization, and depend on parameters that are, at best, set to default, or chosen manually. Second, although swarm intelligence is a highly effective optimization methodology in countless examples, the use of swarm intelligence as a means to optimize FSM with hyperparameters for machine learning models is limited. The majority of past studies have only utilized traditional, while little has been done with newer, less common metaheuristics like WOA and ZOA. Third, most studies have not utilized the methods of ensemble boosting algorithms such as CatBoost, or swarm-based optimizers, with FSM, which is disappointing since these two areas of study can drastically improve predictive accuracy.
Literature review
Past research
In recent years, FSM has been used as a flood control and mitigation management approach. In terms of time, research in the field of FSM has been increasing since 2006, reaching its peak in 2023, and most research was carried out from 2022 to 2024 (Fig. 1).
Regarding the number of studies conducted in different regions or institutions worldwide, most studies have been conducted in the United States, India, China, and Iran. The results show that most research has been conducted in Asian regions. This could be due to this hazard during the monsoon period in Asian countries. In addition, floods in Iran have been one of the most critical risks, and much research has been conducted so far (Fig. 2).
Regarding subject matter, most of the research done on FSM is in the fields of geography, geology, environmental science, flood myth, and computer science. Research shows that machine learning and artificial intelligence are among the most popular fields in FSM research (Fig. 3).
Various studies have been conducted in the field of FSM using machine learning algorithms, which are mentioned below as examples of some research. Khosravi et al.29 used the logistic model trees (LMT) algorithm to create FSM in a region in northern Iran. Al-Abadi3 compared random forest (RF), rotation forest, and AdaBoost algorithms in an area in Iraq. Rahman et al.45 investigated FSM using artificial neural networks (ANN) and logistic regression (LR) algorithms. Shahabi et al.52 used the combined Bagging algorithm and K-nearest neighbor (KNN) to prepare an FSM in northern Iran. Vojtek et al.65 compared the boosted classification tree (BCT) and Boosted Regression tree (BRT) algorithms in the Topľa river basin, Slovakia. Kundapura et al. (2022) compared support vector machine (SVM), RF, and Bagging algorithms. Bhattarai et al.10 investigated RF, ANN, and SVM algorithms for FSM. Wedajo et al.68 investigated RF, Linear Regression, SVM, and long short-term memory (LSTM) algorithms in the Amibara area and Ethiopia to prepare FSM. Hitouri et al.25 investigated SVM, RF, extreme gradient boosting (XGBoost), and classification and regression trees (CART) methods in the Metlili watershed, Morocco. Most studies have used RF, SVM, and ANN machine-learning algorithms for FSM.
In this study, the CatBoost algorithm was used as the basic model for the FSM. Although deep learning algorithms are powerful, they typically require large, high-dimensional datasets such as imagery or time-series data. In contrast, CatBoost is highly efficient and well-suited for structured tabular data, such as the environmental and topographic variables used in this study. This algorithm has advantages such as automatic handling of categorical features and missing values. To date, the CatBoost algorithm has been used in various studies in the field of FSM, and some examples are mentioned below. Seydi et al.51 reached an accuracy of 96.5% in preparing FSM in areas of Iran. Van Phong et al.64 achieved 94% accuracy in a region in Vietnam with this algorithm. Kurugama et al.31 achieved 92.5% accuracy with this algorithm. Saravanan et al.50 in the Idukki district in Kerala achieved 79% accuracy with this algorithm in FSM.
In recent years, researchers have started using metaheuristic and machine learning algorithms to solve the problems of machine learning algorithms in FSM. Wang et al.67 optimized the adaptive neuro-fuzzy inference system (ANFIS) using biogeography-based optimization (BBO) and imperialistic competitive algorithm (ICA) algorithms. Chou et al.15 optimized the convolutional neural network (CNN) algorithm using the galactic swarm optimization (GSO) algorithm. Ruidas et al.48 optimized support vector regression (SVR) using particle swarm optimization (PSO) and the Grasshopper optimization algorithm (GOA). Plataridis and Mallios44 addressed RF optimization using an artificial Bee colony (ABC). Cui et al.17 optimized the back propagation neural network (BPNN) algorithm using the genetic quantum algorithm (GQA). Abba et al.1 optimized the LSTM algorithm by invasive weed optimization (IWO) and harmony search (HS) algorithms. Tinh et al.62 optimized the TensorFlow deep neural network (TFDNN) using the Harris Hawks Optimization (HHO) algorithm. This study used two swarm-based metaheuristic algorithms based on the WOA and ZOA to create the FSM to optimize the CatBoost algorithm. Both empirical performance characteristics and algorithmic suitability for complex, nonlinear optimization problems, such as hyperparameter tuning in ensemble learning models, drove the selection of the WOA and ZOA.WOA is a metaheuristic algorithm based on the bubble-net hunting strategy of humpback whales. It is well-regarded for its exploration–exploitation balance, low parameter sensitivity, and ability to avoid local minima through adaptive mechanisms (Mirjalili 2023). ZOA, a more recent algorithm inspired by the social behavior and movement patterns of zebras, has shown promise in escaping local optima and maintaining population diversity. It employs a randomized, nonlinear search mechanism that improves its global search capability while retaining convergence efficiency63.
Research gap and innovation
Hyperparameter tuning is an essential step in the performance of machine-learning algorithms. Appropriate hyperparameters can significantly improve the model accuracy, generalization to unseen data, and convergence speed47. Conversely, poor hyperparameter selection can lead to overfitting, where the model remembers the training data but performs poorly on the new data48. Overfitting occurs when the model picks up noise from the training data, which affects its performance for new data. Conversely, oversimplified models owing to hyperparameter choices can perform poorly because they fail to capture the underlying patterns in the data1. Hyperparameter tuning involves searching for a complex, high-dimensional hyperparameter space to find the optimal configuration for a model. One of the problems with machine learning algorithms in the field of FSM is the lack of optimal determination of hyperparameters in these algorithms. Optimum determination of these hyperparameters in FSM helps solve overfitting.
In existing FSM literature, hyperparameter tuning is either entirely overlooked or handled through basic grid search techniques, which are computationally expensive and limited in their capacity to escape local optima. Although some studies have applied metaheuristic algorithms for this purpose, these have mainly been used in conjunction with traditional classifiers, like SVM or RF, rather than advanced boosting algorithms like CatBoost. Moreover, to the best of our knowledge, there is no prior study that combines CatBoost with either the ZOA or the WOA for FSM.
The CatBoost algorithm is a robust machine learning algorithm, and one of its disadvantages is the lack of optimal determination of its hyperparameters2. Therefore, this research aimed to improve the modeling accuracy and eliminate overfitting and generalization errors in the CatBoost algorithm in FSM using two metaheuristic algorithms, ZOA and WOA. The primary contribution of this study is the new application of two hybrid models (CatBoost-ZOA and CatBoost-WOA) for FSM, the first application of these algorithmic combinations in the FSM area. Although CatBoost, a gradient boosting algorithm, has gained popularity for its performance with structured data tasks, geospatial flood modeling using the CatBoost algorithm is limited, especially in the context of integrating advanced metaheuristic optimization approaches.
Data and methods
Case study
Shushtar County is located in the southwest of Iran in Khuzestan Province (Fig. 4). This County is situated between 48°34′ to 49°12′ east longitude and 31°36′ to 32°8′ north latitude. The study area has an average rainfall of 294.8 mm and an average temperature of 26.8 °C. The geological structure of the region is a sequence of the Zagros mountain range, which stretches from north to southeast and forms a wide range of mountains in western Iran. The significant formations in this area include the Asmari, Gachsaran, Mishan, Aghajari, and Bakhtiari formations and sediments of the fourth geological period. According to soil science studies, the lands located within the boundaries of Shushtar include mountains, hills, sedimentary plains, and pebble-shaped Babzani alluviums, and most of the land is arable and irrigated based on the usual soil science standards.
Methodology
The study employs a technique that comprises five primary stages (Fig. 5): (1) Gathering flood samples and determining the associated flood conditioning factors. Subsequently, the dataset is partitioned into training and testing subsets using a 70:30 ratio. (2) We used the Frequency Ratio (FR) approach to determine significant scores for each class of factors and executed a multicollinearity test to find correlated conditioning factors. (3) Creating and optimizing a CatBoost model using swarm-based metaheuristics algorithms (WOA and ZOA). (4) Constructing an FSM utilizing three development models: CatBoost, CatBoost-WOA, and CatBoost-ZOA. (5) Assessing the prediction capabilities of these models by employing diverse performance indicators.
Data
Flood dataset
Satellite imagery has been used to identify the study zone’s flood spots. Sentinel-1 images were used in the Google Earth Engine (GEE) system (https://earthengine.google.com/) to monitor floods between 2017 and 2022. To prepare the flood dataset for modeling, the spatial distribution of flood locations has been represented as individual points. In addition, an equivalent number of non-flood locations have been chosen to train the flood model. The values 1 and 0 represent flood events and non-flood events, respectively. The complete dataset has been partitioned into two segments: the training and testing sets. The training dataset comprises 70% (273) flood and non-flood points for model training, while the remaining 30% (117) points constitute the testing dataset used for model validation (Fig. 4). This 70:30 ratio is a widely accepted convention in supervised learning tasks, particularly when working with moderately sized datasets, as it offers a balanced trade-off between two objectives: (1) providing the model with a sufficient number of samples to learn complex patterns in the training phase, and (2) retaining enough independent data to ensure robust and unbiased model validation during testing.
Flood condition factors
Natural disaster research, including flood susceptibility modeling, considers various factors that might cause or mitigate disasters. Consequently, choosing this data is the most crucial step in developing flood susceptibility models since it will significantly influence the study’s quality and the conclusions’ correctness46. According to flood susceptibility research, many factors impact the occurrence, development, and progression of floods. In this regard, prior research has been considered when choosing the controlling parameters. For this purpose, 13 spatial factors affecting flooding were considered in this study3,47,48 (Fig. 6a–m).
Factors influencing flood occurrence. Map generated using ArcGIS Desktop 10.8 (Esri, https://www.esri.com).
Topographical parameters affect how water flows on the ground, accumulates, and finally drains and significantly impact the occurrence of floods4. This study extracted topography parameters, including Topographic Wetness Index (TWI), slope, elevation, stream power index (SPI), aspect, plan curvature, and profile curvature from Shuttle radar topography mission (SRTM) images with a pixel size of 30 × 30 m in GEE. Then, ArcGIS 10.8 and SAGA GIS 8.2.1 software were used to process and prepare the parameters. Land cover and Normalized Difference Vegetation Index (NDVI) parameters significantly affected flood susceptibility by affecting runoff production, water infiltration and storage, and surface runoff57. These two parameters were prepared using the Landsat-8 image in the GEE system, which had a pixel size of 30 m × 30 m from 2017 to 2022. RF classification method was used for land cover. Rainfall is a crucial factor in flood research as it directly and indirectly impacts other flooding-related variables. Important determinants of flood frequency include the location, severity, and total amount of rainfall11. The rainfall map was prepared using the average rainfall data between 2017 and 2022 at 60 rain gauge stations in Khuzestan Province. These data were obtained from the Iranian Meteorological Organization, and the kriging interpolation method was used in ArcGIS 10.8 to prepare a rainfall raster map. Calculating the distance from rivers to each pixel is essential because waterways and their tributaries serve as the primary routes for flooding69. The Euclidean distance approach has been employed to compute the spatial distance between raster pixels and rivers. The river layer was obtained from the Natural Resources Organization of Iran with a scale of 1:50,000. Lithological factors affect flood susceptibility, including infiltration, runoff, and erosion46. This factor was extracted from the Iranian geological layer on a scale of 1:100,000. The soil texture factor affects the occurrence of floods by affecting the infiltration, water-holding capacity, and erodibility55. Two factors, lithology and soil texture, were processed and prepared using ArcGIS 10.8 software.
Methods
Multicollinearity analysis
Multicollinearity is a strong correlation between two or more predictive variables in multivariate regression. This condition can lead to inaccurate statistical inferences, indicating a form of data disorder (Bui et al. 2011). In the regression dataset, the Variance Inflation Factor (VIF) measures multicollinearity (Pradhan et al. 2017; Shogrkhodaei et al. 2021). The values of the VIF index greater than 10 indicate multicollinearity between factors. So, if the values of a factor are greater than 10, that factor should be excluded from modeling (Razavi-Termeh et al. 2020).
Frequency ratio (FR)
FR is a widely employed technique in evaluating flood susceptibility59. FR measures the likelihood of an event happening based on all the factors that influenced a similar event in the past compared to the possibility of it not happening16. The flood susceptibility assessment considers both the locations with high flood severity and the extent of the areas affected by the parameters employed in the research area (Shafapour Tehrany et al. 2019) (Eq. 1).
X represents the proportion of flood surface area within each subclass of a parameter that affects flooding. In contrast, Y represents the proportion of each subclass of a parameter that affects flooding within that parameter.
CatBoost algorithm
In 2017, the Russian search engine Yandex debuted CatBoost, an algorithm for enhancing the search results. Owing to its enhanced feature properties and resolution of prediction shifts, CatBoost outperforms conventional Gradient Boosting Decision Tree (GBDT) methods7. This advancement guards against overfitting problems, strengthening the model’s ability to generalize and withstand challenges and producing more precise prediction outcomes72. Some features of the CatBoost algorithm include using ordered boosting to overcome target leakage problems, being useful for small datasets, controlling categorical features, and successfully handling various data types and formats28,30. The output of the CatBoost algorithm’s estimation is described as follows19 (Eq. 2):
where \(\text{H}\left({\text{x}}_{\text{i}}\right)\) is a decision tree function of explanatory variables \({\text{x}}_{\text{i}}\), and \({\text{R}}_{\text{j}}\) is the disjoint region corresponding to the leaves of the tree.
The CatBoost algorithm processes samples with random permutations and mean-label value calculation methods. Additionally, it effectively reduces the impact of noise from low-frequency categorical data by employing a prior distribution term. This approach optimizes processing capacity for high-dimensional sparse data using a base model of a fully symmetric tree33.
Whale optimization algorithm (WOA)
The WOA was first proposed by Mirjalili and Lewis in 2016 and is a swarm intelligence optimization algorithm. Humpback whales’ natural hunting mechanism inspired this program, which mimics the pods’ diminishing surroundings, spiraling position updates, and erratic hunting behaviors70. In the WOA, the hypothesis states that each solution is represented as a whale, and the whale attempts to occupy a new position in the search space, regarded as the benchmark for the best element in the group. Whales use two mechanisms to search for prey and attack: encircling prey and creating bubble nets. In the case of optimization, search space exploration occurs when whales search for prey, and exploitation happens during attack behavior23. The steps of the WOA algorithm are described below5:
a) During the initial hunting phase, whales encircle the prey spotted only once. The program takes into account the optimal position for locating prey. Thus, whales navigate to the optimal position using Eqs. 3–623:
The position of the whale in the next iteration, \(\overrightarrow{\text{X}} \left(\text{t}+1\right)\), is determined by the position of the best solution, \({\overrightarrow{\text{X}}}^{*}\), along with the coefficient vectors \(\overrightarrow{\text{A}}\) and \(\overrightarrow{\text{C}}\). Additionally, a random vector \(\overrightarrow{\text{r}}\) in the range [0,1] and a decreasing number \(\overrightarrow{\text{a}}\) from two to zero in each iteration are used. In the initial relationship that governs the whales’ position updates during each iteration, it is imperative to modify the vectors \(\overrightarrow{\text{A}}\) and \(\overrightarrow{\text{C}}\) to enable the whales to relocate to various positions, thereby optimizing the solution23.
b) The second stage is called exploration, and for efficiency and convergence, the global optimality of the algorithm is required to use both the exploitation and exploration phases. During the exploration phase, the search agents do not select the optimum solution. Instead, they randomly choose another search agent and move towards it. To facilitate this movement, vector \(\overrightarrow{\text{A}}\) is utilized (Eqs. 7–8)23:
Once the termination condition is met, the search agents continue until the algorithm discovers the global optimum.
Zebra optimization algorithm (ZOA)
The ZOA, a metaheuristic algorithm that debuted in 2022, takes its cues from how zebras act in the wild63. In the social life of zebras in nature, there are two behaviors: “searching for food and defense tactics against attackers,” which are essential. The zebra leader allows the rest of the pack to follow in their footsteps to get closer to the food source21. The Zebras have two defense techniques against their enemies: the first one is the zigzag flight pattern they use to escape, and the second one, they may occasionally try to confuse or frighten the hunter by gathering63. ZOA mimics the actions of zebras as they forage for food and defend themselves from predators. Finding the right mix between exploring and exploiting could be the key to using the ZOA to solve optimization challenges in the real world21,63. The following presents mathematical simulations of natural zebra behavior for the ZOA model22.
Initialization
Every zebra represents a possible response, and the area in which they are situated represents the search space for the subject of interest. A single vector is sufficient to represent each zebra. To construct the ZOA population matrix, the following Equation must be satisfied22.
P, \({\text{P}}_{\text{i}}\), and \({\text{P}}_{\text{i},\text{j}}\) are the zebra population, ith zebra candidate, and jth problem variable suggested by the ith zebra candidate, respectively. N represents the number of search factors, and m represents the number of variables to be set. The values of the fitness function are described by Eq. 1022.
F and \({\text{F}}_{\text{i}}\) are a column vector containing fitness function candidates and the fitness function value determined for the first zebra, respectively.
Foraging activity
The most competent member of a zebra optimizer population becomes the leader and is tasked with recruits additional group members to participate in the study area. The following Equations are used to model the zebras’ location update throughout the foraging period22:
\({\text{P}}_{\text{i}}^{\text{new},\text{S}1}\) represents the update of the ith zebra according to the first stage, and \({\text{P}}_{\text{i},\text{j}}^{\text{new},\text{S}1}\) is the value of its jth dimension, \({\text{F}}_{\text{i}}^{\text{new},\text{S}1}\) represents its fitness function, ZL represents the zebra leader, and \({\text{ZL}}_{\text{j}}\) is its i-th dimension, r means an arbitrary value between 0 and 1 and \(\text{I}=\text{round}(1+\text{rand})\).
Anti-Predator defense technique
Here, we update the search space placements of the ZOA population’s individuals by mimicking zebras’ defensive strategies22. This stage includes two techniques: The defensive technique against the lion and the defensive technique against other predators.
Validation
Several assessment measures were employed to assess and validate the created models and FSMs. These metrics were evaluated in two categories: evaluation of the developed models (Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Square Error (RMSE) indices) and evaluation of flood susceptibility maps (Receiver Operating Characteristic (ROC)). Equations 13–15 can be used to determine the models’ evaluation metrics by comparing the actual values with their predictions12,47.
The expected value is denoted by \({\text{y}}^{\prime }\), the actual value is represented by y, and the number of samples is denoted by n. The ROC curve and Area Under the Curve (AUC) values method are commonly employed in natural disaster research to evaluate the efficacy of susceptibility models created to analyze flood disasters35. By visualizing the True Positive Rate (TPR) and False Positive Rate (FPR) using ROC and AUC values, the effectiveness of binary classification models may be evaluated (Eq. 16)12.
The AUC is a numerical value that measures the performance of a classification model. It runs from 0 to 1 and is calculated using Eq. 1747.
P is the sum of all flood data, while N is all data points that do not include flood data.
Model implement
Models for flood susceptibility were created in the Google Colab environment using Python. This is accomplished by taking the values of thirteen essential elements at the sites where floods have occurred as input and producing a likelihood prediction for each pixel in the research region. The CatBoost model was optimized using two metaheuristic algorithms (WOA and ZOA) with a population of 100 and 50 iterations. The control parameters of the metaheuristic algorithms were determined through trial and error. These two algorithms minimize the objective function in different iterations. The objective function for optimizing the hyperparameters of the CatBoost model is the NRMSE (Normalized Root Mean Squared Error) index (Eq. 18)32.
where \({\text{y}}_{\text{max}}\) and \({\text{y}}_{\text{min}}\) are the maximum and minimum observed values, respectively, finally, these two swarm-based algorithms optimize the CatBoost hyperparameters in different iterations.
Result
Result of data preprocessing
When the spatial factors were transformed into a spatial layer in the first step, they were all normalized between the values of 0 and 1 and then tested for multicollinearity using a VIF test. The results of the VIF test are summarized in Table 1. The VIF values showed that all factors have VIF values below the critical value of 10, suggesting that multicollinearity is not a problem and that all factors were applicable in the modeling process. Containing the most considerable VIF value, the slope (5.948) was only slightly problematic with multicollinearity, while the aspect (1.065) showed very little multicollinearity. Also, the tolerance value for all factors is greater than 0.1, indicating the absence of multicollinearity in the effective factors.
Using an FR method, the second step assessed the relationship between the flood occurrence points and the probability of occurrence in different influencing factors. The results are detailed in Table 2. For NDVI, the class of 0.21 to 0.35 had the highest FR of 1.13, meaning that areas with these values are more likely to have floods. For altitude, the class of 0–50 had the largest FR of 1.13, showing that floods most likely occur in these areas. Slope 0–2.6 had the highest FR of 1.15, showing that flat regions are more likely to flood. Proximity to the river is a significant factor in showing the flood risk, as the 0–100 m had the highest FR (2.65). For TWI, the class of 6.2–7.7 had the highest FR of 1.41, showing that TWI substantially impacts flood occurrence. The class of SPI values from 200 to 300 had the greatest FR of 1.21, indicating a medium correlation with flooding.
In comparison, for profile curvature, the FR value of 1.05 for values between -0.001 and 0.0006 indicates that profile curvature had a minor impact on flood susceptibility. The class of rainfall between 97 and 105 mm had the greatest FR of 1.28, suggesting this range of rain significantly impacts the occurrence of floods, while the FR for the convex plan curvature class was the greatest (FR of 1.006) and indicates the convex plan curvature class has a minor impact on flood risk. The FR for the loam soil texture class was the greatest at 1.13, indicating a moderate effect on flood events. Meanwhile, the class of NW facing had an FR of 1.29, suggesting a directional significance to flood risk. The lithology class Qft2 had the greatest FR of 1.12, indicating a significant association with flooding. The results of the land cover showed that the water body class with a weight of 1.15 had the greatest impact on the occurrence of floods in the study area.
Result of model optimization
To optimize the hyperparameters of the CatBoost model, the WOA and ZOA were utilized as swarm-based algorithms. The hyperparameters of the CatBoost model were modified to accomplish this objective depending on the iterations and batches necessary to minimize the target function (NRMSE). The objective function value was 0.54 when the population size of the WOA was 50 and 40 epochs were run. On the other hand, the objective function value was 0.49 for the ZOA algorithm with a population of 100 and the same number of epochs. The optimized hyperparameters using WOA and ZOA are summarized in Table 3. Compared with the WOA, ZOA was more accurate in enabling the optimization of the CatBoost model.
Comparison of state-of-the-art models
Three developed models, CatBoost, CatBoost-WOA, and CatBoost-ZOA, were trained and tested with training and test datasets. To determine the performance of the models, metrics were calculated based on the differences between the predicted and actual flooding instances, presented in Table 4. The CatBoost-ZOA model outperformed all other models during the testing. The CatBoost-ZOA model demonstrated the lowest MSE of the training dataset at 0.025 and the test dataset at 0.2. In addition, the CatBoost-ZOA model had the lowest RMSE of the training dataset at 0.157 and the test dataset at 0.448. The results indicate that the CatBoost-ZOA model is the most accurate and has the best generalization ability. The CatBoost-ZOA model was also the best, with the MAE of the training dataset at 0.107 and the test dataset at 0.361, further indicating the CatBoost-ZOA model outperformed any other model.
The comparison results of the standalone CatBoost model and CatBoost-WOA model performance metrics for MSE (0.216) and MAE (0.369) of the test dataset showed better values for the CatBoost-WOA model. For RMSE, there was no significant improvement between the standalone CatBoost model and the CatBoost-WOA model, with values of 0.470 and 0.464, respectively. This verifies that the ZOA integration significantly improves the accuracy and reliability of identifying flooding-prone areas using the proposed approach.
Determining the importance of spatial factors
The importance of the factors that affect flooding was evaluated in Table 5 using the CatBoost-ZOA model. The conclusion is that the aspect variable was the most critical factor, with a value of 14.65, suggesting that it significantly influences flood incidence. In addition, the distance to the river (9.7) and TWI (9.51) had high values, which underlines the importance of using water bodies and topographic wetness in flood estimation. In terms of the value of influence, altitude had a value of 9.006, and rainfall was 8.77, which shows that these two factors determine flood susceptibility in a significant manner. The other factors, soil texture (8.18), plan curvature (7.1), NDVI (7.07), land cover (6.96), and profile curvature” (6.87), also had considerable importance in our model, suggesting they are helpful for flood risk assessment. Several factors have a high impact on the model, but their effect was less than the factors mentioned above, such as slope (5.16), SPI (3.72), and lithology (3.17).
Flood-prone areas mapping and validation
To establish flood-prone locations across the whole study area, three novel computational models (CatBoost, CatBoost-WOA, and CatBoost-ZOA) were adopted to predict the probabilities of flood occurrence for each pixel in the study site. The likelihood of occurrence, which covers from 0 to 1, signifies that 0 is without risk, and 1 is the highest probability of flooding. The natural breaks technique was employed to sort and classify the different levels of flood susceptibility. The natural breaks method found the optimal cut-off point for the classification, which effectively groups the data according to natural groupings within it. Using this method, the flood-prone areas were grouped into five categorical levels of susceptibility: very low, low, moderate, high, and very high. This classification was decided for the risk of flooding, which makes it easy for researchers to visualize and comprehend the spatial distribution of flood susceptibility, as shown in Fig. 7.
Flood susceptibility maps generated by three models: (a) CatBoost, (b) CatBoost-WOA, and (c) CatBoost-ZOA models. Map generated using ArcGIS Desktop 10.8 (Esri, https://www.esri.com).
To evaluate different flood-prone area maps created by three models, actual points of flooding events were used from satellite imagery. The performance of the models was measured using the ROC curve, a graphical tool to measure the diagnostic ability of the model. The AUC values demonstrated the predictive performance of each model (Table 6 and Fig. 8). The CatBoost-ZOA model had the highest AUC value of 0.872 with 95% Confidence Intervals (CI) of 0.822 to 0.912, showing the best accuracy in predicting flood-prone areas. Delivering an AUC of 0.850 with a 95% CI of 0.797–0.893, the CatBoost-WOA model showed slightly lower predictive performance yet was still strong enough to be considered. Despite a lower AUC, the CatBoost model at baseline had an AUC of 0.842 with a 95% CI of 0.789–0.886, suggesting good but lower performance on the flood-optimized model. The validation results showed that the CatBoost-ZOA model accurately maps flood-prone zones, which can be utilized as a practical tool for flood risk assessment and management in a region at risk for flooding. Having a higher AUC value, the superior predictive performance of the CatBoost-ZOA model for mapping flood-prone areas is due to the ZOA producing more optimal hyperparameters.
Summary of result
This study combines swarm-based metaheuristic algorithms with boosting algorithms in flood-prone area mapping. The FR method highlighted essential classes that contribute to flood incidence, including crucial classes such as proximity to the river, altitude, and slope. The WOA and ZOA were applied to optimize the hyperparameters of the CatBoost model. Of the two swarm-used algorithms, the ZOA performed better than the WOA, resulting in a lower objective function value (NRMSE) of 1.22 and a higher optimization efficiency. The performance of the optimized models (CatBoost, CatBoost-WOA, and CatBoost-ZOA) was compared using different performance metrics. The model of CatBost-ZOA demonstrated the best performance in flood-prone area mapping, which gives lower MSE, RMSE, MAE, and MAPE values in the training and test datasets.
In this research, the CatBoost-ZOA model was used to assess the importance of spatial factors. According to the model, spatial factors were identified as the most significant and ordered by importance: distance to the river, aspect, TWI, and altitude. The model performance was evaluated using flood occurrence points, and the model’s significance was assessed. The ROC curve analysis showed that the CatBoost-ZOA model had an AUC of 0.872 and the best accuracy for classifying floods in this study. Therefore, the CatBoos-ZOA model is a very suitable initial model for the spatial assessment of floods and can play a good role in flood risk assessment and management tool development.
Discussion
Policy implication
The results of this study have significant implications for flood risk management and policy control. Decision-makers can prioritize areas with the highest flood risk for interventions, such as building flood defense systems, strengthening drainage systems, and implementing early warning systems.
The analysis of flood-conditioning factors based on FR offers invaluable data contributing to risk-informed decision-making. For example, flood-risk zones within 100 m of rivers are characterized by the highest FR, making these areas suitable for no-build zones or conservation buffers with strict development approvals. Similarly, flat, low-elevation flood-prone areas that are observed to have < 2.6° slope and elevations < 50 m warrant site-specific planning for flood controls including enhanced drainage infrastructure, stormwater detention basins, and elevated construction.
Areas characterized by high TWI and SPI values indicate areas of water accumulation and concentrated runoff potential where infiltration zones, bio-swales, or engineered wetlands can be implemented. The findings of FR related to rainfall provide justifications for local early warning or alarm systems as well as rainfall intensity levels that trigger alerts for evacuation, or readiness of local systems preparedness. From a land-use planning perspective, areas with low vegetation cover and water-sensitive soil or lithological types should be gradually converted through restoration, afforestation, or erosion control conducive to infiltration and permeability improvements.
Urban planners can reduce flood risk by including green spaces and permeable surfaces to increase water absorption and reduce runoff. Accurate flood risk maps will enable the development of more effective evacuation plans and ensure that residents of high-risk areas are quickly moved to safer locations. These models can also support real-time decision-making during flood events.
The use of artificial intelligence methods can create automatic systems without user intervention in the future that will detect and identify flood-prone areas before a flood occurs based on the environmental characteristics of the region, based on Internet of Things (IoT) systems, warnings for users subject to flooding; in this research, artificial intelligence has been used in determining flood areas, but in the future, these algorithms can be used in flood early warnings according to the context of users.
Insights from spatial factors
Based on these results, areas with low elevations and slopes in the study area were more affected by floods. Water flows downhill, and low-elevation areas are often the final destinations for runoff from uplands13. Water flows more slowly on a shallow slope, allowing it to pool more significant amount73. Areas with low vegetation cover had a more significant impact on flooding. A low NDVI indicates low or unhealthy vegetation, which means less water uptake and increased runoff potential74,75,76,77,78,79,‚80. Based on these results, lower values of SPI and higher values of TWI had a more significant impact on floods in the study area. A low SPI indicates a stream with less erosive power, indicating less potential for channel incision, which can reduce the capacity of the channel to handle increased water flow. Consequently, flooding may be more likely36. Areas with high TWI values have more potential for water retention because of factors such as flat terrain, concave slopes, and large upstream contributing areas61.
In this study, distances closer to the river affected the intensification of the occurrence of floods. Rivers can erode their banks, leading to instability and potential breaches, which can cause flooding6. Areas with somewhat more rainfall than the rest of the regions were more likely to flood, given the low average rainfall in the study area. When rainfall exceeds the ability of land to absorb it, excess water flows into rivers, streams, and other bodies of water, causing them to overflow14. According to the curvature criteria results, concave areas were more likely to flood in the region. Concave areas often have higher groundwater levels, which can contribute to flooding during heavy rainfall24.
Based on these results, areas with loam soil texture and lithology unit Qft2 significantly impacted flooding. Loams can help reduce surface runoff compared to clay or sandy soils, which can contribute to flooding46. The NW aspect may retain more moisture, potentially increasing the runoff and flood risk during heavy rainfall events56. However, this aspect may affect different geographic conditions and regions differently. The land cover factor results showed that flooding is more common in water body areas. Because they are located in low-altitude regions and are on the path of floods, water bodies can contribute to the occurrence of floods in these areas43.
Comparison of models
The flood susceptibility modeling results showed that the developed models’ accuracy was greater than 80%, which indicates a good accuracy of the modeling in the study area. Swarm-based metaheuristic algorithms increased the accuracy of the standalone CatBoost algorithm by 0.8–3%. A swarm’s intelligence and behavior are unique to each individual. Swarm intelligence is a potent instrument for tackling difficult problems because of the coordinated actions of many individuals. Swarm intelligence algorithms often use parallel processing, allowing shorter computation times and more efficient solutions. The best solutions are often stored in memory, and the search space is conserved throughout iterations20.
Combining swarm-based metaheuristic algorithms with machine-learning algorithms has successfully modeled flood susceptibility. Among these algorithms, PSO is used for ANFIS optimization (with 94.5% accuracy)60, Grey Wolf Optimizer (GWO) for SVR optimization (with 85.7% accuracy)42, and IWO for RF optimization (with 98.3% accuracy)47. Among the algorithms used in this research, the ZOA is more accurate than the WOA. The disadvantages of the WOA algorithm include slow convergence and getting stuck in local optima when dealing with complex high-dimensional problems37. The ZOA has the advantages of a simple structure and easy implementation. ZOA can balance exploration (searching for new potential solutions) and exploitation (refining existing solutions).
Limitations and suggestion
This research presents innovations in flood susceptibility modeling, combining boosting and swarm-based metaheuristic algorithms for the first time. This research has demonstrated the ability of GeoAI algorithms to model flood susceptibility. However, this research has limitations that can be considered to improve research in this field in the future. In the first instance, the models examined in this study are potentially more applicable to the unique context of Shushtar County and it is likely that their generalizability to other geographic locations will be limited. Flood susceptibility is a localized issue which depends on a range of factors including local environmental characteristics, hydrological processes, and land use practices, which all vary from place to place, meaning that the model parameters and performance may not directly extend or be transferable outside of its context without significant adaptation, and retraining on local data. Future research should therefore test and validate the proposed methodology in disparate geographic or climatic contexts, bringing greater knowledge as to its applicability and robustness.
This research has investigated flood susceptibility. While reaching a comprehensive framework in flood management, it is necessary to simultaneously examine the concepts of susceptibility, vulnerability, and hazard in floods. This study uses a traditional approach to displaying flood susceptibility maps. However, according to recent developments in displaying spatial maps in different forms, future research can use technologies such as augmented reality and virtual reality to improve the visualization of flood susceptibility maps. Also, a comprehensive network of sensors for immediate monitoring and warning of floods in risky areas can be considered in future research.
Conclusion
This study tackled the challenge of identifying flood-prone areas by applying advanced computational techniques. We developed and tested a novel hybrid framework that combines boosting algorithms with swarm-based metaheuristics to improve the accuracy and efficiency of flood susceptibility modeling. By integrating CatBoost with the ZOA and the WOA, we achieved enhanced predictive performance, with the CatBoost-ZOA model reaching a modeling accuracy of 87.2%. The ZOA improved accuracy by 3% compared to the baseline, confirming the value of hyperparameter optimization through swarm intelligence. Combining boosting and swarm-based algorithms shows superior performance in distinguishing between flood-prone and non-flood-prone areas. In the study area, low-altitude areas and distances near rivers are the leading causes of flood spread. Therefore, creating structural flood control approaches and the non-structural approaches proposed in this research can be considered a practical step in flood management and control in the study area. In addition, considering that most of the land cover of the study area is agricultural, it was used to control floods by creating earthen dams along the flood path and storing and strengthening underground water in farming areas.
Data availability
The data that support the findings of this study are available on request from the corresponding author.
References
Abba, S. I., Al-Areeq, A. M., Ghaleb, M., Kawara, A. Q. & Razavi-Termeh, S. V. Flood subsidence susceptibility mapping using persistent scatterer SAR interferometry technique coupled with novel metaheuristic approaches from Jeddah, Saudi Arabia. Neural Comput. Appl. https://doi.org/10.1007/s00521-024-09909-2 (2024).
Ahn, J. M., Kim, J. & Kim, K. Ensemble machine learning of gradient boosting (XGBOOST, LightGBM, CatBoost) and Attention-Based CNN-LSTM for harmful algal blooms forecasting. Toxins 15(10), 608. https://doi.org/10.3390/toxins15100608 (2023).
Al-Abadi, A. M. Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: A comparative study. Arab. J. Geosci. https://doi.org/10.1007/s12517-018-3584-5 (2018).
Alam, A., Ahmed, B. & Sammonds, P. Flash flood susceptibility assessment using the parameters of drainage basin morphometry in SE Bangladesh. Quatern. Int. 575–576, 295–307. https://doi.org/10.1016/j.quaint.2020.04.047 (2021).
Al-Fugara, A. et al. Wildland fire susceptibility mapping using support vector regression and adaptive Neuro-Fuzzy inference System-Based Whale optimization algorithm and simulated annealing. ISPRS Int. J. Geo Inf. 10(6), 382. https://doi.org/10.3390/ijgi10060382 (2021).
Ali, A., Rana, I. A., Ali, A. & Najam, F. A. Flood risk perception and communication: The role of hazard proximity. J. Environ. Manage. 316, 115309. https://doi.org/10.1016/j.jenvman.2022.115309 (2022).
Alsulamy, S., Kumar, V., Kisi, O., Kedam, N. & Rathnayake, N. (2025). Enhancing water level prediction using ensemble machine learning models: A comparative analysis. Water Resour. Manag. 1–20. https://doi.org/10.1007/s11269-025-04142-5
Amiri, A., Soltani, K., Ebtehaj, I. & Bonakdari, H. A novel machine learning tool for current and future flood susceptibility mapping by integrating remote sensing and geographic information systems. J. Hydrol. 632, 130936. https://doi.org/10.1016/j.jhydrol.2024.130936 (2024).
Bayat-Afshary, N. & Danesh-Yazdi, M. Are the magnitude and frequency of floods increasing in Iran due to climate change? Implications from a 50-year analysis. Hydrol. Sci. J. 68(15), 2243–2261. https://doi.org/10.1080/02626667.2023.2259904 (2023).
Bhattarai, Y., Duwal, S., Sharma, S. & Talchabhadel, R. Leveraging machine learning and open-source spatial datasets to enhance flood susceptibility mapping in transboundary river basin. Int. J. Digit. Earth https://doi.org/10.1080/17538947.2024.2313857 (2024).
Billi, P., Alemu, Y. T. & Ciampalini, R. Increased frequency of flash floods in Dire Dawa, Ethiopia: Change in rainfall intensity or human impact?. Nat. Hazards 76(2), 1373–1394. https://doi.org/10.1007/s11069-014-1554-0 (2014).
Bui, D. T. et al. A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. CATENA 179, 184–196. https://doi.org/10.1016/j.catena.2019.04.009 (2019).
Cabrita, P., Montes, J., Duo, E., Brunetta, R. & Ciavola, P. The role of different total water level definitions in coastal flood modelling on a low-elevation dune system. J. Mar. Sci. Eng. 12(6), 1003. https://doi.org/10.3390/jmse12061003 (2024).
Chang, D., Li, S. & Lai, Z. Effects of extreme precipitation intensity and duration on the runoff and nutrient yields. J. Hydrol. 626, 130281. https://doi.org/10.1016/j.jhydrol.2023.130281 (2023).
Chou, T. et al. Swarm-based optimizer for convolutional neural network: An application for flood susceptibility mapping. Trans. GIS 25(2), 1009–1026. https://doi.org/10.1111/tgis.12715 (2020).
Costache, R. et al. Flash-flood susceptibility assessment using multi-criteria decision making and machine learning supported by remote sensing and GIS techniques. Rem. Sens. 12(1), 106. https://doi.org/10.3390/rs12010106 (2019).
Cui, H., Quan, H., Jin, R. & Lin, Z. Flood susceptibility mapping using novel hybrid approach of neural network with genetic quantum ensembles. KSCE J. Civ. Eng. 27(1), 431–441. https://doi.org/10.1007/s12205-022-0559-6 (2022).
Dabbeek, J., Silva, V., Galasso, C. & Smith, A. Probabilistic earthquake and flood loss assessment in the Middle East. Int. J. Disaster Risk Reduct. 49, 101662. https://doi.org/10.1016/j.ijdrr.2020.101662 (2020).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. arXiv.org. https://arxiv.org/abs/1810.11363 (2018).
Dutta, T., Bhattacharyya, S., Dey, S. & Platos, J. Border Collie optimization. IEEE Access 8, 109177–109197. https://doi.org/10.1109/access.2020.2999540 (2020).
Elymany, M. M., Enany, M. A. & Elsonbaty, N. A. Hybrid optimized-ANFIS based MPPT for hybrid microgrid using zebra optimization algorithm and artificial gorilla troops optimizer. Energy Convers. Manage. 299, 117809. https://doi.org/10.1016/j.enconman.2023.117809 (2024).
Ghadi, Y. Y., Neamah, N. M., Hossam-Eldin, A. A., Alqarni, M. & AboRas, K. M. State-of-the-art frequency control strategy based on an optimal fuzzy PI-FOPDFΛ for SMES and UPFC integrated smart grids using Zebra optimization algorithm. IEEE Access 11, 122893–122910. https://doi.org/10.1109/access.2023.3328961 (2023).
Gharehchopogh, F. S. & Gholizadeh, H. A comprehensive survey: Whale optimization algorithm and its applications. Swarm Evol. Comput. 48, 1–24. https://doi.org/10.1016/j.swevo.2019.03.004 (2019).
Gunn, J. & Bradley, C. From recharge to cave to spring: transmission of a flood pulse through a complex karst conduit network, Castleton, Derbyshire (UK). Water 16(9), 1306. https://doi.org/10.3390/w16091306 (2024).
Hitouri, S. et al. Flood susceptibility mapping using SAR data and machine learning algorithms in a small watershed in northwestern Morocco. Rem. Sens. 16(5), 858. https://doi.org/10.3390/rs16050858 (2024).
Hu, P., Zhang, Q., Shi, P., Chen, B. & Fang, J. Flood-induced mortality across the globe: Spatiotemporal pattern and influencing factors. Sci. Total Environ. 643, 171–182. https://doi.org/10.1016/j.scitotenv.2018.06.197 (2018).
Islam, A. R. M. T. et al. Flood susceptibility modelling using advanced ensemble machine learning models. Geosci. Front. 12(3), 101075. https://doi.org/10.1016/j.gsf.2020.09.006 (2021).
Jabeur, S. B., Gharib, C., Mefteh-Wali, S. & Arfi, W. B. CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Chang. 166, 120658. https://doi.org/10.1016/j.techfore.2021.120658 (2021).
Khosravi, K. et al. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. 627, 744–755. https://doi.org/10.1016/j.scitotenv.2018.01.266 (2018).
Kumar, V. et al. Leveraging level data for accurate downstream flow prediction in the Narmada River Basin with advanced machine learning models. J. Hydroinf. 27(2), 141–158. https://doi.org/10.2166/hydro.2025.131 (2025).
Kurugama, K., Kazama, S. & Chaminda, S. Flood Susceptibility Mapping Using Explainable Machine Learning Models. https://doi.org/10.31705/iserme.2023.12(2023).
Li, Z. et al. Short-term wind power prediction based on extreme learning machine with error correction. Prot. Control Modern Power Syst. https://doi.org/10.1186/s41601-016-0016-y (2016).
Lin, N., Ma, X., Jiang, R., Wu, M. & Zhang, W. Estimation of maize residue cover using remote sensing based on adaptive threshold segmentation and CatBoost algorithm. Agriculture 14(5), 711. https://doi.org/10.3390/agriculture14050711 (2024).
Mahmoud, S. H. & Gan, T. Y. Multi-criteria approach to develop flood susceptibility maps in arid regions of Middle East. J. Clean. Prod. 196, 216–229. https://doi.org/10.1016/j.jclepro.2018.06.047 (2018).
Masroor, M. et al. Adaptive neuro fuzzy inference system (ANFIS) machine learning algorithm for assessing environmental and socio-economic vulnerability to drought: a study in Godavari middle sub-basin, India. Stoch. Env. Res. Risk Assess. 37(1), 233–259. https://doi.org/10.1007/s00477-022-02292-1 (2022).
Miranda, F. et al. A GIS-based index of physical susceptibility to flooding as a tool for flood risk management. Land 12(7), 1408. https://doi.org/10.3390/land12071408 (2023).
Mirjalili, S. & Lewis, A. The Whale optimization algorithm. Adv. Eng. Softw. 95, 51–67. https://doi.org/10.1016/j.advengsoft.2016.01.008 (2016).
Mudashiru, R. B., Sabtu, N. & Abustan, I. Quantitative and semi-quantitative methods in flood hazard/susceptibility mapping: a review. Arab. J. Geosci. https://doi.org/10.1007/s12517-021-07263-4 (2021).
Munawar, H. S., Hammad, A. W. A. & Waller, S. T. Remote sensing methods for flood prediction: A review. Sensors 22(3), 960. https://doi.org/10.3390/s22030960 (2022).
Nagamani, K., Mishra, A. K., Meer, M. S. & Das, J. Understanding flash flooding in the Himalayan region: a case study. Sci. Rep. https://doi.org/10.1038/s41598-024-53535-w (2024).
Noy, I. Comparing the direct human impact of natural disasters for two cases in 2011: the Christchurch earthquake and the Bangkok flood. Int. J. Disaster Risk Reduct. 13, 61–65. https://doi.org/10.1016/j.ijdrr.2015.03.009 (2015).
Paryani, S. et al. Hybrid-based approaches for the flood susceptibility prediction of Kermanshah province, Iran. Nat. Hazards 116(1), 837–868. https://doi.org/10.1007/s11069-022-05701-4 (2022).
Paul, G. C., Saha, S. & Hembram, T. K. Application of the GIS-based probabilistic models for mapping the flood susceptibility in Bansloi sub-basin of Ganga-Bhagirathi River and their comparison. Rem. Sens. Earth Syst. Sci. 2(2–3), 120–146. https://doi.org/10.1007/s41976-019-00018-6 (2019).
Plataridis, K. & Mallios, Z. Flood susceptibility mapping using hybrid models optimized with Artificial Bee Colony. J. Hydrol. 624, 129961. https://doi.org/10.1016/j.jhydrol.2023.129961 (2023).
Rahman, M. et al. Flood susceptibility assessment in Bangladesh using machine learning and multi-criteria decision analysis. Earth Syst. Environ. 3(3), 585–601. https://doi.org/10.1007/s41748-019-00123-y (2019).
Razavi-Termeh, S. V., Sadeghi-Niaraki, A. & Choi, S. A new approach based on biology-inspired metaheuristic algorithms in combination with random forest to enhance the flood susceptibility mapping. J. Environ. Manage. 345, 118790. https://doi.org/10.1016/j.jenvman.2023.118790 (2023).
Razavi-Termeh, S. V., Seo, M., Sadeghi-Niaraki, A. & Choi, S. Flash flood detection and susceptibility mapping in the Monsoon period by integration of optical and radar satellite imagery using an improvement of a sequential ensemble algorithm. Weather Clim. Extrem. 41, 100595. https://doi.org/10.1016/j.wace.2023.100595 (2023).
Ruidas, D., Chakrabortty, R., Islam, A. R. M. T., Saha, A. & Pal, S. C. A novel hybrid of meta-optimization approach for flash flood-susceptibility assessment in a monsoon-dominated watershed, Eastern India. Environ. Earth Sci. https://doi.org/10.1007/s12665-022-10269-0 (2022).
Salvati, A. et al. Flood susceptibility mapping using support vector regression and hyper-parameter optimization. J. Flood Risk Manag. https://doi.org/10.1111/jfr3.12920 (2023).
Saravanan, S. et al. Flood susceptibility mapping using machine learning boosting algorithms techniques in Idukki district of Kerala India. Urban Clim. 49, 101503. https://doi.org/10.1016/j.uclim.2023.101503 (2023).
Seydi, S. T. et al. Comparison of machine learning algorithms for flood susceptibility mapping. Rem. Sens. 15(1), 192. https://doi.org/10.3390/rs15010192 (2022).
Shahabi, H. et al. Flood detection and susceptibility mapping using Sentinel-1 remote sensing data and a machine learning approach: hybrid intelligence of bagging ensemble based on K-nearest neighbor classifier. Rem. Sens. 12(2), 266. https://doi.org/10.3390/rs12020266 (2020).
Shahabi, H. et al. Flash flood susceptibility mapping using a novel deep learning model based on deep belief network, back propagation and genetic algorithm. Geosci. Front. 12(3), 101100. https://doi.org/10.1016/j.gsf.2020.10.007 (2021).
Shahhosseini, M., Hu, G. & Pham, H. Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. Mach. Learn. Appl. 7, 100251. https://doi.org/10.1016/j.mlwa.2022.100251 (2022).
Shi, D. et al. Effects of disturbed landforms on the soil water retention function during urbanization process in the Three Gorges Reservoir Region, China. CATENA 144, 84–93. https://doi.org/10.1016/j.catena.2016.04.010 (2016).
Sleziak, P., Danko, M., Jančo, M., Parajka, J. & Holko, L. Spatial and temporal variability of saturated areas during rainfall-runoff events. J. Hydrol. Hydromech. 71(4), 439–448. https://doi.org/10.2478/johh-2023-0025 (2023).
Tang, J., Liu, D., Shang, C. & Niu, J. Impacts of land use change on surface infiltration capacity and urban flood risk in a representative karst mountain city over the last two decades. J. Clean. Prod. 454, 142196. https://doi.org/10.1016/j.jclepro.2024.142196 (2024).
Teh, D. & Khan, T. (2021). Types, definition and classification of natural disasters and threat level. in Springer eBooks (pp. 27–56). https://doi.org/10.1007/978-3-030-61278-8_2
Tehrany, M. S., Kumar, L., Jebur, M. N. & Shabani, F. Evaluating the application of the statistical index method in flood susceptibility mapping and its comparison with frequency ratio and logistic regression methods. Geomat. Nat. Haz. Risk 10(1), 79–101. https://doi.org/10.1080/19475705.2018.1506509 (2018).
Termeh, S. V. R., Kornejady, A., Pourghasemi, H. R. & Keesstra, S. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 615, 438–451. https://doi.org/10.1016/j.scitotenv.2017.09.262 (2018).
Thannoun, R. G. & Ismaeel, O. A. Flood risk vulnerability detection based on the developing topographic wetness index tool in geographic information system. IOP Conf. Ser. Earth Environ. Sci. 1300(1), 012012. https://doi.org/10.1088/1755-1315/1300/1/012012 (2024).
Tinh, L. D., Thao, D. T. P., Bui, D. T. & Trong, N. G. Integrating Harris Hawks optimization and TensorFlow deep learning for flash flood susceptibility mapping using geospatial data. Earth Sci. Inf. https://doi.org/10.1007/s12145-024-01351-1 (2024).
Trojovska, E., Dehghani, M. & Trojovsky, P. Zebra Optimization Algorithm: a new Bio-Inspired optimization algorithm for solving optimization algorithm. IEEE Access 10, 49445–49473. https://doi.org/10.1109/access.2022.3172789 (2022).
Van Phong, T., Nguyen, D. D. & Pham, B. T. Modeling and mapping of flood susceptibility at Que Son District, Quang Nam Province, Vietnam using CatBoost. IOP Conf. Ser. Mater. Sci. Eng. 1289(1), 012019. https://doi.org/10.1088/1757-899x/1289/1/012019 (2023).
Vojtek, M. et al. Comparison of multi-criteria-analytical hierarchy process and machine learning-boosted tree models for regional flood susceptibility mapping: a case study from Slovakia. Geomat. Nat. Haz. Risk 12(1), 1153–1180. https://doi.org/10.1080/19475705.2021.1912835 (2021).
Wahba, M., Sharaan, M., Elsadek, W. M., Kanae, S. & Hassan, H. S. Examination of the efficacy of machine learning approaches in the generation of flood susceptibility maps. Environ. Earth Sci. https://doi.org/10.1007/s12665-024-11696-x (2024).
Wang, Y. et al. Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm. J. Environ. Manage. 247, 712–729. https://doi.org/10.1016/j.jenvman.2019.06.102 (2019).
Wedajo, G. K., Lemma, T. D., Fufa, T. & Gamba, P. Integrating satellite images and machine learning for flood prediction and susceptibility mapping for the case of Amibara, Awash Basin, Ethiopia. Rem. Sens. 16(12), 2163. https://doi.org/10.3390/rs16122163 (2024).
Wu, T. et al. High-efficient extraction of drainage networks from digital elevation models constrained by enhanced flow enforcement from known river maps. Geomorphology 340, 184–201. https://doi.org/10.1016/j.geomorph.2019.04.022 (2019).
Yan, Z., Sha, J., Liu, B., Tian, W. & Lu, J. An ameliorative whale optimization algorithm for multi-objective optimal allocation of water resources in Handan, China. Water 10(1), 87. https://doi.org/10.3390/w10010087 (2018).
Yang, L. & Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061 (2020).
Yao, Z. et al. Refined landslide susceptibility mapping by integrating the SHAP-CatBoost model and INSAR observations: A case study of Lishui, southern China. Appl. Sci. 13(23), 12817. https://doi.org/10.3390/app132312817 (2023).
Zhao, L. et al. Multi-method combined analysis of urban flood risks and its influencing factors under low impact development. J. Hydrol. 632, 130927. https://doi.org/10.1016/j.jhydrol.2024.130927 (2024).
Zhao, Q. & Qu, Y. The retrieval of ground NDVI (normalized difference vegetation index) data consistent with remote-sensing observations. Rem. Sens. 16(7), 1212. https://doi.org/10.3390/rs16071212 (2024).
Kundapura, S., Soman, A., & Kuruvilla, E. Machine learning ensemble model for flood susceptibility mapping. In2022 IEEE 7th International Conference on Recent Advances and Innovations in Engineering (ICRAIE) 7, 126–131 (2022) IEEE.
Mirjalili, S. (Ed.). Handbook of Whale Optimization Algorithm: Variants, Hybrids, Improvements, and Applications. Elsevier (2023).
Bui, D. T., Lofman, O., Revhaug, I., & Dick, O. Landslide susceptibility analysis in the Hoa Binh province of Vietnam usingstatistical index and logistic regression. Natural hazards, 59(3), 1413–1444 (2011).
Pradhan, B., & Sameen, M. I. Landslide susceptibility modeling: optimization and factor effect analysis. In Laser scanningapplications in landslide assessment . Cham: Springer International Publishing 115–132 (2017).
Shogrkhodaei, S. Z., Razavi-Termeh, S. V., & Fathnia, A. Spatio-temporal modeling of PM2. 5 risk mapping using threemachine learning algorithms. Environmental Pollution, 289, 117859 (2021).
Razavi-Termeh, S. V., Khosravi, K., Sadeghi-Niaraki, A., Choi, S. M., & Singh, V. P. Improving groundwater potentialmapping using metaheuristic approaches. Hydrological Sciences Journal,65(16), 2729–2749 (2020).
Acknowledgements
This work was supported in part by the ITRC Support Program under Grant IITP-2024-RS-2022-00156354 and in part by the Metaverse Support Program to Nurture the Best Talents under Grant IITP-2024-RS-2023-00254529 funded by the Ministry of Science and ICT of Korea and the Institute of Information and Communications Technology Planning and Evaluation (IITP).
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
Seyed Vahid Razavi-Termeh: Conceptualization, methodology, software, data curation, writing—original draft preparation, investigation, formal analysis, visualization. Abolghasem Sadeghi-Niaraki: Methodology, conceptualization, investigation, formal analysis, visualization, project administration, writing—reviewing and editing, supervision. Sani I. Abba: Validation, resources, writing- reviewing and editing. Jamil Hussain: Writing—reviewing and editing, resources, validation, methodology. Soo-Mi Choi: Writing—reviewing and editing, methodology, investigation, project administration, supervision, funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Razavi-Termeh, S.V., Sadeghi-Niaraki, A., Abba, S.I. et al. Flood-prone area mapping using a synergistic approach with swarm intelligence and gradient boosting algorithms. Sci Rep 15, 27924 (2025). https://doi.org/10.1038/s41598-025-12022-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-12022-6