Evaluating machine learning efficiency and accuracy for real time flash flood mapping

Mirzapour, Hafez; Haghizadeh, Ali; Motlagh, Mahdi Soleimani

doi:10.1038/s41598-025-34037-9

Download PDF

Article
Open access
Published: 30 December 2025

Evaluating machine learning efficiency and accuracy for real time flash flood mapping

Hafez Mirzapour¹,
Ali Haghizadeh¹ &
Mahdi Soleimani Motlagh¹

Scientific Reports volume 16, Article number: 3975 (2026) Cite this article

1800 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Flash floods endanger communities and ecosystems in rugged regions, but precise prediction is difficult due to environmental complexity. This study evaluates six machine learning algorithms for flash flood mapping in Iran’s Dez Basin, a region growing more vulnerable to climate extremes. We developed an integrated geospatial database incorporating 32 climatic, anthropogenic, and physiographic parameters, validated through extensive field surveys documenting historical flood events. The dataset (70% training, 30% validation) was analyzed using: (1) H2O Deep Learning framework, (2) Random Forest (RF), and (3) four boosting methods (AdaBoost, XGBoost, LightGBM, CatBoost). The RF model achieved exceptional predictive performance (AUC = 0.89, accuracy = 95%), outperforming other techniques by 6–12% in classification metrics. Sensitivity analysis identified precipitation intensity (β = 0.34, p < 0.01), watershed area (β = 0.28), and slope gradient (β = 0.25) as statistically significant dominant controls.These findings advance flood risk management in three key ways: First, they demonstrate RF’s superiority in handling heterogeneous geospatial data. Second, the 30 m-resolution susceptibility map provides actionable insights for land-use planning. Third, the methodology offers a transferable framework for arid/semi-arid regions globally. We recommend policymakers prioritize slope stabilization and early-warning systems in high-risk zones (AUC > 0.85) to enhance community resilience.

Introduction

Flash floods, a major natural calamity, have affected global residents, resulting in considerable material destruction, loss of life, and extensive erosion. Multiple hydrological factors, such as topography, soil composition, vegetation cover, human habitation, and antecedent precipitation, precipitate severe floods¹. In regions characterized by steep inclines, rugged landscapes, or densely populated metropolitan settings, even minimal precipitation might precipitate a flash flood. The term “flash” denotes a rapid response, with water levels in the drainage system attaining its zenith within minutes to hours^2,3. The precise forecasting of flash floods continues to be difficult due to their intricacies. Nonetheless, climatic conditions, soil properties, geomorphological features, and plant cover significantly influence their occurrence. Precisely identifying prospective regions susceptible to flash floods is crucial for mitigating their adverse effects^4,5. Flash flood hazard maps are essential for integrated watershed management, offering critical information regarding flood events and their effects on floodplains. Cost-effective and rapid modeling approaches are vital for identifying areas susceptible to flash floods. Data mining is a sophisticated process designed to uncover novel patterns and models from extensive datasets. Prior research has examined the dynamics and consequences of flash floods, employing models such as Random Forest and Naïve Bayes. Habibi et al.⁶ performed a study in northern Iran utilizing a dataset of 410 sample points, which included ten parameters affecting floods. The Regularized Random Forest (RRF) model outperformed the Naïve Bayes model, achieving an Area Under the Curve (AUC) of 0.94. Ren et al.⁷ evaluated flood susceptibility in Southwest China using ten parameters, with the Random Forest model demonstrating superior performance compared to the XGBoost model. SELLAMI and Rhinane⁸ employed multiple models, including as Random forest, support vector machine, multi-layer perceptron, logistic regression, CART, and naïve Bayes, to identify flood-prone regions in Morocco. Elghouat et al.⁹ examined 12 parameters influencing flash floods and contrasted bivariate and multivariate statistical models with machine learning methodologies. The proximity to rivers was identified as the most significant factor leading to flood occurrences in the research area. The Random Forest model surpassed all other models, attaining an AUC of 0.86. Al-Kindi and Alabri¹⁰ evaluated the prediction efficacy of boosting algorithms (XGBoost and CatBoost) in comparison to Random Forest in Oman, emphasizing their robust discrimination ability. Wahba et al.¹¹ utilized the Random Forest model in Japan to analyze 11 environmental characteristics influencing floods, gathering data from 224 flood and non-flood sites. This project seeks to develop flash flood susceptibility maps employing several machine learning and deep learning techniques in the Dez subbasins of Lorestan Province, which have encountered numerous floods in recent years.

Study area

The Dez watershed sub-basins in Lorestan Province, encompassing more than 32% of the province’s total territory, are particularly vulnerable to flash floods owing to their hilly topography and insufficient water resource management infrastructure. Data from hydrometric, rain gauge, and synoptic meteorological stations repeatedly indicates flash floods in this region, resulting in substantial damage to infrastructure, residences, natural resources, and economic establishments. Evaluating flash flood hazards is essential for efficient water resource management and mitigating damages in Lorestan Province. Figure 1 depicts the research area.

Methodology

Flash floods pose a considerable risk to the Dez sub-basins of Lorestan Province, resulting in substantial deaths and economic detriment each year. This work seeks to evaluate the vulnerability to these floods employing deep learning models such as H2O, Random Forest, AdaBoost, XGBoost, LightGBM, and CatBoost. The counties of Khorramabad, Borujerd, Doroud, Azna, and Aligudarz exhibit the highest frequency of flash floods in the region. This study examines numerous factors, including watershed area, perimeter, basin length, stream order, stream length, average stream length, maximum and minimum elevations, relief ratio, slope, slope orientation, distance from river, soil hydrology group, Landuse, bifurcation ratio, stream length ratio, stream frequency, drainage density, ruggedness number, length ratio, circularity ratio, form factor, lithology, NDVI, 2-year return period rainfall, 2-year return period discharge, topographic wetness index (TWI), sediment transport index (STI), and stream power index (SPI). The research employed ArcGIS 10.8.1 and Excel software to examine the physiographic characteristics of the watershed. We conducted Landuse classification with Sentinel-1 and Sentinel-2 satellite photos together with field surveys. We assembled hydrologic soil groups according to soil texture, designating a curve number (CN) for each Landuse category linked to a particular hydrologic soil group. We obtained the annual maximum rainfall and discharge figures for each station. We employed regression approaches to rebuild partial data from stations over a 30-year span (1992–2022) and calculated discharge values for different return periods utilizing suitable statistical distributions. We employed the Soil Conservation Service (SCS) and Kirpich methodologies to assess maximum precipitation, runoff depth, and discharge. We employed independent variables such as stream length, stream slope, area, curve number (CN), maximum rainfall, and rainfall retention in the multivariate regression analysis using discharge as the dependent variable. Regression analysis employs the Durbin-Watson statistic to account for autocorrelation in error terms, so examining the existence of such correlations by analyzing residuals with a designated time lag^12,13. We will employ the coefficient of determination (R²), root mean squared error (RMSE), and Nash-Sutcliffe efficiency (NSE) to evaluate the efficacy of the techniques, accounting for autocorrelation in the error terms. An R² and NSE value around one signifies superior model performance, although RMSE values nearing zero are preferable¹⁴. We will ultimately identify the most appropriate strategy for calculating discharge in watersheds devoid of statistical data. We employed Google Earth Engine, EasyFit Professional v5.6, and RStudio to produce maps for Landuse and soil texture. We utilized RStudio for multivariate regression analysis and ArcGIS Pro v3.1.5 for the creation of interpolated maps. The data was subjected to extensive correlation analysis to detect possible anomalies. Subsequently, we will exclude the parameters that display deviations and incorporate the remaining parameters into deep learning models, including H2O, Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and CatBoost. The optimal model for evaluating sensitivity to flash floods in the research area will be determined using RMSE, R², mean absolute error (MAE), accuracy, and area under the curve (AUC) metrics.

Selection of parameters influencing flood occurrence

We will perform an initial multiple correlation analysis to execute the model. This study will determine if the relationships among two or more input variables may result in deviations. To identify multicollinearity among the numerous variables, two often employed statistical metrics are Tolerance (TOL) and Variance Inflation Factor (VIF)¹⁵. We must eliminate the parameter influenced by the modeling procedure if the Variance Inflation Factor (VIF) exceeds 5 or the Tolerance (TOL) is below 0.1. This indicates that the parameters exhibit multicollinearity.

Landuse and land cover

Landuse and land cover significantly affect river flow and flood occurrence. Vegetation removal, crop changes, and excessive grazing can increase runoff and flood discharges. Detention and retention basins can mitigate flood peak discharges, while excessive grazing compresses soil and eliminates vegetation¹⁶. This study uses Sentinel-2 satellite imagery and radar data from Sentinel-1 for 2022 to create Landuse and land cover maps for the Dez sub-basins in Lorestan Province. Sentinel-2 offers insights into land cover and features like forests and natural disasters, with 13 spectral bands and spatial resolutions of 10, 20, and 60 m. Sentinel-1, a radar satellite, captures images with 5 to 20 m of accuracy and a 5-day revisit period in the equatorial region. This research utilized cloud-free, low-atmospheric-reflectance multispectral images from Sentinel-2 and Sentinel-1’s GRD images via Google Earth Engine. Many researchers have highlighted the efficacy of using vertical-horizontal polarization for monitoring land covers^17,18,68,69. Therefore, this study utilized VH polarization for land cover analysis.

Hydrological soil groups

The production of hydrological soil group maps requires the integration of soil texture and vegetation cover maps. To determine soil texture, we used the following methods:

Simple ratio clay index (SRCI)

The physical and chemical properties of the soil closely correlate with the composition, particle size, strength, and arrangement of clay minerals. Kaolinite, illite, and montmorillonite are the most common clay minerals. The processes involved in the formation of these clay minerals largely depend on the parent material, specific climatic conditions, and locational factors¹⁹. Their significance relates to swelling behavior, soil fertility, metal absorption, and more. Diagnostic absorption bands in the shortwave infrared (SWIR) spectrum, which result from the vibrations of hydroxyl groups and structural water molecules, identify these three types of clay minerals²⁰.

Soil texture is a critical determinant of moisture content. In the visible and infrared spectrums, soil moisture lowers reflectance. Because clay soils can hold a lot of water, they have lower reflectance in the visible, near-infrared, and mid-infrared wavelengths because they absorb light very well, which is very similar to electromagnetic wave radiation. In contrast, silt particles, being a mixture of soil mineral particles, have a weaker correlation with electromagnetic spectrum reflectance²¹.

According to Sabins^22,23, SRCI is the ratio of reflectance between two SWIR bands: B11 (SWIR 1) and B12 (SWIR 2). This ratio is found from Sentinel-2 images (Eq. 1).

$$SRCI=\frac{{SWIR1}}{{SWIR2}}$$

(1)

Brightness index (BI)

The Brightness Index (BI) combines the reflectance information from the red and near-infrared bands, represented by bands B4 and B9 in the Sentinel-2 dataset. This index indicates variations in average reflectance levels²⁴.

Darker soils, primarily due to higher clay content^25,26, exhibit lower radiometric values in both bands, while dry soils with low clay content are highly reflective. We express the Brightness Index as follows:

$$BI=\sqrt {{R^2}+NI{R^2}}$$

(2)

Soil texture classification and moisture analysis

This study used Sentinel-1 products to extract soil moisture for the Dez sub-basins in Lorestan Province. Following this, we classified the soil texture using the primary bands (B11 and B12) from Sentinel-2, the Brightness Index (BI), the Simple Ratio Clay Index (SRCI) from Sentinel-2, and soil moisture data from Sentinel-1, all implemented through a Random Forest algorithm. This process facilitated the determination of clay and sand soil textures. The sensitivity analysis of soil moisture relative to clay and sand is inherently dependent on the components that constitute clay, silt, and sand. Sandy soils, with their porous texture, significantly influence soil moisture by acting as filters that allow water infiltration. Consequently, clay soils dry out more slowly than dry sandy soils and can retain moisture for several days. For this analysis, we employed a method that involved monitoring dynamic changes in soil moisture properties, directly related to precipitation events. This method relies on observing the relationship between average soil moisture levels and its texture (clay and sand)²⁷. We then derived hydrological soil group maps using the percentages of clay, sand, and silt textures in the study area.

Normalized difference vegetation index (NDVI)

NDVI assesses vegetation health and cover using the normalized difference between near-infrared and red bands²⁸.

Stream power index (SPI)

SPI predicts erosion potential in steep, convergent areas based on watershed area and slope²⁹.

Sediment transport index (STI)

STI represents the erosive power of water and identifies erosion-prone areas³⁰.

Topographic wetness index (TWI)

TWI combines upslope area and slope to indicate soil moisture distribution and drainage potential²⁹.

Physical characteristics of the basin

This study examines key parameters of the physical characteristics of the basin. Table 1 presents the methods used to derive each parameter.

Table 1 Outlines the morphometric parameters analyzed in the Dez sub-basins of Lorestan Province.

Full size table

Creation of training and validation datasets

In this phase of the research, a sampling method will be employed to create training and validation datasets. Field visits will be conducted, utilizing GPS devices alongside software tools such as Google Earth. Additionally, information regarding flood-prone areas and flood-free points will be collected based on a review of relevant literature. Consequently, 70% of the data will be utilized for the training phase, while 30% will be reserved for model validation.

Preparation of flash flood sensitivity prediction maps

The preparation of flash flood sensitivity prediction maps will be accomplished using machine learning techniques such as random forest, deep learning, and boosted trees. These methodologies provide solutions for modeling flood behavior, predicting its status presently and in the future. Among the commonly applied approaches for predicting flash flood sensitivity are models such as Random Forest (RF), deep learning (H2O model), adaptive boosting (AdaBoost), maximum gradient boosting (XGBoost), light gradient boosting (LightGBM), and CatBoost (categorical boosting).

Random forest (RF)

Random Forest is a type of ensemble learning method that utilizes base algorithms for iterative multiple predictions. In this approach, a large number of decision trees are generated, and then all trees are combined to make predictions. Random Forest can learn complex patterns and consider non-linear relationships between explanatory variables and dependent variables, as well as integrate and combine different types of data in the analysis. Random Forest constructs multiple decision trees and merges them to produce more accurate and stable predictions. In this method, a random vector, independent of other random vectors, is generated for the K-th tree. Additionally, all vectors share a uniform distribution. The regression tree is developed using the training dataset, and the outcome from the ensemble of K trees can be expressed as:

$$\widehat {{{y_1}}}={h_1}(x),\widehat {{{y_2}}}={h_2}(x),...\widehat {{{y_k}}}={h_k}(x)$$

(3)

$$K=\left\{ {{h_1}(x),{h_2}(x)...{h_k}(x)} \right\}$$

Where:

$$X=\left\{ {{x_1},{x_2}...{x_p}} \right\}$$

$${h_k}(x)=h\left( {x,\theta k} \right)$$

These vectors serve as subsequent input vectors P that collectively form a forest. The k outputs generated correspond to each tree and can be expressed as:

Deep learning (DL)

Deep learning is a subset of machine learning and artificial intelligence techniques. In this data-driven approach, a statistical relationship is identified between input and output data to predict the desired variable. Each level in deep learning learns to represent and compress the input data into a more abstract and succinct form. Crucially, a deep learning process can independently learn the optimal features at various levels³⁸. The term “deep” in deep learning refers to the number of layers through which the data is transformed into output. Deep learning models are capable of extracting better features than shallow models, and thus, the additional layers facilitate enhanced feature learning³⁹.

Boosting machines (BM)

Boosting refers to a strong classifier that is based on a given training dataset and results from the aggregation of several weak classifiers, making it one of the successful algorithms for supervised learning⁴⁰. It is a method for transforming a set of weak learners into a strong learner. A weak learner is defined as one with an error rate less than 0.5, while a strong learner has an error rate close to zero. A family of weak learners is combined to form a robust classifier⁴¹.

Adaptive boosting (AdaBoost)

AdaBoost, introduced by Freund and Schapire in 1997, is a machine learning algorithm that is employed to enhance performance and address the issue of imbalanced classes when used alongside other learning algorithms⁴². This algorithm reinforces the classification process by combining an ensemble of decision trees, known as weak classifiers, to form a stronger overall classifier. This approach is referred to as boosting in machine learning. Each weak classifier fj(x) generates a binary output (in this context, flood or no flood) and performs a learning sequence. In each iteration of classification, incorrectly predicted instances from the previous round are re-weighted. Once a specific threshold is reached, the sequence terminates, yielding a stronger classifier from the ensemble of weaker classifiers. A weak classifier loses weight when it misclassifies observations, while effective classifiers are assigned greater weights. Consequently, an adaptive sequence of learning is executed, enhancing classification performance⁴³.

Gradient boosting machines (GBM)

Gradient boosting (GB) is another machine learning algorithm developed by Friedman⁴⁴. Both AdaBoost and GB classifiers are based on the idea of creating multiple classifiers and then averaging their performances to identify the best one. This model builds trees sequentially, where each new tree corrects errors made by the previous trees. The algorithm starts by training a decision tree, assigning equal weights to each observation. After evaluating the first tree, the weights for observations that are difficult to classify are increased, while lower weights are assigned to those that are easier to classify. Thus, the second tree is built on these weighted data. The goal is to prepare predictions of the adjusted residuals. This process is repeated for a fixed number of iterations. Subsequent trees assist in classifying observations that were not well classified by earlier trees⁴⁵. In this method, trees are grown by minimizing a cost function. This function is defined as follows:

$$L=\left\langle {\sqrt {{{\left( {Y - {Y_{fid}}} \right)}^2}} } \right\rangle$$

(4)

In the above relation, Y represents the predicted model output for class 11, and ${Y_{fid}}$denotes the actual value of the class to which the input data belongs. The model continuously strives to minimize the cost function value. Ensemble classifications typically prevent overfitting of the model learned by the algorithm and often yield better results compared to other algorithms. The key difference between this method and Random Forest is that in gradient boosting, trees are trained one after the other, while in Random Forest, multiple trees are trained simultaneously⁴⁶.

Extreme gradient boosting (XGBoost)

The XGBoost algorithm, introduced by Chen and Guestrin⁴⁷, expands upon the concept of gradient boosting (GB). This algorithm belongs to the family of gradient boosting algorithms and ensemble methods, making it applicable for both regression and classification problems. XGBoost is highly regarded for its fast execution relative to other gradient boosting algorithms and its exceptional performance, frequently being utilized in machine learning competitions⁴⁸. Assuming we have a dataset represented as $D=\left\{ {\left( {{x_i},{y_i}} \right)} \right\}:i=1,...,n$, consisting of n samples and m features, the proposed tree model in this method uses an additive function $\left( {{x_i} \in {R^m},{y_i} \in R} \right)$to approximate the model output as follows⁴⁹:

In this relationship, F represents the space of the problem defined by the fitted decision trees as follows⁴⁹.

$$y_{p} = \phi \left( {x_{i} } \right) = \sum\limits_{{z - 1}}^{z} {f_{z} (x_{i} ),f_{z} \in F}$$

(5)

$$F=\left\{ {f(x)={w_{q(x)}}} \right\}\left( {q:{R^m} \to T,w \in {R^T}} \right)$$

(6)

Here, q denotes the structure of the decision tree, w is the weight of each leaf in the decision tree, and T indicates the number of leaves in the decision trees. The function f is dependent on the values of q, and the weights ww are specific to each decision tree. To optimize the ensemble of decision trees and reduce error, the task of the XGBoost algorithm is to minimize the objective function defined as follows⁴⁹:

$$L(t) = \sum\limits_{{i = 1}}^{n} {l\left( {y_{e} ,y_{p}^{{(t - 1)}} + f_{t} (x_{i} )} \right) + } (f_{t} )$$

(7)

In this equation, l is the loss function for calculating the difference between ${y_e}$ and ${y_p}$and t indicates the iteration number for minimizing the error. The term f represents a penalty function to reduce the complexity of the model fit.

Light gradient boosting machines (LightGBM)

LightGBM is an algorithm for gradient boosting based on decision tree algorithms, utilized for classification and various other machine learning tasks. This algorithm offers a compact representation of data types, which reduces memory consumption for data objects such as NumPy, Pandas, arrays, etc. This is achieved by only needing to store discrete histograms. The default training method for decision trees in LightGBM uses a histogram algorithm. This option is also available in XGBoost but with predetermined default values for features. LightGBM exclusively employs tree-based algorithms and is noted for both its accuracy and high efficiency⁵⁰. Most decision tree-based algorithms increase the depth of the tree. However, LightGBM grows trees by increasing the number of leaves. The leaf with the highest delta value of the cost function is chosen for tree growth, as a high delta indicates that the model has not learned that class (leaf) well, necessitating a deeper focus on that specific class to ask more inquiries about it. Thus, the algorithm performs better when growing leaves than when increasing tree depth, resulting in a lower cost function value. LightGBM supports regression, binary classification, and multi-class classification. Experiments on various datasets demonstrate that LightGBM can achieve high accuracy while accelerating the training process by more than 20 times⁵¹. It is notable that not all data points have equal contributions to the training process in the LightGBM algorithm. Data points with lower gradients require more training, meaning that focusing on data points with larger gradients is more efficient⁴⁶.

CatBoost

CatBoost is another boosting technique proposed by Dorogush et al.⁵². It utilizes a gradient boosting schema to construct a regression model through adjusted estimates. Additionally, various modifications have been implemented to minimize model overfitting. Gradient boosting models are valuable machine learning tools, delivering accurate results across various fields, including environmental parameter estimation, ecosystem factor dispersion, and weather forecasting. The CatBoost model excels in classification tasks. Generally, the absence of specific features can enhance model accuracy, primarily owing to the application of gradient boosting, which employs a binary tree classification schema. The following points illustrate the distinctions between CatBoost and other boosting techniques. It incorporates a sophisticated method for transforming categorical features into numerical information. As noted by Prokhorenkova et al.⁵³, target statistics are highly effective in addressing classified features with minimal information errors. CatBoost combines categorical variables to leverage existing relationships among various parameters. Additionally, a symmetric tree strategy is employed to mitigate overfitting issues and improve classification performance.

Suppose we have a dataset.

$$D=\left\{ {\left( {{X_J},{Y_J}} \right)} \right\}J=1,....,m$$

(8)

Where is a ${X_J}=\left( {x_{j}^{1},x_{j}^{2},...,x_{j}^{n}} \right)$ combination of attributes, and ${Y_J} \in R$ represents the desired goal. The input-output data set is uniformly and independently distributed depending$\rho \left( {.,.} \right)$ on an unknown function. The goal of learning techniques is to train and evaluate a function $H:{R^n} \to R$. That can reduce the loss of information. That is, $L(H):=EL(y,H(X))$ where L is the quality error and (X and Y) represent the test samples of D. The gradient boosting approach performs a series of greedy estimates, $Ht:RmR,t=0,1,2...,Ht=H((t - 1))+{g^t}$ the final function is generated from the previous estimate using an incremental process.

$$g^{t} = \arg \min _{{g \in G}} L(H^{{t - 1}} + g) = \arg \min _{{g \in G}} EL(y,H^{{t - 1}} (X))$$

(9)

Generally, greedy techniques, such as Newton’s method, using a second-order approach $L(H(t - 1))+g)$ in $H(t - 1)$ or adopting (negative) gradient steps, are used to address the optimization problem⁵⁴. To run the model, first a multiple correlation analysis will be performed, the correlation between parameters will be examined, and parameters that may have high internal correlation will be removed, and the models will be run in the R or Python software environment, and the importance of each variable will be determined. In the next step, in order to prepare a visual map of flash flood sensitivity, the results of the models will be transferred to ArcGIS 10.8.1software.

Evaluation of model performance

To assess the performance of the models and methods utilized in this study, we will employ statistical indicators such as NSE (Nash-Sutcliffe Efficiency), RMSE (Root Mean Square Error), DW (Durbin-Watson statistic), Accuracy, and MAE (Mean Absolute Error). The calculations of these indicators will be as follows:

$$R^{2} = \frac{{\left( {\sum\nolimits_{{i = 1}}^{n} {(P_{i} - \overline{P} )(O_{i} - \overline{O} )} } \right)^{2} }}{{\sum\nolimits_{{i = 1}}^{n} {(P_{i} - \overline{P} )^{2} \sum\nolimits_{{i = 1}}^{n} {(O_{i} - \overline{O} )^{2} } } }}$$

(10)

$$NSE = 1 - \frac{{\sum\nolimits_{{i = 1}}^{n} {\left( {O_{i} - P_{i} } \right)} ^{2} }}{{\sum\nolimits_{{i = 1}}^{n} {\left( {P_{i} - \overline{P} } \right)} }}$$

(11)

$$RMSE = \sqrt {\frac{1}{N}} \sum\limits_{{i = 1}}^{n} {\left( {P_{i} - O_{i} } \right)} ^{2}$$

(12)

$$DW = \frac{{\sum\nolimits_{{i = 2}}^{n} {(\delta x_{i} - } \delta x_{{i - 1}} )^{2} }}{{\sum\nolimits_{{i = 1}}^{n} {(\delta x_{i} } \delta x_{i} )}}$$

(13)

$$Accuracy = \frac{{TP + TN}}{{TP + TN + FP + FN}}$$

(14)

$$MAE = \frac{1}{N}\sum\limits_{{i = 1}}^{n} {\left| {P_{i} - O_{i} } \right|}$$

(15)

In these equations, ${P_i}$ represents the computed values, ${O_i}$ denotes the observed values, ${\bar {p}_{}}$ is the mean of the computed values, $\overline {O}$ is the mean of the observed values. n is the number of data points. $\delta {x_i}$ and $\delta {x_{i - 1}}$ refer to the residuals at consecutive points.

Additionally, the following definitions are applied:

TN (True Negatives): The number of non-flood areas accurately identified.
FP (False Positives): The number of flood points incorrectly identified.
TP (True Positives): The number of flood points accurately identified.
FN (False Negatives): The number of non-flood areas incorrectly identified.

Evaluation of flash flood sensitivity prediction map accuracy using the ROC curve

In this phase, the accuracy of the flash flood sensitivity prediction map will be assessed using the ROC (Receiver Operating Characteristic) curve and the Area Under the Curve (AUC). The ROC curve represents the balance between the true positive rate and the false positive rate for each possible threshold value. The area under the curve reflects the model’s effectiveness in predicting flood and non-flood pixels. The model with the highest AUC value will be considered the superior model.

Results

Factors influencing flash flood occurrence

In this study, the significant parameters affecting the sensitivity to flash flood risk and occurrence were identified, including watershed area, perimeter, basin length, stream order, stream length, average stream length, maximum and minimum elevation, Relief ratio, slope, Aspect, Distance from river, soil hydrology group, Landuse, bifurcation ratio, stream length ratio, Stream frequency, drainage density, ruggedness number, length ratio, circularity ratio, Form factor, lithology, NDVI, 2-year return period precipitation, 2-year return period discharge, Topographic Wetness Index (TWI), Sediment Transport Index (STI), and Stream Power Index (SPI). The information layers corresponding to all these parameters have been prepared and are presented in the subsequent sections.

Landuse

The Landuse data were derived from cloud-free, multi-temporal images with low atmospheric reflectance captured by the Sentinel-2 sensor, as well as radar images from the Sentinel-1 sensor in 2022, utilizing the Google Earth Engine platform. Given the reflective similarity of certain Landuses—such as residential areas compared to barren land or dense shrubland versus agricultural land—efforts were made in the Google Earth Engine coding to address this issue. This was accomplished by incorporating elevation maps, vegetation indices (NDVI), and slope to enhance the accuracy of Landuse detection. The Landuse classification was achieved by integrating Sentinel-1 and Sentinel-2 satellite images within the Google Earth Engine, utilizing the Random Forest algorithm. The results were validated against field collection points (GPS). The classification results indicated a Kappa coefficient of 0.80 and an overall accuracy of 0.82. The Landuse map is presented in Fig. 2a. To more accurately evaluate the performance of the classification model, User’s Accuracy and Producer’s Accuracy were calculated for each land use class. User’s Accuracy indicates the proportion of pixels classified into a given category that truly belong to that class, whereas Producer’s Accuracy reflects the proportion of actual ground reference pixels for a class that were correctly identified by the model (Table 2).

Table 2 Per-class accuracy assessment of the land use/land cover classification.

Full size table

Soil texture map

To prepare the soil texture, we followed the methodology outlined by Bousbih et al.²⁷, utilizing the SRCI (Soil Reflectance Classification Index) and BI (Brightness Index) alongside bands 11 and 12 of Sentinel-2, as well as soil moisture data from Sentinel-1. The percentages of clay (Fig. 2b) and sand (Fig. 2c) were extracted within the Google Earth Engine. Subsequently, the percentage of silt (Fig. 2d) was calculated by summing the percentages of clay and sand and subtracting this sum from 100%. The soil was then classified according to the standards set by the United States Department of Agriculture (USDA) within the Google Earth Engine (Fig. 2e).

Hydrologic soil groups map

In large watersheds with multiple soil groups, it is essential to consider the area and position of each group. Soil groups occupying less than 3% of the total watershed area can be excluded, although impervious surfaces must always be taken into account. The hydrologic soil groups are classified according to their minimum permeability based on Table 3⁵⁵.

Table 3 Minimum infiltration rates in hydrologic soil groups.

Full size table

For the preparation of hydrologic soil groups, the soil texture map derived from satellite images in the previous step was utilized(Fig. 2f).

Curve number (CN) map

After identifying the hydrologic soil groups along with the Landuse and vegetation cover maps, the Curve Number (CN) map for use in the SCS method to estimate discharge in sub-basins lacking statistical data is presented in Table 4 and illustrated in Fig. 2g.

Table 4 Curve number (CN) for average soil moisture conditions.

Full size table

Precipitation and discharge

To prepare the precipitation map, rain gauge stations were first evaluated to select those with the longest record of data. Some stations were located within the watershed, while others were situated around it (Fig. 2h). For extracting precipitation data with different return periods and the best statistical distribution for each station, EasyFit software was employed. In this study, the maximum annual precipitation over a 30-year statistical period (1992–2022) was analyzed. The results for various return periods are presented in Table 5. Notably, as with the hydrometric stations, there is no specific distribution for all rain gauge stations. In other words, each rain gauge station may have its own distribution for estimating precipitation with different return periods; indeed, among 19 rain gauge stations, 15 unique distributions were identified as suitable. To create the precipitation map, different return periods were utilized for precipitation zoning. Using ArcGIS 10.81 software, various interpolation models were applied to the rain gauge stations. The best models for zoning precipitation with different return periods in the Dez sub-basins in Lorestan Province are presented in Table 5.

Table 5 Best kriging models for interpolating daily maximum precipitation with different return periods in the Dez Sub-basins of Lorestan Province.

Full size table

The precipitation zoning map with a 2-year return period, along with the locations of the rain gauge stations, is presented in Fig. 2h.

In the following, the length of the drainage network and the average slope of the drainage were determined using the elevation map extracted from the ASTER satellite provided by the USGS for each sub-basin of the watershed. The results are presented in Table 6.

Table 6 sub-basins of Lorestan Province.

Full size table

Table 6 presents the coefficient of determination, Nash-Sutcliffe efficiency coefficient, and mean squared error for evaluating both the multiple regression and SCS methods for return periods of 2, 5, 10, 25, 50, 100, and 200 years, along with the Durbin-Watson statistic. As seen in Table 6, the multiple regression method exhibited superior performance compared to the SCS method in this study. The runoff estimates obtained using the SCS method were consistently higher than the actual values, which aligns with the findings of Talikhoshk et al.⁵⁶. This discrepancy might be attributed to the disregard for actual evapotranspiration and real soil moisture content⁵⁶. Additionally, the poor performance of the SCS method in estimating runoff contradicts the results reported by Hosseini⁵⁷, Esfandiari et al.⁵⁸, and Soleimani et al.⁵⁹. Furthermore, the results from the multiple regression approach affirm the high accuracy in simulating runoff in areas lacking statistical data, consistent with the research of Zema et al.^60,61. The results indicate that the 2-year return period displayed the best performance. The regression equation corresponding to the 2-year return period for application in stations lacking statistical data is presented in Eq. (16). Figure 2i shows the spatial distribution of discharge with a 2-year return period across the study area.

$$D_{2} = - 0.051A + 10.77CN + 0.487P + 1.17S + 0.002L + ( - 1.129SL) + ( - 850.791)$$

(16)

where ${D_2}$ denotes the discharge corresponding to a 2-year return period, A is the area, CN is the curve number, P represents precipitation, S indicates retention, L denotes the length of the Stream, and SL signifies the slope of the Stream.

Considering the Durbin-Watson statistic (acceptable range of 1.5 to 2.5), the return periods of 5, 10, and 25 years are validated for performing multiple regression analysis between the input and output data. The determination coefficient, RMSE, and Nash-Sutcliffe efficiency values in the multiple regression analysis corroborate this evaluation Table 7.

Table 7 Comparison of multiple regression and SCS methods across different return Periods.

Full size table

Slope

The slope of a watershed plays a significant role in runoff and flooding. Increased slope results in decreased infiltration, increased runoff, and elevated peak discharge during flood events²⁸. The slope map is presented in Fig. 2j.

Aspect

Aspect can impact infiltration rates through the alteration of factors such as melting, evaporation, soil moisture, and wind. The Aspect map was classified into ten classes, ranging from 0 to 360 degrees (Fig. 2k).

Geology

From a geological perspective, the direct influence of rocks on permeability and surface runoff makes them critical factors in the occurrence of flooding phenomena in watersheds. Fig. 2l presents the geological map of the study area. Table 8 provides details for each geological class identified in the sub-basins of the Dez watershed in Lorestan Province.

Table 8 Geological classes of the Dez watershed Sub-basins in Lorestan Province.

Full size table

Stream power index (SPI)

The Stream Power Index (SPI) serves as a crucial parameter for assessing stream dynamics and erosion in river systems, significantly influencing the occurrence of flash floods. Specifically, variations in this index can indicate fundamental changes in hydrological patterns as well as the potential for erosion and sedimentation within a watershed. Figure 2m illustrates the SPI map for the study area.

Sediment transport index (STI)

The Sediment Transport Index (STI) is a valuable tool for evaluating the risk of flash floods in various regions, particularly in areas prone to severe soil erosion. Figure 2n presents the sediment transport index map for the region under study.

Vegetation cover (NDVI)

The Normalized Difference Vegetation Index (NDVI) is one of the most recognized, straightforward, and widely used indices in vegetation studies. The values of this index range from − 1 to + 1, where higher values indicate greater vegetation density. Negative values correspond to clouds, snow, and water. The NDVI map for the study area is shown in Fig. 2o.

Topographic wetness index (TWI)

The Topographic Wetness Index (TWI) plays a significant role in influencing flash flood occurrences, as this index reflects the tendency of water to accumulate in specific locations, influenced by the slope of the terrain. Areas with high TWI typically have greater capacities for water accumulation and may exhibit poor drainage, making them susceptible to flooding during heavy rainfall. Furthermore, TWI data aids in predicting moisture conditions and managing water resources, which can be instrumental in mitigating flash flood risks. Figure 2p displays the TWI map for the study area.

Distance from river

Distance from river is a critical factor in assessing the risk of flash floods, as proximity to the river increases the likelihood of inundation. Intense rainfall can significantly elevate water levels and expose adjacent areas to flooding. Figure 2q illustrates the Distance from river for the study area.

The results derived from the analyses conducted using the physical parameters of the watershed are presented in digital maps. These maps illustrate the spatial distribution of key features within the watershed, including area, elevation, drainage network, and water flow patterns. Utilizing geographic maps allows for a visual and scientific examination of the relationships between watershed parameters and the potential for flash floods. Figure 3 encompasses various watershed metrics, including area, perimeter, Basin length, stream order, stream length, average stream length, maximum and minimum elevation, relife ratio, bifurcation ratio, frequency of waterways, drainage density, ruggedness number, length ratio, circularity ratio, and form factor.

Detection of multicollinearity among factors

To execute the model, a multiple correlation analysis was initially performed to detect multicollinearity among the various factors. The tolerance statistic (TOL) and the variance inflation factor (VIF) were employed for this purpose. Parameters exhibiting multicollinearity are utilized for modeling as outlined in Table 9.

Table 9 Multiple correlation analysis of input parameters to the Model.

Full size table

Determination of training and validation datasets

In this phase of the research, a sampling method was employed to create training and validation datasets. Field visits were conducted, and using a Global Positioning System (GPS) device, data of flood-prone and non-flood-prone locations were collected, identifying 252 sites in total. From this total, 70% of the data were employed in the training phase and 30% in the validation phase of the model (Fig. 4).

Model results

The spatial outputs of the models are visually presented in Fig. 5, which displays the flash flood susceptibility maps on a continuous scale from low to very high. A quantitative analysis of these maps reveals a critical insight: the Random Forest model, which was the most accurate, provides a more precise and conservative spatial estimation of high-risk areas. For instance, the northeastern and southwestern parts of the study area fall within the “high” to “very high” susceptibility range. In contrast, the AdaBoost model, which demonstrated the lowest performance, categorizes the central to southern part of the region as high-risk. This estimation by AdaBoost, which is consistent with its low AUC score, indicates a higher rate of false positives, meaning this model has a tendency to mislabel safe areas as hazardous. Therefore, the Random Forest model’s map is not only statistically superior but also more reliable for practical risk management and planning purposes.

Model evaluation

A comprehensive analysis of the model evaluation results clearly demonstrates that the Random Forest (RF) model exhibits decisive superiority across all key performance metrics. This model is identified as the optimal performer, characterized by the lowest error rates (RMSE: 0.12 and MAE: 0.11), the highest capability in explaining data variance (R²: 0.94), the greatest accuracy (Accuracy: 0.95), and the most robust power for discriminating between classes (AUC: 0.89). Its performance is markedly superior, distancing itself significantly from other competing models. Within this comparative framework, the H2O model emerges as a reliable alternative and the second-best candidate. Although its results do not match those of the RF model, its consistent second-place ranking across all metrics (e.g., RMSE: 0.25 and Accuracy: 0.85) indicates a performance substantially better than the remaining models. Among the other models, XGBoost, which typically ranks third or fourth, demonstrates relatively better performance compared to CatBoost, LightGBM, and AdaBoost. These three latter models occupy the lowest tiers of the comparison, with their results on indicators such as R² and AUC being evaluated as weak.

Table 10 Performance evaluation results of the models utilized in the study.

Full size table

Furthermore, the discriminatory power of each model was quantitatively compared using the Receiver Operating Characteristic (ROC) curves presented in Fig. 6. The Area Under the Curve (AUC) values, derived from this figure, provide a definitive metric for model comparison. The Random Forest model achieved a remarkable AUC of 0.89, whereas the AdaBoost model yielded a substantially lower AUC of 0.60. This notable difference of 0.29 in the AUC scores underscores the superior capability of the Random Forest model in distinguishing between flood-prone and non-flood-prone locations. An AUC value of 0.89 indicates high predictive accuracy, while a value of 0.60 suggests a model performing only marginally better than random guessing. This visual and quantitative evidence from Fig. 6 strongly corroborates the statistical findings presented in Table 10, firmly establishing Random Forest as the most robust model for this specific geospatial application.

Comparative analysis of computational efficiency in machine learning models

The computational efficiency assessment of the models is summarized in Table 11. The analysis reveals that Random Forest and LightGBM maintain the highest computational efficiency, demonstrating relatively low processing times (approximately 10 and 14 min, respectively) coupled with moderate memory consumption (2–3 GB). Conversely, CatBoost and H2O Deep Learning were classified as the most resource-demanding algorithms, exhibiting significantly higher processing times (32 and 43 min) and substantial memory requirements (5–8 GB). XGBoost and AdaBoost were positioned within the moderate efficiency category, with processing times of approximately 20 and 16 min and memory usage of 4 GB and 3 GB respectively.

Table 11 Computational efficiency comparison of machine learning models.

Full size table

Discussion and conclusion

This study evaluated the sensitivity of sub-basins in the Dez River basin to flash floods using a suite of machine learning models, including Random Forest (RF), H2O, and several boosting algorithms (XGBoost, CatBoost, LightGBM, AdaBoost). The comparative analysis revealed a pronounced superiority of the Random Forest model, which achieved exceptional performance metrics (R² = 0.94, RMSE = 0.12, AUC = 0.89). This significant outperformance over the boosting algorithms, particularly the notable gap with AdaBoost (R² = 0.58, RMSE = 0.43), warrants further interpretation within the physiographic and hydrological context of the Dez Basin.The robust performance of Random Forest can be attributed to its inherent algorithmic strengths, which are well-suited to the complex, non-linear relationships between the diverse set of geospatial and hydrological parameters used in this study (e.g., topography, land use, soil, and rainfall). The RF model’s bagging (bootstrap aggregating) ensemble method effectively reduces variance and mitigates the risk of overfitting by averaging multiple deep decision trees built on bootstrapped samples of the data. This characteristic is crucial when dealing with environmental datasets where multicollinearity and complex interactions are common. In contrast, boosting algorithms like AdaBoost and its variants focus sequentially on correcting the errors of previous models, which can sometimes make them more susceptible to overfitting on noisy data or datasets with specific characteristics, potentially explaining their relatively lower generalization capability in this specific application. The second-best performer, the H2O model, consistently ranked behind RF but ahead of the other boosting models across all metrics (e.g., R² = 0.77, RMSE = 0.25). This suggests that its underlying distributed computing framework and regularization techniques may offer a more balanced bias-variance trade-off in this spatial prediction task compared to the other evaluated boosting machines. The performance hierarchy observed (RF > H2O > XGBoost/CatBoost > LightGBM > AdaBoost) underscores that there is no single universally best algorithm; rather, the optimal choice is highly dependent on the data structure and the nature of the problem. Despite the limited results of the CatBoost model in the present study, research by Hasnaoui et al.⁶² in the Hodna Basin demonstrated that integrating CatBoost with a Convolutional Neural Network (CNN) using remote sensing data achieved 92% accuracy in flash flood susceptibility mapping. The emphasis of their study on the pivotal role of hydrological and topographic factors aligns with the current findings.The divergent findings of other studies, such as those by Xu et al.⁶³ and Kindi and Alabri¹⁰, who reported superior performance for LightGBM (AUC = 0.9896) and CatBoost/XGBoost (with AUC values of 0.91 and 0.98, respectively), further reinforce this context-dependency, highlighting how regional specificities influence model efficacy.Regarding the key drivers of flash flood sensitivity, the variable importance analysis derived from the Random Forest model identified precipitation and watershed area as the most influential factors, while stream order had the least impact. The primacy of precipitation is logically consistent with the hydro-climatic regime of the region, where intense, short-duration rainfall events are the primary trigger for flash floods. The significance of basin area relates to its role in concentrating runoff. The relative unimportance of stream order may indicate that in these high-relief, flash-flood-prone sub-basins, the rapid overland flow response to rainfall dominates the flood generation process more than the network’s hierarchical structure. The findings of this study concerning the influence of hydrological and topographic indices on flash flood susceptibility align with the results of similar research conducted by Hasnaoui et al.⁶⁴ in the Hodna Basin (Algeria), which also demonstrated, with 89% accuracy, that hydrological and topographic factors are the primary drivers of flood risk. The discrepancy between these findings and studies that identified slope⁶⁵ or distance from the river^66,67 as more critical factors can be explained by the unique combination of climatic, topographic, and geological conditions in the Dez Basin, affirming that flash flood drivers are inherently site-specific. In addition to the model comparisons, the methodological components of the study provided further insights. The land use classification, achieving an overall accuracy of 82% using Sentinel imagery and the RF algorithm on the Google Earth Engine platform, proved to be a reliable input for the model. Conversely, the overestimation of discharge by the SCS method, consistent with the findings of Talikhoshk et al.⁵⁶, underscores the limitation of simplified hydrological models that do not fully account for dynamic factors like actual soil moisture and evapotranspiration. This highlights the advantage of data-driven machine learning approaches that can implicitly capture these complex processes when trained on representative data. In conclusion, this research demonstrates that Random Forest is a highly effective tool for flash flood susceptibility mapping in the Dez Basin, outperforming several advanced boosting algorithms. The results emphasize that model selection should be guided by the specific characteristics of the study area. The identification of precipitation and basin area as key factors provides critical information for local disaster management planners, suggesting that monitoring rainfall patterns and considering basin scale are paramount for early warning systems. Crucially, the practical implications of these scientific findings for basin management are threefold:

Prioritization of infrastructure investments

The high-resolution susceptibility map generated by the RF model identifies specific zones with “very high” flood risk (AUC > 0.85), particularly in the northeastern and southwestern sub-basins. These areas should be prioritized for structural interventions such as slope stabilization measures (e.g., check dams, terracing) and stormwater drainage systems. Given the dominance of precipitation intensity (β = 0.34) and basin area (β = 0.28) as controlling factors, investments in early-warning systems should focus on real-time rainfall monitoring networks in these high-risk zones rather than on uniform basin-wide coverage.

Land-use planning regulations

The strong influence of land use on flood susceptibility necessitates policy reforms. We recommend that local authorities in Lorestan Province amend zoning regulations to:

Restrict new construction in areas classified as “high” or “very high” susceptibility.

Incentivize preservation of natural vegetation covers (NDVI > 0.6) in headwater regions, as these areas significantly reduce runoff generation.

Community-based adaptation strategies

The model’s accuracy in identifying flood-prone locations provides a scientific basis for community engagement. Disaster risk reduction programs should:

Train farmers in soil conservation techniques for areas with low drainage and high clay content, where infiltration is limited. Establish community-based rainfall monitoring stations in data-scarce sub-basins to address the “temporal coverage limitations” identified in this study. These management actions directly translate our scientific findings into resilience-building measures, demonstrating how geospatial modeling can inform sustainable water resource governance in data-scarce regions like the Dez Basin. Future work could explore hybrid modeling approaches and incorporate real-time meteorological data to further enhance predictive accuracy and operational utility.

Study limitations

This study has several limitations that should be considered when interpreting the results:

Data resolution constraints

The Sentinel-2 imagery used has a spatial resolution of 10 m (visible bands), which may not capture sub-pixel hydrological features. For instance, narrow drainage channels (< 10 m width) might be underestimated, potentially affecting the accuracy of flood susceptibility mapping in areas with complex topography. Despite this constraint, the Random Forest model maintained high accuracy (AUC = 0.89), suggesting its robustness to minor spatial limitations.

Temporal coverage limitations

The 5-day revisit period of Sentinel-2 could miss rapid hydrological events. Particularly during flash floods, critical peak flow data might not be captured, leading to underestimation of flood dynamics in short-duration events.Future studies should integrate higher-resolution imagery (e.g., PlanetScope at 3 m) and fuse Sentinel-2 with radar data (Sentinel-1) to improve temporal resolution.

Hydrological estimation uncertainties

Our discharge estimates have a margin of error of ± 15% (based on RMSE calculations against gauge data in Table 7). This uncertainty propagated to the flood susceptibility maps (Fig. 5). Additionally, the SCS method consistently overestimated discharge (Table 7), highlighting limitations in simplified hydrological models that do not fully account for dynamic factors like soil moisture and evapotranspiration.

Sampling and model generalizability

The flood inventory dataset (252 points) may not fully represent the spatial heterogeneity of flash flood events across the Dez Basin. While the Random Forest model achieved high accuracy (AUC = 0.89), its performance may vary in regions with different climatic or geomorphological conditions. the scarcity of ground-based hydrological stations remains a critical constraint. Additionally, machine learning models like Random Forest may overfit to training data in regions with limited flood samples, and their transferability to ungauged basins is uncertain. The susceptibility maps are validated at the sub-basin scale but may not capture local-scale dynamics in urbanized areas (e.g., Khorramabad city). Future studies should incorporate denser sampling networks and real-time monitoring data.

Data availability

The data are available upon request from the corresponding author.

References

Holton, J. R., Pyle, J. A. & Curry, J. A. Encyclopedia of Atmospheric Sciences. Second Edition. (Academic Press, 2015).
Georgakakos, K. P. Analytical results for operational flash flood guidance. J. Hydrol. 317 (1–2), 81–103. https://doi.org/10.1016/j.jhydrol.2005.05.009 (2006).
Article ADS Google Scholar
Norbiato, D., Borga, M., Esposti, D., Gaume, S., Anquetin, E. & S Flash flood warning based on rainfall depth-duration thresholds and soil moisture conditions: an assessment for gauged and ungauged basins. J. Hydrol. 362 (3–4), 274–290. https://doi.org/10.1016/j.jhydrol.2008.08.023 (2008).
Article ADS Google Scholar
Costache, R. et al. Flash-flood hazard using deep learning based on H2O R package and fuzzy-multicriteria decision-making analysis. J. Hydrol. 609, 127747. https://doi.org/10.1016/j.jhydrol.2022.127747 (2022).
Article Google Scholar
Youssef, A. M., Pradhan, B. & Sefry, S. A. Flash flood susceptibility assessment in Jeddah City (Kingdom of Saudi Arabia) using bivariate and multivariate statistical models. Environ. Earth Sci. 75 (1), 12. https://doi.org/10.1007/s12665-015-4830-8 (2016).
Article ADS Google Scholar
Habibi, A., Delavar, M. R., Sadeghian, M. S. & Nazari, B. Flood susceptibility mapping and assessment using regularized Random forest and naïve bayes algorithms. ISPRS Annals Photogrammetry Remote Sens. Spat. Inform. Sci., 10, 241–248. https://doi.org/10.5194/isprs-annals-X-4-W1-2022-241-2023 (2023).
Ren, H. et al. Flood susceptibility assessment with random sampling strategy in ensemble learning (RF and XGBoost). Remote Sens. 16 (2), 320. https://doi.org/10.3390/rs16020320 (2024).
Article ADS Google Scholar
SELLAMI, E. M. & Rhinane, H. Google Earth engine and machine learning for flash flood exposure Mapping—Case study: Tetouan, Morocco. Geosciences 14 (6), 152. https://doi.org/10.3390/geosciences14060152 (2024).
Article ADS Google Scholar
Elghouat, A. et al. Integrated approaches for flash flood susceptibility mapping: Spatial modeling and comparative analysis of statistical and machine learning models. A case study of the Rheraya watershed, Morocco. J. Water Clim. Change. 15 (8), 3624–3646. https://doi.org/10.2166/wcc.2024.726 (2024).
Article Google Scholar
Al-Kindi, K. M. & Alabri, Z. Investigating the role of the key conditioning factors in flood susceptibility mapping through machine learning approaches. Earth Syst. Environ. 8 (1), 63–81. https://doi.org/10.1007/s41748-023-00369-7 (2024).
Article ADS Google Scholar
Wahba, M. et al. Forecasting of flash flood susceptibility mapping using Random forest regression model and geographic information systems. Heliyon https://doi.org/10.1016/j.heliyon.2024.e33982 (2024).
Article PubMed PubMed Central Google Scholar
Rutledge, D. N. & Barros, A. S. Durbin–Watson statistic as a morphological estimator of information content. Anal. Chim. Acta. 454 (2), 277–295. https://doi.org/10.1016/S0003-2670(01)01555-0 (2002).
Article CAS Google Scholar
Salas, J. D. Analysis and modeling of hydrological time series. In: Maidment DR, editor. Handbook of hydrology. (McGraw-Hill, 1993).
Mohammadzadeh, A. & Massoudzadegan, S. Forecasting daily volatility and value at risk with high frequency data. Dev. Transformation Manage. Q. 8 (27), 63–74 (2015).
Google Scholar
Choubin, B. et al. Regional groundwater potential analysis using classification and regression trees. In Spatial modeling in GIS and R for earth and environmental sciences. 485–498 https://doi.org/10.1016/B978-0-12-815226-3.00022-3 (2019).
Monfared, B., Najafabadi, M. & Nafarzadegan, A. R., Flood zoning and identification of effective factors in flood occurrence: A case study of the urban watershed of Bastak. Master’s thesis, Department of Desert Management and Control. (Hormozgan University, 2021).
Eisfelder, C. et al. Cropland and crop type classification with Sentinel-1 and Sentinel-2 time series using Google Earth engine for agricultural monitoring in Ethiopia. Remote Sens. 16 (5), 866. https://doi.org/10.3390/rs16050866 (2024).
Article ADS Google Scholar
Rasti, S., Mahdavifardnh, M., Shaykh Ghaderi, H., Nasiri, A. & Taktaz, N. Z. Improving classification accuracy by combining multi-season images of Sentinel 1 and 2 in order to prepare a landuse map in the cloud space of Google Earth engine (case study: Guilan province). Geogr. Hum. Relations. 5 (3), 357–373. https://doi.org/10.22034/gahr.2022.336692.1696 (2022).
Article Google Scholar
Yoothong, K., Moncharoen, L., Vijarnson, P. & Eswaran, H. Clay mineralogy of Thai soils. Appl. Clay Sci. 11 (5–6), 357–371. https://doi.org/10.1016/S0169-1317(96)00033-6 (1997).
Article CAS Google Scholar
Kariuki, P. C., Woldai, T. & Van Der Meer, F. Effectiveness of spectroscopy in identification of swelling indicator clay minerals. Int. J. Remote Sens. 25 (2), 455–469. https://doi.org/10.1080/0143116031000084314 (2004).
Article Google Scholar
Stenberg, B., Rossel, R. A. V., Mouazen, A. M. & Wetterlind, J. Visible and near infrared spectroscopy in soil science. Adv. Agron. 107, 163–215. https://doi.org/10.1016/S0065-2113(10)07005-7 (2010).
Article CAS Google Scholar
Danoedoro, P. & Zukhrufiyati, A. Integrating spectral indices and geostatistics based on Landsat-8 imagery for surface clay content mapping in Gunung Kidul area, Yogyakarta, Indonesia. In: Proceedings of the 36th Asian Conference on Remote Sensing; ; Asia Quezon, Metro Manila, Philippines. (2015).
Sabins, F. F. Remote sensing for mineral exploration. Ore Geol. Rev. 14, 157–183. https://doi.org/10.1016/S0169-1368(99)00007-4 (1999).
Article Google Scholar
Khan, N. M., Rastoskuev, V. V., Sato, Y. & Shiozawa, S. Assessment of hydrosaline land degradation by using a simple approach of remote sensing indicators. Agric. Water Manag. 77, 96–109. https://doi.org/10.1016/j.agwat.2004.09.038 (2005).
Article Google Scholar
Asfaw, E., Suryabhagavan, K. V. & Argaw, M. Soil salinity modeling and mapping using remote sensing and GIS: the case of Wonji sugar cane irrigation farm, Ethiopia. J. Saudi Soc. Agric. Sci. 17, 250–258. https://doi.org/10.1016/j.jssas.2016.05.003 (2018).
Article Google Scholar
Caloz, R., Abednego, B. & Collet, C. The Normalisation of a Soil Brightness Index for the Study of Changes in Soil Conditions. In: Proceedings of the 4th International Colloquium on Spectral Signatures of Objects in Remote Sensing. 18–22 (1988).
Bousbih, S. et al. Soil texture Estimation using radar and optical data from Sentinel-1 and Sentinel-2. Remote Sens. 11 (13), 1520. https://doi.org/10.3390/rs11131520 (2019).
Article ADS Google Scholar
Parvaresh, A., Mahdavi, R., Melkian, A., Ismailpour, Y. & Halisaz, A. Prioritizing the flood potential of sub-watersheds in Sokhon. Hormozgan using fuzzy TOPSIS and ELECTRE methods. Doctoral dissertation in Watershed Sciences and Engineering. (Hormozgan University, 2018).
Moore, I. D. & Grayson, R. B. Landson. Digital terrain modeling: A review of hydrological, Geomorphological and biological application. Modelling Hydrology. 5, 3–30. https://doi.org/10.1002/hyp.3360050103 (1991).
Article Google Scholar
Moore, I. D. & Burch, G. J. Sediment transport capacity of sheet and Rill flow: application of unit stream power theory. Water Resour. Res. 22 (8), 1350–1360. https://doi.org/10.1029/WR022i008p01350 (1986).
Article ADS Google Scholar
Nookaratnam, K., Srivastava, Y. K., Venkateswarao, V., Amminedu, E. & Murthy, K. S. R. Check dam positioning by prioritization of micro watersheds using SYI model and morphometric analysis remote sensing and GIS perspective. J. Indian Soc. Remote Sens. 33 (1), 25–28. https://doi.org/10.1007/BF02989988 (2005).
Article Google Scholar
Schumn, S. A. Evolution of drainage basins and slopes in bund land of Peth Amboy, new Jersey. Bull. Geol. Soc. Am. 67, 597–646 (1956).
Article Google Scholar
Horton, R. E. Erosional development of streams and their drainage basins; hydrophysical approach to quantitative morphology. Geol. Soc. Am. Bull. 56, 275–370. https://doi.org/10.1130/0016-7606%281945%2956%5B275%3AEDOSAT%5C2.0.CO%3B2 (1945).
Miller, V. C. A Quantitative Geomorphic Study of Drainage Basin Characteristics on the Clinch Mountain Area, Virgina and Tennessee, Proj. 389–402 (Columbia University, 1953).
Strahler, A. N. Quantitative analysis of watershed geomorphology. Eos Trans. Am. Geophys. Union. 38, 913–920 (1957).
Article Google Scholar
Strahler, A. N. & Part, I. I. Quantitative geomorphology of drainage basins and channel networks. In Handbook of Applied Hydrology 4–39 (McGraw-Hill, 1964).
Google Scholar
Schumm, S. A. Evolution of drainage systems and slopes in badlands at Perth Amboy, New Jersey. Geol. Soc. Am. Bull. 67, 597–646. https://doi.org/10.1130/0016-7606%281956%2967%5B597%3AEODSAS%5C2.0.CO%3B2 (1956).
LeCun, Y. & Yoshua, B. Deep learning. Nature 521(7553), 436–444. https://doi.org/10.1038/nature14539 (2015).
Article CAS PubMed ADS Google Scholar
Schmidhuber, J. Deep Learning in Neural Network: An Overview. Neural Networks 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 (2015).
Article PubMed ADS Google Scholar
Choi, J. et al. An optimal boosting algorithm based on nonlinear conjugate gradient method. J. Korean Soc. Industr. Appl. Mathemat. 22(1), 1–13 (2018).
MathSciNet Google Scholar
Divakar, K. & Chitharanjan, K. Performance evaluation of credit card fraud transactions using boosting algorithms. Int. J. Electron. Commun. Comput. Eng. IJECCE. 10 (6), 262–270 (2019).
Google Scholar
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55 (1), 119–139. https://doi.org/10.1006/jcss.1997.1504 (1997).
Article MathSciNet Google Scholar
Iban, M. C. & Bilgilioglu, S. S. Snow avalanche susceptibility mapping using novel tree-based machine learning algorithms (XGBoost, NGBoost, and LightGBM) with eXplainable artificial intelligence (XAI) approach. Stoch. Env. Res. Risk Assess. 37 (6), 2243–2270. https://doi.org/10.1007/s00477-023-02392-6 (2023).
Article Google Scholar
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. https://doi.org/10.1214/aos/1013203451 (2001).
Article MathSciNet Google Scholar
Du, J., Fang, J., Xu, W. & Shi, P. alysis of dry/wet conditions using the standardized precipitation index and its potential usefulness for drought/flood monitoring in Hunan Province, China. Stochastic Environ. Res. Risk Assess. 27, 377–387. https://doi.org/10.1007/s00477-012-0589-6 (2013).
Article Google Scholar
Hajizadeh, H., Farhang, M. & Vafaie Sadr, A. Searching for cosmic strings in Planck data using image processing tools and machine learning. Master’s thesis in Physics. (Shahid Beheshti University, 2020).
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794 https://doi.org/10.1145/2939672.2939785 (2016).
Brownlee, J. Imbalanced Classification with Python: Better metrics, Balance Skewed classes, cost-sensitive Learning. (Machine Learning Mastery, 2020).
Truong, V. H., Papazafeiropoulos, G., Vu, Q. V., Pham, V. T. & Kong, Z. Predicting the patch load resistance of stiffened plate girders using machine learning algorithms. Ocean Eng. 240, 109886. https://doi.org/10.1016/j.oceaneng.2021.109886 (2021).
Article Google Scholar
Liang, Y. et al. Product marketing prediction based on XGboost and LightGBM algorithm, In: 2nd International Conference on Artificial Intelligence and Pattern Recognition. 150–153 https://doi.org/10.1145/3357254.3357290 (2019).
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst., 30. (2017).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. ArXiv Preprint. https://doi.org/10.48550/arXiv.1810.11363 (2018). arXiv:1810.11363.
Article Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. Adv. Neural. Inf. Process. Syst. 31. (2018).
Saber, M. et al. Enhancing flood risk assessment through integration of ensemble learning approaches and physical-based hydrological modeling. Geomatics Nat. Hazards Risk. 14 (1), 2203798. https://doi.org/10.1080/19475705.2023.2203798 (2023).
Article Google Scholar
Mahdavi, M. Applied hydrology 8th edn, Vol. 2, 437 (Tehran University, 2013).
Google Scholar
Talikhoshk, S., Mohseni Saravi, M., Vafakhah, M. & Khalighi Sigaroodi, S. Comparison of neuro-fuzzy and SCS methods in prioritizing sub-watersheds for watershed management actions: A case study of the Talghan watershed. Scientific-Research J. Rangel. Watershed Manage. 68 (2), 213–225. https://doi.org/10.22059/jrwm.2015.54922 (2015).
Article Google Scholar
Hosseini, Y. Comparison of SCS unit hydrograph and uniform methods in estimating the maximum flood discharge of the Amoughin basin. Hydrogeomorphology 21 (6), 87–107 (2019).
MathSciNet Google Scholar
Esfandiari, F., Pourganji, Z., Mostafazadeh, R. & Aghaei, M. Comparison of methods for converting effective precipitation to surface runoff in simulating flood hydrographs in the Naneh Karan basin, ardabil Province. Hydrogeomorphology 9 (32), 63–86. https://doi.org/10.22034/hyd.2022.50000.1624 (2022).
Article Google Scholar
Soleimani, K., Shokrian, F., Abdoli, S. & Sabri, E. Prioritizing flood risk potential in the Talhar watershed using. Geographic Inform. Syst. Ecohydrology 8(3), 749–762. https://doi.org/10.22059/ije.2021.324244.1509 (2021).
Article Google Scholar
Haghizadeh, A., Mohammadlou, M. & Noori, F. Simulation of rainfall-runoff processes using artificial neural networks, adaptive neuro-fuzzy systems, and multivariate regression: A case study of the Khorramabad watershed. Hydrogeomorphology 2 (2), 233–243. https://doi.org/10.22059/ije.2015.56243 (2015).
Article Google Scholar
Zema, D. A., Parhizkar, M., Plaza-Alvarez, P. A., Xu, X. & Lucas-Borja, M. E. Using random forest and multiple-regression models to predict changes in surface runoff and soil erosion after prescribed fire. Model. Earth Syst. Environ. 10 (1), 1215–1228. https://doi.org/10.1007/s40808-023-01838-8 (2024).
Article Google Scholar
Hasnaoui, Y. et al. Enhanced machine learning models development for flash flood mapping using Geospatial data. Euro-Mediterranean J. Environ. Integr. 9 (3), 1087–1107 (2024).
Article Google Scholar
Xu, K., Han, Z., Xu, H. & Bin, L. Rapid prediction model for urban floods based on a light gradient boosting machine approach and Hydrological–Hydraulic model. Int. J. Disaster Risk Sci. 14 (1), 79–97. https://doi.org/10.1007/s13753-023-00465-2 (2023).
Article Google Scholar
Hasnaoui, Y. et al. Integrated Remote Sensing and Deep Learning Models for Flash Flood Detection Based on Spatio-temporal Land Use and Cover Changes in the Mediterranean Region 1–23 (Environmental modeling & assessment, 2025).
Abedi, R., Costache, R., Shafizadeh-Moghadam, H. & Pham, Q. B. Flash-flood susceptibility mapping based on XGBoost, random forest and boosted regression trees. Geocarto Int. 37 (19), 5479–5496. https://doi.org/10.1080/10106049.2021.1920636 (2022).
Article ADS Google Scholar
Janizadeh, S., Vafakhah, M., Kapelan, Z. & Mobarghaee Dinan, N. Hybrid XGboost model with various bayesian hyperparameter optimization algorithms for flood hazard susceptibility modeling. Geocarto Int. 37 (25), 8273–8292. https://doi.org/10.1080/10106049.2021.1996641 (2022).
Article ADS Google Scholar
Vafakhah, M., Nasiri Khiavi, A., Janizadeh, S. & Ganjkhanlo, H. Evaluating different machine learning algorithms for snow water equivalent prediction. Earth Sci. Inf. 15 (4), 2431–2445. https://doi.org/10.1007/s12145-022-00846-z (2022).
Article ADS Google Scholar
Moharrami, M., Attarchi, S., Gloaguen, R. & Alavipanah, S. K. Integration of Sentinel-1 and Sentinel-2 data for ground truth sample migration for multi-temporal land cover mapping. Remote Sens. ; 16(9):1566. https://doi.org/10.3390/rs16091566 (2024).
Mullissa, A. et al. LUCA: A Sentinel-1 SAR-Based global forest landuse change alert. Remote Sens. 16 (12), 2151. https://doi.org/10.3390/rs16122151 (2024).
Article ADS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Watershed Management Engineering, Faculty of Natural Resources, Lorestan University, Khorramabad, Iran
Hafez Mirzapour, Ali Haghizadeh & Mahdi Soleimani Motlagh

Authors

Hafez Mirzapour
View author publications
Search author on:PubMed Google Scholar
Ali Haghizadeh
View author publications
Search author on:PubMed Google Scholar
Mahdi Soleimani Motlagh
View author publications
Search author on:PubMed Google Scholar

Contributions

AH, HM; Methodology: AH, HM; Formal analysis and investigation: AH, HM, MS; Writing—original draft preparation: AH, HM; Writing— review and editing: AH, HM, MS; Supervision: AH. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ali Haghizadeh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Mirzapour, H., Haghizadeh, A. & Motlagh, M.S. Evaluating machine learning efficiency and accuracy for real time flash flood mapping. Sci Rep 16, 3975 (2026). https://doi.org/10.1038/s41598-025-34037-9

Download citation

Received: 06 August 2025
Accepted: 24 December 2025
Published: 30 December 2025
Version of record: 29 January 2026
DOI: https://doi.org/10.1038/s41598-025-34037-9

Subjects

Abstract

Introduction

Study area

Methodology

Selection of parameters influencing flood occurrence

Landuse and land cover

Hydrological soil groups

Simple ratio clay index (SRCI)

Brightness index (BI)

Soil texture classification and moisture analysis

Normalized difference vegetation index (NDVI)

Stream power index (SPI)

Sediment transport index (STI)

Topographic wetness index (TWI)

Physical characteristics of the basin

Creation of training and validation datasets

Preparation of flash flood sensitivity prediction maps

Random forest (RF)

Deep learning (DL)

Boosting machines (BM)

Adaptive boosting (AdaBoost)

Gradient boosting machines (GBM)

Extreme gradient boosting (XGBoost)

Light gradient boosting machines (LightGBM)

CatBoost

Evaluation of model performance

Evaluation of flash flood sensitivity prediction map accuracy using the ROC curve

Results

Factors influencing flash flood occurrence

Landuse

Soil texture map

Hydrologic soil groups map

Curve number (CN) map

Precipitation and discharge

Slope

Aspect

Geology

Stream power index (SPI)

Sediment transport index (STI)

Vegetation cover (NDVI)

Topographic wetness index (TWI)

Distance from river

Detection of multicollinearity among factors

Determination of training and validation datasets

Model results

Model evaluation

Comparative analysis of computational efficiency in machine learning models

Discussion and conclusion

Prioritization of infrastructure investments

Land-use planning regulations

Community-based adaptation strategies

Study limitations

Data resolution constraints

Temporal coverage limitations

Hydrological estimation uncertainties

Sampling and model generalizability

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links