Introduction

Climate change intensifies the hydrological cycle, which in turn lead to more frequent extreme rainfall events, a trend expected to continue in the future1,2,3. These changes in climate patterns directly impact local rainfall trends, therefore influencing streamflow and the frequency of flash floods4,5. Floods, as a historically severe natural hazard, pose significant risks to human populations, particularly in densely populated areas6,7,8, which subsequently heighten socio-environmental hazards9. Rainfall parameters such as duration, total amount, intensity, and time-space distribution are the major factors that influence flash flood occurrence10,11,12. Also, these factors impact the prediction of future rainfall, which is crucial for designing reliable flood management facilities13. To tackle this challenge, Intensity-Duration-Frequency (IDF) curves can be used to estimate future rainfall intensity in a specific return period and duration14,15. In this regard, IDF curves for Rize Province, Turkey, from 2013 to 2099 have been created in a study16. Their study emphasizes the importance of reevaluating past design storms using IDF curves around the globe16.

Since climate change can significantly impact rainfall patterns and streamflow17,18,19, the projection of IDF curves must be incorporated in rainfall analysis and flood simulations. Analyzing changes in future extreme rainfall or IDF curves relies on the General Circulation Models (GCMs) predictions20,21,22; however, the coarse resolution of these predictions limits their effectiveness for basin-scale applications23. To solve this issue, downscaling techniques can be employed to generate data at a more localized and accurate scale1,24,25. In this context, there are two key groups of downscaling techniques: (a) dynamical downscaling (DD), which employs Regional Climate Models (RCMs) to downscale GCMs variables; (b) Statistical Downscaling (SD), which establishes a statistical or empirical relationship between large-scale atmospheric variables (predictors) and regional variables (predictands)26. The SD approaches are more appropriate methods compared to DD approaches as they offer reliable accuracy, easy implementation, and lower computational cost27,28. Therefore, SD techniques might be useful for studies focusing on downscaling and projecting rainfall at a basin and local scale29. These methods are categorized into two main groups: Perfect Prognosis (PP) and Model Output Statistics (MOS)26,30.

The PP and MOS methods have inherent differences in their downscaling process. Although PP approaches forge the statistical relationship between an observed climate variable as a predictor and the observed large-scale data as a predictand, in MOS, the relationship with the observed predictand is developed using GCM-based predictors. Since statistical downscaling using PP approaches relies on accurately forecasting large-scale predictors, the application of such methods can result in uncertainties26. As an alternative, MOS can be employed to explicitly consider GCMs errors and biases in their analysis31. This method has proven to be a reliable tool for climate change projections with a substantial database of past patterns and can provide more advantages over PP for addressing future local scale predictions32,33,34. Additionally, by combining rainfall and temperature with circulation data as predictors, MOS methods can improve the dynamic control of rainfall estimations35.

Due to the complexity of spatio-temporal relationships between climate variables, traditional simple methods cannot effectively capture these interactions, resulting in less reliable downscaling30. To address this issue, recent research has employed machine learning (ML) techniques as MOS-based approaches to improve the accuracy of the downscaled data34,36. For instance, George and Athira (2023) employed a multi-stage stochastic method using the Relevance Vector Machine (RVM) model for rainfall downscaling in the Bharathapuzha River Basin, India37. Additionally, Niazkar et al. (2023) utilized multi-gene genetic programming (MGGP) and artificial neural networks (ANN) to downscale climate change models for predicting temperature in Kohgiluyeh and Boyer-Ahmad Province, Iran38. The aforementioned studies revealed that ML approaches have shown satisfactory results in downscaling climate variables.

Given the diversity of machine learning algorithms, it is essential to evaluate their performance in learning historical patterns of climate variables to identify and select the most suitable algorithm for downscaling. In this context, the H2O-AutoML platform is a valuable tool for automating the data training and validation process. This platform encompasses various tasks, including data pre-processing, feature selection, model selection, and hyperparameter tuning39. By reducing the need for human intervention and expert knowledge, H2O-AutoML offers an efficient and rapid means of establishing relationships between rainfall and climate variables.

After downscaling and predicting future rainfall, a post-processing approach is needed to generate related IDF curves under different scenarios. In this context, several researchers have constructed IDF curves using various distribution functions, such as Generalized Extreme Values (GEV)40; the Gumbel distribution41; Log Pearson Type III42; and the Bayesian beta distribution14; each employed for presenting IDF curves.

After establishing future IDF curves and specifying design rainfall, a robust tool must be employed to formulate the hydrological conditions of the study area and determine the characteristics of flash floods under different climate change scenarios. Currently, few numerical models can accurately analyze hydrological processes at the catchment scale under sub-daily time steps, which is a key requirement for simulating the hydrological response of short-time concentration basins7,43,44,45. In this regard, the Soil and Water Assessment Tool (SWAT) model46,47 can provide spatial accuracies at the Hydrological Response Unit (HRU) level, which makes it an appropriate tool for evaluating the impact of land use on the outputs. Additionally, the SWAT model has a wide range of applications in ungauged basins48,49.

Although recent studies have overlooked the analysis of GCM scenarios, distribution functions, and machine learning models in rainfall and runoff estimations, this study introduces a novel and comprehensive framework that, for the first time, integrates the H2O-AutoML platform with the SWAT hydrological model to predict future flash floods. This hierarchical framework significantly enhances the accuracy of future flash flood predictions at each step of the process. In other words, the innovation of this research lies in its strategic organization of hierarchical levels and the application of robust, diverse methodologies, culminating in highly accurate predictions for future flood events. In the first step of the study, Atmospheric-Oceanic General Circulation Models (AOGCMs) are analyzed to identify the most accurate model for representing historical rainfall in North Al-Batinah. Following this, future rainfall scenarios predicted by the selected model are examined and downscaled utilizing H2O-AutoML. The projection of future IDF curves is achieved by fitting different distribution functions. Finally, the specified rainfall is input into the SWAT model to determine future flash flood characteristics.

In the subsequent sections of the study, Sect. “Methodology” details the materials and methods, Sect. “Case study” describes the case study characteristics, and Sect. “Results and discussion” presents the results. Furthermore, the penultimate section explains the novelties and discusses the current research, and Sect. “Discussion” comprehensively explains the conclusions.

Methodology

This section presents a detailed description of the methods employed within the framework. The procedural steps are visually represented in Fig. 1, and the hierarchical structure of the framework is outlined as follows:

  1. 1.

    Sect. “Climate change projections” provides a comprehensive explanation of AOGCM models, elucidating the information for selecting the most suitable model for subsequent analysis.

  2. 2.

    Sect. Machine learning-based rainfall downscaling” provides a thorough explanation of H2O-AutoML, outlining its role in downscaling future rainfall scenarios.

  3. 3.

    Sect. “Generating future intensity-duration-frequency curves” explains the details of projecting future IDF curves to address the process of determining future storm rainfall patterns.

  4. 4.

    Sect. “Hydrological simulation” presents a detailed overview of the SWAT model and its components, as well as the process of calculating flash floods.

Fig. 1
figure 1

Flowchart of the proposed framework.

Climate change projections

Using Atmosphere-Ocean General Circulation Models (AOGCMs) has greatly advanced our ability to predict climate patterns by incorporating elements such as vegetation and atmospheric chemistry. However, it is essential to acknowledge that these models also have their limitations, including uncertainties in parameterization, reliance on historical data, and potential biases in climate projections that can affect their accuracy and reliability23. The Coupled Model Intercomparison Project Phase 6 (CMIP6) multi-model ensemble builds upon the findings of its predecessor, phase 5, and employs a comprehensive approach to analyze the complex mechanisms of the climate system50, and generates climate projections by considering possible future scenarios51. It combines Shared Socio-economic Pathways (SSPs) and Representative Concentration Pathways (RCPs) to provide a detailed understanding of how these factors could impact our planet’s climate. Indeed, examining SSPs scenarios provides a valuable understanding of potential effects arising from significant socio-economic developments52. The quantitative model projections are significantly improved by the inclusion of the SSPs. In this study, four CMIP6 scenarios, including SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 are utilized to evaluate the future rainfall changes in the study area.

Machine learning-based rainfall downscaling

The correlation between local rainfall and AOGCM variables is often intricate due to the interplay of various climatic factors53,54. In this context, constructing an effective model for future predictions necessitates the consideration of pertinent variables. Given the presence of large-scale atmospheric variables in forecasting future rainfall under diverse scenarios, we develop a model to downscale and enhance the precision of rainfall and flood simulations. This model employs observed rainfall as the output (predictand) variable and utilizes large-scale atmospheric variables derived from historical AOGCM models from 1995 to 2014 as input or predictor. This research uses AOGCM variables, including specific humidity (hus), average air temperature, and precipitation flux (pr) as crucial predictors. These variables were selected due to their availability in the selected CMIP6 model. After determining the predictors and predictand, downscaling model was developed using the H2O-AutoML. The following section introduces H2O-AutoML and its application in selecting the most reliable machine learning models.

H2O-AutoML application in climate change downscaling

This study aims to use ML models for downscaling future rainfall by applying the H2O-AutoML tool. H2O-AutoML, an open-source machine learning tool, is an effective platform accessible through various programming languages such as Python and R Programming39. This platform is designed for tabular datasets that can support different types of problems, including regression problems and multi-class classification. Also, H2O-AutoML has a fast scoring capability that allows multiple models to make quick predictions55. Another advantage of H2O-AutoML is its provision of Application Programming Interfaces (APIs) in various languages, which facilitates integration and broad application across different fields and problems. Additionally, it demonstrates effective performance in processing and simulating complex datasets.

H2O automates feature scaling, hyperparameter tuning, and optimization through random grid searches, allowing it to generate multiple models that are evaluated based on various performance metrics. Hence, this study develops a time-efficient framework that quickly identifies the optimal model without requiring manual trial and error. For optimizing model performance, hyper-parameters of the model were fine-tuned to minimize prediction errors and ensure a satisfactory performance level. Following the principle of testing on data not previously considered during training, we utilized a K-fold cross-validation methodology.

This study uses six learning models for downscaling the rainfall, including Distributed Random Forest (DRF), Gradient Boosting Machine (GBM), stacked ensemble learning, Generalized Linear Model (GLM), Deep Neural Networks (NN), and Extremely Randomized Trees (XRT). More explanations of these models are provided in Table 1.

Table 1 Summary of H2O-AutoML models.

Models evaluation

In this study, 80% of the CMIP6 historical dataset, including specific humidity, air temperature, and rainfall flux as inputs, along with observed rainfall data as outputs, was used for training the H2O-AutoML models. Subsequently, machine learning models were trained in the pre-processed dataset using H2O’s training functions. The training procedure consists of iterative optimizing the model’s parameters. For analyzing the performance of ML models, different evaluation metrics are employed, including Root Mean Squared Logarithmic Error (RMSLE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and mean residual deviance. In this regard, 20% of the dataset is allocating for testing and evaluating the models’ performances. Following this procedure, the optimal ML model is selected based on performance metrics and is then used for downscaling rainfall during the period from 2023 to 2042. Through this, the main aim is to develop a ML model that can accurately predict future daily rainfall based on CMIP6 scenarios.

Generating future intensity-duration-frequency curves

The future IDF curve can be constructed by fitting a Probability Distribution Function (PDF) to extreme rainfall data for different events and durations14. In this process, rainfall intensity over considered durations and return periods can be calculated based on fitted relationships for the period under consideration (2023–2042). Also, IDF curves can be constructed by fitting a probability distribution function to annual maximum rainfall of different durations (e.g., 5 min, 30 min, 1 h, 2 h). This allows for calculating rainfall quantiles corresponding to each return period (e.g., 10, 25, 50, 100 and 200 years).

Selecting the best probability distribution fitted to the annual maximum precipitations is paramount, as it can significantly influence the estimated rainfall quantiles for different return periods. In this study, various distribution functions including Log-normal, Exponential, Gamma, Normal, Exponentiated Weibull, Generalized Extreme Values (GEV), Log-Laplace, generalized normal, inverse Gaussian, Log-Pearson III, Gamma, and Exponential power, etc. are applied and fitted to the maximum daily annual rainfall under SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 scenarios24. Additionally, each distribution function is fitted to the rainfall data series, and based on Maximum Likelihood Estimation (MLE) approach, related parameters are estimated57. Also, Hannan–Quinn Information Criterion (HQIC) is calculated for each distribution function based on the log-likelihood and the number of parameters58. HQIC is a criterion used to analyze the fit performance of distribution functions, which provides a balance between the goodness of fit and the complexity of the model. Indeed, this criterion compares different models by evaluating their fit to the data while penalizing for model complexity. It also can be considered analogous to conducting a likelihood ratio test, where models are assessed not only based on their goodness of fit but also on the number of parameters they include. In this regard, we can make more informed decisions for selecting the best model(s), which helps us to make sure that we choose one that is well-fitted to the data. HQIC value can be estimated from Eq. (1).

$$\:\text{M}\text{i}\text{n}\text{i}\text{m}\text{i}\text{z}\text{e}\:\text{H}\text{Q}\text{I}\text{C}=-2\text{L}(\text{x}.\widehat{{\Theta\:}})+2\text{q}\text{l}\text{n}\left(\text{ln}\left(\text{n}\right)\right)$$
(1)

Where \(\:\text{L}(\text{x}.\widehat{{\Theta\:}})\) denotes the log-likelihood, \(\:q\) and n are number of parameters, and number of samples, respectively. Also, a lower HQIC value represents a more preferred compromise between the goodness of fit and PDF complexity in assessing different PDFs. As a result, selecting the PDF with the minimum HQIC is the best-fitting function for the dataset.

Hydrological simulation

SWAT model

The SWAT model is a continuous and semi-distributed hydrological model that is widely used to assess the effects of diverse management plans on the quality and quantity of water across different scales59,60. The SWAT model consists of various elements, including weather parameters, land cover, soil characteristics, and a crop module. To formulate the hydrological problem, the ArcSWAT tool first divides the study area into distinct subbasins based on a specified area threshold. These subbasins are then separated into different Hydrological Response Units (HRUs), which consist of land segments characterized by the same land cover type, slope percentage, and soil class61,62. In the next step, the key outputs of the model, such as runoff and evapotranspiration (ETa) are firstly calculated at the HRU scale. These outputs are then aggregated to the subbasin level and routed to the catchment outlet. The hydrological processes in the SWAT model for each HRU at daily time-steps model are modeled based on a water balance method61.

In this research, the Curve Number (CN) is applied to analyze the rainfall distribution within the soil layer and distinguish between infiltration and runoff. Also, The Hargreaves technique is employed for estimating ETa in under considration study area63.

In this study, SWAT model was executed utilizing the sub-daily runoff module based on 1-hour time steps. This specific time step was selected to accurately cover the fast development of flood events64. For more detailed explanations of the SWAT sub-daily module and additional information, refer to studies conducted by7,65.

Case study

Al-Batinah, an arid region in northeastern Oman, is bordered by the Western Hajar Mountains to the west and the Sea of Oman north. The Al-Batinah coastal plain is narrower at its northwest and southeast edges, expanding to its broadest stretch of approximately 50 km in the center. It comprises continuous alluvial fans transporting sediment from the mountains to the coast and plain. This plain is Oman’s second most densely populated area, trailing only behind the capital, Muscat7. The flat and fertile Al-Batinah coastal zone, covering over 90% of the shoreline, has evolved into a hub for human settlement. Over the past four decades, focused development has propelled various socio-economic activities, including intensive urbanization and the initiation of coastal tourism projects. Significant infrastructures, encompassing main roads, corniches, markets, fishing harbors, and desalination plants, have been strategically established to support and enhance coastal living. This region is prone to flash floods and heavy rainfall, exemplified by significant events such as the floods caused by the 1890 tropical cyclone Gonu in June 2007, Shaheen between 1 and 4 October 2021, Phet in June 2010, and Kyarr in October 2019, resulting in substantial economic and human losses66.

In this study, an area encompassing three wadis (Table 2)—Wadi Al-Shafan, Wadi Al-Sarami, and Wadi Al-Sakhin—in North Al-Batinah is analyzed for future flood prediction (Fig. 2).

Table 2 Charactristics of the study area.
Fig. 2
figure 2

Location of the north Al-Batinah as the study area (ArcMap 10.1).

Results and discussion

Selecting a capable AOGCM model

To enhance the reliability of climate change predictions, this section evaluates the performance of various AOGCMs to identify the most suitable model for the study area. Given that different prediction models can perform variably across regions and computational cells, selecting the right model is crucial for minimizing uncertainties and ensuring accurate future predictions1. In this regard, A deep analysis was conducted to compare the GCM-projected long-term monthly average rainfall for the baseline (historical) period with observed data. This can effectively illuminate the performance of each CMIP6 model in predicting historical patterns. The evaluation involved assessing 18 AOGCM models using various evaluation indices, including MSE67, NMSE68, NSE69, AE70, RMSE71, KGE72. By evaluating these models against observed rainfall data from 1995 to 2014, the IITM-ESM model emerged as the most accurate in representing historical rainfall. Table 3 summarizes the performance of the different AOGCM models. Consequently, future scenarios will be calculated using the IITM-ESM model, which has demonstrated superior performance in estimating historical data.

Table 3 AOGCM models performance for baseline data (1995–2014).

Rainfall downscaling analysis using H2O-AutoML

After selecting the most suitable prediction model, this section focuses on downscaling its projections under four SSP scenarios, including SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5. GCM models, while powerful at capturing large-scale climate patterns, often lack the resolution needed to accurately reflect local climate conditions. In this section, by refining these projections to a finer scale, we aim to produce more accurate and region-specific climate predictions that are crucial for assessing the potential impacts of future climate scenarios. To do so, after identifying IITM-ESM as the best model for the study area, its projections are downscaled using H2O-AutoML. This platform trains the models using data, including specific humidity (hus), average air temperature, and precipitation flux (pr), and it also identifies the optimal ML model for downscaling future rainfall predictions. Figure 3 illustrates the performance of ML models within the H2O platform. The results indicate that the Stacked Ensemble algorithm, identified by the model ID ‘StackedEnsemble_AllModels_6,’ demonstrated the highest performance during dataset training. This model, which combines all base models—GBM, Deep Learning, and DRF—with a GLM Meta-learner, optimized with 5 K-folds, achieved superior results. Specifically, it attained the following performance metrics: RMSE of 2.281, MSE of 2.202, MAE of 0.362, RMSLE of 0.325, and a mean residual deviance of 5.202. The second-best performing model, ‘GBM_grid_1_model_6,’ which is a three-class classification model using a multinomial distribution, achieved RMSE, MSE, MAE, RMSLE, and mean residual deviance of 2.282, 5.207, 0.366, 0.330, and 5.207, respectively.

Fig. 3
figure 3

Performance of leaderboard machine learning models.

Table 4 shows that in the cross-validation results for H2O-AutoML, fold one exhibits the highest performance during validation. This indicates that models trained on the data subset corresponding to fold one perform more effectively on this specific portion of the dataset.

Table 4 Performance of the H2O models for subsets of the data.

After identifying the optimal machine learning model, the next step is to downscale rainfall projections from 2023 to 2042 using this selected model. To achieve this, CMIP6 projections, including specific humidity, rainfall, and air temperature under the SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 scenarios, were gathered and input into the ‘StackedEnsemble_AllModels_6’ model. This process aims to generate downscaled and more accurate predictions of future rainfall.

After predicting future rainfall by H2O, the resulting downscaled daily rainfall data provides a detailed view of potential rainfall changes, which can be aggregated to evaluate trends on larger timescales, such as monthly and yearly. Figure 4A illustrates the downscaled monthly and annual rainfall for the study area. Analysis of the future monthly rainfall patterns reveals that June, July, August, and September are projected to experience the least rainfall. In contrast, March is anticipated to have the highest rainfall across all CMIP6 evaluated scenarios. Moreover, as depicted in this figure, future rainfall is generally projected to increase under all SSP scenarios compared to historical levels.

Additionally, an examination of annual rainfall patterns (Fig. 4B) highlights variations among the different scenarios, emphasizing the need for multiple scenarios to thoroughly assess the range of possible future conditions. Incorporating these diverse scenarios into future planning is essential to develop robust strategies for managing the potential impacts of climate change on rainfall patterns in the region.

Fig. 4
figure 4

(A) Comparison of monthly aggregated rainfall between historical data (1995–2014) and future SSPs scenarios (2023–2042). (B) Aggregated annual rainfall for different SSPs scenarios (2023–2042).

Projection of future IDF

After selecting and downscaling the most appropriate climate model using the best-performing machine learning algorithm, the next critical step involves formulating IDF curves for the study area. By deriving these curves under each evaluated SSP scenario, we aim to gain a deeper understanding of how extreme rainfall events might change in the future, which is crucial for designing infrastructure and planning flood mitigation strategies.

Initially, different probability distribution functions (PDFs) were applied to the downscaled daily rainfall data to capture the variability and extremes associated with each scenario. The parameters for these PDFs were estimated using the Maximum Likelihood Estimation (MLE) method, a statistical approach that identifies the parameter values, maximizing the likelihood function. By doing so, we ensure that the evaluated function best fits the observed data. The performance of these PDFs was evaluated using the Hannan–Quinn Information Criterion (HQIC), a metric that balances model fit with complexity. HQIC penalizes more complex models, favoring those that achieve a good fit without unnecessary complexity. A lower HQIC value indicates a more preferred model, effectively balancing accuracy and simplicity. As illustrated in Fig. 5, the results indicate that the Weibull distribution best fits SSP1-2.6, Log-Pearson III is most suitable for SSP2-4.5, Gamma for SSP3-7.0, and Exponential power for SSP5-8.5. These are considered as the most appropriate functions for generating IDF curves under the respective scenarios. This method ensures the development of IDF curves that are crucial for flood management and provides essential insights into extreme rainfall events.

Subsequently, future IDF curves were generated by calculating the empirical quantile for a given return period and normalizing it by the corresponding rainfall duration. Figure 6 displays the IDF curves for each SSP scenario. The visual analysis of these curves indicates a notably higher rainfall intensity under the SSP2-4.5 and SSP5-8.5 scenarios compared to SSP1-2.6 and SSP3-7.0. Interestingly, the IDF curves for SSP2-4.5 and SSP5-8.5 are closely aligned, while SSP1-2.6 and SSP3-7.0 exhibit similar but comparatively lower, rainfall intensities.

Fig. 5
figure 5

Comparison of distribution functions for fitting future daily rainfall data.

Fig. 6
figure 6

Future IDF curves for rainfall with 5, 10, 25, 50, 100, and 200-year return periods.

Simulation of future flash flood using SWAT model

Following the derivation of the IDF curves, this section focuses on formulating the hydrological conditions of the study area and simulating flash floods under various SSP scenarios. To understand the potential impacts of changing rainfall patterns on flood characteristics, a reliable modeling tool is essential. In this context, the SWAT model is employed to evaluate these effects.

Sensitivity analysis and calibration of the SWAT model are performed using the SUFI-2 algorithm, a widely used method for model calibration integrated into the SWAT-CUP73. In the first step, sensitivity analysis is performed to identify the parameters that significantly impact the model outputs, including baseflow and streamflow. The key sensitive factors are the SCS runoff curve number, Manning’s value for the main channel, and soil moisture parameters. Understanding these parameters would be crucial for accurate simulation and reliable results. Other studies have also highlighted the model’s sensitivity to channel flow parameters74. However, in the case study affected by the flash flood, base flow parameters do not show sensitivity to calibration. In this regard, the model was calibrated using the aforementioned parameters, focusing on streamflow (cms).

In the next step, calibration was carried out in the daily time step for three stations, DS1, DS2, and DS3, from 2018 to 2020. It is important to note that the lack of suitable hourly data and flood events posed significant and complex challenges during the calibration process. The Coefficient of determination (R2) and Nash-Sutcliffe (NS) values were utilized to assess the accuracy of the simulated discharge results at each stage. The results were acceptable and met the criteria recommended by75. In the next step, the model was evaluated for a severe flood event, and station DS1 was finally selected as the evaluated station upstream of the urban area (Table 5).

Table 5 SWAT model performance statistics for long-term period calibration and flood event validating.

After ensuring the model’s reliability for the study area, the SWAT employs parameters derived from the future IDF rainfall, specifically for durations of 15 and 60 min, and return periods of 100 and 200 years, to predict future flood occurrences.

To initiate this process, the rainfall intensity is calculated using the IDF curves under the specified conditions. Subsequently, two scenarios—pessimistic and optimistic—are selected based on varying future rainfall conditions, as detailed in Table 6.

Next, the total future rainfall is determined based on the calculated rainfall intensity. Drawing upon research conducted by44, the future rainfall pattern is assumed to mirror historical events in such situations.

To analyze and disaggregate future rainfall, a methodology developed by76 is applied. Initially, cumulative historical rainfall is divided by the rainfall within specific time steps. This results in coefficients assigned to each time step that represents the ratio of total rainfall to the amount occurring during that interval. Indeed, total rainfall is multiplied by the corresponding coefficients assigned to each time step to establish the future rainfall pattern. This approach facilitates a comprehensive understanding of the temporal distribution of future rainfall based on historical rainfall patterns. Notably, for rainfall durations of 60 min and specified return periods, the coefficients derived from historical events of the same duration are utilized. In other words, the rainfall with durations of 60 min, under specified return periods, is multiplied by the coefficients derived from a historical event of the same duration (60 min). This comprehensive methodology effectively integrates both intensity and historical context to enable us to analyze future rainfall patterns and their implications for flood simulation.

Table 6 Future rainfall intensity under different scenarios.

Future flood analysis

After identifying disaggregated rainfall patterns, in this concluding section, we present the results of the flash flood predictions for North Al-Batinah, derived from the SWAT model simulations. The Digital Elevation Model (DEM) map and stream networks allowed for the subdivision of North Al-Batinah into 154 sub-basins. Utilizing the European Space Agency (ESA) land use map in conjunction with the Food and Agriculture Organization (FAO) soil map, ArcSWAT generated 493 Hydrological Response Units (HRUs). The resulting sub-basins, stream network, and locations of discharge stations are depicted in Fig. 7.

Following the calibration of the SWAT model, simulations were performed for 1-hour rainfall events corresponding to 100- and 200-year return periods, examining both optimistic and pessimistic scenarios (Fig. 8). Specifically, Fig. 8A and B illustrate the hydrograph for the anticipated rainfall event with a 100-year return period under both optimistic and pessimistic outcomes, while Fig. 8C and D present the hydrograph for future rainfall events with 200-year return periods.

The findings indicate that, under pessimistic scenarios, floods upstream of North Al-Batinah’s urbanized area can escalate to 20.33 and 20.70 m³/s during rainfall events with 100 and 200-year return periods, respectively. In contrast, in the optimistic scenario, the flow is 16.56 and 16.85 m³/s for the rainfall events with 100- and 200-year return periods, respectively.

Incorporating both optimistic and pessimistic scenarios in flood simulations strengthens the robustness of our analytical framework. This dual approach allows for thoroughly examining boundary conditions, which is critical for effective flood management strategies. By evaluating the extremes of rainfall events under various scenarios, we can identify potential risks and prepare for a range of possible outcomes.

Fig. 7
figure 7

North Al-Batinha basin location with its sub-basins, stream network, outlets, and discharge stations.

Fig. 8
figure 8

Hydrograph of the future flood considering rainfall pattern in DS1 discharge station.

Discussion

The current study introduces a comprehensive and novel framework that not only provides a new viewpoint on flash flood prediction but also includes a meticulous and complex workload. Given the inherent uncertainties at each stage of flash flood forecasting, it is crucial to employ reliable tools that minimize errors in every step of the calculations. To tackle this issue, the suggested viewpoint ensures that even though this process can be followed by a wide range of models in different case studies, it would remain reliable. In the first step of the study, the best GCM model was selected among eighteen CMIP6 climate change models. This can significantly reduce the risk of selecting an inappropriate predicting model. The H2O-AutoML platform, offering conveniences such as hyperparameter optimization and different ML models, was utilized for downscaling future rainfall with the highest possible accuracy. Next, to construct the most appropriate IDF curves for each scenario, forty distribution functions were fitted to future daily rainfall data under each SSP scenario. In the following, the SWAT sub-daily module was used to simulate future flash floods in the study area. To conclude, this framework not only advances the field of hydrological modeling but also develops a new standard for accuracy and adaptability in flood forecasting, particularly in regions facing the increasing impacts of climate change.

To highlight the novelties of the current study in each stage, detailed information are explained below:

Stage 1 Selecting the most appropriate CMIP6 model.

Different studies incorporated GCM models for evaluating future rainfall and flood prediction. For instance77, used certain CMIP6 models for urban flood analysis but did not assess their performance for historical rainfall in their case study. Similarly4,78, employed CMIP5 for future rainfall estimation, overlooking the socio-economic considerations provided by CMIP6 scenarios and ignoring the evaluation of how the selected models performed in historical rainfall prediction. This issue, including the lack of consideration for socio-economic aspects in future climate scenarios and neglecting to assess climate models’ effectiveness in historical periods, can introduce uncertainties in various layers of flash flood prediction, potentially disrupting the design of flood control strategies and decision-making.

Stage 2: Using H2O-AutoML to rainfall downscaling.

In this study, H2O AutoML was used for rainfall downscaling for the first time, but the novelty of this stage extends beyond that, as the performance of the proposed statistical model surpasses that of recent studies. Compared to the studies by79,80,81, which also utilized ML models to downscale CMIP6-based scenarios for predicting future rainfall, our proposed model demonstrated superior performance. The reasons for this could be attributed to the use of less robust ML models in those studies or the potential oversight of crucial processes in Stage 1.

Stage 3: Employing a wide range of distribution functions for constructing IDF curves.

Although several studies have been conducted to construct IDF curves for future rainfall, some usual assumptions were made to simplify calculations, which could compromise accuracy. For example82,83, used the GEV distribution function to create IDF curves under CMIP6 and CMIP5 scenarios without considering the possibility that other functions might better interpret future rainfall data. It is now evident in our research that other distribution functions can provide better performance in fitting future rainfall data, as confirmed by the HQIC factor. To provide an overview of the discussed studies, Table 7 summarizes the related literature. This table contrasts the approaches taken by previous studies with the comprehensive framework introduced in the current research that highlights our novelties.

Table 7 Summary of discussed studies.

Conclusion

In this study, we employed a hierarchical statistical framework to forecast forthcoming floods in Al-Abatinah, Oman, an arid region significantly impacted by climate change. This study adopts a strategic approach to achieve the most accurate flood prediction. For this purpose, the initial step involved selecting the most suitable AOGCM model based on various evaluation criteria. The findings revealed that the IITM-ESM model best estimates historical rainfall patterns and would be more reliable for future rainfall predictions. So, the subsequent analysis was performed using the IITM-ESM model under four SSP scenarios, including SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5.

In the next step, to downscale the coarse-resolution projections from the selected climate model, different machine-learning models were evaluated through the H2O-AutoML platform to identify the most appropriate model for the study area. By doing so, the stacked ensemble algorithm demonstrated optimal performance in the downscaling process for predicting historical rainfall, and thus, it was employed for forecasting future daily rainfall from 2023 to 2042. Following this, various distribution functions were utilized to identify the most effective one for projecting future IDF curves.

The SWAT model sensitivity analysis and calibration were followed by examining future rainfall scenarios for different return periods and durations to assess the potential for flash floods. The calibrated SWAT model applied to the study area revealed that under a pessimistic scenario, the flood flow at the entrance of the urbanized area in North Al-Batinah could reach 20.33 and 20.70 m³/s during 100- and 200-year return period rainfall events, respectively. Given the proximity of the urban area of north Al-Batinah to the ocean, this situation can lead to a significant risk of human and economic loss.

This study integrates diverse tools and methods to present a comprehensive framework for flash flood prediction in an arid region. The approach incorporates data from 18 AOGCM, employs six machine learning models, and explores 40 distribution functions. By combining these elements, the study aims to enhance the accuracy and reliability of flash flood predictions.

Considering that GCM models can interpret historical data with varying degrees of accuracy, the performance of this framework may differ across study areas. In this regard, due to the lack of access to a wide range of data from various parts of the earth with different climatic characteristics, we encountered limitations in expanding this framework. So, the proposed hierarchical framework should be specialized and adapted to different case studies to account for the unique characteristics of each study area.

Moreover, this research aligns with and contributes to two of the main Sustainable Development Goals (SDGs). Primarily, the work supports Climate Action (SDG 13) by improving our understanding of future rainfall patterns and their impacts on flood risks under different climate change scenarios. By identifying the best-suited models and downscaling techniques, this study enhances predictive capabilities, helping communities prepare for extreme weather events and mitigating the potential impacts of climate change. Additionally, the development of accurate IDF curves and the simulation of hydrological conditions are directly relevant to Sustainable Cities and Communities as the 11th SDG. By providing insights into future flood risks, this research supports the design and implementation of resilient infrastructure, which in turn ensures that urban and rural communities are better equipped to handle extreme weather events. This contribution is crucial for sustainable urban planning and disaster risk reduction and promotes the resilience of cities and communities in the face of climate-related hazards.

Considering the flood vulnerability of North Al-Batinah downstream, decision-makers involved in flood management in Oman need to address this critical issue. They must implement strategies to mitigate the impact of future floods and rainfall events in the region. We recommend implementing flood management practices in the region for future studies such as detention dams to reduce and alleviate flood damages as well as promote sustainable development in the downstream regions of North Al-Batinah. Such measures and planning can be crucial in minimizing the potential consequences of flash floods.