Abstract
Traditional flood prediction approaches either rely on numerical models, which are accurate but computationally intensive, or machine learning models, which are faster but limited by data availability. To address these limitations, we developed a Prediction-to-Map (P2M) framework that combines the strengths of both methods. Trained on observed data and numerical model outputs, P2M delivers rapid, accurate spatial flood predictions. Applied to predict the flood event during Hurricane Nicholas (2021) near Galveston Bay, Texas, P2M produced flood depth maps that closely matched numerical simulations. Comparisons with observed data suggested P2M’s superior performance, as evidenced by higher R-squared and lower RMSE than the numerical model. Moreover, P2M demonstrated remarkable computational efficiency, producing a flood depth map with a 115,200-fold increase in speed. By achieving both faster speed and higher accuracy, this framework overcomes the trade-off in common surrogate models, providing a useful tool for rapid spatial flood prediction.
Similar content being viewed by others
Introduction
Flooding is one of the most destructive natural disasters, posing significant threats to lives, property, and economies globally. It can occur in inland areas due to heavy rainfall and in coastal regions due to storm surges. Compound flooding—resulting from the co-occurrence of these two types of flooding—can be much more widespread and dangerous than single-factor flooding1,2 and has received increased attention over the past decade3. As global exposure to floods is projected to rise4, the demand for rapid and accurate flood prediction becomes crucial for effective decision-making and risk mitigation.
Flood prediction approaches can be broadly categorized into two types. First, process-based numerical models are used to simulate the future situation of the areas of interest. Hydrological and ocean models have been employed to predict inland flooding (e.g., refs. 5,6,7,8) and coastal storm surges (e.g., refs. 9,10,11,12), respectively. Recent advances also include the dynamical coupling between hydrological and ocean models, which not only increases the forecast accuracy but also enables the diagnosis of contributions from different processes13,14 (e.g., surge, fluvial/pluvial, precipitation). However, numerical models, which usually feature high spatial and temporal resolutions, make predictions by solving governing differential equations and are generally computationally intensive and time-consuming.
The second approach involves data-driven machine learning models, which are trained on large datasets to identify patterns and relationships (temporal or spatial) that can be used for flood prediction. Machine learning models are generally less computationally demanding than numerical models and have been widely applied in flood prediction15. Traditional machine learning models rely on real observed data for training and have shown promising results in various flood-prone regions. For instance, machine learning algorithms like support vector regression (SVR), multilayer perceptron (MLP), and long short-term memory (LSTM) networks have been applied for real-time flood stage forecasting based solely on observed water stages, avoiding the need for complex physical models16. Sharma and Kumari17 demonstrated that hybrid models combining convolutional neural networks (CNN) with random forest (RF) and SVR improved forecasting accuracy compared to standalone models. Shi et al.18 showed that their observed data-driven machine learning model achieved comparable or better error rates than physics-based models, with significant computational speedups. However, these models can only predict floods at locations where observed data are available, limiting their applicability in areas without sufficient historical data or sensor coverage. As a result, generating a comprehensive spatial flood prediction using these models is challenging.
One way to address the limitations of observation point–driven machine learning models in spatial flood prediction is to train the models using synthetic datasets generated by numerical simulations. These models, often referred to as surrogate models or emulation models, aim to emulate the outputs of complex physics-based models by learning the input-output relationships from the numerical models and using this knowledge to make predictions based on new input data19. Various machine-learning techniques have been employed to build surrogate models. For example, RF algorithms have been successfully used to approximate flood extent and depth based on numerical model outputs20,21. K-nearest neighbor (KNN) methods have also been explored for rapid flood mapping tasks22. Artificial neural networks (ANN) have demonstrated strong predictive performance when trained on simulation-derived flood scenarios23. CNN, which are particularly adept at capturing spatial patterns, have been utilized for pixel-wise flood extent prediction24, while U-Net architectures have proven effective for high-resolution flood inundation mapping by capturing both global and local spatial features25,26. Additionally, recurrent models, such as LSTM networks, have shown promise in modeling the temporal dynamics of flooding processes based on numerical output data27,28.
In addition to conventional surrogate models, hybrid approaches have been developed to integrate machine learning techniques with numerical or conceptual models. For instance, Fraehr et al.29 introduced a hybrid surrogate model that combines a low-resolution hydrodynamic (low-fidelity) model with a Sparse Gaussian Process (GP) model to downscale predictions to high-resolution flood inundation extents. By applying Empirical Orthogonal Function (EOF) analysis, they reduced the dimensionality of the flood data, enabling the Sparse GP model to learn the relationship between low- and high-fidelity simulations efficiently. This method significantly reduced computational costs while maintaining high accuracy in flood extent predictions. Li and Willems30 proposed a hybrid model for urban pluvial flood prediction. Their approach integrates a lumped conceptual model, configured using graph theory to represent the sewer system’s topology, with logistic regression models calibrated to the hydrological outputs and observed flooding occurrences. This combination allows for fast, probabilistic flood predictions at specific urban locations, achieving up to 86% accuracy with a 96% reduction in computational time compared to traditional hydrodynamic models. Because these hybrid models were trained on synthetic datasets, they can also be categorized as surrogate models.
The accuracy of surrogate models’ predictions highly depends on the performance of the underlying numerical models. This dependence arises from the fact that only numerical models can produce sufficient spatial training data. However, multiple sources of uncertainty can affect the accuracy of numerical models31. Although a well-calibrated numerical model can accurately simulate a specific flood event, its performance can vary across different events, introducing additional uncertainties into the models that depend on it. Additionally, although these approaches are designed to emulate the results of numerical models, they are unable to replicate the results perfectly due to the introduction of uncertainties during the emulation process. As a result, these surrogate models achieve a faster prediction speed at the cost of accuracy, reflecting a trade-off between speed and accuracy. This limitation arises from their heavy reliance on numerical model outputs without sufficiently incorporating observed data into the training and predicting processes.
In this study, we proposed a Prediction-to-Map (P2M) framework aiming at producing rapid and accurate spatial flood predictions. Our framework identified several locations with available observed water depth/level time series as control points. A machine learning model, trained exclusively on the observed data, was then used to predict flood depths at these control points. Subsequently, another machine learning model was trained on the outputs of a numerical model to map the spatial flood depth based on predictions at the control points. Because the prediction process was based on observed data while the mapping process relied on a numerical model, the framework’s accuracy depends partly on how well the machine learning models learn from the data and partly on the numerical model’s performance. This approach leverages the strengths of machine learning models (learning from observed data) and numerical models (providing spatial distribution), offering a promising strategy to overcome the trade-off between speed and accuracy in flood predictions by achieving both faster results and higher accuracy.
The Prediction-to-Map (P2M) framework (hereafter referred to as P2M) developed in this study generates spatial water depth distribution within a prediction area through two major procedures: predicting and mapping (Fig. 1). First, observed data were used to predict the water depths at several locations where long term water depth time series are available. These locations were defined as control points in this study. This prediction model was based on a machine learning model trained exclusively on the observed data (Box ① in Fig. 1). Second, a mapping model produced the spatial water depth distribution based on the water depth predictions at the control points. Specifically, the mapping model calculated the water depth at every pixel of the prediction area. This mapping model was based on another machine learning model trained on the outputs of a numerical model (Box ② in Fig. 1). It built the relationship between the water depths at the control points and that at individual grid cells of the numerical model. Thus, the pixel size of the predicted flood depth map was determined by the grid spacing of the numerical model’s domain. All three component models, including the prediction model, numerical model and mapping model can be specialized to adapt to various applications across different study sites. In this study, the prediction model was implemented using Long Short-Term Memory (LSTM) networks, the numerical model was our newly developed dynamically coupled hydrological-ocean model13,14, and the mapping model was based on several regression models. Each component model within the framework is described in detail in “Methods” section.
Workflow of the Prediction-to-Map (P2M) Framework.
P2M was designed to predict inland flooding, storm surge, and their co-occurrence as compound flooding. The selected study area was Galveston Bay, TX, a joint region among the San Jacinto Watershed, Galveston Bay-Sabine Lake Watershed, Lower Trinity Watershed, and the Northern Gulf of Mexico (now Gulf of America, Fig. 2). This area receives freshwater inputs from the watersheds and saltwater inputs from the coastal ocean. The main sources of freshwater inputs are Buffalo Bayou, San Jacinto River, and Trinity River, with average streamflows of 51 m3/s, 48 m3/s, and 216 m3/s, respectively (calculated from the discharge data at USGS gages 08074000, 08072050, and 08067250). During flooding events, overbank water can contribute additional freshwater via overland processes. Galveston Bay exchanges water with the open ocean through the Galveston Bay Entrance, an outlet bounded by barrier islands. This outlet is dominated by diurnal tides with a tidal range of 0.36 m (data from NOAA station 8771341). The region experiences tropical storms about once every three years32, which can generate both inland flooding and coastal storm surges. During slow-moving or stalling tropical cyclones33, these two types of flooding can co-occur, leading to compound flooding. Hurricane Harvey (2017) was a prototypical example of compound flooding, which caused catastrophic flooding in the Galveston Bay area34. The frequent tropical storm activity highlights the importance of rapid and accurate flood prediction for timely preparation and mitigation measures in this area.
a Observed data sources and model domains. Blue squares, yellow triangles, and red dots indicate the locations of observed water level, discharge, and wind data, respectively. The black dashed box represents the numerical model’s domain. The black curve outlines the area where spatial flood predictions were performed (prediction area). b Zoom-in view of the prediction area.
Results
Flood depth map prediction
We applied P2M to generate the spatial flood depth distribution (flood depth map) during Hurricane Nicholas (2021) through two steps. First, the prediction model produced hourly water depth predictions at the eight control points (blue squares in Fig. 2 and Fig. S1) with lead times from 1 to 6 h. Second, the mapping model generated hourly flood depth maps based on the predicted water depths at the control points. The extent of the flood depth maps was confined to the region surrounded by the control points to maximize the reliability of the predicted water depths (outlined by the black curve in Fig. 2). We also ran the numerical model for the same period for comparative purposes.
Figure 3 shows the peak flood depth distributions during Hurricane Nicholas (2021) produced by P2M and the numerical model. Three major areas experienced flooding: Buffalo Bayou, Greens Bayou, and several bays upstream of Galveston Bay (labeled in Fig. 3a). The flood depth maps generated by P2M closely mirrored the patterns simulated by the numerical model, with two notable differences. First, P2M predicted smaller flood depths in the bays compared to the numerical model. Second, P2M generated greater flood depths in Greens Bayou, leading to the prediction of flooding in small creeks along Greens Bayou that the numerical model did not simulate.
a Map generated by the numerical model. b–g Maps generated by P2M with 1-h through 6-h lead times.
Figure 4 presents the flood depth peak differences between the results of P2M and the numerical model. In Buffalo Bayou and the bays upstream of Galveston Bay, the differences were negative, with average values ranging from –0.43 m to –0.52 m for different lead times (Table S1). These differences suggest that P2M tended to predict lower flood depths in these areas compared to the numerical model. However, these differences were not significant, as they were calculated at the flood peak when water depths were at their highest during the event. The average flood depth peak in these areas was 6.5 m, and thus, the relative error was about 7%. In contrast, along Greens Bayou, water level differences were positive, with average values ranging from 1.09 m to 1.45 m for different lead times. The relative error was about 20%, given the average flood depth peak was 6.3 m. These relatively large differences could be attributed to the uncertainties in the prediction model. The validation results indicate relatively large errors in the predictions at C2 compared to other control points (Fig. S3). Control point C2 provided the upstream information for Greens Bayou and thus was crucial for the predicted flood depth map within this area. Any errors in its predictions were directly passed to the mapping model. The mapping model, in turn, processed this inaccurate input and produced a flawed spatial distribution of flood depths near C2. This demonstrates that the uncertainties in the prediction model could cascade to the mapping model and affect the flood depth map generated (Fig. S9). These uncertainties highlight the important role of control points in P2M. To improve accuracy, it is essential to have control points distributed around the periphery of the prediction area boundary, ensuring comprehensive spatial coverage (see details in supplementary information Table S6). In this study, control points were located at each key river inflow within the prediction area (Fig. S3), which enhanced the reliability of the predicted flood depth map.
a–f Show results from P2M predictions with 1-h through 6-h lead times, respectively.
To assess the overall flood prediction performance of P2M compared to the numerical model, we calculated several evaluation metrics based on the results of both approaches, including the Nash–Sutcliffe Model Efficiency (NSE), Root Mean Squared Error (RMSE), averaged flood peak over the prediction area, and the flooded area for flood depths exceeding several thresholds. Table 1 summarizes these metrics calculated over the entire prediction area. The NSE values were consistently above 0.98 for all lead times, indicating that the results of P2M and the numerical model closely match each other. The RMSE ranged from 0.26 to 0.29 m across six lead times. The average flood peak simulated by the numerical model was 2.00 m, while P2M predicted a range of 2.15 to 2.24 m, resulting in a difference of 0.15–0.24 m between the two methods. This difference and the RMSE correspond to approximately 7.5–14% of the average flood peak, indicating that the results from both methods were of the same magnitude, and the overall difference was not significant. For the flooded area, we compared the regions where the flood depth exceeded several thresholds. For flood depth greater than 0.1 m, P2M predicted a smaller flooded area compared to the numerical model, primarily due to the underestimation of flood depth in Buffalo Bayou and the bays. For the remaining thresholds, P2M consistently predicted smaller flooded areas than the numerical model, though the differences were less pronounced. Overall, these metrics demonstrate that P2M effectively generated flood depth maps that closely matched the patterns and magnitudes of the numerical model’s simulation.
Predictive accuracy
As water level/depth observation was only available at the control points, directly comparing the accuracy of flood depth map predictions generated by P2M and those simulated by the numerical model is challenging. To address this, we selected two control points C6 and C7, where observed data were available, and used the remaining seven control points as predictors to predict the water depths at these two points. The two selected points were located within the area confined by other control points, making them representative of the actual mapping process. This method allowed us to compare the water level time series produced by P2M and the numerical model against the observations to determine which method offers a more accurate representation.
In this study, high accuracy is defined as a strong agreement between predicted and observed data quantified by a higher R-squared (R²) value, and a low predictive error measured by Root Mean Squared Error (RMSE). Water depth predictions were made with six lead times (1 h through 6 h). The prediction model’s accuracy was highest at the 1-hour lead time, gradually decreasing as the lead time increased (Fig. S3, Table S4). To provide an overall accuracy assessment of P2M, we averaged the six predictions and compared them with the numerical model’s simulations. In practical forecasting, these six predictions could also serve as an ensemble set, potentially reducing the uncertainties in the framework. Fig 5 shows the observed, P2M-predicted, and numerical model-simulated water level time series during Hurricane Nicholas. We plotted P2M’s results with all six lead times (ensemble members) and their mean value (ensemble mean). The analysis was mainly conducted by comparing the ensemble mean with the numerical model’s simulation.
The observed (black), P2M-produced (gray), averaged P2M-produced (red) and numerical model-simulated (blue) water level time series during Hurricane Nicholas at control points C6 (a) and C7 (b).
At control point C6 (Fig. 5a), the numerical model performed better in reproducing the peak value of the flood, as P2M underestimated this peak. However, P2M better captured the timing of the flood peak. During the period just before and after the peak, the numerical model overestimated the flood level, while P2M accurately reproduced the surge and recession of the stormwater. Under normal conditions, the numerical model generally captured the high tidal levels but overestimated the low tides. Although P2M produced accurate tidal levels, it also exhibited a phase shift. The calculated metrics indicate that P2M outperformed the numerical model, with an R-squared value of 0.06 higher and an RMSE of 0.1 m lower.
At control point C7 (Fig. 5b), the numerical model overestimated the water levels during the flooding, while P2M underestimated them. The model-observation comparison suggests a better performance of P2M in terms of predicting flooding at this point. Under normal conditions, both methods behaved similarly to those at C6: the numerical model overestimated low tidal levels, while P2M showed a phase shift. The calculated metrics at this point indicate that both methods had the same R-squared value, but P2M’s RMSE was 0.09 m lower than that of the numerical model.
These two control points are situated in different geographic locations, with C6 located inland, primarily influenced by inflows from C3 through C5, and C7 positioned in upper Galveston Bay, receiving water from both inland sources and the ocean (Fig. S3c). The validation results at these two locations demonstrate P2M’s accuracy in predicting flood depth under varying conditions, further confirming the reliability of the flood depth maps it generated.
Computational efficiency
Table 2 summarizes the computational time of P2M and the numerical model. P2M’s training process took 720 h with 240 CPU cores to train the prediction model. The training time for the mapping model was about 15 min with a single CPU core, which was negligible compared to the extensive time required for the prediction model’s training. The numerical model did not require a training process, and thus, no training time was provided. However, it’s important to note that building a numerical model involves significant effort, including creating the computational domain, preparing input files for forcing and boundary conditions and calibrating and validating the model. The time required for these efforts is difficult to quantify as CPU hours and thus is not summarized in the table. Moreover, the time required to collect data for both numerical and machine learning models is not summarized either.
For the prediction process, we used 480 CPU cores to run the numerical model. The process began with a 4-min initialization, during which the model read all necessary input files, set parameters, and initialized variables. After initialization, the model began stepping through the simulation period, which took approximately 2 min per simulation hour. Therefore, the total run time of the numerical model depends on the prediction period. For example, a 6-h prediction would require 16 min (7680 CPU-minutes). In comparison, P2M required only 4 s using a single CPU core to produce a prediction for one lead time (4 CPU-seconds), resulting in a 115,200-fold increase in computation efficiency. If predictions for all six lead times are required, the total time would be 24 s using 1 CPU core or 4 s using 6 CPU cores. This comparison highlights the significant computational efficiency of P2M, which is especially important in forecasting applications. Its rapid runtime enables large ensemble simulations to be completed within seconds, making it well-suited for uncertainty quantification based on varying rainfall scenarios, model parameters, and/or boundary conditions.
Discussion
The key contribution of P2M lies in its ability to generate flood predictions that are rapid, accurate, and spatially distributed. As illustrated in Fig. 6, existing flood prediction methods tend to involve trade-offs between speed, accuracy, and spatial generalization. Machine learning models generate rapid and accurate flood predictions, but are limited by the availability of training data. Numerical models simulate accurate spatial flood distribution but are time-consuming. P2M’s spatial prediction capability and fast speed give it a distinct advantage over machine learning and numerical models, respectively. One major strength of P2M is its demonstrated potential to achieve higher predictive accuracy than surrogate models. Surrogate models, which are trained solely on numerical models’ outputs, can emulate numerical model results at a faster speed. However, the speedup comes at the cost of accuracy due to the introduction of uncertainties during the emulation process. As a result, surrogate models cannot surpass the accuracy of their underlying numerical models, reflecting a trade-off between speed and accuracy35,36. Accordingly, most surrogate model studies focus on how closely their outputs match those of numerical simulations (e.g., refs. 20,21,22,29), establishing the numerical model as an upper bound for accuracy. In contrast, P2M’s prediction component is trained directly on observed data at selected control points. This allows P2M to achieve higher predictive accuracy than the numerical model at these locations, eliminating the accuracy constraint imposed by surrogate modeling frameworks. The spatial generalization beyond these control points is handled by a mapping model, which captures water depth relationships across the domain. While similar in structure to the framework proposed by Zhou et al.37,38, which also identifies representative points for prediction and then maps spatial flood distributions, the key difference lies in the source of training data. In Zhou’s framework, prediction models were trained on numerical outputs. In contrast, P2M strategically locates its control points at locations with observed data, enabling greater predictive accuracy and better real-world applicability. Although uncertainties may arise in both prediction and mapping components, testing results showed that the P2M outperformed the numerical model in terms of accuracy. This indicates that P2M overcomes the speed-accuracy trade-off in surrogate modeling frameworks.
“Machine Learning” refers to the machine learning models trained on the observed data. “Surrogate Model” refers to the machine learning models trained on the numerical models’ outputs.
Another strength of P2M is its application to a joint area that connects several hydrological basins and the coastal ocean, where multiple flood drivers such as tide, storm surge, river discharge, and heavy rainfall interact. Accurate flood prediction in such areas requires robust cross-scale modeling of both land and ocean processes as well as their interactions. Unlike existing surrogate modeling studies that focus solely on either hydrological basins (e.g., refs. 39,40,41) or coastal systems (e.g., refs. 42,43), the numerical model in this study is a recently developed two-way dynamically coupled hydrological-ocean model. Compared to traditional one-way coupled models, which often struggle to capture the flood dynamics in the land-ocean interaction zone44, the coupled model used in this study is capable of increasing the accuracy by 40%13,14. Moreover, it provided accurate simulations of flood dynamics in every grid cell of the model domain, whereas traditional one-way coupled numerical models typically link hydrological and ocean processes at discrete points in main river channels, limiting their coverage in overland areas (e.g., refs. 45,46,47). This underlying numerical model enabled P2M to produce reliable and accurate predictions of most flooding conditions across the land-ocean continuum.
The complexity of the underlying numerical model also demonstrates an advantage of P2M over surrogate models. Surrogate models rely on numerical models to generate sufficient training data, often requiring simulations from about 50 events (e.g., refs. 28,38). To avoid excessive computational costs, these numerical models must be relatively simple, limiting their ability to capture complex flood dynamics. Additionally, the calibration and validation of these numerous events are time-intensive. In contrast, P2M uses numerical model outputs to train the relationships between water depths at different locations, rather than directly training the water depth. Our results suggest that P2M generated accurate predictions, with the mapping model being trained on the simulation of a single representative event. This allows P2M to benefit from the precision of a sophisticated and well-validated numerical model, providing a distinct advantage in terms of both efficiency and accuracy.
P2M is adaptable to other study sites as long as sufficient observed data are available to serve as control points. Its flexibility allows it to be trained on numerical model outputs across different morphological and hydrodynamic conditions, making it a valuable tool for diverse environmental predictions. Additionally, it can be extended beyond flood level predictions to variables such as salinity and sediment concentration, broadening its applicability. This adaptability suggests potential applications in coastal and estuarine management, where rapid and accurate predictions of key environmental variables are essential for decision-making, hazard mitigation, and ecosystem monitoring.
P2M demonstrated its capability of producing spatial flood predictions within seconds. The predictions had similar patterns and potentially higher accuracy than the numerical model, suggesting that P2M could be valuable in flood warning and response. In this section, we further discuss the limitations of P2M, providing a balanced view between this data-driven approach and the process-based numerical model.
First, the prediction model in P2M still relied on the data availability from discrete locations, limiting its ability to capture complex spatial features and local small-scale events. For example, discharge data used in the model were from available USGS gages, which were mainly located in the main channels, providing upstream conditions for the predictions. However, these data could overlook the contributions from overland processes and small creeks, especially during events with significant localized precipitation. Similarly, the limited wind data may not fully represent the wind field across the prediction area. These limitations could be improved by incorporating spatially distributed data, such as using observed spatial precipitation instead of discrete discharge data48,49. However, this approach would greatly increase the training time. Given P2M’s robust performance in both validation and prediction and the study’s primary focus on developing the framework, potential improvements are needed for future work.
Second, the training data for the mapping model was derived solely from the outputs of the numerical model simulation of Hurricane Harvey (details in “Methods” section). Although Harvey was a representative event that encompassed a wide range of flooding scenarios, relying on a single event could introduce limitations in capturing all possible conditions. However, configuring and calibrating a numerical model to be robust enough to serve as the source of data for training the mapping model is still very time-consuming. In addition, to capture the hydrodynamics on a small scale, the numeric model features a 100-m horizontal resolution, which largely increases the computational cost while limiting its spatial coverage. To reduce computational time, we utilized the 2D version of ROMS, which did not include waves in its configuration, potentially introducing some uncertainties to the model. To address these limitations, future efforts will involve expanding the spatial coverage of numerical models, implementing the 3D version of ROMS with wave coupling, and introducing additional hurricane events or synthetic tropical cyclones50 to enrich the training dataset for the mapping model.
Third, while P2M demonstrates strong predictive capabilities, its forecasting horizon is currently limited to six hours, posing a constraint on its applicability for longer-term flood preparedness. For operational forecasting beyond this time window, P2M can utilize predicted forcings such as discharge from the National Water Model51 and wind from North American Mesoscale Forecast System52 to iteratively update water level predictions. This rolling-out strategy has the potential to extend the forecasting window. However, it also introduces additional uncertainties propagated from the forcing predictions. Assessing the performance of P2M under this approach falls beyond the scope of this study and will be the focus of future investigations.
Last, P2M may exhibit bigger uncertainties when predicting extreme events than the numerical model. This is because the prediction model was trained exclusively on observed historical data, which could lead to an Out-of-Distribution (OOD) issue when encountering extreme events beyond the range of the training data53,54. In contrast, process-based numerical models are driven by governing equations. As the dynamics of extreme events follow these governing equations, numerical models may provide more reliable simulations of such events. In practical operations, we recommend using both numerical and P2M methods to balance the computational efficiency and accuracy during extreme weather events.
Methods
Observed data availability
We collected three data types within the study area to build the prediction model within P2M. Water level data were collected from the lower parts of the watersheds (C1 through C6 in Fig. 2b) and the middle-upper parts of Galveston Bay (C7 and C8 in Fig. 2b). Water level data were converted to water depth, which served as the dependent variable in the prediction. These data also provided information on historical water depth variations, serving as independent variables for the prediction. Discharge data were collected upstream of the water level data locations. These data served as independent variables, providing information on river flooding occurrences. Wind speed and direction data were collected from the middle-lower part of the bay and one coastal location west of the study area. These data were also treated as independent variables, providing information on wind patterns for predicting wind-induced water depth variations. All data were collected over a ten-year period, from 2013/11/1 to 2023/11/1, at hourly intervals. The sources of the data from USGS and NOAA gaging stations are summarized in Table S2.
Prediction model
The prediction model in P2M is responsible for predicting the water depths at the eight control points using observed data. The prediction model took a historical time series of water depth, discharge, and wind data as inputs, and output predicted water depths for future time steps (Fig. S1). Specifically, the input data included 10 days of hourly discharge data at eight locations, water depth data at the eight control points, and wind speed and direction data at three locations (yellow triangles, blue squares, and red dots, respectively; Fig. 2a). This resulted in 22 input variables: eight discharge variables, eight water depth variables, and six wind variables (3 locations × 2 wind parameters). With a time series length of 240 h (10 days), the total input size for the model was 22 × 240. The model’s output was the predicted water depths at the control points for the upcoming hours. Separate models were built for each control point and lead time. In this study, we built the model to predict the water depths up to 12 h in advance. Thus, the total number of machine learning models contained in the prediction model was 8 × 12 = 96. The results shown in the study were based on the predictions with lead times 1 through 6 h because model uncertainty grew significantly when predicting water depth during hurricanes with lead times greater than 6 h (see testing results in Fig. S3).
All the machine learning models were trained and tested on the data collected over a 10-year time series. The data were segmented using a sliding window approach, with each window spanning 252 h (240 h for input and 12 h for output). Each segment of the time series, corresponding to a 252-h window, was treated as a training or testing example. Overlapping was allowed between different examples. After removing windows with missing data, a total of 64,245 training/testing examples were obtained. We split all the examples into training (before 2021/8/1) and testing (after 2021/8/1) sets (Fig. S2). One hurricane (Hurricane Nicholas) and one tropical storm (Tropical Storm Harold) were included in the testing set to evaluate the model’s performance during flooding events.
Long Short-Term Memory (LSTM) network was selected to predict water depths using the historical time series data. The LSTM network was constructed with a many-to-one architecture (gray dashed box in Box ①; Fig. 1). The activation function used in the network was the Leaky Rectified Linear Unit (Leaky ReLU), which is similar to the standard Rectified Linear Unit (ReLU) but has a slope for the negative inputs. The slope coefficient α determines this slope. The model hyperparameters included the slope coefficient α, the number of layers, and the number of features in the hidden state (hidden size) of each layer. Table 3 outlines the hyperparameter configurations used during training. For each control point and lead time, 330 LSTM networks were built with different hyperparameter combinations. Five-fold cross-validation (CV) was performed on the training set for each network to identify the optimal hyperparameters. The network with the lowest CV error, based on RMSE, was selected for each control point and lead time (Table S3). In total, 330 × 5 × 96 = 158,400 LSTM networks were fitted during this training process. The prediction model was tested on the testing set (see supplementary information).
Numerical model
The numerical model in P2M generated hindcast results for building the relationship between the water depths at the eight control points and the water depth at every location within the prediction area. The numerical model used in this study was a coupled hydrological-ocean model developed in our recent work13,14. It coupled the hydrological model, the hydrological modeling extension package of the Weather Research and Forecasting model (WRF-Hydro)55, and the ocean model, the Regional Ocean Modeling System (ROMS)56 using the Coupled Ocean-Atmosphere-Wave and Sediment Transport (COAWST) Modeling System57. The coupled model realized a seamless connection between these two components on an interface boundary through a two-way dynamical coupling method. It allows an integrated simulation of the entire land-ocean continuum and is ideal for events driven by multiple flood sources. The coupled model has been successfully applied to simulate the compound flooding event induced by Hurricane Florence near Cape Fear Estuary, NC (2018) and has demonstrated high accuracy in reproducing water level variations during this hurricane-induced compound flooding event13,14.
To generate training data for the mapping model, the coupled model’s outputs need to capture both river flooding and storm surge to include all potential flooding situations. Thus, we selected a prototypical compound flooding event—Hurricane Harvey (2017)—for running the model. Hurricane Harvey generated a strong storm surge and tremendous rainfall in the study area, representing most flooding situations this area may experience. The domain of this numerical model covered the entire Galveston Bay, the San Jacinto Watershed, the Galveston Bay-Sabine Lake Watershed, the lower part of the Lower Trinity Watershed, and the coastal ocean adjacent to the bay with a 100-m grid spacing (Fig. 2a and Fig. S5). The model was initialized on 2017-08-01 at 0:00 and ran for 60 days. Hourly water depth was outputted for every grid cell within the prediction area and at the eight control points. This resulted in a dataset of 1,440 hourly water depth records for each location. In addition, we used the coupled model to simulate Hurricane Nicholas (2021) for testing and comparative purposes. This simulation began on 2021-09-01 at 0:00 and ran for 30 days. Detailed model setup and validation are provided in supplementary information (Fig. S6).
Mapping model
The mapping model generates the spatial distribution of water depth by calculating the water depth at each pixel in the prediction area, which contained 43,246 cells of the numerical model’s grid with a 100-m horizontal resolution. For each grid cell, we trained an individual machine learning model on the water depth data from the numerical model. The input of each machine learning model was the water depths at the eight control points, and the output was the water depth at its corresponding grid cell. The mapping model was trained on the numerical model’s outputs during Hurricane Harvey. The estimated return period for an event of Harvey’s magnitude in Southeast Texas is 1000–10,000 years58, which provides a robust upper limit for training. We ran the numerical model for 60 days, covering the period before, during, and after Harvey, which generated 1440 training sets for the mapping model, covering all 43,246 cells within the prediction area. The generated training water depth data ranged from low levels under normal conditions to high levels under extreme hurricane conditions. This comprehensive training dataset covered a wide range of scenarios, minimizing the likelihood of encountering OOD issues during forecasting. We tested the mapping model on the observed data during a new hurricane event: Nicholas (see supplementary information Fig. S8).
In the prediction area, two types of locations were identified. Type 1 locations were those where the water depth was consistently greater than zero, which means these locations were always covered by water. Buffalo Bayou and Galveston Bay are examples of Type 1 locations. All eight control points were also Type 1 locations, as the observation stations are typically set in the bay and main channels. Type 2 locations were those where the water depth was zero under normal conditions and greater than zero during flooding. These locations remain dry unless affected by flooding, which results in truncated water depth data due to the lack of recorded values during normal conditions. Houston is an example of a Type 2 location.
Four machine learning models were included in the mapping model (Table 4). For Type 1 locations, we fitted a linear regression model on the entire water depth time series from the numerical model (Model-1). For Type 2 locations, we fitted another three models. Model-2 was also a linear regression but was trained on the untruncated part of the data. Model-3 was a hurdle model. To fit this model, we first converted the response water depth to a binary variable indicating whether the location was flooded (water depth greater than zero). A logistic regression was then fitted on this binary response. For conditions when the logistic regression indicated that the location was flooded, Model-2 was used to calculate the water depth. Model-4 was a Tobit Model, which was designed to handle truncated data59.
A five-fold CV was conducted at each Type 2 location. Given the unbalanced feature of the data at certain locations where flooding occurred within a very narrow time window or not at all, specific criteria were applied. We defined a valid fold as one containing both flooded and unflooded data and counted the number of such folds. If there were more than two valid folds, we calculated the average RMSE for these folds as the CV RMSE. The model with the lowest CV RMSE was selected for each location. If fewer than two valid folds were identified, it indicated that the location experienced either short-duration flooding or no flooding at all. These locations were characterized as low flood risk (hereafter referred to as low-flood-risk locations), as they were simulated to experience minimal flooding during the extreme event of Hurricane Harvey. These low-flood-risk locations would be excluded from the mapping model. The model selection results are provided in supplementary information (Fig. S7).
Data availability
All observed data used in this study are publicly accessible. Water level and discharge data were obtained from U.S. Geological Survey (USGS) gages (https://dashboard.waterdata.usgs.gov), and tidal and meteorological data were sourced from NOAA stations (https://tidesandcurrents.noaa.gov). Topographic and bathymetric data used to drive the numerical models were retrieved from the NHDPlusV2, NOAA Coastal Relief Model, and NCEI Estuarine Bathymetric Digital Elevation Models, as detailed in the Supplementary Information. Meteorological forcing data were obtained from the National Water Model CONUS Retrospective Dataset and the Rapid Refresh (RAP) atmospheric product. The numerical simulations were conducted using open-source model frameworks, including WRF-Hydro (https://ral.ucar.edu/projects/wrf_hydro), ROMS (https://www.myroms.org), and the COAWST modeling system (https://code.usgs.gov/coawstmodel/COAWST). The Prediction-to-Map (P2M) machine learning framework was developed using TensorFlow-Keras (https://www.tensorflow.org/guide/keras) for long short-term memory (LSTM) networks and Scikit-learn (https://scikit-learn.org) for regression models.
Code availability
Custom code for training, testing, and applying the P2M models is available from the corresponding author upon reasonable request. A public GitHub repository will be released upon publication.
References
Wahl, T., Jain, S., Bender, J., Meyers, S. D. & Luther, M. E. Increasing risk of compound flooding from storm surge and rainfall for major US cities. Nat. Clim. Change 5, 1093–1097 (2015).
Sangsefidi, Y., Bagheri, K., Davani, H. & Merrifield, M. Data analysis and integrated modeling of compound flooding impacts on coastal drainage infrastructure under a changing climate. J. Hydrol. 616, 128823 (2023).
Wijetunge, J. J. & Neluwala, N. G. P. B. Compound flood hazard assessment and analysis due to tropical cyclone-induced storm surges, waves and precipitation: a case study for coastal lowlands of Kelani river basin in Sri Lanka. Nat. Hazards 116, 3979–4007 (2023).
Hirabayashi, Y. et al. Global flood risk under climate change. Nat. Clim. change 3, 816–821 (2013).
Chen, W. et al. A coupled river basin-urban hydrological model (DRIVE-Urban) for real-time urban flood modeling. Water Resour. Res. 58, e2021WR031709 (2022).
Mendes, J. & Maia, R. Hydrologic modelling calibration for operational flood forecasting. Water Resour. Manag. 30, 5671–5685 (2016).
Reed, S., Schaake, J. & Zhang, Z. A distributed hydrologic model and threshold frequency-based method for flash flood forecasting at ungauged locations. J. Hydrol. 337, 402–420 (2007).
Rozalis, S., Morin, E., Yair, Y. & Price, C. Flash flood prediction using an uncalibrated hydrological model and radar rainfall data in a Mediterranean watershed under changing hydrological conditions. J. Hydrol. 394, 245–255 (2010).
Bajo, M., Zampato, L., Umgiesser, G., Cucco, A. & Canestrelli, P. A finite element operational model for storm surge prediction in Venice. Estuar., Coast. Shelf Sci. 75, 236–249 (2007).
Madsen, H. & Jakobsen, F. Cyclone induced storm surge and flood forecasting in the northern Bay of Bengal. Coast. Eng. 51, 277–296 (2004).
Mattocks, C. & Forbes, C. A real-time, event-triggered storm surge forecasting system for the state of North Carolina. Ocean Model. 25, 95–119 (2008).
Zhang, K., Li, Y., Liu, H., Rhome, J. & Forbes, C. Transition of the coastal and estuarine storm tide model to an operational storm surge forecast model: A case study of the Florida coast. Weather Forecast. 28, 1019–1037 (2013).
Bao, D. et al. A numerical investigation of Hurricane Florence-induced compound flooding in the Cape Fear Estuary using a dynamically coupled hydrological-ocean model. J. Adv. Model. Earth Syst. 14, e2022MS003131 (2022).
Bao, D., Xue, Z. G. & Warner, J. C. Quantifying compound and nonlinear effects of hurricane-induced flooding using a dynamically coupled hydrological-ocean model. Water Resour. Res. 60, e2023WR036455 (2024).
Mosavi, A., Ozturk, P. & Chau, K. W. Flood prediction using machine learning models: Literature review. Water 10, 1536 (2018).
Dazzi, S., Vacondio, R. & Mignosa, P. Flood stage forecasting using machine-learning methods: A case study on the Parma River (Italy). Water 13, 1612 (2021).
Sharma, S. & Kumari, S. Comparison of machine learning models for flood forecasting in the Mahanadi River Basin, India. J. Water Clim. Change 15, 1629–1652 (2024).
Shi, J. et al. Deep learning models for flood predictions in south florida. arXiv preprint arXiv:2306.15907 (2023).
Zhao, Y., Jiang, C., Vega, M. A., Todd, M. D. & Hu, Z. Surrogate modeling of nonlinear dynamic systems: A comparative study. J. Comput. Inf. Sci. Eng. 23, 011001 (2023).
Zahura, F. T. et al. Training machine learning surrogate models from a high-fidelity physics-based model: Application for real-time street-scale flood prediction in an urban coastal community. Water Resour. Res. 56, e2019WR027038 (2020).
Zahura, F. T. & Goodall, J. L. Predicting combined tidal and pluvial flood inundation using a machine learning surrogate model. J. Hydrol. Regl. Stud. 41, 101087 (2022).
Hou, J., Zhou, N., Chen, G., Huang, M. & Bai, G. Rapid forecasting of urban flood inundation using multiple machine learning models. Nat. Hazards 108, 2335–2356 (2021).
Hosseiny, H., Nazari, F., Smith, V. & Nataraj, C. A framework for modeling flood depth using a hybrid of hydraulics and machine learning. Sci. Rep. 10, 8222 (2020).
Kabir, S. et al. A deep convolutional neural network model for rapid prediction of fluvial flood inundation. J. Hydrol. 590, 125481 (2020).
Frame, J. M. et al. Rapid inundation mapping using the US National Water Model, satellite observations, and a convolutional neural network. Geophys. Res. Lett. 51, e2024GL109424 (2024).
Löwe, R., Böhm, J., Jensen, D. G., Leandro, J. & Rasmussen, S. H. U.-F. L. O. O. D. Topographic deep learning for predicting urban pluvial flood water depth. J. Hydrol. 603, 126898 (2021).
Hu, R., Fang, F., Pain, C. C. & Navon, I. M. Rapid spatio-temporal flood prediction and uncertainty quantification using a deep learning method. J. Hydrol. 575, 911–920 (2019).
Kao, I. F., Liou, J. Y., Lee, M. H. & Chang, F. J. Fusing stacked autoencoder and long short-term memory for regional multistep-ahead flood inundation forecasts. J. Hydrol. 598, 126371 (2021).
Fraehr, N., Wang, Q. J., Wu, W. & Nathan, R. Development of a fast and accurate hybrid model for floodplain inundation simulations. Water Resour. Res. 59, e2022WR033836 (2023).
Li, X. & Willems, P. A hybrid model for fast and probabilistic urban pluvial flood prediction. Water Resour. Res. 56, e2019WR025128 (2020).
Muñoz, D. F., Moftakhari, H. & Moradkhani, H. Quantifying cascading uncertainty in compound flood modeling with linked process-based and machine learning models. Hydrol. Earth Syst. Sci. 28, 2531–2553 (2024).
Keim, B. D., Muller, R. A. & Stone, G. W. Spatiotemporal patterns and return periods of tropical storm and hurricane strikes from Texas to Maine. J. Clim. 20, 3498–3509 (2007).
Trepanier, J. C., Nielsen-Gammon, J., Brown, V. M., Thompson, D. T. & Keim, B. D. Stalling North Atlantic Tropical Cyclones. J. Appl. Meteorol. Climatol. 63, 1409–1426 (2024).
Blake, E. S., & Zelinsky, D. A. National Hurricane Center tropical cyclone report: Hurricane Harvey (AL092017). National Hurricane Center (2018).
Alizadeh, R., Allen, J. K. & Mistree, F. Managing computational complexity using surrogate models: a critical review. Res. Eng. Des. 31, 275–298 (2020).
Razavi, S., Tolson, B. A. & Burn, D. H. Review of surrogate modeling in water resources. Water Resour. Res. 48, W07401 (2012).
Zhou, Y., Wu, W., Nathan, R. & Wang, Q. J. A rapid flood inundation modelling framework using deep learning with spatial reduction and reconstruction. Environ. Model. Softw. 143, 105112 (2021).
Zhou, Y., Wu, W., Nathan, R. & Wang, Q. J. Deep learning-based rapid flood inundation modeling for flat floodplains with complex flow paths. Water Resour. Res. 58, e2022WR033214 (2022).
Contreras, M. T., Gironás, J. & Escauriaza, C. Forecasting flood hazards in real-time: A surrogate model for hydrometeorological events in an Andean watershed. Nat. Hazards Earth Syst. Sci. Discuss. 2020, 1–24 (2020).
Droppers, B., Bierkens, M. F. P. & Wanders, N. A multi-resolution deep-learning surrogate framework for global hydrological models. Water Resour. Res. 61, e2024WR037736 (2025).
Sun, R., Pan, B. & Duan, Q. A surrogate modeling method for distributed land surface hydrological models based on deep learning. J. Hydrol. 624, 129944 (2023).
Jiang, P. et al. Digital Twin Earth--Coasts: Developing a fast and physics-informed surrogate model for coastal floods via neural operators., (2021). arXiv preprint arXiv:2110.07100 .
Xu, Z. et al. arXiv preprint. A Fast AI Surrogate for Coastal Ocean Circulation Models., (2024). arXiv:2410.14952.
Santiago-Collazo, F. L., Bilskie, M. V. & Hagen, S. C. A comprehensive review of compound inundation models in low-gradient coastal watersheds. Environ. Model. Softw. 119, 166–181 (2019).
Du, J. & Park, K. Estuarine salinity recovery from an extreme precipitation event: Hurricane Harvey in Galveston Bay. Sci. total Environ. 670, 1049–1059 (2019).
Huang, W. et al. Compounding factors for extreme flooding around Galveston Bay during Hurricane Harvey. Ocean Model. 158, 101735 (2021).
Valle-Levinson, A., Olabarrieta, M. & Heilman, L. Compound flooding in Houston-Galveston bay during hurricane harvey. Sci. Total Environ. 747, 141272 (2020).
Chang, L. C., Liou, J. Y. & Chang, F. J. Spatial-temporal flood inundation nowcasts by fusing machine learning methods and principal component analysis. J. Hydrol. 612, 128086 (2022).
Chen, C. et al. A short-term flood prediction based on spatial deep learning network: A case study for Xi County, China. J. Hydrol. 607, 127535 (2022).
Jing, R., Lin, N., Emanuel, K., Vecchi, G. & Knutson, T. R. A comparison of tropical cyclone projections in a high-resolution global climate model and from downscaling by statistical and statistical-deterministic methods. J. Clim. 34, 9349–9364 (2021).
NOAA National Water Model CONUS Retrospective Dataset. (n.d.). Retrieved from https://registry.opendata.aws/nwm-archive.
NOAA North American Mesoscale Forecast System (NAM). (n.d.). Retrieved from https://registry.opendata.aws/noaa-nam.
Rasp, S., Pritchard, M. S. & Gentine, P. Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. 115, 9684–9689 (2018).
Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).
Gochis, D. J. et al. WRF-Hydro V5 Technical Description Originally Created: Updated: WRF-Hydro V5 Technical Description. NCAR Technical Note, 1–107 (2018).
Shchepetkin, A. F. & McWilliams, J. C. The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model. Ocean Model. 9, 347–404 (2005).
Warner, J. C., Armstrong, B., He, R. & Zambon, J. B. Development of a Coupled Ocean–Atmosphere–Wave–Sediment Transport (COAWST) Modeling System. Ocean Model. 35, 230–244 (2010).
Kunkel, K. E. & Champion, S. M. An assessment of rainfall from Hurricanes Harvey and Florence relative to other extremely wet storms in the United States. Geophys. Res. Lett. 46, 13500–13506 (2019).
Tobin, J. Estimation of relationships for limited dependent variables. Econometrica: J. Econ. Soc. 24-36 (1958).
Acknowledgements
This work was supported by the U.S. Geological Survey Coastal and Marine Hazards and Resources Program, The Cooperative Ecosystem Studies Units (award G20AC00099), and by Congressional appropriations through the Additional Supplemental Appropriations for Disaster Relief Act of 2019 (Public Law 116–120; 133 Stat. 871). Funding from NOAA Coupled Ocean Modeling Testbed (award NA21NOS0120185), NASA Coastal Resilience (award 80NSSC23K0126), National Science Foundation (Award 2023443), and the US Department of Defense through the US Army Engineer Research and Development Center (ERDC) (Award W912HZ2220005), and the Gulf Research Program of the National Academies of Sciences, Engineering, and Medicine (SCON – 10000883) is also appreciated.
Author information
Authors and Affiliations
Contributions
D.B. developed the hybrid model framework, conducted numerical experiments, and analyzed the results. Z.G.X. conceived and supervised the study, contributed to model design, and provided scientific guidance. M.H., K.X., and C.K.H. provided expertise in experiment design, hydrodynamic modeling, and interpretation of results. J.C.T. contributed to the statistical evaluation and flood risk assessment. Z.G.X., K.X., and C.K.H. secured funding for the study. D.B. and Z.G.X. wrote the main manuscript text. D.B. prepared all figures. All authors reviewed and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bao, D., George Xue, Z., Hiatt, M. et al. A machine learning-based prediction-to-map framework for rapid and accurate spatial flood prediction. npj Nat. Hazards 2, 71 (2025). https://doi.org/10.1038/s44304-025-00122-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s44304-025-00122-2








