Introduction

Roadway safety during floods presents a critical challenge for emergency managers. A majority of flood fatalities in the United States occur when vehicles attempt to traverse submerged roadways, particularly bridges and low-water crossings1,2. Climate change and rapid urbanization are expected to increase the frequency and severity of floods in coming decades3,4. As such, there is an urgent need for enhanced flood warning systems that pinpoint the location and timing of floods to enable pre-emptive dispatch of emergency services and closure of affected roadways.

The evolution of flood forecasting systems over recent decades reflects the hydrology community’s response to this challenge. Modern systems have achieved increasingly accurate predictions of flood timing, magnitude, and extent5,6,7,8 by integrating weather data into hydrological models for runoff estimation, routing predicted runoff through stream network models, and generating detailed inundation maps9,10. In the United States, the National Water Model (NWM) integrates atmospheric forcings, land surface processes, and channel routing modules to generate real-time flood forecasts for ~3.4 million river reaches11—a dramatic expansion in coverage beyond the River Forecast Centers’ 3800 flood gage locations12.

Parallel advances in Data Assimilation (DA) have enhanced the accuracy of flood predictions by integrating real-time sensor observations into hydrologic models. Previous research has demonstrated that DA substantially enhances the accuracy of rainfall-runoff models by updating model states and parameters based on in-situ and remote sensing observations13. Focus areas for updates include states like soil moisture14,15, evapotranspiration16, snowpack17,18, water depth19,20, and model parameters like hydraulic conductivity and groundwater recession constants21,22. With respect to streamflow, assimilation of water level and discharge measurements into routing models has been shown to substantially improve discharge estimates23,24,25,26,27,28.

While existing studies have largely focused on the use of DA for streamflow estimation at ungaged locations28, the potential of DA to improve flood forecasting skill remains underexplored29. Recent studies have investigated the temporal persistence of state corrections from hybrid ensemble and optimal interpolation DA schemes for up to 18 hours27. Although research has shown promising results, there are few studies that investigate the potential for DA to enhance flood forecasts in an operational context while accounting for uncertainty in meteorological forcings.

The National Water Model currently utilizes a computationally efficient ‘nudging’ scheme, which replaces model-generated streamflow values at specific assimilation points with observed data30. This approach improves forecast accuracy by incorporating streamflow observations into the model’s initial conditions at the start of each forecast horizon. However, the correction imposed by nudging occurs only at gage sites, meaning that only downstream discharge estimates are improved31. Moreover, nudging does not provide probabilistic estimates of stage and discharge, and assumes that gage measurements are perfectly accurate.

Beyond simple nudging approaches, the research community has developed a diverse array of data assimilation methods for flood forecasting, ranging from Ensemble Kalman filters implemented in platforms such as DART and OpenDA25,32, to hybrid four-dimensional-variational/particle-filter schemes for multiple basins33, to image-assimilating particle filters that fuse earth observation flood extents into coupled hydrologic-hydraulic models34. These studies demonstrate substantial skill improvements over open-loop runs, though they vary significantly in terms of model complexity and computational requirements. A comprehensive comparison of several data assimilation frameworks for flood forecasting, including model architectures, assimilation methods, and implementation details, is provided in the Supplementary Information (Supplementary Table S1).

While effective flood management is predicated on accurate hydrologic predictions, there is also a growing demand for impact-based warnings that connect hydrologic predictions with practical decision-making needs35,36. These impact-based warnings capture the effects of floods on cities, people, and infrastructure—for example, providing emergency managers with the probability that an affected roadway will overtop within a projected forecast horizon. Many existing systems focus on offline applications like vulnerability and risk assessment37,38, relying on pre-determined floodplain maps to identify probable flood-affected areas39, and prioritizing catastrophic flood events as opposed to smaller floods at bridges and low-water crossings. Recent work has expanded impact-based flood assessment to incorporate nowcasting for emergency response and roadway safety. Developments include real-time situational awareness frameworks that fuse multi-modal observations to monitor road conditions during flooding40 and physics-based approaches that provide high-resolution nowcasting of roadway conditions for emergency response planning41. Several state and county programs operationally issue bridge-specific or low-water-crossing alerts. However, these systems are largely confined to sensor-equipped sites and are typically deterministic, with only limited capability to propagate forecasts to ungaged segments. For instance, Iowa’s BridgeWatch monitors bridge infrastructure for road overtopping and utilizes the Iowa Flood Information System (IFIS) for real-time water levels and flood forecasts across its network of nearly 500 stream-stage sensors, yet forecasting capabilities remain constrained to gauged locations42. Similarly, North Carolina’s Flood Inundation Mapping and Alert Network (NC FIMAN-T) integrates real-time sensor data with hydraulic models to provide interactive flood inundation maps and alerts, but forecasts are limited to gauged locations with minimal capability to predict conditions at ungauged areas between monitoring points43. The Harris County flood monitoring network in Texas likewise provides flood nowcasts based on real-time stage data from 200 gages, but does not provide stage estimates at ungauged locations and does not provide look-ahead capability for future water levels44. These limitations highlight the critical need for forecasting systems that can extend predictions beyond the immediate vicinity of existing gages to support proactive decision-making across transportation networks. While the NWM provides High Water Probability forecasts—indicating the likelihood that a stream will reach high water flow within the forecast period—these predictions rely on empirically-determined thresholds from retrospective simulations rather than actual bridge or culvert dimensions. This disconnect between hydrological forecasting capabilities and infrastructure-specific characteristics makes it challenging to effectively use these warnings for real-world emergency response.

In response to these challenges, this paper introduces an end-to-end operational flood forecasting system that combines online hydrologic modeling, bridge infrastructure data, and a novel data assimilation scheme to enable accurate real-time bridge flooding predictions. Our proposed framework comprises the following key components:

  • Efficient Muskingum-based streamflow routing: An efficient Muskingum routing scheme is implemented to simulate streamflow in large river networks and serves as the basis for the proposed data assimilation and impact forecasting schemes. We show that this streamflow model admits a state-space representation and we provide a computationally efficient implementation, allowing for the application of a Kalman Filter for state estimation.

  • Improved nowcasts via Kalman Filtering: We derive and implement a Kalman Filtering scheme to assimilate real-time stream gage data into the Muskingum routing model. United States Geological Survey (USGS) gage measurements are assimilated into the model at the start of each hour, thereby ensuring that the model remains up-to-date with sensor observations and improving subsequent discharge forecasts.

  • Improved forecasts via multi-model weighting: To further improve discharge forecasts, we derive a new scheme that uses an ensemble of scenarios to project future runoff inputs and then dynamically assigns weights to each ensemble member based on its past performance. For runoff forecasts, we utilize multiple scenarios with different magnitudes based on short-range NWM forecasts, and the performance of each scenario is evaluated using USGS discharge observations over the preceding forecasting horizon.

  • Impact-based predictions from bridge infrastructure data: We estimate the probabilities of bridge flooding by comparing an ensemble of forecasted water stage heights against bridge low chord elevations using a bridge inventory dataset for the state of Texas. The inclusion of bridge data translates hydrological forecasts into actionable warnings for specific infrastructure assets.

The proposed data assimilation framework—referred to here as Kalman Filtering with Multi-Model Weighting (KF+MMW)—is benchmarked against the National Water Model with nudging for a three month period of storm activity in the Guadalupe and San Antonio River basins—two major watersheds within Texas’s Flash Flood Alley that experience frequent and dangerous flooding events45. Using discharge observations from 75 USGS gaging stations and hydraulic geometry data from 2337 bridges, we evaluate the forecasting skill of both methods at predicting discharges and bridge impact events. The resulting analysis quantifies both the improvement in forecasting accuracy and the improvement in lead time that are attributable to our proposed approach. By integrating real-time data, handling uncertainties in forecasted runoff inputs, and offering site-specific warnings for individual bridges, our approach provides an adaptable system for improved flood response under real-world conditions.

Results

We evaluate the improvement in forecasting skill afforded by our proposed framework over a three-month period of severe storms in south-central Texas. Compared to the current nudging approach used by the NWM, the proposed KF+MMW data assimilation method offers improved forecasting skill for both discharge and bridge impacts over all lead times considered. These improvements increase at longer lead times, giving emergency operators more time to dispatch maintenance crews and first responders. We demonstrate that the proposed framework is effective at reducing both missed events and false alarms that undermine trust in flood early warning systems.

Overview of study area and study period

This study focuses on a three-month period of severe storm events affecting the Guadalupe and San Antonio River basins—two major watersheds in south-central Texas with drainage areas of 17,350 km2 and 10,830 km2, respectively. We investigate a period of storm activity from April to July of 2023, which includes the largest storms of the year. To ensure our results remain robust under different hydrologic conditions, a separate assessment of storm events from April to October of 2021 is included in the Supplementary Information (SI) Document (Supplementary Fig. S1).

Figure 1 shows an overview of hydrologic conditions and predicted bridge impacts during the study period, including: (a) time series of discharges for reaches in the Guadalupe basin (purple) and San Antonio basin (green) as measured by USGS gages; and (b) time series of predicted bridge warnings for the 2337 bridges in the study area, with warnings for all bridges and low-water crossings shown in black, and warnings for major span bridges shown in red. Maps (i-vi) show predicted probabilities of bridge impacts at selected time points, with smaller yellow markers representing bridges at low risk of flooding and larger red markers indicating bridges at high risk of flooding. Precipitation fields from the Analysis of Record for Calibration (AORC) dataset are included to show the spatial distribution of storms alongside bridge impacts46. For this assessment, a bridge impact refers to an event where stream stage rises above the bridge low chord, and a bridge warning occurs when the mean predicted stream stage is greater than the bridge low chord at any point over the next 12-h forecast window.

Fig. 1: Temporal and spatial evolution of bridge warnings at 2337 bridges and low water crossings in the San Antonio and Guadalupe Basins during a period of storm events from April to June of 2023.
figure 1

Yellow dots indicate normal conditions, while red dots represent high risk of flooding. a USGS streamflows measured at gages. b Percentage of bridges with warnings: the black line represents all bridges, while the red line shows only span bridges, excluding low water crossings. Maps (i-vi) illustrate the progression of warnings (triangles for span bridges, circles for low water crossings) alongside corresponding precipitation data over time.

Following the progression of storm events shown in Fig. 1, bridge warnings coincide with hydrograph peaks and roughly map to the spatial distribution of rainfall. In (i), warnings are issued throughout the basin at the onset of the rainfall event on April 7. On April 21 (ii), rain fronts concentrate around downtown San Antonio, leading to multiple co-located warnings that align with high USGS gage readings. On May 1 (iii), during a period of no storm activity, few warnings are issued; however, a small number of warnings persist at low water crossings, where the warning threshold is close to the channel bottom. On May 13 (iv), warnings return as a storm approaches from the northeast, with a sharp increase in warnings over the next three hours as water levels rise and the storm moves downstream across the basin. At its peak (v), approximately 290 warnings are issued, impacting 13% of all bridges. On May 17 (vi), as the storm subsides and water levels recede, several bridge warnings are lifted. The remaining warnings are concentrated downstream, showing the cascading effect from the upstream tributaries, consistent with the progression of the storm event.

Streamflow and bridge impact forecasts improve over all lead times

Taking the entire period of record across the two river basins, the overall improvement in forecasting skill is evaluated by computing two different skill scores that capture accuracy in hydrologic predictions and bridge impact predictions, respectively. First, the Continuous Ranked Probability Skill Score (CRPSS) is computed to quantify the agreement between forecasted discharges and measured discharges at gage sites. Second, the Brier Skill Score (BSS) is computed to quantify the agreement between forecasted (binary) bridge impacts and observed bridge impacts at gage sites. We evaluate the difference in skill score averages for the 75 gages where observations are available. A positive CRPSS or BSS indicates better predictive skill compared to the reference model (open loop simulation without data assimilation), while a negative score indicates poorer performance.

Evaluating the average performance over the entire period of record, the proposed KF+MMW data assimilation method offers improved forecasting skill over all lead times considered. Figure 2b shows the median CRPSS by lead time, while (d) shows the median BSS. Our approach (blue) shows improved average and median CRPSS and BSS at all lead times compared to the NWM with nudging (orange). Although skill scores for both methods decrease as lead time increases, the relative advantage of KF+MMW improves with increasing lead time, indicating that the proposed approach offers an advantage at longer forecast horizons where interventions have maximum effect.

Fig. 2: Performance comparison.
figure 2

Kalman Filter with multi-model weighting vs. NWM with nudging for the April–July 2023 study period. a CRPSS (Continuous Ranked Probability Skill Score) difference for flood forecasting for each gage – evaluates probabilistic streamflow forecast accuracy, b CRPSS performance across lead times, c BSS (Brier Skill Score) difference for bridge warnings using bridge low chord threshold – assesses predictions of threshold exceedance, d BSS performance across lead times using bridge low chord threshold. Only sites with warnings are included in (c) and (d).

Improvements in forecasting skill hold not only in aggregate, but also for individual gage sites. Figure 2a, c shows the relative change in forecasting skill at the maximum 12-h forecasting horizon in terms of the mean ΔCRPSS and ΔBSS, respectively. Here, ΔCRPSS and ΔBSS represent the difference in CRPSS and BSS between our approach and that of the NWM. A positive ΔCRPSS or ΔBSS (blue) shows that improvements from our approach exceed those of the NWM with nudging, while a negative skill score (red) shows that NWM with nudging outperforms. As seen in Fig. 2a, c, KF+MMW consistently matches or exceeds the performance of nudging at most locations. Due to the large number of ‘true negatives’ that reduce the sensitivity of the results, Fig. 2 includes only sites where impacts occur. Nevertheless, KF+MMW still demonstrates better forecasting skill when all sites are included, as shown in Supplementary Fig. S2. These results remain consistent for storm events in 2021 (April 15 to October 31), as shown in Supplementary Fig. S3. While CRPSS and BSS demonstrate the performance for ensemble forecasts, we also evaluate hydrologic performance using ensemble mean forecasts across all lead times with NSE and KGE metrics. In terms of NSE and KGE, our framework consistently achieves better performance than NWM for both the 2021 and 2023 evaluation periods (Supplementary Figs. S4 and S5; Supplementary Table S2).

For BSS calculations in (c) and (d), we use the estimated low chord elevations of bridges to determine the occurrence of bridge impacts. However, because bridge flooding events are relatively rare, the study period contains only a limited amount of data to assess bridge impact forecasting skill. For this reason, we conduct an additional sensitivity analysis to ensure that forecasting skill holds when using different bridge impact thresholds.

Large storm events show the greatest improvement in forecasting skill

We conduct a sensitivity analysis by varying the thresholds at which bridge impacts are defined to occur and then re-computing the Brier Skill Score for each threshold. For each gauged bridge, impacts are defined to occur when discharges reach the 50th, 75th, 90th, 95th, and 100th percentiles of observed streamflow. By guaranteeing that a sufficient number of detectable events occurs at each site, this assessment ensures reliable quantification of detection accuracy across the study area. Moreover, this sensitivity analysis assesses forecasting performance across a range of flow conditions from moderate to extreme events, ensuring that the proposed method works for storms of different sizes.

Figure 3 shows the results of the sensitivity assessment, with the bridge impact threshold shown in the vertical axis, the lead time on the horizontal axis, and the median ΔBSS over all sites indicated by color (with blue indicating that KF+MMW performs better, and red indicating that NWM with nudging performs better). The proposed approach consistently outperforms NWM forecasts across all lead times and thresholds. However, the advantage is most pronounced at longer lead times and larger streamflow percentiles (bottom-right corner). This result is important, given that large storm events are most likely to lead to fatalities, while longer lead times give emergency operators more opportunities to plan effective countermeasures against extreme events. These results remained consistent for storm events from April 15 to October 31, 2021 (Supplementary Fig. S6).

Fig. 3: Median difference in BSS across lead times for various thresholds at gauged locations for the April–July 2023 study period.
figure 3

Positive ΔBSS indicates greater improvement in KF predictions compared to NWM predictions.

Improved discharge predictions reduce false positives and false negatives

A closer examination of the storm period at a single gage (USGS 08175000) reveals that the proposed DA approach mitigates both false positives (e.g. false alarms) and false negatives (e.g. missed flood events) that complicate emergency response. Figure 4 compares observed and forecasted discharges and bridge impacts under both the NWM (a) and our proposed framework (b). Here, the impact threshold is based on the low chord elevation of the bridge. For both (a) and (b) the top panels show discharge while the bottom panels show the predicted probability of bridge impacts. Observed discharge and observed bridge impacts are shown in gray, while progressive forecasts at 1–12 h lead times are shown for both the NWM (red) and for the proposed approach (blue), with lighter shades indicating predictions at longer lead times.

Fig. 4: Streamflow and bridge impact forecasting performance comparison during flood events.
figure 4

Comparison of streamflow predictions and bridge impact warnings under a NWM with Nudging (top) and b KF+MMW (bottom). The upper panels in each section display discharge predictions, while the lower panels represent predicted probabilities of bridge flooding. Shaded regions correspond to observed discharge (top) and overtopping events (bottom), with the dashed line in the discharge plots indicating the threshold for bridge impact, as determined from bridge low chord data. Each forecast is associated with lead times, represented by the color gradients on the right. Detailed views of two flood events are shown in c April 6–9, 2023 and d May 10–20, 2023.

From Fig. 4, it is observed that the proposed KF+MMW data assimilation approach better captures true bridge impact events than the NWM with nudging. Notably, the proposed framework successfully captures the flooding event on April 9, while the NWM forecast misses this event. This result can be partially attributed to the multi-model weighting technique. Observing the evolution of model weights over time (Supplementary Fig. S7), it can be seen that heavier weights are applied to ensemble members that predict larger rainfall forecasts during this interval, thereby increasing the forecasted runoff volume. The proposed DA approach also better captures the timing of the true flooding event that occurs around May 15, while the NWM forecast both predicts the onset too early and then under-predicts flooding during the tail end of the event.

In addition to capturing true flooding events, the proposed DA approach also more successfully rejects false alarms. In general, NWM forecasts with nudging exhibit a wider spread of ensemble discharge estimates (Fig. 4a). This spread indicates a greater level of uncertainty and a tendency to overestimate discharges which in turn leads to false alarms. Specific instances of overestimation are observed on April 22, May 10, and May 14, where the forecasted discharges significantly exceed observed values. By contrast, KF+MMW (Fig. 4b) shows a tighter clustering of ensemble trajectories, especially during peak flow events. Under the proposed method, forecasts on May 14 and 15 closely align with the observed streamflow. The false alarm triggered by the NWM on May 14 is rejected while the subsequent true flood event on May 15 is correctly predicted. Although a false alarm occurs around May 17 for both methods, the magnitude, duration, and probability of overtopping predicted by our approach are smaller than that of the NWM predictions. Supplementary Fig. S8 confirms this pattern quantitatively: across all lead times, KF+MMW posts higher AUC values and ROC curves that lie nearer to the ideal top-left corner than the NWM-Nudging baseline, underscoring its ability to detect true bridge-impact events while suppressing false alarms.

Forecasting improvement holds at ungaged bridges as verified by retrospective rainfall forcings

Our evaluation thus far has focused on the gaged sites where discharge and river stage data are available; however, these sites represent only a small fraction of the 2337 bridges within the study area. To validate our bridge flood warning system’s performance across all bridges in the San Antonio and Guadalupe watersheds with a focus on its efficacy during flash flood events, we compare our forecasts against discharge and bridge impact estimates derived from NWM retrospective simulation data.

Rainfall forcing data sources differ significantly between real-time operations and retrospective analysis. In real-time operations, both the NWM and our forecasting system rely on a combination of Multi-Radar Multi-Sensor (MRMS) products—including both gage-adjusted and radar-only precipitation observations47—along with precipitation forecasts from the Rapid Refresh (RAP) and High-Resolution Rapid Refresh (HRRR) models48,49, which inherently contain uncertainties due to their real-time nature and forecasting limitations. By contrast, the retrospective simulation employs the Analysis of Record for Calibration (AORC) rainfall input forcings46, which provide more reliable approximations of actual rainfall conditions through comprehensive quality control and bias correction. For this reason, we treat the discharges produced by AORC forcings as ‘ground truth’ for the purposes of quantifying forecasting skill at ungaged bridges.

Taking the AORC-derived discharges for each bridge as ‘ground truth’, we evaluate the system’s performance by calculating Brier scores for bridge impact forecasts at various lead times and comparing our proposed framework to NWM forecasts with nudging over the retrospective simulation. The verification methodology compares forecast skill using the Brier Skill Score (BSS), where positive values indicate better performance relative to a reference forecast. Here, bridge impact thresholds are based on known low chord elevations at all sites. The simulation period spans April to October 2021, corresponding to the availability of retrospective simulation results. This analysis shows that over 50% of individual bridge locations exhibit improved forecast accuracy compared to the NWM, while the NWM outperforms our framework at fewer than 15% of locations (see Fig. 5b). The remaining 30% of sites show no significant difference in skill, predominantly at sites where no flooding events occur during the study period. Importantly, our framework consistently outperforms the NWM across the entire 12-hour forecasting horizon, demonstrating improved skill even at longer lead times. In particular, we analyze three periods during the study period when flash flood warnings were issued (April 29–May 1, 2021; July 6–7, and October 14–15, 2021) (Fig. 5d–f)50. During these periods, our framework outperforms the existing NWM at nearly twice as many bridge locations, confirming its effectiveness for flash flood scenarios that pose the greatest risk to motorists.

Fig. 5: Verification of bridge flood impact forecasts from the proposed framework (KF+MMW) compared to National Water Model (NWM) retrospective simulations over the study period from April 1 to October 31, 2021.
figure 5

a Spatial map of bridge infrastructure, where dot color indicates the model with higher Brier Skill Score (BSS): blue for KF+MMW, orange for NWM with nudging. b Distribution of comparative forecast skill across all bridges by lead time, showing percentage where KF+MMW (blue), NWM (orange), or neither model (gray) achieved higher BSS. c Streamflow measurements from USGS gages during the study period. Spatial distribution of relative forecast skill at 2177 bridge locations during three flash flood warning periods: d April 29–May 1, 2021; e July 6–7, 2021; f October 14–15, 2021. Color coding follows panel (a). Pie charts quantify the percentage of bridges where each model performed better or conditions were comparable.

Discussion

This study introduces a new framework for operational flood forecasting that addresses the critical gap between hydrologic predictions and real-world impacts on transportation infrastructure. The proposed data assimilation scheme significantly improves predictions of bridge flooding at longer lead times than the current state of the art. Moreover, by evaluating flood predictions against real bridge infrastructure data, this study highlights both the real-world potential for transportation management during floods and the challenges that remain in implementation. Taken together, the proposed framework not only enhances flood forecasting accuracy but also paves the way for real-time, impact-based flood warnings that directly address public safety needs.

Our study demonstrates that the proposed KF+MMW data assimilation approach outperforms nudging in both discharge forecasts and bridge impact predictions. This improvement is especially pronounced at longer lead times, underscoring its implications for early warning systems and emergency response planning. Emergency operators require adequate lead times to dispatch maintenance crews, close affected roadways, and reroute traffic while simultaneously avoiding widespread traffic impacts. The proposed system’s ability to provide accurate forecasts at longer lead times will provide emergency managers and the public more time to prepare and respond, reducing economic impacts and saving lives.

The sensitivity analysis reveals that the proposed approach maintains its advantage across various streamflow thresholds corresponding to storms of different magnitudes and return intervals. This robustness is crucial for an operational flood forecasting system, as it must accurately predict both frequent, low-impact events and rare, high-impact floods—all while minimizing false positives and false negatives that undermine confidence in early warning systems. The consistent advantage in forecasting skill attributable to KF+MMW, especially at higher thresholds and longer lead times, suggests that it is suitable for predicting severe flooding events when early warnings are most critical.

This study moves beyond traditional discharge-focused flooding evaluations and specifically assesses the forecasting skill of roadway flood impact predictions by directly incorporating bridge infrastructure data. While some operational systems, as discussed in the Introduction (e.g., Iowa’s BridgeWatch, NC FIMAN-T), provide valuable impact-based alerts for specific infrastructure, often at gauged locations, a distinct contribution of our work is the rigorous, probabilistic skill evaluation of infrastructure-specific impact forecasts across a large river network, including numerous ungauged sites. This evaluation quantifies the benefits of our proposed data assimilation (KF+MMW) scheme not just for streamflow, but directly for the prediction of bridge overtopping events using metrics like the Brier Skill Score. Detailed skill assessment for infrastructure impacts is critical because, while accurate discharge predictions are generally correlated with accurate predictions of bridge impacts, the two quantities are not interchangeable. Indeed, optimizing model calibration and data assimilation solely for discharge prediction may potentially harm prediction of bridge impacts by emphasizing model accuracy during periods that are less relevant to bridge flooding, such as falling hydrograph limbs or baseflow conditions. The integration of bridge datasets into our model for estimating bridge impact probabilities thus represents an advance towards impact-based flood forecasting. By providing site-specific warnings for individual bridges, the proposed system offers actionable information for emergency responders and the public. Ongoing work will further evaluate the utility of the proposed forecasting system to emergency operators through planned real-world exercises.

Two major factors account for the difference in performance between the proposed DA approach and nudging. First, while nudging applies a discharge correction only at the gage location itself, the Kalman Filter imposes a distributed runoff correction over the gage’s upstream contributing area (see Fig. 6). Thus, nudging is only capable of improving model accuracy downstream of gages, while Kalman Filtering improves model accuracy both upstream and downstream of gages. This capability is crucial for accurate estimation of flood impacts at the watershed scale, given that the large majority of reaches are upstream of gage sites and are therefore unaffected by nudging.

Fig. 6: Spatial distribution of streamflow corrections from data assimilation methods.
figure 6

Normalized difference in streamflow induced by a Nudging in NWM and b Kalman Filter compared to the open-loop simulation throughout the study period from April to July 2023.

In addition to its effect on the spatial distribution of estimated discharge, the correction applied by the Kalman Filter also has a significant effect on the subsequent forecast. Because the correction induced by nudging is applied only at the gage itself, its effect diminishes rapidly as the correction is routed downstream. On the other hand, the correction applied by the Kalman Filter is distributed over the gage’s entire upstream area. This correction functions as an extra runoff input that is routed through the drainage network during the forecasting step. Thus, a positive correction applied by the Kalman Filter will increase the subsequent discharge forecast over the entire forecast window while a negative correction will decrease the discharge forecast. The exact distribution of this runoff correction is ultimately controlled by the Kalman Filter’s process noise covariance matrix, which reflects the spatial covariance in the uncertainty of the forcing input. If additional data are available on the spatial correlation in runoff inputs (e.g. from radar rainfall measurements), then this information may be used to further improve both interpolation and forecasting of discharges within the Kalman Filtering framework.

The second factor responsible for the improvement in forecasting skill is the proposed multi-model ensemble-weighting technique. Given an ensemble of future runoff forecasts, this approach dynamically assigns weights to ensemble members based on their historical performance against USGS observations. Thus, ensemble members that have performed well in the recent past will be given larger weights in the subsequent 12-h forecast. This technique is particularly effective given that ensembles are often dominated by one or two outliers that predict large spurious rainfall events. In Fig. 4, these spurious rainfall forecasts can be seen on May 10 and May 17, where they negatively affect the quality of the NWM forecast. The proposed model-weighting technique suppresses these outliers, resulting in fewer and less persistent false alarms.

While this study represents an important step towards real-time transportation management during flood events, several opportunities for improvement should be addressed in future work, including better representation of bridge elevation and inundation model uncertainties, enhancements to the data assimilation method, and extensions to different geographic regions and modeling environments.

While the integration of bridge datasets represents a significant advance towards operational impact-based flood forecasting, it is important to acknowledge the uncertainties in infrastructure data quality and inundation mapping techniques that affect bridge impact estimates. Bridge elevation data and local channel geometry may contain significant uncertainties that propagate to impact predictions. These uncertainties underscore the importance of maintaining up-to-date quality-controlled infrastructure databases for reliable flood impact assessments. The method used to translate modeled discharge into stage is also subject to uncertainty. Given that measured rating curves are not available for the large majority of reaches, we use synthetic rating curves derived from HAND (Height Above Nearest Drainage) hydraulic geometries to convert discharge to flood depth. Future research should investigate how uncertainty in both bridge elevations and rating curves may be quantified and incorporated into our probabilistic flood forecasting framework. Similarly, confidence in impact predictions may be improved by accounting for localized hydraulic phenomena like bridge-induced backwater effects. The current Muskingum-based approach, while computationally efficient for large-scale applications, does not capture these localized hydraulic interactions that can significantly alter water stage near bridge structures. Future efforts should focus on integrating detailed hydraulic models or data-driven corrections for critical locations to better represent bridge-flow interactions and their influence on upstream water levels. This contribution will help ensure that emergency operators understand the overall level of confidence in reported roadway flooding predictions.

Regarding data assimilation, future research should explore refinements that further improve forecasting skill. Rainfall forecasts currently comprise the largest source of uncertainty in the forecasting model. Thus, improving rainfall and runoff forecasts before the streamflow routing step will likely have the largest impact on improving forecasting accuracy. Towards this goal, additional data sources, such as radar rainfall estimates, rain gage measurements, or soil moisture observations, should be integrated into the proposed data assimilation scheme. The operational reliability of this data assimilation scheme also depends on stream gage input; future work should thus explore strategies to enhance forecast robustness during gage failures, including scenarios with no data or abnormal values. These strategies may involve adapting the Kalman Filter’s assimilation process, for instance, by skipping correction steps when data is missing or by integrating statistical tests to identify and mitigate the impact of erroneous observations51, as these failures can be particularly critical during major flood events. Given that the spatial domain of the Kalman Filter has a significant effect on both the computational speed and the spatial distribution of the runoff correction, alternative localization or inflation techniques should also be explored. Moreover, alternative ensemble forecasting methods and refinements to the current time-lagged approach should be investigated, particularly to enhance responsiveness during rapidly evolving conditions and mitigate potential detection delays for critical events (e.g., as observed in specific instances in Fig. 4b). For instance, future work should investigate methods that assign greater weight to more recent forecasts within the time-lagged ensemble to enhance responsiveness during rapidly evolving conditions. Further exploration should also focus on the generation and utilization of multiple distinct ensemble members directly propagating through the forecasting chain for each forecast cycle to ensure adequate characterization of the variability in rainfall forecasts.

Finally, future research should extend the proposed data assimilation framework to new regions served by the NWM and other hydro-meteorological forecasting systems. Our study demonstrates significant improvements in flood prediction across two major Texas watersheds, particularly within Flash Flood Alley, one of the most flood-prone regions in the United States. However, the benefits of our approach can be extended far beyond Texas. Given that our framework builds on top of the model structure of the NWM, it can be readily implemented across the entire United States where NWM forecasts are already available. Our scalability analysis also shows near-linear computational complexity, demonstrating the framework’s efficiency for large-scale deployment (Supplementary Fig. S9). Moreover, our methodology is compatible with existing continental and global flood prediction systems, such as the European Flood Awareness System (EFAS) and the Global Flood Awareness System (GloFAS)5,9, as these systems typically employ distributed hydrologic models with both rainfall-runoff and routing modules—similar to the NWM structure utilized in this study. Because our framework improves flood predictions by incorporating real-time gage data into the routing module through data assimilation, our approach can be readily applied to flood forecasting systems in other regions where gage data are available.

Methods

Forecasting framework

The proposed flood forecasting framework consists of four basic elements, including (i) an online routing model based on the Muskingum method, (ii) a data assimilation scheme based on Kalman Filtering, (iii) a dynamic model-weighting scheme for ensemble forecasts, and (iv) a post-processing step that converts discharge to stage and determines the probability of bridge impacts. The architecture of this system is depicted in Fig. 7.

Fig. 7: Bridge flood early warning system architecture and workflow.
figure 7

Bridge flood early warning framework showing: (1) watershed partitioning, (2) Kalman filtering with gage data assimilation, (3) weighted ensemble scenarios, (4) streamflow forecasting, (5) time-lagged ensembles, (6) bridge threshold from DEM and bridge data, and (7) probabilistic bridge warnings.

The basic forecasting cycle proceeds as follows. First, discharge data from field-deployed gages is assimilated into the Muskingum routing model to correct initial estimates of streamflow at the start of each forecast horizon. Next, the Muskingum model is simulated using an ensemble of 18-h lateral runoff and groundwater recharge forecasts from the National Water Model. A novel model weighting scheme is used to dynamically adjust the influence of each different lateral inflow scenario based on its performance over the previous forecasting horizon. The scaled forecasts are added together to create a single estimate of forecasted discharge. To generate probabilistic forecasts, we utilize time-lagged ensembles over a 12-h shared window. This approach mitigates ‘flip-flopping’ forecasts that pose difficulties for decision makers. Finally, site-specific rating curves are used to translate discharge forecasts into stage forecasts, and the probability of bridge impacts are computed using bridge envelope data tied to each modeled reach.

Muskingum routing and its state space representation

Our routing model is adapted from the routing model used by the NWM, which employs the Muskingum-Cunge method for channel routing. The Muskingum-Cunge method in the NWM is implemented as follows:

$${{\bf{Q}}}_{t}^{j+1}={\alpha }_{j}{{\bf{Q}}}_{t}^{j}+{\beta }_{j}{{\bf{Q}}}_{t-1}^{j}+{\chi }_{j}{{\bf{Q}}}_{t-1}^{j+1}+{\gamma }_{j}{{\bf{q}}}_{t}^{j}$$
(1)

where Q represents the modeled discharge for a given stream reach [m3/s] and q represents the exogenous lateral inflow to the reach from both runoff and subsurface recharge [m3/s]. For each stream reach, the discharge is modeled at both the upstream end (indexed by j) and the downstream end (indexed by j + 1). From the recurrence relation given by Eq. (1), the discharges at the current timestep (t) are calculated based on the discharges at the previous time step (t − 1). The parameters α, β, χ, and γ are given by:

$${\alpha }_{j}=\frac{\Delta t-2{K}_{j}{X}_{j}}{2{K}_{j}(1-{X}_{j})+\Delta t}\qquad {\beta }_{j}=\frac{\Delta t+2{K}_{j}{X}_{j}}{2{K}_{j}(1-{X}_{j})+\Delta t}$$
(2)
$${\chi }_{j}=\frac{2{K}_{j}(1-{X}_{j})-\Delta t}{2{K}_{j}(1-{X}_{j})+\Delta t}\qquad {\gamma }_{j}=\frac{\Delta t}{{K}_{j}(1-{X}_{j})+\Delta t/2}$$
(3)

where K represents the travel time through the reach [s], X is a dimensionless coefficient that ranges from 0 to 0.5 and represents the degree of flood wave attenuation through the reach, and Δt represents the model time step [s]. The K and X parameters depend on channel geometry and roughness as well as the wave celerity, which itself depends on the stage and discharge52.

The Muskingum-Cunge method employed by the NWM iteratively computes the K and X routing parameters at each time step based on the modeled discharges in each reach. This step imposes an additional computational burden and may also invoke model instability. Thus, our flood forecasting framework modifies the NWM approach by using the Muskingum routing scheme in which the K and X parameters are assumed to remain constant in time. For real-time flood forecasting, this assumption is justified by the fact that uncertainty in the routing parameters is small compared to the uncertainty in the magnitude and timing of rainfall forecasts. Additionally, to ensure a fair comparison, we further calibrate the model to align with the NWM outputs without nudging (‘Standard Open Loop Analysis and Assimilation’). Details of the calibration procedure and results using the Joint State-Parameter Estimation with Expectation Maximization are provided in the Supplementary Information document (Supplementary Figs. S10 and S11).

When applied to each reach in a stream network, the Muskingum routing Eq. (1) leads to a system of discrete-time difference equations. This system of equations may be expressed in state space form (Eqs. (4) and (5)), wherein the discharges at time (t + 1) are represented as a linear function of the streamflows (Q) and the lateral inflow (q) at time (t) for each reach. The matrices A and B are determined by the routing coefficients, taking into account the stream connectivity:

$${{\bf{x}}}_{t+1}={\bf{A}}{{\bf{x}}}_{t}+{\bf{B}}{{\bf{u}}}_{t}+{{\bf{w}}}_{t}$$
(4)
$${{\bf{y}}}_{t+1}={\bf{H}}{{\bf{x}}}_{t+1}+{{\bf{v}}}_{t+1}$$
(5)

where xt is the state vector of discharges in each reach at time t; ut is the exogenous input vector of lateral inflows for each reach at time t; yt is the vector of observed discharges (USGS gages); wt is the process noise; vt is measurement noise; A is the state transition matrix; B is the input matrix; and H is the observation matrix.

Kalman Filter with multiple lateral input scenarios

Our data assimilation approach is based on the well-known Kalman Filter, which combines a dynamical model of a physical system with sensor measurements to generate improved estimates of the system’s internal states. For linear systems in which the dynamical model is perfectly accurate, and the covariances of the process and measurement noise are known precisely, the Kalman Filter can be shown to be the (optimal) minimum mean squared error estimator of system states. Crucially, the Kalman Filter explicitly accounts for uncertainty in both the model and sensor observations, and generates probabilistic estimates of the system’s internal states, meaning that the variances of the predicted streamflows are represented explicitly. Variants of the Kalman Filter have been widely used in numerical weather prediction and hydrologic modeling over the past decades53,54.

In traditional Kalman Filter approaches, while the states (i.e., streamflows) may be corrected through data assimilation, the exogenous inputs (i.e., lateral inflows) are assumed to be known up to some zero-mean Gaussian error. However, in reality, lateral inflows are subject to significant uncertainties due to meteorological and hydrologic model limitations. These uncertainties are further exacerbated by the use of operational real-time rainfall data in our study, which is likely to be less accurate compared to calibrated or reanalyzed data. To address these problems, we introduce a Kalman filter approach that considers an ensemble of multiple future lateral input scenarios. First, the prediction step projects the state vector forward in time using the state-space model for each lateral input scenario i:

$${\hat{{\bf{x}}}}_{t+1| t}^{i}={\bf{A}}{\hat{{\bf{x}}}}_{t| t}^{i}+{\bf{B}}{{\bf{u}}}_{t}^{i}$$
(6)
$${{\bf{P}}}_{t+1| t}^{i}={\bf{A}}{{\bf{P}}}_{t| t}^{i}{{\bf{A}}}^{T}+{{\bf{Q}}}_{t}^{i}$$
(7)

where \({\hat{{\bf{x}}}}_{t| t}^{i}\) represents the posterior estimates of streamflows for each reach for ensemble member i at time t; \({{\bf{u}}}_{t}^{i}\) represents the disturbed lateral inflows for each ensemble member; \({\hat{{\bf{x}}}}_{t+1| t}^{i}\) represents the prior estimates of streamflows at time t + 1 predicted at time t; \({{\bf{P}}}_{t+1| t}^{i}\) is the prior error covariance matrix, \({{\bf{Q}}}_{t}^{i}\) is the process noise covariance matrix. Next, the update step corrects the estimated state by combining the current prediction with measurements from gages:

$${{\bf{K}}}_{t+1}^{i}={{\bf{P}}}_{t+1| t}^{i}{{\bf{H}}}^{T}{\left({\bf{H}}{{\bf{P}}}_{t+1| t}^{i}{{\bf{H}}}^{T}+{{\bf{V}}}_{t+1}^{i}\right)}^{-1}$$
(8)
$${\hat{{\bf{x}}}}_{t+1| t+1}^{i}={\hat{{\bf{x}}}}_{t+1| t}^{i}+{{\bf{K}}}_{t+1}^{i}\left[{{\bf{y}}}_{t+1}^{i}-{\bf{H}}{{\bf{x}}}_{t+1| t}^{i}\right]$$
(9)
$${{\bf{P}}}_{t+1| t+1}^{i}=\left({\bf{I}}-{{\bf{K}}}_{t+1}^{i}{\bf{H}}\right){{\bf{P}}}_{t+1| t}^{i}$$
(10)

where \({{\bf{K}}}_{t+1}^{i}\) is the Kalman gain at time t + 1; \({{\bf{P}}}_{t+1| t+1}^{i}\) is the posterior error covariance at time t + 1; \({{\bf{V}}}_{t+1}^{i}\) is the measurement noise covariance matrix. This update step is executed only at the initial time step of each forecast cycle when current gage data is available. For all subsequent steps, predictions are propagated based on the system dynamics Eq. (6).

Model weighting

At each time step, the Mean Squared Error (MSE) is calculated for each lateral input scenario. The MSE is computed as the average of the squared differences between the predicted streamflows (\(\hat{{\bf{x}}}\)) and the observed stream flow data (y):

$${\,\text{MSE}\,}_{t}^{i}=\frac{1}{T}\mathop{\sum }\limits_{k=1}^{T}{\left({{\bf{y}}}_{t}-{\hat{{\bf{x}}}}_{t| t-k}^{i}\right)}^{2}$$
(11)

where yt is the observed streamflow at time t, \({\hat{{\bf{x}}}}_{t| t-k}^{i}\) is the predicted streamflow from lateral input scenario i at time t predicted from time tk to time t − 1, and T is the number of previous time steps considered. In our study, T is set to 18 h. The weights for each scenario are updated based on their MSEs. Specifically, the weight is set to be inversely proportional to the MSE, ensuring that scenarios with lower errors have larger weights:

$${\tilde{w}}_{t}^{i}={\left({\text{MSE}}_{t}^{i}+\epsilon \right)}^{-1}$$
(12)

Here, ϵ is a small value (e.g., 1e − 6) to prevent division by zero. After updating, the weights are normalized so that their sum equals one:

$${w}_{t}^{i}=\frac{{\tilde{w}}_{t}^{i}}{\mathop{\sum }\nolimits_{j = 1}^{N}{\tilde{w}}_{t}^{j}}$$
(13)

We obtain the forecasted streamflow for future time steps using the weights \({w}_{t}^{i}\). Specifically, the estimate of the streamflow at a future time step t + Δt, given the current time t, is computed as a weighted sum of the predicted streamflows from each lateral input scenario i:

$${\hat{x}}_{t+\Delta t| t}=\mathop{\sum }\limits_{j=1}^{N}{w}_{t}^{j}{\hat{x}}_{t+\Delta t| t}^{j}$$
(14)

Bridge data extraction

The bridge warning system is built upon a comprehensive geospatial dataset derived from LIDAR point clouds that characterize bridge infrastructure55. This dataset, developed for the Texas Department of Transportation (TxDOT), contains detailed information about bridge geometry, elevation profiles, and associated stream characteristics. The system exclusively monitors bridges that cross streams, filtering out non-stream crossings. Water surface elevations are computed using the Height Above Nearest Drainage (HAND) methodology56, which utilizes a 3-m digital elevation model (DEM)57. The system translates streamflow forecasts into stage heights using synthetic rating curves derived from HAND calculations. For each bridge, a warning threshold is established based on the clearance between the water surface and the bridge’s low chord elevation, as this quantity represents the critical point where water may impact the structure58. The bridges are classified into two categories based on their structural characteristics: (1) low-water crossings and small bridges, and (2) large-span bridges. The classification is determined by calculating the ratio of the distance between the low chord and stream bed elevations to the bridge thickness. Bridges with a ratio greater than 2.5 are classified as large-span bridges, while those below this threshold are categorized as low-water crossings and small bridges.

Time-lagged ensembles and bridge impact probability

A time-lagged ensemble (TLE) is a method for creating probabilistic forecasts by reusing past model runs. By gathering output from previous model simulations initialized at different times, a collection of forecasts for a specific future time is formed. This ensemble of forecasts provides a range of possible outcomes, allowing for the calculation of probabilities. In the case of the NWM, TLEs are used to estimate the probability of high water occurring within the next 12 h using 7 time-lagged ensemble members.

For a given bridge, the discharge level at which water starts impacting the low chord of the bridge is denoted as z. The forecast discharge for a time-lagged ensemble is given by \({\hat{x}}^{i}\) for each ensemble member. The probability that the discharge at the bridge will exceed the warning threshold value z can be expressed mathematically as:

$$P(\hat{x} > z)=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}H({\hat{x}}^{i}-z)$$
(15)

where \(P(\hat{x} > z)\) is the probability that the discharge will exceed the threshold z; \({\hat{x}}^{i}\) represents the forecast discharge from the i-th ensemble member; z is the threshold discharge value where the water starts impacting the low chord of the bridge; n is the total number of ensemble members in the time-lagged ensemble; \(H({\hat{x}}^{i}-z)\) is the Heaviside step function, which returns 1 if \({\hat{x}}^{i} > z\) (indicating that the discharge exceeds the threshold) and 0 otherwise.

Score and skill assessment

To assess forecast performance, we employ two primary scoring metrics: the Continuous Ranked Probability Score (CRPS) and the Brier Score (BS). The CRPS evaluates probabilistic forecasts for continuous variables and is defined as:

$${\rm{CRPS}}\,=\mathop{\int}\nolimits_{-\infty }^{\infty }{[F(x)-H(x-{x}_{o})]}^{2}dx$$
(16)

where F(x) is the cumulative distribution function of the forecast, H(xxo) is the Heaviside step function of the observation xo, and x is the forecasted variable. Lower CRPS values indicate better forecast performance. We apply the CRPS to evaluate continuous ensemble forecasts in our hydrograph predictions. This score measures the integrated squared difference between the cumulative distribution function of the forecast and the observation, assessing both accuracy and precision59. For our forecasts, we calculate the CRPS for each time step, evaluating the entire probability distribution drawn from time-lagged ensembles.

For bridge warning events, we utilize the Brier Score. The Brier Score measures the reliability of probabilistic forecasts of a binary event occurring. The Brier Score is calculated as:

$${{BS}}\,=\frac{1}{N}\mathop{\sum }\limits_{t=1}^{N}{({f}_{t}-{o}_{t})}^{2}$$
(17)

where N is the number of forecasts, ft is the forecast probability of the event occurring, and ot is the actual outcome (1 if the event occurred, 0 if it didn’t)60. In our case, ot is equal to 1 when the measured discharge exceeds the bridge low chord elevation, and 0 otherwise. Thus, for perfectly accurate forecasts of bridge warning, the Brier Score will evaluate to zero.

To compare the relative improvement of different forecasting methods, we calculate skill scores. For the CRPS, we compute the Continuous Ranked Probability Skill Score (CRPSS) for both the nudging and Kalman Filter (KF) methods, using the open-loop forecast as a reference:

$${\text{CRPSS}}_{{\rm{Nudging}}}=1-\frac{{\text{CRPS}}_{{\rm{Nudging}}}}{{\text{CRPS}}_{{\rm{open}}\text{-}{\rm{loop}}}}$$
(18)
$${\text{CRPSS}}_{{\rm{KF}}}=1-\frac{{\text{CRPS}}_{{\rm{KF}}}}{{\text{CRPS}}_{{\rm{open}}\text{-}{\rm{loop}}}}$$
(19)

Similarly, for bridge warning events, we calculate the Brier Skill Score (BSS) for both nudging and KF methods:

$${\text{BSS}}_{{\rm{Nudging}}}=1-\frac{{\text{BS}}_{{\rm{Nudging}}}}{{\text{BS}}_{{\rm{open}}\text{-}{\rm{loop}}}}$$
(20)
$${\text{BSS}}_{{\rm{KF}}}=1-\frac{{\text{BS}}_{{\rm{KF}}}}{{\text{BS}}_{{\rm{open}}\text{-}{\rm{loop}}}}$$
(21)

These skill scores range from negative infinity to 1, with positive values indicating improvement over the open-loop forecast, negative values indicating degradation, and a score of 1 representing a perfect forecast.

To quantify the relative improvement of the KF method over nudging, we calculate the difference between these skill scores:

$$\Delta \,\text{CRPSS}={\text{CRPSS}}_{{\rm{KF}}}-{\text{CRPSS}}_{{\rm{Nudging}}}$$
(22)
$$\Delta \,\text{BSS}={\text{BSS}}_{{\rm{KF}}}-{\text{BSS}}_{{\rm{Nudging}}}$$
(23)