Introduction

Riverine floods are one of the most devastating natural disasters in the world1,2. While risk management has effectively led to decreased flood vulnerability globally, the impacts of unprecedented floods continue to affect both the societal and environmental sectors3. A fundamental question that engineering hydrology pursues is how these flood extremes behave under present and future climates4. This aligns with one of the grand challenges in climate science: predicting weather and climate extremes under a changing climate (including flood-producing storms)5,6. However, flood projection remains a challenging task, since it requires characterizing changes in the magnitudes of rainfall extremes as well as their frequencies of occurrence, both of which modify the likelihood of unprecedented flood risk.

Conventional approaches to flood projection involve integrating bias-corrected, downscaled global climate model output with either process-based hydrological models or extreme theory-based statistical models7,8. The validity of such model chains is frequently challenged by the reliability of global climate models in representing the influences of atmospheric dynamics (i.e., the movement and circulation of the atmosphere) and its interplay with atmospheric thermodynamics (i.e., exchange of heat and water within the atmosphere) on extreme rainfall9. The coarse-resolution model leads to inadequate representation of space-time structures of extreme rainfall in generating pluvial floods, especially over complex terrains characterized by intricate rainfall variabilities10,11.

A preferred approach in the climate impact research community involves utilizing high-resolution regional climate models such as pseudo-global warming experiments (i.e., assuming fixed synoptic circulation patterns)12. While this approach provides fine-scale rainfall structure and the quantified influence of atmospheric thermodynamics, the role of atmospheric dynamics in dictating the rainfall process is not adequately addressed. This is less desirable since rainfall changes (in terms of both frequencies and magnitudes) are more closely tied to atmospheric dynamics than thermodynamics, particularly in mid-latitude regions13,14,15 and for the most severe storms16. The changing frequency of rainfall extremes driven by atmospheric circulation leads to changing flood magnitudes with varying recurrence intervals.

Similarly, the flow analog approach, that is selecting “analog” circulation patterns in quantifying climate change impact on rainfall extremes, demonstrates advantages in unraveling the interplay between atmospheric thermodynamics and dynamics17. Its limitation, however, emerges in dealing with the roles of mixed flood-generating processes in flood peak distributions. As demonstrated by worldwide evidence, extreme floods over a specific region are frequently associated with a collection of synoptic drivers18. For instance, the devastating floods in the eastern United States are the resultant of both landfalling tropical cyclones and extratropical systems19; floods in eastern Asian countries are associated with flood-producing storms from landfalling tropical cyclones, extratropical systems (e.g., southwest vortexes), and large-scale synoptic forcings (e.g., monsoon system)20.

Here we propose a framework of flood projection that explicitly addresses changes in rainfall extremes in terms of both magnitudes and frequency. We consider the interplay of atmospheric dynamics and thermodynamics in flood-producing storms of different magnitudes. This is mainly accomplished by synoptic clustering and a series of factual and counterfactual experiments based on large ensemble climate model outputs. We identify the relative contribution of changes in atmospheric dynamics and thermodynamics to changing rainfall extremes of varying recurrence intervals. The contribution is then used to refine convective-scale rainfall projections from thermodynamics-perturbed experiments, enabling high-resolution (i.e., 100 m and hourly) flood inundation projection through hydrodynamic simulations (see Supplementary Fig. 1 for the flowchart of the framework).

We choose the North China Plain (NCP) as the testbed for the proposed framework. The NCP region experienced catastrophic rainfall and flooding during the period 29 July–1 August 2023, with 62 fatalities and 1.8 million individuals evacuated, making it one of the deadliest floods in Chinese history. The four-day deluge produced the heaviest rainfall accumulation over Beijing in the past 140 years and multiple gauging records of two-day rainfall exceeding twice as much as the local annual average21. The July 2023 flood is accompanied by another three record-breaking floods that struck NCP during 2012–2023. Consecutive flood episodes, contrary to the normal conditions of water scarcity over NCP, pose threats to the existing flood adaption and mitigation strategies and place a quarter of the Chinese population (i.e., 400 million) at considerable risk. The dilemma in NCP makes it a “poster child” of regions globally facing unprecedented flood risks.

Our results indicate that an unprecedented flood-producing storm under present-day conditions (with a comparable recurrence interval to the July 2023 flood) will be 3.1 times wetter in terms of accumulated rainfall by the century’s end. This corresponds to a 32% increase per 1 K temperature rise. Substantial increase is partially contributed (i.e., 20%) by more frequent favorable atmospheric circulations for extreme rainfall. The projected flood leads to a ~4.4 times increase in inundation depths, mostly over low-lying, densely populated regions. The population exposed to moderate and high depths (i.e., within 0.5–2.0 m) shows a nearly fivefold increase under current protection levels, rising from 6.25 to 30.21 million. These results highlight the urgency of improved hazard mitigation and climate adaptation strategies in the world’s most densely populated region. Our framework provides a promising approach for managing unprecedented flood risks under a changing climate.

Results

Synoptic patterns for historical flood-producing storms

There are three main synoptic patterns for the flood-producing storms over NCP (see the “Methods” section for details of synoptic clustering, Supplementary Fig. 2). The primary distinctions among the three patterns are determined by the positioning of the western Pacific subtropical high, anomalies in the mid-latitude westerly, and the pathways of low-level moisture transport (including direction and intensity, Supplementary Fig. 2). The key features of each synoptic pattern are highlighted by a representative flood-producing storm (Fig. 1). The July 2016 flood (i.e., four-day storm total rainfall of 162 mm) is characterized by a southern displacement of the western Pacific subtropical high. This facilitates moisture transport along its western fringe from the Bay of Bengal towards NCP. The “cut-off” low-pressure system to the north of NCP enhances the ascending motion of warm humid air mass and atmospheric instability (Pattern 2, Fig. 1b). The synoptic pattern of the July 2016 flood, with the largest frequency of occurrences (N = 27) during the period 1961–2023, resembles that of the climatological mean rainfall in July and August over NCP22.

Fig. 1: Synoptic conditions of three historical flood-producing storms over NCP.
figure 1

a The August 1963 flood, b the July 2016 flood, and c the July 2023 flood. Red contours show 500 hPa geopotential height (with an interval of 40 gpm). Shade and vector show the magnitude (in kg/m2/s) and direction of 850 hPa water vapor flux. NCP is outlined by the black box. The annual maximum 4-day rainfall total (AMP4day) of each flood event and its historical rank during the period 1961–2023 are shown in the subtitles.

For the July 2023 flood (i.e., four-day storm total rainfall of 227 mm), the subtropical high is displaced northwards compared to the July 2016 flood. The contour of 5760 gpm exhibits a notable westward extension over the continent. Tropical cyclones developed along the southern flank of the subtropical high, leading to sharp pressure gradients and intense moisture transport (i.e., the outer tropical cyclone rain bands) from the western Pacific towards NCP (Pattern 3, N = 16, Fig. 1c). The “blocking” high to the north of NCP led to stagnant storm motion and thus persistent rainfall. The synoptic pattern of the August 1963 flood, i.e., four-day storm total rainfall of 264 mm, is the largest storm total during the period 1961–2023 (Pattern 1, N = 20, Fig. 1a). It demonstrates a combination of favorable ingredients for the July 2016 flood (i.e., pressure anomalies in the mid-latitudes) and the July 2023 flood (i.e., low-level moisture transport from the west Pacific). Contrasting synoptic patterns lead to diverse spatial rainfall patterns (Supplementary Fig. 3). The contrasting rainfall patterns are determined by how low-level moisture fluxes are controlled by regional topography and/or upper-level pressure systems.

We calculated the annual maximum 4-day rainfall accumulation (abbreviated as AMP4day, same below) over NCP. This is because disastrous floods over NCP are mostly resultant of excessive rainfall during multiple-day periods, especially those that produce widespread inundation. The time series of AMP4day is well correlated with domain-average low-level convergence (r = 0.73, P < 0.01, Supplementary Fig. 4). This highlights the importance of large-scale forcing in dictating extreme rainfall and floods over NCP. The development of mesoscale convective systems and their interplay with topography are further responsible for contrasting rainfall patterns between different synoptic groups18,21,23.

Changing synoptic patterns and flood-producing storms

The present-day decade-long CESM-LENS simulations, i.e., 1995–2004 (400 model years in total, see the “Methods” section), capture distinct features of the three synoptic patterns (Supplementary Fig. 5) as well as the corresponding patterns of AMP4day under present-day conditions (Supplementary Figs. 6 and 3). The spatial rainfall pattern within each synoptic group is well maintained under future scenarios, as revealed by the decade-long CESM-LENS simulations, i.e., 2071–2080 (400 model years in total). The mean magnitudes of AMP4day, however, show pronounced increases, especially over mountainous regions (Supplementary Fig. 6). The enhanced rainfall can be predominantly ascribed to the elevated atmospheric water-holding capacity associated with rising temperatures (i.e., thermodynamics), given the similar large-scale forcings within each synoptic group as identified by the synoptic clustering algorithm (see the “Methods” section for details).

The frequencies of the three synoptic patterns responsible for AMP4day show notable changes under future climate conditions over NCP (Supplementary Fig. 7). Specifically, the frequency of the August 1963 flood-like pattern (Pattern 1) remains constant (i.e., at 14%) in the future climate. The July 2023 flood-like pattern (Pattern 3), however, is approximately three times as frequent in the future climate (i.e., 59%) under RCP 8.5 scenario as that under present-day conditions (i.e., 20%). This is contrasted by a sharp reduction in the occurrences of the July 2016 flood-like pattern (Pattern 2), i.e., from 66% under present-day conditions to 27% in the future climate. Divergent changes in the frequencies of synoptic patterns underscore the complex nature of atmospheric dynamics and its significance for understanding influences on extreme rainfall.

To further highlight the individual impacts of atmospheric thermodynamics and dynamics on future rainfall changes, we examine the statistical properties of AMP4day in different synoptic groups based on generalized extreme value (GEV) distributions (see the “Methods” section). A notable finding is that the shape parameter for the Pattern 3 group is below zero. A negative shape parameter points to a heavy, unbounded upper tail of extreme rainfall distribution under this type of synoptic environment. This is contrasted by the distribution of AMP4day in the other two synoptic groups with positive shape parameters (Supplementary Fig. 8). Since July 2023 flood-like synoptic pattern is becoming more frequent, the shape parameter of the GEV distribution for the entire storm catalog (that is encompassing all three synoptic groups) transitions from being generally positive under present-day conditions to being generally negative in the future climate. This indicates a strong tendency to increase the likelihood of most extreme rainfall.

Thermodynamic and dynamic contributions to rainfall extremes

Contrasting responses of the AMP4day distributions to future climate among three synoptic groups highlight varying contributions of atmospheric thermodynamics and dynamics to rainfall changes. We assess their contributions through a set of factual and counterfactual experiments. These experiments involve modifying the frequencies of different synoptic patterns and their corresponding AMP4day rainfall under both present-day and future climate conditions (see the “Methods” section).

Figure 2a shows the GEV distributions of the four experiments (see Supplementary Table 1 for details). The two factual experiments show the statistical distribution of AMP4day under present-day (i.e., HIST) and future climate conditions (i.e., FUTU). If the storm catalog in the future climate is resampled based on the frequency of three synoptic patterns under present-day conditions (i.e., THERM), that is, assuming minimal changes in the frequency of synoptic patterns for rainfall extremes, we observe a comparable location of the probability distribution of AMIP4day comparable to that of FUTU, except that the upper tail is lighter. By contrast, if the storm catalog under present-day conditions is resampled based on the frequency of three synoptic patterns in the future climate (i.e., DYNM), that is assuming minimal changes in atmospheric thermodynamics, the location of the probability distribution is comparable to that of HIST. Additionally, DYNM shows a heavier upper tail than that of THERM, highlighting the role of atmospheric dynamics in dictating the probability of extreme rainfall, especially with larger return intervals (i.e., unprecedented storms).

Fig. 2: Probability distributions of AMP4day over NCP and relative contributions of different components to its future change.
figure 2

a Probability distribution of the normalized AMP4day (i.e., minus the mean and divided by the standard deviation) from the CESM-LENS output under different factual and counterfactual experiments. b Relative contribution of thermodynamic, dynamic, and their interaction to future AMP4day changes. The inset plot in a is zoomed section outlined by the dashed black box. The shade represents the uncertainty band (i.e., 25th–75th percentile) attributed to GEV fitting and sampling uncertainty (see the “Methods” section).

We quantify the relative contributions of atmospheric thermodynamics and dynamics by comparing the non-exceedance probability of AMP4day derived from different experiments based on factor separation analysis (see the “Methods” section). There is a strong dependence of the contributions of atmospheric thermodynamics and dynamics to rainfall changes on return intervals (Fig. 2b). The thermodynamic impact dominates changes in rainfall with small return intervals, but the dominance quickly diminishes with increasing return intervals. Changes in moderate rainfall (with return intervals <5 years) are almost entirely attributable to the thermodynamic impact. The contribution of atmospheric dynamics and its interactions with the thermodynamic component plays a more important role in extreme rainfall with return intervals exceeding 100 years.

The return interval of the July 2023 flood is ~186 years under present-day conditions (see the “Methods” section for details). For a 186-year storm in the future climate, changes in the storm total rainfall (i.e., 4-day accumulation) can be attributed: to 20% (the 25th–75th percentile range is 7–59%, same below) from changes in atmospheric thermodynamics, 20% (4–55%) from changes in atmospheric dynamics, and 60% (0–77%) from their interactions (Fig. 2b). This means that by neglecting changes in atmospheric dynamics, future changes in the storm magnitude with a comparable return interval will be notably underestimated.

Flood projection with dynamics-constrained rainfall forcing

We project the magnitude of the July 2023 storm in the future climate (i.e., 2071–2080) using pseudo-global warming (PGW) experiments and a scaling factor approach (see the “Methods” section). By comparing simulated rainfall between PGW experiments and present-day simulations (CTRL), we assess the influence of atmospheric thermodynamics and its interaction with atmospheric dynamics on rainfall changes. This impact is then adjusted by a multiplicative factor, which is the inverse of the sum of the contributions (including thermodynamics and its interaction with dynamics) to the simulated storm's total rainfall. For the July 2023 flood, the factor is calculated as 1.25 (the 25th–75th percentile range is 1.04–2.22, see the “Methods” section).

The factor varies for different magnitudes of storms due to the dependence of relative contributions on the recurrence intervals. The use of a convection-permitting model in PGW and CTRL simulations enables fine-scale characterization of extreme rainfall response to increased temperature. The high-resolution rainfall data from the convection-permitting model is then utilized to drive a land surface model (i.e., Noah LSM) and a flood hydrodynamic model (i.e., CaMa-Flood) to evaluate flood inundation associated with the projected rainfall (see the “Methods” section).

The CTRL simulations capture the spatial and temporal variability of the July 2023 storm quite well (see the “Methods” section). Under the RCP 8.5 scenario, the mean temperature is projected to increase by 4 K during the period 2071–2080 compared to 1995–2004. The increased temperature results in strong rainfall enhancement over NCP in the PGW simulations. The domain-averaged storm total is reported to be 455.7 ± 16.5 mm. This is around 2.5 times larger than that in CTRL simulations (i.e., 178.9 ± 5.8 mm). The temporal variability of projected rainfall in the PGW simulations remains almost consistent with the CTRL simulations (Fig. 3). The ratio of change is approximately equivalent to a 25% rainfall increase per 1 K temperature rise. By multiplying the scale factor 1.25, the ratio of rainfall change will be 3.1 times as large as the storm's total rainfall under present-day conditions, i.e., 32% per 1 K temperature increase.

Fig. 3: Spatial and temporal rainfall patterns in the CTRL and PGW simulations.
figure 3

Spatial patterns of storm total rainfall (shade, in mm) in a CTRL and b PGW simulations. c Time series of domain-average hourly rain rate (in mm/h) derived from simulations and rain gauge observations. These results are the ensemble mean of all six members. The dashed black lines represent the 200 and 400 m elevation contours over NCP. The thin gray lines represent provincial boundaries. The locations of four megacities (i.e., BJ—Beijing, BD—Baoding, SJ—Shijiazhuang, TJ—Tianjin) are shown in black circles. The gray dots highlight grids with statistically insignificant rainfall differences between the CTRL and PGW simulations (based on Student’s t-test, P > 0.01). The inset plots show averaged rainfall accumulation along different bands of elevation. The uncertainty bands in c highlight the full range of all six ensemble members.

The spatial distributions of storm total rainfall exhibit an overall high consistency between the two sets of simulations (Fig. 3). The storm total rainfall over low-lying regions on the windward side (with elevations <200 m) increases fourfold in the PGW simulations. The relative changes in rainfall magnitudes are more pronounced in low-lying areas compared to the mountains and their lee sides. This is partially attributed to the spatial distribution of temperature and atmospheric water vapor fields. A spatially uniform increment of temperature, on average, results in larger absolute rises in atmospheric water vapor and thus rainfall over low-lying regions relative to the mountains. The associated feedback of thermodynamic changes to low-level moisture convergence (i.e., atmospheric dynamic components), e.g., through increased latent heating24, might play a role in producing a relatively larger rainfall increase in low-lying regions. The positive feedbacks within the wetter storm between convective activities and latent heat release thus contribute to substantial rainfall increases25. The notable rainfall increases over the entire domain could also be associated with other factors, i.e., the capabilities of high-resolution simulations (1 km and hourly) in accurately resolving convective activities, changes in the spatial organization of rainfall cells26, statistical shifts in rainfall type (i.e., stratiform and convective) within mesoscale convective systems27, etc.

The inundation of the July 2023 flood is mainly situated at the foothills of the mountains, primarily affecting the outskirts of Beijing and Baoding, with populations exceeding 10 million each (Fig. 4a). The CaMa-Flood model simulation agrees well with in-situ observations and satellite-retrieved flood inundation (Supplementary Figs. 9 and 10). The total area of flood inundation (with flood depth > 0.1 m) amounts to ~1.84 × 104 km2. This contrasts with the projected inundated area of 7.72 × 104 km2 (7.18 × 104–9.33 × 104 km2, i.e., the 25th–75th percentile range, same below) by the 2070s. The inundation extends from the foothills of the mountains to the low-lying region, including Tianjin and Shijiazhuang, two more cities with populations exceeding 10 million (Fig. 4b). The spatial pattern of elevated inundation is associated with the spatial concentration of extreme rainfall under Pattern 3 and the increasing frequency under the future scenario.

Fig. 4: Simulated maximum flood depth for the July 2023 flood under present-day and future climate conditions and flood exposure changes.
figure 4

a Present-day flood depth, b Future-climate flood depth, c changes in flood exposure (represented by a number of population exposure to different flood depths). The results are the ensemble means of all six members. Shade represents the simulated maximum flood depth during historical and projected flood processes. The black dashed lines represent the 200 and 400 m elevation contours over NCP. The black solid lines represent the boundary of the upper Hai River basin. The thin gray lines represent provincial boundaries. The locations of four megacities (i.e., BJ—Beijing, BD—Baoding, SJ—Shijiazhuang, TJ—Tianjin) are shown in red circles. Two hydrometric stations, i.e., Zhangfang (labeled as A) and Manshuihe (labeled as B) are shown in red stars (see Figs. S9 and S10 for observed and simulated hydrographs and flood inundation). The uncertainty bar in c represents the standard deviation of six ensemble members.

The population exposed to low flood depths (i.e., 0.1–0.5 m) increases by 50%, rising from 15.35 to 21.13 million (13.54–22.67 million), whereas the number of people exposed to moderate and high flood depths (i.e., 0.5–2 m) shows nearly a nearly fivefold increase, rising from 6.25 to 30.21 million (22.92–39.37 million). This indicates a structural shift in flood exposure over NCP in future climate. While new flood-control measures might play an important role in addressing the projected expansion of inundation, substantial enhancement to the existing flood-control facilities is needed to mitigate the impacts of elevated flood depths.

Discussion

We provide a framework for high-resolution flood projection that considers both changes in the magnitude and frequency of rainfall extremes. This aligns with recent initiatives of estimating extreme floods with extremely low probabilities (e.g., probable maximum floods) through physics-based model simulations28. While existing endeavors from recent model intercomparison projects, such as CMIP6 HighResMIP29 and CORDEX-CMIP6 RCMs30, offer climate projections with state-of-art modeling techniques, these model outputs still feature relatively coarse spatial resolutions, with most coarser than 0.25° and focusing on only specific regions, such as Europe at 0.125°. Physical processes (e.g., convections, orographic lifting), which drive extreme rainfall variabilities and are subsequently responsible for unprecedented flood risks, cannot be resolved by these models10,31. Convection-permitting model simulations could address this limitation, but they are computationally demanding. This makes our framework, with the derived scaling relation through integrated large ensemble climate model simulations with convection-permitting simulations, promising in bridging the gap.

The substantial increase in population exposure is partially due to the high-end scenario (RCP 8.5) adopted in the analysis. Although the scenario represents the most intense emission pathway and might be less probable across the world32, the projected temperature increase, i.e., around 4 K, is broadly consistent with the most likely range of temperature increase (i.e., 1.6–4.1 K) across China (https://climateknowledgeportal.worldbank.org/country/china/climate-data-projections). We additionally examine changes in rainfall and population exposure by assuming a 2 K temperature increase, representing a mid-range or lower-end scenario (see the “Methods” section). Our results show the projected rainfall accumulation is 302.4 ± 8.7 mm. It is responsible for population exposure to 0.5–2 m inundation depths around 17.34 million (13.34–25.24 million, Supplementary Table 3), approximately a three-fold increase from the current climate. Our expanded temperature range analysis is valuable for assessing low-probability, high-impact flood events that are of critical concern for policy making33.

Our analysis highlights a spatial offset between the area of heightened flood inundation and that with increases in extreme rainfall. This is mainly determined by flood propagation along topographic gradients. It is also partially associated with the larger rates of rainfall increases over low-lying regions. The convection-permitting simulations realistically represent dynamical processes and their interactions with the changing atmospheric thermodynamics34. Similar spatial offsets can be expected for other worldwide regions with mountainous terrains, such as the state of California west of Sierra Nevada, the US Mid-Atlantic Region, and so forth35,36. The spatial offset highlights the utility of our framework in projecting future flood risks37.

Synoptic clustering highlights that flood-producing processes are recurring in both present and future climates, but with notable changes, in terms of their frequencies and intensities. While there is a series of statistical techniques to deal with mixtures of flood peak distributions38, it is vital to unveil the drivers that dictate the changing behaviors of flood-producing processes. This involves investigating the mechanisms that dictate interactions between tropical/subtropical systems (e.g., tropical cyclones, subtropical high) and the westerly over NCP39,40,41. A better understanding of the processes that modulate atmospheric dynamics offers effective constraints on predicting and managing unprecedented floods42.

There are a few considerations when transferring the framework to other regions worldwide. Although atmospheric dynamics play a vital role in rainfall extremes over the mid-latitude regions, their relative contribution to future rainfall changes is not as notable as atmospheric thermodynamics (such as NCP). The reverse seems true for the subtropical regions with projected notable dynamic contribution to rainfall changes43. This would lead to substantial changes in synoptic patterns, in terms of both their frequencies and the quantity of clusters. The atmospheric fields for synoptic clustering require adjustments to best capture synoptic drivers that dictate spatial and temporal variabilities of flood-producing storms, for instance, the integrated water vapor transport for atmospheric rivers over the North American coastline44, the extratropical cyclonic systems over central Europe45. These synoptic drivers, together with land surface properties that dictate flood-generation processes (such as land use/land cover), collectively determine the selection of rainfall indices for flood projection. Our framework might be limited for coastal regions characterized by compound flood drivers (i.e., rainfall and storm surge). This limitation could potentially be addressed by integrating sea level projections and storm surge modeling in future studies46.

A caveat of the present study is that large ensemble simulations of a single global climate model are adopted in the flood projection. The capability of global climate model simulations in capturing present rainfall climatology (i.e., spatial patterns and synoptic drivers) and their future changes are crucial for establishing a robust scaling ratio. Validating the regional performance of different climate ensemble outputs is thus needed prior to projection. However, it remains a challenge for existing climate models to project future atmospheric dynamics. This has mainly resulted from decadal-scale climate variability, the uncertain anthropogenic impact on circulations, as well as model biases in representing these processes47,48. We note that CESM-LENS, through providing many ensemble members with perturbed initial conditions, project the spread of internal climate variability comparable to that in those CMIP6 models49. Future studies need to expand the size of the storm catalog by including a suite of state-of-the-art climate model simulations with reasonable regional performance50. The improved large ensemble will enable probabilistic projections of extreme floods that directly contribute to better hydraulic infrastructure designs.

Methods

Annual maximum four-day rainfall accumulation

We calculated the annual maximum four-day rainfall accumulation (i.e., AMP4day) over the North China Plain (NCP) region based on the ECMWF Reanalysis V5 product (ERA5), spanning the period from 1961 to 2023. The resolution of ERA5 is 0.25° and 1-h. The performance of the ERA5 rainfall product over mainland China has been evaluated in previous studies51. As an additional evaluation, we compare AMP4day derived from ERA5 and that derived from a gridded rainfall product of China, CN05.1. The CN05.1 product is interpolated from 2400 rain gauges, with a spatial resolution of 0.25° and a daily temporal resolution. The time series of AMP4day between ERA5 and CN05.1 are significantly correlated with each other (r = 0.90, P < 0.01). The climatological mean annual maximum four-day rainfall is 65.6 mm for CN05.1 and 68.7 mm for ERA5. We principally utilize the ERA5 rainfall product in our analysis. This is mainly because the availability of atmospheric fields (e.g., wind, pressure) is directly associated with corresponding rainfall extremes. Those atmospheric fields were also used for synoptic clustering.

We chose AMP4day because extreme floods over NCP have mainly resulted from multi-day, continuous rainfall extremes (like the August 1963 flood and the July 2023 flood). These flood extremes show large spatial extents and are responsible for the most hazardous impacts over NCP. The choice of using 4 days as the accumulative period is determined based on calculating correlation coefficients between accumulative rainfall and flood peak discharge over the 296 watersheds. The correlation reaches a saturation point when the accumulative period is 4 days. There are ~220 watersheds demonstrating significant correlations (P < 0.05, Supplementary Fig. 11). Further extending the accumulative period does not notably increase the correlation and the number of watersheds with significant correlations. Short-term extreme rainfall (e.g., <4 days) is mostly responsible for local flood hazards. These storms are mainly dictated by small-scale convection and its fine-scale interactions with regional topography. Local floods from these storms are not the focus of the present study.

CESM large ensemble simulations

We examine daily rainfall based on the output from the Large Ensemble of Community Earth System Model version 1 (CESM-LENS)49. There are 40 ensemble members from CESM-LENS spanning the historical period 1920–2100 and the “high-emissions” representative concentration pathway 8.5 (RCP 8.5) simulations from 2005 to 2100. The spatial resolution is ~1°. Each of those simulations begins from a randomly perturbed initial atmospheric state that aims to characterize projection spread influenced by both climate internal variability and forced climate change49. We take two decade-long periods of the simulations, i.e., 1996–2005, to represent present-day conditions, and 2071–2080 under RCP 8.5 scenario. For each decade-long period, we obtain a catalog of AMP4day over 400 model years (that is 40 ensemble members multiplied by 10 simulation years). Like ERA5, we extract relevant atmospheric fields associated with AMP4day from the two-decade-long CESM-LENS simulations for synoptic clustering analyses.

The large sample size (400 model years) can represent a wide range of internal climate variability during the given period (i.e., 2071–2080). Extending the length of the “snapshot” period is desirable but may have little impact52. Detecting climate change signals based on comparing different snapshots of large-ensemble climate simulations remains a challenge due to multiple factors, such as decadal climate variability, model biases, etc., which is beyond the scope of the present study.

CESM-LENS has been evaluated in previous studies in terms of its performance relative to other ensemble global climate model simulations, especially in terms of representing present-day rainfall climatology over East Asia53. As our analysis demonstrates, the CESM-LENS successfully reproduces key aspects of rainfall climatology over NCP under present climate conditions, including both the spatial patterns of extreme rainfall events and their associated synoptic drivers, with performance comparable to the ERA5 reanalysis product. The regional performance of climate ensemble products plays a critical role in determining the robustness of flood projections.

Synoptic clustering

We characterize synoptic forcings responsible for each AMP4day event within the storm catalog based on sea level pressure (slp), u and v-component wind at 850 hPa (u850, v850), and geopotential height at 500 hPa (h500). These fields are averaged over the 4-day period of each extreme event. We avoid using thermodynamic fields, e.g., humidity and temperature, due to their dramatic differences between present and future climate. Synoptic clustering without thermodynamic fields is mainly intended to specifically quantify the dynamic impacts on rainfall extremes17,54,55. The selected atmospheric fields represent important features of low-level moisture transport and dynamic lifting. These are key ingredients for extreme rainfall over NCP. For instance, the domain-average low-level convergence at 850 hPa is significantly correlated with basin-average annual maximum 4-day rainfall accumulation (r = 0.73 for ERA5, and r = 0.60 for CESM-LENS, Supplementary Fig. 4). Synoptic precursors for extreme rainfall can be relatively better represented in these global climate models.

We classify these synoptic forcings into different patterns based on the spectral clustering algorithm56. The inputs for the clustering algorithm include all atmospheric variables, i.e., slp, u850, v850, h500. We conduct clustering analyses for ERA5 during the period 1961–2023. We determine the optimal number of clusters based on the Silhouette coefficient57 and the Calinski–Harabasz score58. Higher values of the metrics indicate better clustering performance. Both metrics suggest three as the optimal number of clusters (Supplementary Fig. 12). The three synoptic clusters are consistent with frequent synoptic drivers responsible for extreme floods over NCP20. Three representative historical flood-producing storms include the August 1963 storm for Pattern 1, the July 2016 storm for Pattern 2, and the July 2023 storm for Pattern 3. Similarly, we perform spectral clustering analyses for the storm catalogs in CESM-LENS. We randomly choose 200 model years from each decade-long period, i.e., 1996–2005 and 2071–2080. We did spectral clustering by integrating the two series of 200 model years into a new set of 400 model years. We chose three as the optimal cluster number, as consistently suggested by the Silhouette Coefficient and the Calinski–Harabasz score (Supplementary Fig. 12).

To assess the robustness of synoptic clustering, we employ the adjusted mutual information (AMI) score59. Specifically, we generate multiple sets (20 in total) of synoptic clusters by introducing random perturbations to the clustering parameters (“random_seed”). For each perturbed experiment, we compute the AMI score between every pair of synoptic patterns within a given cluster, where a higher score reflects stronger intra-cluster similarity. Our analysis reveals that the optimal cluster number of three yields the highest mean AMI score while exhibiting the smallest variability. More clusters lead to notable degradation of the AMI score, pointing to higher uncertainties (Supplementary Fig. 12).

We counted the frequency (i.e., number of occurrences) of different synoptic patterns for each decade-long period. This enables us to evaluate the changing frequencies of different synoptic patterns from present-day to future climate conditions. The procedures were repeated 100 times, to minimize the impact of sampling uncertainty. Frequencies of each synoptic pattern under present and future climate conditions are shown in Supplementary Fig. 7. Comparisons of spectral clustering analysis between ERA5 and CESM-LENS highlight the capabilities of large ensemble climate model simulations in capturing key dynamic features of flood-producing storms.

Generalize extreme value distribution

We examined the statistical properties of AMP4day based on the generalized extreme value (GEV) distribution60. The GEV distribution is a family of probability distributions developed within the framework of extreme value theory. It takes the form as

$$F(x|\mu ,\sigma ,\varepsilon )=exp \left\{-{\left[1-\varepsilon \left(\frac{x-\mu }{\sigma }\right)\right]}^{1/\varepsilon }\right\}$$
(1)

where \(\mu\), \(\delta\), and \(\varepsilon\) represent location, scale, and shape parameters, respectively. The location and scale parameters are related to the magnitude and variability of rainfall distributions, respectively. The shape parameter indicates the tail properties of the distribution, with negative (positive) values pointing to heavy (light) upper tails of rainfall distribution.

We estimate GEV parameters based on maximum likelihood estimators60. The model performance (especially pertaining to the upper tail) is further evaluated by comparing the quantiles of empirical data against quantiles derived from the fitted GEV distribution, i.e., the quantile–quantile plot60. We calculate the coefficient of determination for the quantile–quantile plot, to quantitively highlight the alignment of the quantiles along the 1:1 line and the performance of GEV distribution. We quantify the uncertainty related to GEV fitting based on a bootstrap approach (see Fig. 2a for the uncertainty band representing the confidence interval within the 25th ~75th percentile range).

We estimate the GEV parameters for the AMP4day catalog derived from CN05.1 covering the period from 1961 to 2023. The AMP4day value for the year 2023 is 227 mm, corresponding to an estimated return period of ~186 years (with a confidence interval of 88–460 years, representing the 25th–5th percentile range).

Factual and counterfactual experiments

We create four factual/counterfactual experiments to assess the relative contributions of atmospheric thermodynamics and dynamics to rainfall changes (see Supplementary Table 1 for details). There are two factual experiments, i.e., HIST and FUTU. For each experiment, we randomly selected 200 model years from the two-decade-long snapshots of CESM-LENS, i.e., 1996–2005 and 2071–2080, respectively. These selections led to two storm catalogs representing present-day and future climate conditions, respectively.

There are two counterfactual experiments, i.e., THERM and DYNM to quantify the impacts of changes in atmospheric thermodynamics and dynamics on future rainfall distributions. For THERM, we randomly select 200 model years from the period 2071–2080 according to the frequency of three different synoptic patterns under present-day conditions, that is 14% (N = 28), 66% (N = 132), and 20% (N = 40) for Pattern 1, Pattern 2, and Pattern 3, respectively. We assumed that atmospheric dynamics are maintained in the future climate. For DYNM, we randomly selected 200 model years from the period 1996–2005 according to the frequency of three different synoptic patterns under future-climate conditions, that is 14% (N = 28), 27% (N = 54), and 59% (N = 118) for Pattern 1, Pattern 2, and Pattern 3, respectively. We assume that thermodynamic conditions (i.e., temperature and humidity) are maintained in the future climate.

The four experiments are repeated 100 times to minimize the impact of sampling uncertainty. This leads to 100 storm catalogs for each experiment. Each storm catalog consists of 200 randomly selected AMP4day events from CESM-LENS. We derive GEV distributions for each storm catalog. Each of the GEV distributions is evaluated based on calculating the coefficient of determination for the quantile–quantile plot. The mean (median) values of the coefficient of determination for all four experiments exceed 0.98 (0.99) (Supplementary Fig. 13). The composite mean GEV distributions of the four experiments are shown in Fig. 2.

Factor separation analysis

We quantified the individual impact of changes in atmospheric dynamics and thermodynamics as well as their interactions with rainfall changes based on factor separation analyses61. The impact is calculated by estimating the change in exceedance probability of AMP4day with a particular return interval (i.e., indicating the average time interval between the occurrence of two extreme events). Relative contributions of different components are based on the following equations:

Dynamic component \({\delta }_{{{\rm{d}}}}\):

$$({E}_{{{d}}{{y}}{{n}}{{m}}}^{k}-{E}_{{{h}}{{i}}{{s}}{{t}}}^{k})/({E}_{{{\rm{f}}}{{\rm{u}}}{{t}}{{u}}}^{k}-{E}_{{{h}}{{i}}{{s}}{{t}}}^{k})$$
(2)

Thermodynamic component \({\delta }_{{{\rm{t}}}}\):

$$({E}_{{{t}}{{h}}{{e}}{{r}}{{m}}}^{k}-{E}_{{{h}}{{i}}{{s}}{{t}}}^{k})/({E}_{{{f}}{{u}}{{t}}{{u}}}^{k}-{E}_{{{h}}{{i}}{{s}}{{t}}}^{k})$$
(3)

Dynamic-thermodynamic interactions \({\delta }_{{{\rm{i}}}}\):

$$[({E}_{{{futu}}}^{k}+{E}_{{{hist}}}^{k})-({E}_{{{dynm}}}^{k}+{E}_{{{therm}}}^{k})]/({E}_{{{futu}}}^{k}-{E}_{{{hist}}}^{k})$$
(4)

where \(E\) represents the exceedance probability for each of the four experiments (HIST, FUTU, THERM, DYNM, \(k\) represents the return interval of AMP4day under present-day conditions. For this study, \(k\) varies from 2 to 300 years. According to these equations, we note that the relative contributions of different components to future rainfall changes show dependence on the return interval.

We quantify the contributions of thermodynamics and its feedback to the dynamics of rainfall changes based on PGW experiments. By considering the contribution of dynamics (i.e., changes in storm frequency), Full rainfall projection is then estimated by multiplying a scaling factor. The factor is the inverse of total contributions from thermodynamics and its interaction with dynamics, i.e., 1/(\({\delta }_{{{\rm{t}}}}\) + \({\delta }_{{{\rm{i}}}}\)) or 1/(1−\({\delta }_{{{\rm{d}}}}\)). For the July 2023 flood, \({\delta }_{{{\rm{t}}}}\), \({\delta }_{{{\rm{i}}}}\) and \({\delta }_{{{\rm{d}}}}\) equals to 20% (7–59%), 60% (0–77%), and 20% (4–55%), respectively. The multiplicative factor equals 1.25 (i.e., 1/(1–20%)). The confidence interval ranges from 1.04 (i.e., 1/(1–4%)) to 2.22 (i.e., 1/1(1–55%)).

Convection-permitting rainfall simulations

We examined the July 2023 storm over NCP based on the Advanced Research version of the Weather Research and Forecasting model (WRF, version 4.0)62. We configured three one-way nested domains. The grid spacing is 9, 3, and 1 km, respectively. The outer domain covers the East Asian continent and parts of the Northwest Pacific. The innermost domain centers over the Hai River basin, including Beijing, Tianjin, and Hebei province (Supplementary Fig. 14). We configured 38 vertical levels in the model, with 20 of them below 2 km above ground. We set 50 hPa as the upper boundary of the model. The physics options of WRF simulations include: the Rapid Radiative Transfer Model for long-wave radiation, Dudhia’s scheme for short-wave radiation, and the Noah land surface model. The Kain–Fritsch cumulus scheme is turned off for all domains. The initial and boundary conditions are provided by the ERA5 reanalysis fields. The output interval for the innermost domain is 1 h.

We conducted ensemble simulations using the ndown approach by adopting different combinations of microphysics schemes and boundary layer schemes, to minimize the impact of physics options on rainfall simulation. We chose three microphysics schemes, i.e., the WRF single-moment six-class scheme, Thompson scheme, and Morrison double-moment scheme, and two boundary layer schemes, i.e., the Yonsei University boundary layer scheme and Mellor–Yamada–Janjic TKE scheme. This leads to an ensemble of six-member simulations (i.e., six combinations of different physics options). This ensemble set is referred to as CTRL simulations. The choice of ensemble size was dictated by the computation and storage resources for the high-resolution output (i.e., 1 km and 1 h). All the simulations start from 12 UTC on 28 July 2023 and run till 12 UTC on 2 August 2023. The first 12 h were treated as a spin-up period and were not included in the following analyses.

The ensemble mean of the CTRL simulations is compared against observations from 5071 rain gauges across NCP. Model configurations capture the spatial and temporal rainfall variability of the July 2023 storm quite well (Supplementary Fig. 15). The CTRL simulations accurately represent the location of maximum daily rainfall accumulation during the four-day period. The time series of the simulated domain-average rain rate is correlated with rain gauge observations (r = 0.94, P < 0.01). The mean domain-average hourly rain rate from simulations and gauge observations is 1.81 and 1.60 mm/h, respectively.

Pseudo-global warming experiments

We investigated the thermodynamic impacts of the July 2023 storm based on the pseudo-global warming (PGW) downscaling simulations approach63. We first calculated the ensemble mean difference of decadal mean air temperatures (for both surface and atmospheric vertical levels for both July and August between 1995–2004 and 2071–2080 from CESM-LENS. The spatial pattern of surface temperature changes is consistent between July and August (Supplementary Fig. 16). The mean air temperature difference is ~4 K throughout the atmospheric levels below 200 hPa, pointing to a relatively uniform warming atmosphere under future climate conditions (Supplementary Fig. 17). We added the derived temperature perturbations to the initial and boundary conditions (i.e., ERA5 reanalysis fields) as adopted in the CTRL simulations. We then re-calculated specific humidity (at both surface and atmospheric levels) accordingly, assuming that the relative humidity is constant under thermodynamic perturbations. Previous evidence shows negligible change of relative humidity under a warming climate64. All the other model configurations were kept consistent with the CTRL simulations. This leads to another ensemble of six-member simulations, referred to as the PGW simulations.

The PGW experiments do not aim to make real projections of rainfall under a warming climate. The PGW experiments only represent future rainfall changes under specific synoptic conditions, which may become more likely under a warming climate65. We compare the climatological mean low-level wind fields between present and future scenarios. The spatial patterns exhibit remarkable consistency, with only marginal variations in magnitude (Supplementary Fig. 18). This evidence further justifies our choice of only perturbing thermodynamic fields in the PGW experiments. Although we only impose thermodynamic changes, atmospheric dynamics are allowed to evolve accordingly, i.e., demonstrating the interactions between thermodynamics and dynamics. Comparisons between the PGW and CTRL simulations shed light on the impacts of thermodynamic changes and their interactions with atmospheric dynamics on future rainfall. These impacts will be then referenced and rescaled to estimate rainfall changes attributed to the changes in the frequency of favorable synoptic conditions.

To test the impact of moderate thermodynamic changes, we implement another set of simulations by assuming a 2 K temperature rise throughout the atmospheric levels below 200 hPa. These new ensembles represent a mid-range or lower-end climate scenario. We show the domain-averaged rainfall accumulation for the July 2023 storm is 302 ± 8.7 mm with a 2 K temperature increase, compared to 455.7 ± 16.5 mm for the RCP 8.5 scenario (~4 K, Supplementary Fig. 19).

CaMa-Flood model simulations

The rainfall output from WRF simulations was further used to drive the Noah land surface model with default model parameters66 and the hydrodynamic CaMa-Flood model (version 4)67. The spatial resolution of the CaMa-Flood model is 1 min (~1.8 km). The calculated flood depth computed was downscaled onto a 3 s (~100 m at the equator) high-resolution digital elevation model. The downscaling is through comparing whether the elevation was lower than the simulated water level at each pixel7,68.

We calibrate the CaMa-Flood model by adjusting Manning coefficients of channels and floodplains as well as parameters characterizing river bathymetry (i.e., depth and width) by referencing against the observed hourly discharge observations at two stream gaging stations, Zhangfang (labeled as point A) and Manshuihe (labeled as point B, Fig. 4). The downstream of the two stations suffered the most severe inundation during the July 2023 flood. The Kling–Gupta efficiency, Nash–Sutcliffe efficiency, and correlation coefficient are 0.68 (0.56), 0.68 (0.22), and 0.84 (0.60) for the Zhangfang (Manshuihe) station, respectively (Supplementary Fig. 9). Previous studies suggest that the Kling–Gupta efficiency values above 0.3 can be considered acceptable in hydrological modeling69.

The deviations between the simulated and observed discharge series are associated with several factors. For instance, the Noah model simulates runoff over previous grids based on the infiltration excess process. Since the hourly rainfall rate is relatively small, especially during the early stage of the storm (i.e., <5 mm/h), the assumption of an infiltration-excess scheme might lead to an earlier rising limb of flood hydrographs but smaller flood peaks. Additionally, the observed flood discharge inherits uncertainty from the discharge rating curves. These curves do not include sufficient representations of extreme river flows like the July 2023 flood. During extreme floods, debris and sediments have the potential to notably impact the reliability of the rating curve by reshaping the morphology of the channels.

We additionally evaluate the performance of the flood model in simulating flood inundation by referring to satellite-retrieved inundation (Supplementary Fig. 10). As can be seen from the comparison, the simulated flood inundation is overall consistent with satellite retrievals. We use a flood depth threshold of 0.1 m as inundation (similarly see e.g., refs. 70,71,72). The hit rate and critical success index for the entire domain are 0.70 and 0.11 (see Supplementary Table 2 for definition). The performance is better for the river region to the north of the domain (i.e., the downstream of Manshuihe). The hit rate and critical success index are 0.86 and 0.23, respectively. For the river region in the southern domain (i.e., the downstream of Zhangfang), the simulated flood depth implies an overestimated flood inundation. There is a large flood detention basin in the southern domain. The detention basin was activated when the heavy rainfall episodes ended due to the exceptional flood depths. This leads to a shift in the timing of simulated flood inundation. The overestimation might be additionally associated with the uncertainty of derived inundation extent from satellite images over complex terrains73. This can be remedied by incorporating ultra-high resolution auxiliary imagery74 or LiDAR measurements75, which will be pursued in future endeavors. In the present study, we focus on the maximum flood depth for the entire simulation.

Flood exposure analysis

We compared the simulated flood depth between the present-day and future climate conditions. For present-day conditions, the CaMa-Flood model is driven by the CTRL simulations (WRF-Noah). For future climate conditions, we first derive rainfall projection with constrained atmospheric dynamics, that is, the difference between PGW and CTRL simulations multiplied by a scaling factor. This projected rainfall output is piped into the WRF/Noah/CaMa-Flood model chain. We extracted the maximum flood depth for each ensemble member under both present-day and future climate conditions.

We categorized the maximum flood depths into four different groups: 0.1–0.5, 0.5–1.0, 1.0–2.0, and >2.0 m. We overlaid a gridded population layer (with a spatial resolution of 100 m)69 onto the maximum flood depth to calculate the corresponding population exposed to different inundations. Note that we use population density maps for both current and future scenarios. China's population is projected to a slight decline by 2050s (https://databank.worldbank.org/China-Population-Projection-15-50/id/61621b1c), with the rates of changes varying across sub-regions. Since NCP is one of the most developed and populated regions in the country, the population demonstrates minimal changes76. Our projection of flood exposure should thus be less likely affected by using the current population density map.

The version of the CaMa-Flood model in this study does not incorporate representations of hydraulic infrastructure such as dams/reservoirs or levees. We note, however, that our flood projection is less likely affected by the missed representations of hydraulic structures, e.g., dams and levees, in the CaMa-Flood model. This is because the focus of our present study is unprecedented flood risk (like the July 2023 flood, the return interval exceeds 100 years). The design standards of most existing flood-control structures over NCP are around return intervals of 20–30 years77. The limited availability of the inventory for flood-control measures is evident in the literature78. Our projection highlights the regions with high flood risks under current protection levels, providing references for stakeholders and policymakers for improved mitigation and adaptation strategies. Developing the inventories of flood-control measures and incorporating their representations into flood modeling would need future efforts from the flood research community.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.