Introduction

Floods continue to pose a major global threat, causing significant loss of life and extensive damage to infrastructure, property, and agriculture. In recent decades, severe flooding driven by record-breaking rainfall events has become more frequent worldwide1,2. For example, in July 2021, Zhengzhou, China, experienced an unprecedented rainstorm, leading to catastrophic urban flooding3. The city recorded an hourly maximum rainfall of 201.9 mm, breaking the national record previously held since 19754. Similarly, in August 2017, record-breaking rainfall associated with Hurricane Harvey led to severe flash flooding in Houston, USA, inundating over 300,000 structures and impacting approximately half a million vehicles5. High-impact events of this magnitude were previously considered highly improbable, but they are occurring with increasing regularity.

At present, the predominant approach for designing flood protection systems and water-related infrastructure relies on extreme value analysis (EVA), which involves fitting a theoretical statistical distribution, typically a generalized extreme value (GEV) distribution, to historical extreme records6. While EVA is mathematically robust and has been widely used in engineering practice, it is constrained by two fundamental assumptions: that the statistical properties of extremes remain stationary over time, and that the historical record adequately represents the full range of future possibilities.

Three major challenges undermine these assumptions. First, the climate is nonstationary7,8,9,10, and climate change is altering the frequency and intensity of extreme rainfall events. Global warming enhances atmospheric moisture-holding capacity and convective intensity, increasing the likelihood of record-breaking rainfall11,12,13,14. Second, different extreme rainfall events may belong to different statistical populations because of the presence of multiple rainfall-generating mechanisms15,16. This heterogeneity violates the assumption that all extremes belong to a single statistical population. However, extensions such as the two-component extreme value (TCEV) distribution17,18 can account for this heterogeneity by modeling two populations of extremes originated from different physical processes. Third, even in a stationary climate with a distribution of events from the same population, internal climate variability19 can lead to the occurrence of record-breaking events that are significantly larger than any previously observed on records. This is due to the inherently stochastic nature of the climate system, which implies that even relatively long observational records (e.g., >30-year records) may fail to capture very rare but physically possible extreme events. In this study, we focus on the second and third challenges, as the first has been extensively addressed in the literature on rainfall extremes in a changing climate10,20,21,22,23,24,25.

We illustrate the problem investigated here with an example. During Hurricane Harvey in August 2017, a rain gauge in northwest Houston, USA, recorded 408.4 mm of rainfall in 24 h, as shown in Fig. 1. This event significantly exceeded all other observed extremes by a considerable margin. While the exact return period of this event is unknown, its probability of occurrence was likely underestimated by the conventional EVA approach. The GEV distribution fitted to historical annual precipitation maxima provided lower estimates for 100–300-year return periods and failed to account for the magnitude of this hurricane-induced event, even when considering the 5–95th confidence intervals for a 100-year return period. Given that hydraulic infrastructures are typically designed based on return periods of 100 or 200 years, the existing infrastructures were largely unprepared for an event of this magnitude. While illustrative of a single case, this example motivates a broader systematic investigation of record-breaking extreme rainfall events and underscores the limitations of conventional EVA methodology in assessing the risks posed by such extremes.

Fig. 1: Observed extreme rainfall for a 24-h duration and fitted GEV distribution (excluding the record-breaking event during the fitting) for a station located northwest of Houston, Texas.
Fig. 1: Observed extreme rainfall for a 24-h duration and fitted GEV distribution (excluding the record-breaking event during the fitting) for a station located northwest of Houston, Texas.
Full size image

The shaded area shows the 5–95th confidence interval for the GEV obtained from 103 bootstraps. The observed 24-h maximum rainfall (408.4 mm) recorded on August 26, 2017, during Hurricane Harvey, is shown as a representative record-breaking event according to the plotting position method (Weibull formula), which is unlikely to represent the true return period of this event.

To enhance the robustness and resilience of engineering design to avoid flood damage, there is a growing need for methodologies that can account for internal climate variability and diverse rainfall-generating mechanisms. In this study, we investigate whether a stochastic weather generator can serve as a more reliable tool for estimating record-breaking rainfall events compared to conventional EVA. Specifically, we use the Advanced Weather Generator (AWE-GEN) to simulate the stochastic variability of the precipitation process by generating a large ensemble of 100-year-long hourly synthetic rainfall time series (see “Methods”). Unlike conventional EVA, which relies solely on the “tail” part of historical records, our approach considers the full distribution, including both tail and non-tail parts. This allows the reproduction of a broad range of rainfall statistics beyond extremes, including the explicit representation of internal climate variability and different rainfall-generating mechanisms by using rainfall statistics computed over different months.

The proposed stochastic approach based on AWE-GEN is evaluated using hourly rainfall data from 2703 stations across nine countries, spanning a range of climates and storm types. The geographical distribution of these stations is shown in Fig. 2. Among them, we have identified 429 stations that experienced record-breaking rainfall events for various durations (1, 3, 6, 12, and 24 h), highlighted in green in Fig. 2. Details on identifying record-breaking rainfall events and the proposed stochastic approach are provided in the “Methods” section. In the following sections, we present the performance of the new approach relative to conventional EVA and discuss the implications for flood risk estimation and infrastructure design under uncertainty.

Fig. 2: Spatial distribution of the 2703 quality-controlled rain stations used in this study, located across the United States, Belgium, Germany, Switzerland, the United Kingdom, South Korea, Japan, Singapore, and New Zealand.
Fig. 2: Spatial distribution of the 2703 quality-controlled rain stations used in this study, located across the United States, Belgium, Germany, Switzerland, the United Kingdom, South Korea, Japan, Singapore, and New Zealand.
Full size image

Highlighted in green are the selected stations identified as experiencing record-breaking rainfall events.

Results

Simulation of unseen record-breaking rainfall events

The performance of the stochastic AWE-GEN approach compared to the conventional GEV-based EVA method in simulating extreme rainfall across five durations (1, 3, 6, 12, and 24 h) at ten representative stations is showcased in Fig. 3. The left column illustrates five stations where AWE-GEN fails to reproduce a rainfall magnitude equal to the record-breaking events within the 5–95th percentiles for a 100-year return period; the right column showcases five stations where instead the proposed approach successfully captures the record-breaking events for each duration. Even though the exact value of the return period of the record-breaking rainfall event is unknown, as discussed more in detail later on, with success we refer to cases where the magnitude of the record-breaking event was captured by the 5–95th percentile range of AWE-GEN simulations for a 100-year return period, while failure denotes cases where the event exceeds this range (see “Methods”). As a 100-year period is an arbitrary decision, other target return periods (50 and 200 years) are also used as definitions of success and failure cases.

Fig. 3: Comparison of AWE-GEN simulations and GEV distributions in capturing record-breaking rainfall events across five durations (1, 3, 6, 12, and 24 h) for ten representative stations.
Fig. 3: Comparison of AWE-GEN simulations and GEV distributions in capturing record-breaking rainfall events across five durations (1, 3, 6, 12, and 24 h) for ten representative stations.
Full size image

The locations of these stations are: a Kashiwazaki, Niigata, Japan (69-year record); b southeastern New Mexico, USA (57-year record); c eastern Switzerland, near Säntis Mountain (31-year record); d near Forres, Moray, Scotland (60-year record); e near Oshima, Tokyo, Japan (69-year record); f Basel, Switzerland (31-year record); g Gifu, Japan (128-year record); h Fayetteville, North Carolina, USA (50-year record); i near Arecibo on the northern coast of Puerto Rico (49-year record); and j northwest Houston, Texas, USA (28-year record). Panels on the left (a, c, e, g, i) show cases where AWE-GEN fails to capture the record-breaking events within the 5–95th percentiles at the 100-year return period (“Failure”); panels on the right (b, d, f, h, j) show cases where AWE-GEN successfully captures the record-breaking events within the 5–95th percentiles at the 100-year return period (“Success”). Blue shaded areas represent percentile ranges of the 100 AWE-GEN simulations; the solid line shows the fitted GEV distribution, and dashed lines indicate its 5–95th confidence interval.

In all ten cases, the record-breaking events fall outside the 5–95% bootstrap confidence intervals of the GEV fits, demonstrating that the underestimation is not only due to a poor fitting of the GEV curve but persists even when accounting for parameter uncertainty. It reveals a critical limitation of conventional EVA methods: the tail behavior estimated from limited records of annual maxima often fails to anticipate the magnitude of unprecedented very large extremes.

In contrast, the AWE-GEN approach provides a broader probabilistic envelope that incorporates internal climate variability, enabling it to simulate a more realistic range of extremes. For instance, in Fig. 3j, corresponding to the Houston station affected by Hurricane Harvey, the observed 24-h rainfall (408.4 mm) is successfully captured within the 5–95th percentile range of the AWE-GEN ensemble. For the same event, the GEV distribution assigns a return period of approximately 320 years; thus, the magnitude of such an event will be underestimated using conventional 100- or 200-year design criteria. This could lead to the underdesign of hydraulic structures and flood prevention measures, posing risks to infrastructure and public safety.

Another illustrative example occurred in October 2016, when Hurricane Matthew brought heavy rainfall to central and eastern North Carolina, which resulted in major flooding. More than 600 roads were closed and nearly 99,000 structures were affected by floodwaters26. The station of Fayetteville, North Carolina, recorded a 12-h record-breaking rainfall event of 307 millimeters during Hurricane Matthew (Fig. 3h). The GEV method estimates the return period of this event at 6000 years. Furthermore, a clear discrepancy emerges between the historical observations and the GEV distribution tail. The GEV distribution systematically underestimates the high return levels, even when accounting for its 5–95% confidence intervals. By contrast, the AWE-GEN approach successfully reproduces this extreme event. This suggests that a stochastic weather generator, by accounting for a larger number of statistics and mostly internal variability, offers a more flexible and robust representation of the range of potential unseen extremes.

However, not all record-breaking events are successfully captured by AWE-GEN. For example, a station located on the northern coast of Puerto Rico experienced unprecedented heavy rainfall during Hurricane Maria in September 2017 (Fig. 3i). As a powerful Category 4 hurricane, Hurricane Maria brought catastrophic flooding and landslides to Puerto Rico, with extreme rainfall reaching 547 mm within 24 h at this station. This event represents a significant outlier in the historical rainfall record. While it lies within the broader 1–99th percentile range of the AWE-GEN simulations, it exceeds the upper boundary of the 5–95th percentile range for a 100-year return period. The GEV method also fails to represent this event within conventional return periods, assigning a return period of about 1000 years. These limitations of traditional EVA highlight the challenges of modeling high-impact outliers that deviate significantly from precedent events on record. Conversely, there is a value in using stochastic rainfall generators like AWE-GEN that are better equipped to simulate unseen but possible record-breaking rainfall events.

Success rate of the stochastic rainfall generator

After illustrating AWE-GEN potential in Fig. 3, we provide a systematic analysis of results across all selected record-breaking rainfall events and stations. Figure 4 summarizes the success rates in capturing record-breaking events across different durations (1, 3, 6, 12, and 24 h) using three approaches: (a) the proposed stochastic AWE-GEN approach; (b) the GEV distribution fitted to historical observations; and (c) the GEV distribution fitted to 100 synthetic 100-year-long realizations generated by AWE-GEN. The success rate is quantified at the 100-year return period for three percentile ranges: 10–90th, 5–95th, and 1–99th. To examine whether the poor performance of the GEV-based EVA method is primarily due to data limitations, the third approach (Fig. 4c) increases data availability by extending the rainfall time series length through synthetic realizations (see “Methods”). We also present the success rate calculated based on the 50-year and 200-year return periods for the three approaches in Supplementary Fig. 1, while, as expected, exact percentages of successes are larger using a target 200 years return period and lower using a target 50-year return period, differences are not particularly pronounced and most importantly they do not modify the relative performance of AWE-GEN and EVA based approaches.

Fig. 4
Fig. 4
Full size image

Comparison of the success rates (%) in capturing record-breaking rainfall events across different durations (1, 3, 6, 12, and 24 h) using: a the AWE-GEN stochastic rainfall generator; b the GEV distribution fitted to observed data; and c the GEV distribution fitted to the 100 years of synthetic realizations generated by AWE-GEN. Success rates are shown for three percentile ranges: 10–90th, 5–95th, and 1–99th.

The AWE-GEN approach (Fig. 4a) consistently achieves high success rates across all durations and percentile ranges, demonstrating its robustness in capturing unseen extreme rainfall events. For example, using the 5–95th percentile range, the AWE-GEN method achieves success rates of 58% (1 h), 87% (3 h), 97% (6 h), 93% (12 h), and 76% (24 h), which are considerably higher than those obtained with the conventional GEV method (Fig. 4b), which only achieve success rates of 5%, 1%, 2%, 4%, and 2% respectively. This pattern remains consistent across all percentile ranges considered.

By fitting the GEV distribution to the large ensemble of 100-year-long synthetic realizations generated by AWE-GEN (Fig. 4c), we effectively eliminate the issue of limited observational data. If the GEV method were only limited by data availability, its success rates applied to synthetic data should be comparable to those of the AWE-GEN approach. The GEV method fitted to the 100 synthetic realizations (Fig. 4c) improves considerably (Fig. 4b) but still underperforms by far the success rate obtained with the stochastic weather generator (Fig. 4a). For example, at the 5–95th percentile range, the success rates of the GEV method fitted with synthetic realizations are 21% (1 h), 58% (3 h), 57% (6 h), 45% (12 h), and 27% (24 h).

These findings suggest that the conventional GEV method suffers from inherent methodological limitations in capturing record-breaking rainfall events, beyond the commonly acknowledged constraint of limited observational data. While increasing the dataset size through synthetic realizations improves the success rates to some extent, the GEV framework still falls short in capturing the stochastic nature of unseen extreme rainfall events. In contrast, the results confirm that AWE-GEN is capable of simulating record-breaking rainfall events with a high degree of success, particularly for mid-to-long durations in the range of 3–24 h, but it still captures more than 50% of record-breaking events also at the hourly scale. This makes the proposed approach a valuable tool for assessing risks associated with record-breaking precipitation extremes. Given the dominant role of stochastic variability for future projections of station-scale rainfall extremes under an uncertain future20,27, such an approach is likely to capture most of the extremes also in a non-stationary climate. However, the model skill in simulating short-duration 1-h unseen extremes is much less than for other durations, which demands further enhancements to improve model structure at these temporal scales or for combinations of different stochastic rainfall models28.

Theoretically, the greater the extremity of an outlier event, the more challenging it is to capture its magnitude. To explore whether the degree of extremity of record-breaking events correlates with the success rate of the AWE-GEN approach, we analyze the distribution of the ratio of maximum to second maximum rainfall for both success and failure cases across all durations in Fig. 5. The ratio represents the degree of extremity of a record-breaking event in the observed historical record. Failure cases generally display higher ratios, with greater medians compared to success cases for all durations. This indicates that failure cases are generally associated with more extreme outliers, where the record-breaking rainfall event significantly exceeds the historical pattern. This pattern is particularly pronounced for longer durations, such as 12 and 24 h. In contrast, for shorter durations (i.e., 1 h and 3 h), success and failure cases exhibit more overlap in their ratio distributions, suggesting that the performance of the AWE-GEN approach at shorter timescales is less related to the degree of event extremity.

Fig. 5: Violin plots showing the distribution of the ratio between maximum and second maximum rainfall for AWE-GEN simulation success and failure cases across five durations (1, 3, 6, 12, and 24 h).
Fig. 5: Violin plots showing the distribution of the ratio between maximum and second maximum rainfall for AWE-GEN simulation success and failure cases across five durations (1, 3, 6, 12, and 24 h).
Full size image

Boxplots within the violins show the median and interquartile range. Success refers to cases where the record-breaking event falls within the 5–95th percentile range of AWE-GEN simulations for the 100-year return period, while failure denotes cases where the event exceeds this range.

Interestingly, while the failure cases are generally associated with higher ratios due to more extreme outliers, the success cases also show the presence of some high outliers across all durations. These outliers in the success cases indicate that the AWE-GEN approach has the capacity (although not always) to simulate extremes that even substantially exceed the subsequently observed record-breaking extremes.

Limitations of conventional EVA in predicting record-breaking events

We further assess whether the potential underestimation of the conventional EVA approach in estimating record-breaking rainfall events is a true underestimation or is simply because these events have very large return periods and thus EVA estimates are indeed correct. We compared the return periods estimated for all the identified record-breaking rainfall events, inverting different extreme value distributions (GEV, Gumbel, and the two-parameter Fréchet) with the theoretical return periods of these events. In this analysis, the theoretical return periods are derived directly using the same number of stations and record length as observations and are an accurate approximation of the real ensemble distribution of return periods of the analyzed record-breaking events (see Methods and Supplementary Fig. 4).

From this analysis, we can see that the GEV, despite being widely used for EVA of rainfall, consistently overestimates return periods of record-breaking events (Fig. 6). This overestimation is evident in the wide range of return periods produced by GEV, with most values often extending far beyond theoretically expected return periods. Such overestimation undermines the reliability of GEV for accurately quantifying the risks associated with record-breaking precipitation events in the tail of the distribution, as their return periods will be largely overestimated or conversely the magnitude underestimated for a given return period. For practical applications, these inflated return periods could lead to under-preparedness in flood risk management and infrastructure design. For example, record-breaking events that are likely to occur over relatively shorter timeframes (a century or so) are misrepresented as events with extremely low probabilities.

Fig. 6: Boxplots of return periods (years) for all identified record-breaking rainfall events, derived by inverting different extreme value distributions fitted to the data: generalized extreme value (GEV), Gumbel, and Fréchet.
Fig. 6: Boxplots of return periods (years) for all identified record-breaking rainfall events, derived by inverting different extreme value distributions fitted to the data: generalized extreme value (GEV), Gumbel, and Fréchet.
Full size image

The return periods derived theoretically from the definition of return period, the number of stations, and length of the time series for each station are also shown. The boxplots display the distribution of estimated return periods, with the horizontal line representing the median.

While the Gumbel and Fréchet distributions are special cases of the GEV family, in hydrological practice it is common to use them rather than the GEV. This is often done either for historical consistency, simplicity in engineering design, or based on assumptions about the tail behavior of the data. For example, many engineering guidelines still recommend the Gumbel distribution for design purposes29,30. Our analysis shows that the Gumbel distribution, while exhibiting a narrower range of return periods than GEV, still overestimates return periods relative to the theoretical expectations, with a higher median. Its overestimation is less severe than GEV but remains problematic. In contrast, the two-parameter Fréchet distribution demonstrates the closest alignment with theoretical return periods. Its narrower interquartile range and lower median values suggest that the Fréchet distribution provides a reasonable probabilistic assessment of return periods associated with record-breaking events. Similarly, Papalexiou and Koutsoyiannis (2013), by analyzing over 15,000 global rainfall records to evaluate which extreme value distribution best fits annual maximum daily rainfall globally, found that the Fréchet distribution consistently outperformed other distributions31. The heavy-tail characteristic of this distribution better captures the variability and extremity of rainfall events.

Our results reinforce the idea that using the Fréchet distribution or other distributions explicitly tailored to better capture the tail of the extreme values, such as the TCEV distribution17,18 would be essential to associate the right probability to record-breaking rainfall events originating from different rainfall-generating mechanisms16. However, the practical application of such models generally requires that the parameters for the two (or more) populations be estimated reliably from enough samples in the precipitation data, which may not always be the case. It should be noted that in any single station, the magnitude of these record-breaking events will still be underestimated using conventional return periods of 100–200 years for design, even using the Fréchet distribution, as the true return periods of these record-breaking events are estimated around 500–5000 years (see the range of theoretical return periods in Fig. 6).

While our study applies a consistent GEV across thousands of stations for objective comparison, we acknowledge that some practitioners might use a broader set of approaches in operational hydrology and engineering, including comparing multiple statistical distributions (e.g., Gumbel, Fréchet, and generalized Pareto), and regional frequency analysis29,32,33. Guidelines such as the flood estimation handbook (FEH)34 recommend the use of a range of distributions and the careful assessment of model fit, threshold selection, and data independence. Our pragmatic use of EVA was chosen to ensure methodological consistency across a large and diverse dataset and still reflects the approach used by many engineers worldwide.

Discussion

Overall, the GEV and Gumbel distributions tend to overestimate return periods for record-breaking rainfall events, with the GEV distribution showing the most pronounced overestimation. This limitation highlights the risks of relying solely on conventional EVA for infrastructure design and flood prevention. The Fréchet distribution, with its closer alignment to theoretical values, may offer a more reliable alternative, especially in contexts requiring a proper approximation of the heavy tails of the distribution. However, the stochastic approach presented here provides further advantages over traditional EVA methods. Although this approach comes with slightly higher implementation complexity and is less accessible for some practitioners, its improved performance demonstrates the value of robust stochastic modeling techniques that can explicitly account for internal climate variability and diverse rainfall-generating mechanisms. While user-friendly weather and rainfall generators are becoming more widely available35, we suggest that future work focus on developing even more accessible tools and practical guidelines to facilitate wider adoption by practitioners.

Importantly, our results show that the stochastic approach is substantially less sensitive to the exclusion of extreme events from the calibration dataset compared to conventional extreme value approaches. This reduced sensitivity is particularly valuable in the real world, where future unprecedented rainfall events are inevitably absent from historical records. By leveraging the entire rainfall time series, the stochastic framework offers a more reliable foundation for predicting unseen extremes.

In summary, our findings highlight the limitations of conventional EVA, which consistently underestimates the magnitude of record-breaking rainfall events, even when using extended synthetic datasets and considering wide confidence intervals. This underperformance is not only due to limited data availability but also reflects methodological constraints in capturing the full spectrum of rainfall variability. In contrast, the proposed stochastic weather generator approach demonstrates a more realistic capacity for simulating record-breaking rainfall events, particularly for mid-to-long durations between 3 and 12 h. Success rates for these durations consistently exceeded 95% within the 1–99th percentile range and 85% within the 5–95th percentile range. This probabilistic framework enables a statistical representation of the variability inherent in the precipitation process, enhancing the reliability of risk estimates for very rare events. As such, it provides a more adaptive and robust foundation for flood risk estimation and infrastructure planning in an uncertain climate, with direct implications for engineering practice and long-term resilience.

Methods

Observational data

Observed hourly rainfall time series from 3520 rain stations across multiple countries, including the United States, Belgium, Germany, Switzerland, the United Kingdom, South Korea, Japan, Singapore, and New Zealand, were collected from national meteorological and/or hydrological agencies (Table S1). Stations were selected based on a minimum record length of ten years, with an average record length of 44 years. Although national meteorological agencies conducted preliminary quality checks on rain gauge observations, certain records still contain anomalously high values or unrealistically long dry periods, along with missing data. To ensure that the recorded rainfall extremes are reliable and not caused by measurement errors, equipment malfunctions, or data processing mistakes, additional quality-control procedures were developed and applied to these multi-sourced hourly rainfall datasets. The quality control procedures are provided in Supplementary Information (Text S1). Ultimately, 2703 rain stations passed the quality-control procedures and were used in the formal analysis. Figure 2 illustrates the geographical distribution of the 2703 quality-controlled stations, highlighting the stations that experienced record-breaking rainfall events (see definition in the next section).

To characterize the climatic diversity of the study domain, we classified each station using the Köppen–Geiger climate classification36, based on its geographic coordinates. Specifically, we identified 315 stations in arid and semi-arid climates, 1295 in temperate climates, 925 in continental climates, 157 in tropical climates and 11 in polar climates. A detailed analysis of success rates by climate group is presented in Supplementary Fig. 2. Due to limited representation in tropical and polar climates, our analysis focused on three dominant groups with sufficient sample size: Arid/Semi-arid, Temperate, and Continental.

Identification of record-breaking rainfall events

Record-breaking rainfall events were systematically identified using hourly rainfall data from the 2703 quality-controlled rain stations. The identification involved the following steps:

  1. (1)

    Extraction of annual maxima: Annual maximum rainfall for five durations (1, 3, 6, 12, and 24 h) at each station was computed using a moving window method, ensuring the largest rainfall event for each duration was recorded for every year of available data.

  2. (2)

    Quantification of extreme event magnitudes: For each station and each duration, the maximum and the second maximum annual rainfall events were extracted from the series of annual maxima. The ratio between the maximum event and the second maximum event (Max1/Max2) was then calculated. This ratio represents the degree to which a certain record-breaking event exceeds the second-highest event in the record, serving as a quantifiable measure of extremity.

  3. (3)

    Threshold selection for record-breaking events: Stations with ratios in the top 5% (95th percentile) of the cumulative distribution function (CDF) of Max1/Max2 across all gauges were identified as experiencing significant record-breaking events (Fig. 7). These thresholds range approximately from 1.51 to 1.57, depending on the rainfall duration, implying that the magnitude of record-breaking events is at least 50% larger than the second-largest maxima on record.

Fig. 7: Cumulative distribution functions (CDFs) of the ratio between observed maximum and second maximum rainfall events for five durations (1, 3, 6, 12, and 24 h).
Fig. 7: Cumulative distribution functions (CDFs) of the ratio between observed maximum and second maximum rainfall events for five durations (1, 3, 6, 12, and 24 h).
Full size image

The inset highlights the 95th percentile, representing the ratio thresholds identifying the top 5% of most extreme record-breaking rainfall events.

As a result, this procedure identified 135 stations for each duration that experienced record-breaking rainfall events, resulting in a total of 675 selections. Because some stations experienced record-breaking events across multiple durations, the final number of unique selected stations is 429, with an average record length of 36.2 years, as highlighted in green in Fig. 2.

Stochastic weather generator

The AWE-GEN (Advanced WEather GENerator) is an hourly weather generator designed to simulate time series of various weather variables such as precipitation, cloud cover, air temperature, and incoming shortwave radiation, by combining both stochastic approaches and some level of process representation, thus offering a flexible framework for weather simulation going beyond purely statistical methods37. We only use the precipitation module of AWE-GEN, which is based on the Neyman–Scott Rectangular Pulse (NSRP) model28,38. The NSRP model is well-suited for capturing the temporal clustering and variability of rainfall, particularly between seasonal and hourly scales, while it starts to degrade at sub-hourly scales28. It allows for the stochastic generation of rainfall events by considering storm arrivals, rain cells, and their timing within the storm and associated durations and intensities of rain cells, making it particularly effective for modeling extreme rainfall events. In this framework, the precipitation amount at a given time is given by the overlapping of all rain cells covering that specific time window which might include rain cells of the same storm or even different storms. Model parameters are calibrated by fitting several rainfall statistics (e.g., mean, coefficient of variation, lag-1 autocorrelation, skewness, and frequency of precipitation) at different durations (1, 6, 24, and 72 h), but there is no specific fitting associated with extreme events, contrary to other approaches39. A full description of the Neyman–Scott Rectangular Pulse model and associated parameter estimation is available in previous references37,38,40,41,42,43. AWE-GEN also has a module to deal with inter-annual variability, ensuring a realistic representation of precipitation variability from hourly to decades. The parameters of the weather generator are calibrated independently for the twelve months to account for seasonality and were estimated from the observed hourly rainfall data from the selected stations with record-breaking rainfall events but excluding the entire year containing the record-breaking event. In such a way, the model is blind to this occurrence. For a comprehensive description of the AWE-GEN model structure and parameterization, refer to Fatichi et al.37. The ability of AWE-GEN to realistically replicate the observed rainfall time series is shown in Supplementary Figs. 5 and 6.

Conventional extreme value analysis

Conventional hydrological design and risk assessment typically rely on EVA to statistically represent rare events by fitting an appropriate probability distribution6,31,44,45. Extreme rainfall time series for various durations are generally constructed using block maxima over the observational period. The GEV distribution46 is the most widely adopted and recommended distribution for extreme rainfall frequency analysis when data are obtained using the annual block maxima approach21,47,48. The CDF of the GEV is expressed as:

$$F\left({x|}\mu ,\sigma ,\xi \right)=\exp \left[-{\left(1+\xi \left(\frac{x-\mu }{\sigma }\right)\right)}^{-1/\xi }\right]$$
(1)

where \(\mu\), \(\sigma\), and \(\xi\) represent the location, scale, and shape parameters, respectively. Specifically, \(\mu\) determines the center of the distribution, \(\sigma\) describes the dispersion of data around \(\mu\), and \(\xi\) governs the tail behavior of the distribution. Depending on the shape parameters \(\xi\), the Gumbel (\(\xi =0\)), Fréchet (\(\xi > 0\)), and Weibull (\(\xi < 0\)) distributions can be derived as special cases. The GEV parameters were estimated using L-moments, preferred for their robustness against skewness and outliers, especially advantageous for limited sample sizes31,32,49.

To quantify the uncertainty in the GEV estimates, confidence intervals were calculated using a bootstrap resampling approach, generating 10³ bootstrap samples by resampling the original data with replacement. This provides robust uncertainty bounds around the fitted GEV distribution, facilitating a comparison with results from the stochastic weather generator. In this study, both the conventional EVA and the AWE-GEN rainfall generator are used under the assumption of stationarity, as is standard practice.

Comparison of weather generator vs extreme value analysis

To assess the capability of the proposed stochastic approach in simulating unprecedented record-breaking extremes, we tested both AWE-GEN and GEV-based methods using observed rainfall data from the selected stations (see section “Observational data”). For both approaches, parameters were estimated using all available records excluding the year of the record-breaking event, simulating a scenario in which the extreme event had not yet recorded. This allowed us to evaluate whether both approaches could successfully identify the record-breaking event as a potential future rainfall extreme. For the proposed stochastic approach, an ensemble of 100 synthetic realizations, each comprising a 100-year-long hourly rainfall time series was generated with AWE-GEN. These 100 realizations represent equiprobable stochastic replicates of the rainfall time series and thus explicitly consider internal climate variability in rainfall occurrence.

To determine if the limited performance of the GEV-based EVA method was primarily due to data limitations, an additional comparison was performed. The GEV distribution was fitted separately to each of the 100 synthetic rainfall realizations generated by AWE-GEN. By evaluating the ability of these GEV fits to capture the record-breaking events, we assessed whether increased data availability (in this case from synthetic realizations) improved the performance of the conventional EVA approach, or if its limitations persisted despite having longer or much longer data series.

Additionally, to evaluate whether the degree of extremity of record-breaking events influences the ability of AWE-GEN to capture them, we classified the stations into two groups: (1) success cases, where AWE-GEN simulations captured the event within their 5–95th percentile range at a given return period, and (2) failure cases, where the record-breaking event exceeded the 95th percentile. We then analyzed the distribution of Max1/Max2 for both success and failure cases through violin plots.

When analyzing a large number of stations (2703) with extensive observational data, it is expected that some observed maxima will have very large return periods, potentially much larger than the value assigned with the Weibull plotting position formula50. The distribution of these very large return periods is a function of the number of stations and recorded data length. To provide a robust benchmark for evaluating return periods estimated by the GEV and two widely used extreme value distributions in engineering practice (Gumbel and two parameters-Fréchet distribution), we generated theoretical distributions of return periods associated with record-breaking events. Specifically, for each station and observational record length, we randomly generated return periods based on their definition, i.e., as the inverse of exceedance probability, with exceedance probability drawn uniformly between 0 and 1 for the same number of years for each station in the database. We then selected only the top 5% of stations where a return period for a station exceeds the second-highest return period by the largest ratio, mirroring the criteria used to identify record-breaking events in the actual data. Although it is impossible to precisely assign return periods to individual observed events, this stochastic approach provides a realistic theoretical distribution of return periods across all the stations-years used, enabling meaningful comparison with return periods obtained from inverting the extreme value distributions. To account for randomness in the return period generation process, we repeated the random generation procedure twenty times and confirmed that the resulting theoretical return-period distributions remained stable across replicates (Supplementary Fig. 4).