Introduction

An increase in the rate of dam failures has led to growing concerns about the state of aging dams in the United States (US) and elsewhere. The decadal rate of dam failures has been increasing, especially since the 1970s1, and this trend appears to persist (Supplemental Fig. 1a). The deteriorating state of dam maintenance, coupled with a changing climate by more extreme rainfall events, could precipitate failures due to overtopping2,3,4,5. The US National Inventory of Dams (NID) contains over 92,000 dams with an average age of 60 years6. Of these, approximately 25,000 are classified as posing a high or significant hazard on failure, and 74% of these dams (18,657) are rated as fair, poor, unsatisfactory, or not rated for their condition. Since 2000, there have been reports of 630 dam failures in the contiguous United States (CONUS) attributed to hydrologic factors7. The conventional dam design criteria require the spillway to be designed to handle the “inflow design flood”, which is usually identified as the “Probably Maximum flood” (PMF) using the “Probable Maximum Precipitation” (PMP)8. These criteria are typically “event” based, where the event has a duration of 1 to 24 hours9 under conditions of extreme antecedent moisture. Most large dams were designed to withstand extreme floods with an implied return period ranging from 104 to 107 years10,11, as estimated by the PMF approach.

However, dam failures in the last 20 years and many near failures suggest more modest return periods for failure, i.e., higher failure risks. For example, climate conditions during the Oroville spillway failure in 201712, as well as during the Michigan dam failures in 202013, were shown to be relatively moderate. Notably, of the 630 recorded failures, 18 were primarily induced by snowmelt during moderate rainfall events7. The narrative of many of the recent dam failures, including Oroville, the Michigan dams, and the Libyan dam failures in 202314, highlights two main factors: (1) the overtopping failure was associated with multiple prior rainfall events that led to “watershed memory” or high “antecedent soil moisture”, and to a full reservoir at the time of the last rainfall event that led to overtopping, and (2) reservoir operators invariably kept the reservoir relatively full due to operational requirements anticipating future water demands, or to meet regulatory requirements, and hence there was low flood storage capacity when the overtopping storm occurred. These conditions fall well below the assumed design criteria that are designed using extreme storm scenarios of some duration under extreme antecedent moisture conditions as the causal factor15.

The frequency, intensity, and clustering of extreme rainfall events vary with inter-annual (e.g., ENSO) and decadal climate variations, and anthropogenic climate change16,17,18,19. Do these variations, coupled with aging and poorly maintained dams lead to significant increases in the risk of overtopping and dam failures? Several studies have shown that the current hydrologic infrastructure design standards are insufficient, given the substantial increase in the frequency of extreme meteorological events20,21. Consequently, an increased likelihood of hydrological failures of dams is anticipated across various regions of the US in response to the changing climate22,23. Nonetheless, to date, no systematic analysis has addressed this critical question on a continental or global scale, partly due to an underestimation of the issue’s seriousness and the complexity involved in modeling hydroclimatic conditions alongside human behavior (reservoir operation). Such an analysis requires assumptions as to operating rules and condition of the dams that are hard to validate in large-scale studies. Extensive data on rainfall are available for the US, and how changing rainfall extremes affect dam failure is an open question.

Numerous studies have provided evidence that extreme precipitation events have been increasing both in intensity and frequency in large parts of the US24,25,26,27. Climate projections suggest this trend will persist as global warming continues, as predicted by the Clausius-Clapeyron relationship24,28. However, no corresponding analysis of a sequence of rainfall events preceding a large rainfall event has been done at a continental scale. A persistent atmospheric circulation regime can bring recurrent storms into a region, culminating in an extreme flood peak and volume29,30.

We consider three main questions: 1) What primarily triggered recent hydrologically induced-dam failures: a sequence of high rainfall events, a single extreme rainfall event, or a combination of both? 2) How has the likelihood of these factors evolved across the CONUS? 3) What does it imply for the changing risk of dam failure in different regions of the country relative to the data that may have been available at the time of dam design? We explore these questions using data for the 552 recent dams whose failure was attributed to hydrologic factors, but excluding snow-related failures, and find that extreme rainfall event sequences rather than an extreme 1-day rainfall are more common at failure. The trends in both the annual one- and multi-day and joint precipitation events across the country were analyzed to provide an indication of the possible regional exposure of CONUS dams to such events.

Results

Return periods of 1-day and precedent rainfalls during dam failures

We addressed the first question, i.e., whether the event rainfall or the preceding multi-day rainfall, or a combination of the two, was extreme at the time of failure of the 552 US dams that have failed since 2000 using a blended 1901-2021 20CRv3-ERA5 reanalysis rainfall dataset. At each site, the dam failure event rainfall’s return period was assessed using an analysis of the maximum 1-day rainfall preceding dam failure (A*). The return period was evaluated using the annual maximum daily rainfall (A) for the full period of record using a non-stationary GEV model with a linear temporal trend in the scale and location parameters (see Methods for details). A similar analysis was performed for the cumulative rainfall (K) for the k = 5 and 30 days preceding failure, using a non-stationary Gamma distribution with linear time trends for the location and scale parameters, applied to the annual maximum rainfall, Kk, for each of those durations and for the joint event (A*, K*) associated with the failure event.

We find that A* associated with failure was usually a moderate event, often with a return period of under 10 years, and around 100-1,000 years in a few cases (Fig. 1). These are much smaller events in most cases than would be associated with a PMP or PMF used as the dam failure design criterion. The preceding 5- or 30-day rainfall (K5* or K30*) also had similar to slightly larger return periods, but the joint event (A*, K5*) or (A*, K30*) was usually much more unusual. For instance, in the state of South Carolina, over 40 dam failures occurred in October 2015, attributed to persistent heavy rainfall spanning several days31 and a high 1-day rainfall. This is seen in Fig. 1d, as a cluster of dam incidents characterized by notably high return periods for the joint occurrences of A* and K5*. Another cluster featuring an extreme joint occurrence of A* and K30* is observed across the states of New Jersey and New York, associated with the historical floods caused by Hurricane Irene in 2011. In this case, prolonged heavy rainfall also occurred approximately 15 days before the hurricane32,33. Consequently, the indication is that the joint occurrence of a moderately rare 1-day rainfall event coupled with a sequence of rare wet events preceding was a factor in many of the dam failures. This compound risk of persistent high rainfall is often associated with a persistent atmospheric circulation pattern leading to recurrent high rainfall events in a region29,30.

Fig. 1: Estimated return periods for precipitation events during each dam failure across the CONUS.
figure 1

Return periods of (a) 1-day maximum precipitation, A* (b) 5-day precedent precipitation total, K5* (c) 30-day precedent precipitation total, K30* (d) joint occurrence of 1-day maximum precipitation and 5-day precedent precipitation total, (A*, K5*), and (e) joint occurrence of 1-day maximum precipitation and 30-day precedent precipitation total (A*, K30*) observed dam failures at each location. The models fit for the probability distributions (GEV for A, Gamma for K5 and K30, and best copula for (A, K5) and (A, K30)) consider linear time trends fit independently to the 1901-2021 20CRv3-ERA5 reanalysis data at each dam location (see Methods for details), and the exceedance probabilities are evaluated from the local model at the year of failure of each dam.

Return periods of 1-day and precedent rainfalls under non-stationarity

The second question was whether the return period of the precipitation associated with the failure event had changed significantly from the time of design of the dam to the time of failure. A challenge with this analysis was that the date of construction was available for only 497 of the dams that had failed. A comparison of the exceedance probabilities of A*, K5*, and K30* for the dam failure event, evaluated for the year of failure versus the year of design using the at-site non-stationary models across the entire group of 497 dams does not suggest a significant change in these probabilities across these time periods (Fig. 2 and Supplemental Figs. 2 and 3). This is interesting given the widespread published examples of increases in the frequency of extreme daily and sub-daily rainfall21. However, there are clusters of dam sites across the CONUS where the exceedance probabilities of A*, K5*, and K30* appear to have increased and where they have decreased – e.g., in Fig. 2, the exceedance probabilities of A* have been increasing (red colored) across a cluster of dams in the northeast with a mean age of approximately 98 years, while the exceedance probabilities of K5* and K30* have been decreasing (blue colored) in some dams in the south with a mean age of around 58 years. There is a large variation in these results that may mask the change at the national level. A widespread change in the hydroclimatic risk associated with the failure event is not evident, perhaps because persistent rainfall events are implicated in the failure. In most cases, where the compound events (A*, K*) were rare at the time of failure, they were also rare at the time of design. The clustering of floods at inter-annual to decadal time scales has been demonstrated as an issue for flood risk and trend analysis34,35.

Fig. 2: Comparison of return periods for precipitation events at dam construction and incident times across the CONUS.
figure 2

a Comparison between the distributions of return periods estimated at the time of dam incidents and those estimated at the time of dam construction. Here, J5* and J30* denotes the joint occurrence (A*, K5*) and joint occurrence (A*, K30*), respectively. The box represents the interquartile range, while the whiskers represent the range of values within 1.5 times the interquartile range beyond the first and third quartiles. b Ratio of return periods estimated at the time of dam incidents to those estimated at the time of dam constructions for A*, (c) for K5*, (d) for K30*, (e) for joint occurrence of A* and K5*, and (f) for joint occurrence of A* and K30*. Dams lacking construction-year information are marked with an ‘X’ symbol.

Recent trends in precipitation extremes and climate-induced dam failure potential across the CONUS

Our exploration of the precipitation records confirmed evidence36,37,38 of decadal variability in precipitation extremes over the CONUS. The most recent trends for increases in precipitation extremes were often noted post-1975. Consequently, we used an instrumental data set, the 1979-2021 CPC-CONUS land precipitation to confirm the trends noted in the reanalysis data. The recent linear trends in the 10- and 100-year return period events of daily annual maximum rainfall (A) and the preceding 5- and 30-days of rainfall (K5 and K30) across the CONUS for the CPC-CONUS data are shown in Fig. 3. We note that while there are isolated regions with statistically significant trends in A, K5 and K30 (Fig. 3 and Supplemental Fig. 4a–f), there are widespread statistically significant trends in the joint events of 100-year A and 10-year K for both k = 5 and 30 days. Notably, substantial clusters of dams, specifically those with large size (height > 12.2 m or storage capacity > 1.23 million m3) and categorized as posing a high or significant hazard upon failure, are located in areas demonstrating increased occurrences of these joint events (Fig. 4 and Supplemental Fig. 4g, h). This is concerning since a) it suggests that these joint events are getting stronger/more frequent and b) even accounting for these trends, the joint (A*, K5*) or (A*, K30*) associated with the dam failure event often had very high return periods for many of the US dams that had a hydrologic failure in the last 22 years.

Fig. 3: Recent linear trends in 10-year and 100-year precipitation events across the CONUS.
figure 3

Recent linear trends in (a) 10-year annual maximum daily rainfall, (b) 100-year annual maximum daily rainfall, (c) 10-year 5-day precedent rainfall total, (d) 100-year 5-day precedent rainfall total, (e) 10-year 30-day annual maximum rainfall, and (f) 100-year 30-day annual maximum rainfall, for the 1979-2021 CPC-CONUS daily rainfall dataset. The color represents the ratio of the return level in 2021 to that in 1979.

Fig. 4: Spatial distribution of aging dams and recent trends in extreme joint precipitation events across the CONUS.
figure 4

Spatial distribution of aging dams and recent linear trends in (a) joint event of 100-year annual maximum daily and 10-year previous 5-day rainfall, and (b) joint event of 100-year annual maximum daily and 10-year previous 30-day rainfall, for the 1979-2021 CPC-CONUS daily rainfall dataset. The color represents the ratio of the return level in 2021 to that in 1979. Only large-sized dams, with height > 12.2 m (40 ft) or storage capacity > 1.23 million m3 (1000 acre-ft), categorized as high or significant hazard dams are displayed.

Discussion

There is a large literature that highlights the potential for higher and more frequent precipitation extremes and the associated potential for increased flooding as climate changes27,39. Much of this work focuses on daily and sub-daily precipitation. However, flooding requires a consideration of the antecedent conditions prior to a large rainfall event, i.e., the prior history of rainfall. The finding in this paper that most of the recent hydrologic dam failures are related to the joint 1-day and preceding rainfall event is consequently not a surprise. Although dams are ostensibly engineered to withstand inflow design floods under conservative hydroclimate conditions, however, our findings reveal that many recent failures have occurred under climate conditions significantly more moderate than those anticipated by the existing design criteria. This highlights that the traditional design criteria need to be revisited to consider a broader set of conditions that may actually lead to failure.

The fact that the frequency and severity of these compound events is increasing across the CONUS, and possibly in other places, adds an element of urgency to better understanding and predicting these conditions. Prior research emphasizes the role of persistent atmospheric circulation conditions and hemispheric scale quasi-periodic inter-annual and decadal climate variations as carriers of the information associated with such events16,17,18,19. However, given the predominant focus on climate change projections, research on the predictability of such mechanisms has been relatively limited.

The risk factor that is more difficult to map is the likely storage in the reservoir when a moderate or large rainfall event that could lead to overtopping may occur. This depends on the reservoir operator’s management strategy. Some relevant work in this direction exists40,41, but it typically requires data on reservoir inflows, outflows, and storage, which is not readily available at the continental scale. Moreover, an examination of the 2023 NID data6 reveals that for dams that are taller than 50 ft, and are rated as a significant hazard on failure, the condition of over 53% is unrated or unknown. When this is extended to include dams that have a high or significant hazard rating, the percentage of unrated or unknown condition increases to 74%. Consequently, it is difficult to assess the condition of dams that have previously failed and attribute the overtopping failure to additional factors.

A portfolio risk analysis of all US or global dams that considers the climate, fragility, and operational risk factors with a mapping to potential impacts from dam failure is needed to better understand the collective risk of cascading failure of critical infrastructure that would be triggered by dam failure and its socio-economic impacts5.

Methods

Dam incidents

The Association of State Dam Safety Officials (ASDSO) Dam Incident Database compiles a list of dam incidents that occurred within the US7. Spanning from 1874 to the present day, these records primarily originate from state programs and contain information such as the dam’s name, location, downstream hazard potential, incident date, and incident driver. Although this database may not include every dam incident that happened since 1874, it is the most extensive databases of dam incidents for the US, as ASDSO is the primary partner of the Federal Emergency Management Agency (FEMA) in the National Dam Safety Program (NDSP) and serves as the official voice for state dam safety.

For this study, we specifically focus on incidents attributable to hydrological factors or flooding since the year 2000, resulting in a selection of 552 dam incidents documented until 2021. The spatial distribution and the season of their occurrence are illustrated in Supplemental Fig. 2b. Some of these dam incidents have happened simultaneously, possibly within the duration of a single meteorological event. Consequently, although there were 552 incidents in total, they transpired over only 128 distinct days. A Mann-Kendall test shows a statistically significant upward trend in the annual number of dam incidents (Sen’s slope = 1.57, p-value = 0.001), as shown in Supplemental Fig. 1a.

We reiterate that the selection of dam incidents excludes those where snowmelt water was reportedly a triggering factor. Although snow accumulation and melting frequently lead to floods and dam failures, especially during the winter and spring seasons, limited access to long-term snow data hampers a complete evaluation of their severity (i.e., return period) and their role in dam incidents.

Precipitation data

We use data sourced from the Inter-Sectoral Impact Model Inter-comparison Project – Phase 3 (ISIMIP3), which is an initiative to provide consistent, bias-corrected climate datasets for impact modeling42,43. The project offers historical simulations forced by observational climate and socioeconomic information (ISIMP3a) and bias-corrected CMIP6 climate forcing for pre-industrial, historical, and different future emission scenarios (ISIMP3b), employing various General Circulation Models (GCMs). Data from the previous phase, ISIMIP2, has already been widely used in several studies related to climate and water resources44.

The ISIMP3a database provides four observational climate datasets; GSWP3-W5E5, 20CRv3-W5E5, 20CRv3-ERA5, and 20CRv3. All of them have daily temporal and 0.5° spatial resolutions. Their temporal coverage varies, and we used the 20CRv3-ERA5 dataset, which covers a period from 1901 to 2021. The 20CRv3-ERA5 dataset is generated based on the latest version of the European Reanalysis45 (ERA5) and ensemble member 1 of the Twentieth Century Reanalysis version 346 (20CRv3) interpolated to 0.5 degreed spatial resolution. The dataset combines ERA5 dataset for the period of 1979 – 2019 with 20CRv3 bias-adjusted towards ERA5 for the period of 1901 – 1978. The bias adjustment is done using ISIMIP3BASD v2.5.147 in order to reduce discontinuities at the 1978/1979 transition. Compared to other bias-adjustment and statistical downscaling methods, the ISIMIP3BASD v2.5.1 is shown to provide a more robust bias adjustment of extreme values and preserve trends more accurately across quantiles47,48. We used the average daily precipitation from the four nearest grid points surrounding a dam’s location, incorporating weighted values based on the proximity of these grid points to the dam.

In addition to the reanalysis precipitation data set, we also used the instrumental gridded daily precipitation data set, the Climate Prediction Center’s (CPC) Unified Gauge-Based Analysis of Daily Precipitation over CONUS (CPC-CONUS) from the National Oceanic and Atmospheric Administration (NOAA)49. The CPC-CONUS is generated by interpolating gauge reports from multiple information sources available at CPC, including the Global Telecommunication System (GTS), Cooperative Observer Network (COOP), and other national and international meteorological agencies. It was shown in an earlier study that the optimal interpolation objective analysis technique, which is used to create the CPC dataset, is capable of generating daily precipitation analysis with biases of less than 1% over most parts of the global and land areas50. The quality of the dataset is assured by performing the quality control on the collected gauge reports by comparing them to the historical records and independent information from measurements at nearby stations, concurrent radar/satellite observations, as well as numerical model forecasts, especially focusing on zero and extreme values51.

Calculation of Return periods of A* and K*

We consider two types of precipitation statistics derived from each dam incident or rainfall grid cell i: the maximum 1-day precipitation within k days prior to the incident (A*) and the cumulative k-day precipitation total preceding the incident (K*), as serving a proxy for the antecedent wetness. While a more thorough analysis might utilize direct soil moisture product instead of using a proxy, such data were not universally available for a long time across all the 552 dam failure sites; hence, precipitation data were employed. Prior studies have shown the efficacy of recent precipitation as a surrogate for soil moisture levels52.

Return periods for A* and K* are determined based on the annual maximum daily precipitation (A) and total precipitation over k days preceding the date of A (K), respectively. This involves modeling the probability distributions of A and K and subsequently establishing the return periods for A* and K*. In addition, the return periods for the joint occurrence of A* and K* are determined using bivariate copulas. The return periods for A*, K*, and their joint occurrence are determined using a non-stationary model evaluated at the year of the dam incident and the year of dam construction completion. Preliminary analyses tested various values for k (e.g., 5, 10, and 30 days) to account for differing concentration times in watersheds upstream of the dams. However, the results for k = 10 days were very similar to those obtained for k = 5 days. Consequently, we present findings exclusively for k = 5 and 30 days only, offering a focused exploration of these time frames. We established an upper limit of 30 days for k, as the residence time of precipitation absorbed by the soil layer typically does not exceed this period53.

The Generalized Extreme Value (GEV) distribution is the limiting distribution of the maxima of independent and identically distributed random (i.i.d.) variables54. It is a popular choice for analyzing block maxima of hydrometeorological data. We use the GEV distribution to model the annual maximum daily precipitation (A). For each dam site \(i\), let \({A}_{i,t}\) denote the annual maximum daily precipitation for year t derived from the average daily precipitation series obtained from the four nearest grid points in proximity to the dam’s location. The probability distribution for \({A}_{i,t}\) is specified as:

$${A}_{i,t} \sim {GEV}({\mu }_{{A}_{i,t}},{\sigma }_{{A}_{i,t}},\,{\gamma }_{{A}_{i}})$$
(1)

where \({\mu }_{{A}_{i,t}}\), \({\sigma }_{{A}_{i,t}}\), and \({\gamma }_{{A}_{i,t}}\) are the location, scale, and shape parameters of the GEV distribution, respectively. The cumulative distribution function of \({A}_{i}\) evaluated at \({A}_{i,t}\) is given by:

$${F}_{{A}_{i}}\left({A}_{i,t}\right)={e}^{-{\left[1+{\gamma }_{{A}_{i}}\left(\frac{{A}_{i,t}-{\mu }_{{A}_{i,t}}}{{\sigma }_{{A}_{i,t}}}\right)\right]}^{-1/{\gamma }_{{A}_{i}}}}$$
(2)

Following Katz (2013)55, the non-stationarity of \({A}_{i,t}\) is modeled through linear trends in the parameters \({\mu }_{{A}_{i,t}},{\sigma }_{{A}_{i,t}}\) while treating \({\gamma }_{{A}_{i}}\) as time invariant:

$$\begin{array}{c}{\mu }_{{A}_{i,t}}={{\alpha }_{1}}_{i}+{{\beta }_{1}}_{i}t\\ {\sigma }_{{A}_{i,t}}={{\alpha }_{2}}_{i}+{{\beta }_{2}}_{i}t\end{array}$$
(3)

We test the GEV distributions under four different assumptions to identify the best fit for the observed \({A}_{i,t}\) at each site: 1) No temporal trend, 2) Linear trend in \({\mu }_{{A}_{i,t}}\) only, 3) Linear trend in \({\sigma }_{{A}_{i,t}}\) only, and 4) Linear trend in both \({\mu }_{{A}_{i,t}}\) and \({\sigma }_{{A}_{i,t}}\). The model that yields the lowest Bayesian Information Criterion (BIC) is selected for each site. We use the ‘ismev’ package56 in R to estimate the parameters of the univariate GEV distribution, employing the Maximum Likelihood estimators detailed in Coles (2001)54.

The k-day antecedent precipitation (K) is modeled using the 2-parameter gamma distribution (G2). The G2 distribution is considered a reasonable choice for daily or weekly precipitation events57,58. Let \({K}_{i,t}\) denote the total precipitation over k days preceding the date of the annual maximum precipitation in year t at the dam location of incident \(i\). Reparametrizing \({1/{\sigma }_{{K}_{i,t}}}^{2}\) and \({{\sigma }_{{K}_{i,t}}}^{2}{\mu }_{{K}_{i,t}}\) into \({\alpha }_{{K}_{i,t}}\) and \({\beta }_{{K}_{i,t}}\), as proposed by Johnson et al.59, allows \({K}_{i,t}\) to be modeled as:

$${K}_{i,t} \sim G2({\alpha }_{{K}_{i,t}},{\beta }_{{K}_{i,t}})$$
(4)

where \({\alpha }_{{K}_{i,t}}\) and \({\beta }_{{K}_{i,t}}\) indicate the shape and scale parameters of the G2 distribution, respectively. The cumulative distribution function of G2 for \({K}_{i,t}\) is given by:

$${F}_{{K}_{i}}\left({K}_{i,t}\right)=\frac{1}{\Gamma \left({\alpha }_{{K}_{i,t}}\right)}\gamma \left({\alpha }_{{K}_{i,t}},\frac{{K}_{i,t}}{{\beta }_{{K}_{i,t}}}\right)$$
(5)

where \(\Gamma\) is the gamma function and \(\gamma\) is the lower incomplete gamma function. We model the non-stationarity of \({K}_{i,t}\) through the parameters \({\mu }_{{K}_{i,t}}\) and \({\sigma }_{{K}_{i,t}}\) by assuming they may change linearly over time:

$$\begin{array}{c}{\mu }_{{K}_{i,t}}={{\alpha }_{3}}_{i}+{{\beta }_{3}}_{i}t\\ {\sigma }_{{K}_{i,t}}={{\alpha }_{4}}_{i}+{{\beta }_{4}}_{i}t\end{array}$$
(6)

Similar to the process for \({A}_{i,t}\), the best G2 distribution for each site is identified by assessing four different assumptions of linear trends in \({\mu }_{{K}_{i,t}}\) and \({\sigma }_{{K}_{i,t}}\) based on BIC. The ‘gamlss’ package60 in R was used to infer the parameters of the univariate G2 distribution. The estimated values of the distribution parameters (\({\mu }_{{A}_{i,t}}\), \({\sigma }_{{A}_{i,t}}\), \({\mu }_{{K}_{i,t}}\), and \({\sigma }_{{K}_{i,t}}\)) and their trends are shown in Supplemental Figures 5 and 6, respectively.

For each dam incident \(i\), the determination of the return periods for \({A}_{i}^{* }\) (\({T}_{{A}_{i}^{* }})\) and \({K}_{i}^{* }\) (\({T}_{{K}_{i}^{* }})\) at year t is based on the probability distributions modeled for \({A}_{i}\) and \({K}_{i}\):

$${T}_{{A}_{i}^{* }}(t)=\frac{1}{{1-F}_{{A}_{i}}\left({A}_{i}^{* }|{\mu }_{{A}_{i,t}},\,{\sigma }_{{A}_{i,t}},\,{\gamma }_{{A}_{i}}\right)}$$
(7)
$${T}_{{K}_{i}^{* }}(t)=\frac{1}{{1-F}_{{K}_{i}}\left({K}_{i}^{* }|{\mu }_{{K}_{i,t}},\,{\sigma }_{{K}_{i,t}}\right)}$$
(8)

The probability distributions of \({A}_{i}\) and \({K}_{i}\) serve as the null distributions for \({A}_{i}^{* }\) and \({K}_{i}^{* }\).

Calculation of return period of the joint occurrence of A* and K*

The determination of the return period for the simultaneous occurrence of A* and K* relies on establishing the joint probability distribution of A and K. In this study, the dependence structure between A and K is modeled using copulas, which have been used extensively to specify multivariate distributions for statistical modeling of data. Prior to modeling their joint distributions, an assessment was conducted to ascertain whether the dependence between A and K changes over time.

To investigate whether the dependence between \({A}_{i,t}\) and \({K}_{i,t}\) changes over time, we test the temporal variation in the correlation \({\rho }_{i,t}\) between \({F}_{{A}_{i}}\left({A}_{i,t}\right)\) and \({F}_{{K}_{i}}\left({K}_{i,t}\right).\) For simplicity, we denote \({F}_{{A}_{i}}({A}_{i,t})\) as \({u}_{i,t}\) and \({F}_{{K}_{i}}({K}_{i,t})\) as \({v}_{i,t}\), hereafter. Note that by Eqs. (2) and (5), the temporal trends in the mean and variance of \({A}_{i,t}\) and \({K}_{i,t}\) are effectively addressed in the construction of \({u}_{i,t}\) and \({v}_{i,t}\), and thus we expect the mean and variance of \({u}_{i,t}\) and \({v}_{i,t}\) are constant over time. Then an estimate of \({\rho }_{i}\) would be provided by:

$${u}_{i,t}={F}_{{A}_{i}}({A}_{i,t}),\,{v}_{i,t}={F}_{{K}_{i}}({K}_{i,t})$$
(9)
$${\hat{\rho }}_{i,t}=\frac{\sum \left({u}_{i,t}-{m}_{{u}_{i}}\right)\left({v}_{i,t}-{m}_{{v}_{i}}\right)}{{s}_{{u}_{i}}{s}_{{v}_{i}}}=\frac{\sum \left({u}_{i,t}{v}_{i,t}-{{m}_{{u}_{i}}v}_{i,t}-{u}_{i,t}{m}_{{v}_{i}}\right)}{{s}_{{u}_{i}}{s}_{{v}_{i}}}=\frac{\sum \left({u}_{i,t}{v}_{i,t}\right)-2{m}_{{u}_{i}}{m}_{{v}_{i}}}{{s}_{{u}_{i}}{s}_{{v}_{i}}}$$
(10)

where \({m}_{{u}_{i}}\), \({m}_{{v}_{i}}\), \({s}_{{u}_{i}}\), and \({s}_{{v}_{i}}\) are the mean and standard deviation of \({u}_{i,{t}}\) and \({v}_{i,t}\), respectively. If the dependence between \({u}_{i,t}\) and \({v}_{i,t}\) is stationary, then the expected value of the cross product \({r}_{i,t}={u}_{i,t}{v}_{i,t}\) should be invariant with time. We explore the stationarity of \({r}_{i,t}\) using the Mann-Kendall test, to assess whether modeling the non-stationarity in the dependence structure is necessary. This analysis revealed no significant trends in dependence between \({A}_{i,t}\) and \({K}_{i,t}\) at the selected dams, implying a stationary dependence structure suffices for bivariate models of \({A}_{i,t}\) and \({K}_{i,t}\).

The multivariate cumulative distribution function of \({A}_{i,t}\) and \({K}_{i,t}\) is then represented in terms of their marginal of \({u}_{i,t}\) and \({v}_{i,t}\), with a stationary copula function \({C}_{i}\), defined by a copula parameter \(\theta\):

$${F}_{{A}_{i},{K}_{i}}\left({A}_{i,t},{K}_{i,t}\right)={C}_{i}\left({u}_{i,t},{v}_{i,t}{\rm{;}}\,\theta \right)$$
(11)

There are several parametric copula families available, and the strength of bivariate dependence is usually controlled by \(\theta\). Using the ‘VineCopula’ package61 in R, we determine the optimal set of copula family and \(\theta\) that best fits the given bivariate distribution based on the BIC statistics for each site \(i\).

As we accommodate non-stationarity for both \({A}_{i,t}\) and \({K}_{i,t}\), the probability of their joint exceedances for a fixed set of thresholds, such as \({A}_{i}^{* }\) and \({K}_{i}^{* }\), may also vary with time. Here, we measure the probability of joint exceedances for \({A}_{i}^{* }\) and \({K}_{i}^{* }\) at time t, i.e., \(p\left({A}_{i,t}\, > \,{A}_{i}^{* },\,{K}_{i,t} \,> \,{K}_{i}^{* }\right)\), using copulas62:

$$p\left({A}_{i,t} \,> \,{A}_{i}^{* },\,{K}_{i,t}\, > \,{K}_{i}^{* }\right)={G}_{t}\left({A}_{i}^{* },{K}_{i}^{* }\right)=1-{{u}_{i,t}^{* }-{v}_{i,t}^{* }+C}_{i}\left({u}_{i,t}^{* },{v}_{i,t}^{* }{\rm{;}}\,\theta \right)$$
(12)
$${u}_{i,t}^{* }={F}_{{A}_{i}}\left({A}_{i}^{* }|{\mu }_{{A}_{i,t}},\,{\sigma }_{{A}_{i,t}},\,{\gamma }_{{A}_{i}}\right),\,{v}_{i,t}^{* }={F}_{{K}_{i}}\left({K}_{i}^{* }|{\mu }_{{K}_{i,t}},\,{\sigma }_{{K}_{i,t}}\right)$$
(13)

Then the return period of the joint occurrence of \({A}_{i}^{* }\) and \({K}_{i}^{* }\) at year t, \({T}_{{A}_{i}^{* },{K}_{i}^{* }}(t)\), is calculated as:

$${T}_{{A}_{i}^{* },{K}_{i}^{* }}(t)=\frac{1}{{G}_{t}\left({A}_{i}^{* },{K}_{i}^{* }\right)}$$
(14)

Uncertainty of the estimated return periods for \({A}_{i}^{* }\), \({K}_{i}^{* }\), and their joint exceedance are shown in Supplemental Fig. 7.