Introduction

Challenge: A changing climate suggests increasing aridity in many regions throughout the world such as the western United States where warmer temperatures are projected to exacerbate the phase transition of snow to rain, increase evapotranspiration, and result in persistent drought1,2,3. However, the relative role of precipitation and seasonal temperature changes and how they change vegetation and runoff dynamics is debated4,5. The Upper Colorado River Basin (UCRB; Fig. 1, and Supplementary Figs. S1 and S2) has experienced a multi-decadal drought since the beginning of the 21st-century6,7,8,9,10,11 with considerable impacts on local and regional economies and ecosystems. The UCRB serves as the primary hydrologic source for the entire Colorado River Basin (Basin), generating nearly 90% of the Colorado River’s annual natural streamflow12,13, which drains approximately 647,500 km2 (250,000 mi2) in the western United States and 9065 km2 (3500 mi2) in northwestern Mexico, serving seven U.S. states, two Mexican states, 30 in-basin Native American Tribes, in-basin cities such as Las Vegas and Phoenix, and out-of-basin (via trans-basin diversions) cities such as Cheyenne, Denver, Salt Lake City, Albuquerque, Los Angeles, San Diego, and Tijuana. It provides agriculture, municipal, and industrial supplies to approximately 40 million people, approximately 2.43 million hectares (6 million acres) of farmland, and supports numerous critical ecosystems, species, and habitats14.

Fig. 1: Upper Colorado River Basin (UCRB) sub-basins.
Fig. 1: Upper Colorado River Basin (UCRB) sub-basins.
Full size image

UCRB Colorado River Simulation System (CRSS) model domain is shown with a black outline. The twenty CRSS model sub-basins, including the basin identifiers (CRSSID, 1–20; Supplementary Table S1), are shown with gray outlines. The closed basin portion of the UCRB is filled in gray. The Lees Ferry gage is shown with an upward blue triangle, including the approximate location of Lake Powell and major UCRB tributaries.

The variability in precipitation, reduction in streamflow and the concomitant increase in surface air temperature in the recent multi-decadal UCRB drought are well recognized5,6,7,10,12,15,16,17,18,19,20,21,22,23,24. However, the reduction in streamflow has been more than proportional to any reductions in precipitation. This divergence is exemplified in water years 2021 and 2022, where precipitation was 83% and 97% of the 1991 to 2020 UCRB average25, yet the unregulated inflow into the major storage structure in the UCRB, Glen Canyon Dam—Lake Powell, was 36% and 63% of average26,27, respectively. Conversely, in water year 2023, precipitation was 111% of average25 and unregulated Lake Powell inflow was 140% of average28. Thus, the sensitivity of streamflow to precipitation appears to be nonlinear and asymmetric. The comparatively high 2023 streamflow given the prior dry year highlights the potential path dependence of hydrologic outcomes to a climate sequence mediated by an active ecosystem that responds to both climate and hydrology.

Researchers hypothesize that multiple, often compounding mechanisms are contributing to observed streamflow declines in the UCRB, including: (a) increased evapotranspiration, bare soil evaporation, and snow ablation from increasing temperatures, particularly in the spring and summer, or summer and fall, has amplified streamflow reductions and that these effects will intensify under an increasingly warming climate10,15,16,18,19,20,21,22,23,29; (b) warming-driven snow loss reduces surface albedo, increasing net radiation and evapotranspiration, among other effects, thereby further suppressing streamflow30; and (c) declines in spring precipitation coupled with reduced albedo from diminished cloud cover, lead to increased solar radiation intensifying warming and thus earlier snowmelt, collectively increasing potential evapotranspiration and exacerbating streamflow losses31.

The central question we focus on is how surface runoff efficiency (RE), i.e., the amount of surface flow (Qs) generated per unit precipitation (P) may change with future P and temperature (T) conditions. The dynamics of P and T as they influence streamflow, including through vegetation, need to be better understood to advise water resource management policies and practices, including those related to consumption and conservation. To explore this question formally, we develop specific composite hypotheses based on literature that could be tested with readily available and publicly accessible data, rather than outputs from simulation models and use formal causal inference approaches to refine and identify the plausible causal pathways or networks to explain the dynamics of RE conditional on the observables. We find that winter-spring P, rather than summer T is the dominant driver of the changes in RE. The seasonal maximum of snow water equivalent (SWE), as determined by P, influences the spring T, and the biomass production reflecting changes in the vegetation phenology and activity, implying a higher evapotranspiration demand in the spring. Summer T that has been considered a driver of increased evapotranspiration in other studies, does not emerge as a statistically significant driver conditional on the information encoded in the winter-spring P, T and vegetation dynamics.

Setting: The UCRB contributes approximately 90% of the annual natural streamflow to the full Colorado River Basin12,13. Approximately 80% of this flow occurs between March and August12. Although snow accounts for approximately 50% of annual precipitation in the UCRB32, it produces an estimated 70% of the annual streamflow33. This disproportionate contribution reflects the basin’s snowmelt-dominated hydrology, particularly given that nearly 73% of water year precipitation occurs from early-winter through late-spring (October through June; Supplementary Table S3), underscoring the UCRB’s strong dependence on cold-season snow accumulation and spring melt dynamics; hence, we refer to the period October through June as the early-winter through late-spring precipitation season, corresponding to the snow accumulation season and the primary phase of melt that initiates and sustains the bulk of streamflow generation, with remaining contributions extending into the summer months. The dry season base flow is sustained by the earlier precipitation through groundwater recharge32,34.

Rather than relying solely on the Colorado River at Lees Ferry gage, proximate to demarcating the UCRB from the Lower Colorado River Basin, which has been the focus of many prior studies, we selected 16 sub-basins in the UCRB (Fig. 1, Supplementary Figs. S1 and S2, and Supplementary Table S1) based on the available naturalized streamflow data (water years 1906–2020) to better capture spatial variability in hydroclimatic drivers. The sub-basins are identified by a Colorado River Simulation System identification (CRSSID) number, used by the Bureau of Reclamation (Reclamation) in its water resources modeling. The sub-basins range in size from just under 778 km2 (300 mi2) to just under 51,800 km2 (20,000 mi2), with an average of approximately 11,655 km2 (4500 mi2). Average sub-basin elevations range from approximately 1981 m (6500 ft) to 3353 m (11,000 ft), with the average summit to outlet elevation range of approximately 1524 m (5000 ft). On average, over 50% of the area of the individual sub-basins are over 2286 m (7500 ft), and nearly 20% of the individual sub-basins are over 3048 m (10,000 ft) (Supplementary Fig. S1 and Supplementary Table S1). Hydroclimates and ecological diversity vary across and within the sub-basins. Across the full UCRB, natural vegetation comprised of grasslands and wetlands (~31%), shrublands (~65%), and forests (~26%) combined account for ~93% of the land cover, while agriculture accounts for 2%, and water, barren, and developed areas ~4% (Supplementary Fig. S2 and Supplementary Table S2). The common trend in surface runoff efficiency (RE) across the UCRB can be illustrated using the two most productive of the 16 sub-basins examined (Fig. 1 and Supplementary Table S1), which collectively account for, on average, ~25% of the UCRB’s annual streamflow. While RE generally varies positively (in phase) with precipitation (P) and negatively (out of phase) with temperature (T), the long-term trend in RE is decreasing across the full instrumental period (water years 1906 through 2020). We find P to be mostly stable although decreasing nominally, while T is increasing (Fig. 2 and Supplementary Table S4) indicating a level of connection between RE and a clear warming trend which reinforces a potential T or evapotranspiration modulation signal. The sharp T increase in the mid-1980s aligns with well-documented warming trends in the late 20th century12,35,36,37. Additionally, there are coherent decadal variations in these hydroclimate variables, suggesting that large-scale ocean-atmosphere circulations produce decadal regimes with wet-cool and dry-warm phases that amplify or attenuate variations in RE, respectively. Multiple hydroclimatic mechanisms drive streamflow with varied influence and timing. While the region has a mixed topography, land cover, and land use (Supplementary Figs. S1 and S2, and Supplementary Tables S1 and S2), a common large-scale climate signal may induce processes in the dominant streamflow sub-basins and account for the decadal variations. Relatedly, Zhao & Zhang link tropical Pacific sea surface temperature and spring precipitation in the UCRB38.

Fig. 2: Temporal trends in surface runoff efficiency (REt), precipitation (Pt), and spring temperature (TMAMJt).
Fig. 2: Temporal trends in surface runoff efficiency (REt), precipitation (Pt), and spring temperature (TMAMJt).
Full size image

Standardized RE, P and March through June average temperature (TMAMJt) distribution for water years 1906–2020 are shown with black, blue, and red points respectively with corresponding LOWESS (locally weighted scatterplot smoothing) lines using a 11.5-year moving window for two sub-basins—(a) Colorado River at Glenwood Springs, CO (CRSSID 1; contributes the highest flow fraction—on average, ~15% of UCRB annual streamflow); (b) Colorado River near Cameo, CO (CRSSID 2; contributes the second-highest flow fraction—on average, ~10% of UCRB annual streamflow). The early 21st-century 21-year drought period (2000–2020) is highlighted in yellow. Details of monotonic trend and change-point analyses are provided in Supplementary Table S4.

Approach: A common approach used by past studies is to use spatially distributed physics-based hydrologic models (see Supplementary Note S1 for discussion and examples) to estimate streamflow and hydrologic fluxes in the basin39. These are widely considered to be robust approaches. However, the interpretations and projections of hydrologic processes and fluxes from these models generally vary due to uncertainties in the inputs and the parameters, and most do not address landscape and vegetation changes that may have occurred or model parameters changes over time15,16,40,41,42,43,44,45,46,47. Therefore, direct inferences as to the potential causal mechanisms for the dependence of surface runoff efficiency (RE) on the climatic inputs using observations in the UCRB are valuable.

Our data-based study employs Bayesian Network (BN) based causality analysis. BNs48,49,50 have been used in many scientific fields, including hydrology51,52,53, for data-informed structured causal inferences with hypotheses postulated as directed acyclic graphs (DAGs). We develop and test specific causal hypotheses with conditional probability models using observed data and confirm the results using multivariate regression. Trends and associations with large scale atmospheric circulation mechanisms are also explored. Recent studies have also used methods such as PCMCI+ (Peter-Clark Momentary Conditional Independence Plus)54,55,56,57,58,59, which are conceptually related to the BN literature but are designed to explore causality in multivariate time series where a priori causal hypotheses may not be forthcoming (see Supplementary Note S2 for additional details). A workflow diagram of our general analytical approach is presented in Supplementary Fig. S11.

Hypotheses: For the UCRB, we analyzed a set of plausible mechanistic hypotheses regarding the determinants of water year surface runoff efficiency (RE). For each water year, t, REt is defined as (Qt − Qbt)/(Pt A), where Qt is the naturalized streamflow integrated over the March through September period; Qbt is the estimated base flow component of Qt; hence, (Qt − Qbt) is the naturalized surface flow (Qst), Pt is the total water year P, and A is the contributing drainage area for a given sub-basin (see Supplementary Note S3 for additional details). Subsurface losses from P are considered to manifest as Qb, and evapotranspiration effects are represented using Normalized Difference Vegetation Index (NDVI) values. The March through September streamflow represents the full water year surface runoff period in response to the dominant seasonal precipitation input, and the temperature and vegetation growth response over these months.

Our hypotheses consider a hierarchical and multivariate dependence structure, allowing some variables (e.g., prior water year precipitation, Pt-1) to inform other variables directly (e.g., REt), as well as through an intermediate variable (e.g., Qbt). Under the BN learning model, one can test whether information contained in the intermediate variable is sufficient or if there is additional independent information that is conveyed by the primary causal variable. A DAG (Fig. 3) was developed to provide a structure to test the hypotheses. Using the DAG from Fig. 3, we tested various mechanistic hypotheses and quantified arc strengths with probability values (p-values) and Bayesian Information Criterion (BIC). This approach allowed us to evaluate goodness of fit, model complexity, and assess conditional independence to validate or reject causal hypotheses. Additionally, this testing clarified causality by identifying the relative strength contributions from correlated variables.

Fig. 3: Directed Acyclic Graph (DAG) for testing mechanistic hypotheses.
Fig. 3: Directed Acyclic Graph (DAG) for testing mechanistic hypotheses.
Full size image

Colored arcs represent hypothesis groups—refer to Supplementary Note S4 — Conceptual Hypotheses and Hypotheses Groups for description. Individual hypothesis links as well as the composite network of hypotheses are investigated using information-theoretic criteria (e.g., Bayesian Information Criterion) to synthesis a causal structure that is a subset of the full network above that is most plausible given the data on all the variables.

Based on the varied findings in the literature, we considered the role that past and concurrent P and T may play in changing RE through changes in vegetation (and hence evapotranspiration), snow accumulation, and base flow (and hence soil moisture). T and the NDVI are considered seasonally for each water year t — (a) for the spring season, the average T from March through June (TMAMJt) and average NDVI from March through May (NDVIMAMt); and (b) for the summer season, the average T and NDVI from July through September (TJASt) and (NDVIJASt). Further, for the fall season, NDVISONant, is represented by the average NDVI for September in water year (t-1) and October and November in the current water year (t). The maximum snow water equivalent (SWEMAXt) is considered as the peak SWE value from October through June of each water year.

We examined two periods of record in our analysis: (1) full period — the complete115‑year instrumental period over water years 1906 through 2020, and (2) NDVI – SWE period — the recent 38‑year subset of the full period when NDVI and SWE data became available, water years 1983 through 2020. We start our analysis with the full period and then move to the NDVI – SWE period for insights into the role of NDVI and SWE. While not as longitudinally extensive as the full period, the NDVI – SWE period is of much interest given the substantial temperature increase observed over the past four decades12,35,36,37 (e.g., see Fig. 2 and Supplementary Table S4), and specifically to reveal the potential role of changing vegetation and snow in response to P and T.

Because UCRB runoff is snowmelt-dominated and approximately 73% of annual precipitation occurs during the early-winter through late-spring months (October through June; Supplementary Table S3), we used total water year precipitation as the variable of interest for understanding runoff efficiency. This choice is supported by a very strong and highly significant correlation between October through June precipitation and water year precipitation in both the full period (r = 0.90, p ≤ 0.01) and the NDVI – SWE period (r = 0.92, p ≤ 0.01; Table 2).

With respect to the segmented NDVI spans of March through May and July through September, June NDVI was excluded to avoid conflating signals between natural green-up in March through May and the subsequent full irrigated agriculture season. In the UCRB, June marks a transitional period in which NDVI increasingly reflects irrigated cropland activity in addition to native vegetation dynamics, potentially obscuring early-season climatic influences on greenness (see Supplementary Note S5 for additional details).

Results

The time series of most of the hydroclimate variables are highly correlated across the 16 sub-basins, suggesting the role of a common large-scale climate driver of UCRB hydrology

Principal Component Analysis (PCA; PC—principal component) of the 16 sub-basin time series of each variable, using correlation as a metric, leads to a very high fraction of the mutual correlation explained by the first and second principal components, PC1 and PC2, respectively with PC1 dominating the variance explained for both the full period (93% to 38%) and NDVI – SWE period (93% to 50%) for all the hydroclimatic variables (Supplementary Tables S5 and S6, and Supplementary Fig. S3). We thus selected PC1 for all subsequent statistical analysis. There are additional components that have some common structure across the sub-basins, but they account for substantially lower variance. For example, PC2’s variance explained for both the full period and NDVI – SWE period were 3% to 11%, and 5% to 12%, respectively, for all hydroclimatic variables. The loadings (i.e., eigencoefficients) of PC1 were nearly equal for all variables, suggesting that it represents a coherent basin average (Supplementary Figs. S4a and S4b), while PC2 loadings represent a contrast in behavior across the 16 sub-basins for each variable (Supplementary Figs. S5a and S5b). Additional details of the PCA are provided in Supplementary Note S6. To understand variability and frequency components in the PC time series data, a wavelet analysis of PC1 for precipitation (Pt), base flow (Qbt), and surface runoff efficiency (REt) identified statistically significant decadal variability (Supplementary Fig. S6) consistent with the decadal time series variations shown in Fig. 2.

Surface runoff efficiency in the UCRB is most strongly influenced by precipitation and by spring vegetation, which in turn is influenced by spring temperature. Prior year precipitation and summer temperature play a secondary role, likely due to wet/dry hydroclimatic persistence across years

BN causality analysis applied to the full period (Fig. 4a and Supplementary Table S8) for the leading PC (PC1) of each field, identifies Pt as the primary determinant of REt, with average temperature from March through June (TMAMJt) exerting a secondary, modulating influence, and prior water year (Pt-1) and prior water year average temperature from July through September (TJASt-1) as weaker determinants. Qbt was found to be influenced by Pt-1 but is not retained as a predictor of REt; given Pt-1 and TJASt-1, Qbt does not provide additional statistically significant information. These findings are consistent with the results found by Woodhouse et al. for the instrumental record20 and Woodhouse & Pederson in the instrumental and paleorecord21 but diverge from the identification of TJASt as a direct driver of REt through evapotranspiration or for antecedent soil moisture and/or base flow as reported in much of the other existing literature. An additional insight is that the summer hot drought is already in place due to the spring warming in combination with any reduced Pt, which then translates into Pt being a limiting variable for the summer, with summer TJASt not emerging as an additional factor.

Fig. 4: Bayesian Networks (BN) of direct and mediated drivers of surface runoff efficiency (REt).
Fig. 4: Bayesian Networks (BN) of direct and mediated drivers of surface runoff efficiency (REt).
Full size image

PC1 BN depicting hypothesized and learned dependencies influencing REt in the Upper Colorado River Basin with results applied to (a) DAG for the full period (1906–‍2020); (b) DAG for the NDVI – SWE period (1983–‍2020). The DAGs show the full postulated BNs. Non-significant arcs (α > 0.05) are shown using dashed lines, significant arcs (0.01 <α ≤ 0.05) are shown using thin solid lines, and highly significant arcs (α ≤ 0.01) are shown using thick solid lines. Arrow direction indicates pre-specified directions of causality of all the variables based on physically plausible hypotheses. Bayesian Information Criterion (BIC) scores and p-values indicating model fit and arc significance are provided in Supplementary Tables S8 and S9. Supplementary Fig. S10 presents the DAG analogous to Fig. 4b, except with TMAMJt directing an arc to SWEMAXt. The SWEMAXt to TMAMJt arc shown here is more significant (p = 3.73e-06) than the TMAMJt to SWEMAXt arc (p = 3.62e-02) shown in Supplementary Fig. S10.

For the NDVI – SWE period (Fig. 4b and Supplementary Table S9), these causal relationships are refined given NDVI and SWE data availability. SWEMAXt and average NDVI from March through May (NDVIMAMt) emerge as the primary and secondary determinants of REt. In turn, SWEMAXt is determined by Pt, and thus emerges as an intermediate variable between Pt and REt. SWEMAXt accounts for 84% of the variance of REt. Conditional on the inclusion of SWEMAXt in the model, NDVIMAMt accounts for 4% of the variance of REt. Further, SWEMAXt accounts for 45% of the variance of TMAMJt which in turn accounts for 80% of the variance of NDVIMAMt, and Pt accounts for 70% of the variance of SWEMAXt (Table 1).

Table 1 PC1 Multiple Linear Regression (MLR)-based confirmatory analysis of Bayesian Network (BN) causal inference for the DAG structures shown in Fig. 4a (Full period) and Fig. 4b (NDVI-SWE period)

Thus, Pt’s influence on REt is approximately 60% (0.84 × 0.7 or 59% directly through SWEMAXt + 0.84 × 0.7 × 0.45 × 0.8 × 0.04 or 1% through the SWEMAXt, TMAMJt, NDVIMAMt pathway) of the variance, through the two pathways. By comparison, for the full period Pt, TMAMJt-1, Pt-1, and TJASt-1 account for 36%, 14%, 8%, and 2% of the variance of REt, resulting in an adjusted R2 of 0.58. An application of the Fig. 4a model to the NDVI – SWE period results in Pt, TMAMJt-1, Pt-1, and TJASt-1, accounting for 64%, 10%, 5%, and 0%, with an adjusted R2 of 0.77. TJASt-1 is no longer considered statistically significant as a predictor in this shorter period. We see that despite the potential difference in the climate and other factors over the two periods, the dominant influence of Pt is evident, and the role of intermediate variables is clarified by the analysis of the NDVI – SWE period, and that the use of intermediate variables in the shorter period is effective in boosting the adjusted R2 from 0.77 to 0.90. A comparison of Fig. 4a, b also illustrates how the role of the intermediate variables highlights how the information flows through the revised dependence structure.

Directionally, the influence of SWEMAXt on TMAMJt was found to be statistically more significant than the influence of TMAMJt on SWEMAXt (Supplementary Fig. S10), thus indicating the possible direction of influence on these two correlated variables, and supporting the hypothesis that reduced/increased snow (and hence Pt) increase/decrease spring T.

The significance of TMAMJt shifts to influencing REt through NDVIMAMt. Pt does not appear to significantly influence NDVIMAMt; thus, clarifying that the role of TMAMJt comes through increased vegetative activity and productivity that reduces the REt through increased transpiration. None of the other variables emerge as significant predictors of REt. Pt-1 and TJASt-1 influence antecedent NDVI for September through November (NDVISONant), and Pt-1 and NDVISONant influence Qbt, and Qbt influences REt only through NDVIMAMt, suggesting a role for subsurface water availability to support springtime vegetation activity.

In summary, we establish that Pt is the dominant influence on REt, with its role mediated by SWEMAXt which influences spring temperature TMAMJt and hence spring vegetation activity (NDVIMAMt), but this influence emerges as a secondary driver. Summer temperature TJASt, which has been implicated as the driver of reduced REt in other studies based on potential evapotranspiration estimates, does not emerge as a statistically significant factor. Prior year precipitation Pt-1 and fall vegetation (NDVISONant) do emerge as determinants of Qbt which then has an influence on spring vegetation activity (NDVIMAMt). However, Pt is by far the dominant driver of the REt variance explained.

The results of the confirmatory MLR analysis performed at the DAG node level are presented in Table 1 for the pruned DAG model based on BN results, and in Supplementary Tables S8, S9, and S10 for the full DAG. We note that they support the BN results which were derived over the full DAG’s likelihood, rather than a node at a time.

Fig. 5: Composite mean differences between the 10% wettest and the 10% driest years across key hydroclimate fields.
Fig. 5: Composite mean differences between the 10% wettest and the 10% driest years across key hydroclimate fields.
Full size image

Composite mean (January through May) for the UCRB 10% wettest (P PC1 greater than the 90th percentile) water years since 1948 minus the UCRB 10% driest water years (P PC1 less than the 10th percentile) since 1948 for (a) Sea Surface Temperature (SST) (°C); (b) 700 mb Geopotential Height (m) (c) 700 mb Vector Wind (m s−1); (d) Outgoing Long Wave Radiation (W m−2); (e) Surface Precipitable Water (kg m−2); (f) Surface Precipitation Rate (mm day−1); (g) 1000 mb Air Temperature (°C). 10% wettest years since 1948 in rank order: 1997, 1995, 1984, 1957, 1965, 1986, 1952; 10% driest years since 1948 in rank order: 2018, 1977, 2002, 2020, 2012, 1966, 1974. The approximate location of the UCRB is identified with a red oval. Plots were generated using NOAA Earth Systems Research Laboratories Physical Sciences Division Interactive Plotting and Analysis Tool76.

For wet and dry years, in which the basin water year precipitation PC1 is greater than the 90th percentile or less than the 10th percentile, respectively, a large-scale atmospheric circulation pattern in response to Pacific and Atlantic Sea Surface Temperature (SST) anomalies drives or blocks precipitation to the UCRB, with correspondingly cooler/warmer temperatures in the basin (Fig. 5)

To obtain additional insight into our primary statistical finding of precipitation being the principal contributor to the RE signal, composite analysis of this primary driver, was conducted revealing warmer mid-winter through spring (i.e., January through May) seasonal SST anomalies lead to weaker January through May mid-latitude atmospheric pressure and altered wind patterns around the Central North Pacific High that drive eddy transport of precipitable water (and P) towards the UCRB in the mid-winter-spring season. These dynamics are also associated with cooler UCRB T, and lower Outgoing Long Wave Radiation (OLR) consistent with increased cloudiness (Fig. 5). The opposite pathway through these interrelated mechanisms also occurs where cooler SST anomalies lead to stronger atmospheric pressure and wind anomalies that block eddy transport of precipitable water (and P) as well as lead to higher OLR consistent with reduced cloudiness. Circulation anomalies induced by spatial SST gradients (such as those associated with El Niño–Southern Oscillation events) that induce changes in the mid-latitude jet stream and associated eddy transport of tropical moisture appear to drive the dynamics of these extreme years. The SST and geopotential height patterns are similar to those identified by others6 (additional information is provided in Supplementary Note S7).

A sensitivity analysis using greater than the 75th percentile and less than the 25th percentile of the water year precipitation PC1 illustrates concordant sign and spatial structures (Supplementary Fig. S8). Furthermore, composite anomaly plots for the UCRB 10% driest water years (P PC 1 less than the 10th percentile) since 1948 from 1991–2020 climatology (Supplementary Fig. S7) exhibit opposing spatial structures that are mutually reinforcing and logically consistent.

The years corresponding with the 10% highest and lowest REt water years are not the same as those for the corresponding Pt percentiles, and these lead to different details of the climate fields and land-surface responses, such as those related to temperature, and vegetation activity and productivity, respectively. However, qualitatively, the conclusions as to moisture transport and precipitation are generally consistent (Supplementary Figs. S9). These differing details are not unexpected since REt includes the cumulative effects of hydroclimatic processes beyond the synoptic ocean-atmosphere interactions associated with the delivery of moisture and precipitation to the UCRB.

Discussion

Our overarching findings are that surface runoff efficiency (REt) in the UCRB is most strongly influenced by total water year precipitation (Pt), and then by spring average temperature for March through June (TMAMJt) in the full period. When the shorter period with snow water equivalent (SWE) and Normalized Difference Vegetation Index (NDVI) data availability is analyzed, we find that peak SWE (SWEMAXt), which is determined by precipitation, and the spring season average NDVI for March through May (NDVIMAMt), which itself is influenced by spring TMAMJt in the 1983–2020 period (NDVI – SWE period), emerges as a significant predictor, highlighting a temperature-vegetation pathway. Base flow (Qbt) and summer average temperature for July through September (TJASt) are not significant predictors of REt in either period. Prior water year precipitation Pt-1 and summer temperature TJASt-1 emerge only as modest influencers of current REt and only in the full period when intermediate NDVI variables are not available. Thus, the causal inference framework best supports the conclusion that water year precipitation is the primary driver of REt and is modulated by the spring temperature-vegetation growth dynamics.

Noting that the anomalous circulation mechanisms (our analysis and by others6) that transport/block P to the region would lead to enhanced/depleted snowpack with higher/lower albedo, we conjecture that the cooler/warmer UCRB is a local or regional response to the winter-spring season P input. A larger snow-covered area would lead to cooler conditions through albedo feedback, and vice versa30,31. While there is a general warming trend in the entire region and across seasons, the warm-dry and cool-wet spring conditions emerge as we considered the extreme winter to spring precipitation modes and their associated ocean-atmosphere circulation. The warm/cool spring anomalies accompanying the low/high precipitation we used as conditioning variables are anomalies from the longer period trend and set the stage for the changes in the vegetation dynamics, as reflected in phenological shifts, morphological changes, and increased biomass in those years. The surge in spring vegetation in warm-dry years amplifies REt’s negative response beyond what would be expected from P alone. Thus, the anomalous circulation dynamics lead to a basin scale response that cascades through the water-energy-vegetation processes.

Implications for studies that analyze factors responsible for changes in runoff characteristics in similar settings include, (a) investigating long historical records is important for rigorous, statistically verifiable claims; (b) exploring the ocean-atmosphere circulation patterns that drive regional weather and seasonality is useful for insights as to the anomalous conditions that lead to changes in REt; (c) dynamic vegetation models that represents phenological and biomass changes is important for the proper attribution of outcomes from process based hydrologic models; and (d) causality frameworks with statistical hypothesis testing is useful whether a process model or an empirical analysis is employed.

Our findings are generally consistent with those in a recent paper by Hogan & Lundquist using different methods and targets for analysis31. They found that decreased spring P coupled with higher potential evapotranspiration that they attribute to reduced cloud cover and lowered surface albedo due to earlier snow disappearance explains much of the 19% reduction in streamflow since 2000, relative to the 14% decrease in P over the same period.

Warming during spring and its effects on terrestrial vegetation activity and productivity including growth onset, development and lifecycle dynamics are well documented60,61,62,63. Our findings are consistent with this observation—spring TMAMJt is very strongly positively correlated (r = 0.87, p ≤ 0.01) with NDVIMAMt. Further, NDVIMAMt is very strongly negatively correlated (r = −0.71, p ≤ 0.01) with REt suggesting an enhancement of consumptive water use through evapotranspiration (Tables 1 and 2, and Supplementary Table S9). The crucial role of vegetative processes on water availability in an eastern United States and lower elevation, mid-latitude basin using physics-based model has been previously documented64.

Table 2 Matrix of Pearson correlations among PC1 scores of hydroclimatic indicators. PC1 correlation matrix for the full period (1906–2020) and the NDVI – SWE period (1983–2020)

Seasonal NDVI analyzed includes early vegetation (i.e., NDVIMAMt) and fully developed vegetation (i.e., NDVIJASt) which also includes agricultural crops. We found the average NDVI for July through September (NDVIJASt) to be positively correlated (r = 0.66, p ≤ 0.01, r = 0.60, p ≤ 0.01, and r = 0.68, p ≤ 0.01) with Pt, SWEMAXt, and REt, respectively, but NDVIMAMt is negatively correlated (r = −0.54, p ≤ 0.01, r = −0.61, p ≤ 0.01, and r = −0.71, p ≤ 0.01) with Pt, SWEMAXt, and REt, respectively (Table 2). Furthermore, we found that these NDVI variables are negatively correlated (r = −0.68, p ≤ 0.01) with each other (Table 2). An explanation for this negative correlation between NDVIJASt and NDVIMAMt is that a higher water year Pt and REt corresponds to a higher NDVIJASt given the greater amount of available water. This higher water availability supports natural vegetation and agricultural crops. However, noting that Pt and TMAMJt are negatively correlated in the full period (r = −0.43, p ≤ 0.01) and the NDVI – SWE period (r = −0.63, p ≤ 0.01; Table 2), we infer cool-wet conditions with more persistent spring snow cover translates into reduced spring vegetation and hence lower NDVIMAMt, which is further supported by the TMAMJt and SWEMAXt negative correlation (r = −0.67, p ≤ 0.01; Table 2) in the NDVI – SWE period.

Methods

Principal Component Analysis (PCA; PC — principal component), correlation analysis, Bayesian Network (BN) modeling, multiple linear regression (MLR) modeling, analysis of variance (ANOVA), and wavelet analysis was performed using the R statistical computing environment65. A workflow diagram of the general analytical approach is presented in Supplementary Fig. S11.

PCA

Our analyses indicated that there was considerable similarity in the relationships of the timeseries analyzed separately for the 16 sub-basins of the Upper Colorado River Basin (UCRB) based on the spatial correlation structure and its projection into a lower dimension using PCA. Consequently, we used PCA across sub-basins for each variable to identify (a) the leading common temporal pattern (i.e., PC1), and (b) the leading contrast from the common pattern (i.e., PC2). The hypotheses were then explored for these 2 leading PCs to understand how the dominant or common temporal patterns of each variable relate to the same level of patterns of the other variables. This approach enabled us to regionalize hypotheses while retaining the possibility of exploring the causal structure of phenomena that are both spatially common and spatially diverse. To perform the PCA dimensionality reduction of the hydroclimatic variables, the princomp function in the stats R-package65 was used with the multi-sub-basin correlation matrix.

Correlation analysis

To assess the pair-wise strength and direction of the linear relationship between the PC scores for each variable, Pearson’s correlation analysis was performed using the cor function in R65. Correlation (r) strength was considered to be negligible when |r | <0.20, weak when 0.20 ≤ |r | <0.40, moderate when 0.40 ≤ |r | <0.50, strong when 0.50 ≤ |r | <0.70, and very strong when |r | ≥ 0.70. Pearson’s product-moment correlation coefficient was tested for significance using the cor.test function in R65. Results were classified as non-significant when α > 0.05, significant when 0.01 <α ≤ 0.05, and highly significant when α ≤ 0.01.

Analysis of the correlations of the PC1 scores (Table 2) of each variable revealed additional insights beyond the well-established relationship between surface runoff efficiency (REt) and precipitation (Pt) and peak snow water equivalent (SWEMAXt); as expected, REt was positively correlated with Pt in both the full period (r = 0.60, p ≤ 0.01) and the NDVI – SWE period (r = 0.80, p ≤ 0.01); further, REt and SWEMAXt are positively correlated (r = 0.92, p ≤ 0.01) during this latter period reaffirming the dominant influence of precipitation on runoff efficiency. The positive correlation (r = 0.87, p ≤ 0.01) between average temperature for March through June (TMAMJt) and average Normalized Difference Vegetation Index for March through May (NDVIMAMt), the negative correlations between REt and TMAMJt in the full period (r = −0.60, p ≤ 0.01) and the NDVI – SWE period (r = −0.75, p ≤ 0.01), and between REt and NDVIMAMt (r = −0.71, p ≤ 0.01) supports the hypothesis that warming leads to enhanced natural vegetation response which then decreases REt for the same Pt and SWEMAXt. Further, average temperature for July through September (TJASt) is only moderately negatively correlated with REt in the full period (r = −0.44, p ≤ 0.01) and the NDVI – SWE period (r = −0.31, p > 0.05), suggesting a limited influence of summer T, and the average NDVI for July through September (NDVIJASt) is notably strongly and positively correlated with REt (r = 0.68, p ≤ 0.01) which may suggest that in a wet year, improved water availability in the UCRB increases irrigated agricultural and natural summer vegetation activity and productivity. See Supplementary Note S8 for additional PC1 correlation details and observations (correlations of PC2 are provided in Supplementary Table S7).

These pairwise correlations are suggestive of ways in which the hypotheses in the complete DAG (Fig. 3) can likely be pruned. However, in the multivariate, causal setting, a comprehensive evaluation of the DAG using Bayesian learning to compare the different probabilistic representations implied by the postulated causal pathways is necessary.

BN modeling

Considering our hypotheses (Fig. 3), we explore and refine these hypothesized links using BNs for inference on the causal strength of candidate networks. Note that the direct influence of Pt and Tt on RE t is considered in addition to that through the intermediate variables Qbt and NDVIt. Data on all these variables is available, whereas soil moisture and evapotranspiration data are not directly available and would otherwise be estimated using parameterized physics-based models. The DAG is intentionally over specified as Bayesian learning is used to retain only those links that are supported by the data accounting for the sample size and the number of model parameters. As an example, a model for the causality of Qbt would use the likelihood of the data to compare and select the most appropriate of the following probabilistic representations—(1) f(Qbt), (2) f(Qbt|NDVISONant)f(NDVISONant), (3) f(Qbt|NDVISONant)f(NDVISONant|TASt-‍1, Pt-‍1)f(TASt-1, Pt-1), and (4) f(Qbt|NDVISONant,TASt-1, Pt-1)f(NDVISONant|TASt-1, Pt-1) f(TASt-‍1, Pt-‍1); where, f(.|.) represents a conditional probability density function, and f(.) is a marginal probability density function. The winner would then correspond to an appropriately pruned DAG, based on which of the four candidates had the highest likelihood given the data, and accounting for the number of parameters estimated from the data for that postulate.

BNs consider a possible dependence structure across a set of variables. Learning in BNs entails testing competing dependence structures or DAGs using the data to assess the most likely DAG that best represents the data presented, relative to the original DAG that was hypothesized to cover plausible causal structures. The identification of the direction of causality as well as the structure of the optimal resulting network is a challenging problem, since the number of potential combinations to be evaluated can quickly become very large, and a number of global optimization algorithms are proposed. In our case, we have pre-specified the direction of causality of all the variables based on physically plausible hypotheses (Fig. 3), and since we considered several potential factors, some of which may be interdependent on each other, our problem is simpler. We need to optimally prune this network so that the most parsimonious model that maximizes the likelihood of the observations can be identified.

All the variables we considered are continuous, and we use the Bayesian Information Criteria (BIC; bic-g in R package bnlearn)66 as the score function to choose the optimal network by comparing all possible DAGs with arc removal with all conditional and marginal distributions modeled as Gaussian, and the BIC evaluated as a penalized likelihood measure across each potential configuration of the DAG, and bic-g evaluated as the difference in the BIC if the DAG is modified to delete or add a specific arc in the network. The BIC considers the likelihood function, the sample size of the data, and the total number of model parameters which contribute to the effective degrees of freedom. As an example, consider a simple DAG with variables A, B, and C with a postulated dependence structure {C | A,B, B | A}. In this case the candidate models are (1) f(C | A,B)f(B | A)f(A), (2) f(C | A,B)f(B)f(A), (3) f(C | A)f(A), (4) f(C | B)f(B), and (5) f(C)f(B)f(A). Note, the decreasing dependence and complexity (number of variables) in the sequence of models. The first model considers that C depends on both B and A, and that B depends on A. Thus, A informs C in addition to the information A passes to B. The second model considers the dependence of C on A and B, but B is no longer considered to depend on A, and so on. The BIC is defined as, BIC(i) = di ln(n) – 2 ln(Li); where, i is the model index, n is the sample size, di is the number of parameters for the ith model, and Li is the estimated likelihood (i.e., joint probability over the n observations) of that model for the observations. The computation of Li entails a choice of a probability distribution for each variable (A, B, C in the example), and a model for the functional dependence between the variables (e.g., for model 1, C = a0 + a1 A + a2 B, eC ~ N(0, sC2); B = a3 + a4 A, eB ~ N(0, sB2); A ~ N(mA, sA2), i.e., d1 = 9, and for model 5 A ~ N(mA, sA2), B ~ N(mB, sB2), C ~ N(mC, sC2), i.e., d5 = 6, where N(.,.) represents a Normal distribution with the first parameter as the mean and the second parameter as the variance. The corresponding likelihoods would be evaluated as,

$${L}_{1}={\prod }_{j=1}^{n}f({C}_{j}|{A}_{j},\,{B}_{j})f({B}_{j}|{A}_{j})f({A}_{j})$$
(1)
$${L}_{5}={\prod }_{j=1}^{n}f({C}_{j})f({B}_{j})f({A}_{j})$$
(2)

The model that minimizes the BIC is selected, and one can evaluate the strength of each arc through its contribution to the likelihood of the model. We also test for conditional independence with partial correlations using a p-value of the arc as contributing to the model with a t-test67, analogously to the p-value test for correlation significance68 (see Supplementary Note S9). The results from the model selected via BIC scoring are typically consistent with the p-value (from a two-sided test), and the arc strength based on the likelihood contribution of an arc.

The DAGs for the full period and the NDVI – SWE period after Bayesian learning applied to PC1 for each of the hydroclimate variables in the complete DAG (Fig. 3) are shown in Fig. 4a, b respectively. Bayesian learning was performed using the bn.fit function from bnlearn, and results of arc strength calculated using the arc.strength function are provided in Supplementary Tables S8 and S9.

Additional information regarding BN is provided in Supplementary Note S2.

MLR and ANOVA

To confirm the significance of the causal variables and the explanatory power of these predictors, MLR models were developed (Table 1 and Supplementary Table S10). The lm and anova functions available in the stats R-package65 were used. As summarized in Table 1 and Supplementary Table S10, the models confirm the BN causal results.

Wavelet analysis

To understand variability and frequency components in the PC timeseries data, wavelet spectral analysis was conducted using the R-package biwavelet69. Statistically significant decadal variability consistent with the observed decadal time series variations in the Pt, Qbt, and REt hydroclimate variables are presented in Supplementary Fig. S6.

Naturalized streamflow data

Monthly naturalized streamflow for the Colorado River and tributaries within Colorado River Basin were obtained from the Bureau of Reclamation (Reclamation)70. This natural flow is computed by Reclamation using their Natural Flow and Salt Model with U.S. Geological Survey (USGS) gaged flow with the effects of dams, diversions, and other anthropogenic actions removed71. This data was available from water years 1906 through 2020 for 20 sub-basins within the UCRB. A sub-set of 16 were chosen as the dominant water-producing sub-basins. Separately, these 20 sub-basins are modeled in Reclamation’s Colorado River Simulation System44 (CRSS, also referred to as the Colorado River Decision Support System—CRDSS) and we reference these sub-basins using the CRSS identifier — CRSSID (Supplementary Table S1).

Precipitation and surface air temperature data

Mean monthly precipitation and temperature values were obtained from the National Oceanic and Atmospheric Administration’s (NOAA) Monthly U.S. Climate Gridded (5 × 5 km) dataset72 (NClimGrid). This dataset interpolates instrumental data from the Global Historical Climatology Network Daily (GHCN-D) Temperature and Precipitation Dataset consisting of daily maximum, minimum, average temperature values, and daily precipitation accumulation values73. The NClimGrid data, which was available from water year 1896 through present, was processed for the full period using an UCRB sub-basin shapefile (a geospatial vector data format developed by ESRI — Environmental Systems Research Institute) with the 16 sub-basins of interest. Based on this spatial and temporal resolution, monthly P, and T values for each of the sub-basins were computed. Monthly P was aggregated to annual total P value. We chose to use a gridded archive of climatological data and representative averaging instead of raw station data to achieve a more complete and consistent representation of the climatology across the entire sub-basin and thus avoid specific observation station location bias. NClimGrid is also the only gridded P and T dataset that is published officially by a U.S. federal agency — NOAA.

Normalized difference vegetation index (NDVI) data

Mean maximum monthly NDVI were collated from the Global Inventory Modeling and Mapping Studies − 3rd Generation V1.2 (2023-08-24, GIMMS-3G + ) on an approximately 8 km by 8 km grid74. This dataset is derived from NOAA satellite vegetation greenness data using Advanced Very High-Resolution Radiometer (AVHRR) obtained at an approximate 15-day timestep. This data, which was available from water years 1983 through 2021, was first averaged from the two maximum NDVI observations per month in the dataset and was processed for the NDVI – SWE period using the earlier referenced UCRB sub-basin shapefile. Based on this spatial and temporal resolution, seasonal mean maximum values for each of the sub-basins was then computed.

Sea Surface Temperature (SST), wind, precipitation and precipitable water, radiation, and atmospheric pressure data

Composite mean and composite mean anomalies were developed from National Centers for Environmental Prediction (NCEP) — National Center for Atmospheric Research (NCAR) Reanalysis 1 Data75 using NOAA-Earth Systems Research Laboratories-Physical Sciences Division Interactive Plotting and Analysis Tool76. The SST used is the NOAA Extended Reconstructed SST V577.

Snow water equivalent (SWE) data

Mean daily SWE data values were obtained from the National Snow and Ice Data Center (NSIDC) gridded (4 × 4 km) dataset78. This SWE and snow depth dataset developed at the University of Arizona assimilates, interpolates (between the station locations), and downscales (800 × 800 m) observed and modeled data over the Conterminous U.S. from National Resources Conservation Service’s snow telemetry (SNOTEL) network and the National Weather Service’s cooperative observer program (COOP) network, and Oregon State University’s Parameter-elevation Regressions on Independent Slopes Model (PRISM), respectively. The NSIDC data, which was available from water year 1982 through present, was processed for the NDVI – SWE period using an UCRB sub-basin shapefile with the 16 sub-basins of interest. Based on this spatial and temporal resolution, maximum daily values for each of the sub-basins were computed. As with the Precipitation and Surface Air Temperature data, we chose to use a gridded archive of climatological data followed by PCA instead of raw station data to achieve a more complete and consistent representation of the climatology across the entire sub-basin and thus avoid specific observation station location bias.