Introduction

Surface soil moisture (SM) plays an essential and multifaceted role in the functioning of the Earth’s climate system, with important feedback to the water, energy, and carbon cycles1,2,3. Accurate projections of future SM changes (ΔSM) are critical for water resources planning, agricultural management, and climate change adaptations. Earth System Models (ESMs) are indispensable tools for projecting ΔSM but often show large uncertainties regarding its magnitude and even the change sign4,5,6,7,8,9. One reason behind the large uncertainty is related to the sparse in-situ observations of SM10, which prohibits the effective development, calibration and constraint of ESMs.

Recently, the Emergent Constraints (EC) technique has been developed to narrow uncertainty in model projections by deriving a physically explainable relationship between aspects of current climate and future climate projection (X and Y, respectively) that emerges from an ensemble of ESMs11,12,13. Observations of X can then be used to constrain Y if the uncertainty in X observations is small compared to the spread in X simulations. Previous studies have successfully adopted ECs for various climate-relevant quantities and processes such as equilibrium climate sensitivity14, Arctic ocean acidification15, land photosynthesis16, runoff17, precipitation18,19,20,21,22,23, climate warming24,25,26, crop yield sensitivity27,28, soil carbon turnover29, and aerosol-cloud-climate forcing30,31.

So far, EC has not been proposed to constrain ΔSM, probably because the traditional EC approach using a univariate and linear perspective is ineffective for this goal. From a water balance point of view, ΔSM is controlled by both changes in water demand via potential evapotranspiration (PET) and changes in water supply1,2,6,32,33. With greenhouse warming, increase in net radiation (Rn) would tend to increase PET, while warming would also directly widen vapor pressure deficit and thus elevating PET. As a proxy of water supply, precipitation shows diverse regional changing patterns and serves as a major driver of actual evapotranspiration (ET). Thus, a single observable variable is unlikely to accurately capture the joint physical mechanisms controlling ΔSM.

In this work, we extend the traditional EC technique to bivariate EC to account for the joint and compounding effects of temperature (T) and precipitation (P). We use T as the proxy for water demand since it is a key driver of PET under global warming scenarios and has much larger model samples and less observational uncertainty than other drivers of PET (e.g., Rn) (Supplementary Note 1). Here, we identify strong emergent relationships between historical temperature and precipitation trends during 1980–2014 (T trend and P trend) and future ΔSM (upper 10 cm) during 2070–2099 compared to 1980–2014 across 30 ESMs from the Coupled Model Intercomparison Project phase 6 (CMIP6)34,35 (Supplementary Table 1). Observed T and P trends from three observational datasets are then combined with emergent relations to constrain ΔSM using the traditional univariate and the newly developed bivariate ECs. The rationale is that in-situ observations of T and P are more readily available than SM, and the impacts of T and P on SM are physically well-understood.

Results

Uncertainties in future ΔSM

Consistent with previous reports5,6,7,8,9,36, the ensemble mean of CMIP6 models projects a drying trend in global-mean SM toward the end of the 21st century for the high-emission scenario SSP5-8.5 (Supplementary Table 1). By 2070–2099, a global mean decrease in SM of 4.29% is projected, with more drying in South Africa, north South America, and Southwest Europe (Fig. 1), due to decrease in P and increase in ET (Supplementary Fig. 1). In association with the widespread drying, large uncertainties (i.e., one standard deviation across model ensemble) are associated with ΔSM, which exceed the uncertainties in the projected changes in T (ΔT) and P (ΔP) (Supplementary Figs. 23). Similar patterns are found for the SSP1-2.6, SSP2-4.5, and SSP3-7.0 scenarios but with less pronounced drying (Supplementary Figs. 46).

Fig. 1: Global changes in soil moisture by the end of the 21st century and its uncertainty.
figure 1

a Spatial distribution of the multi-model ensemble mean of future soil moisture changes (ΔSM, %) by 2070–2099 relative to 1980–2014 and its uncertainty (inter-model standard deviation, σ) based on the Coupled Model Intercomparison Project phase 6 (CMIP6) outputs under the SSP5-8.5 scenario (Supplementary Table 1). b Latitudinal mean of ΔSM (black) and the associated uncertainty (gray). c, d Histogram (normalized scale of relative frequency) of grid-level σ and ΔSM across global land areas, respectively.

Underlying mechanisms behind ΔSM

Following the Palmer Drought Severity Index37 and the Budyko theory38, SM can be derived from water supply (related to P) and water demand (related to PET). Although future changes in PET are mainly driven by Rn and T, we use T as a proxy of water demand instead of Rn because the observational uncertainty of T is much lower than Rn (Supplementary Note 1). By regressing ΔSM against ΔT and ΔP across CMIP6 models, we found that ΔT alone can explain 40% of the inter-model variance of ΔSM, while we do not find evidence that ΔP alone is related to ΔSM (Fig. 2a, p > 0.05). The physical mechanisms are the widening of vapor pressure deficit by reducing relative humidity (Supplementary Table 2) and increasing saturated water vapor with warming according to Clausius-Clapeyron law (Supplementary Note 1). In addition, ΔPET show strong correlations with ΔET over land (r = 0.58), though it is weaker than that over the ocean (r = 0.72), as land ET is controlled by water availability, soil property, vegetation dynamics and etc39. Although P shows weak correlation with SM, this does not mean it is not important to include P for SM predictions for two reasons. First, there is a close linkage between ΔP and ΔET (r = 0.79, p < 0.05) and between ΔET and ΔSM (r = −0.33, p < 0.1). Second, ΔT show strong correlations with ΔP (r = 0.82, p < 0.05), which can be physically explained by atmospheric energetics40. Therefore, when considering ΔT and ΔP together, the explanatory power for ΔSM increases substantially especially after accounting for the interactions between ΔT and ΔP (R2 = 0.67), which is expected since the effects of ΔT on ΔET are physically dependent on ΔP which also scales linearly with ΔT (Supplementary Fig. 7). Factors associated with the remaining unexplained variance of ΔSM include vegetation structure and physiology41, snow42 and CO243, which are not adopted as the constraints of ΔSM due to the relatively high uncertainty associated with their measurement.

Fig. 2: Physical mechanisms behind global ΔSM.
figure 2

a Scatter plot of CMIP6 ΔSM (%) against predicted ΔSM under the SSP5-8.5 scenario during 2070–2099 relative to 1980–2014. Each circle corresponds to the global mean ΔSM of a CMIP6 model (see Supplementary Table 1). The red, blue, yellow, and purple colors indicate the predictions by the regression models based on CMIP6 outputs of temperature change (ΔT) only (LT), precipitation change (ΔP) only (LP), both ΔT and ΔP (LTP) and their interactions (NTP), respectively. For each ESM, the simulated ΔSM from CMIP6 outputs is plotted on the x-axis, while the predicted ΔSM by four regression models are plotted on the y-axis. The corresponding fitting line and coefficient of determination (R2) is shown, with * indicating statistically significant at 95% confidence level. b Scatterplot of future ΔT against historical T trend during 1980–2014. The dashed black line indicates the linear fitting line, with R2 shown on the top left and 95% confidence interval shown by shaded gray areas. The probability density functions (PDFs) of ΔT and T trends across CMIP6 models are shown on the left and at the bottom, respectively. c Same as (b) but for precipitation.

In addition, we found that models that simulate a large T trend during 1980–2014 are likely to project a higher ΔT in the future (Fig. 2b), and similar findings are obtained for ΔP (Fig. 2c). Similar to ΔT, future ΔP is controlled by greenhouse gas (GHG) increases, as aerosol emissions are expected to be relatively low by the end of 21st century44, even if P is more sensitive to anthropogenic aerosol emissions than GHGs. Detection and attribution analysis indicates that T and P trends during 1980–2014 are dominated by the responses to GHGs rather than aerosol emissions and natural internal variability (Supplementary Note 2 and Figs. 89). Based on these rationales, recent studies have successfully constrained ΔT and ΔP using historical observations of T and P trends18,24. Therefore, given the strong physical control of ΔT and ΔP on ΔSM (Supplementary Note 1), we hypothesize that historical observations of T and P trends can be utilized to constrain ΔSM.

Bivariate emergent constraints on global and regional ΔSM

Our results confirm a strong emergent relationship between historical T and P trends and future ΔSM across CMIP6 models (Fig. 3a), which is expected since both recent T and P trends and future ΔSM are mainly driven by an increase in GHGs (Supplementary Note 2 and Figs. 89). This emergent relationship suggests that CMIP6 models with greater warming in the recent past tend to predict an intensification of future surface soil drying. Although the recent P trend alone shows no statistically significant relation with ΔSM across CMIP6 models, the combination of T and P trends increases the explained variance of ΔSM compared to LT, with R2 values of 0.25 and 0.42 for the LTP and NTP methods, respectively. This is because T effect on SM would depend on P and vice versa, and such nonlinear effects cannot be explicitly represented in the univariate emergent constraint (i.e., LT, LP). The revealed emergent relations are also detected in the observational dataset (Supplementary Table 4). Overall, despite that the spread in the historical variable X (i.e., T and P trends) and future variable Y (i.e., ΔSM) is large across CMIP6 models, applying f(X) for constraining Y is potentially useful (p < 0.05), given that the observed uncertainty in X is smaller than that in the simulations (Supplementary Table 5).

Fig. 3: Emergent constraints on global ΔSM.
figure 3

a Each circle corresponds to a CMIP6 model (Supplementary Table 1), which shows the simulated historical (1980–2014) T trend on the x-axis, P trend on y-axis, and ΔSM (2070–2099 relative to 1980–2014). The observational mean of T and P trends during the historical period are shown by the vertical and horizontal dotted lines, respectively, with the associated uncertainties indicated by the shading (±1 standard deviation, Supplementary Table 3). b The probability density functions (PDFs) of future ΔSM before (black line) and after (colored lines) constraints. The red, blue, yellow, and purple lines are the PDFs of ΔSM after applying the emergent constraints (ECs) of LT, LP, LTP and NTP, respectively. The corresponding error bars (mean ±1 standard deviation) are illustrated at the top. The star in the legend indicates that the PDF curves after constraint are statistically significantly different (p < 0.05) from that before constraints according to the Kolmogorov-Smirnov (K-S) test. The coefficients of determination (R2) of the four ECs (* indicates p < 0.05), the differences in the central estimate of ΔSM after constraint relative to that before constraint (Δμ=μafter-μbefore, %) and the differences in the associated uncertainties (Δσ = (σafter-σbefore)/σbefore, %) are given by the table inset.

Observed T and P trends are then combined with the emergent relationships to derive the constrained PDF of ΔSM (Fig. 3b). The results show a statistically significant difference (K-S test, p < 0.05) between the PDF of the constrained ΔSM based on the developed bivariate ECs and that of the raw (i.e., unconstrained) CMIP6 projections. The raw CMIP6 models projected a global decline in SM of 4.29% by the end of the century. After constraints by LT, they showed a much lower decrease in SM of 0.74% since the negative effects of warming bias on ΔSM were excluded. When both T and P effects are considered in LTP, the negative sensitivity of ΔSM to T increases while a slight positive sensitivity to P is obtained, and therefore less drying (ΔSM = 0.35%) is projected after excluding both the warming and wetting bias of CMIP6 models. A positive ΔSM is even projected after the NTP constraint (Fig. 3b) as the negative sensitivity of ΔSM to T triples due to T and P interactions. This shift is also supported by the underestimation of historical surface available water (P-ET) by CMIP6 models which is expected to persist in the future (Supplementary Fig. 10).

In addition to the positive shifts of the estimated average ΔSM, both univariate (i.e., LT) and bivariate ECs resulted in narrower PDF curves, indicating a reduction in ΔSM uncertainty. This is because the uncertainty in T and P observations is small compared to their uncertainty in CMIP6 simulations (Supplementary Table 6). In particular, the proposed bivariate nonlinear EC (i.e., NTP) leads to an uncertainty reduction of 7.87%, which is four times more effective than the traditional univariate EC of 1.68% (i.e., LT). The results also show that the Bayesian Information Criterion (BIC) values for the LT, LTP and NTP models are 171.24, 172.56 and 168.17, respectively, which suggest that the LTP and NTP models do not exhibit overfitting compared to LT. Since NTP has greater explanation power and is more effective in reducing the uncertainty of ΔSM compared to other models, we can place greater confidence on the results of ΔSM from the NTP constraint. The better fit of NTP over LTP further indicates the importance of considering T and P interactions for constraining ΔSM and suggests that when averaged globally, surface soils may not be as dry in the future as previously projected by the raw CMIP6 models.

Similar to the global-scale results, the NTP method outperformed the LT method in semi-arid, dry sub-humid, and humid regions, but does not appear to better perform in hyper arid and arid climate regions due to large uncertainty in P observations (Fig. 4a and Supplementary Table 5). We then use global P as the constraint, motivated by the statistically significant correlation between multi-model ensemble mean of global P and regional P for hyper-arid (r = 0.62, p < 0.05) and arid (r = 0.65, p < 0.05) climate regions, and the accelerated moisture transfers and hydrological cycle over water-limited regions45. The emergent relationship shows better goodness-of-fit with models based on LP, LTP, and NTP in both hyper-arid and arid regions (Fig. 4a), which are key drivers in the reduced uncertainty at a global scale. The bivariate EC (NTP) shows improvement which depends on the climate zones, which is related to the region-dependent capability of NTP model and spatial distribution of uncertainty in climate observations used for constraining ΔSM. For hyper-arid, arid, dry sub-humid, and humid areas, the NTP approach achieves uncertainty reductions of 14.60%, 9.91%, 3.96%, and 7.13%, respectively, which are 1.00, 1.73, 1.12, and 2.38 times larger than the univariate ECs, respectively. However, in hyper-arid, arid, and humid regions, the univariate constraint (LT or LP) appears more effective than the LTP constraint to reduce uncertainty, as the positive effect of higher fitting goodness is completely counteracted by the negative effect of increased additional uncertainty of the P or T trend (Fig. 4c).

Fig. 4: Emergent constraints on regional ΔSM.
figure 4

a The coefficients of determination (R2) of the emergent relationships based on LT (red), LP (blue), LTP (yellow) and NTP (purple) constraint for Hyper arid, Arid, Semi-arid, Dry sub-humid and Humid regions. Hyper arid g and Arid g represent the results for Hyper arid and Arid regions using global precipitation trend as the predictor instead of regional precipitation. A dot indicates a good fit of the emergent relationship is statistically significant (p < 0.05). b The differences in the central estimate of ΔSM after constraint relative to that before constraint (Δμ=μafter-μbefore, %). c The differences in the uncertainties of ΔSM after constraint relative to that before constraint (Δσ = (σafter-σbefore)/σbefore, %). Definitions of five climate regions are provided in Supplementary Fig. 11.

For semi-arid, dry sub-humid and humid regions, the LT, LTP and NTP methods consistently suggest a positive shift of the ΔSM, suggesting that future SM may have been underestimated by the raw CMIP6 models (Fig. 4b). The largest increase in SM is obtained for NTP, followed by LTP or LT. Overall, the regional pattern of constrained ΔSM broadly exhibits an intensified drying in arid regions and enhanced wetting in wet regions. Specifically, for semi-arid, dry sub-humid and humid regions, the nonlinear bivariate ECs (i.e., NTP) even change the sign of ΔSM from negative to positive (Supplementary Table 7). However, the ECs lead to a decrease in SM in arid and hyper-arid regions (except for NTP in arid region), as ΔSM in the two regions are dominated by ΔP (Fig. 4a) and ESMs have a wet bias in P trend (Supplementary Table 5). Spatially, the change in the sign of ΔSM is primarily observed in high-latitude areas of the Northern Hemisphere and Australia (Supplementary Fig. 12). These regional differences between the constrained ΔSM patterns can be attributed to the overestimation of recent T and P trends by CMIP6 models (Supplementary Table 5) and imply that model uncertainty in simulating historical T and P trends would propagate into future projections of ΔT and ΔP, and thus impact ΔSM.

Discussion

Reducing the uncertainty of ESMs in ΔSM projections is critical to inform climate policies and guide water management, although ESMs may not necessarily account for all sources of uncertainty. Given the strong relations between ΔSM and SM itself (r = 0.75), there is a possibility for using observational SM for constraining ΔSM. However, observational SM datasets show large discrepancy not only in the magnitude but also in the sign of SM trend and the uncertainty is even higher than CMIP6 models (Supplementary Fig. 13), which make it infeasible to constrain ΔSM using historical SM trend. The relatively smaller observational uncertainty of climate forcing than SM allows for constraining ΔSM projections using the so-called EC approach. Previous EC studies have typically used univariate approaches, such as the LT and LP constraints. In this study, we introduced a bivariate EC and applied it to constrain the spread of ΔSM at the global and regional scales. The performed sensitivity analysis on different future periods and on different future scenarios shows that the NTP constraint consistently exhibited the best performance in reducing the uncertainty of ΔSM (Supplementary Table 8 and Figure 14). Since future changes in total SM are dominated by ΔP46, the marginal improvement of goodness of fit by the bivariate ECs is mostly offset by the increased additional uncertainty of ΔT when constraining total SM (Supplementary Fig. 15).

Indeed, the performance of EC methods is related to the prediction error (σy, see Methods) and the observational uncertainty. Since the same database was used for different ECs, and the influence of internal climate variability can be neglected (Supplementary Fig. 8), the EC performance is primarily determined by the prediction error σy, which is captured by the fitting goodness (represented by the variance in the residuals s) and additional uncertainty associated with the prediction term\(A{({X}^{T}X)}^{-1}{A}^{T}\). Lower values of s—this is associated with a better goodness of fit—and a reduced additional uncertainty improves the ability of EC to reduce the uncertainty of ΔSM. When extending from a univariate EC to a bivariate EC, the goodness of fit improves, which, in turn, leads to an additional uncertainty introduced by the added item in fitting. In some cases, the reduction in ΔSM uncertainty gained by a better goodness of fit may be mostly offset by an increased additional uncertainty of P, as illustrated by the case of LTP (Fig. 3). In hyper-arid and arid regions, after replacing regional P with global P, the ECs lead to a large reduction in ΔSM uncertainty, as the observational uncertainty in global P is substantially lower than that in regional P for hyper-arid and arid regions (Supplementary Table 5).

Our analysis does not find evidence of emergent relationships under the more moderate warming scenarios SSP1-2.6 and SSP2-4.5 (Supplementary Fig. 14) and in spring and autumn under the SSP5-8.5 scenario (Supplementary Fig. 16), due to weaker warming signals24,47,48 and consequently smaller ΔSM (Supplementary Figs. 26). Considering the potential influences of internal variability and uncertainties in the observations, we restrict our historical period to begin in 1980 as the effects of anthropogenic aerosols and internal climate variability are relatively small (Supplementary Note 2 and Figure 8) and gauge-observations are more readily available thereafter. Given the global warming hiatus reported after 2000s49, we conduct a sensitivity analysis by extending the historical period to 1970–2014, which leads to similar conclusions. However, the effectiveness of NTP in uncertainty reduction is lower than that during the period 1980–2014, as the uncertainty of observed data of P during 1970–2014 is higher (Supplementary Fig. 17). It is known that some CMIP6 climate models share common components, thereby lacking complete independence. Nevertheless, our sensitivity test shows consistent results following a leave-one-out procedure as cross-validation (Supplementary Fig. 18). Due to data limitations, we also do not differentiate between liquid soil moisture and soil ice, which may have distinct changing mechanisms in responding to future climate change.

Although SM has an upper and a lower limit, its changes are relatively small. Indeed, our results show that the raw estimates of ΔSM are within the range of −16.18–1.49% across climate models, while the constrained estimates remain within the range of −9.95–10.49% (Supplementary Fig. 19). Despite our efforts to reduce uncertainty, accurately estimating ΔSM remains challenging as other important driving factors (e.g., vegetation dynamics, vertical soil properties) were not considered in our suggested bivariate emergent constraint approach. Nevertheless, our study provides a valuable framework to measure how climate model bias in temperature and precipitation simulations would propagate and affect the projection of ΔSM under global warming. Along with the uncertainty reduction, the results suggest a shift of ΔSM from negative to positive in humid regions and an intensified drying in arid regions, which is linked to climate model bias in simulating surface water availability. The constrained ΔSM after bivariate ECs can enhance our understandings of the complex and diverse patterns of ΔSM, which has important implications for water resource management, agricultural planning, and climate adaptations, as it provides more reliable information for policy-makings. Indeed, the increased drying in dry regions may further exacerbate water scarcity and increase water stress on ecosystems, while the enhanced wetting in wet regions suggests the potential risk of flooding and thus calls for more targeted strategies to optimize water resources for risk management. The remaining low signal (multi-model mean) to noise (inter-model variance) ratio of the constrained ΔSM suggests the need to further reduce the uncertainty associated with observational data and develop additional constraints in the future.

Methods

CMIP6 models and observations

We use monthly mean output of surface soil moisture (SM), 2 m air temperature (T), and precipitation (P) from all 30 available CMIP6 models for the historical and future scenarios under SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 (Supplementary Note 1). We focus on changes in surface SM (upper 10 cm) as its outputs are readily available for all models and different models have diverse representations of soil depths for total soil moisture outputs. For the observations, three gridded datasets are utilized for temperature and precipitation, respectively (Supplementary Table 3), which cover global land areas from 1980 to the present. Reanalysis datasets are excluded due to potential artificial trends resulting from temporal assimilation changes18,50. Both the simulations and observational data were interpolated to a 0.5° × 0.5° resolution using a bilinear interpolation method, to match the resolution of CRU land mask. We also mask out Greenland and the Antarctic where surface soil moisture is not meaningful.

Bivariate emergent constraint approach

The EC method is based on a physically explainable emergent relationship between a simulated variable over the historical period (independent variable X) and a simulated variable in a future period (dependent variable Y) across an ensemble of models13,14,20,51. Observations of X are then used to constrain Y if the uncertainty in X observations is small compared to the spread in X simulations. Therefore, it is the precondition that X is defined using the same geographical boundary such that observations of X can be plugged into the emergent relations for the constraint of Y. Details of the univariate linear EC method are provided in Supplementary Method 1. Here, we introduced bivariate EC approach, which allows for incorporating two predictors (X1 and X2) into a regression model expressed in Eq. (1).

$$Y=X\beta +\varepsilon$$
(1)

For the linear bivariate constraint in Eq. (1), Y is an \(N\times 1\) dependent variable matrix \(\left(\begin{array}{c}{y}_{1}\\ \vdots \\ {y}_{N}\end{array}\right)\), X is an \(N\times 3\) covariate matrix\(\left(\begin{array}{ccc}1 & {x}_{{1}_{1}} & {x}_{{2}_{1}}\\ \vdots & \vdots & \vdots \\ 1 & {x}_{{1}_{N}} & {x}_{{2}_{N}}\end{array}\right)\), β is the \(3\times 1\) regression coefficient matrix \(\left(\begin{array}{c}{\beta }_{0}\\ {\beta }_{1}\\ {\beta }_{2}\end{array}\right)\) for X, ε is the \(N\times 1\) random error matrix \(\left(\begin{array}{c}{\varepsilon }_{1}\\ \vdots \\ {\varepsilon }_{N}\end{array}\right)\) where the random error ɛi with constant variance σ2 satisfies normality and homoscedasticity and N is the number of models. The linear regression model can be further expressed as Eq. (2).

$$Y=f({X}_{1},{X}_{2})+\varepsilon ={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\varepsilon$$
(2)

For the nonlinear bivariate constraint in Eq. (1), Y and ɛ are still the same matrix as in the linear bivariate constraint while X is an \(N\times 4\) covariate matrix\(\left(\begin{array}{cccc}1 & {x}_{{1}_{1}} & {x}_{{2}_{1}} & {x}_{{1}_{1}}{x}_{{2}_{1}}\\ \vdots & \vdots & \vdots & \vdots \\ 1 & {x}_{{1}_{N}} & {x}_{{2}_{N}} & {x}_{{1}_{N}}{x}_{{2}_{N}}\end{array}\right)\), and β is the \(4\times 1\) regression coefficient matrix \(\left(\begin{array}{c}{\beta }_{0}\\ {\beta }_{1}\\ {\beta }_{2}\\ {\beta }_{3}\end{array}\right)\) for X. The nonlinear regression model can also be further expressed as Eq. (3).

$$Y=f({X}_{1},{X}_{2})+\varepsilon ={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+{\beta }_{3}{X}_{1}{X}_{2}+\varepsilon$$
(3)

where the interaction term X1X2 is introduced to account for the joint impact of X1 and X2 on SM. This interaction term is included to account for the nonlinear effects of T and P on SM, given that the impact of T on SM depends on P while the effect of P on SM depends on T. If β3 is sufficiently small or not statistically significantly different from zero, it suggests that the variables X1 and X2 do not interact (Supplementary Table 9). A positive β3 indicates that as one covariate increases, the effect of the other covariate on the outcome becomes more positive. A negative β3 indicates that as one covariate increases, the effect of the other predictor on the outcome becomes more negative.

For each input pair (x1, x2), the variance of the predicted response can be expressed by a function of (x1, x2) as Eq. (4).

$${\sigma }_{y}({x}_{1},{x}_{2})=s\sqrt{1+A{({X}^{T}X)}^{-1}{A}^{T}}$$
(4)

where s is the least-squares error calculated by Eq. (5), which represents the estimated deviation of the simulated values from the fitted values. A is an observational based matrix \((\begin{array}{c}\begin{array}{c}1\end{array}\,\begin{array}{c}{x}_{1}\end{array}\,\begin{array}{c}{x}_{2}\end{array}\end{array})\) and \((\begin{array}{c}\begin{array}{c}1\end{array}\,\begin{array}{c}{x}_{1}\end{array}\,\begin{array}{c}{x}_{2}\,{x}_{1}{x}_{2}\end{array}\end{array})\) for the relationship of Eq. (2) and Eq. (3), respectively.

$${s}^{2}=\frac{1}{N-p-1}{\sum }_{i=1}^{N}{\left(\,f\left({x}_{{1}_{i}},{x}_{{2}_{i}}\right)-{y}_{i}\right)}^{2}$$
(5)

where p is the number of covariates in the regression model, which is 2 for the linear bivariate constraint and 3 for the nonlinear bivariate constraint. Details about the variance of the predicted response \({\sigma }_{y}({x}_{1},{x}_{2})\) are further explained in Supplementary Method 2.

Emergent constraint on future changes in soil moisture

The proposed bivariate EC approach is applied to constrain future ΔSM, with historical T and P trends (1980–2014) as the predictors X1 and X2, and future ΔSM (2070–2099 minus 1980–2014) as the predictand Y. For global and regional constraints, all variables are spatially aggregated accordingly, based on which the ECs are then applied. We use a linear (LTP) and nonlinear (NTP) bivariate EC method, considering the combined influences of T and P and their interactions, respectively. To demonstrate the effectiveness of our proposed method, we compare the ability to reduce uncertainty in future ΔSM between LTP, NTP, and commonly used univariate (linear) EC approaches based on T only (LT) and P only (LP). We first apply the EC at the global scale and then explore its value at the regional scale.

After applying the constraint, the contours of equal probability density around the best-fit regression can be expressed as probability density function (PDF) of y given x1 and x2 (Eq. (6)). Hence, the marginal PDF of the constrained y can be obtained by performing a double integration on the joint PDF of \({y}_{fu}| {x}_{{1}_{ob}},{x}_{{2}_{ob}}\) and \({x}_{{1}_{ob}},{x}_{{2}_{ob}}\) (Eq. (7)).

$${{{\rm{PDF}}}}\left(y|{x}_{1},{x}_{2}\right)=\frac{1}{\sqrt{2\pi \cdot {\sigma }_{y}^{2}\left({x}_{1},{x}_{2}\right)}}\exp \left\{-\frac{{\left(y-f\left({x}_{1},{x}_{2}\right)\right)}^{2}}{2{\sigma }_{y}^{2}\left({x}_{1},{x}_{2}\right)}\right\}$$
(6)
$${{\mbox{PDF}}}({\,y}_{fu})={\int _{-\infty }^{+\infty }}{\int _{-\infty }^{+\infty }}{{\mbox{PDF}}}({\,y}_{fu}| {x}_{{1}_{ob}},{x}_{{2}_{ob}}){{\mbox{PDF}}}({x}_{{1}_{ob}},{x}_{{2}_{ob}})d{x}_{{1}_{ob}}d{x}_{{2}_{ob}}$$
(7)

where \({{\mbox{PDF}}}({\,y}_{fu}| {x}_{{1}_{ob}},{x}_{{2}_{ob}})\) is the probability density for the future projected variable yfu given historical observable variables \({x}_{{1}_{ob}}\) and \({x}_{{2}_{ob}}\), and \({{\mbox{PDF}}}({x}_{{1}_{ob}},{x}_{{2}_{ob}})\) is the binary joint Gaussian distribution of \({x}_{{1}_{ob}}\) and \({x}_{{2}_{ob}}\).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.