Introduction

Soil moisture, a crucial factor affecting water, energy and carbon cycles1,2,3, serves as a significant source of climate predictability across a wide range of timescales4,5,6,7,8. Based on 3-hourly or daily observations and reanalysis data, previous studies have shown that moist soils and high evaporation can enhance the probability of afternoon precipitation9,10. However, both positive and negative feedbacks of soil moisture on next-day precipitation probability have been observed over the United States11 and Australia12. These reflect complex processes underlying soil moisture–precipitation coupling: both wet and dry soils can either enhance or weaken precipitation, depending on atmospheric conditions13,14,15,16; specifically, wet soil replenishes local water vapor and moistens the atmospheric boundary layer17, while dry soil enhances surface sensible heat and increases the depth of the boundary layer18,19.

Despite the abundant evidences suggesting soil moisture influences precipitation, it remains challenging to clearly discern the pathways linking soil moisture to precipitation response within the complex Earth system, and identify the conditions that favor a strong soil moisture–precipitation coupling. Moreover, past studies20,21,22,23 have emphasized soil moisture variability (standard deviation) in quantifying land–atmosphere coupling, but how the variability modulates soil moisture–precipitation coupling through different physical pathways remains poorly understood. Advancing process-level insights into this coupling can identify key mechanisms and inform the parameterization of physical processes in models, ultimately facilitating effective drought/flood prediction and emergency response24,25.

Climate modeling serves as an important tool for studying soil moisture feedbacks. Extensive model-based studies have shown that soil moisture–atmosphere feedbacks can intensify hydrometeorological aridity26 and extremes27,28, drive land carbon uptake variability29, and regulate land drying30 and water availability31. However, due to differences in atmospheric and land surface parameterizations32,33, climate models show substantial divergences in representations of soil moisture–atmosphere coupling20,34, which necessarily influence their findings. Therefore, assessment and diagnosis of climate models’ ability to simulate soil moisture–precipitation coupling and the underlying processes is crucial to the credibility of model-based understandings, predictions and projections of climate variability and extremes24,35,36.

A series of coupling indices22,23,37,38,39 has been developed to quantify soil moisture effects on the atmosphere, but such metrics do not necessarily reflect the underlying “causality”. Encouragingly, recent strides in causality analysis40,41,42,43 offer new opportunities to analyze causal effects between variables, and the availability of daily satellite observations of soil moisture44,45 facilitates the detection of surface soil moisture–precipitation (SSM–P) coupling across the globe. Here we adopt an information flow-based causal approach, Liang–Kleeman technique (see Methods), to investigate the causality of SSM on precipitation in boreal warm seasons (May–September), using daily data from Neural Network algorithm-based satellite observations of soil moisture (NNsm), Multi-Source Weighted Ensemble Precipitation (MSWEP) observations, and the European Center for Medium-Range Weather Forecasts Reanalysis version 5 (ERA5) (see Methods). We also investigate the physical pathways of SSM–P coupling and the linkage between SSM variability and causal effects across the full coupling chain. Furthermore, we evaluate how well the latest climate models in the Coupled Model Intercomparison Project 6 (CMIP6) can capture the SSM–P causality, and discuss implications of our process-level findings for model development.

Results

Causal relationship between surface soil moisture and precipitation

Based on long-term and high-quality observations, we conduct Liang–Kleeman information flow analysis and explore the spatial distribution of SSM–P coupling over the global land (60°S–60°N). The magnitude of the X→Y information flow measures the causal effect strength of X on Y, and thus, here we focus only on the absolute value of the information flow. From the NNsm and MSWEP observations, we find that the global SSM→P causal effects demonstrate large spatial heterogeneity with most values close to zero (Fig. 1a and Supplementary Fig. 1a), indicating that there is no causality of SSM on local precipitation across most of the global land. The SSM→P causality is statistically significant (p < 0.1) over only 15.7% of the global land areas, where the median and average are 0.04 and 0.05 nats day−1, respectively (Fig. 1a and Supplementary Fig. 2a). Further causality analysis based on ERA5 reanalysis data reveals similar patterns of SSM→P causality (Fig. 1c and Supplementary Fig. 1b), with 15.8% of the global land areas exhibiting statistically significant SSM→P causal effects (p < 0.1), indicating the robustness of the identified land–atmosphere coupling based on these two datasets. Both satellite and reanalysis data show obvious meridional differences in SSM→P causality, with two peaks around 10°N and 0°–10°S due to strong coupling in the tropics (Fig. 1b, d). The general lack of strong coupling in the mid-latitudes is likely because our information flow identifies causality from local SSM to precipitation, which cannot capture the impact of non-local (e.g., upwind) soil moisture that is likely more important in the mid-latitudes with generally stronger winds than in the tropics.

Fig. 1: Strong surface soil moisture–precipitation (SSM–P) causality in boreal warm seasons based on observations and reanalysis.
Fig. 1: Strong surface soil moisture–precipitation (SSM–P) causality in boreal warm seasons based on observations and reanalysis.
Full size image

a Global map of causal effects (nats day−1) of NNsm surface (0–5 cm) soil moisture on MSWEP precipitation during 2003–2020. The causal effect is expressed in term of the magnitude of the information flow. Only statistically significant (p < 0.1) causal effects are shown. The pie chart shows the land area fraction with significant SSM→P causality (colored in pink) over the global land. The royal blue lines denote the 2500-m isoline of terrain height. b Latitudinal averages of the SSM–P causality based on NNsm and MSWEP data. The shading indicates one standard deviation of the causality on each latitude range. c, d Same as (a, b), but for ERA5 surface (0–7 cm) soil moisture and precipitation during 1979–2021. The regions enclosed by gray lines are hotspots detected in this study: North India (NI; 18°–35°N, 73°–87°E, with terrain height < 2500 m), the Sahel region (SR; 10°–16°N, 20°W–40°E), tropical Africa (TA; 0.25°–10°S, 5°–45°E), North Brazil (NB; 10°S–5°N, 40°–65°W), the Pacific coast of Mexico (PCM; the three vertices of the triangle are (17°N, 110°W), (32°N, 110°W), and (17°N, 100°W)), the western Tibetan Plateau (WTP; 27°–32.5°N, 81.5°–91°E, with terrain height > 2500 m), the Iranian Plateau (IP; 30°–41°N, 32°–53°E, with terrain height > 900 m), and the Greater Horn of Africa (GHA; 0°–10°N, 37.5°–51°E).

Here we roughly divide these significant areas across the globe into eight hotspots as outlined in Fig. 1c: North India (NI), the Sahel region (SR), tropical Africa (TA), North Brazil (NB), the Pacific coast of Mexico (PCM), the western Tibetan Plateau (WTP), the Iranian Plateau (IP), and the Greater Horn of Africa (GHA). These findings remain qualitatively unchanged when the significance level is restricted to 0.05, although the significant land area fractions based on observations and reanalysis data decrease to 10.1% and 11.7%, and the average magnitudes of the causality slightly increase (Supplementary Figs. 2c–f). The timescale at which SSM affects precipitation is crucial for both weather and climate predictability. Lagged information flows further demonstrate that SSM impacts on precipitation can persist at weather to sub-seasonal timescales with regional differences (see “Methods” and Supplementary Figs. 3, 4). However, these timescales of persistence in ERA5 may be sensitive to its choices of land cover and soil parameterization46.

Overall, the causality analysis based on the two datasets shows similar strong SSM–P coupling regions except the NB, where the satellite product has poor performance45,47,48. Our identified hotspots are partially consistent with Koster et al.34, that has investigated how soil moisture in the sub-surface layer couples with precipitation, but with additional hotspots in alpine (WTP and IP) and tropical (NB, TA and GHA) regions. The present emergence of these new hotspots of SSM–P coupling is likely due to (1) deficiencies in climate models at the time of Koster et al.34., e.g., coarse spatial resolutions, (2) different spatial and temporal scales, i.e., local scale in our study versus a combination of local and non-local feedbacks in Koster et al.34., and (3) different seasons and soil depths analyzed here as compared with their study.

Physical mechanisms underlying SSM–P causality

Previous studies9,10,11 suggest that local feedbacks of soil moisture to precipitation are regulated through evapotranspiration (ET) and/or surface sensible heat flux (SH). Consequently, there are two primary pathways through which SSM affects precipitation (SSM→ET→P and SSM→SH→P), which can be roughly divided into four sub-processes with terrestrial (SSM→ET and SSM→SH) and atmospheric (ET→P and SH→P) segments. Based on ERA5 reanalysis, we analyze the causality pertaining to each of the four sub-processes and their spatial distributions. The areas with strong SSM→ET and SSM→SH causality are substantially larger than those with strong ET→P and SH→P causality (Fig. 2a and Supplementary Fig. 5). Although terrestrial processes exhibit strong coupling across much of the global land, SSM impacts on precipitation in these regions may be limited by atmospheric processes. In other words, the SSM–P coupling is more constrained by atmospheric processes than terrestrial processes. Additionally, SSM→ET and SSM→SH causality patterns show distinct spatial variations (Supplementary Figs. 5a–b). For instance, in northern India, SH is more sensitive to SSM than ET, and the SSM→ET causality is weak, aligning with previous findings22,49 based on ERA5, MERRA and MERRA2, while Australia exhibits the opposite pattern. This suggests that the SSM effects on ET and SH are not necessarily symmetrical across regions.

Fig. 2: Strong causality of sub-processes by which boreal warm-season surface soil moisture (SSM) affects precipitation (P) based on ERA5 reanalysis.
Fig. 2: Strong causality of sub-processes by which boreal warm-season surface soil moisture (SSM) affects precipitation (P) based on ERA5 reanalysis.
Full size image

a Land area fractions (LAFs; %) of significant SSM→evapotranspiration (ET), SSM→sensible heat (SH), ET→P and SH→P causality (p < 0.1) over the global land. b Global map of the pathways underlying SSM–P coupling. The SSM→ET→P pathway occurs over the overlapping areas with significant SSM→ET and ET→P causality, the SSM→SH→P pathway occurs over the overlapping areas with significant SSM→SH and SH→P causality, and thus over the overlapping areas with significant causality of all four sub-processes, both pathways occur. The royal blue lines denote the 2500-m isoline of terrain height. c Atmospheric moisture flux convergence (MFC) regime (mm day−1) and proportions (%) of grid cells with significant SSM→ET, ET→lifting condensation level (LCL) and ET→P causality over the eight SSM–P coupling hotspots. d Same as (c), but for boundary layer height (BLH) variability (−) and proportions of grid cells with significant SSM→SH, SH→BLH and SH→P causality. Here, BLH variability is defined as the ratio of BLH standard deviation to its climatological mean.

The overlapping areas of strong coupling across the four sub-processes largely reproduce the SSM–P coupling hotspots, although with some discrepancies (e.g., in the eastern United States) (Fig. 2b). Areas exclusively affected by the SSM→SH→P pathway account for 7.4% of the global land, while those affected solely by the SSM→ET→P pathway comprise 6.6%, with an additional 3.9% influenced by both pathways. This distribution indicates that both ET-related and SH-related pathways contribute significantly to land–atmosphere coupling. The total overlapping area (Fig. 2b) slightly exceeds the area of significant SSM→P causality (Fig. 1c), likely due to methodological differences in classification approaches or noise. Notably, distinct regional patterns emerge across the eight SSM–P coupling hotspots: in NI, SSM–P coupling primarily operates through the SSM→SH→P pathway with minimal contribution from the SSM→ET→P pathway; in contrast, the SSM→ET→P pathway dominates in the GHA and TA, while both pathways contribute substantially for the remaining five hotspots.

Among the eight hotspots, high proportions (exceeding 80%) of grid cells exhibit statistically significant causality for both SSM→ET and SSM→SH; however, the proportions showing significant ET→P and SH→P causality vary considerably across regions, ranging from 2.1% to 100% (Fig. 2c, d). This indicates that while the terrestrial segment’s causality is prevalent across these hotspots, the atmospheric segment’s influence shows strong regional dependence. The connection between soil moisture and precipitation operates through complex physical processes, with lifting condensation level (LCL) and boundary layer height (BLH) serving as critical intermediate links for ET and SH effects, respectively. Specifically, ET increases humidity in the lower atmosphere, thereby lowering the LCL. This reduced LCL makes it easier for air parcels to reach condensation heights, facilitating cloud formation and thus favoring precipitation. Convective precipitation typically requires the BLH to exceed the LCL. SH can elevate the boundary layer, which can trigger precipitation and potentially increase its intensity. Our analysis of the causal effects of ET on LCL and SH on BLH further confirms the observed pathway differences (Fig. 2c, d). In NI, the SSM→ET→P pathway is nearly absent, corresponding to minimal ET influence on LCL. Conversely, the SSM→SH→P pathway shows limited presence in the GHA and TA, consistent with weak SH→BLH causality in these regions. These process-level insights provide essential guidance for climate model assessment, diagnosis, and development.

To better understand how regional background climate influences these pathway differences, we examine the role of atmospheric moisture flux convergence (MFC) regime and BLH variability (as reflected by the coefficient of variation) across the identified hotspots (Fig. 2c, d). The relatively wet NI, characterized by the South Asian monsoon and predominantly external moisture source50,51, exhibits the largest MFC among all eight SSM–P coupling hotspots. This external moisture dominance renders the atmospheric boundary layer and precipitation relatively insensitive to ET changes, resulting in weak causal effects of ET→LCL and ET→P (Fig. 2c). For the remaining seven hotspots, lower MFC values make ET a non-negligible source of water vapor supply for precipitation, strengthening ET’s impacts on both LCL and precipitation. These results are consistent with previous findings13,16,52 showing that local soil moisture–precipitation feedback can be modulated by lower-tropospheric moisture, which is primarily controlled by water vapor transport.

Meanwhile, the SH–BLH/P causal relationship varies substantially across hotspots, with the proportion of strong SH→P and SH→BLH causality generally increasing with higher region-averaged BLH variability (Fig. 2d). This pattern emerges because BLH variability is affected by multiple factors53,54, including background climatology of SH and its fluctuations, wind speed, and aerosols. The low BLH variability in both the GHA and TA, which implies low sensitivity of BLH to SH variations, may be caused by strong background SH in the former and by weak SH fluctuations in the latter (Supplementary Fig. 6). As a result, SH-induced elevation in BLH is limited and unable to trigger precipitation, thereby making the SSM→SH→P pathway less viable in these two hotspots.

These findings indicate that regional differences in SSM–P causal pathways across hotspots are closely linked to moisture source characteristics and BLH variability. It should be noted that ERA5 reanalysis data show large residual moisture values (P-ET-MFC) over much of NB (Supplementary Fig. 7). This suggests a significant atmospheric moisture imbalance that may influence our derived SSM–P relationships and associated causal pathways. Future work examining the SSM–P coupling and its physical processes in this region needs more reliable data sources.

Variability–causality linkages

Given the importance of the magnitude of soil moisture variability in land–atmosphere coupling22,23, here we delve into how the variability shapes SSM→P causality as well as its physical processes. We also consider soil moisture regime (mean state of soil moisture) in view of its generally recognized role in SSM–P coupling34. Figure 3a depicts the global pattern of SSM variability, with large values concentrated in regions such as the WTP, NI, SR and PCM. Spatially, SSM variability aligns more closely with SSM–P coupling than the SSM regime (Supplementary Fig. 8a). Then, we calculate the probability of significant causality under different ranges of the two SSM characteristics. The probability of significant SSM→P causality is sensitive to changes in SSM regime, with strong causality occurring more frequently in transitional (semi-arid and semi-humid) SSM regime (0.1 ~ 0.5 m3 m3), as indicated in previous research34; meanwhile, the SSM→P causal effect is highly sensitive to changes in the magnitude of SSM variability and the probability becomes larger with increasing SSM variability (Fig. 3b). That is, strong SSM→P causality tends to occur in regions of transitional SSM regime and large SSM variability, conditions that most grid cells in the hotspots exhibit (Supplementary Fig. 8b).

Fig. 3: Relationship between surface soil moisture (SSM) characteristics (variability and regime) and probabilities of strong causality of SSM–precipitation (P) and its sub-processes in boreal warm seasons based on reanalysis.
Fig. 3: Relationship between surface soil moisture (SSM) characteristics (variability and regime) and probabilities of strong causality of SSM–precipitation (P) and its sub-processes in boreal warm seasons based on reanalysis.
Full size image

a Spatial distribution of SSM variability (m3 m3) over the period of 1979–2021. The magnitude of SSM variability is defined as the standard deviation of daily SSM. The royal blue lines denote the 2500-m isoline of terrain height. b The probability (%) of significant SSM→P causality (p < 0.1) in each bin of SSM variability and SSM regime across the globe. The number of ERA5 grid cells (Ni,j) in the bin with the ith SSM variability range (i = 1, 2, …, 6) and the jth SSM regime range (j = 1, 2, …, 6) is calculated, and then these grid cells are classified into significant and non-significant parts according to the SSM→P causality. Finally, the proportion of the grid cells (ni,j) with significant SSM→P causality in the total grid cells (Ni,j) is calculated, which is the probability (ni,j/Ni,j). c, d Same as (b), but for evapotranspiration (ET)→P and sensible heat (SH)→P causality. e The probability (%) of significant SSM→P, ET→P and SH→P causality in different ranges of SSM variability.

Our analysis further examines how the probabilities of significant causal effects for the four sub-processes vary with the two SSM metrics. Similar to our earlier findings, strong causal effects of ET→P and SH→P tend to occur in a transitional SSM regime with large SSM variability (Fig. 3c, d). However, ET→P effects predominantly appear in semi-arid conditions, while SH→P effects span the entire range of the transitional regime. Strong SSM→ET and SSM→SH causal effects also prefer regions with large SSM variability (Supplementary Fig. 9). In more detailed analyses, we categorize global grid cells according to SSM variability and calculate the proportion exhibiting significant causality of SSM→P, ET→P and SH→P within each category (Fig. 3e). Notably, these proportions increase systematically with the SSM variability magnitude, with SSM→P and SH→P causality exceeding 70% in regions of considerably high variability ( ≥ 0.09 m3 m-3). This relationship stems from the positive correlation between SSM variability and ET/SH variability (Supplementary Fig. 10): larger soil moisture variability is often associated with wider fluctuations in ET and SH, enabling them to significantly affect boundary layer dynamics and consequently precipitation formation.

Compared to the ET→P process, the SH→P process is much more sensitive to SSM variability, with its proportion showing a steeper increase with SSM variability (Fig. 3e). This differential response occurs because ET→P causality is also constrained by external water vapor sources, which indeed mediate SSM variability—large SSM variability often occurs in regions with large MFC (Supplementary Fig. 11). These higher MFC values indicate greater contributions from external water vapor sources to precipitation, which can attenuate the causal effects of ET on precipitation. Overall, both SSM→ET→P and SSM→SH→P pathways are more likely in transitional regions with large SSM variability. Additionally, SH-related processes are more sensitive to changes in the magnitude of SSM variability than ET-related processes.

Multiple factors influence SSM variability, including land surface characteristics (e.g., land cover, soil type, and slope)55, land–atmosphere interactions, and large-scale circulations56, among others. Determining the dominant factor and how these factors interact with each other to regulate the SSM variability is outside the scope of this study and will be a topic for follow-up research. It should be emphasized, however, that the spatial distribution of SSM variability during 1979–2021 has remained relatively stable over the decades (Supplementary Fig. 12), suggesting SSM variability is an emergent geographical feature with persistent spatial characteristics despite possible climate-driven changes in its magnitude.

Implications for model evaluation and development

Based on daily observations and reanalysis data, we have revealed the spatial distribution of the SSM→P causal effect, with eight hotspots identified globally. Here, we investigate whether climate models can capture these hotspots, and apply our process-level findings to further assess the model performance, as this is useful to help guide models’ representation of land–atmosphere interactions that are important to improve climate predictions21,57. We apply the same information flow technique to output from 16 CMIP6 models, chosen based on the availability of daily output for soil moisture, surface heat flux and precipitation (see Methods and Supplementary Table 1).

According to spatial distributions of the simulated SSM→P causality (Supplementary Figs. 1316) and proportions of grid cells with significant causality over the eight hotspots (Fig. 4a), most models perform poorly, capturing strong causality in only a small fraction of the strongly coupled areas identified by ERA5. Nonetheless, these climate models, except BCC-ESM1 and MRI-ESM2-0, perform relatively well in NB. Four of the sixteen models (AWI-ESM-1-1-LR, MPI-ESM1-2-LR, MPI-ESM1-2-HR and MPI-ESM-1-2-HAM) perform the best, reasonably capturing four or five hotspots (PCM, SR, IP, NB and NI), and are classified here as “ERA5-similar models”. In contrast, BCC-CSM2-MR, NorESM2-LM, MRI-ESM2-0 and BCC-ESM1 perform the worst, capturing only 19% to 36% of grid cells with significant causality over these hotspots, and are thus classified as “ERA5-dissimilar models”. These findings show that climate models vary widely not only in their simulated soil moisture58, but also in their ability to represent SSM–P causality, highlighting the need for further model improvement. These results can inform scientists when selecting climate models to explore land–atmosphere coupling.

Fig. 4: Process-level evaluation of boreal warm-season surface soil moisture–precipitation (SSM–P) causality based on CMIP6 simulations and reanalysis.
Fig. 4: Process-level evaluation of boreal warm-season surface soil moisture–precipitation (SSM–P) causality based on CMIP6 simulations and reanalysis.
Full size image

a ERA5-estimated and model-simulated proportions (%) of grid cells with significant SSM→P causality (p < 0.1) over the eight SSM–P coupling hotspots. b Comparison of the proportions of grid cells with significant SSM→evapotranspiration (ET) causality simulated by ERA5-similar models (AWI-ESM-1-1-LR, MPI-ESM1-2-LR, MPI-ESM1-2-HR and MPI-ESM-1-2-HAM) and ERA5-dissimilar models (BCC-CSM2-MR, NorESM2-LM, MRI-ESM2-0 and BCC-ESM1) with those estimated by ERA5. ce Same as (b), but for SSM→sensible heat (SH), ET→P and SH→P causality, respectively. f Comparison of the relationship between SSM variability and probability of significant SSM→P causality simulated by ERA5-similar models with that derived by ERA5. g Same as (f), but for ERA5-dissimilar models.

To elaborate on the performance differences between “ERA5-similar models” and “ERA5-dissimilar models” in representing SSM–P coupling, we compare ERA5-estimated and model-simulated causality of the four sub-processes across three hotspots, i.e., NI, PCM, and SR (Fig. 4b–e). These hotspots are selected because ERA5-similar models consistently perform well here (with MPI-ESM-1-2-HAM showing slightly weaker performance in NI), while ERA5-dissimilar models consistently underperform. In general, despite individual model differences (Supplementary Fig. 17), ERA5-similar models more accurately capture the causality of all four sub-processes, with their average proportions of significant causality consistently closer to ERA5 estimates compared to those of ERA5-dissimilar models (Fig. 4b–e). The reasons for ERA5-dissimilar models’ underestimation of SSM→P causality are region-dependent: in NI, the primary deficiency lies in weakened SH→P causality; in the PCM, models show reduced causality across multiple pathways (SSM→ET, ET→P, and SH→P); and in the SR, the main limitations are likely due to diminished ET→P and SH→P causal effects. In addition, all models severely underestimate SSM–P coupling in the GHA due to weakened ET→P causality.

We further evaluate how climate models represent the relationship between SSM variability and SSM→P causality. We find that ERA5-similar models successfully capture the ERA5-derived positive correlation between SSM variability and probability of strong SSM→P causality, while ERA5-dissimilar models fail to reproduce this relationship (Fig. 4f, g). The ability to reasonably capture the variability–causality relationship provides an additional metric for evaluating model performance, particularly in representing local land–atmosphere coupling. Improving the representation of relevant sub-processes is an important prerequisite for improving the simulation capability of SSM–P coupling. Our process-level diagnostics clearly identify specific model deficiencies both regionally and globally. Importantly, this diagnostic approach could be extended beyond SSM–ET/SH–P processes to evaluate various other physical processes in climate models.

In summary, our information flow analysis of long-term satellite observations and reanalysis data reveals strong SSM→P causality over approximately 16% of the analyzed global land, concentrated in eight distinct hotspots. We identify significant regional variations in the physical pathways linking SSM to precipitation. The SH-related pathway dominates in NI, while the ET-related pathway prevails in the GHA and TA. Both pathways contribute substantially to the SSM–P coupling in the remaining five hotspots. These regional differences in the dominant pathways can be explained by specific hydroclimate characteristics, particularly external versus local moisture availability and BLH variability. Moreover, we provide in-depth insights into variability–causality relationships, showing that strong causal effects of SSM→P and its sub-processes preferentially occur in regions with large SSM variability. Notably, the SSM→SH→P causality is most sensitive to changes in the magnitude of SSM variability. Our results imply the need for enhanced in situ monitoring of soil moisture in these hotspots to improve SSM remote sensing inversion and understanding of SSM variability.

These process-level insights also offer new perspectives for model evaluation, diagnosis, and development59. Our analysis of 16 CMIP6 models reveals that while most models struggle to accurately represent SSM–P coupling, showing region-specific underestimation of sub-processes’ coupling, four models successfully and consistently reproduce the ERA5-derived variability–causality relationship. Caution is needed when using climate models to analyze future warming-driven changes in land–atmosphere coupling, and more high-frequency data output of climate models is required for accurate assessment60. Our results promote the understanding and prediction of land–atmosphere coupling, and provide a valuable benchmark for evaluating climate models’ physical processes, which facilitates model development.

Methods

Satellite observations and reanalysis data

In order to explore realistic characteristics of land–atmosphere coupling, satellite observations are treated as truth, and state-of-the-art reanalysis data are used for cross validation. We use a global long-term satellite product of daily soil moisture, which reflects 0–5 cm surface soil moisture. This so-called NNsm product adopts a soil moisture neural network inversion algorithm to transfer the advantages of Soil Moisture Active Passive (SMAP) to Advanced Microwave Scanning Radiometer for EOS (AMSR-E) and AMSR2, and obtain long-term soil moisture observations with accuracy similar to SMAP at a 36-km spatial resolution45. The dataset is able to reproduce the spatiotemporal distribution of SMAP soil moisture, and compares well with in situ observations from the international soil moisture network around the world61, with an unbiased root mean square error of about 0.05 m3 m-3. For a substitute of global precipitation observations, we use the MSWEP version-2 product, with spatial and temporal resolutions of 3-hour and 0.1°, respectively. The MSWEP dataset is unique in that it combines in situ, satellite, and reanalysis data to obtain the highest quality precipitation estimates for each location62. In both densely observed and unmeasured areas, the MSWEP tends to exhibit better performance than other precipitation products63,64.

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate, which combines model data with observations (satellite and in situ) in the upper air and near surface65. Soil moisture in the first layer (0–7 cm) and precipitation data provided by ERA5 reanalysis are employed to re-examine the SSM→P causal effect. We also use the ERA5 data of soil moisture in the other three layers (7–28 cm, 28–100 cm, and 100–289 cm), evapotranspiration, surface sensible heat flux, boundary layer height, lifting condensation level (calculated using 2-m temperature and dewpoint temperature66), and atmospheric moisture flux convergence to analyze soil thickness sensitivity and physical processes of the SSM→P causality. The hourly reanalysis data we used are at a 0.25° spatial resolution.

CMIP6 simulations

The CMIP6 simulations are used for evaluating models’ ability to capture land–atmosphere coupling. The CMIP6 was launched by the World Climate Research Program (WCRP) to answer new scientific questions currently facing the field of climate change67,68. Daily surface soil moisture, precipitation and surface heat fluxes data are required in our study to evaluate simulated SSM–P coupling and its pathways. Under these constraints, sixteen CMIP6 historical simulations (see Supplementary Table 1 for details) are selected, including AWI-ESM-1-1-LR, MPI-ESM-1-2-HAM, MPI-ESM1-2-LR, MPI-ESM1-2-HR, TaiESM1, CMCC-CM2-SR5, CMCC-CM2-HR4, CMCC-ESM2, ACCESS-ESM1-5, NorESM2-MM, NorESM2-LM, BCC-CSM2-MR, BCC-ESM1, CanESM5, MIROC6 and MRI-ESM2-0. Daily surface soil moisture (mrsos variable), surface latent heat flux (hfls variable), surface sensible heat flux (hfss variable), and precipitation (pr variable) fields in the ensemble member r1i1p1f1 are taken from these simulations over the period of 1970–2014.

Data processing

To analyze the causal relationship between NNsm and MSWEP daily data, it is necessary to interpolate the NNsm data in time and match the two products in space. We do this using the following procedures. First, based on the nearest neighbor method, the NNsm missing data with no more than three consecutive are interpolated using the average of the NNsm non-missing data before and after the day. Second, the above filled data are spatially aggregated into common grids with 0.5° × 0.5° resolution. Finally, repeat the first step to perform temporal interpolation on the common grids, which can further reduce missing data.

The number of missing days for the NNsm data after the above processing is shown in Supplementary Fig. 18 (taking May–September 2020 as an example). It can be seen that the missing days in most regions are very few, but there are still many missing observations in high mountainous regions, which constrains our estimation of the causality in these areas (Supplementary Fig. 1a). The MSWEP data are also aggregated into the 0.5°× 0.5° common grids.

Information flow technique

Causality analysis is a core problem of scientific research, as well as an important problem in philosophy, and scientists have been working on it for more than half a century. While the convergent cross-mapping technique has been widely used in complex systems studies, it may incorrectly identify unidirectional causality as bidirectional causality when analyzing strongly coupled time series pairs40,69,70. This limitation necessitates particular caution and additional verification when applying this technique to strongly correlated variables, such as soil moisture and precipitation. Since 2014, a series of studies41,71 have shown that causality can be measured by information flow, which can be derived from first principles and does not have to appear in a semi-empirical form as traditional methods. This technique, known as Liang–Kleeman information flow, has been successfully used in a growing number of disciplines, such as earth system science72,73, brain neuroscience74, and quantum realm75.

The Liang–Kleeman technique computes the information flow from the time series of the Y2 variable to those of the Y1 variable at each grid as follows:

$${T}_{2\to 1}=\frac{{C}_{11}{C}_{12}{C}_{2,d1}-{C}_{12}^{2}{C}_{1,d1}}{{C}_{11}^{2}{C}_{22}-{C}_{11}{C}_{12}^{2}}$$
(1)

where \({C}_{11}\), \({C}_{12}\), \({C}_{2,d1}\), \({C}_{1,d1}\) and \({C}_{22}\) are the covariances among Y1, Y2 and d1; d1 is an Euler forward differenced series derived from Y1, which can be expressed as (Y1,i+1-Y1,i)/Δt, and Δt is the time step size. If \({T}_{2\to 1}\) equals zero, then Y2 is not a cause of Y1; otherwise, it is. That is, the absolute value of \({T}_{2\to 1}\) indicates the strength of Y2Y1 causality. Here, the unit of information flow is nats day−1 (nat: natural unit of information). We use this method at the grid level, meaning that our results actually explore the local influence of soil moisture on precipitation. A significance test is required for practical application. In this study, given the large sample size N (153 days year−1 × 43 years for ERA5 and 153 days year−1 × 18 years for NNsm and MSWEP), \({T}_{2\to 1}\) approximately follows a Gaussian distribution around its true value with a variance \({\left(\frac{{C}_{12}}{{C}_{11}}\right)}^{2}{\hat{\sigma }}_{{a}_{12}}^{2}\), which is due to the property of maximum likelihood estimation. \({\hat{\sigma }}_{{a}_{12}}^{2}\) can be determined by computing the covariance matrix (\({\left(N{{{\bf{I}}}}\right)}^{-1}\)) of \({{{\boldsymbol{\theta }}}}=\left({a}_{11},{a}_{12};{b}_{1}\right)\), where I is the Fisher information matrix: \({I}_{{ij}}=-\frac{1}{N}{\sum }_{n=1}^{N}\frac{{\partial }^{2}\log \rho \left({{{{\bf{Y}}}}}_{n+1}|{{{{\bf{Y}}}}}_{n}{;}\hat{{{{\boldsymbol{\theta }}}}}\right)}{\partial {\theta }_{i}\partial {\theta }_{j}}\). Thus, the significance test here can be simplified with the Fisher information matrix, and the multi-year average information flows are taken as the truth values.

This study focuses on boreal warm seasons due to the limited land areas of strong SSM–P coupling identified during boreal cold seasons (Supplementary Fig. 19). It should be emphasized that our results represent climatological causal effects based on calculations performed for each year. This approach is justified because we find that despite interannual variations, regions with significant SSM→P information flows consistently exhibit strong coupling throughout most of the analysis period, rather than only during occasional extreme years (Supplementary Fig. 20). Moreover, based on a month-by-month analysis, which is conducted by applying the information flow technique to data from the same calendar month across years, the SSM–P causality identified in our study has been shown to be minimally affected by seasonality (Supplementary Fig. 21). We also apply the information flow technique to SSM and subsequent precipitation to quantitatively analyze the temporal scale of SSM→P causality.

Timescales of the detected SSM–P causality

If the feedbacks occur over long-term timescales, the causality of deep SSM on precipitation should be stronger than that of shallow SSM. By exploring how the causality varies with soil thickness using ERA5 reanalysis, we find that the causal effects gradually weaken with increasing soil thickness, but the eight hotspots display different degrees of sensitivity to changes in surface soil thickness (Supplementary Fig. 22). Particularly, the SSM→P causal effect strength over the WTP decays most rapidly, completely disappearing at a soil depth of 0–100 cm, mainly due to sparse vegetation and the predominance of soil evaporation in this region76. Thus, here we propose that the detected causality above is linked to short-term SSM–P feedbacks.

To quantitatively define the timescale of SSM affecting subsequent precipitation, we analyze its lag causality. Specifically, we use 1/e of the proportion of grid cells with significant 0-day-lag SSM→P causality as the criterion for determining the timescale for individual hotspots. When the significant proportion of n-day-lag SSM→P causality falls below this criterion and that of (n−1)-day-lag causality exceeds it, the strong SSM–P coupling is considered to disappear at the n-day lag, indicating that the SSM effect on precipitation can last up to n days. Supplementary Fig. 3 shows the proportion of significant SSM→P causality for different lag times. In TA, WTP and NI, the proportions decrease sharply as lag time increases, with SSM affecting precipitation up to 15, 10 and 12 days later at the 10% significance level, respectively (Supplementary Fig. 3b, f, h). In the other hotspots (GHA, PCM, NB, IP and SR), the timescales of the detected SSM→P causality are more than 16 or even 32 days (Supplementary Fig. 3a, g, c–e). As the selected significance level decreases (p < 0.05), the timescales become shorter (Supplementary Fig. 4). Nonetheless, the influence of SSM on precipitation over these hotspots is consistently evident at weather to sub-seasonal timescales, regardless of the significance level (p < 0.1 or 0.05). Of these eight hotspots, NI and WTP have the shortest timescales, consistent with the earlier result that their causality decays fastest with soil thickness (Supplementary Fig. 22).

Land area fraction

Since data are provided or processed in a regular longitude-latitude grid, the area of grid cells is not always equal. When calculating the significant land area fraction, we take into account the variation of the grid cell’s area with latitude.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.