Background & Summary

Soil moisture (SM) is a critical component of the global hydrological cycle and a key climate variable influencing water, carbon, and energy fluxes at the land-atmosphere interface1,2,3,4,5. It influences hydrological processes such as runoff, infiltration, and evapotranspiration, with broad applications in weather forecasting4,6, drought monitoring7,8,9,10, flood prediction, and agricultural management11. SM is typically divided into surface soil moisture (SSM) and root zone soil moisture (RZSM), while RZSM being particularly critical as it regulates plant transpiration, nutrient uptake12, and drought resilience, and plays a vital role in climate feedbacks, groundwater recharge13, and ecosystem stability14. Accurate and continuous SM estimation, especially at across soil depths, is essential for understanding terrestrial water dynamics and mitigating climate-related risks.

Despite its significance, obtaining high-quality SM data with adequate spatial and temporal resolution remains a challenge15,16. In-situ networks17,18,19,20 offering high accuracy SM observations and vertical profile insights but are limited by sparse spatial coverage due to logistical and financial constraints. Satellite-based missions (e.g. SMOS21, SMAP22) enable global coverage but are restricted to the top ~5 cm of soil23 and often perform poorly in densely vegetated24,25, topographically complex26, frozen27, or snow-covered28,29 environments, resulting in data gaps. Alternatively, physics-based models such as Land Surface Models (LSMs) and Earth System Models (ESMs)30,31,32 provide multilayer SM estimates, but rely on parameterizations that introduce uncertainties due to incomplete physical representations and meteorological forcing errors2,33,34, especially for RZSM35.

Machine learning (ML) approaches have recently emerged as powerful alternatives, enabling data-driven SM estimation by leveraging large-scale environmental data. Several studies have pioneered the application of ML to estimate SM, especially for RZSM, and introduced a number of datasets36,37,38,39,40,41, NNsm37 provides SSM with 36-km resolution at a global scale (daily, 2002–2019) using Artificial Neural Networks (ANN), SoMo.ml38 offers global SM at three soil layers (0–10, 10–30 and 30–50 cm) with 0.25° spatial resolution (daily, 2000–2019) based on Long Short-Term Memory neural network (LSTM), SoMo.ml-EU40 with 0.1° resolution over Europe as an advancement of Somo.ml, and SMCI41 delivers multilayer SM (0–100 cm) at 1-km resolution over China using Random Forest42 (RF). However, these ML-based datasets are often restricted by coarse spatial resolution, lack of multi-depth information, or limited validation across climatic regimes.

To address these limitations, we introduce SMRFR (Soil Moisture via Random Forest Regression), a long-term, global, daily, multilayer SM dataset generated using a novel ML-based framework (Fig. 1). Our approach combines quality-controlled in-situ SM data from International Soil Moisture Network (ISMN20) and multi-source predictors from ERA5-Land reanalysis32 and remote sensing products (e.g., MODIS vegetation indices, soil properties, and topographic features). To ensure robust learning and generalizability, we employed optimized RF models and applied Extended Triple Collocation43 (ETC) method to select high quality stations for training.

Fig. 1
figure 1

Schematic workflow for generating the SMRFR product, including data pre-processing, model training, and output production.

SMRFR provides globally consistent daily SM estimates at five depth layers (0–5, 5–10, 10–30, 30–50, and 50–100 cm), from 2000 to 2023, with a spatial resolution of 9 km (see Table 1). Compared to existing satellite-based or model-derived products, SMRFR overcomes key limitations by: (i) offering multilayer SM profiles beyond surface-only estimates, (ii) improving spatial resolution than typical ML datasets, (iii) utilizing strict data quality control and harmonized multi-source inputs, and (iv) enabling potential transferability to finer resolutions (e.g., 1 km) and regional applications. SMRFR bridges gaps between existing methods and datasets, providing a scientific foundation for improved SM modelling, climate impact research, and water resource management.

Methods

In-situ SM observations

In-situ SM data measured by ISMN stations was obtained as target SM data, all the SM time series were resampled to a daily temporal resolution to synchronize differences across sensors. Following established quality control guidelines44, measurements flagged as unreliable were removed. In addition, sensors with insufficiently documented data (e.g., fewer than 200 days of records) were excluded to accommodate interannual SM variability while ensuring enough effective stations.

Another in-situ SM dataset45 was obtained from the National Center for Monitoring and Early Warning of Natural Disasters (CEMADEM) of Brazil for evaluating the capability of SMRFR in transferring knowledge of SM dynamic across regions (e.g., across-continents)46, whose spatial representativity has been proved47. Outliers were removed and data completeness was checked to ensure dataset integrity for evaluation purposes.

Table 1 Specifications of SMRFR.

Predictors for SM modelling

The predictors employed in RF models (see Table 2) were carefully selected based on their strong relevance to SM dynamics. The dynamic data component was primarily obtained from the ERA5-Land32, the application of ERA5-Land effectively circumvents the challenges associated with spatial scale48 and time coverage38 inconsistencies of multiple remote sensing observations. Furthermore, the timely updates of ERA5-Land facilitate the continuous generation of SMRFR, its fine spatial (9-km) and temporal (hourly) resolutions make it particularly suitable for capturing short- and long-term soil water dynamics. Recognizing that SM dynamics are intricately intertwined with meteorological factors, yet ultimately manifest in SM itself, we incorporated SM as a predictive variable. This approach encapsulates the information typically reflected by a multitude of meteorological predictors, thereby reducing the reliance on auxiliary data through a form of data assimilation. MODIS vegetation indices (e.g., Normalized Difference Vegetation Index, NDVI, Enhanced Vegetation Index, EVI) were included to capture vegetation’s role in SM retention and evapotranspiration. Vegetation mediates the exchange of water between the land and atmosphere, thus playing an essential role in both retaining and depleting soil water.

Table 2 Predictors and target data for SM modelling.

Variations in topography, altitude, and vegetation cover affect solar radiation and hydrological processes like runoff. Additionally, soil heterogeneity, including differences in structure, composition38, and water retention capacity, influences the horizontal distribution and vertical movement of SM49,50,51,52,53,54. Thus, static predictors (e.g. topography, soil texture, bulk density, and field capacity) were incorporated to account for the effects of terrain and soil hydraulic properties on water infiltration and retention.

To ensure consistency across diverse input sources, all predictors were pre-processed to match the SMRFR grid (9 km, daily). ERA5-Land variables were averaged to daily means, and Vegetation indices were linearly interpolated to daily frequency. Soil properties data (e.g., sand, clay, bulk density) were aggregated from 250 m to 9 km using spatial means. All predictors were then mapped to target soil depths, projected, and clipped to a unified global land mask.

Model training and application

ML performance improves with the accuracy of input data55,56, while ETC approach57,58 has been validated as an effective tool for enhancing estimation accuracy by controlling the quality of the training data59,60,61. It is based on assumptions of (i) orthogonality of product errors, (ii) independence among the errors of the three datasets, and iii) errors in the products that are linearly related to the reference dataset. In this study, we applied the ETC method to evaluate the consistency among three independent sources: in-situ observations, land surface model outputs, and remote sensing products (see Table S1 in Supplementary Information document) and the assumed “truth”. Based on the coefficient of determination (R2 = 0.762), we selected high-quality stations for training. A total of 433 stations were retained as the final representative subset (Fig. 2). This selective strategy ensures that both model training and validation are grounded in the most reliable and representative SM data available.

Fig. 2
figure 2

(a) Spatial distribution of the SM stations, including 434 representative and 1623 failed stations. (b) Valid data length and the number of target SM stations per soil layer from 2000 to 2023.

We initially evaluated multiple ML algorithms, including Support Vector Regression (SVR), K-Nearest Neighbor (KNN). Among them, the RF algorithm showed the best overall performance in terms of root mean square error (RMSE) and correlation coefficient (see Table S2), particularly in geographically heterogeneous regions. The quality and representativeness of the training data are critical to model performance63. Therefore, we restricted the training set to a subset of carefully selected stations. Furthermore, estimated SM derived from the overlying layers were incorporated as input variables to enhance the predictive capabilities for deeper soil layers, a strategy previously validated38. A five-fold cross-validation grid search was conducted to optimize RF hyperparameters, identifying the configuration that maximized model accuracy. For each soil layer, the model was trained and validated on the corresponding curated dataset to ensure robustness and representativeness. The final major hyperparameter settings used were n_estimator = 1100, max_depth = 560.

Importance of predictors

The contributions of predictors in SM modelling were assessed using the Mean Decrease in Accuracy (MDA) metric, which quantifies the decline in model performance when a predictor’s values are randomly permuted. To facilitate a systematic evaluation, we categorized predictors into groups based on their type: static attributes (e.g., topography, soil properties), vegetation indices (VI) (e.g., NDVI & EVI).

As depicted in Fig. 3, the dominant role of SM from upper layers in predicting deeper-layer SM highlights the importance of vertical water transfer and moisture memory effects, especially in lower layers where atmospheric influence is reduced. This vertical dependency enables the model to capture the lagged infiltration processes and persistent storage effects that are key to RZSM dynamics.

Fig. 3
figure 3

Relative importance of predictors in SM modelling. Static predictors are grouped under “static”, and vegetation indices under “VI”.

Among non-SM predictors, soil properties (e.g., clay and sand content, field capacity) exert more influence on spatial variability than on temporal fluctuations64. These features govern the infiltration rate, water retention capacity, and hydraulic conductivity of soils65,66, especially under contrasting soil types (e.g., sandy vs. clayey regions). In regions with limited vegetation or low rainfall variability, these soil properties can dominate SM behavior. VI also play a crucial role in surface and near-surface layers, as they influence SM through both direct mechanisms (e.g., interception, transpiration, root water uptake) and indirect effects (e.g., seasonal phenology, surface energy balance regulation), all of which strongly short-term SM dynamics67,68. Their contribution to model accuracy decreases with depth, which is consistent with the diminishing role of vegetation processes below the rooting zone. In contrast, soil temperature exhibited marginal contribution, likely because their influence is already implicitly captured through other variables like SM and VI.

Data Records

The SMRFR dataset can be accessed at figshare69. The compressed files (.zip) contain data in zarr format for the five respective layers. An example file name is “SMRFR_ < YYYY > _v1.0.zarr”, with YYYY standing for year.

Technical Validation

We evaluated the suitability and potential of ML-based models for estimating SM data, concentrating on three key aspects. First, we examined the modelling performance during the training phase. Second, we evaluated the temporal dynamics and spatial patterns of SMRFR. Finally, we assessed the capability of SMRFR in knowledge transfer scenarios. The ability of these models to transfer knowledge across regions is vital for producing enhanced quality data in domains where observations are scarce.

Evaluation on SMRFR and its modelling

Validation of SM Modelling

As shown in Fig. 4a, SM estimates exhibit a strong correlation with in-situ measurements. Model performance improves with increasing soil depth, as indicated by higher correlation coefficients (ranging from 0.947 to 0.982) and lower unbiased RMSE (ubRMSE, decreasing from 0.035 to 0.022 m³/m³). This trend likely reflects the greater temporal stability of deeper soil layers, which are less affected by short-term meteorological variations and surface interactions, leading to more predictable moisture patterns and reduced model uncertainty. The frequency distributions in Fig. 4b further demonstrate this consistency, showing close alignment between estimated and observed SM values. A slight overestimation is observed around medium SM levels (0.2–0.4 m³/m³), which may be influenced by regional variations in soil hydraulic properties and vegetation cover. Figure 4c highlights the model’s robustness under diverse climatic conditions, with estimated SM values closely matching in-situ data across both arid and humid regions.

Fig. 4
figure 4

Comparison between SMRFR (green) and in-situ SM (blue) in the validation set across five layers. (a) Scatter plots, (b) frequency distributions, and (c) violin plots across different climatic zones.

To further assess model robustness under diverse climate regimes, we evaluated SMRFR performance across five Köppen-Geiger climate70 zones using validation stations withheld from model training (see Fig. 5 and Fig. S1). The results show that the model performs best in temperate and continental climates, with the highest correlations and lowest errors. Tropical and polar regions performed slightly worse in comparison, with higher variability and errors, which may be due to complex vegetation dynamics, snow-related processes and fewer ground truth data.

Fig. 5
figure 5

Evaluation of SMRFR SSM (0–5 cm) performance across five Köppen-Geiger climate types using validation stations excluded from model training. (a) Correlation coefficient, (b) Bias, (c) ubRMSE, and (d) MAE are shown as violin plots overlaid with boxplots. Sample size is indicated above the plots in (a).

These climate-based differences highlight both the generalizability and limits of SMRFR and emphasize the importance of region-specific evaluation in global-scale modelling. Benefiting from extensive data collection and rigorous quality control, our training data can encompass a wide spectrum of various climatic conditions, enabling strong generalization even in complex hydrological environments. In summary, the ML models effectively capture SM dynamics and can accurately estimate SM at unseen locations.

Temporal dynamics of SMRFR

We compared in-situ SM, estimated SM, and local precipitation to investigate the temporal dynamics of SMRFR at stations (see Fig. 6 and Figs. S24). The dynamics of SMRFR aligned closely with in-situ SM, particularly in upper soil layers, with well-aligned scatters patterns. During the dry season with minimal precipitation, both SMRFR and in-situ SM showed low, stable moisture levels, suggesting that the model effectively captures seasonal depletion and is sensitive to rainfall dynamics.

Fig. 6
figure 6

Time series and scatter plots of in-situ SM and SMRFR at different depths for representative station. Each plot includes daily precipitation. KGC stands for Köppen-Geiger climate type; LC stands for land cover.

A wet bias was observed in SMRFR compared to in-situ SM, intensifying with soil depth and dry season progression. This may stem from the high hydraulic conductivity71 of local soil (e.g., sandy clay loam at station Yosemite-Village-12-W), characterized by large pore spaces, having high infiltration rates but low water retention capacity, leading to fast dry-down in the dry season. This highlights a shortcoming of ML models, which has limited ability to learn soil-specific hydraulic properties.

Following a prolonged dry period, a moderate rainfall in early September triggered rapid wetting in shallow layers, while deeper layers (>30 cm) remained unaffected. This reflects increased water absorption capacity in desiccated soils. In contrast, during the wet season, elevated SM levels allowed infiltration to reach deeper layers (e.g., 50–100 cm, as seen in April). These examples highlight the model’s ability to empirically capture physically plausible moisture dynamics through data-driven learning.

However, SMRFR showed a muted response to intense rainfall compared to in-situ SM, likely due to the inherent averaging effect of RF outputs, which reflect the arithmetic mean of numerous decision trees, leading to a consensus result devoid of extreme values. While this reduces variance and noise, it limits model’s capability to replicate sharp infiltration and runoff responses. A hybrid ML-Physical modelling approach (e.g., integrate with hydrological models) might enhance the physical realism of SM dynamics, especially for infiltration and runoff processes during extreme rainfall events.

Spatial patterns of SMRFR

We further assessed the spatial patterns of SMRFR and its response to extreme events. For illustration, we analysed the localized multi-layer SM maps before and after an extreme rainfall event (details in Fig. 7a). SMRFR provides a comprehensive depiction of SM characteristics within this region, featuring high SM levels in coastal monsoon regions (e.g., the Indian Peninsula and southern China) and drier conditions in interior arid zones. Wet-up patterns correspond to precipitation levels and align with the extreme rainfall event72. During this event, the localized total precipitation exceeding 700 mm in the Indian Peninsula, south-central China, and the Himalayan region, leading to significant SM increases. SMRFR effectively captures both the spatial continuity and localized variability of SM changes.

Fig. 7
figure 7

Multi-layer SM response to an extreme rainfall event (June 25 to July 10, 2020), covering 3°N–55°N and 73°W–135°W. (a) Total precipitation, (b) elevation (DEM), (c) NDVI (July 11), (dh) SMRFR on June 25, (im) SMRFR on July 10, and (nr) SM differences.

Within the Central and Southern Peninsula, SM profiles vary notably with depth, where northern regions exhibiting lower SM levels in deeper layers. This likely reflects vertical heterogeneity in soil texture and land cover, influenced by elevation gradients. For example, plains dominated by crops may have shallower rooting depths compared to northern forested areas, affecting vertical SM redistribution73.

Difference maps (Fig. 7n-r) highlight significant wet-up in surface layers following the event, while deeper horizons show more limited responses. This attenuation may result from combined effects of rainfall interception, inherent evaporation, and lateral movement of SM. Notably, along the Himalayas region, despite intense rainfall, minimal SM increase were observed. This suggests that little to no rainfall infiltrated into the soil to wet it further, it could be influenced by near-saturated or saturated soil, which promote runoff, or frozen/snow-covered soils, which inhibit infiltration.

To further evaluate the spatial representativeness of SMRFR, we conducted an in-depth regional analysis across three geographically and climatically distinct areas: the Loess Plateau (China, temperate semi-arid), the Cerrado (Brazil, tropical savanna), and the Central Great Plains (USA, temperate continental), as illustrated in Fig. 8. These regions encompass a wide range of terrain complexity, land cover types, and soil properties, offering a representative testbed for evaluating the SMRFR’s ability to capture localized SM variability.

Fig. 8
figure 8

Evaluation of SMRFR and environmental conditions in three regions: (a) Loess Plateau (China), (b) Cerrado (Brazil), and (c) Central Great Plains (USA). Columns include: (1) SMRFR annual mean SM (2020), (2) CCI annual mean SM (2020), (3) correlation with CCI, (4) RMSE, (5) elevation (DEM), (6) annual mean NDVI (2020), (7) Clay fraction, and (8) Köppen-Geiger climate classification.

In the Central Great Plains, SMRFR closely follows terrain-induced SM gradients, with higher moisture observed in lowland areas and declining levels toward elevated zones. These spatial variations are consistent with local patterns in clay content and vegetation density, reinforcing SMRFR’s sensitivity to surface and subsurface hydrological controls. In contrast, SMRFR performance deteriorates in more topographically heterogeneous environments, such as the southern Loess Plateau and the southern Cerrado Plateau, where rapid changes in elevation and sparse vegetation cover introduce higher uncertainty. This is reflected in both reduced correlation and elevated RMSE. These examples collectively demonstrate SMRFR’s strengths in relatively homogeneous landscapes and highlight areas where further refinement or targeted calibration could improve accuracy in complex terrains.

Comparison with global SM datasets

In this section, we examined the spatial patterns of SMRFR and existing datasets at global scale. The long term global SSM and RZSM states over the entire period (2000–2023) are presented in Fig. 9 and Fig. S5. Overall, SMRFR exhibits spatial distributions consistent with reference datasets, capturing expected SM gradients driven by topography and climate, characterized by (i) wetter conditions in tropical and monsoon regions, and (ii) drier conditions in inland and highland regions.

Fig. 9
figure 9

Comparison of SMRFR with long term SSM means from ERA5 land, GLEAM, and ESA-CCI. Along the diagonal are individual dataset maps, above the diagonal are difference maps (row minus column) and below the diagonal are frequency distributions comparisons. Only grid cells where all datasets are available are used.

Notably, SMRFR unveils distinctive regional traits, especially in RZSM maps, where higher SM levels are observed in some arid regions (e.g., the Sahara Desert and the Middle East) compared to other datasets. These disparities may originate from multiple factors, including differences in input data, processing methodologies and fundamental distinctions in the operational mechanisms between ML and LSMs in simulating SM dynamics. This highlights the necessity for intensive evaluation and inter-comparison studies to better comprehend the underlying causes of these variations. The differences between frequency distributions may reflect the ML algorithm’s heightened sensitivity to distinct features embedded within the training data. Continued efforts are needed to enhance the accuracy and robustness of SM estimations.

Capability of ML models to transfer knowledge across continents

In regions with limited or no in-situ observations, SM data are generated by learning SM mechanisms from other locations. Therefore, it’s essential to validate capability of SMRFR to transfer knowledge across regions (e.g., across continents).

To this end, the independent SM dataset45 was employed as a reference, along with other SM products. Notably, none of stations from this dataset was involved in SM modelling. Additionally, none of representative stations were distributed in eastern Brazil (see Fig. 2), where the stations of this dataset located in. Thus, SMRFR estimates in this region rely purely on knowledge learned from external training data. We focus on: (i) the responsiveness of SMRFR to local SM dynamics and rainfall events and (ii) consistency and differences between SMRFR and other datasets.

As depicted in Fig. 10, all the time series exhibit strong consistency in temporal SM dynamics, with SM peaks occurring along with heavy rainfall in rainy season while lower SM levels in dry periods. The nearly consistent dynamic fluctuation of SMRFR in tandem with in-situ SM showcases the heightened sensitivity of SMRFR to minute SM variations, highlighting its capability to simulate both seasonal and interannual SM variability. Despite this, varying degrees of wet bias are observed across all four products compared to in-situ SM, suggesting the overestimations in such semi-arid regions. Notably, SMRFR performs best in this transfer-learning scenario, with a mean absolute error (MAE) of 0.0331 m³/m³, outperforming ERA5-Land (0.1286), ESA CCI (0.1091), and GLEAM (0.1681).

Fig. 10
figure 10

Time series and scatter plots of SSM (0–10 cm) mean at CEMADEM stations. (a) All stations (all, 360) and subsets in (b) Bahia (BA, 133), (c) Ceará (CE, 64), (d) Pernambuco (PE, 42) and (e) Piauí (PI, 32).

Station-level evaluation (Fig. 11) further confirms SMRFR’s superior accuracy, showing the lowest bias, ubRMSE and MAE of 0.0273 m³/m³, 0.0339 m³/m³, 0.0439 m³/m³ and acceptable correlation coefficient of 0.65. The distributions of MAE and correlation coefficients reveal that while the relative temporal dynamics of SM are well-captured in these datasets, accurate estimation of SM values remains a challenge, particularly for LSM-based datasets74,75.

Fig. 11
figure 11

(a) Distribution of bias, correlation coefficient, ubRMSE and MAE between four datasets and in-situ SSM (10 cm) of eastern Brazil. (b) Boxplots of performance metrics, which are calculated between SM datasets and in-situ SM for single station. Red lines indicate the best performance.

In summary, SMRFR demonstrates three key advantages: (i) strong ability to capture temporal SM dynamics and seasonal patterns, (ii) responsiveness to both short- and long-term rainfall variations and (iii) reliable estimation of absolute SM levels. These strengths reflect ML model’s robust learning capacity and its potential to support knowledge transfer across diverse regions.

However, it is worth noting that the current validation is grounded on semi-arid regions. Whether the ML-model can maintain its superior performance under disparate climatic conditions (e.g., extreme drought, high humidity, or more complex scenarios) remains uncertain. Future investigations should extend evaluation to broader environmental contexts to further enhance model robustness and generalizability.

Usage Notes

Despite the significant enhancement in accessibility of global SM datasets, there persist limitations, for instance, coarse spatial resolutions (25–50 km)22,38,76,77, simplified vertical structures (e.g., single root zone layer)78 and uneven continental coverage36,39,40. These limit the consideration of land surface heterogeneity and large-scale analysis of climate-SM interactions. To satisfy the increasing requirement for high resolution SM data16,79, this study introduces an innovative framework that harnesses the prowess of ML to systematically generate comprehensive, global-scale, multilayer SM datasets, using multi-source data. As a pilot step, we generated SMRFR, a novel SM dataset which provides global daily SM estimates at five soil layers (0–100 cm) at 9-km resolution from 2000 to 2023, with planned enhancement to 1-km resolution.

During training and validation, the ML framework demonstrated remarkable proficiency in deciphering intricate, nonlinear relationships and dynamic interactions between SM and environmental drivers. Consistent with earlier findings38, SM itself emerged as the dominant factor in determining the model inputs, suggesting that the framework can be streamlined to reduce data dependency. SMRFR also can exhibits strong transferability, particularly in semi-arid regions, where it effectively captures seasonal and event-scale SM fluctuations in the absence of local training data.

As earth observation data expand, ML continues to gain traction in earth system modelling, including SM, solar radiation80 and precipitation81. However, representing global SM dynamics with a single model per soil layer remains challenging, given the diversity in soil properties, topography, surface roughness, vegetation cover, and freeze-thaw dynamics. These complexities can lead to region-specific performance variations56. Future improvements may benefit from regional model specialization or clustering approaches39. In addition, while SMRFR is suitable for large-scale and regional studies, current resolution may be insufficient for field-scale applications. Finer-resolution SM datasets (e.g. hundred-meter scale or small) are needed to support precision agriculture and irrigation planning16,82.

SMRFR supports a range of applications across agriculture, hydrology, and ecology. It can enhance hydrological models (e.g., SWAT, VIC) by providing high-resolution, multilayer SM inputs that improve simulations of runoff, infiltration, and evapotranspiration processes. For instance, surface layers (e.g., 0–30 cm) support plant-available water estimation in SWAT, while the deeper profiles (up to 100 cm) enhance root-zone moisture representation in VIC. SMRFR can also complements satellite-based SM products for gap-filling and bias correction, and may be integrated with climate datasets (e.g., CMIP6) to assess long-term SM trends and their climate impacts. Regional biases can be further corrected using in-situ networks (e.g., COSMOS, FLUXNET. By facilitating seamless integration, calibration, and validation, SMRFR can enable large-scale SM analysis and provide new opportunities for drought monitoring, water resource management83, and ecosystem research.