Abstract
Open-source legacy data available for training soil organic carbon (SOC) models are limited and not uniformly distributed in space or time. While some process-based models predict SOC changes, most of the large-scale data-driven SOC modeling efforts overlook temporal shifts. Accounting for the expected temporal drift allows us to increase the accuracy of dataset available for machine learning models. Here we present an approach for creating proximity-based distance matrices using the legacy data available in contiguous US (CONUS) and generating spatially resolved temporal shift projections that adjust observations to the target date. The approach was evaluated by comparing SOC observations projected to two reference years, SOC1980 and SOC2020 and without temporal adjustment (SOCno−adj). Stocks of SOC projections showed significant differences between SOCno−adj and SOC2020. Baseline estimate of SOC stocks in CONUS croplands (top 1 m) were higher based on SOCno−adj (14.49 Pg C) compared to SOC2020 (13.29 Pg C), for pasture lands 15.49 Pg (SOCno−adj) and 14.22 Pg C (SOC2020), for forest lands at 39.52 Pg C (SOCno−adj) and 40.83 Pg C (SOC2020). The study results confirmed the validity of our methodology, and its capability to enhance SOC stock projections effectively with temporal adjustments. Potential users of this study’s outcomes include many stakeholders involved in carbon incentive programs, including farmers, scientists, policy makers, and industry partners.
Similar content being viewed by others
Introduction
Soil organic carbon (SOC) is an important ecosystem commodity, providing many critical ecosystem services, and is a crucial component of soil health. Increasing SOC stocks is directly linked to improving soil fertility, water and nutrient holding capacity, and soil biodiversity1,2and is a promising climate mitigation strategy3,4. Global agricultural lands can be leveraged through appropriate climate-smart agricultural practices to increase SOC sequestration, estimated to be between 0.35 and 1.18 Gt of CO2per year5,6,7. Expansion of soil carbon monitoring in agricultural lands is anticipated to grow for supporting carbon credit markets designed to leverage additional land resources. Soil carbon monitoring programs require accurate quantification of SOC baseline stocks defined as the expected average soil carbon at a given location3,8, which can also aid in generating high-quality carbon credits9,10. Given the complexity of SOC dynamics and nonlinear responses across spatial and temporal scales11,12, it is essential to represent the baseline SOC stocks dynamically to reflect current levels13. Modeling at higher spatial resolution is also essential to accurately account for different land use practices driven changes14. Estimating the current baseline stocks would ideally involve collecting soil samples for analysis at individual farms, as this would yield the most precise results15. However, this approach is expensive and challenging to implement on a large spatial scale, such as the Conterminous United States (CONUS).
A variety of techniques have been employed to model and map SOC baseline content, including process-based models16,17,18and data-driven machine learning (ML) models. ML models are particularly effective at larger spatial scales as they can be trained using environmental covariates such as climate, topography, vegetation, parent material, and other factors at various spatial scaless19,20. Additionally, geostatistical approaches like regression kriging have accounted for spatial autocorrelation alongside environmental correlations21. Despite these advancements, the predictive performance of these techniques is constrained by limited sampling density and extent when compared to the inherent spatial heterogeneity22. Crucially, the temporal dimension has been largely overlooked when using extensive legacy SOC data collected over multi-decadal timespans in ML models23,24. Most models aggregated multi-year data, treating it as a singular moment, which introduces systematic errors by assuming that soil carbon levels remain constant over time. This assumption contradicts compelling evidence of the impact of climate change, land use dynamics, and conservation practices on soil carbon storage and fluxes25,26. Reliable models for predicting temporally resolved SOC baselines at scale are unavailable27,28.
Numerous ML techniques and algorithms show promise for large-scale modeling by leveraging legacy data and remotely sensed environmental information29,30. The Gradient-Boosted Decision Tree framework, like the CatBoost31,32, is particularly interesting for its capability to efficiently construct and organize ensemble decision trees using categorical covariate data, which has proven highly valuable in modeling diverse datasets33. Additionally, it enables the inclusion of covariates that influence SOC stocks at various kinetic orders28, and spatial scales34, making CatBoost models highly adaptable and precise for a wide range of environmental conditions. Furthermore, space-for-time substitution could be integrated to fill temporal gaps, where the spatial gradient is used for accounting for temporal variation and has been employed to enhance SOC predictions on explicit datasets35,36. Thus, applying gradient-boosted ML modeling to establish dynamic baselines and extend them to broader regions has merits; however, further advancements are required to account for the spatial and temporal drivers influencing SOC stocks29. Particularly, improvements should enable efficient hyper-parameter learning that accounts for spatial heterogeneity within a small region (e.g., a working land system or a watershed scale) and facilitate the effective use of space-time substitution to project temporally resolved SOC data within a specified area.
In this study, we have proposed and evaluated a two-stage methodology for time-adjusted mapping of SOC across the CONUS by leveraging an extensive compilation of legacy soil profile data that lacks repeated measurements. The primary objective was to integrate temporal and spatial analysis to create an approach that leverages these long-term records for mapping contemporary carbon distribution on a continental scale.
Results
Evaluation of the time adjustment method
A robust data set for modeling SOC in CONUS with over 21,000 data points was created after pre-processing, as discussed above. This data had spatiotemporal gaps (Fig. 1), highlighting discrepancies in spatial distribution and temporal coverage. Utilizing our temporal adjustment method (Fig. 2), we adjusted the multi-decadal SOC observations to 2020 (Fig. 3). This dataset represents the time-adjusted baseline data product, which can be used with bio-climatic and geomorphological variables to generate a spatially explicit SOC dataset for model training. The distribution after temporal adjustment to 2020 shows a noticeable leftward skew compared to non-time-adjusted data (Fig. 4), signifying a decrease in the overall averages in SOC values. This shift towards lower ranges aligns with the expectation of SOC loss over time in the CONUS. However, we observed positive and negative projections for temporal shifts (slopes) in SOC changes (Fig. 5). This variance underscores the spatial sensitivity of the approach to factors such as land use and other input variables.
Model performance
Three models were developed to estimate: (a) SOC without time adjustment (SOCno−adj), (b) SOC adjusted to reflect likely 2020 conditions (SOC2020), and (c) SOC adjusted to reflect likely 1980 conditions (SOC1980). For SOC1980, all data was adjusted to 1980 using a temporal adjustment process similar to that described for 2020. The 2020 adjusted model evaluated on the independent testing dataset exhibited an MAE of 3.38 and an RMSE of 4.49 (R²=0.41); all data is in kg/m2 units (Table 1). Given the complexities associated with SOC prediction, the decrease in performance from train to test is expected and in line with other studies. With SOC2020 adjustment, the expected SOC stock ranged between 0 and 32.84 kg m−2 (Fig. 6), whereas the range was between 2.24 and 21.25 kg m−2 for SOCno−adj projections (Figure S1). This broader range with temporal adjustments is expected as the process increases and decreases the expected SOC based on observed regional slopes.
Feature significance
The summary plot for SOC2020 (Fig. 7) revealed that the top five predictors for SOC stocks in 2020, ranked by importance, were aridity index, soil taxonomy, solar radiation, clay content (0–5 cm), and second layer clay content (5–10 cm). Higher values of the percentage of clay in surface soil and the aridity index corresponded to higher SOC values at most places. In comparison, lower values of solar radiation, evaporation-transpiration, upslope curvature, and temperature of the warmest month correspond to higher SOC values.
Summary plot for the model trained on time adjusted to 2020 data. Features are listed from top to bottom based on their significance. Each point on the summary plot represents a SHAP value for a feature and an instance, position on the X-axis indicates the impact of that value on the model’s output (to the right means positive impact higher SOC value), color of the points indicates the value of the feature for that observation (red high, blue low), density of the points shows how common or rare specific SHAP values are for a feature.
Interestingly, summary plots for SOCno-adj (Figure S2) and SOC1980 (Figure S3) had different feature importance when compared to SOC2020. The top five features for SOC1980, ranked by importance, were surface clay content (0–5 cm), soil taxonomy, second layer clay content (5–10 cm), aridity index, and elevation. Interestingly, surface clay content was the most important feature in both SOCno-adj and SOC1980 but moved down in the feature importance table in SOC2020. Solar radiation moved up to the top in SOC2020 projections, and lower solar radiation was correlated to higher SOC values.
SOC stocks
Using the model projections (SOC1980, SOC2020, and SOCno−adj), we estimated the total SOC stock in the CONUS for cropland, grassland, and forest land. Comparing SOC2020 with SOCun−adj revealed that just accounting for time reduced the carbon in croplands (top 1 m) from 14.29 Pg (SOCno−adj) to 13.29 Pg (SOC2020) (Fig. 8). A similar trend was noted for pasture lands, where SOC stock in the top 1 m soil profile decreased from 15.49 Pg (SOCno−adj) to 14.22 Pg (SOC2020). A contrasting trend was observed for forest lands, where the SOC stock within the top 1 m soil profile displayed an increase from 39.52 Pg (SOCno−adj) to 40.83 Pg (SOC2020). These results are summarized in a supplementary table (Table S2).
Estimated SOC stocks (Pg C) for different land uses using non-time adjusted data and time adjusted to 2020 data (left), Percentage difference in estimated SOC stocks for different land types when using non-time adjusted data vs. time using time adjusted to 2020 data(right). Error bars represent standard deviation.
Comparing SOC1980 and SOC2020 indicated that total SOC stocks within the top 1 m soil profile decreased in croplands from 14.49 Pg in 1980 to 13.29 Pg in 2020, representing a loss of approximately 8.35% (Fig. 9). A similar trend was noted for pasture lands, where SOC stocks in the top 1 m soil profile decreased from 16.26 Pg in 1980 to 14.22 Pg in 2020, representing a loss of approximately 12.54%. An opposite trend was noted for the forest lands, as SOC stocks in the top 1 m soil profile increased from 40.36 Pg in 1980 to 40.83 Pg in 2020, representing an increase of approximately 1.15%. Total SOC stocks, summing for all three major land use types, decreased from 71.12 Pg in 1980 to 68.33 Pg in 2020, reflecting a loss of approximately 3.92%. These results are summarized in a supplementary table (Table S3). Model outputs presented here are not adjusted for changes in the various land use areas in CONUS. The share of croplands decreased from 20.4 to 19.4% by 2020. Similarly, the share of pastures experienced a decline from 22.9 to 20.8%, whereas the contribution of forestry increased from 56.8 to 59.7% (Figure S4).
Estimated SOC stocks (Pg C) for different land types using 1980-time adjusted data and 2020-time adjusted data (left), Percentage difference in estimated SOC stocks for different land types when using 1980-time adjusted data vs. using 2020-time adjusted data (right). Error bars represent standard deviation.
Discussions
Temporal adjustment methodology presented in this study addressed the historical data gaps to account for temporal trends in SOC. The method combines statistical and machine learning techniques. Outcomes suggest that this approach is effective in efficiently parameterizing dynamic input variables and their governing processes impacting SOC changes across spatial scales. For instance, we found that soil textural classes at local scales and the aridity index at larger spatial scales were particularly effective input variables for capturing SOC-temporal shifts within the specified area. Prior research has also emphasized the importance of combining modeling methods for SOC estimates at larger spatial scales37. It particularly noted their effectiveness for more efficiently parameterizing the dynamic input variables (and processes) that drive carbon changes38. Additionally, our method effectively identified spatiotemporal trends in SOC stocks, exhibiting divergent shifts under different land use types, as evidenced by the model outputs depicting trends in SOC shifts and CONUS maps illustrating slope variations. These findings further underscore the importance of temporally adjusting training data for modeling purposes. Our method will increase the availability of training data for models, which may enable the scientific community to develop improved adaptive learning models39with the capacity to integrate new data as it becomes available. However, our methodology must be further improved and validated using recent data to address inherent limitations, including process-related constraints, data gaps, and uncertainties within the SOC data itself. A priority for the new soil sampling regime should also be to validate the temporal trends in different regions. This idea, to some extent, has been adopted in Europe27 for a limited study and should yield exciting results when it matures.
The SOC2020model, trained on temporally adjusted data, exhibited performance consistent with previously reported models utilizing similar legacy datasets. Studies with various machine learning techniques to project SOC baselines at a higher resolution of 250 m² within the CONUS have reported model accuracy with R² values ranging between 0.28 and 0.723,40,41,42,43. Moreover, our model’s integration of temporal adjustment renders its performance metrics more reliable. However, we acknowledge that the correlations indicated by our model may be incomplete or partially inaccurate due to gaps in critical data. Other modeling approaches, including process-based models, also face similar limitations and would equally benefit from additional data inputs.
Significant features for SOC2020predictions included aridity index, soil taxonomy, solar radiation, clay content (0–5 cm), and second layer clay content (5–10 cm), which were also significant in previous ML modeling studies44,45,46. However, some reshuffling among the top five covariates represented in SHAP plots for SOC2020, SOC1980, and SOCno−adj, which further confirms the improvements in model performance by accurately capturing the dynamic relationship between governing factors of SOC and model projections after temporal adjustments. The top five important features for the models interestingly represented both micro-spatial scales (e.g., top clay and soil type) and macro-scales (e.g., aridity index), which likely played a role in effectively categorizing the temporal slope effects between land use types. For example, micro-scale effects are more important in croplands47,48, whereas climatic effects may be more significant for forestry lands49,50. By using a detailed taxonomy, we were able to utilize soil profile characteristics through depth effectively. This appears crucial for capturing SOC depth attenuation features at smaller spatial scales, which may not be discernible through remotely sensed climate and vegetation data. The influence of soil type on the vertical distribution of SOC stocks has been demonstrated51. Additionally, SOC is used as a segregation criterion in soil classification systems52, further emphasizing the utility of understanding soil type changes with depth. While our model incorporates several SOC-governing factors, it may not fully capture the complexity of abrupt SOC shifts, such as rapid responses to extreme conditions changes53,54. For example, while the aridity index and slope are useful proxies, they may only partially represent the full impact of sudden changes, such as droughts or erosion, on SOC dynamics. Advanced datasets for SOC and environmental co-variates are needed to enable more nuanced, real-time adjustments to SOC temporal trends. The estimated stocks derived from our model projections fell within a comparable range to those reported in previous studies. We estimated the total SOC stocks for CONUS in the top 1 m soil profile at 71.12 PgC in 1980 and 68.33 PgC in 2020. Another study estimated this stock at 57 PgC55, another at 62.2 PgC56, and another study at 75.208 PgC and demonstrated a general declining trend of SOCs tocks across different regions of CONUS57. Our estimates are comparable to this later report by Goncalves et al. (2021), mostly based on using a more recent repeated sampling between 2010 and 2013 within CONUS58. One advantage of our method is its longer-term scalability, as it expands modeling to diverse legacy datasets that lacked repeated sampling efforts. Our model estimates are higher than the mean SOC stock reported for the CMIP6 model scenario, at 67.5 PgC59and another at 64.2 PgC40, confirming the overall SOC declining trends. While these models did incorporate temporal adjustments to model temporal shifts in SOC by leveraging the changing trajectory of climate variables, it’s essential to note that the feedback between temporal SOC shifts and climate change remains highly uncertain60. Furthermore, the negative impacts of land use change on SOC are relatively substantial compared to the temporal changes induced by climate variability61,62.
The sensitivity of our model to land use changes was evident in the notably variable slopes observed within different land use types and across spatial heterogeneity associated with climate factors. The negative slopes observed in croplands but not in forest land were particularly notable, indicating a significant impact of land use changes on SOC levels. According to the model projections, the total SOC- stocks in croplands’ top 1 m soil profile ranged from 14.94 PgC in 1980 to about 13.29 PgC in 2020. Comparable estimates have been reported in other studies; for instance, one study estimated 13.4 Pg C in the top 1 m in areas classified as croplands or agricultural lands in CONUS55, and another reported 13.0 Pg C63. However, cropland SOC stocks are dynamic, as noted in our model outputs, which showed a decrease of about 8.35% from 1980 to 2020. This trend is consistent with numerous studies demonstrating historical SOC loss in croplands64. The continual decline in SOC from croplands is well-documented, attributed to drivers such as erosion65, some specific land use practices64, and climate change effects66,67. For instance, tilled croplands in CONUS, covering approximately 210 thousand km²68have been noted to experience significant SOC loss from the topsoil compared to no-till lands69. Another indication of our model’s sensitivity is the contrasting trend observed for forest lands, wherein our projections indicated an increase of about 1.15% from 1980 (40.37 Pg C) to approximately 40.83 PgC in the year 2020 for the top 1-meter soil profile. Our projected estimates may not consistently align with the range reported in some studies, with estimates ranging from 20 Pg C70, 32.7 PgC56, and approximately 40 PgC71,72. Such a wide range suggests that estimates for the SOC in forest lands are highly uncertain73.
A small positive change in forest land SOC-stocks projected by our model can be attributed to increased net primary productivity of forest biomass, likely driven by rising atmospheric CO2levels and enhanced dry deposition of nutrients, enriching forest soils74. This could result in higher belowground carbon input and increased residue conversion to SOC. According to a USDA report for the National Forest Inventory, there are indications of increases in mineral soil SOC stocks in US forest lands75. Our model predicted little or no change for large forest areas, which aligns with another study reporting that SOC in forest lands has been stable since 190776. These findings from our study collectively demonstrate that our model exhibits increased sensitivity to land use-induced changes, which serve as the primary drivers of SOC losses and temporal shifts. The results of this study show that our methodology is a promising alternative that demonstrates the potential for improving existing machine learning methods in SOC modeling but remains dependent on further data availability and methodological improvements.
Limitations and future research
Soil data availability is spatiotemporally very limited. Within the CONUS, we observed large swaths of areas with no data or limited dataset with no repeat testing. Such distribution makes understanding trends in SOC hard. This is not surprising, as the primary goal of soil testing is not to establish trends. However, given the focus on changes in SOC in response to changing climate, it is crucial to understand the trends in SOC. This underscores the need for additional data, as spatial and temporal data gaps remain a significant constraint for accurately projecting current stocks to validate the impacts of climate-smart farming practices and other uses. This imperative is further underscored by the substantial spatiotemporal data gaps prevalent in many regions globally, such as Africa (Figure S5). Although our approach represents a significant advancement in SOC modeling, yet it’s crucial to critically assess and improve by identifying areas where improvements are needed such as enhanced data collection useful for validation methods, comprehensive uncertainty quantification, and develop adaptive temporal and spatial models to refine past-projection techniques.
The study’s methodology is built on key assumptions regarding temporal and spatial adjustments, particularly the choice of a 5-year period for averaging and a 10 km spatial similarity threshold for slope computation. These thresholds were chosen to balance data variability and model sensitivity; however, they assume a level of temporal and spatial homogeneity that may not be universally applicable77,78,79. We recognize that spatial similarity could vary by location and that a fixed distance might not capture all aspects of spatial variability. The 5-year temporal adjustment was chosen based on prior evidence showing that observable change in SOC takes multiple years80,81, however, this may not be adequate in regions with extreme climatic events or rapid land use changes53,54. Similarly, the 10 km spatial threshold was selected to ensure data sufficiency but does not capture finer-scale spatial heterogeneity. Future research should explore adaptive temporal frameworks and spatially flexible models that adjust based on local environmental conditions, potentially incorporating high-frequency data and advanced spatial statistics like variogram analysis to better reflect the true temporal and spatial dynamics influencing SOC.
Data harmonization and imputation such as simulating bulk density and standardizing soil depth profiles may introduce fixed correlations that could potentially influence our machine learning models by embedding artificial relationships. While harmonization was essential to create a workable and consistent dataset given the scope and scale of our study, this process can introduce biases in data relationships. However, the risk of introducing bias is even higher by leaving out a substantial amount of available soil data if not harmonized in large-scale models like ours. For certain soil properties, such as bulk density, there is no viable alternative currently due to the lack of sufficient field observation data. Detailed information outlining the imputation process at every step and the methods employed is provided in supplementary materials.
The validation of SOC model predictions in this study has some limitations due to the absence of long-term, systematically collected data, especially in regions experiencing substantial land use changes. The variability of environmental covariates such as climate, soil properties, and land use—also poses a challenge, given their significant spatial and temporal fluctuations that the current dataset may not fully capture82,83. The reliance on limited historical land cover and land use (LCLU) data, which often lacks necessary temporal resolution84, further complicates precise SOC modeling. The absence of detailed records on land management practices, such as crop rotations and tillage methods, further restricts our ability to account for these critical drivers of SOC change.
An improvement to our model could involve utilizing temporally resolved land use maps, as temporal changes in land use management have the potential to alter the land coverage used for our model predictions. For instance, the total cropland area in CONUS decreased by about 18% between 1949 and 201285. However, a reversal trend was observed in some regions between 2008 and 2012, with approximately 30,000 km² converted to cropland from grassland, potentially resulting in SOC loss86. By capturing these temporal changes, we could further enhance model accuracy. Additionally, repeated sampling efforts similar to those conducted in Europe27 would significantly improve our models and provide valuable validation data, which is a major limitation for both process-based and ML modeling. Robust validation will also enhance the explainability of ML models and mitigate their black-box nature, making the results more meaningful for interpreting SOC governing processes and accurately representing them in ML modeling of SOCchanges.
Quantifying uncertainty in SOC projections remains a complex challenge due to the inherent variability in environmental data and the interactions between multiple covariates. While multiple model runs were employed in this study to estimate uncertainty, this approach mainly addresses data variability and model structure uncertainty. Thus, model’s ability to produce unbiased estimates for large-scale SOC quantification using environmental data remains untested. In view of this, to provide a comparison, we developed a difference map between our SOC predictions and the ML-derived ISRIC SOC dataset for the US99. This map, included as Supplementary Figure S6, highlights regional variations and areas where model estimates differ significantly. This comparison does not constitute a bias test. However, it provides valuable insight into potential discrepancies. A lower value than ISRIC is expected in our SOC predictions because our map accounts for temporal adjustment of data, reflects SOC loss trends, and excludes wetland areas from the analysis. These distinctions highlight differences in methodology and temporal focus between the two datasets. Future studies should develop comprehensive uncertainty quantification frameworks that account for parameter uncertainty, model structural uncertainty, and data uncertainty. Bayesian approaches and sensitivity analyses could provide deeper insights into the sources of uncertainty, allowing for more robust and reliable SOC predictions. However, without a mechanism to evaluate and correct bias, the model’s results should be interpreted cautiously, particularly in the context of applications requiring precise aggregate values, such as carbon trading schemes.
Methodology
Data accumulation and preparation
SOC data was compiled from various sources, including the International Soil Reference and Information Centre (ISRIC) database87, the National Cooperative Soil Survey (NCSS) database88, and the Rapid Carbon Assessment (RaCA) data58. These sources contain soil profile (layers) records with SOC measurements at different depths collected across the CONUS over several decades (1910–2020). After removing the duplicate data, a standardized 1-meter SOC stock value (0–100 cm) was computed for all soil profile points from different sources. A depth-adjusted aggregation method was adopted to calculate the total stock value for 100 cm, which required computing bulk density for each soil profile layer. Bulk density values were estimated using a machine learning model for soil profile layers lacking bulk density. This model was trained on soil profiles with known bulk density values, utilizing SOC content, geographic coordinates, texture (sand and silt percentages), and soil horizon as predictors. SOC stock for each layer within the soil profiles was calculated by integrating the layer-specific SOC concentrations with the corresponding bulk density and thickness measurements. Only soil profiles extending to a depth of at least 1 m were considered, ensuring a comprehensive representation of the top meter. These calculations were confined to the upper 1 m of the soil profile to align with the standard depth for SOC stock assessment. In cases where layers extended beyond this depth, only the portion within the top 1 m was considered, applying a proportional linear adjustment based on the overlap with this depth interval. Additional details about the data, bulk density estimation model and depth-standardized aggregation method are provided in the supplementary document. Thus, for each soil profile, a depth-weighted average SOC content in the top 1 m (SOC1m) was calculated based on the SOC measurements at recorded depth layers. The compiled dataset was filtered to remove outliers and physically implausible observations. The final aggregated set of SOC profiles encompasses multi-decadal measurements standardized to topsoil SOC1m values, providing extensive coverage of SOC status across the study region (Fig. 1).
Temporal adjustment of stock values
A time series modeling approach was developed to characterize the status of soil organic carbon (SOC) and account for temporal variability by applying space-for-time substitution89,90. For each sampling site, a pair-wise distance matrix was constructed with other sites within a 10 km radius based on haversine distance. Sites within this threshold (10 km) were considered proximal with more similar SOC temporal trajectories. The choice of a 10-km radius for establishing spatial similarity was intended to balance data density with model accuracy. While smaller distances could capture finer-scale spatial heterogeneity in SOC, reducing the radius significantly reduced the availability of data points for calculating temporal slopes. This reduction compromised the model’s capacity to generate reliable temporal adjustments, particularly given the inherent sparsity of SOC observations across extensive regions. Thus, we selected 10 km as a compromise to ensure sufficient observation density for reliable slope estimation while still preserving local spatial trends in SOC levels.
For each focal site, proximal sites were filtered to those with total clay and silt content within 10% of the focal site’s value to constrain variation in soil texture. Sampling years and corresponding SOC measurements for the proximal sites were extracted and aggregated to the nearest 5-year period (e.g., 2000–2004 rounded to 2000). The aggregation reduces the influence of short-term fluctuations and enhances the multi-decadal signal. Aggregating SOC values within 5-year periods also helps mitigate artifacts from uneven sampling density across years. It smooths the high-frequency variability unrelated to gradual SOC change, enhancing the signal of multi-decadal trends. Shorter (< 5 years) periods were found to be influenced by individual anomalous years, while more extended periods excessively smooth temporal dynamics. The mean SOC value within each five years was calculated as inputs for the temporal trend analysis.
A robust linear regression model (RLM) was fitted to the rounded 5-year midpoints and mean SOC values if at least 3 unique periods were available, implementing Huber’s T norm to limit the influence of outliers91. This yielded slope (m) and intercept (c) coefficients representing the temporal SOC trend for each focal site. Using these slope values, SOC was projected to the desired year using:
For sites without sufficient temporal data (< 3 periods), the slope coefficient was predicted using a gradient-boosting model with a cross-validation approach based on climate, soil taxonomy, texture (clay, silt), terrain attributes, land use, and geographic coordinates. The model was trained on sites with known slope values. K-Fold cross-validation (k = 10) was applied, yielding an average R2of 0.6, demonstrating the model’s generalizability to new, unseen data. Subsequently, a final model was trained on the entire dataset using the parameters refined through a 10-fold cross-validation process, which achieved an R2 of 0.92. Slope values for sites without RLM trends were predicted with this model and used to estimate projected SOC. Additional details about the data and process are provided in the supplementary document. The process of temporal adjustment of SOC to the 2020 reference year (SOC2020) is shown in Fig. 2.
Feature selection for spatially distributed SOC modeling
An input dataset with topographical characteristics, climates, soil types, land use, and vegetation characteristics of CONUS was developed in consultation with soil scientists to represent likely factors affecting SOC; see supplement Table 1 for details. The spatial covariates/features included elevation, aspect, slope, and several topographical curvatures, including cross-sectional, downslope, flowline, general, local, local downslope, local upslope, maximal, minimal, plan, profile, tangential, total, and upslope. Climatic covariates included aridity, potential evapotranspiration, solar radiation, and several bioclimatic variables based on precipitation and temperature. Soil-related covariates included USDA soil taxonomy, soil water content, clay content at six standard depths (0, 10, 30, 60, 100, and 200 cm), sand content at those six standard depths, and eleven soil temperature-based bioclimatic variables at 0–5 cm soil depth and 5–15 cm soil depths. Land use and crop-related covariates included MODIS land use land cover classifications and Crop development Layer (CDL). Vegetation covariates included NDVI and EVI computed at several percentiles on an annual scale (minimum, 25 percentile, median, 75 percentile, 95 percentile, and maximum). Covariates are described in Supplement A. Feature selection was executed in three phases. First, a correlation analysis identified relationships among covariates, enabling the exclusion of highly correlated variables to diminish redundancy. Second, the covariate pool was condensed according to preliminary model results, employing feature-importance scores. The complexity of the model was minimized by preserving only the most pivotal features. The third phase involved consulting subject-matter experts and pertinent literature to evaluate the applicability of the remaining covariates, ensuring that their inclusion was contextually justified. After these steps, the initial 97 co-variates were narrowed to 38 covariates (Supplement Table S1).
SOC ML model
A boosting-tree-based regression model (CatBoost) (https://catboost.ai/en/docs/), with a strategy to ensure minimal overfitting and optimal generalizability, was used for predicting SOC at 1 km resolution. The temporally adjusted dataset was segregated into training, validation, and test subsets. Of the data, 80% was randomly picked for the training phase, while the remaining 20% was randomly divided equally between validation and testing. The model was trained on the training dataset and concurrently evaluated on the unseen validation data. This synchronous assessment during training facilitated early detection of overfitting, thus preserving generalization capabilities.
Upon completion of the training process, the model’s performance was evaluated on the test dataset—a subset of data previously not seen by the model. This served as a reliable test for the model’s predictive proficiency. Statistical metrics, including Mean Absolute Error (MAE), the coefficient of determination (R2score), and the Root Mean Square Error (RMSE), were computed to quantify the model’s performance. This multi-layered approach, involving the distinct usage of training, validation, and testing datasets, facilitated extracting and applying key features that could be generalized92.
Assessment of model features
SHAP (SHapley Additive exPlanations) analysis was used to enhance the model’s interpretability using the Python Shap package (https://shap.readthedocs.io/en/latest/). The bee-swarmplot (included in the supplement) provides insight into how each feature contributes to the model’s predictions, transforming the model from a black box into a somewhat explainable framework93,94. SHAP decomposes the model output into the sum of the effects of each feature being introduced into a conditional expectation. The SHAP value for a feature represents the average marginal contribution of a feature across all possible combinations95.
Modeling uncertainty
Modeling uncertainties were investigated by employing 10 runs of the model, with each run initiated using a distinct random seed and trained on the different split of the data, ensuring each model was exposed to a unique portion of the data. This approach allowed the estimation of errors due to the data’s inherent variability. Standard deviation across the predictions from this suite of model runs was used to quantify the uncertainty in SOC stock predictions. Similar methods have been used by other studies96,97,98.
Data visualization
The plots in Figs. 1, 3 and 5 were initially created using Plotly Express (v5.13.1) and Plotly Graph Objects (v5.13.1) for Python (plotly.express.scatter_geo — 5.24.1 documentation). Custom scatter plots were generated using the scatter_geo function to represent SOC data across the USA. Custom color scales were applied to visualize the data effectively, ranging from temporal trends to SOC and slope values. These color scales were designed to highlight specific patterns in the data, whether by year, SOC value, or slope magnitude. After generating the plots, additional modifications were made using Adobe Illustrator (v28.6) (https://www.adobe.com/products/illustrator.html) to enhance the visuals for publication-quality standards, ensuring clarity and consistency across the figures. The map in Fig. 6 was generated by importing the SOC raster data into ArcGIS Pro. The layout was designed within the software, and the final image was exported. Data Visualization with ArcGIS Pro: The raster data was visualized in ArcGIS Pro with customized layouts and symbology suited to the dataset. The exported image was refined using Adobe Illustrator (v28.6) to ensure publication-quality adjustments and visual consistency with the other figures.
Data availability
Datasets generated and/or analyzed during the current study are available at the following websites. Some of the data projections generated by the current study are available from the corresponding authors on reasonable request. https://www.isric.org/ , https://www.nrcs.usda.gov/resources/data-and-reports/ncss-soil-characterization-data-lab-data-mart. https://www.nrcs.usda.gov/resources/data-and-reports/rapid-carbon-assessment-raca. Historical climate data — WorldClim 1 documentation, Earth Engine Data Catalog | Google for Developers.
Change history
04 July 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-07846-1
References
Lehmann, J., Bossio, D. A., Kögel-Knabner, I. & Rillig, M. C. The concept and future prospects of soil health. Nat. Reviews Earth Environ., 1–10 (2020).
Manns, H. R. & Berg, A. A. Importance of soil organic carbon on surface soil water content variability among agricultural fields. J. Hydrol. 516, 297–303 (2014).
Bossio, D. et al. The role of soil carbon in natural climate solutions. Nat. Sustain. 3, 391–398 (2020).
Lal, R. Soil carbon sequestration to mitigate climate change. Geoderma 123, 1–22 (2004).
IPCC. (USA. Cambridge University Press, (2013).
Schneider, U. A., McCarl, B. A. & Schmid, E. Agricultural sector analysis on greenhouse gas mitigation in US agriculture and forestry. Agric. Syst. 94, 128–140. https://doi.org/10.1016/j.agsy.2006.08.001 (2007).
Smith, P. et al. Greenhouse gas mitigation in agriculture. Philosophical Trans. Royal Soc. B: Biol. Sci. 363, 789–813 (2008).
Conant, R. T., Ogle, S. M., Paul, E. A. & Paustian, K. Measuring and monitoring soil organic carbon stocks in agricultural lands for climate mitigation. Front. Ecol. Environ. 9, 169–173 (2011).
Badgley, G. et al. Systematic over-crediting in California’s forest carbon offsets program. Glob Chang. Biol. 28, 1433–1445 (2022).
Oldfield, E. E. et al. Crediting agricultural soil carbon sequestration. Science 375, 1222–1225 (2022).
Chenu, C. et al. Increasing organic stocks in agricultural soils: knowledge gaps and potential innovations. Soil Tillage. Res. 188, 41–52 (2019).
Lehmann, J. & Kleber, M. The contentious nature of soil organic matter. Nature 528, 60–68. https://doi.org/10.1038/nature16069 (2015).
Luo, Z., Viscarra Rossel, R. A. & Shi, Z. Distinct controls over the temporal dynamics of soil carbon fractions after land use change. Glob Chang. Biol. 26, 4614–4625 (2020).
Viscarra Rossel, R. A., Webster, R., Bui, E. N. & Baldock, J. A. Baseline map of organic carbon in Australian soil to support national carbon accounting and monitoring under climate change. Glob Chang. Biol. 20, 2953–2970 (2014).
Smith, P. et al. How to measure, report and verify soil carbon change to realize the potential of soil carbon sequestration for atmospheric greenhouse gas removal. Glob Chang. Biol. 26, 219–241. https://doi.org/10.1111/gcb.14815 (2020).
Morais, T. G., Teixeira, R. F. & Domingos, T. Detailed global modelling of soil organic carbon in cropland, grassland and forest soils. PLoS One. 14, e0222604 (2019).
Smith, J. et al. Projected changes in the organic carbon stocks of cropland mineral soils of European Russia and the Ukraine, 1990–2070. Glob Chang. Biol. 13, 342–356 (2007).
Smith, P. et al. A comparison of the performance of nine soil organic matter models using datasets from seven long-term experiments. Geoderma 81, 153–225 (1997).
Huang, H. et al. A review on digital mapping of soil carbon in cropland: progress, challenge, and prospect. Environ. Res. Lett. 17, 123004 (2022).
Wadoux, A. M. C., Padarian, J. & Minasny, B. Multi-source data integration for soil mapping using deep learning. Soil 5, 107–119 (2019).
Kumar, S., Lal, R. & Liu, D. A geographically weighted regression kriging approach for mapping soil organic carbon stock. Geoderma 189, 627–634 (2012).
Yates, K. L. et al. Outstanding challenges in the transferability of ecological models. Trends Ecol. Evol. 33, 790–802 (2018).
Wang, Z. et al. Upscaling Soil Organic Carbon measurements at the Continental Scale Using Multivariate Clustering Analysis and Machine Learning. J. Geophys. Research: Biogeosciences. 129 https://doi.org/10.1029/2023JG007702 (2024). e2023JG007702.
Xia, Y., McSweeney, K. & Wander, M. M. Digital Mapping of Agricultural Soil Organic Carbon using soil forming factors: a review of current efforts at the Regional and National scales. Front. Soil. Sci. 2, 890437 (2022).
De Rosa, D. et al. Soil organic carbon stocks in European croplands and grasslands: how much have we lost in the past decade? Glob Chang. Biol. 30, e16992. https://doi.org/10.1111/gcb.16992 (2024).
Heikkinen, J., Keskinen, R., Kostensalo, J. & Nuutinen, V. Climate change induces carbon loss of arable mineral soils in boreal conditions. Glob Chang. Biol. 28, 3960–3973 (2022).
Le Noë, J. et al. Soil organic carbon models need independent time-series validation for reliable prediction. Commun. Earth Environ. 4, 158 (2023).
Menichetti, L., Ågren, G. I., Barré, P., Moyano, F. & Kätterer, T. Generic parameters of first-order kinetics accurately describe soil organic matter decay in bare fallow soils over a wide edaphic and climatic range. Sci. Rep. 9, 20319 (2019).
Grunwald, S. Artificial intelligence and soil carbon modeling demystified: power, potentials, and perils. Carbon Footprints. 1, 5 (2022).
Heuvelink, G. B. et al. Machine learning in space and time for modelling soil organic carbon change. Eur. J. Soil. Sci. 72, 1607–1623 (2021).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).
Dorogush, A. V. et al. Fighting biases with dynamic boosting. arXiv preprint arXiv:1706.09516 (2017).
Hancock, J. T. & Khoshgoftaar, T. M. CatBoost for big data: an interdisciplinary review. J. big data. 7, 94 (2020).
Wiesmeier, M. et al. Soil organic carbon storage as a key function of soils - a review of drivers and indicators at various scales. Geoderma 333, 149–162. https://doi.org/10.1016/j.geoderma.2018.07.026 (2019).
Georgiou, K., Abramoff, R. Z., Harte, J., Riley, W. J. & Torn, M. S. Microbial community-level regulation explains soil carbon responses to long-term litter manipulations. Nat. Commun. 8, 1223 (2017).
Keyvanshokouhi, S. et al. Effects of soil process formalisms and forcing factors on simulated organic carbon depth-distributions in soils. Sci. Total Environ. 652, 523–537 (2019).
Riggers, C. et al. Multi-model ensemble improved the prediction of trends in soil organic carbon stocks in German croplands. Geoderma 345, 17–30 (2019).
Biney, J. K. M. et al. Prediction of topsoil organic carbon content with Sentinel-2 imagery and spectroscopic measurements under different conditions using an ensemble model approach with multiple pre-treatment combinations. Soil Tillage. Res. 220, 105379 (2022).
Shen, Z. et al. Deep transfer learning of global spectra for local soil carbon monitoring. ISPRS J. Photogrammetry Remote Sens. 188, 190–200. https://doi.org/10.1016/j.isprsjprs.2022.04.009 (2022).
Gautam, S. et al. Continental United States may lose 1.8 petagrams of soil organic carbon under climate change by 2100. Glob. Ecol. Biogeogr. 31, 1147–1160. https://doi.org/10.1111/geb.13489 (2022).
Mishra, U., Gautam, S., Riley, W. J. & Hoffman, F. M. Ensemble Machine Learning Approach improves predicted spatial variation of Surface Soil Organic Carbon stocks in Data-Limited Northern Circumpolar Region. Front. Big Data. 3 https://doi.org/10.3389/fdata.2020.528441 (2020).
Poggio, L. et al. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. SOIL 7, 217–240. https://doi.org/10.5194/soil-7-217-2021 (2021).
Zhang, L. et al. A CNN-LSTM Model for Soil Organic Carbon Content Prediction with Long Time Series of MODIS-Based phenological variables. Remote Sens. 14, 4441 (2022).
Georgiou, K. et al. Divergent controls of soil organic carbon between observations and process-based models. Biogeochemistry 156, 5–17. https://doi.org/10.1007/s10533-021-00819-2 (2021).
Román Dobarco, M. et al. Mapping soil organic carbon fractions for Australia, their stocks, and uncertainty. Biogeosciences 20, 1559–1586. https://doi.org/10.5194/bg-20-1559-2023 (2023).
Duarte-Guardia, S. et al. Better estimates of soil carbon from geographical data: a revised global approach. Mitig. Adapt. Strat. Glob. Change. 24, 355–372 (2019).
Tan, Z., Lal, R., Smeck, N. & Calhoun, F. Relationships between surface soil organic carbon pool and site variables. Geoderma 121, 187–195 (2004).
Goidts, E., Wesemael, B. V. & Van Oost, K. Driving forces of soil organic carbon evolution at the landscape and regional scale using data from a stratified soil monitoring. Glob Chang. Biol. 15, 2981–3000 (2009).
Powers, J. S. & Veldkamp, E. Regional variation in soil carbon and δ 13 C in forests and pastures of northeastern Costa Rica. Biogeochemistry 72, 315–336 (2005).
Seibert, J., Stendahl, J. & Sørensen, R. Topographical influences on soil properties in boreal forests. Geoderma 141, 139–148 (2007).
Hobley, E., Wilson, B., Wilkie, A., Gray, J. & Koen, T. Drivers of soil organic carbon storage and vertical distribution in Eastern Australia. Plant. Soil. 390, 111–127 (2015).
Michéli, E., Owens, P. R., Láng, V., Fuchs, M. & Hempel, J. Organic Carbon as a major differentiation criterion in soil classification systems. Soil. Carbon, 37–43 (2014).
Wang, M. et al. Global soil profiles indicate depth-dependent soil carbon losses under a warmer climate. Nat. Commun. 13, 5514 (2022).
Wang, M. et al. Responses of soil organic carbon to climate extremes under warming across global biomes. Nat. Clim. Change. 14, 98–105. https://doi.org/10.1038/s41558-023-01874-3 (2024).
Bliss, N. B., Waltman, S. W., West, L. T., Neale, A. & Mehaffey, M. Distribution of soil organic carbon in the conterminous United States. Soil. Carbon, 85–93 (2014).
Guo, Y., Gong, P., Amundson, R. & Yu, Q. Analysis of Factors Controlling Soil Carbon in the Conterminous United States. Soil Sci. Soc. Am. J. 70, 601–612. https://doi.org/10.2136/sssaj2005.0163 (2006).
Gonçalves, D. R. P., Mishra, U., Wills, S. & Gautam, S. Regional environmental controllers influence continental scale soil carbon stocks and future carbon dynamics. Sci. Rep. 11, 6474. https://doi.org/10.1038/s41598-021-85992-y (2021).
Wills, S. et al. Springer International Publishing,. in Soil Carbon (eds Alfred E. Hartemink & Kevin McSweeney) 95–104 (2014).
O’Neill, B. C. et al. The scenario model intercomparison project (ScenarioMIP) for CMIP6. Geosci. Model Dev. 9, 3461–3482 (2016).
Wiesmeier, M., Barthold, F. & Blank, B. Kögel-Knabner, I. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant. soil. 340, 7–24 (2011).
Guo, L. B. & Gifford, R. M. Soil carbon stocks and land use change: a meta analysis. Glob Chang. Biol. 8, 345–360. https://doi.org/10.1046/j.1354-1013.2002.00486.x (2002).
Houghton, R. A. Interactions between Land-Use Change and Climate-Carbon Cycle Feedbacks. Curr. Clim. Change Rep. 4, 115–127. https://doi.org/10.1007/s40641-018-0099-9 (2018).
Lajtha, K., Bailey, V. L. & McFarlane, K. The Second State of the Carbon Cycle Report-Chap. 12. Soils (Lawrence Livermore National Lab.(LLNL), 2018). (United States).
Sanderman, J., Hengl, T. & Fiske, G. J. Soil carbon debt of 12,000 years of human land use. Proc. Natl. Acad. Sci. U. S. A. 114, 9575–9580 (2017).
Van Oost, K. et al. The impact of agricultural soil erosion on the global carbon cycle. Science 318, 626–629 (2007).
Bond-Lamberty, B. & Thomson, A. Temperature-associated increases in the global soil respiration record. Nature 464, 579–582 (2010).
Crowther, T. W. et al. Quantifying global soil carbon losses in response to warming. Nature 540, 104–108. https://doi.org/10.1038/nature20150 (2016). http://www.nature.com/nature/journal/v540/n7631/abs/nature20150.html#supplementary-information
Prăvălie, R. Exploring the multiple land degradation pathways across the planet. Earth Sci. Rev. 220, 103689 (2021).
Sulman, B. N. et al. Land Use and Land Cover affect the depth distribution of Soil Carbon: insights from a large database of soil profiles. Front. Environ. Sci. 8 https://doi.org/10.3389/fenvs.2020.00146 (2020).
Domke, G. M., Oswalt, S. N., Walters, B. F. & Morin, R. S. Tree planting has the potential to increase carbon sequestration capacity of forests in the United States. Proceedings of the national academy of sciences 117, 24649–24651 (2020).
Liu, J. et al. Critical land change information enhances the understanding of carbon balance in the United States. Glob Chang. Biol. 26, 3920–3929 (2020).
Sleeter, B. M. et al. Effects of contemporary land-use and land-cover change on the carbon balance of terrestrial ecosystems in the United States. Environ. Res. Lett. 13, 045006. https://doi.org/10.1088/1748-9326/aab540 (2018).
Domke, G. et al. Toward inventory-based estimates of soil organic carbon in forests of the United States. Ecol. Appl. 27, 1223–1235 (2017).
Sperry, J. S. et al. The impact of rising CO2 and acclimation on the response of US forests to global warming. Proceedings of the National Academy of Sciences 116, 25734–25744 (2019).
US-EPA. Inventory of US Greenhouse Gas Emissions and Sinks: 1990–2021. (U.S. Environmental Protection Agency, Washington, DC. (2021). https://www.epa.gov/ghgemissions/inventory-us-greenhouse-gas-emissions-and-sinks.
Magerl, A., Le Noë, J., Erb, K. H., Bhan, M. & Gingrich, S. A comprehensive data-based assessment of forest ecosystem carbon stocks in the US 1907–2012. Environ. Res. Lett. 14, 125015 (2019).
Goidts, E., Van Wesemael, B. & Crucifix, M. Magnitude and sources of uncertainties in soil organic carbon (SOC) stock assessments at various scales. Eur. J. Soil. Sci. 60, 723–739 (2009).
Duarte-Guardia, S. et al. Biophysical and socioeconomic factors influencing soil carbon stocks: a global assessment. Mitig. Adapt. Strat. Glob. Change. 25, 1129–1148 (2020).
Lessmann, M., Ros, G. H., Young, M. D. & de Vries, W. Global variation in soil carbon sequestration potential through improved cropland management. Glob Chang. Biol. 28, 1162–1177 (2022).
Post, W. M. & Kwon, K. C. Soil carbon sequestration and land-use change: processes and potential. Glob Chang. Biol. 6, 317–327. https://doi.org/10.1046/j.1365-2486.2000.00308.x (2000).
Smith, P. How long before a change in soil organic carbon can be detected? Glob Chang. Biol. 10, 1878–1883. https://doi.org/10.1111/j.1365-2486.2004.00854.x (2004).
Wadoux, A. M. C., Minasny, B. & McBratney, A. B. Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth Sci. Rev. 210, 103359 (2020).
Grunwald, S. Artificial intelligence and soil carbon modeling demystified: power, potentials, and perils. Carbon Footprints. 1, 6 (2022).
Wang, J., Bretz, M., Dewan, M. A. A. & Delavar, M. A. Machine learning in modelling land-use and land cover-change (LULCC): current status, challenges and prospects. Sci. Total Environ. 822, 153559 (2022).
Bigelow, D. & Borchers, A. Major uses of land in the United States, (2017). (2012).
Spawn, S. A., Lark, T. J. & Gibbs, H. K. Carbon emissions from cropland expansion in the United States. Environ. Res. Lett. 14, 045009 (2019).
Batjes, N., Ribeiro, E., van Oostrum, A. & Leenaars, J. & Jesus de Mendes, J. Standardised soil profile data for the world (WoSIS, July 2016 snapshot). http:\dx.doi.org10 (2016).
Reinsch, T. & West, L. The US national cooperative soil characterization database. (2010).
Blois, J. L., Williams, J. W., Fitzpatrick, M. C., Jackson, S. T. & Ferrier, S. Space can substitute for time in predicting climate-change effects on biodiversity. Proceedings of the national academy of sciences 110, 9374–9379 (2013).
Pickett, S. T. In Long-term Studies in Ecology: Approaches and Alternatives110–135 (Springer, 1989).
Huber, P. J. Robust regression: asymptotics, conjectures and Monte Carlo. Annals Stat., 799–821 (1973).
Xu, Y. & Goodacre, R. On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J. Anal. Test. 2, 249–262 (2018).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017).
Štrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41, 647–665 (2014).
Shapley, L. S. 307–317 (Princeton University Press Princeton, NJ, (1953).
Dutschmann, T. M., Kinzel, L., Ter Laak, A. & Baumann, K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J. Cheminform. 15, 49 (2023).
Wang, Z. et al. Upscaling soil organic carbon measurements at the continental scale using multivariate clustering analysis and machine learning. J. Geophys. Research: Biogeosciences. 129, e2023JG007702 (2024).
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 30 (2017).
Hengl, T. et al. SoilGrids250m: global gridded soil information based on machine learning. PLoS One. 12, e0169748. https://doi.org/10.1371/journal.pone.0169748 (2017).
Acknowledgements
This study was funded by SIF, PBC. ACS received funding from USDA-NIFA Hatch project (Project number: TEX0-1-9603). The funders played no role in the study design, data collection, analysis , and interpretation of data, or the writing of this manuscript.
Author information
Authors and Affiliations
Contributions
LB: Data collection, curation and analysis, methodology, original draft. ACS: Conceptualization, methodology, supervision, funding, original draft preparation. SK: Conceptualization, methodology, supervision, funding, editing. RT: Data collection, editing. JR: Methodology, editing. RS: Methodology, editing. RP: Analysis, data collection, methodology.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The original version of this Article contained an error in the spelling of the author Rocky Talchabhadel, which was incorrectly given as Rocky Talchabadel.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bokati, L., Somenahally, A., Kumar, S. et al. Temporal adjustment approach for high-resolution continental scale modeling of soil organic carbon. Sci Rep 15, 6483 (2025). https://doi.org/10.1038/s41598-025-89503-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-89503-1
Keywords
This article is cited by
-
Advancing Infiltration Rate Prediction in Algeria’s Mitidja Plain: A Machine Learning and Empirical Model Comparison
Earth Systems and Environment (2025)