Temporal adjustment approach for high-resolution continental scale modeling of soil organic carbon

Bokati, Laxman; Somenahally, Anil; Kumar, Saurav; Robatjazi, Javad; Talchabhadel, Rocky; Sarkar, Reshmi; Perepi, Rahul

doi:10.1038/s41598-025-89503-1

Download PDF

Article
Open access
Published: 22 February 2025

Temporal adjustment approach for high-resolution continental scale modeling of soil organic carbon

Laxman Bokati¹,
Anil Somenahally^2,3,
Saurav Kumar¹,
Javad Robatjazi³,
Rocky Talchabhadel⁴,
Reshmi Sarkar⁵ &
…
Rahul Perepi¹

Scientific Reports volume 15, Article number: 6483 (2025) Cite this article

3685 Accesses
2 Citations
8 Altmetric
Metrics details

Subjects

A Correction to this article was published on 04 July 2025

This article has been updated

Abstract

Open-source legacy data available for training soil organic carbon (SOC) models are limited and not uniformly distributed in space or time. While some process-based models predict SOC changes, most of the large-scale data-driven SOC modeling efforts overlook temporal shifts. Accounting for the expected temporal drift allows us to increase the accuracy of dataset available for machine learning models. Here we present an approach for creating proximity-based distance matrices using the legacy data available in contiguous US (CONUS) and generating spatially resolved temporal shift projections that adjust observations to the target date. The approach was evaluated by comparing SOC observations projected to two reference years, SOC₁₉₈₀ and SOC₂₀₂₀ and without temporal adjustment (SOC_no−adj). Stocks of SOC projections showed significant differences between SOC_no−adj and SOC₂₀₂₀. Baseline estimate of SOC stocks in CONUS croplands (top 1 m) were higher based on SOC_no−adj (14.49 Pg C) compared to SOC₂₀₂₀ (13.29 Pg C), for pasture lands 15.49 Pg (SOC_no−adj) and 14.22 Pg C (SOC₂₀₂₀), for forest lands at 39.52 Pg C (SOC_no−adj) and 40.83 Pg C (SOC₂₀₂₀). The study results confirmed the validity of our methodology, and its capability to enhance SOC stock projections effectively with temporal adjustments. Potential users of this study’s outcomes include many stakeholders involved in carbon incentive programs, including farmers, scientists, policy makers, and industry partners.

A global meta-analysis of soil organic carbon in the Anthropocene

Article Open access 22 June 2023

Spatio-temporal mapping reveals changes in soil organic carbon stocks across the contiguous United States since 1955

Article Open access 01 August 2025

Effects of recultivation on soil organic carbon sequestration in abandoned coal mining sites: a meta-analysis

Article Open access 22 November 2022

Introduction

Soil organic carbon (SOC) is an important ecosystem commodity, providing many critical ecosystem services, and is a crucial component of soil health. Increasing SOC stocks is directly linked to improving soil fertility, water and nutrient holding capacity, and soil biodiversity^1,2and is a promising climate mitigation strategy^3,4. Global agricultural lands can be leveraged through appropriate climate-smart agricultural practices to increase SOC sequestration, estimated to be between 0.35 and 1.18 Gt of CO₂per year^5,6,7. Expansion of soil carbon monitoring in agricultural lands is anticipated to grow for supporting carbon credit markets designed to leverage additional land resources. Soil carbon monitoring programs require accurate quantification of SOC baseline stocks defined as the expected average soil carbon at a given location^3,8, which can also aid in generating high-quality carbon credits^9,10. Given the complexity of SOC dynamics and nonlinear responses across spatial and temporal scales^11,12, it is essential to represent the baseline SOC stocks dynamically to reflect current levels¹³. Modeling at higher spatial resolution is also essential to accurately account for different land use practices driven changes¹⁴. Estimating the current baseline stocks would ideally involve collecting soil samples for analysis at individual farms, as this would yield the most precise results¹⁵. However, this approach is expensive and challenging to implement on a large spatial scale, such as the Conterminous United States (CONUS).

A variety of techniques have been employed to model and map SOC baseline content, including process-based models^16,17,18and data-driven machine learning (ML) models. ML models are particularly effective at larger spatial scales as they can be trained using environmental covariates such as climate, topography, vegetation, parent material, and other factors at various spatial scaless^19,20. Additionally, geostatistical approaches like regression kriging have accounted for spatial autocorrelation alongside environmental correlations²¹. Despite these advancements, the predictive performance of these techniques is constrained by limited sampling density and extent when compared to the inherent spatial heterogeneity²². Crucially, the temporal dimension has been largely overlooked when using extensive legacy SOC data collected over multi-decadal timespans in ML models^23,24. Most models aggregated multi-year data, treating it as a singular moment, which introduces systematic errors by assuming that soil carbon levels remain constant over time. This assumption contradicts compelling evidence of the impact of climate change, land use dynamics, and conservation practices on soil carbon storage and fluxes^25,26. Reliable models for predicting temporally resolved SOC baselines at scale are unavailable^27,28.

Numerous ML techniques and algorithms show promise for large-scale modeling by leveraging legacy data and remotely sensed environmental information^29,30. The Gradient-Boosted Decision Tree framework, like the CatBoost^31,32, is particularly interesting for its capability to efficiently construct and organize ensemble decision trees using categorical covariate data, which has proven highly valuable in modeling diverse datasets³³. Additionally, it enables the inclusion of covariates that influence SOC stocks at various kinetic orders²⁸, and spatial scales³⁴, making CatBoost models highly adaptable and precise for a wide range of environmental conditions. Furthermore, space-for-time substitution could be integrated to fill temporal gaps, where the spatial gradient is used for accounting for temporal variation and has been employed to enhance SOC predictions on explicit datasets^35,36. Thus, applying gradient-boosted ML modeling to establish dynamic baselines and extend them to broader regions has merits; however, further advancements are required to account for the spatial and temporal drivers influencing SOC stocks²⁹. Particularly, improvements should enable efficient hyper-parameter learning that accounts for spatial heterogeneity within a small region (e.g., a working land system or a watershed scale) and facilitate the effective use of space-time substitution to project temporally resolved SOC data within a specified area.

In this study, we have proposed and evaluated a two-stage methodology for time-adjusted mapping of SOC across the CONUS by leveraging an extensive compilation of legacy soil profile data that lacks repeated measurements. The primary objective was to integrate temporal and spatial analysis to create an approach that leverages these long-term records for mapping contemporary carbon distribution on a continental scale.

Results

Evaluation of the time adjustment method

A robust data set for modeling SOC in CONUS with over 21,000 data points was created after pre-processing, as discussed above. This data had spatiotemporal gaps (Fig. 1), highlighting discrepancies in spatial distribution and temporal coverage. Utilizing our temporal adjustment method (Fig. 2), we adjusted the multi-decadal SOC observations to 2020 (Fig. 3). This dataset represents the time-adjusted baseline data product, which can be used with bio-climatic and geomorphological variables to generate a spatially explicit SOC dataset for model training. The distribution after temporal adjustment to 2020 shows a noticeable leftward skew compared to non-time-adjusted data (Fig. 4), signifying a decrease in the overall averages in SOC values. This shift towards lower ranges aligns with the expectation of SOC loss over time in the CONUS. However, we observed positive and negative projections for temporal shifts (slopes) in SOC changes (Fig. 5). This variance underscores the spatial sensitivity of the approach to factors such as land use and other input variables.

Fig. 1

Soil profiles used in the study color-coded by sampling year showcasing spatial coverage and temporal variation. The bar plot below serves as a color key and also indicates the number of samples from each year.

Full size image

Model performance

Three models were developed to estimate: (a) SOC without time adjustment (SOC_no−adj), (b) SOC adjusted to reflect likely 2020 conditions (SOC₂₀₂₀), and (c) SOC adjusted to reflect likely 1980 conditions (SOC₁₉₈₀). For SOC₁₉₈₀, all data was adjusted to 1980 using a temporal adjustment process similar to that described for 2020. The 2020 adjusted model evaluated on the independent testing dataset exhibited an MAE of 3.38 and an RMSE of 4.49 (R²=0.41); all data is in kg/m² units (Table 1). Given the complexities associated with SOC prediction, the decrease in performance from train to test is expected and in line with other studies. With SOC₂₀₂₀ adjustment, the expected SOC stock ranged between 0 and 32.84 kg m⁻² (Fig. 6), whereas the range was between 2.24 and 21.25 kg m⁻² for SOC_no−adj projections (Figure S1). This broader range with temporal adjustments is expected as the process increases and decreases the expected SOC based on observed regional slopes.

Table 1 Statistical metrics for various models.

Full size table

Feature significance

The summary plot for SOC₂₀₂₀ (Fig. 7) revealed that the top five predictors for SOC stocks in 2020, ranked by importance, were aridity index, soil taxonomy, solar radiation, clay content (0–5 cm), and second layer clay content (5–10 cm). Higher values of the percentage of clay in surface soil and the aridity index corresponded to higher SOC values at most places. In comparison, lower values of solar radiation, evaporation-transpiration, upslope curvature, and temperature of the warmest month correspond to higher SOC values.

Interestingly, summary plots for SOCno-adj (Figure S2) and SOC₁₉₈₀ (Figure S3) had different feature importance when compared to SOC₂₀₂₀. The top five features for SOC₁₉₈₀, ranked by importance, were surface clay content (0–5 cm), soil taxonomy, second layer clay content (5–10 cm), aridity index, and elevation. Interestingly, surface clay content was the most important feature in both SOCno-adj and SOC₁₉₈₀ but moved down in the feature importance table in SOC₂₀₂₀. Solar radiation moved up to the top in SOC₂₀₂₀ projections, and lower solar radiation was correlated to higher SOC values.

SOC stocks

Using the model projections (SOC_1980, SOC_2020, and SOC_no−adj), we estimated the total SOC stock in the CONUS for cropland, grassland, and forest land. Comparing SOC₂₀₂₀ with SOC_un−adj revealed that just accounting for time reduced the carbon in croplands (top 1 m) from 14.29 Pg (SOC_no−adj) to 13.29 Pg (SOC₂₀₂₀) (Fig. 8). A similar trend was noted for pasture lands, where SOC stock in the top 1 m soil profile decreased from 15.49 Pg (SOC_no−adj) to 14.22 Pg (SOC₂₀₂₀). A contrasting trend was observed for forest lands, where the SOC stock within the top 1 m soil profile displayed an increase from 39.52 Pg (SOC_no−adj) to 40.83 Pg (SOC₂₀₂₀). These results are summarized in a supplementary table (Table S2).

Comparing SOC₁₉₈₀ and SOC₂₀₂₀ indicated that total SOC stocks within the top 1 m soil profile decreased in croplands from 14.49 Pg in 1980 to 13.29 Pg in 2020, representing a loss of approximately 8.35% (Fig. 9). A similar trend was noted for pasture lands, where SOC stocks in the top 1 m soil profile decreased from 16.26 Pg in 1980 to 14.22 Pg in 2020, representing a loss of approximately 12.54%. An opposite trend was noted for the forest lands, as SOC stocks in the top 1 m soil profile increased from 40.36 Pg in 1980 to 40.83 Pg in 2020, representing an increase of approximately 1.15%. Total SOC stocks, summing for all three major land use types, decreased from 71.12 Pg in 1980 to 68.33 Pg in 2020, reflecting a loss of approximately 3.92%. These results are summarized in a supplementary table (Table S3). Model outputs presented here are not adjusted for changes in the various land use areas in CONUS. The share of croplands decreased from 20.4 to 19.4% by 2020. Similarly, the share of pastures experienced a decline from 22.9 to 20.8%, whereas the contribution of forestry increased from 56.8 to 59.7% (Figure S4).

Discussions

Temporal adjustment methodology presented in this study addressed the historical data gaps to account for temporal trends in SOC. The method combines statistical and machine learning techniques. Outcomes suggest that this approach is effective in efficiently parameterizing dynamic input variables and their governing processes impacting SOC changes across spatial scales. For instance, we found that soil textural classes at local scales and the aridity index at larger spatial scales were particularly effective input variables for capturing SOC-temporal shifts within the specified area. Prior research has also emphasized the importance of combining modeling methods for SOC estimates at larger spatial scales³⁷. It particularly noted their effectiveness for more efficiently parameterizing the dynamic input variables (and processes) that drive carbon changes³⁸. Additionally, our method effectively identified spatiotemporal trends in SOC stocks, exhibiting divergent shifts under different land use types, as evidenced by the model outputs depicting trends in SOC shifts and CONUS maps illustrating slope variations. These findings further underscore the importance of temporally adjusting training data for modeling purposes. Our method will increase the availability of training data for models, which may enable the scientific community to develop improved adaptive learning models³⁹with the capacity to integrate new data as it becomes available. However, our methodology must be further improved and validated using recent data to address inherent limitations, including process-related constraints, data gaps, and uncertainties within the SOC data itself. A priority for the new soil sampling regime should also be to validate the temporal trends in different regions. This idea, to some extent, has been adopted in Europe²⁷ for a limited study and should yield exciting results when it matures.

The SOC₂₀₂₀model, trained on temporally adjusted data, exhibited performance consistent with previously reported models utilizing similar legacy datasets. Studies with various machine learning techniques to project SOC baselines at a higher resolution of 250 m² within the CONUS have reported model accuracy with R² values ranging between 0.28 and 0.7^{23,40,41,42,43}. Moreover, our model’s integration of temporal adjustment renders its performance metrics more reliable. However, we acknowledge that the correlations indicated by our model may be incomplete or partially inaccurate due to gaps in critical data. Other modeling approaches, including process-based models, also face similar limitations and would equally benefit from additional data inputs.

Significant features for SOC₂₀₂₀predictions included aridity index, soil taxonomy, solar radiation, clay content (0–5 cm), and second layer clay content (5–10 cm), which were also significant in previous ML modeling studies^44,45,46. However, some reshuffling among the top five covariates represented in SHAP plots for SOC₂₀₂₀, SOC₁₉₈₀, and SOC_no−adj, which further confirms the improvements in model performance by accurately capturing the dynamic relationship between governing factors of SOC and model projections after temporal adjustments. The top five important features for the models interestingly represented both micro-spatial scales (e.g., top clay and soil type) and macro-scales (e.g., aridity index), which likely played a role in effectively categorizing the temporal slope effects between land use types. For example, micro-scale effects are more important in croplands^47,48, whereas climatic effects may be more significant for forestry lands^49,50. By using a detailed taxonomy, we were able to utilize soil profile characteristics through depth effectively. This appears crucial for capturing SOC depth attenuation features at smaller spatial scales, which may not be discernible through remotely sensed climate and vegetation data. The influence of soil type on the vertical distribution of SOC stocks has been demonstrated⁵¹. Additionally, SOC is used as a segregation criterion in soil classification systems⁵², further emphasizing the utility of understanding soil type changes with depth. While our model incorporates several SOC-governing factors, it may not fully capture the complexity of abrupt SOC shifts, such as rapid responses to extreme conditions changes^53,54. For example, while the aridity index and slope are useful proxies, they may only partially represent the full impact of sudden changes, such as droughts or erosion, on SOC dynamics. Advanced datasets for SOC and environmental co-variates are needed to enable more nuanced, real-time adjustments to SOC temporal trends. The estimated stocks derived from our model projections fell within a comparable range to those reported in previous studies. We estimated the total SOC stocks for CONUS in the top 1 m soil profile at 71.12 PgC in 1980 and 68.33 PgC in 2020. Another study estimated this stock at 57 PgC⁵⁵, another at 62.2 PgC⁵⁶, and another study at 75.208 PgC and demonstrated a general declining trend of SOCs tocks across different regions of CONUS⁵⁷. Our estimates are comparable to this later report by Goncalves et al. (2021), mostly based on using a more recent repeated sampling between 2010 and 2013 within CONUS⁵⁸. One advantage of our method is its longer-term scalability, as it expands modeling to diverse legacy datasets that lacked repeated sampling efforts. Our model estimates are higher than the mean SOC stock reported for the CMIP6 model scenario, at 67.5 PgC⁵⁹and another at 64.2 PgC⁴⁰, confirming the overall SOC declining trends. While these models did incorporate temporal adjustments to model temporal shifts in SOC by leveraging the changing trajectory of climate variables, it’s essential to note that the feedback between temporal SOC shifts and climate change remains highly uncertain⁶⁰. Furthermore, the negative impacts of land use change on SOC are relatively substantial compared to the temporal changes induced by climate variability^61,62.

The sensitivity of our model to land use changes was evident in the notably variable slopes observed within different land use types and across spatial heterogeneity associated with climate factors. The negative slopes observed in croplands but not in forest land were particularly notable, indicating a significant impact of land use changes on SOC levels. According to the model projections, the total SOC- stocks in croplands’ top 1 m soil profile ranged from 14.94 PgC in 1980 to about 13.29 PgC in 2020. Comparable estimates have been reported in other studies; for instance, one study estimated 13.4 Pg C in the top 1 m in areas classified as croplands or agricultural lands in CONUS⁵⁵, and another reported 13.0 Pg C⁶³. However, cropland SOC stocks are dynamic, as noted in our model outputs, which showed a decrease of about 8.35% from 1980 to 2020. This trend is consistent with numerous studies demonstrating historical SOC loss in croplands⁶⁴. The continual decline in SOC from croplands is well-documented, attributed to drivers such as erosion⁶⁵, some specific land use practices⁶⁴, and climate change effects^66,67. For instance, tilled croplands in CONUS, covering approximately 210 thousand km²⁶⁸have been noted to experience significant SOC loss from the topsoil compared to no-till lands⁶⁹. Another indication of our model’s sensitivity is the contrasting trend observed for forest lands, wherein our projections indicated an increase of about 1.15% from 1980 (40.37 Pg C) to approximately 40.83 PgC in the year 2020 for the top 1-meter soil profile. Our projected estimates may not consistently align with the range reported in some studies, with estimates ranging from 20 Pg C⁷⁰, 32.7 PgC⁵⁶, and approximately 40 PgC^71,72. Such a wide range suggests that estimates for the SOC in forest lands are highly uncertain⁷³.

A small positive change in forest land SOC-stocks projected by our model can be attributed to increased net primary productivity of forest biomass, likely driven by rising atmospheric CO₂levels and enhanced dry deposition of nutrients, enriching forest soils⁷⁴. This could result in higher belowground carbon input and increased residue conversion to SOC. According to a USDA report for the National Forest Inventory, there are indications of increases in mineral soil SOC stocks in US forest lands⁷⁵. Our model predicted little or no change for large forest areas, which aligns with another study reporting that SOC in forest lands has been stable since 1907⁷⁶. These findings from our study collectively demonstrate that our model exhibits increased sensitivity to land use-induced changes, which serve as the primary drivers of SOC losses and temporal shifts. The results of this study show that our methodology is a promising alternative that demonstrates the potential for improving existing machine learning methods in SOC modeling but remains dependent on further data availability and methodological improvements.

Limitations and future research

Soil data availability is spatiotemporally very limited. Within the CONUS, we observed large swaths of areas with no data or limited dataset with no repeat testing. Such distribution makes understanding trends in SOC hard. This is not surprising, as the primary goal of soil testing is not to establish trends. However, given the focus on changes in SOC in response to changing climate, it is crucial to understand the trends in SOC. This underscores the need for additional data, as spatial and temporal data gaps remain a significant constraint for accurately projecting current stocks to validate the impacts of climate-smart farming practices and other uses. This imperative is further underscored by the substantial spatiotemporal data gaps prevalent in many regions globally, such as Africa (Figure S5). Although our approach represents a significant advancement in SOC modeling, yet it’s crucial to critically assess and improve by identifying areas where improvements are needed such as enhanced data collection useful for validation methods, comprehensive uncertainty quantification, and develop adaptive temporal and spatial models to refine past-projection techniques.

The study’s methodology is built on key assumptions regarding temporal and spatial adjustments, particularly the choice of a 5-year period for averaging and a 10 km spatial similarity threshold for slope computation. These thresholds were chosen to balance data variability and model sensitivity; however, they assume a level of temporal and spatial homogeneity that may not be universally applicable^77,78,79. We recognize that spatial similarity could vary by location and that a fixed distance might not capture all aspects of spatial variability. The 5-year temporal adjustment was chosen based on prior evidence showing that observable change in SOC takes multiple years^80,81, however, this may not be adequate in regions with extreme climatic events or rapid land use changes^53,54. Similarly, the 10 km spatial threshold was selected to ensure data sufficiency but does not capture finer-scale spatial heterogeneity. Future research should explore adaptive temporal frameworks and spatially flexible models that adjust based on local environmental conditions, potentially incorporating high-frequency data and advanced spatial statistics like variogram analysis to better reflect the true temporal and spatial dynamics influencing SOC.

Data harmonization and imputation such as simulating bulk density and standardizing soil depth profiles may introduce fixed correlations that could potentially influence our machine learning models by embedding artificial relationships. While harmonization was essential to create a workable and consistent dataset given the scope and scale of our study, this process can introduce biases in data relationships. However, the risk of introducing bias is even higher by leaving out a substantial amount of available soil data if not harmonized in large-scale models like ours. For certain soil properties, such as bulk density, there is no viable alternative currently due to the lack of sufficient field observation data. Detailed information outlining the imputation process at every step and the methods employed is provided in supplementary materials.

The validation of SOC model predictions in this study has some limitations due to the absence of long-term, systematically collected data, especially in regions experiencing substantial land use changes. The variability of environmental covariates such as climate, soil properties, and land use—also poses a challenge, given their significant spatial and temporal fluctuations that the current dataset may not fully capture^82,83. The reliance on limited historical land cover and land use (LCLU) data, which often lacks necessary temporal resolution⁸⁴, further complicates precise SOC modeling. The absence of detailed records on land management practices, such as crop rotations and tillage methods, further restricts our ability to account for these critical drivers of SOC change.

An improvement to our model could involve utilizing temporally resolved land use maps, as temporal changes in land use management have the potential to alter the land coverage used for our model predictions. For instance, the total cropland area in CONUS decreased by about 18% between 1949 and 2012⁸⁵. However, a reversal trend was observed in some regions between 2008 and 2012, with approximately 30,000 km² converted to cropland from grassland, potentially resulting in SOC loss⁸⁶. By capturing these temporal changes, we could further enhance model accuracy. Additionally, repeated sampling efforts similar to those conducted in Europe²⁷ would significantly improve our models and provide valuable validation data, which is a major limitation for both process-based and ML modeling. Robust validation will also enhance the explainability of ML models and mitigate their black-box nature, making the results more meaningful for interpreting SOC governing processes and accurately representing them in ML modeling of SOCchanges.

Quantifying uncertainty in SOC projections remains a complex challenge due to the inherent variability in environmental data and the interactions between multiple covariates. While multiple model runs were employed in this study to estimate uncertainty, this approach mainly addresses data variability and model structure uncertainty. Thus, model’s ability to produce unbiased estimates for large-scale SOC quantification using environmental data remains untested. In view of this, to provide a comparison, we developed a difference map between our SOC predictions and the ML-derived ISRIC SOC dataset for the US⁹⁹. This map, included as Supplementary Figure S6, highlights regional variations and areas where model estimates differ significantly. This comparison does not constitute a bias test. However, it provides valuable insight into potential discrepancies. A lower value than ISRIC is expected in our SOC predictions because our map accounts for temporal adjustment of data, reflects SOC loss trends, and excludes wetland areas from the analysis. These distinctions highlight differences in methodology and temporal focus between the two datasets. Future studies should develop comprehensive uncertainty quantification frameworks that account for parameter uncertainty, model structural uncertainty, and data uncertainty. Bayesian approaches and sensitivity analyses could provide deeper insights into the sources of uncertainty, allowing for more robust and reliable SOC predictions. However, without a mechanism to evaluate and correct bias, the model’s results should be interpreted cautiously, particularly in the context of applications requiring precise aggregate values, such as carbon trading schemes.

Methodology

Data accumulation and preparation

SOC data was compiled from various sources, including the International Soil Reference and Information Centre (ISRIC) database⁸⁷, the National Cooperative Soil Survey (NCSS) database⁸⁸, and the Rapid Carbon Assessment (RaCA) data⁵⁸. These sources contain soil profile (layers) records with SOC measurements at different depths collected across the CONUS over several decades (1910–2020). After removing the duplicate data, a standardized 1-meter SOC stock value (0–100 cm) was computed for all soil profile points from different sources. A depth-adjusted aggregation method was adopted to calculate the total stock value for 100 cm, which required computing bulk density for each soil profile layer. Bulk density values were estimated using a machine learning model for soil profile layers lacking bulk density. This model was trained on soil profiles with known bulk density values, utilizing SOC content, geographic coordinates, texture (sand and silt percentages), and soil horizon as predictors. SOC stock for each layer within the soil profiles was calculated by integrating the layer-specific SOC concentrations with the corresponding bulk density and thickness measurements. Only soil profiles extending to a depth of at least 1 m were considered, ensuring a comprehensive representation of the top meter. These calculations were confined to the upper 1 m of the soil profile to align with the standard depth for SOC stock assessment. In cases where layers extended beyond this depth, only the portion within the top 1 m was considered, applying a proportional linear adjustment based on the overlap with this depth interval. Additional details about the data, bulk density estimation model and depth-standardized aggregation method are provided in the supplementary document. Thus, for each soil profile, a depth-weighted average SOC content in the top 1 m (SOC_1m) was calculated based on the SOC measurements at recorded depth layers. The compiled dataset was filtered to remove outliers and physically implausible observations. The final aggregated set of SOC profiles encompasses multi-decadal measurements standardized to topsoil SOC_1m values, providing extensive coverage of SOC status across the study region (Fig. 1).

Temporal adjustment of stock values

A time series modeling approach was developed to characterize the status of soil organic carbon (SOC) and account for temporal variability by applying space-for-time substitution^89,90. For each sampling site, a pair-wise distance matrix was constructed with other sites within a 10 km radius based on haversine distance. Sites within this threshold (10 km) were considered proximal with more similar SOC temporal trajectories. The choice of a 10-km radius for establishing spatial similarity was intended to balance data density with model accuracy. While smaller distances could capture finer-scale spatial heterogeneity in SOC, reducing the radius significantly reduced the availability of data points for calculating temporal slopes. This reduction compromised the model’s capacity to generate reliable temporal adjustments, particularly given the inherent sparsity of SOC observations across extensive regions. Thus, we selected 10 km as a compromise to ensure sufficient observation density for reliable slope estimation while still preserving local spatial trends in SOC levels.

For each focal site, proximal sites were filtered to those with total clay and silt content within 10% of the focal site’s value to constrain variation in soil texture. Sampling years and corresponding SOC measurements for the proximal sites were extracted and aggregated to the nearest 5-year period (e.g., 2000–2004 rounded to 2000). The aggregation reduces the influence of short-term fluctuations and enhances the multi-decadal signal. Aggregating SOC values within 5-year periods also helps mitigate artifacts from uneven sampling density across years. It smooths the high-frequency variability unrelated to gradual SOC change, enhancing the signal of multi-decadal trends. Shorter (< 5 years) periods were found to be influenced by individual anomalous years, while more extended periods excessively smooth temporal dynamics. The mean SOC value within each five years was calculated as inputs for the temporal trend analysis.

A robust linear regression model (RLM) was fitted to the rounded 5-year midpoints and mean SOC values if at least 3 unique periods were available, implementing Huber’s T norm to limit the influence of outliers⁹¹. This yielded slope (m) and intercept (c) coefficients representing the temporal SOC trend for each focal site. Using these slope values, SOC was projected to the desired year using:

$$\:SO{C}_{Adjusted\:year}\:=\:{SOC}_{observed\:}+\:m\:*\:(Adjusted\:year\:-\:Sampling\:year)$$

For sites without sufficient temporal data (< 3 periods), the slope coefficient was predicted using a gradient-boosting model with a cross-validation approach based on climate, soil taxonomy, texture (clay, silt), terrain attributes, land use, and geographic coordinates. The model was trained on sites with known slope values. K-Fold cross-validation (k = 10) was applied, yielding an average R²of 0.6, demonstrating the model’s generalizability to new, unseen data. Subsequently, a final model was trained on the entire dataset using the parameters refined through a 10-fold cross-validation process, which achieved an R² of 0.92. Slope values for sites without RLM trends were predicted with this model and used to estimate projected SOC. Additional details about the data and process are provided in the supplementary document. The process of temporal adjustment of SOC to the 2020 reference year (SOC₂₀₂₀) is shown in Fig. 2.

Feature selection for spatially distributed SOC modeling

An input dataset with topographical characteristics, climates, soil types, land use, and vegetation characteristics of CONUS was developed in consultation with soil scientists to represent likely factors affecting SOC; see supplement Table 1 for details. The spatial covariates/features included elevation, aspect, slope, and several topographical curvatures, including cross-sectional, downslope, flowline, general, local, local downslope, local upslope, maximal, minimal, plan, profile, tangential, total, and upslope. Climatic covariates included aridity, potential evapotranspiration, solar radiation, and several bioclimatic variables based on precipitation and temperature. Soil-related covariates included USDA soil taxonomy, soil water content, clay content at six standard depths (0, 10, 30, 60, 100, and 200 cm), sand content at those six standard depths, and eleven soil temperature-based bioclimatic variables at 0–5 cm soil depth and 5–15 cm soil depths. Land use and crop-related covariates included MODIS land use land cover classifications and Crop development Layer (CDL). Vegetation covariates included NDVI and EVI computed at several percentiles on an annual scale (minimum, 25 percentile, median, 75 percentile, 95 percentile, and maximum). Covariates are described in Supplement A. Feature selection was executed in three phases. First, a correlation analysis identified relationships among covariates, enabling the exclusion of highly correlated variables to diminish redundancy. Second, the covariate pool was condensed according to preliminary model results, employing feature-importance scores. The complexity of the model was minimized by preserving only the most pivotal features. The third phase involved consulting subject-matter experts and pertinent literature to evaluate the applicability of the remaining covariates, ensuring that their inclusion was contextually justified. After these steps, the initial 97 co-variates were narrowed to 38 covariates (Supplement Table S1).

SOC ML model

A boosting-tree-based regression model (CatBoost) (https://catboost.ai/en/docs/), with a strategy to ensure minimal overfitting and optimal generalizability, was used for predicting SOC at 1 km resolution. The temporally adjusted dataset was segregated into training, validation, and test subsets. Of the data, 80% was randomly picked for the training phase, while the remaining 20% was randomly divided equally between validation and testing. The model was trained on the training dataset and concurrently evaluated on the unseen validation data. This synchronous assessment during training facilitated early detection of overfitting, thus preserving generalization capabilities.

Upon completion of the training process, the model’s performance was evaluated on the test dataset—a subset of data previously not seen by the model. This served as a reliable test for the model’s predictive proficiency. Statistical metrics, including Mean Absolute Error (MAE), the coefficient of determination (R²score), and the Root Mean Square Error (RMSE), were computed to quantify the model’s performance. This multi-layered approach, involving the distinct usage of training, validation, and testing datasets, facilitated extracting and applying key features that could be generalized⁹².

Assessment of model features

SHAP (SHapley Additive exPlanations) analysis was used to enhance the model’s interpretability using the Python Shap package (https://shap.readthedocs.io/en/latest/). The bee-swarmplot (included in the supplement) provides insight into how each feature contributes to the model’s predictions, transforming the model from a black box into a somewhat explainable framework^93,94. SHAP decomposes the model output into the sum of the effects of each feature being introduced into a conditional expectation. The SHAP value for a feature represents the average marginal contribution of a feature across all possible combinations⁹⁵.

Modeling uncertainty

Modeling uncertainties were investigated by employing 10 runs of the model, with each run initiated using a distinct random seed and trained on the different split of the data, ensuring each model was exposed to a unique portion of the data. This approach allowed the estimation of errors due to the data’s inherent variability. Standard deviation across the predictions from this suite of model runs was used to quantify the uncertainty in SOC stock predictions. Similar methods have been used by other studies^96,97,98.

Data visualization

The plots in Figs. 1, 3 and 5 were initially created using Plotly Express (v5.13.1) and Plotly Graph Objects (v5.13.1) for Python (plotly.express.scatter_geo — 5.24.1 documentation). Custom scatter plots were generated using the scatter_geo function to represent SOC data across the USA. Custom color scales were applied to visualize the data effectively, ranging from temporal trends to SOC and slope values. These color scales were designed to highlight specific patterns in the data, whether by year, SOC value, or slope magnitude. After generating the plots, additional modifications were made using Adobe Illustrator (v28.6) (https://www.adobe.com/products/illustrator.html) to enhance the visuals for publication-quality standards, ensuring clarity and consistency across the figures. The map in Fig. 6 was generated by importing the SOC raster data into ArcGIS Pro. The layout was designed within the software, and the final image was exported. Data Visualization with ArcGIS Pro: The raster data was visualized in ArcGIS Pro with customized layouts and symbology suited to the dataset. The exported image was refined using Adobe Illustrator (v28.6) to ensure publication-quality adjustments and visual consistency with the other figures.

Data availability

Datasets generated and/or analyzed during the current study are available at the following websites. Some of the data projections generated by the current study are available from the corresponding authors on reasonable request. https://www.isric.org/ , https://www.nrcs.usda.gov/resources/data-and-reports/ncss-soil-characterization-data-lab-data-mart. https://www.nrcs.usda.gov/resources/data-and-reports/rapid-carbon-assessment-raca. Historical climate data — WorldClim 1 documentation, Earth Engine Data Catalog | Google for Developers.

Change history

04 July 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-07846-1

References

Lehmann, J., Bossio, D. A., Kögel-Knabner, I. & Rillig, M. C. The concept and future prospects of soil health. Nat. Reviews Earth Environ., 1–10 (2020).
Manns, H. R. & Berg, A. A. Importance of soil organic carbon on surface soil water content variability among agricultural fields. J. Hydrol. 516, 297–303 (2014).
CAS Google Scholar
Bossio, D. et al. The role of soil carbon in natural climate solutions. Nat. Sustain. 3, 391–398 (2020).
Google Scholar
Lal, R. Soil carbon sequestration to mitigate climate change. Geoderma 123, 1–22 (2004).
CAS Google Scholar
IPCC. (USA. Cambridge University Press, (2013).
Schneider, U. A., McCarl, B. A. & Schmid, E. Agricultural sector analysis on greenhouse gas mitigation in US agriculture and forestry. Agric. Syst. 94, 128–140. https://doi.org/10.1016/j.agsy.2006.08.001 (2007).
Article Google Scholar
Smith, P. et al. Greenhouse gas mitigation in agriculture. Philosophical Trans. Royal Soc. B: Biol. Sci. 363, 789–813 (2008).
CAS Google Scholar
Conant, R. T., Ogle, S. M., Paul, E. A. & Paustian, K. Measuring and monitoring soil organic carbon stocks in agricultural lands for climate mitigation. Front. Ecol. Environ. 9, 169–173 (2011).
Google Scholar
Badgley, G. et al. Systematic over-crediting in California’s forest carbon offsets program. Glob Chang. Biol. 28, 1433–1445 (2022).
CAS Google Scholar
Oldfield, E. E. et al. Crediting agricultural soil carbon sequestration. Science 375, 1222–1225 (2022).
CAS Google Scholar
Chenu, C. et al. Increasing organic stocks in agricultural soils: knowledge gaps and potential innovations. Soil Tillage. Res. 188, 41–52 (2019).
Google Scholar
Lehmann, J. & Kleber, M. The contentious nature of soil organic matter. Nature 528, 60–68. https://doi.org/10.1038/nature16069 (2015).
Article CAS Google Scholar
Luo, Z., Viscarra Rossel, R. A. & Shi, Z. Distinct controls over the temporal dynamics of soil carbon fractions after land use change. Glob Chang. Biol. 26, 4614–4625 (2020).
Google Scholar
Viscarra Rossel, R. A., Webster, R., Bui, E. N. & Baldock, J. A. Baseline map of organic carbon in Australian soil to support national carbon accounting and monitoring under climate change. Glob Chang. Biol. 20, 2953–2970 (2014).
PubMed Central Google Scholar
Smith, P. et al. How to measure, report and verify soil carbon change to realize the potential of soil carbon sequestration for atmospheric greenhouse gas removal. Glob Chang. Biol. 26, 219–241. https://doi.org/10.1111/gcb.14815 (2020).
Article Google Scholar
Morais, T. G., Teixeira, R. F. & Domingos, T. Detailed global modelling of soil organic carbon in cropland, grassland and forest soils. PLoS One. 14, e0222604 (2019).
CAS PubMed Central Google Scholar
Smith, J. et al. Projected changes in the organic carbon stocks of cropland mineral soils of European Russia and the Ukraine, 1990–2070. Glob Chang. Biol. 13, 342–356 (2007).
Google Scholar
Smith, P. et al. A comparison of the performance of nine soil organic matter models using datasets from seven long-term experiments. Geoderma 81, 153–225 (1997).
Google Scholar
Huang, H. et al. A review on digital mapping of soil carbon in cropland: progress, challenge, and prospect. Environ. Res. Lett. 17, 123004 (2022).
Google Scholar
Wadoux, A. M. C., Padarian, J. & Minasny, B. Multi-source data integration for soil mapping using deep learning. Soil 5, 107–119 (2019).
CAS Google Scholar
Kumar, S., Lal, R. & Liu, D. A geographically weighted regression kriging approach for mapping soil organic carbon stock. Geoderma 189, 627–634 (2012).
Google Scholar
Yates, K. L. et al. Outstanding challenges in the transferability of ecological models. Trends Ecol. Evol. 33, 790–802 (2018).
Google Scholar
Wang, Z. et al. Upscaling Soil Organic Carbon measurements at the Continental Scale Using Multivariate Clustering Analysis and Machine Learning. J. Geophys. Research: Biogeosciences. 129 https://doi.org/10.1029/2023JG007702 (2024). e2023JG007702.
Xia, Y., McSweeney, K. & Wander, M. M. Digital Mapping of Agricultural Soil Organic Carbon using soil forming factors: a review of current efforts at the Regional and National scales. Front. Soil. Sci. 2, 890437 (2022).
Google Scholar
De Rosa, D. et al. Soil organic carbon stocks in European croplands and grasslands: how much have we lost in the past decade? Glob Chang. Biol. 30, e16992. https://doi.org/10.1111/gcb.16992 (2024).
Article CAS Google Scholar
Heikkinen, J., Keskinen, R., Kostensalo, J. & Nuutinen, V. Climate change induces carbon loss of arable mineral soils in boreal conditions. Glob Chang. Biol. 28, 3960–3973 (2022).
CAS PubMed Central Google Scholar
Le Noë, J. et al. Soil organic carbon models need independent time-series validation for reliable prediction. Commun. Earth Environ. 4, 158 (2023).
Google Scholar
Menichetti, L., Ågren, G. I., Barré, P., Moyano, F. & Kätterer, T. Generic parameters of first-order kinetics accurately describe soil organic matter decay in bare fallow soils over a wide edaphic and climatic range. Sci. Rep. 9, 20319 (2019).
CAS PubMed Central Google Scholar
Grunwald, S. Artificial intelligence and soil carbon modeling demystified: power, potentials, and perils. Carbon Footprints. 1, 5 (2022).
Google Scholar
Heuvelink, G. B. et al. Machine learning in space and time for modelling soil organic carbon change. Eur. J. Soil. Sci. 72, 1607–1623 (2021).
CAS Google Scholar
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).
Dorogush, A. V. et al. Fighting biases with dynamic boosting. arXiv preprint arXiv:1706.09516 (2017).
Hancock, J. T. & Khoshgoftaar, T. M. CatBoost for big data: an interdisciplinary review. J. big data. 7, 94 (2020).
PubMed Central Google Scholar
Wiesmeier, M. et al. Soil organic carbon storage as a key function of soils - a review of drivers and indicators at various scales. Geoderma 333, 149–162. https://doi.org/10.1016/j.geoderma.2018.07.026 (2019).
Article CAS Google Scholar
Georgiou, K., Abramoff, R. Z., Harte, J., Riley, W. J. & Torn, M. S. Microbial community-level regulation explains soil carbon responses to long-term litter manipulations. Nat. Commun. 8, 1223 (2017).
PubMed Central Google Scholar
Keyvanshokouhi, S. et al. Effects of soil process formalisms and forcing factors on simulated organic carbon depth-distributions in soils. Sci. Total Environ. 652, 523–537 (2019).
CAS Google Scholar
Riggers, C. et al. Multi-model ensemble improved the prediction of trends in soil organic carbon stocks in German croplands. Geoderma 345, 17–30 (2019).
CAS Google Scholar
Biney, J. K. M. et al. Prediction of topsoil organic carbon content with Sentinel-2 imagery and spectroscopic measurements under different conditions using an ensemble model approach with multiple pre-treatment combinations. Soil Tillage. Res. 220, 105379 (2022).
Google Scholar
Shen, Z. et al. Deep transfer learning of global spectra for local soil carbon monitoring. ISPRS J. Photogrammetry Remote Sens. 188, 190–200. https://doi.org/10.1016/j.isprsjprs.2022.04.009 (2022).
Article Google Scholar
Gautam, S. et al. Continental United States may lose 1.8 petagrams of soil organic carbon under climate change by 2100. Glob. Ecol. Biogeogr. 31, 1147–1160. https://doi.org/10.1111/geb.13489 (2022).
Article Google Scholar
Mishra, U., Gautam, S., Riley, W. J. & Hoffman, F. M. Ensemble Machine Learning Approach improves predicted spatial variation of Surface Soil Organic Carbon stocks in Data-Limited Northern Circumpolar Region. Front. Big Data. 3 https://doi.org/10.3389/fdata.2020.528441 (2020).
Poggio, L. et al. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. SOIL 7, 217–240. https://doi.org/10.5194/soil-7-217-2021 (2021).
Article CAS Google Scholar
Zhang, L. et al. A CNN-LSTM Model for Soil Organic Carbon Content Prediction with Long Time Series of MODIS-Based phenological variables. Remote Sens. 14, 4441 (2022).
Google Scholar
Georgiou, K. et al. Divergent controls of soil organic carbon between observations and process-based models. Biogeochemistry 156, 5–17. https://doi.org/10.1007/s10533-021-00819-2 (2021).
Article CAS Google Scholar
Román Dobarco, M. et al. Mapping soil organic carbon fractions for Australia, their stocks, and uncertainty. Biogeosciences 20, 1559–1586. https://doi.org/10.5194/bg-20-1559-2023 (2023).
Article CAS Google Scholar
Duarte-Guardia, S. et al. Better estimates of soil carbon from geographical data: a revised global approach. Mitig. Adapt. Strat. Glob. Change. 24, 355–372 (2019).
Google Scholar
Tan, Z., Lal, R., Smeck, N. & Calhoun, F. Relationships between surface soil organic carbon pool and site variables. Geoderma 121, 187–195 (2004).
CAS Google Scholar
Goidts, E., Wesemael, B. V. & Van Oost, K. Driving forces of soil organic carbon evolution at the landscape and regional scale using data from a stratified soil monitoring. Glob Chang. Biol. 15, 2981–3000 (2009).
Google Scholar
Powers, J. S. & Veldkamp, E. Regional variation in soil carbon and δ 13 C in forests and pastures of northeastern Costa Rica. Biogeochemistry 72, 315–336 (2005).
CAS Google Scholar
Seibert, J., Stendahl, J. & Sørensen, R. Topographical influences on soil properties in boreal forests. Geoderma 141, 139–148 (2007).
CAS Google Scholar
Hobley, E., Wilson, B., Wilkie, A., Gray, J. & Koen, T. Drivers of soil organic carbon storage and vertical distribution in Eastern Australia. Plant. Soil. 390, 111–127 (2015).
CAS Google Scholar
Michéli, E., Owens, P. R., Láng, V., Fuchs, M. & Hempel, J. Organic Carbon as a major differentiation criterion in soil classification systems. Soil. Carbon, 37–43 (2014).
Wang, M. et al. Global soil profiles indicate depth-dependent soil carbon losses under a warmer climate. Nat. Commun. 13, 5514 (2022).
CAS PubMed Central Google Scholar
Wang, M. et al. Responses of soil organic carbon to climate extremes under warming across global biomes. Nat. Clim. Change. 14, 98–105. https://doi.org/10.1038/s41558-023-01874-3 (2024).
Article CAS Google Scholar
Bliss, N. B., Waltman, S. W., West, L. T., Neale, A. & Mehaffey, M. Distribution of soil organic carbon in the conterminous United States. Soil. Carbon, 85–93 (2014).
Guo, Y., Gong, P., Amundson, R. & Yu, Q. Analysis of Factors Controlling Soil Carbon in the Conterminous United States. Soil Sci. Soc. Am. J. 70, 601–612. https://doi.org/10.2136/sssaj2005.0163 (2006).
Article CAS Google Scholar
Gonçalves, D. R. P., Mishra, U., Wills, S. & Gautam, S. Regional environmental controllers influence continental scale soil carbon stocks and future carbon dynamics. Sci. Rep. 11, 6474. https://doi.org/10.1038/s41598-021-85992-y (2021).
Article CAS PubMed Central Google Scholar
Wills, S. et al. Springer International Publishing,. in Soil Carbon (eds Alfred E. Hartemink & Kevin McSweeney) 95–104 (2014).
O’Neill, B. C. et al. The scenario model intercomparison project (ScenarioMIP) for CMIP6. Geosci. Model Dev. 9, 3461–3482 (2016).
Google Scholar
Wiesmeier, M., Barthold, F. & Blank, B. Kögel-Knabner, I. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant. soil. 340, 7–24 (2011).
CAS Google Scholar
Guo, L. B. & Gifford, R. M. Soil carbon stocks and land use change: a meta analysis. Glob Chang. Biol. 8, 345–360. https://doi.org/10.1046/j.1354-1013.2002.00486.x (2002).
Article Google Scholar
Houghton, R. A. Interactions between Land-Use Change and Climate-Carbon Cycle Feedbacks. Curr. Clim. Change Rep. 4, 115–127. https://doi.org/10.1007/s40641-018-0099-9 (2018).
Article Google Scholar
Lajtha, K., Bailey, V. L. & McFarlane, K. The Second State of the Carbon Cycle Report-Chap. 12. Soils (Lawrence Livermore National Lab.(LLNL), 2018). (United States).
Google Scholar
Sanderman, J., Hengl, T. & Fiske, G. J. Soil carbon debt of 12,000 years of human land use. Proc. Natl. Acad. Sci. U. S. A. 114, 9575–9580 (2017).
Van Oost, K. et al. The impact of agricultural soil erosion on the global carbon cycle. Science 318, 626–629 (2007).
Google Scholar
Bond-Lamberty, B. & Thomson, A. Temperature-associated increases in the global soil respiration record. Nature 464, 579–582 (2010).
CAS Google Scholar
Crowther, T. W. et al. Quantifying global soil carbon losses in response to warming. Nature 540, 104–108. https://doi.org/10.1038/nature20150 (2016). http://www.nature.com/nature/journal/v540/n7631/abs/nature20150.html#supplementary-information
Article CAS Google Scholar
Prăvălie, R. Exploring the multiple land degradation pathways across the planet. Earth Sci. Rev. 220, 103689 (2021).
Google Scholar
Sulman, B. N. et al. Land Use and Land Cover affect the depth distribution of Soil Carbon: insights from a large database of soil profiles. Front. Environ. Sci. 8 https://doi.org/10.3389/fenvs.2020.00146 (2020).
Domke, G. M., Oswalt, S. N., Walters, B. F. & Morin, R. S. Tree planting has the potential to increase carbon sequestration capacity of forests in the United States. Proceedings of the national academy of sciences 117, 24649–24651 (2020).
Liu, J. et al. Critical land change information enhances the understanding of carbon balance in the United States. Glob Chang. Biol. 26, 3920–3929 (2020).
Google Scholar
Sleeter, B. M. et al. Effects of contemporary land-use and land-cover change on the carbon balance of terrestrial ecosystems in the United States. Environ. Res. Lett. 13, 045006. https://doi.org/10.1088/1748-9326/aab540 (2018).
Article Google Scholar
Domke, G. et al. Toward inventory-based estimates of soil organic carbon in forests of the United States. Ecol. Appl. 27, 1223–1235 (2017).
CAS Google Scholar
Sperry, J. S. et al. The impact of rising CO2 and acclimation on the response of US forests to global warming. Proceedings of the National Academy of Sciences 116, 25734–25744 (2019).
US-EPA. Inventory of US Greenhouse Gas Emissions and Sinks: 1990–2021. (U.S. Environmental Protection Agency, Washington, DC. (2021). https://www.epa.gov/ghgemissions/inventory-us-greenhouse-gas-emissions-and-sinks.
Magerl, A., Le Noë, J., Erb, K. H., Bhan, M. & Gingrich, S. A comprehensive data-based assessment of forest ecosystem carbon stocks in the US 1907–2012. Environ. Res. Lett. 14, 125015 (2019).
CAS Google Scholar
Goidts, E., Van Wesemael, B. & Crucifix, M. Magnitude and sources of uncertainties in soil organic carbon (SOC) stock assessments at various scales. Eur. J. Soil. Sci. 60, 723–739 (2009).
CAS Google Scholar
Duarte-Guardia, S. et al. Biophysical and socioeconomic factors influencing soil carbon stocks: a global assessment. Mitig. Adapt. Strat. Glob. Change. 25, 1129–1148 (2020).
Google Scholar
Lessmann, M., Ros, G. H., Young, M. D. & de Vries, W. Global variation in soil carbon sequestration potential through improved cropland management. Glob Chang. Biol. 28, 1162–1177 (2022).
CAS Google Scholar
Post, W. M. & Kwon, K. C. Soil carbon sequestration and land-use change: processes and potential. Glob Chang. Biol. 6, 317–327. https://doi.org/10.1046/j.1365-2486.2000.00308.x (2000).
Article Google Scholar
Smith, P. How long before a change in soil organic carbon can be detected? Glob Chang. Biol. 10, 1878–1883. https://doi.org/10.1111/j.1365-2486.2004.00854.x (2004).
Article Google Scholar
Wadoux, A. M. C., Minasny, B. & McBratney, A. B. Machine learning for digital soil mapping: applications, challenges and suggested solutions. Earth Sci. Rev. 210, 103359 (2020).
Google Scholar
Grunwald, S. Artificial intelligence and soil carbon modeling demystified: power, potentials, and perils. Carbon Footprints. 1, 6 (2022).
Google Scholar
Wang, J., Bretz, M., Dewan, M. A. A. & Delavar, M. A. Machine learning in modelling land-use and land cover-change (LULCC): current status, challenges and prospects. Sci. Total Environ. 822, 153559 (2022).
CAS Google Scholar
Bigelow, D. & Borchers, A. Major uses of land in the United States, (2017). (2012).
Spawn, S. A., Lark, T. J. & Gibbs, H. K. Carbon emissions from cropland expansion in the United States. Environ. Res. Lett. 14, 045009 (2019).
CAS Google Scholar
Batjes, N., Ribeiro, E., van Oostrum, A. & Leenaars, J. & Jesus de Mendes, J. Standardised soil profile data for the world (WoSIS, July 2016 snapshot). http:\dx.doi.org10 (2016).
Reinsch, T. & West, L. The US national cooperative soil characterization database. (2010).
Blois, J. L., Williams, J. W., Fitzpatrick, M. C., Jackson, S. T. & Ferrier, S. Space can substitute for time in predicting climate-change effects on biodiversity. Proceedings of the national academy of sciences 110, 9374–9379 (2013).
Pickett, S. T. In Long-term Studies in Ecology: Approaches and Alternatives110–135 (Springer, 1989).
Google Scholar
Huber, P. J. Robust regression: asymptotics, conjectures and Monte Carlo. Annals Stat., 799–821 (1973).
Xu, Y. & Goodacre, R. On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J. Anal. Test. 2, 249–262 (2018).
PubMed Central Google Scholar
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017).
Štrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41, 647–665 (2014).
Google Scholar
Shapley, L. S. 307–317 (Princeton University Press Princeton, NJ, (1953).
Dutschmann, T. M., Kinzel, L., Ter Laak, A. & Baumann, K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J. Cheminform. 15, 49 (2023).
PubMed Central Google Scholar
Wang, Z. et al. Upscaling soil organic carbon measurements at the continental scale using multivariate clustering analysis and machine learning. J. Geophys. Research: Biogeosciences. 129, e2023JG007702 (2024).
CAS Google Scholar
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 30 (2017).
Hengl, T. et al. SoilGrids250m: global gridded soil information based on machine learning. PLoS One. 12, e0169748. https://doi.org/10.1371/journal.pone.0169748 (2017).
Article CAS PubMed Central Google Scholar

Download references

Acknowledgements

This study was funded by SIF, PBC. ACS received funding from USDA-NIFA Hatch project (Project number: TEX0-1-9603). The funders played no role in the study design, data collection, analysis , and interpretation of data, or the writing of this manuscript.

Author information

Authors and Affiliations

School of Sustainable Engineering and Built Environment, Arizona State University, 777 E University Dr., 85287, AZ, Tempe, USA
Laxman Bokati, Saurav Kumar & Rahul Perepi
Texas A&M AgriLife Research, Texas A&M University, 1710 FM 3053, 75684, Overton, TX, USA
Anil Somenahally
Department of Soil and Crop Sciences, Texas A&M University, 370 Olsen Blvd. College Station, 77843, TX, Texas, USA
Anil Somenahally & Javad Robatjazi
Jackson State University, 1400 John R. Lynch St. Jackson, 39217-0168, MS, Jackson, USA
Rocky Talchabhadel
Prairie View A&M University, PO. Box 519 MS 2008, Prairie View, TX, 978-7190, 77446, USA
Reshmi Sarkar

Authors

Laxman Bokati
View author publications
Search author on:PubMed Google Scholar
Anil Somenahally
View author publications
Search author on:PubMed Google Scholar
Saurav Kumar
View author publications
Search author on:PubMed Google Scholar
Javad Robatjazi
View author publications
Search author on:PubMed Google Scholar
Rocky Talchabhadel
View author publications
Search author on:PubMed Google Scholar
Reshmi Sarkar
View author publications
Search author on:PubMed Google Scholar
Rahul Perepi
View author publications
Search author on:PubMed Google Scholar

Contributions

LB: Data collection, curation and analysis, methodology, original draft. ACS: Conceptualization, methodology, supervision, funding, original draft preparation. SK: Conceptualization, methodology, supervision, funding, editing. RT: Data collection, editing. JR: Methodology, editing. RS: Methodology, editing. RP: Analysis, data collection, methodology.

Corresponding authors

Correspondence to Anil Somenahally or Saurav Kumar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this Article was revised: The original version of this Article contained an error in the spelling of the author Rocky Talchabhadel, which was incorrectly given as Rocky Talchabadel.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bokati, L., Somenahally, A., Kumar, S. et al. Temporal adjustment approach for high-resolution continental scale modeling of soil organic carbon. Sci Rep 15, 6483 (2025). https://doi.org/10.1038/s41598-025-89503-1

Download citation

Received: 30 June 2024
Accepted: 05 February 2025
Published: 22 February 2025
DOI: https://doi.org/10.1038/s41598-025-89503-1

Keywords

This article is cited by

Advancing Infiltration Rate Prediction in Algeria’s Mitidja Plain: A Machine Learning and Empirical Model Comparison
- Amina Mazighi
- Mohamed Meddi
- Aqil Tariq
Earth Systems and Environment (2025)