100 m soil landscape grids of Canada

Geng, Xiaoyuan; He, Juanxia; Grima, Victoria; Jiang, Yefang; Tetreau, Maggi; Crittenden, Stephen; Kiley, Simon; VandenBygaart, Albert. J.; Vanrobaeys, Jason

doi:10.1038/s41597-025-05460-4

Download PDF

Data Descriptor
Open access
Published: 10 July 2025

100 m soil landscape grids of Canada

Xiaoyuan Geng ORCID: orcid.org/0009-0009-1727-4431¹,
Juanxia He¹,
Victoria Grima²,
Yefang Jiang¹,
Maggi Tetreau³,
Stephen Crittenden¹,
Simon Kiley¹,
Albert. J. VandenBygaart¹ &
…
Jason Vanrobaeys⁴

Scientific Data volume 12, Article number: 1178 (2025) Cite this article

2902 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

With the latest collection of soil point and available co-variables data, 100 m grids of soil type and selected soil attributes of Canada are developed. While model-based statistical validations during the machine learning runs are satisfactory, limited independent statistical validations are also made. The 100 m soil landscape grids can be used for national and regional soil organic carbon stock inventory and carbon sequestration assessment. There are also potential applications for soil health, soil erosion, land suitability and other large area-based modelling and applications.

A global database of land management, land-use change and climate change effects on soil organic carbon

Article Open access 24 May 2022

Variation of soil organic carbon and physical properties in relation to land uses in the Yellow River Delta, China

Article Open access 23 November 2020

Geospatial digital mapping of soil organic carbon using machine learning and geostatistical methods in different land uses

Article Open access 06 February 2025

Background & Summary

Canada is the second largest country in the world with nearly 10 million square kilometres of land area, approximately 62 million hectares of which is agricultural land. Up-to-date soil landscape data are essential for precision agriculture, ecosystem and nutrient cycling modelling, soil organic carbon (SOC) stock inventory, SOC sequestration potential assessment, and sustainable soil management use¹. With diminishing investment in conventional soil surveys, operational predictive soil mapping (PSM) frameworks and methods have been used to provide up-to-date soil data and information across landscape scales in Canada^2,3. Predictive soil mapping, also referred to as digital soil mapping (DSM), is the computer-assisted production of spatial data or maps of soil types and soil properties using structured knowledge of soil and its relation with environmental variables. PSM involves the creation and population of spatially explicit soil information from field and laboratory observations coupled with spatial and non-spatial soil inference systems⁴. Soil classes and properties include categorical examples such as soil classification names, and interdependent continuous variables such as bulk density. In Canada, the first gridded national soil and soil landscape data developed using the PSM method were the subset of the global PSM outputs⁵. One of the advantages of PSM methods is that predictions can be repeated or updated as new training point data and co-variable data become available. Because of that, the global soil grids were updated in 2021⁶. When computing capacity is available, ensemble machine learning can improve the performance and can provide better deterministic predictions of soil types and properties⁷. Among the studied machine learners such as artificial neural network, support vector machine, gradient boosting decision tree and random forest algorithms, different variations of random forest algorithms have produced more accurate PSM outcomes^3,5.

In Canada, with more available soil point data, new co-variable data and advances in machine learning algorithms, regional data of PSM have also been generated^8,9. For the nationwide PSM, more point soil data are being gathered and compiled¹⁰. Bias correction¹¹ has been added to the random forest-based machine learning algorithms. Along with new co-variable data, nationwide PSM has been conducted at a 100 m grid resolution. These kinds of incremental predictive soil mapping operations will be repeated in the future as more nationwide soil sampling work yields new data based on purposive sampling designs¹².

The outputs of PSM should be accompanied with uncertainty measures¹³ and independent accuracy assessment^12,14. PSM output accuracy assessment remains a challenge, especially when validating the PSM outputs at regional, national and global scales¹⁵. Point-to-point based validation requires a purposefully-designed and collected independent point data set. For large mapped areas, this requirement is often met with using legacy soil survey point data that may not adequately provide suitable spatial representability due to the variability of landscapes and processes that influence soil properties. Mapped data for large areas like USA and Australia, point data-based independent validations often come with lower accuracies^16,17. However, areal or management unit-based accuracy assessments are more meaningful to end users of the PSM outputs¹⁸. The primary goal of this study is to develop 100 m soil landscape grids of Canada using PSM method, and to conduct statistical evaluations of the predictions¹⁹.

Methods

Machine learning and predictive soil mapping

Soil classes and properties are influenced by soil forming factors, namely climate, topography, parent material, organisms and time. This is further described with a state factor equation: CLORPT (CL = climate condition at a point; O = organisms including land cover; R = relief or topographic attributes; P = parent or surficial geological material; T = time or age) and is expressed as a theoretical soil-landscape relationship²⁰. To truly reflect the relationship between soil properties and soil forming factors, the universal model of spatial variation recognizes that both the deterministic and stochastic components of soil forming factors need to be modelled²¹. To represent the soil forming factors using computerized systems, McBratney et al.²² expanded the CLORPT to SCORPAN model²². SCORPAN model states that soil class or soil property is function of soil intrinsic properties, climate, organism, relief, parent materials, time and spatial location.

$${\boldsymbol{S}}={\boldsymbol{f}}\left({\boldsymbol{s}},{\boldsymbol{c}},{\boldsymbol{o}},{\boldsymbol{r}},{\boldsymbol{p}},{\boldsymbol{a}},{\boldsymbol{n}}\right)$$

(1)

Where

s - soil intrinsic properties

c - climate

o -organism

r - relief or topography

p - parent materials

a - age or time

n - spatial coordinates of a point soil data

Such soil-environment relationship frameworks, especially the SCORPAN model, have been the foundation of recent predictive soil mapping⁵. The key steps of predictive soil mapping include training data, co-variable data collection and compilation, machine learning model building, predictions, uncertainty measures, intrinsic and independent statistical validations, and data quality assurance (Fig. 1). R package Ranger was used to construct random forest-based inference models and predictions of great group soil classes and selected soil properties^23,24. For soil property inference, the use of quantile random forest (QRF) option enabled the estimation of prediction uncertainty.

Point soil data processing and compilation

Point soil data with geographic coordinates are the key training data source for machine learning. A point of soil from a specific location is referred to as a pedon. A standard pedon “is the smallest, three-dimensional unit at the surface of the earth that is considered as a soil”²⁵, and it usually has a surface area of approximately 1 m². There is a need to harmonize the layered soil property values to provide uniform depths²⁶ because the original point soil data were not collected from consistent depths. Spline algorithms implemented in R²³ were used to harmonize point soil data to uniform depths²⁷. For this work, the two main soil point data sets were from Canadian Soil Information Service (https://sis.agr.gc.ca/cansis/nsdb/npdb/index.html) and Canadian Forest Service¹⁰. Figure 2 shows the spatial location and distribution of the combined point soil data used for this project. Most of the sampled pedons were located in the southern regions of Canada where land used for agriculture was located. The locations of the legacy pedons were often selected based on ease-of-access and access permission, therefore the distribution of those pedons was skewed and clustered. The great group soil class names used in the point soil data set were harmonized based on the third edition of the Canadian Soil Classification System²⁵.

Co-variable data processing and compilation

The co-variables of PSM work include climate, land cover, topography and surficial geology data. Within the topographic theme, 100 m grid size topographic co-variables such as Iwahashi and Pike landform classes²⁸ and Multi-resolution valley bottom flatness²⁹ were derived using the 16 m grid size digital elevation model (DEM) data and SAGA GIS³⁰. Table 1 further summarizes all 70 co-variable layers used for this work.

Table 1 Co-variables used for 100 m soil data prediction.

Full size table

Data tiling and parallel computing

Tiling and parallel computing solutions were used to make more efficient use of available computing capacity. To cover Canada, 400 tiles were evenly created to subset the underlying data stack used for prediction, bias correction, and uncertainty computations as described in the following sections (Fig. 3). 230 tiles with full or partial overlaps with the landmass of Canada were used in the PSM data pipeline. The Future and Furrr packages for R were used for the implementation of parallel computing²³.

Bias correction and uncertainty measure

Similar to each of the studied machine learners, random forest-based machine learning has limitations. For example, random forest-based machine learning can cause conditional biases which are different from systematic biases^9,11. Conditional bias in this case means that the variability of predicted attribute values is less than that resulting from training or observation data. That includes the narrowed maximum and minimum range of the observed input data. For example, in addition to spatial variability differences, the range of observed soil bulk density of Canada is between 2.42 g/cm³ and 0 g/cm³, the predicted data range is between 1.79 g/cm³ and 0.27 g/cm³. It is an added advancement to include bias correction of random forest machine learner. Based on initial solutions by Zhang and Lu¹¹, the bias correction model-1 was used here. The model-1 uses both predictor (independent) variable and response (dependent) variable to correct inference-introduced bias by random forest. For soil class prediction, no bias correction was applied. The uncertainty measure of soil class prediction was calculated from the majority appearance counts in percentage among the prediction iterations.

Under the specifications of the Global Soil Partnership (GSP)¹³, uncertainties of predicted soil class and property data should be attached. However, no specific methods of uncertainty measures are proposed³¹. In this project, for soil properties the 5% and 95% quantiles were generated using QRF algorithms. To present uncertainty intuitively to end users, relative prediction intervals (RPI) were also calculated^31,32. RPI are the ratio between the prediction interval (PI) and training data confidence interval (TDCI) (Eqs. 2,3,4)³¹. Globally, RPI values below 1 indicate that the predicted values are within the 0.05 and 0.95 quantile range of training data:

$$P{I}_{90}={P}_{95}-{P}_{5}$$

(2)

$${TDC}{I}_{90}={Q}_{95}-{Q}_{5}$$

(3)

$${RP}{I}_{90}=\frac{P{I}_{90}}{{TDC}{I}_{90}}$$

(4)

where

P₉₅ is the 95% quantile of the modeled predictions

P₅ is the 5% quantile of the modeled predictions

Q₉₅ is the 95% quantile of the training data

Q₅ is the 5% quantile of the training data

Soil depth data adjustment with depth-to-bedrock data

The outputs of PSM data should be further corrected or adjusted with depth-to-bedrock data. In Canada, there are many locations where soils are shallower than 1 meter. In those places predicted soil attribute values below the depth-to-bedrock are set to “no data”. The input data for this operation include the 100 m soil grids and derived depth-to-bedrock data in meters. The depth-to-bedrock data were compiled using multiple raster and vector data sources (https://sis.agr.gc.ca/cansis/nsdb/psm/depth_to_bedrock_canada_100m.zip). Some abrupt division lines needed to be corrected along with future updates of the national 100 m soil landscape grids because of the multiple data sources and the nature of the compiled depth-to-bedrock data. The outputs represented depth-to-bedrock corrected national 100m soil landscape grids data.

Model-based and independent statistical validations

Generally, there are two groups of methods used to validate the outputs of PSM. One group is model-based, which uses spatial stochastic models such as multi-fold cross-validation by randomly splitting training data multiple times during prediction procedures. The other is based on independent probabilistic samples, in which the independent samples can be from specific points or areas. Each group of validation methods can have advantages and disadvantages^15,33. In this project, indicators of coefficient of determination (R²) and root mean squared error (RMSE) from bias-correction QRF models were used. For the whole coverage of Canada, soil organic carbon stocks (T/ha) at 0–30 cm depth derived from this 100 m and the global 250 m grids⁵ were used for correlation analysis.

Areal-based statistical validation and data processing

Canada has seven physiographic regions and a wide variety of soil forming factors such as climate, topography, surficial geological materials, and vegetation covers. Soils across Canada are classified into 10 orders²⁵. Although there are national 1:1 million scale soil maps/data, detailed soil surveys were mainly conducted within the agricultural extent of Canada. Less than 7% of land is used for agriculture in Canada (Fig. 4)³.

For the predicted soil classes at the Great Group classification level²⁵, areal-based statistical validation was conducted with dominant soil types summarized by 1:1 million scale soil landscape of Canada (SLC) version 3.2 polygons (https://sis.agr.gc.ca/cansis/nsdb/slc/v3.2/index.html). Quantity and allocation disagreement statistics were calculated between the SLC and 100 m soil grids based dominant soil types summarized by the SLC polygons³⁴.

While the national soil landscape grids were produced with a PSM method for the entirety of Canada in this study, three study sites where detailed soil surveys (https://sis.agr.gc.ca/cansis/nsdb/dss/v3/index.html) and recent predictive soil mapping data exist were selected for further statistical independent validations of this 100 m soil landscape grids data.

In the selected sites, soil properties from soil surveys and various PSM sources were summarized by each of the soil survey polygons¹². Among the compared soil attributes of different data sources, summary statistics and Pearson correlation coefficients were calculated using R²³.

Swan Lake watershed site

The Swan Lake watershed site is located within a sub-watershed of the Swan Lake basin, Manitoba, Canada. The region receives an annual average of 530 mm of precipitation. The average annual temperature of the region is 1.6 °C and the frost free period ranges from 87 to 110 days between 1991 and 2000 (https://climate.weather.gc.ca/climate_normals/index_e.html#1991, accessed November 25. 2024). The dominant landscapes of the site are flat to rolling topography. Most of this area is under annual crop cover with some under grasslands. The detailed soil surveys of this region are at 1:50,000 scale. From this site, 10 m soil landscape grids data were derived using the same PSM method as the one for this work, with purposively designed point soil samples from 2022.

Breadalbane watershed site

The Breadalbane watershed is located in Prince Edward Island (PEI) within the Gulf of St. Lawrence region of Canada. The cool and humid climate is mainly influenced by continental air masses that are humidified and temperature-moderated by the surrounding ocean waters. January and July mean temperatures are −7 °C and 18.7 °C, respectively with an annual mean precipitation of 1100 mm. The frost-free period varies from 100 to 160 days allowing for the cultivation of a wide variety of crops^35,36. The conventional soil survey data used in this study were based on 1:20,000 scale soil survey. From this site, with purposefully designed point soil samples from 2018, 10 m soil landscape grids data were derived using the same PSM method as the one for the 100 m soil grid development.

West Block of the Grasslands National Park site

The West Block of the Grasslands National Park, Saskatchewan, Canada is located within a semi-arid region with an annual mean precipitation of 363 mm between 1991 and 2000. While approximately one third of this total falls as snow, the remainder falls as rain mostly during infrequent heavy summer thunderstorms. From June to August, normal daily mean temperatures range from 15 °C to 18 °C (https://climate.weather.gc.ca/climate_normals/index_e.html#1991, accessed November 25. 2024). In this site, the available conventional soil survey is at a scale of 1:50,000; with purposively designed point soil samples from 2021, 50 m soil landscape grids data were derived using the same PSM method as the one used for this work.

Areal data processing and compilation

The available soil surveys and PSM data come with different scales and resolutions. However, the data structures or data models are common. For the soil surveys, mapped soil areal polygons are associated with polygon attribute table (PAT), soil component table (SCT), soil name table (SNT) and soil layer table (SLT). Within a soil polygon, there can be more than one soil component. Each of the percentile soil components is linked to a unique soil name/type. A soil name/type is linked to multiple records of soil layers. Further details of the soil survey entity model are described in the Canadian Soil Information Service (CanSIS) web site (https://sis.agr.gc.ca/cansis/nsdb/dss/v2/data_model.html). Among the reported soil attributes, soil bulk density (BD) (g/cm³), sand (%), clay (%), and soil organic carbon (SOC) (%) were selected for independent evaluation use. Sand and clay data are inherent soil properties. SOC content and BD represent both inherent and human activity influenced soil properties. For example, SOC content at 0–30 cm depends strongly on climate, soil, topography, land use, and management. These soil properties vary greatly across fields and landscapes due these factors. Among the three selected sites, valid soil survey polygons were used for areal data summary and comparison. Figure 5 shows the key steps of soil survey polygon-based data compilation and processing. The final areal data of soil BD, sand content clay content, and SOC content from 0–30 cm depth were used for further summary statistics and correlation coefficient calculations.

All the processed soil survey and soil grids data and codes are provided via a GitHub repository (https://github.com/CanSISPSM/PSM100mCanada).

Data Records

The 100 m soil landscape grids of Canada data are available at Zenodo³⁷. The data set is also accessible via a government public facing web site (link: https://sis.agr.gc.ca/cansis/nsdb/psm/index.html) The primary distribution list includes GeoTiff files for SOC, BD, sand, clay, silt, pH (CaCl₂ buffered), cation exchange capacity (CEC) and associated quantile uncertainty data at five depth intervals. Further details of the data are provided via data stack metadata (link: https://agriculture.canada.ca/atlas/data_donnees/griddedSoilsCanada/supportdocument_documentdesupport/en/ISO_19131_Soil_Landscape_Grids_of_Canada_100m_%e2%80%93_Data_Product_Specifications.pdf).

Technical Validation

Table 2 shows coefficients of determination (R²) and root mean squared error (RMSE) between the out-of-bag (OOB) predicted and observed data. Overall, the intrinsic statistical validation with R² values range between 0.70 and 0.82 indicates strong performance of the RF with bias correction measures.

Table 2 Out of bag accuracy of 100m predictive soil mapping.

Full size table

With the 0–30 cm SOC stock(T/ha) data of the 250 m grids⁵ and this 100 m soil grids data, the overall coefficient of determination (R²) is 0.49. For the overall accuracy assessment of predicted soil classes, Table 3 shows 62% overall agreement between the predicted and soil survey reported values.

Table 3 Statistical accuracy of predicted soil classes.

Full size table

Among the three independent statistical validation sites, soil survey polygons were used to summarize SOC, BD, sand, and clay contents from available soil surveys with scales ranging from 1:20,000 to 1:50,000, and from soil grids with resolutions ranging from 10 m to 100 m. SOC, BD, sand and clay values at 0–30 cm depth are summarized for the Swan Lake, Breadalbane, and Grassland National Park sites (Tables 4–6). Although the mean values of SOC, BD, sand, and clay by soil survey polygons are generally within expected ranges, there are wider mean value ranges for some attributes among the sites. For example, at the Swan Lake site, the soil survey reported mean sand content was 35.29% which is much lower than 52.84% and 48.67% of the 10 m and 100 m soil grids reported respectively.

Table 4 Summary statistics of selected soil attributes, Swan Lake site.

Full size table

Table 5 Summary statistics of selected soil attributes, Breadalbane site.

Full size table

Table 6 Summary statistics of selected soil attributes, Grassland National Park site.

Full size table

Further correlation analysis outcomes are presented in Figs. 6–8. The correlation analysis results presented in Figs. 6–8 show degrees of agreement. For example, in Fig. 8, the soil bulk density of soil surveys are positively correlated to the 50 m and 100 m soil grids. There are also disagreements in the results. For example, in Fig. 8 clay content summarized by soil survey polygons were shown to be negatively correlated between the 50 m and 100 m soil grids.

Legacy soil surveys data were collected 20–30 years ago and are highly generalized. In contrast, the 10 m to 50 m soil grids of the selected sites were more spatially-explicit and based on recent soil samples. Neither the legacy soil survey nor the finer soil grids can offer true validation data for the 100 m soil grids in this case. However, the degrees of disagreement on compared soil properties between the 100 m soil grids and finer resolution data sources indicate that the 100 m soil grids should not be used for field scale applications. For instance, for the Swan Lake site (Fig. 6), predicted SOC and sand contents of 100 m soil grids are positively correlated with the values from finer 10 m soil grids. In contrast, the predicted BD and clay values of 100 m soil landscape grids are negatively correlated with the values from the 10 m soil grids. For Bread Albane (Fig. 7), predicted sand and clay values of 100 m soil grids are positively correlated with the ones of the 10 m soil grids while predicted SOC and BD values are negatively correlated with the ones of 10 m soil grids. For Grassland National Park (Fig. 8), predicted SOC and BD values of the 100 m soil grids are positively correlated with the ones of 50 m soil grids while predicted sand and clay contents of the 100 m soil grids are negatively correlated with the ones of 50 m soil grids. This 100 m soil landscape grid data set is the current version of the ongoing incremental soil grid data development by the Canadian Soil Information Service. With more added ground point data, the accuracy of the 100 m soil grids is expected to improve.

Usage Notes

Except for the metadata of this published data stack, all the raster data files are in GeoTiff format. Both commercial-off-the-shelf and open-source geographic information system software can be used to read, manipulate, and integrate this data set of files. Given the resolution (100 m) and incremental developmental nature of this data set, the data are suitable for national and regional scale applications and decision-making uses. For applications at watershed and field scales, finer resolution soil grids should be developed and used.

Code availability

Along with subset of inputs data, the R code used to create the 100 m soil grids of Canada is available on GitHub (https://github.com/CanSISPSM/PSM100mCanada).

References

Sanchez, P. A. et al. Digital Soil Map of the World. Science 325, 680–681 (2009).
Article CAS PubMed Google Scholar
Geng, X. et al. Toward Digital Soil Mapping in Canada: Existing Soil Survey Data and Related Expert Knowledge. in Digital Soil Mapping: Bridging Research, Environmental Application, and Operation (eds. Boettinger, J. L., Howell, D. W., Moore, A. C., Hartemink, A. E. & Kienast-Brown, S.) 325–335, https://doi.org/10.1007/978-90-481-8863-5_26 (Springer Netherlands, Dordrecht, 2010).
Geng, X. Development of operational methods to predict soil classes and properties in Canada using machine learning. https://doi.org/10.22215/etd/2020-14027 (Carleton University, Ottawa, Ontario, 2020).
Lagacherie, P. et al. Digital Soil Mapping - An Introductory Perspective (2006).
Hengl, T. et al. SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 12, e0169748 (2017).
Article PubMed PubMed Central Google Scholar
Poggio, L. et al. SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. SOIL 7, 217–240 (2021).
Article ADS CAS Google Scholar
Malone, B. P., Minasny, B., Odgers, N. P. & McBratney, A. B. Using model averaging to combine soil property rasters from legacy soil maps and from point data. Geoderma 232–234, 34–44 (2014).
Article ADS Google Scholar
Beguin, J., Fuglstad, G.-A., Mansuy, N. & Paré, D. Predicting soil properties in the Canadian boreal forest with limited data: Comparison of spatial and non-spatial statistical approaches. Geoderma 306, 195–205 (2017).
Article ADS CAS Google Scholar
Sylvain, J.-D., Anctil, F. & Thiffault, É. Using bias correction and ensemble modelling for predictive mapping and related uncertainty: A case study in digital soil mapping. Geoderma 403, 115153 (2021).
Article ADS Google Scholar
Shaw, C., Hilger, A., Filiatrault, M. & Kurz, W. A Canadian upland forest soil profile and carbon stocks database. Ecology 99, 989–989 (2018).
Article PubMed Google Scholar
Zhang, G. & Lu, Y. Bias-corrected random forests in regression. Journal of Applied Statistics 39, 151–160 (2012).
Article MathSciNet Google Scholar
Lagacherie, P., Arrouays, D., Bourennane, H., Gomez, C. & Nkuba-Kasanda, L. Analysing the impact of soil spatial sampling on the performances of Digital Soil Mapping models and their evaluation: A numerical experiment on Quantile Random Forest using clay contents obtained from Vis-NIR-SWIR hyperspectral imagery. Geoderma 375, 114503 (2020).
Article ADS Google Scholar
Arrouays, D., Lagacherie, P. & Hartemink, A. E. Digital soil mapping across the globe. Geoderma Regional 9, 1–4 (2017).
Article Google Scholar
Rossiter, D. G., Poggio, L., Beaudette, D. & Libohova, Z. How well does Predictive Soil Mapping represent soil geography? An investigation from the USA. https://doi.org/10.5194/soil-2021-80 (2021).
Brus, D. J., Kempen, B. & Heuvelink, G. B. M. Sampling for validation of digital soil maps. European J Soil Science 62, 394–407 (2011).
Article Google Scholar
Nauman, T. Data from: Soil Landscapes of the United States 100-meter (SOLUS100) soil property maps project repository https://doi.org/10.15482/USDA.ADC/25033856.V1 (2024).
Han, S. Y., Filippi, P., Singh, K., Whelan, B. M. & Bishop, T. F. A. Assessment of global, national and regional‐level digital soil mapping products at different spatial supports. European Journal of Soil Science 73, e13300 (2022).
Article Google Scholar
Lemercier, B. et al. Multiscale evaluations of global, national and regional digital soil mapping products in France. Geoderma 425, 116052 (2022).
Article ADS Google Scholar
Oreskes, N. Evaluation (not validation) of quantitative models. Environ Health Perspect 106, 1453–1460 (1998).
Article PubMed PubMed Central Google Scholar
Jenny, H. Factors of Soil Formation: A System of Quantitative Pedology. (Dover Publ, New York, 1994).
Matheron, G. The Theory of Regionalized Variables and Its Applications. (École national supérieure des mines, 1971).
McBratney, A. B., Mendonça Santos, M. L. & Minasny, B. On digital soil mapping. Geoderma 117, 3–52 (2003).
Article ADS Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, Austria, 2024).
Wright, M. N. & Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and r. J. Stat. Soft. 77 (2017).
The Canadian System of Soil Classification. (NRC Research Press, Ottawa, 1998).
Arrouays, D. et al. GlobalSoilMap. in Advances in Agronomy vol. 125, 93–134 (Elsevier, 2014).
Bishop, T. F. A., McBratney, A. B. & Laslett, G. M. Modelling soil attribute depth functions with equal-area quadratic smoothing splines. Geoderma 91, 27–45 (1999).
Article ADS Google Scholar
Iwahashi, J. & Pike, R. J. Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature. Geomorphology 86, 409–440 (2007).
Article ADS Google Scholar
Gallant, J. C. & Dowling, T. I. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resources Research 39, 2002WR001426 (2003).
Article Google Scholar
Conrad, O. et al. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 8, 1991–2007 (2015).
Article ADS Google Scholar
Nauman, T. W. & Duniway, M. C. Relative prediction intervals reveal larger uncertainty in 3D approaches to predictive digital soil mapping of soil properties with legacy data. Geoderma 347, 170–184 (2019).
Article ADS Google Scholar
Ciampalini, R., Lagacherie, P. & Hamrouni, H. Documenting GlobalSoilMap. Net grid cells from legacy measured soil profile and global available covariates in Northern Tunisia (2012).
Heuvelink, G. Uncertainty quantification of GlobalSoilMap products. GlobalSoilMap: Basis of the Global Spatial Soil Information System - Proceedings of the 1st GlobalSoilMap Conference 335–340, https://doi.org/10.1201/b16500-62 (2014).
Pontius, R. G. & Millones, M. Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing 32, 4407–4429 (2011).
Article ADS Google Scholar
Nyiraneza, J. et al. Long-Term Changes in Mehlich-3-Extractable Sulfur and Magnesium under Intensive Agriculture in Eastern Canada. Communications in Soil Science and Plant Analysis 50, 2505–2520 (2019).
Article CAS Google Scholar
MacDougall, J. I., Veer, C. & Wilson, F. Soils of Prince Edward Island. (Research Branch, Agriculture Canada, Ottawa, 1988).
Geng, X., He, J. & Kiley, S. 100m Soil Landscape Grids of Canada (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15473720 (2025).
Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A. & Hegewisch, K. C. TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Sci Data 5, 170191 (2018).
Article PubMed PubMed Central Google Scholar
Hengl, T. & Parente, L. Long-term MODIS LST day-time and night-time temperatures, sd and differences at 1 km based on the 2000–2020 time series. https://doi.org/10.5281/ZENODO.6458406 (2022).
Didan, K. MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061. https://doi.org/10.5067/MODIS/MOD13Q1.061 (2021).
Fulton, R. J. Surficial Materials of Canada. 1880A https://ostrnrcan-dostrncan.canada.ca/handle/1845/191883, https://doi.org/10.4095/205040 (1995).

Download references

Acknowledgements

This work is funded under the Government of Canada Agriculture Climate Solution (ACS) program. Some team members of Canadian Soil Information Service (CanSIS) provided initial archived point soil data and technical support. Thanks to Ben Stewart for his support on the national pedon database work. Also sincere gratitude goes to Swan Lake First Nation for their support, collaboration, contributions and commitment throughout the duration of this project. This partnership exemplifies the importance of ongoing relationships with Indigenous communities built on mutual respect and a commitment to learn from one another.

Author information

Authors and Affiliations

Science and Technology Branch, Agriculture and Agri-Food Canada, Ottawa, Canada
Xiaoyuan Geng, Juanxia He, Yefang Jiang, Stephen Crittenden, Simon Kiley & Albert. J. VandenBygaart
Swan Lake First Nation, Swan Lake Land Governance Office, Swan Lake, R0G 0B9, Manitoba, Canada
Victoria Grima
Parks Canada, P.O. Box 150, Val Marie, S0N 2T0, Saskatchewan, Canada
Maggi Tetreau
Pest Management Regulatory Agency, Health Canada, Ottawa, Canada
Jason Vanrobaeys

Authors

Xiaoyuan Geng
View author publications
Search author on:PubMed Google Scholar
Juanxia He
View author publications
Search author on:PubMed Google Scholar
Victoria Grima
View author publications
Search author on:PubMed Google Scholar
Yefang Jiang
View author publications
Search author on:PubMed Google Scholar
Maggi Tetreau
View author publications
Search author on:PubMed Google Scholar
Stephen Crittenden
View author publications
Search author on:PubMed Google Scholar
Simon Kiley
View author publications
Search author on:PubMed Google Scholar
Albert. J. VandenBygaart
View author publications
Search author on:PubMed Google Scholar
Jason Vanrobaeys
View author publications
Search author on:PubMed Google Scholar

Contributions

Xiaoyuan Geng: Lead and conducted predictive soil mapping work at across scales and compiled and processed all the source data for this paper. Juanxia He: Programmed and conducted predictive soil mapping work and manuscript writing. Victoria Grima and Jason Vanrobaeys: Co-ordinated the planning and the organization together with the training component for the conducted field sampling work for the Swan Lake site. Victoria Grima: Cataloged and validated the data collected during the field sampling work for the Swan Lake site. Also supervised the field sampling work which was carried out by a team of youths from Swan Lake First Nation. Yefang Jiang: Enabled the field sampling work for the Breadalbane site and manuscript writing. Maggi Tetreau: Managed and enabled the field sampling work at Grassland National Park site. Stephen Crittenden: Organized and conducted soil bulk density measurements for the Swan Lake site and manuscript writing. Simon Kiley: Enabled data publishing and sharing. Albert VandenBygaart: Provided inputs on overall predictive soil mapping framework and manuscript writing.

Corresponding author

Correspondence to Xiaoyuan Geng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Geng, X., He, J., Grima, V. et al. 100 m soil landscape grids of Canada. Sci Data 12, 1178 (2025). https://doi.org/10.1038/s41597-025-05460-4

Download citation

Received: 30 January 2025
Accepted: 24 June 2025
Published: 10 July 2025
DOI: https://doi.org/10.1038/s41597-025-05460-4