Background & Summary

Spatially explicit information on available P and exchangeable K is crucial for understanding and managing agricultural soil fertility1,2. These nutrients are fundamental determinants of crop productivity, with phosphorus essential for root development and early crop growth, while potassium plays vital roles in photosynthesis, enzyme activation and plant stress resistance3,4. P and K’s availability and spatial distribution significantly influence fertilizer use efficiency and agricultural sustainability5, especially in semi-arid countries like Morocco where nutrient limitations in soil often constrain crop yields by restricting plant growth and development6.

Despite the recognized importance of P and K in agricultural systems, research efforts worldwide, specifically in Morocco, have primarily focused on soil organic carbon mapping7,8, leaving these essential nutrients comparatively understudied. While global and continental-scale soil mapping initiatives like iSDAsoil9 (https://isda-africa.com/isdasoil, last access Dec-23-2024) provide broad coverage of soil properties across Africa at 30-meter resolution, their predictions for Morocco were generated from a small number of samples, which introduces considerable uncertainty in their local applicability. Previous studies of soil nutrients in Morocco have been restricted mainly to local scales or specific agricultural regions10,11,12,13,14, limiting their utility for national-level planning and decision-making regarding fertilizer recommendations for strategic crops.

The only available cartographic information on soil fertility in Morocco was produced as part of the Fertimap project, which aimed to develop maps of key soil properties including organic matter, pH, phosphorus and potassium, along with fertility recommendation tools. Although these data are accessible via the project platform, their data cannot be downloaded or repurposed for further applications. Additionally, while the maps were generated using standard interpolation techniques, the lack of documented methodologies and model performance metrics hinders the clarity and reproducibility of the findings.

To address these limitations, this study aims to develop the first comprehensive national baseline maps of plant-available phosphorus and exchangeable potassium for Morocco’s croplands using advanced digital soil mapping techniques with machine learning algorithms and environmental covariates. We introduce a new gridded database of available-P (ppm) and exchangeable-K (ppm) at 250-meter spatial resolution covering Morocco’s cropland areas, developed using digital soil mapping techniques and soil databases. Our approach integrates Random Forest machine learning with a comprehensive set of environmental covariates, following specifications aligned with international soil mapping standards and the SCORPAN framework15.

The mapping methodology incorporated a set of environmental covariates, including climate data, terrain attributes, remote sensing indices, and soil parent material, enabling robust predictions across Morocco’s diverse agricultural landscapes. Unlike previous efforts through the Fertimap project that employed traditional interpolation methods16, this study represents the first application of machine learning-based digital soil mapping for comprehensive nutrient assessment at the national scale in Morocco. Particular attention was paid to assessing map accuracy using independent validation data and quantifying prediction uncertainty through ensemble modeling approaches.

These baseline maps represent a significant advance in understanding soil nutrient status across Moroccan cropland areas, providing spatially explicit nutrient information previously unavailable at the national scale. The maps are freely available, facilitating their use by researchers, practitioners, and policymakers for evidence-based decision-making and targeted interventions to improve agricultural productivity and sustainability.

Methods

Soil phosphorus and potassium data

Soil data used in this study were compiled from various sources collected between 2010 and 2022, with approximately 50% of the samples derived from the Fertimap project database and the remaining 50% from complementary datasets compiled through various Moroccan organizations including the Ministry of Agriculture, the National Institute for Agricultural Research, and the Regional Offices for Agricultural Development. In total, we used 5,276 and 6,978 georeferenced topsoil samples (0–30 cm) for available-P and exchangeable-K prediction, respectively (Fig. 1).

Fig. 1
Fig. 1
Full size image

Spatial distribution of soil samples used for the prediction of the soil available phosphorus and exchangeable potassium.

For the Fertimap data, we utilized only a subset of the total dataset, knowing that the full dataset comprises approximately 32,000 samples collected under the project. The foundational Fertimap dataset is not publicly accessible, and while soil property maps are published on the Fertimap website (www.fertimap.ma) for visualization purposes, the underlying raster files (such as TIFF format) are not shared, limiting their reusability for further research applications. For this study, we obtained access to a subset of the Fertimap data through formal request from the responsible parties at the National Institute of Agricultural Research. Additional information about the Fertimap project can be found in Bouabid et al.17.

Environmental covariates

The spatial prediction of soil phosphorus and potassium was supported by a set of environmental covariates representing the major soil-forming factors according to the SCORPAN approach15. For the Soil property covariate, we used the soil organic carbon map from (https://github.com/abdelkrim-bsr/SOC_Morocco), details about this data can be found at Bouasria et al.8, with a 250 m resolution. Additionally, long-term Landsat bare earth spectral reflectance (2002–2022) was incorporated for six spectral bands (Blue, Green, Red, NIR, SWIR1, and SWIR2), generated following the methodology described in Demattê et al.18. Climate variables: temperature parameters (annual mean, maximum of warmest month, minimum of coldest month), annual precipitation were obtained from WorldClim version 2.0 database19 (https://www.worldclim.org), reference evapotranspiration (both mean and standard deviation) and the global aridity index were obtained from (https://doi.org/10.6084/m9.figshare.7504448.v5; Zomer et al.20). Hydrological variables included the height above the nearest drainage derived at multiple resolutions (30 m with 100 and 1000 river head threshold cells and 90 m with 1000 threshold cells) from (https://gee-community-catalog.org/projects/hand/; Donchyts et al.21). The global hydrologic curve number dataset (GCN250) provided additional information on runoff potential under dry, average, and wet antecedent conditions were extracted from Jaafar and Ahmad22. Vegetation and land use indicators were derived from multiple remote sensing sources. These included monthly median values of vegetation indices (NDVI, EVI) based on MODIS data (2000–2023) and extracted from (https://doi.org/10.5067/MODIS/MCD43A4.006; Schaaf and Wang23), Sentinel-1 radar backscatter measurements in VV and VH polarization (2016–2017) are available vis its DOI (https://doi.org/10.48436/n2d1v-gqb91; Bauer-Marschallinger et al.24), long-term cropping intensity (2001–2019) (https://doi.org/10.6084/m9.figshare.14099402; Liu et al.25), and the Biodiversity Intactness Index (2017–2020) (https://data.nhm.ac.uk/dataset/bii-bte; Hudson et al.26). Anthropogenic influences were represented through global human modification indices covering 1990–2015, including agricultural stress indicators and human intrusion metrics (https://doi.org/10.5281/zenodo.3963013; Theobald et al.27). Terrain attributes were derived from the MERIT Digital Elevation Model (https://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_DEM/; Yamazaki et al.28), comprising slope, aspect, curvature, topographic wetness index, Valley Bottom Flatness, various roughness measures, and geomorphological classifications. Parent material information was incorporated through lithological classes from the Global Lithological Map database v1.1 (https://doi.pangaea.de/10.1594/PANGAEA.788537; Hartmann and Moosdorf29). All the 76 environmental covariates were preprocessed to ensure consistent spatial resolution (250 m) and projection. Continuous variables were resampled using bilinear interpolation, while categorical variables were processed using nearest neighbor resampling. All processing was performed using Google Earth Engine30, ensuring computational efficiency and reproducibility. All covariates were masked using a long-term mean of annual global cropping intensity layer to exclude non-agricultural areas.

Predictive modeling

Figure 2 outlines the methodological flowchart adopted in this study. Available-P and exchangeable-K prediction datasets were partitioned into training (70%) and testing (30%) subsets using a stratified random sampling approach. The stratification was performed by first sorting the data according to available-P and exchangeable-K prediction values and dividing it into 10 equal groups, ensuring that the full range of nutrient values was proportionally represented in both subsets. This approach maintains the statistical distribution of the target variables and minimizes potential sampling bias in model development and validation.

Fig. 2
Fig. 2
Full size image

Methodological flowchart.

The 76 environmental covariates were considered potential predictors. Random Forest (RF) was selected as the primary modeling algorithm due to its demonstrated capability in providing high performance across diverse soil science applications, as highlighted by Lamichhane et al.31. Furthermore, Bouasria et al.8 found that RF models outperformed other state-of-the-art algorithms including XGBoost and LightGBM when applied to predict soil organic carbon across Morocco’s diverse landscapes. For feature selection, a Recursive Feature Elimination (RFE) approach was integrated with Random Forest modeling to determine the most influential covariates for each nutrient. The RFE-RF combination has demonstrated superior performance in various soil mapping studies in Morocco, proving effective for nutrient prediction applications8,32,33,34,35.

Random Forest Regression models (RF) were fitted separately for phosphorus and potassium using the ‘ranger’ package in R using the selected covariates. The grid search was performed to optimize four key hyperparameters. The optimization process evaluated combinations of different numbers of trees (100, 500, 1000), numbers of variables per split (2, 3, 4, 5), minimum node sizes (5, 10), and maximum tree depths (5, 10, 15, 20). Model performance under different parameter combinations was assessed using 10-fold cross-validation on the training dataset. The optimal parameter set was selected to minimize the Root Mean Square Error (RMSE). These optimized parameters were then used to develop the final prediction models for both nutrients. Model fitting and prediction were performed under R programming language.

Accuracy assessment and uncertainty analysis

The models for available-P and exchangeable-K were validated using independent test datasets, representing 30% of the total samples. Model performance and predictive accuracy were evaluated based on multiple statistical metrics, including the coefficient of determination (R2), Lin’s concordance correlation coefficient (LCCC), Root Mean Square Error (RMSE), Mean Error (ME), and the Ratio of Performance to Interquartile Range (RPIQ).

Spatial prediction uncertainty was quantified through an ensemble modeling approach using the bootstrapping method. The data splitting process was repeated 25 times, with each iteration using stratified random sampling approach for splitting the dataset into 70% training and 30% testing. The RF model was trained on the respective training subset for each bootstrapping iteration and applied to predict phosphorus and potassium values across all cropland areas. This generated 25 independent prediction maps for each nutrient. The ensemble of predictions was used to calculate based-pixel Prediction Interval Ratio (PIR), defined as the width of the 90% prediction interval (difference between the 95th and 5th percentiles of predictions) divided by the predicted median value (50th percentile)

Generating available-P and exchangeable-K prediction maps

The selected covariates for each parameter and the best RF models were used to generate spatial predictions of available-P (ppm) and exchangeable-K prediction (ppm) at 250 m resolution across Morocco’s cropland areas.

All spatial predictions were processed and stored as GeoTIFF files to maintain the spatial reference system and resolution of the input environmental covariates.

Data Records

Generated maps with the lower and upper interval can be downloaded from the Zenodo repository at https://doi.org/10.5281/zenodo.1507325736. The data was organized as follows. ‘available-P_250m_croplands.tif’: The zip file contains the generated map for the available phosphorus (P2O5 ppm) with a spatial resolution of 250 m and a coordinate system WGS84 (EPSG:4326). ‘exchangeable-K_250m_croplands.tif’: The zip file contains the generated map for the exchangeable potassium (K2O ppm) with a spatial resolution of 250 m and a coordinate system WGS84 (EPSG:4326).

Technical Validation

Prediction accuracy

Using the RFE method, 15 and 50 covariates were selected as important covariates for the prediction of available-P (ppm) and exchangeable-K (ppm), respectively. The variable importance of all selected covariates is presented in Fig. 3 with the spatial distribution of the top 3 features for each parameter, with an important contribution of climatic and topographic data. The predictive accuracy performance of the models for both parameters is given in Table 1 and Fig. 4.

Fig. 3
Fig. 3
Full size image

Maps of the top three covariates and the importance of variables for predicting phosphorus and potassium.

Table 1 Best RF model accuracy performances under train and test datasets.
Fig. 4
Fig. 4
Full size image

Random Forest model performance for available-P (ppm) and exchangeable-K (ppm) prediction for training and validation using bootstrapping method.

For exchangeable-K, the initial model achieved a high R2 of 0.95 and LCCC of 0.97 on the training dataset, with an RMSE of 34 ppm and MAE of 23 ppm. The model maintained strong performance using the validation dataset, with R2 of 0.80, LCCC of 0.88, and RMSE of 67 ppm. Similarly, the available-P model demonstrated robust performance on both training data (R2 = 0.95, LCCC = 0.97, RMSE = 17 ppm) and test dataset (R2 = 0.78, LCCC = 0.87, RMSE = 35 ppm). Box plots (Fig. 4) also show that model performance metrics remained stable under bootstrapping (25 times repeated resampling), particularly for LCCC and R2, indicating robust model behavior. In addition, the RPIQ values were consistently higher for exchangeable-K compared to available-P across bootstrap iterations, suggesting better relative prediction accuracy for potassium regardless of training data composition.

Spatial mapping and accuracy analysis

Figures 5 and 6 present the spatial distribution of available-P (ppm) and exchangeable-K (ppm) concentrations across Moroccan croplands, and their associated uncertainty. The nutrient concentration maps are classified according to Moroccan agricultural standards13,37, ranging from very low to very high fertility levels. For phosphorus, the classification ranges from very low (<15 ppm) to very high (>100 ppm), while potassium is categorized from very low (<60 ppm) to very high (>300 ppm). The spatial patterns reveal distinct regional variations in nutrient status across Morocco’s agricultural landscapes. The corresponding uncertainty maps display the Prediction Interval Ratio (PIR90%), with consistently low values (generally <0.8 for phosphorus and <0.5 for potassium) across most cropland areas. These low uncertainty values corroborate the robust performance metrics observed during model validation, demonstrating the models’ stability and reliability in predicting both soil nutrients (especially high R2 values). Globally, areas with higher uncertainty values typically correspond to regions with limited sampling density.

Fig. 5
Fig. 5
Full size image

Spatial distribution of available-P (ppm) and associated uncertainty (expressed by Prediction Interval Ratio (PIR)).

Fig. 6
Fig. 6
Full size image

Spatial distribution of exchangeable-K (ppm) and associated uncertainty (expressed by Prediction Interval Ratio (PIR)).

Limitations and future work

This study presents several limitations that should be acknowledged. The spatial distribution of soil samples, while comprehensive at the national scale, remains uneven across Morocco’s diverse agricultural landscapes, with some regions having lower sampling density than others. This sampling heterogeneity may affect prediction accuracy in undersampled areas, as reflected in the uncertainty maps. Additionally, the temporal inconsistency of input data, spanning collection periods from 2010 to 2022, introduces potential variability that may not fully capture recent changes in soil nutrient status. The reliance on remote sensing covariates of variable quality and temporal resolution further contributes to prediction uncertainty, particularly in areas with complex topography or frequent cloud cover.

Another important limitation is the focus on topsoil layers (0–30 cm), which provides limited information about nutrient distribution in deeper soil horizons that may be relevant for certain crops and management practices. The 250-meter spatial resolution, while appropriate for national-scale applications, may not capture fine-scale variability needed for precision agriculture at the field level.

Future research efforts should address these limitations through expanded sampling campaigns targeting undersampled regions to improve spatial coverage and prediction accuracy. Priority should be given to collecting new samples across diverse agroecological zones and establishing collaborations with other Moroccan research institutes to compile a comprehensive national soil database. The long-term objective is to develop an integrated platform that provides both raw soil data and derived raster products for the research community, subject to data sharing agreements.

Future work will also explore higher spatial resolution products, potentially at 30-meter resolution, to better support precision agriculture applications. This study represents a first attempt to demonstrate the advantages of integrating machine learning with environmental covariates for soil mapping in Morocco, establishing a methodological framework that can be recommended for future soil mapping projects as an alternative to traditional interpolation methods employed in previous studies.

Usage Notes

The generated maps of available phosphorus (ppm) and exchangeable potassium (ppm) across Moroccan croplands, developed at a 250 m resolution, provide an essential foundation for comprehending soil nutrient distribution nation-wide. Being the first maps of their kind, they offer insightful information to promote sustainable agricultural practices, allowing policymakers to make better decisions and farmers to manage fertilizer more precisely. In other terms, these maps can help optimize crop productivity, reduce nutrient imbalances and guide targeted interventions where soil amendments are most needed.

Researchers can also exploit these maps to explore nutrient-crop interactions, assess the impact of soil management practices on long-term soil health, and use them as baseline data for modeling studies in agronomy, soil, hydrology, and environmental sciences. Additionally, these maps represent essential inputs for a wide range of applications, including land degradation assessments, crop yield prediction models, and studies on the impacts of climate change, where soil nutrient data is a crucial factor.

Their utility extends beyond farm-level decision-making to broader agricultural and environmental planning. Specifically, these maps facilitate the design of efficient soil monitoring networks, inform evidence-based agricultural policies, and support sustainable land management strategies. Furthermore, they contribute to national and regional initiatives aimed at enhancing food security, optimizing fertilizer use, and promoting environmental sustainability by enabling data-driven interventions for soil conservation and ecosystem resilience. Furthermore, these maps bridge data gaps and support existing global and African efforts to generate and model soil information7,38.