Abstract
Soil contamination by metals and metalloids (metal[loid]s) is a global issue with significant risks to human health, ecosystems, and food security. Accurate risk assessment depends on understanding metal(loid) mobility, which dictates bioavailability and environmental impact. Here we show a theory-guided machine learning model that predicts soil metal(loid) fractionation across the globe. Our model identifies total metal(loid) content and soil organic carbon as primary drivers of metal(loid) mobility. We find that 37% of the world’s land is at medium-to-high mobilization risk, with hotspots in Russia, Chile, Canada, and Namibia. Our analysis indicates that global efforts to enhance soil carbon sequestration may inadvertently increase metal(loid) mobility. Furthermore, in Europe, the divergence between spatial distributions of total and mobile metal(loid)s is uncovered. These findings offer crucial insights into global distributions and drivers of soil metal(loid) mobility, providing a robust tool for prioritizing metal(loid) mobility testing, raising awareness, and informing sustainable soil management practices.
Similar content being viewed by others
Introduction
Soil, a non-renewable and indispensable natural resource, underpins vital ecosystem functions1. However, the accumulation of metals and metalloids (hereafter referred to as metal[loid]s) in soil, whether naturally occurring or human-induced, poses a global threat with profound health implications for humans2, livestock3, and wildlife4. The threat stems from their toxicity, widespread occurrence, resistance to biodegradation, and potential for uptake by crops5. High contents of metal(loid)s in soil have been detected worldwide, notably in the European Union6 and China7, impacting vast populations. A global assessment revealed that over 23 million people inhabiting floodplains, which account for approximately 2.8% of the Earth’s land area8, are affected by mining-related metal(loid) accumulation9. With projected population growth and escalating demand for agricultural land10,11, the detrimental impacts of soil metal(loid) contamination on ecosystems and human health are likely to intensify.
To assess the risks associated with soil metal(loid) contamination accurately and inform effective remediation and policymaking12, it is crucial to consider not only the total content of these pollutants but also their mobility within the soil environment1. Metal(loid) mobility dictates their bioaccessibility for plant uptake and subsequent entry into the food chain1, making it a key determinant of their environmental fate and potential risks6,13,14. A comprehensive understanding of metal(loid) mobility in global soils, which is largely governed by chemical speciation, remains a critical gap in the literature.
Obtaining comprehensive, fine-resolution soil metal(loid) speciation data at continental or global scales remains a significant challenge, hindering efforts to assess and understand the global mobility of such pollutants. Although advanced analytical techniques, such as high-energy resolution X-ray absorption spectroscopy and isotope dilution, can accurately determine metal(loid) speciation15,16, their specialized nature and limited availability restrict their widespread application. Sequential extraction experiments, although widely used to fractionate metal(loid)s and evaluate bioaccessibility17,18,19, have not yet been used to consistently and reliably determine fractionation patterns at continental or global scales. This is attributed to the labor-intensive nature of sequential extraction experiments, the heterogeneous nature of soil properties, and the interplay of numerous confounding factors (detailed in Supplementary Note 1). Inconsistent results from previous chemical fractionation studies1 further underscore the critical knowledge gap in the fractionation and mobility analysis of soil metal(loid)s.
The availability of big Earth data and advancements in machine learning (ML) have enabled the development of global predictive environmental models, even with limited sample data20,21,22. Leveraging this opportunity, we compiled a global dataset of 30,832 field measurements of soil metal(loid) fractionation across 5 continents and 56 countries/territories (Methods and Supplementary Fig. 1). We then developed a theory-guided ML model to predict the fractionation of various metal(loid)s, which was trained and validated using the compiled dataset plus an external dataset with 4,900 literature- and experiment-derived data points (Methods). By integrating global topsoil properties at a 5-km resolution, our model uncovered the spatial distributions of the most prevalent metal(loid)s (Pb, As, Cd, Cr, Cu, and Hg), elucidating the fundamental drivers of their mobility in soil. We identified global hotspots of metal(loid) mobilization risk and evaluated their spatiotemporal changes under elevated contamination scenarios and potential soil carbon sequestration efforts. A detailed inspection of mobile metal(loid) content in the European Union (EU) member states was also conducted. This study highlights the dynamic nature of metal(loid) mobility, providing insights for mitigating global soil pollution and supporting sustainable soil management. The present model can also be used to aid soil metal(loid) assessment, raise awareness, and prioritize further testing of metal(loid) mobility at both local and global scales.
Results
Global patterns and key drivers of soil metal(loid) mobility
The developed theory-guided ML model, validated using global data, accurately predicts the distribution of metal(loid)s across four fractions with decreasing mobility: exchangeable, water- and acid-soluble (F1), reducible (F2), oxidizable (F3), and residual (F4) (Methods and Supplementary Fig. 2). The F1 fraction, indicative of the most mobile metal(loid) fraction1, poses the greatest threat to the surrounding environment23. Consequently, its mass percentage, denoted as PF1, was regarded as the mobility index in this study. The ML model was employed to predict the baseline mobility, defined as PF1 at the soil baseline contamination threshold, of 22 metal(loid)s globally at a 5-km resolution (see “Methods”). The baseline mobility measures the most mobile metal(loid) content once baseline contamination occurs; a high baseline mobility represents a high likelihood of a soil metal(loid) being mobilized into soil pore water. Thus, the mobility analysis in the present study indicates the spillover risk of soil metal(loid) contamination to water and food security, and its associated risk is termed as the mobilization risk in subsequent analyzes.
This global assessment revealed substantial spatial variability and element-specific differences in baseline mobility, as illustrated for most prevalent metal(loid)s in Fig. 1a–f. Taking Pb as an example (Fig. 1a), the medium-to-high level of mobility (PF1 > 10%, see “Methods”) occurred in approximately 12.03% of the global topsoil (excluding Antarctica), mostly distributed in the northern USA, southern Canada, North Africa, southern Europe, and southern China. For other prevalent metal(loid)s, including As, Cd, Cr, Cu, and Hg, the global land fractions with medium-to-high mobilization risk amounted to 39.96%, 83.93%, 22.85%, 25.49%, and 40.34%, respectively. Globally, the average proportion of land at medium-to-high mobilization risk was 37.43% for the most prevalent metal(loid)s (Supplementary Table 1).
a–f Modeled global distribution of the mobile fraction for Pb, As, Cd, Cr, Cu, and Hg at baseline scenarios. The medium-to-high level of mobility (PF1 > 10%) is indicated in red. g Average PF1 values for 22 metal(loid)s across the globe. The TMC for baseline mobility analysis was selected based on worldwide regulatory guidance values (RGVs), primary US Environmental Protection Agency (USEPA) (Supplementary Tables 2 and 3); h Shapley additive explanations (SHAP) analysis for the mobile fraction (PF1). The impact of the top 10 important descriptors is illustrated through a swarm plot of their corresponding SHAP values. The color of the dot represents the relative value of the descriptor in the dataset (high-to-low depicted as orange-to-blue). The horizontal location of the dots shows whether the effect of that descriptor contributes positively or negatively in that prediction instance (x-axis). The gray bar plot shows the mean absolute SHAP values of the descriptors in each instance, which implies the overall contribution of the descriptor to the PF1 prediction. OC and CEC represent organic carbon and cation exchange capacity, respectively. Source data are provided as a Source Data file.
Among 22 metal(loid)s covered in the current study, uranium, zinc, and sodium exhibited the highest baseline mobility, with global average PF1 values of 41.50%, 38.10%, and 21.04%, respectively, whereas iron, vanadium, and aluminum showed the lowest baseline mobility, with global average PF1 values of 1.82%, 3.23%, and 4.25%, respectively (Fig. 1g). Notably, the mobilization risk evidently increased with the contamination level. Given the elevated contamination levels (Methods), the global average land fraction with PF1 > 10% for the most prevalent metal(loid)s would expand to 52.73% (Supplementary Table 1), as predicted by the ML model. Evident variations in the increase of the mobile fractions were observed from baseline to elevated contamination scenarios (Supplementary Fig. 3), indicating that soil properties had a crucial influence on the mobility distribution and dynamics across the globe. For example, the mobile Pb fraction increased more evidently in South America and Africa than in other regions under the same increase in total Pb content. However, the mobilization risk hotspots remained similar under both the baseline and elevated contamination scenarios.
Interpretation of the ML model (Methods) revealed that electron affinity and covalent radius were the primary elemental properties influencing mobility, whereas total metal(loid) content (TMC) and organic carbon (OC) were the most influential soil properties (Fig. 1h). The positive influence of TMC on PF1 (Fig. 1h) suggests that metal(loid) mobility increases in highly contaminated soils, implying that the associated mobilization risks may escalate disproportionately faster than the level of contamination. The positive correlation between TMC and PF1 explains the above-mentioned influence of elevated contamination levels on mobilization risk. Additionally, the positive influence of OC on PF1 (Fig. 1h) suggests that soil carbon sequestration, a representative negative carbon emission strategy24, may unintentionally increase mobilization risks in metal(loid)-contaminated soils. Generally speaking, the soil pH is also positively correlated with PF1 (Fig. 1h), although with minor fluctuations. The influence of cation exchange capacity (CEC) and clay on PF1 is highly non-linear, indicating complex influencing mechanisms. For example, certain amounts of clay may stabilize metal(loid)s in soil by providing more adsorption sites, while too much clay can also facilitate metal(loid) leaching due to the decreased amount of highly active components and increased particle solubility.
Escalated metal(loid) mobilization risk under the four per mille initiative
The comprehensive mobilization risk map (Methods) of the most prevalent metal(loid)s in soil revealed a high degree of spatial heterogeneity globally, with a pronounced concentration of high mobilization risk areas in the high-latitude regions of the Northern Hemisphere (Fig. 2a). The predicted mobilization risk hotspots existed on all continents, predominantly in central and northern Asia, northern parts of North America and Africa, southern Oceania, and South America (Fig. 2b–i). Regions characterized by high mobile fractions, such as Algeria25 and northwest China26,27,28, were represented in the comprehensive mobilization risk map.
a Mobilization risk hotpots of prevalent metal(loid)s (Pb, As, Cd, Cr, Cu, and Hg) across the globe. This plot illustrates the number of prevalent metal(loid)s identified with medium-to-high mobilization risk in baseline scenarios for a specific site; b–i Zoomed-in sections of the typical mobilization risk hotspots. Source data are provided as a Source Data file.
High mobilization risk strongly corresponded with the high OC content (Supplementary Fig. 4). OC was the primary soil property driver of PF1, second only to TMC, as revealed by feature importance analysis (Fig. 1h). Moreover, OC was the most important soil property driver for As, Cr, and Cu, and it had an important influence on the PF1 of other prevalent metal(loid)s (Supplementary Fig. 5). Global efforts to promote soil carbon sequestration as a negative emission strategy may lead to a potentially escalated mobilization risk of soil metal(loid). The ML model allows us to evaluate the change in the mobile fraction (PF1) of various metal(loid)s in future scenarios of enhanced soil OC.
The Four Per Mille Initiative aims to increase the soil OC content by 0.4% annually to improve soil health and combat climate change24,29,30. However, this increase in soil carbon could inadvertently heighten the mobilization risk of metal(loid). If this initiative was consistently fulfilled from 2024 to 2050, global soils would experience a considerable increase in the mobilization risk of metal(loid)s (Fig. 3a). Specifically, 1% of global land (excluding Antarctica) was predicted to be at medium-to-high mobilization risk for at least six prevalent metal(loid)s in 2024; this proportion was 9% for five prevalent metal(loid)s and 13% for four prevalent metal(loid)s (Fig. 3b). According to our model predictions, the global land proportion at medium-to-high mobilization risk of contamination by at least four prevalent metal(loid)s would rise to 82% by 2050 if the Four Per Mille Initiative was fully implemented, representing a significant increase from 23% in 2024.
a Global risk hotspots in 2050 after the adoption of soil carbon sequestration under the Four Per Mille Initiative; b Global and national comparison of the mobilization risk hotspots in 2024 and 2050. This plot illustrates the number of prevalent metal(loid)s identified as medium-to-high mobilization risk in baseline scenarios for a specific site. The bar charts depict the percentage of land identified to be risky by various prevalent metal(loid)s, highlighting the countries that face significant mobilization risks. Source data are provided as a Source Data file.
An increase in the mobilization risk of metal(loid)s occurs in most countries, and spatial variation exists because of the coupled influences of many factors (i.e., soil properties and the employed baseline scenarios). National analysis revealed that 65% of Russian land was at medium-to-high mobilization risk for at least four prevalent metal(loid)s in 2024, whereas the proportions for Chile, Canada, and Namibia were 43%, 36%, and 32%, respectively. The three countries with the largest land proportions at medium-to-high mobilization risk for six prevalent metal(loid)s were Canada (4%), Russia (1%), and Namibia (1%) in 2024. Pakistan, China, and Namibia were expected to experience the most substantial increases in the mobilization risk of metal(loid)s due to the implementation of soil carbon sequestration, whereas Russia was expected to experience only a modest increase in the mobilization risk (Fig. 3b). Although the PF1 values of the six prevalent metal(loid)s generally increased with increasing OC content, the rate of increase diminished at higher OC levels (Supplementary Fig. 6). The average OC contents in Pakistan (1.73%), China (3.66%), and Namibia (0.59%) were much lower than that in Russia (14.87%) in 2024, indicating a more evident increase in the mobilization risk when the Four Per Mille Initiative was adopted.
Spatial disparities in total and mobile metal(loid) contents in EU soils
By combining the ML model with TMC data from the Land Use/Land Cover Area Frame Survey (LUCAS), we generated contour maps (Fig. 4 and Supplementary Fig. 7) illustrating the mobile metal(loid) contents in the soil (PF1 × TMC) of EU member states in 2009 (Methods). The disparity between the distributions of TMC and mobile metal(loid) content reflected the complex factors influencing mobility and underscores the importance of leveraging the ML model to assess mobilization risk. For example, the total Pb content ranged between 1.99 and 151.12 mg/kg, with an average of 15.07 mg/kg. High levels of Pb contents were observed around central United Kingdom (UK), central Germany, and central Italy. The UK, Germany, Italy, and France, with averages of 26.52, 21.96, 20.78 and 18.46 mg/kg, respectively, ranked among the top EU member states for total Pb content. The UK exhibited the highest mobile Pb content; however, Italy exceeded Germany in terms of mobile Pb (Fig. 4b, d).
a Distribution of total Pb content, along with zoomed-in sections in central Europe; b Mean total Pb content of European countries with soil samples > 500; c Modeled distribution of mobile Pb content, along with zoomed-in sections in central Europe; d Modeled mean mobile Pb content of European countries with soil samples > 500; The total and mobile contents in maps are classified using the geometric interval method in the color spectrum. e–i Mean total and mobile contents of As, Cd, Cr, Cu, and Hg for the top five EU countries. The orange bars on the left-hand side y-axis represent the mobile metal(loid) content, and the blue markers on the right-hand side y-axis represent the total metal(loid) content. The countries are ordered based on their mobile metal(loid) content. UK represents United Kingdom. Source data are provided as a Source Data file.
The areas predicted to have the most significant differences in the spatial distributions of total and mobile Pb included southern Germany, southern UK, and northern France, primarily driven by high soil pH (Supplementary Fig. 8)31. Additionally, spatial differences in total and mobile Pb contents were also observed in northern Germany and Sweden, where they are influenced by CEC values and high OC values (Supplementary Fig. 8). Differences in the rankings of EU countries in terms of the total and mobile contents of As, Cd, Cr, Cu, and Hg were also apparent (Fig. 4e–i and Supplementary Fig. 7). Comparing the ranking by total and mobile contents, Austria decreased from fourth to fifth for Cu, and Ireland decreased from second to third for Hg, respectively. The mobility prediction model also revealed an evident increase in mobile metal(loid) content across EU soils when soil carbon sequestration is adopted (Supplementary Fig. 9).
Discussion
Herein, we present a pioneering application of theory-guided ML to predict soil metal(loid) mobility, addressing the challenges posed by the global heterogeneity of soil properties. The validated soil metal(loid) fractionation model, which enables the fractionation prediction of 49 metal(loid)s in diverse soil types, is expected to significantly reduce the resources and time required for geological surveys. Therefore, our soil metal(loid) mobility model enables the creation and routine updating of global metal(loid) mobility maps.
The metal(loid) mobility model can be used as a guide for prioritizing metal(loid) mobility testing and for raising awareness, for example, in central and northern Asia, northern parts of North America and Africa, southern Oceania, and South America. Prioritizing further metal(loid) mobility testing holds more importance for low- and middle-income countries with extensive crop production (i.e., southern Brazil and northern Argentina), where the current metal(loid) mobility condition is largely unknown and limited testing resources must be employed effectively in the future. We note that the presented soil metal(loid) mobility maps cannot be used to guide site-scale assessment and remediation under the current resolution (5 km × 5 km), particularly because site-scale ( < 100 m) heterogeneous soil metal(loid)s and properties cannot be obtained from existing global datasets32. Nevertheless, our soil metal(loid) mobility model can continue to support site-scale soil metal(loid) assessment and remediation by substituting, at least partially, costly and time-consuming sequential extraction experiments. Thus, embracing our model in site-scale studies can provide practitioners with a unique ability to assess soil metal(loid) mobility rapidly and reliably.
Our results suggest a disproportionate increase in the mobilization risk compared with the contamination level, highlighting the need for more tailored mobility investigations in specific countries/regions with escalating metal(loid) contamination. For example, Chinese soils are anticipated to experience increasing As contamination33. Our findings indicate that the resulting increase in As mobility, particularly in regions with high arsenic levels such as southeast China, could exacerbate the threat to food safety and human health. Furthermore, the mobility distribution maps can be evaluated together with data on the major metal(loid) sources, guiding detailed mobility testing, aiding in industrial planning, and supporting crop management (Supplementary Note 2). For example, south Chile, Namibia, and Kazakhstan were found to be the mobilization risk hotspots with a high probability of soil metal(loid) accumulation due to natural and anthropogenic sources, which should undergo detailed metal(loid) mobility testing and monitoring.
Our metal(loid) mobility model also provides timely insights into how metal(loid) mobility changes with related descriptors, which can provide essential information for designing remediation measures (Supplementary Note 3). Examining the collective effect of soil attributes on the mobility of metal(loid)s facilitates the classification of all investigated metal(loid)s into three groups on the basis of the influence of soil attributes (Supplementary Note 3). This type of metal(loid) grouping can serve as a benchmark for guiding soil remediation efforts when specific data is lacking for a particular metal(loid) but is abundant for similar metal(loid)s in the same group. Moreover, the metal(loid) mobility model can advance our remediation strategies by providing more tailored mobility responses to varying soil attributes (i.e., soil pH), thus suggesting the most efficient pathway to stabilize metal(loid)s in soils. For example, there might exist a varied safe pH range, where PF1 can be evidently decreased for different metal(loid)s at various conditions. Thus, more tailored safe pH ranges provided by our ML models can promote more precise chemical stabilization techniques for site-specific remediations.
The predicted soil metal(loid) mobility maps have substantial implications for precise soil management. The mobility maps highlight the evident spatial disparity in metal(loid) mobilization risk at the same total content, making it unfeasible to establish singular RGVs worldwide or for countries with large land areas and diverse soil types. Establishing new RGVs that consider both mobile and total contents serves as a necessary step towards improved soil regulation. RGVs can be further tailored to the local environment; thus, different RGVs should be proposed for different regions on the basis of soil heterogeneity, instead of using singular RGVs across the world or over entire countries.
Our results also suggest that soil carbon sequestration, although promising for mitigating greenhouse gas emissions, can significantly increase metal(loid) mobility in the soil. Thus, the adoption of soil carbon sequestration worldwide as a key carbon sequestration strategy might be considered highly risky. Instead, countries must incorporate diverse soil carbon sequestration strategies to address the spatial variation in metal(loid) mobilization risk increases. Acknowledging such mobilization risk variation caused by soil carbon sequestration is also an important step towards achieving sustainable soil carbon management.
Despite its promise, our metal(loid) mobility framework has limitations with respect to the data and methods. The collection of additional fractionation data, ideally in a coordinated manner, from areas that currently have poor coverage would benefit future modeling efforts by allowing the model to be better adapted to these areas. Another notable limitation arises from the use of metal(loid) fractionation to represent mobility. Sequential extraction has been widely used to fractionate soil metal(loid)s into operationally defined forms according to their physiochemical mobility and potential availability34. This method has been proven to be useful in the field of environmental analytical chemistry35 and can work as a common scheme so that results from different researchers can be compared and an international database can be established36,37. Although positive correlations between the mobile content in soil and metal(loid) content in plants have been reported38, the correlation between bioavailability and fractionation data remains controversial16, partly due to the lack of precise definition of bioavailability and standard methods for its measurement39. Therefore, particular emphasis should be placed on interpreting the current results with respect to metal(loid) bioavailability. We also acknowledge that our metal(loid) mobility model primarily focuses on the mobility of soil metal(loid)s; therefore, the mobilization risk analysis could shed light on the likelihood of a metal(loid) in soil being mobilized under certain contamination scenarios. In other words, the global mobility maps of soil metal(loid)s cannot represent their actual risk, since the global distribution of the soil metal(loid) content is unavailable, and the source–pathway–receptor interactions are not considered.
In conclusion, this study demonstrates the potential of ML in transforming soil metal(loid) fractionation analysis, providing a foundation for more advanced, tailored, and accurate ML approaches in soil science.
Methods
Assembling a global database for metal(loid) fractionation in soil
We systematically reviewed the literature (March 1, 2023) with a keyword search of studies published since 1975 in the Web of Science and Google Scholar: TOPIC: [“soil” AND “heavy metal” AND (“fraction*” OR “distribution” OR “speciation” OR “sequential extraction”)]. The initial search yielded more than 10,000 peer-reviewed studies and theses from the university directories. Considering the simple implementation, repeatable fractionation results, and data availability40, we focused on studies involving metal(loid) fractionation from the Community Bureau of Reference (BCR) four-step sequential extraction (N ≈ 1000 studies). We fully reviewed these papers to identify studies with detailed data on soil properties ( ≥ two soil descriptors), total metal(loid) content, and metal(loid) fractionation (N = 272). To mitigate the potential problems associated with including too many studies from various experimental settings, we excluded studies with a limited number of data points ( < 12). The process employed for literature screening and data collection is shown in Supplementary Fig. 10. We extracted all available data from the remaining studies (N = 207, Supplementary Table 4), and used WebPlotDigitizer to extract data from plots, images, and/or maps41.
F1–F4 in the BCR correspond to fractions bound to carbonates, Fe–Mn oxides, organic matter, and crystal minerals, respectively42. Among them, F1 is the most mobile fraction that poses the greatest threat to the surrounding environment23. A specific soil and metal(loid) combination has four data points, considering four BCR fractions, leading to a global dataset (N = 207 studies) with 30,832 data points from 5 continents spanning 56 countries/territories (Supplementary Fig. 1). It contained 49 metal(loid)s (Supplementary Fig. 11), diverse soil textures and types, and all land cover groups (cropland, forest, shrubland, grassland, bare land, peat, and marsh). We note that though the uneven distribution of data points was observed in the global dataset, which is often encountered in global soil investigations22,43, our subsequent reliability and uncertainty analyzes of trained ML models imply their robustness and generalization capability to unknown samples and regions.
Clay content (%), pH, CEC (cmol+/kg), and OC (%) were identified as key descriptors of soil, considering their importance in metal(loid) fractionation1,44, availability of data, and feasibility of the trained model for global application (these four descriptors have been well documented in the global soil database). We used the measured metal(loid) content after soil digestion with HCl–HNO3–HF–HClO4 as the total metal(loid) content, and when it was not provided, the sum of BCR extractions was used. The metal(loid) type was replaced with 32 elemental descriptors to facilitate their utilization in ML modeling45. Overall, the global dataset included 37 inputs (4 descriptors for soil properties, 32 descriptors for metal[loid]s and 1 for the total metal[loid] content), and the output was the fractionation percentage under the coupled influence of 37 descriptors. Data conversion and imputation were then implemented to prepare the final and complete global dataset (Supplementary Note 4).
The strong and well-explained input variation within the aggregated dataset improves the robustness and application prospects of the ML models trained on it (Supplementary Note 5)46. No significant linear correlation was observed between the descriptors and metal(loid) fractionation (Supplementary Note 6), indicating a coupled, nonlinear relationship. The challenge in predicting metal(loid) fractionation is further complicated by the significant variations among various metal(loid)s (Supplementary Note 7).
Metal(loid) mobility modeling using theory-guided ML
We validated and compared seven ML algorithms, including deep neural network (DNN)47,48, random forest (RF)49, gradient boosting machine (GBM)50, extreme gradient boosting machine (XGBoost)51, light gradient boosting machine (LightGBM)52,53, support vector machine (SVM)54, and logistic regression (LR)55. First, we split the total number of data points into training and test sets using repeated 90/10 random splits after a comprehensive convergence test (Supplementary Fig. 12). We used the training set to determine the best ML algorithms and set of hyperparameters, as well as train the final model. We used the test set and a brand-new independent test dataset to assess the generalization capability of the trained model.
We then conducted ML modeling through the following process: (i) determined the best feature scaling methods (standardization, normalization, and no scaling) for each ML algorithm with their default hyperparameters (Supplementary Table 5); (ii) selected features via five feature selection methods (recursion feature elimination, feature cluster removal, random-forest-based feature selection, and no feature selection); (iii) optimized ML hyperparameters from approximately 4,000 hyperparameter combinations (Supplementary Table 6); (iv) incorporated a chemical constraint such that, for a given metal(loid), the sum of all its fractions must be equal to 100%, into ML models. Specifically, for a typical fraction prediction, we predicted all fraction percentages (PF1–F4) and normalized that fraction considering the sum of PF1-F4. To speed up ML modeling, the above-mentioned modeling process was only applied to the top three robust ML algorithms with their default performance (Supplementary Table 7). The designed modeling process demonstrated successive improvements in the predictive performance, and the most significant enhancement (76.9% of the total improvement) was achieved by incorporating chemical constraints (Supplementary Fig. 13).
All ML models were built and evaluated in Python, with DNN implemented using the Keras package56 and all other models implemented using the scikit-learn package57. We used the grid search method together with five-fold cross-validation (CV) in scikit-learn to select the best hyperparameter combination. The coefficient of determination (R2) from the five-fold CV was utilized as the evaluation indicator during the modeling process. In our case, the best-trained model was XGBoost with standardization, feature cluster removal, the optimized hyperparameter set, and chemical constraints.
After identifying the best-performing ML algorithm and the set of hyperparameters, we trained the optimal XGBoost model for our final predictions and performance verification. The optimal XGBoost model could establish a strong correspondence between the observed and predicted fractionation values on the test set (86% variation explained with a mean absolute error of 6.21) (Supplementary Fig. 13).
Modeling reliability, prediction uncertainty, and descriptor importance
In addition to training-test verification, we employed a series of reliability analyzes to assess the out-of-sample error of our model. We compiled an independent test dataset from the most recent studies (1620 data points, Supplementary Table 8) and extensive laboratory experiments (3748 data points). The compiled independent test dataset primarily contained soil samples from China and EU (details in Supplementary Note 8). Although the optimal XGBoost model was not trained on any measurements from the independent test dataset, it nevertheless achieved an R2 value of 0.63 (Supplementary Fig. 14); this indicates its ability to predict metal(loid) fractionation of unknown soil samples with accuracy33,58. We also performed a hold-out test to showcase that the trained XGBoost model had a good generalization capability for other regions that were not covered during model training (Supplementary Note 9).
In the present study, we utilized resampling methods based on the jackknife strategy to estimate robust prediction intervals with the optimal XGBoost model59. Ideally, the uncertainty quantification of the optimal XGBoost model should yield a c% confidence interval that contains the true value for approximately c% of the time60. Our uncertainty analysis revealed that 95.49% of the testing data fell into the 95% confidence intervals of prediction, indicating low uncertainty in predicting soil metal(loid) fractionation during testing.
We used permutation importance to calculate the overall descriptor importance based on the complete dataset, using the implementation in the scikit-learn61. A SHAP analysis was performed to investigate the effects of various descriptors on the F1 fraction. For this purpose, we used the TreeSHAP package and tforce plot visualization from the SHAP library in Python.
Global metal(loid) mobility distribution and key drivers
To create the global map of soil metal(loid) mobility using the optimal XGBoost model, we sampled the global topsoil properties using SoilGrids version 2.0 at a resolution of 5 km × 5 km32. Unit conversion was conducted on the SoilGrids data to enable straightforward application of the optimal XGBoost model. The resulting database comprised over 7 million topsoil data points with complete soil descriptor information (pH, OC, clay, and CEC). Before the global application of the optimal XGBoost model, the degree of interpolation/extrapolation was performed62. It is found that the optimal XGBoost model required very limited extrapolation to soil properties not included in the training plots (0.13% of soil OC and 0% of other soil properties, Supplementary Table 9), manifesting the reliability of its global application.
As there is no global dataset of the metal(loid) content in the topsoil, we investigated the global patterns of metal(loid) mobility under baseline contamination scenarios. To determine representative metal(loid) content values for baseline scenarios, we used regulatory guidance values (RGVs) from leading multinational organizations or national regulations (Supplementary Tables 1 & 2)63. To avoid bias in ML extrapolation and ensure the reliability of our prediction, 22 metal(loid)s were analyzed because their selected RGVs were within the TMC range of the global dataset64,65. Together with the topsoil properties in SoilGrids, global baseline mobility maps for metal(loid)s were constructed. The relative uncertainty maps of global prediction were constructed by displaying standard deviation divided by the mean of prediction based on 20 runs of our final XGBoost model (Supplementary Fig. 15). The relative uncertainty of global mobile fractions for prevalent metal(loid)s in baseline scenarios (Fig. 1a–f) was averaged at 9.70%, indicating a small relative uncertainty in the global application of the model66. The metal(loid) mobility model also allowed us to investigate the influence of metal(loid) contamination by selecting representative RGVs for elevated contamination scenarios (Supplementary Table 1). In rare instances during the global application, negative fraction predictions were observed and subsequently replaced using zero.
We used the risk assessment code to evaluate the mobilization risk of metal(loid)s in the soil. Specifically, low and medium-to-high mobilization risks were associated with a PF1 of 1–10% and > 10%, respectively67. Combining the baseline mobility maps for single metal(loid)s led to a comprehensive mobility map of the most prevalent metal(loid)s, including Pb, As, Cd, Cr, Cu, and Hg. Here, we considered a specific site to be susceptible to the mobilization risk of a single metal(loid) when it was predicted to be at medium-to-high mobilization risk in the baseline scenario.
We note that, although these hypothetical scenarios are unrealistic from a physical perspective, they are intended to shed light on the differences in the global hotspots of soil metal(loid) mobility under the influence of metal(loid) content. Therefore, these metal(loid) mobility maps could help raise awareness, improve industrial layout, and promote precise government coordination.
Using the global mobility model, we also evaluated the variation in metal(loid) mobility under the influence of human activity. Here, we investigated the variation in metal(loid) mobility influenced by a typical human endeavor to tackle global climate change, that is, soil carbon sequestration. Soil carbon sequestration was analyzed as it has been recognized as a promising technique to remove atmospheric CO2 by sequestering C as soil OC. In other words, soil carbon sequestration mitigates the rapid increase in global CO2 emissions by altering soil properties, particularly soil OC. Before investigating the influence of soil carbon sequestration on soil metal(loid) mobility, we determined the variation in OC caused by soil carbon sequestration. Indeed, a recent international initiative proposed an increase in global soil OC by 0.4% per year to mitigate greenhouse gas emissions30, which has been well recognized24. Using the above-mentioned increasing rate of soil OC, we simulated mobility variations in the mobile metal(loid) fraction in 2024 and 2050, respectively. The distribution maps of the mobile metal(loid) fraction allowed us to compute the proportion of land at medium-to-high mobilization risk at global and national scales.
Applying ML mobility models across the EU soils
We applied the optimal XGBoost model to spatially map the metal(loid) mobility in the soil of EU member states in 2009. We selected EU member states as a case study for several reasons. First, standard and extensive soil sampling has been conducted in the Land Use/Land Cover Area Frame Survey (LUCAS) project, which encompassed a large area with a wide distribution of soil types. Second, comprehensive analyzes, including the analyzes of soil properties and total contents of prevalent metal(loid)s, have been performed in a single laboratory to ensure consistency within the database. Third, high-quality datasets containing data regarding the required descriptors for predicting metal(loid) mobility have been made publicly available for EU member states via resources offered by international agencies. Finally, in addition to the total metal(loid) content that has been analyzed68, an analysis of their mobility would provide a more accurate assessment of their mobilization risk.
The EU dataset was sampled from the LUCAS project provided by the European Soil Data Center (ESDC). The LUCAS dataset is considered the most comprehensive across continental Europe and has been extensively used for various modeling purposes. We sampled soil properties from the LUCAS 2009 topsoil database69 and extracted the metal(loid) content from the soil contamination maps of the same project68. The resulting dataset comprised 17,601 soil samples with complete descriptors’ information from 23 EU countries, which was used for the metal(loid) mobility prediction. The mobile metal(loid) content was calculated by multiplying the mobile fraction (F1) by the total metal(loid) content44. We investigated the impact of soil carbon sequestration on EU metal(loid) mobility by using the same modeling strategy applied globally. This involved increasing soil OC content by 10.4% while keeping all other properties constant.
Data visualization and statistical analyzes
We visualized the continental and global distribution of metal(loid) fractionation using ArcGIS with Kriging spatial interpolation at a resolution of 5 km × 5 km. We generated the other plots using Matplotlib 3.7.0 and Seaborn 0.12.270. We performed statistical tests (Shapiro–Wilk test, Wilcoxon Signed Rank test, and Pearson correlation) using Scipy 1.10.171, and conducted uncertainty analysis using MAPIE 0.9.272. To decrease the fluctuation in performance evaluation caused by random dataset splitting, we used the mean values from 20 repeated dataset splits as evaluation indicators.
Data availability
Relevant data supporting the key findings of this study are available within the paper and the supplementary information file. Source data are provided with this paper, which is also available at https://figshare.com/s/fa76e3cb3e0e0709df8d73. The soil property and metal(loid) data across EU can be found in the Land Use/Land Cover Area Frame Survey (LUCAS) 2009 database69, and topsoil heavy metal maps68. Global soil property data is available from SoilGrids (https://soilgrids.org/). Source data are provided with this paper.
Code availability
The code to construct the machine learning models is available via Figshare (https://figshare.com/s/fa76e3cb3e0e0709df8d)73.
References
Hou, D. et al. Metal contamination and bioremediation of agricultural soils for food safety and sustainability. Nat. Rev. Earth Environ. 1, 366–381 (2020).
O’Connor, D., Hou, D., Ok, Y. S. & Lanphear, B. P. The effects of iniquitous lead exposure on health. Nat. Sustain. 3, 77–79 (2020).
Thornton, I. & Abrahams, P. Soil ingestion—a major pathway of heavy metals into livestock grazing contaminated land. Sci. Total Environ. 28, 287–294 (1983).
Bull, K. R., Murton, R. K., Osborn, D., Ward, P. & Cheng, L. High levels of cadmium in Atlantic seabirds and sea-skaters. Nature 269, 507–509 (1977).
Rodríguez-Eugenio, N., McLaughlin, M. & Pennock, D. Soil Pollution: A Hidden Reality. (FAO, Rome, 2018).
Pérez, A. P. & Eugenio, N. R. Status of Local Soil Contamination in Europe. https://doi.org/10.2760/093804, JRC107508 (EUR 29124 EN, Publications Office of the European Union: Luxembourg, 2018).
Zhao, F.-J., Ma, Y., Zhu, Y.-G., Tang, Z. & McGrath, S. P. Soil contamination in China: current status and mitigation strategies. Environ. Sci. Technol. 49, 750–759 (2015).
Nardi, F., Annis, A., Di Baldassarre, G., Vivoni, E. R. & Grimaldi, S. GFPLAIN250m, a global high-resolution dataset of Earth’s floodplains. Sci. Data 6, 180309 (2019).
Macklin, M. G. et al. Impacts of metal mining on river systems: a global assessment. Science 381, 1345–1350 (2023).
Foley, J. A. et al. Solutions for a cultivated planet. Nature 478, 337–342 (2011).
Tilman, D., Balzer, C., Hill, J. & Befort, B. L. Global food demand and the sustainable intensification of agriculture. Proc. Natl Acad. Sci. 108, 20260–20264 (2011).
Hou, D. & Ok, Y. S. Soil pollution—speed up global mapping. Nature 566, 455–456 (2019).
Wang, X. et al. Climate and vegetation as primary drivers for global mercury storage in surface soil. Environ. Sci. Technol. 53, 10665–10675 (2019).
Almendras, M.-L., Carballa, M., Diels, L., Vanbroekhoven, K. & Chamy, R. Prediction of heavy metals mobility and bioavailability in contaminated soil using sequential extraction and biosensors. J. Environ. Eng. 135, 839–844 (2009).
Wang, J., Man, Y., Yin, R. & Feng, X. Isotopic and spectroscopic investigation of mercury accumulation in Houttuynia cordata colonizing historically contaminated soil. Environ. Sci. Technol. 56, 7997–8007 (2022).
Li, Q. et al. Speciation of heavy metals in soils and their immobilization at micro-scale interfaces among diverse soil components. Sci. Total Environ. 825, 153862 (2022).
Liang, Y. et al. Geochemical controls on the distribution and bioavailability of heavy metals in sediments from Yangtze River to the East China Sea: assessed by sequential extraction versus diffusive gradients in thin-films (DGT) technique. J. Hazard. Mater. 452, 131253 (2023).
Bagherifam, S., Brown, T. C., Bagherifam, S. & Baglieri, A. Sequential extraction of labile and recalcitrant fractions of soil organic matter: a case study focusing on antimony (Sb) in humic acids, fulvic acids and humin fractions of long-term aged contaminated soils. Environ. Pollut. 327, 121610 (2023).
Mbodji, M. et al. Speciation of metals by sequential extractions of agricultural soils located near a dumpsite for prediction of element availability to vegetables. Talanta 244, 123411 (2022).
Blumenstock, J., Cadamuro, G. & On, R. Predicting poverty and wealth from mobile phone metadata. Science 350, 1073–1076 (2015).
Toetzke, M., Banholzer, N. & Feuerriegel, S. Monitoring global development aid with machine learning. Nat. Sustain. 5, 533–541 (2022).
Xu, P. et al. Fertilizer management for global ammonia emission reduction. Nature 626, 792–798 (2024).
Jin, J. et al. Influence of pyrolysis temperature on properties and environmental safety of heavy metals in biochars derived from municipal sewage sludge. J. Hazard. Mater. 320, 417–426 (2016).
Yang, Y., Tilman, D., Furey, G. & Lehman, C. Soil carbon sequestration accelerated by restoration of grassland biodiversity. Nat. Commun. 10, 718 (2019).
Benhachem, F. Z. & Harrache, D. Chemical speciation and potential mobility of heavy metals in forest soil near road traffic in Hafir, Algeria. J. Health Pollut. 11, 210614 (2021).
Wang, X. S. & Qin, Y. Leaching characteristics of heavy metals and As from two urban roadside soils. Environ. Monit. Assess. 132, 83–92 (2007).
Xu, Y. et al. Cr migration potential and species properties in the soil profile from a chromate production site in the groundwater depression cone area. Bull. Environ. Contam. Toxicol. 109, 600–608 (2022).
Zeng, X., Jin, Q., Wang, P. & Huang, C. Distribution and speciation of heavy metal(loid)s in soils under multiple preservative-treated wooden trestles. Toxics 11, 249 (2023).
Bradford, M. A. A leaky sink. Nat. Clim. Chang. 7, 475–476 (2017).
Minasny, B. et al. Soil carbon 4 per mille. Geoderma 292, 59–86 (2017).
Ballabio, C. et al. Mapping LUCAS topsoil chemical properties at European scale using Gaussian process regression. Geoderma 355, 113912 (2019).
Poggio, L. et al. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. Soil 7, 217–240 (2021).
Zhang, S. et al. Escalating arsenic contamination throughout Chinese soils. Nat. Sustain. 7, 766–775 (2024).
Hlavay, J., Prohaska, T., Weisz, M., Wenzel, W. W. & Stingeder, G. J. Determination of trace elements bound to soil and sediment fractions (IUPAC Technical Report). Pure Appl. Chem. 76, 415–442 (2004).
Tessier, A. & Campbell, P. G. C. Comment on “Pitfalls of sequential extractions” by P. M. V. Nirel and F. M. M. Morel. Water Res. 25, 115–117 (1991).
Sutherland, R. A. BCR®−701: a review of 10-years of sequential extraction analyses. Anal. Chim. Acta 680, 10–20 (2010).
Tack, F. M. G. & Verloo, M. G. Chemical speciation and fractionation in soil and sediment heavy metal analysis: a review. Int. J. Environ. Anal. Chem. 59, 225–238 (1995).
Shober, A. L., Stehouwer, R. C. & MacNeal, K. E. Chemical fractionation of trace elements in biosolid-amended soils and correlation with trace elements in crop tissue. Commun. Soil Sci. Plant Anal. 38, 1029–1046 (2007).
Kim, R.-Y. et al. Bioavailability of heavy metals in soils: definitions and practical implementation—a critical review. Environ. Geochem. Health 37, 1041–1061 (2015).
Hernández-Moreno, J. M., Rodríguez-González, J. I. & Espino-Mesa, M. Evaluation of the BCR sequential extraction for trace elements in European reference volcanic soils. Eur. J. Soil Sci. 58, 419–430 (2007).
Rohatgi, A. et al. WebPlotDigitizer user manual version 3.4. 1–18 https://automeris.io/WebPlotDigitizer (2014).
Bogusz, A. & Oleszczuk, P. Sequential extraction of nickel and zinc in sewage sludge- or biochar/sewage sludge-amended soil. Sci. Total Environ. 636, 927–935 (2018).
Huang, Y. et al. Size, distribution, and vulnerability of the global soil inorganic carbon. Science 384, 233–239 (2024).
Shaheen, S. M. & Rinklebe, J. Geochemical fractions of chromium, copper, and zinc and their vertical distribution in floodplain soil profiles along the Central Elbe River, Germany. Geoderma 228-229, 142–159 (2014).
Jiang, Y. et al. Topological representations of crystalline compounds for the machine-learning prediction of materials properties. npj Comput. Mater. 7, 28 (2021).
Podgorski, J. & Berg, M. Global threat of arsenic in groundwater. Science 368, 845–850 (2020).
Rosenblatt, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Ke, G. et al. Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30, 3146–3154 (2017).
Aiken, E., Bellue, S., Karlan, D., Udry, C. & Blumenstock, J. E. Machine learning and phone data can improve targeting of humanitarian aid. Nature 603, 864–870 (2022).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M. & Klein, M. Logistic Regression. (Springer, New York, 2002).
Ketkar, N. Introduction to Keras. In Deep Learning with Python, 97–111 (Springer, 2017).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Cook-Patton, S. C. et al. Mapping carbon accumulation potential from global natural forest regrowth. Nature 585, 545–550 (2020).
Barber, R. F., Candès, E. J., Ramdas, A. & Tibshirani, R. J. Predictive inference with the jackknife. Ann. Stat. 49, 486–507 (2021).
Nemani, V. et al. Uncertainty quantification in machine learning for engineering design and health prognostics: a tutorial. Mech. Syst. Signal Process. 205, 110796 (2023).
Altmann, A., Toloşi, L., Sander, O. & Lengauer, T. Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010).
van den Hoogen, J. et al. Soil nematode abundance and functional group composition at a global scale. Nature 572, 194–198 (2019).
Jennings, A. A. Analysis of worldwide regulatory guidance values for the most commonly regulated elemental surface soil contamination. J. Environ. Manag. 118, 72–95 (2013).
Muckley, E. S., Saal, J. E., Meredig, B., Roper, C. S. & Martin, J. H. Interpretable models for extrapolation in scientific machine learning. Digit. Discov. 2, 1425–1435 (2023).
Zhan, J. et al. Evaluating interpolation and extrapolation performance of neural retrieval models. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2486–2496 (ACM, New York, 2022).
Haaf, D., Six, J. & Doetterl, S. Global patterns of geo-ecological controls on the response of soil respiration to warming. Nat. Clim. Change 11, 623–627 (2021).
Liu, J., Zhang, X.-H., Tran, H., Wang, D.-Q. & Zhu, Y.-N. Heavy metal contamination and risk assessment in water, paddy soil, and rice around an electroplating plant. Environ. Sci. Pollut. Res. 18, 1623–1632 (2011).
Tóth, G., Hermann, T., Szatmári, G. & Pásztor, L. Maps of heavy metals in the soils of the European Union and proposed priority areas for detailed assessment. Sci. Total Environ. 565, 1054–1062 (2016).
Orgiazzi, A., Ballabio, C., Panagos, P., Jones, A. & Fernández-Ugalde, O. LUCAS soil, the largest expandable soil dataset for Europe: a review. Eur. J. Soil Sci. 69, 140–153 (2018).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Cordier, T. et al. Flexible and Systematic Uncertainty Estimation with Conformal Prediction via the MAPIE library, In Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications 204, 549–581 (PMLR, Limassol, 2023).
Qi, C. C. et al. Figshare https://figshare.com/s/fa76e3cb3e0e0709df8d (2025).
Acknowledgements
This study was funded by the Innovative Research Groups of the National Natural Science Foundation of China (No. 52121004, L.Y.C.), National Natural Science Foundation of China (Nos. 22376221, C.C.Q. and 22494680, Z.L.), Natural Science Foundation of Hunan Province (No. 2024JJ2074, C.C.Q.), and Young Elite Scientists Sponsorship Program by CAST (No. 2023QNRC001, C.C.Q.). This work was also supported in part by the High Performance Computing Center of Central South University.
Author information
Authors and Affiliations
Contributions
C.C.Q., Y.Z., L.Y.C., and Z.L. designed the project. C.C.Q., T.H., and M.T.W. constructed the modelling methodology and performed the modelling. F.H.M.T. and G.R.H. participated in dataset collection and discussion. M.L., B.T.Z., S.D., and Q.S.C. participated in results analysis and discussion. C.C.Q. and Z.L. wrote the initial draft of the manuscript, and all authors edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Michael Harbottle, Xuejun Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Qi, C., Hu, T., Zheng, Y. et al. Global and regional patterns of soil metal(loid) mobility and associated risks. Nat Commun 16, 2947 (2025). https://doi.org/10.1038/s41467-025-58026-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-58026-8
This article is cited by
-
Research on the Initial Damage Sensitivity Analysis of Interlaminar Bonding Defects in Wind Turbine Blades Under Extreme Wind Conditions
Journal of Vibration Engineering & Technologies (2025)
-
Enhanced prediction of occurrence forms of heavy metals in tailings: A systematic comparison of machine learning methods and model integration
International Journal of Minerals, Metallurgy and Materials (2025)