Abstract
The Tibetan Plateau, a globally significant ecological region, is experiencing escalating pollution from heavy metals (HMs). This study applies a machine learning approach based on the self-organizing map hyper-clustering, alongside advanced methodologies such as Positive Matrix Factorization (PMF), Incremental Spatial Autocorrelation, and Bivariate Local Indicators of Spatial Association (BiLISA), to analyze the ecological risk of soil HMs in representative watersheds of the southeastern Tibetan Plateau, focusing on spatial pattern clustering, pollutant source identification, and interaction risk assessment. The results indicated higher HMs concentrations in the middle and downstream areas. A comprehensive ecological risk assessment integrating the Improved Potential Ecological Risk Index, Enrichment Factor, Contamination Factor, and Geo-accumulation Index identified Cd, Pb, and As as the primary pollutants of concern. By combining PMF with Mantel analysis, pollution was attributed to geological background, agricultural activities, traffic emissions, and atmospheric deposition. The BiLISA method revealed significant spatial interactions among HMs, with the composite pollution of As and Cd occupying the largest proportion in High (As)-High (Cd) aggregation zones, underscoring the need for integrated management strategies. This study offers novel insights into the spatial pollution patterns and source apportionment of soil HMs, providing an advanced analytical framework for their precise control and ecological restoration.
Similar content being viewed by others
Introduction
The Tibetan Plateau, one of the most ecologically sensitive and vulnerable regions in China, exhibits a heightened response to environmental changes. The Tibetan Plateau has faced unprecedented environmental threats due to global climate changes and escalating anthropogenic activities1,2,3. Soil, a fundamental ecosystem component crucial for agriculture and human sustenance, is increasingly compromised by heavy metals (HMs) contamination, which deteriorates soil fertility, reduces productivity, and poses severe health risks through food chain infiltration. Nationally, approximately 20% of arable land is polluted, emphasizing the urgent need for soil quality preservation to safeguard agricultural security and public health4,5,6. Although industrial activity within the Tibetan Plateau is limited, atmospheric transport of HMs from rapid development in Southeast Asia has contributed to their accumulation in the region’s soil. Elevated levels of HMs such as Cd, Pb, As, and Zn have been linked to anthropogenic sources, including transportation along major provincial roads, where concentrations diminish exponentially with distance. In some areas, Pb concentrations exceed several hundred milligrams per kilogram7,8,9. The compounded impact of these pollutants threatens soil health, ecological integrity, and sustainable economic growth in the Tibetan Plateau10,11,12. The typical small watershed in the southeastern Tibetan Plateau, where this study is conducted, embodies these multifaceted environmental challenges. The study area’s characteristic landform features, agricultural practices, and transportation activities serve as a microcosm of the entire Tibetan Plateau. Therefore, identifying the spatial distribution, origins, and threats of HMs contamination in this representative area is imperative for effective soil pollution prevention and ecological management in this ecologically critical region.
The self-organizing map (SOM), a type of artificial neural network, has found extensive application in environmental field for exploratory analysis and spatial pattern recognition. By leveraging non-linear mapping, competitive learning strategies, and preservation of spatial topology, SOM provides a robust framework for analyzing complex datasets13. Integrating SOM with complementary algorithms, including hierarchical cluster analysis (HCA) and K-means (KM) clustering, can further enhance analytical precision and resolution. This strategy is consistent with the approach of integrating multiple models to enhance environmental assessment outcomes, highlighting the necessity of incorporating machine learning methods in complex environmental systems14,15. In comparison with conventional multivariate approaches like HCA and KM, SOM delivers deeper insights into dynamics within classes and unveils more distinct inter-group variations16. Its exceptional visualization capabilities also facilitate clearer and more impactful result presentation17. Beyond machine learning applications, the significance of comprehensive evaluation methods has been increasingly emphasized in understanding HMs pollution. Metrics such as the geo-accumulation index (Igeo), the enrichment factor (EF), the contamination factor (CF), and advanced indices like the improved potential ecological risk index (IPERI) are essential for robust assessments18,19. Thus, we propose that the integration of refined machine learning methodologies with these comprehensive evaluation strategies can significantly enhance the understanding of spatial variations in HMs pollution. Additionally, employing multiple indices is crucial for accurately identifying high-risk metals.
Quantitative delineation of the sources contributing to HMs in soil is imperative for the development of targeted strategies aimed at pollution mitigation and the deployment of appropriate remedial actions20,21,22. At present, pollution source identification approaches are primarily categorized into qualitative analysis and quantitative apportionment. Among these, receptor models circumvent the need for explicit emission factor information and do not require a clear understanding of the transport processes of pollutants. They directly measure the receptor environment. Consequently, receptor modeling methods are the most commonly used technical approach in current pollution source apportionment research23. The Positive Matrix Factorization (PMF) represents a comparatively advanced receptor model for analyzing soil HMs datasets. PMF can overcome the problem of negative contributions of pollution sources24. It is extensively employed to clarify the chemical properties and origins of pollution sources, especially within the research on the specific origins of heavy metal contaminants in terrestrial and aquatic deposits25,26,27.
Previous studies have employed spatial autocorrelation analysis across various scales to understand their distribution, as well as hot and cold spot analysis and spatial prediction28,29. Spatial autocorrelation analysis often employs the Incremental Spatial Autocorrelation (ISA) model to handle autocorrelation between the dependent variable and spatial distance, producing reliable results30. It can guide the deployment of environmental monitoring points, ensuring that sampling points accurately represent the contamination levels of the entire research region, thereby improving the representativeness and accuracy of monitoring data. Soil HMs pollution problems typically involve the interaction of multiple HMs, but decision-makers are often unaware of these spatial interaction risks. Previous studies have tended to concentrate on evaluating and forecasting the spatial spread of individual heavy metal contaminants in soil. The limitation of this approach lies in the segmentation of HMs pollution management, which significantly increases costs in terms of finance, labor, and time, and also fails to achieve early warning and supervision of the comprehensive risks posed by multiple HMs. The Bivariate Local Indicators of Spatial Autocorrelation (BiLISA) is a technique for spatial autocorrelation analysis that can be employed to couple the risk of pollutants through spatial interactions between different HMs, with the advantage of characterizing spatial clusters or ‘hot spots’31,32. Considering the presence of HMs point sources and interaction risks in the actual environment, the BiLISA analysis is theoretically a suitable tool for risk area assessment33. Currently, the utilization of the BiLISA analysis method to assess the interaction risk of HMs in soil remains limited. Therefore, integrating PMF with BiLISA enhances the accuracy of HMs traceability, especially for the identification of binary HMs superimposed pollution areas.
To address the limitations of previous studies, this research employs advanced machine learning techniques, including SOM-based hyper-clustering, the PMF model, ISA, and BiLISA, applied to a dataset of HMs concentrations in soils from small watersheds on the Tibetan Plateau. These methodologies are utilized to uncover pollution patterns, identify and quantify sources of soil contamination, and determine pollution severity, thereby providing foundational data for regional pollution management and ecological restoration. The primary objectives of this study are as follows: (1) to characterize the spatial distribution patterns of heavy metal pollutants; (2) to identify critical contaminants using comprehensive risk assessment indices, including IPERI, EF, Igeo, and CF; (3) to investigate HMs sources and quantify their contributions through the PMF model; and (4) to delineate binary HMs co-contamination zones and define risk management units.
Materials and methods
Study area
The study area is a typical small watershed located in the southeastern Tibetan Plateau, with coordinates spanning from 98°00’ to 99°05’ East in longitude and 28°37’ to 30°20’ North in latitude, and an average altitude of 3500 m. The renowned National Highway 318, known as the Sichuan-Tibet Highway, traverses the entire watershed. It is situated in a plateau temperate semi-humid region with a monsoon climate, marked by moist summers and chilly, arid winters. It exhibits typical characteristics of a warm and dry river valley climate. The yearly average temperature stands at 10 °C, accompanied by an annual rainfall that spans from 350 to 450 mm. The terrain of the study area is predominantly characterized by rolling hills and river valleys. The principal river in the watershed is a major tributary of the Lancang River, with an overall length of approximately 21 km and a drainage area of approximately 192.82 km2. The geology is dominated by sandstone, conglomerate, and mudstone sedimentary rocks, while the predominant soil type is loam. The population of the study area is predominantly Tibetan, with residents living along the river. The local economy is primarily agrarian, with agriculture and animal husbandry representing the mainstays of the region’s economy. Barley, corn, and wheat are the primary crops, while pastures are primarily used to support the grazing of yak and Tibetan goats. This “natural-ecological, agricultural, and transportation” composite system serves as a microcosm of plateau environmental changes, making it an ideal representative area for studying soil HMs contamination and ecological environmental impacts.
Soil sampling and chemical analysis
Employing satellite imagery from remote sensing, a grid of 1 km × 1 km was superimposed on the agricultural land layer in the watershed to extract desired sampling points for the study area. A comprehensive set of 37 surface soil samples was gathered across the entire small watershed (Fig. 1). The entire batch of samples was air-dried in a particular soil drying room and then reduced to a 100 mesh size for the assessment of their physical and chemical characteristics. The determination of soil HMs (As, Cd, Pb, Cr, Zn, Ni) content utilized a HNO3-HCl-HF digestion system for soil samples, wherein 50 mg of soil sample was taken and digested at 180 °C using a mixture comprising 2.0 mL HCl, 2.0 mL HF and 6.0 mL HNO3 under microwave conditions for 20 min. Subsequently, the digestion solution was controlled to a volume of 1–2 mL by setting a reflux program at 170 °C for 20 min, and then diluted to 50 mL for measurement using a 3% HNO3 solution. The HMs content was tested utilizing inductively coupled plasma mass spectrometry (ICP-MS, Agilent 8900 Series, Thermo Scientific, USA), calibrated against standard curves, and verified using eight internal standard elements. Random duplicate samples, national soil standard specimens (BW07306), reagent blanks, as well as parallel soil samples were tested to verify the reliability of the experimental procedures, instrument precision, and data accuracy. The spiked recovery rate of the standard samples ranged from 84 to 114%, and the analysis deviation compared to the relative standard deviation (RSD) was less than 5%.
Super-clustering method based on self-organizing map (SOM) dividing sampling sites into three clusters in the study area. The figure was created by ArcGIS Desktop 10.8. https://www.esri.com/en-us/arcgis/products/arcgis-desktop/overview.
Comprehensive ecological risk assessment index
Improved potential ecological risk index (IPERI)
The IPERI is a refined version of the Potential Ecological Risk Index (PERI), first proposed by Hakanson, integrates soil contamination levels, ecological and environmental impacts, along with toxicological data to offer a more holistic risk evaluation34. The potential ecological risk factor for a specific metal, denoted as (Ei r), is defined as follows.
The risk index (RI) for the sampling locations was determined using the following equation:
The equation used to calculate the RI for the sampling sites incorporates several key parameters, including the metal concentration in soil (Ci), background soil concentration (C0), and the toxic-response factor (Ti r) specific to each metal. The toxic-response factors are derived from the relationship between the bioproduction index (BPI) and the toxic factor (Si t-value), with specific values assigned to metals: Pb and Ni at 5, Zn at 1, As at 10, Cr at 2, and Cd at 3035. The potential ecological risk factor for each metal (Ei r) is calculated, with the overall RI providing a comprehensive risk assessment for the metals involved. The PERI for heavy metals is typically divided into five levels based on risk, though recent literature has highlighted the need for adjustment based on the type and concentration of pollutants, as the generic toxicity factors used in Hakanson’s study may not always provide an accurate reflection of current pollution statuses36,37.
Studies have often miscalculated or misclassified the Ei r and RI values, particularly by using inaccurate toxic-response factors or overlooking the sensitivity of aquatic organisms to pollutants. In Hakanson’s original model, the low-risk threshold for Ei r was set at 40, based on the highest toxic factor (Si t-value) among the pollutants, and the first cut-off RI value was set at 150. However, a more refined approach proposed by Ma et al.36 adjusts these values for greater accuracy in assessing heavy metal pollution. In this study, we adopted the toxicity factors from Xu et al.37, which are based on Hakanson’s model, to improve risk assessment and adjust the first-level thresholds for both Ei r and RI (Table S1). Specifically, we set the first level for Ei r at 30 (reflecting the high toxicity factor of Cd) and the first level for RI at 60, recalculated by multiplying the unit toxicity factor of 1.13 by the total sum of the toxicity factors (53). This refined approach enhances the accuracy of ecological risk classification for heavy metal pollution.
Geoaccumulation index (Igeo)
The geoaccumulation index (Igeo) has been widely applied to evaluate environmental pollution issues38. This index considers both the influence of natural geological background and the impact of human activities on the environment. It is calculated using Eq. 339:
Where Cn represents the actual concentration (mg/kg) of HMs in the soil, and Bn denotes the corresponding geochemical background value (mg/kg) of the HMs in the study area. In this study, the soil environment of the Tibet Autonomous Region is used as the reference value. The K value is used to eliminate the influence of different background values caused by rock differences, commonly set at 1.5. Pollution levels are classified into seven levels from low to high, as shown in Table S2.
Enrichment factor (EF)
The EF is a widely used and effective method for determining the presence of HMs contamination in soil samples. The EF is primarily utilized to differentiate whether the source of HMs is anthropogenic or geogenic40. It is calculated by measuring the ratio of an element to a reference element. Al, Si, K, Sc, Ti, Fe, Mn, Sr, and Zr are the most commonly used reference elements41,42. This study adopts Zr as the reference element, based on the background values of soils in the Tibet Autonomous Region, to calculate the EF using Eq. (4).
Where Cn is the actual concentration of HMs obtained from testing, CZr is the actual content of Zr in soil samples, Bn denotes the background value of corresponding HMs in the Tibet Autonomous Region soil, and BZr indicates the background value of Zr in the region. Based on the magnitude of the EF, the pollution degree of HMs elements is divided into five levels (Table S3)43.
Contamination factor (CF)
The CF quantifies the enrichment of HMs relative to background levels, serving as an indicator of contamination severity. It is defined by the equation34:
Where Ci stands for the concentration of metal i in the soil, C0 represents the corresponding background concentration. The classification criteria for contamination levels are presented in Table S4.
Machine learning methods
Spatial patterns of HMs pollution using optimized machine learning models
The Self-Organizing Map (SOM), an unsupervised neural network, imitates the behavior of human brain neurons to form self-organizing clustering patterns. By associating input vectors with their best-matching neurons, SOM facilitates the clustering of samples with similar features44. However, SOM does not inherently delineate sub-groups of output neurons, necessitating further super-clustering of neuron vectors45. Recent studies suggest that combining SOM with HCA outperforms the commonly used KM approach for recognizing ecological patterns45,46.
To determine the most suitable model for spatial HMs pollution analysis, four clustering methods were constructed: HCA, KM, HCA + SOM, and KM + SOM. Using the R package “NbClust”, the optimal number of clusters was identified based on multiple indicators47. The SOM approach mapped sampling locations to their respective best-matching units (BMUs) to visualize spatial pollution patterns, and differences among clusters were characterized using the Kruskal-Wallis and Wilcoxon tests.
Using the HMs concentrations from 37 sites, missing values were excluded, and three data transformations (log(1 + x), Z-score normalization, and min-max scaling) were applied to enhance SOM training45. SOM grid sizes followed heuristic guidelines, with a total neuron count approximating five times the square root of input samples48. Hexagonal topologies were selected for smoother information flow, and four configurations (3 × 10, 10 × 3, 5 × 6, 6 × 5) were tested. Convergence was achieved by minimizing the quantization error (QE) and topographic error (TE), with the optimal models exhibiting high explained variance.
Neuron vectors of the optimal SOM models underwent super-clustering using Euclidean distance with Ward.D linkage and K-means methods46,49. For comparison, input data clustering was performed directly. SOM performance was quantitatively assessed based on clustering quality metrics provided by “NbClust”. Eight final models emerged, incorporating varying combinations of SOM topologies and clustering approaches.
Spatial information from sample labels was mapped onto the SOM’s trained BMUs, enabling visualization of HMs spatial distribution. Statistical characterization of clusters revealed significant differences between groups, enhancing the ecological interpretation of spatial HMs patterns50. This optimized SOM-based workflow demonstrates superior performance in clustering and spatial visualization, offering robust insights into HMs pollution dynamics.
PMF model analysis
PMF model is a multivariate receptor model recommended by the United States Environmental Protection Agency (PMF5.0, USEPA). Its primary statistical analysis principle involves calculating the errors of various chemical components in soil samples using weights, and then continuously optimizing through least squares method iteration to determine the main sources of soil HMs pollution and their contributions. The pollution source component profile matrix F and the source contribution matrix G can be calculated using the following matrix equation:
Matrix X is an n × m matrix, where n is the number of samples and m is the number of soil HMs categories. This matrix X can be decomposed into matrices G and F, where G is an n × f matrix representing the source contribution, and F is an f × m matrix representing the pollution source component spectrum. Here, f is the number of main pollution sources. Additionally, matrix E represents the residual, indicating the difference between X and the product of G and F.
The most crucial step in the PMF model is selecting the optimal number of pollution source factors, f. Under the constraints of gik≥0 and fkj≥0, the objective function Q reaches its minimum value when the corresponding number of factors is chosen as the appropriate number of pollution source factors. With fit coefficients exceeding 0.93 for all HMs, the PMF-derived estimates accurately captured the observed variations in HM concentrations in soil (Fig. S1). The objective function Q is defined as follows:
Geostatistical analysis
Moran’s I is divided into Global Moran’s I and Local Moran’s I, commonly used as a significant research indicator to measure the potential spatial dependence between observed values of variables within the same region. Since global spatial autocorrelation may mask local spatial autocorrelation, we need a model that can explore and analyze spatial distributions at a microscale, and Local Moran’s I is one excellent representative51. In contrast to univariate spatial autocorrelation that focuses on a single variable, bivariate spatial autocorrelation elucidates the spatial interconnections between distinct variables by employing local Moran’s I. The Bivariate Moran’s I tool, based on the spatial autocorrelation of bivariate variables, has high adaptability and effectiveness in describing the spatial interactions and dependencies of two geographical features. It can be used to determine whether two variables exhibit correlation in space and assess the specific locations of statistically significant spatial aggregation types.
Soil HMs are in a continuous distribution state, and it is impossible to obtain information on the entire HMs content through sampling points. Theoretically, only when the number of sampling points tends to infinity can the evaluation value approach the actual value. In order to visualize the spatial distribution of composite pollution of soil HMs, this study uses inverse distance weighting interpolation of sampling point data to obtain raster data as reference values for computing bivariate correlation spatial distributions. The formula for this calculation is as follows52:
Where Xh i and Xm j represent the values of HMs variables h and m at grid cells i and j, respectively; Wij is the weight matrix calculated by weighting the spatial distance between grid cells i and j. Since Wij fully considers the relationship between two grid cells that are spatially close but not adjacent, geographical distance is used to determine Wij. Therefore, Wij =1/ dij, where dij represents the Euclidean distance between grid cells i and j. If the output of Ihm shows a significant positive result, the bivariate variables exhibit a clear correlation, forming a spatial pattern of bivariate soil HMs aggregation (High-High) or no composite pollution pattern of soil HMs (Low-Low). If the result is negative, it indicates that the bivariate variables do not have a significant correlation, showing spatial heterogeneity patterns of bivariate asymmetry, specifically Low-High and High-Low. When the result is not statistically significant, it indicates a nonsignificant association.
The Incremental Spatial Autocorrelation (ISA) tool is used to test spatial autocorrelation (Global Moran’s I) of point attributes at a series of increasing distances. In simple terms, it measures the strength of spatial clustering of observation values at every possible distance between data points, and then returns a z-score53. The z-score reflects the strength of spatial clustering as a function of distance, with increasing z-scores indicating increasing clustering significance. Peaks in z-scores with statistical significance indicate the most prominent distances where spatial clustering occurs during the simulated outward diffusion process54.
Results and discussion
Distribution characteristics of surface soil HMs
Distribution characteristics of HMs concentrations in surface soil samples are shown in Table S5. The mean concentrations of Zn, As, Cd, and Pb were 1.25, 1.49, 2.35, and 2.57 times greater than the background values, which were 21.9, 74, 19.7, 0.081, 29.1 mg/kg, respectively. In contrast, Cr and Ni were lower than their corresponding background values55. The proportions of sites where the six elements exceed the background values in descending order was as follows: Cd (100%), Pb (91.89), Zn (64.86%), As (48.45%), Ni (16.22%), Cr (3.95%). Furthermore, compared to the other three HMs, As, Cd, and Pb are 2.68, 1.96, and 2.77 times the national average, respectively56. This indicates that the enrichment levels of these three HMs in the study area are relatively high when considering both local and national background values. In addition to natural processes, there may be human activities contributing to the environmental presence of these metals.
Compared with other clustering methods such as HCA and K-means, the hyper-clustering of a 3 × 10 SOM using K-means and HCA demonstrated superior performance across multiple evaluation metrics. However, the HCA-based hyper-clustering model emerged as the optimal approach due to its higher PtBiserial index57 and enhanced multi-metric performance (Table S6). This indicates a better alignment between the dataset and the partitions derived from SOM hyper-clustering58, effectively capturing the spatial distribution characteristics of HMs pollution in the study area. Specifically, the SOM model was trained using six HMs from 37 soil samples as input data, normalized using min-max scaling. A total of 30 neurons (3 × 10) were selected, achieving the maximum explained variance (EV = 99.73%), the optimal quantization error (QE = 0.0153), and topological error (TE = 0.0842) (Table S7). Finally, SOM neurons were grouped into three clusters (C1, C2, and C3) based on HCA and NbClust results to reveal spatial pollution patterns (Fig. S2a, b).
The sampling points were divided into three clusters (Fig. 1), each characterized by distinct pollution profiles. Cluster C1, consisting of 24 sites, was primarily located in the central region of the watershed. Cluster C2 included 10 sites, mainly distributed across the midstream and downstream areas of the watershed. Cluster C3 contained three sites (S12, S35, and S36), exhibiting considerable spatial variability, with sites located in both the upstream and midstream regions (Fig. 2a). Kruskal-Wallis tests indicated significant differences in HMs pollution among the three clusters (p < 0.05) (Fig. 2b). Specifically, C1 exhibited the lowest overall HMs concentrations and posed minimal pollution risk. In contrast, C2 was characterized by pronounced multi-metal contamination, particularly with Cd and As. Meanwhile, C3 was distinguished by high concentrations of Cr and Ni.
Spatial distribution of heavy metals (HMs). (a) Grouping of sampling locations into three distinct clusters through the super-clustering of a Self-Organizing Map (SOM) combined with Hierarchical Cluster Analysis (HCA) (b)Patterns of HMs contamination across clusters (C1, C2, and C3 represent the SOM super-clustering outcomes).
Evaluation of HMs accumulation situation
According to the IPERI assessment, all HMs in the study area exhibited low ecological risks except for Cd, which posed a considerable ecological risk with Er = 71.66. The RI values for the six HMs ranged from 44.51 to 270.37, with a mean value of 108.48. Based on the refined RI risk classification, 13.51% of the sampling sites were categorized as low and moderate ecological risks, respectively, while 72.98% of the sites fell under the medium risk category. This indicates that the majority of the study area faces at least moderate ecological threats (Fig. 3). Spatially, clusters C2 and C3 exhibited significant ecological risks associated with Cd exposure, while moderate risks from Pb and As were primarily concentrated in cluster C2. Overall, the watershed displayed a distinct pattern of heightened ecological risks in the midstream and downstream regions.
The evaluation of accumulation status for six HMs in surface soil of the study area was conducted by calculating the Igeo values, as shown in Fig. 3. The majority of points exceeding 83% for Cr, Ni, and Zn were classified as unpolluted, indicating relatively minor harm to the soil ecological environment from these four types of HMs. Only As, Cd, and Pb had Igeo values greater than 1, accounting for 10.81%, 13.51%, and 18.92% of the total points, respectively. This indicates the presence of moderate to heavy contamination by As, Cd, and Pb in the study area, highlighting the need for special attention to the environmental impact of these HMs on soil. Cr and Ni exhibited pollution risks ranging from unpolluted to heavily polluted only in cluster C3, while Zn was categorized as posing no pollution risk exclusively in cluster C1. Overall, these results suggest that As, Cd, and Pb are the main pollutants, followed by Cr, Ni, and Zn.
The EF values of selected HMs in agricultural soil samples from the small river basin are presented in Table S8. Cr, Ni, and Zn were found to exhibit significant enrichment levels in cluster C3. These results indicate slight to moderate enrichment of these HMs in agricultural soil. A total of 94.59% of the sampling points displayed significant enrichment of Pb, showing significant enrichment overall. The EF values for As ranged from 0.89 to 14.75, with an average of 3.79 and a relatively high relative standard deviation, reflecting a wide spatial variation in enrichment levels. The significantly enriched points were primarily distributed in the midstream and downstream regions of the watershed, necessitating targeted attention to these areas. Cd was found to exhibit significant or higher enrichment levels at all sampling points across the entire watershed, highlighting substantial enrichment of Cd in the study area’s soil. In general, Pb and Cd are significantly enriched in the soil of the study area, As exhibits localized regional aggregation, while Cr, Ni, and Zn show overall enrichment levels ranging from slight to moderate. Consistent with the conclusions drawn from the Igeo calculations, we should focus on the environmental impact of As, Cd, and Pb in the study area.
The classification of HMs pollution levels based on CF values is presented in Fig. 3. The results reveal that a substantial portion of the study area is subject to considerable pollution from Cd (100%), with 21.62% of the area experiencing considerable pollution; Pb and As have 91.89% and 51.35% of the area under moderate pollution, respectively, and 18.92% and 10.81% of the area reaching a higher pollution level (CF ≥ 3), with these pollutants predominantly concentrated in clusters C2 and C3. Additionally, Zn was noted for its moderate pollution impact, affecting 64.86% of the region. Spatial variations in pollution levels were observed among the different clusters. Notably, Cr and Ni pollution was negligible in clusters C1 and C2, with moderate contamination observed only in C3. The pollution of Cd, Pb, and As was primarily concentrated in the midstream and downstream regions of the watershed.
Quantitative source apportionment
To effectively identify the sources of HMs, Pearson correlation analysis was initially conducted to determine the correlation among the six HMs. The potential sources and contributions of heavy metals were then assigned and quantified using the PMF model and the Mantal test (Fig. 4). Before running the model, we inputted the sample concentration values and their uncertainties, which were calculated based on the detection limits of the ICP-MS instrument. The model automatically computed species signal-to-noise ratios (S/N) greater than or equal to 9, indicating a strong signal for species categorization. During the model execution, factors ranging from 3 to 6 were attempted, with each model run 20 times, and the start seed number was set to random. After the base model run, the most suitable number of factors was determined by evaluating the minimum, most stable, and Qtrue and Qrobust with the smallest difference in Qexcept. Finally, four factors were identified to match the model very well, with all species scaled residuals ranging from − 3 to 3 and satisfying normal distribution, and linear regression coefficients r2 exceeding 0.9. Bootstrap (BS) and Displacement (DISP) methods were employed for error assessment of the model to analyze uncertainties and biases. The results showed that in BS analysis, more than 92% of factors could be completely mapped, and in DISP summary, no factor exchange was found at dQmax = 4, indicating almost no rotation ambiguity in this solution, suggesting that the four positive definite factor matrix decomposition solutions were stable and reliable.
The PMF analysis results are displayed in Fig. 4a, and the Mantel test results are shown in Fig. 4b. Factor 1 (F1) had the lowest contribution ratio (16.42%), with the highest loadings for As (71.84%, Mantel test, p < 0.01), followed by Pb (16.54%), Cd (14.89%), and Zn (14.04%). Unlike many regions where As contamination is primarily attributed to irrigation water pollution or industrial discharge, the Tibetan Plateau exhibits a unique dual - source pattern—a combination of high geological background and agricultural activities—resulting in As concentrations that far exceed local background values. Numerous studies have demonstrated that the use of fertilizers and pesticides is a significant factor leading to increased As content in agricultural soil59,60. The study area’s arable soil is mainly affected by fertilizers and pesticides, which are applied by residents to promote plant growth and development during the plant growth period, using urea, organic fertilizers, and pesticides for weed control and pest eradication. Another important reason for the high As content is the widespread distribution of As-rich shale in the Tibetan Plateau, leading to As levels far exceeding the background values of Chinese and world soils61. Therefore, the main cause of As enrichment is superimposed on a high geological background in addition to agricultural sources.
Factor 2 (F2) had the highest contribution ratio (30.82%), with Pb (83.42%, Mantel test, p < 0.01), Cd (24.07%, Mantel test, p < 0.05), Zn (15.81%), Ni (12.50%), and Cr (11.99%). Pb is an important source of HMs enrichment, released through exhaust emissions along with Cd, Ni, Pb, and Cu, released from brake wear along with Ba, Cu, Sb, and Fe, and released from tire wear, resulting in Zn62. Previous studies have reported that the sources of Pb may include exhaust emissions63, coal combustion64, industrial waste, and fertilizers65,66. Unlike low-altitude regions, where Pb pollution is primarily linked to industrial activities and coal combustion, Pb contamination in the Tibetan Plateau is predominantly traffic-related. The study area is a transit area for the famous Sichuan-Xizang Line (National Highway 318), where large numbers of tourists gather during the peak tourism season, most of whom drive large-displacement off-road vehicles along the National Highway 318 to enter the Tibetan Plateau. Due to the low oxygen content in the plateau (the average altitude of the study area is 3500 m), incomplete combustion of vehicle engines occurs67, resulting in increased Pb content in vehicle exhaust, leading to the enrichment of Pb in the study area. Therefore, the enrichment of Pb is closely related to traffic source emissions.
Factor 3 (F3) contributes secondarily (29.99%), with Cr (68.50%, Mantel test, p < 0.01), Ni (67.49%, Mantel test, p < 0.01), Zn (27.18%, Mantel test, p < 0.05), and Cd (17.12%). According to the Igeo analysis of Cr and Ni, 97.30% and 94.59% of sites, respectively, were classified as unpolluted, with average EFCr values less than 2, indicating a slight enrichment level. Moreover, the mean values of Cr and Ni throughout the study area did not exceed the background values of soil in the Tibet Autonomous Region. Compared with the other three contributing factors, the contribution of F3 calculated by PMF to these HMs was closer to their respective background values. This suggests that the parent material has a strong provenance attribute for these highly loaded HMs. The sources of Cr and Ni in soil depend largely on their contents in parent rocks, and the anthropogenic inputs (such as fertilizers and manure) are typically lower than the HMs contents in the parent rocks after weathering processes form soil68. Cr and Ni are assigned to the same factor by the PMF model and Mantel test, typically considered as indicators of natural sources, which is supported by abundant literature69,70,71. Therefore, the conclusion that F3 represents natural sources seems reasonable.
Factor 4 (F4) ranks third in terms of source contribution, accounting for 22.77%, with Cd (43.92%, Mantel test, p < 0.01), Zn (42.97%, Mantel test, p < 0.01), As (28.00%), Ni (20.01%), and Cr (19.06%). It is unlikely that all Cd in agricultural land soil in the basin comes from the plateau itself, as the study area has almost no industry, except for a few small-scale repair shops, and no obvious environmental pollution problems. Cd in the soil may come from the periphery of the plateau, which needs to be considered from the perspective of atmospheric deposition, as atmospheric input has become a key process for Cd accumulation on the Tibetan Plateau72,73. This viewpoint is supported by numerous studies, showing that high Cd content in sediments of Ximen Co Lake on the Tibetan Plateau is due to the influence of the southwest monsoon, which carries anthropogenic pollutants from the west and South Asia to the Tibetan Plateau74. Backward trajectory analysis of atmospheric transport suggests that the Gurenhekou Glacier and Jade Dragon Snow Mountain are mainly influenced by southwest air masses and southeast air masses. Apart from possible local industrial activities, due to anthropogenic inputs from South Asia, the Gurenhekou Glacier may be a potential source of trace elements, resulting in high Cd content and enrichment75. Surrounding the Tibet Plateau, countries such as Iran and Kuwait in West Asia, as well as India and Pakistan in South Asia, have been important oil-producing countries for decades, and rapid economic growth in South Asia in recent decades has led to increasing environmental pollution. During these processes, pollutants such as gases, metals, and organic compounds emitted into the atmosphere pollute the environment, and the westerly and South Asian monsoon airflow transports pollutants to the Tibet Plateau29. Therefore, the fourth factor related to atmospheric deposition can be considered as the source of Cd, Zn, As, and Ni accumulation in the soil.
Spatial autocorrelation analysis of HMs
The application of ISA facilitated the assessment of the extent of statistically significant soil contamination and identified the regions requiring immediate remedial action. A graphical representation of the z-scores of Moran’s I as a function of distance highlighted the specific ranges where spatial clustering of HMs concentrations was most pronounced76. Figure 5 and Table S9 display the spatial clustering strength of HMs Pb, Cd, and As at each distance interval obtained through ISA analysis. When the distance interval is set to 10, neither Pb nor Cd shows a peak in Z-score, only As shows a peak, hence capturing more precise distance peaks by increasing the distance intervals gradually. When the distance interval for Pb is set to 20, the Z-score peaks at 1.97, with a clustering radius of 1000 m, then decreases with increasing distance. For Cd, the Z-score peaks at 1.78, with a clustering radius of 2324.11 m. As peaks at a Z-score of 12.87, with a clustering radius of 5928.90 m. The calculated Z-score and P-values indicate that the clustering of As (Z = 12.86, P < 0.01) in the study area is significantly higher than that of Cd (Z = 1.78, P = 0.04 < 0.05) and Pb (Z = 1.97, P = 0.04 < 0.1), which is related to local agricultural activities. From the clustering radius, it can be observed that As > Cd > Pb, indicating that the enrichment range of As is much higher than that of Pb and Cd, which may be due to the dispersed distribution of agricultural land in the multi-valley terrain of the study area and differences in pesticide or fertilizer use by residents. Cd exhibits a relatively high enrichment radius, possibly due to its high mobility, as well as geographic factors (including precipitation, DEM, slope, etc.) and soil properties (pH, texture), leading to Cd enrichment77,78. Through these analyses, the areas with the most severe HMs enrichment have been identified, and long-term monitoring and remediation measures should be implemented in these areas to reduce the risks posed by soil HMs accumulation to agricultural products, direct or indirect harm to human health, and adverse ecological impacts.
Spatial analysis through ISA reveals distinct spatial patterns among three HMs. Identifying and understanding their clustering patterns in space is crucial when studying interactions among multiple HMs pollutants. Determining these spatial clusters not only helps unveil the distribution characteristics of pollutants but also aids in deeper comprehension of their potential relationships and impacts. Bivariate Local Indicators of Spatial Association (Bivariate LISA) maps provide information on the interactions among HMs. These maps categorize any two pollutants into five types: High-High, High-Low, Low-High, Low-Low, and Not significant79. Figure 6 illustrates the spatial interaction patterns among HMs in the study area that require particular attention. The High-High clustering area of As-Cd composite pollution is concentrated in the southwest and central parts of the study area, accounting for 19.80% of the total area (38.17 km2), indicating a significant synergistic effect between the two HMs pollutants. The central and southwest parts of the study area are concentrated agricultural areas where residents apply fertilizers and herbicides extensively during the crop growing season. According to the results of the PMF model analysis, As mainly originates from geological background overlaid with anthropogenic agricultural activities, while Cd primarily comes from atmospheric deposition and agricultural activities80,81. The clustering effect of agricultural activities explains the accumulation of As and Cd in the central and southwest parts of the soil. In contrast, the Low-Low clustering area mainly occurs in the northeast, where the altitude is much higher than in the central and southwest areas, with only a small portion of alpine grasslands, almost unaffected by human activities, covering 27.35% of the total area. Additionally, spatial heterogeneity patterns were found, primarily concentrated in the south-central region (Low-High, accounting for 8.57% of the total area) and the northeastern part of the central region (High-Low, only accounting for 0.067% of the total area). This is mainly attributed to differences in pollution sources and variations in soil adsorption mechanisms. The remaining 44.21% of the study area did not show significant correlations, mainly comprising part of the Trans-Himalayan region, with minimal influence from residents, thus not being the focus of this study.
Bivariate LISA mapping of HMs. The figure was created by ArcGIS Desktop 10.8. https://www.esri.com/en-us/arcgis/products/arcgis-desktop/overview, and Geoda 1.14.0 software; http://geodacenter.github.io/.
The High-High clustering area of As-Pb composite pollution is only scattered in the central part of the study area, accounting for only 0.57% of the total area (1.10 km2). This indicates that there is no significant synergistic effect between the two HM pollutants overall. This finding aligns with the results derived from the PMF model and Mantel test, which revealed that the two HMs originate from distinct sources. It is possible that this co-clustering area results from the overlap between emission sources associated with F1 and F2. The Low-Low clustering area (covering 25.27% of the total area) is concentrated in contiguous areas in the north, which are high-altitude mountainous areas with minimal human activity influence. Moreover, this part of the study area has relatively fewer soil sampling points collected and analyzed, so this area does not require our particular attention. Spatial heterogeneity is observed in the western (High-Low, accounting for 17.94%) and central-southern (Low-High, accounting for 16.46%) regions. This heterogeneity primarily stems from differing sources of two HMs. The western region is characterized by concentrated enrichment of As and Cd, while the central-southern region exhibits enrichment of Cd and Pb. In the remaining 39.76% of the study area, no significant correlation between binary variables was detected, hence no further specific discussion is provided.
The High-High clustering area of Pb-Cd composite pollution is mainly concentrated in the central part, traversed by National Highway 318, covering 8.60% of the total area (16.58 km2), indicating a synergistic effect between Pb and Cd. F2 resolved by the PMF model indicates that Pb is the main loading element, and Cd is a minor component, and the Mantel test shows a significant correlation between Pb and Cd (r = 0.5, p < 0.05), indicating a high degree of similarity in pollution sources in this area. Additionally, this area is home to famous scenic spots where large numbers of self-driving tourists gather each year, and residents also use vehicles to access grasslands for grand events, significantly increasing the likelihood of Pb and Cd accumulation from vehicular emissions. The Low-Low clustering area of Pb-Cd composite pollution is concentrated in the northeast (covering 23.87%), with minimal human influence. The High-Low and Low-High clustering areas account for 9.06% and 17.61%, respectively, which are essentially coincident with the single-variable HMs enrichment areas. The western High-Low area partially belongs to a basin, making it more susceptible to Pb enrichment, and may also be affected by local climatic conditions leading to the dispersion of traffic sources. The remaining 40.85% of the study area did not show significant correlations between binary variables and will not be subjected to further analysis.
The zoning of HMs interaction risks
The spatial distribution of HMs interactions is depicted through LISA maps, yet the combined risk of these pollutants remains elusive due to unclear knowledge regarding their joint toxicity. Currently, HMs pollution is managed based on separate risk thresholds for each pollutant, making it challenging to quantify their collective impact. In this study, the risk thresholds for each HM were set according to the strictest risk screening values specified in the soil environment quality risk control standard (GB15618-2018), with thresholds for As, Pb, and Cd set at 25, 90, and 0.3 mg/kg, respectively. Based on these parameters, the study area is classified into four major categories and seven subcategories of risk control zones: the multi-metal risk control zone (H-H-H), the dual-metal risk control zones (As(Low)-H-H, Pb(Low)-H-H, Cd(Low)-H-H), the single-metal risk control zones (As(High)-L-L, Pb(High)-L-L, No Cd(High)-L-L), and the clean zone (L-L-L).
The risk management zones are illustrated in Fig. 7. The L-L-L zone covers the majority of the northern and northeastern parts of the study area, representing 54.31% of the total area. The H-H-H zone, which accounts for only 0.22% of the total area, is mainly located in the central region. The risk control zones for As-H-H, Pb-H-H, and Cd-H-H bimetallic metals are also concentrated in the central region, representing 0.18%, 0.03%, and 0.87% of the total area, respectively. The central region experiences pollution from mixed sources including agriculture, transport, and atmospheric deposition. Although these zones occupy a smaller proportion of the area, as discussed in the previous section, the spatial clustering of the two HMs is evident. Therefore, attention should be paid to these clustering risk areas in dual-metal risk control zones, referring to the LISA maps. Single-metal risk control zones are concentrated in the middle and lower reaches, with As-L-L and Pb-L-L accounting for 27.59% and 16.79% of the total area, respectively. This indicates that the predominant pollution in the study area is from single heavy metal sources, with no spatial interaction effects. The identification of multi-metal risk control zones enables the more accurate determination of different HMs pollution areas, thereby providing technical support for the precise prevention and control of soil HMs contamination.
Watershed HMs risk interaction zones. The figure was created by ArcGIS Desktop 10.8. https://www.esri.com/en-us/arcgis/products/arcgis-desktop/overview, and Geoda 1.14.0 software; http://geodacenter.github.io/.
Conclusion
This paper employs machine learning techniques to delineate the spatial distribution of HMs, identify key pollution sources through integrated indices, the PMF model, and BiLISA, and ultimately assess interactive risk zones and delineate risk management areas for major contaminants. (1) HMs concentrations are elevated in midstream and downstream areas due to anthropogenic activities and geological background. (2) As, Pb, and Cd are the primary pollutants exceeding safety thresholds, requiring regulatory attention. (3) Source apportionment reveals that As mainly originates from agricultural activities and geological sources (71.84%), Pb from traffic emissions (83.42%), and Cd from atmospheric deposition (43.92%), suggesting targeted mitigation strategies such as improved fertilization practices, promotion of low-emission vehicles, and regional air pollution controls (4) Spatial analysis identifies significant interaction risks between As and Cd, underscoring the need for integrated pollution management. (4) Spatial analysis identifies significant interaction risks between As and Cd, underscoring the need for integrated pollution management. Future research should incorporate spatiotemporal modeling and bioavailability assessments to refine risk evaluations. This study demonstrates the effectiveness of machine learning in environmental analysis, providing insights for large-scale pollution management.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Shi, W., Qiao, F. & Zhou, L. Identification of ecological risk zoning on Qinghai-Tibet plateau from the perspective of ecosystem service supply and demand. Sustainability 13 (10), 5366 (2021).
Wu, J. et al. Inorganic pollution around the Qinghai-Tibet plateau: an overview of the current observations. Sci. Total Environ. 550, 628–636 (2016).
Wu, J. et al. Pollution, ecological-health risks, and sources of heavy metals in soil of the Northeastern Qinghai-Tibet plateau. Chemosphere 201, 234–242 (2018).
Liu, J. et al. A Spatial distribution - Principal component analysis (SD-PCA) model to assess pollution of heavy metals in soil. Sci. Total Environ. 859, 160112 (2023).
Wang, S. et al. Spatial distribution and source apportionment of heavy metals in soil from a typical county-level City of Guangdong Province, China. Sci. Total Environ. 655, 92–101 (2019).
Fei, X. et al. Improved heavy metal mapping and pollution source apportionment in Shanghai City soils using auxiliary information. Sci. Total Environ. 661, 168–177 (2019).
Sheng, J. et al. Heavy metals of the Tibetan top soils: level, source, Spatial distribution, Temporal variation and risk assessment. Environ. Sci. Pollut Res. 19, 3362–3370 (2012).
Li, L., Wu, J., Lu, J. & Xu, J. Speciation, risks and isotope-based source apportionment of trace elements in soils of the Northeastern Qinghai–Tibet plateau. Geochem. : Explor. Environ. Anal. 20, 315–322 (2019).
Wang, G. et al. Traffic-related trace elements in soils along six highway segments on the Tibetan plateau: influence factors and Spatial variation. Sci. Total Environ. 581–582, 811–821 (2017).
Qin, G. et al. Soil heavy metal pollution and food safety in China: effects, sources and removing technology. Chemosphere 267, 129205 (2021).
Huang, J. et al. A new exploration of health risk assessment quantification from sources of soil heavy metals under different land use. Environ. Pollut. 243, 49–58 (2018).
Harvey, P. J. et al. Evaluation and assessment of the efficacy of an abatement strategy in a former lead smelter community, Boolaroo, Australia. Environ. Geochem. Health. 38, 941–954 (2016).
Chon, T. S. Self-Organizing maps applied to ecological sciences. Ecol. Inf. 6, 50–61 (2011).
Gao, B., Stein, A. & Wang, J. A two-point machine learning method for the Spatial prediction of soil pollution. Int. J. Appl. Earth Obs Geoinf. 108, 102742 (2022).
Bammou, Y. et al. Improving landslide susceptibility mapping in semi-arid regions using machine learning and Geospatial techniques. DYSONA - Appl. Sci. 6, 269–290 (2025).
Vesanto, J. & Alhoniemi, E. Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11, 586–600 (2000).
Licen, S., Astel, A. & Tsakovski, S. Self-organizing map algorithm for assessing Spatial and Temporal patterns of pollutants in environmental compartments: A review. Sci. Total Environ. 878, 163084 (2023).
Hadzi, G. Y., Essumang, D. K. & Ayoko, G. A. Assessment of contamination and potential ecological risks of heavy metals in riverine sediments from gold mining and pristine areas in Ghana. J. Trace Elem. Minerals. 7, 100109 (2024).
Zhang, Y. et al. Toxicities and risk assessment of heavy metals in sediments of Taihu lake, China, based on sediment quality guidelines. J. Environ. Sci. 62, 31–38 (2017).
Fei, X. et al. Contamination assessment and source apportionment of heavy metals in agricultural soil through the synthesis of PMF and geogdetector models. Sci. Total Environ. 747, 141293 (2020).
Ercilla-Montserrat, M., Muñoz, P., Montero, J. I., Gabarrell, X. & Rieradevall, J. A study on air quality and heavy metals content of urban food produced in a mediterranean City (Barcelona). J. Clean. Prod. 195, 385–395 (2018).
Li, C. et al. Enhancement of heavy metal immobilization in sewage sludge Biochar by combining alkaline hydrothermal treatment and pyrolysis. J. Clean. Prod. 369 133325 (2022).
Dong, B. et al. Multiple methods for the identification of heavy metal sources in cropland soils from a resource-based region. Sci. Total Environ. 651, 3127–3138 (2019).
Cao, Q., Wang, H. & Chen, G. Source apportionment of PAHs using two mathematical models for Mangrove sediments in Shantou coastal zone, China. Estuaries Coasts. 34, 950–960 (2011).
Liang, J. et al. Spatial distribution and source identification of heavy metals in surface soils in a typical coal mine City, Lianyuan, China. Environ. Pollut. 225, 681–690 (2017).
Rastegari Mehr, M. et al. Distribution, source identification and health risk assessment of soil heavy metals in urban areas of Isfahan Province, Iran. J. Afr. Earth Sci. 132, 16–26 (2017).
Bhuiyan, M. A., Dampare, S. B., Islam, M. A. & Suzuki, S. Source apportionment and pollution evaluation of heavy metals in water and sediments of Buriganga river, Bangladesh, using multivariate analysis and pollution evaluation indices. Environ. Monit. Assess. 187, 4075 (2015).
Wu, Z., Chen, Y., Han, Y., Ke, T. & Liu, Y. Identifying the influencing factors controlling the Spatial variation of heavy metals in suburban soil using Spatial regression models. Sci. Total Environ. 717, 137212 (2020).
Du, H. et al. Contamination characteristics, source analysis, and Spatial prediction of soil heavy metal concentrations on the Qinghai-Tibet plateau. J. Soils Sediments. 23, 2202–2215 (2023).
Huo, X. N., Li, H., Sun, D. F., Zhou, L. D. & Li, B. G. Combining geostatistics with Moran’s I analysis for mapping soil heavy metals in Beijing, China. Int. J. Environ. Res. Public. Health. 9, 995–1017 (2012).
Abokifa, A. A., Katz, L. & Sela, L. Spatiotemporal trends of recovery from lead contamination in Flint, MI as revealed by crowdsourced water sampling. Water Res. 171, 115442 (2020).
Zhao, C. S. et al. Impact of Spatial variations in water quality and hydrological factors on the food-web structure in urban aquatic environments. Water Res. 153, 121–133 (2019).
Jia, Z. et al. An integrated methodology for improving heavy metal risk management in soil-rice system. J. Clean. Prod. 273 122797 (2020).
Hakanson, L. An ecological risk index for aquatic pollution control. A sedimentological approach. Water Res. 14, 975–1001 (1980).
Suresh, G., Ramasamy, V., Sundarrajan, M. & Paramasivam, K. Spatial and vertical distributions of heavy metals and their potential toxicity levels in various beach sediments from high-background-radiation area, Kerala, India. Mar. Pollut Bull. 91, 389–400 (2015).
Ma, J., Han, C. & Jiang, Y. Some problems in the application of potential ecological risk index. Geographical Res. Geogr. Res. 39, 1233–1241 (2020).
Xu, Z., Ni, S., Tuo, X. & Zhang, C. Calculation of heavy Metal’s toxicity factors in the evaluation of potential ecological risk index. Environ. Sci. Technol. 31 (2), 112–115 (2008).
Liu, J. et al. Geochemical dispersal of thallium and accompanying metals in sediment profiles from a smelter-impacted area in South China. Appl. Geochem. 88, 239–246 (2018).
Muller, G. Index of geoaccumulation in sediments of the Rhing river. GeoJournal 2, 108–118 (1969).
Yildiz, U. & Ozkul, C. Heavy metals contamination and ecological risks in agricultural soils of Usak, Western Turkiye: a Geostatistical and multivariate analysis. Environ. Geochem. Health. 46, 58 (2024).
Ozkul, C. Heavy metal contamination in soils around the Tuncbilek thermal power plant (Kutahya, Turkey). Environ. Monit. Assess. 188, 284 (2016).
Liu, Y. et al. High cadmium concentration in soil in the three Gorges region: Geogenic source and potential bioavailability. Appl. Geochem. 37, 149–156 (2013).
Sutherland, R. A. Bed sediment-associated trace metals in an urban stream, Oahu, Hawaii. Environ. Geol. 39, 611–627 (2000).
Li, J. et al. Source apportionment and Ecological-Health risks assessment of heavy metals in topsoil near a factory, central China. Exposure Health. 13, 79–92 (2021).
Rahman, A. S., Kono, Y. & Hosono, T. Self-organizing map improves Understanding on the hydrochemical processes in aquifer systems. Sci. Total Environ. 846, 157281 (2022).
Farsadnia, F. et al. Identification of homogeneous regions for regionalization of watersheds by two-level self-organizing feature maps. J. Hydrol. 509, 387–397 (2014).
Charrad, M., Ghazzali, N., Boiteau, V. & Niknafs, A. NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61, 1–36 (2014).
Vesanto, J. & Alhoniemi, E. Clustering of the self-organizing map. IEEE Trans. Neural Netw. 11 (3), 586–600 (2000).
Xiang, Q. et al. The potential ecological risk assessment of soil heavy metals using self-organizing map. Sci. Total Environ. 843, 156978 (2022).
Wang, S. et al. Machine learning-driven assessment of heavy metal contamination in the impounded lakes of China’s South-to‐North water diversion project: identifying Spatiotemporal patterns and ecological risks. J. Hazard. Mater. 480, 135983 (2024).
Li, Y., Wang, X. & Gong, P. Combined risk assessment method based on Spatial interaction: A case for polycyclic aromatic hydrocarbons and heavy metals in Taihu lake sediments. J. Clean. Prod. 328, 129590 (2021).
Anselin, L. Local indicators of Spatial association-LISA. Geogr. Anal. 27, 93–115 (2003).
Jossart, J., Theuerkauf, S. J., Wickliffe, L. C. & Morris, J. A. Jr Applications of Spatial autocorrelation analyses for marine aquaculture siting. Front. Mar. Sci. 6, 806 (2020).
Ran, H. et al. Pollution characteristics and source identification of soil metal(loid)s at an abandoned arsenic-containing mine, China. J. Hazard. Mater. 413, 125382 (2021).
Huang, J. et al. Elemental composition of the topsoil fine fraction at and around the Tibetan plateau. Environ. Pollut. 320, 121098 (2023).
Teng, Y. et al. Soil and soil environmental quality monitoring in China: a review. Environ. Int. 69, 177–199 (2014).
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A. & NbClust An R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61, 1–36 (2014).
Milligan, G. W. An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45, 325–342 (1980).
Varol, M., Sunbul, M. R., Aytop, H. & Yilmaz, C. H. Environmental, ecological and health risks of trace elements, and their sources in soils of Harran plain, Turkey. Chemosphere 245, 125592 (2020).
Zhou, Y. et al. Arsenic in agricultural soils across China: distribution pattern, accumulation trend, influencing factors, and risk assessment. Sci. Total Environ. 616–617, 156–163 (2018).
Li, C. et al. Geothermal spring causes arsenic contamination in river waters of the Southern Tibetan plateau, China. Environ. Earth Sci. 71 (9), 4143–4148 (2013).
Thorpe, A. & Harrison, R. M. Sources and properties of non-exhaust particulate matter from road traffic: a review. Sci. Total Environ. 400, 270–282 (2008).
Men, C. et al. Pollution characteristics, risk assessment, and source apportionment of heavy metals in road dust in Beijing, China. Sci. Total Environ. 612, 138–147 (2018).
Wang, J. et al. Bioaccessibility, sources and health risk assessment of trace metals in urban park dust in Nanjing, Southeast China. Ecotoxicol. Environ. Saf. 128, 161–170 (2016).
Tang, Z. et al. Contamination and health risks of heavy metals in street dust from a coal-mining City in Eastern China. Ecotoxicol. Environ. Saf. 138, 83–91 (2017).
Atafar, Z. et al. Effect of fertilizer application on soil heavy metal concentration. Environ. Monit. Assess. 160, 83–89 (2010).
Shi, J. et al. Chemical characteristics of PM2.5 emitted from motor vehicles exhaust under the plateau with low oxygen content. Atmos. Environ. 314, 120053 (2023).
Rodriguez Martin, J. A., Arias, M. L. & Grau Corbi, J. M. Heavy metals contents in agricultural topsoils in the Ebro basin (Spain). Application of the multivariate geoestatistical methods to study Spatial variations. Environ. Pollut. 144, 1001–1012 (2006).
Jiang, Y. et al. Source apportionment and health risk assessment of heavy metals in soil for a Township in Jiangsu Province, China. Chemosphere 168, 1658–1668 (2017).
Lv, J. et al. Identifying the origins and Spatial distributions of heavy metals in soils of Ju country (Eastern China) using multivariate and Geostatistical approach. J. Soils Sediments. 15, 163–178 (2014).
Xue, J. L. et al. Positive matrix factorization as source apportionment of soil lead and cadmium around a battery plant (Changxing County, China). Environ. Sci. Pollut Res. Int. 21, 7698–7707 (2014).
Zhang, Z. et al. Identification of anthropogenic contributions to heavy metals in wetland soils of the Karuola glacier in the Qinghai-Tibetan plateau. Ecol. Indic. 98, 678–685 (2019).
Dong, Z. et al. New insights into trace elements deposition in the snow packs at remote alpine glaciers in the Northern Tibetan plateau, China. Sci. Total Environ. 529, 101–113 (2015).
Yuan, H. et al. Characteristics and origins of heavy metals in sediments from Ximen Co lake during summer monsoon season, a deep lake on the Eastern Tibetan plateau. J. Geochem. Explor. 136, 76–83 (2014).
Li, R. et al. Spatial distribution and source analysis of trace elements in typical mountain glaciers on the Qinghai-Tibet plateau. J. Glaciology Geocryology. 43, 1277–1289 (2021).
Meng, Y., Cave, M. & Zhang, C. Spatial distribution patterns of phosphorus in top-soils of greater London authority area and their natural and anthropogenic factors. Appl. Geochem. 88, 213–220 (2018).
Li, R., Zhang, R., Yang, Y. & Li, Y. Accumulation characteristics, driving factors, and model prediction of cadmium in soil-highland barley system on the Tibetan plateau. J. Hazard. Mater. 453, 131407 (2023).
Liu, T., Yuan, X., Luo, K., Xie, C. & Zhou, L. Molecular engineering of a new method for effective removal of cadmium from water. Water Res. 253, 121326 (2024).
Ogneva-Himmelberger, Y. & Huang, L. Spatial distribution of unconventional gas wells and human populations in the Marcellus shale in the united States: vulnerability analysis. Appl. Geochem. 60, 165–174 (2015).
Fei, X. et al. Comprehensive assessment and source apportionment of heavy metals in Shanghai agricultural soils with different fertility levels. Ecol. Indic. 106, 105508 (2019).
Cai, L. M., Wang, Q. S., Wen, H. H., Luo, J. & Wang, S. Heavy metals in agricultural soils from a typical Township in Guangdong Province, China: occurrences and Spatial distribution. Ecotoxicol. Environ. Saf. 168, 184–191 (2019).
Acknowledgements
This work was supported by Key Research and Development Program of Ningxia (Grant NO. 2023BEG01002), the National Natural Science Foundation of China (Grant NO. 42473071), and the Natural Science Foundation of Henan (Grant NO. 222300420128).
Author information
Authors and Affiliations
Contributions
S.D. and R.S. conceptualised the study. Material preparation, data collection, and analysis were performed by Y.L. and Y.Y. Materials, facilities, and supervision were contributed by X.L., G.C. and W.D. The first draft of the manuscript was written by Y.L. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, Y., Yu, Y., Ding, S. et al. Application of machine learning in soil heavy metals pollution assessment in the southeastern Tibetan plateau. Sci Rep 15, 13579 (2025). https://doi.org/10.1038/s41598-025-97006-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-97006-2