Groundwater quality evolution across China

Zhou, Qing; Zhang, Jiangjiang; Zhang, Shuyou; Chen, Qiang; Fan, Huifeng; Cao, Chenglong; Zhang, Yanni; Yang, Yadi; Luo, Jian; Yao, Yijun

doi:10.1038/s41467-025-57853-z

Download PDF

Article
Open access
Published: 14 March 2025

Groundwater quality evolution across China

Nature Communications volume 16, Article number: 2522 (2025) Cite this article

18k Accesses
7 Citations
15 Altmetric
Metrics details

Subjects

Abstract

China is facing a severe groundwater quality crisis amid economic development and climate change, yet the extent and trajectory of this crisis remain largely unknown. Here we developed a machine-learning model, incorporating natural and social-economic factors, to construct annual probabilistic maps of poor groundwater quality (PGQ, i.e., Class V based on the Chinese groundwater quality standard) across China from 1980 to 2100. Alarmingly, our findings indicate a concerning escalation in PGQ area ratio, rising from 17.3% in 1980 to 30.1% in 2000, and surging to 40.8% by 2020, adversely affecting 6.8%, 17.5%, and 36.0% of the Chinese population, respectively. The predominant drivers of this degradation were identified as agricultural discharge (contributing to 10.7% growth in PGQ area ratio), followed by groundwater exploitation (5.6%), industrial discharge (5.3%), domestic discharge (1.7%), climate change (0.5%), and land use change (-0.3%). By 2050, the PGQ area ratio could range from 37.9% to 48.3% under different socio-economic and climate scenarios. Our study highlights the urgent need for effective water resources management and conservation measures to mitigate the deteriorating trend of groundwater quality and address the challenges posed by socio-economic development and climate change, thereby safeguarding water security for China and the global community.

Time lag effect of precipitation on groundwater level based on wavelet analysis in the People’s Victory Canal irrigation area, China

Article Open access 09 April 2025

Identifying groundwater characteristics and controlling factors in Jiaozhou Bay’s northern coastal region, China: a combined approach of multivariate statistics, isotope analysis, and field empirical investigations

Article Open access 11 October 2024

Groundwater quality assessment for drinking purposes: a case study in the Mekong Delta, Vietnam

Article Open access 16 March 2023

Introduction

Groundwater provides almost half of all drinking water worldwide, particularly in rural areas¹. Yet, it faces substantial threats from socio-economic development and climate change^2,3. Serving as a crucial source of drinking water for over 400 cities across China, groundwater is especially vital in the northern regions where it accounts for two-thirds of the drinking water, half of the industrial water, and one-third of the irrigation water^4,5. As the world’s largest industrial powerhouse and a major agricultural producer, China has experienced considerable groundwater quality problems over recent decades⁶. Extensive pollution discharges from agricultural and industrial activities, coupled with large-scale over-exploitation, have led to widespread groundwater pollution across the nation^7,8. In 2020, a report by China’s Ministry of Ecology and Environment, indicated that 33.7% of the tested 10,242 sites (dominated by shallow groundwater) had “marginal” groundwater quality, while another 43.6% were classified as “poor”⁹.

In response to the groundwater quality crisis, nations including China have developed environmental monitoring networks aimed at assessing and mitigating the negative impacts of contamination and over-exploitation on this vital resource¹⁰. These networks are important for the early detection of pollutants, comprehending groundwater flow dynamics, and formulating strategies to increase groundwater sustainability¹¹. However, a key limitation of these networks is their sparse spatial distribution of monitoring sites, which fails to provide comprehensive national coverage, often both costly and time-consuming¹². This deficiency is particularly acute in emerging economies like China, where the low density of monitoring sites inadequately captures the spatial variability of groundwater quality and pollution incidents, compromising the accuracy and completeness of monitoring efforts¹³. Furthermore, considerable challenges persist in comprehensively quantifying the driving factors behind groundwater quality deterioration and in projecting future changes.

To address the lack of comprehensive data for groundwater quality assessment, researchers have increasingly utilized machine learning (ML) to enhance the mapping of contaminants like arsenic and fluoride, which often stem from natural geological sources^14,15. Pioneering works by Amini et al.¹⁶ and Rodríguez-Lado et al.¹⁷ have incorporated environmental factors into ML models to create predictive maps. These efforts are notable for their ability to interpret complex environmental data—including climate, soil, geology, and topography—into useful predictive tools for assessing groundwater quality^18,19,20,21. However, these studies primarily focused on static environmental factors as predictors, which may not sufficiently capture the dynamic and evolving impacts of human activities and climate change on groundwater systems.

In this study, we employed extensive groundwater survey data, along with natural and socio-economic variables across China, to develop an ML model for predicting groundwater quality. Utilizing this model, we mapped the annual probabilities of poor groundwater quality (PGQ, i.e., Class V based on the Chinese groundwater quality standard²²; Table S1) from 1980 to 2100, under various future scenarios of socio-economic development and climate change. This approach allowed us to illuminate evolving patterns of groundwater quality and assess the profound influence of human activities and climate change on China’s groundwater quality. Our analysis offers crucial insights and guidance for crafting strategies to address the environmental threats that China’s groundwater resources are facing. We underscore the importance of adapting to changes in development patterns and climate conditions to effectively mitigate these challenges.

Results

Random forest modeling

In our study, we extracted groundwater quality data from 1977 published surveys, culminating in a dataset. This dataset incorporated geospatial information, including location and land-use type, as well as temporal data represented by the sampling year. The spatial distribution and temporal variation of the survey data are presented in Fig. S1. Recognizing the need for a robust and unbiased model, we systematically segregated these 1977 groundwater quality surveys into two distinct datasets: 90% for training and 10% for validation. To counterbalance any potential issues arising from the relatively small sample size, we utilized a strategic data augmentation technique. This involved the generation of modified duplicates of the existing data, thereby enhancing our training dataset. The augmentation of the validation dataset was conducted independently of the training dataset, ensuring no overlap and maintaining the integrity of our model’s validation process.

To establish a quantitative relationship between predictors and groundwater quality (PGQ or non-PGQ), we employed a non-parametric supervised ML technique known as random forest (RF). To develop a parsimonious classification model, an initial selection of 51 potential predictor variables underwent the elimination of redundant variables (Table S2). This process of variable pruning, which was guided by evaluating the collinearity and independence (Fig. S2), ensured that only those variables that did not compromise the prediction accuracy of subsequent RF models were retained (“Model predictor selection” in Methods). Addressing multicollinearity in these parameters is crucial, as it helps prevent adverse effects such as overfitting, reduced interpretability, and ultimately, compromised predictive performance of the model²³. The finally retained 25 predictors (indicated in bold in Table S2) includes factors related to soil properties, geographical and hydrogeological conditions, climate change, groundwater exploitation, pollution discharge, and land-use change. The out-of-bag error of each predictor, reflecting its impact on the model’s predictive accuracy²⁴, is used to assess the importance of each predictor (Fig. S3). Five most notable predictors are identified: Depth-based groundwater type, air temperature, aridity index, precipitation, and sand content of soil (100–200 cm).

The performance of the RF model on the validation set (10% of the data, which was randomly selected while maintaining the relative distribution of PGQ and non-PGQ) is summarized in the confusion matrix in Table S3. Despite a prevalence of PGQ of only 43% in the dataset, the model performs well in predicting both PGQ (sensitivity: 0.82) and non-PGQ (specificity: 0.87) at a probability cutoff of 0.50. The accuracy is correspondingly high at 0.85. Likewise, the model’s area under the receiver operating characteristic curve (AUC), which considers the full range of possible cutoffs, has a very high value of 0.88 on the validation set (Fig. S4). Additionally, the model achieves substantial agreement beyond chance in its classifications with Cohen’s kappa coefficient of 0.69. While slightly below the ideal, this value falls within 0.61–0.80, which Landis and Koch defined as substantial agreement²⁵. This slightly lower kappa value reflects the inherent complexity and variability of the environmental data and processes analyzed in this study. The concordance between our model’s predictions and the PGQ maps presented in “2021 Annual Report of China’s Groundwater Monitoring Project”²⁶, attests to the model’s accuracy and reliability (Fig. S5).

China’s groundwater quality during 1980–2020

Utilizing the developed model, we drew annual maps of the probability of PGQ across China from 1980 to 2020, with a detailed resolution of 1 km. In 1980, regions with PGQ probability exceeding 0.5 were primarily located in Southwest China, Northwest China, and parts of Northeast China, showing diverse levels of PGQ probability (Fig. 1a). As the 21st century unfolded, the situation evolved, with a substantial increase in PGQ probability observed in North China and Central China (Fig. 1b). These regions, which previously had relatively good groundwater quality, experienced a notable degradation by 2020 (Fig. 1c). The probability of being classified as having PGQ exceeds 0.9 in up to 2.1% of the national area, particularly across most regions of North China, signaling a notable deterioration in groundwater quality. In 1980, only about 17.3% of the national area was affected by PGQ (Fig. 2a). By 2000, this percentage had slightly increased to 22.2%. However, there was a substantial rise to a peak of 41.6% in 2019, followed by a slight decrease to 40.8% in 2020.

**Fig. 1: Spatial-temporal patterns of China’s groundwater quality and affected population.**

**Fig. 2: Temporal Evolution of PGQ Area and Affected Population.**

Figure 3a illustrates the spatial distribution of groundwater quality category changes in China from 1980 to 2020, detailing regions of deterioration (non-PGQ changed to PGQ), improvement (PGQ changed to non-PGQ), and stability (unchanged). Across the country, 25.3% of the national area experienced deterioration in groundwater quality, 1.8% showed improvement, and 72.9% remained unchanged. Figs. S6a and S6b show that the area with deteriorated groundwater quality expanded from 6.8% during 1980–2000 to 20.9% during 2000–2020, suggesting an accelerated deterioration over recent decades. Between 1980 and 2000, regions with deteriorated groundwater quality were primarily scattered across Central China, North China, and coastal regions (Fig. S6a). In the last two decades, this situation spreads to encompass more regions, including North China, Northeast China, Northwest China, and Southeast China (Fig. S6b).

**Fig. 3: Groundwater quality changes and the drivers.**

Compared to the expansion of PGQ area, the populations affected have undergone a more pronounced change. In the last century, most of the regions with PGQ were not densely populated, with the majority having a population density less than 100 people per km² (Fig. 1d). However, as we entered the new century, there has been a noticeable increase in the population density within the newly emerged PGQ regions (Fig. 1e). Approximately one-third of these PGQ regions now have a population density exceeding 100 people per km² (Fig. 1f). From 1980 to 2020, the area ratio suffering from PGQ has doubled, while the proportion of affected population experienced a fourfold escalation. This demographic change highlights an increase from affecting 6.8% of the population in 1980, to 17.5% in 2000, peaking at 38.3% in 2018 before slightly decreasing to 36.0% in 2020 (Fig. 2b).

By analyzing drivers of groundwater quality changes from 1980 to 2020 (“Analysis of drivers” in Methods), we have obtained the distribution of dominant factors (Fig. S7a) and their relative contributions (Fig. S7b). This analysis showed a heterogeneous geographical pattern, illustrating the spatial heterogeneity of the driver impacts.

Pollution-dominant regions encompassed the largest area, covering 16.2% of the national area (Fig. 3b). Among the pollution sources, agricultural discharge emerged as the predominant contributor, affecting 9.6% of the national area, mainly located in North China, Northeast China, and Central China. Industrial discharge as the dominant driver was concentrated primarily in Northeast China and Central China, affecting 4.4% of national area. In contrast, regions predominantly influenced by domestic discharge were concentrated in Southeast China, comprising 2.2% of the national area. Additionally, regions where groundwater exploitation was the main driver constituted 4.6% of national area, notably impacting Northeast China and Northwest China. The distribution of regions dominated by land use was fragmented, with the smallest area ratio of 0.2%.

During 1980–2020, the drivers of groundwater quality change in China have evolved spatially and temporally. Notably, in regions experiencing groundwater deterioration, agricultural discharge and climate change emerged as the primary drivers from 1980 to 2000, affecting 2.7% and 1.8% of the national area, respectively (Fig. S6c). The period from 2000 to 2020 saw a change in dominant drivers, with agricultural discharge and groundwater exploitation taking precedence, controlling 7.6% and 5.7% of the national area, respectively (Fig. S6d). This period witnessed a marked escalation in the cumulative impact of various environmental factors on the change of PGQ area ratio. Specifically, agricultural discharge experienced a dramatic increase in its contribution to the PGQ area ratio, climbing from 0.04% in 1985 to 10.7% in 2020, while industrial discharge rose from 0.09% to 5.3% over the same period (Fig. 3c). Similarly, groundwater exploitation marked a notable increase, escalating from 0.07% to 5.6%. Domestic discharge, though initially negligible, also notably impacted PGQ area ratio, contributing an increase of 1.7% by 2020 from 0.1% in 1985. Concurrently, climate change demonstrated a fluctuating yet persistent moderate influence, starting from a 0.1% contribution in 1985, peaking at 0.9% in 2000, and stabilizing at 0.5% in 2020. In contrast, land use changes had a consistently minimal and predominantly positive impact on groundwater quality dynamics.

Post-2020 projections of groundwater quality changes

We analyzed the temporal evolution of groundwater quality from 2021 to 2100 under four distinct scenarios, enabling an assessment of long-term influence of these key drivers on groundwater quality (Fig. 4). In Scenario I, we maintained the intensity of all influencing factors at their 2020 levels. To further investigate the responses of groundwater quality to future socio-economic development and climate change, we used the Shared Socio-economic Pathways (SSP) scenarios, i.e., SSP1-1.9 (Scenario II), SSP2-4.5 (Scenario III), and SSP5-8.5 (Scenario IV), from the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR6)²⁷ (“Scenario settings” in Methods). Specifically, Fig. 4a-d shows the PGQ probability in 2050 under four scenarios (I, II, III, and IV), while Fig. 4e-h depicts the affected populations, respectively. Additionally, Fig. S8 explores changes in PGQ probability up to 2050, and Fig. S9 displays the changes in groundwater quality categories between 2020 and 2050 under four scenarios, whereas Fig. S10 extends to 2100, detailing the distribution of PGQ probability as well.

**Fig. 4: Projections of China’s groundwater quality under four scenarios (I, II, III, and IV).**

Scenario I projects a near-constant rate of deterioration, with the PGQ area ratio projected to rise from 40.8% in 2020 to 42.9% in 2050, and slightly increase further to 43.1% in 2100 (affecting ~35.3% of the population) (Fig. 2). Key regions of concern include Southeast China, Southwest China, and Central China by 2050 (Fig. S9a), particularly in South China where deterioration is expected to expand by 2100 (Fig. S10i).

In contrast, Scenario II projects substantial improvements, driven by proactive environmental policies and reduced emissions. This optimistic projection is evidenced by the PGQ area ratio’s decline to 37.9% in 2050, with expectations for a further reduction to 33.2% in 2100, illustrating a sustained positive trend (Fig. 2). The affected population is expected to decrease from 31.1% in 2050 to 21.1% in 2100. This improvement is predominantly observed in Northeast China and North China in 2050 (Fig. S9b), with additional progress anticipated in South China in 2100 (Fig. S10j).

Scenario III presents a concerning outlook, revealing a troubling trend with the PGQ area ratio rising from 47.2% in 2050 to 49.1% in 2100 (Fig. 2). Approximately 39.1% of the population could be exposed to PGQ in 2050, further increasing to 40.5% in 2100. This scenario reflects a situation of medium pollution and persistent emissions, leading to widespread deterioration, particularly in Central China and Southwest China in 2050 (Fig. S9c). Moreover, the projection indicates an expansion of the affected regions, extending groundwater quality deterioration to Northwest China and South China in 2100 (Fig. S10k).

Scenario IV, which represents the highest emission trajectory, projects an even more alarming increase in PGQ area ratio (48.3% in 2050 and 50.8% in 2100) (Fig. 2). Concurrently, approximately 40.7% of the population could be exposed to PGQ in 2050, with this proportion slightly decreasing to about 39.5% in 2100. The spatial distribution of PGQ expands notably, observed in the more industrialized and populated regions like Northeast and Central China in 2050 (Fig. S9d), and later spreads to additional regions in Northwest China and South China in 2100 (Fig. S10l).

Across all four scenarios, relative to the spatial distribution pattern observed in 2020, the probability of encountering PGQ in northern regions, especially in North China and Northeast China, is expected to slightly decrease by 2050 (Fig. S8) and 2100 (Figs. S10e-h), though the overall groundwater quality is projected to remain poor (Figs. S9 and S10i-l).

Our analysis reveals a complex interplay of factors (i.e., climate change, groundwater exploitation, land use and pollution discharge) driving changes in China’s groundwater quality during 2020–2100 (“Analysis of drivers” in Methods). The spatial distribution of dominant factors affecting PGQ probability in China during 2020-2050 is detailed in Fig. S11, specifically focusing on areas experiencing deterioration (Fig. 5a, b, d e) or improvement (Fig. 5c). Temporal trends of these drivers across China are presented in Fig. S12.

**Fig. 5: Dominant drivers of China’s groundwater quality change under four scenarios (I, II, III, and IV) during 2020–2100.**

Pollution discharge plays a critical role, especially in regions like Central China and Southwest China, where industrial, agricultural, and domestic activities are prevalent, with Scenario IV controlling the largest area, with 3.0% of the national area (Fig. 5e). In comparison, pollution affects 2.2% of the national area under Scenario III (Figs. 5d), and 2.6% in Scenario I (Fig. 5a). During 2020–2050, pollution is the leading contributor to the increase in PGQ area ratio, particularly noted in Scenario IV with an increase of 4.0% (Fig. 5f). This trend not only persists but also intensifies by 2100, with pollution contributing an additional increase of 4.2% in PGQ area ratio relative to 2020, reflecting the escalating effects of pollution under high-emission scenarios.

Land use, whose changes are driven by agricultural expansion, urbanization, and other human activities, predominantly impacts groundwater quality in Central China and Eastern China, particularly in regions such as the North China Plain, the middle and lower reaches of the Yangtze River, and the Northeast Plain, where intensive agricultural and urban development are prevalent. Under Scenario IV, land use dominates groundwater quality change in 5.6% of the national area, while Scenario III sees a controlling area ratio of 5.5% (Fig. 5d, e). Figure 5f reveals that land use is one of the key contributors to changes in PGQ area ratio, with Scenario IV leading to a 1.5% increase, and Scenario III showing a 1.2% increase by 2100. Notably, Scenario II stands out with a substantial decrease in land-use-related impacts, reducing the PGQ area ratio by 0.9% in 2050 and further to 1.8% in 2100, indicating a potential positive effect of controlled changes in land use on groundwater quality in certain scenarios.

Groundwater exploitation notably affects regions with high water demand, such as Northeast China, Northwest China, and North China. Under all scenarios, it contributes to groundwater quality deterioration in up to 0.3% of the national area. However, Scenario II stands out for its positive effects, improving groundwater quality across ~4.0% of the national area (Fig. 5c). From 2020 to 2050, reductions in groundwater exploitation, particularly under Scenario II, result in a slight improvement in PGQ area ratio by 0.2% (Fig. 5f). In 2100, continued efforts to reduce groundwater exploitation under Scenario II are projected to further reduce the PGQ area ratio by up to 0.6%, underscoring the importance of sustainable groundwater management.

Climate change exerts a moderate yet notable influence on groundwater quality across all scenarios, especially in Northwest China, Central China, and Southeast China. In Scenario I, climate change affects 1.2% of the national area (Fig. 5a), increasing to 3.1% in Scenario IV (Fig. 5e). The impact is similarly substantial in Scenario III, where it influences 3.1% of the national area. In 2050, climate change has a modest impact on the PGQ area ratio, subtly altering it by less than 1.0% across various scenarios (Fig. 5f). However, in 2100, its effect becomes slightly more pronounced, potentially increasing the PGQ area ratio by up to 1.8% in Scenario IV, reflecting a steady, though gradual, worsening of groundwater quality driven by this factor.

Discussion

In this study, our model ranks the factors that notably influence the prediction accuracy of groundwater quality, identifying depth-based groundwater type as the most critical predictor (Fig. S3). Sensitivity analysis, involving a ±10% change in each variable, highlighted permeability as the most sensitive factor (Fig. S13). In addition, we analyzed the interrelationships among the predictors (Fig. S2), as well as the partial dependence plots and feature interaction charts (Figs. S14, S15), demonstrating the complex interactions among the predictors. Shallow groundwater is more vulnerable to surface contaminants, while deeper groundwater is less impacted²⁸. Regions with lower precipitation, higher temperatures, and higher groundwater exploitation rates showed higher PGQ probabilities (Fig. S15). Climate-driven changes directly affect recharge rates, influence groundwater replenishment, while indirectly alter groundwater usage patterns²⁹. In arid and semi-arid regions, reduced rainfall leads to over-extraction and higher pollutant concentrations due to decreased dilution, whereas humid regions see increased mobilization and dilution of pollutants^30,31. As climate change continues to reshape global temperature and precipitation patterns, these dynamics are expected to become increasingly complex. Additionally, soil sand content (100–200 cm) notably affects recharge and contamination, with higher content increasing susceptibility to surface contaminants (Fig. S14f)³². It is essential to note that this analysis is based on a nationwide perspective, and the most critical predictors vary spatially, as shown in Fig. S7a. These insights highlight the importance of informing policymakers and stakeholders about region-specific factors that must be managed to improve groundwater quality.

It’s indisputable that increased pollution discharges and excessive exploitation have played critical roles in the deterioration of this vital resource over the past decades. Groundwater exploitation in China has surged from 645.7 billion m³ in 1980 to a peak of 1133.8 billion m³ in 2012 (Fig. S16a). Similarly, the total volume of domestic and industrial wastewater has also doubled, increasing from around 41.4 billion tons in 1980 to about 85.2 billion tons in 2020, with the majority of this growth attributed to domestic discharge (Fig. S16b). Pollution discharges have been the main drivers of groundwater quality degradation in China over the past four decades, particularly from agricultural discharge in major regions like North China³³ (Figs. 3 and S7). Excessive groundwater exploitation serves as a secondary driver, predominantly in North China and Northwest China. Moreover, industrial and domestic discharges notably impact groundwater quality³⁴. North China, one of the most important agricultural and economic centers in China, bears the brunt of groundwater exploitation, with rising groundwater withdrawal and waste discharge intensifying the risk of pollutants seeping into aquifers and threatening the quality of deep groundwater resources³⁵. Remarkably, changes in land use, such as optimized management of industrial and agricultural lands, converting agricultural land to forested areas or conservation zones under the widespread “grain-for-green” policy, establishing urban green spaces, and restoring wetlands, likely contribute to the improvement of groundwater quality by reducing pollution from agricultural chemicals³⁶. Strengthening such environmentally friendly measures, like improving fertilizer and pesticide use efficiency, is crucial for protecting groundwater quality³⁷.

Despite groundwater quality in China has faced decades of decline, recent years have seen a notable change in this trend. This change, marked by the slight decrease in the PGQ area ratio in 2020 after years of consistent increase (Fig. 2), may be attributed to a variety of strategic interventions. Groundwater exploitation has steadily decreased by one-fifth in 2020 compared to its peak intensity in 2012 (Fig. S16a), likely attributable to the operation of major water conservancy projects such as the South-North Water Transfer Project³⁸. The establishment and enforcement of groundwater pollution prevention laws and regulations, combined with environmental inspection systems, have played a vital role in curbing pollution sources³⁹. Furthermore, the proliferation of wastewater treatment facilities in China has been on the rise (Fig. S17), attributable to the proactive environmental protection measures enacted by the government, such as “Action Plan for Prevention and Control of Water Pollution”⁴⁰. Innovative developments in pollution control technologies—spanning source control, process blocking, and end-point remediation (supported by dedicated scientific programs) along with their engineered applications—have further contributed to groundwater quality improvements⁴¹. These actions demonstrate how government initiatives can effectively address water quality challenges and set a precedent for achieving the United Nations’ Sustainable Development Goal 6 on sustainable water and sanitation⁴².

In response to diverse development and climate change trajectories, China urgently needs proactive and flexible strategies to manage groundwater resources and control industrial pollution, with a focus on enforcing strict compliance and discharge standards for manufacturing enterprises, while prioritizing sustainable development goals and integrating climate resilience tactics to ensure the long-term viability of these assets, particularly in vulnerable areas^43,44,45. In North China, particularly the North China Plain, reducing groundwater over-exploitation through stricter regulations and promoting water-saving irrigation is crucial⁴⁶. Additionally, incentivizing sustainable agricultural practices, such as crop rotation and controlled fertilizer application, can also mitigate groundwater contamination⁴⁷. In Northwest China, policies should target reducing industrial water use and enhancing recharge programs, while encouraging water recycling and less water-intensive technologies⁴⁸. In Southern China, particularly in regions prone to extreme weather events, effective flood control and improved land-use planning are needed to reduce contaminant transport during floods, alongside stronger waste management regulations to minimize contamination risks^49,50.

Although visual comparisons and statistical evaluations suggest that the model estimates are generally reasonable, notable uncertainties remain. Monte Carlo simulations were applied to assess uncertainties in PGQ probability predictions, providing insights into spatial and temporal variability. As shown in Figs. S18d-f and S19e-h, uncertainties are moderate overall but higher in North and Northeast China from 1980 to 2100, likely due to complex interactions between socio-economic factors and groundwater systems. Under Scenario IV, uncertainties become more pronounced (Fig. S20), likely driven by increased discharges and groundwater exploitation. This highlights the need for adaptive groundwater management and pollution control strategies, particularly in high-emission scenarios, to mitigate adverse impacts and enhance prediction reliability in vulnerable regions.

Building on these uncertainties, our study also encounters certain limitations, particularly concerning assumptions of socio-economic predictors and the spatiotemporal resolution of data. The use of regional averages instead of point-specific values, the variability in sampling methods, and the exclusion of highly correlated predictor variables—though necessary for model generalization and parsimony—may have led to an incomplete representation of complex interactions, potentially underestimating the contributions of correlated factors. Additionally, assuming steady-state conditions for some environmental variables like soil properties and water table depth may overlook key temporal variations, limiting the capture of dynamic contaminant transport over time. Furthermore, the lack of distinction between geogenic and anthropogenic sources of PGQ and differentiation among phreatic, confined, and unspecified groundwater samples could affect the accuracy of predictions. Moreover, the use of Class V as a threshold for PGQ is conservative, and stricter thresholds like Class IV could be considered. Our single-factor assessment focuses on the worst indicator, which may lead to stricter evaluations but offers a cautious approach to groundwater management. Lastly, our model prioritizes factors influencing overall groundwater quality rather than specific indicators, potentially differing from more targeted assessments.

Despite the inherent limitations, our approach offers distinct advantages over traditional methods, which are often labor-intensive and costly. By integrating regional data from previous research with accessible natural and socio-economic predictors, our methodology presents a comprehensive analytical framework for assessing groundwater qualities. This strategy provides detailed patterns of groundwater quality evolution across China from 1980 to 2100, revealing that pollution discharge, groundwater over-exploitation, and land-use change are major contributors to its degradation, particularly in high-risk regions affecting large populations. Furthermore, our projections indicate that proactive environmental policies can substantially improve groundwater quality under diverse future scenarios. This approach is especially valuable for emerging economies, where limited groundwater data and PGQ are common. By offering a cost-effective solution, our model not only fills critical data gaps but also provides actionable insights into resource management and strategic planning. Although uncertainties remain—especially regarding socio-economic models and emissions paths—the findings underscore the need for adaptive, region-specific groundwater management practices to mitigate socio-economic and climate-related impacts. Future research can build on this model by refining data resolution and examining specific pollutant pathways to enhance predictive reliability and support long-term, sustainable groundwater policies.

Methods

Data collection and pre-processing

The groundwater quality data were obtained from published articles by searching in China National Knowledge Infrastructure (CNKI) and Web of Science. Initially, a total of 31,535 studies comprising 25,200 articles in Chinese and 6335 articles in English were retrieved using the following search formula: “Groundwater quality” (Topic) and “China” (Topic). Given the heterogeneity of the data, the retrieved publications were further filtered according to the following rules: (i) at least five groundwater quality indicators were determined; (ii) sampling locations were provided at prefecture level; (iii) there were more than five sampling points and sufficient statistical information (including mean at least); (iv) land-use type was provided; (v) samples were taken between 1990 and 2020. To effectively capture the contemporary groundwater quality status, data predating 1990 were deliberately omitted due to its limited representativeness and inadequacy in reflecting the prevailing conditions. Finally, 753 articles were identified to establish our dataset, which contains a total of 1977 surveys, as shown in Fig.S1.

The groundwater quality data were classified into two categories: non-PGQ and PGQ, based on the criteria outlined in the Groundwater Quality Standards of the People’s Republic of China (GB/T 14848-2017)²². This classification process involves the following four steps:

Step 1: Collect all relevant groundwater quality indicators from the dataset, ensuring they contain more than five indicators, as specified by established testing standards.

Step 2: Assess each indicator individually by classifying it as Class V or not, based on Table S1.

Step 3: Determine the overall groundwater quality with the worst indicator.

Step 4: Categorize the groundwater quality into two types: non-PGQ (Class I to IV) and PGQ (Class V).

In our analysis, we meticulously built the relationship between predictors and groundwater quality category (PGQ or non-PGQ). As listed in Table S2, the predictors included two classes of steady-state factors (soil properties and geographic and hydrogeologic characteristics) and four classes of time-variant factors (i.e., climate change, pollution discharge, groundwater exploitation, and land-use change).

Groundwater quality is influenced by various environmental and chemical processes, including redox reactions, contaminant transport, and natural filtration mechanisms⁵¹. Given China’s vast and diverse landscape, along with notable regional variations in data availability and quality, we prioritized accessible and quantifiable predictors to ensure broader applicability and consistency in groundwater quality predictions across different regions. Accordingly, we selected predictors that not only capture redox-related influences indirectly but also reflect other critical mechanisms affecting groundwater quality, providing a comprehensive foundation for reliable assessment. To effectively capture these complex processes, we selected predictors such as soil properties (e.g., porosity, permeability, soil type, and soil composition across different soil layers) to reflect contaminant adsorption and biological degradation, geographic and hydrogeologic characteristics (e.g., topographic wetness index, water-table ratio, and groundwater type) to indicate contaminant accumulation and natural dilution, and climatic factors (e.g., temperature, precipitation, and aridity) to represent groundwater recharge, evaporation, and pollutant mobility^17,19,30. Additionally, pollution discharge (agricultural, industrial, and domestic sources) can substantially affect groundwater quality by increasing contaminant loads, while land-use change, can disrupt the natural infiltration and recharge processes⁵². Notably, groundwater exploitation (e.g., groundwater supply and groundwater exploitation rate) emerges as a crucial anthropogenic perturbation factor^53,54. Excessive groundwater extraction lowers water tables, concentrates contaminants, and, in coastal areas, can trigger seawater intrusion, increasing salinity^35,55,56. It can also mobilize naturally occurring contaminants like arsenic and fluoride, further degrading water quality³⁰.

To classify groundwater quality at the prefecture level, we implemented a series of data preprocessing steps. First, regional data were converted into 1 km² grid cells based on regional land-use types, with all cells within a prefecture assigned the corresponding groundwater quality data to ensure consistency. The groundwater quality data were then binarized, assigning a value of zero to non-PGQ and one to PGQ. The purpose of this approach was twofold: (i) to prioritize the fundamental health aspect of safety (non-PGQ) or unsafety (PGQ) of groundwater for drinking; and (ii) to address variations in precision resulting from diverse analysis methods employed in different data sources. Next, natural factors in the predictors were extracted at a 1 km² resolution, assigning each grid cell a unique value representing local environmental conditions. Socio-economic factors were uniformly applied to all grid cells within a province, based on the sampling year. The dataset on land-use types covers specific benchmark years: 1980, 1990, 1995, 2000, 2005, 2010, 2015, 2020, and every five years between 2020 and 2100. For other years, land-use data are assumed to be identical to the adjacent prior benchmark year. For instance, land-use data in 1989 are taken to be the same as those in 1980. For other dynamic factors, their cumulative average values from 1980 to the target year were calculated as the model inputs. This accounts for the lag in pollutant migration and effectively captures the long-term impacts on groundwater quality. Fertilizer consumption and pesticide consumption were averagely allocated per km² of agricultural land, with non-agricultural land assigned zero. Industrial wastewater discharge, industrial solid waste discharge, domestic wastewater discharge, domestic solid waste discharge, and groundwater supply were allocated per km² based on population density for each province. Per capita data for these variables were calculated by dividing the total value by the population of each province, then distributed per km² based on population density. Groundwater exploitation rates were uniformly distributed across each province, with regions having a population density below 1 person/km² assigned a rate of zero. Other factors were allocated per km² based on the resolution of raster data or vector data. Finally, groundwater quality categories for each year and region were matched with the corresponding natural and socio-economic variables to create the training and validation datasets for the RF model.

Model development and evaluation

In this study, the RF model was constructed to predict the spatio-temporal dynamic of groundwater quality from natural and social-economic factors. The RF model was chosen for its ability to handle large datasets with numerous predictor variables while efficiently capturing complex interactions among them²⁴. The RF model uses the bootstrap resampling method to build multiple decision trees. For the construction of each tree, samples are independently selected; however, the distributions for all trees in the forest are the same, which guarantees the robustness of the model. In addition, the RF model offers key advantages such as preventing overfitting by introducing randomness at each decision node, which enhances generalization. By selecting random feature subsets for each tree, RF effectively addresses the challenges of high dimensionality in the data. It also reduces experimental noise and improves prediction accuracy through an ensemble approach, averaging predictions from multiple decision trees. Furthermore, RF provides inherent feature importance measures, helping identify key variables, and handles missing data efficiently, either through imputation or splitting based on available features. With minimal parameter tuning required compared to other machine learning algorithms, RF presents itself as an ideal model for this study^14,57,58.

We began by optimizing the hyperparameters of the RF model, with a particular focus on the number of decision trees. By systematically testing tree counts in increments of 10, from 10 to 200, we identified 50 trees as the optimal balance between accuracy and computational efficiency (Fig. S21). Other parameters, such as minimum leaf size, were kept at their default settings, as these have proven effective in similar applications.

To further validate the model’s performance and safeguard against overfitting, we conducted 10-fold cross-validation. The dataset was randomly partitioned into 10 subsets, with 9 used for training and 1 for validation in each iteration. This process was repeated across all subsets, ensuring comprehensive evaluations. The averaged performance metrics (Table S4 and Fig. S22) confirmed the model’s ability to generalize effectively, without overfitting.

After confirming the model’s robustness through cross-validation, we proceeded to retrain it using 90% of the data (1779 surveys, comprising 67,315 points) for training and 10% (198 surveys, comprising 15,335 points) for validation, each preserving the proportion of PGQ and non-PGQ of the full dataset. This approach maximizes the model’s learning capacity by providing a larger training set while preserving an independent validation set to assess performance on unseen data. This final validation confirmed that the model maintained strong predictive accuracy, further enhancing its generalization ability.

To precisely and comprehensively establish the relationship between model predictors and groundwater quality, 51 predictor variables were initially included. We conducted Pearson correlation analysis to reduce redundancy in the predictor variables. Initially, all original variables were included in the RF model training. Among the list of candidate variables, one is opted out if its Pearson correlation coefficient²³ with any other variables was larger than 0.7. However, certain variables were retained despite exceeding this correlation threshold. This decision was made given that excluding these variables notably reduced the model’s predictive performance, demonstrating their essential contributions to accurately capturing the system’s dynamics. This step-by-step approach ensures that the final set of variables used in modeling or further analysis is less likely to include redundant information, thus improving the parsimony and interpretability of the model. Finally, 25 predictors were reserved to predict the groundwater quality in China. The data sources and detailed information about these predictors were described in the Supplementary Information.

The final RF model was trained with 50 trees using the selected 25 predictors as inputs to classify each 1 km² grid as being PGQ or not. Each decision tree in the forest casts a ‘vote’ for a class label based on the input data, with the final class label determined by the majority vote across all trees. The model also calculates class-label probabilities based on the proportion of trees voting for each class. The performance of the RF model is systematically validated with the following metrics.

Prevalence (Prev) quantifies the proportion of actual positives in the dataset and is calculated as follows⁵⁹:

$${Prev}=\frac{{TP}+{FN}}{{TP}+{FN}+{TN}+{FP}}$$

(1)

where TP (true positives) are samples correctly identified as PGQ; FN (false negatives) are samples incorrectly identified as non-PGQ; TN (true negatives) are samples correctly identified as non-PGQ; FP (false positives) are samples incorrectly identified as PGQ.

General statistical indices, such as accuracy (Acc), sensitivity (Sen), precision (Prec) and specificity (Spec), were employed to quantify the bias of the RF model, with higher values indicating better model effectiveness⁶⁰. The definitions of TP, TN, FP, and FN shed light on the seemingly subtle distinctions among these statistics, which play a crucial role in assessing the model’s performance and its ability to minimize false positives and negatives⁵⁹:

$${Acc}=\frac{{TP}+{TN}}{{TP}+{FN}+{TN}+{FP}}$$

(2)

$${Sen}=\frac{{TP}}{{TP}+{FN}}$$

(3)

$${Prec}=\frac{{TP}}{{TP}+{FP}}$$

(4)

$${Spec}=\frac{{TN}}{{TN}+{FP}}$$

(5)

The performance of the trained RF model was then assessed using the area under the receiver operating characteristic (ROC) curve (AUC). The ROC curve is primarily used to evaluate the performance of a classification model by showing the trade-off between the sensitivity and specificity across different thresholds⁶¹. AUC is a single measure that summarizes the overall performance of RF for classification. The higher the AUC, the better the model is at distinguishing between PGQ and non-PGQ. A perfect classifier would have an AUC of 1, while a completely random one would have an AUC of 0.5.

Additionally, Cohen’s kappa coefficient (K), which measures the agreement between the observed and predicted classifications corrected for chance, is used to further assess the model’s accuracy⁵⁹. Kappa coefficient of 0.6–0.8 is generally considered good, and values above 0.8 are deemed excellent, calculated by:

$$K=\frac{{P}_{{{{\rm{o}}}}}-{P}_{{{{\rm{e}}}}}}{1-{P}_{{{{\rm{e}}}}}}$$

(6)

where P_o is the observed agreement (Acc), and P_e is the expected agreement by chance:

$${P}_{e}=\frac{\left({TP}+{FP}\right)\times \left({TP}+{FN}\right)+({FN}+{TN})\times ({FP}+{TN})}{{({TP}+{FN}+{TN}+{FP})}^{2}}$$

(7)

Apart from evaluating the predictive abilities of the RF model, the spatial patterns of the estimates were validated by comparing them with nationally recognized maps, ensuring their consistency and reliability.

To assess predictor importance, we utilized the error of out-of-bag (OOB) samples among trees, a common metric employed in RF models. This approach involves randomly permuting the values of a specific predictor variable and calculating the resulting change in OOB error. OOB error refers to the prediction error computed on samples that were not included in the bootstrap sample used to train each decision tree in the RF model. If permuting a predictor leads to a considerable increase in OOB error, it suggests that the variable plays a critical role in prediction accuracy. This approach offers a reliable means of identifying the important features for model prediction.

To further assess the sensitivity of each influencing factor on the model’s predictions⁶², we conducted a sensitivity analysis by systematically increasing or decreasing each continuous predictor by 10%. This approach allowed us to observe how these perturbations influenced the predicted average probability of PGQ over the historical period from 1980 to 2020. By comparing these modified predictions to the baseline scenario, where no changes to the predictors were applied, we were able to quantify the degree to which each factor contributed to variability in PGQ probability, providing insights into which predictors most notably affect groundwater quality predictions under different conditions.

Following the identification of key predictors, partial dependence plots (PDPs) were generated to visualize each predictor’s marginal effect on the probability of PGQ. The PDPs were generated by holding the focal predictor constant across its empirical range while integrating out the effects of all other predictors, thereby isolating the unique contribution of the predictor to the response⁶³. This was achieved by averaging the model predictions over the distribution of the data, providing a clear visual representation of the dependency between the predictors and the probability of PGQ.

In addition, feature interaction charts (FICs) were constructed to explore the combined effect of two predictors on the predicted probability of PGQ⁶³. It takes into account the range of values for the two concerned predictors. However, instead of holding one variable constant, FICs vary both variables simultaneously over their empirical ranges. The model’s predictions are then averaged over the joint distribution of the two predictors, and the combined effect is visualized in a two-dimensional color-coded map. This approach highlights how the interaction between two predictors influences the model output, thus providing a deeper understanding of the relationships within the data.

PGQ prediction

The RF model was used to reconstruct and predict the spatiotemporal distribution of groundwater quality from 1980 to 2100, enabling us to infer long-term trends in populations potentially exposed to PGQ. We apply a probability cutoff (Prob_cut) of 0.5 to identify the populations at risk in areas exceeding this threshold. Then affected population living in each grid (1 km²) was calculated by multiplying the grid’s population (Pop) by its probability of PGQ (Prob_poor). The calculation of the potentially affected population (Pop_affect) is summarized in the following equations:

$${{Pop}}_{{{{\rm{affect}}}}}=\left\{\begin{array}{c}{Pop}\times {{Prob}}_{{{{\rm{poor}}}}},\,{{Prob}}_{{{{\rm{poor}}}}} \, > \, {{Prob}}_{{{{\rm{cut}}}}}\\ \,0,\,{{Prob}}_{{{{\rm{poor}}}}}\le {{Prob}}_{{{{\rm{cut}}}}}\end{array}\right.\,$$

(8)

The dynamics of PGQ probability among different periods were controlled by the four categories of time-variant drivers (i.e., climate change, pollution discharge, groundwater exploitation, and land-use change). To isolate the impacts of a specific category of time-variant driver on PGQ, we designed factorial experiments between 1980 and 2020. In the experiment, we just kept one specific driver category (e.g., climate) constant at their initial states in 1980, while allowing the other driver categories vary with time as they are (i.e., the time series in the real world). For instance, the ${Prob}$_Climate means that all drivers in the climate category remained constant at values in 1980 but with real (time-variant) features of drivers in pollution and groundwater categories.

By comparing the simulations between experiments, we could isolate the impact of a target driver category and quantify its contribution. To implement the comparison, we defined the differences between simulation driven by all variables in period y (${Prob}$_All,y) and that driven by partial time-variant driver categories (${Prob}$_Vi,y) as the actual contribution of the target variable i (Vi) and period y (Con_Vi,y):

$${{Con}}_{{Vi},y}={{Prob}}_{{All},y}-{{Prob}}_{{Vi},y}$$

(9)

The relative contribution (%) of variable i (i = 1, 2, 3,…, n) and period y (y = 1, 2, 3,…, k) (RelCon_Vi,y) was defined as:

$${{RelCon}}_{{Vi},y}=\frac{\left|{{Con}}_{{Vi},y}\right|}{{\sum }_{i}^{n}\left|{{Con}}_{{Vi},y}\right|}\times 100\%$$

(10)

To quantify the contribution of each driver category to changes in PGQ area ratio on the national scale, we define the differences between the area ratio of PGQ driven by all variables in period y (${AR}$_All,y) and that driven by partial time-variant driver categories (${AR}$_Vi,y) as the primitive contribution of the target variable i (Vi) and period y (ConAR_Vi,y):

$${{ConAR}}_{{Vi},y}={{AR}}_{{All},y}-{{AR}}_{{Vi},y}$$

(11)

The differences between the area ratio of PGQ driven by all variables in period y (${AR}$_All,y) and that driven by all variables in the baseline year (${{AR}}_{{All},{BY}}$) are defined as the actual growth:

$${{ActAR}}_{{All},y}={{AR}}_{{All},y}-{{AR}}_{{All},{BY}}$$

(12)

Then, the relative contributions of each driver category to changes in the PGQ area ratio are calculated as:

$${{RelAR}}_{{Vi},y}=\frac{{{ActAR}}_{{All},y}}{{\sum }_{i}^{n}{{ConAR}}_{{Vi},y}}\times {{ConAR}}_{{Vi},y}$$

(13)

The grid annual mean relative contributions of the four driver categories in groundwater were used to visualize the spatial patterns of the influential intensity of different driver categories on a national scale. We defined the single dominant driver category of each pixel as the variable with a maximum relative contribution. As pollution discharges are composed of agricultural, industrial, and domestic pollution, the dominant driver category of pollution discharge for each pixel is defined as the secondary driver category with the largest relative contributions among the three secondary driver categories.

To further elucidate the temporal dynamics of groundwater quality, the differences in the projected groundwater quality categories for the years 2020, 2000, and 1980 were calculated, generating maps of groundwater quality changes for the periods 1980–2000, 2000–2020, and 1980–2020. Regions of groundwater quality deterioration (non-PGQ to PGQ) during these periods were extracted as masks. This approach yielded detailed maps of the driving factors behind deteriorated groundwater quality regions across the three distinct periods.

In future scenarios, the analysis of dominant driver focuses exclusively on four primary categories: climate change, groundwater exploitation, land use and pollution discharges, without individually detailing secondary pollution categories. We kept one specific driver category (e.g., climate) constant at its states in 2020, while allowing the other driver categories to vary with time.

We evaluated the temporal evolution of groundwater quality from 2021 to 2100 in four different scenarios, which allowed us to assess the long-term changes in groundwater quality by considering the continuity of the underlying factors over the specified time frame. To evaluate the impacts of climate change on groundwater quality, we utilized the SSP scenarios, namely SSP1-1.9, SSP2-4.5, and SSP5-8.5, as outlined in the IPCC AR6⁶⁴. These scenarios provide a framework for understanding and projecting different socio-economic and climate conditions, allowing us to examine the potential responses of groundwater quality to various development and climate change trajectories.

In Scenario I (Baseline), we assumed a consistent development intensity for all drivers. Climate, groundwater exploitation, and pollution discharges were maintained at the same levels as the cumulative average values in 2020. Similarly, land-use type in Scenario I was also assumed to remain unchanged from 2020. In contrast, Scenarios II-IV incorporated projections to account for future variability. Specifically, climate data from one CMIP6 model——EC-Earth3 were used as model input to project future PGQ probabilities^65,66,67. Additionally, land-use data in these scenarios were derived from the SSP-RCP global 1 km land-use simulation dataset (2020-2100) for SSP1-1.9, SSP2-4.5, and SSP5-8.5 scenarios, updated every five years⁶⁸. To maintain consistency in representation, these land-use data were aligned with those used for the historical period from 1980 to 2020.

In the three scenarios, annual change rates of −3%, 1%, and 3% were assigned as gradients for groundwater exploitation and pollution discharge, respectively. These rates were determined based on historical trends^69,70 and future projections^71,72 related to industrial activities, agricultural expansion, and environmental policies. The low rate of −3% reflects a proactive approach toward reducing resource use and pollution discharge, aligning with the sustainability-focused SSP1-1.9 scenario⁷³. In contrast, the medium rate of 1% corresponds to a moderate development pathway, as seen in SSP2-4.5. This scenario embodies a balanced approach, allowing socio-economic development to progress while placing emphasis on environmental protection⁷⁴. The high rate of 3% aligns with the SSP5-8.5 scenario, characterized by rapid economic growth and minimal environmental regulation⁷⁵. This scenario anticipates high energy demands, with substantial reliance on fossil fuels and industrial expansion. We acknowledge that future development may exhibit more intricate patterns, and our current study may not fully capture this complexity and uncertainty. Then, these rates were applied to the corresponding values from 2021 to 2100, and the cumulative average was calculated as the model inputs for each year. Please refer to Table S5 for detailed information on the specific scenario assumptions.

To evaluate the impact of uncertainties in climate change, groundwater exploitation, and pollution charge on model predictions, we applied Monte Carlo (MC) simulations to quantify the variability in groundwater quality predictions due to input uncertainties⁷⁶. This analysis was conducted for the period 1980–2100, with data processed at 5-year intervals for 1980–2015, annually from 2016 to 2020, and at 5-year intervals for future projections from 2025 to 2100 under four scenarios.

In each scenario-year combination, MC simulations with 50 members were run to introduce variability in key natural and socio-economic predictors. Perturbations were applied to these dominant factors, such as climate, groundwater exploitation and pollution charge, reflecting their real-world uncertainties. For instance, temperature varied within a ±2 °C range, while predictors like precipitation, fertilizer, and pesticide use were perturbed by up to ±20%.

The results of these simulations can be used to evaluate the range of possible groundwater quality outcomes based on uncertainties in input data. For each year and scenario, we calculated the mean and standard deviation of the predicted probabilities based on the ensemble outcomes, providing a robust estimate of both expectation and variability in groundwater quality predictions.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The code and source data required to generate the figures in this study have been deposited in Figshare [https://doi.org/10.6084/m9.figshare.27612528] (ref. ⁷⁷). Data for the predictors can be found in the Supplementary Information. Regional groundwater quality data are protected and are not available due to data privacy laws.

Code availability

The codes for this study are available via Figshare at https://doi.org/10.6084/m9.figshare.27612528 (ref. ⁷⁷).

References

Delaire, C. et al. Assessing Groundwater Quality: A Global Perspective: Importance, Methods and Potential Data Sources. 59 (Friends of Groundwater in the World Water Quality Alliance, 2021).
Lall, U., Josset, L. & Russo, T. A snapshot of the world’s groundwater challenges. Annu. Rev. Environ. Resour. 45, 171–194 (2020).
Article MATH Google Scholar
Kuang, X. et al. The changing nature of groundwater in the global water cycle. Science 383, eadf0630 (2024).
Article PubMed MATH CAS Google Scholar
Bei, E., Wu, X., Qiu, Y., Chen, C. & Zhang, X. A tale of two water supplies in China: finding practical solutions to urban and rural water supply problems. Acc. Chem. Res. 52, 867–875 (2019).
Article PubMed MATH CAS Google Scholar
Ministry of Environmental Protection. National Plan for Groundwater Pollution Prevention and Control (2011–2020). (Ministry of Environmental Protection, 2011).
Zhang, Q., Miao, L., Wang, H., Hou, J. & Li, Y. How rapid urbanization drives deteriorating groundwater quality in a provincial capital of China. Pol. J. Environ. Stud. 29, 441–450 (2019).
Article MATH Google Scholar
Li, M. et al. The decline in the groundwater table depth over the past four decades in China simulated by the Noah-MP land model. J. Hydrol. 607, 127551 (2022).
Article MATH Google Scholar
Ma, T. et al. Pollution exacerbates China’s water scarcity and its regional inequality. Nat. Commun. 11, 650 (2020).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Ministry of Ecology and Environment of the People’s Republic of China. China Ecological and Environmental Status Bulletin. (Ministry of Ecology and Environment of the People’s Republic of China, 2020).
China Geological Survey, Ministry of Natural Resources. National Groundwater Monitoring Report of China. (China Geological Survey, Ministry of Natural Resources, 2022).
Amirabdollahian, M. & Datta, B. Identification of contaminant source characteristics and monitoring network design in groundwater aquifers: an overview. J. Environ. Prot. 4, 26–41 (2013).
Article MATH Google Scholar
Mahlknecht, J. et al. Nitrate prediction in groundwater of data scarce regions: The futuristic fresh-water management outlook. Sci. Total Environ. 905, 166863 (2023).
Article PubMed MATH CAS Google Scholar
Damania, R., Desbureaux, S., Rodella, A.-S., Russ, J. & Zaveri, E. Quality Unknown: The Invisible Water Crisis. (World Bank Publications, 2019).
Haggerty, R., Sun, J., Yu, H. & Li, Y. Application of machine learning in groundwater quality modeling—a comprehensive review. Water Res. 233, 119745 (2023).
Article PubMed CAS Google Scholar
Zhi, W., Appling, A. P., Golden, H. E., Podgorski, J. & Li, L. Deep learning for water quality. Nat. Water 2, 228–241 (2024).
Article PubMed PubMed Central MATH Google Scholar
Amini, M. et al. Statistical modeling of global geogenic arsenic contamination in groundwater. Environ. Sci. Technol. 42, 3669–3675 (2008).
Article ADS PubMed MATH CAS Google Scholar
Rodríguez-Lado, L. et al. Groundwater arsenic contamination throughout China. Science 341, 866–868 (2013).
Article ADS PubMed Google Scholar
Podgorski, J. & Berg, M. Global analysis and prediction of fluoride in groundwater. Nat. Commun. 13, 4232 (2022).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Podgorski, J. & Berg, M. Global threat of arsenic in groundwater. Science 368, 845–850 (2020).
Article ADS PubMed MATH CAS Google Scholar
Cao, H., Xie, X., Wang, Y. & Deng, Y. The interactive natural drivers of global geogenic arsenic contamination of groundwater. J. Hydrol. 597, 126214 (2021).
Article CAS Google Scholar
Cao, H., Xie, X., Wang, Y. & Liu, H. Predicting geogenic groundwater fluoride contamination throughout China. J. Environ. Sci. 115, 140–148 (2022).
Article CAS Google Scholar
Standardization Administration of China. Standard for Groundwater Quality (GB/T 14848-2017) (General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China; Standardization Administration of China, 2017).
Liu, F. et al. Mapping high-resolution national soil information grids of China. Sci. Bull. 67, 328–340 (2022).
Article MATH Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article MATH Google Scholar
Landis, J. R. & Koch, G. G. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 363–374 (1977).
CIGEM. Annual Report of China’s Groundwater Monitoring Project (China Geological Survey, Ministry of Natural Resources, 2021).
Masson-Delmotte, V. et al. Climate change 2021: the physical science basis. Contribution working group I sixth Assess. Rep. intergovernmental panel Clim. change 2, 2391 (2021).
MATH Google Scholar
Berg, R. C., Kempton, J. P. & Cartwright, K. Potential for Contamination of Shallow Aquifers in Illinois. Vol. 532 (Illinois State Geological Survey, 1984).
Kløve, B. et al. Climate change impacts on groundwater and dependent ecosystems. J. Hydrol. 518, 250–266 (2014).
Article ADS MATH Google Scholar
Wang, Y. et al. Genesis of geogenic contaminated groundwater: as, F and I. Crit. Rev. Environ. Sci. Technol. 51, 2895–2933 (2021).
Article MATH CAS Google Scholar
Jia, Y. et al. Distribution, formation and human-induced evolution of geogenic contaminated groundwater in China: a review. Sci. Total Environ. 643, 967–993 (2018).
Article ADS PubMed MATH CAS Google Scholar
Nolan, B. T. et al. Factors influencing ground-water recharge in the eastern United States. J. Hydrol. 332, 187–205 (2007).
Article ADS MATH Google Scholar
Han, D. & Currell, M. J. Review of drivers and threats to coastal groundwater quality in China. Sci. Total Environ. 806, 150913 (2022).
Article PubMed MATH CAS Google Scholar
Machiwal, D., Jha, M. K., Singh, V. P. & Mohan, C. Assessment and mapping of groundwater vulnerability to pollution: current status and challenges. Earth-Sci. Rev. 185, 901–927 (2018).
Article ADS MATH Google Scholar
Thaw, M., GebreEgziabher, M., Villafañe-Pagán, J. Y. & Jasechko, S. Modern groundwater reaches deeper depths in heavily pumped aquifer systems. Nat. Commun. 13, 5263 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Bryan, B. A. et al. China’s response to a national land-system sustainability emergency. Nature 559, 193–204 (2018).
Article ADS PubMed MATH CAS Google Scholar
Burri, N. M., Weatherl, R., Moeck, C. & Schirmer, M. A review of threats to groundwater quality in the anthropocene. Sci. Total Environ. 684, 136–154 (2019).
Article ADS PubMed MATH CAS Google Scholar
Yao, Y. et al. Integration of groundwater into China’s south-north water transfer strategy. Sci. Total Environ. 658, 550–557 (2019).
Article ADS PubMed MATH CAS Google Scholar
Xiang, C. & van Gevelt, T. Central inspection teams and the enforcement of environmental regulations in China. Environ. Sci. Policy 112, 431–439 (2020).
Article MATH Google Scholar
MEP. Action Plan for Prevention and Control of Water Pollution printed and distributed (in Chinese), https://english.mee.gov.cn/News_service/news_release/201504/t20150427_299595.shtml (2015).
Liu, Y.-c, Fei, Y.-h, Li, Y.-s, Bao, X.-l & Zhang, P.-w Pollution source identification methods and remediation technologies of groundwater: a review. China Geol. 7, 125–137 (2024).
MATH Google Scholar
Herrera, V. Reconciling global aspirations and local realities: challenges facing the sustainable development goals for water and sanitation. World Dev. 118, 106–117 (2019).
Article Google Scholar
Howells, M. et al. Integrated analysis of climate change, land-use, energy and water strategies. Nat. Clim. Change 3, 621–626 (2013).
Article ADS MATH Google Scholar
Naddaf, M. The world faces a water crisis—4 powerful charts show how. Nature 615, 774–775 (2023).
Article ADS PubMed MATH CAS Google Scholar
Famiglietti, J. S. The global groundwater crisis. Nat. Clim. Change 4, 945–948 (2014).
Article ADS MATH Google Scholar
Lapworth, D. J., Boving, T. B., Kreamer, D. K., Kebede, S. & Smedley, P. L. Groundwater quality: Global threats, opportunities and realising the potential of groundwater. Sci. Total Environ. 811, 152471 (2022).
Article PubMed CAS Google Scholar
Gan, L. et al. Distributions, origins, and health-risk assessment of nitrate in groundwater in typical alluvial-pluvial fans, North China Plain. Environ. Sci. Pollut. Res., 1–18 (2022).
Chen, L., Caro, F., Corbett, C. J. & Ding, X. Estimating the environmental and economic impacts of widespread adoption of potential technology solutions to reduce water use and pollution: Application to China’s textile industry. Environ. Impact Assess. Rev. 79, 106293 (2019).
Article Google Scholar
Kato, S. & Huang, W. Land use management recommendations for reducing the risk of downstream flooding based on a land use change analysis and the concept of ecosystem-based disaster risk reduction. J. Environ. Manag. 287, 112341 (2021).
Article MATH Google Scholar
Sinisi, L. & Aertgeerts, R. Guidance on Water Supply and Sanitation in Extreme Weather Events. (World Health Organization. Regional Office for Europe, 2011).
Islam, A. & Quaff, A. R. Groundwater contamination and remediation: a review. Afr. J. Biomed. Res. 27, 3855–3867 (2024).
Article MATH Google Scholar
Smedley, P. & Kinniburgh, D. G. Essentials of Medical Geology: Revised Edition (ed O. Selinus) 279–310 (Springer Netherlands, 2013).
Wang, Y. et al. Groundwater quality and health: making the invisible visible. Environ. Sci. Technol. 57, 5125–5136 (2023).
Article ADS PubMed MATH CAS Google Scholar
Niazi, H. et al. Global peak water limit of future groundwater withdrawals. Nat. Sustain. 7, 413–422 (2024).
Article MATH Google Scholar
Jasechko, S. et al. Rapid groundwater decline and some cases of recovery in aquifers globally. Nature 625, 715–721 (2024).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Wu, J., Meng, F., Wang, X. & Wang, D. The development and control of the seawater intrusion in the eastern coast of Laizhou Bay, China. Environ. Geol. 54, 1763–1770 (2008).
Article ADS MATH Google Scholar
Pham, Q. B., Tran, D. A., Ha, N. T., Islam, A. R. M. T. & Salam, R. Random forest and nature-inspired algorithms for mapping groundwater nitrate concentration in a coastal multi-layer aquifer system. J. Clean. Prod. 343, 130900 (2022).
Article CAS Google Scholar
Podgorski, J. E., Labhasetwar, P., Saha, D. & Berg, M. Prediction modeling and mapping of groundwater fluoride contamination throughout India. Environ. Sci. Technol. 52, 9889–9898 (2018).
Article ADS PubMed CAS Google Scholar
Taylor, K. E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 106, 7183–7192 (2001).
Article ADS MATH Google Scholar
Pal, S. C., Ruidas, D., Saha, A., Islam, A. R. M. T. & Chowdhuri, I. Application of novel data-mining technique based nitrate concentration susceptibility prediction approach for coastal aquifers in India. J. Clean. Prod. 346, 131205 (2022).
Article CAS Google Scholar
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
Article ADS MATH Google Scholar
Xu, C. & Gertner, G. Z. Uncertainty and sensitivity analysis for models with correlated parameters. Reliab. Eng. Syst. Saf. 93, 1563–1573 (2008).
Article MATH Google Scholar
Cutler, D. R. et al. Random forests for classification in ecology. Ecology 88, 2783–2792 (2007).
Article PubMed MATH Google Scholar
Arias, P. A. et al. IPCC Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change 33−144 (Cambridge, 2021).
Thrasher, B. et al. NASA global daily downscaled projections, CMIP6. Sci. Data 9, 262 (2022).
Article PubMed PubMed Central Google Scholar
Peng, S. 1 km multi-scenario and multi-model monthly temperature data for China in 2021–2100. National Tibetan Plateau/Third Pole Environment Data Center (2022).
Peng, S. 1 km multi-scenario and multi-model monthly precipitation data for China in 2021–2100. National Tibetan Plateau/Third Pole Environment Data Center (2022).
Qi, Z. Global 1 km Land Use Simulation Dataset under SSP-RCP Scenarios (2020–2100). National Earth System Science Data Center, National Science & Technology Infrastructure of China. (https://doi.org/10.12041/geodata.29986078912207.ver1.db).
NBSC. China City Statistical Yearbook. (National Bureau of Statistics of China, China Statistics Press, 1985–2021).
NBSC. China Statistical Yearbook on Environment. (National Bureau of Statistics of China, China Statistics Press, 1981–2021).
Riahi, K. et al. The Shared Socioeconomic Pathways and their energy, land use, and greenhouse gas emissions implications: an overview. Glob. Environ. Change 42, 153–168 (2017).
Article MATH Google Scholar
O’Neill, B. C. et al. The roads ahead: Narratives for shared socioeconomic pathways describing world futures in the 21st century. Glob. Environ. Change 42, 169–180 (2017).
Article MATH Google Scholar
Van Vuuren, D. P. et al. Energy, land-use and greenhouse gas emissions trajectories under a green growth paradigm. Glob. Environ. Change 42, 237–250 (2017).
Article MATH Google Scholar
Fricko, O. et al. The marker quantification of the Shared Socioeconomic Pathway 2: a middle-of-the-road scenario for the 21st century. Glob. Environ. Change 42, 251–267 (2017).
Article Google Scholar
Kriegler, E. et al. Fossil-fueled development (SSP5): An energy and resource intensive scenario for the 21st century. Glob. Environ. Change 42, 297–315 (2017).
Article MATH Google Scholar
Mooney, C. Z. Monte Carlo Simulation (Sage, 1997).
Zhou, Q. et al. Groundwater Quality Evolution Across China. Figshare https://doi.org/10.6084/m9.figshare.27612528 (2024).
Article MATH Google Scholar

Download references

Acknowledgements

We would like to thank the many providers of data, which were an essential component of this work. Acknowledgment for the data support from “The National Tibetan Plateau Data Center (TPDC)”, ‘National Earth System Science Data Center (https://www.geodata.cn)’ and “Geospatial Data Cloud”. This work is supported by the National Key Research and Development Program of China (Nos. 2020YFC1807002, Y.J.Y., 2021YFC1809103, Y.J.Y., and 2022YFD1700104, Y.M.L.), the National Natural Science Foundation of China (Nos. 42077140, Y.J.Y. and 41991335, Y.M.L.), the Natural Science Foundation of Jiangsu Province (BK20231461, J.J.Z.) and Conservation of Biodiversity in China in the light of Climate Change (CHN-2152, 18/0015, Q.C.).

Author information

These authors contributed equally: Qing Zhou, Jiangjiang Zhang, Shuyou Zhang.

Authors and Affiliations

Institute of Soil Science, Chinese Academy of Sciences, Nanjing, China
Qing Zhou, Shuyou Zhang, Qiang Chen, Yanni Zhang, Yadi Yang & Yijun Yao
University of Chinese Academy of Sciences, Beijing, China
Qing Zhou, Shuyou Zhang, Qiang Chen, Yanni Zhang & Yijun Yao
Yangtze Institute for Conservation and Development, Hohai University, Nanjing, China
Jiangjiang Zhang & Chenglong Cao
The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China
Jiangjiang Zhang & Chenglong Cao
College of Environment, Hohai University, Nanjing, China
Shuyou Zhang
Nanjing Institute of Environmental Sciences of the Ministry of Ecology and Environment, Nanjing, China
Qiang Chen
Xuchang Meteorological Service, Xuchang, China
Huifeng Fan
School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Jian Luo

Authors

Qing Zhou
View author publications
Search author on:PubMed Google Scholar
Jiangjiang Zhang
View author publications
Search author on:PubMed Google Scholar
Shuyou Zhang
View author publications
Search author on:PubMed Google Scholar
Qiang Chen
View author publications
Search author on:PubMed Google Scholar
Huifeng Fan
View author publications
Search author on:PubMed Google Scholar
Chenglong Cao
View author publications
Search author on:PubMed Google Scholar
Yanni Zhang
View author publications
Search author on:PubMed Google Scholar
Yadi Yang
View author publications
Search author on:PubMed Google Scholar
Jian Luo
View author publications
Search author on:PubMed Google Scholar
Yijun Yao
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.J.Y. and J.L. designed the study. Q.Z., H.F.F., C.L.C., Y.N.Z, and Y.D.Y. collected the data. Q.Z., H.F.F., S.Y.Z., and Q.C. analyzed the data. Y.J.Y., Q.Z., and J.J.Z. performed the modeling. Y.J.Y., Q.Z., and J.J.Z. wrote the first complete draft of the manuscript. Y.J.Y., J.L., Q.Z., and J.J.Z. revised the paper with inputs from all co-authors. All authors contributed to the interpretation of results, writing, and revision of the paper.

Corresponding authors

Correspondence to Jian Luo or Yijun Yao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Rahim Barzegar, Hui Qian, and Chunmiao Zheng for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, Q., Zhang, J., Zhang, S. et al. Groundwater quality evolution across China. Nat Commun 16, 2522 (2025). https://doi.org/10.1038/s41467-025-57853-z

Download citation

Received: 17 July 2024
Accepted: 03 March 2025
Published: 14 March 2025
DOI: https://doi.org/10.1038/s41467-025-57853-z

This article is cited by

Spatial Variability and Ecological Assessment of Groundwater Hydrogeochemistry in the Sanjiang Plain
- Yeping Li
- Xiangxi Meng
- Minglan Ren
Water Conservation Science and Engineering (2025)