Introduction

The fluctuations in inland water bodies (quantity) and their complex interactions with physical, chemical, and biological characteristics (quality) are fundamental to sustaining life on Earth. Among them, water quality is sensitive to both natural and anthropogenic disturbances1. On the one hand, climate change greatly impacts river water quality via both gradual and episodic patterns2. Biogeochemical processes in rivers are significantly disrupted by anthropogenic activities such as population growth, land development, and agricultural production3. It is estimated that more than 50% of global river systems are no longer classified as natural4. Against this backdrop, comprehending and systematically quantifying the effects of natural and anthropogenic drivers on water quality dynamics is critically important. This understanding is pivotal not only for the supply and safety of water for human consumption but also for food production, the energy supply and the overall health and stability of aquatic ecosystems5.

Currently, most studies focus on long-term water quality trends on continental or regional scales6,7, with fewer studies addressing seasonal trends, primarily in remote, natural watersheds. These studies emphasize the impact of large-scale climate change on river systems, confirming that seasonal climatic variability is closely linked to the migration, transformation, and biochemical reaction rates of water pollutants8. However, the influence of anthropogenic interventions on seasonal trends is greater than that on interannual trends. For example, land management measures such as road construction, farmland conservation, and reforestation are typically implemented on a subannual scale, and can influence the seasonal dynamics of river water quality indicators by regulating surface runoff generation, controlling soil erosion, and altering the transport of organic matter9. Agricultural activities such as fertilization and irrigation exhibit clear seasonal dynamics, with precipitation exacerbating the uneven effects of nutrient runoff10. Industrial water use and wastewater discharge vary according to the market demand by season11. Consequently, human activities drive biogeochemical processes involving oxygen, nutrient, and sediment cycling, profoundly altering seasonal river water quality dynamics.

Water quality dynamics reflect both physical processes, such as river–atmosphere gas exchange, and biological processes, including photosynthetic oxygen production and respiratory oxygen consumption. However, systematic attribution analyzes at the national scale are constrained by factors such as the scarcity of meteorological, terrestrial, and landscape gradient sites and inconsistent data collection frequencies12. Some researchers attribute river changes to climate forcing13, whereas others emphasize the role of anthropogenic pressures14,15. Li et al. proposed an integrative mechanistic perspective that extends beyond singular effects, emphasizing the impact of dynamic changes in geomorphology, structure, and material and energy flows within river–terrestrial systems16. These processes are shaped and regulated by both natural factors and human activities. Therefore, it is essential to transcend traditional watershed boundaries and systematically assess the relative impacts of natural factors and human activities on river health.

China’s river basins, marked by diverse hydrology, variable climate, and uneven anthropogenic pressures, face concurrent challenges from agricultural intensification, industrial pollution, and urban expansion17. These characteristics make them ideal for studying water quality dynamics and coupled human–nature responses. We analyzed the interannual and seasonal variations in COD and DO concentrations across China during the period 2006–2020, elucidating the differences in seasonal water quality variations between natural and managed watersheds. Among them, COD and DO are the most nationally representative water quality parameters for identifying pollution levels and assessing the health status of water bodies6. Building on this, we compared seasonal water quality trends between 1915 natural watersheds and nearby managed counterparts with similar climates to disentangle climatic and anthropogenic influences. To quantify the direction and strength of human intervention, we introduce the T-NM index, capturing its asymmetric amplification and attenuation effects across seasons. We compiled a 15-year comprehensive dataset for the ten largest watersheds in China, encompassing six major categories and 30 attributes, including seasonal elements, meteorology, watershed attributes, socioeconomics, land use, and landscape. Using a machine learning framework, we conducted a decoupling analysis to distinguish the natural and anthropogenic influences on seasonal water quality variations across natural and managed watersheds nationwide.

Our study addresses three key questions: (1) How have COD and DO evolved over the past 15 years in different watersheds in China? (2) What are the similarities and differences in the seasonal water quality trends between natural and managed watersheds in China? (3) What are the driving factors of the seasonal water quality variations in both types of watersheds?

Results

Decadal trends in COD and DO concentrations

A wide range of COD and DO concentrations was observed in the COD and DO concentrations nationwide, with annual average COD and DO concentrations ranging from 0.72–15.5 mg L−1 to 2.1–11.1 mg L−1, respectively (Fig. S1). The northern river basins, such as the Songhua River (ShR), Huai River (HuR), and Hai River (HaR) Basins, exhibited relatively high COD concentrations, averaging over 4 mg L−1, whereas the southern rivers demonstrated average concentrations below 3 mg L−1. The DO concentration exhibited a distinct latitudinal gradient, resulting in lower concentrations in rivers at lower latitudes (Fig. S1). Trend analysis revealed that 61.1% of watersheds exhibited a decreasing trend in COD concentrations between 2006 and 2020, of which 35.2% were statistically significant (p < 0.05); meanwhile, 64.7% of watersheds showed an increasing trend in DO concentrations, with 26.4% reaching statistical significance (Fig. 1a, b). Among the watersheds with significant trends, the median changes in the COD and DO concentrations were −1.8% and 3.7%, respectively. Nationally, the average change rates of COD and DO were −1.57 and 0.93 mg L−1 dec−1, respectively, indicating an overall significant river water quality improvement across China since 2006 (Table S1). These findings conform with those of Ma et al., who reported average change rates of COD and DO of −3.8 and 0.71 mg L−1 dec−1, respectively, in Chinese rivers between 2003 and 20176. Specifically, the ShR and Liao River (LiR) Basins in North China showed faster water quality improvements, with COD reduction rates exceeding 1.43 mg L−1 dec−1 and DO increase rates surpassing 1.34 mg L−1 dec−1, whereas the Pearl River (PeR) in the southernmost part of China showed slower water quality improvement and even an increasing COD concentration trend (Table S1). The ShR and LiR Basins have undergone sustained industrial decline, accompanied by stricter pollution control enforcement and expanded wastewater treatment infrastructure, resulting in substantial reductions in organic pollutant loads. In contrast, the PeR Basin, characterized by rapid urbanization and dense population, continues to face mounting challenges from non-point source pollution. During the rainy season, intensified agricultural runoff and diffuse pollutant inputs may partially undermine the effectiveness of existing management measures18.

Fig. 1: Trajectories and spatial distribution of COD and DO concentration changes across China’s ten major river basins.
figure 1

Histograms showing the percentage of sub-watersheds with significant changes (p < 0.05) in COD (a) and DO (b) concentrations from 2006 to 2020. Dark gray indicates increasing concentration trends, while light gray represents decreasing trends. The red dashed line marks no change in concentration. c Quadrant chart illustrating four distinct trajectories of COD and DO concentration changes in sub-watersheds. Colored circles denote significant trends (p < 0.05), while gray circles indicate non-significant trends (p > 0.05). d Bar chart showing the number of watersheds corresponding to each of the four concentration change trajectories. e Spatial distribution of the four concentration change trajectories across China’s ten major river basins. Colored circles represent significant trends (p < 0.05), and gray circles denote non-significant trends (p > 0.05). f Distribution of river networks and monitoring stations, with green lines indicating our first-order rivers and gray dots indicating monitoring stations.

We classified the trajectories of COD and DO concentration changes into four categories via their combined trends (Fig. 1c–d): (1) quadrants Q1 and Q3 represent synchronous changes in COD and DO, with Q1 indicating simultaneous increases (288 watersheds, 62 significant) and Q3 indicating simultaneous decreases (319 watersheds, 26 significant); (2) quadrants Q2 and Q4 represent asynchronous changes, where Q2 denotes a reduction in COD coupled with an increase in DO (687 watersheds, 202 significant), and Q4 denotes an increase in COD accompanied by a decrease in DO (167 watersheds, 10 significant). Our analysis focused on watersheds exhibiting significant trends in both COD and DO. Spatially, these four trajectory types were distributed across various watersheds nationwide (Fig. 1e and Table S2). Quadrant Q1, representing significant increases in both COD and DO, accounted for 19.8% ± 7.0% of the four trajectories in the ten major river basins, with the southernmost PeR exhibiting a notably greater proportion (12 watersheds, 38.7%). Quadrant Q2, characterized by COD reduction and DO increase, dominated all four quadrants (69.1% ± 8.8%), peaking at 87.5% in the Yellow River (YeR) Basin. In quadrant Q3, significant reductions in COD and DO were mostly observed in watersheds located in southern China, such as the Yangtze River (YaR), HuR, and Southwest River (SwR) Basins. Notably, within quadrants Q2 and Q3, an additional dominant trajectory emerged, characterized by a notable reduction in COD accompanied by a nonsignificant increase in DO (Q2, 226 watersheds) or a nonsignificant decrease in DO (Q3, 141 watersheds) (Fig. 1d). These results demonstrate a pervasive downward trend in COD across watersheds nationwide, irrespective of DO variations, further highlighting the differentiated concentration trajectories of COD and DO.

Seasonal water quality trends across China

Consistent with the decadal trends, seasonal-scale patterns nationwide similarly reflect a directional tendency of decreasing COD concentrations and increasing DO concentrations. For COD concentrations, the trends in spring, fall, and winter were consistent, with 17.9%, 22.2%, and 22.5% of the major watersheds, respectively, showing significant reductions. Due to seasonal differences in hydrological and pollution dynamics, the watersheds with significant reductions varied by season. However, only 12.3% of the watersheds exhibited a significant decrease in COD in summer. Additionally, 6% and 7.6% of the watersheds exhibited significant increases in COD in spring and summer, primarily in the southern YaR, HuR, and PeR Basins and the northern HaR, Liao (LiR), and Songhua River (ShR) Basins (Fig. 2a, c). For DO concentrations, the predominant pattern was an increase, with 13.3%, 19.7%, and 25.5% of the watersheds showing significant DO increases in spring, fall, and winter, respectively, whereas fewer than 3% of the watersheds exhibited significant DO reductions. However, 9.2% of the watersheds showed significant DO reductions in summer, which occurred mainly in the northeastern plain, the lower reaches of the Yangtze River Plain, the northwestern plateau, and the southeastern hills (Fig. 2b, e).

Fig. 2: Seasonal trends in COD and DO concentrations across China’s ten major river basins from 2006 to 2020.
figure 2

Maps illustrating the spatial distribution of seasonal trends in COD (a) and DO (b) concentrations across sub-watersheds during this period (p < 0.05). Red and blue colors represent increasing and decreasing trends, respectively, with deeper shades indicating more significant changes. Grouped bar charts show the proportions of sub-watersheds with significant and non-significant seasonal trends in COD (c) and DO (e) for spring, summer, autumn, and winter. Dark colors indicate significant trends (p < 0.05), while lighter shades denote non-significant trends (p > 0.05); red and blue represent increasing and decreasing trends, respectively. Depict the average trend levels (p < 0.05) in COD (d) and DO (f) concentrations across northern and southern basins in different seasons. Bars represent the mean trends, colored hollow circles denote changes in sub-watershed concentrations. The curves to the right of the bars display the normal distribution of the data points. The horizontal bars represent 1.5 times the standard deviation (SD).

We observed that the summer season exhibited distinct responses compared with those during the other seasons, with a greater number of watersheds showing significant COD increases and DO decreases, whereas fewer watersheds showed significant COD reductions and DO increases. Specifically, there were 28–106% more watersheds with significant COD increases and 217–473% more watersheds with significant DO decreases than in other seasons. Conversely, there were 31–46% fewer watersheds with significant COD decreases and 10–54% fewer watersheds with significant DO increases than in other seasons (Fig. 2a, b). COD concentrations in rivers from both northern and southern regions are generally lower in spring, autumn, and winter than in summer, while DO concentrations are higher in these seasons than in summer (Fig. 2d, f). Further analysis indicates that the combined effects of high temperatures and intense rainfall during summer significantly increase the flux of pollutants into river systems via surface runoff. This seasonal pattern coincides with periods of intensified agricultural fertilization, irrigation, and industrial activity, leading to a substantial rise in non-point source pollution loads and making water quality more vulnerable to disturbance. Additionally, changes in hydrological regimes during the flood season may reduce the dilution capacity of certain river segments, further exacerbating the risk of water quality deterioration19. Although more than half of the watersheds did not exhibit statistically significant seasonal trends (P > 0.05), the grouped bar charts shown in Fig. 2 reveal that, regardless of statistical significance, water quality generally improved in spring, fall, and winter, whereas the improvement and deterioration trends were almost balanced in summer. While the seasonal trends were generally consistent with the annual trends, these seasonal patterns indicated spatial variations that could not be captured via annual trend analysis. Understanding these seasonal dynamics is crucial for the sustainable management of water resources and aquatic environments.

This study advances previous work by extending the temporal resolution of trend analysis from the interannual to the seasonal scale, enabling a nationwide assessment of the seasonal dynamics of COD and DO. This seasonal perspective complements and enriches existing interannual studies by offering a more nuanced understanding of how river water quality responds to temporal variability. Several of our findings are consistent with national-scale annual trend analyzes and seasonal assessments at the watershed level, lending robustness to the observed patterns. For example, a comprehensive trend analysis of seven major river basins in China by Liu et al. (2017) found that 27.4% of watersheds showed a significant downward trend in COD, whereas 24.7% showed a significant upward trend in DO, which is consistent with the results of this study20. At the basin level, numerous studies have indicated that in China’s most densely populated basin (the HuR Basin), COD significantly decreased in winter and spring from 2003 to 2015, whereas an increasing trend was observed in some upstream areas in summer21,22. From 2010 to 2019, the YaR Basin exhibited an increasing trend in DO, with larger increases in spring and winter than in summer23.

Comparison of seasonal water quality trends between natural and managed watersheds

To better elucidate seasonal water quality variations, we comparatively analyzed managed watersheds and their neighboring natural counterparts. The findings revealed that 16–24% of the managed watersheds exhibited significant seasonal trends in either increasing or decreasing COD and DO concentrations (P < 0.1), whereas their natural neighbors did not demonstrate notable trends (P > 0.1). Conversely, only 7–14% of the managed watersheds showed the opposite trend, with significant trends in natural watersheds (P < 0.1) and non-significant trends in managed watersheds (P > 0.1) (Table S3). This outcome highlights the critical influence of anthropogenic stressors on water quality seasonality, which coincides with the findings of Stow24. We subsequently focused on pairs of managed and natural basins exhibiting significant seasonal changes, calculating the ratio of absolute trend values between managed and natural basins (denoted as T-NM). We then quantified the specific impact of anthropogenic factors on seasonal water quality variations by analysing the direction and magnitude of T changes (Fig. 3).

Fig. 3: Classification and pairing of natural and managed watersheds.
figure 3

a Spatial distribution and pairing method of natural and managed sites across the ten major river basins. Triangles represent the 195 natural sites, while circles denote the 1540 managed sites. The color coding illustrates the anthropogenic disturbance index, with higher values indicating greater human impact. The number of managed sites within a 115 km radius of each natural site was aggregated, yielding 1915 natural-managed pairs. The number of managed sites per pair ranges from 1 to 55, with a median of 4. Only pairs with significant trend changes (p < 0.1) were included in the analysis. b Four directional categories and five magnitude levels of trend changes between natural and managed watersheds. The four quadrants capture the directional consistency or divergence of trends between paired watersheds. Specifically, the first quadrant (N+M+) indicates synchronous increasing trends in both natural and managed watersheds; the second quadrant (N+M) reflects a rising trend in natural watersheds but a declining trend in managed ones; the third quadrant (NM) denotes synchronous decreasing trends in both; and the fourth quadrant (NM+) represents a declining trend in natural watersheds and a rising trend in managed watersheds. Furthermore, the magnitude of divergence in trend slopes is classified into five levels: highly suppressed (T < 0.44), moderately suppressed (0.44 ≤ T < 0.86), low impact (0.86 ≤ T < 1.22), moderately amplified (1.22 ≤ T < 2.58), and highly amplified (T ≥ 2.58). Classifications for COD and DO were based on the same thresholds.

The results indicated that 62% of the watersheds exhibited a predominant T-NM pattern in COD concentrations (N-M- denotes a declining COD trend in both managed and natural watersheds), whereas 60% of the watersheds exhibited a dominant T-N+M+ in DO concentrations (N+M+ indicates an increasing DO trend in both managed and natural watersheds) (Table S4). Notably, 8% of the watersheds exhibited the T-N+M+ pattern in COD concentrations, while 19% showed the T-NM pattern in DO concentrations. We observed that, for both COD and DO, the proportion of watersheds exhibiting the T-N+M pattern (i.e., decreasing in managed watersheds and increasing in natural watersheds) and the T-NM+ pattern (i.e., increasing in managed watersheds and decreasing in natural watersheds) was individually less than 30%. This result suggests that the directional consistency in water quality trends between managed and natural watersheds suggests a dominant influence of climatic factors in determining these trends, with anthropogenic pressures functioning as modifiers, either reinforcing or counteracting the trends. Furthermore, we determined that the T-NM patterns exhibited seasonal regularity. Synchronous water quality improvement in both managed and natural watersheds, characterized by the T-NM pattern in the COD concentration and the T-N+M+ pattern in the DO concentration, mainly occurred in fall (COD: 71%; DO: 83%) and winter (COD: 73%; DO: 78%). This indicates a broad, gradual water quality improvement trend in fall and winter. These results are consistent with those of other studies on seasonal water quality trends across natural watersheds at the regional scale25.

To better understand how anthropogenic factors interact with climatic factors to influence water quality, we analyzed the magnitude of the T index in detail. The results revealed that 17–24% of the watersheds were minimally affected by anthropogenic influences (Category 3, with T values between the 40th and 60th percentiles, 0.86 ≤ T < 1.22). Notably, a T value of 1 indicates that the changes in managed watersheds are completely consistent with those in adjacent natural watersheds, indicating that natural factors drive water quality changes. In 27–33% of the managed watersheds, the water quality change trend surpassed that in neighboring natural watersheds by 22–158% (Category 4, with T values between the 60th and 90th percentiles, 1.22 ≤ T < 2.58). This moderate amplification effect of water quality trends occurred in both improving and deteriorating watersheds in spring and summer, whereas it occurred primarily in improving watersheds in fall and winter (Fig. 4a–b). Additionally, 5–19% of the managed watersheds exhibited trends exceeding 158% of those in natural watersheds (Category 5, with T values above the 90th percentile, T ≥ 2.58). This notable amplification effect was primarily observed in improving watersheds across all seasons. Furthermore, 23–35% and 5–13% of the watersheds exhibited moderate suppression (Category 2, with T values between the 10th and 40th percentiles, 0.44 ≤ T < 0.86) and strong suppression (Category 1, with T values below the 10th percentile, T < 0.44) of water quality trends, respectively, due to anthropogenic factors. We also observed differences in the extent of anthropogenic impacts on the two water quality indicators, with a greater tendency towards pronounced amplification of COD trends and strong suppression of DO trends. These findings suggested that the influence of anthropogenic factors on water quality also exhibits seasonal characteristics, highlighting the potential of controlling seasonal anthropogenic impacts to improve long-term water quality trends.

Fig. 4: Seasonal variation trends and spatial patterns of COD and DO in natural and managed watersheds from 2006 to 2020.
figure 4

a, b Proportions of watersheds showing different combinations of change directions and magnitudes for COD (a) and DO (b) across the four seasons. The T-value represents the absolute ratio of significant trends (p < 0.1) between managed and natural watersheds. c, d Spatial distribution of seasonal variation trends (p < 0.1) for COD and DO in natural and managed watersheds. N+M+ denotes synchronous increases in both natural and managed watersheds, while NM indicates synchronous decreases. Red and blue represent increasing and decreasing trends, respectively, with gray indicating low impact (0.86 ≤ T < 1.22).

The spatial distribution of T values across China revealed that the impact of anthropogenic factors on water quality was region-specific (Fig. 4c, d). Notably, the smallest influence on river water quality (0.86 ≤ T < 1.22) occurred in the Southeast River (SeR) and YaR Basins. We focused on high amplification of the general river water quality improvement trend (TCOD-NM ≥ 2.58, TDO-N+M+ ≥ 2.58) in 72 subbasins within the watersheds except for the SeR, ShR and HuR Basins, suggesting that the decreasing COD trend or increasing DO trend in rivers in these regionally managed watersheds was significantly greater than that in their natural neighbors. This amplification effect of anthropogenic interventions was particularly notable in fall. This phenomenon could be attributed to the implementation of hydraulic infrastructure and land use modifications, including pollutant dilution due to increased flows from reservoir discharge and enhanced vegetation cover and ecological protection measures in managed watersheds26,27. This effect was particularly significant in the sparsely vegetated upstream regions of the YeR and YaR Basins, explaining why these areas have become highlights for anthropogenically driven water quality improvements. For example, ecological restoration projects such as Grain for Green and small watershed management in the middle YeR and upper YaR have improved ecosystem structure and water self-purification, amplifying the impact of human intervention on water quality improvement28.

However, we also noted significant water quality trend suppression primarily in basins exhibiting overall improvement (TCOD-NM < 0.44, TDO-N+M+ < 0.44) (Fig. 4c, d). These patterns were evident in northern basins, such as the YeR and HaR Basins, as well as southern basins, including the YaR, HuR, and PeR Basins, where the COD reduction or DO increase rates in managed basins were lower than those in their natural neighbors. This phenomenon is likely attributable to the intense human activities in these regions, such as agricultural practices, poorly regulated industrial development, and pollutant discharge, which complicate water quality improvement efforts. In contrast, some areas continue to face overlapping sources of pollution and poor hydrological connectivity, which constrain the effectiveness of management interventions in reducing external pressures. For instance, parts of the North China Plain are characterized by a high density of small-scale industries and the co-occurrence of non-point source pollution and untreated domestic sewage29. Delayed implementation of pollution control measures in such regions has limited the extent of water quality improvement in managed watersheds.

Additionally, we noted significantly more evidence that anthropogenic pressures highly influence COD concentrations rather than DO concentrations, which is consistent with previous findings30,31. COD originates mainly from agricultural runoff, industrial effluents, and domestic sewage. The regional intensification of human activities or the increase in wastewater treatment can significantly increase or decrease local COD concentrations. Unlike the processes affecting COD, in addition to the indirect impact of anthropogenic discharge on aquatic organism growth and metabolism, natural factors such as temperature constrain oxygen solubility in the atmosphere, directly influencing DO concentrations32.

Overall, our study establishes a systematic paired comparison framework between natural and managed watersheds and introduces the T-NM index to normalize and quantify differences in both the direction and magnitude of seasonal trends. The results reveal marked spatial heterogeneity in the effects of anthropogenic interventions on river water quality, highlighting the dual roles of human activities in either amplifying or suppressing water quality improvements. Identifying such region-specific effects is critical for detecting management blind spots and optimizing intervention strategies. These findings provide critical systematic insights for optimizing watershed management strategies and improving water quality.

Drivers and controls of seasonal water quality trends

To elucidate the underlying mechanisms driving seasonal and long-term water quality trends, we employed a random forest (RF) model to assess the relative importance of various factors influencing COD and DO in both natural and managed watersheds. The predictor variables included seasonal elements (season [SEA], monthly trend values [Q] and seasonal trend values [SQ]), meteorological factors (temperature [TEM], precipitation [PRE]), watershed attributes (slope [SLP] and watershed area [WA]), and landscape elements (Table S5). To highlight the differences in driving factors between natural and managed watersheds, we used the same indicator system for the same water quality parameters in both watershed types. The results indicated that the RF model performed better in predicting water quality in natural watersheds (COD R2 = 0.81; DO R2 = 0.83) than in managed watersheds (COD R2 = 0.74; DO R2 = 0.70) (Table S6). These results conform with recent research data suggesting that anthropogenic disturbances reduce the prediction accuracy of water quality models33.

The importance score results for the predictor variables revealed substantial differences in the primary drivers of seasonal water quality trends between natural and managed watersheds (Fig. 5). Although seasonal factors play a crucial role in predicting water quality in both types of watersheds, their impact is greater in managed watersheds (Fig. 5a, b). Regarding COD, the importance rankings of Q-COD and SQ-COD were 4th and 3rd in natural watersheds but increased to 1st and 2nd, respectively, in managed watersheds. Regarding DO, the importance rankings of SEA and SQ-DO were 5th and 8th in natural watersheds and 4th and 5th, respectively, in managed watersheds (Fig. 5c, d). Interestingly, the difference in the importance of the seasonal driving factors of these two water quality indicators suggested that trend-based variables were more critical for COD, whereas categorical variables were more important for achieving accurate DO predictions. This could be attributed to the nature of COD, which serves as an indicator of organic pollution levels in water, demonstrating a clear concentration‒discharge relationship34,35. Discharge (including agricultural runoff and domestic sewage) often fluctuates over time, exhibiting patterns on decadal, interannual, and seasonal scales. Therefore, seasonal or monthly COD trend values can effectively capture these dynamic changes. In contrast, DO, as a unique solute indicative of the water body’s self-purification capacity, is influenced not only by pollutant discharge and subsequent biological activities but also by gas exchange and solubility36. Thus, categorical variables provide clearer indications of the combined effect of discharge, climate, and other seasonal characteristics, rendering them more critical for model predictions than trend variables. Furthermore, the smaller importance gap between the top drivers and other factors in natural watersheds suggests that in areas with lower human activity, the individual contributions of the driving factors to water quality are relatively modest (Fig. S2).

Fig. 5: Drivers of seasonal water quality trends.
figure 5

a, b Frequency ranking of variables in the COD models for natural watersheds (a) and managed watersheds (b). c, d Frequency ranking of variables in the DO models for natural watersheds (c) and managed watersheds (d). Abbreviations are listed in Table S1. e Conceptual illustration of dominant drivers of seasonal water quality variation. Meteorological factors (e.g., air temperature and precipitation) dominate seasonal water quality dynamics in natural watersheds, whereas landscape factors—namely the composition and configuration of land use—have a greater influence in managed watersheds. The blue shaded ellipse and directional arrows represent the typical seasonal trajectory of water quality (from spring to winter), while the red dashed arrows indicate potential deviations due to increased anthropogenic pressures or climate extremes. More details are given in “Discussion”.

In addition to seasonal factors, climatic variables (temperature and precipitation) and watershed attributes (slope and watershed area) emerged as the most significant predictors in natural watersheds. Partial dependence plots revealed that with increasing temperature and precipitation, COD and DO exhibited increasing and decreasing trends, respectively, indicating that river water quality degradation was greater during high-temperature seasons and rainy periods (Figs. S3S6 and Table S7). Notably, over the past 15 years, temperature and precipitation trends have increased in 83.4% and 85.8% of watersheds across China, with significant increases in 24.3% and 15.0% of these areas, respectively (Table S8). These findings emphasize the importance of understanding how river water quality responds to seasonal climate changes under long-term global warming and the increasing frequency of extreme precipitation events. Moreover, DO concentrations decrease with increasing river area, with faster changes occurring in watersheds with slopes above 20° and areas smaller than 103 km² (Fig. S7). These steep, smaller watersheds likely represent headwater streams that serve as cradles of watershed water resources, including glaciers and permafrost zones on the Qinghai‒Tibet Plateau and forested watersheds in the highlands of the Changbai Mountain range. These insights highlight the critical importance of enhancing seasonal water quality monitoring and protection in headwater regions, which serve as essential conduits for terrestrial carbon transfer to oceans and as highly sensitive biogeochemical reactors under climate change37.

In contrast, our RF model identified landscape metrics such as Shannon’s diversity index (SHDI), largest patch index (LPI), and edge density (ED) as more important drivers within managed watersheds, reflecting landscape diversity, fragmentation, and dominance, respectively. For COD, SHDI was the third most critical driving factor in managed watersheds (Fig. 5b), with its importance second only to seasonal trend values. An increase in SHDI is correlated with a decrease in COD (Fig. S4), likely due to the diverse landscape patterns that enhance pollutant deposition, dispersion, adsorption, and purification during runoff processes38,39. Our findings conform with those of other studies on the connection between SHDI and river water quality trends40. Importantly, our SHDI metric was not a static variable but was calculated via land use data selected from eight years between 2006 and 2020. Overall, SHDI reflects the trajectory of land use change in managed watersheds, which is driven primarily by urbanization. During the study period, the average SHDI in managed watersheds increased from 0.926 to 0.944, mainly due to a 42.8% increase in urban and residential land use and a 6.2% decrease in agricultural land use (Table S9).

Although urbanization is often associated with increased pollution pressures, previous studies have reported that, in certain river basins such as the YaR, YeR, and HaR Basins, reductions in fertilizer use, improved wastewater collection, and advancements in treatment technologies may have contributed to localized improvements in water quality41,42,43. Regarding DO, the dominance of waterbody landscape features was the second most critical driving factor in managed watersheds, second only to temperature (Fig. 5d). Partial dependence plots revealed that when LPI-4 is below 0.1, DO significantly increases with increasing landscape dominance (Fig. S6), primarily because larger waterbody patches generally indicate higher river connectivity and more complex flow networks, which facilitate oxygen exchange and uniform DO distributions44. Indeed, we found that the hotspots of significant DO suppression (TDO-N+M+ < 0.44) in the YaR and PeR Basins corresponded to regions where LPI-4 was below 0.1(Figs. 4d and S7). These findings emphasize the necessity of optimizing landscape patterns in managed watersheds with smaller water body patch areas.

The identification results provide valuable guidance for tailoring watershed water quality management strategies to regional conditions. For instance, the strong explanatory power of topographic and climatic variables such as slope and precipitation in natural watersheds suggests that hydrological regulation under natural background conditions should be prioritized. In contrast, in managed watersheds, the relatively high contribution of landscape configuration metrics such as SHDI and LPI indicates that optimizing land use structure and ecological spatial patterns can further enhance the synergistic benefits of pollution reduction and water quality restoration on top of existing management efforts. Despite the presence of spatiotemporal uncertainties arising from regional discrepancies in data availability and mismatches in the temporal resolution of explanatory factors and water quality observations, the machine learning models demonstrated robust generalizability and stability based on rigorous data filtering, variable control, and cross-validation procedures. The models effectively identified the dominant drivers of seasonal water quality dynamics in both natural and managed watersheds, underscoring their practical utility in bridging data gaps and uncovering latent seasonal patterns in dynamic water quality systems.

Discussion

In a world increasingly shaped by frequent climate changes and intensified human activities, river water quality management is becoming increasingly challenging. Our study provides a basis for understanding the dynamics of seasonal river water quality trends. The study reports a significant overall improvement in China’s river water quality since 2006, highlighting the coexistence of water quality improvement and degradation in summer. Through paired comparisons of natural and managed watersheds, and using T-NM to assess trend direction and magnitude, we found that 52–89% of watersheds exhibited consistent trends, indicating that climate predominantly drives seasonal water quality patterns. However, anthropogenic pressures exert a dual influence, amplifying (22–158%) and suppressing (14–56%) trends, with notable seasonal variation. These findings are essential for identifying regions where human interventions either enhance or worsen water quality. Furthermore, compared to DO, which reflects the overall health of river systems, human activities have a greater impact on COD, which is closely tied to pollutant emissions.

To explain water quality changes, we developed four independent RF models to simulate monthly COD and DO concentrations (2006–2020) across 1735 natural and managed sites nationwide. Despite spatiotemporal biases arising from inconsistent data availability and mismatched time scales, our study underscores the utility of machine learning models in filling data gaps and uncovering seasonal patterns in dynamic water quality. We found that monthly trends and seasonal trends are the main drivers of water quality, with a greater influence on managed watersheds and COD concentrations (47.08%, CODMN-Managed). Climate factors, such as precipitation (25.37%, CODMN-Natural), and watershed attributes like slope (17.40%, DO-Natural) explain variations in natural watersheds, while landscape metrics like SHDI (11.58%, CODMN-Managed) and LPI (10.66%, DO-Managed) play a larger role in managed watersheds. These findings are crucial for sustainable water environment management by identifying and mitigating seasonal water quality degradation trends, enhancing monitoring and protection of natural headwater streams, and optimizing landscape patterns in managed watersheds.

Ultimately, we emphasize the need for more focused interventions to alleviate anthropogenic pressures on river water quality, adjust seasonal patterns, and adapt to the evolving climate and human development. Targeted summer management measures, including regulation of runoff, control of pollution sources, and improved real-time water quality monitoring, are essential to minimize the impact of high temperatures and changing precipitation patterns on water health. Moreover, in regions where anthropogenic pressures significantly amplify pollution, comprehensive measures such as agricultural restructuring, implementation of separate sewerage systems, and establishment of ecological buffer zones are recommended. In contrast, areas characterized by strong natural attenuation effects offer opportunities to optimize resource allocation and improve cost-effectiveness. Future research should focus on developing hybrid modeling frameworks that integrate understanding of hydrological processes with data-driven algorithms. By incorporating high-resolution remote sensing, in-situ monitoring data, and pollutant emission inventories, such models can enhance robustness and generalizability under complex environmental conditions, thereby providing stronger methodological support and scientific evidence for adaptive watershed governance under changing climate and socio-economic contexts.

Methods

Monitoring data sources and site classification

Through the China National Environmental Monitoring Centre (CNEMC) (https://www.cnemc.cn/), we collected weekly and monthly monitoring data of COD and DO at key monitoring sections in ten major river basins in China for a total of 15 years from 2006 to 2020. The dataset includes 170,925 COD observations and 171,542 DO observations from 1735 monitoring stations, with monthly averages calculated. The 15-year dataset allows for some data gaps (due to monitoring anomalies or missing values), which is common in water quality monitoring. To balance the requirements of model training and data quality, we applied a dual filtering strategy: For spatial coverage, each site was required to have at least 100 valid observations of COD and DO concentrations during the period from 2006 to 2020; For temporal coverage, valid observations were required for at least 80% of the months in each calendar year. In addition, following the quality control approach proposed by Zhang et al.45, we systematically assessed data integrity and removed potential outliers or measurement errors by excluding values that fell outside the range of the monthly mean ±3 standard deviations at each monitoring site46.

The 2006–2020 period was selected for its 15-year span, suitable for long-term trend assessments, and its alignment with the temporal coverage of the driving variable dataset (Table S5). Moreover, national-scale observational data that are traceable and systematically accessible have been available since 2006. The period from 2006 to 2020 covers three major national environmental governance cycles in China (the 11th, 12th, and 13th Five-Year Plans), providing a consistent monitoring foundation and important value for policy performance evaluation. The selected sites exhibit significant spatial heterogeneity, spanning different hydrological, climatic, and geographical regions, as well as varying levels of anthropogenic impact. In terms of climatic conditions, the selected monitoring sites span a wide range of hydroclimatic variability, with monthly mean precipitation ranging from 0 to 755.87 mm and monthly mean air temperature from −31.04 °C to 30.82 °C. Regarding watershed characteristics, the study sites encompass a broad spectrum of stream sizes, with Strahler stream orders ranging from 1 to 9. Topographically, the stations are distributed across plains, hills, and mountainous regions. In terms of land use and landscape composition, 737 watersheds are dominated by cropland, 687 by forest, and 177 by grassland, while 26 and 78 watersheds are primarily covered by water bodies and urban/residential-industrial land use, respectively.

Long-term changes in land cover are one of the most direct manifestations of human activities affecting terrestrial ecosystems, driving transitions from natural to semi-natural or anthropogenically modified systems47. Wang et al. developed a Human Activity Intensity (HAI) index by constructing a weighted land use classification model, which has been widely applied in ecological security assessments, land use planning, and environmental impact studies (Eq. 1)48,49. In this study, we utilized 1-km resolution Landsat remote sensing land use data (https://www.resdc.cn/) and employed spatial analysis tools in ArcGIS 10.8 to calculate HAI values for each grid cell, enabling the characterization of spatial heterogeneity in human activity intensity across watershed units. Following the ecological disturbance classification framework proposed by Lu et al. and accounting for China’s hydrological context, watersheds with HAI values below 3 were designated as natural, while those with higher values were classified as managed50. Through systematic validation and regional calibration, we identified 195 natural and 1540 managed watersheds across China (Fig. S1).

$${\rm{HAI}}=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{n}}}{{\rm{LA}}}_{{\rm{i}}}\times {{\rm{k}}}_{{\rm{i}}}}{{\rm{WA}}}$$
(1)

In the Eq. (1), n represents the number of landscape types, \({{LA}}_{i}\) denotes the area of the ith land use type, \({k}_{i}\) corresponds to the intensity coefficient of human disturbance for the ith land use type (Table S10), and WA refers to the area of the watershed.

Comprehensive and seasonal trend analysis

To assess the status of COD and DO across major river basins, we calculated the monthly and seasonal mean concentrations for all monitoring sites from 2006 to 2020, and further calculated the molar ratio of COD to DO for all available monthly data. Seasons were defined as spring (March‒May), summer (June‒August), fall (September‒November), and winter (December‒February).

$${{\rm{R}}}_{{\rm{COD}}/{\rm{DO}}}=\frac{{{\rm{c}}}_{{\rm{COD}}}/{{\rm{M}}}_{{\rm{COD}}}}{{{\rm{c}}}_{{\rm{DO}}}/{{\rm{M}}}_{{\rm{DO}}}}$$
(2)

In the Eq. 2, \({R}_{{COD}/{DO}}\) denotes the COD to DO molar ratio, \({c}_{{COD}}\) and \({c}_{{DO}}\) are the respective concentrations (mg/L) at a given time, and \(M\) refers to the molar mass of oxygen. As both are expressed as oxygen-equivalent concentrations and share the same molar mass (32 g/mol), the concentration ratio directly reflects the molar ratio.

Trend detection was conducted using Sen’s slope estimator, a robust non-parametric method for trend calculation that is resistant to measurement error and outliers, making it particularly suitable for the analysis of long-term time series data51. Subsequently, the Mann-Kendall (MK) non-parametric test was employed to assess the statistical significance of the observed trends, a widely used method in trend analysis for long-term time series that is not affected by missing or anomalous data and does not require the data to follow a normal distribution52. The magnitude of trend changes was further quantified using Sen’s slope rate. Following the approach of Stahl et al.53, we expressed the rate of change as a percentage per decade, which is robust to outliers.

$${\rm{\beta }}={\rm{mean}}\left(\frac{{{\rm{X}}}_{{\rm{j}}}-{{\rm{X}}}_{{\rm{i}}}}{{\rm{j}}-{\rm{i}}}\right),\forall {\rm{j}} > {\rm{i}}$$
(3)
$${\rm{r}}=\frac{{\rm{\beta }}\times 10}{\bar{{\rm{x}}}}\times 100 \%$$
(4)

In the Eqs. (34), \({x}_{j}\) and \({x}_{i}\) represent water quality time series data, with \({\rm{\beta }}\) > 0 indicating an increasing trend in concentration, and \({\rm{\beta }}\) < 0 indicating a decreasing trend. \(\bar{x}\) denotes the mean concentration of water quality, while \({r}\) represents the trend magnitude (%yr−1).

Collaborative analysis of seasonal variation trends in natural and managed watersheds

To qualitatively and quantitatively discern the differences in the trends of COD and DO concentrations between natural watersheds and adjacent managed watersheds, we adopted a paired natural-managed site approach, inspired by the work of Singh et al, to help identify the underlying causes of seasonal variations54. Ficklin et al. have indicated that a 115 km radius represents the optimal range for the most notable correlation across managed and natural watersheds, likely due to the similar climatic regimes within this area33. Therefore, we hypothesized that natural and managed sites within a 115 km radius are influenced by comparable climatic conditions (Fig. S8). Using the ArcGIS Buffer spatial analysis tool, we aggregated the number of managed sites within a 115 km radius of 195 natural sites, resulting in 1915 natural-managed site pairs. To minimize the confounding effects of geographic heterogeneity, two-sample t-tests were employed to confirm that there were no statistically significant differences in meteorological and watershed attributes between paired natural and managed watersheds. Our analysis specifically targeted pairs that exhibited statistically significant slopes (P < 0.1).

To capture the direction and magnitude of COD and DO concentration trends, we developed a metric system termed T-NM. The T-value represents the absolute ratio of significant trend changes between managed and natural watersheds, with quadrant distributions reflecting the trend directions. The trends of COD and DO were evaluated using a unified metric framework, denoted as TCOD and TDO, respectively. In terms of trend direction, firstly, the quadrant distribution diagram illustrates the divergent trend directions between natural and managed watersheds. Regarding the magnitude of changes, the T-value percentile ranks the indicators into five categories (Table S11). The magnitude of divergence in trend slopes is classified into five levels: highly suppressed (T < 0.44), moderately suppressed (0.44 ≤ T < 0.86), low impact (0.86 ≤ T < 1.22), moderately amplified (1.22 ≤ T < 2.58), and highly amplified (T ≥ 2.58). Classifications for COD and DO were based on the same thresholds. We specifically focused on the highly suppressed and highly amplified categories to emphasize the anthropogenic influences evident in managed watersheds.

$${\rm{T}}-{\rm{NM}}={\rm{Abs}}\left(\frac{{{\rm{\beta }}}_{{\rm{M}}}}{{{\rm{\beta }}}_{{\rm{N}}}}\right)$$
(5)

In Eq. (5), \(\text{T}-\text{NM}\) denotes the trend-based metric developed in this study, where \({\beta }_{M}\) and \({\beta }_{N}\) represent the statistically significant trend magnitudes in managed and natural watersheds, respectively.

Selection of driving variables and Random Forest model methodology

The heterogeneity of watersheds arises from both dynamic and static factors. Dynamic factors include temporal variations in climate conditions and anthropogenic disturbances, while static factors encompass relatively stable landscape attributes such as land use type, soil properties, topography, and geological structure. These factors collectively modulate the timing and pathways of material transport within river basins, leading to complex seasonal responses in water quality55. We gathered and computed monthly or annual time series data for 30 driving variables, spanning six categories, from 1735 watershed regions across China between 2006 and 2020. Specifically, we calculated landscape metrics at both the class level‒such as the largest patch index (LPI) and edge density (ED)‒and the landscape level‒such as the Shannon diversity index (SHDI)‒to quantify the spatial structure of land use with respect to dominance, fragmentation, and diversity. Using Fragstats 4.2 software, we computed all landscape metrics at the watershed scale based on an 8-neighbor rule for both class- and landscape-level analyzes. Furthermore, we included both overall and seasonal trends of COD and DO concentrations, along with the categorical variable season, as input variables in the modeling framework. The trend variable was incorporated as a state-type input at the site scale to capture the long-term variation patterns of water quality under varying natural and anthropogenic influences, as well as their spatial heterogeneity. The selection of these variables was based on their impact on COD and DO concentrations in watersheds and the availability of data56,57. Detailed descriptions of all predictor variables and their data sources can be found in Table S5.

Furthermore, we developed four independent Random Forest (RF) models. These models were specifically designed to simulate changes in COD for natural watersheds (195 sites, 9796 samples) and managed watersheds (1540 sites, 73,714 samples), as well as DO for natural watersheds (195 sites, 9788 samples) and managed watersheds (1540 sites, 73,694 samples). Initially, we employed Spearman’s rank correlation coefficient to assess the statistical correlations among 30 potential predictor variables, and used Recursive Feature Elimination (RFE) combined with expert knowledge to select the 10 least correlated driving variables for each model (Fig. S9 and Table S7). The Random Forest algorithm is a robust machine learning technique based on ensemble decision trees used for classification and regression. Each tree in the forest is trained independently using a random subset of the dataset, which enhances model stability and accuracy8,58.

Each of the four models was tuned using GridSearchCV from the Scikit-Learn library (Table S12). Ten-fold cross-validation was conducted on 85% of the data to optimize hyperparameters and reduce overfitting. The remaining 15% was held out as an independent test set for model evaluation. The mean absolute error (MAE) was selected as the optimization objective during model training due to its greater robustness to outliers and extreme deviations, enabling more balanced fitting performance across the full range of concentration values. Model performance was systematically evaluated using four complementary metrics: the coefficient of determination (R²), mean absolute error (MAE), Nash–Sutcliffe efficiency coefficient (NSE), and percent bias (PBIAS). These metrics together provide a comprehensive assessment of both model accuracy and generalization capability. Both the natural and managed watershed RF models demonstrated satisfactory fit (Fig. S10 and Table S6). To enhance the robustness and reliability of our findings, we conducted 30 independent runs for each model using different random seeds and recorded the frequency with which each variable appeared in the top ranks. This approach helps to mitigate potential biases introduced by random sampling and bootstrapping, and allows us to identify consistently influential variables, rather than relying on potentially unstable single-run results. To further explore the marginal contribution of the core features to the predictive performance, we generated partial dependence plots (PDPs) for each individual variable. To prevent PDPs from being misleading in regions with sparse or no data, we concurrently plotted the probability density distribution of the variable data, thereby enhancing the interpretability and reliability of the results59. This methodology was implemented using the PDPbox and NumPy libraries in Python 3.6.