Introduction

The integrated consequences of climate variability and anthropogenic activities have led to considerable global alterations of catchment hydrological regimes in recent decades1,2,3,4. Global warming has caused an overall upward trend in temperatures and fluctuations in precipitation5. By increasing water vapour content within the atmospheric boundary layer6, and accelerating the degradation of permafrost ice wedge and glaciers7, it has thereby affected the water cycle process in river basins. Meanwhile, with economic development and population growth, extensive anthropogenic activities such as the irrigation8, urbanization9 and construction of water projects10 have modified the conditions of catchment substrates and the distribution of water resources on both spatial and temporal scales3,11. Within the human-nature system, attribution analysis of runoff variation facilitates the dynamic equilibrium of water resources, strengthens adaptive capacities to climate challenges, and provides data-driven insights for watershed management, ecological restoration, and related initiatives3.

Currently, to separate the impacts of climate change and anthropogenic activities on hydrological regimes, researchers have primarily employed four approaches: statistical attribution analysis method12,13, Budyko framework3,14, hydrological models4,15 and machine learning16,17,18. Statistical attribution analysis method based on the empirical statistical method requires long-term hydro-meteorological observations3, whereas existing studies often face data scarcity issues due to limited monitoring durations. In contrast, the Budyko framework model, although more physically significant than the standard mathematical-statistical empirical method, has limitations in accurately modelling hydrological processes, which restricts its effectiveness in climate resilience methods19. Similarly, as a hydrological model, SWAT mitigates the interpretability limitations of purely data-driven approaches., while also demonstrating advantages in representing flood peaks20. However, their cross-temporal scale predictive capabilities still face challenges20,21. Furthermore, they typically require extensive input data such as topographic, land use, and soil parameters. On the other hand, the flexible data processing capability of machine learning enables it to handle different variables across various time scales using only meteorological data and runoff as inputs22. In particular, the RFRM demonstrates advantages over individual machine learning models in variable selection, handling high-dimensional feature23 and computational efficiency24. However, as a black-box model, machine learning algorithm lacks interpretability regarding the physical mechanisms underlying hydrological processes, a limitation frequently noted in comparative studies25,26. Thus, multi-model fusion approaches contribute to reducing the uncertainty associated with reconstructed runoff17,27,28. The preceding discussion has summarized methods for reconstructing runoff. However, beyond the choice of methodology, both multi-temporal scale analysis and attribution analysis emerge as a pivotal yet underexplored factor in accurately quantifying anthropogenic impacts.

Current studies quantifying the impacts of climate change and anthropogenic activities on runoff exhibit two primary limitations. First, the predominant reliance on monthly and annual temporal scales for runoff reconstruction29, with limited exploration of alternative scales, without conclusive evidence establishing monthly resolution as the most efficient or accurate. Second, frequent reliance on single-perspective analyses, typically focused on land-use alterations to explain contribution rates, potentially overlooking additional anthropogenic mechanisms influencing runoff given the complexity of anthropogenic activities. Specifically, different human activities affect runoff variations across distinct temporal scales. In terms of remote sensing indicators, vegetation plays an important role in runoff regulation. Studies have shown that its influence on the runoff coefficient is more pronounced over short timescales30,31, and tends to diminish at annual and longer scales32. Meanwhile, remote sensing metrics such as impervious surface area (ISA) and nighttime light (NTL) reflect urbanization intensity. Urbanization increases the proportion of impervious surfaces, which significantly shortens confluence time, raises the runoff coefficient, and amplifies flood peaks during storm events, while generally suppressing baseflow and groundwater recharge33. As urbanization intensifies, differences in surface runoff between urban and rural areas become more evident at seasonal scales, with urbanization playing the most significant role in runoff increase during heavy rainfall events34,35. To address these limitations, this study introduces a novel framework that identifies watershed-specific optimal time scales and employs a hybrid RF–SWAT approach to ensure reliable application in basins with limited underlying surface data. Furthermore, the framework integrates multi-year government census data with remote sensing products to analyse the practical causes of contribution rate variations. A case study of the Lan River Basin was conducted to validate the proposed method. The Qiantang River Basin is a representative basin in south-eastern China, with the Lan River Basin constituting one of its three sub-basins. Xu et al.4 employed the SWAT model to project runoff changes in the Lan River basin from 2011 to 2100, revealing potential multiannual runoff reduction and increasingly polarized seasonal distribution patterns (decreased winter flows vs. intensified summer precipitation), with marked uncertainties in these projections. Analogously, Zhang et al.36 projected that extreme flows of small return periods at all stations in Lan River Basin will likely increase in the future period 2011–2040 under all three scenarios of climate change. Xia et al.3 systematically investigated the contribution rates of climate change and anthropogenic activities to runoff impacts in the North and South Source Basins of the upper Qiantang River by innovatively integrating six separation methods based on the Budyko hypothesis framework. This paper presents new references for water resource management in the Lan River Basin with explicit emphasis on temporal scale selection and multi-source data fusion, while offering methodological insights for other watershed studies.

Materials and methods

Study area

The Lan River Basin, situated within a subtropical humid monsoon region (Fig. 1a), experiences hot and rainy summers and mild, dry winters. This fluvial system maintains a mean annual temperature of 17 °C with precipitation averaging 1,545 mm. Marked spatiotemporal disparities in rainfall distribution drive pronounced hydrological seasonality, where April-July runoff constitutes 55%−60% of the annual discharge3. Figure 1 illustrates the geographical location of the Lan River Basin.

Fig. 1
Fig. 1
Full size image

(a) Location map of Zhejiang Province; (b) Geographical context of the Lan River Basin; (c) Drainage network of the Lan River Basin. The map was created using ArcGIS 10.2 (https://desktop.arcgis.com/) based on the standard map No.GS(2024)0650.

The Lan River, located in west-central Zhejiang Province (119°10′–120°15′ E, 29°05′–29°45′ N), constitutes a principal tributary of the Qiantang River system (Fig. 1b). Originating from the Majin Stream in the northwestern headwaters of the basin, the upper reaches flow eastward through Quzhou City and Jinhua City before exiting the Jinqu Basin at Lanxi City, where it converges with the Jinhua River. The mainstream ultimately merges with the Xin’an River at Meicheng Town, north of Lanxi City in Jiande City, forming the Fuchun River, traversing three hydrological stations: Quzhou Station, Jinhua Station, and Lanxi Station (the basin’s outlet) (Fig. 1c). With a channel length of 303 km and width ranging 250–350 m, this study focuses on the Lan River’s mainstream and its major tributary basins, encompassing a total area of 19,350 km2. The basin exhibits substantial topographic variation, featuring 1,216 m of relief and a mean elevation of 286 m.

Data acquisition

Daily runoff records (1970–2019) from three hydrological stations, including Lanxi Station, Quzhou Station, and Jinhua Station, were provided by the Zhejiang Provincial Hydrology Bureau. Meteorological data spanning 1970–2019 from five in-basin stations were obtained from the National Meteorological Data Centre of the China Meteorological Administration (http://data.cma.cn/). A 90-m resolution Digital Elevation Model (DEM) was acquired from the Geospatial Data Cloud platform (https://www.gscloud.cn/). Soil classification data (1:1,000,000 scale) derived from the Second National Soil Survey were sourced from the Nanjing Institute of Soil Sciences (https://s.ncdc.ac.cn/). Land use patterns were extracted from 1-km resolution datasets hosted by the Resource and Environment Science and Data Center, Chinese Academy of Sciences (https://www.resdc.cn/). The National Ecosystem Science Data Center (http://www.nesdc.org.cn) provided 30-m annual maximum Normalized Difference Vegetation Index (NDVI) data (2000–2018). Annual 30-m resolution ISA data (2000, 2005, 2010, 2015, 2018) were obtained from Kuang et al37. A DMSP-OLS-like NTL composite dataset (1992–2019) was compiled following Wu et al38. Socioeconomic and agroforestry statistics (1970–2019) for Jinhua and Quzhou Cities were compiled from municipal statistical yearbooks.

.

Methods

Quantifying contributions of climate change and anthropogenic activities

$$\Delta {Q_{total}}={Q_{post}} - {Q_{pre}}$$
(1)

where \(\Delta {Q_{total}}\) is the variation in runoff before and after the abrupt change (where the periods before and after the abrupt change refer to the reference period and the period influenced by anthropogenic activities, respectively),\({Q_{{\text{post}}}}\) is the measured multi-year average runoff during the period after the mutation,\({Q_{pre}}\) is the measured multi-year average runoff during the period before the mutation.

$$\Delta {Q_C}={Q_{re}} - {Q_{pre}}$$
(2)
$$\Delta {Q_H}={Q_{post}} - {Q_{re}}$$
(3)

where \(\Delta {Q_C}\)and\(\Delta {Q_H}\) are the changes in runoff caused by climate change and anthropogenic activities, \({Q_{{\text{re}}}}\) is the multi-year average runoff reconstructed by inputting the post-mutation meteorological data into the model, which is solely influenced by meteorological change factors.

$${\eta _C}=\frac{{\Delta {Q_C}}}{{\Delta {Q_{{\text{total}}}}}}$$
(4)
$${\eta _H}=\frac{{\Delta {Q_H}}}{{\Delta {Q_{{\text{total}}}}}}$$
(5)

where \({\eta _C}\) and \({\eta _H}\)are the percentage contributions of climate change and human activities to the total runoff variations, respectively.

Resampling to temporal scales

To implement temporal upscaling, this study processed daily meteorological variables (precipitation, potential evapotranspiration, etc.) and runoff records spanning 1970–2019 through a multi-temporal aggregation framework. The original daily resolution data were systematically upscaled to weekly, two-week, monthly, two-month, seasonal (three-month), and annual temporal resolutions. The aggregation protocols governing diverse temporal scales and variables are systematically detailed in Table 1.

Table 1 Aggregation protocols.

Reconstruction methods of runoff

According to the aforementioned calculation method, it can be concluded that separating the impacts of climate change and human activities requires reconstructing runoff data for the human activity-influenced period under meteorological influences only. The Lan River Basin is characterized by hilly and valley-plain terrain and a humid monsoon climate, with an average annual precipitation of approximately 1500 mm. Additionally, precipitation and evapotranspiration exhibit strong seasonality, and there is no stable snow cover. Meanwhile, rapid urbanization and industrialization have further complicated the hydrological processes, which are collectively influenced by climate, topography, and human activities. Therefore, to comprehensively capture these complex mechanisms, runoff regression models can be constructed based on RFRM and SWAT. On the data-driven side,, as a representative supervised machine learning algorithm, RFRM establishes nonlinear mapping relationships between explanatory and response variables on training sets, effectively capturing complex mechanisms within watershed hydrological systems, and predict response variables based on explanatory variables on test sets. RFRM’s theoretical foundation originates from the Bagging (Bootstrap aggregating) framework, which improves model performance by continuously reducing variance. On the physics-based side, hydrological models can address the interpretability limitations of deep learning model outputs by providing physics-based frameworks. Among them, SWAT, a semi-distributed watershed hydrological model, uses discretized spatial parameters to simulate continuous time-series hydrology, sediment, and other processes. Moreover, prior applications have already demonstrated SWAT’s effectiveness in the Lan River Basin, confirming its regional suitability4. SWAT’s hydrological simulation process is divided into the land phase of the hydrological cycle (i.e., runoff generation and overland flow concentration) and the streamflow phase (i.e., channel flow concentration). The elements constructing this physical framework—and the data input into the model—include DEM, land-use data, soil property data, and meteorological data.

Sensitivity analysis and calibration were performed for RFRM and SWAT using the different methods. For the RFRM, out-of-period tests were employed to accommodate the characteristics of hydrological time-series data. For the SWAT model, an integrated approach combining Latin Hypercube (LH) sampling with the One-Factor-at-a-Time (OAT) method was applied39.

The framework for quantifying and analysing the impacts of climate change and anthropogenic activities on runoff is shown in Fig. 2. First, the optimal set of explanatory variables for the RFRM model was selected through Spearman correlation and multicollinearity analyses across various temporal scales. The model was then trained, validated, and used with this optimal variable combination and time scale to reconstruct runoff. For SWAT, DEM and land use/cover data were prepared, and the SWAT model at the optimal temporal scale was selected for runoff reconstruction. Finally, remote sensing products and socioeconomic statistics were integrated to parse anthropogenic impacts on runoff variation.

Fig. 2
Fig. 2
Full size image

Flowchart of the framework.

Results

Temporal differentiation of climatic impacts on runoff

Figure 3 illustrates runoff dynamics and climate change characteristics in the Lan River Basin from 1970 to 2019. The annual meteorological factors for Lan River Basin were calculated using Thiessen Polygons method to divide each basin into several sections according to the meteorological stations and spatially interpolate the annual precipitation, temperature and evapotranspiration and potential evapotranspiration during the period of 1970–2019. Xia et al.3 evaluated the mutation points of runoff in the Lan River Basin based on the Mann - Kendall test, Pettitt test, and moving - T test. With reference to the results of their mutation tests, the base period and human activity impact periods were divided according to the mutation positions as follows: base period (BP): 1970–1987, variation period 1 (VP1): 1988–2008, and variation period 2 (VP2): 2009–2019. Since 1970, the Lan River Basin has exhibited an overall increasing trend in runoff, with annual averages of approximately 801 mm, 885 mm and 1010 mm during the BP, VP1 and VP2, respectively (Fig. 3a). Extreme runoff events were recorded in specific years, including 1401 mm (1995), 1407 mm (2010), 1356 mm (2012) and 1347 mm (2015), with heightened frequency during VP2. The rising runoff trend is partially attributed to increased precipitation, which averaged 1,565 mm, 1,588 mm and 1,763 mm annually during BP, VP1, and VP2, respectively (Fig. 3b). Concurrently, annual evaporation increased from 896 mm (BP) to 913 mm (VP1) and 1,010 mm (VP2), closely tracking temperature rises from 17.0 °C (BP) to 17.7 °C (VP1) and 18.2 °C (VP2) (Figs. 2d and 3c). Air pressure and relative humidity exhibited variable declines, while wind speed fluctuated across periods (Fig. 3g).

Fig. 3
Fig. 3
Full size image

Analysis of runoff and meteorological data across BP, VP1 and VP2 Periods.

Explanatory variable selection in RFRM

The formation of river runoff begins with the precipitation process. As the fundamental source of runoff, the amount and spatiotemporal distribution of precipitation directly determine the magnitude and characteristics of the runoff process. Subsequently, during the storage and infiltration phase, precipitation is intercepted by vegetation, infiltrates into the soil, fills depressions, and is consumed by evapotranspiration. When rainfall intensity exceeds the infiltration capacity, overland flow begins to form, initiating the hillslope runoff process, which gradually extends to the entire watershed. Consequently, climate is the most fundamental factor influencing river runoff, with precipitation and evaporation directly controlling the formation and variation of runoff, while temperature, humidity, wind speed, and other factors exert indirect effects through the former two. Based on meteorological station data, this study selected precipitation, evaporation, temperature, air pressure, relative humidity, and wind speed as the initial explanatory variables. According to the divided periods, the numbers of data groups for BP, VP1, VP2, and the whole period at the annual scale are 18, 21, 11, and 50, respectively, which are significantly fewer than those at other temporal scales. To eliminate the analysis bias in the annual scale due to the small number of data groups, based on the bootstrap idea, 18, 21, and 11 groups of data were respectively sampled each time from BP, VP1, VP2, and the whole period at different temporal scales, and sampling with replacement was conducted 1000 times to calculate the Spearman correlation coefficients and obtain the average values. Figures 4 and 5 show the Spearman correlations between meteorological elements and runoff at multiple temporal scales in each period and the whole period. Among them, the correlation between precipitation and runoff is the best, with a maximum of 0.90 and a median of 0.85. It is worth noting that the correlation between precipitation and runoff shows a significant scale dependence, and its correlation coefficient increases from 0.29 at the daily scale to 0.90 at the seasonal scale, which is in line with the cumulative effect of watershed confluence time. The wind speed generally shows poor correlation with runoff, so it can be excluded first. The result that the correlation between evaporation and runoff is poor is in line with the characteristics of watersheds in humid regions29,40, In addition to precipitation, relative humidity, temperature, and air pressure also exhibited good correlations with runoff. However, beyond correlation, multicollinearity between variables must be considered. For multiple regression equations, multicollinearity among variables can affect coefficients and may impact model accuracy. Therefore, variance inflation factor (VIF) of the elements should be further calculated to identify collinearity between elements41,42. Relative humidity, air pressure, evaporation, temperature, and precipitation were selected to form an independent dataset termed set 1, which was subsequently used for multicollinearity analysis. The analysis workflow is shown in Table 2. In the analysis of set 1, regardless of the temporal scale, the VIF values of variables relative humidity and air pressure were significantly higher than those of the remaining variables, so these two variables were removed (remd), thereby forming set 2. As the temporal scale increased, the VIF values of variables increased, and the collinearity between variables intensified, commonly used thresholds for judging variable collinearity strength include VIF = 542 and VIF = 1043. As shown in Table 2, VIF values of variables in set 3 and set 4 were all below 5 across all temporal scales, while those in set 2 were below 10 at the daily scale. Thus, three explanatory variable combinations and one single explanatory variable were selected for model input: precipitation–evaporation (P–E), precipitation–temperature (P–T), precipitation–evaporation–temperature (P–E–T), and precipitation (P).

Fig. 4
Fig. 4
Full size image

Spearman correlation coefficient heatmap for each period.

Fig. 5
Fig. 5
Full size image

Spearman correlation coefficient heatmap for the full period.

Table 2 Multicollinearity analysis results across different Temporal scales.

RFRM-based simulation

To elucidate the response mechanisms between meteorological factors and runoff during the baseline period and optimize model structure, this study constructed a runoff simulation model using the Random Forest algorithm. The modelling process involved three stages: First, the model was tuned, trained, and validated using the training set (data from Lanxi Station, 1972–1982) and testing set (data from Lanxi Station, 1983–1987). After inputting the training set, simulated runoff values were compared with observed values, followed by hyperparameter optimization via grid search. The relationship between model accuracy and temporal scales is detailed in Tables 3, 4, 5 and 6. Finally, the refined model was retrained on the training set to reconstruct runoff values from 1988 to 2019. Models demonstrating Nash-Sutcliffe efficiency coefficient (NSE) and coefficient of determination (R2) > 0.75, alongside percent bias (PBIAS) < 10%, were deemed optimal44. As demonstrated by the evaluation metrics in Tables 3, 4, 5 and 6, the precipitation-evaporation model (P-E model) at the monthly scale and the two-month-scale (P-E-Month and P-E-Two-Month), the precipitation-temperature model (P-T model) at the monthly scale and the two-month-scale (P-T-Month and P-T-Two-Month), the precipitation-evaporation-temperature model (P-E-T model) at the monthly scale (P-E-T-Month), and the precipitation model (P model) at the two-month scale (P-Two-month) exhibited optimal performance. Overall, models based on inputs of variable combinations generally have higher accuracy compared with those based on the input of a single variable. Meanwhile, among the seven temporal scales ranging from daily to annual, models at the monthly and two-month scales perform the best. The two-month scale models also demonstrated strong results, supporting its inclusion in hydrological model selection. Furthermore, models utilizing temporal scales larger than monthly generally outperformed finer-scale counterparts in metric evaluations. However, while seasonal-scale models achieved superior performance on training datasets, they underperformed monthly- and two-month scale models during testing, indicating stronger robustness in the latter two temporal resolutions.

As detailed in Tables 7 and 8, model-reconstructed runoff analyses quantify the contribution rates of climatic and anthropogenic drivers. All model simulation results consistently indicate that during VP1, human activities dominated runoff variation, with contribution rates reaching 77.67%~93.96%. In VP2, the human contribution rate decreased to 46.91%~66.10%. Notably, although the relative contribution rate decreased, the absolute runoff change caused by human activities showed significant increases: monthly scale runoff change rose from 5.39 ~ 6.15 mm in VP1 to 8.58 ~ 11.79 mm in VP2 (increase about 90%), while two-month scale change increased from 10.93 ~ 13.06 mm to 20.84 ~ 24.18 mm (increase about 90%), indicating enhanced influence of human activities on runoff variation.

Table 3 Accuracy of P-E models across Temporal scales on testing (training) sets.
Table 4 Accuracy of P-T models across Temporal scales on testing (training) sets.
Table 5 Accuracy of P-E-T models across Temporal scales on testing (training) sets.
Table 6 Accuracy of P models across Temporal scales on testing (training) sets.
Table 7 Reconstruction results of the RFRM during the VP1 period.
Table 8 Reconstruction results of the RFRM during the VP2 period.

SWAT-based simulation

Based on the DEM of the Lan River Basin, the SWAT model divided the basin into 50 sub-basins and further generated 319 hydrological response units (HRUs) by combining soil type, land use, and slope classifications (5%, 10%, and 10%, respectively). Meteorological driving data integrated daily observations from 1970 to 2019 at five meteorological stations, covering precipitation, evaporation, wind speed, relative humidity, and duration of sunlight. During calibration, parameters were adjusted using simulated and observed runoff data from three hydrological stations (Jinhua, Quzhou, and Lanxi). We set 1970–1971 as the warm-up period for the SWAT model, 1972–1982 as the calibration period, and 1983–1987 as the validation period. Table 9 shows the sensitive parameters identified by sensitivity analysis and they are then used in the model calibration. The validated parameters were then applied to reconstruct runoff during the human-impacted periods. Model performance metrics are shown in Table 10. Daily and annual scale results were excluded due to R2 < 0.5. The monthly scale SWAT model (SWAT-Month)’s simulations performed well. During calibration, Lanxi Station achieved an RBIAS of −7.32%, with R2 = 0.85 and NSE = 0.84, while Quzhou Station had R2 = 0.85 and NSE = 0.83. During validation, Lanxi Station showed R2 = 0.87 and RBIAS = −3.2%, and Jinhua Station had R2 = 0.86 and RBIAS = −0.87%. Overall, the SWAT model demonstrated satisfactory accuracy in simulating runoff for the Lan River Basin. Table 11 presents the contribution rates of climate change and anthropogenic activities based on SWAT reconstruction results. The reconstruction results of the SWAT model are consistent with the RFRM: during VP1, human activities dominated the contribution rate (78.10%); in VP2, their contribution rate decreased to 67.30%, but the absolute runoff change caused increased significantly from 5.42 mm to 12.31 mm (126.9% increase). This result similarly indicates that in VP2, the influence of human activities on runoff increase shows an enhanced trend.

Table 9 Parameters considered for calibration and validation of the SWAT model.
Table 10 Performance metrics of the SWAT-Month model during calibration and validation periods across three hydrological stations.
Table 11 Reconstruction results of the SWAT during the VP1 and VP2 periods.

Discussions

Comparative analysis of machine learning and hydrological models across multiple Temporal scales

The results in RFRM-based simulation and SWAT-based simulation sections demonstrate that the monthly scale model exhibits optimal performance among most models (with the exceptions of the P model, which performs best at the bimonthly scale). This indicates that the monthly scale most effectively captures the relationship between meteorological factors and runoff. Notably, the two-month scale model also holds significant interpretative value, with its mechanisms explained as follows: (1) In the P model, the bimonthly temporal window may smooth short-term evaporation and temperature fluctuations, enabling the precipitation to adequately represent the water balance during this period; (2) The bimonthly scale more comprehensively integrates time-lagged effects in hydrological processes (e.g., soil water storage/release, groundwater recharge/discharge) and effectively captures such delayed responses, whereas the monthly scale model emphasizes immediate hydrological reactions. Further analysis reveals that model accuracy follows a unimodal distribution across temporal scales: Accuracy progressively improves from daily to monthly/bimonthly scales, then declines with coarser scales. The lower accuracy at the annual scale may stem from limited data points, while the lowest precision at the daily scale can be attributed to: (1) Pronounced high-frequency variability in daily-scale meteorological data (e.g., precipitation), which complicates pattern recognition by models; (2) The inherent time lag in runoff generation, suggesting that incorporating input variables with a lead time step (e.g., meteorological data from the preceding day) may enhance performance in fine-scale modelling45.

Figure 6 comprehensively compares the performance of seven well-performing models (P-E-Month, P-T-Month, P-E-T-Month, P-E-Two-Month, P-T-Two-Month, P-Two-Month, and SWAT-Month) in simulating runoff. The upper panels (a–g) show scatterplots of simulated versus observed values, while the lower panels (h–n) display corresponding boxplots. Scatter plot analysis indicates favorable distribution of monthly scale model data points, with clusters tightly aligned along the 1:1 line. To systematically evaluate the predictive performance differences of various hydrological models across different flow levels, this study adopts a partition-based analysis method for assessing validation period data. The statistical properties of the data are: mean (\(\mu\)) = 69.42 mm, standard deviation (\(\sigma\)) = 120.9 mm. The approach first divides the runoff series into three independent hydrological regime zones—low (\({\text{x}}<\mu\)),medium (\(\mu \leqslant x \leqslant \mu +2\sigma\)) and high (\(x>\mu +2\sigma\)) —based on the relative spread of the flows from the mean46, then calculates the RBIAS within each partition. This reveals model simulation capabilities in specific flow intervals. Such hydrological regime-based layered diagnostics can effectively identify systematic biases masked in holistic evaluations. Based on runoff depth classification results (Table 12), all RFRM variants demonstrate overestimation in low- (1.04 ~ 19.94%) and high-(3.56 ~ 18.66%) value zones but underestimation in medium-value zones (−8.31~−2.88%). Conversely, the SWAT model shows underestimation in high-value zones (−3.20%). consistent with findings from other’s study47. This bidirectional bias suggests systematic flaws in its baseflow simulation mechanism and stormflow response processes. Under different runoff depth intervals, each model demonstrates varying performance capabilities. In low-, medium-, high, and total–value zones, the models performing best are the P-E-Month, P-T-Month, P-E-T-Month, and P-T-Month models respectively. In summary, the P-T-Month model maintains optimal performance across low-, medium-, high-, and total-value zones exhibiting relative bias (RBIAS) values of 3.26%, −3.65%, 3.30%, and 0.10% respectively. The absolute global deviation remains below 5%. The P-E-T-Month model overestimates by 6.73% in low-value zones and underestimates by 8.31% in medium-value zones, indicating that introducing temperature factors exacerbates simulation uncertainty for low-medium runoff under global warming. As the worst-performing monthly scale model, the P-Month model exhibits significant overestimation in low-value zones (19.94%), high-value zones (18.66%), and total-value zones (14.46%) highlighting differential impacts of driver combinations on model performance. Collectively, these results demonstrate the robust adaptability of the P-T-Month model across all runoff depth intervals. Due to limited data points for bimonthly models, direct visual comparison with monthly scale models proves challenging, necessitating supplementary analysis via box plots. The boxplots indicate that compared to the three two-month scale models, the three monthly scale models better simulate runoff. Specifically, the three RFRM models—P-E-Month (Fig. 6h), P-T-Month (Fig. 6i), and P-E-T-Month (Fig. 6j)—closely matched observed runoff depths in terms of medians, distribution ranges, and outlier patterns, demonstrating both accurate trends and statistical fidelity. Among these, the P-T-Month model exhibits the closest match between simulated and observed values for both median and interquartile range. Therefore, whether in scatter plots or box plots, the results indicate that the P-T-Month model demonstrates the best performance. However, RFRM models exhibit slightly lower box positions in simulated data, with particularly poor capture capability for extreme high values. Notably, although the SWAT model similarly showed lower box positions, only the SWAT-Month model captured the statistically significant response between meteorological drivers and high runoff depth events (around 275 mm), demonstrating its distinct advantage over RFRM in simulating extreme hydrological conditions. However, at the bimonthly scale, while the models captured median values reasonably well, significant positional shifts in their boxplots were observed. Among the bimonthly models, P-T-Two-month (Fig. 6l) demonstrated relatively superior performance. P-E-Two-month (Fig. 6k) exhibited marked deviation—particularly in its upper quartiles—where simulated value showed systematic underestimation. The P-Two-month model (Fig. 6m) displayed an excessively large box range with dispersed data distribution, failing to replicate the distribution characteristics of observed data. In summary, the P-E-Month and P-T-Month models demonstrated comprehensively superior performance, outperforming counterparts across predictive accuracy, extreme value simulation, and distribution matching metrics. This underscores how joint incorporation of precipitation with either evapotranspiration or temperature as input variables significantly enhances runoff process representation. While the physics-based SWAT-Month framework showed suboptimal performance relative to top data-driven models, it maintained robust adaptability and stability—establishing its value as a benchmark reference model that provides critical interpretability for data-driven counterparts.

Fig. 6
Fig. 6
Full size image

Simulation performance of different models during the validation period.

However, neither machine learning nor hydrological models achieved perfection, and biases during calibration (1972–1982) and validation (1983–1987) periods may affect final results. Uncertainties in model performance remain prevalent, where divergent contribution rate estimates frequently emerge among different scholars investigating identical basins48,49. These uncertainties prove difficult to quantify, particularly for black-box machine learning models. Overall, the reconstruction results show small differences among RFRM models, with presented contribution rate differences for climate change and human activities within VP1 and VP2 being about 10%, belonging to the acceptable range. Whether RFRM or SWAT, they show consistency in the incremental impact of climate change and human activities on runoff variation: The human activities impact in VP2 is significantly higher than during VP1, indicating that human activities in the VP2 period are more frequent compared to VP1 and promote runoff increase.

Table 12 RBIAS statistics of monthly scale models across runoff depth intervals.

Systematic evaluation of human-induced runoff variations across distinct periods

Huang et al50. found that built-up areas derived from Chinese census data underestimate urbanization levels, and selecting remote sensing datasets can improve the accuracy of urban land area and urbanization degree estimations. Given the spatial limitations of traditional census data in urbanization monitoring, this study combines remote sensing datasets and government-provided statistical data to discuss the real-world causes of the contribution rate results, including urbanization progress, stages of human activity development, water conservancy facility construction, and water resource management policies. The adopted remote sensing datasets include: NDVI and ISA37 derived from daytime satellite imagery and NTL data38 from nighttime satellite imagery. Vegetation index is a simple, effective and empirical measure of surface vegetation51, so the NDVI was used to characterize the vegetation change in the Lan River Basin. Vegetation dynamics significantly alter runoff regimes through hydrological process modulation (e.g., transpiration and interception evaporation), thereby regulating the magnitude of available watershed runoff. Impervious surfaces are composed of materials that impede natural water infiltration, such as roads, parking lots, rooftops, and reservoirs, and serve as a primary indicator of urbanization52. The impacts of ISA on runoff manifest in two ways: first, impervious surfaces prevent water infiltration, leading to more precipitation directly converting into surface runoff and increasing total runoff; second, the expansion of impervious areas reduces infiltration capacity and urban water retention potential, thereby elevating runoff coefficients, meaning significantly higher surface runoff under equivalent rainfall conditions33. Consequently, this effect is also pronounced at the seasonal scale34. Nighttime light data not only reflect the extent and boundaries of impervious surfaces and monitor urban expansion53, but are also used to estimate regional population density, distribution, and trends54. The applications of NTL in population, urbanization, and economics demonstrate its close association with anthropogenic activities. The impacts of NTL on runoff are twofold: first, NTL increases surface temperature through anthropogenic heat release, enhancing nighttime urban heat island effects and altering regional water cycles55; second, increased NTL intensity and expanded lit areas reflect intensified anthropogenic activities, elevated water consumption, and exacerbated reductions in infiltration, groundwater recharge, and post-precipitation evaporation56. Analysing long-term NTL data helps identify developmental stages, spatiotemporal patterns of anthropogenic activities, and their hydrological impacts. As shown in Figs. 7 and 8(a), the changes in NDVI within the Lan River Basin from 2000 to 2018 are demonstrated. The annual maximum NDVI showed a fluctuating upward trend (0.72–0.77) during 2000–2018, with minimum and maximum values recorded in 2005 and 2017 respectively, indicating sustained improvement in vegetation coverage. Although extensive studies suggest vegetation growth typically reduces runoff19, this study observed increased measured runoff depths, creating a paradox. This contradiction is attributed to three factors: (1) precise NDVI inversion relies on high-resolution remote sensing imagery, yet weak NDVI variation (p < 0.05) in this basin may cause measurement errors due to sensor limitations. (2) In semi-humid climates with dense vegetation, runoff exhibits significantly reduced sensitivity to NDVI changes – contrasting with high sensitivity in arid, sparsely vegetated regions57. (3) In urbanized plains, NDVI has lesser impact on runoff compared to other human activities. Thus, NDVI changes are not the primary driver of runoff variation in the Lan River Basin, necessitating analysis of direct anthropogenic influences like urbanization and water resource management. As shown in Figs. 8 and 10(b), the expansion trend of ISA in the Lan River Basin from 2000 to 2018 is demonstrated. To be more specific, high/medium imperviousness (imperviousness > 30%) area in the Lan River Basin has continuously increased since 2000, with rapid growth during 2000–2005 (VP1) and slower increases during 2006–2018 (VP2), reaching a total growth of approximately 28% by the end of 2018 compared to 2005. As shown in Figs. 9 and 10(c), the changes in NTL from 1992 to 2019 are demonstrated. Piecewise linear regression revealed a mutation in the annual mean NTL in 2009, with growth rates of 0.42 per year during 1992–2008 (VP1) and 0.79 per year during 2009–2019 (VP2). By 2019, the mean NTL (12.44) increased by approximately 146% compared to 2008 (5.04). NTL trend analysis (Figs. 8 and 9(c)) shows gradual expansion of nighttime light intensity and spatial coverage in the Lan River Basin from 1992 to 2019 (VP1 and VP2), with more pronounced expansion in the north-eastern region (near Lanxi Station, the basin’s outlet), indicating increasingly frequent anthropogenic activities in the Lan River Basin, particularly in the northeast. Synthesizing results from both remote sensing products, the growth of ISA and NTL collectively explains the changes in impact of human activities on runoff change from VP1 to VP2, as derived from reconstructed and observed data at Lanxi Station.

Fig. 7
Fig. 7
Full size image

Spatiotemporal changes in NDVI across the Lan River Basin, Jinhua City, and Quzhou City (2000–2018). (Fig. 7 was generated by the manuscript author using ArcGIS 10.2 software. Software version number is 10.2, the link to https://desktop.arcgis.com/.).

Fig. 8
Fig. 8
Full size image

Spatiotemporal changes in ISA across the Lan River Basin, Jinhua City, and Quzhou City (2000–2018). (Fig. 10 was generated by the manuscript author using ArcGIS 10.2 software. Software version number is 10.2, the link to https://desktop.arcgis.com/.).

Fig. 9
Fig. 9
Full size image

Spatiotemporal changes in NTL across the Lan River Basin, Jinhua City and Quzhou City (1992–2019). (Fig. 8 was generated by the manuscript author using ArcGIS 10.2 software. Software version number is 10.2, the link to https://desktop.arcgis.com/.).

Fig. 10
Fig. 10
Full size image

Temporal Variations of Anthropogenic Activity Indicators (1970–2019).

Table 13 displays statistical data provided by the governments of Jinhua and Quzhou cities. During VP2 (2010–2019), the growth rates of gross domestic product (GDP) in Jinhua and Quzhou reached 168% and 171%, respectively, while those of per capita GDP were 151% and 162%, and those of population were 6.6% and 3.5%. These findings demonstrate rapid economic development and frequent anthropogenic activities in the two cities within the basin during this period. Meanwhile, during this period, the growth rates of total water supply in Jinhua City and Quzhou City reached 86% and − 23% respectively. Specifically, the growth rates of domestic water use and industrial water use in Jinhua City reached 77% and − 93% respectively. Globally, annual human water withdrawals accounted for 8.35% of total runoff, among which domestic, industrial, and irrigation water withdrawals constituted 0.84%, 1.69% and 5.57% of total runoff, respectively58. Although population growth led to increased domestic water consumption, industrial water use in Jinhua decreased by approximately 2 billion m3 during VP2, the total water supply and domestic water use increased by only 196 million m3 and 63 million m3, respectively. Furthermore, domestic and industrial water use is characterized by a pattern of withdrawal and subsequent return flow, resulting in a low net water consumption coefficient. Although their influence on total basin runoff remains limited at monthly or longer time scales, such activities can markedly amplify low-flow signals over short-term intervals (particularly during dry seasons). In contrast to domestic and industrial water use, agricultural irrigation is characterized by significantly higher consumptive water use. It directly reduces surface runoff through field evapotranspiration and infiltration, while simultaneously altering seasonal water distribution via the “infiltration-groundwater recharge-delayed discharge” process. This typically results at the seasonal scale of water consumption that is higher in summer and lower in winter. Jinhua’s crop planting area declined sharply around 2000, with the planting areas in Jinhua and Quzhou at the end of VP2 being 78% and 91% of those at the end of VP1, respectively. Reductions in planting area indicate lower demand for agricultural irrigation water. In addition to reduced planting area, Quzhou’s second and third major agricultural census bulletins show that by the end of 2006, the city’s cultivated areas with sprinkler, drip, and sub-surface irrigation facilities were 0.4 thousand hectares and 0.3 thousand hectares, respectively. By the end of 2016, these areas had increased to 4.7 thousand hectares, indicating the development of agricultural water-saving technologies in Quzhou between VP1 and VP2. This progress can be attributed to Quzhou’s scientific and technological advancements, as well as a series of water-saving policies and water resource management regulations issued by Zhejiang Province and Quzhou City (e.g., Zhejiang’s “Five-Water Co-Governance” policy, the Quzhou City Water Resource Management Measures, Quzhou Government Office Document [2009] No. 50). Meanwhile, Jinhua also promulgated a series of water-saving policies during VP2, such as the Opinions on Implementing the Strictest Water Resource Management System and Comprehensively Promoting the Construction of a Water-Saving Society (Jinhua Government Document [2013] No. 67). Regarding reservoir regulation, when the regulation time scale aligns with the research scale (e.g., annual regulation analysed at an annual scale), the total annual runoff remains largely unchanged apart from losses due to evaporation and seepage from the reservoir. The primary impact lies in the intra-annual redistribution of streamflow, making these effects difficult to detect at annual or longer scales but clearly observable as reduced peak flows at finer temporal resolutions such as monthly, weekly, or daily scales. Due to the complexity of human activities, statistical census data are used here as supporting evidence to indicate the intensity of anthropogenic influences, rather than serving as direct causal evidence for changes in streamflow. In summary, information from remote sensing products and statistical census data shows that anthropogenic activities were more frequent in VP2 than in VP1, and the trend of promoting runoff increase was more evident, which empirically validates the accuracy of the reconstructed results. Future studies should seek to quantify the contributions of different anthropogenic activities to streamflow processes and their variations across multiple temporal scales.

Table 13 Interannual variations of anthropogenic activity indicators in the Lan river Basin.

Systematic summary of the research methodology framework

Machine learning models for hydrological simulation typically involve three primary uncertainty sources: (1) data uncertainty due to poor data quality, (2) variable uncertainty from suboptimal selection of explanatory variables, and (3) model uncertainty arising from failure to capture physical processes. Among these, variable uncertainty originates from two opposing risks: excessive explanatory variables may trigger the curse of dimensionality, preventing identification of key features, whereas insufficient variables reduce model generalization capability41. However, a significant number of studies overlook the selection of explanatory variables. This study established a systematic methodological framework—variable screening, model comparison, and multi-source validation—to quantify climatic and anthropogenic impacts on runoff. Its core value extended beyond empirical application in the Lan River Basin and served as a reference for similar research. For variable selection, the framework innovatively integrated Bootstrap resampling with statistical testing, effectively resolving data scarcity (e.g., sparse annual-scale data) and multicollinearity issues. By implementing 1000 Bootstrap repetitions matched to annual-scale sample sizes, it minimized small-sample fluctuations in correlation analysis. Combined with VIF-based elimination of highly collinear variables (e.g., relative humidity and atmospheric pressure), the final variable combinations (P-T, P-E, etc.) established a robust foundation for high-precision simulations across temporal scales. This screening logic overcame limitations of traditional empirical approaches, particularly in basins with uneven monitoring periods. To address model uncertainty, the framework achieved methodological complementarity through comparative analysis of data-driven and physics-based models coupled with multi-scale cross-validation. The RFRM captured nonlinear relationships between meteorological variables and runoff59, while the SWAT model compensated for machine learning’s black-box interpretability limitations with explicit physical processes (e.g., runoff generation and concentration mechanisms). The five model types were trained and validated synchronously across seven temporal scales (from daily to annual) using out-of-period tests. Model accuracy peaked at the monthly and bimonthly scales, underscoring the critical influence of temporal scale selection.The superior performance of two-month scale models provides new insights for subsequent research. For model evaluation, we specifically conducted runoff depth-partitioned RBIAS calculations for top-performing monthly-scale models, revealing systematic biases across hydrological regimes. This informs future development of tiered machine learning models based on flow regimes (low/medium/high flow). This multi-model, multi-scale design effectively mitigated biases inherent in single-model/single-scale approaches, demonstrating strong adaptability for basins experiencing intense climatic fluctuations and complex human disturbances. Furthermore, the framework advanced beyond traditional land-use-change limitations in anthropogenic impact analysis by integrating long-term hydro-meteorological observations, multi-source remote sensing products (ISA, NTL, NDVI), and socioeconomic statistics (GDP, population, policies). This multi-source integration proves particularly suitable for basins with compound anthropogenic pressures urbanization, agriculture, hydraulic engineering). The framework’s versatility manifests in three dimensions: (1) The variable screening module accommodates basins with uneven data quality; (2) The model comparison strategy balances accuracy and physical interpretability; (3) The multi-source validation method supports regions with diverse human activities. Even in divergent climates (e.g., arid/semi-arid basins) or data-scarce contexts (e.g., lacking high-frequency monitoring), effective adaptation remains achievable through adjustable VIF thresholds and optimized temporal scale combinations (e.g., emphasizing quarterly-annual scales). This framework not only provides scientific support for water resource management in the Lan River Basin but also establishes a standardized, refined methodological foundation for attribution studies in diverse watersheds.

Conclusions

This study developed a systematic methodological framework to quantify and analyse the contribution rates of climate change and anthropogenic activities to runoff variation, integrating machine learning models, hydrological models, and multi-source data fusion across multi-temporal scales. The design logic and principal findings are summarized below:

(1) To screen explanatory variables for the RFRM inputs, this study conducted correlation and multicollinearity analyses between meteorological factors and runoff across multiple temporal scales. To mitigate interference from small-sample fluctuations in correlation analysis, Bootstrap resampling (1000 iterations) was employed to compute Spearman correlations between meteorological variables and runoff. Subsequent VIF analysis eliminated multicollinearity, thus identifying optimal variable combinations with cross-scale robustness.

(2) Cross-validation across seven temporal scales (daily to annual) revealed peak model accuracy at monthly or bimonthly scales. Subsequent runoff depth-partitioned RBIAS calculations for monthly-scale models demonstrated systematic biases under different hydrological regimes. This finding lays the foundation for developing tiered machine learning models based on flow magnitude classification (low/normal/high flow regimes).

(3) The framework integrated remote sensing products (ISA, NTL, NDVI) and socioeconomic statistics (GDP, water use, policy documents, etc.) to parse anthropogenic impacts on runoff variation, overcoming the limitations of conventional analyses that relied solely on land use change as a single perspective.

This framework was applied to the Lan River Basin. Variable screening results indicated that precipitation exhibiting the most significant association (maximum Spearman coefficient of 0.90 at seasonal scale). Although relative humidity and air pressure also correlated with runoff, they were excluded due to high multicollinearity (VIF > 10) with other meteorological factors. Models using combined variable inputs generally outperformed single-variable models in accuracy. Among seven daily-to-annual temporal scales, monthly and bimonthly scale models demonstrated optimal performance. Specifically, the P-T-Month model exhibited the most balanced performance, while the SWAT-Month model, though slightly underperforming P-T-Month, demonstrated unique advantages in simulating extreme high-flow events. By contrast, RFRM and SWAT yielded similar runoff reconstructions. Multi-source data were integrated to analyse the results. ISA, NTL, NDVI, and socioeconomic statistical data reveal that human activities exert dual impacts on runoff. These dual effects highlight the necessity of analysing human impacts through quantitative assessment. These conclusions provide direct references for watershed studies with the same characteristics as the Lan River Basin (e.g., humid and semi-humid areas, urban watersheds, etc.).