Introduction

Lakes contain 87% of global liquid freshwater1 and act as regulators of the carbon cycle and climate2. They play an important ecosystem services role and are sentinels of climate change3. The main focus of previous studies on the impacts of climate change on lakes include the loss of lake ice cover4,5,6, accelerated lake evaporation7, and alteration of lake mixing regimes8. One of the most direct responses to current climate change is an increase in lake surface water temperature (LSWT)9,10,11, which further impacts lake ecosystems12 and greenhouse gas emissions13. Additionally, variations in LSWT influence atmospheric temperature and moisture at a local scale. Therefore, it is important to understand how lake temperatures vary in response to climate change.

The Tibetan Plateau (TP), often referred to as the “Asian Water Tower”, is experiencing rapid warming and is currently warmer than any time in the past 2000 years14. Under this changing climate, the area of lakes15 as well as water storage16 and water levels17,18,19,20 have changed during recent decades. These changes will inevitably affect fluctuations in lake temperatures. With the development of advanced measurement technologies, LSWT datasets21,22,23 have become available, facilitating the study of lake temperatures on the TP. Wan et al.22 calculated LSWT change rates for 374 lakes on the TP for the period 2001‒2015, revealing contrasting patterns between nighttime and daytime: an overall trend of 0.37 ± 0.44 °C/decade at night and −0.54 ± 0.51 °C/decade during the day. Using moderate resolution imaging spectroradiometer (MODIS) products, Zhang et al.21 analyzed temperature changes in 52 lakes on the TP over the period 2001‒2012, revealing an overall increasing trend, with rates ranging from 0.01 to 0.47 °C/decade, although a few lakes (21 lakes) exhibited cooling trends.

In general, LSWT is influenced by climatic factors, such as air temperature, solar radiation, cloud cover, and wind speed24. Specifically, widespread increases in LSWT occur in response to atmospheric warming25,26,27; increased solar radiation can increase LSWT during spring and summer28; a decline in wind speed has been observed to increase LSWT by diminishing vertical mixing and surface turbulent heat fluxes within lakes8,29; and increased specific humidity can raise LSWT by reducing the upward latent heat flux. In summary, air temperature, shortwave radiation, longwave radiation, and specific humidity have positive effects on LSWT, while wind speed has a negative effect on LSWT.

Many studies have explored the drivers of lake warming, aiming to understand the causes of lake temperature changes. Tong et al.7 quantified the contributions of key external climate-forcing parameters to global LSWT trends, finding that an increase in air temperature accounted for 47% of global lake warming, with other notable contributions from longwave radiation (26%) and specific humidity (25%). Huang et al.30 found that changes in air temperature, downward longwave radiation, and wind speed were the most important climatic drivers of LSWT changes in China. Most studies have emphasized the influence of climatic factors on temperature changes within lakes but have generally overlooked the key role of rivers flowing into lakes. For example, on the TP, the increasing rate of non-glacial-fed lakes is much higher than that of glacial-fed lakes, likely owing to cold water input from glacier melting31. Therefore, it is necessary to consider the role of river inflows into lakes when determining the causes of lake temperature changes, especially in high-elevation lakes.

Spatial distribution of LSWT provide important evidence in intra-lake increasing patterns. For example, in large lakes across the Northern Hemisphere, LSWTs in deep areas display higher rates of warming during summer, attributed to the greater temporal persistence in deep areas of temperature anomalies associated with an earlier onset of thermal stratification32. By analyzing the temperature differences in different regions, it is possible to clearly reveal the heat transfer pathways in the lake, the water mixing patterns, and the spatial differentiation of the living environment of aquatic organisms. This is crucial for constructing a comprehensive and accurate understanding of lake ecosystem, which is conducive to the rational utilization of lake resources and the important basis for the sustainable development of human society.

The major objective of this study is to identify and better understand the role of lake inflows in determining both the overall variations and intra-lake heterogeneity of LSWT from July to November. We selected three lakes (Nam Co, Siling Co, and Paiku Co) and their surrounding basins on the TP (Fig. 1), all having available in-situ lake temperature observations, as case studies. The random forest method was used to identify the main driving factors of LSWT changes in these three lakes. We also analyzed the relationships between the intra-lake heterogeneity of LSWT and both fluvial inflows and lake bathymetry.

Fig. 1: Overview of the study area.
Fig. 1: Overview of the study area.
Full size image

a Location of the study area on the Tibetan Plateau (TP). bd Distribution of rivers and geography of the Nam Co, Siling Co, and Paiku Co basins, respectively. The red five-pointed stars indicate the locations of the in-situ monitoring sites of lake temperature. The yellow triangle indicates the in-situ flow monitoring point in Nam Co.

Results

Evaluation of CCI-LSWT dataset

After comparison with in-situ observations at different points within the three lakes (Fig. 2), the CCI-LSWT dataset was shown to accurately reproduce the dynamics of LSWT. The minimum r between CCI-LSWT and observed LSWT exceeded 0.9, and the largest absolute values of mean bias and RMSE were 0.51 and 1.60 °C, respectively.

Fig. 2: Comparison of daily lake surface water temperature (LSWT) (°C) between the CCI-LSWT dataset and observations in the three studied lakes.
Fig. 2: Comparison of daily lake surface water temperature (LSWT) (°C) between the CCI-LSWT dataset and observations in the three studied lakes.
Full size image

a Nam Co. b Siling Co. c Northern part of Paiku Co. d Southern part of Paiku Co. RMSE root-mean-square error, r correlation coefficient, n data volume.

In addition, we validated the LSWT values from the CCI-LSWT dataset across entire lakes using MODIS data on a monthly scale. The comparison was conducted during the ice-free period when the CCI-LSWT dataset contains more available data. Accordingly, we selected May‒December for Nam Co33, April‒November for Siling Co34, and April‒December for Paiku Co35. There was a strong correlation between CCI-LSWT and MODIS LSWT, with r exceeding 0.9 and absolute values of mean bias and RMSE not exceeding 1.66 and 2.15 °C, respectively (Fig. 3). However, an underestimation was observed in CCI-LSWT, which may be a result of the transit time of MODIS Terra generally being later in the day than that of the satellites producing the CCI-LSWT dataset, allowing more time for the lake to heat up. Overall, our evaluation demonstrates that the CCI-LSWT dataset can be applied to the analysis of LSWT in these three lakes.

Fig. 3: Interannual variation of monthly lake surface water temperature (LSWT) of CCI (blue line) and moderate resolution imaging spectroradiometer (MODIS) data (red line) during the period 2001–2019.
Fig. 3: Interannual variation of monthly lake surface water temperature (LSWT) of CCI (blue line) and moderate resolution imaging spectroradiometer (MODIS) data (red line) during the period 2001–2019.
Full size image

a Nam Co. b Siling Co. c Paiku Co. RMSE root-mean-square error, r correlation coefficient.

Evaluation of lake inflow dataset

We collected available monthly average observed inflows of Nam Co Basin at Niyaqu River to evaluate the accuracy of the GRFR dataset for the period 2008‒2012. Our results showed that the GRFR dataset can accurately capture the monthly variation in inflow discharge, with an r of 0.84 (Fig. 4). The relatively small mean bias of 0.43 m³/s and RMSE of 1.82 m³/s confirmed the accuracy of the GRFR dataset, this being sufficient for our focus on discharge changes here.

Fig. 4: Comparison of multi-year monthly average values of discharge between observed and Global Reach-level Flood Reanalysis (GRFR) during the period 2008‒2012 in Nam Co basin.
Fig. 4: Comparison of multi-year monthly average values of discharge between observed and Global Reach-level Flood Reanalysis (GRFR) during the period 2008‒2012 in Nam Co basin.
Full size image

RMSE root-mean-square error, r correlation.

Contributions of hydroclimatic variables to LSWT

We calculated the mean LSWT of the whole lake from July to November for the period 2001‒2019, as shown in Fig. 5. Using linear fitting, we found that the temperature variation in each of the three lakes was below 0.1 °C/decade and did not pass the 0.05 significance test. Despite the general increase in air temperature across the TP, the three lakes did not show a significant warming trend, so the influence of other hydrometeorological factors was investigated.

Fig. 5: Interannual variation of lake surface water temperature (LSWT).
Fig. 5: Interannual variation of lake surface water temperature (LSWT).
Full size image

a Nam Co. b Siling Co. c Paiku Co.

We explored the importance of 2-m air temperature, shortwave radiation, longwave radiation, wind speed, specific humidity, and river discharge to the changes in LSWT. Zero - phase component analysis was employed. Random forest analysis was used to quantitatively analyze the relative contributions of different variables in explaining the overall LSWT variations in the three lakes from July to November (Fig. 6). This analysis showed that > 85% of the variation in LSWT values can be attributed to meteorological variables and lake inflows. Specifically, river discharge was the most important factor explaining the changes in LSWT values (p < 0.01) in all lakes, with its RCR reaching 40%, exceeding the influence of meteorological factors. The water inputs to lakes from precipitation and glacier or snow meltwater typically have a lower temperature than the lake water, which can modulate lake temperature31. This may cause the three lakes do not exhibit significant interannual variation. In addition, air temperature and specific humidity also had a significant impact on overall lake temperatures (p < 0.05).

Fig. 6: Estimated relative contribution rates (RCRs) of surface air temperature (AT), specific humidity (SH), 10 m wind speed (WS), shortwave radiation (SW), longwave radiation (LW), and runoff to long-term lake surface water temperatures (LSWTs) from July to November during the period 2001–2019.
Fig. 6: Estimated relative contribution rates (RCRs) of surface air temperature (AT), specific humidity (SH), 10 m wind speed (WS), shortwave radiation (SW), longwave radiation (LW), and runoff to long-term lake surface water temperatures (LSWTs) from July to November during the period 2001–2019.
Full size image

A red five-pointed star indicates that the corresponding variable passes the 95% significance test. a Nam Co. b Siling Co. c Paiku Co. The explained variances of random forest for the LSWT of Nam Co, Siling Co, and Paiku Co are 88.4%, 86.6%, and 88.3%, respectively.

We further analyzed the factors affecting LSWT values in each month, as shown in Table 1. Air temperature exhibited a strong correlation with LSWTs in most months, especially in Nam Co and Siling Co, suggesting that air temperature substantially influences lake temperature on a monthly scale. However, the relationship between each meteorological variable and LSWTs varied considerably from month to month in Paiku Co. River discharge had a negative correlation with LSWTs in summer in all three lakes, indicative of the cooling effect of river inflows on LSWTs.

Table 1 Correlation analysis between monthly mean air temperature (AT), downward longwave radiation (LW), downward shortwave radiation (SW), specific humidity (SH), wind speed (WS), discharge, and lake surface water temperature (LSWT) during the period 2001‒2019

Intra-lake heterogeneity of LSWT and associated influencing factors

We calculated the summer and autumn LSWT averages in each lake for 19 years at the grid scale, and provided the locations of the major rivers flowing into the lake as well as the lake bathymetry corresponding to the lake. In general, the spatial distributions of LSWT exhibited notable differences across different seasons.

As illustrated in Fig. 7, the LSWT of Nam Co exhibits an obvious west-to-east warming gradient during summer. The eastern basin experiences higher temperatures due to its shallower depth and consequently lower heat capacity, allowing for faster thermal response. In contrast, the western margins exhibit cooler LSWT, likely resulting from wind-induced upwelling of colder subsurface waters driven by prevailing westerlies36. In autumn, the temperature of Nam Co shows a pattern where the central area is warmer while the surrounding areas are cooler. Near the lake inlet, the temperature is the lowest which may be associated with cooler inflow runoff. In autumn, the difference between the air temperature in the basin and the LSWT reached a maximum in October and November (Fig. 7d). There is a well-established positive correlation between river temperature and surrounding air temperature37,38. Therefore, we used the difference between the basin average air temperature and LSWT as an indicator of the temperature difference between river and lake water. Our results indicate that the temperature difference between the lakes and their inflowing rivers reached a peak during October and November, consistent with the findings of a previous study that measured lake and river temperatures in Paiku Co basin35. In autumn, river water temperature experienced a rapid decline, whereas LSWT values remained relatively stable owing to the high specific heat capacity of lakes. Consequently, despite lower river inflow in autumn, the relatively cooler nature of the river water leads to sufficient cold water being brought into the lakes, resulting in the observed pattern of lower LSWT values at the inlets of the three lakes. The central lake region maintains higher temperatures due to its greater depth (Fig. 7c), where the larger water volume provides increased heat capacity and slower cooling.

Fig. 7: Intra-lake heterogeneity of lake surface water temperature (LSWT) and its influencing factors of Nam Co in summer and autumn.
Fig. 7: Intra-lake heterogeneity of lake surface water temperature (LSWT) and its influencing factors of Nam Co in summer and autumn.
Full size image

a Spatial distribution of LSWT and location of rivers in summer at Nam Co Basin. Darker blue colors of catchments equate to larger river flows in the corresponding region. b The distribution of LSWT in autumn. c The bathymetry of Nam Co. d Variation of basin air temperature (AT) and LSWT.

In Siling Co, rivers flow into it from all four compass directions. Among these, the Za’gya Zangbo in the northern basin and the Za’gen Zangbo and Ngari Zangbo in the western basin, which originate from mountain regions, have flow volumes that are over 20 times greater than those of rivers in the eastern and southern regions. Consequently, their cooling effects on LSWT in summer are more pronounced, resulting in two particularly cold zones: one with LSWT below 13 °C in the northern part of the lake and another in the western part of the lake with an LSWT range of 13‒13.3 °C (Fig. 8). In autumn, there is a significant temperature difference between the river and lake, causing lower temperatures near the inflow area of the lake.

Fig. 8: Intra-lake heterogeneity of lake surface water temperature (LSWT) and its influencing factors of Siling Co in summer and autumn.
Fig. 8: Intra-lake heterogeneity of lake surface water temperature (LSWT) and its influencing factors of Siling Co in summer and autumn.
Full size image

a Spatial distribution of LSWT and location of rivers in summer at Siling Co Basin. Darker blue colors of catchments equate to larger river flows in the corresponding region. b The distribution of LSWT in autumn. c The bathymetry of Siling Co. d Variation of basin air temperature (AT) and LSWT.

In Paiku Co, the spatial distribution characteristics in summer and autumn are opposite (Fig. 9). In summer, the lowest lake temperatures occur in the northern region, attributable to its greater water depth which results in slower thermal response. The substantial river-lake temperature contrast in autumn induces lower temperatures in the southern lake region.

Fig. 9: Intra-lake heterogeneity of lake surface water temperature (LSWT) and its influencing factors of Paiku Co in summer and autumn.
Fig. 9: Intra-lake heterogeneity of lake surface water temperature (LSWT) and its influencing factors of Paiku Co in summer and autumn.
Full size image

a Spatial distribution of LSWT and location of rivers in summer at Paiku Co Basin. Darker blue colors of catchments equate to larger river flows in the corresponding region. b The distribution of LSWT in autumn. c The bathymetry of Paiku Co. d Variation of basin air temperature (AT) and LSWT.

Discussion

TP Lakes are sensitive to regional hydroclimatic changes and human activities. Studying the intra-lake heterogeneity of LSWT can help us to better understand regional environmental change. Our findings show that interannual variation of LSWT in Nam Co, Siling Co and Paiku Co was insignificant. The result of random forest analysis indicate that the factor with the greatest influence on LSWT in all three lakes was lake inflows, followed by air temperature and specific humidity from July to November. We found that lake inflows increased the intra-lake heterogeneity of LSWT, but the mechanism was different in summer compared with that in autumn. During summer, runoff flowing into the lakes significantly increased (Fig. 4), while in autumn, there is a large temperature difference between rivers and lakes. Both of these causing localized lower temperatures in certain areas of the lakes. In addition to runoff, the lake bathymetry can also explain the intra-lake heterogeneity of LSWT, which is manifested as the difference in LSWT between the deep and shallow areas of the lake.

The CCI-LSWT dataset is subject to inherent uncertainties from diverse sources. Radiative transfer models may introduce inaccuracies due to imperfect parameterizations or assumptions. Uncertainties associated with retrieval arise from simplifications in model design, observational constraints, or non-ideal atmospheric conditions. ERA5-Land dataset uncertainties stem from the model limitations in fully capturing the physical processes that govern the earth system. To evaluate how these uncertainties might affect the random forest results, we performed a sensitivity analysis by introducing Gaussian noise to each independent variable. The noise had a mean of 0 and a standard deviation equivalent to 10% of the standard deviation of the original variable. A total of 50 independent disturbance experiments were conducted. The results of the random forest model showed that the ranking of variable importance remained stable across all experiments. This indicates that the conclusions drawn from the model are robust and not significantly affected by the uncertainties in the input data.

This study emphasizes the influence of fluvial inflows on LSWT. In addition to precipitation, river inflow may originate from glacier and snow meltwater, particularly in the Tibetan Plateau region. Taking Nam Co as a case study (Fig. 10), we analyzed the intra-annual variations of total inflow runoff and cryospheric meltwater contribution (ths sum of glacier and snow melt) from 2001 to 2019 based on the simulated results from Zhou et al.16. Result shows that the continuous influx of meltwater with a certain proportion from May to September constitutes a persistent cold water source entering the lake system, which may potentially contribute to LSWT suppression. Future climate warming may drive the contrasting thermal regimes in lake systems: In glacier-fed basins where glacier melt accelerates, an increase in cold meltwater fluxes may maintain or intensify the cooling effects of LSWT. Conversely, in regions where glacier retreat with declining meltwater contributions, the reduction of cold water injection may amplify lake warming trends. This bidirectional response highlights the critical role of cryospheric meltwater in modulating thermal dynamics of alpine lakes.

Fig. 10
Fig. 10
Full size image

The intra-annual variability of total inflow runoff, glacier and snow meltwater, and the proportion of glacier and snow meltwater in the total inflow runoff in the Nam Co basin from 2001 to 2019.

The study of intra-lake heterogeneity of LSWT should be emphasized in future research because the spatial distribution of LSWT provides more insights than that a single average value of the whole lake. Meanwhile, future work needs more specific quantification of the relative contribution of river inflows, key meteorological factors (e.g., air temperature, wind speed) and lake bathymetry in shaping the heterogeneity of LSWT. A direct comparison using percentage contributions or significance testing would help to clarify the dominant drivers of spatial variation and further support the role of river inflows in lake thermodynamics. To achieve these objectives, the three-dimensional lake model should be encouraged, which is driven by external meteorological conditions such as wind stress, lake thermal stratification and basin topography, playing a critical role in shaping intra-lake LSWT heterogeneity39,40,41.

Methods

Study area

The main study area comprised three lakes located on the central and southern parts of the TP: Nam Co, Siling Co, and Paiku Co. Nam Co, with a surface area of over 2000 km2, is situated to the north of the Nyainqentanglha Mountains. The main rivers flowing into Nam Co are the Naqu, Boqu, Angqu, and Cequ on the western shoreline, the Niyaqu on the eastern shoreline, and a series of parallel tributaries on the southern shoreline, few rivers flow into the northern part of the lake. The lake gradually becomes shallower from the center to the periphery, with a maximum depth exceeding 90 m and an average depth of over 50 m33. The average annual temperature of Nam Co is below 0 °C, with the highest temperature occurring in July42. Glacial meltwater is of great importance to the water balance and hydrological processes within Nam Co and its surrounding basin43,44. Siling Co is currently the largest lake in the Tibet Autonomous Region of China45. This lake is located within a closed catchment, with the main rivers flowing into it being the Za’gen Zangbo and Ngari Zangbo in the west, Boques Zangbo in the east, and Za’gya Zangbo in the north. The average annual temperature of Siling Co is 0.3 °C and the average total annual rainfall is approximately 290 mm. Lake inflow is markedly affected by both precipitation and snow and glacier melt46. Paiku Co, located on the southern TP, is also a hydrologically closed lake and has a surface area of 266 km2. Rivers flow from south to north into the lake; major rivers include the Daqu, Bulaqu, and Barixiongqu. The average depth of the lake is 41.1 m and the deepest point (72.8 m) is located in the northern portion of the lake47. According to the second Chinese glacier inventory in 200948, the total glacial area in Paiku Co basin has decreased from 130.2 to 119.9 km2 during the past 40 years. The average annual air temperature from June 2015 to May 2016 was 4.4 °C47.

Observational data

In-situ measured lake water temperature data for the three lakes were obtained from the National Tibetan Plateau/Third Pole Environment Data Center. In-situ water temperature monitoring data (3‒83 m in depth, 10 layers) for Nam Co (Fig. 1b) were collected from October 2011 to July 201449, and expressed as daily average values. Daily lake water temperature measurements (2‒38 m in depth, 5 layers) for Siling Co (Fig. 1c) spanned the period June‒December 2017. Using the observed lake water temperature data from different depths in Nam Co and Siling Co, model curves were constructed to obtain the lake water temperature at 0 m when data from all depths were available. In total, 48 data points were available for Nam Co, while 93 data points were available for Siling Co. In Paiku Co (Fig. 1d), two water temperature profiles were monitored: one in the southern part of the lake (0‒42 m in depth) and the other in the northern part (0‒72 m in depth)35. These observed data spanned the period June 2015‒May 2018 for the southern monitoring point, and June 2016‒May 2017 for the northern monitoring point. The observed water temperature data at 0 m for the three lakes were used to validate LSWT product data.

In addition to lake water temperature data, observed daily runoff data in the Niyaqu River, located in the eastern part of the Nam Co basin (Fig. 1b), were collected. The observed runoff data were obtained from the Nam Co Monitoring and Research Station for Multisphere Interactions (30°46′N, 90°59′E), which was established by the Institute of Tibetan Plateau Research, Chinese Academy of Sciences in 2005. These data, available from May to October over the period 2008‒2012, were primarily used to validate lake inflow data from runoff products.

Following the method described in Han et al.50, the bathymetry of three lakes was derived from vectorized contour lines based on prior surveys47,51,52, and was mainly used to evaluate its impact on the intra-lake heterogeneity of LSWT.

Product-based LSWT data

A satellite-derived LSWT product with high spatial resolution, released by the European Space Agency Climate Change Initiative (CCI-LSWT)53,54,55, has been widely used in previous studies6,56,57 and was also utilized here. The CCI-LSWT dataset provides daily LSWT data for 2024 lakes worldwide from five sensors: ATSR2 (ERS-2), AATSR (Envisat), MODIS (Terra), AVHRR (MetOpA and MetOpB) and SLSTR (Sentinel3A and Sentinel3B), with a spatial resolution of 1/120° grid format (~1 km resolution at the equator). For the three studied lakes, a buffer zone of 1 km was constructed to eliminate the influence of water level change on LSWT.

The CCI-LSWT data are sparse during the ice-covered period owing to problems in discriminating clouds from lake ice or snow58. Hence, we only analysis the lake water temperature in summer (July‒September) and autumn (October‒November)12,59, excluding December because the lakes are generally frozen at this time33,34,35. The CCI-LSWT dataset was validated at the point scale. By identifying the four nearest grid points corresponding to the in-situ measurement locations of each lake, we compared the measured lake temperatures at 0 mwith the mean CCI-LSWT values of these four grid points.

Validation of water surface temperatures from the MODIS land surface temperature (LST) product against in-situ measurements has been conducted in previous studies, showing good performance for many lakes60,61,62. Therefore, the MODIS LST product was used here to validate the CCI-LSWT product in terms of the overall LSWT variations in the three lakes. The MODIS LST dataset contains 15 years (2001‒2015) of nighttime and daytime LSWT data for 374 lakes22. For comparison with the CCI-LSWT dataset, we specifically utilized the daytime LSWT data obtained from each entire lake.

Meteorological data

The European Centre for Medium-range Weather Forecasts (ECMWF) Reanalysis v5-Land (ERA5-Land) dataset63 provides global land gridded climate-forcing data from 1981 to the near present at hourly temporal resolution and a spatial resolution of 0.1° × 0.1°. Various climate-forcing variables from the hourly ERA5-Land dataset were used to analyze their influence on LSWT. On the basis of previous work7,30, we selected five factors that mainly influence LSWT: 2-m air temperature, longwave radiation, shortwave radiation, wind speed, and specific humidity. Wind speed was calculated using both the 10 m v-component and u-component of wind. Specific humidity was accurately estimated by integrating the 2 m dewpoint temperature and surface pressure data. For each lake, the monthly mean value of each meteorological variable was calculated.

Lake inflows

Lake inflow data were derived from the Global Reach-level Flood Reanalysis (GRFR) dataset, which contains 3-hourly and daily river flow records for 2.94 million rivers worldwide over a 40-year period from 1980 to 201964. The dataset is based on the variable infiltration capacity distributed hydrological model and the parallel computation of discharge model, integrating global natural river runoff with high resolution and accuracy into simulations. For each lake basin, we extracted lake inflow data from major rivers that flow into the lakes and calculated the monthly average river discharge from the GRFR dataset. For Niyaqu River in Nam Co Basin, the multi-year monthly average observed runoff (calculated from daily data) was compared with the lake inflow obtained from the GRFR product (Fig. 4).

Basin boundaries

Lake basin boundaries were obtained from the HydroBASINS dataset, which offers 12 hierarchically nested sub-basin breakdowns globally at arc-second resolution65. Level six and seven watersheds were mainly used here.

Estimating the contributions of forcing variables to LSWT

In this study, our predictors used in random forest included meteorological variables (wind speed, air temperature, shortwave radiation, longwave radiation, and specific humidity) as well as river discharge. We used zero-phase component analysis to eliminate the effects of interactions among variables as much as possible. This method is commonly used for data pre-processing in machine learning and has also been applied to the reconstruction of climate fields66. The random forest method was then used to estimate the contributions of hydrometeorological variables to changes in LSWT values. Random forest is a classification tree-based algorithm that falls under the broader category of machine learning methods, requiring simulation and iterative processes. Breiman67 introduced the Bagging (Bootstrap Aggregation) method, which became a milestone in the field of machine learning and has been widely cited. Based on the Bagging approach that randomly selects subsets of samples, the random forest algorithm further reduces the correlation between individual trees by randomly selecting features at each node. This enhances regression accuracy and improves overall model performance. Compared with other statistical classifiers, random forest offers advantages such as high classification accuracy, ability to model complex interactions among predictor variables, and flexibility in performing various types of statistical data analyses68. The order of importance was determined by their frequency and relative position in individual trees across the entire forest. The explanatory power of the response variables was estimated using the mean-squared error (MSE). Subsequently, the relative contribution rate (RCR) of each factor was calculated based on the MSE values. Random forest has been applied to lake feature attribution in previous studies69,70.

Evaluation criteria

Various statistical criteria, including root-mean-square error (RMSE), bias, and correlation coefficient (r) were used to evaluate the performances of the CCI-LSWT and GRFR runoff products. These criteria are defined as follows:

$$\text{RMSE}=\sqrt{\mathop{\sum }\limits_{\text{i}=1}^{{\rm{N}}}{({\text{X}}_{\text{s},\text{t}}-{\text{X}}_{\text{o},\text{t}})}^{2}/{\rm{N}}},$$
(1)
$$\text{Bias}=\mathop{\sum }\limits_{\text{i}=1}^{{\rm{N}}}({\text{X}}_{\text{s},\text{t}}-{\text{X}}_{\text{o},\text{t}})/{\rm{N}},$$
(2)
$$\left.{\text{r}}=\left[\mathop{\sum }\limits_{{\text{i}}=1}^{{\rm{N}}}({\text{X}}_{{\text{s}},{\text{t}}}-{\bar{\text{X}}}_{\text{s}})({\text{X}}_{{\text{o}},{\text{t}}}-{\bar{\text{X}}}_{\text{o}})\right]\right/\left[\sqrt{\mathop{\sum }\limits_{{\text{i}}=1}^{{\rm{N}}}{({\text{X}}_{{\text{s}},{\text{t}}}-{\bar{\text{X}}}_{\text{s}})}^{2}}\sqrt{\mathop{\sum }\limits_{{\text{i}}=1}^{{\rm{N}}}{({\text{X}}_{{\text{s}},{\text{i}}}-{\bar{\text{X}}}_{\text{o}})}^{2}}\right],$$
(3)

where, N is the number of observed values; \({\text{X}}_{\text{s},\text{t}}\) includes both observed and MODIS LSWT values, as well as observed runoff values at Niquya at t moment; \({\text{X}}_{\text{o},\text{t}}\) represents CCI-LSWT and runoff values from GRFR; \({\bar{\text{X}}}_{\text{s}}\) and \({\bar{\text{X}}}_{{\rm{o}}}\) denotes the mean value of the corresponding data.