Abstract
Remote sensing technology has the potential to enhance the lake’s large-scale and long-term dynamic monitoring capabilities significantly. High-quality in-situ datasets are essential for improving the accuracy and reliability of remote sensing retrieval of lake ecosystems. This dataset provides satellite-ground synchronized in-situ data on water multi-parameters for typical lakes in China spanning the period between 2020 and 2023. It includes quality-checked water optical parameters (remote sensing reflectance (Rrs), chlorophyll-a (Chl-a), total suspended matter (TSM) and Secchi disk depth (SDD)), and water surface temperature (WST) data. It encompasses 586 sampling points across 18 lakes. The dataset exhibits two significant highlights: Firstly, synchronous observations from multiple satellites are coordinated during the data collection effectively supporting the retrieval and validation of water remote sensing products. Secondly, it encompasses diverse data types, collecting synchronous measurements of Rrs and various parameters. This dataset will continuously update, substantially enhancing regional and global lake monitoring capabilities through satellite remote sensing data.
Similar content being viewed by others
Background & Summary
Water optical parameters and WST serve as crucial indicators for measuring and assessing lake ecosystems1,2. Water optical parameters offer valuable insights into the detection of organic and inorganic particles, light penetration, ecological risk assessment, and the influence of climate change on lakes3,4,5. Variations in WST can affect the growth, reproduction, and distribution of aquatic organisms and the lake’s water balance. They find extensive application in monitoring, researching, and managing lake environments, facilitating the evaluation of the impact of human activities, climate change, and pollution on these ecosystems. Consequently, the accurate and long-term monitoring of changes in lake water optical parameters holds great significance. However, water experiments often encounter limitations due to weather conditions, instrumentation constraints, and sampling time. Collecting valuable long-term water multi-parameter data necessitates substantial economic and temporal investments6. Furthermore, water quality parameters datasets are often dispersed among researchers and institutions. Therefore, consolidating such data into large-scale, long-term, and high-precision water quality datasets poses considerable challenges.
With the advancement of remote sensing satellites and the maturation of remote sensing technology, utilizing remote sensing technology for large-scale and long-term lake monitoring has emerged as an inevitable trend7,8,9. Satellite sensors for water quality monitoring can be broadly categorized into marine and land sensors. The marine sensors include the Moderate Resolution Imaging Spectroradiometer (MODIS, Terra and Aqua), the Visible Infrared Imaging Radiometer Suite (VIIRS, NPP), and the Ocean and Land Colour Instruments (OLCI, Sentinel-3). On the other hand, the land sensors encompass Operational Land Imager (OLI, Landsat 8–9), Multi-Spectral Instrument (MSI, Sentinel-2), Panchromatic and Multispectral camera (PMS, Gaofen), and Charge-Coupled Device (CCD, HJ2A)10,11,12. These sensors substantially enhance the lake’s monitoring capabilities in terms of spatial and temporal scales, thereby furnishing long-term water optical information for lakes. To extract water optical parameters from satellite images, limnologists, oceanographers, and other researchers have devised various retrieval models based on empirical, semi-empirical, analytical, and artificial intelligence models13,14,15,16. However, the applicability and generalizability of these algorithms at spatiotemporal scales require further investigation, especially considering spectral interactions in inland waters16. The influence of optically active components contributes to interference in the water mixture’s intrinsic optical properties, rendering it challenging to discern the contributions of distinct water quality components from the overall water-leaving radiance17. Although specific studies have monitored water optical parameters across spatial and temporal scales, these retrieval models rely on a large amount of consistent data support.
In recent years, several in-situ datasets have been published for aquatic ecological research, as presented in Table 1. These datasets can be categorized into ocean datasets and inland water datasets. The ocean datasets encompass SeaBASS18, CoastColour Round Robin datasets19, and a compilation of global bio-optical in-situ data for ocean color satellite applications20. These datasets effectively integrate global ocean water spectra and quality data, establishing a solid marine ecological research foundation. Similarly, various datasets of inland water parameters have also been released, such as ReaISAT21, LCC-TP v1.022, AquaSat23, GLORIA24, LakeTSim25, and an extensive spatiotemporal water quality dataset spanning four decades in China26. ReaISAT presents a new global dataset that contains the location and surface area variations of 681,137 lakes from 1984 to 2015. AquaSat contains more than 600,000 matchups, covering 1984–2019, of ground-based water quality parameters measurements paired with spectral reflectance from Landsat 5, 7, and 8. GLORIA includes 7,572 curated hyperspectral remote sensing reflectance measurements at 1 nm intervals within the 350 to 900 nm wavelength range. LakeTSim combined numerical modeling and satellite thermal data to create a regional dataset of long-term water temperatures for 401 French lakes. LCC-TP v1.0 is the first dataset of lake-catchment characteristics for 1525 lakes on the Tibetan Plateau, including six categories: lake characteristics, topography, climate, land cover/land use, soil and geology, and human activities. The 40-year water quality spatiotemporal dataset comprised over 330,000 observations encompassing daily, weekly, and monthly records of surface water quality covering the period from 1980 to 2022. Among inland water datasets, only two provide water parameter data specifically for China, whereas the GLORIA dataset contains relatively limited data in China. Over the past five years, a mere 113 in-situ data records were recorded in 2021 across four lakes in southeastern China (Hou Lake, Wuhan East Lake, Poyang Lake, and Liangzi Lake). Conversely, the 40-year water quality spatiotemporal dataset provides significant long-term series water parameter data in China. It serves as an extensive source for long-term series data in water quality monitoring of China’s offshore waters, lakes, reservoirs, and rivers. However, the majority of these datasets are concentrated on water quality monitoring in the eastern coastal areas, resulting in a limited number of monitoring points for inland lakes in China. Furthermore, the monitoring points for lakes are mainly situated along the shorelines, making it challenging to capture the entire distribution of water parameters within the lakes. This limitation may introduce specific uncertainty when using remote sensing technology to retrieve lake water optical parameters.
To fulfill the requirements of satellite remote sensing applications across various domains, China initiated the “National Civilian Space Infrastructure Terrestrial Observation Satellite Common Application Support Platform Project” in 2018. This project provides common and fundamental technical services, ensuring the provision of in-situ data measurements for six parameters across the country: radiation, geometry, water parameters, land surface, atmosphere, and vegetation. The project has accumulated a substantial volume of synchronous satellite-ground in-situ measurement data, supporting quantitative remote sensing research conducted by Chinese satellites. The initial batch of selected water-related common products included four water optical parameters (Rrs, Chl-a, TSM, SDD) and WST. More water quality parameters, such as CDOM, COD, TN and TP, will be measured with support from future projects.
After four years of development, the project has established a synchronized satellite-ground dataset for water optical parameters and WST data in typical lakes across China. This dataset aims to construct a large-scale, long-term water dataset covering typical lakes in China. From 2020 to 2023, A total of 586 matched Rrs, Chl-a, TSM, SDD, Wand ST data points were collected from 18 lakes. The satellite-ground synchronous dataset serves as a validation resource for water remote sensing products. GaoFen or HJ-2A/B satellite simultaneous observations were coordinated during the in-situ data collection process. Additionally, global open satellite data, including Landsat 8-9 and Sentinel-2, were considered in the lake experiments. The synchronous observation method ensures a time difference of less than ± 1.5 hours between in-situ and satellite observation data. Moreover, this dataset simultaneously captures Rrs and water optical parameters and WST data, enabling researchers to explore their intrinsic relationship, thus establishing a more reliable and credible retrieval model for water optical parameters. With the ongoing support of the project, the collection of water quality parameters in China’s typical lakes will continue, and regular updates to this dataset will be made. This dataset is a valuable complement to lake research, providing an effective and comprehensive data resource.
Methods
Parameter description
This dataset focuses on four water optical parameters and WST, which are essential for remote sensing studies of lakes. These parameters play a crucial role in understanding the ecological dynamics of lakes, and extensive studies utilizing remote sensing retrieval modeling have been conducted on them27.
Rrs provides a wealth of water information and can be utilized to infer essential water parameters. Data from airborne or satellite remote sensing platforms enable monitoring large-scale water regions and assessing water quality.
Chl-a is a photosynthetic pigment found in aquatic plants, algae, and certain bacteria28. It plays a crucial role in capturing light energy and initiating the process of photosynthesis, which is essential for estimating the primary productivity of aquatic ecosystems.
TSM refers to the concentration of suspended or floating organic and inorganic particles in water29. TSM measurements provide insights into water turbidity, sedimentation processes, and the transport of nutrients and pollutants.
SDD represents the depth to which light can penetrate through the water. It is influenced by factors such as TSM, phytoplankton abundance, and CDOM. SDD can provide information about the pollution level and overall water quality30.
WST is often employed as an indicator for water quality monitoring. Changes in WST are closely related to factors such as dissolved oxygen content, biological activity, water circulation, and water pollution. Furthermore, WST has significant implications for the survival and reproduction of aquatic organisms31.
By monitoring these parameters, researchers can gain insights into aquatic ecosystems’ ecological conditions, primary productivity, and nutrient dynamics. These parameters are widely utilized in environmental monitoring projects, research endeavors, and management efforts to assess the impact of human activities, climate change, and pollution on lakes.
Dataset description
This dataset encompasses 18 lakes located within China, as illustrated in Fig. 1. The longitude of the lakes ranges from 86.725° to 132.336°, while the latitude ranges from 24.859° to 49.546°. The dataset is categorized based on various lake characteristics, including the lake zone, latitude, altitude, water quality type, formation, hydrology, and surface area. Table 2 demonstrates the comprehensive representation of the dataset across different regions in China. Considering China’s location in the mid-latitude region, the dataset includes 15 lakes in the mid-latitude region and three in the low-latitude region. The altitudes of these lakes range from 0 to 3,200 meters. According to the Environmental Quality Standards for Surface Water (GB3838-2002)32, the lakes are classified as follows: four lakes as Class II, seven lakes as Class III, five lakes as Class IV, and one lake as Class V. These lakes comprise 16 freshwater lakes and two saline lakes (Lake Qinghai and Lake Tuosu). Furthermore, the sampling points were strategically designed to sample multiple water optical parameters, ensuring a more accurate representation of the spatial distribution of different parameters within the lakes33. Figure 2 depicts the sampling locations for each experiment.
The geographical locations of the 18 lakes44.
This dataset compiles water experimental data from 2020 to 2023, as shown in Table 3. Among the 18 lakes included, water quality data for eight lakes was collected on a single occasion, while data for the remaining lakes was collected two or more times. Lakes, with their relatively limited spatial heterogeneity compared to other land covers, allow for the direct validation of medium-high-resolution remote sensing products. As for low-resolution satellite remote sensing products, in-situ data can be transformed into satellite image pixel scale data through scale conversion, enabling their utilization in product validation34.
This dataset’s advantage lies in its effective synchronization of in situ data collection with Chinese satellite observations. By coordinating synchronous observations of satellites over the lakes during in-situ data collection, the dataset ensures the most accurate “truth value” for water remote sensing products. Table 3 illustrates the synchronous satellite data during the water sampling experiment, ensuring a time difference between ground sampling and satellite observations of approximately ±1.5 hours.
Chinese multispectral satellites like GF1 (8 m/16 m), GF1B/C/D (8 m), GF2 (4 m), and GF6 (8 m/16 m) have high spatial resolutions. They can also provide high-temporal and high-spatial-resolution data for retrieving water optical parameters. For instance, HJ2A/B has multispectral sensors (16 m) and infrared and thermal infrared sensors (48/96 m). The synchronicity between Sentinel-2, Landsat 8-9, and the in-situ dataset was also investigated. When the time window was one day before and after the in-situ sampling, 73% of the ground experiments were synchronized with Sentinel-2 observations. Further expanding the time window to 3 days before and after the field sampling resulted in 90% synchronicity with Sentinel-2 data. However, fewer synchronous satellite images are available because Landsat satellites have a 16-day temporal resolution. Nonetheless, twelve water experiments coincided with Landsat satellite overpasses, allowing for the verification of water remote sensing products using this dataset.
Observation dataset acquisition
Rrs measurement method
The Rrs measurements were conducted using a TRIOS RAMSES water spectrometer. The spectral resolution of the in-situ Rrs is 1 nm. This system comprises one irradiance sensor and two radiance sensors. The above-water measurement method was employed for data collection. The Rrs at each sampling station was calculated following the methodology proposed by Mobley35,36.
where \({L}_{t}^{0+}(W[{watt}]\cdot {m}^{-2}\,\cdot \,{{sr}}^{-1}\,\cdot \,{{nm}}^{-1})\) is the above water upwelling radiance, \({L}_{{sky}}^{0+}(W[{watt}]\cdot {m}^{-2}\,\cdot \,{{sr}}^{-1}\,\cdot \,{{nm}}^{-1})\) is sky radiance, and \(({E}_{d}^{0+}(W[{watt}]\cdot {m}^{-2}\,\cdot \,{\mathrm{nm}}^{-1}))\) is the above surface downwelling irradiance measured using TrioOS RAMSES radiometers, within 350–900 nm wavelengths. The sensors were attached to a frame adhering to the viewing angles of Mobley. \({L}_{t}^{0+}\) has a relative azimuth angle (φv) within 90°–135° from the sun and a zenith angle (θv) of 45° from the nadir, and \({L}_{{sky}}^{0+}\) has a zenith angle (\({{\rm{\theta }}}_{v}^{{\prime} }\)) of θv + 90° from nadir.
The instrument setup and measurements are conducted as follows: (i) Before measurement, the TriOS RAMSES water spectrometer is mounted on a specialized device that fixes the angles of the three sensors. This device is depicted in Fig. 3(a). (ii) The device is placed horizontally at the edge of the boat deck, ensuring the target water body is not influenced by surrounding structures’ shadows, sun glints, water foam, or white caps. (iii) The observation geometry θv, φv is set to 40° and 135°, respectively. The zenith angles of the two radiance sensors are adjusted to 40° and −40°, with one sensor pointing toward the water surface and the other sensor pointing toward the sky. (iv) Each observation strictly follows the field observation geometry. Simultaneous spectral data from two radiance sensors and one irradiance sensor are collected, with multiple spectra acquired for each sensor. (v) Filtering is applied to the collected spectral data from each sensor to eliminate residual sun glint, sky reflection components, and the influence of clouds and surrounding structures. (vi) Remote sensing reflectance (Rrs) is computed based on the processed data.
Chl-a and WST measurement method
Chl-a and WST were measured using a water quality monitoring platform manufactured by YSI37. We specifically used the EXO1/EXO2 instrument model. This platform includes a multi-parameter device designed to monitor water quality parameters data. The measurement of Chl-a concentration relies on chlorophyll fluorescence measurement. When exposed to a specific wavelength, chlorophyll emits light at a higher wavelength. The excitation light source for the instrument is a light-emitting diode with a peak wavelength of approximately 470 nm. Upon stimulation by this light source, chlorophyll within intact cells emits fluorescence within the 650–700 nm spectrum. The instrument quantifies the fluorescence signal to determine the chlorophyll concentration in the water. We employ a method based on a thermistor or a thermal resistance sensor for WST measurements. This approach ensures a high level of accuracy and stability in the measurements. During the measurement, the instrument was placed 10–30 cm underwater. The measurement time was 1 minute, and the interval between each measurement was 2–3 seconds. The average value of multiple measurements was recorded as the Chl-a concentration and WST at the sampling point.
TSM measurement method
TSM measurements are conducted using experimental determination methods, which involve the following steps38: (i) Water Sampling: Standard glass containers (300–500 ml) were used to collect the samples at 30–50 cm below the water surface and label each sample. (ii) Preparation of filters: The filter membranes were pre-rinsed with distilled water and then placed into weighing bottles. The weighing bottles containing the filter membranes were dried in an oven at 103–105 °C for 60 minutes. After the weighing bottles cooled to room temperature, their weights (B) were measured and recorded using a balance. (iii) Sample processing: A 100 ml portion of a well-mixed water sample was filtered using filter membranes and a vacuum pump. The weighing bottle with the filtered sample on the filter membrane was then placed into an oven set at 103–105 °C for 60 minutes. After the sample cooled to room temperature, its weight (A) was measured and recorded using a balance. (iv) TSM (mg/L) calculation: During the water experiment, water samples are collected using containers and returned to the laboratory for measurement. The measurement principle is shown in Eq. (2):
where C is TSM concentration (mg/L); A (g) is the weight of TSM, filter membrane, and weighing bottle; B (g) is the weight of the filter membrane and weighing bottle; V (ml) represents the sample volume.
SDD measurement method
SDD measurements are conducted using the Secchi disk method39. The Secchi disk, typically a white or black-and-white disk with a diameter of approximately 30 cm, is employed to assess the transparency of lake waters. The SDD refers to the depth at which the disk is no longer visible to an observer when lowered into the water. This depth serves as a quantitative indicator of the water body’s transparency or vertical visibility. It is recommended to record the average of the two measurements taken at the bow and stern during the assessment.
Sensor corrections
TriOS calibration
The TriOS radiance and irradiance sensors were calibrated at wavelengths of 460 nm, 550 nm, and 640 nm for four integration times. The radiance source used for the integrating sphere was the USLR-V12F-NMNN model manufactured by Labsphere Inc. The irradiance sensor utilized the FEL-C Irradiance Standard (F-1703) manufactured by Optronic Laboratories. The results demonstrate excellent fitting outcomes for the radiance and irradiance sensors at the three specified wavelengths (Fig. 4). After one year of TriOS radiance and irradiance sensor operation, the calibration coefficient changes were examined, and no significant deviations were found.
YSI EXO1/EXO2 calibration
Chl-a and WST were measured using the YSI water quality parameter monitoring platform. The sensor calibration was performed before each measurement. We used a dual-channel fluorescence sensor for the Chl-a measurement and expressed the measurements in relative fluorescence units (RFU). The Chl-a values were derived from the correlation between Chl-a concentration and RFU. The RFU channel was calibrated using a two-point calibration method (Fig. 5)40. The instrument’s conductivity and temperature were directly calibrated for WST using standard conductivity solutions.
Two-point calibration of Chl-a (a) and conductivity (b). x is the measured value before sensor calibration, and y is the value after sensor calibration. The linear relationship before and after the sensor calibration calibrates the sensor’s accuracy. For example, the Chl-a concentration measured by the instrument before calibration is 50 ug/L, and the Chl-a concentration measured by the instrument after calibration is 42.128 ug/L.
Chlorophyll calibration was performed using distilled water (0 μg/L) and water with a known chlorophyll concentration. A standard solution containing 625 μg/L of Rhodamine WT dye can be utilized as the reference solution with a known chlorophyll concentration. For the conductivity/temperature sensor, calibration is primarily required for conductivity, while temperature does not need calibration. Conductivity calibration is performed using standard conductivity solutions. The two-point calibration procedure is as follows: (i) First, immerse the sensor in distilled water and input 0 as the value for the first standard solution in the instrument. (ii) Next, place the sensor in the Rhodamine WT standard solution and observe the data readings. Accept this calibration point once the data becomes stable (or shows no significant changes for approximately 40 seconds). (iii) Complete the calibration process.
Data quality control
Specific quality control measures must be implemented to ensure the generation of a high-quality dataset for water optical parameters. These measures encompass data matching, completeness checks, and quality control, all of which collectively contribute to producing a high-quality dataset.
-
1)
Before quality control, all in situ data must be reviewed to ensure the collected data’s accuracy and proper matching. This review examines naming conventions, such as replacing abbreviations like “Lon” for longitude with “Longitude”. Additionally, units are standardized to ensure consistency and conformity to the appropriate unit of measurement. For example, for SDD measurements, “meters (m)” may be changed to “centimeters (cm)”.
-
2)
Completeness checks are essential to ensure the usability of the in-situ dataset. These checks encompass metadata completeness, including OID, sampling time, sampling point coordinates, and other relevant information. Furthermore, the identification of missing or invalid values is crucial. Due to organizational differences, completeness checks are conducted separately for Rrs and water quality data. Initially, the completeness of metadata is assessed, and data with incomplete metadata is excluded. The presence of invalid or missing values is determined. The corresponding data point is flagged if a sampling point has multiple missing values for water optical parameters. Moreover, if a sampling point exhibits substantial missing data within the spectral range of 350–940 nm, the data point is also flagged.
-
3)
Data matching ensures a one-to-one correspondence between Rrs and water optical parameter data. Spatial matching is conducted to exclude data that do not align spatially. This step ensures the consistency and reliability of the dataset.
-
4)
Quality control: The reflectance spectra undergo a thorough quality control process to identify outliers and unrealistic spectral shapes. A set of quality control indicators (refer to Table 4) was utilized, derived from the GLORIA in-situ dataset24. These indicators addressed high-frequency variability, baseline shifts, and negative slopes, particularly in the ultraviolet-to-blue region of the spectrum. The QWIP indicator was employed to identify spectral shapes that deviated from the typical central tendencies observed in aquatic reflectance data. Reasonable threshold ranges for maximum and minimum values were established through a comprehensive literature review and extensive sampling experience. Hydrology and remote sensing specialists expertly evaluated spectra flagged as ‘Suspect’.
Table 4 Quality control flag names for Rrs and water optical parameters.
The quality control identified 551 high-quality in situ datasets, all of which remain unflagged. These datasets encompass essential information, including spatial details such as longitude and latitude, temporal details such as sampling time, and water optical parameter information such as Chl-a, TSM, SDD, and WST. Table 5 presents the preliminary statistics for these data.
Data Records
The main tasks have been completed, including lake selection, sampling design, data collection, sensor calibration, and data archiving. All data have undergone consistency checks and quality control to flag suspicious or erroneous data.
This dataset is hosted in the Zenodo database, an open-access scientific data storage and sharing platform41. The data is contained in several comma-separated values (CSV) files, and a Microsoft Excel file provides keys to column names (see Table 6).
Technical Validation
Validation of water remote sensing reflectance
To verify the usability of the Rrs data, this experiment compares the Rrs data with satellite images. The Rrs data is equated to the image value of a synchronous satellite42. First, water reflectance (Rw) is calculated as:
In-situ Rw spectra were weighted by spectral response functions SRF(λ) of sensor bands, thus deriving multi-spectral data comparable to atmospherically corrected sensor reflectance from the image.
where \({R}_{{eq}}\left({\lambda }_{i}\right)\) is the sensor reflectance simulated from in-situ reflectance (water equivalent spectrum), and Fi represents the spectral response function of the i th band of the satellite sensor.
Remote sensing images are obtained from satellites that transit simultaneously with the ground sampling time. The Sentinel-2 data are from the Copernicus Data Space Ecosystem (https://dataspace.copernicus.eu/), and the Landsat 8 data are from the United States Geological Survey (https://earthexplorer.usgs.gov/). GF and HJ satellite data are downloaded from the China Center for Resources Satellite Data and Application (https://data.cresda.cn/#/2dMap). To obtain the surface reflectance (SR), the Sentinel-2 and Landsat 8 data were atmospherically corrected using ACOLITE DSF, and the GF and HJ data were atmospherically corrected using FLAASH.
The Pearson correlation coefficient (r), the mean absolute percentage error (MAPE, %) and root mean square error (RMSE, sr−1) are used to evaluate the deviations between Req and the remote sensing image value.
where N is the total number of measurements; i represents a single measurement; M and E are measured values and estimated values, respectively.
We verified the correlation and accuracy of each band between the water equivalent spectrum and satellite image values using six lakes in different regions of China (Fig. 6 & Table 7). The satellites utilized for validation encompass Landsat, Sentinel, and China’s HJ and GF satellites, ensuring a highly representative validation. The SR data of the satellite and the water equivalent spectrum are evenly distributed on both sides of the 1:1 line, indicating that the satellite data can monitor the water-leaving spectral radiance. However, in some lakes, the SR data of the satellite and the water equivalent spectrum do not show an obvious correlation. For example, in Fig. 6(c,f), the SR data of the satellite and the water equivalent spectrum show a weak correlation and negative correlation, indicating inconsistency between the satellite data and the water spectrum data. In addition, the Sentinel-2 data exhibit weaker consistency with the water equivalent spectrum data in Bands 6, 7, and 8 in Fig. 6(a). This discrepancy can be attributed to the low remote sensing reflectivity of the water spectrum in the near-infrared band. These situations will cause significant uncertainty in the retrieval of water quality parameters from remote sensing satellites.
Remote sensing inversion of water optical parameters
The in-situ data collected from the lakes in China can be effectively employed for remote sensing retrieval of water optical parameters. To assess the applicability of this dataset, a water optical parameter retrieval model was constructed using simulated sensor reflectance derived from in-situ reflectance (Eq. 4) and water optical parameters (Chl-a, TSM, and SDD)43 as exemplified through GF1 PMS and Sentinel-2 MSI. The retrieval model uses the simulated PMS and MSI sensor reflectance based on in-situ reflectance as training data. The remote sensing retrieval of water optical parameters was performed using a random forest regression algorithm. We employed the RandomForestRegressor model from the Scikit-learn library in Python. The model is based on ensemble learning and utilizes the random forest algorithm to fit and predict continuous target variables. The model parameter configuration is set as n_estimators=500, max_depth=10, random_state=3. The training and validation datasets consist exclusively of quality-controlled in-situ data. The independent variable is the reflectance of the PMS sensor (Band 1–4) and MSI sensor (Band 2, 3, 4 and 8) at different spectral bands (blue, green, red, and near-infrared). The dependent variable is the corresponding water optical parameter. The data were divided into a training dataset (385 samples) and a testing dataset (166 samples) in a 7:3 ratio.
To verify the effectiveness of remote sensing retrieval for Chl-a, TSM, and SDD, the retrieval results of PMS and MSI are presented in Fig. 7 and on two different sensors, except for the Chl-a validation results of GF1 PMS, the training and validation correlations of the retrieval models for the three water optical parameters exceeded 80%. Furthermore, the correlations between the SDD in the validation dataset surpass 90%. Additionally, due to the spectral response functions of the two sensors being relatively similar in the visible to near-infrared range, the root mean square errors (RMSEs) difference between the retrieval results of the two sensors is minimal, within 5%. These results indicate a high accuracy in retrieving the three water optical parameters. Notably, there is a strong correlation between water optical parameters and sensor reflectance. Overall, the findings demonstrate the availability and effectiveness of both in-situ and remote sensing data in water optical parameter retrieval.
Usage Notes
Quality flags
Every record in the dataset is accompanied by a quality flag, and the interpretation of the quality flag can be found in the “Variable” file. Missing values for water optical parameters were designated as outliers.
Data update
The dataset will undergo regular updates with the project’s ongoing support, ensuring the latest information is included. Any updates to the dataset will be thoroughly documented and described in the Zenodo database, providing a comprehensive account of the changes made.
Code availability
The code to conduct quality flagging described in the “Technical validation” section is written in Python and available on Zenodo. The code is available at https://github.com/VolunteeredGI/Lake-in-situ-dataset-code.
References
Gholizadeh, M. H., Melesse, A. M. & Reddi, L. A Comprehensive Review on Water Quality Parameters Estimation Using Remote Sensing Techniques. Sensors. 16, 1298 (2016).
Bhateria, R. & Jain, D. Water quality assessment of lake water: a review. Sustain. Water Resour. Manag. 2, 161–173 (2016).
Zhang, Y. et al. A study of absorption characteristics of chromophoric dissolved organic matter and particles in Lake Taihu, China. Hydrobiologia. 592, 105–120 (2007).
Woolway, R. I. et al. Global lake responses to climate change. Nature Reviews Earth & Environment. 1, 388–403 (2020).
Stefan, H. G., Cardoni, J. J., Schiebe, F. R. & Cooper, C. M. Model of light penetration in a turbid lake. Water Resources Research. 19, 109–120 (1983).
Sharpley, A. N. et al. Managing Agricultural Phosphorus for Protection of Surface Waters: Issues and Options. J of Env Quality. 23, 437–451 (1994).
Qi, C., Huang, S. & Wang, X. Monitoring water quality parameters of Taihu Lake based on remote sensing images and LSTM-RNN. IEEE Access. 8, 188068–188081 (2020).
Wang, S. et al. A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000–2018. Scientific Data. 8, 26 (2021).
Tortini, R. et al. Satellite-based remote sensing data set of global surface water storage change from 1992 to 2018. Earth System Science Data. 12, 1141–1151 (2020).
Smith, B. et al. A chlorophyll-a algorithm for Landsat-8 based on mixture density networks. Frontiers in Remote Sensing. 1, 623678 (2021).
Jiang, D. et al. Remotely estimating total suspended solids concentration in clear to extremely turbid waters using a novel semi-analytical method. Remote sensing of environment. 258, 112386 (2021).
Liu, Y., Fan, J.-P. & Jiang, H. Evaluation of Parametric and Nonparametric Algorithms for the Estimation of Suspended Particulate Matter in Turbid Water using Gaofen-1 Wide Field-of-view Sensors. J Indian Soc Remote Sens. 49, 2673–2687 (2021).
Pahlevan, N. et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sensing of Environment. 240, 111604 (2020).
Wang, Q. et al. Long-term remote sensing of total suspended matter using Landsat series sensors in Hulun Lake, China. International Journal of Remote Sensing. 42, 1379–1397 (2021).
Song, K. et al. A unified model for high resolution mapping of global lake (>1 ha) clarity using Landsat imagery data. Science of The Total Environment. 810, 151188 (2022).
Yang, H. et al. A review of remote sensing for water quality retrieval: Progress and challenges. Remote Sensing 14(8), 1770 (2022).
Dube, T., Mutanga, O., Seutloali, K., Adelabu, S. & Shoko, C. Water quality monitoring in sub-Saharan African lakes: a review of remote sensing applications. African Journal of Aquatic Science. 40, 1–7 (2015).
Werdell, P.J., Fargion, G.S., McClain, C.R. and Bailey, S.W. The SeaWiFS bio-optical archive and storage system (SeaBASS): Current architecture and implementation. No. NASA/TM-2002-211617 (2002).
Nechad, B. et al. CoastColour Round Robin data sets: a database to evaluate the performance of algorithms for the retrieval of water quality parameters in coastal waters. Earth system science data 7(2), 319–348 (2015).
Valente, A. et al. A compilation of global bio-optical in situ data for ocean-colour satellite applications–version three. Earth System Science Data Discussions 2022, 1–61 (2022).
Khandelwal, A. et al. ReaLSAT, a global dataset of reservoir and lake surface area variations. Sci. Data. 9, 356 (2022).
Liu, J. et al. A dataset of lake-catchment characteristics for the Tibetan Plateau. Earth System Science Data. 14, 3791–3805 (2022).
Ross, M. R. et al. AquaSat: A Data Set to Enable Remote Sensing of Water Quality for Inland Waters. Water Resources Research. 55, 10012–10025 (2019).
Lehmann, M. K. et al. GLORIA-A globally representative hyperspectral in situ dataset for optical sensing of water quality. Scientific data 10(1), 100 (2023).
Sharaf, N. et al. A long-term dataset of simulated epilimnion and hypolimnion temperatures in 401 French lakes (1959–2020). Earth System Science Data Discussions 2023, 1–16 (2023).
Lin, J. et al. An extensive spatiotemporal water quality dataset covering four decades (1980–2022) in China. Earth System Science Data 16(2), 1137–1149 (2024).
Yin, Z. et al. Water clarity changes in Lake Taihu over 36 years based on Landsat TM and OLI observations. International Journal of Applied Earth Observation and Geoinformation. 102, 102457 (2021).
Harvey, E. T., Kratzer, S. & Philipson, P. Satellite-based water quality monitoring for improved spatial and temporal retrieval of chlorophyll-a in coastal waters. Remote Sensing of Environment. 158, 417–430 (2015).
Bilotta, G. S. & Brazier, R. E. Understanding the influence of suspended solids on water quality and aquatic biota. Water research. 42, 2849–2861 (2008).
Swift, T. J. et al. Water clarity modeling in Lake Tahoe: Linking suspended matter characteristics to Secchi depth. Aquat. Sci. 68, 1–15 (2006).
Dokulil, M. T. et al. Increasing maximum lake surface temperature under climate change. Climatic Change. 165, 56 (2021).
Standardization Administration of the P.R.C. Environmental Quality Standards for Surface Water (GB3838-2002), Beijing, Standards Press of China https://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/shjbh/shjzlbz/200206/W020061027509896672057.pdf (2002).
Zhai, M. et al. Water Multi-Parameter Sampling Design Method Based on Adaptive Sample Points Fusion in Weighted Space. Remote Sensing. 14, 2780 (2022).
Cui, Y. et al. Global water cycle and remote sensing big data: Overview, challenge, and opportunities. Big Earth Data 2(3), 282–297 (2018).
Mobley, C. D. Estimation of the remote-sensing reflectance from above-surface measurements. Applied optics 38(36), 7442–7455 (1999).
Mobley, C. D. Polarized reflectance and transmittance properties of windblown sea surfaces. Appl. Opt. 54, 4828 (2015).
Yellow Springs Incorporated. The Basics of Chlorophyll Measurement, p 2., accessed April 2, 2024 at https://www.ysi.com/File%20Library/Documents/Technical%20Notes/T606-The-Basics-of-Chlorophyll-Measurement.pdf.
Ministry of Ecology and Environment of the People’s Republic of China. Water quality-Determination of suspended substance-Gravimetric method (GB 11901-89), Beijing, Standards Press of China https://www.mee.gov.cn/image20010518/3723.pdf (1990).
Lee, Z. et al. Secchi disk depth: A new theory and mechanistic model for underwater visibility. Remote sensing of environment. 169, 139–149 (2015).
Yellow Springs Incorporated. EXO User Manual, p128–132, accessed April 2, 2024 at https://www.ysi.com/file%20library/documents/manuals/exo-user-manual-web.pdf.
Zhai, M. et al. Satellite-ground synchronous in-situ dataset of water optical parameters and surface temperature for typical lakes in China. Zenodo https://doi.org/10.5281/zenodo.10434390 (2024).
Martins, V. S. et al. Assessment of atmospheric correction methods for Sentinel-2 MSI images applied to Amazon floodplain lakes. Remote Sensing 9(4), 322 (2017).
Zhai, M. et al. Retrieve of total suspended matter in typical lakes in China based on broad bandwidth satellite data: Random Forest model with Forel-Ule Index. Frontiers in Environmental Science 11, 1132346 (2023).
Zhang, X. et al. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth System Science Data 13(6), 2753–2776 (2021).
Acknowledgements
We would like to acknowledge Yong Xie and Wen Shao for their support in data collection. We sincerely acknowledge funding from the National Key Research and Development Program of China (Grant No. 2023YFB3905800) and The Common Application Support Platform for Land Observation Satellites of China’s Civil Space Infrastructure.
Author information
Authors and Affiliations
Contributions
Mingjian Zhai: conceptualization, data filtering, and writing-original draft & editing; Xiang Zhou: supervision, formal analysis, and investigation; Zui Tao: conceptualization, validation and writing-review; Yong Xie: supervision; Jian Yang: project administration; Wen Shao: data collection, software; Hongming Zhang: visualization; Tingting Lv: writing-review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhai, M., Zhou, X., Tao, Z. et al. Satellite-ground synchronous in-situ dataset of water optical parameters and surface temperature for typical lakes in China. Sci Data 11, 883 (2024). https://doi.org/10.1038/s41597-024-03704-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-024-03704-3









