A non-optically active lake salinity dataset by satellite remote sensing

Deng, Mingming; Ma, Ronghua; Wang, Lixin; Hu, Minqi; Xue, Kun; Cao, Zhigang; Xiong, Junfeng; Yu, Zhengyang

doi:10.1038/s41597-025-05686-2

Download PDF

Data Descriptor
Open access
Published: 30 July 2025

A non-optically active lake salinity dataset by satellite remote sensing

Scientific Data volume 12, Article number: 1324 (2025) Cite this article

1561 Accesses
1 Citations
Metrics details

Subjects

Abstract

Water salinity characterizes the physicochemical properties of natural water, serving as an essential parameter for assessing lake water quality. However, the efficiency of remote sensing inversion of water salinity is limited as salinity is a non-optically active parameter, leading to the lack of a pixel-scale lake salinity dataset. Conventional function models based on salinity tracers or single lakes have low regional applicability, while machine learning algorithms can effectively capture the nonlinear relationship between radiance and salinity, providing large-scale inversion opportunities. Our study constructed an extreme gradient boosting (XGB) salinity model, which was used to generate the Inner Mongolia lake salinity (IMSAL) dataset with Sentinel-2 remote sensing reflectance. The IMSAL dataset contains 928 raster scenes with 10-meter spatial resolution for eight lakes from 2016 to 2024. Cross-validation and independent validation with measured and published literature-recorded salinities confirmed the good consistency and reliability. This dataset provides invaluable information on spatial patterns and long-term variations in lake salinity useful to prevent lake salinization and facilitate the lake management for sustainable ecosystem development.

A dataset of trophic state index for nation-scale lakes in China from 40-year Landsat observations

Article Open access 21 June 2024

Investigation of the spatial and temporal variation of soil salinity using Google Earth Engine: a case study at Werigan–Kuqa Oasis, West China

Article Open access 16 February 2023

Satellite-ground synchronous in-situ dataset of water optical parameters and surface temperature for typical lakes in China

Article Open access 14 August 2024

Background & Summary

Water salinity is crucial and maintains the stability of lake ecosystems as an essential parameter for evaluating water quality^1,2, which affects the biological, physical, and chemical processes of lake ecosystems, including the survival and distribution of aquatic organisms, the utilization of water resources, and the carbon cycle of lakes^3,4,5. Lake salinity exhibits a high spatiotemporal sensitivity under the impact of climate change and anthropogenic activity^6,7, especially in arid and semi-arid regions^8,9. However, field experiments do not provide adequate spatial detail, which makes it difficult to monitor salinity continuously.

Lake water quality information is frequently and extensively acquired by remote sensing in various bands of the electromagnetic spectrum, effectively compensating for the spatial dispersion and low-frequency problems of field surveys¹⁰. On board the Sentinel-2 satellite, the Multi-Spectral Instrument (MSI) sensor has high spatial resolution (10‒60 m), a high return frequency (5 days) with twin satellites, and considerable radiometric resolution (12 bit) with enhanced signal-to-noise ratios at 443 nm¹¹. MSI images offer more detailed spatial characterization that is useful for water constituents monitored in small-scale inland lakes. It has been widely used to estimate the concentration of optically active constituents (OACs) and non-OACs, such as colored dissolved organic matter (CDOM), chlorophyll a (Chla, μg/L), total nitrogen, and total phosphorus^12,13,14. Lake salinity belongs to non-OACs and has no direct color signal and a complicated non-linear relationship with remote sensing reflectance R_rs(λ) (units sr⁻¹)¹⁵. The CDOM absorption coefficient (a_g(λ), m^–1) commonly served as a tracer of salinity in estuaries, while the correlation between a_g(λ) and salinity cannot be significant in inland lakes^16,17. In addition, previous studies have attempted to use Secchi Disk Depth (SDD, m) as an indirect parameter to retrieve lake salinity on the Tibetan Plateau¹⁸. However, this indirect inversion method relies on the accuracy of intermediate parameters that are not only regional-specific but also complicated in the retrieval process and inevitably produce errors. In order to minimize the error propagation, machine learning (ML) algorithms were used to process the nonlinear relationship between R_rs(λ) and salinity by intricate structure and networks. ML is an innovative method to mine the information of non-OACs from satellite images and has been proved to be effective in inland waters by published studies, such as estimates of dissolved oxygen, particulate organic carbon, and dissolved organic carbon concentrations^19,20,21. Apart from other non-OACs, regional-scale lake salinity estimates have always been challenged by diverse climatic and geographic conditions, as the complexity of lake hydrology, ion composition, and sources results in multiple magnitudes. Furthermore, the high diversity of Chla, a_g(λ), and suspended particulate matter (SPM, mg/L) causes optical complexity in inland water and has made the uniform salinity algorithm developed hard^17,22. Therefore, the remote sensing of non-optically active salinity, which varies from place to place, needs to be effectively monitored, especially in arid and semi-arid areas where highly dynamic changes occur.

Inner Mongolia is located in a typical arid and semi-arid transition zone, crossing most climatic regions from east to west in China (Fig. 1), with the development of various lake types, including freshwater (<1 g/L), brackish (1–3 g/L), and oligosaline (3–35 g/L) (Table 1)²³. These lakes are critical shields for ecological security in northern China and provide valuable water resources to maintain ecosystem functions^24,25. However, the long-term satellite monitoring of lake salinity is not present in the region and it is hard to reveal the spatiotemporal distribution. Consequently, the Inner Mongolia lake salinity (IMSAL) dataset for eight lakes over the past nine years was created based on Sentinel-2 MSI data and the XGB algorithm, which offers multi-scale (daily, annual, intra-annual season, and season) average records and pixel-scale spatial distributions of lake salinity. The purpose of this research is to (1) innovatively construct a 10 m spatial resolution salinity dataset using MSI images, (2) provide a dataset paradigm for other lakes to generate salinity datasets, and (3) share long-term salinity data to facilitate lake management and prevent lake salinization.

Table 1 Statistics of water quality parameters of sampled lakes.

Full size table

Methods

The IMSAL dataset contains salinity raster data from 2016 to 2024 for eight lakes in Inner Mongolia. Figure 2 overviews the workflow for salinity retrieval and dataset generation, which consists of three major modules. Module one is the data collection and preprocessing for in situ and MSI data, including field surveys, atmospheric correction, water extraction, and index construction. Module two is the construction of the XGB salinity retrieval model, testing, and five-fold cross-validation (CV). The final module is an independent validation by using the latest in situ and literature-recorded data.

Field data collection

Field data were obtained through uniformly distributed in situ sampling and previous research in eight lakes from July 2017 to October 2024. All data were collected in spring (March to May), summer (June to August), and autumn (September to November), lacking winter data due to ice cover. A total of 318 data were collected: 229 from in situ measurements and 89 from published literature^{26,27,28,29,30,31,32,33,34,35,36,37}. Usage details about literature record salinity are shown in Table 2. These data were divided into three datasets, Dataset one contains 229 data pairs for model training, testing, and five-fold CV, with a ±3 days matching interval between field data and MSI images. Dataset two consists of 69 latest field-measured and literature records salinity data, which were used for independent validation of the IMSAL. Dataset three comprises 62 in situ samples were collected concurrent with Sentinel-2 overpass (±6 hours) and measured parameters including spectrum, salinity (ppt), Chla, a_g(λ), SPM, and SDD, which were applied to assess R_rs(λ). Salinity was measured at the lake surface (depth <0.5 m) by a calibrated YSI water quality meter (YSI, Inc.). Surface water samples collected were stored in a lightproof box and returned to the laboratory immediately to measure Chla, a_g(λ), and SPM at the cruises end. Chla and a_g(λ) were measured by using a Shimadzu UV2700 spectrophotometer (Shimadzu, Inc.)^38,39. SPM was measured using the pre-combustion and weighing methods⁴⁰. Spectral Evolution PSR-1100f (350–1050 nm) was used to measure spectral data containing the total water-leaving radiance (L_sw), the sky radiance (L_sky), and the radiance of the reference gray panel (L_p), which radiances were used to calculate R_rs(λ) with the following formula⁴¹:

$${R}_{{rs}}\left({\rm{\lambda }}\right)=\left[\left({{\rm{L}}}_{{\rm{sw}}}-{\rm{\rho }}\times {{\rm{L}}}_{{\rm{sky}}}\right)\times {{\rm{\rho }}}_{{\rm{p}}}\right]/\pi \times {{\rm{L}}}_{{\rm{p}}},$$

(1)

where ρ is Fresnel reflectance set as 0.028⁴², ρ_p is the reflectance of the gray panel (30%). Lastly, each band wavelength-center R_rs(λ) was convolved according to the Spectral Response Function (SRF) of the MSI sensor.

Table 2 Lakes with recorded salinity field data from published papers for model training or independent cross-validation (CV).

Full size table

Sentinel-2 MSI image derived Rrs(λ)

A total of 976 Sentinel-2 MSI Level-1C images were downloaded from the Copernicus Data Space Ecosystem (https://dataspace.copernicus.eu/) during the period from March 2016 to October 2024. Each MSI tile covers approximately 100 × 100 km² and the overpass is at 10:30 local time to minimize the effects of sunglint and cloud cover, and an orbital altitude is 786 km⁴³. The available images for each year in spring, summer, and autumn were downloaded under cloudless or low-cloud (<10%) conditions. The downloaded MSI images were calibrated using a dark spectrum fitting (DSF) algorithm integrated into the Atmospheric Correction for OLI lite (ACOLITE) processor to derive R_rs(λ). The DSF algorithm was designed for atmospheric corrections in coastal and inland waters and has been widely used^44,45. The MSI image-derived R_rs(λ) contains 11 bands with resampled spatial resolution of 10 m.

Lake water extraction

Lake outline data from the Lake-Watershed Science Data Center (http://lake.geodata.cn) were used as initial boundaries for water extraction. The dataset contains accurate lake names, locations, and area properties. Vector lake outlines were interpreted based on China Brazil Earth Resources Satellite (CBERS) and Landsat-5 images, associated with digital elevation maps and lake chronicles⁴⁶. Google Earth Engine (GEE) platform archived lake outlines and matched Sentinel-2 images to extract water by using the normalized difference water index (NDWI, Eq. (2)) and OTSU method⁴⁷.

$${\rm{NDWI}}=\left({\rm{G}}-{\rm{NIR}}\right)/\left({\rm{G}}+{\rm{NIR}}\right),$$

(2)

where G is the green band (560 nm) and NIR is the near-infrared band (842 nm). Accurate estimation of lake salinity was difficult in aquatic plant waters due to the optical properties differing from normal waters and tending to cause high reflections in the infrared band. Therefore, combined with the Floating Algae Index (FAI, threshold of 0.03, Eq. (3)) to exclude aquatic plant waters in grass-type lakes (Ulansuhai Lake)⁴⁸.

$${\rm{FAI}}={\rm{NIR}}-{\rm{R}}-\left[\left({\rm{SWIR}}-{\rm{R}}\right)\times \left({{\rm{\lambda }}}_{{\rm{NIR}}}-{{\rm{\lambda }}}_{{\rm{R}}}\right)/\left({{\rm{\lambda }}}_{{\rm{SWIR}}}-{{\rm{\lambda }}}_{{\rm{R}}}\right)\right]$$

(3)

where for the MSI sensor, the FAI calculate bands were adjusted to be the red band (R, 664 nm), NIR, and short-wave infrared band (SWIR, 1610 nm), with λ as the corresponding wavelength.

In addition, due to incorrect salinity estimates caused by bottom reflection and adjacency effects of mixed pixels in optically submerged lakes, possibly, a two-pixel buffer was created to minimize these influences⁴⁹. Removed small patches (<3 pixels) and then calculated the lake area by ArcGIS 10.2 (WGS1984_UTM projection) in post-processing.

This study generated water coverage frequency maps and detected the change rate of lake area by Mann-Kendall (M-K) test in eight lakes during 2016 to 2024⁵⁰ (Fig. 3 and Table 3), and the M-K test was implemented by Python kernel. The constant water frequency threshold was defined as 95% to minimize errors due to cloud cover on the lake surface. The water coverage frequency in Daihai, Hongjiannao, Juyan, and Ulansuhai lakes has fluctuated more drastically compared to other lakes (percentage ≤ 65%); notably, the lake area of Daihai has significantly shrunk over the past nine years (change rate: −1.10 km²/year, p < 0.01). In contrast, Hulun, Dalinor, and Nanhaizi lakes have maintained relatively constant water boundaries (>84%) in the past nine years. Lake area of Hulun (13.98 km²/year, p < 0.01) and Nanhaizi (0.002 km²/year, p < 0.01) exhibited an increasing trend, while Dalinor (−1.01 km²/year, p < 0.01), Chagannaoer (−0.18 km²/year, p < 0.01), Daihai, and Juyan (−0.19 km²/year, p < 0.01) lakes decreased. Hongjiannao and Ulansuhai lakes did not indicate significant area changes (p > 0.27).

Table 3 Statistics on total water area, constant water area (water coverage frequency greater than 95%), percentage of constant water, and change rate of water area in each lake from 2016 to 2024.

Full size table

Estimation of lake water salinity

One of the most widely applied and effective machine learning algorithms, XGB, was used to build the lake salinity model⁵¹. As an integrated learner based on decision trees of XGB that realizes gradient descent by residual iteration and adds a regularization term to prevent overfitting. Each tree contributes to the final prediction by a corresponding score based on the leaf node to which the input data belongs, and the final prediction is the sum of these scores across all trees in the ensemble. Good prediction accuracy and robustness of the algorithm have been demonstrated in the establishment of a retrieval model for the regression task of lake water quality parameters^52,53.

The XGB salinity algorithm was implemented by the Python kernel with the sklearn library in this study. The main phases of salinity modeling included feature selection, model training, testing, and validation (Fig. 2). Thirteen input features were filtered by an important assessment algorithm already embedded in XGB, namely, R_rs(443), R_rs(497), R_rs(560), R_rs(664), R_rs(704), R_rs(740), R_rs(842), NDWI, chromaticity angle (alpha), and lake area. The evolution of lakes is inevitably associated with changes in water volume and area, especially in arid and semi-arid regions, which can affect the water physicochemical properties (e.g., salinity)^18,54. Alpha is a physical quantity related to the composition and inherent optical properties of the optical deep water, with the formula that can be found in Wang et al.⁵⁵. The 229 matched pairs of salinity and R_rs(λ) were randomly divided into 70% training set (N = 153) and 30% test collection (N = 76), depending on the size of Dataset one and the ratio frequently used for the XGB algorithm. The model arguments and structure were determined by training collections, and the model performance was evaluated through testing collections. In the training process, the GridSearchSV method was employed to determine the model hyperparameters; the number of trees was 500, the learning rate was 0.05, the maximum tree depth was 7, the subsample rate was 0.8, the regularization was 0.01, and the minimum child weight was set to 8. For evaluating the model performance, Dataset one was arbitrarily divided into five groups for implementing five-fold CV after defining the model’s structure⁵⁶. The average statistical metrics of the five assessments were used to evaluate the XGB salinity model’s performance. Evaluation metrics consisted of the coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and bias (systematic error).

Data Records

The IMSAL dataset, containing 928 salinity raster data and corresponding mean with standard deviation records at 10 m spatial resolution for eight lakes in Inner Mongolia over the past nine years, is available at the Zenodo repository (https://doi.org/10.5281/zenodo.15849093)⁵⁷. All raster data are stored in ‘TIF’ format and the data records are stored in Microsoft Excel xlsx files, which total about 29.7 GB. Each lake’s properties, including lake name, abbreviated name, location, elevation (m), districts, and types of climates, are catalogued in ‘Lake_info.xlsx’ (Table 4). Under the IMSAL dataset, it has eight subfolders named ‘Lake_Name_SAL’ and three tables. Each subfolder contains five data folders (Table 5). These data folders store the MSI R_rs(λ)-retrieved salinity data, including daily salinity raster data (folder name: ‘Abbreviation_Individual’), average annual salinity (‘Abbreviation_Annual’), average intra-annual seasonal salinity (‘Abbreviation_Intraannual_Season’), average seasonal salinity (‘Abbreviation_Season’), and nine-year average salinity (‘Abbreviation_Average’). The attribute corresponding to ‘Abbreviation’ is available in ‘Lake_info.xlsx’. The raster name, mean, standard deviation (S.D.), MSI image identifier, and date for each raster scene are recorded in the statistics table with ‘SAL_stat.xlsx’ (Table 4). The metadata of the dataset is compiled in the table ‘IMSAL_meta.xlsx’. Table 5 lists the storage architecture of the IMSAL dataset, data format, raster spatial resolution, and temporal resolution.

Table 4 Property name and description of the IMSAL dataset.

Full size table

Table 5 The storage architecture of the IMSAL dataset, data format, and the spatiotemporal resolution of raster data.

Full size table

Technical Validation

Accuracy assessment of ACOLITE derived MSI R _rs(λ)

Considered that accurate salinity estimation is affected by the performance of atmospheric correction, it was necessary to evaluate the consistency of MSI images-derived R_rs(λ) with in situ measured R_rs(λ) (Fig. 4).The MSI-derived R_rs(λ) has the highest accuracy at 664 nm (N = 62, R² = 0.83, and RMSE = 0.0023 sr^–1) and outperforms the NIR band (704–865 nm) in terms of precision at visible wavelengths (497–664 nm). Overestimated R_rs(λ) by AC processor in the infrared range absorbed by water where minor values are prone to error. Although the processors do not perform well at 783–865 nm, these wavelength bands were not the primary input features of the model. Therefore, the input features R_rs(λ) involved in model training were reliable. The ACOLITE processor exhibits acceptable performance overall (RMSE < 0.0048 sr^‒1, |bias| < 0.0034 sr^‒1) and demonstrated it can be employed to derive R_rs(λ) from MSI images (Table 6).

Table 6 Accuracy statistics of in-situ measurements R_rs(λ) and ACOLITE-derived MSI R_rs(λ).

Full size table

Overall accuracy assessment

The XGB salinity model performs with considerable accuracy (R² = 0.98, RMSE = 0.95 ppt, MAE = 0.56 ppt, and MAPE = 10.2%) on the 30% independent testing set, as shown in Fig. 5a. Estimated salinity ranged from 0.5 to 18 ppt, which was in good consistency with measured values. But slightly underestimated lake salinity in the range of 15–18 ppt is possibly related to the constraints of the model boundaries. Five-fold CV demonstrated accuracy comparable to 30% independent testing (R² = 0.95, RMSE = 0.99 ppt, MAE = 0.51 ppt, and MAPE = 12.2%), and the salinity derived from cross-validation was consistent with the extent of field measured, with only a few points underestimated (Fig. 5b). In general, the assessment results confirm that the XGB salinity model has good precision and robustness for salinity inversion based on MSI R_rs(λ).

Pixel-based statistical validation

Salinity raster data retrieved by concurrent satellite images was selected to histogram statistics for the lake-wide pixels (Fig. 6). The mode of estimated salinity can be visualized and reveal the outliers in the raster data by image histogram. The outliers are commonly distributed at both ends of the histogram with few pixels and abrupt value transitions. A single peak phenomenon was observed in the salinity histogram of most lakes and less variation within the lake (Fig. 6). It was consistent with the mode of salinity distribution measured in the field. The bimodal characteristics were found in a few lakes and with a scarcity of pixels on the sub-peak (Fig. 6r,x). The outlier pixels were observed in the Daihai Lake at the extent of 0–5 ppt due to being affected by mixed pixels in nearshore waters, probably, and not found in other lakes (Fig. 6x). Box plots display the mean and standard deviation of measured and estimated salinity with smaller deviations for most lakes. Overall, substantial outlier pixels were not observed in the salinity raster images and demonstrated good quality.

Lake salinity independent validation

To objectively assess the reliability and science of the IMSAL dataset, independent validation was carried out by using Dataset two, which was not involved in model training and testing (Fig. 7). A total of 69 recent field-sampled and literature-recorded data were used for independent validation, which met a one-day matching interval to the salinity raster images. For historical salinity raster images, the validation density was not enough due to less matching data. The small size of in situ measurements from Lake Nanhaizi (N = 3) that have all been employed for model training and testing, they were not involved in independent validation (Fig. 7a). Field sampling of this lake will be expanded in future work. From the fitted line in Fig. 7b, the independent validation achieved a good precision (N = 69, R² = 0.94, p < 0.01, and MAPE = 11.9%). Overestimation of salinity was a major manifestation of the XGB model error, especially in the range from 0 to 6 ppt, with more training samples from oligosaline-type lakes than brackish-type lakes, which boosted low-salinity predictions. But the XGB model slightly underestimated lake salinity by 6–9 ppt because of the few model overfittings caused by dense training data in this range (Fig. 5b). Note that there may be high uncertainty (RMSE = 1.58 ppt) in the estimation of salinity in freshwater lakes (<1 ppt) due to poor salinity spectral characteristics in freshwater lakes as a result of inadequate freshwater-salt mixing and low water salinity. Additionally, unavoidable systematic errors existing in fitting nonlinear relationships between salinity and R_rs(λ) exacerbated uncertainty in freshwater lakes. Numerical comparisons of external data demonstrate again that IMSAL has science to it.

Usage Notes

Effective monitoring of the water salinity is essential for preventing lake salinization and assessing water quality conditions. But regional spatiotemporal monitoring of lake salinity remains enough. The IMSAL dataset fills the deficiency to some extent and provides help for the public to understand the spatial patterns and trends in lake salinity. Variations of lake salinity in arid and semi-arid areas respond sensitively to climate, which means a chance to mine meteorological intelligence from the salinity dataset for climatologists. Note that this distinction can be made in conjunction with field data when the user requires in-depth research that carefully distinguishes between freshwater and brackish water, as the model has uncertainties in freshwater. The insufficient density of independent validation for historical salinity images inspires our future work, which needs to increase the sampling frequency or establish routine monitoring stations to collect long-term validation data. Model underestimation of salinity in the 15–18 ppt range may cause untimely monitoring of changes in water properties and seasonal patterns, resulting in delayed response measures. Potential correction strategies include expanding the model boundaries by adding training samples from high-salinity lakes or building a correction function to recalibrate for oligosaline lake salinity through field sampling. In addition, our established XGB model for rapid inversion of lake salinity based on remote sensing also offers technical insights for generating lake salinity datasets in other regions as well. Different from salinity data constructed using environmental data modeling and field measurements^53,58, the XGB salinity model constructed based on remote sensing imagery in this study has the ability to historically reconstruct and iteratively update data. The complete procedure code for the building of the XGB salinity model and employed to MSI images was given in this study. If the user is interested in other lakes, the training samples can be extended from the current model to construct a practical model.

Code availability

Demonstration code for the construction, five-fold cross-validation, and application of the XGB salinity model is available at https://github.com/MingMDeng/SAL_inversion.git and should be accessed and edited using Python 3.11. Atmospheric correction of the Sentinel-2 MSI data was accomplished through the ACOLITE software (version 20221114.0). Extracted remote sensing reflectance and mosaic images in ENVI 5.3. Pixel-based manipulation and salinity data visualization in ArcGIS 10.2.

References

Wurtsbaugh, W. A. et al. Decline of the world’s saline lakes. Nat. Geosci. 10, 816–821 (2017).
Article ADS CAS Google Scholar
Zhao, J. & Temimi, M. An Empirical Algorithm for Retreiving Salinity in the Arabian Gulf: Application to Landsat-8 Data. in 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS) 4645–4648 (IEEE, New York, 2016).
Vidal, N. et al. Salinity shapes food webs of lakes in semiarid climate zones: a stable isotope approach. Inland Waters 11, 476–491 (2021).
Article CAS Google Scholar
Jeppesen, E., Beklioglu, M., Ozkan, K. & Akyurek, Z. Salinization Increase due to Climate Change Will Have Substantial Negative Effects on Inland Waters: A Call for Multifaceted Research at the Local and Global Scale. Innovation-Amsterdam 1, 100030 (2020).
CAS Google Scholar
Liao, Y. et al. Salinity is an important factor in carbon emissions from an inland lake in arid region. Sci. Total Environ. 906, 167721 (2024).
Article CAS Google Scholar
Canedo-Argueelles, M. et al. Saving freshwater from salts. Science 351, 914–916 (2016).
Article ADS Google Scholar
Tang, X. et al. Effects of climate change and anthropogenic activities on lake environmental dynamics: A case study in Lake Bosten Catchment, NW China. J. Environ. Manage. 319, 115764 (2022).
Article Google Scholar
Williams, W. D. Salinisation: A major threat to water resources in the arid and semi-arid regions of the world. Lakes & Reservoirs 4, 85–91 (1999).
Article Google Scholar
Jiang, X. et al. Centenary covariations of water salinity and storage of the largest lake of Northwest China reconstructed by machine learning. J. Hydrol. 612, 128095 (2022).
Article Google Scholar
Hu, M. et al. A dataset of trophic state index for nation-scale lakes in China from 40-year Landsat observations. Sci. Data 11, 659 (2024).
Article Google Scholar
Drusch, M. et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sensing of Environment 120, 25–36 (2012).
Article ADS Google Scholar
Xu, J. et al. Optical models for remote sensing of chromophoric dissolved organic matter (CDOM) absorption in Poyang Lake. ISPRS-J. Photogramm. Remote Sens. 142, 124–136 (2018).
Article ADS Google Scholar
Guo, H., Huang, J. J., Chen, B., Guo, X. & Singh, V. P. A machine learning-based strategy for estimating non-optically active water quality parameters using Sentinel-2 imagery. Int. J. Remote Sens. 42, 1841–1866 (2021).
Article Google Scholar
Pahlevan, N. et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 240, 111604 (2020).
Article Google Scholar
Sullivan, S. A. Experimental Study of the Absorption in Distilled Water, Artificial Sea Water, and Heavy Water in the Visible Region of the Spectrum*. J. Opt. Soc. Am. 53, 962 (1963).
Article ADS CAS Google Scholar
Bai, Y. et al. Remote sensing of salinity from satellite-derived CDOM in the Changjiang River dominated East China Sea. J. Geophys. Res.-Oceans 118, 227–243 (2013).
Article ADS Google Scholar
Liu, Y., Bao, A. & Chen, X. Measuring salinity of low salinity lake by optical remote sensing: A case study of Bosten Lake. Journal of Remote Sensing 18, 902–911 (2014).
Google Scholar
Liu, C. et al. The decrease of salinity in lakes on the Tibetan Plateau between 2000 and 2019 based on remote sensing model inversions. Int. J. Digit. Earth 16, 2644–2659 (2023).
Article ADS Google Scholar
Guo, H. et al. A generalized machine learning approach for dissolved oxygen estimation at multiple spatiotemporal scales using remote sensing. Environ. Pollut. 288, 117734 (2021).
Article CAS Google Scholar
Liu, D. et al. Substantial increase of organic carbon storage in Chinese lakes. Nat. Commun. 15, 8049 (2024).
Article CAS Google Scholar
Zhao, Z., Shi, K. & Zhang, Y. Remote sensing estimation of dissolved organic carbon concentrations in Chinese lakes based on Landsat images. J. Hydrol. 638, 131466 (2024).
Article CAS Google Scholar
Palmer, S. C. J., Kutser, T. & Hunter, P. D. Remote sensing of inland waters: Challenges, progress and future directions. Remote Sens. Environ. 157, 1–8 (2015).
Article ADS Google Scholar
Song, C. et al. Widespread declines in water salinity of the endorheic Tibetan Plateau lakes. Environ. Res. Commun. 4, 091002 (2022).
Article Google Scholar
Tao, S. et al. Rapid loss of lakes on the Mongolian Plateau. Proc. Natl. Acad. Sci. USA. 112, 2281–2286 (2015).
Article ADS CAS Google Scholar
Feng, S., Zheng, S., Guan, W., Han, L. & Wang, S. Evolution of the lake area and its drivers during 1990-2021 in Inner Mongolia. Environ. Earth Sci. 83, 414 (2024).
Article Google Scholar
Xu C. Bacterial Diversity in the water and Sediment of Daihai Lake. (Shanghai Ocean University, Shanghai, 2023).
Yang, F. et al. Investigation of Water Environment and Fish Diversity in Dalinor Wetlands I. Major Ions, Salt Content and Electrical Conductivity in the Water of Dali Lake. Wetland Science 18, 507–515 (2020).
Google Scholar
Li, X. et al. Spatiotemporal Variation of Phytoplankton Community Structure and Its Influencing Factors in the Dalinor Lake. Wetland Science 21, 897–906 (2023).
Google Scholar
Gao, J. et al. Effects of water quality and bacterial community composition on dissolved organic matter structure in Daihai lake and the mechanisms. Environ. Res. 214, 114109 (2022).
Article CAS Google Scholar
Yang, W. et al. Characterization for nitrogen metabolism of sediments in highland saline lake. China Environmental Science 43, 1328–1339 (2023).
CAS Google Scholar
Ren, X. et al. Spatial changes and driving factors of lake water quality in Inner Mongolia, China. J. Arid Land 15, 164–179 (2023).
Article Google Scholar
Yu, H. Health Assessment and Study on Adaptability of Evaluation Systemin Juyan Lake, Inner Mongolia. (Inner Mongolia Agricultural University, 2021).
Zhou, J. Effects of Wuliangsuhai Water Environmental Factors on Nutritional Status of Lakes. (Inner Mongolia Agricultural University, 2020).
Hu, J. Spatiotemporal dynamics and driving factors of zooplankton community in representative lakes in the Inner Mongolia plateau, China. (Xi’An University of technology, 2024).
Bai, H. et al. Characteristics of Zooplankton Community Structure and Its Relationship with Environmental Factors in Hongjiannao Lake. Journal of Ecology and Rural Environment 38, 1064–1075 (2022).
Google Scholar
Bai, H. et al. Seasonal Changes of Phytoplankton Communities and Their Relationship with Environmental Factors in the Hongjiannao Lake, A Desert Lake. Wetland Science 22, 155–168 (2024).
Google Scholar
Feng, S., Liu, X. & Li, H. Spatial variations of δD and δ18O in lake water of western China and their controlling factors. Journal of Lake Sciences 32, 1199–1211 (2020).
Article Google Scholar
Gitelson, A. A. et al. A simple semi-analytical model for remote estimation of chlorophyll-a in turbid waters: Validation. Remote Sens. Environ. 112, 3582–3593 (2008).
Article ADS Google Scholar
Bricaud, A., Morel, A. & Prieur, L. Absorption by Dissolved Organic-Matter of the Sea (yellow Substance) in the Uv and Visible Domains. Limnol. Oceanogr. 26, 43–53 (1981).
Article ADS CAS Google Scholar
Cao, Z., Duan, H., Feng, L., Ma, R. & Xue, K. Climate- and human-induced changes in suspended particulate matter over Lake Hongze on short and long timescales. Remote Sens. Environ. 192, 98–113 (2017).
Article ADS Google Scholar
Mueller, J. L. et al. Ocean Optics Protocols For Satellite Ocean Color Sensor Validation, Revision 4. Volume III: Radiometric Measurements and Data Analysis Protocols. (2003).
Mobley, C. D. Estimation of the remote-sensing reflectance from above-surface measurements. Appl. Optics 38, 7442–7455 (1999).
Article ADS CAS Google Scholar
Zeng, F. et al. Monitoring inland water via Sentinel satellite constellation: A review and perspective. ISPRS-J. Photogramm. Remote Sens. 204, 340–361 (2023).
Article ADS Google Scholar
Vanhellemont, Q. Adaptation of the dark spectrum fitting atmospheric correction for aquatic applications of the Landsat and Sentinel-2 archives. Remote Sens. Environ. 225, 175–192 (2019).
Article ADS Google Scholar
Vanhellemont, Q. Sensitivity analysis of the dark spectrum fitting atmospheric correction for metre- and decametre-scale satellite imagery using autonomous hyperspectral radiometry. Opt. Express 28, 29948–29965 (2020).
Article ADS Google Scholar
Ma, R. et al. China’s lakes at present: Number, area and spatial distribution. Sci. China-Earth Sci. 54, 283–289 (2011).
Article ADS CAS Google Scholar
Otsu, N. Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979).
Article Google Scholar
Hu, C. A novel ocean color index to detect floating algae in the global oceans. Remote Sens. Environ. 113, 2118–2129 (2009).
Article ADS Google Scholar
Deng, M. et al. Monitoring Salinity in Inner Mongolian Lakes Based on Sentinel-2 Images and Machine Learning. Remote Sens. 16, 3881 (2024).
Article Google Scholar
Sen, P. K. Estimates of the Regression Coefficient Based on Kendall’s Tau. Journal of the American Statistical Association 63, 1379–1389 (1968).
Article MathSciNet Google Scholar
Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in KDD’16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING 785–794 (Assoc Computing Machinery, New York, 2016).
Cao, Z. et al. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sens. Environ. 248, 111974 (2020).
Article Google Scholar
Xu, P., Liu, K., Shi, L. & Song, C. Machine learning modeling reveals the spatial variations of lake water salinity on the endorheic Tibetan Plateau. J. Hydrol.-Reg. Stud. 56, 102042 (2024).
Google Scholar
Williams, W. D. Environmental threats to salt lakes and the likely status of inland saline ecosystems in 2025. Environ. Conserv. 29, 154–167 (2002).
Article Google Scholar
Wang, S. et al. A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000-2018. Sci. Data 8, 26 (2021).
Article Google Scholar
Cao, Z. et al. A decade-long chlorophyll-a data record in lakes across China from VIIRS observations. Remote Sens. Environ. 301, 113953 (2024).
Article Google Scholar
Deng, M. et al. A non-optically active lake salinity dataset in Inner Mongolia. Zenodo https://doi.org/10.5281/zenodo.15849093 (2025).
Thorslund, J. & van Vliet, M. T. H. A global dataset of surface water and groundwater salinity measurements from 1980–2019. Sci. Data 7, 231 (2020).
Article Google Scholar
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540, 418–422 (2016).
Article ADS CAS Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No.42361144002 and No.42371371). We gratefully acknowledge field data support from the Lake-Watershed Science Data Center (http://lake.geodata.cn) and Inner Mongolia University. The authors thank the study participants from Nanjing Institute of Geography and Limnology (Yiqiu Wu, Guang Gao, and Feizhou Cheng) for their efforts in the field experiments.

Author information

Authors and Affiliations

Key Laboratory of Lake and Watershed Science for Water Security, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, 210008, China
Mingming Deng, Ronghua Ma, Minqi Hu, Kun Xue, Zhigang Cao, Junfeng Xiong & Zhengyang Yu
University of Chinese Academy of Sciences, Beijing, 100049, China
Mingming Deng & Zhengyang Yu
University of Chinese Academy of Sciences, Nanjing, 211135, China
Ronghua Ma
School of Ecology and Environment, Inner Mongolia University, Hohhot, 010021, China
Lixin Wang

Authors

Mingming Deng
View author publications
Search author on:PubMed Google Scholar
Ronghua Ma
View author publications
Search author on:PubMed Google Scholar
Lixin Wang
View author publications
Search author on:PubMed Google Scholar
Minqi Hu
View author publications
Search author on:PubMed Google Scholar
Kun Xue
View author publications
Search author on:PubMed Google Scholar
Zhigang Cao
View author publications
Search author on:PubMed Google Scholar
Junfeng Xiong
View author publications
Search author on:PubMed Google Scholar
Zhengyang Yu
View author publications
Search author on:PubMed Google Scholar

Contributions

Deng, M. contributed to data curation, formal analysis, methodology, programming, quality control, validation, visualization, and writing – original draft preparation and editing. Ma, R. contributed to conceptualization, data curation, funding acquisition, investigation, methodology, programming, project administration, quality assurance, quality control, supervision, and writing – review and editing. Wang, L. contributed to data curation. Hu, M. contributed to writing – review and editing. Xue, K. contributed to writing – review and editing. Cao, Z. contributed to methodology, writing – review and editing. Xiong, J. contributed to data curation and validation. Yu, Z. contributed to writing – review and editing.

Corresponding author

Correspondence to Ronghua Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Deng, M., Ma, R., Wang, L. et al. A non-optically active lake salinity dataset by satellite remote sensing. Sci Data 12, 1324 (2025). https://doi.org/10.1038/s41597-025-05686-2

Download citation

Received: 10 January 2025
Accepted: 24 July 2025
Published: 30 July 2025
DOI: https://doi.org/10.1038/s41597-025-05686-2