Abstract
Water salinity characterizes the physicochemical properties of natural water, serving as an essential parameter for assessing lake water quality. However, the efficiency of remote sensing inversion of water salinity is limited as salinity is a non-optically active parameter, leading to the lack of a pixel-scale lake salinity dataset. Conventional function models based on salinity tracers or single lakes have low regional applicability, while machine learning algorithms can effectively capture the nonlinear relationship between radiance and salinity, providing large-scale inversion opportunities. Our study constructed an extreme gradient boosting (XGB) salinity model, which was used to generate the Inner Mongolia lake salinity (IMSAL) dataset with Sentinel-2 remote sensing reflectance. The IMSAL dataset contains 928 raster scenes with 10-meter spatial resolution for eight lakes from 2016 to 2024. Cross-validation and independent validation with measured and published literature-recorded salinities confirmed the good consistency and reliability. This dataset provides invaluable information on spatial patterns and long-term variations in lake salinity useful to prevent lake salinization and facilitate the lake management for sustainable ecosystem development.
Similar content being viewed by others
Background & Summary
Water salinity is crucial and maintains the stability of lake ecosystems as an essential parameter for evaluating water quality1,2, which affects the biological, physical, and chemical processes of lake ecosystems, including the survival and distribution of aquatic organisms, the utilization of water resources, and the carbon cycle of lakes3,4,5. Lake salinity exhibits a high spatiotemporal sensitivity under the impact of climate change and anthropogenic activity6,7, especially in arid and semi-arid regions8,9. However, field experiments do not provide adequate spatial detail, which makes it difficult to monitor salinity continuously.
Lake water quality information is frequently and extensively acquired by remote sensing in various bands of the electromagnetic spectrum, effectively compensating for the spatial dispersion and low-frequency problems of field surveys10. On board the Sentinel-2 satellite, the Multi-Spectral Instrument (MSI) sensor has high spatial resolution (10‒60 m), a high return frequency (5 days) with twin satellites, and considerable radiometric resolution (12 bit) with enhanced signal-to-noise ratios at 443 nm11. MSI images offer more detailed spatial characterization that is useful for water constituents monitored in small-scale inland lakes. It has been widely used to estimate the concentration of optically active constituents (OACs) and non-OACs, such as colored dissolved organic matter (CDOM), chlorophyll a (Chla, μg/L), total nitrogen, and total phosphorus12,13,14. Lake salinity belongs to non-OACs and has no direct color signal and a complicated non-linear relationship with remote sensing reflectance Rrs(λ) (units sr−1)15. The CDOM absorption coefficient (ag(λ), m–1) commonly served as a tracer of salinity in estuaries, while the correlation between ag(λ) and salinity cannot be significant in inland lakes16,17. In addition, previous studies have attempted to use Secchi Disk Depth (SDD, m) as an indirect parameter to retrieve lake salinity on the Tibetan Plateau18. However, this indirect inversion method relies on the accuracy of intermediate parameters that are not only regional-specific but also complicated in the retrieval process and inevitably produce errors. In order to minimize the error propagation, machine learning (ML) algorithms were used to process the nonlinear relationship between Rrs(λ) and salinity by intricate structure and networks. ML is an innovative method to mine the information of non-OACs from satellite images and has been proved to be effective in inland waters by published studies, such as estimates of dissolved oxygen, particulate organic carbon, and dissolved organic carbon concentrations19,20,21. Apart from other non-OACs, regional-scale lake salinity estimates have always been challenged by diverse climatic and geographic conditions, as the complexity of lake hydrology, ion composition, and sources results in multiple magnitudes. Furthermore, the high diversity of Chla, ag(λ), and suspended particulate matter (SPM, mg/L) causes optical complexity in inland water and has made the uniform salinity algorithm developed hard17,22. Therefore, the remote sensing of non-optically active salinity, which varies from place to place, needs to be effectively monitored, especially in arid and semi-arid areas where highly dynamic changes occur.
Inner Mongolia is located in a typical arid and semi-arid transition zone, crossing most climatic regions from east to west in China (Fig. 1), with the development of various lake types, including freshwater (<1 g/L), brackish (1–3 g/L), and oligosaline (3–35 g/L) (Table 1)23. These lakes are critical shields for ecological security in northern China and provide valuable water resources to maintain ecosystem functions24,25. However, the long-term satellite monitoring of lake salinity is not present in the region and it is hard to reveal the spatiotemporal distribution. Consequently, the Inner Mongolia lake salinity (IMSAL) dataset for eight lakes over the past nine years was created based on Sentinel-2 MSI data and the XGB algorithm, which offers multi-scale (daily, annual, intra-annual season, and season) average records and pixel-scale spatial distributions of lake salinity. The purpose of this research is to (1) innovatively construct a 10 m spatial resolution salinity dataset using MSI images, (2) provide a dataset paradigm for other lakes to generate salinity datasets, and (3) share long-term salinity data to facilitate lake management and prevent lake salinization.
(a) Location of sampled lakes, (b–i) distribution of sampling points and frequency of water occurrence for eight lakes from east to west in Inner Mongolia. Surface water frequency data were from the Global Surface Water dataset (https://global-surface-water.appspot.com/)59.
Methods
The IMSAL dataset contains salinity raster data from 2016 to 2024 for eight lakes in Inner Mongolia. Figure 2 overviews the workflow for salinity retrieval and dataset generation, which consists of three major modules. Module one is the data collection and preprocessing for in situ and MSI data, including field surveys, atmospheric correction, water extraction, and index construction. Module two is the construction of the XGB salinity retrieval model, testing, and five-fold cross-validation (CV). The final module is an independent validation by using the latest in situ and literature-recorded data.
Field data collection
Field data were obtained through uniformly distributed in situ sampling and previous research in eight lakes from July 2017 to October 2024. All data were collected in spring (March to May), summer (June to August), and autumn (September to November), lacking winter data due to ice cover. A total of 318 data were collected: 229 from in situ measurements and 89 from published literature26,27,28,29,30,31,32,33,34,35,36,37. Usage details about literature record salinity are shown in Table 2. These data were divided into three datasets, Dataset one contains 229 data pairs for model training, testing, and five-fold CV, with a ±3 days matching interval between field data and MSI images. Dataset two consists of 69 latest field-measured and literature records salinity data, which were used for independent validation of the IMSAL. Dataset three comprises 62 in situ samples were collected concurrent with Sentinel-2 overpass (±6 hours) and measured parameters including spectrum, salinity (ppt), Chla, ag(λ), SPM, and SDD, which were applied to assess Rrs(λ). Salinity was measured at the lake surface (depth <0.5 m) by a calibrated YSI water quality meter (YSI, Inc.). Surface water samples collected were stored in a lightproof box and returned to the laboratory immediately to measure Chla, ag(λ), and SPM at the cruises end. Chla and ag(λ) were measured by using a Shimadzu UV2700 spectrophotometer (Shimadzu, Inc.)38,39. SPM was measured using the pre-combustion and weighing methods40. Spectral Evolution PSR-1100f (350–1050 nm) was used to measure spectral data containing the total water-leaving radiance (Lsw), the sky radiance (Lsky), and the radiance of the reference gray panel (Lp), which radiances were used to calculate Rrs(λ) with the following formula41:
where ρ is Fresnel reflectance set as 0.02842, ρp is the reflectance of the gray panel (30%). Lastly, each band wavelength-center Rrs(λ) was convolved according to the Spectral Response Function (SRF) of the MSI sensor.
Sentinel-2 MSI image derived Rrs(λ)
A total of 976 Sentinel-2 MSI Level-1C images were downloaded from the Copernicus Data Space Ecosystem (https://dataspace.copernicus.eu/) during the period from March 2016 to October 2024. Each MSI tile covers approximately 100 × 100 km2 and the overpass is at 10:30 local time to minimize the effects of sunglint and cloud cover, and an orbital altitude is 786 km43. The available images for each year in spring, summer, and autumn were downloaded under cloudless or low-cloud (<10%) conditions. The downloaded MSI images were calibrated using a dark spectrum fitting (DSF) algorithm integrated into the Atmospheric Correction for OLI lite (ACOLITE) processor to derive Rrs(λ). The DSF algorithm was designed for atmospheric corrections in coastal and inland waters and has been widely used44,45. The MSI image-derived Rrs(λ) contains 11 bands with resampled spatial resolution of 10 m.
Lake water extraction
Lake outline data from the Lake-Watershed Science Data Center (http://lake.geodata.cn) were used as initial boundaries for water extraction. The dataset contains accurate lake names, locations, and area properties. Vector lake outlines were interpreted based on China Brazil Earth Resources Satellite (CBERS) and Landsat-5 images, associated with digital elevation maps and lake chronicles46. Google Earth Engine (GEE) platform archived lake outlines and matched Sentinel-2 images to extract water by using the normalized difference water index (NDWI, Eq. (2)) and OTSU method47.
where G is the green band (560 nm) and NIR is the near-infrared band (842 nm). Accurate estimation of lake salinity was difficult in aquatic plant waters due to the optical properties differing from normal waters and tending to cause high reflections in the infrared band. Therefore, combined with the Floating Algae Index (FAI, threshold of 0.03, Eq. (3)) to exclude aquatic plant waters in grass-type lakes (Ulansuhai Lake)48.
where for the MSI sensor, the FAI calculate bands were adjusted to be the red band (R, 664 nm), NIR, and short-wave infrared band (SWIR, 1610 nm), with λ as the corresponding wavelength.
In addition, due to incorrect salinity estimates caused by bottom reflection and adjacency effects of mixed pixels in optically submerged lakes, possibly, a two-pixel buffer was created to minimize these influences49. Removed small patches (<3 pixels) and then calculated the lake area by ArcGIS 10.2 (WGS1984_UTM projection) in post-processing.
This study generated water coverage frequency maps and detected the change rate of lake area by Mann-Kendall (M-K) test in eight lakes during 2016 to 202450 (Fig. 3 and Table 3), and the M-K test was implemented by Python kernel. The constant water frequency threshold was defined as 95% to minimize errors due to cloud cover on the lake surface. The water coverage frequency in Daihai, Hongjiannao, Juyan, and Ulansuhai lakes has fluctuated more drastically compared to other lakes (percentage ≤ 65%); notably, the lake area of Daihai has significantly shrunk over the past nine years (change rate: −1.10 km2/year, p < 0.01). In contrast, Hulun, Dalinor, and Nanhaizi lakes have maintained relatively constant water boundaries (>84%) in the past nine years. Lake area of Hulun (13.98 km2/year, p < 0.01) and Nanhaizi (0.002 km2/year, p < 0.01) exhibited an increasing trend, while Dalinor (−1.01 km2/year, p < 0.01), Chagannaoer (−0.18 km2/year, p < 0.01), Daihai, and Juyan (−0.19 km2/year, p < 0.01) lakes decreased. Hongjiannao and Ulansuhai lakes did not indicate significant area changes (p > 0.27).
Estimation of lake water salinity
One of the most widely applied and effective machine learning algorithms, XGB, was used to build the lake salinity model51. As an integrated learner based on decision trees of XGB that realizes gradient descent by residual iteration and adds a regularization term to prevent overfitting. Each tree contributes to the final prediction by a corresponding score based on the leaf node to which the input data belongs, and the final prediction is the sum of these scores across all trees in the ensemble. Good prediction accuracy and robustness of the algorithm have been demonstrated in the establishment of a retrieval model for the regression task of lake water quality parameters52,53.
The XGB salinity algorithm was implemented by the Python kernel with the sklearn library in this study. The main phases of salinity modeling included feature selection, model training, testing, and validation (Fig. 2). Thirteen input features were filtered by an important assessment algorithm already embedded in XGB, namely, Rrs(443), Rrs(497), Rrs(560), Rrs(664), Rrs(704), Rrs(740), Rrs(842), NDWI, chromaticity angle (alpha), and lake area. The evolution of lakes is inevitably associated with changes in water volume and area, especially in arid and semi-arid regions, which can affect the water physicochemical properties (e.g., salinity)18,54. Alpha is a physical quantity related to the composition and inherent optical properties of the optical deep water, with the formula that can be found in Wang et al.55. The 229 matched pairs of salinity and Rrs(λ) were randomly divided into 70% training set (N = 153) and 30% test collection (N = 76), depending on the size of Dataset one and the ratio frequently used for the XGB algorithm. The model arguments and structure were determined by training collections, and the model performance was evaluated through testing collections. In the training process, the GridSearchSV method was employed to determine the model hyperparameters; the number of trees was 500, the learning rate was 0.05, the maximum tree depth was 7, the subsample rate was 0.8, the regularization was 0.01, and the minimum child weight was set to 8. For evaluating the model performance, Dataset one was arbitrarily divided into five groups for implementing five-fold CV after defining the model’s structure56. The average statistical metrics of the five assessments were used to evaluate the XGB salinity model’s performance. Evaluation metrics consisted of the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and bias (systematic error).
Data Records
The IMSAL dataset, containing 928 salinity raster data and corresponding mean with standard deviation records at 10 m spatial resolution for eight lakes in Inner Mongolia over the past nine years, is available at the Zenodo repository (https://doi.org/10.5281/zenodo.15849093)57. All raster data are stored in ‘TIF’ format and the data records are stored in Microsoft Excel xlsx files, which total about 29.7 GB. Each lake’s properties, including lake name, abbreviated name, location, elevation (m), districts, and types of climates, are catalogued in ‘Lake_info.xlsx’ (Table 4). Under the IMSAL dataset, it has eight subfolders named ‘Lake_Name_SAL’ and three tables. Each subfolder contains five data folders (Table 5). These data folders store the MSI Rrs(λ)-retrieved salinity data, including daily salinity raster data (folder name: ‘Abbreviation_Individual’), average annual salinity (‘Abbreviation_Annual’), average intra-annual seasonal salinity (‘Abbreviation_Intraannual_Season’), average seasonal salinity (‘Abbreviation_Season’), and nine-year average salinity (‘Abbreviation_Average’). The attribute corresponding to ‘Abbreviation’ is available in ‘Lake_info.xlsx’. The raster name, mean, standard deviation (S.D.), MSI image identifier, and date for each raster scene are recorded in the statistics table with ‘SAL_stat.xlsx’ (Table 4). The metadata of the dataset is compiled in the table ‘IMSAL_meta.xlsx’. Table 5 lists the storage architecture of the IMSAL dataset, data format, raster spatial resolution, and temporal resolution.
Technical Validation
Accuracy assessment of ACOLITE derived MSI R rs(λ)
Considered that accurate salinity estimation is affected by the performance of atmospheric correction, it was necessary to evaluate the consistency of MSI images-derived Rrs(λ) with in situ measured Rrs(λ) (Fig. 4).The MSI-derived Rrs(λ) has the highest accuracy at 664 nm (N = 62, R2 = 0.83, and RMSE = 0.0023 sr–1) and outperforms the NIR band (704–865 nm) in terms of precision at visible wavelengths (497–664 nm). Overestimated Rrs(λ) by AC processor in the infrared range absorbed by water where minor values are prone to error. Although the processors do not perform well at 783–865 nm, these wavelength bands were not the primary input features of the model. Therefore, the input features Rrs(λ) involved in model training were reliable. The ACOLITE processor exhibits acceptable performance overall (RMSE < 0.0048 sr‒1, |bias| < 0.0034 sr‒1) and demonstrated it can be employed to derive Rrs(λ) from MSI images (Table 6).
Overall accuracy assessment
The XGB salinity model performs with considerable accuracy (R2 = 0.98, RMSE = 0.95 ppt, MAE = 0.56 ppt, and MAPE = 10.2%) on the 30% independent testing set, as shown in Fig. 5a. Estimated salinity ranged from 0.5 to 18 ppt, which was in good consistency with measured values. But slightly underestimated lake salinity in the range of 15–18 ppt is possibly related to the constraints of the model boundaries. Five-fold CV demonstrated accuracy comparable to 30% independent testing (R2 = 0.95, RMSE = 0.99 ppt, MAE = 0.51 ppt, and MAPE = 12.2%), and the salinity derived from cross-validation was consistent with the extent of field measured, with only a few points underestimated (Fig. 5b). In general, the assessment results confirm that the XGB salinity model has good precision and robustness for salinity inversion based on MSI Rrs(λ).
Pixel-based statistical validation
Salinity raster data retrieved by concurrent satellite images was selected to histogram statistics for the lake-wide pixels (Fig. 6). The mode of estimated salinity can be visualized and reveal the outliers in the raster data by image histogram. The outliers are commonly distributed at both ends of the histogram with few pixels and abrupt value transitions. A single peak phenomenon was observed in the salinity histogram of most lakes and less variation within the lake (Fig. 6). It was consistent with the mode of salinity distribution measured in the field. The bimodal characteristics were found in a few lakes and with a scarcity of pixels on the sub-peak (Fig. 6r,x). The outlier pixels were observed in the Daihai Lake at the extent of 0–5 ppt due to being affected by mixed pixels in nearshore waters, probably, and not found in other lakes (Fig. 6x). Box plots display the mean and standard deviation of measured and estimated salinity with smaller deviations for most lakes. Overall, substantial outlier pixels were not observed in the salinity raster images and demonstrated good quality.
Lake salinity independent validation
To objectively assess the reliability and science of the IMSAL dataset, independent validation was carried out by using Dataset two, which was not involved in model training and testing (Fig. 7). A total of 69 recent field-sampled and literature-recorded data were used for independent validation, which met a one-day matching interval to the salinity raster images. For historical salinity raster images, the validation density was not enough due to less matching data. The small size of in situ measurements from Lake Nanhaizi (N = 3) that have all been employed for model training and testing, they were not involved in independent validation (Fig. 7a). Field sampling of this lake will be expanded in future work. From the fitted line in Fig. 7b, the independent validation achieved a good precision (N = 69, R2 = 0.94, p < 0.01, and MAPE = 11.9%). Overestimation of salinity was a major manifestation of the XGB model error, especially in the range from 0 to 6 ppt, with more training samples from oligosaline-type lakes than brackish-type lakes, which boosted low-salinity predictions. But the XGB model slightly underestimated lake salinity by 6–9 ppt because of the few model overfittings caused by dense training data in this range (Fig. 5b). Note that there may be high uncertainty (RMSE = 1.58 ppt) in the estimation of salinity in freshwater lakes (<1 ppt) due to poor salinity spectral characteristics in freshwater lakes as a result of inadequate freshwater-salt mixing and low water salinity. Additionally, unavoidable systematic errors existing in fitting nonlinear relationships between salinity and Rrs(λ) exacerbated uncertainty in freshwater lakes. Numerical comparisons of external data demonstrate again that IMSAL has science to it.
Usage Notes
Effective monitoring of the water salinity is essential for preventing lake salinization and assessing water quality conditions. But regional spatiotemporal monitoring of lake salinity remains enough. The IMSAL dataset fills the deficiency to some extent and provides help for the public to understand the spatial patterns and trends in lake salinity. Variations of lake salinity in arid and semi-arid areas respond sensitively to climate, which means a chance to mine meteorological intelligence from the salinity dataset for climatologists. Note that this distinction can be made in conjunction with field data when the user requires in-depth research that carefully distinguishes between freshwater and brackish water, as the model has uncertainties in freshwater. The insufficient density of independent validation for historical salinity images inspires our future work, which needs to increase the sampling frequency or establish routine monitoring stations to collect long-term validation data. Model underestimation of salinity in the 15–18 ppt range may cause untimely monitoring of changes in water properties and seasonal patterns, resulting in delayed response measures. Potential correction strategies include expanding the model boundaries by adding training samples from high-salinity lakes or building a correction function to recalibrate for oligosaline lake salinity through field sampling. In addition, our established XGB model for rapid inversion of lake salinity based on remote sensing also offers technical insights for generating lake salinity datasets in other regions as well. Different from salinity data constructed using environmental data modeling and field measurements53,58, the XGB salinity model constructed based on remote sensing imagery in this study has the ability to historically reconstruct and iteratively update data. The complete procedure code for the building of the XGB salinity model and employed to MSI images was given in this study. If the user is interested in other lakes, the training samples can be extended from the current model to construct a practical model.
Code availability
Demonstration code for the construction, five-fold cross-validation, and application of the XGB salinity model is available at https://github.com/MingMDeng/SAL_inversion.git and should be accessed and edited using Python 3.11. Atmospheric correction of the Sentinel-2 MSI data was accomplished through the ACOLITE software (version 20221114.0). Extracted remote sensing reflectance and mosaic images in ENVI 5.3. Pixel-based manipulation and salinity data visualization in ArcGIS 10.2.
References
Wurtsbaugh, W. A. et al. Decline of the world’s saline lakes. Nat. Geosci. 10, 816–821 (2017).
Zhao, J. & Temimi, M. An Empirical Algorithm for Retreiving Salinity in the Arabian Gulf: Application to Landsat-8 Data. in 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS) 4645–4648 (IEEE, New York, 2016).
Vidal, N. et al. Salinity shapes food webs of lakes in semiarid climate zones: a stable isotope approach. Inland Waters 11, 476–491 (2021).
Jeppesen, E., Beklioglu, M., Ozkan, K. & Akyurek, Z. Salinization Increase due to Climate Change Will Have Substantial Negative Effects on Inland Waters: A Call for Multifaceted Research at the Local and Global Scale. Innovation-Amsterdam 1, 100030 (2020).
Liao, Y. et al. Salinity is an important factor in carbon emissions from an inland lake in arid region. Sci. Total Environ. 906, 167721 (2024).
Canedo-Argueelles, M. et al. Saving freshwater from salts. Science 351, 914–916 (2016).
Tang, X. et al. Effects of climate change and anthropogenic activities on lake environmental dynamics: A case study in Lake Bosten Catchment, NW China. J. Environ. Manage. 319, 115764 (2022).
Williams, W. D. Salinisation: A major threat to water resources in the arid and semi-arid regions of the world. Lakes & Reservoirs 4, 85–91 (1999).
Jiang, X. et al. Centenary covariations of water salinity and storage of the largest lake of Northwest China reconstructed by machine learning. J. Hydrol. 612, 128095 (2022).
Hu, M. et al. A dataset of trophic state index for nation-scale lakes in China from 40-year Landsat observations. Sci. Data 11, 659 (2024).
Drusch, M. et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sensing of Environment 120, 25–36 (2012).
Xu, J. et al. Optical models for remote sensing of chromophoric dissolved organic matter (CDOM) absorption in Poyang Lake. ISPRS-J. Photogramm. Remote Sens. 142, 124–136 (2018).
Guo, H., Huang, J. J., Chen, B., Guo, X. & Singh, V. P. A machine learning-based strategy for estimating non-optically active water quality parameters using Sentinel-2 imagery. Int. J. Remote Sens. 42, 1841–1866 (2021).
Pahlevan, N. et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 240, 111604 (2020).
Sullivan, S. A. Experimental Study of the Absorption in Distilled Water, Artificial Sea Water, and Heavy Water in the Visible Region of the Spectrum*. J. Opt. Soc. Am. 53, 962 (1963).
Bai, Y. et al. Remote sensing of salinity from satellite-derived CDOM in the Changjiang River dominated East China Sea. J. Geophys. Res.-Oceans 118, 227–243 (2013).
Liu, Y., Bao, A. & Chen, X. Measuring salinity of low salinity lake by optical remote sensing: A case study of Bosten Lake. Journal of Remote Sensing 18, 902–911 (2014).
Liu, C. et al. The decrease of salinity in lakes on the Tibetan Plateau between 2000 and 2019 based on remote sensing model inversions. Int. J. Digit. Earth 16, 2644–2659 (2023).
Guo, H. et al. A generalized machine learning approach for dissolved oxygen estimation at multiple spatiotemporal scales using remote sensing. Environ. Pollut. 288, 117734 (2021).
Liu, D. et al. Substantial increase of organic carbon storage in Chinese lakes. Nat. Commun. 15, 8049 (2024).
Zhao, Z., Shi, K. & Zhang, Y. Remote sensing estimation of dissolved organic carbon concentrations in Chinese lakes based on Landsat images. J. Hydrol. 638, 131466 (2024).
Palmer, S. C. J., Kutser, T. & Hunter, P. D. Remote sensing of inland waters: Challenges, progress and future directions. Remote Sens. Environ. 157, 1–8 (2015).
Song, C. et al. Widespread declines in water salinity of the endorheic Tibetan Plateau lakes. Environ. Res. Commun. 4, 091002 (2022).
Tao, S. et al. Rapid loss of lakes on the Mongolian Plateau. Proc. Natl. Acad. Sci. USA. 112, 2281–2286 (2015).
Feng, S., Zheng, S., Guan, W., Han, L. & Wang, S. Evolution of the lake area and its drivers during 1990-2021 in Inner Mongolia. Environ. Earth Sci. 83, 414 (2024).
Xu C. Bacterial Diversity in the water and Sediment of Daihai Lake. (Shanghai Ocean University, Shanghai, 2023).
Yang, F. et al. Investigation of Water Environment and Fish Diversity in Dalinor Wetlands I. Major Ions, Salt Content and Electrical Conductivity in the Water of Dali Lake. Wetland Science 18, 507–515 (2020).
Li, X. et al. Spatiotemporal Variation of Phytoplankton Community Structure and Its Influencing Factors in the Dalinor Lake. Wetland Science 21, 897–906 (2023).
Gao, J. et al. Effects of water quality and bacterial community composition on dissolved organic matter structure in Daihai lake and the mechanisms. Environ. Res. 214, 114109 (2022).
Yang, W. et al. Characterization for nitrogen metabolism of sediments in highland saline lake. China Environmental Science 43, 1328–1339 (2023).
Ren, X. et al. Spatial changes and driving factors of lake water quality in Inner Mongolia, China. J. Arid Land 15, 164–179 (2023).
Yu, H. Health Assessment and Study on Adaptability of Evaluation Systemin Juyan Lake, Inner Mongolia. (Inner Mongolia Agricultural University, 2021).
Zhou, J. Effects of Wuliangsuhai Water Environmental Factors on Nutritional Status of Lakes. (Inner Mongolia Agricultural University, 2020).
Hu, J. Spatiotemporal dynamics and driving factors of zooplankton community in representative lakes in the Inner Mongolia plateau, China. (Xi’An University of technology, 2024).
Bai, H. et al. Characteristics of Zooplankton Community Structure and Its Relationship with Environmental Factors in Hongjiannao Lake. Journal of Ecology and Rural Environment 38, 1064–1075 (2022).
Bai, H. et al. Seasonal Changes of Phytoplankton Communities and Their Relationship with Environmental Factors in the Hongjiannao Lake, A Desert Lake. Wetland Science 22, 155–168 (2024).
Feng, S., Liu, X. & Li, H. Spatial variations of δD and δ18O in lake water of western China and their controlling factors. Journal of Lake Sciences 32, 1199–1211 (2020).
Gitelson, A. A. et al. A simple semi-analytical model for remote estimation of chlorophyll-a in turbid waters: Validation. Remote Sens. Environ. 112, 3582–3593 (2008).
Bricaud, A., Morel, A. & Prieur, L. Absorption by Dissolved Organic-Matter of the Sea (yellow Substance) in the Uv and Visible Domains. Limnol. Oceanogr. 26, 43–53 (1981).
Cao, Z., Duan, H., Feng, L., Ma, R. & Xue, K. Climate- and human-induced changes in suspended particulate matter over Lake Hongze on short and long timescales. Remote Sens. Environ. 192, 98–113 (2017).
Mueller, J. L. et al. Ocean Optics Protocols For Satellite Ocean Color Sensor Validation, Revision 4. Volume III: Radiometric Measurements and Data Analysis Protocols. (2003).
Mobley, C. D. Estimation of the remote-sensing reflectance from above-surface measurements. Appl. Optics 38, 7442–7455 (1999).
Zeng, F. et al. Monitoring inland water via Sentinel satellite constellation: A review and perspective. ISPRS-J. Photogramm. Remote Sens. 204, 340–361 (2023).
Vanhellemont, Q. Adaptation of the dark spectrum fitting atmospheric correction for aquatic applications of the Landsat and Sentinel-2 archives. Remote Sens. Environ. 225, 175–192 (2019).
Vanhellemont, Q. Sensitivity analysis of the dark spectrum fitting atmospheric correction for metre- and decametre-scale satellite imagery using autonomous hyperspectral radiometry. Opt. Express 28, 29948–29965 (2020).
Ma, R. et al. China’s lakes at present: Number, area and spatial distribution. Sci. China-Earth Sci. 54, 283–289 (2011).
Otsu, N. Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979).
Hu, C. A novel ocean color index to detect floating algae in the global oceans. Remote Sens. Environ. 113, 2118–2129 (2009).
Deng, M. et al. Monitoring Salinity in Inner Mongolian Lakes Based on Sentinel-2 Images and Machine Learning. Remote Sens. 16, 3881 (2024).
Sen, P. K. Estimates of the Regression Coefficient Based on Kendall’s Tau. Journal of the American Statistical Association 63, 1379–1389 (1968).
Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in KDD’16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING 785–794 (Assoc Computing Machinery, New York, 2016).
Cao, Z. et al. A machine learning approach to estimate chlorophyll-a from Landsat-8 measurements in inland lakes. Remote Sens. Environ. 248, 111974 (2020).
Xu, P., Liu, K., Shi, L. & Song, C. Machine learning modeling reveals the spatial variations of lake water salinity on the endorheic Tibetan Plateau. J. Hydrol.-Reg. Stud. 56, 102042 (2024).
Williams, W. D. Environmental threats to salt lakes and the likely status of inland saline ecosystems in 2025. Environ. Conserv. 29, 154–167 (2002).
Wang, S. et al. A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000-2018. Sci. Data 8, 26 (2021).
Cao, Z. et al. A decade-long chlorophyll-a data record in lakes across China from VIIRS observations. Remote Sens. Environ. 301, 113953 (2024).
Deng, M. et al. A non-optically active lake salinity dataset in Inner Mongolia. Zenodo https://doi.org/10.5281/zenodo.15849093 (2025).
Thorslund, J. & van Vliet, M. T. H. A global dataset of surface water and groundwater salinity measurements from 1980–2019. Sci. Data 7, 231 (2020).
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540, 418–422 (2016).
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Grant No.42361144002 and No.42371371). We gratefully acknowledge field data support from the Lake-Watershed Science Data Center (http://lake.geodata.cn) and Inner Mongolia University. The authors thank the study participants from Nanjing Institute of Geography and Limnology (Yiqiu Wu, Guang Gao, and Feizhou Cheng) for their efforts in the field experiments.
Author information
Authors and Affiliations
Contributions
Deng, M. contributed to data curation, formal analysis, methodology, programming, quality control, validation, visualization, and writing – original draft preparation and editing. Ma, R. contributed to conceptualization, data curation, funding acquisition, investigation, methodology, programming, project administration, quality assurance, quality control, supervision, and writing – review and editing. Wang, L. contributed to data curation. Hu, M. contributed to writing – review and editing. Xue, K. contributed to writing – review and editing. Cao, Z. contributed to methodology, writing – review and editing. Xiong, J. contributed to data curation and validation. Yu, Z. contributed to writing – review and editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Deng, M., Ma, R., Wang, L. et al. A non-optically active lake salinity dataset by satellite remote sensing. Sci Data 12, 1324 (2025). https://doi.org/10.1038/s41597-025-05686-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-05686-2