Prolonged water body types dataset of urban agglomeration in central China from 1990 to 2021

Ma, Haozheng; Yang, Xiaohong; Fan, Runyu; He, Kang; Wang, Lizhe

doi:10.1038/s41597-025-04794-3

Download PDF

Data Descriptor
Open access
Published: 22 March 2025

Prolonged water body types dataset of urban agglomeration in central China from 1990 to 2021

Scientific Data volume 12, Article number: 480 (2025) Cite this article

1733 Accesses
1 Citations
Metrics details

Subjects

Abstract

Data and knowledge of spatial-temporal dynamics of multiple types of water body are significant for water resources management but remain very limited. Using the Landsat satellite data and weakly supervised deep learning techniques for long term mapping, we report annual maps of multiple types of inland water bodies on urban agglomeration scale in the middle reaches of the Yangtze River (MRYR) during 1990-2021 at 30m spatial resolution. Accuracy assessment from 14000 validation points in seven years indicates an overall accuracy of 94.50%. Quantitative and qualitative comparison with other water related mapping products further demonstrates the superiority on long time span, refined classification system, and the high precision. The Water-MRYR could support the government and the public in the challenging water resources management and wetland conservation.

Evaluation of water ecological security and diagnosis of Obstacles in the Yangtze river delta, China

Article Open access 22 August 2025

Extraction of water bodies from high-resolution remote sensing imagery based on a deep semantic segmentation network

Article Open access 25 June 2024

Ecological-environmental challenges and restoration of aquatic ecosystems of the Middle-Eastern

Article Open access 14 October 2022

Background & Summary

Inland surface water bodies are critical components of ecosystems on earth. They play a vital role in supporting agriculture, industry, and human societies from regional to global¹. The inland water are also closely associated with several of the Sustainable Development Goals (such as, SDG 6, 14 and 15). Different types of water bodies, such as rivers, lakes, and artificial water body, have different impacts on the ecological environment of cities^2,3,4. The artificial ponds used for agriculture and aquaculture typically have higher carbon emissions^5,6. Large-scale collection of artificial water bodies can impact the land surface dramatically, leading to changes of land cover types in lakes, rivers, and wetlands.

Numerous inland surface water body mapping data at various spatial resolutions have been created and applied in surface water body monitoring^7,8,9,10,11. These binary maps (water and not water) reported the monthly or yearly dynamics of permanent and seasonal surface water bodies around the world. Besides, some land cover^12,13,14 and wetland^15,16 related researches have started to consider the classification of water and wetland (seasonal water) and their long-term trend^17,18,19,20. However, the various types of surface waters have similar spectral features, which causes challenges when differentiating them. Threshold classification method based on the various calculated shape parameters, such as compactness, area and boundary index, has been applied to map the various water cover types^21,22,23,24. Because of the challenging stable features extraction and extensive manually post-processing work, existing studies are constrained to a limited number of cities and years, preventing large-scale and longer-term mapping and analysis^19,21.

The urban agglomeration in the middle reaches of the Yangtze River^25,26 encompasses the major cities of Hubei, Hunan, and Jiangxi, covering an area of approximately 250,000 square kilometers, as shown in Fig. 1. This region is endowed with abundant water and wetland resources²⁷, with key rivers such as the Yangtze River, Xiangjiang River, and Ganjiang River traverse the area, and significant inland lakes like Dongting Lake, Poyang Lake, and Hong Lake. In this case, high precision mapping of the surface water bodies and derived spatial-temporal dynamics information over the past few decades in this area are essential for the understanding of the roles of various water cover types in water security, but such information has not been well documented to the government and public, yet.

In recent years, the deep learning methods, with their automatic feature learning and extraction capabilities, have shown advantages and widely applied in remote sensing image interpretation^28,29,30, such as image classification, semantic segmentation, and object detection. Among these models, the deep learning based semantic segmentation models have the ability of capturing contextual information for pixel-level classification³¹, thus have been widely used to reveal the spatiotemporal distribution of the research object from remote sensing image in large scale^32,33,34,35. In this study, we used deep learning techniques to map the multiple water body types of the urban agglomeration in the middle reaches of the Yangtze River from 1990 to 2021 at 30 resolution. Four types of water body are considered in this study, including lakes, rivers, wetlands, and artificial water bodies. A Semantic-to-Pixel framework is proposed combing the semantic segmentation deep model and binary water body maps for the mapping of water cover types. In addition, a weakly supervised long term mapping framework is proposed aiming at the automatic long term sample generation. The overall accuracy indicated by over 14000 validation points for seven years ranges from 0.9245 to 0.9598, which reflects the high precision of our method. The dataset can serve as a foundation for further studies on drivers of artificial water bodies expansion and their implications for socio-economic conditions and sustainable land use development, particularly regarding environmental impacts.

Methods

Study area

The urban agglomeration in the middle reaches of the Yangtze River^25,26 encompasses 31 major cities of Hubei, Hunan, and Jiangxi, covering an area of approximately 250,000 km² with a population of 130 million. It is an important component of the Yangtze River Economic Belt and holds a significant position in China’s economic and social development pattern. This urban agglomeration is one of the regions with the richest water resources in China, with three International Wetland cities, Wuhan, Changde, and Nanchang.

Materials

Satellite imagery

The Landsat on the Google Earth Engine (GEE), covering Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), Landsat 8 Operational Land Imager (OLI), are used in this study. All of the images are from the Tire 1 of Landsat Collection 2, which is a dataset containing atmospherically corrected surface reflectance data. The spatial resolution of Landsat data is 30 meters, with six bands being used in our work, including the red, green, blue, near-infrared, and two shortwave infrared bands. The Quality Assessment (QA) band, which is driven from CFmask^36,37 to mask observations affected by cloud or cloud shadow, is used to remove observations in low quality from each tiles.

For the semantic segmentation task, the median composite images are used for a clear observation. As the images from Landsat 5 are usually in low quality, the images from 1990 to 2010 are the composite using the adjacent years(three years composite for the middle year). After the launch of Landsat 8, the quality of the images has been greatly improved, therefore we use the median synthesis within the year.

Semantic segmentation Dataset

Precision annotated labels for model training are crucial for semantic segmentation tasks. However, there is currently a lack of publicly available benchmark datasets for mapping multiple types of water bodies. In this case, we start out by manually assign the labels for semantic segmentation. The median composite imagery in 2020 from Landsat 8 is selected for label generation. The image of the whole area is first crop into non-overlapping patches with size 512 × 512 pixels (2940 in total). We conduct stratified sampling based on the number of water bodies in the GSW_JRC⁷, which is a global surface water dataset developed by the European Commission’s Joint Research Centre as shown in Table 1. Then 466 (around 15%) patches are selected and manually labeled with four water body categories as shown in Table 2. Specifically, considering the resolution of remote sensing images and the complexity of annotation, for the category of artificial water bodies, we do not remove the tunnel gaps among them, but instead used a Semantic-to-Pixel framework (next section) for contour extraction. Finally, the semantic segmentation dataset is randomly divided into training and validation set of 4:1.

Table 1 The introduction of dataset used in this study.

Full size table

Table 2 The classification system used used in this study.

Full size table

Binary water map

Binary water map is used to remove the tunnel gaps among the artificial water bodies and serve as weak guidance for long term training. The GSW Yearly Water Classification History⁷ provides YearlyHistory dataset with spatial information on permanent and seasonal water in GEE platform. However, the seasonal water category is a composition of the monthly bianry product, resulting in a relatively lower accuracy.

In order to decrease the omission for seasonal water, the widely used mNDWI-VIs algorithm^8,9,38 is selected to generate the water frequency map Water_f. Water_f is developed using the time series Landsat data and three indices, NDVI³⁹ (Normalized Difference Vegetation Index, equation (1)), EVI⁴⁰ (Enhanced Vegetation Index, equation (2)), and mNDWI⁴¹ (Modified Normalized Difference Water Index, equation (3)). The water body detection algorithm is ((mNDWI > EVI or mNDWI > NDVI) and (EVI < 0.1)). The pixels satisfying the above condition are classified as water. In addition, the Shuttle Radar Topography Mission (SRTM)⁴² digital elevation model (DEM), as well as the slope, are used to remove terrain shadows in mountainous areas. The water frequency maps are then merged with a threshold 0 to get the binary water map with clear boundary. Finally, the binary water maps from GSW_YRC and mNDWI-VIs algorithm are intersected together to get the synthetic water map.

$$NDVI=\frac{NIR-RED}{NIR+RED}$$

(1)

$$EVI=2.5\ast (\frac{NIR-RED}{NIR+6\ast RED-7.5\ast BLUE+1})$$

(2)

$$MNDWI=\frac{GREEN-SWIR}{GREEN+SWIR}$$

(3)

Multiple Water body types mapping using deep semantic segmentation models

This study proposed a framework for mapping multiple water body types in urban agglomerations scale using satellite imagery and deep learning techniques. We first create a semantic segmentation dataset for the multiple water body types using the Landsat 8 median composite images and manually assigned labels. The Semantic-to-Pixel framework for the mapping of multiple types of water bodies is then applied to get the map in 2020 as shown in Fig. 2 step1. Then the weakly-supervised training framework, shown in Fig. 2 step2, is applied to train models for the mapping application in other years and the models are used for the long-term mapping.

The batch size is set to 8 for the model training. The Segformer model⁴³, which is developed from vision transformer, has demonstrated strong feature extraction capabilities and been widely used in remote sensing tasks, is selected as the backbone for wetland mapping. The Mean Intersection over Union (MIoU)) is selected as the metric index on the test dataset.

Due to the relatively lower resolution and the upsample structure in the semantic segmentation model. We ignore the tunnel gaps among the artificial water bodies and just get the semantic boundary while labeling and training the model. In order to get a more detailed mapping result, the outputs of the segmentation model are refined with the water frequency maps to correct the tunnel gaps and road among the clusters of artificial water bodies. The tunnel gaps among the artificial water body are masked as they are assigned not water in the binary water maps.

Long term multiple Water body types mapping via weakly supervision

Due to the domain shift from the different satellite sensors and imaging quality, the model trained with the remote sensing images and labels from one year could not be directly applied in mapping the images from other years even in the same place⁴⁴. The long term stable training samples are generated using various ways^17,19,22,45. However, manually annotated labels year by year is extremely time-consuming for decades observation. In this case, we proposed the weakly supervised long term mapping framework for the multiple water body types. For the long-term mapping, samples from the baseline year with labels ${D}_{l}={({x}_{i}^{l},{y}_{i}^{l})}_{i=1}^{{N}_{l}}$ and other years without labels ${D}_{u}={({x}_{i}^{u})}_{i=1}^{{N}_{u}}$ are used to train the semantic segmentation model. In order to achieve that, the proposed framework utilizes a standard teacher-student model, which consists two models of the same architecture. As for the weight updating, we adopt the commonly used exponential moving average(EMA)⁴⁶ strategy, which means the teacher model’s weights are updated based on the weights of student model with a momentum.

The loss design is also important for the training process, we design three loss in this weakly supervised long term mapping framework. For the labeled images, the objective is to minimize the fully-supervised cross-entropy loss L_s shown in equation (4). As for the unlabeled images, the pixel-wise pseudo-labels are generated from the prediction of the teacher model. In order to select the high-confidence pseudo-labels, we calculate the entropy for each pixel and compare it with a dynamic percentile threshold θ_t. The entropy at pixel (i, j) is defined as in equation (5). The p_ij ∈ R^C is the probability of each category after the softmax function in teacher model and C is the number of classes. The pixels with entropy lower than the θ_t are selected as reliable pseudo labels and the unsupervised loss L_u is calculated. During the training procedure, the pseudo-labels tend to be reliable gradually. Base on this intuition, the θ_t is updated with linear strategy every epoch, which is shown in equation (6) with t and T denoting the current epoch and total epoch (θ₀ set to 80% in this study).

$${L}_{ce}=-\sum {Y}_{n}log(S({X}_{n}))$$

(4)

$${E}_{{p}_{ij}}=-\mathop{\sum }\limits_{c=0}^{C-1}{p}_{ij}(c)log({p}_{ij}(c))$$

(5)

$${\theta }_{t}={\theta }_{0}+\frac{t}{T}(1-{\theta }_{0})$$

(6)

Besides, as the different water body types belong to water, the binary water maps are also used as supervision to get the water loss L_water, which helps the model to differentiate between water and non water. The probability maps of the classes in water are merged and softmax to the binary output. The cross entropy loss is also used for the unsupervised loss and water loss. The final loss is shown in equation (7).

$$L={L}_{s}+{L}_{u}+{L}_{water}$$

(7)

Accuracy assessment

To cross-validate our product, we design the accuracy assessments by a stratified random sampling approach. We evaluate the accuracy based on mapping from Landsat 30 m prediction. Accuracy assessment and area estimation of the mapping results are performed followed^47,48. The overall accuracy (OA) is used to describe the overall performance of all categories in the mapping products. To evaluate each category, the F1 Score is chosen as it represents the harmonic mean of Precision (producer accuracy) and Recall (user accuracy), effectively balancing these two metrics.

The stratified random sampling is conducted on seven years (1990, 1995, 2000, 2005, 2010, 2015, and 2020) with about 2000 validation points for each year to assess the mapping accuracy. For each point, we determine the category the point belonged to using the Landsat mosaic imagery and the corresponding binary water map. Specifically, for the points from 2015 and 2020, we also use Sentinel-2 imagery with higher spatial resolution as a reference for annotation.

Data Records

Water-MRYR dataset can be accessed at https://doi.org/10.57760/sciencedb.12990⁴⁹. The dataset is a long term water body types mapping result of the urban agglomeration in the middle reaches of the Yangtze River from 1990 to 2021 at 30m spatial resolution, which has been produced using the developed pixel-semantic framework and weakly supervised deep learning technology. The accuracy of the mapping results is validated using over 14,000 verification points, achieving an overall accuracy (OA) of 94.50%. The mapping product reveals the spatiotemporal distribution of various water body types in the urban agglomeration, which is of great significance for ecological conservation and water resource management. The dataset contains 14 products, with one every five years from 1990 to 2010 and one every year from 2013 to 2021. All the files are stored in .tif format and named UA_MRYR_Year.tif for each year. The index numbers for 0 to 4 represent background, lake, river, wetland, and artificial water, as shown in Fig. 3.

Technical Validation

Accuracy assessment on the mapping product

Validation on sample points

Validated with our sample points, the OA of water cover mapping for the seven reference years are 92.45%, 93.41%, 93.61%, 95.49%, 95.33%, 95.98%, and 95.26%. The average OA for the seven years is 94.50%. The F1 score on different categories is shown in Table 3. The accuracy of lakes and artificial water bodies all exceeds 90%, while the accuracy of most rivers and wetlands exceeds 80%. The above results demonstrate the high accuracy of our products.

Table 3 The accuracy assessment result for the F1 Score on each categories and Overall Accuracy on each year.

Full size table

Validation on open-access validation sets

To get a more comprehensive accuracy verification, we evaluate our products based on two open-access validation sets. The first⁵⁰ is a sample set for global land cover mapping, which is produced using all-season Landsat-8 imagery around 2015. We filter the samples in the study area and get 1008 points. After manually check the assign the water related category, the OA for our products (UA_MRYR_2015) is 98.91%. The GLobal Lakes and Wetlands Database(GLWD)⁵¹ is a summary of several global and reginal water-related dataset developed in 2004. We use the level-2 data, which contains 691 polygons with rough shape boundary for lakes and rivers, to validate our product (UA_MRYR_2005). After manual rectification some errors in GLWD according to the Landsat imagery in 2005, our mapping products are semantically consistent with 96.96% (670) of polygons, revealing the high accuracy of our mapping product.

Statistical-level validation

In addition to assessment on validation points, the consistency between mapping products and statistical data is also considered as an important aspect of evaluating mapping results. The city level area statistics are conducted using the mapping product in 2020 as shown in Fig. 4. Nearly 20% of the water resources are artificial water body, which significantly affect the economy and environment in the urban agglomerations. Our map reveals extensive artificial water body distributed around large rivers and lakes in most cities. The top ten cities account for 80.74% of the the urban agglomerations share of artificial water body coverage, including Jingzhou (28.67%), Wuhan (8.98%), Xiaogan (8.00%), Changde (6.40%), Yueyang (5.93%), Yiyang (5.65%), Huanggang (5.35%), Xiantao (5.12%), Qianjiang (3.33%) and Tianmen (3.33%). Among the top 20 cities, 13 cities are located in Hubei Province, which also has the largest amount of water resources.

Our research also reveals the relationship between artificial water body area and fishery industry production in city level (https://www.stats.gov.cn). The data on the total value of fishery production comes from the official statistical data released by the Provincial Bureau of Statistics. The result shows a significant correlation between artificial water bodies and the total value of the fishery industry, with Pearson Correlation: 0.88*** and Spearman Correlation: 0.83***, suggesting a central role of artificial water body in supporting regional fishery economy.

Comparison with other water-related datasets

For a more comprehensive comparison, we assess the performance of our products with four water related mapping products in the MRYR, as shown in Fig. 5. The GSW_JRC⁷ contains permanent water and seasonal water. EA_Wetland²³ is a wetland mapping product in east Asia in 10m resolution. The multiple water cover types are considered in this product, and the inland swamp and marsh are merged as the wetland category for comparison. The CWaM²¹ is a water cover map product in China developed with Sentinel-1 and 2 imagery. The rice filed and agriculture ponds are merged as artificial water. GLC_FCS30D¹⁸ is a dynamic global land cover product, which also considers wetland categories the same as GWL_FCS30¹⁶. The inland wetland, namely swamp, marsh and flooded flat are merged for comparison. ESA_Worldcover⁵² is a renowned global land cover map from the European Space Agency. The water, wetland, cropland, and grassland types are selected for comparison.

For wetland classification dataset, EA_Wetland has presented mapping with high spatial resolution (10m), but its performance in distinguishing multiple types of water bodies needs improvement. For the water specific data, the CWaM also clearly presents the spatial distribution of some water body categories, but it also contains a substantial amount of noise for the artificial ponds. For the land cover product, the GLC_FCS30D and ESA WorldCover mainly classify the artificial water body as cropland and grassland. In summary, our products best demonstrate the spatial information of multiple types of water body.

Comparison with long-term mapping products

The four mapping products mentioned above only provide the spatial information only in year 2021 except the GLC_FCS30D. To assess the presentation ability of spatiotemporal information, the GLC_FCS30D and the GSW_YRC are considered for comparison. The two products provide long-term (the nominal period is 1985-2021, but the data for 1985-1990 is incomplete due to the poor imaging quality) spatial information for water and land cover respectively. Different types of water bodies have different trends of change, but this information has not been reflected in current mapping products. In this case, the mapping results for the three products in Hong Lake(The largest lake in Hubei Province) are shown in Fig. 6.

In terms of water bodies, our products maintain a high degree of consistency with GSW, but GSW lacks the specific category information for water bodies. The GSW_JRC shows the increase tendency of seasonal water body, which is actually the artificial water body as shown in our products. The GLC_FCS30D also shows the tendency of area increase for the water but fails to classifies the types of water body. In addition, GLC_FCS30D also misclassifies a large number of artificial water bodies as farmland and grassland, without correctly capturing the spatial semantic information of artificial water bodies. Our product clearly demonstrates the water cover types and reveals the significant expansion of artificial water body around the Hong Lake.

Usage Notes

The Water-MRYR dataset provides long-term (1990–2021) mapping of water body types(lake, river, wetland, and artificial water) in the urban agglomeration of the middle reaches of the Yangtze River, with a spatial resolution of 30 meters. We recommend that users view this data product using software such as ArcGIS or QGIS. Our data provides long-term spatiotemporal information on the changes of different water body types, which can be used to detect the expansion of artificial water bodies and their encroachment on other types of water bodies, such as wetlands.

Code availability

The experiments are conducted under the Pytorch environment on an NVIDIA RTX 4080GPU, with Python==3.8.16, torch==1.13.1, and torchvision==0.14.1. The code for long term mapping of multiple types of water with Landsat imagery can be accessed on GitHub at the following link (https://github.com/haozhengma/Water-MRYR). Additionally, we also provide the GEE code of mNDWI-VIs algorithm for water mapping.

References

Pi, X. et al. Mapping global lake dynamics reveals the emerging roles of small lakes. Nature communications 13, 5777 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Huang, X., Xie, C., Fang, X. & Zhang, L. Combining pixel-and object-based machine learning for identification of water-body types from urban high-resolution remote-sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8, 2097–2110 (2015).
Article ADS MATH Google Scholar
Wu, Q. et al. Satellites reveal hotspots of global river extent change. Nature communications 14, 1587 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yao, F. et al. Satellites reveal widespread decline in global lake water storage. Science 380, 743–749 (2023).
Article ADS CAS PubMed MATH Google Scholar
Holgerson, M. A. & Raymond, P. A. Large contribution to inland water co2 and ch4 emissions from very small ponds. Nature Geoscience 9, 222–226 (2016).
Article ADS CAS Google Scholar
Dong, B., Xi, Y., Cui, Y. & Peng, S. Quantifying methane emissions from aquaculture ponds in china. Environmental Science & Technology 57, 1576–1583 (2022).
Article ADS MATH Google Scholar
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540, 418–422 (2016).
Article ADS CAS PubMed MATH Google Scholar
Zou, Z. et al. Divergent trends of open-surface water body area in the contiguous united states from 1984 to 2016. Proceedings of the National Academy of Sciences 115, 3810–3815 (2018).
Article ADS CAS MATH Google Scholar
Wang, X. et al. Gainers and losers of surface and terrestrial water resources in china during 1989–2016. Nature communications 11, 3471 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Li, Y., Dang, B., Zhang, Y. & Du, Z. Water body classification from high-resolution optical remote sensing imagery: Achievements and perspectives. ISPRS Journal of Photogrammetry and Remote Sensing 187, 306–327 (2022).
Article ADS MATH Google Scholar
Han, W. et al. A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities. ISPRS Journal of Photogrammetry and Remote Sensing 202, 87–113 (2023).
Article ADS MATH Google Scholar
Chen, J. et al. Global land cover mapping at 30 m resolution: A pok-based operational approach. ISPRS Journal of Photogrammetry and Remote Sensing 103, 7–27 (2015).
Article ADS MATH Google Scholar
Zhang, X. et al. Glc_fcs30: Global land-cover product with fine classification system at 30 m using time-series landsat imagery. Earth System Science Data Discussions 2020, 1–31 (2020).
CAS Google Scholar
Karra, K. et al. Global land use/land cover with sentinel 2 and deep learning. In 2021 IEEE international geoscience and remote sensing symposium IGARSS, 4704–4707 (IEEE, 2021).
Mao, D. et al. National wetland mapping in china: A new product resulting from object-based and hierarchical classification of landsat 8 oli images. ISPRS Journal of Photogrammetry and Remote Sensing 164, 11–25 (2020).
Article ADS MATH Google Scholar
Zhang, X. et al. Gwl_fcs30: global 30 m wetland map with fine classification system using multi-sourced and time-series remote sensing imagery in 2020. Earth System Science Data Discussions 2022, 1–31 (2022).
Google Scholar
Amani, M. et al. Wetland change analysis in alberta, canada using four decades of landsat imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 10314–10335 (2021).
Article ADS MATH Google Scholar
Zhang, X. et al. Glc_fcs30d: the first global 30 m land-cover dynamics monitoring product with a fine classification system for the period from 1985 to 2022 generated using dense-time-series landsat imagery and the continuous change-detection method. Earth System Science Data 16, 1353–1381 (2024).
Article ADS MATH Google Scholar
Wang, M. et al. Interannual changes of urban wetlands in china’s major cities from 1985 to 2022. ISPRS Journal of Photogrammetry and Remote Sensing 209, 383–397 (2024).
Article ADS MATH Google Scholar
Lou, A., He, Z., Zhou, C. & Lai, G. Long-term series wetland classification of guangdong-hong kong-macao greater bay area based on apsmnet. International Journal of Applied Earth Observation and Geoinformation 128, 103765 (2024).
Article Google Scholar
Li, Y. & Niu, Z. Systematic method for mapping fine-resolution water cover types in china based on time series sentinel-1 and 2 images. International Journal of Applied Earth Observation and Geoinformation 106, 102656 (2022).
Article MATH Google Scholar
Deng, Y. et al. Automated and refined wetland mapping of dongting lake using migrated training samples based on temporally dense sentinel 1/2 imagery. International Journal of Digital Earth 16, 3199–3221 (2023).
Article ADS MATH Google Scholar
Wang, M. et al. Wetland mapping in east asia by two-stage object-based random forest and hierarchical decision tree algorithms on sentinel-1/2 images. Remote Sensing of Environment 297, 113793 (2023).
Article MATH Google Scholar
Peng, K. et al. Continental-scale wetland mapping: A novel algorithm for detailed wetland types classification based on time series sentinel-1/2 images. Ecological Indicators 148, 110113 (2023).
Article MATH Google Scholar
Zheng, Z. & Qingyun, H. Spatio-temporal evaluation of the urban agglomeration expansion in the middle reaches of the yangtze river and its impact on ecological lands. Science of The Total Environment 790, 148150 (2021).
Article CAS PubMed MATH Google Scholar
Dai, X. et al. Spatio-temporal variations of ecosystem services in the urban agglomerations in the middle reaches of the yangtze river, china. Ecological Indicators 115, 106394 (2020).
Article MATH Google Scholar
Liu, Y., Lai, L. & Gao, Y. Mapping 10 m monthly surface water dynamics in the yangtze river basin from 2017 to 2020 using a robust atmc algorithm. Journal of Hydrology 626, 130327 (2023).
Article Google Scholar
Zhu, X. X. et al. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE geoscience and remote sensing magazine 5, 8–36 (2017).
Article MATH Google Scholar
Ma, L. et al. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS journal of photogrammetry and remote sensing 152, 166–177 (2019).
Article ADS MATH Google Scholar
Wang, L., Zuo, B., Le, Y., Chen, Y. & Li, J. Penetrating remote sensing: Next-generation remote sensing for transparent earth. The Innovation 4, 100519 (2023).
Article PubMed PubMed Central Google Scholar
Li, W. et al. Multiclass crop interpretation via a lightweight attentive feature fusion network using vehicle-view images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 18, 496–509 (2024).
Article MATH Google Scholar
Fan, R. et al. Urban informal settlements classification via a transformer-based spatial-temporal fusion network using multimodal remote sensing and time-series human activity data. International Journal of Applied Earth Observation and Geoinformation 111, 102831 (2022).
Article MATH Google Scholar
Fan, R. et al. Fine-scale urban informal settlements mapping by fusing remote sensing images and building data via a transformer-based multimodal fusion network. IEEE Transactions on Geoscience and Remote Sensing 60, 1–16 (2022).
MATH Google Scholar
Zhang, L. et al. A prolonged artificial nighttime-light dataset of china (1984-2020). Scientific Data 11, 414 (2024).
Article PubMed PubMed Central MATH Google Scholar
Tong, X. et al. Global area boom for greenhouse cultivation revealed by satellite mapping. Nature Food 1–11 (2024).
Zhu, Z. & Woodcock, C. E. Automated cloud, cloud shadow, and snow detection in multitemporal landsat data: An algorithm designed specifically for monitoring land cover change. Remote Sensing of Environment 152, 217–234 (2014).
Article ADS Google Scholar
Zhu, Z., Wang, S. & Woodcock, C. E. Improvement and expansion of the fmask algorithm: Cloud, cloud shadow, and snow detection for landsats 4–7, 8, and sentinel 2 images. Remote sensing of Environment 159, 269–277 (2015).
Article ADS Google Scholar
Zou, Z. et al. Continued decrease of open surface water body area in oklahoma during 1984–2015. Science of the Total Environment 595, 451–460 (2017).
Article ADS CAS PubMed MATH Google Scholar
Tucker, C. J. Red and photographic infrared linear combinations for monitoring vegetation. Remote sensing of Environment 8, 127–150 (1979).
Article ADS MATH Google Scholar
Huete, A. et al. Overview of the radiometric and biophysical performance of the modis vegetation indices. Remote sensing of environment 83, 195–213 (2002).
Article ADS Google Scholar
Xu, H. Modification of normalised difference water index (ndwi) to enhance open water features in remotely sensed imagery. International journal of remote sensing 27, 3025–3033 (2006).
Article ADS MATH Google Scholar
Farr, T. G. et al. The shuttle radar topography mission. Reviews of geophysics 45 (2007).
Xie, E. et al. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34, 12077–12090 (2021).
MATH Google Scholar
Rolf, E., Klemmer, K., Robinson, C. & Kerner, H. Position: mission critical--satellite data is a distinct modality in machine learning. Proceedings of the 41 st International Conference on Machine Learning. (2024).
Wang, M. et al. Interannual changes of coastal aquaculture ponds in china at 10-m spatial resolution during 2016–2021. Remote Sensing of Environment 284, 113347 (2023).
Article MATH Google Scholar
Tarvainen, A. & Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30 (2017).
Olofsson, P. et al. Good practices for estimating area and assessing accuracy of land change. Remote sensing of Environment 148, 42–57 (2014).
Article ADS MATH Google Scholar
Foody, G. M. Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification. Remote sensing of environment 239, 111630 (2020).
Article MATH Google Scholar
Ma, H., Wang, L., Yang, X., Fan, R. & He, K. Prolonged Water-Body Types Dataset of Urban Agglomeration in Central China(1990-2021), https://doi.org/10.57760/sciencedb.12990 (2024).
Li, C. et al. The first all-season sample set for mapping global land cover with landsat-8 data. Science Bulletin 62, 508–515 (2017).
Article ADS MathSciNet PubMed MATH Google Scholar
Lehner, B. & Döll, P. Development and validation of a global database of lakes, reservoirs and wetlands. Journal of hydrology 296, 1–22 (2004).
Article ADS MATH Google Scholar
Zanaga, D. et al. Esa worldcover 10 m 2021 v200, https://doi.org/10.5281/zenodo.7254221 (2022).

Download references

Acknowledgements

This paper is funded by National Natural Science Foundation of China (No.U21A2013) and the “CUG Scholar” Scientific Research Funds at China University of Geosciences (Wuhan) (Project No.2024013).

Author information

Authors and Affiliations

China University of Geosciences, Wuhan, School of Computer Science, Wuhan, 430078, China
Haozheng Ma, Xiaohong Yang, Runyu Fan & Lizhe Wang
China University of Geosciences, Wuhan, Key Laboratory of Geological Survey and Evaluation of Ministry of Education, Wuhan, 430074, China
Kang He

Authors

Haozheng Ma
View author publications
Search author on:PubMed Google Scholar
Xiaohong Yang
View author publications
Search author on:PubMed Google Scholar
Runyu Fan
View author publications
Search author on:PubMed Google Scholar
Kang He
View author publications
Search author on:PubMed Google Scholar
Lizhe Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

W.L. conceived the experiment(s), M.H. and F.R. conducted the experiment(s), H.K. analysed the results, M.H. and Y.X. wrote the paper.

Corresponding author

Correspondence to Lizhe Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ma, H., Yang, X., Fan, R. et al. Prolonged water body types dataset of urban agglomeration in central China from 1990 to 2021. Sci Data 12, 480 (2025). https://doi.org/10.1038/s41597-025-04794-3

Download citation

Received: 03 October 2024
Accepted: 10 March 2025
Published: 22 March 2025
DOI: https://doi.org/10.1038/s41597-025-04794-3

Subjects

Abstract

Similar content being viewed by others

Evaluation of water ecological security and diagnosis of Obstacles in the Yangtze river delta, China

Extraction of water bodies from high-resolution remote sensing imagery based on a deep semantic segmentation network

Ecological-environmental challenges and restoration of aquatic ecosystems of the Middle-Eastern

Background & Summary

Methods

Study area

Materials

Satellite imagery

Semantic segmentation Dataset

Binary water map

Multiple Water body types mapping using deep semantic segmentation models

Long term multiple Water body types mapping via weakly supervision

Accuracy assessment

Data Records

Technical Validation

Accuracy assessment on the mapping product

Validation on sample points

Validation on open-access validation sets

Statistical-level validation

Comparison with other water-related datasets

Comparison with long-term mapping products

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links