Abstract
In this study, Sentinel-2 time series satellite remote sensing imagery and an improved CA-DeepLabV3+ semantic segmentation network were utilized to construct a model for extracting urban impervious surfaces. The model was used to extract the distribution information of impervious surfaces in the central urban area in Chongqing from 2017 to 2022. The spatiotemporal evolution characteristics of the impervious surfaces were analyzed using the area change and standard deviational ellipse methods. The results indicate that the improved CA-DeepLabV3+ model performs exceptionally well in identifying impervious surfaces, with precision, recall, F1 score, and MIoU values of 90.78%, 90.85%, 90.82%, and 83.25%, respectively, which are significantly better than those of other classic semantic segmentation models, demonstrating its high reliability and generalization performance. The analysis shows that the impervious surface area in Chongqing’s central urban area has grown rapidly over the past five years, with a clear expansion trend, especially in the core urban area and its surrounding areas. The standard deviational ellipse analysis revealed that significant directional expansion of the impervious surfaces has occurred, primarily along the north–south axis. Overall, this model can achieve large-scale, time-series monitoring of the impervious surface distribution, providing critical technical support for studying urban impervious surface expansion and fine urban management, presenting promising application prospects.
Similar content being viewed by others
Introduction
The broad concept of impervious surface (IS) refers to natural or man-made materials that prevent surface water from infiltrating into the soil, thereby altering the flood runoff flow, material deposition, and pollution profiles1. In academic research, the term impervious surfaces typically refers to areas composed of artificial materials that are hard and impermeable, such as building rooftops, parking lots, and roads. Since 2000, there has been a rapid increase in research on impervious surface remote sensing both in China and internationally. Although research on impervious surface remote sensing started relatively late in China, it has developed rapidly, with Chinese scholars increasingly contributing to this field. In recent years, scholars from around the world have conducted extensive research on methods for extracting impervious surfaces using various remote sensing images.
Currently, methods for extracting impervious surfaces can be broadly classified into three categories: spectral mixture analysis (SMA), index-based methods, and image classification methods. SMA is a relatively mature method for the extraction of impervious surfaces. Its innovation lies in decomposing the endmember proportions within mixed pixels, enabling precise extraction of different land cover information, and addressing the issues of mixed pixels and spectral similarity. SMA can effectively address the impact of mixed pixels on the extraction of impervious surface information2. Ridd et al.3 viewed urban surfaces as a linear combination of vegetation-impervious surface-soil (V-I-S) and proposed the well-known V-I-S model, providing a new perspective for extracting impervious surfaces. SMA mitigates the issue of mixed pixels in medium and low spatial resolution images to some extent. However, due to the strong subjectivity of endmember selection, the complexity of urban impervious surface categories, and the challenge of distinguishing spectral heterogeneity, selecting endmembers on a large scale remains difficult. In addition, this method tends to overestimate areas with low impervious surface coverage and underestimate areas with high impervious surface coverage4. Compared with SMA, extracting impervious surfaces by establishing impervious surface indices has significant advantages. Index-based methods utilize the spectral characteristics of impervious surfaces and enhance the differences between impervious surfaces and other land cover types through algebraic operations between spectral bands to achieve impervious surface extraction5. Chinese scholar Xu (2010) proposed the normalized difference impervious surface index (NDISI), which specifically considers the impact of water bodies and sand on impervious surfaces6. However, distinguishing impervious surfaces from bare land remains challenging. The index method is simple and straightforward; however, because it relies on spectral values to extract information, selecting an appropriate impervious surface index and determining the optimal threshold for different images, regions, and sensor data can be challenging. Therefore, constructing a highly applicable index model is relatively difficult. Deyong et al. (2003) applied the classification and regression tree (CART) method to Landsat and nighttime light data, achieving good results in impervious surface extraction7. However, the CART method is sensitive to data noise and training sample errors, which can affect the final accuracy to some extent. Sawaya et al. (2003) established a regression relationship between impervious surface indices and the normalized difference vegetation index (NDVI) and performed pixel-by-pixel calculations on high-resolution data, yielding good extraction results8. Multivariate regression analysis avoids classifying a mixed pixel containing a range of impervious surfaces as a single category, but the issue of mixed pixels containing both impervious surface information and bare soil information leads to a lower estimation accuracy.
Image classification methods regard impervious surfaces as a type of land cover and classify them using their spectral and spatial characteristics9. Common remote sensing image classification algorithms include artificial neural networks (ANNs)10, maximum likelihood (ML)11, support vector machines (SVMs)12, and random forests (RF)13. Weng (2008) noted that ANNs can handle nonlinear spectral mixing issues and have a higher extraction accuracy compared with linear spectral mixture analysis models10. However, their drawbacks include slow convergence during training and high requirements for data samples14. Nallapareddy et al. argued that even with incomplete and erroneous input data, ANNs can provide solutions by learning and generalizing from examples and experience15. Zhang et al. (2009) used linear spectral analysis and least squares support vector machines (LS-SVM) to extract impervious surfaces in Nanjing based on Landsat images and achieved a high extraction accuracy16. Currently, kernel function selection mainly relies on experiments and experience, and there is a lack of consensus on the impact of kernel functions on classification accuracy14. Zhang Bing et al. successfully produced a global 30 m high-precision impervious surface mapping product using the random forest model17. Conventional machine learning methods perform well, but a notable drawback is their reliance on domain experts manually engineered features, and these extraction methods are composed of single-layer feature extraction models without deep feature extraction18,19,20.
In recent years, deep learning has allowed for significant progress to be made in image classification, image segmentation, and other fields. Deep learning models can effectively extract both low-level and high-level features from training images, significantly enhancing processing speed and accuracy21,22,23,24. Traditional convolutional neural networks (CNNs) are mainly used for image classification and are suited for fixed-size input images, but they struggle with pixel-level semantic segmentation. This is because the fully connected layers in CNNs lose spatial information, making fine pixel-level predictions difficult25. Haotao Hu et al. improved segmentation accuracy by adjusting the dilation rates in the parallel structure of the Atrous Spatial Pyramid Pooling (ASPP) module within the DeepLab network, leading to the development of the DeepLabv3+ model26. DeepLabv3+ performs exceptionally well in image segmentation tasks across various scenarios. Compared to other advanced models, DeepLabv3+ exhibits significant advantages in terms of accuracy, robustness, and real-time performance. These benefits are attributed to its multi-scale feature extraction, the design of the ASPP module, and the introduction and optimization of the decoder module. These features make DeepLabv3+ one of the leading models in the field of image segmentation today. While the ASPP module enhances feature extraction, its overall architecture remains limited, especially when merging features at different scales. It is not sufficient for handling complex scenes and performs poorly with densely distributed small objects27,28,29.
To address the issue that most deep learning models have difficulty extracting small targets and produce blurred edge segmentation in remote sensing images, this study improves the classic DeepLabv3+ network for remote sensing image segmentation. We replace the complex and computationally intensive Xception backbone with the lightweight MobileNetV2 backbone. MobileNetV2, known for its efficient computation and high accuracy, reduces model complexity while maintaining effective feature extraction. Additionally, we introduce the Coordinate Attention (CA) module as the attention mechanism. Compared with traditional modules like CBAM and SE, which may lose spatial information when calculating channel attention, the CA module captures interactions between spatial positions and channels, enabling the model to focus more effectively on target regions. These improvements enhance the segmentation accuracy without significantly increasing the computational cost. Specifically, this study aims to address the following issues: (1) difficulty in extracting small targets due to their size and dense distribution; (2) information loss and insufficient feature extraction at different scales in complex scenes; and (3) blurred edge segmentation caused by conventional models. We designed the CA-DeepLab v3 + model to reduce information loss, enhance the extraction of key features at different scales, and effectively improve the ability to handle complex scenes and densely distributed small objects. We applied the improved model to a Sentinel-2 image dataset to extract impervious surfaces in the main urban area in Chongqing.
Materials and methods
Study area and data
Study area
Chongqing is located in the upper reaches of the Yangtze River (105°11′–110°11′E, 28°10′–32°13′N), serving as a key national center for advanced manufacturing and the financial hub of the western region, and the main urban area covers an area of 5,273 km2. This region is characterized by its mountain city topography and contains the Jinyun, Zhongliang, Tongluo, and Mingyue mountain ranges. The Yangtze and Jialing rivers converge here, resulting in significant elevation variations and fragmented terrain. As a national central city and a critical economic support point in Southwest China, Chongqing’s main urban area is an important economic center and transportation hub in western China. It is characterized by a highly developed economy, high urbanization level, and rapid urban development. The main urban area contains a wide variety of land cover types and distinct urban boundaries, leading to a concentrated and widespread distribution of impervious surfaces. Therefore, accurately estimating impervious surfaces is crucial for urban planning, spatial coordination, and environmental protection in Chongqing’s main urban area. However, previous research in this area has been relatively limited.
In this study, we utilized multi-temporal land use/land cover (LULC) remote sensing monitoring data for Chongqing from 2017 to 2022. The spatial distributions of the study area and land use types are shown in Fig. 1. The urban land, rural residential areas, and other construction land within the main urban area are typical types of impervious surfaces in the land cover dataset for Chongqing.
China map with the location of Chongqing marked (top left); Digital Elevation Model (DEM) of Chongqing (bottom left); Land use type map of Chongqing’s main urban area (right side) (by ArcGIS Pro v.3.0.1, Esri Inc., Redlands, CA, USA. https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview/).
Data
In this study, we utilized Sentinel-2 satellite images as the remote sensing data source, which meet the high spatial resolution requirement. Sentinel-2, part of the European Space Agency’s Copernicus program, is a high-resolution multispectral imaging satellite and is equipped with a multispectral imaging (MSI) sensor, which has 13 bands with central wavelengths ranging from 443 to 2190 nm. This includes all of the spectral bands within the visible, near-infrared, and short-wave infrared regions30,31. It has spatial resolutions of 10, 20, and 60 m, and the complementary A and B satellites have a revisit period of up to 5 days. Sentinel-2 has been successfully applied in various fields such as fire assessment32, land cover classification33,34, and agricultural plastic greenhouse identification35. The parameters of the Sentinel-2 data are shown in Table 1. The preprocessing of the Sentinel-2 imagery included radiometric calibration, atmospheric correction, resampling, and image cropping. The data for 2017–2019 were at the L1C level, which consists of orthorectified images with geometric corrections but without radiometric calibration and atmospheric correction. We used the atmospheric correction module Sen2Cor released by the European Space Agency to process the remote sensing images and obtain L2A level data31. The data for 2019–2022 were preprocessed using the Google Earth Engine, including cloud removal using the QA60 band, which contains cloud mask information.
Methods
This paper presents an architecture for extracting impervious surfaces from Sentinel-2 images based on the improved DeepLabv3+ network model. The main steps are as follows: (1) data preprocessing and dataset construction; (2) training the semantic segmentation model; (3) extracting the impervious surface information from the time series; and (4) analyzing the spatiotemporal changes in the impervious surfaces. The automated extraction process is illustrated in Fig. 2.
Dataset construction
Based on high-resolution Sentinel-2 satellite remote sensing images, binary images of impervious surfaces were created. The ArcGIS Pro software was used to conduct manual image annotation to generate ground truth data. According to visual interpretation and field survey data, the samples were classified into pervious and impervious surfaces. The annotated impervious surfaces included buildings, roads, and parking lots, and the pervious surfaces included vegetation, water bodies, and bare soil. The pervious and impervious samples were assigned values of 0 and 1, respectively. Considering the computer memory, computational efficiency, and feature learning, in this study, we adopted a sliding window data input scheme, cropped the preprocessed images, and annotated the images into 256 × 256 pixel images. To enhance the robustness of the images and reduce the sensitivity of the training data to prevent overfitting, data augmentation was performed on the cropped samples through multi-angle flipping, mirroring, linear stretching, and noise addition. Finally, 527 images and 527 labels were obtained and prepared for dataset construction. These image-label pairs were divided into training and validation sets with a 4:1 ratio. Additionally, an independent test dataset consisting of 150 images and 150 labels, which was not utilized during the training or validation phases, was prepared to assess the model’s performance. The test images and corresponding labeled samples used in this study are shown in Fig. 3.
Original images and labeled samples(by ArcGIS Pro v.3.0.1, Esri Inc., Redlands, CA, USA. https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview/).
DeepLabv3+ Network modeling
DeepLabv3+ is an advanced deep learning model designed for image semantic segmentation tasks, aiming for precise pixel-level classification. The model employs an encoder-decoder structure36. The encoder uses a depth-wise separable convolution network (Xception) to extract both shallow and deep semantic information. It feeds deep features into an ASPP (Atrous Spatial Pyramid Pooling) module with dilation rates of 1, 6, 12, and 18, followed by a global average pooling operation. These features at different scales are then concatenated along the channel dimension and compressed through a 1 × 1 convolution layer to obtain high-level feature maps. The advantage of this model lies in increasing the receptive field without losing information, allowing each convolution output to encompass a larger range of information. The decoder restores detail through gradual up sampled and feature fusion, improving the accuracy of segmentation boundaries. Initially, a 1 × 1 convolution layer adjusts the channel number of shallow features down-sampled twice, then these features are concatenated with the high-level feature maps up-sampled four times and optimized using 3 × 3 convolutions. Finally, a 4× up-sampled is performed to generate the prediction image with the same resolution as the original image. Key innovations of the model include the introduction of atrous convolution and ASPP, capturing rich contextual information at different scales, enhancing the understanding of complex scenes, expanding the receptive field, and improving the recognition capability of objects at various scales. The specific framework of the network is illustrated in Fig. 4.
Traditional network structure of DeepLabv3+ (by ArcGIS Pro v.3.0.1, Esri Inc., Redlands, CA, USA. https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview/).
CA-DeepLabV3+ model
This study is based on the classic DeepLabv3+ network with some improvements. The original DeepLabv3+ model uses Xception as the backbone network for feature extraction in the encoder part, but its structure is complex with a high number of parameters and computational load. To address this issue, this study adopts the lightweight MobileNetV2 as the backbone network. MobileNetV2 can maintain efficient computation while providing high accuracy, performing well in classification and detection tasks. MobileNetv2 is an upgraded version of MobileNetv1. It still uses depth-wise separable convolutions and incorporates a 1 × 1 expansion convolution to expand the number of channels before the data enter the depth-wise convolution37. Its main building blocks are shown in Fig. 5a.
After extracting low-level and high-level features, we utilize the CA module as the attention mechanism (Fig. 5b). Existing attention mechanisms (such as the convolution block attention module (CBAM) and squeeze and excitation (SE)) generally use global max pooling or average pooling when calculating channel attention, which can result in the loss of spatial information about objects. The CA module introduces spatial attention mechanisms alongside the channel attention, thus embedding positional information into the channel attention. Channel attention mechanisms typically rely on global average pooling to convert input features into a single feature vector, capturing cross-channel information. However, Coordinate Attention (CA) breaks this traditional framework by decomposing the channel attention mechanism into two parallel and independent spatial feature encoding processes, aggregating features along the vertical (height) and horizontal (width) directions38. The advantage of the CA attention mechanism lies in its unique design, which captures inter-channel interactions while also gathering direction-aware and position-aware information crucial for visual tasks, without significantly increasing computational cost. This innovative approach allows the model to maintain computational efficiency while focusing on a broader area, thereby achieving more precise localization and recognition of target objects.
The overall network architecture of the improved DeepLabv3+ remote sensing image segmentation algorithm is shown in Fig. 6. During the feature extraction process, high-level and low-level features are weighted by the CA module to highlight the more important features. These features are then fused, and the fused feature map is processed through a 3 × 3 convolution and an up-sampling operation to obtain the prediction results. By introducing the CA module, the model is better able to focus on regions related to the target objects and to understand contextual information, thus improving the segmentation of each object and enhancing the model’s segmentation accuracy.
Enhanced DeepLabv3+ network model (by ArcGIS Pro v.3.0.1, Esri Inc., Redlands, CA, USA. https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview/).
Accuracy evaluation
To verify the performance of the semantic segmentation algorithm, four accuracy evaluation metrics were utilized: precision, recall, F1 score, and mean intersection over union (MIoU). Using manually annotated impervious surfaces as a reference, we quantitatively evaluated the extraction results of the impervious surfaces. In binary classification, a pixel can result in four possible outcomes: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP and TN are the number of pixels correctly classified as impervious surfaces and pervious surfaces, respectively. FP is the number of pervious surface pixels incorrectly classified as impervious surfaces, and FN is the number of impervious surface pixels incorrectly classified as pervious surfaces.
-
(1)
The precision is the proportion of pixels correctly predicted as impervious surfaces among all of the pixels that are actually impervious surfaces. The formula is as follows:
$$Precision=\frac{{TP}}{{TP+FP}}$$(1) -
(2)
The recall is the proportion of positive samples correctly predicted by the model out of all of the actual positive samples. The formula is as follows:
$$Recall=\frac{{TP}}{{TP+FN}}$$(2) -
(3)
The F1 score is a metric that considers both the precision and recall aiming to balance the two by achieving the highest value for both. The formula is as follows:
$$F1=2 \times \frac{{Precision \times Recall}}{{Precision+Recall}}$$(3) -
(4)
The MIoU is a standard metric for measuring the accuracy of semantic segmentation. It is used to calculate the degree of overlap between the extracted impervious surface patches and the actual annotated plots. The MIoU is computed by calculating the ratio of the intersection and union of the predicted and ground truth sets for each class and then averaging these ratios. This metric provides strong objectivity in classification models.
$$MIoU=\frac{1}{{k+1}}\sum\limits_{{i=0}}^{k} {\frac{{TP}}{{FN+FP+TP}}}$$(4)
Results
Comparison of training results of different models
To validate the performance of the CA-DeepLabV3+ model in impervious surface recognition, we conducted a comparative study with three classical semantic segmentation models: DeepLabv3+, UNet, and PSPNet. During the comparison, we ensured that the training parameters of all models were consistent. The final experimental results showed that the CA-DeepLabV3+ model performed the best in terms of accuracy, ranking first among the four models. Table 2; Fig. 7 details the accuracy comparison data of the improved DeepLabv3+ model and the other three classical models in both impervious surface(IS) and pervious surface(PS) recognition tasks. Additionally, as shown in Table 2, the overall accuracies (OAs) of all the models in impervious surface recognition exceeded 80%, which fully demonstrates the significant advantages and effectiveness of deep learning in the field of impervious surface recognition.
Among the four evaluation metrics, the improved CA-DeepLabV3+ model has significant superiority in terms of its impervious surface and pervious surface recognition accuracy compared with the other three models. For the impervious surface and pervious surface recognition accuracies, the overall precision of the CA-DeepLabV3+ model reaches 90.78%, its recall is 90.85%, its MIoU is 83.25%, and its F1 score is 90.82%. Compared with the original model, the metrics of the CA-DeepLabV3+ model are 1.11%, 1.95%, 2.54%, and 1.56% higher, respectively. Compared with the UNet model, the metrics of the CA-DeepLabV3+ model are 3.86%, 4.75%, 3.99%, and 4.34% higher, respectively. Compared with the PSPNet model, the metrics of the CA-DeepLabV3+ model are 0.79%, 2.10%, 3.29%, and 1.22% higher, respectively (Table 2). These results indicate that the CA-DeepLabV3+ model performs excellently in impervious surface extraction tasks, and it has a higher model generalization performance and more accurate recognition of impervious surface areas.
As shown in Fig. 8, the CA-DeepLabV3+ model exhibits superior performance in the task of impervious surface segmentation in Chongqing. The model demonstrates multiple significant advantages. It performs multi-scale feature extraction, accurately capturing large-scale roads and small-scale buildings, thereby achieving precise segmentation of complex scenes. The model excels in boundary detection, providing high-precision segmentation with clear edges for both buildings and roads. The contextual attention mechanism effectively enhances the correlation between distant pixels, significantly improving the model’s understanding and segmentation of complex scenes. Furthermore, the model shows good stability and consistency across different scenes and angles, indicating strong generalization ability. In contrast, the original DeepLabv3+ model and classic semantic segmentation models like UNet and PSPNet yield relatively poorer results. They tend to misclassify certain pervious surface areas, such as bare soil and sediment-laden water bodies, as impervious surfaces and fail to adequately capture fragmented impervious surface areas. In summary, the CA-DeepLabV3+ model demonstrates outstanding performance and broad application potential in the task of impervious surface segmentation.
Extracted impervious surfaces using four different models. The black area represents the background, and the white area represents the impervious surfaces, figure (a) contains bare soil and water, figure (b) contains complex and fragmented impervious surface areas, and figure (c) contains contiguous impervious surface areas. (by ArcGIS Pro v.3.0.1, Esri Inc., Redlands, CA, USA. https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview/).
Spatiotemporal analysis of impervious surfaces in Chongqing’s urban area
Area changes
The expansion of impervious surfaces in the main urban area of Chongqing from 2017 to 2022 is shown in Table 3. In addition to the dynamic changes in the distribution of the extracted results (Fig. 9), it can be seen that overall, the impervious surface area in Chongqing has rapidly increased over the past five years and has exhibited different degrees of expansion. From 2017 to 2022, the built-up area of Chongqing city increased from 1351.74 square kilometers to 1698.47 square kilometers, representing a growth of 25.7% (source: Chongqing Statistical Yearbook 2018–2023)39,40,41,42,43,44. During the same period, the impervious surface area in the main urban area increased from 855.51 square kilometers to 1051.37 square kilometers, an increase of 22.9%. Although there are differences in the absolute values of these datasets, their growth trends are highly consistent. This indicates a strong correlation between impervious surface area and built-up land area, both of which reflect the progress and intensity of urban development. This consistency further confirms the reliability and accuracy of the results obtained in this study.
In 2017, the impervious surfaces in Chongqing were mainly distributed in the center of the city and the surrounding areas, and no large-scale impervious surfaces were observed in other areas. The area was 855.51 km2. From 2017 to 2018, the coverage of the impervious surfaces in Chongqing rapidly increased, and the area expanded to 897.23 km2, with an increase of 41.72 km2 and a growth rate of 4.88%. The expansion was mainly concentrated in the urban core and surrounding areas, and significant growth occurred in the eastern and southern regions. From 2018 to 2019, the impervious surface area continued to expand, increasing to 932.28 km2, with an increase of 35.05 km2 and a growth rate of 3.91%. Although the expansion rate slowed slightly, it continued to increase overall. The expansion in the urban core began to saturate, and new development areas became the main growth regions. From 2019 to 2020, the impervious surface area further increased to 959.43 km2, with an increase of 27.15 km2 and a growth rate of 2.91%. The expansion rate was relatively steady, reflecting ongoing urban development. From 2020 to 2021, the impervious surface area in Chongqing’s main urban area increased to 976.59 km2, with an increase of 17.16 km2 and a growth rate of 1.79%. The expansion rate slowed significantly as a result of the influence of the COVID-19 pandemic, which restricted urban construction and development activities. From 2021 to 2022, the impervious surface area significantly increased to 1051.37 km2, with an increase of 74.78 km2 and a growth rate of 7.66%. According to the Chongqing Territorial Spatial Master Plan (2021–2035)45, after 2021, Chongqing accelerated the development of several key areas, particularly the Liangjiang New Area and the Western (Chongqing) Science City. These regions have been designated as strategic growth poles for the city’s development, focusing on technological innovation and the concentration of high-end industries. With the rapid construction of infrastructure and functional expansion in these areas, the intensity of land development has significantly increased, leading to a marked growth in impervious surface area. The large-scale construction of transportation networks, public service facilities, and emerging industrial parks has particularly driven the notable rebound in impervious surface expansion from 2021 to 2022, resulting in the accelerated growth observed during this period. The urban development plans and policy directions of the government directly influenced the speed and scope of the impervious surface expansion.
Standard deviation ellipse analysis
To study the distribution patterns and directional changes in the impervious surfaces in Chongqing, the standard deviation ellipse (SDE) method was used to analyze the spatial changes in the impervious surfaces from 2017 to 2022. The SDE, based on the spatial location and structure of the study objects, quantitatively analyzes the centrality, directionality, and spatial morphology of geographic features. The long axis of the ellipse represents the dominant direction of the impervious surfaces, while the short axis represents the main distribution range. A longer long axis indicates a more pronounced directionality of the distribution, while a shorter short axis indicates stronger aggregation.
By comparing the standard deviation ellipses and the shifts in their centroids (Table 4; Fig. 10), it was found that due to the shape of the study area, the ellipses maintained a relatively stable tilt direction during the five periods. The direction of the long axis of the impervious surfaces remained basically unchanged. The length of the long axis of the standard deviation ellipse increased over time, reflecting a clear directionality in the expansion of the impervious surfaces, mainly along the north–south axis of the main urban area. The short axis also grew each year, indicating that the impervious surfaces in the east and west were also expanding, and the overall aggregation decreased. Chongqing’s main urban area was located at the confluence of the Yangtze and Jialing rivers. This area is characterized by hilly and mountainous terrain. The urban development in this region is constrained by the terrain and has primarily expanded along the north–south direction, while the expansion in the east–west direction is slower.
The directions and distances of the centroid shifts reflect the centrality and the change trend of the distribution of the impervious surfaces. From 2017 to 2018, the trajectory of the centroid shifted southwest from a position slightly south of the center of the study area. The shift distance was large, about 1.59 km. From 2018 to 2021, the centroid first moved northeastward, then slightly southwestward, and then significantly eastward in 2022. The shift distance was approximately 1.67 km.
Overall, the average center and expansion direction of Chongqing’s impervious surfaces were largely consistent with the city’s urban planning development. The development was centered on the main urban area, promoting coordinated development in the northeastern, southeastern, and western regions of Chongqing and advancing urban–rural integration and regional integration.
Discussion
In this study, we developed a remote sensing semantic segmentation model for urban impervious surfaces that utilizes Sentinel-2 time-series satellite remote sensing images and is an improvement on theCA-DeepLabV3+ semantic segmentation network. The proposed model was used to extract impervious surface distribution information about Chongqing’s central urban area from 2017 to 2022. It was then applied to analyze the spatiotemporal evolution characteristics of the impervious surfaces using area change and standard deviation ellipse methods. The main conclusions of this study are as follows.
-
(1)
The CA-DeepLabV3+ model is highly reliable. The improved CA-DeepLabV3+ network-based model for extracting impervious surface information was demonstrated to have high reliability, with precision, recall, F1 score, and MIoU values of 90.78%, 90.85%, 90.82%, and 83.25%, respectively. The model achieved a high accuracy and was demonstrated to effectively monitor the large-scale, time-series impervious surface distribution. Based on these four evaluation metrics, the CA-DeepLabV3+ model significantly outperformed the other three models in terms of accuracy in identifying impervious and pervious surfaces. These results indicate that the CA-DeepLabV3+ model performs excellently in impervious surface extraction tasks, and it has a higher generalization performance and more accurate recognition capabilities.
-
(2)
In this study, we analyzed the spatiotemporal changes in the impervious surfaces in Chongqing’s central urban area and found that the coverage area significantly expanded from 2017 to 2022. Based on the extraction results and dynamic changes, the quantity and area of the impervious surface regions in Chongqing grew rapidly during 2017–2022 and exhibited varying degrees of expansion. The distribution of the impervious surfaces was primarily concentrated in the urban core and surrounding areas, and the expansion speed and direction were influenced by urban development planning and policy guidance.
-
(3)
Analysis using the standard deviation ellipse method revealed that the expansion of the impervious surfaces in Chongqing’s central urban area exhibited a clear directional tendency, primarily along the north–south axis. Some expansion also occurred in the east–west direction but with weaker overall aggregation. Chongqing’s central urban area, located at the confluence of the Yangtze and Jialing rivers, is characterized by hilly and mountainous terrain. Urban development is constrained by this terrain, leading to primarily north–south expansion. The migration trajectory of the mean center reflects the direction and central position of the changes in the impervious surface expansion, which is very consistent with Chongqing’s urban planning.
-
(4)
The improved CA-DeepLabV3+ model excels in extracting impervious surface information, significantly outperforms other classical semantic segmentation models in terms of accuracy in identifying impervious and pervious surfaces, and has a high reliability and generalization performance. This model can achieve large-scale, time-series monitoring of the impervious surface distribution, providing technical support for the study of urban impervious surface expansion and fine-scale urban management. In the future, we will continue to improve impervious surface extraction using high-resolution and hyperspectral remote sensing data to obtain more detailed and accurate distribution data to support refined urban management and environmental protection.
Data availability
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.
References
Slonecker, E. T. et al. Remote sensing of impervious surfaces: A review. Remote Sens. Reviews. https://doi.org/10.1080/02757250109532436 (2001).
Fang, L. et al. Estimation of large-scale impervious surface percentage by fusion of multi-source time series remote sensing data. JRS 24, 1243–1254 (2020).
Ridd, M. K. Exploring a V-I-S (vegetation-impervious surface-soil) model for urban ecosystem analysis through remote sensing: Comparative anatomy for cities†. Int. J. Remote Sens. 16, 2165–2185 (1995).
Wu, C. & Murray, A. T. Estimating impervious surface distribution by spectral mixture analysis. Remote Sens. Environ. 84, 493–505 (2003).
Sun, Z., Wang, C., Guo, H. & Shang, R. A. Modified normalized difference impervious surface index (MNDISI) for automatic urban mapping from Landsat imagery. Remote Sens. 9, 942 (2017).
Xu, H. Analysis of impervious surface and its impact on urban heat environment using the normalized difference impervious surface index (NDISI). Photogramm. Eng. Remote Sens. 76, 557–565 (2010).
Hu, D., Chen, S., Qiao, K. & Cao, S. Integrating CART algorithm and multi-source remote sensing data to estimate sub-pixel impervious surface coverage: A case study from Beijing municipality, China. Chin. Geogr. Sci. 27, 614–625 (2017).
Sawaya, K. E., Olmanson, L. G., Heinert, N. J., Brezonik, P. L. & Bauer, M. E. Extending satellite remote sensing to local scales: Land and water resource monitoring using high-resolution imagery. Remote Sens. Environ. 88, 144–156 (2003).
Hanqiu, X. U. & Meiya, W. Remote sensing-based retrieval of ground impervious surfaces. Zggx 20, 1270–1289 (2021).
Weng, Q. & Hu, X. Medium Spatial resolution satellite imagery for estimating and mapping urban impervious surfaces using LSMA and ANN. IEEE Trans. Geosci. Remote Sens. 46, 2397–2406 (2008).
Tailor, A. Introductory digital image processing: A remote sensing perspective. Image Vis. Comput. 4, 229 (1986).
Sun, Z. Estimating urban impervious surfaces from Landsat-5 TM imagery using multilayer perceptron neural network and support vector machine. J. Appl. Remote Sens. 5, 053501 (2011).
Xu, J. et al. Measuring spatio-temporal dynamics of impervious surface in Guangzhou, China, from 1988 to 2015, using time-series Landsat imagery. Sci. Total Environ. 627, 264–281 (2018).
Jia, K., Li, Q., Tian, Y. C. & Wu, B. A. Review of classification methods of remote sensing imagery. Guang Pu Xue Yu Guang Pu Fen xi = Guang Pu 31, 2618–2623 (2011).
Anusha, N. & Balakrishnan, B. Automatic flood detection in Multi-Temporal Sentinel-1 synthetic aperture radar imagery using ANN algorithms. Int. J. Comput. Commun. Control 15 (2020).
Youjing, Z., Liang, C. & Chuan, H. Estimating urban impervious surfaces using LS-SVM with multi-scale texture. In 2009 Joint Urban Remote Sensing Event 1–6 (2009).
Zhang, X. et al. Development of a global 30 m impervious surface map using multisource and multitemporal remote sensing datasets with the Google Earth engine platform. Earth Syst. Sci. Data. 12, 1625–1648 (2020).
Cheng, G., Han, J. & Lu, X. Remote sensing image scene classification: Benchmark and state of the Art. Proc. IEEE 105, 1865–1883 (2017).
Plajer, I. C., Băicoianu, A., Majercsik, L. & Ivanovici, M. Multisource Remote Sensing Data Visualization Using Machine Learning. IEEE Transact. Geosci. Remote Sens. 62, 1–12 (2024).
Ullah, F. et al. Conventional to deep ensemble methods for hyperspectral image classification: A comprehensive survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 17, 3878–3916 (2024).
Zhao, Z.-Q., Zheng, P., Xu, S. & Wu, X. Object detection with deep learning: A review. IEEE Transact. Neur. Netw. Learning Syst. 30, 3212–3232 (2019).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems Vol. 25 (Curran Associates, Inc., 2012).
Yang, Z. & Nevatia, R. A multi-scale cascade fully convolutional network face detector. in 23rd International Conference on Pattern Recognition (ICPR) 633–638 (2016). (2016).
Stuhlsatz, A., Lippel, J. & Zielke, T. Feature Extraction With Deep Neural Networks by a Generalized Discriminant Analysis. IEEE Transact. Neur. Netw. Learning Syst. 23, 596–608 (2012).
Marmanis, D. et al. Semantic segmentation of aerial images with an ensemble of CNNs. ISPRS Ann. Photogramm Remote Sens. Spat. Inf. Sci. III–3, 473–480 (2016).
Hu, H., Cai, S., Wang, W., Zhang, P. & Li, Z. A. Semantic segmentation approach based on deeplab network in High-Resolution remote sensing images. In Image and Graphics (eds Zhao, Y. et al.) 292–304 (Springer International Publishing, 2019).
Lin, Y., Xu, D., Wang, N., Shi, Z. & Chen, Q. Road extraction from very-high-resolution remote sensing images via a nested SE-Deeplab model. Remote Sens. 12, 2985 (2020).
Yu, F. & Koltun, V. Multi-scale context aggregation by dilated convolutions. https://arxiv.org/abs/1511.07122v3 (2015).
He, K., Zhang, X., Ren, S. & Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015).
Yi, Z., Jia, L. & Chen, Q. Crop classification using multi-temporal Sentinel-2 data in the Shiyang river basin of China. Remote Sens. 12, 4052 (2020).
Zhaoying, Z. SENTINEL-2 User Handbook Sentinel-2 User Handbook.
Seydi, S. T., Akhoondzadeh, M., Amani, M. & Mahdavi, S. Wildfire damage assessment over Australia using Sentinel-2 imagery and MODIS land cover product within the Google Earth engine cloud platform. Remote Sens. 13, 220 (2021).
Feng, Q. et al. Integrating multitemporal Sentinel-1/2 data for coastal land cover classification using a multibranch convolutional neural network: A case of the yellow river delta. Remote Sens. 11, 1006 (2019).
Zhang, T., Su, J., Xu, Z., Luo, Y. & Li, J. Sentinel-2 satellite imagery for urban land cover classification by optimized random forest classifier. Appl. Sci. 11, 543 (2021).
Sun, H., Wang, L., Lin, R., Zhang, Z. & Zhang, B. Mapping plastic greenhouses with Two-Temporal Sentinel-2 images and 1D-CNN deep learning. Remote Sens. 13, 2820 (2021).
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision – ECCV 2018 (eds Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y.) 833–851 (Springer International Publishing, 2018).
Howard, A. G. et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861 (2017).
Hou, Q., Zhou, D. & Feng, J. Coordinate attention for efficient mobile network design. https://doi.org/10.48550/arXiv.2103.02907 (2021).
Chongqing Municipal Bureau of Statistics. Chongqing Statistical Yearbook (China Statistics, 2018).
Chongqing Municipal Bureau of Statistics. Chongqing Statistical Yearbook (China Statistics, 2019).
Chongqing Municipal Bureau of Statistics. Chongqing Statistical Yearbook (China Statistics, 2020).
Chongqing Municipal Bureau of Statistics. Chongqing Statistical Yearbook (China Statistics, 2021).
Chongqing Municipal Bureau of Statistics. Chongqing Statistical Yearbook (China Statistics, 2022).
Chongqing Municipal Bureau of Statistics. Chongqing Statistical Yearbook (China Statistics, 2023).
Chongqing Municipal Bureau of Planning and Natural Resources. Chongqing Territorial Spatial Master Plan (2021–2035) (Chongqing Municipal Bureau of Planning and Natural Resources, 2024).
Acknowledgements
D.W. conceived the experimental design and processed the experimental data together with Y.C. D.W. discussed the statistical analysis results. H.K. provided important guidance on the experimental design and analysis, and contributed to project management and funding acquisition. All authors reviewed the manuscript.
Funding
This research was funded by “ERP Study in Porous carbonate rock area” (No: F2022405).
Author information
Authors and Affiliations
Contributions
D.W. conceived the experimental design and processed the experimental data together with Y.C. D.W. discussed the statistical analysis results. H.K. provided important guidance on the experimental design and analysis, and contributed to project management and funding acquisition. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wei, D., Chang, Y. & Kuang, H. Extraction and spatiotemporal analysis of impervious surfaces in Chongqing based on enhanced DeepLabv3+. Sci Rep 15, 9807 (2025). https://doi.org/10.1038/s41598-025-94882-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-94882-6












