Urban change detection of remote sensing images via deep-feature extraction

Wang, Haiying; Wu, Mingzhong

doi:10.1038/s41598-025-07252-7

Download PDF

Article
Open access
Published: 01 July 2025

Urban change detection of remote sensing images via deep-feature extraction

Haiying Wang¹ &
Mingzhong Wu²

Scientific Reports volume 15, Article number: 21473 (2025) Cite this article

2453 Accesses
1 Citations
Metrics details

Subjects

Abstract

Urban change detection based on remote sensing images holds significant importance in environmental monitoring and emergency management. However, it poses several challenges including large disparity errors, diverse types of changes, and a substantial difference between the number of changed and unchanged areas. In this study, we propose an efficient model called BiUnet-Dense for extracting deep features by integrating the advantages of Bi-Unet, Dense Block, and long short-term memory (LSTM) networks. Building upon the classical architecture of U-Net, Bi-Unet utilizes bi-temporal images to compare and extract features. The incorporation of modified dense connections reduces network parameters while mitigating gradient disappearance through maximizing feature reuse. Additionally, LSTM facilitates information transmission from earlier to later using cell states to provide more meaningful feature vectors. We implemented our model on two datasets: Onera Satellite Change Detection (OSCD) and Change Detection Data of Guangzhou (CD_Data_GZ). Quantitative and qualitative results demonstrate that our method significantly improves detection effectiveness with enhanced F1-score and Kappa while effectively reducing false-positive detections as well as identifying labeling errors.

Spatial temporal fusion based features for enhanced remote sensing change detection

Article Open access 30 September 2025

Hybrid lightweight transformer for efficient landslide change detection in remote sensing imagery

Article Open access 23 December 2025

Multitask semantic change detection guided by spatiotemporal semantic interaction

Article Open access 08 May 2025

Introduction

Change detection is a fundamental process that involves comparing multiple remote sensing images captured at different time points to identify alterations on Earth’s surface within the same geographical area. It serves as a crucial approach for acquiring up-to-date geographic information, making it a significant research focus in the field of remote sensing applications. The detection of land usage and land cover (LULC) changes can provide an efficient and cost-effective method for map updating, urban planning, and economic development prediction. Consequently, remote sensing studies have garnered considerable attention in recent years within the realm of remote sensing studies^1,2,3,4.

The methods of change detection can be classified into two approaches: direct change prediction^5,6,7 and first classifying and then identifying changes^8,9. Furthermore, they can be categorized as classical machine learning-based methods^10,11 or deep learning-based methods^12,13. Initially, Walter performed pixel object classification in a GIS database to obtain change results⁹. Gu et al. proposed an enhanced Markov random field-based method for change detection¹⁴. Cao et al. extracted the changed and unchanged areas without relying on training parameters¹⁵. Lv et al. introduced a mixed conditional random field (mixed-CRF) model for change detection¹¹. While classical machine learning-based change detection methods have demonstrated some success in practical applications, they exhibit limitations in terms of model generalization, assumptions about data distribution, computational complexity, interpretability of results, sensitivity to high-dimensional and noisy data, and dependence on labeled data. To address these challenges, researchers have increasingly turned to deep learning-based algorithms and techniques to enhance the accuracy and efficiency of change detection.

Change detection network architectures based on deep learning primarily encompass AlexNet¹⁶, Visual Geometry Group (VGG)¹⁷, Inception¹⁸, residual neural network (ResNet)¹⁹, fully convolutional network (FCN)²⁰, U-Net²¹, and DeepLab V1 to DeepLab V3+^22,23,24,25, which integrate components such as Inception, ResNet, and feature pyramid²⁶. In the context of remote sensing multi-temporal urban change detection, Jiang et al. utilized a convolutional neural network to extract features from bi-temporal remote sensing images, connected these features sequentially, calculated the Euclidean distance of the feature maps, and finally obtained the change detection graph¹². However, using Euclidean distance as a metric for change detection may be insufficiently sophisticated to fully capture complex spatial and temporal change information. Liu et al. employed an unsupervised deep convolutional coupling network to effectively detect changes between heterogeneous radar image datasets²⁷. Although this method does not require labeled data, it is less accurate and comprehensive compared to supervised approaches in terms of change detection. Amirkolaee et al. estimated digital surface models (DSMs) using dense convolutional neural networks (DCNNs) to construct three-dimensional geospatial information based on single remote sensing images²⁸, the accuracy of change detection using this method is contingent upon the precision of the DSMs. D’Addabboet calculated several deep features that efficiently captured contextual information for Very High-Resolution (VHR) image analysis using AlexNet-based pretrained convolution layers²⁹. However, the processing of VHR images demanded substantial computational resources and time, thereby limiting the scalability of this method to large-scale datasets. Basavaraju et al. used modified residual connections and a new spatial pyramid pool module to complete change detection while preserving the shape of the changing region³⁰. Although these improvements enhanced model performance, they also increased model complexity, leading to greater training difficulty and higher computational costs. Jinzhu et al. used the U-Net to capture historical urban development and simulate the future model of the North China Plain, which is the fastest urbanizing region in the world³¹. This approach focused primarily on rapid urbanization but was less sensitive to natural disasters, environmental changes, and other types of alterations. Iris et al. proposed a method that can directly perform change detection in 3D data. This method is a deep Siamese nuclear point convolutional network, which completes change detection and classification in one step through 2D image change detection and 3D point cloud analysis³². While this method achieved superior accuracy and improved classification performance, it also imposed high demands on computing resources and storage space. Overall, these studies demonstrated that deep learning methods exhibit superior accuracy, improved classification performance, and significantly reduced regional confusion.

To address the limitations of existing methods, capture complex features of remote sensing images, enhance change detection, improve model generalization and feature reuse capabilities, and capture long-term dependencies in time series, this study proposes an urban change detection method that integrates Bi-Unet, Dense Block, and LSTM. The main contributions are as follows:

(1)
Unlike the single encoder structure of traditional U-Net architectures, this model employs a Bi-Unet structure, incorporating an additional encoder to process inputs from two phases. By passing the difference matrix to the decoder for feature fusion, this approach better detects changes in images.
(2)
To capture complex features and enhance the model’s generalization and feature reuse abilities, dense connections are utilized in both the encoding and decoding units. However, these connections may increase network parameters, cause redundancy in backpropagation data, and raise GPU consumption during training. These issues are mitigated by adding a Dropout layer at the end of the dense connections.
(3)
In neural network backpropagation, layers closer to the current layer have a greater influence, while the influence of distant layers diminishes with increasing network depth. To capture the influence of distant layers, this model incorporates an LSTM network, leveraging gate control mechanisms to combine temporal features learned by different layers, thereby effectively capturing long-term dependencies in time series.

Methods

The proposed study introduces an urban change detection model based on the classical U-Net architecture, which enables more accurate semantic segmentation results with a reduced number of training samples. Figure 1 illustrates the specific architecture of our proposed model, which consists of an encoder and decoder. The encoder extracts low-level features from bi-temporal images, whereas the decoder extracts corresponding high-level features. The input to the model comprises bi-temporal images, and the output is the change map between them. The feature extraction of each image was performed in four units, with the detailed structures explained later in this paper. By calculating the differences between bi-temporal images, feature channels were added to establish a path for backward propagation during training that connects low-level features with their corresponding high-level counterparts²¹. The four skip connections connect the four decoder units. To ensure feature reuse across each unit, both the encoder and decoder adopted Dense Blocks to connect feature maps throughout the network layer. LSTM networks are employed for each block in low-level feature extraction to address long-term dependencies in simple recurrent neural networks (RNNs).

Bi-Unet

In recent years, the proliferation of deep learning-based methodologies across diverse domains has led to a plethora of change detection techniques. In^33,34, a novel deep patch-based architecture wherein features extracted from bi-temporal patches are simultaneously processed through an array of expanded convolutional layers. However, this approach requires pixel-level processing at the individual level, which results in computationally intensive operations. Daudt et al. introduced three distinct fully convolutional Siamese networks for change-region detection³⁵; however, their models lacked sufficient consideration of temporal data patterns. To address these limitations, we propose a deep learning model based on Bi-Unet, which effectively captures spatial features from bi-temporal inputs. Figure 2 illustrates the specific modules in both the encoder and decoder units.

The architecture of Bi-Unet is based on the U-Net network. The core feature of the U-Net network lies in its distinctive U-shaped structure, which consists of two main components: an encoder and a decoder. In the encoder section, image features are progressively extracted via multi-layer convolutional and pooling operations, thereby reducing the data dimensionality. In the decoder section, the spatial dimensions and details of the image are gradually restored through upsampling operations and skip connections. Skip connections, a key characteristic of the U-Net network, facilitate the effective fusion of features at different scales by concatenating the feature maps from the encoder with the upsampled results from the decoder. This design enables the U-Net network to capture fine structures within images, thereby enhancing the segmentation accuracy for small targets and edge regions. The U-Net network employs a fully convolutional architecture, eliminating the need for fully connected layers found in traditional CNNs, thus reducing the number of parameters and computational load. Additionally, this architecture allows the U-Net network to process input images of any size, improving the model’s flexibility and practicality. Building upon the single-encoder structure of the traditional U-Net architecture, Bi-Unet introduces an additional encoder to receive information from two phases, passing the difference matrix to the decoder for feature fusion, thereby better detecting changes in the image.

As illustrated in Fig. 2, the proposed model incorporates the bi-temporal inputs of T1 and T2. Each encoder unit consists of three blocks, each comprising a 3 × 3 convolutional layer, batch normalization (BN) layer, rectified linear unit (ReLU) layer, and Dropout layer. Subsequent to the feature extraction process in the encoder, the extracted feature maps from the bi-temporal images were down-sampled through 2 × 2 max-pooling. During the decoder phase, the max-pooling results undergo transposed convolution before computing the difference between them and those obtained from block bi- temporal images in the encoder. This difference was concatenated with the transposed convolution results to serve as an input for block1. By incorporating skip connections within the Bi-Unet architecture, high-resolution information is integrated with low-resolution information to generate more intricate features while preserving the spatial and temporal details. Each decoder unit comprises three blocks consisting of a 3 × 3 convolutional layer, BN layer, ReLU layer, and Dropout layer. Finally, a final urban change map was derived at the end of our model using a 1 × 1 convolutional layer. The output of the model represents the changes observed between bi-temporal images.

Dense block

Since the introduction of ResNet¹⁹, two prominent trends have emerged in the backbone network structures: increased depth and width. Huang et al. proposed a technique called stochastic depth to train networks by randomly eliminating layers while ensuring convergence of the algorithm³⁶. To address issues pertaining to network redundancy, Huang et al. introduced a novel concept based on the stochastic depth of ResNet, known as the DenseNet³⁷. It builds upon the stochastic depth concept of ResNet. The core component of DenseNet is the Dense Block, whose primary objectives are to mitigate vanishing gradients, significantly reduce parameter count, enhance feature extraction, and promote feature reuse while maintaining a compact model size. In a Dense Block, multiple convolutional layers are densely connected, with each layer receiving input from all preceding layers. Specifically, for an N-layer Dense Block, the Nth layer’s input consists of concatenated feature maps from all previous N-1 layers along the channel dimension. This design facilitates smoother information flow and more efficient feature reuse. As the number of layers increases, the number of input feature maps for each layer grows accordingly. In an N-layered network, N*(N + 1)/2 connections facilitate seamless information flow and gradient propagation throughout the entire network structure. This characteristic not only simplifies the training, but also enhances the feature extraction performance.

The dense interconnections between the encoded and decoded units are shown in Fig. 3. Specifically, the input image is connected to block1, block2, block3, a max pooling layer, and subsequent units. Assume that the input image is denoted as ${x}_{0}$, the output of layer $l$ is represented by ${x}_{l}$, and the transformation performed by each dense convolutional block is denoted as ${H}_{l}\left(\right)$, where $l$ denotes the layer number. In a conventional feedforward neural network setting, we have ${x}_{l}={H}_{l}\left({x}_{l-1}\right)$. However, in ResNet, the equation becomes ${x}_{l}={H}_{l}\left({x}_{l-1}\right)+{x}_{l-1}$, whereas in DenseNet, it becomes ${x}_{l}={H}_{l}\left(\left[{x}_{0},{x}_{1},\dots\:\dots\:,{x}_{l-1}\right]\right)$, where [·] denotes the concatenation operation. In our proposed model design approach, we specifically employ ${H}_{l}\left(\right)$ consisting of a 3 × 3 convolutional operations, followed by a BN and ReLU activation function application. Due to the use of Dense Blocks, as the number of layers increases, the number of input feature maps gradually grows, leading to a significant increase in computational complexity per layer. This connection mode may result in issues such as an increase in network parameters, redundant backpropagation data, and higher GPU consumption during training. To address these challenges, we introduce a Dropout layer at the end of the dense connections, as depicted in Fig. 3. The Dropout layer randomly sets a subset of neurons’ outputs to zero, thereby reducing the computational load during training and mitigating neuron coadaptation, which enhances the model’s generalization ability and prevents overfitting.

LSTM block

The encoding unit employs a recurrent network (RNN) for feature map extraction. RNNs³⁸ have been widely adopted for handling time-dependent data because of them ability to capture temporal relationships among sequential features. LSTM³⁹, a sophisticated recurrent module comprising four interconnected neural networks, has been widely applied in diverse domains, such as motion trajectory prediction^40,41,42 and movement-related time-series forecasting^43,44,45. By incorporating four interacting layers to address long-term dependency issues, LSTM stands out among other types of RNNs. The gates of LSTM selectively incorporate or exclude information from the cell state through activation functions, thereby enabling precise control over the flow of information. Each step of the LSTM processing, an output denoted as h and a cell state represented by C are generated. The specific operations of LSTM can be summarized as follows:

To ascertain the information that can be transmitted through the cellular state for sequence X and input vector ${x}_{t}$ at time t, a pivotal initial step involves making a decision regulated by the forget gate utilizing the sigmoid activation function, as illustrated in (1).

$${f}_{t}={\upsigma\:}({W}_{f} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{f}).$$

(1)

where the sigmoid function ${\upsigma\:}$ is utilized with the weight matrix ${W}_{f}\:$employed by the forget gate, ${h}_{t-1}$ represents the previous output, ${x}_{t}$ denotes the current input, and ${b}_{f}$ indicates the bias. Equation (1) constrains the value of$\:{f}_{t}$ between 0 and 1, thereby determining the extent to which ${C}_{t-1}$ can be propagated.

The second step involves generating new information that requires updating and consists of two components. First, the input gate determines the values to be updated using a sigmoid activation function, as shown in (2). Second, a novel candidate value, ${\stackrel{\sim}{C}}_{t}$ is generated by the tanh function, as shown in (3). This candidate value can be incorporated into the cell state that is generated by the current layer.

$${i}_{t}={\upsigma\:}({W}_{i} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{i})$$

(2)

$${\stackrel{\sim}{C}}_{t}=tanh({W}_{c} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{c}).$$

(3)

where the sigmoid function$\:{\upsigma\:}$ and hyperbolic tangent function tanh are utilized, ${W}_{i}$ and ${W}_{c}$ represent the corresponding weight matrices, ${h}_{t-1}$ denotes the previous output, ${x}_{t}$ represents the current input, and ${b}_{i}$ and ${b}_{c}$ indicate the bias. Equation (4) was employed to combine the outputs from these two components to facilitate the cell state update.

$${C}_{t}={f}_{t}\text{*}{C}_{t-1}+{i}_{t}\text{*}{\stackrel{\sim}{C}}_{t}.$$

(4)

The final step involved determining the output of the model. Initially, the output was obtained by utilizing the output gate, as shown in (5). Subsequently, the tanh function was employed to rescale ${C}_{t}$ within the range of −1 to 1. This rescaled value is then multiplied element-wise by the output from (5), as shown in (6). Ultimately, ${h}_{t}$ represents the output of the LSTM.

$${o}_{t}={\upsigma\:}({W}_{0} \cdot \left[{h}_{t-1},{x}_{t}\right]+{b}_{o})$$

(5)

$${h}_{t}={o}_{t}{*}\text{t}\text{a}\text{n}\text{h}\left({C}_{t}\right).$$

(6)

The detailed processing steps of the three LSTM layer loop blocks within a single unit at any given time in the encoder are shown in Fig. 4. Each unit consists of three blocks, where each block receives a vector representation based on the time series as the input and is connected to the output of the preceding block. Block 1 takes $[{C}_{0},{h}_{0}]$ and ${x}_{1}$ as inputs and generates $[{C}_{1},{h}_{1}]$ as its output. Similarly, block 2 uses $[{C}_{1},{h}_{1}]$ and ${x}_{2}$ as inputs, yielding $[{C}_{2},{h}_{2}]$ as its output.

As depicted in Fig. 1, the input of the encoder for any bi-temporal sequence comprises five units, with each output serving as the input of the subsequent unit. Given that this model consists of multiple LSTM layers, the first layer receives the initial input data, whereas each successive layer uses the output of the predecessor as its own input. Ultimately, the final output vector corresponds to the final step of the last layer. Within this network architecture, a single vector representation encodes information regarding all geographical locations within the cell state. Consequently, although initially distinct across different input channels, their unique characteristics are subsequently blended into an LSTM cell-state vector and utilized as the final output.

Change detection

The proposed model presents a network architecture based on Bi-Unet that incorporates two streams of the same shared weights for the encoder. These streams are assigned to input bi-temporal images to enable more accurate change detection. To facilitate skip connections, the absolute difference between bi-temporal images was utilized. Each unit in both the encoder and decoder consisted of three blocks. To address potential feature overwriting and resolve long-term dependence issues in RNNs, LSTM networks were employed within each unit of the encoder. Furthermore, Dense Blocks are incorporated in both the encoder and decoder to enhance the feature transmission and extract additional change information. Specifically, each layer takes as input not only from all previous layers, but also ensures that subsequent layers can effectively utilize the extracted features.

In our model, the spatial details present in the early layers can complement the more abstract and comparative information encoded later, thereby leading to enhanced accuracy in predicting boundary changes in the output image. Importantly, we incorporated change detection or transfer learning from other datasets, enabling end-to-end training of the architectures. The OSCD and CD_Data_GZ datasets were used as the experimental data in this study. The input and output images were preserved with their original dimensions; the parameters of each block are listed in Table 1.

Table 1 Network structure of BiUnet-Dense.

Full size table

Results

Implementation details

We implemented BiUnet-Dense using the nn.Module in the PyTorch framework. All images were utilized without preprocessing and were inputted to their original dimensions. To augment the training sample diversity, shape adjustment, random flipping, and random rotation were incorporated into a size of 32 × 32 during each iteration. The hyperparameters were set as follows: 50 epochs, a batch size of 32, and a patch size of 96. The model was trained using an NVIDIA GeForce GTX 2080Ti and Intel(R) Xeon(R) CPU E5-2690 v4.

Dataset description

The OSCD dataset addresses the challenge of detecting temporal variations in multispectral satellite images captured by Sentinel-2 between 2015 and 2018⁴⁶. This dataset comprises 24 pairs of carefully selected multispectral image sets acquired from diverse locations worldwide, including Brazil, the USA, Europe, the Middle East, and Asia. Each location was represented by a set of multispectral images consisting of 13 bands obtained from the Sentinel-2 satellite. The spatial resolution ranges from 10 m to 60 m.

The CD_Data_GZ dataset was collected from 2006 to 2019, covering the suburban areas of Guangzhou, China⁴⁷. To facilitate the generation of image pairs, we utilized the Google Earth service within BIGEMAP to acquire 19 seasonal VHR image pairs with three spectral bands: red, green, and blue. This dataset has a spatial resolution of 0.55 m and dimensions ranging from 1006 × 1168 pixels to 4936 × 5224 pixels.

Quantitative evaluation metrics

To evaluate the performance of our proposed urban change detection algorithm and compare it with other methods, we employed a set of evaluation metrics. Prior to conducting an in-depth analysis of these metrics, it is necessary to define fundamental terminology. TP, FP, TN, and FN represent the numbers of true positives, false positives, true negatives, and false negatives, respectively. In the change detection module, TP refers to the accurately identified amount of data indicating change, whereas FP represents instances incorrectly classified as changed. TN represents accurately classified data indicating no change, whereas FN signifies erroneously identified data indicating no change. The quantitative evaluation metrics included precision, recall, F1-score, and the Kappa index. Precision quantifies the ability to avoid false detections by dividing the number of true positives by the total number of instances and is defined as follows:

$$Precision=\frac{TP}{TP+FP}.$$

(7)

Recall denotes the ratio of true positives to the total number of positive instances and serves as a metric for effectively minimizing false negatives. This is mathematically defined as follows:

$$Recall=\frac{TP}{TP+FN}.$$

(8)

The precision and recall metrics exhibited contrasting behaviors, indicating an inverse relationship between recall and precision. Conversely, an increase in recall results in a decrease in precision and vice versa. The F1-score is defined as the harmonic mean of recall and precision, rendering it a more comprehensive measure of the algorithm performance. This can be regarded as a descriptive indicator that encompasses both recall and precision.

$$F1-score=\frac{2\times\:Precision\times\:Recall}{Precision+Recall}.$$

(9)

The Kappa index⁴⁸ is a statistical measure used to evaluate the level of agreement among raters in classifying the results. It is widely recognized as a more robust metric than a simple calculation of agreement percentage, because it accounts for the possibility of chance agreement. The Kappa index ranged from 0 to 1, with higher values indicating superior performance of the classifier. By considering both the observed relative agreement between the classifier and ground truth ${p}_{o}$ and the hypothetical probability of chance agreement ${p}_{e}$, the Kappa index can be defined as:

$$Kappa=\frac{{p}_{o}-{p}_{e}}{1-{p}_{e}}.$$

(10)

where the ${p}_{o}$ and the ${p}_{e}$ are given by

$${p}_{o}=\frac{TP+TN}{TP+FP+TN+FN}$$

(11)

$${p}_{e}=\frac{\left(TP+FP\right)\left(TP+FN\right)+\left(TN+FP\right)(TN+FN)}{{(TP+FP+TN+FN)}^{2}}.$$

(12)

Experimental results

To evaluate the performance of BiUnet-Dense, we conducted a comparative analysis against some bi-temporal change detection methods, including fully convolutional Siamese concatenation (FC-Siam-conc)³⁵, dual-task constrained deep Siamese convolutional network (DTCDSCN)⁴⁹, the combination of Siamese network and NestedUNet (SNUNet)⁵⁰, bitemporal image transformer (BIT)⁵¹, attention-based multiscale transformer network (AMTNet)⁵², and transformer-based Siamese two-stream CD framework (ScratchFormer)⁵³. To ensure consistency, we implemented all five networks using the PyTorch framework and trained them on both datasets.

Quantitative results

Table 2 presents the parameters and four evaluation metrics for urban change detection in BiUnet-Dense along with the comparison bi-temporal methods. Precision and recall cover both the changed and unchanged features. The results show that our method performs well in multiple aspects.

It can be seen from Table 2 that in the models of comparative analysis, BiUnet-Dense demonstrates outstanding performance on both the OSCD and CD_Data_GZ datasets, ranking first in both F1 score and Kappa coefficient. Compared to FC-Siam-conc, DTCDSCN, SNUNet, BIT, AMTNet, and ScratchFormer in the OSCD dataset, BiUnet-Dense demonstrated a significant improvement in the F1-score by 19.54%, 16.44%, 16.26%, 15.16%, 15.65%, and 0.28%, respectively, while its Kappa increased by 19.56%, 17.31%, 17.03%, 15.25%, 15.45%, and 0.23%, respectively. On the CD_Data_GZ dataset, compared with the comparison models, the F1-score of BiUnet-Dense increased by 2.46%, 0.51%, 9.04%, 0.42%, 4.13%, and 0.64%, respectively, and the kappa increased by 1.56%, 11.03%, 9.52%, 3.1%, 2.56%, and 1.26%, respectively. Especially in the OSCD dataset, it performs outstandingly in the precision and recall of change categories, but its parameters and computational complexity are relatively large. ScratchFormer and BIT follow closely behind. The former strikes a balance between performance and computational load, while the latter performs exceptionally well in terms of the accuracy rate of change categories. DTCDSCN and SNUNet have high computational efficiency and are suitable for resource-constrained scenarios, but their performance is compromised. AMTNet and FC-Siam-conc each have their own advantages in different indicators. Therefore, in the case of increasingly abundant computer resources, BiUnet-Dense is the first choice for pursuing ultimate performance.

Table 2 Comparison results on the two change detection datasets.

Full size table

The evaluation matrices of all models in the training and testing phases are shown in Fig. 5, where (a)–(g) are the evaluation matrices on the OSCD dataset, and (h)–(n) are the evaluation matrices on the CD_Data_GZ dataset. It can be seen from the first row of Fig. 5 that on the 0SCD dataset, compared with BiUnet-Dense, although the precision, recall, F1-scoreh and Kappa of ScratchFormer are very close to the metrics on BiUnet-Dense, However, after the model training reaches the stable stage, there is still a problem in ScratchFormer that the precision and F1-score in the test stage are higher than those in the training stage. This overfitting performance can imply that the model has not fully learned the features of the training set. Or the regularization method was wrongly applied in the testing phase, resulting in the model performing better during the test. The longer the AMTNet model is trained, the more it shows a trend of being unfitting. The evaluation indicators of several models such as BIT, SNUNet, and FC-Siam-conc are relatively low. The DTCDSCN model not only has very low evaluation index values but also oscillates severely. Evidently, BiUnet-Dense consistently outperformed the other models in both the training and testing phases. In the training phase, precision, recall, F1-score, and Kappa were reported as 43.52%, 60.67%, 50.68%, and 49.53%, respectively, whereas in the testing phase, they were recorded as 41.98%, 54.29%, 47.34%, and 43.32%, respectively. Furthermore, the results from BiUnet-Dense exhibited smoother and more stable performance with similar growth rates across both the training and testing phases without notable discrepancies.

On the CD_Data_GZ dataset, compared with BiUnet-Dense, the precision, recall, F1-score, and Kappa of ScratchFormer, BIT, and FC-Siam-conc during the training stage were significantly higher than those on BiUnet-Dense. However, the values of these four indicators were very low during the testing stage of these models. It can be seen that overfitting has occurred in these three models. The remaining three models, AMTNet, SNUNet, and DTCDSCN, have been oscillating severely all the time, making the training impossible to fit. Therefore, BiUnet-Dense outperformed the other two models on both the train and test datasets. During the training phase, precision, recall, F1-score, and Kappa achieved values of 75.32%, 43.93%, 55.49%, and 43.97%, respectively. During the testing phase, these values were recorded as 57.98%, 33.64%, 42.56%, and 38.91%, respectively.

Qualitative results

Figure 6 shows the change detection outcomes of the various methods utilizing the three channels on the OSCD dataset. The OSCD dataset demonstrates a low resolution, with the ground truth marking all relative changes between the two time periods, encompassing modifications in buildings, roads, vegetation, water bodies, and other features. In this figure, white, black, green, and magenta represent TPs, TNs, FPs, and FNs, respectively. These results clearly indicate that all methods are influenced by variations in lighting conditions, as well as the spatial characteristics of buildings, such as rooftops. Moreover, they are affected by alterations in vegetation cover, bare soil, and water bodies. BiUnet-Dense exhibited limitations in accurately delineating boundaries, leading to the presence of numerous FP instances surrounding the predicted TPs. However, BiUnet-Dense demonstrated superior performance by detecting a greater number of TPs while maintaining high levels of accuracy compared to other methods. Although some FPs persist within their results, their frequency is lower than that observed with alternative approaches and is primarily concentrated around the predicted TPs. Furthermore, BiUnet-Dense exhibits fewer FN predictions than the other detection techniques employed herein. Notably, resilient against unrelated urbanization changes, it successfully identified a substantial number of alterations occurring within buildings and roads. Although ScratchFormer and AMTNet also correctly identified most of the TPs, compared with BiUnet-Dense, there are more FNs in the detection results of ScratchFormer, and there are many FPs around the TPs detected by AMTNet. This might be due to the problem of local feature dilution in the global context modeling of the transformer architecture of ScratchFormer, which leads to the accumulation of FNs. Although AMTNet enhances semantic association, in dense prediction scenarios, the cross-attention branches it designs may introduce feature pollution, causing the spread of FPs. The remaining three models, BIT, SNUNet and DTCDSCN, adopted multi-scale feature fusion or edge enhancement modules, thus achieving excellent results in the aspect of change detection boundaries. However, almost all of them identified all the true examples as false counterexamples.

The change detection results of the different methods on the CD_Data_GZ dataset are presented in Fig. 7. Because the dataset exhibited a higher resolution, with ground-truth annotations limited to building changes only. All methods demonstrated precise boundary delineation in their prediction results. Particularly, BiUnet-Dense performs better in the detection of continuous and dense fine changes. Almost all models wrongly classify the dark building roofs resulting from changes as FNs. The models overly rely on the characteristics of the surrounding environment. When the dark roof is in a heterogeneous area, the context semantic bias will suppress the change detection. Among these approaches, BiUnet-Dense achieved superior performance by identifying the highest number of TPs and detecting subtle changes in both buildings and roads that were not annotated in the ground truth data. Consequently, all roads identified by BiUnet-Dense were marked as FPs because of their absence from the provided annotations. In urban change detection tasks, our primary focus is to capture changes, specifically in buildings and roads. Although some unrelated changes may be detected as FPs, BiUnet-Dense demonstrates greater robustness towards urbanization-related alterations.

Discussion

Evaluation of the Bi-Unet

The dual-encoder architecture employed by Bi-Unet provides two parallel feature extraction pathways for U-Net networks. Each pathway independently processes the input image, extracting distinct feature sets and computing their differences to generate a difference matrix. This parallel processing approach enables the model to capture more diverse image information, thereby enhancing the diversity of feature extraction. Compared with single-encoder U-Net networks, the dual-encoder architecture facilitates the formation of multiple feature representations during the encoding process. Each encoder branch can learn unique feature maps that are effectively utilized to recover image details and delineate boundaries during decoding. By integrating these diverse feature representations, the model can produce more accurate and detailed segmentation results. Additionally, the introduction of the dual-encoder structure enhances the model’s generalization capability. Since the two encoder branches extract features independently, which are subsequently fused and leveraged in later network layers, the model demonstrates improved adaptability to various types of images and segmentation tasks. Consequently, this design allows the model to perform robustly even when processing unseen images.

As shown in Table 3, the application of the dual-encoder model resulted in significant improvements in the OSCD dataset, with precision_change, recall_change, precision_no_change, F1_score, and Kappa increasing by 8.61%, 31.52%, 1.55%, 19.77%, and 20.05%, respectively. On the CD_Data_GZ dataset, these metrics improved by 1.55%, 3.16%, 0.38%, 2.4%, and 3.38%, respectively. These enhancements can be attributed to the diverse feature representations generated by the dual-encoder, leading to more accurate and detailed segmentation results. In urban change detection tasks, the unchanged areas typically constitute a larger proportion compared to changed areas, potentially causing an imbalance between positive and negative samples. The introduction of dual encoders may have exacerbated this imbalance by extracting features from multiple dimensions, thereby focusing more on changed regions and less on unchanged regions. Consequently, the recall_no_change metric decreased by 1.39%.

Table 3 Evaluation results of architectures with dual-encoder and single-encoder.

Full size table

Evaluation of the dense block

In this section, we investigate the impact of the modified Dense Block that facilitates feature reuse by establishing connections between the feature maps across network layers. A comparative analysis of our approach, with and without a Dense Block, is presented in Table 4. After incorporating the modified Dense Block into the model architecture, the precision_change, recall_change, precision_no change, F1_score, and Kappa coefficient of the OSCD were improved by 2.88%, 20.61%, 1.02%, 10.87%, and 10.36%, respectively. On the CD_Data_GZ dataset, the precision_change, F1_score, and Kappa coefficient were improved by 15.53%, 14.64%, and 12.59%. The Dense Block leverages feature multiplexing and gradient flow to enhance the network’s ability to learn feature representations in change detection tasks more effectively. This improvement allows the model to identify areas of change more accurately, thereby enhancing precision. Simultaneously, the densely connected design enables the network to capture change information more comprehensively, reducing the likelihood of missed detections. Consequently, both precision and recall are improved, leading to a higher F1_score. The Dense Block thus improves the model’s performance in classification tasks by refining feature representation and gradient flow, which enhances the accuracy of identifying changed and unchanged areas. However, it should be noted that the use of Dense Blocks also increased FPs, resulting in a slight reduction in recall_no_change by 1.1% for OSCD. On the CD_Data_GZ dataset, recall_change, precision_no_change, and recall_no_change decreased by 2.17%, 3.42%, and 1.11%, respectively.

Table 4 Evaluation results of architectures with and without dense block.

Full size table

Evaluation of the LSTM block

In this section, we investigate the role of the LSTM block in computing the temporal relationship between the bi-temporal images. A comparative analysis of our approach, with and without the LSTM block, is presented in Table 5. On the OSCD dataset, the incorporation of LSTM blocks led to improvements in precision_change, recall_change, precision_no change, F1_score, and Kappa by 0.47%, 9.54%, 0.47%, 4.16%, and 4.96%, respectively. On the CD_Data_GZ dataset, the precision_change, recall_change, F1_score, and Kappa were improved by 6.74%, 19.87%, 7.84%, and 6.03%, respectively. The LSTM’s ability to capture long-term dependencies in data is particularly advantageous for change detection tasks, where changes are often not isolated but correlated with prior and subsequent data points in the time series. Through its unique gating mechanism, LSTM can effectively learn and utilize these long-term dependencies to more accurately identify change points, thereby enhancing precision. This gating mechanism allows LSTM to flexibly adjust the retention of past information, which aids in identifying relevant historical data for current changes, thus improving recall. Consequently, improvements in both precision and recall lead to enhanced F1_score and Kappa values. However, it should be noted that while the LSTM block enhances FP detection capability, it also led to a slight decrease (0.6%) in recall_no_change on the OSCD dataset; on the CD_Data_GZ dataset, precision_no_change and recall_no_change decreased by 2.36% and 3.4%, respectively.

Table 5 Evaluation results of architectures with and without LSTM block.

Full size table

Conclusion

This paper presents a novel learning framework for urban change detection that integrates a modified Dense Block and an LSTM block within a Bi-Unet architecture. Quantitative and qualitative analyses demonstrated the challenges associated with urban change detection, including significant disparity errors, diverse types of changes, and substantial misalignment between changed and unchanged areas. The experimental results reveal that change detection is influenced by numerous FPs resulting from labeling errors, lighting variations, or other unrelated changes. Our model significantly enhanced performance compared to single-task change-detection frameworks by incorporating bi-temporal tasks. Moreover, both the modified dense and LSTM blocks contribute to improved change detection by employing more stringent formulas. The Dense Block serves as an effective feature extractor that combines identity mapping attributes, depth supervision mechanisms, and diversified depths to facilitate feature reuse throughout the network for more efficient learning processes that aid in reducing FP detection. In contrast, the LSTM block provides an efficient aggregation strategy for skip connections by encoding information from multiple timestamps to generate more meaningful temporal feature vectors. The method proposed in our study demonstrates a substantial reduction in FPs, along with significant improvements in the F1-score and Kappa, thereby successfully identifying labeling errors in both examined datasets.

In future research, we aim to enhance generalization by employing multitask learning with multi-temporal images. The utilization of multitask learning⁵⁴ can improve learning performance through data amplification, attribute selection, representation bias management, and the prevention of overfitting. We integrated multi-temporal images into multiple tasks and simultaneously learned multiple related tasks using multitask learning. During the learning process, a shallow shared representation is employed to facilitate knowledge sharing and complementarity among domains, thereby promoting mutual learning and enhancing generalization.

Determining the initial manifestation of urban change is highly valuable for change detection, and can provide more comprehensive insights into the frequency of urbanization. In this study, we employed an LSTM network to investigate internal activation operations, encompassing both intra-departmental processes and information stored within cell states. As emphasized by Lyu et al.⁵⁵, the cell state effectively preserves crucial information across consecutive time steps, thereby encapsulating alterations in the current sequence image pair. By appropriately fine-tuning our approach, it may be feasible to discern the changes after each temporal increment and accurately track their precise onset.

Finally, we strive to optimize the preservation and shape fidelity of the detected object while minimizing false positives that are in close proximity to true positives.

Data availability

Data publicly available in a repository: The dataset of Onera Satellite Change Detection (OSCD) is available at https://ieee-dataport.org/open-access/oscd-onera-satellite-change-detection#files.The dataset of Change Detection Data of Guangzhou (CD_Data_GZ) is available at https://github.com/daifeng2016/Change-Detection-Dataset-for-High-Resolution-Satellite-Imagery. Data available with the paper or supplementary information: The authors declare that the data supporting the findings of this study are available within the paper. Should any raw data files be needed in another format they are available from the corresponding author upon reasonable request. Source data are provided with this paper. Example from: https://www.kaggle.com/code/wanghaiying/urban-change-detection-of-remote-sensing-images.

References

Li, M. et al. A review of remote sensing image classification techniques: The role of spatio-contextual information. Eur. J. Remote Sens. 47, 389–341 (2014).
Article Google Scholar
Pal, M. & Mather, P. M. Support vector machines for classification in remote sensing. Int. J. Remote Sens. 26, 1007–1011 (2005).
Article Google Scholar
Li, D. R. Change detection from remote sensing images. Geomat. Inform. Sci. Wuhan Univ. 28, 7–12 (2003).
Google Scholar
Sui, H. G., Feng, W. Q. & Li, W. Z. Review of change detection methods for Multi-temporal remote sensing imagery. Geomat. Inform. Sci. Wuhan Univ. 43, 1885–1898 (2018).
Google Scholar
Chen, J. et al. Land-use/land-cover change detection using improved change-Vector analysis. Photogram. Eng. Remote Sens. 4, 369–379 (2003).
Article Google Scholar
Ridd, M. K. & Liu, J. J. A comparison of four algorithms for change detection in an urban environment. Remote Sens. Environ. 63, 98–100 (1998).
Article ADS Google Scholar
Wu, K., He, T. & Yang, Y. T. Change detection method based on pixel unmixing and EM algorithm for low and medium resolution remote sensing imagery. Geomat. Inform. Sci. Wuhan Univ. 44, 555–562 (2019).
Google Scholar
Hu, J. R. & Zhang, Y. Z. Seasonal change of land-use/land-cover (LULC) detection using MODIS data in rapid urbanization regions: A case study of the Pearl river Delta region (China). IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 6, 1913–1920 (2013).
Article ADS Google Scholar
Walter, V. Object-Based classification of remote sensing data for change detection. ISPRS J. Photogram. Remote Sens. 58, 225–238 (2004).
Article ADS Google Scholar
Ashish, G., Niladri, S. M. & Susmita, G. Fuzzy clustering algorithms for unsupervised change detection in remote sensing images. Inf. Sci. 181, 699–715 (2011).
Article Google Scholar
Lv, P. Y. et al. Unsupervised change detection based on hybrid conditional random field model for high Spatial resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 56, 4002–4015 (2018).
Article ADS Google Scholar
Jiang, X. D. et al. Convolutional Neural Network Features Based Change Detection in Satellite Images. First International Workshop on Pattern Recognition 100110 W. (2016).
Iino, S. et al. CNN-Based generation of High-Accuracy urban distribution maps utilising SAR satellite imagery for short-term change monitoring. Int. J. Image Data Fusion. 9, 302–318 (2018).
Article ADS Google Scholar
Gu, W., Lv, Z. H. & Hao, M. Change detection method for remote sensing images based on an improved Markov random field. Multimedia Tools Appl. 76, 17719–17734 (2017).
Article Google Scholar
Cao, G. L., Zhou, C. & Li, Y. P. A new change-detection method in high-resolution remote sensing images based on a conditional random field model. Int. J. Remote Sens. 37, 1173–1189 (2016).
Article Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Article Google Scholar
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition., Available online: Very deep convolutional networks for large-scale image recognition., Available online: (2014). https://doi.org/10.48550/arXiv.1409.1556
Szegedy, C. et al. Inception-v4, Inception-ResNet and the impact of residual connections on learning., Available online: Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning., Available online: (2016). https://doi.org/10.48550/arXiv.1602.07261
He, K. et al. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA · United States, 26 June-1 July 2016.
Zhang, Y. H. et al. June. Fully convolutional adaptation networks for semantic segmentation. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA · United States 18–23 (2018).
Ronneberger, O., Fischer, P., Brox, T. & U-Net Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October (2015).
Chen, L. C. et al. Semantic image segmentation with deep convolutional Nets and fully connected Crfs. Comput. Sci. 4, 357–361 (2014).
Google Scholar
Chen, L. C. et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018).
Article PubMed Google Scholar
Chen, L. C. et al. Rethinking atrous convolution for semantic image segmentation., Available online: Rethinking atrous convolution for semantic image segmentation., Available online: (2017). https://doi.org/10.48550/arXiv.1706.05587
Chen, L. C. et al. September. Encoder-decoder with atrous separable convolution for semantic image segmentation. Computer Vision - ECCV 2018 (Lecture Notes in Computer Science), Munich, Germany, 8–14 (2018).
Lin, T. Y., Dollar, P. & Girshick, R. Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern, Recognition (CVPR), Hawaii, USA, 21–26 (2017).
Liu, J. M. et al. A deep convolutional coupling network for change detection based on heterogeneous optical and radar images. IEEE Trans. Neural Netw. Learn. Syst. 29, 545–559 (2018).
Article ADS MathSciNet PubMed Google Scholar
Amirkolaee, H. A. & Arefi, H. 3D change detection in urban areas based on DCNN using A single image. Int. Arch. Photogram. Remote Sens. Spat. Inform. Sci. 42, 89–95 (2019).
Article Google Scholar
D’Addabbo, A. et al. Urban Crowd change detection from VHR images acquired by UAV via deep features exploitation. Remote Sensing Technologies and Applications in Urban Environments 11864. (2021).
Basavaraju, K. S. et al. UCDNet: A deep learning model for urban change detection from Bi-Temporal multispectral Sentinel-2 satellite images. IEEE Trans. Geosci. Remote Sens., 60, 1–10. (2022).
Article Google Scholar
Jinzhu, W. et al. Simulating large-scale urban land-use patterns and dynamics using the U-Net deep learning architecture. Comput. Environ. Urban Syst. 97, 101855 (2022).
Article Google Scholar
Iris, D. G., Sébastien, L. & Thomas, C. Siamese KPConv: 3D multiple change detection from Raw point clouds using deep learning. ISPRS J. Photogram. Remote Sens. 197, 274–291 (2023).
Article Google Scholar
Mou, L. H., Bruzzone, L. & Zhu, X. X. Learning spectral-spatial-temporal features via a recurrent convolutional neural network for change detection in multispectral imagery. IEEE Trans. Geosci. Remote Sens. 57, 924–935 (2019).
Article ADS Google Scholar
Rahman, F. et al. Siamese network with multi-level features for patch-based change detection in satellite imagery. IEEE Global Conference on Signal and Information Processing, Anaheim, California, USA, 26 November-1 December (2018).
Daudt, R. C., Saux, B. L. & Boulch, A. Fully convolutional siamese networks for change detection. IEEE International Conference on Image Processing, , Athens, Greece, 7–10, October (2018).
Huang, G. et al. October. Deep Networks with Stochastic Depth. Computer Vision- ECCV 2016 (Lecture Notes in Computer Science), Amsterdam, the Netherlands 11–14 (2016).
Huang, G., Liu, Z. & Maaten, L. V. D. Densely connected convolutional networks. IEEE Conference on Computer Vision and Pattern, Recognition, (CVPR), Hawaii, USA, 21–26 (2017).
Hochreiter, S. The vanishing gradient problem during learning recurrent neural Nets and problem solutions. Int. J. Uncertain. Fuzziness Knowledge-Based Syst. 6, 107–111 (1998).
Article Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article CAS PubMed Google Scholar
Dai, S. Z., Li, L. & Li, Z. H. Modeling vehicle interactions via modified Lstm models for trajectory prediction. IEEE Access 7, 38287–38296 (2019).
Article Google Scholar
Crivellari, A. & Beinat, E. LSTM-based deep learning model for predicting individual mobility traces of short-term foreign tourists. Sustainability 12, 349–367 (2020).
Article Google Scholar
Cheng, B., Xu, X. & Zeng, Y. J. Pedestrian trajectory prediction via the social-grid Lstm model. J. Eng. 16, 1468–1474 (2018).
Google Scholar
Mitrokhin, V. E., Bashkov, I. N., Mobile traffic prediction from & raw data using lstm networks. International Scientific and Technical Conference Radio Engineering, Electronics and Communication, Macau, China, 1–8 (2019).
Yang, D. et al. Urban rail transit passenger flow forecast based on Lstm with enhanced long-term features. Intell. Transp. Syst. 13, 1475–1482 (2019).
Article Google Scholar
Pasini, K. et al. LSTM encoder-predictor for short-term train load forecasting. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Würzburg, Germany, 16–20 September (2019).
Daudt, R. C. et al. Urban change detection for multispectral earth observation using convolutional neural networks. IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 (2018).
Peng, D. F. et al. SemiCDNet: A semisupervised convolutional neural network for change detection in high resolution remote-sensing images. IEEE Trans. Geosci. Remote Sens. 59, 5891–5906 (2021).
Article ADS Google Scholar
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960).
Article Google Scholar
Liu, Y. et al. Building change detection for remote sensing images using a dual-task constrained deep Siamese convolutional network model. IEEE Geosci. Remote Sens. Lett. 18, 811–815 (2020).
Article ADS Google Scholar
Fang, S. et al. Snunet-cd: A densely connected Siamese network for change detection of Vhr images. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2021).
Google Scholar
Chen, H., Qi, Z. & Shi, Z. Remote sensing image change detection with Transformers. IEEE Trans. Geosci. Remote Sens. 60, 1–14 (2022).
CAS Google Scholar
Liu, Y. et al. An attention-based multiscale transformer network for remote sensing image change detection. ISPRS J. Photogram. Remote Sens. 202, 599–609 (2023).
Article ADS Google Scholar
Noman, M. et al. Remote sensing change detection with transformers trained from scratch., Available online: Remote sensing change detection with transformers trained from scratch., Available online: (2023). arXiv:2304.06710v1.
Caruana, R. Multitask Learning, Ph. D. (Thesis Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, 1997).
Lyu, H. B., Lu, H. & Mou, L. Learning a transferable change rule from a recurrent neural network for land cover change detection. Remote Sens. 8, 506–528 (2016).
Article ADS Google Scholar

Download references

Author information

Authors and Affiliations

College of Earth Science, China West Normal University, Nanchong, 637000, China
Haiying Wang
School of Mathematics and Information, China West Normal University, Nanchong, 637000, China
Mingzhong Wu

Authors

Haiying Wang
View author publications
Search author on:PubMed Google Scholar
Mingzhong Wu
View author publications
Search author on:PubMed Google Scholar

Contributions

Haiying Wang wrote the main manuscript text and Mingzhong Wu prepared the calculation and system implementation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Haiying Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, H., Wu, M. Urban change detection of remote sensing images via deep-feature extraction. Sci Rep 15, 21473 (2025). https://doi.org/10.1038/s41598-025-07252-7

Download citation

Received: 25 October 2024
Accepted: 13 June 2025
Published: 01 July 2025
Version of record: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-07252-7

Subjects

Abstract

Similar content being viewed by others

Spatial temporal fusion based features for enhanced remote sensing change detection

Hybrid lightweight transformer for efficient landslide change detection in remote sensing imagery

Multitask semantic change detection guided by spatiotemporal semantic interaction

Introduction

Methods

Bi-Unet

Dense block

LSTM block

Change detection

Results

Implementation details

Dataset description

Quantitative evaluation metrics

Experimental results

Quantitative results

Qualitative results

Discussion

Evaluation of the Bi-Unet

Evaluation of the dense block

Evaluation of the LSTM block

Conclusion

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links