Fig. 1

Structural diagram of the modified Pyramid Vision Transformer (PVT-v2) encoder. The architecture processes the input through overlapping patch embedding, hierarchical transformer blocks, and MHSA mechanisms. The figure highlights the four progressive stages (f p 1 to f p 4) with varying feature dimensions, which enables the model to extract hierarchical representations for precise change detection. The satellite image on the left side is from the Levir-MCI dataset66.