Abstract
Semantic segmentation of high-resolution Unmanned Aerial Vehicle (UAV) remote sensing images plays a crucial role in environmental monitoring, urban planning, agricultural assessment, and disaster management. Semantic segmentation methods that are based on deep learning have demonstrated superior performance; however, they rely on large amounts of annotated data, and thus their performance significantly degrades in small-sample scenarios. To obtain better performance on small-scale remote sensing semantic segmentation datasets, methods combining knowledge distillation and semi-supervised learning are proposed. These methods use models pre-trained on large-scale natural image datasets (such as ImageNet) to guide the training of student models on target datasets directly, achieving significant performance gains. However, the feature distribution of natural image datasets differs significantly from that of remote sensing image datasets. Therefore, student models, directly guided by teacher models pre-trained on natural image datasets, often struggle to obtain the optimal performance, especially when few samples are labeled in the target domain. Whether introducing a medium-scale remote sensing dataset as an intermediate domain between natural image datasets and the target remote sensing dataset can further improve model performance is a question worth exploring. This study proposed a few-shot remote sensing image semantic segmentation method that combined multi-stage knowledge distillation (MKD) and semi-supervised learning (SSL) to progressively bridge domain gaps and leverage unlabeled data. The experimental results on the Erhai UAV dataset (EH) show that the proposed MKD + SSL method achieves a mean IoU of 77.05% with only 880 labeled samples, outperforming the widely used single-stage KD method by + 3.06% mIoU, with per-class IoU gains up to +(2.17% − 5.21%). On the Cityscapes benchmark, our framework further surpasses state-of-the-art methods such as UniMatch, achieving a + 1.5% and + 1.4% improvement in mIoU under 1/16 and 1/8 labeled settings, respectively. These results demonstrate that the proposed method effectively enhances segmentation accuracy in few-shot settings and generalizes well across diverse datasets, with wide practical value.
Data availability
The model codes and additional meta-data can be accessed on github (https://github.com/Mrjianghanlin/A-Few-Shot-High-Resolution-Remote-Sensing-Image-Semantic-Segmentation-Method3). The research utilizes three key datasets: (A) The UAV remote sensing data of the EH dataset can be accessed on Google Drive (https://drive.google.com/drive/folders/14YghGdWH-DzJy3sfTEQy0Tj3zhJtwd7Y?usp=sharing). (B) The high-resolution remote sensing images of the HW dataset are available on AI Studio (https://aistudio.baidu.com/datasetdetail/54302/0).C) Urban scene images of the Cityscapes dataset are freely accessible on the official website (https://www.cityscapes-dataset.com/). If you are unable to access any of these datasets, please contact the Han-Lin Jiang author (email: 15691552855@163.com) for assistance.
References
Si, B. et al. ABNet: An aggregated backbone network architecture for fine landcover classification. Remote Sens. 16(10), 1725 (2024).
Khan, B. A. & Jung, J.-W. Semantic segmentation of aerial imagery using U-Net with self-attention and separable convolutions. Appl. Sci. 14(9), 3712 (2024).
Lu, Y. et al. Multi-dimensional manifolds consistency regularization for semi-supervised remote sensing semantic segmentation. Knowl. Based Syst. 299, 112032 (2024).
Shen, X. et al. Multi-scale feature aggregation network for semantic segmentation of land cover. Remote Sens. 14(23), 6156 (2022).
He, K. et al. Mask r-cnn. In Proceedings of the Proceedings of the IEEE international conference on computer vision (2017).
He, K. et al. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016).
Huang, G. et al. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017).
Lin, T-Y. et al. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017).
Zheng, G. et al. Deep semantic segmentation of unmanned aerial vehicle remote sensing images based on fully convolutional neural network. Front. Earth Sci. 11, 1115805 (2023).
Li, G. et al. Adaptive prototype learning and allocation for few-shot segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021).
Xie, G-S. et al. Scale-aware graph neural network for few-shot semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021).
Yang, B. et al. Prototype mixture models for few-shot semantic segmentation. In Proceedings of the Computer Vision–ECCV. : 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, F, 2020 (Springer, 2020).
Yuan, X., Shi, J. & Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 169, 114417 (2021).
Li, Y. et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS J. Photogramm. Remote Sens. 175, 20–33 (2021).
Cordts, M. et al. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016).
Lyu, Y. et al. UAVid: A semantic segmentation dataset for UAV imagery. ISPRS J. Photogramm. Remote Sens. 165, 108–119 (2020).
Chen, Z. et al. Road extraction in remote sensing data: A survey. Int. J. Appl. Earth Obs. Geoinf. 112, 102833 (2022).
Liu, P. et al. Survey of road extraction methods in remote sensing images based on deep learning. PFG–J. Photogramm., Remote Sens. Geoinf. Sci. 90(2), 135–59 (2022).
Hung, W-C. et al. Adversarial learning for semi-supervised semantic segmentation. arXiv preprint arXiv:180207934 (2018).
Chen, H. et al. SemiRoadExNet: A semi-supervised network for road extraction from remote sensing imagery via adversarial learning. ISPRS J. Photogramm. Remote Sens. 198, 169–183 (2023).
Tan, B. et al. Transitive transfer learning. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015).
Recht, B. et al. Do imagenet classifiers generalize to imagenet? In Proceedings of the International conference on machine learning (PMLR, 2019).
Marmanis, D. et al. Deep learning earth observation classification using ImageNet pretrained networks. IEEE Geosci. Remote Sens. Lett. 13(1), 105–109 (2015).
Hoyer, L. et al. Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation. Int. J. Comput. Vis. 131(8), 2070–96 (2023).
Gadiraju, K. K. & Vatsavai, R. R. Comparative analysis of deep transfer learning performance on crop classification. In Proceedings of the 9th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (2020).
Hinton, G. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:150302531 (2015).
Wang, Y. et al. Intra-class feature variation distillation for semantic segmentation. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, F, 2020. (Springer, 2020).
Park, S. & Heo, Y. S. Knowledge distillation for semantic segmentation using channel and spatial correlations and adaptive cross entropy. Sensors 20(16), 4616 (2020).
Liu, Y. et al. Structured knowledge distillation for semantic segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2019).
Brostow, G. J., Fauqueur, J. & Cipolla, R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2009).
Everingham, M. et al. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2010).
Chen, L-C. et al. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587 (2017).
Cui, B., Chen, X. & Lu, Y. Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access 8, 116744–55 (2020).
Vu, T-H. et al. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2019).
Laine, S. & Aila, T. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:161002242 (2016).
Desai, S. & Ghose, D. Active learning for improved semi-supervised semantic segmentation in satellite images. Proceedings of the IEEE/CVF winter conference on applications of computer vision (2022).
Mittal, S., Tatarchenko, M. & Brox, T. Semi-supervised semantic segmentation with high-and low-level consistency. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1369–79 (2019).
Zheng, Y. et al. Semi-supervised adversarial semantic segmentation network using transformer and multiscale convolution for high-resolution remote sensing imagery. Remote Sens. 14(8), 1786 (2022).
Wang, Y. et al. Learning pseudo labels for semi-and-weakly supervised semantic segmentation. Pattern Recogn. 132, 108925 (2022).
Tuia, D., Persello, C. & Bruzzone, L. Recent advances in domain adaptation for the classification of remote sensing data. arXiv preprint arXiv:210407778 (2021).
Li, M. et al. Cross-domain and cross-modal knowledge distillation in domain adaptation for 3d semantic segmentation. In Proceedings of the 30th ACM International Conference on Multimedia (2022).
Zhou, W. et al. Graph attention guidance network with knowledge distillation for semantic segmentation of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing https://doi.org/10.1109/TGRS.2023.3311480 (2023).
Yuan, J. et al. FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024).
Zhang, C. et al. Multitask GANs for semantic segmentation and depth completion with cycle consistency. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5404–15 (2021).
Sajjadi, M., Javanmardi, M. & Tasdizen, T. Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv. Neural Inf. Process. Syst. 29 (2016).
Tarvainen, A. & Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst. 30 (2017).
Yang, L. et al. St++: Make self-training work better for semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022).
Zhai, X. et al. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision (2019).
Zhou, W. et al. Graph attention guidance network with knowledge distillation for semantic segmentation of remote sensing images. IEEE Trans. Geosci. Remote Sens. 61, 1–15 (2023).
Sohn, K. et al. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Adv. Neural. Inf. Process. Syst. 33, 596–608 (2020).
Zhang, B. et al. Semi-supervised semantic segmentation network via learning consistency for remote sensing land-cover classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2, 609–15 (2020).
Li, J. et al. Semisupervised semantic segmentation of remote sensing images with consistency self-training. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021).
Chen, C. et al. Semantic-aware generative adversarial nets for unsupervised domain adaptation in chest x-ray segmentation. In Proceedings of the International workshop on machine learning in medical imaging (Springer, 2018).
Kim, J. et al. Structured consistency loss for semi-supervised semantic segmentation. arXiv preprint arXiv:200104647 (2020).
Zhang, H. et al. Noise-robust consistency regularization for semi-supervised semantic segmentation. Neural Networks 184, 107041 (2025).
Yuan, J. et al. Semi-supervised semantic segmentation with mutual knowledge distillation. In Proceedings of the Proceedings of the 31st ACM international conference on multimedia (2023).
Chen, D., Ma, A. & Zhong, Y. Semi-supervised knowledge distillation framework for global-scale urban man-made object remote sensing mapping. Int. J. Appl. Earth Obs. Geoinf. 122, 103439 (2023).
Ma, W., Karakuş, O. & Rosin, P. L. Knowledge distillation for road detection based on cross-model semi-supervised learning. In Proceedings of the IGARSS 2024–2024 IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2024).
Song, J. et al. RS-MTDF: Multi-Teacher Distillation and Fusion for Remote Sensing Semi-Supervised Semantic Segmentation. arXiv preprint arXiv:250608772 (2025).
Datatang Huawei Cloud Cup 2019 China Internet+ College Student Innovation and Entrepreneurship Competition. Datatang (2023).
Lin, H. et al. A multi-task consistency enhancement network for semantic change detection in HR remote sensing images and application of non-agriculturalization. Remote Sens. 15(21), 5106 (2023).
Zhu, J. & Gao, N. Entropy teacher: Entropy-guided pseudo label mining for semi-supervised small object detection in panoramic dental X-Rays. Electronics 14(13), 2612 (2025).
Yang, Q. et al. Interactive self-training with mean teachers for semi-supervised object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021).
Shrivastava, A., Gupta, A. & Girshick, R. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016).
Yang, L. et al. Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2023).
Ouali, Y., Hudelot, C. & Tami, M. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020).
Ke, Z. et al. Guided collaborative training for pixel-wise semi-supervised learning. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, F, 2020 (Springer, 2020).
Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–605 (2008).
Funding
This study was supported by the National Natural Science Foundation of China (32260131, 31960119, 62262001), the Yunnan Young and Middle-aged Academic and Technical Leaders Reserve Talent Project in China (202405AC350023, 202205AC160001), and the Scientific Research Fund project of the Education Department of Yunnan Province of China (2025Y1250, 2024Y850).
Author information
Authors and Affiliations
Contributions
H.-L.J. and N.W. designed the study and wrote the manuscript. B.G. and Z.-K.L. performed the experiments. R.-H.W. analyzed the data. X.-W.L. and M.Z. wrote the manuscript. B.-H.C. and E.-M.Z. prepared figures. G.-P.R. supervised the project. X.-W.L. E.-M.Z, and D.-Q.Y. Funding Acquisition. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jiang, HL., Wang, N., Geng, B. et al. A few-shot high-resolution remote sensing image semantic segmentation method. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46887-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-46887-y