Abstract
Hyperspectral images (HSIs) are renowned for their rich spatial and spectral information, which is crucial for accurate classification. The acquisition of discriminative spectral-spatial features plays a pivotal role in determining classification results. While convolutional neural networks (CNNs) have demonstrated remarkable performance in HSI classification, increasing network depth can lead to performance degradation. Furthermore, their fixed scale and limited receptive field restrict the ability to capture long-range dependencies, hindering effective feature learning and, consequently, affecting the generalization capability of the framework. This paper presents a novel HSIs classification framework, MTSA-Net, which integrates a multiscale transformer with a spatial attention mechanism, resulting in a more robust, flexible, and high-performing approach. Initially, the proposed framework utilizes 3-D and 2-D convolution layers, followed by spatial attention to prioritize and focus on the most critical spatial features. These enhanced features are then passed through multiscale transformer encoders to capture local and global representations, effectively modeling long-range dependencies. Finally, a feature fusion module combines features extracted at varying scales, leading to a more robust and comprehensive feature representation for final classification. Extensive experiments on five widely used benchmark HSIs datasets demonstrate that the proposed MTSA-Net method outperforms state-of-the-art approaches, particularly with limited training samples. The overall accuracies of 98.84%, 98.77%, 99.80%, 97.84%, and 95.87% are achieved on the Indian Pines, Pavia University, Salinas Valley, Houston-13, and Houston-18 datasets, respectively. The source code for this work will be accessible at https://github.com/irfan01000 for reproducibility.
Similar content being viewed by others
Data availability
The datasets analyzed during this research are available at: https://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Sceneshttps://machinelearning.ee.uh.edu/2013-ieee-grss-data-fusion-contest/https://machinelearning.ee.uh.edu/2018-ieee-grss-data-fusion-challenge-fusion-of-multispectral-lidar-and-hyperspectral-data/
References
Li, S. et al. Deep learning for hyperspectral image classification: An overview. IEEE Transactions on Geoscience and Remote Sensing 57(9), 6690–6709. https://doi.org/10.1109/TGRS.2019.2907932 (2019).
Vaddi, R. & Manoharan, P. Hyperspectral image classification using cnn with spectral and spatial features integration. Infrared Physics & Technology 107, 103296 (2020).
Farooque, G., Xiao, L., Yang, J. & Sargano, A. B. Hyperspectral image classification via a novel spectral-spatial 3d convlstm-cnn. Remote Sensing 13(21), 4348 (2021).
Roy, S. K., Krishna, G., Dubey, S. R. & Chaudhuri, B. B. Hybridsn: Exploring 3-d-2-d cnn feature hierarchy for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters 17(2), 277–281 (2019).
Wang, C., Zhang, P., Zhang, Y., Zhang, L. & Wei, W.: A multi-label hyperspectral image classification method with deep learning features. In Proceedings of the International Conference on Internet Multimedia Computing and Service, 127–131 (2016).
He, L., Li, J., Liu, C. & Li, S. Recent advances on spectral-spatial hyperspectral image classification: An overview and new guidelines. IEEE Transactions on Geoscience and Remote Sensing 56(3), 1579–1597 (2017).
Hong, D. et al. Spectralformer: Rethinking hyperspectral image classification with transformers. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2021).
Zhou, P., Han, J., Cheng, G. & Zhang, B. Learning compact and discriminative stacked autoencoder for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 57(7), 4823–4833 (2019).
Zheng, X., Yuan, Y. & Lu, X. Dimensionality reduction by spatial-spectral preservation in selected bands. IEEE Transactions on Geoscience and Remote Sensing 55(9), 5185–5197 (2017).
Liang, J., Li, P., Zhao, H., Han, L. & Qu, M.: Forest species classification of uav hyperspectral image using deep learning. In 2020 Chinese Automation Congress (CAC), 7126–7130 (2020). https://doi.org/10.1109/CAC51589.2020.9327690
Zhang, C. et al. Noise reduction in the spectral domain of hyperspectral images using denoising autoencoder methods. Chemometrics and Intelligent Laboratory Systems 203, 104063 (2020).
Hadi, F. et al. Dhcae: Deep hybrid convolutional autoencoder approach for robust supervised hyperspectral unmixing. Remote Sensing 14(18), 4433 (2022).
Hadi, F., Yang, J., Farooque, G. & Xiao, L. Deep convolutional transformer network for hyperspectral unmixing. European Journal of Remote Sensing 56(1), 2268820 (2023).
Khader, A., Xiao, L. & Yang, J. A model-guided deep convolutional sparse coding network for hyperspectral and multispectral image fusion. International Journal of Remote Sensing 43(6), 2268–2295 (2022).
Zhang, G. et al. Htd-net: A deep convolutional neural network for target detection in hyperspectral imagery. Remote Sensing 12(9), 1489 (2020).
Farooque, G., Xiao, L., Sargano, A. B., Abid, F. & Hadi, F. A dual attention driven multiscale-multilevel feature fusion approach for hyperspectral image classification. International Journal of Remote Sensing 44(4), 1151–1178 (2023).
Shenming, Q., Xiang, L. & Zhihua, G. A new hyperspectral image classification method based on spatial-spectral features. Scientific Reports 12(1), 1541 (2022).
Miranda-Vega, J. E., Rivas-López, M. & Fuentes, W. F. k-nearest neighbor classification for pattern recognition of a reference source light for machine vision system. IEEE Sensors Journal 21(10), 11514–11521. https://doi.org/10.1109/JSEN.2020.3024094 (2021).
Shi, X. & Sun, L.: Hyperspectral image classification with support vector machines based on the maximum noise fraction. In 2022 IEEE 5th International Conference on Electronics Technology (ICET), 1193–1197 (2022). https://doi.org/10.1109/ICET55676.2022.9824122
Bajpai, S., Singh, H.V. & Kidwai, N.R.: Feature extraction & classification of hyperspectral images using singular spectrum analysis & multinomial logistic regression classifiers. In 2017 International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT), 97–100 (IEEE, 2017).
Duan, W., Li, S. & Fang, L.: Spectral-spatial hyperspectral image classification using superpixel and extreme learning machines. In Pattern Recognition: 6th Chinese Conference, CCPR 2014, Changsha, China, November 17-19, 2014. Proceedings, Part I 6, 159–167 (Springer, 2014).
Zhang, Y., Cao, G., Li, X. & Wang, B. Cascaded random forest for hyperspectral image classification. IEEE journal of selected topics in applied earth observations and remote sensing 11(4), 1082–1094 (2018).
Islam, M.R., Ahmed, B. & Hossain, M.A.: Feature reduction based on segmented principal component analysis for hyperspectral images classification. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), 1–6 (2019). https://doi.org/10.1109/ECACE.2019.8679394
Champa, A.I., Rabbi, M.F., Mahedy Hasan, S.M., Zaman, A. & Kabir, M.H.: Tree-based classifier for hyperspectral image classification via hybrid technique of feature reduction. In 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), 115–119 (2021). https://doi.org/10.1109/ICICT4SD50815.2021.9396809
Li, E., Du, P., Samat, A., Meng, Y. & Che, M. Mid-level feature representation via sparse autoencoder for remotely sensed scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10(3), 1068–1081 (2016).
Chen, Y., Lin, Z., Zhao, X., Wang, G. & Gu, Y. Deep learning-based classification of hyperspectral data. IEEE Journal of Selected topics in applied earth observations and remote sensing 7(6), 2094–2107 (2014).
Otter, D. W., Medina, J. R. & Kalita, J. K. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems 32(2), 604–624. https://doi.org/10.1109/TNNLS.2020.2979670 (2021).
Chai, J., Zeng, H., Li, A. & Ngai, E. W. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications 6, 100134 (2021).
Zhang, Q., Yang, L. T., Chen, Z. & Li, P. A survey on deep learning for big data. Information Fusion 42, 146–157 (2018).
Zhao, W. & Du, S. Spectral-spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote Sensing 54(8), 4544–4554 (2016).
Li, W., Wu, G., Zhang, F. & Du, Q. Hyperspectral image classification using deep pixel-pair features. IEEE Transactions on Geoscience and Remote Sensing 55(2), 844–853 (2016).
Liu, Q., Xiao, L., Yang, J. & Wei, Z. Cnn-enhanced graph convolutional network with pixel-and superpixel-level feature fusion for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 59(10), 8657–8671 (2020).
Zhang, H., Li, Y., Zhang, Y. & Shen, Q. Spectral-spatial classification of hyperspectral imagery using a dual-channel convolutional neural network. Remote sensing letters 8(5), 438–447 (2017).
Lee, H. & Kwon, H. Going deeper with contextual cnn for hyperspectral image classification. IEEE Transactions on Image Processing 26(10), 4843–4855 (2017).
Feng, F., Wang, S., Wang, C. & Zhang, J. Learning deep hierarchical spatial-spectral features for hyperspectral image classification based on residual 3d–2d cnn. Sensors 19(23), 5276 (2019).
Jia, S., Zhao, B., Tang, L., Feng, F. & Wang, W. Spectral-spatial classification of hyperspectral remote sensing image based on capsule network. The Journal of Engineering 2019(21), 7352–7355 (2019).
Chen, C., Zhang, J.-J., Zheng, C.-H., Yan, Q. & Xun, L.-N.: Classification of hyperspectral data using a multi-channel convolutional neural network. In Intelligent Computing Methodologies: 14th International Conference, ICIC 2018, Wuhan, China, August 15-18, 2018, Proceedings, Part III 14, 81–92 (Springer, 2018).
Hao, S., Wang, W., Ye, Y., Nie, T. & Bruzzone, L. Two-stream deep architecture for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 56(4), 2349–2361 (2017).
Gong, H., Farooque, G., Khader & A., Xiao, L.: Multiscale semantic alignment graph convolution network for single-shot learning based hyperspectral image classification. In Fourteenth International Conference on Graphics and Image Processing (ICGIP 2022), 12705, 462–473 (SPIE, 2023).
Chen, Y., Jiang, H., Li, C., Jia, X. & Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE transactions on geoscience and remote sensing 54(10), 6232–6251 (2016).
Li, Y., Zhang, H. & Shen, Q. Spectral-spatial classification of hyperspectral imagery with 3d convolutional neural network. Remote Sensing 9(1), 67 (2017).
Zhao, F. et al. Multiple vision architectures-based hybrid network for hyperspectral image classification. Expert Systems with Applications 234, 121032 (2023).
Carion, N., et al.: End-to-end object detection with transformers. In European Conference on Computer Vision, 213–229 (Springer, 2020).
Touvron, H., et al.: Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, 10347–10357 (PMLR, 2021).
Parvaiz, A. et al. Vision transformers in medical computer vision-a contemplative retrospection. Engineering Applications of Artificial Intelligence 122, 106126 (2023).
Hong, D., et al.: Spectralformer: Rethinking hyperspectral image classification with transformers. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022) https://doi.org/10.1109/TGRS.2021.3130716
Ullah, W., Hussain, T., Ullah, F. U. M., Lee, M. Y. & Baik, S. W. Transcnn: Hybrid cnn and transformer mechanism for surveillance anomaly detection. Engineering Applications of Artificial Intelligence 123, 106173 (2023).
Sun, L., Zhao, G., Zheng, Y. & Wu, Z. Spectral-spatial feature tokenization transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 60, 1–14 (2022).
Roy, S.K., et al.: Spectral–spatial morphological attention transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 61, 1–15 (2023) https://doi.org/10.1109/TGRS.2023.3242346
Yu, H.,et al.: A multilevel spectral–spatial transformer network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 60, 1–13 (2022) https://doi.org/10.1109/TGRS.2022.3186400
He, W., Huang, W., Liao, S., Xu, Z. & Yan, J.: Csit: A multiscale vision transformer for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 9266–9277 (2022) https://doi.org/10.1109/JSTARS.2022.3216335
Zhang, B., Chen, Y., Rong, Y., Xiong, S. & Lu, X. Matnet: A combining multi-attention and transformer network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 61, 1–15 (2023).
Luo, F., Huang, H., Duan, Y., Liu, J. & Liao, Y. Local geometric structure feature for dimensionality reduction of hyperspectral imagery. Remote Sensing 9(8), 790 (2017).
Hu, W., Huang, Y., Wei, L., Zhang, F. & Li, H. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors 2015, 1–12 (2015).
Ahmad, M. et al. A fast and compact 3-d cnn for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters 19, 1–5 (2020).
Pande, S. & Banerjee, B. Hyperloopnet: Hyperspectral image classification using multiscale self-looping convolutional networks. ISPRS Journal of Photogrammetry and Remote Sensing 183, 422–438 (2022).
Xu, Q., Xiao, Y., Wang, D. & Luo, B. Csa-mso3dcnn: Multiscale octave 3d cnn with channel and spatial attention for hyperspectral image classification. Remote Sensing 12(1), 188 (2020).
Liu, Q., Xiao, L., Huang, N. & Tang, J.: Composite neighbor-aware convolutional metric networks for hyperspectral image classification. IEEE Transactions on Neural Networks and Learning Systems (2022).
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017).
Haut, J. M., Paoletti, M. E., Plaza, J., Plaza, A. & Li, J. Visual attention-driven hyperspectral image classification. IEEE transactions on geoscience and remote sensing 57(10), 8065–8080 (2019).
Chen, Y., Kalantidis, Y., Li, J., Yan, S. & Feng, J.: A2-nets: Double attention networks. Advances in neural information processing systems 31 (2018).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I.S.: Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), 3–19 (2018).
Wang, Q., et al.: Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11534–11542 (2020).
Mou, L. & Zhu, X. X. Learning to pay attention on spectral domain: A spectral attention module-based convolutional network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 58(1), 110–122 (2019).
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10022 (2021).
Xu, Y., Du, B. & Zhang, L. Beyond the patchwise classification: Spectral-spatial fully convolutional networks for hyperspectral image classification. IEEE Transactions on Big Data 6(3), 492–506. https://doi.org/10.1109/TBDATA.2019.2923243 (2020).
Yin, J., Qi, C., Huang, W., Chen, Q. & Qu, J.: Multibranch 3d-dense attention network for hyperspectral image classification. IEEE Access 10, 71886–71898 (2022) https://doi.org/10.1109/ACCESS.2022.3188853
Makantasis, K., Karantzalos, K., Doulamis, A. & Doulamis, N.: Deep supervised learning for hyperspectral data classification through convolutional neural networks. In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 4959–4962 (IEEE, 2015).
Kanthi, M., Sarma, T.H. & Bindu, C.S.: A 3d-deep cnn based feature extraction and hyperspectral image classification. In 2020 IEEE India Geoscience and Remote Sensing Symposium (InGARSS), 229–232 (IEEE, 2020).
Shu, Z., Wang, Y. & Yu, Z. Dual attention transformer network for hyperspectral image classification. Engineering Applications of Artificial Intelligence 127, 107351 (2024).
Jiang, N., Geng, S., Zheng, Y., Sun & L.: Msdca: A multi-scale dual-branch network with enhanced cross-attention for hyperspectral image classification. Remote Sensing 17(13) (2025) https://doi.org/10.3390/rs17132198
Farooque, G., Liu, Q., Sargano, A. B. & Xiao, L. Swin transformer with multiscale 3d atrous convolution for hyperspectral image classification. Engineering Applications of Artificial Intelligence 126, 107070 (2023).
Acknowledgements
This paper was funded by the Princess Nourah bint Abdulrahman University Researchers Supporting Project, under grant No. (PNURSP2026R161), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors, therefore, gratefully acknowledge and thank Nourah bint Abdulrahman University for its technical and financial support.
Author information
Authors and Affiliations
Contributions
Conceptualization, I.A. & G.F.; methodology, I.A. & F.H.; implementation, I.A. & F.H.; validation, I.A.; formal analysis, G.F. & A.K.; investigation, I.A. & A.K.; writing, I.A. & G.F.; result interpretation, I.A. & A.A.; review and editing, A.K., S.A.G., & A.A.; supervision and proofreading, S.A.G. & E.A.; and funding acquisition, S.A.G. & E.A. All authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ahmad, I., Farooque, G., Hadi, F. et al. A multiscale transformer with spatial attention for hyperspectral image classification. Sci Rep (2026). https://doi.org/10.1038/s41598-025-34756-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-34756-z


