Abstract
Accurate brain tumor classification from magnetic resonance imaging (MRI) is critical for early diagnosis and effective clinical decision-making. Although recent CNN–Transformer hybrid models have shown promising performance, most approaches rely primarily on single-modal spatial information, limiting their ability to capture complementary spectral features, model tumor heterogeneity, and generalize across datasets. To address these challenges, this paper proposes MM-FD-ConvFormer, a multimodal frequency-aware deformable CNN–Transformer network for robust brain tumor classification with enhanced interpretability. The proposed mode integrates three complementary modalities: (1) spatial MRI representations from original images, (2) frequency-domain MRI representations obtained via Fourier or wavelet transforms to capture texture and intensity variations, and (3) multi-scale contextual features for modeling global dependencies. A ConvNeXt V2 backbone is employed to extract discriminative spatial features, while a parallel lightweight ConvNeXt-based branch processes frequency-domain inputs. These features are subsequently fused and refined using a Swin Transformer V2 to capture long-range contextual relationships. To effectively integrate heterogeneous modalities and adapt to irregular tumor boundaries, a deformable cross-modal attention mechanism is introduced, enabling dynamic and shape-aware feature fusion. Final classification is performed on a unified multimodal representation, with an optional uncertainty-aware prediction head to improve reliability. The proposed model is evaluated using multiple public datasets, including the Kaggle Brain Tumor MRI and Figshare datasets for training, with external validation on the clinically relevant BraTS 2020/2021 dataset and optional testing on TCIA/REMBRANDT to assess cross-dataset generalization. Extensive experiments demonstrate that MM-FD-ConvFormer consistently outperforms standard CNN baselines, advanced transformer-based models, and hybrid approaches in terms of accuracy, macro-F1 score, and AUC. Furthermore, qualitative analyses using Grad-CAM, attention map visualization, and weakly supervised pseudo-segmentation provide interpretable insights into tumor localization and model decision-making. Overall, MM-FD-ConvFormer offers a robust, interpretable, and generalizable solution for automated brain tumor classification in real-world clinical imaging applications.
Similar content being viewed by others
Data availability
Dataset Availability: Dataset is publicly available, can access from Reference 33 to 37.
Abbreviations
- AUC:
-
Area Under the Curve
- ANOVA:
-
Analysis of Variance
- BraTS:
-
Brain Tumor Segmentation Dataset
- CAM:
-
Class Activation Mapping
- CNN:
-
Convolutional Neural Network
- ConvNeXt:
-
Convolutional Next Architecture
- DNN:
-
Deep Neural Network
- F1-score:
-
Harmonic Mean of Precision and Recall
- FD:
-
Frequency Domain
- GAN:
-
Generative Adversarial Network
- GNN:
-
Graph Neural Network
- Grad-CAM:
-
Gradient-weighted Class Activation Mapping
- IoU:
-
Intersection over Union
- K-fold:
-
K-Fold Cross Validation
- Macro-F1:
-
Macro-Averaged F1 Score
- LSTM:
-
Long Short-Term Memory
- MRI:
-
Magnetic Resonance Imaging
- MM-FD-ConvFormer:
-
Multimodal Frequency-Domain Convolutional Transformer
- MODIS:
-
Moderate Resolution Imaging Spectroradiometer
- NIH:
-
National Institutes of Health
- ROC:
-
Receiver Operating Characteristic
- SD:
-
Standard Deviation
- SHAP:
-
SHapley Additive exPlanations
- SOTA:
-
State of the Art
- Swin:
-
Shifted Window Transformer
- TCIA:
-
The Cancer Imaging Archive
- TPR:
-
True Positive Rate
- U-Net:
-
U-shaped Neural Network
- ViT:
-
Vision Transformer
- XAI:
-
Explainable Artificial Intelligence
References
Z. Li, Y. Wan, L. Xu, W. Su, P. Wang and X. Zhou, A Diff usion Model and Few-Shot Learning Framework for EEGAbnormality Classifi cation and Cerebral Infarction Assessment, in IEEE Transactions on Computational Social Systems, https://doi.org/10.1109/TCSS.2026.3665229.
Jin, L. et al. Frequency-aware spatial–temporal attention explainable network for EEG decoding. IEEE J. Biomed. Health Inform. 29 (10), 7175–7185.https://doi.org/10.1109/JBHI.2025.3576088 (2025).
Jiang, M. et al. Frequency-aware diffusion model for multi-modal MRI image synthesis. J. Imaging. 11 (5), 152 (2025).
Ke, L. et al. Brain tumor classification from MRI images using a multi-scale channel attention CNN integrated with SVM. Sci Rep. 16, 6297.https://doi.org/10.1038/s41598-026-36164-3 (2026).
Shinde, R. U., Sangolagi, V. A., Patil, M. B., Mhetre, V. & Kulkarni, S. Scalable and robust CNN models for brain tumor detection in healthcare applications. Proc. Copyr. 346, 353 (2024).
Sahoo, J. R., Nanda, S. K. & Panda, G. Brain tumor detection using transformer-based EfficientB0Net. SN Comput. Sci. 7 (1), 80 (2026).
Anand, V., Khajuria, A., Pachauri, R. K. & Gupta, V. Multi-class classification of brain tumors using optimized CNN and transfer learning techniques. Sci Rep. 16, 4709.https://doi.org/10.1038/s41598-025-34806-6 (2026).
Tang, Z., Liao, X., Liao, B., Shen, C. & Zhang, Y. MRI brain tumor classification using RFENet three-branch model with SwishReLU. Biomed. Signal. Process. Control. 113, 108893 (2026).
Pacal, I. & Banerjee, T. Towards accurate and interpretable brain tumor diagnosis: T-FSPANNet with tri-attribute and pyramidal attention-based feature fusion. Biomed. Signal. Process. Control. 113, 108852 (2026).
Balamurugan, A. G., Srinivasan, S., Monica, P., Mathivanan, S. K. & Shah, M. A. Robust brain tumor classification by fusion of deep learning and channel-wise attention mode approach. BMC Med. Imaging. 24 (1), 147 (2024).
Prasad, A. Y., Tanaka, K., Krishnamoorthy, R. & Thiagarajan, R. Robust brain tumor detection and classification from multichannel MRI using deep learning. Dev. Neurobiol. 85 (3), e22991 (2025).
Nassar, S. E., Yasser, I., Amer, H. M. & Mohamed, M. A. A robust MRI-based brain tumor classification via a hybrid deep learning technique. J. Supercomput. 80 (2), 2403–2427 (2024).
Ganesh, S., Kannadhasan, S. & Jayachandran, A. Multi-class robust brain tumor with hybrid classification using DTA algorithm. Heliyon. 10 (1),https://doi.org/10.1016/j.heliyon.2023.e23610 (2024).
Zhang, J. et al. EFF_D_SVM: a robust multi-type brain tumor classification system. Front. Neurosci. 17, 1269100 (2023).
Shah, H. A. et al. A robust approach for brain tumor detection in magnetic resonance images using finetuned EfficientNet. IEEE Access. 10, 65426–65438 (2022).
Hasan, N., Ahmed, M. F., Nasif, M. A., Haq, M. R. & Rahman, M. Hybrid feature extraction approach for robust brain tumor classification: HOG, GLCM, and artificial neural network. In: Proc 6th Int Conf Electrical Engineering and Information & Communication Technology (ICEEICT). IEEE, pp 1292–1297 (2024).
Babu Vimala, B. et al. Detection and classification of brain tumor using hybrid deep learning models. Sci. Rep. 13 (1), 23029 (2023).
Agarwal, M. et al. Deep learning for enhanced brain tumor detection and classification. Results Eng. 22, 102117 (2024).
Sharif, M. I., Khan, M. A., Alhussein, M., Aurangzeb, K. & Raza, M. A decision support system for multimodal brain tumor classification using deep learning. Complex. Intell. Syst. 8 (4), 3007–3020 (2022).
Lerousseau, M., Deutsch, E., Paragios, N. Multimodal Brain Tumor Classifi cation. In: Crimi, A., Bakas, S. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2020. Lecture Notes in Computer Science(), 12659. Springer, Cham.https://doi.org/10.1007/978-3-030-72087-2_42 (2021).
Usha, M. P., Kannan, G. & Ramamoorthy, M. Multimodal brain tumor classification using convolutional Tumnet architecture. Behav Neurol 2024:4678554 (2024).
Rohini, A. et al. Multimodal hybrid convolutional neural network based brain tumor grade classification. BMC Bioinform. 24 (1), 382 (2023).
Ullah, M. S. et al. BrainNet: a fusion assisted novel optimal framework of residual blocks and stacked autoencoders for multimodal brain tumor classification. Sci. Rep. 14 (1), 5895 (2024).
Razzaghi, P., Abbasi, K., Shirazi, M. & Rashidi, S. Multimodal brain tumor detection using multimodal deep transfer learning. Appl. Soft Comput. 129, 109631 (2022).
Khan, M. A. et al. Multimodal brain tumor detection and classification using deep saliency map and improved dragonfly optimization algorithm. Int. J. Imaging Syst. Technol. 33 (2), 572–587 (2023).
Lilhore, U. K. et al. AG-MS3D-CNN: multiscale attention-guided 3D convolutional neural network for robust brain tumor segmentation across MRI protocols. Sci. Rep. 15 (1), 24306 (2025).
Lilhore, U. K., Simaiya, S., Prasad, D. & Guleria, K. A hybrid tumour detection and classification based on machine learning. J. Comput. Theor. Nanosci. 17 (6), 2539–2544 (2020).
Dalal, S. et al. An efficient brain tumor segmentation method based on adaptive moving self-organizing map and fuzzy K-mean clustering. Sensors 23 (18), 7816 (2023).
Simaiya, S., Lilhore, U. K., Walia, R., Chauhan, S. & Vajpayee, A. An efficient brain tumour detection from MR images based on deep learning and transfer learning model. In: Proc Int Conf IoT, Communication and Automation Technology (ICICAT). IEEE, pp 1–5 (2023).
Sayah, A. et al. Enhancing the REMBRANDT MRI collection with expert segmentation labels and quantitative radiomic features. Sci. Data. 9 (1), 338 (2022).
Henry, T. et al. Brain tumor segmentation with self-ensembled, deeply supervised 3D U-Net neural networks: a BraTS 2020 challenge solution. In MICCAI Brainlesion Workshop 327–339. ArXiv, abs/2011.01045. (Springer, 2020).
Kaggle Brain Tumor MRI Dataset. Available online: (2023). https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset
Figshare Brain Tumor Dataset. Available online: (2015). https://figshare.com/articles/dataset/brain_tumor_dataset/1512427
BraTS 2021 Dataset. The Cancer Imaging Archive. Available online: (2021). https://www.cancerimagingarchive.net/analysis-result/rsna-asnr-miccai-brats-2021/
TCIA REMBRANDT Dataset. The Cancer Imaging Archive. Available online: (2022). https://www.cancerimagingarchive.net/collection/rembrandt/
Acknowledgements
The authors would like to acknowledge the Deanship of Graduate Studies and Scientific Research, Taif University for funding this work.
Author information
Authors and Affiliations
Contributions
Anto Lourdu Xavier Raj Arockia Selvarathinam contributed to conceptualization, methodology design, model development, experimental implementation, and manuscript drafting.- Umesh Kumar Lilhore contributed to conceptualization, supervision, formal analysis, results interpretation, and critical revision of the manuscript.- Roobaea Alroobaea contributed to data curation, experimental validation, and performance analysis.- Majed Alsafyani contributed to dataset preparation, experimental support, and result verification.- Abdullah M. Baqasah contributed to literature review, comparative analysis, and manuscript editing.- Sultan Algarni contributed to statistical analysis, result visualization, and discussion refinement.- MD Monish Khan contributed to supervision, project administration, funding acquisition, and final manuscript approval.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Arockia Selvarathinam, A.X., Lilhore, U.K., Alroobaea, R. et al. MM FD ConvFormer multimodal frequency aware deformable CNN transformer network for robust brain tumor classification. Sci Rep (2026). https://doi.org/10.1038/s41598-026-43616-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-43616-3


