Abstract
Cancer is caused by somatic mutations, a dreadful disease that impacts individuals everywhere. Classifying gene expression data is essential for disease diagnosis and distinguishing tumor types. However, small sample sizes, numerous features, and noise make this task particularly challenging. This is especially true when performing feature selection on high-dimensional microarray data. It is critical to select the most pertinent and valuable genes from microarray data to identify prospective biomarkers or gain insight into the fundamental mechanisms of cancer. This study introduces a novel hybrid model that combines feature selection and classification to identify the most significant and informative features from microarray data associated with brain cancer. The research employs the GSE50161 dataset obtained from the Curated Microarray Database (CuMiDa), comprising 130 samples classified into five distinct categories with 54,676 genomes examined. We first applied mRMR to reduce dimensionality by removing redundant features, followed by HHO to refine the feature subset for optimal classification performance. To improve the performance of our model in classifying brain cancer microarray data, we utilized three metaheuristic algorithms: Differential Evolution (DE), Harris Hawks Optimization (HHO), and Particle Swarm Optimization (PSO). The hyperparameters “C” and “sigma” of the support vector machine (SVM) were optimized using these algorithms. The experimental results indicate that the suggested framework improves the capacity to differentiate between benign and malignant tissues with reduced time and dimensionality requirements. Furthermore, the genes selected for the dataset on brain cancer have undergone biological interpretation. This process is consistent with the findings of relevant scientific inquiries and significantly influences patients’ prognoses.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
References
Feltes, B. C., Chandelier, E. B., Grisci, B. I. & Dorn, M. CuMiDa (2019) : An Extensively Curated Microarray Database for Benchmarking and Testing of Machine Learning Approaches in Cancer Research. J. Comput. Biol. 26(4), 376–386. https://doi.org/10.1089/cmb.2018.0238 (2019).
SREEDEVI, E. & PRADEEP JANGIR, D. S. KUMAR, J. R., An Enhanced Early Detection and Risk Prediction Of Brain Tumors Using MRI-CT Scans With Deep Learning Technique. Journal Theoretical Appl. Inform. Technology, 102(21), 7780–7792 (2024).
Abdul Rasool Hassan, B., Mohammed, A. H., Hallit, S., Malaeb, D. & Hosseini, H. Exploring the role of artificial intelligence in chemotherapy development, cancer diagnosis, and treatment: Present achievements and future outlook. Front. Oncol. 15, 1475893. https://doi.org/10.3389/fonc.2025.1475893 (2025).
Salma, R. A. et al. Leveraging machine learning for effective breast cancer diagnosis. WSEAS Trans. Comput. Res. 13, 34–46. https://doi.org/10.37394/232018.2025.13.4 (2025).
Erdal, H. & Namli, E. Monthly streamflow prediction: The power of ensemble machine learning based decision support models. Int. J. Hydrol. Sci. Technol. 1(1), 1. https://doi.org/10.1504/ijhst.2022.10046854 (20233).
Ghorai, S., Mukherjee, A., Sengupta, S. & Dutta, P. K. Multicategory cancer classification from gene expression data by Multiclass NPPC Ensemble. In Proceedings of the International Conference on Systems in Medicine and Biology, 41–48. (2010). https://doi.org/10.1109/icsmb.2010.5735343
Bhandari, N., Walambe, R., Kotecha, K. & Khare, S. P. A comprehensive survey on computational learning methods for analysis of gene expression data. Front. Mol. Biosci. 9, 907150. https://doi.org/10.3389/fmolb.2022.907150 (2022).
Madhu, G., Mohamed, A. W., Kautish, S., Shah, M. A. & Ali, I. Intelligent diagnostic model for malaria parasite detection and classification using imperative inception-based capsule neural networks. Sci. Rep. 13(1), 13377. https://doi.org/10.1038/s41598-023-40317-z (2023).
Saeys, Y., Inza, I. & Larrañaga, P. A review of feature selection techniques. Bioinf. Bioinformatics 23(19), 2507–2517. https://doi.org/10.1093/bioinformatics/btm344 (2007).
Varma, S. & Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 91. https://doi.org/10.1186/1471-2105-7-91 (2006).
Feng, C., Zhang, Z. & Pal, N. R. A comprehensive study on feature selection in the wrapper framework. Int. J. Mach. Learn. Cybern. 11, 1603–1626 (2020).
Zhang, Z., Kuhn, M. & Lalonde, M. Feature selection strategies for high-dimensional data in bioinformatics.. Curr. Opin. Biotechnol. 73, 148–154 (2022).
Chandrashekar, G. & Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024 (2014).
Joseph, J. A. et al. Artificial Intelligence Method for Detecting Brain Cancer Using Advanced Intelligent Algorithms. In 2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC) (pp. 1482–1487). IEEE. (2023)., July https://doi.org/10.1109/ICESC57686.2023.10193659
Heidari, A. A. et al. Harris Hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 97, 849–872. https://doi.org/10.1016/j.future.2019.02.028 (2019).
Ramadan, O. I. et al. Enhancing breast cancer classification based on BPSO feature selection and machine learning techniques. Eng. Technol. Appl. Sci. Res. 15(3), 23907–23916. https://doi.org/10.48084/etasr.10900 (2025).
Abatal, A. et al. Hybrid long short-term memory and decision tree model for optimizing patient volume predictions in emergency. International Journal of Electrical and Computer Engineering (IJECE) 15(1), 669–676. https://doi.org/10.11591/ijece.v15i1.pp669-676 (2025).
Al Sukhni, H. et al. Brain tumor detection: Integrating machine learning and deep learning for robust brain tumor classification. J. Intell. Syst. Internet Things https://doi.org/10.54216/JISIoT.150101 (2025).
Aljanabi, M., Shkoukani, M. & Hijjawi, M. Ground-level ozone prediction using machine learning techniques: A case study in Amman, Jordan. Int. J. Autom. Comput. 17(5), 667–677. https://doi.org/10.1007/s11633-020-1233-4 (2020).
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J. & Scholkopf, B. Support vector machines. IEEE Intelligent Systems and their Applications 13(4), 18–28. https://doi.org/10.1109/5254.708428 (1998).
Abdel-Basset, M., Abdel-Fatah, L. & Sangaiah, A. K. Metaheuristic algorithms: A comprehensive review. Comput. Intell. multimedia big data cloud Eng. Appl. 185–231. https://doi.org/10.1016/B978-0-12-813314-9.00010-4 (2018).
Bohat, V. K., Hashim, F. A., Batra, H. & Abd Elaziz, M. Phototropic growth algorithm: A novel metaheuristic inspired from phototropic growth of plants. Knowl. Based Syst. 322, 113548. https://doi.org/10.1016/j.knosys.2025.113548 (2025).
Mezura-Montes, E., Velázquez-Reyes, J. & Coello Coello, C. A. A comparative study of differential evolution variants for global optimization. In Proceedings of the 8th annual conference on Genetic and Evolutionary Computation (pp. 485–492). (2006)., July https://doi.org/10.1145/1143997.1144086
Shukla, A. K. Chaos teaching learning-based algorithm for large-scale global optimization problem and its application. Concurrency Comput. Pract. Exp. https://doi.org/10.1002/cpe.6514 (2022).
Tripathi, D., Shukla, A. K. & Reddy, R. Multi-layer hybrid credit scoring model based on feature selection, ensemble learning, and ensemble classifier. Handbook of Research on Data Science for Effective Healthcare Practice and Administration. (2020). https://doi.org/10.4018/978-1-5225-9643-1.ch021
Shukla, A. K. Simultaneously feature selection and parameters optimization by teaching–learning and genetic algorithms for diagnosis of breast cancer. Int. J. Data Sci. Anal. https://doi.org/10.1007/s41060-024-00513-0 (2024).
Shukla, A. K., Singh, P. & Vardhan, M. Hybrid TLBO-GSA strategy for constrained and unconstrained engineering optimization functions. In Hybrid Metaheuristics Research and Applications. (2018). https://doi.org/10.1142/9789813270237_0002
Singh, R. K. & Sivabalakrishnan, M. Feature Selection of Gene Expression Data for Cancer Classification: A Review. Procedia Comput. Sci. 2015, 50, 52–57. (2015). https://doi.org/10.1016/j.procs.2015.04.060
Kilicarslan, S., Adem, K. & Celik, M. Diagnosis and classification of cancer using a hybrid ReliefF and convolutional neural network model. Med. Hypotheses 137, 109577. https://doi.org/10.1016/j.mehy.2020.109577 (2020).
Elemam, T. & Elshrkawey, M. A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis. The Scientific World Journal, 2022. (2022). https://doi.org/10.1155/2022/1056490
Qaraad, M. et al. A hybrid feature selection optimization model for high-dimension data classification. IEEE Access 9, 42884–42895. https://doi.org/10.1109/ACCESS.2021.3065341 (2021).
Ali, W. & Saeed, F. Hybrid filter and genetic algorithm-based feature selection for improving cancer classification in high-dimensional microarray data. Processes 11(2), 562. https://doi.org/10.3390/pr11020562 (2023).
Debata, P. P. & Mohapatra, P. A hybrid convolutional neural network approach for feature selection and disease classification. Turk. J. Electr. Eng. Comput. Sci. 29(8), 2580–2599. https://doi.org/10.3906/elk-2105-43 (2021).
Saeid, M. M., Nossair, Z. B. & Saleh, M. A. A microarray cancer classification technique based on discrete wavelet transforms for data reduction and a genetic algorithm for feature selection. In 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 857–861). IEEE. (2020)., June https://doi.org/10.1109/ICOEI48184.2020.9143024
Passi, K., Nour, A. & Jain, C. K. Markov blanket: Efficient strategy for feature subset selection method for high dimensional microarray cancer datasets. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1864–1871). IEEE. (2017)., November https://doi.org/10.1109/BIBM.2017.8217942
Pashaei, E. & Pashaei, E. Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput. Appl. 35(1), 353–374. https://doi.org/10.1007/s00521-022-07780-7 (2023).
Yaqoob, A., Verma, N. K. & Aziz, R. M. Optimizing gene selection and cancer classification with hybrid sine cosine and cuckoo search algorithm. J. Med. Syst. 48(1), 10. https://doi.org/10.1007/s10916-023-02031-1 (2024).
Vatankhah, M. & Momenzadeh, M. Self-regularized Lasso is used to select the most informative features in microarray cancer classification. Multimedia Tools Appl. 83(2), 5955–5970. https://doi.org/10.1007/s11042-023-15207-1 (2024).
Shiny, K. V. Brain tumor segmentation and classification using optimized U-Net. Imaging Sci. J. 72(2), 204–219. https://doi.org/10.1080/13682199.2023.2200614 (2024).
Hira, Z. M. & Gillies, D. F. A review of feature selection and feature extraction methods applied to microarray data. Advances in bioinformatics, 2015. (2015). https://doi.org/10.1155/2015/198363
Das, S. Filters, wrappers, and a boosting-based hybrid for feature selection. In Icml (Vol. 1, pp. 74–81). (2001), June.
Agrawal, P., Abutarboush, H. F., Ganesh, T. & Mohamed, A. W. Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019). Ieee Access 9, 26766–26791. https://doi.org/10.1109/ACCESS.2021.3056407 (2021).
Arasteh, B., Sadegi, R., Aghaei, B. & Ghanbarzadeh, R. Single and multi-objective metaheuristic algorithms and their applications in software maintenance. Decision-Making Models. 97–110. https://doi.org/10.1016/B978-0-443-16147-6.00010-4 (2024).
Tawhid, M. A. & Ibrahim, A. M. Feature selection is based on a rough set approach, wrapper approach, and binary whale optimization algorithm. Int. J. Mach. Learn. Cybern. 11, 573–602. https://doi.org/10.1007/s13042-019-00996-5 (2020).
Xue, B., Zhang, M., Browne, W. N. & Yao, X. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2015).
El Akadi, A., Amine, A., El Ouardighi, A. & Aboutajdine, D. A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl. Inf. Syst. 26, 487–500. https://doi.org/10.1007/s10115-010-0288-x (2011).
Yan, X. & Jia, M. Intelligent fault diagnosis of rotating machinery using improved multiscale dispersion entropy and mRMR feature selection. Knowl. Based Syst. 163, 450–471. https://doi.org/10.1016/j.knosys.2018.09.004 (2019).
Shen, L. et al. Evolving support vector machines using fruit fly optimization for medical data classification. Knowl. Based Syst. 96, 61–75. https://doi.org/10.1016/j.knosys.2016.01.002 (2016).
Faris, H. et al. An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl. Based Syst. 154, 43–67. https://doi.org/10.1016/j.knosys.2018.05.009 (2018).
Kumar, M., Kulkarni, A. J. & Satapathy, S. C. Socio evolution & learning optimization algorithm: A socio-inspired optimization methodology. Future Gener. Comput. Syst. 81, 252–272. https://doi.org/10.1016/j.future.2017.10.052 (2018).
Rao, R. V., Savsani, V. J. & Vakharia, D. P. Teaching–learning-based optimization: An optimization method for continuous non-linear large-scale problems. Inf. Sci. 183(1), 1–15. https://doi.org/10.1016/j.ins.2011.08.006 (2012).
Tamimi, E., Ebadi, H. & Kiani, A. Evaluation of different metaheuristic optimization algorithms in feature selection and parameter determination in SVM classification. Arab. J. Geosci. 10, 1–19. https://doi.org/10.1007/s12517-017-3254-z (2017).
Zhou, J. et al. Optimization of support vector machine through metaheuristic algorithms in forecasting TBM advance rate. Eng. Appl. Artif. Intell. 97, 104015. https://doi.org/10.1016/j.engappai.2020.104015 (2021).
Ardjani, F., Sadouni, K. & Benyettou, M. Optimization of SVM multiclass by particle swarm (PSO-SVM). In 2010 2nd International Workshop on Database Technology and Applications (pp. 1–4). IEEE. (2010)., November https://doi.org/10.5815/ijmecs.2010.02.05
Indraswari, R. & Arifin, A. Z. RBF kernel optimization method with particle swarm optimization on SVM using the analysis of input data’s movement. Jurnal Ilmu Komputer dan Informasi 10(1), 36–42. https://doi.org/10.21609/jiki.v10i1.410 (2017).
GSE50161. GEO accession in CuMiDa (Curated Microarray Database), brain cancer gene expression; 130 samples, five classes, 54,676 probes. URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50161
Onyije, F. M. et al. Risk factors for childhood brain tumors: A systematic review and meta-analysis of observational studies from 1976 to 2022. Cancer Epidemiol. 88, 102510. https://doi.org/10.1016/j.canep.2023.102510 (2024).
Das, P. & Das, A. <article-title update=“modified” original="Multi-scale cross-spectral coherence and phase spectral distribution based measurement in the non-subsampled shearlet domain for classification of brain tumors">Multi-scale cross spectral coherence and phase spectral distribution based measurement in non-subsampled shearlet domain for classification of brain tumors. Expert Syst. Appl. 247, 123329. https://doi.org/10.1016/j.eswa.2024.123329 (2024).
Hosack, D. A., Dennis, G. Jr., Sherman, B. T., Lane, H. C. & Lempicki, R. A. Identifying biological themes within lists of genes with EASE. Genome Biol. 4(10), R70. https://doi.org/10.1186/gb-2003-4-10-r70 (2003).
Dennis, G. Jr. et al. DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol. 4(9), R60. https://doi.org/10.1186/gb-2003-4-9-r60 (2003).
Uzma, Al-Obeidat, F., Tubaishat, A., Shah, B. & Halim, Z. Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Comput. Appl. 34 (11), 8309–8331. https://doi.org/10.1007/s00521-020-05101-4 (2022).
El-Kafrawy, P., Manhrawy, I. I., Fathi, H., Qaraad, M. & Kelany, A. K. Using multi-feature selection with machine learning for de novo acute myeloid leukemia in Egypt. In 2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS) (pp. 1–8). IEEE. (2019)., December https://doi.org/10.1109/ISACS48493.2019.9068905
Halim, Z. An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowl.-Based Syst. 234, 107560. https://doi.org/10.1016/j.knosys.2021.107560 (2021).
Abualigah, L., Al-Okbi, N. K., Mirjalili, S., Alshinwan, M., Al Hamad, H., Khasawneh,A. M., … Gandomi, A. H. (2022). Moth-Flame optimization Algorithm, arithmetic optimization Algorithm, Aquila Optimizer, Gray Wolf Optimizer, and sine cosine algorithm: a comparative analysis using multilevel thresholding image segmentation problems. In Handbook of Moth-Flame Optimization Algorithm (pp. 241–263). CRC Press. https://doi.org/10.1201/9781003205326.
Shannaq, F. et al. Exploring metaheuristic optimization algorithms in the context of textual cyberharassment: A systematic review. Expert Syst. 42(2), e13826. https://doi.org/10.1111/exsy.13826 (2025).
Acknowledgements
Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R435), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Funding
Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R435), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the paper as follows: study conception and design: HANAA FATHI, ARAR ALTAWIL, AYDA K. KELANY, IBRAHIM I. M. MANHRAWY; data collection: ARAR ALTAWIL, AYDA K. KELANY; analysis and interpretation of results: HANAA FATHI, ARAR ALTAWIL, AYDA K. KELANY, IBRAHIM I. M. MANHRAWY; draft manuscript preparation: DEEMA M. ALSEKAIT, HANAA FATHI, ARAR ALTAWIL, DEEMA M. ALSEKAIT, AYDA K. KELANY, IBRAHIM I. M. MANHRAWY. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Manhrawy, I.I.M., Fathi, H., Alsekait, D.M. et al. Hybrid feature selection and classification model using high-dimensional data based on a metaheuristic algorithm for brain cancer diagnosis. Sci Rep (2026). https://doi.org/10.1038/s41598-026-41573-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-41573-5