Abstract
The evolution of the viruses is rapidly becoming a global challenge to the creation of vaccines since the new variants are often capable of escaping the immune system and decreasing the vaccine efficacy. The traditional methods of genomic epidemiology rely on the retrospective phylogenetic analysis, which can elucidate the previous mutations, but cannot predict the evolutionary trends in the future. In order to address these disadvantages, a new Refined Deep Evolutionary Learning Framework (R-DELF) is proposed that combines the genomic, structural, and temporal intelligence in predicting proactive viral mutations and assessing vaccine suitability. The methodology uses an ESM-2 Transformer that extracts structure-aware embeddings, merged with dual-attention Graph Neural Networks (GNNs) which learn phylogenetic and structural dependencies. Evolutionary learning maximiser improves adaptation modelling and an Explainable AI layer, which offers interpretability based on residue-level attribution. Tests indicate that experimentally it achieves 99.2% accuracy, 97.92% precision, 98.89% recall and 99.4% F1, which is higher than the current AI-based virology models. It is implemented in Python and with the help of TensorFlow and genomic and protein data obtained via Kaggle. The framework allows predicting the high-risk mutations in advance, facilitates the production of vaccines on time, and increases the preparedness to pandemics by making intelligent, data-driven predictions of viral evolution.
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Hui, X. et al. A review of Cross-Species transmission mechanisms of influenza viruses. Vet. Sci. 12 (5), 447 (2025).
Amoutzias, G. D. et al. The remarkable evolutionary plasticity of coronaviruses by mutation and recombination: insights for the COVID-19 pandemic and the future evolutionary paths of SARS-CoV-2. Viruses 14 (1), 78 (2022).
Ong, S. W. X., Chia, T. & Young, B. E. SARS-CoV-2 variants of concern and vaccine escape, from alpha to Omicron and beyond. Expert Rev. Respir Med. 16 (5), 499–502 (2022).
Mutz, P. et al. Human pathogenic RNA viruses establish noncompeting lineages by occupying independent niches, Proc. Natl. Acad. Sci., vol. 119, no. 23, p. e2121335119, (2022).
Aribi, M. Advancements in Human Vaccine Development: From Traditional to Modern Approaches, (2024).
Sharma, S. Public health challenges in the global South Post-COVID-19 pandemic. Int. Manag Rev. 21 (1), 94–129 (2025).
Theijeswini, R. et al. Prophylactic and therapeutic measures for emerging and re-emerging viruses: artificial intelligence and machine learning-the key to a promising future. Health Technol. 14 (2), 251–261 (2024).
Al-Amran, F. G., Hezam, A. M., Rawaf, S. & Yousif, M. G. Genomic Analysis and Artificial Intelligence: Predicting Viral Mutations and Future Pandemics, ArXiv Prepr. ArXiv230915936, (2023).
Yue, T. et al. Deep learning for genomics: from early neural Nets to modern large Language models. Int. J. Mol. Sci. 24 (21), 15858 (2023).
Omoseebi, A. A Graph-Based Framework for Modeling Inference Dependencies in Deep Bioinformatics Models.
Wimalawansa, S. J. Unlocking insights: navigating COVID-19 challenges and emulating future pandemic resilience strategies with strengthening natural immunity. Heliyon, 10, 15, (2024).
Hill, V. et al. Toward a global virus genomic surveillance network. Cell. Host Microbe. 31 (6), 861–873 (2023).
Leung, M. K., Delong, A., Alipanahi, B. & Frey, B. J. Machine learning in genomic medicine: a review of computational problems and data sets, Proc. IEEE, vol. 104, no. 1, pp. 176–197, (2015).
Remita, M. A. et al. A machine learning approach for viral genome classification. BMC Bioinform. 18 (1), 208 (2017).
Meijers, M. et al. Concepts and methods for predicting viral evolution, in Influenza Virus: Methods and Protocols, Springer, 253–290. (2025).
Dasari, C. M. & Bhukya, R. Explainable deep neural networks for novel viral genome prediction. Appl. Intell. 52 (3), 3002–3017 (2022).
Zwart, M. P., Kupczok, A. & Iranzo, J. Predicting virus evolution: from genome evolution to epidemiological trends, Frontiers in Virology, vol. 3. Frontiers Media SA, p. 1215709, (2023).
Zou, X. PETRA: Pretrained Evolutionary Transformer for SARS-CoV-2 Mutation Prediction, ArXiv Prepr. ArXiv251103976, (2025).
Thadani, N. N. et al. Learning from prepandemic data to forecast viral escape. Nature 622 (7984), 818–825 (2023).
Tang, W. et al. SARS-CoV-2: lessons in virus mutation prediction and pandemic preparedness. Curr. Opin. Immunol. 95, 102560 (2025).
Bagabir, S. A., Ibrahim, N. K., Bagabir, H. A. & Ateeq, R. H. Covid-19 and artificial intelligence: genome sequencing, drug development and vaccine discovery. J. Infect. Public. Health. 15 (2), 289–296 (2022).
Hamelin, D. et al. Predicting pathogen evolution and immune evasion in the age of artificial intelligence. Comput Struct. Biotechnol. J, (28) 27, 1370-1382 (2025).
Sarmadi, A., Hassanzadeganroudsari, M. & Soltani, M. Artificial intelligence and machine learning applications in vaccine development. Bioinforma Tools Pharm. Drug Prod. Dev, 11, pp. 233–253, (2023).
Domingo, E., García-Crespo, C., Lobo-Vega, R. & Perales, C. Mutation rates, mutation frequencies, and proofreading-repair activities in RNA virus genetics. Viruses 13 (9), 1882 (2021).
Doneva, N. & Dimitrov, I. Viral immunogenicity prediction by machine learning methods. Int. J. Mol. Sci. 25 (5), 2949 (2024).
He, L. et al. Sfm-protein: Integrative co-evolutionary pre-training for advanced protein sequence representation, ArXiv Prepr. ArXiv241024022, (2024).
Ouyang-Zhang, J., Diaz, D., Klivans, A. & Krähenbühl, P. Predicting a protein’s stability under a million mutations. Adv. Neural Inf. Process. Syst. 36, 76229–76247 (2023).
Tan, Y., Wang, R., Wu, B., Hong, L. & Zhou, B. Retrieval-enhanced mutation mastery: Augmenting zero-shot prediction of protein language model, ArXiv Prepr. ArXiv241021127, (2024).
Zhang, L. et al. VenusMutHub: A systematic evaluation of protein mutation effect predictors on small-scale experimental data. Acta Pharm. Sin B. 15 (5), 2454–2467 (2025).
R. Willett, SARS-CoV-2 Genetics Genbank data for genome and protein sequences for>10k COVID-19 virus strains. 2022. [Online]. Available: https://www.kaggle.com/datasets/rtwillett/sarscov2-genetics/data
biophysics53. Protein Secondary Struct from Seq – 2022. 2024. [Online]. Available: https://www.kaggle.com/code/biophysics53/protein-secondary-struct-from-seq-2022
Hays, P. Machine Learning and Artificial Intelligence Predictive Models for Viral Genome and Human Proteome Interactions, (2025).
Mumtaz, Z., Rashid, Z., Saif, R. & Yousaf, M. Z. Deep learning guided prediction modeling of dengue virus evolving serotype. Heliyon, 10, 11, e32061 (2024).
Choi, W. J., Park, J., Seong, D. Y., Chung, D. S. & Hong, D. A prediction of mutations in infectious viruses using artificial intelligence. Genomics Inf. 22 (1), 15 (2024).
Alshayeji, M. H., Sindhu, S. C. & Abed, S. Viral genome prediction from Raw human DNA sequence samples by combining natural Language processing and machine learning techniques. Expert Syst. Appl. 218, 119641 (2023).
Laine, E., Karami, Y. & Carbone, A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol. Biol. Evol. 36 (11), 2604–2619 (2019).
Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval, in International Conference on Machine Learning, PMLR, pp. 16990–17017. (2022).
Acknowledgements
This work was funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No. (DGSSR-2025-02-01427).
Author information
Authors and Affiliations
Contributions
O.S. conceived and designed the study, and contributed to manuscript writing and revision, M.I. performed data analysis, and contributed to manuscript preparation, A.A. contributed to data collection, F.A. participated in model validation, and visualization of outcomes, Y.A. contributed to manuscript preparation, A.A. assisted with literature review, and manuscript editing, and E.A. supported manuscript revision. All authors reviewed and approved the final version of the manuscript and agreed to be accountable for its content.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shahin, O.R., Ibrahim, M.N., Alanazi, A. et al. Predicting genetic evolution of viruses to identify suitable vaccines using artificial intelligence. Sci Rep (2026). https://doi.org/10.1038/s41598-026-35143-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-35143-y