Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Predicting genetic evolution of viruses to identify suitable vaccines using artificial intelligence
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 February 2026

Predicting genetic evolution of viruses to identify suitable vaccines using artificial intelligence

  • Osama R. Shahin1,
  • Mohamed N. Ibrahim2,
  • Awadh Alanazi3,
  • Fahd S. Alharithi4,
  • Yasir Alruwaili3,5,
  • Ahmad A. Alzahrani6 &
  • …
  • Eman Fawzy El Azab2 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Evolution

Abstract

The evolution of the viruses is rapidly becoming a global challenge to the creation of vaccines since the new variants are often capable of escaping the immune system and decreasing the vaccine efficacy. The traditional methods of genomic epidemiology rely on the retrospective phylogenetic analysis, which can elucidate the previous mutations, but cannot predict the evolutionary trends in the future. In order to address these disadvantages, a new Refined Deep Evolutionary Learning Framework (R-DELF) is proposed that combines the genomic, structural, and temporal intelligence in predicting proactive viral mutations and assessing vaccine suitability. The methodology uses an ESM-2 Transformer that extracts structure-aware embeddings, merged with dual-attention Graph Neural Networks (GNNs) which learn phylogenetic and structural dependencies. Evolutionary learning maximiser improves adaptation modelling and an Explainable AI layer, which offers interpretability based on residue-level attribution. Tests indicate that experimentally it achieves 99.2% accuracy, 97.92% precision, 98.89% recall and 99.4% F1, which is higher than the current AI-based virology models. It is implemented in Python and with the help of TensorFlow and genomic and protein data obtained via Kaggle. The framework allows predicting the high-risk mutations in advance, facilitates the production of vaccines on time, and increases the preparedness to pandemics by making intelligent, data-driven predictions of viral evolution.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Hui, X. et al. A review of Cross-Species transmission mechanisms of influenza viruses. Vet. Sci. 12 (5), 447 (2025).

    Google Scholar 

  2. Amoutzias, G. D. et al. The remarkable evolutionary plasticity of coronaviruses by mutation and recombination: insights for the COVID-19 pandemic and the future evolutionary paths of SARS-CoV-2. Viruses 14 (1), 78 (2022).

    Google Scholar 

  3. Ong, S. W. X., Chia, T. & Young, B. E. SARS-CoV-2 variants of concern and vaccine escape, from alpha to Omicron and beyond. Expert Rev. Respir Med. 16 (5), 499–502 (2022).

    Google Scholar 

  4. Mutz, P. et al. Human pathogenic RNA viruses establish noncompeting lineages by occupying independent niches, Proc. Natl. Acad. Sci., vol. 119, no. 23, p. e2121335119, (2022).

  5. Aribi, M. Advancements in Human Vaccine Development: From Traditional to Modern Approaches, (2024).

  6. Sharma, S. Public health challenges in the global South Post-COVID-19 pandemic. Int. Manag Rev. 21 (1), 94–129 (2025).

    Google Scholar 

  7. Theijeswini, R. et al. Prophylactic and therapeutic measures for emerging and re-emerging viruses: artificial intelligence and machine learning-the key to a promising future. Health Technol. 14 (2), 251–261 (2024).

    Google Scholar 

  8. Al-Amran, F. G., Hezam, A. M., Rawaf, S. & Yousif, M. G. Genomic Analysis and Artificial Intelligence: Predicting Viral Mutations and Future Pandemics, ArXiv Prepr. ArXiv230915936, (2023).

  9. Yue, T. et al. Deep learning for genomics: from early neural Nets to modern large Language models. Int. J. Mol. Sci. 24 (21), 15858 (2023).

    Google Scholar 

  10. Omoseebi, A. A Graph-Based Framework for Modeling Inference Dependencies in Deep Bioinformatics Models.

  11. Wimalawansa, S. J. Unlocking insights: navigating COVID-19 challenges and emulating future pandemic resilience strategies with strengthening natural immunity. Heliyon, 10, 15, (2024).

  12. Hill, V. et al. Toward a global virus genomic surveillance network. Cell. Host Microbe. 31 (6), 861–873 (2023).

    Google Scholar 

  13. Leung, M. K., Delong, A., Alipanahi, B. & Frey, B. J. Machine learning in genomic medicine: a review of computational problems and data sets, Proc. IEEE, vol. 104, no. 1, pp. 176–197, (2015).

  14. Remita, M. A. et al. A machine learning approach for viral genome classification. BMC Bioinform. 18 (1), 208 (2017).

    Google Scholar 

  15. Meijers, M. et al. Concepts and methods for predicting viral evolution, in Influenza Virus: Methods and Protocols, Springer, 253–290. (2025).

  16. Dasari, C. M. & Bhukya, R. Explainable deep neural networks for novel viral genome prediction. Appl. Intell. 52 (3), 3002–3017 (2022).

    Google Scholar 

  17. Zwart, M. P., Kupczok, A. & Iranzo, J. Predicting virus evolution: from genome evolution to epidemiological trends, Frontiers in Virology, vol. 3. Frontiers Media SA, p. 1215709, (2023).

  18. Zou, X. PETRA: Pretrained Evolutionary Transformer for SARS-CoV-2 Mutation Prediction, ArXiv Prepr. ArXiv251103976, (2025).

  19. Thadani, N. N. et al. Learning from prepandemic data to forecast viral escape. Nature 622 (7984), 818–825 (2023).

    Google Scholar 

  20. Tang, W. et al. SARS-CoV-2: lessons in virus mutation prediction and pandemic preparedness. Curr. Opin. Immunol. 95, 102560 (2025).

    Google Scholar 

  21. Bagabir, S. A., Ibrahim, N. K., Bagabir, H. A. & Ateeq, R. H. Covid-19 and artificial intelligence: genome sequencing, drug development and vaccine discovery. J. Infect. Public. Health. 15 (2), 289–296 (2022).

    Google Scholar 

  22. Hamelin, D. et al. Predicting pathogen evolution and immune evasion in the age of artificial intelligence. Comput Struct. Biotechnol. J, (28) 27, 1370-1382 (2025).

  23. Sarmadi, A., Hassanzadeganroudsari, M. & Soltani, M. Artificial intelligence and machine learning applications in vaccine development. Bioinforma Tools Pharm. Drug Prod. Dev, 11, pp. 233–253, (2023).

  24. Domingo, E., García-Crespo, C., Lobo-Vega, R. & Perales, C. Mutation rates, mutation frequencies, and proofreading-repair activities in RNA virus genetics. Viruses 13 (9), 1882 (2021).

    Google Scholar 

  25. Doneva, N. & Dimitrov, I. Viral immunogenicity prediction by machine learning methods. Int. J. Mol. Sci. 25 (5), 2949 (2024).

    Google Scholar 

  26. He, L. et al. Sfm-protein: Integrative co-evolutionary pre-training for advanced protein sequence representation, ArXiv Prepr. ArXiv241024022, (2024).

  27. Ouyang-Zhang, J., Diaz, D., Klivans, A. & Krähenbühl, P. Predicting a protein’s stability under a million mutations. Adv. Neural Inf. Process. Syst. 36, 76229–76247 (2023).

    Google Scholar 

  28. Tan, Y., Wang, R., Wu, B., Hong, L. & Zhou, B. Retrieval-enhanced mutation mastery: Augmenting zero-shot prediction of protein language model, ArXiv Prepr. ArXiv241021127, (2024).

  29. Zhang, L. et al. VenusMutHub: A systematic evaluation of protein mutation effect predictors on small-scale experimental data. Acta Pharm. Sin B. 15 (5), 2454–2467 (2025).

    Google Scholar 

  30. R. Willett, SARS-CoV-2 Genetics Genbank data for genome and protein sequences for>10k COVID-19 virus strains. 2022. [Online]. Available: https://www.kaggle.com/datasets/rtwillett/sarscov2-genetics/data

  31. biophysics53. Protein Secondary Struct from Seq – 2022. 2024. [Online]. Available: https://www.kaggle.com/code/biophysics53/protein-secondary-struct-from-seq-2022

  32. Hays, P. Machine Learning and Artificial Intelligence Predictive Models for Viral Genome and Human Proteome Interactions, (2025).

  33. Mumtaz, Z., Rashid, Z., Saif, R. & Yousaf, M. Z. Deep learning guided prediction modeling of dengue virus evolving serotype. Heliyon, 10, 11, e32061 (2024).

  34. Choi, W. J., Park, J., Seong, D. Y., Chung, D. S. & Hong, D. A prediction of mutations in infectious viruses using artificial intelligence. Genomics Inf. 22 (1), 15 (2024).

    Google Scholar 

  35. Alshayeji, M. H., Sindhu, S. C. & Abed, S. Viral genome prediction from Raw human DNA sequence samples by combining natural Language processing and machine learning techniques. Expert Syst. Appl. 218, 119641 (2023).

    Google Scholar 

  36. Laine, E., Karami, Y. & Carbone, A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol. Biol. Evol. 36 (11), 2604–2619 (2019).

    Google Scholar 

  37. Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval, in International Conference on Machine Learning, PMLR, pp. 16990–17017. (2022).

Download references

Acknowledgements

This work was funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No. (DGSSR-2025-02-01427).

Author information

Authors and Affiliations

  1. Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi Arabia

    Osama R. Shahin

  2. Department of Clinical Laboratories Sciences, College of Applied Medical Sciences at Al Qurayyat, Jouf University, Al Qurayyat, 77454, Saudi Arabia

    Mohamed N. Ibrahim & Eman Fawzy El Azab

  3. Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Jouf University, Sakaka, Saudi Arabia

    Awadh Alanazi & Yasir Alruwaili

  4. Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia

    Fahd S. Alharithi

  5. Center for Health Research and Innovations, Deanship of Graduate Studies and Scientific Research, Jouf University, Sakaka, Saudi Arabia

    Yasir Alruwaili

  6. Department of Computer Science and Artificial Intelligence, College of Computing, Umm-AlQura University, P.O.Box 8XH2+XVP, Mecca, 24382, Saudi Arabia

    Ahmad A. Alzahrani

Authors
  1. Osama R. Shahin
    View author publications

    Search author on:PubMed Google Scholar

  2. Mohamed N. Ibrahim
    View author publications

    Search author on:PubMed Google Scholar

  3. Awadh Alanazi
    View author publications

    Search author on:PubMed Google Scholar

  4. Fahd S. Alharithi
    View author publications

    Search author on:PubMed Google Scholar

  5. Yasir Alruwaili
    View author publications

    Search author on:PubMed Google Scholar

  6. Ahmad A. Alzahrani
    View author publications

    Search author on:PubMed Google Scholar

  7. Eman Fawzy El Azab
    View author publications

    Search author on:PubMed Google Scholar

Contributions

O.S. conceived and designed the study, and contributed to manuscript writing and revision, M.I. performed data analysis, and contributed to manuscript preparation, A.A. contributed to data collection, F.A. participated in model validation, and visualization of outcomes, Y.A. contributed to manuscript preparation, A.A. assisted with literature review, and manuscript editing, and E.A. supported manuscript revision. All authors reviewed and approved the final version of the manuscript and agreed to be accountable for its content.

Corresponding author

Correspondence to Osama R. Shahin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahin, O.R., Ibrahim, M.N., Alanazi, A. et al. Predicting genetic evolution of viruses to identify suitable vaccines using artificial intelligence. Sci Rep (2026). https://doi.org/10.1038/s41598-026-35143-y

Download citation

  • Received: 08 November 2025

  • Accepted: 02 January 2026

  • Published: 03 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-35143-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Artificial intelligence
  • Genetic prediction
  • Genomic analysis machine learning
  • Vaccine development
  • Viral evolution
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing