Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A unified evolution-driven deep learning framework for virus variation driver prediction

Abstract

The increasing frequency of emerging viral infections necessitates a rapid human response, highlighting the cost-effectiveness of computational methods. However, existing computational approaches are limited by their input forms or incomplete functionalities, preventing a unified prediction of diverse virus variation drivers and hindering in-depth applications. To address this issue, we propose a unified evolution-driven framework for predicting virus variation drivers, named Evolution-driven Virus Variation Driver prediction (E2VD), which is guided by virus evolutionary traits. With evolution-inspired design, E2VD comprehensively and significantly outperforms state-of-the-art methods across various virus mutational driver prediction tasks. Moreover, E2VD effectively captures the fundamental patterns of virus evolution. It not only distinguishes different types of mutations but also accurately identifies rare beneficial mutations that are critical for viruses to survive, while maintaining generalization capabilities across different lineages of SARS-CoV-2 and different types of viruses. Importantly, with predicted biological drivers, E2VD perceives virus evolutionary trends in which potential high-risk mutation sites are accurately recommended. Overall, E2VD represents a unified, structure-free and interpretable approach for analysing and predicting viral evolutionary fitness, providing an ideal alternative to costly wet-lab measurements to accelerate responses to emerging viral infections.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The motivation and methodology of E2VD.
Fig. 2: Model architecture and prediction tasks.
Fig. 3: Prediction performance of E2VD.
Fig. 4: Ablation studies.
Fig. 5: Generalization performance evaluation.
Fig. 6: The pipeline and results of evolutionary trend prediction.

Similar content being viewed by others

Data availability

UniRef90 for PLM pretraining can be downloaded from https://www.uniprot.org/. The raw DMS datasets of binding affinity, expression and antibody escape prediction tasks of SARS-CoV-2 are publicly available from previous studies12,13,14. The raw DMS datasets of the mutational effect prediction tasks of influenza virus, Zika virus and HIV are also publicly available from previous studies16,17,18. The processed datasets are publicly available via GitHub at https://github.com/ZhiweiNiepku/E2VD (ref. 76) and figshare at https://figshare.com/projects/E2VD/206608 (ref. 77). Source data are provided with this paper.

Code availability

Relevant code and models are available via GitHub at https://github.com/ZhiweiNiepku/E2VD (ref. 76) and figshare at https://figshare.com/projects/E2VD/206608 (ref. 77).

References

  1. Kuiken, T., Fouchier, R., Rimmelzwaan, G. & Osterhaus, A. Emerging viral infections in a rapidly changing world. Curr. Opin. Biotechnol. 14, 641–646 (2003).

    Article  Google Scholar 

  2. Luo, G. G. & Gao, S.-J. Global health concerns stirred by emerging viral infections. J. Med. Virol. 92, 399 (2020).

    Article  MATH  Google Scholar 

  3. Prokunina-Olsson, L. et al. Covid-19 and emerging viral infections: the case for interferon lambda. J. Exp. Med. 217, e20200653 (2020).

  4. Jackson, C. B., Farzan, M., Chen, B. & Choe, H. Mechanisms of SARS-CoV-2 entry into cells. Nat. Rev. Mol. Cell Biol. 23, 3–20 (2022).

    Article  Google Scholar 

  5. Shang, J. et al. Structural basis of receptor recognition by SARS-CoV-2. Nature 581, 221–224 (2020).

    Article  MATH  Google Scholar 

  6. Walls, A. C. et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292 (2020).

    Article  MATH  Google Scholar 

  7. Krammer, F. SARS-CoV-2 vaccines in development. Nature 586, 516–527 (2020).

    Article  MATH  Google Scholar 

  8. Dong, Y. et al. A systematic review of SARS-CoV-2 vaccine candidates. Signal Transduct. Target. Ther. 5, 237 (2020).

    Article  MATH  Google Scholar 

  9. Tao, K. et al. The biological and clinical significance of emerging SARS-CoV-2 variants. Nat. Rev. Genet. 22, 757–773 (2021).

    Article  MATH  Google Scholar 

  10. Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. J. Virol. 84, 3134–3146 (2010).

    Article  MATH  Google Scholar 

  11. Yang, S. et al. Fast evolution of SARS-CoV-2 BA.2.86 to JN.1 under heavy immune pressure. Lancet Infect. Dis. 24, e70–e72 (2024).

    Article  MATH  Google Scholar 

  12. Starr, T. N. et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182, 1295–1310 (2020).

    Article  MATH  Google Scholar 

  13. Starr, T. N. et al. Deep mutational scans for ACE2 binding, RBD expression, and antibody escape in the SARS-CoV-2 Omicron BA.1 and BA.2 receptor-binding domains. PLoS Pathog. 18, e1010951 (2022).

    Article  Google Scholar 

  14. Cao, Y. et al. Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution. Nature 614, 521–529 (2023).

    MATH  Google Scholar 

  15. Yisimayi, A. et al. Repeated Omicron exposures override ancestral SARS-CoV-2 immune imprinting. Nature 625, 148–156 (2024).

    Article  Google Scholar 

  16. Lei, R. et al. Mutational fitness landscape of human influenza H3N2 neuraminidase. Cell Rep. 42, 111951 (2023).

  17. Sourisseau, M. et al. Deep mutational scanning comprehensively maps how zika envelope protein mutations affect viral growth and antibody escape. J. Virol. 93, 10–1128 (2019).

    Article  Google Scholar 

  18. Haddox, H. K., Dingens, A. S., Hilton, S. K., Overbaugh, J. & Bloom, J. D. Mapping mutational effects along the evolutionary landscape of HIV envelope. eLife 7, e34420 (2018).

    Article  Google Scholar 

  19. Wu, N. C. et al. High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution. Sci. Rep. 4, 4942 (2014).

    Article  MATH  Google Scholar 

  20. Doud, M. B. & Bloom, J. D. Accurate measurement of the effects of all amino-acid mutations on influenza hemagglutinin. Viruses 8, 155 (2016).

    Article  MATH  Google Scholar 

  21. Soh, Y. S., Moncla, L. H., Eguia, R., Bedford, T. & Bloom, J. D. Comprehensive mapping of adaptation of the avian influenza polymerase protein pb2 to humans. eLife 8, e45079 (2019).

    Article  Google Scholar 

  22. Wang, G. et al. Deep-learning-enabled protein–protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nat. Med. 29, 2007–2018 (2023).

    Article  MATH  Google Scholar 

  23. Han, W. et al. Predicting the antigenic evolution of SARS-CoV-2 with deep learning. Nat. Commun. 14, 3478 (2023).

    Article  MATH  Google Scholar 

  24. Thadani, N. N. et al. Learning from prepandemic data to forecast viral escape. Nature 622, 818–825 (2023).

    Article  MATH  Google Scholar 

  25. Chen, C. et al. Computational prediction of the effect of amino acid changes on the binding affinity between SARS-CoV-2 spike RBD and human ACE2. Proc. Natl Acad. Sci. USA 118, e2106480118 (2021).

    Article  Google Scholar 

  26. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  MATH  Google Scholar 

  27. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  MATH  Google Scholar 

  28. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  MathSciNet  MATH  Google Scholar 

  29. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article  MATH  Google Scholar 

  30. Loewe, L. & Hill, W. G. The population genetics of mutations: good, bad and indifferent. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365, 1153–1167 (2010).

    Article  MATH  Google Scholar 

  31. Duffy, S. Why are MA virus mutation rates so damn high? PLoS Biol. 16, e3000003 (2018).

    Article  Google Scholar 

  32. Pak, M. A. et al. Using AlphaFold to predict the impact of single mutations on protein stability and function. PLoS ONE 18, e0282689 (2023).

    Article  MATH  Google Scholar 

  33. Yin, R., Feng, B. Y., Varshney, A. & Pierce, B. G. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci. 31, e4379 (2022).

    Article  Google Scholar 

  34. Stevens, A. O. & He, Y. Benchmarking the accuracy of AlphaFold 2 in loop structure prediction. Biomolecules 12, 985 (2022).

    Article  MATH  Google Scholar 

  35. McDonald, E. F., Jones, T., Plate, L., Meiler, J. & Gulsevin, A. Benchmarking AlphaFold2 on peptide structure prediction. Structure 31, 111–119 (2023).

    Article  Google Scholar 

  36. Hie, B. L., Yang, K. K. & Kim, P. S. Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins. Cell Syst. 13, 274–285 (2022).

    Article  MATH  Google Scholar 

  37. Hie, B., Zhong, E. D., Berger, B. & Bryson, B. Learning the language of viral evolution and escape. Science 371, 284–288 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  38. Swanson, K., Chang, H. & Zou, J. Predicting immune escape with pretrained protein language model embeddings. In Proc. 17th Machine Learning in Computational Biology Meeting (eds Knowles, D. A. et al.) 110–130 (PMLR, 2022).

  39. Chen, J. et al. Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants. Int. J. High. Perform. Comput. Appl. 37, 650–665 (2023).

    Article  MATH  Google Scholar 

  40. Stefanini, M., Lovino, M., Cucchiara, R. & Ficarra, E. Predicting gene and protein expression levels from DNA and protein sequences with perceiver. Comput. Methods Programs Biomed. 234, 107504 (2023).

    Article  MATH  Google Scholar 

  41. Martiny, H.-M., Armenteros, J. J. A., Johansen, A. R., Salomon, J. & Nielsen, H. Deep protein representations enable recombinant protein expression prediction. Comput. Biol. Chem. 95, 107596 (2021).

    Article  Google Scholar 

  42. Wang, E. Prediction of antibody binding to SARS-CoV-2 RBDs. Bioinforma. Adv. 3, vbac103 (2023).

    Article  MATH  Google Scholar 

  43. Baer, C. F. Does mutation rate depend on itself? PLoS Biol. 6, e52 (2008).

    Article  MATH  Google Scholar 

  44. Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. 42, 275–283 (2024).

    Article  MATH  Google Scholar 

  45. Zhang, Y. & Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34, 5586–5609 (2021).

    Article  MATH  Google Scholar 

  46. Elnaggar, A. et al. Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).

    Article  MATH  Google Scholar 

  47. Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34, 29287–29303 (2021).

    Google Scholar 

  48. Unsal, S. et al. Learning functional properties of proteins with language models. Nat. Mach. Intell. 4, 227–245 (2022).

    Article  MATH  Google Scholar 

  49. Heinzinger, M. et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform. 20, 1–17 (2019).

    Article  Google Scholar 

  50. Gu, J. et al. Recent advances in convolutional neural networks. Pattern Recognit. 77, 354–377 (2018).

    Article  MATH  Google Scholar 

  51. Huang, C., Talbott, W., Jaitly, N. & Susskind, J. M. Efficient representation learning via adaptive context pooling. In Proc. 39th International Conference on Machine Learning (eds Niu, G. et al.) 9346–9355 (PMLR, 2022).

  52. Bork, P. & Koonin, E. V. Protein sequence motifs. Curr. Opin. Struct. Biol. 6, 366–376 (1996).

    Article  Google Scholar 

  53. Bailey, T. L., Williams, N., Misleh, C. & Li, W. W. Meme: discovering and analyzing dna and protein sequence motifs. Nucleic Acids Res. 34, W369–W373 (2006).

    Article  Google Scholar 

  54. Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. IEEE International Conference on Computer Vision (ed Mortensen, E.) 2999–3007 (IEEE, 2017).

  55. Cao, Y. et al. Omicron escapes the majority of existing SARS-CoV-2 neutralizing antibodies. Nature 602, 657–663 (2022).

    Article  MATH  Google Scholar 

  56. Ma, W., Fu, H., Jian, F., Cao, Y. & Li, M. Immune evasion and ACE2 binding affinity contribute to SARS-CoV-2 evolution. Nat. Ecol. Evol. 7, 1457–1466 (2023).

    Article  MATH  Google Scholar 

  57. Markov, P. V. et al. The evolution of SARS-CoV-2. Nat. Rev. Microbiol. 21, 361–379 (2023).

    Article  MATH  Google Scholar 

  58. Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).

    Article  MATH  Google Scholar 

  59. Chan, T. M. & Pătraşcu, M. Counting inversions, offline orthogonal range counting, and related problems. In Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms (ed. Charikar, M.) 161–173 (SIAM, 2010).

  60. Ajtai, M., Jayram, T., Kumar, R. & Sivakumar, D. Approximate counting of inversions in a data stream. In Proc. 34th Annual ACM Symposium on the Theory of Computing 370–379 (ACM, 2002).

  61. Hendrycks, D. et al. The many faces of robustness: a critical analysis of out-of-distribution generalization. In Proc. IEEE/CVF International Conference on Computer Vision 8340–8349 (IEEE, 2021).

  62. Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).

    Article  MATH  Google Scholar 

  63. Yue, C. et al. ACE2 binding and antibody evasion in enhanced transmissibility of XBB.1.5. Lancet Infect. Dis. 23, 278–280 (2023).

    Article  MATH  Google Scholar 

  64. Parums, D. V. The XBB.1.5 (‘kraken’) subvariant of Omicron SARS-CoV-2 and its rapid global spread. Med. Sci. Monit. 29, e939580–1 (2023).

    Google Scholar 

  65. Farahani, A., Voghoei, S., Rasheed, K. & Arabnia, H. R. A brief review of domain adaptation. In Proc. ICDATA 2020 and IKE 2020 (eds Stahlbock, R. et al.) 877–894 (Springer Nature, 2021).

  66. Moulana, A. et al. The landscape of antibody binding affinity in SARS-CoV-2 Omicron BA.1 evolution. eLife 12, e83442 (2023).

    Article  MATH  Google Scholar 

  67. Dejnirattisai, W. et al. SARS-CoV-2 Omicron-B.1.1. 529 leads to widespread escape from neutralizing antibody responses. Cell 185, 467–484 (2022).

    Article  Google Scholar 

  68. Starr, T. N. et al. SARS-CoV-2 RBD antibodies that maximize breadth and resistance to escape. Nature 597, 97–102 (2021).

    Article  MATH  Google Scholar 

  69. Hong, Q. et al. Molecular basis of receptor binding and antibody neutralization of Omicron. Nature 604, 546–552 (2022).

    Article  MATH  Google Scholar 

  70. Meng, B. et al. Altered TMPRSS2 usage by SARS-CoV-2 Omicron impacts infectivity and fusogenicity. Nature 603, 706–714 (2022).

    Article  MATH  Google Scholar 

  71. Cameroni, E. et al. Broadly neutralizing antibodies overcome SARS-CoV-2 Omicron antigenic shift. Nature 602, 664–670 (2022).

    Article  Google Scholar 

  72. Triveri, A. et al. SARS-CoV-2 spike protein mutations and escape from antibodies: a computational model of epitope loss in variants of concern. J. Chem. Inf. Model. 61, 4687–4700 (2021).

    Article  MATH  Google Scholar 

  73. Gruell, H. et al. Neutralisation sensitivity of the SARS-CoV-2 Omicron BA.2.75 sublineage. Lancet Infect. Dis. 22, 1422–1423 (2022).

    Article  MATH  Google Scholar 

  74. Wu, L. et al. SARS-CoV-2 Omicron RBD shows weaker binding affinity than the currently dominant Delta variant to human ACE2. Signal Transduct. Target. Ther. 7, 8 (2022).

    Article  Google Scholar 

  75. Imai, M. et al. Efficacy of antiviral agents against Omicron subvariants BQ.1.1 and XBB. N. Engl. J. Med. 388, 89–91 (2023).

    Article  Google Scholar 

  76. Nie, Z. & Liu, X. Code of E2VD. figshare https://doi.org/10.6084/m9.figshare.27762207.v1 (2024).

  77. Nie, Z. & Liu, X. Data of E2VD. figshare https://doi.org/10.6084/m9.figshare.25911490.v1 (2024).

  78. Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (grant no. 2022ZD0118201 to J.C.), the Shenzhen Medical Research Funds in China (grant no. B2302037 to J.C.), the National Natural Science Foundation of China (grant nos. 62425101 and 62088102 to Y.T.; grant nos. 61972217, 32071459, 62176249, 62006133 and 62271465 to J.C.), Self-Supporting Program of Guangzhou Laboratory (grant no. SRPG22-001 to P.Z.) and AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, China. We thank M. Li from the University of Waterloo and Y. Mao from Peking University for their constructive discussions.

Author information

Authors and Affiliations

Contributions

Z.N. designed the prototype of the project. Z.N. and X.L. contributed to model architecture design, model training, data analysis and manuscript writing. Z.W. and F.X. contributed to the pretraining of PLMs. Y.L. contributed to figure production and data analysis. H.S., T.D. and G.S. contributed to data curation. Y.T., J.C., W.G., P.Z. and Y.W. contributed to technical discussions.

Corresponding authors

Correspondence to Jie Chen or Yonghong Tian.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Meng Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Dimensionality reduction visualization of prediction tasks for SARS-CoV-2.

a, The expression prediction task, where three types of mutations are presented, including Risk-free (MFI ratio < 0.5), Risk-free (0.5 < MFI ratio < 1) and Risky (MFI ratio>1). b, The antibody escape prediction task, where two types of mutations are presented, including Low-risk (Escape score < 0.4) and High-risk (Escape score > 0.4).

Extended Data Fig. 2 Dimensionality reduction visualization of prediction tasks for influenza virus, Zika virus and HIV.

a, The mutational effect prediction task for influenza virus, where three types of mutations are presented, including Risk-free (Mutational effect < -1), Risk-free (-1 < Mutational effect < 0) and Risky (Mutational effect > 0). bc, The mutational effect prediction task for Zika virus (b) and HIV (c), where three types of mutations are shown, including Risk-free (Mutational effect < -5), Risk-free (-5 < Mutational effect < 0) and Risky (Mutational effect > 0).

Supplementary information

Supplementary Information

Supplementary Figs. 1–3 and Tables 1–26.

Reporting Summary

Source data

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nie, Z., Liu, X., Chen, J. et al. A unified evolution-driven deep learning framework for virus variation driver prediction. Nat Mach Intell 7, 131–144 (2025). https://doi.org/10.1038/s42256-024-00966-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-024-00966-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing