ARNLE model identifies prevalence potential of SARS-CoV-2 variants

Liu, Yuqi; Li, Jing; Li, Peihan; Yang, Yehong; Wang, Kaiying; Li, Jinhui; Yang, Lang; Liu, Jiangfeng; Jia, Leili; Wu, Aiping; Yang, Juntao; Li, Peng; Song, Hongbin

doi:10.1038/s42256-024-00919-2

Article
Published: 31 December 2024

ARNLE model identifies prevalence potential of SARS-CoV-2 variants

Nature Machine Intelligence volume 7, pages 18–28 (2025)Cite this article

1082 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

SARS-CoV-2 mutations accumulated during the COVID-19 pandemic, posing significant challenges for immune prevention. An optimistic perspective suggests that SARS-CoV-2 will become more tropic to humans with weaker virulence and stronger infectivity. However, tracing a quantified trajectory of this process remains difficult. Here we introduce an attentional recurrent network based on language embedding (ARNLE) framework to analyse the shift in SARS-CoV-2 host tropism towards humans. ARNLE incorporates a language model for self-supervised learning to capture the features of amino acid sequences, alongside a supervised bidirectional long-short-term-memory-based network to discern the relationship between mutations and host tropism among coronaviruses. We identified a shift in SARS-CoV-2 tropism from weak to strong, transitioning from an approximate Chiroptera coronavirus to a primate-tropic coronavirus. Delta variants were closer to other common primate coronaviruses than previous SARS-CoV-2 variants. A similar phenomenon was observed among the Omicron variants. We employed a Bayesian-based post hoc explanation method to analyse key mutations influencing the human tropism of SARS-CoV-2. ARNLE identified pivotal mutations in the spike proteins, including T478K, L452R, G142D and so on, as the top determinants of human tropism. Our findings suggest that language models like ARNLE will significantly facilitate the identification of potentially prevalent variants and provide important support for screening key mutations, aiding in timely update of vaccines to protect against future emerging SARS-CoV-2 variants.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: ARNLE process and structure.**

**Fig. 2: Human tropism prediction of SARS-CoV-2 by the ARNLE classifier.**

**Fig. 3: Human tropism shift of SARS-CoV-2 variants associated with COVID-19 pandemic severity.**

**Fig. 4: Compositional and site-tropism shift indices of SARS-CoV-2 variants based on important AAs.**

**Fig. 5: Interpretability of features for SARS-CoV-2 tropism classification.**

A predictive language model for SARS-CoV-2 evolution

Article Open access 23 December 2024

The evolution of SARS-CoV-2

Article 05 April 2023

Tracking cryptic SARS-CoV-2 lineages detected in NYC wastewater

Article Open access 03 February 2022

Data availability

The trained, supervised bi-LSTM model and all sequence datasets (including the training, blind validation and test datasets) are available via GitHub at https://github.com/SN-1604/ARNLE and Zenodo at https://doi.org/10.5281/zenodo.13836159 (ref. ⁶⁷). The trained ELMo model, the embedded training and blind validation data for the supervised bi-LSTM model and Supplementary Table 3 are available via Zenodo at https://doi.org/10.5281/zenodo.13826663 (ref. ⁶⁸). The models can be implemented with code on GitHub. The embedded test data can be generated through code on GitHub. Source data are provided with this paper.

Code availability

All the codes in Python are available via GitHub at https://github.com/SN-1604/ARNLE and Zenodo at https://doi.org/10.5281/zenodo.13836159 (ref. ⁶⁷). A tutorial on how to use ARNLE and reproduce the findings is available on GitHub.

References

Douam, F. et al. Genetic dissection of the host tropism of human-tropic pathogens. Ann. Rev. Genet. 49, 21–45 (2015).
Article Google Scholar
Cui, J., Li, F. & Shi, Z. L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 17, 181–192 (2019).
Article MATH Google Scholar
Lim, Y. X. et al. Human coronaviruses: a review of virus-host interactions. Diseases 4, 26 (2016).
Choe, H. & Farzan, M. How SARS-CoV-2 first adapted in humans. Science 372, 466–467 (2021).
Article MATH Google Scholar
Zhou, P. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020).
Article MATH Google Scholar
Grint, D. J. et al. Severity of severe acute respiratory system coronavirus 2 (SARS-CoV-2) alpha variant (B. 1.1. 7) in England. Clin. Infect. Dis. 75, e1120–e1127 (2022).
Article Google Scholar
Liu, C. et al. The antibody response to SARS-CoV-2 Beta underscores the antigenic distance to other variants. Cell Host Microbe 30, 53–68.e12 (2022).
Article MATH Google Scholar
Liu, Y. & Rocklov, J. The reproductive number of the Delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus. J. Travel Med. 28, taab124 (2021).
Karim, S. & Karim, Q. A. Omicron SARS-CoV-2 variant: a new chapter in the COVID-19 pandemic. Lancet 398, 2126–2128 (2021).
Article MATH Google Scholar
Farinholt, T. et al. Transmission event of SARS-CoV-2 delta variant reveals multiple vaccine breakthrough infections. BMC Med. 19, 255 (2021).
Article MATH Google Scholar
Liu, L. et al. Striking antibody evasion manifested by the Omicron variant of SARS-CoV-2. Nature 602, 676–681 (2021).
Article MATH Google Scholar
Rossler, A. et al. SARS-CoV-2 Omicron variant neutralization in serum from vaccinated and convalescent persons. N. Engl. J. Med. 386, 698–700 (2022).
Ma, C. et al. Broad host tropism of ACE2-using MERS-related coronaviruses and determinants restricting viral recognition. Cell Discov. 9, 57 (2023).
Article MATH Google Scholar
Rambaut, A. et al. Addendum: a dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 6, 415 (2021).
Article MATH Google Scholar
Ozono, S. et al. SARS-CoV-2 D614G spike mutation increases entry efficiency with enhanced ACE2-binding affinity. Nat. Commun. 12, 848 (2021).
Article Google Scholar
Cele, S. et al. Escape of SARS-CoV-2 501Y.V2 from neutralization by convalescent plasma. Nature 593, 142–146 (2021).
Article MATH Google Scholar
Li, J. et al. Machine learning methods for predicting human-adaptive influenza A viruses based on viral nucleotide compositions. Mol. Biol. Evol. 37, 1224–1236 (2020).
Article MATH Google Scholar
Babayan, S. A., Orton, R. J. & Streicker, D. G. Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science 362, 577–580 (2018).
Article Google Scholar
Li, J. et al. Genomic representation predicts an asymptotic host adaptation of bat coronaviruses using deep learning. Front. Microbiol. 14, 1157608 (2023).
Article Google Scholar
Xia, X. Extreme genomic CpG deficiency in SARS-CoV-2 and evasion of host antiviral defense. Mol. Biol. Evol. 37, 2699–2705 (2020).
Pollock, D. D. et al. Viral CpG deficiency provides no evidence that dogs were intermediate hosts for SARS-CoV-2. Mol. Biol. Evol. 37, 2706–2710 (2020).
Article MATH Google Scholar
Li, J. et al. Deep learning based on biologically interpretable genome representation predicts two types of human adaptation of SARS-CoV-2 variants. Brief. Bioinform. 23, bbac036 (2022).
Asgari, E. & Mofrad, M. R. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10, e0141287 (2015).
Article MATH Google Scholar
Rao, R. et al. Evaluating protein transfer learning with TAPE. Adv. Neural Inf. Process. Syst. 32, 9689–9701 (2019).
MATH Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article MATH Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article MATH Google Scholar
Alley, E. C. et al. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).
Article MATH Google Scholar
Hie, B. et al. Learning the language of viral evolution and escape. Science 371, 284–288 (2021).
Article MathSciNet MATH Google Scholar
Roy, A. et al. Base composition and host adaptation of the SARS-CoV-2: insight from the codon usage perspective. Front. Microbiol. 12, 548275 (2021).
Kawashima, I. Y. et al. SARS-CoV-2 host prediction based on virus-host genetic features. Sci. Rep. 12, 4576 (2022).
Article MATH Google Scholar
Obermeyer, F. et al. Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. Science 376, 1327–1332 (2022).
Article MATH Google Scholar
JD, B. & RA, N. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evol. 9, vead055 (2023).
MATH Google Scholar
Li, L. et al. Machine learning detection of SARS-CoV-2 high-risk variants. Preprint at bioRxiv https://doi.org/10.1101/2023.04.19.537460 (2023).
Q, S. et al. VarEPS: an evaluation and prewarning system of known and virtual variations of SARS-CoV-2 genomes. Nucleic Acids Res. 50, D888–D897 (2022).
Article Google Scholar
Marquet, C. et al. Embeddings from protein language models predict conservation and variant effects. Hum. Genet. 141, 1629–1647 (2021).
Peters, M. et al. Deep contextualized word representations. In Proc. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (eds Walker, M. et al.) 2227–2237 (ACL, 2018).
Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018).
Radford, A. et al. Improving language understanding by generative pre-training. Preprint at https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (2018).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Article MATH Google Scholar
Zvyagin, M. et al. GenSLMs: genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. Int. J. High Perf. Comput. Appl. 37, 683–705 (2023).
Article MATH Google Scholar
Dao, T. et al. FlashAttention: fast and memory-efficient exact attention with IO-awareness. Adv. Neural Inf. Process. Syst. 35, 16344–16359 (2022).
MATH Google Scholar
Dai, Z. et al. Transformer-XL: attentive language models beyond a fixed-length context. Preprint at https://arxiv.org/abs/1901.02860 (2019).
Li, Y. et al. Localvit: bringing locality to vision transformers. Preprint at https://arxiv.org/abs/2104.05707 (2021).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet MATH Google Scholar
Yang, M. et al. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution. Nucleic Acids Res. 50, e81 (2022).
Article MATH Google Scholar
Chen, Z. et al. Integration of a deep learning classifier with a random forest approach for predicting malonylation sites. Genom. Proteom. Bioinform. 16, 451–459 (2018).
Article MATH Google Scholar
Fu, L., Peng, Q. & Chai, L. Predicting DNA methylation states with hybrid information based deep-learning model. IEEE/ACM Trans. Comput. Biol. Bioinform. 17, 1721–1728 (2020).
Article MATH Google Scholar
Pan, X. et al. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics 19, 511 (2018).
Article MATH Google Scholar
Lu, M. et al. Deep attention network for egocentric action recognition. IEEE Trans. Image Process. 28, 3703–3713 (2019).
Article MathSciNet MATH Google Scholar
De, S. et al. Griffin: mixing gated linear recurrences with local attention for efficient language models. Preprint at https://arxiv.org/abs/2402.19427 (2024).
Peng, B. et al. Rwkv: reinventing rnns for the transformer era. Preprint at https://arxiv.org/abs/2305.13048 (2023).
Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. Preprint at https://arxiv.org/abs/2312.00752 (2023).
Hammelman, J. & Gifford, D. K. Discovering differential genome sequence activity with interpretable and efficient deep learning. PLoS Comput. Biol. 17, e1009282 (2021).
Article MATH Google Scholar
Y, W. et al. Structural basis for SARS-CoV-2 Delta variant recognition of ACE2 receptor and broadly neutralizing antibodies. Nat. Commun. 13, 871 (2022).
Article MathSciNet MATH Google Scholar
Bugatti, A. et al. The D405N mutation in the spike protein of SARS-CoV-2 Omicron BA.5 inhibits spike/integrins interaction and viral infection of human lung microvascular endothelial cells. Viruses 15, 332 (2023).
He, P. et al. SARS-CoV-2 Delta and Omicron variants evade population antibody response by mutations in a single spike epitope. Nature Microbiol. 7, 1635–1649 (2022).
Article MATH Google Scholar
Supasa, P. et al. Reduced neutralization of SARS-CoV-2 B.1.1.7 variant by convalescent and vaccine sera. Cell 184, 2201–2211.e7 (2021).
Article MATH Google Scholar
Lista, M. J. et al. The P681H mutation in the spike glycoprotein of the Alpha variant of SARS-CoV-2 escapes IFITM restriction and is necessary for Type I interferon resistance. J. Virol. 96, e0125022 (2022).
Article MATH Google Scholar
Liu, Y. et al. Delta spike P681R mutation enhances SARS-CoV-2 fitness over Alpha variant. Cell Rep. 39, 110829 (2022).
Article Google Scholar
Mishra, T. et al. SARS-CoV-2 spike E156G/Delta157-158 mutations contribute to increased infectivity and immune escape. Life Sci. Alliance 5, e202201415 (2022).
Selvavinayagam, S. T. et al. Low SARS-CoV-2 viral load among vaccinated individuals infected with Delta B. 1.617. 2 and Omicron BA. 1.1. 529 but not with Omicron BA. 1.1 and BA. 2 variants. Front. Public Health 10, 1018399 (2022).
Article Google Scholar
He, X. et al. Research progress in spike mutations of SARS‐CoV‐2 variants and vaccine development. Med. Res. Rev. 43, 932–971 (2023).
Article MATH Google Scholar
Ai, J. et al. Antibody evasion of SARS-CoV-2 Omicron BA. 1, BA. 1.1, BA. 2, and BA. 3 sub-lineages. Cell Host Microbe 30, 1077–1083.e4 (2022).
Article MATH Google Scholar
Yunlong, C. et al. Characterization of the enhanced infectivity and antibody evasion of Omicron BA.2.75. Cell Host Microbe 30, 1527–1539.e5 (2022).
Article MATH Google Scholar
Marcinkiewicz, A. L. et al. Structural evolution of an immune evasion determinant shapes pathogen host tropism. Proc. Natl Acad. Sci. 120, e2301549120 (2023).
Article MATH Google Scholar
Yunlong, C. et al. Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution. Nature 614, 521–529 (2023).
Google Scholar
Liu, Y. SN-1604/ARNLE: attentional recurrent network based on language embedding (ARNLE). Zenodo https://doi.org/10.5281/zenodo.13836159 (2024).
Liu, Y. An attentional recurrent network based on language embedding framework for human tropism prediction reveals trends in the prevalence of SARS-CoV-2 variants. Zenodo https://doi.org/10.5281/zenodo.13826663 (2024).

Download references

Acknowledgements

This study was supported by grants from the National Key Research and Development Program (grant no. 2023YFC2604400, Peng Li), the Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences (grant no. CIFMS2022-I2M-1-011, J.T.Y.), the National Science and Technology Major Project of China (grant no. 2018ZX10201001, Peng Li) and the Biomedical High Performance Computing Platform, Chinese Academy of Medical Sciences.

Author information

These authors contributed equally: Yuqi Liu, Jing Li, Peihan Li.

Authors and Affiliations

Chinese PLA Center for Disease Control and Prevention, Beijing, China
Yuqi Liu, Peihan Li, Kaiying Wang, Jinhui Li, Lang Yang, Leili Jia, Peng Li & Hongbin Song
State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, Beijing, China
Jing Li
State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Yehong Yang, Jiangfeng Liu & Juntao Yang
State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of System Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, China
Aiping Wu

Authors

Yuqi Liu
View author publications
Search author on:PubMed Google Scholar
Jing Li
View author publications
Search author on:PubMed Google Scholar
Peihan Li
View author publications
Search author on:PubMed Google Scholar
Yehong Yang
View author publications
Search author on:PubMed Google Scholar
Kaiying Wang
View author publications
Search author on:PubMed Google Scholar
Jinhui Li
View author publications
Search author on:PubMed Google Scholar
Lang Yang
View author publications
Search author on:PubMed Google Scholar
Jiangfeng Liu
View author publications
Search author on:PubMed Google Scholar
Leili Jia
View author publications
Search author on:PubMed Google Scholar
Aiping Wu
View author publications
Search author on:PubMed Google Scholar
Juntao Yang
View author publications
Search author on:PubMed Google Scholar
Peng Li
View author publications
Search author on:PubMed Google Scholar
Hongbin Song
View author publications
Search author on:PubMed Google Scholar

Contributions

H.S., Peng Li, J.Y. and A.W. conceived the study. Y.L., K.W., Jinhui Li and Jiangfeng Liu managed the data. Y.L. and Jing Li built the model. Y.L., Peihan Li, Y.Y. and L.Y. performed the data analysis. Peng Li, L.J. and H.S. supervised the project. Y.L. and Jing Li wrote the manuscript. H.S., Peng Li, J.Y. and A.W. revised the paper.

Corresponding authors

Correspondence to Aiping Wu, Juntao Yang, Peng Li or Hongbin Song.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Jianyang Zeng, Wei Zheng and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Methods used for Supplementary figures and Supplementary Figs. 1–10 and Tables 1–7.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–7.

Supplementary Data 1

Statistical source data for Supplementary Fig. 1.

Supplementary Data 2

Statistical source data for Supplementary Fig. 2.

Supplementary Data 3

Statistical source data for Supplementary Fig. 3.

Supplementary Data 4

Statistical source data for Supplementary Fig. 4.

Supplementary Data 5

Statistical source data for Supplementary Fig. 5.

Supplementary Data 6

Statistical source data for Supplementary Fig. 6.

Supplementary Data 7

Statistical source data for Supplementary Fig. 7.

Supplementary Data 8

Statistical source data for Supplementary Fig. 8.

Supplementary Data 9

Statistical source data for Supplementary Fig .9.

Supplementary Data 10

Statistical source data for Supplementary Fig. 10.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Li, J., Li, P. et al. ARNLE model identifies prevalence potential of SARS-CoV-2 variants. Nat Mach Intell 7, 18–28 (2025). https://doi.org/10.1038/s42256-024-00919-2

Download citation

Received: 03 September 2023
Accepted: 01 October 2024
Published: 31 December 2024
Issue date: January 2025
DOI: https://doi.org/10.1038/s42256-024-00919-2

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links