Transformers and genome language models

Consens, Micaela E.; Dufault, Cameron; Wainberg, Michael; Forster, Duncan; Karimzadeh, Mehran; Goodarzi, Hani; Theis, Fabian J.; Moses, Alan; Wang, Bo

doi:10.1038/s42256-025-01007-9

Review Article
Published: 13 March 2025

Transformers and genome language models

Micaela E. Consens^1,2,3,
Cameron Dufault¹,
Michael Wainberg^2,4,5,6,7,
Duncan Forster^2,8,9,
Mehran Karimzadeh^2,10,11,12,
Hani Goodarzi ORCID: orcid.org/0000-0002-9648-8949^10,11,12,
Fabian J. Theis ORCID: orcid.org/0000-0002-2419-1943^13,14,15,16,
Alan Moses^1,17 &
…
Bo Wang ORCID: orcid.org/0000-0002-9620-3413^1,2,3,18

Nature Machine Intelligence volume 7, pages 346–362 (2025)Cite this article

11k Accesses
26 Citations
112 Altmetric
Metrics details

Subjects

Abstract

Large language models based on the transformer deep learning architecture have revolutionized natural language processing. Motivated by the analogy between human language and the genome’s biological code, researchers have begun to develop genome language models (gLMs) based on transformers and related architectures. This Review explores the use of transformers and language models in genomics. We survey open questions in genomics amenable to the use of gLMs, and motivate the use of gLMs and the transformer architecture for these problems. We discuss the potential of gLMs for modelling the genome using unsupervised pretraining tasks, specifically focusing on the power of zero- and few-shot learning. We explore the strengths and limitations of the transformer architecture, as well as the strengths and limitations of current gLMs more broadly. Additionally, we contemplate the future of genomic modelling beyond the transformer architecture, based on current trends in research. This Review serves as a guide for computational biologists and computer scientists interested in transformers and language models for genomic data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A big-picture look at the diverse applications of gLMs.**

**Fig. 2: A comparison of how different genomic deep learning models operate on DNA sequence data.**

**Fig. 3: The total amount of compute, in PFS-days used to train the various models discussed in the Review (all of the models for which parameter number, training time, and GPU usage were available).**

Genomic language model predicts protein co-regulation and function

Article Open access 03 April 2024

Navigating the pitfalls of applying machine learning in genomics

Article 26 November 2021

A long-context language model for deciphering and generating bacteriophage genomes

Article Open access 30 October 2024

References

Nichol, A. et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. Preprint at https://doi.org/10.48550/arXiv.2112.10741 (2021).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with clip latents. Preprint at https://doi.org/10.48550/arXiv.2204.06125 (2022).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://doi.org/10.48550/arXiv.1810.04805 (2018).
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).
MATH Google Scholar
Yang, Z. et al. XLNet: generalized autoregressive pretraining for language understanding. In Proc. 33rd International Conference Neural Information Prcoessing Systems 517, 5753–5763 (2019).
Brown, T. B. et al. Language models are few-shot learners. Preprint at https://doi.org/10.48550/arXiv.2005.14165 (2020).
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Article MATH Google Scholar
GPT-4 Technical Report (OpenAI, 2023).
Warr, A. et al. Exome sequencing: current and future perspectives. G3 Genes Genomes Genet. 5, 1543–1550 (2015).
Article MATH Google Scholar
Ng, P. C. & Kirkness, E. F. Whole genome sequencing. Genet. Var. 628, 215–226 (2010).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21–29 (2015).
Article Google Scholar
Park, P. J. ChIP–seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009).
Article MATH Google Scholar
Vaisvila, R. et al. Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res. 31, 1280–1289 (2021).
Article MATH Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Article Google Scholar
Haque, A., Engel, J., Teichmann, S. A. & Lönnberg, T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 9, 75 (2017).
Article MATH Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article MATH Google Scholar
Ecker, J. R. et al. ENCODE explained. Nature 489, 52–54 (2012).
Article Google Scholar
Zou, J. et al. A primer on deep learning in genomics. Nat. Genet. 51, 12–18 (2019).
Article MATH Google Scholar
Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinforma. Oxf. Engl. 31, 761–763 (2014).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Article Google Scholar
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015).
Article MATH Google Scholar
Pei, G., Hu, R., Jia, P. & Zhao, Z. DeepFun: a deep learning sequence-based model to decipher non-coding variant effect in a tissue- and cell type-specific manner. Nucleic Acids Res. 49, W131–W139 (2021).
Article Google Scholar
Hassanzadeh, H. R. & Wang, M. DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. In Proc. IEEE International Conference on Bioinformatics and Biomedicine Vol. 2016, 178–183 (2016).
Trieu, T., Martinez-Fundichely, A. & Khurana, E. DeepMILO: a deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure. Genome Biol. 21, 79 (2020).
Article MATH Google Scholar
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Article MATH Google Scholar
Wang, M., Tai, C., E, W. & Wei, L. Define: Deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res. 46, e69 (2018).
Article MATH Google Scholar
He, Z., Liu, L., Wang, K. & Ionita-Laza, I. A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs. Nat. Commun. 9, 5199 (2018).
Article MATH Google Scholar
Wells, A. et al. Ranking of non-coding pathogenic variants and putative essential regions of the human genome. Nat. Commun. 10, 5241 (2019).
Article MATH Google Scholar
Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).
Article MATH Google Scholar
Avsec, Z. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Article MATH Google Scholar
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Article MATH Google Scholar
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Tasaki, S., Gaiteri, C., Mostafavi, S. & Wang, Y. Deep learning decodes the principles of differential gene expression. Nat. Mach. Intell. 2, 376–386 (2020).
Article MATH Google Scholar
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
Article MathSciNet Google Scholar
Fudenberg, G., Kelley, D. R. & Pollard, K. S. Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods 17, 1111–1117 (2020).
Article Google Scholar
Avsec, Z. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Article MATH Google Scholar
Vitsios, D., Dhindsa, R. S., Middleton, L., Gussow, A. B. & Petrovski, S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat. Commun. 12, 1504 (2021).
Article Google Scholar
Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genome. Preprint at https://doi.org/10.48550/arXiv.2306.15006 (2023).
Cui, H., Wang, C., Maan, H. & Wang, B. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).
Tan, J. et al. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat. Biotechnol. 41, 1140–1150 (2023).
Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).
Article MATH Google Scholar
Bolya, D., Fu, C.-Y., Dai, X., Zhang, P. & Hoffman, J. Hydra Attention: efficient attention with many heads. Preprint at https://doi.org/10.48550/arXiv.2209.07484 (2022).
Ma, X. et al. Mega: moving average equipped gated attention. Preprint at https://doi.org/10.48550/arXiv.2209.10655 (2022).
Nguyen, E. et al. HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. Preprint at https://doi.org/10.48550/arXiv.2306.15794 (2023).
Jones, W., Alasoo, K., Fishman, D. & Parts, L. Computational biology: deep learning. Emerg. Top. Life Sci. 1, 257–274 (2017).
Article Google Scholar
Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
Article MATH Google Scholar
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2017).
MATH Google Scholar
Richards, B. A. et al. A deep learning framework for neuroscience. Nat. Neurosci. 22, 1761–1770 (2019).
Article MATH Google Scholar
Wainberg, M., Merico, D., Delong, A. & Frey, B. J. Deep learning in biomedicine. Nat. Biotechnol. 36, 829–838 (2018).
Article MATH Google Scholar
Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W. & Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 24, 125–137 (2023).
Article Google Scholar
Talukder, A., Barham, C., Li, X. & Hu, H. Interpretation of deep learning in genomics and epigenomics. Brief. Bioinform. 22, bbaa177 (2021).
Article Google Scholar
Li, Z. et al. Applications of deep learning in understanding gene regulation. Cell Rep. Methods 3, 100384 (2023).
Article MATH Google Scholar
Eraslan, G., Avsec, Z., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
Article Google Scholar
Routhier, E. & Mozziconacci, J. Genomics enters the deep learning era. PeerJ 10, e13613 (2022).
Article MATH Google Scholar
Sapoval, N. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 13, 1728 (2022).
Article MATH Google Scholar
Muse, S. Introduction to Biomedical Engineering 2nd edn (eds Enderle, J. D. et al.) Ch. 13, 799–831 (2005).
Marin, F. I. et al. BEND: benchmarking DNA language models on biologically meaningful tasks. Preprint at https://doi.org/10.48550/arXiv.2311.12570 (2024).
Benson, D. A. et al. GenBank. Nucleic Acids Res. 41, D36–D42 (2013).
Article Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article MATH Google Scholar
Leinonen, R., Sugawara, H. & Shumway, M. The Sequence Read Archive. Nucleic Acids Res. 39, D19–D21 (2011).
Article Google Scholar
Song, L. & Crawford, G. E. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb. Protoc. 2010, pdb.prot5384 (2010).
Article MATH Google Scholar
Belton, J.-M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article MATH Google Scholar
Yao, D. et al. Multicenter integrated analysis of noncoding CRISPRi screens. Nat. Methods 21, 723–734 (2024).
Article MATH Google Scholar
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Article Google Scholar
Satterlee, J. S. et al. The NIH Common Fund/Roadmap Epigenomics Program: successes of a comprehensive consortium. Sci. Adv. 5, eaaw6507 (2019).
Article Google Scholar
Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article MATH Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article MATH Google Scholar
Sennrich, R., Haddow, B. & Birch, A. Neural machine translation of rare words with subword units. Preprint at https://doi.org/10.48550/arXiv.1508.07909 (2016).
Chandra, A., Tünnermann, L., Löfstedt, T. & Gratz, R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 12, e82819 (2023).
Article Google Scholar
Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725–734 (2022).
Article MATH Google Scholar
Tang, Z. & Koo, P. K. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. Preprint at bioRxiv https://doi.org/10.1101/2024.02.29.582810 (2024).
Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. In NIPS'12: Proc. 26th International Conference on Neural Information Processing Systems Vol. 1, 1097–1105 (NIPS, 2012).
Elman, J. L. Finding structure in time. Cogn. Sci. 14, 179–211 (1990).
Article MATH Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article MATH Google Scholar
Vaswani, A. et al. Attention is all you need. Preprint at https://doi.org/10.48550/arXiv.1706.03762 (2017).
Wang, T. et al. What language model architecture and pretraining objective works best for zero-shot generalization? In Int. Conf. Machine Learning 22964–22984 (PMLR, 2022).
Poli, M. et al. Hyena Hierarchy: towards larger convolutional language models. Preprint at https://doi.org/10.48550/arXiv.2302.10866 (2023).
Tay, Y. et al. Are pre-trained convolutions better than pre-trained transformers? Preprint at https://doi.org/10.48550/arXiv.2105.03322 (2022).
Yang, K. K., Lu, A. X. & Fusi, N. Convolutions are competitive with transformers for protein sequence pretraining. Cell Syst. 15, 286–294.e2 (2024).
Greene, C. S. The future is unsupervised. Sci. Transl. Med. 8, 346ec108 (2016).
Article MATH Google Scholar
Benegas, G., Batra, S. S. & Song, Y. S. DNA language models are powerful predictors of genome-wide variant effects. Proc. Natl Acad. Sci. USA 120, e2311219120 (2023).
Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).
Article Google Scholar
Zhang, Y., Bai, Z. & Imoto, S. Investigation of the BERT model on nucleotide sequences with non-standard pre-training and evaluation of different k-mer embeddings. Bioinformatics 39, btad617 (2023).
Gu, A., Goel, K. & Ré, C. Efficiently modeling long sequences with structured state spaces. Preprint at https://doi.org/10.48550/arXiv.2111.00396 (2022).
Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. Preprint at https://doi.org/10.48550/arXiv.2312.00752 (2024).
Schiff, Y. et al. Caduceus: bi-directional equivariant long-range dna sequence modeling. Preprint at https://doi.org/10.48550/arXiv.2403.03234 (2024).
Bishop, C. M. & Bishop, H. Deep Learning: Foundations and Concepts (Springer International, 2024).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
MIT Deep Learning 6.S191. http://introtodeeplearning.com (accessed 11 July 2024).
Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10, e0130140 (2015).
Article MATH Google Scholar
Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Preprint at bioRxiv https://doi.org/10.1101/2022.09.15.508087 (2022).
Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).
Article MATH Google Scholar
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. https://doi.org/10.1038/s41588-024-02053-6 (2025).
Fishman, V. et al. GENA-LM: a family of open-source foundational DNA language models for long sequences, Nucleic Acids Res. 53, gkae1310 (2025).
Dao, T., Fu, D. Y., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-awareness. Preprint at https://doi.org/10.48550/arXiv.2205.14135 (2022).
Press, O., Smith, N. A. & Lewis, M. Train short, test long: attention with linear biases enables input length extrapolation. Preprint at https://doi.org/10.48550/arXiv.2108.12409 (2022).
Hu, E. J. et al. LoRA: low-rank adaptation of large language models. Preprint at https://doi.org/10.48550/arXiv.2106.09685 (2021).
Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. Transformers are RNNs: fast autoregressive transformers with linear attention. Preprint at https://doi.org/10.48550/arXiv.2006.16236 (2020).
Sun, Y. et al. Retentive Network: a successor to transformer for large language models. Preprint at https://doi.org/10.48550/arXiv.2307.08621 (2023).
Gresova, K., Martinek, V., Cechak, D., Simecek, P. & Alexiou, P. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data 24, 25 (2023).
Article Google Scholar
Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).
Serrano, Y., Ciudad, A. & Molina, A. Are protein language models compute optimal? Preprint at https://doi.org/10.48550/arXiv.2406.07249 (2024).
Li, F.-Z., Amini, A. P., Yue, Y., Yang, K. K. & Lu, A. X. Feature reuse and scaling: understanding transfer learning with protein language models. Preprint at bioRxiv https://doi.org/10.1101/2024.02.05.578959 (2024).
Theodoris, C. V. Perspectives on benchmarking foundation models for network biology. Quant. Biol. 12, 335–338 (2024).
Article MATH Google Scholar
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Article MATH Google Scholar
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Article MATH Google Scholar
Fang, R. et al. Comprehensive analysis of single cell ATAC-seq data with SnapATAC. Nat. Commun. 12, 1337 (2021).
Article MATH Google Scholar
Chen, Y., Xie, M. & Wen, J. Predicting gene expression from histone modifications with self-attention based neural networks and transfer learning. Front. Genet. 13, 1081842 (2022).
Article Google Scholar
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article MATH Google Scholar
Dwork, C. & Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2014).
Article MathSciNet MATH Google Scholar
McMahan, H. B., Moore, E., Ramage, D., Hampson, S. & Arcas, B. A. Y. Communication-efficient learning of deep networks from decentralized data. Preprint at https://doi.org/10.48550/arXiv.1602.05629 (2016).
Clauwaert, J., Menschaert, G. & Waegeman, W. Explainability in transformer models for functional genomics. Brief. Bioinform. 22, bbab060 (2021).
Article MATH Google Scholar
Serrano, S. & Smith, N. A. Is attention interpretable? Preprint at https://doi.org/10.48550/arXiv.1906.03731 (2019).
Chefer, H., Gur, S. & Wolf, L. Transformer interpretability beyond attention visualization. Preprint at https://doi.org/10.48550/arXiv.2012.09838 (2020).
Voita, E., Talbot, D., Moiseev, F., Sennrich, R. & Titov, I. Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. Preprint at https://doi.org/10.48550/arXiv.1905.09418 (2019).
Abnar, S. & Zuidema, W. Quantifying attention flow in transformers. Preprint at https://doi.org/10.48550/arXiv.2005.00928 (2020).
Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R. & Samek, W. Layer-wise relevance propagation for neural networks with local renormalization layers. In Artificial Neural Networks and Machine Learning–ICANN 2016: 25th International Conference on Artificial Neural Networks 63–71 (Springer, 2016).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. Preprint at https://doi.org/10.48550/arXiv.1705.07874 (2017).
Kwon, Y. & Zou, J. WeightedSHAP: analyzing and improving Shapley based feature attributions. Preprint at https://doi.org/10.48550/arXiv.2209.13429 (2022).
Ullah, F. & Ben-Hur, A. A self-attention model for inferring cooperativity between regulatory features. Nucleic Acids Res. 49, e77 (2021).
Article MATH Google Scholar
Toneyan, S. & Koo, P. K. Interpreting cis-regulatory interactions from large-scale deep neural networks. Nat. Genet. 56, 2517–2527 (2024).
Article Google Scholar
Zhang, Z. et al. Protein language models learn evolutionary statistics of interacting sequence motifs. Proc. Natl Acad. Sci. USA 121, e2406285121 (2024).
Article Google Scholar
Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. Preprint at https://doi.org/10.48550/arXiv.2006.15222 (2021).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://doi.org/10.48550/arXiv.2108.07258 (2022).
Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Assessing the limits of zero-shot foundation models in single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.561085 (2023).
Lu, A. X., Lu, A. X. & Moses, A. Evolution is all you need: phylogenetic augmentation for contrastive learning. Preprint at https://doi.org/10.48550/arXiv.2012.13475 (2020).
Benegas, G., Albors, C., Aw, A. J., Ye, C. & Song, Y. S. A DNA language model based on multispecies alignment predicts the effects of genome-wide variants. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02511-w (2025).
Belancio, V. P., Deininger, P. L. & Roy-Engel, A. M. LINE dancing in the human genome: transposable elements and disease. Genome Med. 1, 97 (2009).
Article Google Scholar
Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).
Article MATH Google Scholar
Levine, D. et al. Cell2Sentence: teaching large language models the language of biology. Preprint at bioRxiv https://doi.org/10.1101/2023.09.11.557287 (2023).
Hao, M. et al. Large scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).
Article MATH Google Scholar
Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat. Methods 21, 1430–1443 (2024).
Article MATH Google Scholar
Hao, M. et al. Current opinions on large cellular models. Quant. Biol. 12, 433–443 (2024).
Article MATH Google Scholar
Hassani, A. & Shi, H. Dilated neighborhood attention transformer. Preprint at https://doi.org/10.48550/arXiv.2209.15001 (2022).
Bolya, D. et al. Token Merging: your ViT but faster. Preprint at https://doi.org/10.48550/arXiv.2210.09461 (2022).
Alamdari, S. et al. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at bioRxiv https://doi.org/10.1101/2023.09.11.556673 (2023).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Kimura, M. Solution of a process of random genetic drift with a continuous model. Proc. Natl Acad. Sci. USA 41, 144–150 (1955).
Article MATH Google Scholar
Kimura, M. Stochastic processes and distribution of gene frequencies under natural selection. Cold Spring Harb. Symp. Quant. Biol. 20, 33–53 (1955).
Article MATH Google Scholar
Wakeley, J. The limits of theoretical population. Genetics 169, 1–7 (2005).
Article MATH Google Scholar
DaSilva, L. F. et al. DNA-Diffusion: leveraging generative models for controlling chromatin accessibility and gene expression via synthetic regulatory elements. Preprint at bioRxiv https://doi.org/10.1101/2024.02.01.578352 (2024).

Download references

Acknowledgements

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Micaela E. Consens, Cameron Dufault, Alan Moses & Bo Wang
Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
Micaela E. Consens, Michael Wainberg, Duncan Forster, Mehran Karimzadeh & Bo Wang
Peter Munk Cardiac Center, University Health Network, Toronto, Ontario, Canada
Micaela E. Consens & Bo Wang
Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
Michael Wainberg
Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
Michael Wainberg
Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
Michael Wainberg
Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
Michael Wainberg
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Duncan Forster
The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
Duncan Forster
Arc Institute, Palo Alto, CA, USA
Mehran Karimzadeh & Hani Goodarzi
Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
Mehran Karimzadeh & Hani Goodarzi
Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
Mehran Karimzadeh & Hani Goodarzi
Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Munich, Germany
Fabian J. Theis
TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
Fabian J. Theis
Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
Fabian J. Theis
Munich Center for Machine Learning, Technical University of Munich, Garching, Germany
Fabian J. Theis
Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
Alan Moses
Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
Bo Wang

Authors

Micaela E. Consens
View author publications
Search author on:PubMed Google Scholar
Cameron Dufault
View author publications
Search author on:PubMed Google Scholar
Michael Wainberg
View author publications
Search author on:PubMed Google Scholar
Duncan Forster
View author publications
Search author on:PubMed Google Scholar
Mehran Karimzadeh
View author publications
Search author on:PubMed Google Scholar
Hani Goodarzi
View author publications
Search author on:PubMed Google Scholar
Fabian J. Theis
View author publications
Search author on:PubMed Google Scholar
Alan Moses
View author publications
Search author on:PubMed Google Scholar
Bo Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

M.E.C. selected the papers to review, summarized contributions from all papers, performed analysis, and designed all figures. A.M., B.W., M.W. and D.F. helped with figure design. C.D. contributed to paper selection and summarizing contributions; A.M., M.W., M.K., F.J.T. and H.G. contributed to manuscript writing. A.M. supervised and B.W. conceived and supervised the project.

Corresponding author

Correspondence to Bo Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Jesper Tegner, Fan Yang, Xuegong Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Appendices A–C, Table 1, Figs. 1 and 2.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Consens, M.E., Dufault, C., Wainberg, M. et al. Transformers and genome language models. Nat Mach Intell 7, 346–362 (2025). https://doi.org/10.1038/s42256-025-01007-9

Download citation

Received: 29 January 2024
Accepted: 31 January 2025
Published: 13 March 2025
Issue date: March 2025
DOI: https://doi.org/10.1038/s42256-025-01007-9

This article is cited by

Creating interpretable deep learning models to identify species using environmental DNA sequences
- Samuel Waggoner
- Jon Donnelly
- Chaofan Chen
Scientific Reports (2025)
Advances in haplotype phasing and genotype imputation
- Quan Sun
- Yun Li
Nature Reviews Genetics (2025)