The Nucleotide Transformer is a series of foundation models pre-trained on DNA sequences through self-supervised learning that extracts context-specific representations of nucleotide sequences. These representations can then be used to accurately predict molecular phenotypes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout

References
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). This Review argues for deep learning as a major genomics modelling technique.
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). This paper presents Enformer, a supervised deep learning model that predicts thousands of epigenetic and transcriptional profiles from hundreds of human and mouse cell types using DNA sequences of up to 200 kb as input.
Nguyen, E. et al. HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. NeurIPS Proc. 1872, 43177–43201 (2023). This paper presents HyenaDNA, a DNA foundation model with context lengths of up to 1 million nucleotides.
Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. Preprint at https://doi.org/10.48550/arXiv.2306.15006 (2024). This preprint presents DNABERT-2, a multi-species DNA language model using byte pair encoding.
Tang, Z. & Koo, P. K. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. Preprint at bioRxiv https://doi.org/10.1101/2024.02.29.582810 (2024). This preprint evaluates the power of pre-trained genomics language models.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is a summary of: Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods https://doi.org/10.1038/s41592-024-02523-z (2024).
Rights and permissions
About this article
Cite this article
Generalized AI models for genomics applications. Nat Methods 22, 231–232 (2025). https://doi.org/10.1038/s41592-024-02524-y
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41592-024-02524-y