Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Briefing
  • Published:

Generalized AI models for genomics applications

The Nucleotide Transformer is a series of foundation models pre-trained on DNA sequences through self-supervised learning that extracts context-specific representations of nucleotide sequences. These representations can then be used to accurately predict molecular phenotypes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Pre-training and fine-tuning of Nucleotide Transformer models.

References

  1. Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). This Review argues for deep learning as a major genomics modelling technique.

    Article  CAS  PubMed  Google Scholar 

  2. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). This paper presents Enformer, a supervised deep learning model that predicts thousands of epigenetic and transcriptional profiles from hundreds of human and mouse cell types using DNA sequences of up to 200 kb as input.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Nguyen, E. et al. HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. NeurIPS Proc. 1872, 43177–43201 (2023). This paper presents HyenaDNA, a DNA foundation model with context lengths of up to 1 million nucleotides.

    Google Scholar 

  4. Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. Preprint at https://doi.org/10.48550/arXiv.2306.15006 (2024). This preprint presents DNABERT-2, a multi-species DNA language model using byte pair encoding.

  5. Tang, Z. & Koo, P. K. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. Preprint at bioRxiv https://doi.org/10.1101/2024.02.29.582810 (2024). This preprint evaluates the power of pre-trained genomics language models.

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods https://doi.org/10.1038/s41592-024-02523-z (2024).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Generalized AI models for genomics applications. Nat Methods 22, 231–232 (2025). https://doi.org/10.1038/s41592-024-02524-y

Download citation

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-024-02524-y

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research