Generalized AI models for genomics applications

doi:10.1038/s41592-024-02524-y

Research Briefing
Published: 28 November 2024

Generalized AI models for genomics applications

Nature Methods volume 22, pages 231–232 (2025)Cite this article

4700 Accesses
1 Citations
33 Altmetric
Metrics details

Subjects

The Nucleotide Transformer is a series of foundation models pre-trained on DNA sequences through self-supervised learning that extracts context-specific representations of nucleotide sequences. These representations can then be used to accurately predict molecular phenotypes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Pre-training and fine-tuning of Nucleotide Transformer models.**

References

Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). This Review argues for deep learning as a major genomics modelling technique.
Article CAS PubMed Google Scholar
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). This paper presents Enformer, a supervised deep learning model that predicts thousands of epigenetic and transcriptional profiles from hundreds of human and mouse cell types using DNA sequences of up to 200 kb as input.
Article CAS PubMed PubMed Central Google Scholar
Nguyen, E. et al. HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. NeurIPS Proc. 1872, 43177–43201 (2023). This paper presents HyenaDNA, a DNA foundation model with context lengths of up to 1 million nucleotides.
Google Scholar
Zhou, Z. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. Preprint at https://doi.org/10.48550/arXiv.2306.15006 (2024). This preprint presents DNABERT-2, a multi-species DNA language model using byte pair encoding.
Tang, Z. & Koo, P. K. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. Preprint at bioRxiv https://doi.org/10.1101/2024.02.29.582810 (2024). This preprint evaluates the power of pre-trained genomics language models.

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods https://doi.org/10.1038/s41592-024-02523-z (2024).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Generalized AI models for genomics applications. Nat Methods 22, 231–232 (2025). https://doi.org/10.1038/s41592-024-02524-y

Download citation

Published: 28 November 2024
Version of record: 28 November 2024
Issue date: February 2025
DOI: https://doi.org/10.1038/s41592-024-02524-y

Generalized AI models for genomics applications

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Nucleotide Transformer: building and evaluating robust foundation models for human genomics

Search

Quick links

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links