Benchmarking genomic language models

Tang, Lin

doi:10.1038/s41592-025-02829-6

Research Highlight
Published: 10 September 2025

Bioinformatics

Benchmarking genomic language models

Lin Tang¹

Nature Methods volume 22, page 1758 (2025)Cite this article

1108 Accesses
3 Altmetric
Metrics details

Subjects

Access through your institution

Buy or subscribe

Supervised deep learning models have a dazzling track record in many computational genomics tasks, but their success relies on vast (and often costly) experimental data for training. Recently, genomic language models (gLMs), whose pretraining needs only (although large numbers of) DNA sequences, have manifested as a potentially appealing alternative and are interesting many researchers in the genomics and computational biology community, including Peter Koo of Cold Spring Harbor Laboratory. “We were initially excited by the growing class of gLMs that aim to learn unsupervised representations of DNA,” he says. However, after building these models with his team, “We found that they consistently underperformed well-established supervised models.” Intrigued to know whether these observations prevail more generally, Koo and his colleagues shifted their project to perform a rigorous evaluation of gLMs.

Challenges abound for benchmarking analysis in such a fast-paced area. Although new gLMs keep emerging, issues with code and data availability often hinder full reproducibility. “Many functional genomics modeling papers provided code and data, but these were often incomplete or difficult to adapt,” says Koo. This led the team to concentrate on a small but representative set of gLMs whose data and model baselines could be reliably obtained. Another important distinction of their benchmarking study compared to previous efforts is the tasks they designed. “The key innovation of our evaluation is its focus on biologically aligned tasks that are tied to open questions in gene regulation,” notes Koo. “In contrast, most existing benchmarks rely on classification tasks that originated in the machine learning literature and continue to be propagated in gLM studies, despite being disconnected from how models would be used to advance biological understanding and discovery.”

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Author information

Authors and Affiliations

Nature Methods https://www.nature.com/nmeth/
Lin Tang

Authors

Lin Tang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Lin Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, L. Benchmarking genomic language models. Nat Methods 22, 1758 (2025). https://doi.org/10.1038/s41592-025-02829-6

Download citation

Published: 10 September 2025
Issue date: September 2025
DOI: https://doi.org/10.1038/s41592-025-02829-6