Assessing uncertainty of sequence representations generated by protein language models

doi:10.1038/s41592-026-03027-8

Research Briefing
Published: 01 April 2026

Assessing uncertainty of sequence representations generated by protein language models

Nature Methods (2026)Cite this article

Subjects

Language model-inferred embeddings are replacing structure-derived descriptions of proteins, genes and genomes. We propose a model-agnostic measure to quantify reliability of these new representations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: RNS-based assessments of embeddings identify poorly represented proteins across different data sets.**

References

Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017). This work introduces the attention mechanism in transformer architecture.
Weissenow, K. & Rost, B. Are protein language models the new universal key? Curr. Opin. Struct. Biol. 91, 102997 (2025). This review article discusses the transition from evolutionary information to machine-learned embeddings for protein prediction.
Article CAS PubMed Google Scholar
Dallago, C. et al. Learned embeddings from deep learning to visualize and predict protein sets. Curr. Protoc. 1, e113 (2021). This article introduces ‘Bioembeddings’, a publicly available library of pLM pipelines.
Article PubMed Google Scholar
Saul, B. N. & Christian, D. W. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970). The earliest work we have identified that illustrates use of random sequences to evaluate significance of protein sequence similarities.
Article Google Scholar

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Prabakaran, R. & Yana Bromberg, Y. Quantifying uncertainty in protein representations across models and tasks. Nat. Methods https://doi.org/10.1038/s41592-026-03028-7 (2026).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Assessing uncertainty of sequence representations generated by protein language models. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03027-8

Download citation

Published: 01 April 2026
Version of record: 01 April 2026
DOI: https://doi.org/10.1038/s41592-026-03027-8

Assessing uncertainty of sequence representations generated by protein language models

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Quantifying uncertainty in protein representations across models and tasks

Search

Quick links

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links