Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Briefing
  • Published:

Assessing uncertainty of sequence representations generated by protein language models

Language model-inferred embeddings are replacing structure-derived descriptions of proteins, genes and genomes. We propose a model-agnostic measure to quantify reliability of these new representations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: RNS-based assessments of embeddings identify poorly represented proteins across different data sets.

References

  1. Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017). This work introduces the attention mechanism in transformer architecture.

  2. Weissenow, K. & Rost, B. Are protein language models the new universal key? Curr. Opin. Struct. Biol. 91, 102997 (2025). This review article discusses the transition from evolutionary information to machine-learned embeddings for protein prediction.

    Article  CAS  PubMed  Google Scholar 

  3. Dallago, C. et al. Learned embeddings from deep learning to visualize and predict protein sets. Curr. Protoc. 1, e113 (2021). This article introduces ‘Bioembeddings’, a publicly available library of pLM pipelines.

    Article  PubMed  Google Scholar 

  4. Saul, B. N. & Christian, D. W. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970). The earliest work we have identified that illustrates use of random sequences to evaluate significance of protein sequence similarities.

    Article  Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Prabakaran, R. & Yana Bromberg, Y. Quantifying uncertainty in protein representations across models and tasks. Nat. Methods https://doi.org/10.1038/s41592-026-03028-7 (2026).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Assessing uncertainty of sequence representations generated by protein language models. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03027-8

Download citation

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41592-026-03027-8

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing