Bridging the gap between hybrid and sequence-only protein language models

doi:10.1038/s41592-026-03049-2

Research Briefing
Published: 30 March 2026

Bridging the gap between hybrid and sequence-only protein language models

Nature Methods (2026)Cite this article

94 Accesses
2 Altmetric
Metrics details

Subjects

By allowing protein language models (PLMs) to learn from each other’s most confident predictions, we compressed the collective knowledge of existing PLMs into VESM — a single sequence-only model that outperforms state-of-the-art hybrid methods. VESM predictions extended beyond binary pathogenicity classification, accurately quantifying the severity of variant effects on clinical phenotypes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: VESM predicts clinical, experimental and phenotypic effects of variants.**

References

Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021). This paper introduced the ESM family and showed that protein language models can learn rich biological constraints from sequence data.
Article CAS PubMed PubMed Central Google Scholar
Brandes, N. et al. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023). This paper introduced protein language models as foundational tools for genome-wide variant effect prediction.
Article CAS PubMed PubMed Central Google Scholar
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). This paper introduced AlphaMissense as a state-of-the-art hybrid variant effect predictor trained on sequence alignments, 3D protein structures and population allele frequency data.
Article CAS PubMed Google Scholar
Karczewski, K. J. et al. Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genomics 2, 100168 (2022). This paper introduces Genebass, a public resource of rare variant association statistics from UK Biobank exomes, providing the data we used to evaluate VESM predictions.
Article CAS PubMed PubMed Central Google Scholar
Notin, P. et al. ProteinGym: large-scale benchmarks for protein fitness prediction and design. In Advances in Neural Information Processing Systems Vol. 36 (eds. Oh, A. et al.) 64331–64379 (Curran Associates, 2023). This article introduces ProteinGym, a comprehensive benchmark of clinical and deep-mutational-scanning datasets that enables systematic comparison of variant effect prediction methods.

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Dinh, T., Jang, S. K., Zaitlen, N. & Ntranos, V. Compressing the collective knowledge of ESM into a single protein language model. Nat. Methods https://doi.org/10.1038/s41592-026-03050-9 (2026).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bridging the gap between hybrid and sequence-only protein language models. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03049-2

Download citation

Published: 30 March 2026
Version of record: 30 March 2026
DOI: https://doi.org/10.1038/s41592-026-03049-2

Bridging the gap between hybrid and sequence-only protein language models

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Compressing the collective knowledge of ESM into a single protein language model

Search

Quick links

Subjects

Access options

References

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links