Abstract
Accurate characterization of multi-state protein conformations is crucial for understanding their functional mechanisms and advancing targeted therapies. Extracting coevolutionary constraints from homologous sequences helps reveal protein structure and function, which can be automatically captured by MSA Transformer leveraging attention mechanisms. Making use of the multi-conformational coevolutionary signals captured by MSA Transformer, we introduce in this study EvoSplit to disentangle coevolutionary signals associated with distinct conformations to guide protein structure predictions. EvoSplit outperforms AF-Cluster on 85 fold-switching proteins and successfully models the conformations of proteins beyond AlphaFold2’s training set. We then identify 54 candidates with potential conformational diversity for cancer-related human proteins. Notably, for five GTPases, EvoSplit consistently predicts two conformations, one of which has not been previously reported. As an important example, the protein–protein interaction analysis provides new insights into novel HRAS function-associated conformations. Furthermore, the validity of these newly identified conformations is examined by evolutionary analysis and extensive molecular dynamics simulations.
Data availability
The sources of public data used for method evaluation and the exploration of potential multi-conformational proteins are detailed in the “Dataset Collection” subsection in Methods. All referenced ground truth structures were obtained from the PDB database. All relevant evaluation datasets and results generated in this study are publicly available via Zenodo at https://doi.org/10.5281/zenodo.1833496487.
Code availability
Molecular dynamics simulations were performed using Amber (version 22). EvoSplit version 1.0 was used in this study. The code of EvoSplit is available via GitHub at https://github.com/PepperLee-sm/EvoSplit and via Zenodo at https://doi.org/10.5281/zenodo.1833536588 under the Apache v.2.0 license.
References
Murzin, A. G. Metamorphic proteins. Science 320, 1725–1726 (2008).
Boehr, D. D., Nussinov, R. & Wright, P. E. The role of dynamic conformational ensembles in biomolecular recognition. Nat. Chem. Biol. 5, 789–796 (2009).
Parisi, G., Zea, D. J., Monzon, A. M. & Marino-Buslje, C. Conformational diversity and the emergence of sequence signatures during evolution. Curr. Opin. Struct. Biol. 32, 58–65 (2015).
Hrabe, T. et al. PDBFlex: exploring flexibility in protein structures. Nucleic Acids Res. 44, D423–D428 (2016).
Monzon, A. M., Rohr, C. O., Fornasari, M. S. & Parisi, G. CoDNaS 2.0: a comprehensive database of protein conformational diversity in the native state. Database 2016, baw038 (2016).
Saldaño, T. E., Monzon, A. M., Parisi, G. & Fernandez-Alberti, S. Evolutionary conserved positions define protein conformational diversity. PLoS Comput. Biol. 12, e1004775 (2016).
Monzon, A. M. et al. Conformational diversity analysis reveals three functional mechanisms in proteins. PLoS Comput. Biol. 13, e1005398 (2017).
Ellaway, J. I. J. et al. Identifying protein conformational states in the Protein Data Bank: toward unlocking the potential of integrative dynamics studies. Struct. Dyn. 11, 034701 (2024).
Dishman, A. F. & Volkman, B. F. Metamorphic protein folding as evolutionary adaptation. Trends Biochem. Sci. 48, 665–672 (2023).
Chakravarty, D., Schafer, J. W. & Porter, L. L. Distinguishing features of fold-switching proteins. Protein Sci. 32, e4596 (2023).
Tuinstra, R. L. et al. Interconversion between two unrelated protein folds in the lymphotactin native state. Proc. Natl. Acad. Sci. USA 105, 5057–5062 (2008).
Porter, L. L. & Looger, L. L. Extant fold-switching proteins are widespread. Proc. Natl. Acad. Sci. USA 115, 5968–5973 (2018).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using alphaFold. Phys. Rev. Lett. 129, 238101 (2022).
Del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 11, e75751 (2022).
Heo, L. & Feig, M. Multi-state modeling of G-protein coupled receptors at experimental accuracy. Proteins 90, 1873–1885 (2022).
Stein, R. A. & Mchaourab, H. S. SPEACH_AF: sampling protein ensembles and conformational heterogeneity with Alphafold2. PLoS Comput. Biol. 18, e1010483 (2022).
Sala, D., Hildebrand, P. W. & Meiler, J. Biasing AlphaFold2 to predict GPCRs and kinases with user-defined functional or structural properties. Front. Mol. Biosci. 10, 1121962 (2023).
Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024).
Kalakoti, Y. & Wallner, B. AFsample2 predicts multiple conformations and ensembles with AlphaFold2. Commun. Biol. 8, 373 (2025).
Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012).
Morcos, F., Jana, B., Hwa, T. & Onuchic, J. N. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc. Natl. Acad. Sci. USA 110, 20533–20538 (2013).
Uguzzoni, G. et al. Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis. Proc. Natl. Acad. Sci. USA 114, E2662–E2671 (2017).
Schafer, J. W. & Porter, L. L. Evolutionary selection of proteins with two folds. Nat. Commun. 14, 5478 (2023).
Kamisetty, H., Ovchinnikov, S. & Baker, D. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. USA 110, 15674–15679 (2013).
Anishchenko, I., Ovchinnikov, S., Kamisetty, H. & Baker, D. Origins of coevolution between residues distant in protein 3D structures. Proc. Natl. Acad. Sci. USA 114, 9122–9127 (2017).
Rao, R. M. et al. MSA Transformer. in Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) vol. 139 8844–8856 (PMLR, 2021).
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. in Proc. Second International Conference on Knowledge Discovery and Data Mining 226–231 (AAAI Press, 1996).
Chakravarty, D. et al. AlphaFold predictions of fold-switched conformations are driven by structure memorization. Nat. Commun. 15, 7296 (2024).
Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017).
Lapedes, A. S. & Stormo, G. D. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Lect. Notes-Monogr. Ser. 33, 236–256 (1999).
Thomas, J., Ramakrishnan, N. & Bailey-Kellogg, C. Graphical models of residue coupling in protein families. IEEE/ACM Trans. Comput. Biol. Bioinform. 5, 183–197 (2008).
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl. Acad. Sci. USA 106, 67–72 (2009).
Ishiura, M. et al. Expression of a gene cluster kaiABC as a circadian feedback process in cyanobacteria. Science 281, 1519–1523 (1998).
Chang, Y.-G. et al. A protein fold switch joins the circadian oscillator to clock output in cyanobacteria. Science 349, 324–328 (2015).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
MacQueen, J. Some methods for classification and analysis of multivariate observations. in Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics vol. 5 281–298 (University of California Press, 1967).
Johnson, L. S., Eddy, S. R. & Portugaly, E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 11, 431 (2010).
Chakravarty, D. & Porter, L. L. AlphaFold2 fails to predict protein fold switching. Protein Sci. 31, e4353 (2022).
Piomponi, V., Cazzaniga, A. & Cuturello, F. Evolutionary constraints guide AlphaFold2 in predicting alternative conformations and inform rational mutation design. J. Chem. Inf. Model. 65, 9459–9468 (2025).
Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).
Ha, J. & Loh, S. N. Protein conformational switches: from nature to design. Chem. A Eur. J. 18, 7984–7999 (2012).
De Sanctis, J. et al. Lck function and modulation: Immune cytotoxic response and tumor treatment more than a simple event. Cancers 16, 2630 (2024).
Prakaash, D., Fagnen, C., Cook, G. P., Acuto, O. & Kalli, A. C. Molecular dynamics simulations reveal membrane lipid interactions of the full-length lymphocyte specific kinase (lck). Sci. Rep. 12, 21121 (2022).
Hofmann, G. et al. Binding, domain orientation, and dynamics of the Lck SH3−SH2 domain pair and comparison with other src-family kinases. Biochemistry 44, 13043–13050 (2005).
Pawlonka, J., Rak, B. & Ambroziak, U. The regulation of cyclin D promoters – review. Cancer Treat. Res. Commun. 27, 100338 (2021).
Wang, J. et al. Aberrant Cyclin D1 splicing in cancer: from molecular mechanism to therapeutic modulation. Cell Death Dis. 14, 244 (2023).
Bahar, M. E., Kim, H. J. & Kim, D. R. Targeting the RAS/RAF/MAPK pathway for cancer therapy: from mechanism to clinical studies. Sig Transduct. Target Ther. 8, 1–38 (2023).
Cuesta, C., Arévalo-Alameda, C. & Castellano, E. The importance of being PI3K in the RAS Signaling Network. Genes 12, 1094 (2021).
Yin, G. et al. Targeting small GTPases: emerging grasps on previously untamable targets, pioneered by KRAS. Signal Transduct. Target Ther. 8, 212 (2023).
Vetter, I. R. & Wittinghofer, A. The guanine nucleotide-binding switch in three dimensions. Science 294, 1299–1304 (2001).
Herrmann, C. Ras–effector interactions: after one decade. Curr. Opin. Struct. Biol. 13, 122–129 (2003).
Cherfils, J. & Zeghouf, M. Regulation of small GTPases by GEFs, GAPs, and GDIs. Physiol. Rev. 93, 269–309 (2013).
Parise, A., Cresca, S. & Magistrato, A. Molecular dynamics simulations for the structure-based drug design: targeting small-GTPases proteins. Expert Opin. Drug Discov. 19, 1259–1279 (2024).
Case, D. et al. Amber 2022 https://doi.org/10.13140/RG.2.2.31337.77924 (2022).
Szklarczyk, D. et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2022).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Wu, R. et al. High-Resolution de Novo Structure Prediction from Primary Sequence. Preprint at https://doi.org/10.1101/2022.07.21.500999 (2022).
Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).
Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).
Liu, S. et al. Assisting and accelerating NMR assignment with restrained structure prediction. Commun. Biol. 8, 1067 (2025).
Lu, W. et al. DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nat. Commun. 15, 1071 (2024).
Hu, Y. et al. Exploring protein conformational changes using a large-scale biophysical sampling augmented deep learning strategy. Adv. Sci. 11, 2400884 (2024).
The UniProt Consortium et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform. 20, 473 (2019).
Hong, Y., Lee, J. & Ko, J. A-Prot: protein structure modeling using MSA transformer. BMC Bioinform. 23, 93 (2022).
Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. in Advances in Neural Information Processing Systems vol. 34 29287–29303 (Curran Associates, 2021).
Vorberg, S., Seemayer, S. & Söding, J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput. Biol. 14, e1006526 (2018).
Eastman, P. et al. OpenMM 7: rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 13, e1005659 (2017).
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22, 2577–2637 (1983).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Mirabello, C., Wallner, B., Nystedt, B., Azinas, S. & Carroni, M. Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes. Nat. Commun. 15, 8724 (2024).
Case, D. A. et al. AmberTools. J. Chem. Inf. Model. 63, 6183–6191 (2023).
Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).
Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).
Berendsen, H. J. C., Postma, J. P. M., Van Gunsteren, W. F., DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).
Zwanzig, R. Nonlinear generalized Langevin equations. J. Stat. Phys. 9, 215–220 (1973).
Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).
Oh, K. J. & Deng, Y. An efficient parallel implementation of the smooth particle mesh Ewald method for molecular dynamics simulations. Comput. Phys. Commun. 177, 426–431 (2007).
Gowers, R. et al. MDAnalysis: a Python package for the rapid analysis of molecular dynamics simulations. scipy 98–105, https://doi.org/10.25080/Majora-629e541a-00e (2016).
Scheurer, M. et al. Pycontact: rapid, customizable, and visual analysis of noncovalent interactions in md Simulations. Biophys. J. 114, 577–583 (2018).
Vig, J. et al. BERTology Meets Biology: interpreting attention in protein language models. In 9th International Conference on Learning Representations, ICLR 2021 (2021).
Zhang, Y. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.
Li, S. et al. Dataset for EvoSplit. Zenodo https://doi.org/10.5281/zenodo.18334964 (2026).
Li, S. et al. Source code for EvoSplit. Zenodo https://doi.org/10.5281/zenodo.18335365 (2026).
Acknowledgements
This work was supported by National Science and Technology Major Project (No. 2022ZD0115001 to Y.Q.G and S.Liu), National Natural Science Foundation of China (T2495221 to Y.Q.G), and New Cornerstone Science Foundation (NCI202305 to Y.Q.G). We thank Prof. Lauren L. Porter for helpful discussion on fold-switching datasets. We thank Zhen Zhu for assistance with molecular simulations and Hao Chai for insightful discussions during the early stages of the project. We also thank Shiwei Li for assistance with scheme visualization.
Author information
Authors and Affiliations
Contributions
Y.Q.G. and S.L. (corresponding author) designed and developed overall concepts in the paper and supervised the project. S.L. (first author), C.Z., and L.K. developed the EvoSplit method. S.L. (first author), C.Z., and Y.X. performed data collection, evaluation, and analysis. S.L. (first author) wrote the initial draft of the manuscript. All authors contributed ideas to the work and assisted in manuscript editing and revision.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, S., Zhang, C., Kong, L. et al. Disentangling coevolutionary constraints for modeling protein conformational heterogeneity. Commun Chem (2026). https://doi.org/10.1038/s42004-026-01940-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42004-026-01940-9