Fig. 1: Description of Cfold and test set results. | Nature Communications

Fig. 1: Description of Cfold and test set results.

From: Structure prediction of alternative protein conformations

Fig. 1

a Description of Cfold. Using all monomeric protein structures in PDB, we create a conformational split of structural clusters. Different conformations are defined as having >0.2 TM-score differences in identical sequence regions. A structure prediction network is trained on one partition of conformations, and the remaining structural clusters of conformations are saved for evaluation. The network trained to predict structures is similar to the Evoformer of AlphaFold2 (Methods). Two tracks are present, one processing the multiple sequence alignment (MSA) representation and one the amino acid pair representation, the MSA- and pair tracks. At training time, one coevolutionary representation is created to predict one structure. At inference (the MSA clustering strategy is displayed here), the trained Evoformer network is used to predict alternative protein conformations by creating many different coevolutionary representations (orange/green). These are made by sampling and clustering different sequences from the full MSA. b TM-score distributions of conformations trained and not trained on (test) for the different strategies (n = 145 and 154 for dropout and MSA clustering, respectively). The MSA clustering results in slightly better results than using dropout. The best TM-score was selected for each method out of approximately 100 samples (Methods). The black boxes encompass data quartiles and the white dots mark the medians for each distribution. The black lines encompass the min/max values. c Density plot of the TM-score to conformations in the training set vs. TM-score to unseen conformations using the best strategy (MSA clustering). The higher the density, the darker the colour. Only structures that could be predicted with a TM-score >0.8 for both train and test conformations among the samples taken are displayed (52 structures, n = 5408 sampled predictions). Predicted structures corresponding to lysine acetyltransferase PDB IDs 4AVA/4AVB (blue/grey) are shown.

Back to article page