Fig. 2: IFUM performance on Mega-scale and other datasets. | Nature Communications

Fig. 2: IFUM performance on Mega-scale and other datasets.

From: Protein folding stability estimation with explicit consideration of unfolded states

Fig. 2: IFUM performance on Mega-scale and other datasets.

a Scatter plot of IFUM-predicted ΔGGpred) versus experimental ΔGGexp) for the Mega-scale test set. ΔG values are clamped between –1 and 5 kcal/mol, the reported experimental dynamic range20. Color intensity indicates point density (bright: high density; dark: low density). The model achieved a Pearson correlation coefficient (PCC) of 0.78 and root mean square error (RMSE) of 1.16 kcal/mol on this dataset. b, c PCC and RMSE comparisons. b IFUM compared to IFUMbaseline, ESMtherm, ESM2 pseudo-perplexity (ESM2pppl), and protein length as a baseline (baseline-length) on the Mega-scale Common (see “Methods”). c IFUM compared to the IFUMbaseline model, trained without unfolded state modeling, and baseline-length on the Mega-scale test set. d Example predictions for HECTD1 CPH domain (PDB: 3DKM) mutants Q5L and V16D. Left: Overlay of ESMFold-predicted structures for Q5L (blue) and V16D (orange) mutants, showing their ΔGpred and ΔGexp values. Mutation sites (Q5L and V16D) are indicated with a dotted circle. Corresponding predicted equilibrium ensemble distograms are shown in a heatmap (orange: V16D, blue: Q5L). e Histogram comparing ΔGpred values between stable (ΔGexp ≥ 5 kcal/mol) and unstable (ΔGexp ≤ –1 kcal/mol) subsets of the Mega-scale test set. The x-axis labels denote specific ranges of ΔGpred values, where parentheses indicate an exclusive boundary and square brackets indicate an inclusive boundary. f A scatter plot of ΔGpred versus ΔGexp for 57 unique wild-type proteins from literature data. The marker size corresponds to protein length, and color corresponds to the ESMFold predicted Local Distance Difference Test (plDDT) score. The PCC ranges from 0.53 to 0.97 with a higher plDDT cutoff. The dashed line indicates a perfect correlation (y = x). g Histogram comparing ΔGpred values between CATH and DisProt datasets (two-sided Welch’s t-test p « 0.001). Source data are provided as a Source data file.

Back to article page