Fig. 6: Analysis of phase separation-related feature contributions to pathogenicity prediction. | Nature Communications

Fig. 6: Analysis of phase separation-related feature contributions to pathogenicity prediction.

From: Decoding Missense Variants by Incorporating Phase Separation via Machine Learning

Fig. 6

a–e Pathogenicity prediction performance evaluation of the model combining EVE with PS-related features. a AUROC (Left) and AUPR (Right) evaluations on the independent test set (n = 15,394). The purple line represents the model trained with both EVE and PS features; the green line represents the EVE score alone. b AUROC (Left) and AUPR (Right) evaluations specifically on variants within IDRs from the data set analyzed in (a) (n = 5656). c, d The divergence of predicted scores distributions between the standalone EVE (green) and the combined model (purple), quantified using a two-sided Mann–Whitney U test on the independent test set (****P < 0.0001; P-values are 2.4e-27; 1.7e-15, 9.6e-59; and 7.2e-293 respectively, the boxplot components within each violin plot, from top to bottom are maxima, upper quartile, median, lower quartile, and minima.). c Score distributions for pathogenic-prone variants (pathogenic and likely pathogenic, n = 2044, left graph) and benign-prone variants (benign and likely benign, n = 3612, right graph) with a focus on variants located in IDRs. d A parallel evaluation of (c) but focusing on variants located in Domains (6665 pathogenic or likely pathogenic and 3073 benign or likely benign). e Evaluation of IDRs variants with high AlphaFold2 pLDDT scores (pLDDT ≥ 70, n = 2763) and low pLDDT scores (pLDDT < 50, n = 2407). f–i Pathogenicity prediction performance evaluation of the model combining ESM1b with PS-related features. f Evaluation of the model trained with ESM1b and PS features using 5-fold cross-validation under the ClinVar dataset (n = 140,321). g Evaluation of IDRs variants with high AlphaFold2 pLDDT scores (pLDDT ≥ 70, n = 36,032) and low pLDDT scores (pLDDT < 50, n = 25,755). h, i Pathogenicity prediction for 1,015,769 ClinVar VUSs by combining PS features with ESM1b scores. Source data are provided as a Source Data file.

Back to article page