Fig. 5: Evaluation of PSMutPred scores across ClinVar64 variants.
From: Decoding Missense Variants by Incorporating Phase Separation via Machine Learning

a–d Comparison of variants’ Pearson correlation between groups. Groups include a PS-prone group (83 known PS proteins, 1451 variants) and a low-PS-prone group (8528 proteins, 84,840 variants) defined by PS proteins, and a predicted PS-prone group (1276 proteins, 30,889 variants) and a predicted low-PS-prone group (7335 proteins, 56,853 variants). (two-tailed P-values computed by sci-kit learn pearsonr package; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; NS = no significance). a Comparison between the PS-prone group and the low-PS-prone group. P-values are 1.7e-8, 5.8e-8, 2.6e-4, 3.4e-5, 2.4e-14, and 1.2e-275 respectively. b Comparison between the predicted PS-prone group and the predicted low-PS-prone group. P-values are 9.9e-82, 0.01, 4.9e-53, 0.02, 3.6e-198, and 7.7e-223 respectively. c Comparison between variants located in IDRs (n = 15,427) and Domains (n = 15,462) within the predicted PS-prone group. P-values are 6.4e-61, 2.8e-13, 1.7e-23, 6.0e-37, 1.5e-149, and 1.4e-107 respectively. d Comparison between variants from neurodegenerative disease (ND) related proteins (19 proteins, n = 252) and variants from other proteins (non-ND) (within the predicted PS-prone group). P-values are 0.88, 3.1e-8, 0.005, and 2.6e-95 respectively. e AUROC scores of PSMutPred-IP models on pathogenicity prediction of IDR missense variants from the PS-prone group (n = 489 variants). f A parallel evaluation of (e) but focuses on the predicted PS-prone group (n = 8188). g Comparison of the proportion values defined by different PSMutPred-IP models, including IP-RF (top), IP-LR (middle), and IP-SVR (bottom). Comparison of the PS-prone group and the low-PS-prone group on the left (PS proteins), and comparison between the predicted PS-prone group and the predicted low-PS-prone group (Predicted-PS proteins). Differences are based on 2-sample Kolmogorov’s D statistic, with positive values indicating higher proportions in the PS-prone group and negative values indicating higher proportions in another. Source data are provided as a Source Data file.