Fig. 2: Evaluation of the performance of the pSAM model and identification of DNLs with residue-level contribution analysis. | Nature Communications

Fig. 2: Evaluation of the performance of the pSAM model and identification of DNLs with residue-level contribution analysis.

From: Deep learning prioritizes cancer mutations that alter protein nucleocytoplasmic shuttling to drive tumorigenesis

Fig. 2: Evaluation of the performance of the pSAM model and identification of DNLs with residue-level contribution analysis.

A pSAM shows significantly better predictive performance than existing tools for 1000 randomly selected proteins. B The pSAM-predicted probabilities on nuclear-localized peptides whose NLS activity scores were measured previously. The data are presented as a box-and-whisker graph (bounds of box: first to third quartile, bottom and top line: minimum to maximum, central line: median). NLS activity score classes: scores 9−10 (n = 57 peptides), scores 5-8 (n = 168 peptides) and scores 1-4 (n = 149 peptides). C The delta scores between amino acids located in known NLS regions and randomly selected regions. Alterations in nuclear localization probability was calculated for 8920 amino acids in NLSs and randomly selected regions, respectively. The data are presented as a box-and-whisker graph (bounds of box: first to third quartile, bottom and top line: minimum to maximum, central line: median). D The position relative to the terminal terminus of the predicted DNL regions. E The residue-level contribution of the nuclear localization probability of PML. F The residue-level contribution of the nuclear localization probability of TP53. G The t-SNE distribution of predicted NLS-like and NES-like regions. H The t-SNE distribution of known NLSs and NESs. I The number of known and predicted NLSs and NESs. J Domain analysis of NLS regions and NLS-like regions retrieved from NLSdb and pSAM. The error bands represent 95% confidence intervals. K Shuffling analysis of known NLSs and predicted determinants of nuclear localization (DNLs). For the 1752 selected proteins, the matched wildtype sequence (WT), validated NLS-truncated sequence, DNL-truncated sequence and randomly truncated sequence were compared. The data are presented as a box-and-whisker graph (bounds of box: first to third quartile, bottom and top line: minimum to maximum, central line: median). L Coverage of known and predicted targeting peptides for nuclear localization. Two-sided Kruskal-Wallis test was used for (B), two-sided Student’s t test was used for (C), two-sided Pearson correlation test was used for (J), two-sided Wilcoxon test was used for (K). Source data are provided as a Source Data file.

Back to article page