Fig. 2: A logistic regression model, PepScore, predicts the stable probability of noncanonical peptides. | Nature Communications

Fig. 2: A logistic regression model, PepScore, predicts the stable probability of noncanonical peptides.

From: Widespread stable noncanonical peptides identified by integrated analyses of ribosome profiling and ORF features

Fig. 2

a Overview of data analysis steps. b Distribution of PhyloCSF scores of stable (N = 343) vs. undetectable (N = 100) microproteins. The two-sided Wilcoxon Rank Sum Test P-value is shown. c Cumulative lengths of stable vs. undetectable microproteins. The stable ones were divided into three groups based on their PhyloCSF scores and were compared with the undetectable peptides: < −5 (N = 73, P = 2 × 10−10); ≥ −5 and ≤0 (N = 37, P = 3 × 10−12); >0 (N = 233, P = 6 × 10−24). The P-values were calculated using the two-sided Wilcoxon Rank Sum Test. d The expected ORF lengths at different FDRs based on randomized transcript and genome sequences. We grouped transcripts based on different length ranges for the calculation. e The FDRs of observed ORF lengths in stable vs. undetectable microproteins. As in (c), the stable ones were divided into three groups based on their PhyloCSF scores and were compared with the undetectable peptides: < −5 (N = 73, P = 3 × 10−11); ≥ −5 and ≤0 (N = 37, P = 7 × 10−13); >0 (N = 233, P = 1 × 10−25). The P-values were calculated using the two-sided Wilcoxon Rank Sum Test. f The logistic regression model PepScore classifies stable vs. undetectable microproteins. The coefficients and P-values of the training parameters are shown. g The PepScore distribution of indicated peptide groups. The boxes are bounded by the 25 and 75 percentiles and the center represents the median. The whiskers extend from each edge of the box to indicate the 1.5x interquartile range. N = 99 for undetectable microproteins and N = 67 for stable ones from annotated lncRNAs; N = 273 for RefSeq-defined proteins <100 aa, N = 4318 for proteins between 100 aa and 200 aa, and N = 27,566 for protein >200 aa. The P-values calculated using the two-sided Wilcoxon Rank Sum Test are shown. h The ROC curve showing the PepScore performance to classify stable vs. undetectable microproteins. The AUROC value is shown. i AUROC values for various models using different parameters to classify stable vs. undetectable microproteins.

Back to article page