Fig. 2: Development of ML-based MS2 filtering. | Nature Communications

Fig. 2: Development of ML-based MS2 filtering.

From: Implementing N-terminomics and machine learning to probe Nt-arginylation

Fig. 2: Development of ML-based MS2 filtering.The alternative text for this image may have been generated using AI.

a Assessment of prediction accuracy for MS2 spectra with varying training methods. Plotted are the distribution of Pearson’s correlation coefficient (PCC) of PSMs for Arg-starting peptides according to prediction models (left), percentage of PSMs with PCC ≥ 0.9 (center) and fragment ion species-specific PCC (right). Box plots in this figure show the median (center line), interquartile range (IQR; box limits), and whiskers extending to 1.5 × IQR; outliers beyond this range are shown as individual points. b PCCs obtained by comparing the PSMs searched as Nt-arginylated peptides to the predicted spectra generated using the fine-tuned MS2 prediction model. Bin size: 0.025. c, Boxplots that compare ion species-specific PCC values for b-ion and y-ion. n indicates the number of PSMs for putative Nt-arginylated peptides in each PCC score group of b. d Discriminatory power of similarity measures between true positive and false positive Nt-arginylation PSMs using receiver-operating characteristic (ROC) curves. The PSMs were generated using a decoy database specialized for Nt-arginylation search. PCC Pearson’s correlation coefficient, COS cosine similarity, SPC Spearman’s correlation coefficient, spec_FNR spectral false negative rate, spec_FPR spectral false positive rate, percolator percolator score. e Cumulative false discovery rate by order of percolator score and PCC similarity score. A PSM with a lower index has a higher score. f Distribution of ion species specific PCC values for b-ion and y-ion. n indicates the number of PSMs for putative Nt-arginylated peptides divided into two groups based on the PCC score of (e) corresponding to 1% FDR.

Back to article page