Fig. 3: Development of ML-based RT filtering and fragment-mass-error-based statistical filtering.
From: Implementing N-terminomics and machine learning to probe Nt-arginylation

a RT of Arg-starting peptides predicted by RT prediction model versus observed RT. Blue line, fitted linear regression; red dashed line, Δt95% region. b Predicted RT of peptides with Arg replaced by Gly-Val in Arg-starting peptides of (a). Only 41 out of 2567 PSMs remain within the Δt95% RT interval. c Deviation between the predicted and observed RT of Arg-starting peptides in which Arg is replaced by other types of amino acid/dipeptides. The distributions are expressed as cumulative fractions. Dashed line, RT at Δt95%. d Predicted versus observed RT of the putative Nt-arginylated peptides obtained by tandem database search. Red dashed lines indicate the Δt95% region of the RT model. e A histogram of Nt-arginylation peptides as a function of RT deviation between observed and predicted. f Distribution of precursor mass errors that fit (within) and do not fit (outside) the RT model. A two-sample Kolmogorov–Smirnov (K–S) test shows that the two distributions are different. D distance statistic, P P-value. g Average mass errors of b- and y-fragment ions in each PSMs of the putative Nt-arginylated peptides. h A histogram of Nt-arginylation peptides as a function of P-values obtained from Student’s t-test comparing b-ion errors and y-ion errors. The PSMs are divided into two groups based on P-values. Dashed line, median of each group.