Fig. 2: Expert manual evaluation of literature reported unannotated protein detections in mass spectrometry datasets. | Nature Communications

Fig. 2: Expert manual evaluation of literature reported unannotated protein detections in mass spectrometry datasets.

From: Community benchmarking and evaluation of human unannotated microprotein detection by mass spectrometry based proteomics

Fig. 2: Expert manual evaluation of literature reported unannotated protein detections in mass spectrometry datasets.

A Counts of each pair of ratings among the PSMs that were assessed by two evaluators (n = 155). The Pearson correlation between pairs of ratings is indicated. B For a set of manually evaluated PSMs (n = 274), the spectrum was also predicted using several machine learning models (see “Methods”). The spectral angle is an indicator of how different the observed PSM was from the closest predicted spectrum, with larger angles indicating a worse match. The best spectral angles are indicated among PSMs grouped by evaluator rating. The box in each boxplot indicates interquartile range between the first and third quartiles, while the center line indicates the median. The whiskers indicate minima and maxima within 1.5 times the interquartile range. C Mean ± standard error of ratings of PSMs sampled from each study, per each of six evaluators (n = 620 rated PSMs in total). Standard errors were corrected for finite population (total count of reported PSMs supporting unannotated proteins in the study). Ratings were given on a 1–5 scale. D Overall distribution of ratings for unannotated protein PSMs among all studies and evaluators (n = 620 PSMs). Bars indicate proportions ± standard errors. E Log Ribo-Seq read counts for ORFs expressing proteins in PSMs rated highly (rating > 3, n = 65 proteins) or lowly (rating < 3, n = 105 proteins). Reads are from a collection of human Ribo-Seq studies (see Methods). The box in each boxplot indicates interquartile range between the first and third quartiles, while the center line indicates the median. The whiskers indicate minima and maxima within 1.5 times the interquartile range. Differences between group means are tested using a two-sided permutation test. F Predicted lengths of proteins rated highly (>3,n = 65 proteins) or lowly (<3, n = 105 proteins). Box plot meaning is same as above. Differences between group means are tested using a two-sided permutation test. G Evaluated and extrapolated counts (±SEM) of HLA and non-HLA high-rated (rating of 4 or 5) protein detections. Extrapolated counts give the number of high-rated protein detections expected if the entire dataset had been evaluated. Source data are provided as a Source Data file.

Back to article page