Extended Data Fig. 2: nuORFdb benchmarking.
From: Unannotated proteins expand the MHC-I-restricted immunopeptidome in cancer

a. Spectra search times (y axis) for the HLA-A*02:01 sample with different databases (x axis). b-c. nuORFdb minimizes the loss of sensitivity for annotated peptides, while enabling discovery of nuORF peptides. Number of annotated peptides (b) and nuORF peptides (c) discovered (y axis) across four databases (x axis). d. nuORFdb spectra mapping has the lowest % FDR among the three databases. %FDR for nuORF peptides (y axis) across databases (x axis). Global FDR for all peptides was set to 1%. e. nuORF peptides are discovered across multiple databases. Number of nuORF peptides unique to or shared across databases (y axis), as indicated by the black circles below (x axis). Bars on the bottom left indicate the total number of nuORF peptides discovered using each database. f. Ratios of nuORF types discovered vary depending on the database used for spectra mapping. Proportion of nuORFs of different types (y axis) in the set of nuORFs discovered by all three databases (Shared), using each database, or those specific to each database and not found by others (x axis). g. ORFs discovered using different databases vary in RNA-seq and Ribo-seq read coverage. Percent of annotated (UCSCdb) or nuORF (other databases) peptides with >0 reads (y axis) discovered using the four databases, or discovered uniquely by a database (x axis). h-k. MS spectrum mapping to the correct peptide sequence is more challenging using RNAdb and TransDb. h. Distribution of the number of considered matches for each spectrum across four databases. i. Difference between Spectrum Mill score for the top ranked (Rank1) and second best (Rank2) peptide sequences (y axis) across databases (x axis). n = 11007 (UCSC), 155 (Shared), 253 (nuORFdb), 68 (nuORFdb specific), 320 (RNAdb), 64 (RNAdb specific), 389 (TransDb), 149 (TransDb specific). Median, with 25% and 75% (box range), and 1.5 IQR (whiskers) are shown. j. Distribution of the HLAthena-predicted binding score (MSi) (left) and percent of peptides with MSi score >= 0.8 (red line on the left) (x axis) across databases (y axis). k. Predicted hydrophobicity index (y axis) and retention time (x axis) of peptides discovered using different databases for the HLA-A*24:02 sample.