Table 1 Properties of reanalyzed studies

From: Community benchmarking and evaluation of human unannotated microprotein detection by mass spectrometry based proteomics

Citation

sORF database size

Considered noncanonical PSMs

Reported sORFs with MS support

HLA or non-HLA

Source material

Annotation definition

Reported false discovery rate

Cao et al. 44

Three-frame translation of transcriptome

28

17

non-HLA

HEK293T

Human UniProtKB 2019

1% at peptide and protein level, proteome-wide

Bogaert et al. 28

16,919

8

6

non-HLA

HEK293T cellular cytosol

Human UniProtKB/Swiss-Prot 2021

<1% peptide, <2.5% protein, proteome-wide

Chothani et al. 4

7767

5763

614

non-HLA

NHDF and HUVEC (Slany et al. 66.), ES (Shekari et al. 67.), Heart (Doll et al. 68.)

Human UniProtKB 2017

1% PSM level, unannotated specific

Duffy et al. 34.

38,187

2445

366

non-HLA

Adult brain, Prenatal brain, hESC-derived neurons

Human UniProtKB

1% at peptide and protein level, proteome-wide

Douka et al. 40.

45

18

8

non-HLA

SH-SY5Y cells (Murillo et al. 69. and Brenig et al. 70.)

Human UniProtKB 2019

10% at peptide level, proteome-wide

Prensner et al. 33.

553

6236

140

HLA and non-HLA

14 published mass spectrometry datasets

UCSC RefSeq

1% at PSM level, proteome-wide

Ouspenskaia et al. 29.

237,437

9985

4903

HLA and non-HLA*

Lymphoblastoid cell line (Sarkizova et al. 71.), patient-derived melanoma cell line, patient-derived glioblastoma cell line (Shraibman et al. 72.), chronic lymphocytic leukemia tumor, ovarian carcinoma, renal cell carcinoma

Annotated genes on UCSC Genome Browser hg19

1% at PSM level, class-specific FDR for each type of unannotated ORF (e.g., uORF, dORF)

Chen et al. 35.

7824

33

12

HLA and non-HLA†

iPSCs

Human UniProtKB

1% at PSM level, proteome-wide

Chong et al. 32.

Three-frame translation of transcriptome

2597

384

HLA

Patient-derived melanoma cell lines and lung cancer samples with matched normal tissues

Human UniProtKB/TrEMBL 2018

Class-specific FDR for unannotated, keep only PSMs identified by both Comet and MaxQuant. Estimated FDR < 0.001%

Martinez et al. 42.

7554

1160

319

HLA

Six cancer cell lines from Bassani-Sternberg 2015 (25576301): B-cells EBV transformed, B-cell leukemia, basal like breast cancer, colon carcinoma, primary fibroblast

Human UniProtKB/Swiss-Prot

1% FDR at peptide level, proteome-wide

van Heesch et al. 6.

1598

1942

500

non-HLA

Heart (Doll et al., 29133944), iPSC-derived cardiomyocytes

Human UniProtKB 2017

1% targeted FDR, 50-60% estimated FDR

Lu et al. 59.

2969

964

308

non-HLA

Cell lines: lung, colorectal cancer, liver cancer, cervical cancer

Human UniProtKB/Swiss-Prot

1% FDR at PSM, peptide and protein level.

  1. *Only HLA spectra were evaluated. †Only non-HLA spectra were evaluated.
  2. List of all studies reanalyzed. sORF database size indicates the number of sORFs in the protein sequence database in the MS analysis for each study. The number of these ORFs with proteomic support according to the study is also given. Considered noncanonical PSMs is the number of PSMs supporting a sORF-encoded protein reported in each study for which we could obtain the necessary information to evaluate; PSMs actually evaluated were selected randomly from this set. Annotation definition indicates the database used by each study to define the set of annotated or “canonical” proteins; all other proteins are considered to be unannotated, sORF-expressed proteins. Reported false discovery rate indicates the FDR given in each study for the list of sORF detections and whether this was calculated proteome-wide (a common FDR considering both unannotated and annotated proteins) or specific to the unannotated proteins.