Fig. 3: Transcript diversity contributes to a wealth of protein diversity.

a Total number of transcripts and ORFs for each gene in the lrCaptureSeq dataset. ORF number typically scales with transcript number, as shown by similar line slopes across most genes. A minority of genes exhibit far fewer ORFs than transcript isoforms (steep slopes). b Lorenz plots of isoform ORF distributions, similar to Fig. 2C. Many predicted protein isoforms (dots) are expected to contribute to overall gene expression. Also see Supplementary Fig. 3A, B). c Shannon diversity index for unique predicted ORFs for each gene. Genes that encode trans-synaptic binding proteins are highlighted in red. d Treeplot depicting relative abundance of predicted ORFs within the dataset. For most genes, overall expression is distributed across many ORF isoforms. Genes with steep slopes in a (e.g., Cntn4) show differences here compared with transcript treeplot (Fig. 2E). e Schematic of proteomic techniques used to enrich for cell surface proteins. f Coomasie stained protein gel from biotin-labeled and streptavidin-enriched cell surface proteins. Elution lane shows enrichment of higher molecular weight proteins compared with total lysate input (left lane). Bands from 75 to 250 kDa were excised for mass spectrometry. Right (–) lane, negative control sample omitting biotinylation reagent. g Plot depicting number of unannotated peptides discovered by mass spectrometry that do not exist in the UniProtKB database. Such peptides would have gone undetected if they had not been predicted to exist by lrCaptureSeq.