Figure 1: A computational pipeline to quantify the expression of pseudogenes from TCGA RNA-seq data.

First, we combined the latest pseudogene annotations from the Yale Pseudogene database and the GENCODE Pseudogene Resource, and filtered those pseudogene exons that overlapped with any known protein-coding genes. Second, we evaluated the sequence uniqueness of each exon of a pseudogene, and only retained those pseudogenes containing exon(s) with sufficient alignability for further characterization. Third, we filtered those reads mapped to multiple genomic locations from TCGA BAM files.