Figure 1

Overview of the study flow scheme, data sets, and sequencing robustness. (A) A schematic of the different datasets analyzed in the study is shown. We collected and analyzed replicate sets at each step in the sample collection and sequencing process. (B) A schematic representation of post-sequencing transcript quality metrics shows we utilize transcript integrity number (TIN) and 3′ transcript bias metrics that capture the consistency and uniformity of coverage for a given transcript. (C) Pairwise correlations of transcriptional profiles of replicate pairs are shown, with each replicate type separated by dashed horizontal lines. The correlations of matched replicates are individual points, broken into high TIN and low TIN groups, based on transcript integrity check as described in Supplementary Methods. Densities on the bottom of the plot indicate various background distributions of correlations. (D) Expression difference analysis per each replicate group is presented. The y-axis quantifies the greatest differences in the gene expression quantifications per replicate pair, with types of replicates indicated along the x-axis. This plot shows that the library replicates are the most similar in expression quantification, while FFPE vs. FF/OCT replicates are least similar. (E) Expression differences from (D) were plotted against correlations of transcriptional profiles in replicate pairs, demonstrating anti-correlation relationships between the two metrics and showing their relationship to the replicate type. For both (D) and (E), only high-TIN pairs are plotted and only COSMIC cancer genes were used for analysis.