Fig. 3: Long-read RNA-seq shows consistency in gene expression quantification with short-read RNA-seq data.

a, Scatterplots of spike-in gene log2-transformed CPM values obtained from long-read direct cDNA and PCR cDNA RNA-seq (using Salmon), and short-read RNA-seq (using Salmon), compared with expected log2-transformed spike-in CPM for five different spike-in RNAs. Light blue points represent Sequin Mix A version 1 and SIRV E2; dark blue points represent Sequin Mix A version 2, ERCC and SIRV E0 + long SIRV RNAs. b, Box plots showing the median, upper and lower quartiles, and 1.5 times the interquartile ranges of the Spearman correlation between log2-transformed CPMs (using Salmon) for protein-coding genes from replicates generated by different protocols. Light green represents replicates from different cell lines (inter-cell line: n = 667, 617, 534, 514, 447 and 411 for dRNA versus cDNA, dRNA versus dcDNA, cDNA versus dcDNA, dRNA versus Illumina, cDNA versus Illumina, and dcDNA versus Illumina, respectively) and light blue represents replicates from the same cell line (intra-cell line: n = 113, 103, 90, 86, 73 and 69. c, Box plots showing the median, upper and lower quartiles, and 1.5 times the interquartile ranges of the Spearman correlation between log2-transformed CPMs (using Salmon) for long-noncoding RNA genes from replicates generated by different protocols. Light green represents replicates from different cell lines (inter-cell line: n = 667, 617, 534, 514, 447 and 411, for dRNA versus cDNA, dRNA versus dcDNA, cDNA versus dcDNA, dRNA versus Illumina, cDNA versus Illumina, and dcDNA versus Illumina, respectively). Light blue represents replicates from the same cell line (intra-cell line: n = 113, 103, 90, 86, 73 and 69). d, Scatterplot of log2-transformed CPMs from protein-coding genes obtained from long-read direct cDNA (using Salmon) compared with those obtained from short-read RNA-seq (using Salmon) in the A549 cell line. e, Scatterplot of log2-transformed CPMs from long-noncoding genes obtained from long-read direct cDNA (using Salmon) compared with those obtained from short-read RNA-seq (using Salmon) in the A549 cell line. f, Heatmap showing the correlation of gene log2-transformed CPM estimates across the SG-NEx samples generated using PCR cDNA, direct cDNA, direct RNA and short-read protocols.