Fig. 2: K-value is an indicator of gene isoform quantification error. | Nature Biotechnology

Fig. 2: K-value is an indicator of gene isoform quantification error.

From: Improving gene isoform quantification with miniQuant

Fig. 2

a, Schematic illustration of the gene isoform structures and their corresponding aligned short reads of FAM219A (left) and SPINDOC (right). The two-dimension density plot (n = 200 simulations) represents the correlations between true and estimated abundance. b, Barplot representing the number of genes and corresponding isoforms within each K-value group. The first K-value group includes genes with K-value = 1. All subsequent K-value groups are genes with K-values between the two numbers above and below the line. The number of genes and isoforms are labeled on the left and right next to the bars, respectively. c, Boxplot showing the median MARD of genes within each K-value group across sequencing depths (n = 9) quantified using five different tools. Only genes with expression levels TPM > 1 are retained for visualization. In boxplots, the hinges represent the first and third quartiles, the center line represents the median, and the whiskers extend to the smallest and largest datapoints within 1.5 interquartile from the hinges. All boxplots in the subsequence analysis have the same definition unless specified. d, Barplot representing the MARD of ten indicated genes with low (red) and high (blue) K-values across sequencing depths (n = 9) quantified by kallisto. The overall MARD (gray) are calculated based on the median MARD of all genes with expression level TPM > 1. K-values are labeled in brackets along with the gene symbol. In barplots, data are presented as mean values ± s.e. All barplots in the subsequence analysis have the same definition unless specified. e, Comparison of the performance (precision, recall, accuracy and F1 score) between differentially expressed genes isoforms with low (red) and high (blue) K-values. f, Violin plot represents the performance (precision, recall, accuracy and F1 score) of identifying differentially expressed gene isoforms between ESC and DE across sequencing depths (n = 9) within each K-value group quantified using five different tools. g, Boxplot representing the median MARD (left) and irreproducibility (right) of genes within each K-value group per sample from GTEx (n = 54 human tissues), TCGA (n = 32 cancer types) and ENCODE (n = 46 cell lines and human tissues). Only genes with expression levels TPM > 1 are retained for visualization.

Back to article page