Extended Data Fig. 2: Gene expression and alternative splicing profiles across samples.

(a) Number of expressed genes (Transcripts per Million, TPM > 0.1) increases rapidly with the increasing number of clean reads across all 8,536 samples, reaching a plateau at 50 million reads. The black line is the smoothed curve fitted by a generalized additive model using geom_smooth function from ggplot2 (v3.3.6) in R (v3.4.1). The shaded area around the lines represents the 95% confidence interval for the fitted values (the line). (b) The percentage of unexpressed genes (TPM < 0.1 across all samples) on known chromosomes (Known) and unplaced scaffolds (Unplaced, 54.10%). (c-f) Compared to expressed genes, the unexpressed genes have shorter gene length (df = 21,921, P = 2.2 × 10−4) (c), fewer exons (df = 27,675, P = 2.5 × 10−5) (d), higher CG density (df = 21,921, P = 1.5 × 10−103) (e), and higher dN/dS ratio (df = 19,718, P = 5.4 × 10−21) (f). (g) The number of spliced introns increases rapidly with the increasing number of clean reads across samples, reaching a plateau at 100 million reads. The smoothed curve and the shaded band are obtained using the same method as in (a). (h-k) Compared to all genes, genes without spliced introns in any tissues have shorter gene length (df = 22,320, P = 2.9 × 10−18) (h), fewer exons (df = 17,690, P = 7.4 × 10−52) (i), lower expression levels (median gene expression levels across samples, df = 28,479, P = 0.35) (j), and higher dN/dS ratio (df = 19,921, P = 3.7 × 10−32) (k). All the P values above are obtained based on the two-sided Welch two sample t-test, and * indicates P < 0.05. (l) Distribution of gene types for those without spliced introns. (m) Significant terms (P < 0.05) of Gene Ontology for genes without spliced introns based on the hypergeometric test.