Fig. 2: Validation of transcripts identified from long-read data in preimplantation mouse embryos. | Nature Communications

Fig. 2: Validation of transcripts identified from long-read data in preimplantation mouse embryos.

From: High-resolution annotation of the mouse preimplantation embryo transcriptome using long-read sequencing

Fig. 2

a Saturation of novel transcripts identified by short-read sequencing or by a combination of short- and long-read sequencing. This combination was archived by merging transcripts identified from short- and long-read data. The identified novel transcripts were annotated with the GENCODE annotation at each step. The lines and bands represent the mean and the 99% confidence interval of the number of novel transcripts identified at each step, respectively. b Sequence homology and domain analysis for novel coding transcripts. The blue prism represents significance (log10(e-value) < −5) according to both Blastp and Pfam. Green points indicate Blastp only, purple points indicate Pfam only, and gray points indicate no significance in either analysis. c The scatter plot shows the fraction of conserved bases (base-wise phyloP score > 0.972) (x axis) and the maximal 200-bp window average phastCons score (y axis) of novel non-coding transcripts. Blue points indicate transcripts with higher base-wise conservation (phyloP) relative to random control regions. Orange points indicate transcripts with higher window-based conservation (phastCons) relative to random control regions. Red points indicate transcripts that met both conservation criteria. d Association between H3K4me3 enrichment and gene expression. Red heatmaps represent the distributions of the H3K4me3 signals in the promoters of novel transcripts with novel TSSs within the annotated loci as well as novel genes overlapped with H3K4me3 peaks (±500-bp). Each row represents a promoter region of ±4 kb around the TSSs for 2-cell, 4-cell, and 8-cell. Blue heatmaps represent the distributions of TPM in the two classes of TSSs. Each row represents the TPM calculated by short-read data form 2-cell, 4-cell, and 8-cell. e, f Validation examples of identified transcripts. IGV view of the H3K4me3 density and RNA-seq alignment density in a novel isoform (e: Chr1:9790650–9907978; Sgk3) and a novel gene (f: Chr11:105,165,544–105,183,862).

Back to article page