Fig. 1: De novo identification of TSSs related to common wheat development.

a Genomic tracks illustrating the global repression of TEs by DNA methylation. Genes are marked with gray rectangles. The region in the orange dashed box is enlarged in b. b Genomic tracks illustrating a local hypo-methylated TE locus with open chromatin and active histone modifications (expanded view from Fig. 1a). c Top: average ELE-RNA expression profiles determined by the nascent RNA-sequencing analysis surrounding hypo-methylated and random TEs. FPKM, fragments per kilobase of transcript per million mapped fragments. Bottom: heatmaps of CG, CHG, and CHH DNA methylation rates around each hypo-methylated TE locus that overlaps with an ELE-RNA. d Workflow of the experimental design. e. Genomic tracks illustrating the CAGE signals in the TSSs of the genes and ELEs in four tissues. CAGE(+/−) represents the positive/negative strand for the CAGE-seq analysis. f Workflow of the genome-wide TSS annotation based on the integration of CAGE-seq, transcriptome, and epigenome data. CAGE-seq data were generated from embryo, seedling, spike and root. The CAGE-TSS is defined as a region with an enriched CAGE signal detected by CAGEr. CAGE-TSSs located at the 5′-end of annotated genes, or de novo assembled transcripts with coding potential are defined as gene TSSs, whereas intergenic noncoding CAGE-TSSs overlapped with active epigenetic markers indicative of enhancer activity, including enrichment of H3K4me3 and H3K9ac, are defined as ELE-TSSs. CAGE signals located at the 5′-end of the genes or transcripts with a relatively weak signal, but supported by epigenetic features including H3K9ac, H3K4me3, H3K36me3, RNA-sequencing and open-chromatin accessibility, are classified as low-confidence (LC) gene TSSs. g Donut plot showing the distribution of CAGE clusters in genic and intergenic regions in four tissues. h Scatter plot showing the transcription levels in intergenic regions as determined by CAGE-seq (x-axis) and total RNA-seq (y-axis) data. Each dot represents a CAGE cluster. The lines represent contours. i Heatmaps of the CAGE-TSSs surrounding the ELEs defined by the H3K4me3 and H3K9ac peaks. The heatmaps present the signal densities for CAGE-TSSs (left) and H3K4me3 or H3K9ac (right); peaks are ordered according to the increasing length of ELEs. Source data are provided as a Source Data file.