Fig. 2: Transcriptional and epigenetic features of pTEs.

a The four-step pipeline we used to construct a rice TIP map. First, the rice genome was sequenced to obtain raw reads, including both long reads and short reads. Subsequently, de novo assembly was performed, followed by the annotation of TEs and genes. Finally, a pangenome graph was constructed, from which SVs were extracted for TE annotation. This information was then integrated with gene annotations to generate the rice TIP map. The SVs annotated as TEs are referred to as pTEs. b Bar plot illustrating the number of genes at various genic regions with pTE insertions. The numbers above the bars represent the number of genes with a single pTE insertion. The gray bars represent the gene number with pTE insertions at multiple locations. c The impact of pTE insertions on gene expression levels at different cold treatment time points. pTEs in different rice varieties are categorized into two groups: those with TE ( + TE) and those without TE (-TE). “Upstream” indicates that a single pTE is located within 2 kb of the gene and inserted in the upstream region, while “Exon” signifies that a single pTE is also located within 2 kb of the gene but inserted within the exon. Box plots show the distribution of gene expression levels: the center line represents the median, the box bounds indicate the 25 and 75th percentiles, the whiskers extend to the minimum and maximum values. For the “Upstream” category, n = 4573 genes; for the “Exon” category, n = 1385 genes. Statistical analysis of these data was performed using a two-tailed Wilcoxon test (***P < 0.001, ns: P > 0.05). d Workflow for classifying genomic sequences into pTE, consensus TE (cTE), and non-TE categories. The two genomes used for comparison are Nipponbare97 and MH6338. e H3K27me3 enrichment and epigenetic profiles surrounding genes, cTEs and pTEs. f Proportions of different histone modifications among whole genome sequences (WGS) of different categories. g Proportions of different histone modifications among gene flanking sequences (GFS) of different categories. GFS refers to the sequences of genes and their flanking 2 kb within the genome. Source data are provided as a Source Data file.