Fig. 3: TE-derived TSSs arise from specific TE families with different evolutionary ages.

A A series of boxplots that highlight the distribution of TE families and subfamilies from which TE-derived TSSs arise. Each boxplot summarizes data from 115 biosamples, with the center line indicating the median, the box representing the interquartile range, and the whiskers extending to the most extreme values within 1.5× the interquartile range. The testis tissues (encased in a dashed rectangle) contained the most TE-derived TSSs in most TE families or subfamilies. B Fold enrichments (red dots) and genomic percentages (gray bars) of TE subfamilies from which TE-derived TSSs arise. C According to evolutionary age, TE-derived TSSs, promoter-proximal TEs, and TSSs of GENCODE protein-coding genes were clustered into “Hominids”, “Old World Anthropoids (OWA)”, “Primates”, and “Mammals” categories (see Methods for more details). Chi-squared test P-values are shown. D Distribution of TE subfamilies in different evolutionary categories for high-confidence TE-derived TSSs, all TE-derived TSSs, and promoter-proximal TEs. E TE-derived TSSs showed lower sequence divergence than promoter-proximal TEs in evolutionarily young categories (“Hominids” and “OWA”), while in evolutionarily old categories (“Primates” and “Mammals”), TE-derived TSSs had higher sequence divergence than promoter-proximal TEs. In each boxplot, the center line indicates the median, the box represents the interquartile range, and the whiskers extend to the most extreme values within 1.5× the interquartile range. One-sided Wilcoxon rank-sum test P-values are shown. F Boxplots comparing the sequence divergence between TE-derived TSSs and promoter-proximal TEs in different TE subfamilies. In each boxplot, the center line indicates the median, the box represents the interquartile range, and the whiskers extend to the most extreme values within 1.5× the interquartile range. One-sided Wilcoxon rank-sum test P-values are shown. G Structural completeness of TE subfamilies for TE‑derived TSSs and promoter‑proximal TEs. H Transcription factor enrichment at TE-derived TSSs for each TE subfamily (Fisher’s exact test FDR values were color-coded). Enriched transcription factors shared by different TE subfamilies were shown in the zoomed-in view. The bottom-right inset contains histograms showing counts of overlapped transcription factors in TE-derived TSSs and promoter-proximal TEs for different TE subfamilies.