Fig. 2: Characteristics of TE-derived TSSs and their host genes.

A Numbers of TE-derived TSSs identified using RAMPAGE data in 115 biosamples. Each biosample is represented by one black dot (all TE-derived TSSs) and one red dot (high-confidence TE-derived TSSs). The two biosamples with the most TE-derived TSSs (encased in a dashed rectangle) were testis tissues from two donors. The top-left inset indicates that more than half of TE-derived TSSs in each biosample belong to the high-confidence subset, and the top-right inset contains two histograms showing counts of TE-derived TSSs identified in different numbers of biosamples. In each boxplot, the center line indicates the median, the box represents the interquartile range, and the whiskers extend to the most extreme values within 1.5× the interquartile range. B A pie chart tallies the types of genes linked to the high-confidence TE-derived TSSs. Half of the high-confidence TE-derived TSSs are connected to the transcripts of protein-coding genes. C The protein-coding genes with higher expression levels are more likely to have TE-derived TSSs. Each boxplot summarizes data from 115 biosamples, with the center line indicating the median, the box representing the interquartile range, and the whiskers extending to the most extreme values within 1.5× the interquartile range. One-sided Wilcoxon rank-sum test P-values are shown. D Genes with TE-derived TSSs exhibit significantly higher tissue specificity than housekeeping genes. E (Top) Dendrogram resulting from agglomerative hierarchical clustering of all tissue samples with RAMPAGE data based on their expression profiles of TE-derived TSSs. Each leaf of the tree represents one tissue sample, and subtrees dominated by a single tissue type are highlighted. (Bottom) Gene ontology (GO) analysis of genes with tissue-specific TE-derived TSSs detected in each tissue type. Top 5 enriched GO terms for each tissue type, with fold enrichment and P-values derived from Fisher’s exact test, are shown.