Fig. 6: Unistrand flam-like clusters are selectively enriched for antisense ERVs.

a Boxplot showing fraction of interspersed repeat content for the indicated repeat classes. Each data point represents one species (n = 119). Species with multiple genome assemblies are represented by their mean. b Boxplot showing the number of subfamilies detected per LTR family with either gag + pol (left) or gag + pol + env (right) ORFs. Each data point corresponds to one species (n = 119). Species with multiple genome assemblies are represented by their mean. c Bar plot showing LTR contribution (left, antisense; right, sense) to total transposon content across all annotated flam-like clusters. Gypsy elements are shown in red (antisense) or blue (sense) and other LTR elements are shown in grey. Clusters are grouped by synteny as indicated to the right. Species and genome assembly (alphabetically sorted) are indicated to the left. d Similar to (c), but showing LTR content across flam and major dual-strand clusters in D. melanogaster. Cluster strand was defined according to total transposon content (light grey). e Boxplot showing strand bias defined as sense strand minus antisense strand contribution to total transposon content for transposons classified as LTR, LTR/Gypsy or any other LTR, respectively. Strand bias is shown across all annotated flam-like clusters (left, dark grey, n = 48) or major dual-strand clusters in D. melanogaster and proTRAC de novo predicted clusters (right, light grey, n = 354). The means were compared using a two-sided Student’s t Test. f Boxplot displaying Gypsy versus other LTR coverage against the genomic average across different unistrand clusters. Each point corresponds to one cluster in one genome assembly. g Scatterplot showing Gypsy enrichment against env enrichment in unistrand clusters from the indicated species (see “Cluster content analyses” in the Methods for details). Only high-quality LTR transposons are included in the analysis (both gag and pol and at least one good genomic hit). Boxplots show median (central line), interquartile range (IQR, box), and minimum and maximum values (whiskers, at most 1.5*IQR). Source data are available in the source data file.