Fig. 3: GC content of 4d sites in mammalian transcripts supports the unwanted transcript hypothesis.
From: Interpreting mammalian synonymous site conservation in light of the unwanted transcript hypothesis

a GC4 content of single-exon genes (n = 1119) and the first (n = 6070) and last (n = 8053) exons of multi-exon genes are significantly greater than GC4 content of internal exons (two-sided Wilcoxon rank-sum tests, P < 2.2 × 10−16 in all cases). Violin plots show data density. b PhyloP is significantly higher at 4d sites within 3 bp of exon-intron boundaries (n = 81,221) compared to 4d sites elsewhere in exons (n = 2,539,897; ANOVA, F = 6561, d.f. = 2, P < 2.2 × 10−16). Red dotted line indicates neutrality. Boxplots in (a, b) show first and third quartiles with median line, whiskers extend ±1.5× IQR. Histograms of base counts across mammals at 4d sites at (c), the first (5′ exon edge, n = 1044) and d last (3′ exon edge, n = 12,049) positions in exons in phyloP bins show strong bias for G at conserved 4d sites. Logo plots for base content of 4d sites at (e), the first six positions (n = 29,871) and f last six positions (n = 19,792) in exons, where the probability reflects the counts of each base across the mammal genomes at 4d sites located at those positions. Grey diagram of exon puts the positions into the visual context of the exon/intron boundaries. Source data are provided as a Source Data file.