Fig. 1: Conservation at fourfold degenerate sites in mammals is GC-biased.
From: Interpreting mammalian synonymous site conservation in light of the unwanted transcript hypothesis

a Boxplots showing genome GC content at 4d sites (GC4) per mammalian Order (n = 240 placental mammal genome assemblies). All mammal genomes in the Zoonomia dataset show a GC4 bias except for the hispid cotton rat (Sigmodon hispidus, Order: Rodentia; GC4 = 49.5%). Boxes represent first and third quartiles with median line, whiskers extend ±1.5× IQR, outliers beyond whiskers are shown as points. b The number of each base at 4d sites across 240 placental mammal genomes, binned by phyloP score, where more positive scores indicate stronger conservation. Pie charts showing the proportion of each base seen at 4d sites across the mammal genomes for sites c phyloP <2.27 and d phyloP ≥2.27. At conserved 4d sites (phyloP ≥2.27), we observe a general bias towards C or G at GC sites (e) and a bias towards A or T at AT sites (f) among the mammal genomes. g Per-transcript mean phyloP at 4d sites that are GC positively correlates with mean phyloP at 4d sites that are AT (n = 17,394; two-tailed Pearson’s r = 0.63, P < 2.2 × 10−16). Blue trend line shows a linear regression using the ‘geom_smooth’ function in ggplot2 in R. Colour scale represents mean transcript 4d phyloP. Species silhouettes in (a) were obtained from phylopic.org, which are available under a creative commons licence. Source data are provided as a Source Data file.