Supplementary Figure 3: Average nucleotide diversity (π) for MAE and BAE genes in the global 1000 Genomes data set.
From: Genes with monoallelic expression contribute disproportionately to genetic diversity in humans

π is calculated for the coding regions (CDS), including all sites. Error bars, 95% confidence intervals calculated by bootstrapping. Orange, BAE genes; blue, MAE genes. (a) π per cell line for genes classified as MAE and BAE in that cell line. (b) π for genes classified as MAE with RPKM >1 in only one (MAE 1), two, three or four cell lines, as compared to genes classified as BAE. (c) π for genes experimentally determined to be MAE (217) or BAE (2,412) on the basis of SNP array assays of five clones from the GM13130 cell line, as reported in ref. 3. (d) π for MAE and BAE genes by expression level for low, intermediate and high expression, as determined in e. (e) Definition of the low, intermediate and high expression categories for genes in the genome-wide data set. RPKM is the highest RPKM observed with each gene’s assigned status in the six cell lines; the boundaries of the categories are shown in hashed lines. (f) Mutation rate–corrected, non-CpG-prone π values for MAE and BAE genes by expression level. 95% confidence intervals were estimated by bootstrapping. Colors are as in a. (g) Mutation rate corrected π for MAE and BAE genes by expression level. 95% confidence intervals were estimated by bootstrapping. Colors are as in a.
Note for f and g:
Nucleotide diversity (π) in expression level bins. BAE genes are shifted toward much higher mRNA expression levels as compared to MAE genes (blue, MAE genes; orange, BAE genes). Although previous studies have suggested that highly expressed genes are subjected to higher selective pressure than weakly expressed genes (Proc. Natl. Acad. Sci. USA 102, 14338–14343 (2005) and Trends Genet. 24, 114–123, 2008), in our gene sets, we do not find strong evidence for negative correlation between π and expression levels. Specifically, we stratified MAE and BAE genes into eight equally sized bins by expression levels in six cell types (log10 (RPKM); see the Online Methods for our definition of expression levels) and examined the linear relationship between π and expression level (f, non-CpG π; g, overall π). The difference in mutation rate was corrected with a divergence-based mutation rate map. Expression level is not significantly correlated with non-CpG π (P = 0.52 for MAE and 0.61 for BAE) or overall π (P = 0.10 for MAE and 0.07 for BAE). Note that even the marginal correlation between expression level and overall π for BAE genes is almost entirely driven by the genes in the highest expression level bin (log10 (RPKM) >2.0), without which the trend becomes flat (P = 0.89; solid black line). This most highly expressed group of genes can explain only 9% of the difference in overall π (Δp) between MAE and BAE genes.
To make an extremely conservative assumption, one can argue that the insignificant trend of overall π for BAE is uniform and holds over lowest expression levels (dashed black line). Even in that case, the potential contribution of expression bias is estimated to explain only 36% of Δp. To estimate this, we extrapolated the π values of hypothetical BAE genes that follow the distribution for gene length and expression level of monoallelic expression by
where and
are the estimated intercept and slope of the BAE trend over expression level (7.8 × 10−4 and −6.2 × 10−5, respectively; from dashed black line) and
and
are the number of fourfold-degenerate sites and the expression level of MAE gene i.