Extended Data Fig. 3: Magnitude of methylation change near somatic mutations by tissue and genomic context.
From: Somatic mutation as an explanation for epigenetic aging

a) Boxplots of the distribution of ΔMF10kb values for mutated (red) versus random control (n = 260,000, blue) sites for each tissue type separately (n = 813, 144, and 1,643 mutated sites from Pancreas, Brain, and Ovary tissues, respectively). P value shown for a two-sided Mann-Whitney test for a difference in median methylation fraction between the mutated and non-mutated random control loci. P value shown for a two-sided Mann-Whitney test for a difference in median absolute deviation (MAD) of ΔMF10kb between the mutated and non-mutated random control loci. The central line represents the median, the edges of the box the interquartile range (IQR), and the whiskers 1.5-times the IQR. b) A histogram of the median methylation fraction across comparison sites within ±10 kb of mutated (n = 2,600, red) and random control sites (n = 260,000, blue). Mutated sites are the same as Fig. 3b. Random control sites have been selected as before, with the additional criteria of having a methylation profile matched to that of the matched samples at mutated sites (as measured by the median methylation fraction of comparison sites, Methods). P value shown for a two-sided Mann-Whitney test for a difference in median methylation fraction between the mutated and random control loci. c) Probability distribution of ΔMF10kb values for mutated (red) versus random control (blue) sites. Mutated and random sites are the same as (b). P value calculated as in (a). d) Line plot depicting the fold enrichment for mutated over non-mutated random control sites as a function of ΔMF10kb, for the same sites as Fig. 3b. Sites are stratified depending on whether the site is a CpG and/or falls within a CpG island (n = 419 CpG-non-CGI, 21 CpG-CGI, 2,120 non-CpG-non-CGI, and 39 non-CpG-CGI sites). Fold enrichment is the ratio of the probability of observing a given ΔMF10kb for mutated sites versus non-mutated random control sites. ΔMF10kb is divided into equally spaced bins from –0.4 to 0.4. e) Barchart showing the fold-enrichment of mutated sites with the most extreme methylation changes (absolute ΔMF10kb | Z-score | > 1.96, n = 401 mutated sites) in various genomic regions, compared to all other mutated sites (n = 2,199 mutated sites). P values were calculated using a two-sided Fisher exact test. The categories ‘Upstream gene’ and ‘Downstream gene’ refer to variants located within 1 kb of the 5’ transcription start site and the 3’ transcription stop site, respectively, but outside the gene itself. f) As in (e), but comparing the mutated sites with the most extreme gains of methylation (Z-score of ΔMF10kb > 1) to those with the most extreme losses of methylation (Z-score of ΔMF10kb < –1). g) Boxplot of the ΔMF10kb value as a function of the mutated allele frequency (MAF). Same sites and samples as Fig. 3e (n = 3,880 mutated loci. The Pearson correlation is shown for the association of MAF with ΔMF10kb and the absolute value of ΔMF10kb. Two-sided p values were calculated based on the exact distribution of Pearson’s r modeled as a beta function. The central line represents the median, the edges of the box the interquartile range (IQR), the whiskers 1.5-times the IQR, and the points all ΔMF10kb value outside of these ranges.