Extended Data Fig. 2: Supplemental characterization of CpG mutations. | Nature Aging

Extended Data Fig. 2: Supplemental characterization of CpG mutations.

From: Somatic mutation as an explanation for epigenetic aging

Extended Data Fig. 2

a) The distribution of methylation fraction values of each CpG site in the TCGA and PCAWG datasets separately (TCGA = 273,202 and PCAWG = 326,749 CpG sites) in each sample (TCGA = 8,680 and PCAWG = 651 samples). b) The CpG density (number of CpGs per base pair) in the 50 and 125 base pairs surrounding each of the CpG sites in (a). The central line of the inner boxplot represents the median, the edges of the box the interquartile range (IQR), and the whiskers 1.5-times the IQR. c) Violin plots of the distribution of mean methylation fraction of non-mutated individuals at the same mutated CpG sites as in Fig. 1d (n = 8,037 sites), stratified by CpG mutation type. d) As in (c), but the distribution of CpG density in the 125 bp surrounding each CpG site. e) Pie chart showing the proportion of CpG mutations (n = 467,079 mutations) that result in specific mutated nucleotides. Note that 5’-CpG-3’ sites are palindromic, corresponding to a 3’-GpC-5’ sequence on the opposite strand; thus, mutation of the C residue is equivalent to mutation of the complementary G residue. For simplicity, we refer to all CpG mutations by the status of the C residue. f) Violin plot showing the mean methylation fraction across all PCAWG samples, considering CpG sites where a mutation has occurred in at least one sample (left, n = 1,137 CpG sites), CpG sites where no mutation has occurred in any sample (middle, n = 325,614 CpG sites), and all measured CpG sites (right, n = 326,751). Significant difference of distribution (p ≤ 3.03 × 10–50) is marked with (***) and non-significant (p > 0.05) with (n.s.), based on a two-sided Mann-Whitney test. g) Methylation fraction at the same mutated CpG sites as Fig. 1d (n = 8,037 sites). CpG sites are binned into five groups based on MAF, with violin plots summarizing the distribution of methylation fraction within each group. Vertical bars inside each violin represent the interquartile range. Two-sided p value calculated based on the exact distribution of Pearson’s r modeled as a beta function.

Back to article page