Extended Data Fig. 3: DNA methylation profiles of 144 WGBS samples.

(a) The percentage of covered CpGs (read depth ≥ 5× or ≥ 10×) in the entire genome increases rapidly with the increasing number of reads used for methylation extraction, approximately reaching a plateau at 200 million reads. The black solid line and dash line are the smoothed curves fitted by a generalized additive model using geom_smooth function from ggplot2 (v3.3.6) in R (v3.4.1) for read depth ≥ 5× and ≥ 10×, respectively. The shaded area around the lines represents the 95% confidence interval for the fitted values (the lines). (b) Compared to covered CpGs (Covered), the uncovered CpGs (read depth < 5× across all samples, Uncovered) tend to be located within gene deserts (df = 15,074,753, P < 2.2×10−308) and regions with higher CG density (df = 15,074,753, P < 2.2×10−308). All the P values above are obtained based on the two-sided Welch two sample t-test, and * indicates P < 0.05. (c) Distribution of uncovered CpGs (< 5×) along the entire genome.