Fig. 1: Colon cancer associates with epigenetically drifted cells.

a, PCA of WGBS datasets generated in human healthy and colon cancer samples of the indicated ages (yo, years old). Data were obtained from the TCGA database. b, Bar chart indicating the number of genes with hyper- or hypomethylated promoters in the indicated conditions (total n = 100; Supplementary Table 1). Differential methylation was assessed by comparing individuals older than 60 years to those younger than 15 for aging-related changes, and by comparing all CRC samples to all healthy controls for cancer-related changes. The P value was calculated using a chi-squared test (two-sided). c, Venn diagrams of the DM promoters found in aging and cancer datasets; differential methylation was assessed as described in b. The P value was calculated by a one-sided hypergeometric test. d, Boxplot indicating the DNAm level of the promoter of ACCA drift genes in colon cancer samples and nine other cancer types from the TCGA datasets: COAD, ESCA, liver hepatocellular carcinoma (LIHC), HNSC, bladder urothelial carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), glioblastoma multiforme (GBM) and skin cutaneous melanoma (SKCM). Boxplots represent interquartile range with 5–95 percentile whiskers. P values were computed using two-sided Wilcoxon rank-sum tests between healthy and tumor samples. n = 50 samples per cancer type (50 healthy + 50 tumor). e, Proportional distribution of the ACCA drift genes classified as upregulated (15.92%), downregulated (45.21%) or not regulated (39.87%) in CRC compared to healthy controls. Differential expression was determined based on both effect size (log2 fold change) and statistical significance (false discovery rate (FDR) < 0.05) (Supplementary Table 2). f, Schematic representation of an integrative analysis of DNA methylation and gene expression datasets of human healthy colon samples. g, GO analysis of the statistically correlated genes revealed in f. Numbers indicate the number of significant genes included in the relative GO term.