Abstract
A subset of colorectal cancer (CRC) patients displays abnormal DNA methylation patterns (“Outlier Methylation Phenotype” (OMP)) in normal adjacent tissue (NAT). We analyzed the impact of OMP status on epigenetic age and tumor progression using published aging and mitotic clocks and colonic epigenetic clock (EpiAge) developed on controls, with equal representation from Black and Caucasian populations. Three aging clocks (including EpiAge) showed “significantly lower” epigenetic age in CRC. Mitotic clocks suggested significantly fewer stem cell divisions in CRC versus controls. A binomial model using inactive X chromosome CpGs suggested that NAT of CRC patients had fewer stem cells compared with controls. On comparing NAT and tumor methylome, the OMP group showed the fewest epigenetic differences. In conclusion, our study demonstrates that NAT of OMP-CRC patients undergoes epigenetic age deceleration and have fewer epigenetic differences with tumor tissues, suggesting that NAT of OMPs had progressed toward a tumor identity.
Introduction
We have identified a subgroup of colorectal cancer (CRC) patients who have highly disrupted epigenomes and display “Outlier Methylation Phenotype” (OMPs)1,2 in normal adjacent tissue (NAT). Black CRC patients appear more than twice as likely to have OMP than their White counterparts (11/42 versus 4/35)2.
Whether OMPs have distinguishing clinical outcomes is unclear. However, we observed that the frequency of OMPs in the healthy population increases with age, suggesting the accumulation of epigenetic perturbations over time1. If epigenetic changes occur throughout life, the rate of biological aging could be different from the rate of chronological aging. A common approach to estimating biological age is by using DNA methylation clocks. Such epigenetic clocks are comprised of a set of CpGs that exhibit age-related change, identified using predictive modelling, which are then used to estimate the epigenetic age of individuals3,4,5,6. However, none of the previously established clocks are derived from healthy populations of African descent. For instance, Hannum’s clock3 is based on peripheral blood leukocytes from White individuals, whereas Horvath’s multi-tissue clock4 is derived from multiple datasets including a large number of normal tissues from cancer patients in TCGA (which also has an under-representation of patients of African descent). These epigenetic clocks have been used to determine the epigenetic age of colon tumor tissues7 or normal colon mucosa8,9. Interestingly, one of these studies conducted on a racially diverse population9 also states the limitations of the Horvath clock for African American population. Hence, due to the lack of/under-representation of Black patients in the development of epigenetic clocks, we aimed to develop a colonic epigenetic clock using healthy controls, with equal representation of White and Black individuals to estimate the epigenetic age of our patients.
While it appears that OMPs are associated with undesirable outcomes, including low birth weight10 and colorectal cancer2, we wished to identify the impact of OMP status on epigenetic aging and tumor progression.
Results
Identification of age-associated CpGs and development of a normal colon mucosa epigenetic clock (EpiAge)
Our study cohort comprised of 68 healthy controls and 77 CRC patients. Among CRC patients we had 15 OMPs and 62 non-OMPs (Suppl Table 1).
We first identified 92,171 age-related CpGs (p < 0.05) using linear regression of methylation data from the healthy control samples, with equal representation of Black and White racial ancestries (Suppl Table 1). We then developed a colonic epigenetic clock by predictive modelling of the top 10,060 (p < 0.001) CpGs from the 92,171 age-associated CpGs (Suppl Fig. 1). The final model (EpiAge) had 58 CpGs (Table 1), which were used to calculate the epigenetic ages of the samples using the below formula:
Based on our newly developed epigenetic clock, epigenetic ages of control samples were very highly correlated (R2 = 0.9928) with their chronological ages (Fig. 1a), as expected for samples on which the clock was developed. On the other hand, the epigenetic ages of NAT of cancer patients were not strongly correlated (R2 = 0.1918) with their chronological ages (Fig. 1b). As we have identified a subgroup of highly epigenetically unstable cancer patients2, termed to have outlier methylation phenotype (OMP), we further examined the correlation after excluding these OMPs. Non-OMP cancer patients had a moderate correlation (R2 = 0.4753) between their epigenetic and chronological ages (Fig. 1c).
Comparison of epigenetic ages between healthy controls and NAT of cancer patients
Our study groups (healthy controls and cancer patients) were age-matched, so there was no difference in the chronological ages of the study groups (Fig. 2a). However, our colonic epigenetic clock showed significantly lower epigenetic age for NAT of cancer patients compared to healthy controls (Fig. 2b). As our EpiAge clock is derived from a control group of moderate size, rather than a large group, we analyzed the cancer/control differences using other published epigenetic clocks (Horvath, Hannum, GrimAge, PhenoAge) (Fig. 3a–d). Two of the four clocks (Hannum and PhenoAge) showed significantly lower age, or age deceleration, of cancer patients compared to the control group (Fig. 3b,c). The Horvath clock, which is derived substantially from NAT of cancer patients in the TCGA database, did not show a significant difference, nor did the GrimAge clock, which is trained on smoking pack-years. We further used four measures of mitotic clocks (tnsc from epiTOC2, tnsc from epiTOC2, pcgtAge from epiTOC and HypoSC from Hypoclock) (Fig. 3e–h) to interrogate these differences. Interestingly, all four mitotic measures suggested significantly fewer stem-cell divisions in NAT of cancer patients compared to healthy controls.
Epigenetic age in NAT and Healthy groups using different aging and mitotic clocks. (a) Horvath multi-tissue clock; (b) Hannum clock; (c) PhenoAge; (d) GrimAge; (e) tnsc from epiTOC2; (f) tnsc2 from epiTOC2; (g) pcgtAge from epiTOC; (h) hypoSC (Hypo score) from Hypo clock. Red symbols indicate OMPs. tnsc: the estimated cumulative number of stem-cell divisions per stem-cell per year and per sample using the full epiTOC2 model; tnsc2: the estimated cumulative number of stem-cell divisions per stem-cell per year and per sample using an approximation of epiTOC2 which assumes all epiTOC2 CpGs have beta-values exactly 0 in the fetal stage; pcgtAge: this is the mitotic-score obtained using epiTOC model; hyposc: HypoScore by taking the average methylation of 678-solo-WCGWs which exhibit uniformly high methylation across fetal tissue types.
As cell -type compositions can influence the shifts observed in epigenetic aging and stem cell divisions in OMP NAT, so we used metric of epithelial cell purity, Leukocytes unmethylation for purity (LUMP)11. We have shown earlier2 that there is no difference in epithelial cell purity between NAT and healthy controls. In this study, we further tested whether NAT of OMPs differ from NAT of non-OMPs in epithelial cell purity. Our data (Suppl Fig. 2) indicates that NAT of OMPs and non-OMPs do not differ in epithelial cell purity. Furthermore, OMPs and non-OMPs did not have any differences clinically (Suppl Table 1). However, OMPs and non-OMPs did differ in some of the (alcohol consumption, smoking and BMI) methylation-based predictors for lifestyle and biochemical traits.
We also employed a binomial model to compare stem-cell number in two groups (healthy control (Nh) and cancer (Nc)) of women using ChrX CpGs (Materials and Methods, see also McLaren 197212; Turan et al. 2010)13. Among the sites analyzed, eleven sites had significantly higher Nh than Nc. On the other hand, only one site was significant in the opposite direction. No other sites showed significant differences. However, as our criterion for significance was extremely stringent (Materials and Methods), it is possible that there may be a dozen or more sites achieving statistical significance. Overall, our data indicate significantly greater stem cell number in the healthy control group compared to the cancer group. This analysis provides a plausible explanation for decreased epigenetic age in NAT of cancer patients compared to healthy controls.
We examined the influence of OMPs (as defined in Ghosh et al., 20222) on epigenetic age deceleration seen in the NAT of cancer patients. For this set of analyses, we excluded the OMPs and repeated the analyses shown in Figs. 2 and 3 for non-OMP cancer patients versus healthy controls. The outcome of the chronological age and EpiAge was similar (Fig. 4a,b), wherein there was no difference in the chronological ages but significantly decreased EpiAge of NAT of cancer patients compared to controls. All other epigenetic clocks (Figs. 5a–d) had similar outcomes. However, three of the four mitotic clock measures (tnsc, tnsc2, pcgtAge) lost. Significance after excluding the OMPs (Fig. 5e–h).
On using the tissue prediction function, we observed that OMPs had < 20% probability of being a colon tissue (Fig. 6).
Epigenetic age in NAT and Healthy groups using different aging and mitotic clocks after excluding the OMPs. (a) Horvath multi-tissue clock; (b) Hannum clock; (c) PhenoAge; (d) GrimAge; (e) tnsc from epiTOC2; (f) tnsc2 from epiTOC2; (g) pcgtAge from epiTOC; (h) hypoSC (Hypo score) from Hypo clock. Red symbols indicate OMPs. tnsc: the estimated cumulative number of stem-cell divisions per stem-cell per year and per sample using the full epiTOC2 model; tnsc2: the estimated cumulative number of stem-cell divisions per stem-cell per year and per sample using an approximation of epiTOC2 which assumes all epiTOC2 CpGs have beta-values exactly 0 in the fetal stage; pcgtAge: this is the mitotic-score obtained using epiTOC model; hyposc: HypoScore by taking the average methylation of 678-solo-WCGWs which exhibit uniformly high methylation across fetal tissue types.
Differential expression analysis of OMPs vs. non-OMPs
To validate our methylation data on stem-cell estimation, we also performed expression analysis on a subset of samples (4 OMPs vs. 7 non-OMPs). All the NAT samples had same tumor grade (GradeII/moderately differentiated) as tumor grade can be affected by cancer stem cell population. We identified 149 upregulated and 34 downregulated genes in OMPs compared to non-OMPs (Suppl Fig. 3, Suppl Table 2). Interestingly, a number of genes associated with cancer stem cells or stemness (NNMT, SAA1, CCL2, CXCL2, CCN1) were among the top 10 upregulated genes in OMPs. Furthermore, pathway analysis of 149 upregulated genes showed 30 activated pathways (z score > 2) in OMP group (Suppl Table 3). The top activated pathway was Interleukin-4 and interleukin-13 signaling pathway which is a known pathway for stemness in CRC.
Comparison of epigenetic ages between normal and tumor tissues of cancer patients
Comparison of the epigenetic ages of NAT and tumor tissues of cancer patients demonstrated that tumor tissues have significantly greater epigenetic ages (paired t-test: p < 0.0001, mean of differences = 25.99) compared to their NAT in the EpiAge clock (Fig. 7a,b). We also analyzed these differences using other epigenetic clocks (Fig. 8a–d). Interestingly, only two of the four clocks (Hannum (paired t-test : p = 0.0006, mean of differences = 8.672) and PhenoAge (paired t-test : p < 0.0001, mean of differences = 37.86)) validated this trend, whereas the Horvath clock (which is based largely on TCGA samples) indicated decreased age of tumor tissues paired t-test : p = 0.0080, mean of differences=−5.635) and GrimAge did not show any differences (paired t-test : p = 0.242) between the epigenetic ages of NAT and tumor tissues. As expected, all mitotic clock measures (Fig. 8e–h) indicated a significantly (p < 0.0001) greater number of stem cell divisions in the tumor tissues compared to NATs.
Epigenetic age of tumor tissues using EpiAge clock. (a) Comparison of EpiAge between NAT and tumor tissues of CRC patients. (b) Estimation plot showing deviation of EpiAges from NAT to tumor tissues of the individual CRC patients using lines and the mean of differences between EpiAges in tumor and NAT.
Comparison of epigenetic ages in NAT and tumor tissues of CRC patients using different aging and mitotic clocks. (a) Horvath multi-tissue clock; (b) Hannum clock; (c) PhenoAge; (d) GrimAge; (e) tnsc from epiTOC2; (f) tnsc2 from epiTOC2; (g) pcgtAge from epiTOC; (h) hypoSC (Hypo score) from Hypo clock. Red symbols indicate OMPs. tnsc: the estimated cumulative number of stem-cell divisions per stem-cell per year and per sample using the full epiTOC2 model; tnsc2: the estimated cumulative number of stem-cell divisions per stem-cell per year and per sample using an approximation of epiTOC2 which assumes all epiTOC2 CpGs have beta-values exactly 0 in the fetal stage; pcgtAge: this is the mitotic-score obtained using epiTOC model; hyposc: HypoScore by taking the average methylation of 678-solo-WCGWs which exhibit uniformly high methylation across fetal tissue types.
Differential methylation of tumor tissues
As a measure of global epigenetic change between NAT and tumor, we also analyzed the magnitude of site-specific epigenetic differences between the NAT and tumor tissues of the cancer patients. Overall, tumor tissues were hypomethylated (77% of interrogated CpGs) compared to NATs (Fig. 9a). We further divided the cancer patients into three groups based on racial ancestry or OMP status: White non OMPs, Black non OMPs and OMPs (combined White and Black OMPs due to limited number of White OMPs). White and Black non OMP groups showed a large number of significant methylation differences between NAT and tumor tissues (104,209 CpGs and 60,405 CpGs, respectively; Figs. 9b,c).
Volcano plots showing differentially methylated CpGs between NAT and tumor tissues of CRC patients. (a) All CRC patients (n=76); (b) White non-OMP CRC patients (n=31); (c) Black non-OMP CRC patients (n=30); (d) OMP CRC patients (n=15). CpGs with P value less than 4.6E-07 and magnitude of difference >0.05 were considered to be significant and are plotted for each comparison. x-axis shows the magnitude of methylation difference between normal and tumor tissues. y-axis shows the significance level (p-value). Numbers (within boxes) on left side of vertical dotted lines indicate the number of significantly hypomethylated CpGs in tumors and numbers (within boxes) on right side of dotted vertical lines indicate the number of significantly hypermethylated CpGs in tumors compared to NAT.
On the other hand, methylation profile of NAT did not differ dramatically from tumor tissues in the OMP group (only 871 significantly differentially methylated CpGs; Fig. 9d), suggesting the NAT of OMP patients were much more similar, in DNA methylation profile, to their tumors than NAT of non OMP patients.
As variability within the groups can affect the significance observed, we performed Principal Component Analysis (PCA) (Suppl Fig. 4) to investigate whether decreased differences between NAT OMPs and OMP tumors are attributed to variability. However, as expected our results suggested that tumor tissues of both OMPs and non OMPs are highly variable (green and purple ellipses in Suppl Fig. 4) whereas NAT of OMPs are all clustered close together (red ellipse) and are minimally variable. It is noteworthy that we have observed similar results while comparing healthy and NAT (Fig. 1A,B of Ghosh et al., 2022)2 wherein NAT of OMPs clustered separately from NAT of non-OMPS and healthy controls.
Discussion
Our previous finding2 of large numbers of DNA methylation differences and differences of substantial magnitude in CRC with OMP led to the current study. As it has been suggested that epigenetic alterations are age-related changes that increase the risk of cancer and other age-related disorders14, we evaluated the epigenetic age of our patient groups. It is noteworthy that a number of epigenetic clocks have been developed to calculate the epigenetic age of individuals. However, none of these clocks were developed from a population with substantial representation of African Americans. Hence, we chose to develop our own colonic clock using the methylation data of healthy controls from our study. We have also used four published DNA methylation aging clocks (HannumAge3, HorvathAge4, PhenoAge5, GrimAge6, four measures from mitotic clocks (tnsc, tnsc2, pcgtAge, HypoSC)15, as well as an epigenetic aging clock of our own derived colonic clock (EpiAge), to compare the epigenetic age of normal colon mucosa from patients without cancer with normal colon mucosa and tumor tissues from colon cancer patients. Two of the clocks (Horvath and GrimAge) showed no significant difference between the epigenetic age of normal colon mucosa in colon cancer patients compared with patients without cancer. This observation would seem to suggest that the development of colon cancer has no relationship to biological age. However, we note that the Horvath clock relies heavily on DNA methylation data from “adjacent normal” tissues (NAT) of TCGA, so that the failure of this clock to discriminate between NAT of cancer patients and patients without cancer may not be surprising. Furthermore, the GrimAge clock is trained on smoking-pack years. In other words, the training set used in the GrimAge clock is derived from methylation data from individuals with varying degree of smoking habits. In this regard, we note that the GrimAge clock does not distinguish between the epigenetic age of tumor tissues versus NAT of cancer patients and the Horvath clock predicts a slightly younger epigenetic age for tumor tissue.
Three of the aging clocks (Hannum, PhenoAge and EpiAge) show substantial differences between the epigenetic age of NAT in cancer patients versus patients without cancer. However, the direction of the difference (NAT in cancer patient being epigenetically “younger” than non-cancer patient mucosa) is unexpected from the simplest hypothesis that NAT of cancer patients should have greater biological age than the normal tissues of patients without cancer (cancer being a disease associated with greater chronological age).
Because DNA methylation clocks are related, in greater or lesser degree, to failure of maintenance of DNA methylation through cell divisions, i.e., they perform as “mitotic clocks”, a simple prediction is that tumor tissues should have more cell divisions than the NAT. This prediction is fulfilled using the epiTOC, epitTOC2 and HypoClock solo-WCGC models of Teschendorff15. Applying the same models to compare the number of mitotic divisions in NAT of cancer patients with the normal mucosa of patients without cancer, the stem cells of the cancer patients appear to have undergone fewer mitotic divisions in the epiTOC and epiTOC2 clocks. Similarly, significantly decreased hypomethylation of the solo-WCGC in NAT of cancer patients compared to the normal mucosa of patients without cancer suggest fewer stem cell divisions in NAT of patients with cancer. Interestingly, the epiTOC and epiTOC2 clocks lose this significance when OMPs are excluded. Taken at face value, this observation seems to suggest that either the mitotic clocks of cancer patients run slower (because they have accumulated fewer cell divisions), or that they have been started later than those of patients without cancer, or NAT of cancer patients have undergone fewer cell divisions because they have fewer stem cells.
Our analysis does indicate that the NAT of the cancer group has fewer stem cells than the control group using a binomial model of independent X chromosome CpG sites to predict the number of stem cells. Unfortunately, we were limited by the number of OMPs and hence could not carry out a similar comparison for OMPs and non-OMPs as separate groups. However, we did perform differential expression analysis on a subset of OMPs and non-OMPs. Five out of the top 10 upregulated genes (NNMT, SAA1, CCL2, CXCL2, CCN1) have been shown to be associated with cancer stem cells (either have increased expression in cancer stem cells or promote stemness)16,17,18,19,20,21. The top activated pathway was Interleukin-4 and interleukin-13 signaling pathway which is a known pathway for stemness in CRC22,23.
As prognosis in cancer is dependent on tumor characteristics rather than NAT, we also studied the epigenome of the tumor tissues. We had paired tissues (NAT and tumor) from CRC patients, so we were able to conduct paired analysis to identify epigenetic events involved in normal to tumor tissue progression. We quantified the epigenetic differences between NAT and tumor tissues and found that these epigenetic differences followed a decreasing order from White non OMPs, African American non OMPs to OMPs. Our tumor methylation data re-emphasized our initial observation that NAT of OMPs were highly epigenetically disrupted, and more similar to tumor tissues in overall methylation levels, compared to non-OMPs.
It is noteworthy that there was no difference in demographic or clinical profile between OMPs and non-OMPs (Suppl Table 1). We were limited by data on lifestyle factors such as diet, alcohol or smoking history and metabolic status. However, on using methylation-based predictors (Suppl Table 4) we did see significant differences in alcohol consumption, smoking and BMI. Hence, lifestyle factors might be associated with OMP status and epigenetic aging, but this needs to be validated in future studies.
It is noteworthy that OMP is a characteristic feature of NAT. Hence, unlike other molecular markers like CpG island methylator phenotype (CIMP)24 which can be used only after cancer development, OMP has the potential to identify individuals at risk of developing cancer as well as improve treatment outcomes in CRC patients. Our present study provided insights into the characteristic features of OMPs in terms of epigenetic aging and tumor epigenome. However, we were limited by the sample size to conduct any clinical correlations. Of note, decelerated epigenetic age has been associated with poor prognosis in breast cancer25, cervical squamous cell carcinoma26 and stomach adenocarcinoma27.
Our study demonstrated two remarkable distinguishing features of OMP CRC patients- lower epigenetic age (or epigenetic age deceleration) of NAT and minimal epigenetic differences between tumor versus NAT. A potential explanation for these observations could be the existence of cancer stem cells. Cancer stem cells are characterized by self-renewal, proliferative and enhanced plasticity leading to tumorigenesis. They may also contribute to metastasis and tumor heterogeneity. Such cancer stem cells can originate by dedifferentiation of differentiated cells or transformation from a non-malignant adult stem cells28.
Evidently, gut microbiota also plays an important role in inducing stemness. We have shown significant higher abundance of CRC associated bacterial genera- Bacteroides and Fusobacterium in NAT of OMPs compared to non-OMPs2. Interestingly, bacterial species (B. fragilis and F. nucleatum) from both of these genera have been shown to induce stemness in colorectal tissues29,30. Previous studies have also shown that cancer stem cell markers are enriched in NAT31. In our case, this can be explained by using the Waddington landscape (Fig. 10) wherein the presence of cancer stem cells within the NAT of cancer patients results in epigenetic age deceleration. Furthermore, it is plausible that OMPs/outliers have a higher proportion of cancer stem cells which may have originated from dedifferentiation of normal colon cells (red ball in Fig. 10) or oncogenic transformation of stem cells (blue ball in Fig. 10). Irrespective of the mode of origin, these cancer stem cells lose colon cell identity, resulting in diminished resemblance to colon tissue (Fig. 6). We speculate that NAT of OMPs have a higher proportion of cancer stem cells and might explain minimal epigenetic differences between tumor versus NAT.
Our study has some limitations. First of all, we have only considered chronological age for healthy controls for the EpiAge clock development and excluded other demographic variables. Secondly, our conclusions are solely based on in silico (methylation and expression) data. We were not able to validate the results using immunohistochemistry due to unavailability of unstained tissue slides.
In future, large scale multi-centric studies are needed to identify a large group of OMP among CRC patients and screening colonoscopy patients and validate the outcomes of the present study, as well as for better clinical characterization of CRC patients with OMPs (association with tumor stage, response to treatment and survival). Furthermore, interventional studies need to be conducted in screening colonoscopy patients to identify OMPs and employ lifestyle changes (like diet, physical activity) to determine whether they suppress the risk of CRC development.
Methods
Epigenetic aging
Methylation data processing and identification of omps
Genomic DNA from normal colon mucosa of healthy controls (n = 68) and NAT of CRC patients (n = 77) was run on Illumina’s EPIC array as described previously2. The resulting raw files were normalized using minfi package in R and checked for batch effects and quality control2.
Identification of OMPs in this cohort of patients was done by boxplot method of outlier analysis2. Briefly, in the first step, each of the CpG site was analyzed for the presence of outliers [methylation levels beyond 1.5 times the interquartile range (IQR) below the first quartile (“hypomethylated outliers”) or above the third quartile (“hypermethylated outliers”) of the distribution]. In the second step, the distribution of outlier CpGs was plotted for each sample and similar outlier calculations as in Step 1 were done, to identify individuals with extremely large number of outlier CpGs (same IQR test as in Step 1) compared with rest of the population. Outliers of Step 2 were considered as the individuals with OMP. Furthermore, OMP status was validated by using other statistical approaches like unsupervised clustering and principal component analysis (PCA)2.
Development of colonic clock (EpiAge)
Methylation data and chronological ages of 68 controls (from our previous study2 were used to identify age associated CpGs (using lm function in R). Elastic net regression (using glmnet package in R) was used to perform predictive modelling and develop the final EpiAge model. Briefly, highly significant (linear regression p < 0.001) age-related CpGs were used to fit the model using the following code:
fit <-glmnet(x, y, lambda = cv.glmnet(x, y)$lambda.1se.
Then the coefficients were extracted using the below code:
Coefficient <- coef(fit, s = fit$lambda.1se)
Where x = Methylation data of the samples for the significant age-related CpGs; y = Chronological age of the samples.
Calculation of epigenetic ages and mitotic measures
Epigenetic ages were calculated for both published3,4,5,6 and our EpiAge clock using the methylation values (from our previous study2 of the CpGs in the respective clocks. Parameters for mitotic clocks were calculated as described15. We also determined the probability of colon tissue in our samples using Horvath’s tissue predictor function4.
Correlation analyses
Correlation analyses among chronological and epigenetic ages for different study groups were done in GraphPad Prism (version 8).
Between- group comparisons
Comparisons of different parameters (chronological age, epigenetic age, mitotic measures) between the groups were done by two-sided t-test in GraphPad Prism (version 8). P values less than 0.05 were considered significant.
Cell composition/purity
Epithelial cell purity between the NAT of OMPs and NAT of non-OMPs was estimated by leukocyte unmethylation for purity (LUMP) as described previously11.
DNA methylation-based predictors
DNA methylation-based predictors for lifestyle and biochemical traits were calculated using MethylDetectR32.
Stem-cell number calculation using X chromosome CpGs in women
This method is based on the variance of the binomial distribution to calculate the number of stem cells using X-chromosome DNA methylation as a proxy for inactive X-chromosomes in females (after McLaren, 197212; Turan et al., 201013). We filtered 18,567 chromosome X CpGs from the array to 2530 CpGs which have different methylation ranges in males and females; and are unmethylated in males (methylation fraction < 0.10 in males). As we are using the principle of chromosome X inactivation, so we further filtered into 2015 methylated CpGs (> 40% methylation in females). Next, we selected 308 independent CpGs (CpGs > = 250 bp away from neighboring CpGs) for further calculations (Suppl Fig. 5). We used a binomial model to estimate the number, N, of stem cells in the two populations (healthy and cancer) of women. Using this model, our estimate is N ~ p(1-p)/var(p) where p is the average fraction methylated over the 35 healthy control women, or 39 cancer women, and var(p) is its variance.
We hypothesized that all individuals in the group of either healthy controls or cancer women have the same chance, p, of methylating one of the x chromosomes in each of their N embryonic stem cells. This situation is analogous to the estimation of N using data from paternal/maternal X-gene inheritance. In that case p = 1/2. In our case p is unknown but assumed to be the same over all women in the group, that is, all healthy women have the same p, and all cancer women have the same p, but the two groups may have different values of p. We used each site to estimate N using the above formula.
We tested the null hypothesis of no difference between the two groups. This was done for each of the 308 sites as follows: we first determined the N value for each group at one site, Nh (average number of stem cells in healthy control group) and Nc (average number of stem cells in cancer group). Next, we found the absolute difference diff(data) = Nh – Nc. Then we randomly permuted the 74 data numbers to form a simulated data set. In this simulated data all 74 observations are equally likely, 35/74, to be assigned to the healthy simulated set and 39/74 to the cancer set. So, the hypothesis of no difference is strictly true for the simulated data. Finally, we computed diff(simulated) for the simulated data and compared it to diff(data). We simulated 99 data sets and counted the number of times the simulated difference is smaller than diff(data). If diff(data) is among the 5 largest differences we declared H > C significantly different, at the 5% level.
We realized that with 308 tests at the 5% level we should expect at least a few spurious ones. We did not want to subject all 308 sites to more stringent testing because of the time it would consume. Instead, we took the 5% significant results (less than 20) and retested each of them 100 times. If a result passed 100 new 5% randomized tests all 100 times it was regarded as significant. Otherwise, it was discarded from the list of significant results. The resulting tests significance level is (0.05)^100 = 5E-98. We expect no spurious significant results.
Differential gene expression analysis
RNASeq was performed on 4 OMPs and 7 non-OMPs. We used R package DESeq2, version 3.13 for differential gene expression analyses. A log2fold change > 1 or <−1 and padjusted<0.05 was considered as significant.
Methylation analyses of tumor samples
Tumor samples
Tumor tissues (fresh frozen) of 76 patients with colorectal cancer (enrolled in previous study2 were purchased from Fox Chase Cancer Center biobank.
Written informed consent from the patients was obtained and the study was conducted in accordance with Declaration of Helsinki ethical guidelines and the study was approved by Temple University’s institutional review board.
Sample processing and quality check
Genomic DNA was extracted from colon tissue samples using Invitrogen PureLink genomic DNA kit as per the manufacturer’s protocol. Extracted DNAs were quantified using Thermo Fisher’s NanoDrop.
Illumina EPIC array
Extracted DNA was sent to an external Genomic Facility at Penn State University to be run on Illumina’s EPIC array. Prior to array run, extracted DNA was treated with bisulfite using Zymo EZ DNA methylation kit. Bisulfite-treated DNA was processed further to run Illumina’s EPIC array as described previously33. The output data were generated in the idat files.
Array data processing
Raw data files were preprocessed using minfi’s preprocessIllumina function as described previously2. Beta values obtained after these preprocessing steps were used for all the subsequent analyses.
Differential methylation analysis
Between group comparisons were done using paired two-sided t test for methylation values. Bonferroni correction was used to correct for multiple testing and P values less than 4.6E-07 was considered as the cutoff for significance, as described previously2. A cut off (0.05) for magnitude of difference in beta values was also introduced. Hence, CpGs with P value less than 4.6E-07 and magnitude of difference > 0.05 were considered to be significant. Differential methylation analyses were done using two-sided t test in R.
Principal component analysis (PCA)
Principal component analysis was done by using FactoMineR package in R.
Data availability
The datasets generated during this study are available in the GEO repository (GSE 199057).
References
Ghosh, J., Schultz, B., Coutifaris, C. & Sapienza, C. Highly variant DNA methylation in normal tissues identifies a distinct subclass of cancer patients. Adv. Cancer Res. 142, 1–22 (2019).
Ghosh, J. et al. Epigenome-Wide study identifies epigenetic outliers in normal mucosa of patients with colorectal cancer. Cancer Prev. Res. Phila. Pa. 15, 755–766 (2022).
Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell. 49, 359–367 (2013).
Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).
Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 10, 573–591 (2018).
Lu, A. T. et al. DNA methylation grimage strongly predicts lifespan and healthspan. Aging 11, 303–327 (2019).
Zheng, C., Li, L. & Xu, R. Association of epigenetic clock with consensus molecular subtypes and overall survival of colorectal cancer. Cancer Epidemiol. Biomark. Prev. Publ Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol. 28, 1720–1724 (2019).
Wang, T. et al. Dysfunctional epigenetic aging of the normal colon and colorectal cancer risk. Clin. Epigenetics. 12, 5 (2020).
Devall, M. et al. Racial disparities in epigenetic aging of the right vs left colon. J. Natl. Cancer Inst. 113, 1779–1782 (2021).
Ghosh, J., Mainigi, M., Coutifaris, C. & Sapienza, C. Outlier DNA methylation levels as an indicator of environmental exposure and risk of undesirable birth outcome. Hum. Mol. Genet. 25, 123–129 (2016).
Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
McLaren, A. Numerology of development. Nature 239, 274–276 (1972).
Turan, N. et al. Inter- and intra-individual variation in allele-specific DNA methylation and gene expression in children conceived using assisted reproductive technology. PLoS Genet. 6, e1001033 (2010).
López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell 153, 1194–1217 (2013).
Teschendorff, A. E. A comparison of epigenetic mitotic-like clocks for cancer risk prediction. Genome Med. 12, 56 (2020).
Wang, W., Yang, C., Wang, T. & Deng, H. Complex roles of nicotinamide N-methyltransferase in cancer progression. Cell. Death Dis. 13, 267 (2022).
Wang, X. et al. SAA suppresses α-PD-1 induced anti-tumor immunity by driving TH2 polarization in lung adenocarcinoma. Cell. Death Dis. 14, 718 (2023).
Lim, S. Y., Yuzhalin, A. E., Gordon-Weeks, A. N. & Muschel, R. J. Targeting the CCL2-CCR2 signaling axis in cancer metastasis. Oncotarget 7, 28697–28710 (2016).
Chen, M. C. et al. CXCL2/CXCR2 axis induces cancer stem cell characteristics in CPT-11-resistant LoVo colon cancer cells via Gαi-2 and Gαq/11. J. Cell. Physiol. 234, 11822–11834 (2019).
Lv, Y. et al. CXCL2: a key player in the tumor microenvironment and inflammatory diseases. Cancer Cell. Int. 25, 133 (2025).
Haque, I. et al. Cyr61/CCN1 signaling is critical for epithelial-mesenchymal transition and stemness and promotes pancreatic carcinogenesis. Mol. Cancer. 10, 8 (2011).
Cui, G., Wang, Z., Liu, H. & Pang, Z. Cytokine-mediated crosstalk between cancer stem cells and their inflammatory niche from the colorectal precancerous adenoma stage to the cancerous stage: mechanisms and clinical implications. Front. Immunol. 13, 1057181 (2022).
He, B. et al. IL-13/IL-13RA2 signaling promotes colorectal cancer stem cell tumorigenesis by inducing ubiquitinated degradation of p53. Genes Dis. 11, 495–508 (2024).
Toyota, M. et al. CpG Island methylator phenotype in colorectal cancer. Proc. Natl. Acad. Sci. U S A. 96, 8681–8686 (1999).
Ren, J. T., Wang, M. X., Su, Y., Tang, L. Y. & Ren, Z. F. Decelerated DNA methylation age predicts poor prognosis of breast cancer. BMC Cancer. 18, 989 (2018).
Lu, X. et al. Epigenetic age acceleration of cervical squamous cell carcinoma converged to human papillomavirus 16/18 expression, immunoactivation, and favourable prognosis. Clin. Epigenetics. 12, 23 (2020).
Hong, C. et al. Epigenetic age acceleration of stomach adenocarcinoma associated with tumor stemness Features, Immunoactivation, and favorable prognosis. Front. Genet. 12, 563051 (2021).
Chu, X. et al. Cancer stem cells: advances in knowledge and implications for cancer therapy. Signal. Transduct. Target. Ther. 9, 170 (2024).
Liu, Q. Q. et al. Enterotoxigenic bacteroides fragilis induces the stemness in colorectal cancer via upregulating histone demethylase JMJD2B. Gut Microbes. 12, 1788900 (2020).
Liu, H. et al. Fusobacterium nucleatum promotes colorectal cancer cell to acquire stem cell-Like features by manipulating lipid Droplet-Mediated numb degradation. Adv. Sci. Weinh Baden-Wurtt Ger. 9, e2105222 (2022).
Atkinson, R. L. et al. Cancer stem cell markers are enriched in normal tissue adjacent to triple negative breast cancer and inversely correlated with DNA repair deficiency. Breast Cancer Res. BCR. 15, R77 (2013).
Hillary, R. F. & Marioni, R. E. MethylDetectR: a software for methylation-based health profiling. Wellcome Open. Res. 5, 283 (2020).
Mani, S. et al. Epigenetic changes in preterm birth placenta suggest a role for ADAMTS genes in spontaneous preterm birth. Hum. Mol. Genet. 28, 84–95 (2019).
Funding
The current study was funded by award numbers R21CA264213 and R01CA281948 from the National Cancer Institute. The project described was also supported by TUFCCC/HC Regional Comprehensive Cancer Health Disparity Partnership, Award Numbers U54 CA221704 (5) and 2 U54 CA221704 (5) from the National Cancer Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
JG and CS conceived and designed the study, OA, JNR, SL, SB, BMS and JG performed statistical analysis, BMS processed the samples for methylation array, JG wrote the manuscript with inputs from CS. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Schultz, B.M., Ayeotan, O., Rana, J.N. et al. Abnormal epigenetic aging of epigenetic outliers in normal colon mucosa from colorectal cancer patients. Sci Rep 16, 4022 (2026). https://doi.org/10.1038/s41598-025-34035-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-34035-x









