Mapping the temporal landscape of breast cancer using epigenetic entropy

Monyak, Daniel L.; Holloway, Shannon T.; Gumbert, Graham J.; Grimm, Lars J.; Hwang, E. Shelley; Marks, Jeffrey R.; Shibata, Darryl; Ryser, Marc D.

doi:10.1038/s42003-025-08867-2

Download PDF

Article
Open access
Published: 16 October 2025

Mapping the temporal landscape of breast cancer using epigenetic entropy

Communications Biology volume 8, Article number: 1477 (2025) Cite this article

2920 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Although generally unknown, the age of a newly diagnosed tumor encodes valuable etiologic and prognostic information. Here, we estimate the age of breast cancers, defined as the time from the start of growth to detection, using a measure of epigenetic entropy derived from genome-wide methylation arrays. Based on an ensemble of neutrally fluctuating CpG (fCpG) sites, this stochastic epigenetic clock differs from conventional clocks that measure age-related increases in methylation. We show that younger tumors exhibit hallmarks of aggressiveness, such as increased proliferation and genomic instability, whereas older tumors are characterized by elevated immune infiltration, indicative of enhanced immune surveillance. These findings suggest that the clock captures a tumor’s effective growth rate resulting from the evolutionary-ecological competition between intrinsic growth potential and external systemic pressures. Because of the clock’s ability to differentiate old and stable from young and aggressive tumors, it has potential applications in risk stratification of early-stage breast cancers and guiding early detection efforts.

Functionally enriched epigenetic clocks reveal tissue-specific discordant aging patterns in individuals with cancer

Article Open access 02 April 2025

DNA methylation landscapes of 1538 breast cancers reveal a replication-linked clock, epigenomic instability and cis-regulation

Article Open access 13 September 2021

Field cancerization, accelerated aging, and immunosuppression: the rapid rise of hormone-sensitive and early-onset breast cancer

Article Open access 18 November 2025

Introduction

When a woman is diagnosed with breast cancer, it is generally not possible to ascertain how long the tumor has been growing. Yet knowledge about a tumor’s age at diagnosis could provide important prognostic clues: older indolent tumors that are less likely to progress may require less invasive treatment, whereas younger fast-growing tumors require more urgent and aggressive treatment. While there is a rich literature on the estimation of the mean sojourn time of breast cancer^1,2,3,4,5—the average time tumors spend in a detectable but asymptomatic, pre-clinical state—there is a paucity of tools to assess the age of individual tumors at the time of detection.

Epigenetic clocks provide a promising approach to estimate individual tumor age. Originally developed to quantify the biologic aging process in humans, epigenetic clocks leverage specific patterns of DNA methylation that are strongly correlated with biologic tissue age^6,7. Broadly, these clocks focus on the DNA methylation status of CpG sites across the genome that become differentially methylated with increasing tissue age. These clocks are well suited for estimating the number of cell divisions since birth of the person, making them valuable tools for studying normal tissue aging and pre-neoplastic changes^8,9,10,11,12. However, they are less effective for estimating tumor mitotic age, defined as the number of cell divisions since birth of the first tumor cell.

In contrast to “one-way” clocks that measure age-related methylation changes, “two-way” epigenetic clocks leverage CpG sites that fluctuate between the unmethylated and methylated states on a relatively fast time scale (Fig. 1A). Originally introduced in the context of homeostatic intestinal stem cell dynamics¹³, such stochastic epigenetic clocks have also been applied to hematologic malignancies¹⁴. Here we follow similar design principles to develop a breast cancer-specific two-way epigenetic clock to measure individual tumor mitotic age at diagnosis based on average methylation levels ($\beta$-values) of select CpG sites included in standard methylation arrays. Unlike “one-way” methylation clocks that begin ticking at birth, our clock is reset at onset of tumor growth, allowing it to track the number of cell divisions up to the time of diagnosis.

**Fig. 1: Fluctuating CpG sites (fCpG) and modulators of tumor age.**

The proposed stochastic clock measures the entropy of an ensemble of fluctuating CpG (fCpG) sites. In the tumor’s most recent common ancestor cell, each fCpG was either unmethylated ($\beta =0$), hemi-methylated ($\beta =0.5$), or methylated ($\beta =1$). As the tumor expands, replication errors produce a mixture of cells with different methylation states (Fig. 1B), thus progressively increasing the tumor’s epigenetic entropy¹⁵. In the special case of unbiased fCpG sites—whose methylation and de-methylation rates are in balance—the bulk-level methylation converges to $\beta =0.5$ with increasing tumor mitotic age, regardless of the first tumor cell’s state (Fig. 1C). Thus, by measuring the distribution of unbiased fCpG sites, we can derive an estimate of the age of a given tumor cell population relative to the start of the most recent clonal expansion.

The combination of tumor-specific age estimates and gene expression profiles further provides a unique opportunity to characterize the evolutionary and ecological pressures that shape the temporal landscape of breast cancer. Notably, aggressive tumors that evolve in a weakly suppressive immune microenvironment are expected to reach a detectable size faster than indolent, slow-growing tumors in a strongly suppressive immune microenvironment (Fig. 1D).

The manuscript is structured as follows. Combining DNA methylation and gene expression data from several hundred breast cancer and normal breast tissue samples, we first identify a set of unbiased fCpG sites and introduce the epigenetic clock index as a proxy measure of tumor mitotic age. We then evaluate the face validity of the index by examining its relationship with established prognostic markers, and we combine methylation and gene expression data to identify tumor- and microenvironment-specific factors that modulate tumor mitotic age. Finally, we validate key properties of the clock index in independent cohorts of patients with paired primary-metastasis samples, and we derive quantitative estimates of individual breast cancers’ mitotic and calendar ages.

Results

Selection of unbiased fCpG sites

To identify a set of unbiased fCpG sites in breast cancer, we used 450K methylation array data from 634 invasive breast cancers in The Cancer Genome Atlas¹⁶ (TCGA) and 79 normal breast tissue samples¹⁷. Using a two-step selection process, we identified an ensemble of fCpG sites with balanced (de-)methylation rates as follows.

First, we identified CpG sites with an average $\beta$-value close to 0.5 in both normal breast tissue and breast cancers, thus excluding sites with an inherent bias toward methylation or de-methylation, and sites that are subject to systematic selection during homeostasis and/or tumorigenesis (Fig. 2A). To avoid confounding by cell type heterogeneity, we further constrained candidate sites to be balanced within luminal and basal epithelial breast tissues, respectively (Fig. 2B).

**Fig. 2: Selection and validation of unbiased fluctuating CpG (fCpG) sites.**

In the second step, we ordered the set of unbiased CpG sites by between-tumor variability and included only the 500 most fluctuating sites in the final clock set of unbiased fCpGs (Fig. 2C; see Supplementary Data 1 for clock fCpGs). Importantly, this final step excludes non-informative sites that either do not fluctuate at all (i.e., imprinted hemi-methylated state) or fluctuate too fast (i.e., steady-state methylation of $\beta \approx 0.5$ reached on time scales much shorter than the average tumor mitotic age at diagnosis). To ensure that this step did not select for CpG sites that mainly captured cell-type composition effects, we verified that individual fCpGs had low ${{\rm{\beta }}}$-value variation across the normal samples (median standard deviation of 0.06 compared to 0.19 across tumors; P < 10⁻¹⁰, Wilcoxon rank-sum test; Fig. 2D). Furthermore, as we expected, the fCpG sites were enriched for non-genic/regulatory CpGs, with 27.6% of fCpGs versus 17.8% of non-fCpGs being non-genic/regulatory (P = 2 × 10⁻⁸, ${\chi }^{2}$ test).

Next, we sought to validate the unbiased and fluctuating nature of the clock set in two independent cohorts. In a cohort of 146 breast cancer patients (Lund cohort)¹⁸, we found significantly higher inter-tumor variability in $\beta$-values among the CpG sites in the clock set, as compared to the CpG sites not included in the clock set (Fig. 2E). Similarly, in a small cohort of 5 patients with multiple samples from their primary tumors¹⁹, we found elevated intra-tumor variability in clock set vs non-clock set sites (Fig. 2F). Together, these patterns corroborate the unbiased and fluctuating nature of the clock set of CpG sites.

Interestingly, the fCpG sites in the clock set were more tightly concentrated around $\beta =0.5$ in each normal breast sample, as compared to breast cancers (Supplementary Fig. 1). Consistent with the underlying dynamic model of the clock (Fig. 1A), this suggests that over decades of breast development and maintenance, the fCpGs had converged to the stationary methylation state of $\beta =0.5$ (Fig. 1C).

Epigenetic clock index

At the level of individual tumors, the 500 fCpG sites in the clock set exhibited primarily unimodal or bimodal distributions of $\beta$-values (Fig. 3A). We explored how these tumor-specific distributions of $\beta$-values could be used to estimate tumor mitotic age. In the founding tumor cell, each fCpG starts in either the unmethylated ($\beta =0$), hemi-methylated ($\beta =0.5$), or methylated ($\beta =1$) state (Fig. 1A). Although the trajectories of individual sites are subject to stochastic fluctuations (Fig. 1C), an ensemble of sites starting in the same initial configuration collectively drift toward the steady state of $\beta =0.5$ (Fig. 3B).

**Fig. 3: The distribution of β-values across the clock set encodes tumor age.**

By considering the histograms of $\beta$-value distributions at different tumor mitotic ages, we can track the evolution of the three “peaks” corresponding to the subsets of initially unmethylated, hemi-methylated, and methylated clock sites (Fig. 3C). As the tumor’s mitotic age increases, the left peak of the histogram (consisting of originally unmethylated clock sites) starts moving to the right, whereas the right peak (originally methylated sites) moves to the left; the middle peak (originally hemi-methylated sites) remains stationary. By measuring the extent to which the three peaks have converged to the stationary value of $\beta =0.5$, we can thus estimate the mitotic age of individual tumors.

Concretely, we used the standard deviation of the $\beta$-values, denoted by ${s}_{\beta }$, to quantify the relationship between tumor mitotic age and the evolving clock set profile (Fig. 3C). Because ${s}_{\beta }$ is highest at time 0, when the $\beta$-value distribution exhibits three sharp peaks, and then monotonically decreases over time (Fig. 3D), we introduced the epigenetic clock index ${c}_{\beta }=1-{s}_{\beta }$ as a proxy measure of tumor mitotic age (Fig. 3E; see Supplementary Data 2 for clock index values).

While most breast cancers are thought to derive from a single clone of origin, recent work suggests that some breast cancers may derive from multiple founder cells²⁰. If this is the case, the initial $\beta$-value distribution of fCpGs is not confined to the three discrete peaks at $\beta$ = 0, 0.5, and 1. The initial epigenetic entropy of a multiclonal cancer is thus higher compared to a cancer of monoclonal origin, and the epigenetic clock index will overestimate the true tumor mitotic age at the time of resection. However, because it appears unlikely that a tumor derives from a large number of independent clones, we expect this bias to be limited.

We identified a significant difference in the distribution of the epigenetic clock index ${c}_{\beta }$ among invasive ductal carcinomas (IDC; median, 0.81) and invasive lobular carcinomas (ILC; median, 0.85; P < 10⁻¹⁰, Wilcoxon rank-sum test). This is aligned with the notion of ILC as a molecularly and clinically distinct disease entity²¹, and consistent with the propensity of ILCs to be mammographically occult²², which may lead to older tumor mitotic ages at the time of diagnosis. Cognizant of these differences between ILC and IDC, we decided to reduce further confounding in downstream analyses by focusing on the ductal cancers.

In the next two sections, we characterize the relationship between a tumor’s mitotic age, as quantified by the epigenetic clock index ${c}_{\beta }$, and its evolutionary-ecological context as determined by its intrinsic growth potential and external pressures from the microenvironment.

Younger tumors have more aggressive phenotypes

As a breast tumor grows, its likelihood of detection on the basis of imaging or symptoms increases. Because fast growing tumors are expected to reach a detectable size sooner than slow growing ones, we hypothesized that younger tumor mitotic age would correlate with established markers of tumor aggressiveness. To test this hypothesis, we correlated the epigenetic clock index with several established features of tumor aggressiveness, including molecular subtype^23,24, genomic instability^25,26, grade²⁷, proliferation²⁸, and size²⁹.

There was a clear relationship between tumor mitotic age and molecular subtype: luminal A tumors, which have a more favorable prognosis, were older than luminal B and basal tumors (Fig. 4A-B; Supplementary Table 1). Similarly, when using clinical instead of molecular subtyping in two additional cohorts of breast tumors (Germany cohort, n = 253; WCHS cohort, n = 445), triple-negative cancers tended to be younger compared to luminal A and luminal B cancers (Supplementary Fig. 2A, 3). Consistent with these subtype patterns, there was a strong correlation between genomic instability and younger tumor mitotic age (Fig. 4C,D). Younger tumors were of higher histopathologic grade (Fig. 4E) but not stage (Fig. 4F), and more likely to be Ki-67-positive (Supplementary Fig. 2B).

**Fig. 4: Epigenetic clock index vs. clinicopathological variables.**

Another prognostic factor in breast cancer is tumor size, with larger lesions having worse outcomes. We found that smaller tumors were of older mitotic age compared to larger tumors (Fig. 4G, H), presumably because slow growing tumors spend more time at the smaller end of the detectable size range, and are, therefore, more likely to be detected at a smaller size.

Finally, the relationship between tumor mitotic age and patient age at diagnosis was inconclusive, with a weak negative correlation in TCGA (R = −0.18) and no correlation in either the Lund (R = −0.10; Supplementary Table 1) or WCHS (R = −0.03; Supplementary Table 1) cohorts. This is consistent with the notion that the fCpG clock measures the age of the tumor—starting with the most recent common ancestor cell—and not the age of the patient.

Identifying modulators of tumor mitotic age

The time it takes for a tumor to grow from a single cell to a detectable mass depends on its effective growth rate, that is the difference between cell proliferation and cell death (Fig. 5A). Cell proliferation primarily reflects the tumor’s intrinsic growth potential and aggressiveness, whereas cell death is often the result of extrinsic selective pressures applied by the tumor microenvironment, such as immune surveillance and resource constraints due to limited vascularization^30,31.

**Fig. 5: Tumor mitotic age vs. measures of proliferation.**

To explore putative modulators of effective tumor growth and tumor mitotic age at diagnosis, we performed genome-wide correlation analyses of the epigenetic clock index ${c}_{\beta }$ against gene expression. As predicted, mitotically younger tumors exhibited increased expression of proliferation-related genes such as Ki67 and MCM2 (Fig. 5B, Supplementary Table 1). The signal was further augmented when considering the average expression across a set of genes involved in M-phase and mitotic checkpoint regulation (Fig. 5C,D) and the fraction of cells in S-phase (Fig. 5E).

Next, we examined the microenvironment’s ability to decrease the effective growth rate of a tumor through increased cell death. As hypothesized, the expression of immune cell markers such as CD3, CD4, CD8 and FOX3 was elevated in mitotically older tumors (Fig. 5B; Supplementary Table 1). This suggests that tumors which are subject to immune surveillance—e.g., through neo-antigen directed immune control by CD8 + T-cells—have a lower effective growth rate and, thus, reach a detectable size at an older mitotic age, as compared to tumors that successfully evade immune control and thus reach a detectable size at a younger mitotic age.

To perform a systematic analysis of tumor mitotic age modulation, we performed a genome-wide gene set enrichment analysis (GSEA) (Fig. 6A). Consistent with the univariate gene expression analyses, mitotically younger tumors were enriched for pathways related to proliferation and cell cycle control. Conversely, mitotically older tumors were enriched for immune pathways and immune-related signaling pathways, again supporting the notion of effective immune control in older, slower growing lesions.

**Fig. 6: Pathway enrichment and immune decomposition analyses.**

For a more in-depth analysis of the immune infiltrate, we used the CIBERSORTx algorithm³² to estimate the extent and composition of the immune compartment. As expected, the extent of the immune compartment increased with tumor mitotic age (Fig. 6B). When decomposing each tumor’s immune compartment into the major cell types, we found an increase in the fraction of T-cells in mitotically older tumors (Fig. 6C, Supplementary Table 2), again suggestive of T-cell mediated immune surveillance.

Analysis of paired tumor samples validates epigenetic clock

Multiple tumor samples from the same patient provide a unique opportunity to assess the internal validity of the epigenetic clock. Indeed, paired samples should be epigenetically more related—via their most recent common ancestor cell—than samples from different patients. In a cohort of 8 women with multi-focal breast cancer³³, we found that the within-patient correlations of the clock set fCpG sites were higher (median, 0.70) than the between-patient correlations (median, 0.11; P = 3 × 10⁻⁶, Wilcoxon rank-sum test; Fig. 7A). The same held true for a cohort of 18 patients with paired primary tumors and lymph node metastases³⁴ (median, 0.85 vs. 0.13, P < 10⁻¹⁰; Fig. 7B) and a subset of 22 patients with paired primary tumors and metastases (including lymph node and distant metastases) from the AURORA US Metastasis Project³⁵ (median, 0.66 vs. 0.13, P < 10⁻¹⁰; Supplementary Fig. 4).

The two cohorts of patients with paired primary and metastasis samples^34,35 allowed us to test two additional properties of the fCpG clock. First, assuming that each metastasis is seeded by a single cell from the primary tumor, synchronous metastases should be younger than their matched primaries. Indeed, the epigenetic clock measures the mitotic age of the metastasis relative to the seeding event, which occurred after initiation of the primary tumor. Consistent with this prediction, metastases had a lower epigenetic clock index compared to their matched primaries in 33/40 patients, and only one metastasis was noticeably older than the matched primary (Fig. 7C). These findings provide direct support for interpreting the epigenetic clock index as a proxy measure for tumor mitotic age. However, it is important to acknowledge that potential differences in tumor biology and microenvironmental conditions between primary tumors and metastases may act as unmeasured confounders in this analysis.

Second, the timing of metastatic dissemination relative to the primary tumor’s age is expected to impact the epigenetic similarity of the two samples: if the metastasis is seeded early during primary tumor growth (i.e., similar ${c}_{\beta }$ values), the $\beta$-values of the two samples are expected to be closely related (Fig. 7D, E) because the metastasis seeding cell came from a mostly homogenous population; conversely, if the metastasis is seeded late (i.e., different ${c}_{\beta }$ values), the $\beta$-values are expected to differ more substantially (Fig. 7F) because the seeding cell came from a heterogenous population. Corroborating this hypothesis, and consistent with a corresponding simulation of metastatic seeding based on the oscillator model (Fig. 7G), we found a negative correlation between metastasis mitotic age difference and $\beta$-value similarity (Fig. 7H).

Finally, we note that metastases may be seeded not by a single cell, but by a cluster of cells from the primary tumor³⁶. In such cases—similar to the scenario of a multiclonal primary tumor (see ‘Epigenetic clock index’)—our method may overestimate the true mitotic age of the metastasis. However, as long as the estimated mitotic age of the metastasis is younger than that of the matched primary, its true mitotic age must also be younger, and thus the conclusion derived from Fig. 7C remains valid.

Quantifying tumor mitotic age

So far, we have used the epigenetic clock index ${c}_{\beta }$ as a correlate of tumor mitotic age. To derive quantitative estimates of each tumor’s mitotic and calendar ages, we proceeded as follows (see Methods for details). First, we invoked the mathematical oscillator model (Fig. 1A) to relate tumor mitotic age to the measured $\beta$-values of fCpG sites in the clock set. Next, we decomposed each tumor’s empirical fCpG $\beta$-value distribution into three groups (Fig. 8A): originally unmethylated fCpGs (left peak in the histogram), originally hemi-methylated fCpGs (middle peak), and originally methylated fCpGs (right peak). Finally, we combined the peak location in each group with the oscillator model to infer the estimated mitotic age of the tumor (Fig. 8B).

Finally, we combined tumor-specific estimates of mitotic age (Fig. 8B) and proliferation rate (Fig. 8C, D) to derive tumor-specific estimates of calendar age (see Supplementary Data 2 for age estimates). Anchoring the median tumor age at a consensus estimate of 3 years (see Methods), the distribution of calendar ages across the TCGA and Lund cohorts ranged from 0.3 to 29.9 years, with an interquartile range of 1.6 to 5.5 years (Fig. 8E). There were notable differences in median tumor calendar ages by molecular subtype, ranging from 1.1 years in basal cancers to 6.5 years in luminal A cancers (Supplementary Table 3). Interestingly, even though there was only a small difference between Her2-positive and luminal A tumors with respect to tumor mitotic age (163 vs.171 generations), there was a clear separation with respect to calendar age (2.5 vs. 6.5 years; Supplementary Table 3). In other words, the Her2-positive cancers accumulated a similar number of cell divisions in a much shorter time, which is consistent with a generally more aggressive clinical presentation and worse prognosis^37,38.

Adjusting for tumor purity

Bulk samples contain a mixture of tumor and stroma. Because the epigenetic clock index exhibited correlations with tumor purity as measured in the TCGA cohort by the consensus purity estimate³⁹ (CPE; R = −0.70; Supplementary Fig. 5A), we had restricted our analyses to samples of high tumor purity (CPE ≥ 0.6). Nevertheless, we cannot rule out that the observed variability in $\beta$-value distributions among the selected fCpG sites—which are used to estimate tumor mitotic age—were at least partially driven by the methylation patterns of admixed non-epithelial cells. If this is the case, then, e.g., the immune pathway enrichment of older tumors (Fig. 6A) may be confounded by the presence of non-epithelial cells that alter the measured $\beta$-value distribution.

First, we checked whether the selection of clock sites was strongly influenced by the presence of non-epithelial cells. To do this, we started with all unbiased sites (mean $\beta \in [0.4,\,0.6]$ in TCGA, normals, basal/luminal epithelial) and separately identified the 500 most variable sites in the least and most pure tumors (bottom and top CPE quartiles). These two selections showed high overlap (91% and 99%) with the original clock set of most variable CpG sites across all tumors, suggesting that the selection of fCpGs was not unduly driven by the non-epithelial tumor compartment.

Next, to adjust for possible confounding of $\beta$-values by tumor purity, we modeled the measured methylation as a mixture of tumor and stroma methylation, see Methods for details. This adjustment was based on the CPE measure of tumor purity³⁹; we also employed EpiSCORE⁴⁰ to estimate the fraction of epithelial content and found that the two measures of purity were well correlated (R = 0.63). The resulting purity-adjusted epigenetic clock index ${c}_{\beta }^{\alpha }$ exhibited a lower correlation with tumor purity (R = −0.25, Supplementary Fig. 5B) and was lower than the unadjusted epigenetic clock index ${c}_{\beta }$ (Supplementary Fig. 5C).

When replacing the unadjusted epigenetic clock index with the purity-adjusted version, the strength of correlations between markers of tumor aggressiveness and younger tumor mitotic age remained unaltered (Supplementary Table 1, Supplementary Fig. 6, Supplementary Fig. 7). Individual immune genes and the extent of immune infiltration remained associated with older tumor mitotic age, although the correlations were attenuated (Supplementary Table 1, Supplementary Table 2). While the immune pathways were no longer enriched in older tumors (Supplementary Fig. 7), there was still a positive correlation between the fraction of T cells and tumor mitotic age (Supplementary Table 2).

Discussion

In this study, we developed an epigenetic clock to measure the age of newly diagnosed breast cancers. Measuring epigenetic entropy among neutrally fluctuating CpG (fCpG) sites, the clock requires no knowledge about the tumor’s initial DNA methylation state. This allows it to track the number of cell divisions since birth of the tumor, a task that previous tissue clocks were not able to accomplish. Based on standard methylation arrays, the clock has potential as a novel marker of aggressiveness and prognosis in early-stage breast cancer.

Once a patient is diagnosed with breast cancer, the tumor’s mitotic age encodes valuable prognostic information. Intuitively, a slow-growing tumor that takes a long time to reach the threshold of detection is more likely to have a good prognosis compared to a fast-growing tumor that quickly expands into a detectable mass. Our analyses corroborate this hypothesis by revealing that mitotically younger tumors were enriched for features of tumor aggressiveness and predictors of poor outcome, including genomic instability, higher grade, and basal molecular subtype^25,26. This property of the epigenetic clock is quite remarkable given that its constituent fCpG sites were selected only on the basis of simple statistical properties of their ${{\rm{\beta }}}$-value distributions.

Beyond prognostication, the clock holds promise in risk-stratified screening approaches. The efficacy of breast cancer screening critically depends on the sojourn time, that is the time window during which the tumor is asymptomatic but mammographically detectable. If the sojourn time is short, early detection is unlikely even under frequent screening; if it is long, some cancers will be overdiagnosed⁴¹. Sojourn time estimates are usually obtained by fitting natural history models to population data, yielding indirect, population-averaged estimates. Our approach, in contrast, allows for direct and individual-level characterization of tumor age, which provides an upper bound for the sojourn time. Assuming an overall median time to detection of 3 years in our cohort, the time to detection in luminal A cancers (6.5 years) was substantially longer compared to that in luminal B (2.4 years), Her2-positive (2.5 years), and basal (1.1 years) cancers. These estimates are consistent with the observation that interval cancers are enriched for more aggressive subtypes compared to screen-detected cancers⁴², and highlight opportunities for data-driven personalization of screening schedules.

The epigenetic clock also provides an opportunity to quantify the evolutionary-ecological pressures that shape the temporal landscape of breast cancers. Indeed, because most tumors are of comparable size at the time of diagnosis, tumor mitotic age is related to the effective growth rate: tumors that reach the detection threshold at a younger age have a higher effective growth rate compared to tumors that reach the threshold at a higher age. Our analyses characterized the effective growth rate of breast cancers as a competition between tumor-intrinsic growth potential (e.g., proliferation) and microenvironmental pressures (e.g., surveillance by immune cells)⁴³. According to this model, highly proliferative tumors that successfully evade the immune system are detected at a younger age compared to less proliferative lesions subject to continuous immune control.

Our study has several limitations. First, because tumor age is not observable in practice, a direct validation of the clock is not possible. Nevertheless, we note that the clock correctly classified the age ordering of primary tumors and metastases in 33 of 40 patients. Second, the epigenetic clock index was correlated with sample purity, which suggests the latter may be a confounder in our analyses. To mitigate this risk of bias we systematically repeated all analyses using purity-adjusted methylation values; while some of the associations were attenuated, the overall qualitative conclusions remained unchanged. To address the potential confounding of age estimates by tumor purity, single cell methylation data is needed. Third, estimation of tumor mitotic age was based on a simple mathematical model of (de-)methylation dynamics. In future work, this approximation can be refined using more sophisticated simulation-based models that account for the underlying population dynamics, including cell proliferation and death, and possibly selection.

How long a newly diagnosed breast cancer has been growing is generally considered a known unknown. Here we revisited this assumption and developed a way to infer tumor age using standard methylation arrays. While developed specifically for breast cancer, the approach can be generalized to any cancer type and, as such, provides a scalable technology to characterize the temporal landscape of oncology.

Methods

TCGA cohort

Of the 1085 invasive breast cancers from female patients in The Cancer Genome Atlas (TCGA)⁴⁴, 774 had available methylation array data (Infinium HumanMethylation450 BeadChip, Illumina, San Diego, CA, USA). After excluding 138 tumors of low tumor content (consensus purity estimate³⁹ [CPE] <0.6) and one sample each from two patients with two primary samples, the remaining 634 tumors were used to select the ensemble of 500 fCpG sites. Finally, after excluding 10 tumors with ≥5% missing clock set fCpG measurements and 100 tumors with a histology code other than infiltrating duct carcinoma, the analytic cohort consisted of 400 tumors. The following variables were retrieved: patient age at diagnosis; tumor histology; T stage (subsetted to T1, T2, T3, and T4); summary stage (subsetted to stages I, II, or III). For all 400 patients with invasive ductal carcinoma, gene expression quantification (RNA-seq) and copy number segment data were available as well; when >1 measurement was available, one was selected at random. All clinical and sequencing data were retrieved from the Genomic Data Commons (GDC; https://gdc.cancer.gov) using the R package TCGAbiolinks (version 2.25.3)⁴⁵.

Lund cohort

We retrieved publicly available methylation array data (Infinium HumanMethylation450 BeadChip) from 181 primary breast cancers in the Southern Sweden Breast Cancer Group tissue bank at the Department of Oncology and Pathology, Skåne University Hospital (Lund, Sweden) and the Department of Pathology, Landspitali University Hospital (Reykjavik, Iceland)¹⁸. The data were obtained through the Gene Expression Omnibus (GSE75067). Because calculation of the purity metric CPE requires gene expression, somatic copy-number, and immunohistochemistry in addition to methylation data, we instead assessed tumor purity using the leukocyte unmethylation percentage (LUMP) value. A tumor’s LUMP value is calculated as the average $\beta$-value among 44 specific CpG sites, divided by 0.85; we found the LUMP value to be strongly correlated with CPE (R = 0.86). After excluding samples of low purity (LUMP < 0.6; n = 35), the remaining 146 samples all had <5% missing clock set fCpG measurements. After exclusion of non-ductal histology (n = 48) we ended up with an analytic cohort of n = 98. The following variables were retrieved: patient age at diagnosis; tumor grade; tumor size; molecular subtype (PAM50); fraction of genome altered (FGA); expression of a mitotic checkpoint gene module⁴⁶; fraction of cells in S-phase (flow cytometry).

Normal breast tissue cohort

We obtained publicly available methylation array data (Infinium HumanMethylation450 BeadChip) from 100 normal breast tissue samples in the Susan G. Komen Tissue Bank^17,47 (GSE88883). We excluded samples of low purity (LUMP < 0.6), resulting in a cohort of 79 normal breast tissue samples used for identifying fCpG sites.

Germany cohort

We retrieved publicly available methylation array data (Infinium HumanMethylation450 BeadChip) from 305 primary breast cancers collected within the Bavarian Breast Cancer Cases and Controls Study 2. The data were obtained through the Gene Expression Omnibus (GSE69914)⁴⁸. After excluding samples of low purity (LUMP < 0.6; n = 52), the remaining 253 samples all had <5% missing clock set fCpG measurements. ER, PR, Her2, and Ki-67 status measured by IHC were retrieved. Clinical subtypes were defined as follows: Hormone receptor negative (HR-) tumors were defined as those that were ER- and PR-, while HR+ tumors had to be positive for one or both of ER and PR. Then, HR + /Her2- tumors were labeled as luminal A, HR + /Her2+ as luminal B, HR-/Her2+ as Her2 positive, and HR-/Her2- as triple-negative.

WCHS cohort

We retrieved publicly available methylation array data (Infinium HumanMethylation450 BeadChip) from 694 primary breast cancers collected within the Women’s Circle of Health Study (WCHS). The raw intensity data in IDAT format were obtained through the Gene Expression Omnibus (GSE226569) using the R package GEOquery (version 2.76.0)^49,50. The data was subjected to standard background correction, bias correction, masking, and conversion to $\beta$-values using the openSesame pipeline from the R package sesame (version 1.26.0)⁵¹. After removing DCIS (stage 0) tumors (n = 56), we excluded samples of low purity (LUMP < 0.6; n = 79). Then, after noting that some clock set fCpGs had highly sparse data, we removed 51 sites that were missing measurements in more than 25% of the remaining tumors. Finally, 114 tumors were excluded for having ≥5% missing measurements across the 449 remaining fCpGs, resulting in a final cohort of 445 patients with invasive tumors. ER, PR, and Her2 status measured by IHC, patient age, and summary stage were provided by the publishing authors. Clinical subtypes were determined as described previously for the Germany cohort.

Single-cell DNAm atlas

We retrieved publicly available single-cell whole genome bisulfite sequencing data for breast luminal (N = 3 measurements) and basal (N = 4) epithelial tissue from the DNA methylation atlas of normal human cell types collected by Loyfer et al. (GSE186458)⁵², all of which were of very high purity (LUMP ≥ 0.99). We only retrieved data collected for the 482,422 sites that also appeared on the Infinium HumanMethylation450 BeadChip.

Multiple sample cohorts

We retrieved publicly available methylation array data (Infinium Human-Methylation450 BeadChip) from four cohorts with paired tumor samples. The first cohort consisted of 8 breast cancer patients with multiple primary samples (GSE106360)¹⁹. Only samples from the 5 patients (2 patients with 5 samples each; 3 patients with 3 samples each) who had not received neoadjuvant therapy were used. Because LUMP values were highly variable, we did not apply any purity filtering. The second cohort consisted of 10 patients diagnosed with multi-focal breast cancer (GSE39451)³³. For each patient, methylation array data from 2 foci were available, and we only included the 8 patients where both samples were of sufficient purity (LUMP ≥ 0.6). The third cohort consisted of paired primary and lymph node metastasis samples from 44 patients (GSE58999)³⁴. Only patients where both samples were of sufficient purity (LUMP ≥ 0.6) were included (n = 18). The fourth cohort, from the AURORA US Metastasis Project, consisted of primary and metastasis samples taken from 55 patients with metastatic breast cancer. In our analysis, we included only patients for whom at least one primary and one metastasis sample of sufficient purity (LUMP ≥ 0.6) were available (n = 22) (GSE212370)³⁵. Only patients with at least one primary and one metastasis sample of sufficient purity (LUMP ≥ 0.6) were included (n = 22). When more than one primary or metastasis sample was available, the one with the highest LUMP value was selected.

Selection of fluctuating CpG (fCpG) sites

First, we sought to identify CpG sites with balanced methylation and de-methylation rates, defined as having an average methylation content (β-value) between 0.4 and 0.6 in the TCGA cohort (N = 634), bulk normal breast tissue cohort (N = 79), single-cell breast luminal epithelial cohort (N = 3), and single-cell breast basal epithelial cohort (N = 4). CpG sites with ≥20 missing values in either the TCGA or bulk normal cohorts were excluded from this selection process. In the second step, we ranked all such balanced CpG sites by their $\beta$-value variance among tumors in the TCGA cohort and selected the 500 most variable fCpGs to define the clock set ${{\mathcal{C}}}$. Based on the clock set, each tumor was assigned an epigenetic clock index ${c}_{\beta }=1-{s}_{\beta }$ where ${s}_{\beta }$ is the standard deviation of the β values in the clock set. We compared the proportion of genic/regulatory sites in clock sites vs. non-clock sites using the ${\chi }^{2}$ test; genic/regulatory CpG sites were identified as those associated with regulatory features or genes in one or both of the official annotation files of the Infinium HumanMethylation450 and MethylationEPIC bead chip arrays (https://support.illumina.com).

Gene expression analyses

For tumors in TCGA, relative gene expression levels were taken as the mean-centered, log2(x + 1) transformation of the reported transcript per million (TPM) intensities. Among the 60,616 RNA transcripts recorded in TCGA, only those classified as protein-coding genes by the HUGO Gene Nomenclature Committee⁵³ were included in subsequent analyses (n = 18,910). Expression of a mitotic checkpoint gene module⁴⁶ was calculated as the average relative expression of the genes included in the module. Molecular subtyping was based on the PAM50 algorithm⁵⁴ as implemented in R package Genefu⁵⁵. For tumors in the Lund cohort, identical gene expression and molecular subtyping analyses had previously been reported⁴⁶, thus enabling a direct comparison between tumors in the TCGA and Lund cohorts.

Pathway enrichment analyses

For the TCGA cohort, we performed a gene set enrichment analysis (GSEA) using the software package GSEA^56,57 to identify Hallmark gene sets that are correlated with the epigenetic clock index ${c}_{\beta }$. The analysis was performed using the Pearson correlation to rank individual genes; phenotype-permutation-based P values and false-discovery rate (FDR) Q values were computed using 1000 permutations. All other inputs were kept at their defaults.

CIBERSORTx

To assess the immune cell composition within the tumor microenvironment, we employed CIBERSORTx using the LM22 signature matrix and batch correction³². Briefly, RNA-seq data from the TCGA tumor samples were uploaded to the CIBERSORTx web portal, where gene expression profiles were deconvoluted to estimate the absolute scores for 22 distinct immune cell types. The analysis was performed with the default parameters, including 100 permutations for statistical significance assessment. For reporting of results the 22 distinct cell types were then collapsed into six mutually exclusive categories: B cells, macrophages, mast cells, myeloid cells, natural killer (NK) cells, and T cells.

Copy-number analyses

For tumors in TCGA, copy-number (CN) data consisted of specified chromosomal regions of equal CN, the ${\log }_{2}(\frac{x}{2})$-transformed CN, and the number of probes. We converted these values to absolute copy numbers and determined each segment to have either a copy number gain (segment mean ≥ 2.5), a copy number loss (segment mean ≤1.5), or no change (1.5 < segment mean < 2.5). The fraction of the genome altered by copy number gains and losses were each calculated for every tumor by dividing the number of probes affected by gains and losses, respectively, by the total number of probes. The total fraction of the genome altered by copy number alterations (FGA) was then calculated as the sum of these two values. For tumors in the Lund cohort, the same approach had previously been used to compute FGA², thus enabling direct comparison between tumors in the TCGA and Lund cohorts.

In silico model of tumor growth and fCpG dynamics

To simulate the dynamics of fCpG sites in a growing tumor, we used a discrete-time birth-death process. Starting with a single founding tumor cell, the population is updated in time intervals of one day, at which time each cell either divides, dies, or remains unchanged with probabilities $\alpha$, $\lambda$, and $1-\alpha -\lambda$, respectively. Upon cell division, each allele in each cell changes its methylation state with probability $\mu$. We tracked an ensemble of 90 fCpG sites, assuming independent (de-)methylation dynamics. Unless otherwise specified, the following parameters were used: $\alpha =0.17$ (the estimated mean proliferation rate in the TCGA-Lund combined cohort, see below for details), $\lambda =0.15$ (to reach a population of 10⁹ cells in 3 years, or, ${\left(1+\alpha -\lambda \right)}^{3\cdot 365}\approx {\,10}^{9}$), and ${{\rm{\mu }}}=0.002$ (the estimated flip rate in the combined cohort, see below for details).

Tumor specific proliferation rates

For tumors in the Lund cohort, tumor specific proliferation rates ${\alpha }_{i}$ were estimated based on the reported fraction ${f}_{i}$ of cells in S-phase as ${\alpha }_{i}={f}_{i}/{T}_{S}$, where ${T}_{S}$ is the average time spent in S-phase (see Supplementary Methods for details). We assumed ${T}_{S}$ to equal 12.7 hours, based on an average across five cancer cell lines⁵⁸. Because ${f}_{i}$ is not reported in the TCGA cohort, we used the Lund data to develop a predictive model of S-phase fraction using an elastic net model. As candidate predictors, we included FGA, LUMP value, and average gene expression levels within each of the following gene modules⁴⁶: mitotic checkpoint (see above), immune response, stroma, mitotic progression, early response, steroid response, basal, and lipid. The model was fit to the Lund cohort tumors using cross-validation for hyperparameter optimization, and then applied to TCGA tumors to predict tumor-specific S-phase fractions ${f}_{i}$ and proliferation rates ${\alpha }_{i}$.

Tumor age estimation

To estimate tumor mitotic and calendar ages from the empirical $\beta$-value distributions, we proceeded in two steps. In the first step, we decomposed each tumor’s empirical $\beta$-value distribution into three groups, or “peaks”, of fCpG sites: the originally unmethylated fCpG sites (left peak), the originally hemi-methylated fCpG sites (middle peak), and the originally methylated fCpG sites (right peak). We achieved this by fitting a mixture model of three Beta distributions to the $\beta$-values of the 500 fCpG sites in the clock set using the R package BetaModels (version 0.5.2). To improve convergence of this method, sites with extreme $\beta$-values ($\beta > 0.98$ or $\beta < 0.02$) were removed before fitting the mixture model (a total of 46 and 244 sites were thus removed in the TCGA and Lund cohorts). In preparation of the next step, we determined the mode of each Beta component in the mixture as the location of the corresponding peak. At this point we excluded tumors with a middle peak location outside the interval [0.4, 0.6] because this suggests a bias in the (de-)methylation rates and thus violates a basic assumption of the fCpG dynamics in the clock set (35 and 8 tumors were excluded in the TCGA and Lund cohorts, respectively). We also excluded 3 TCGA and 7 Lund normal that were of the “normal” molecular subtype. In the second step, we used the stochastic oscillator model (Fig. 1A) to relate the empirical peak location to the approximate age of the tumor. Because this step requires knowledge about the unknown stochastic (de-)methylation rate, we constrained the overall calendar age distribution across the Lund and TCGA cohorts to have a median of 3 years, which corresponds to the mean sojourn time in breast cancer^4,59. See Supplementary Methods for details.

EpiSCORE

To provide an orthogonal measure of purity in the TCGA cohort, we used the R package EpiSCORE (version 0.9.6)⁴⁰. Following the standard instructions of the tool, we converted the 450K methylation array data to gene-level DNAm data and then used the breast reference DNAm matrix provided with the package to estimate, for each tumor, the fraction of each of the following breast cell-types: basal, endothelial, fat, fibroblast, luminal, lymphocyte, and macrophage. The epithelial fraction was computed as the sum of the luminal and basal fractions.

Purity adjusted analyses

Acknowledging the correlation between the epigenetic clock index ${c}_{\beta }$ and tumor purity, we derived a purity-adjusted epigenetic clock index ${c}_{\beta }^{\alpha }$ and repeated relevant correlation analyses with ${c}_{\beta }^{\alpha }$ instead of ${c}_{\beta }$. Because the epigenetic clock index was derived from the distribution of $\beta$-values of fCpG sites, we performed the purity adjustment at the level of $\beta$-values. For this, we assumed that the measured ${{\rm{\beta }}}$-value at site ${\mathfrak{i}}$ (${\beta }_{i}^{m}$) could be decomposed as a weighted sum of $\beta$-values of the tumor (${\beta }_{i}^{t}$) and the immune component (${\beta }_{i}^{s}$),

$${\beta }_{i}^{m}=p{\beta }_{i}^{t}+\left(1-p\right){\beta }_{i}^{s},$$

(1)

where $p$ is the sample purity as measured by CPE. To estimate ${\beta }_{i}^{s}$ we combined the CIBERSORTx decomposition of the stroma (see section CIBERSORTx) with ${{\rm{\beta }}}$-values of its constituent cells (${\beta }_{k}^{c}$) to obtain

$${\beta }_{i}^{s}={\sum}_{k\in {LM}22}{{w}_{i,k}\beta }_{k}^{c},$$

where ${w}_{i,{k}}$ is the fraction of cell type $k$ (in the LM22 signature) in tumor sample $i$. The ${\beta }_{k}^{c}$ were estimated using published cell-type specific methylation values⁶⁰. Finally, the purity adjusted $\beta$-values were obtained by solving Eq. (1) for ${\beta }_{i}^{t}$ and truncating values below 0 and above 1 (necessary for <5.7% of the adjusted ${{\rm{\beta }}}$-values).

Statistics and reproducibility

Correlations between two continuous variables were calculated using the Pearson correlation coefficient. The medians of continuous variables were compared using a two-sided Wilcoxon rank-sum test at significance level of 0.05. For each variable, tumors with missing values of that variable were excluded. All analyses and visualizations were performed in Python (3.9.19) and R (version 4.3)⁶¹. All analyses were based on publicly available data sources and can thus be fully reproduced. To maximize statistical power, all qualifying samples were included in the respective data analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All data used in this work are publicly available under https://www.cancer.gov/ccg/research/genome-sequencing/tcga (TCGA data), and under the following Gene Expression Omnibus accession numbers: GSE75067 (Lund cohort); GSE88883 (normal breast cohort); GSE69914 (Germany cohort); GSE226569 (WCHS cohort); GSE186458 (DNAm atlas); GSE106360, GSE39451, GSE58999, GSE212370 (multiple sample cohorts). A list of the 500 fCpG sites used to estimate tumor mitotic age is provided in Supplementary Data 1. Tumor mitotic ages for samples in the TCGA and Lund cohorts are provided in Supplementary Data 2. The numerical data underlying the figures in this manuscript are provided in Supplementary Data 3.

Code availability

All Python (version 3.9.19) and R (version 4.3) code used to produce the results in this paper are found on GitHub at https://github.com/danmonyak/EpiClockInvasiveBRCA (MIT License)⁶¹.

References

Duffy, S. W., Chen, H. H., Tabar, L. & Day, N. E. Estimation of mean sojourn time in breast cancer screening using a Markov chain model of both entry to and exit from the preclinical detectable phase. Stat. Med. 14, 1531–1543 (1995).
Article CAS PubMed Google Scholar
Michaelson, J. et al. Estimates of breast cancer growth rate and sojourn time from screening database information. J. Women’s Imaging 5, 11–19 (2003).
Article Google Scholar
Shapiro, S., Goldberg, J. D. & Hutchison, G. B. Lead time in breast cancer detection and implications for periodicity of screening. Am. J. Epidemiol. 100, 357–366 (1974).
Shen, Y. & Zelen, M. Screening sensitivity and sojourn time from breast cancer early detection clinical trials: mammograms and physical examinations. J. Clin. Oncol. 19, 3490–3499 (2001).
Article CAS PubMed Google Scholar
Weedon-Fekjær, H., Vatten, L. J., Aalen, O. O., Lindqvist, B. & Tretli, S. Estimating mean sojourn time and screening test sensitivity in breast cancer mammography screening: new results. J. Med. Screen. 12, 172–178 (2005).
Article PubMed Google Scholar
Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. cell 49, 359–367 (2013).
Article CAS PubMed Google Scholar
Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, 1–20 (2013).
Article Google Scholar
Yang, Z. et al. Correlation of an epigenetic mitotic clock with cancer risk. Genome Biol. 17, 1–18 (2016).
Article CAS Google Scholar
Youn, A. & Wang, S. The MiAge Calculator: a DNA methylation-based mitotic age calculator of human tissue types. Epigenetics 13, 192–206 (2018).
Article PubMed PubMed Central Google Scholar
Zhu, T., Tong, H., Du, Z., Beck, S. & Teschendorff, A. E. An improved epigenetic counter to track mitotic age in normal and precancerous tissues. Nat. Commun. 15, 4211 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nat. Genet. 50, 591–602 (2018).
Article CAS PubMed PubMed Central Google Scholar
Teschendorff, A. E. A comparison of epigenetic mitotic-like clocks for cancer risk prediction. Genome Med. 12, 1–17 (2020).
Article Google Scholar
Gabbutt, C. et al. Fluctuating methylation clocks for cell lineage tracing at high temporal resolution in human tissues. Nat. Biotechnol. 40, 720–730 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gabbutt, C. et al. Evolutionary dynamics of 1,976 lymphoid malignancies predict clinical outcome. medRxiv 2023.2011. 2010.23298336 (2023).
Teschendorff, A. E. On epigenetic stochasticity, entropy and cancer risk. Philos. Trans. R. Soc. B 379, 20230054 (2024).
Article CAS Google Scholar
Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Article CAS Google Scholar
Johnson, K. C., Houseman, E. A., King, J. E. & Christensen, B. C. Normal breast tissue DNA methylation differences at regulatory elements are associated with the cancer risk factor age. Breast Cancer Res. 19, 1–11 (2017).
Article Google Scholar
Holm, K. et al. An integrated genomics analysis of epigenetic subtypes in human breast tumors links DNA methylation patterns to chromatin states in normal mammary cells. Breast Cancer Res. 18, 1–20 (2016).
Article Google Scholar
Luo, Y. et al. Regional methylome profiling reveals dynamic epigenetic heterogeneity and convergent hypomethylation of stem cell quiescence-associated genes in breast cancer following neoadjuvant chemotherapy. Cell Biosci. 9, 16 (2019).
Article PubMed PubMed Central Google Scholar
Nishimura, T. et al. Evolutionary histories of breast cancer and related clones. Nature 620, 607–614 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).
Article CAS PubMed PubMed Central Google Scholar
van der Veer, E. L. et al. Causes and consequences of delayed diagnosis in breast cancer screening with a focus on mammographic features and tumour characteristics. Eur. J. Radiol. 167, 111048 (2023).
Article PubMed Google Scholar
Danielsen, H. E., Pradhan, M. & Novelli, M. Revisiting tumour aneuploidy — the place of ploidy assessment in the molecular era. Nat. Rev. Clin. Oncol. 13, 291–304 (2016).
Article CAS PubMed Google Scholar
Ricke, R. M., van Ree, J. H. & van Deursen, J. M. Whole chromosome instability and cancer: a complex relationship. Trends Genet. 24, 457–466 (2008).
Article CAS PubMed PubMed Central Google Scholar
Chia, S. K. et al. A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen. Clin. Cancer Res. 18, 4465–4472 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wallden, B. et al. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med. Genom. 8, 1–14 (2015).
Article Google Scholar
Rakha, E. A. et al. Breast cancer prognostic classification in the molecular era: the role of histological grade. Breast Cancer Res. 12, 1–12 (2010).
Article Google Scholar
Wiesner, F. G. et al. Ki-67 as a prognostic molecular marker in routine clinical use in breast cancer patients. Breast 18, 135–141 (2009).
Article PubMed Google Scholar
Carter, C. L., Allen, C. & Henson, D. E. Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer 63, 181–187 (1989).
Article CAS PubMed Google Scholar
Loftus, L. V., Amend, S. R. & Pienta, K. J. Interplay between cell death and cell proliferation reveals new strategies for cancer therapy. Int. J. Mol. Sci. 23, 4723 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Article CAS PubMed Google Scholar
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
Article CAS PubMed PubMed Central Google Scholar
Desmedt, C. et al. Abstract S6-2: characterization of different foci of multifocal breast cancer using genomic, transcriptomic and epigenomic data. Cancer Res. 72, S6–2 (2012).
Article Google Scholar
Reyngold, M. et al. Remodeling of the methylation landscape in breast cancer metastasis. PloS One 9, e103896 (2014).
Article PubMed PubMed Central Google Scholar
Garcia-Recio, S. et al. Multiomics in primary and metastatic breast tumors from the AURORA US network finds microenvironment and epigenetic drivers of metastasis. Nat. Cancer 4, 128–147 (2023).
CAS PubMed Google Scholar
Cheung, K. J. & Ewald, A. J. A collective route to metastasis: seeding by tumor cell clusters. Science 352, 167–169 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fan, C. et al. Concordance among gene-expression–based predictors for breast cancer. N. Engl. J. Med. 355, 560–569 (2006).
Article CAS PubMed Google Scholar
Haque, R. et al. Impact of breast cancer subtypes and treatment on survival: an analysis spanning two decades. Cancer Epidemiol. Biomark. Prev. 21, 1848–1855 (2012).
Article Google Scholar
Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
Article CAS PubMed Google Scholar
Teschendorff, A. E., Zhu, T., Breeze, C. E. & Beck, S. EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data. Genome Biol. 21, 221 (2020).
Article CAS PubMed PubMed Central Google Scholar
Welch, H. G. & Black, W. C. Overdiagnosis in cancer. J. Natl. Cancer Inst. 102, 605–613 (2010).
Article PubMed Google Scholar
Li, J. et al. Molecular differences between screen-detected and interval breast cancers are largely explained by PAM50 subtypes. Clin. Cancer Res. 23, 2584–2592 (2017).
Article PubMed Google Scholar
Maley, C. C. et al. Classifying the evolutionary and ecological features of neoplasms. Nat. Rev. Cancer 17, 605–619 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Article Google Scholar
Colaprico, A. et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 44, e71–e71 (2015).
Article PubMed PubMed Central Google Scholar
Fredlund, E. et al. The gene expression landscape of breast cancer is shaped by tumor protein p53 status and epithelial-mesenchymal transition. Breast Cancer Res. 14, 1–13 (2012).
Article Google Scholar
Sherman, M. E. et al. The Susan G. Komen for the Cure Tissue Bank at the IU Simon Cancer Center: a unique resource for defining the “molecular histology” of the breast. Cancer Prev. Res. 5, 528–535 (2012).
Article Google Scholar
Yang, Z., Jones, A., Widschwendter, M. & Teschendorff, A. E. An integrative pan-cancer-wide analysis of epigenetic enzymes reveals universal patterns of epigenomic deregulation in cancer. Genome Biol. 16, 140 (2015).
Article PubMed PubMed Central Google Scholar
Chen, J. et al. An epigenome-wide analysis of socioeconomic position and tumor DNA methylation in breast cancer patients. Clin. Epigenet.15, 68 (2023).
Article CAS Google Scholar
Davis, S. & Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847 (2007).
Article PubMed Google Scholar
Zhou, W., Triche, T. J. Jr, Laird, P. W. & Shen, H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res. 46, e123–e123 (2018).
PubMed PubMed Central Google Scholar
Loyfer, N. et al. A DNA methylation atlas of normal human cell types. Nature 613, 355–364 (2023).
Article CAS PubMed PubMed Central Google Scholar
Seal, R. L., et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 51, D1003–D1009 (2023).
Article CAS PubMed Google Scholar
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).
Article PubMed PubMed Central Google Scholar
Gendoo, D. M. et al. Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics 32, 1097–1099 (2016).
Article CAS PubMed Google Scholar
Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Article CAS PubMed Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Article CAS PubMed PubMed Central Google Scholar
Bialic, M., Al Ahmad Nachar, B., Koźlak, M., Coulon, V. & Schwob, E. Measuring S-phase duration from asynchronous cells using dual EdU-BrdU pulse-chase labeling flow cytometry. Genes 13, 408 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bhatt, R. et al. Estimation of age of onset and progression of breast cancer by absolute risk dependent on polygenic risk score and other risk factors. Cancer 130, 1590–1599 (2024).
Article PubMed Google Scholar
Hannon, E. et al. Assessing the co-variability of DNA methylation across peripheral cells and tissues: Implications for the interpretation of findings in epigenetic epidemiology. PLoS Genet. 17, e1009443 (2021).
Article CAS PubMed PubMed Central Google Scholar
Monyak, D. L. Analysis scripts and documentation for “Mapping the Temporal Landscape of Breast Cancer Using Epigenetic Entropy”. v1.0. Zenodo. https://doi.org/10.5281/zenodo.16813782 (2025).

Download references

Acknowledgements

We gratefully recognize our funders who provided support for this work: National Institutes of Health (grant R01-CA271237 to M.D.R. and L.J.G.; grant U2C-CA233254 to E.S.H.; grant U54-CA217376 to D.S.), and Breast Cancer Research Foundation (grant BCRF-19-074 to E.S.H).

Author information

Authors and Affiliations

Trinity College of Arts and Sciences, Duke University, Durham, NC, USA
Daniel L. Monyak & Graham J. Gumbert
Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
Shannon T. Holloway & Marc D. Ryser
Department of Radiology, Duke University School of Medicine, Durham, NC, USA
Lars J. Grimm
Department of Surgery, Duke University School of Medicine, Durham, NC, USA
E. Shelley Hwang & Jeffrey R. Marks
Department of Pathology, University of Southern California Keck School of Medicine, Los Angeles, CA, USA
Darryl Shibata
Department of Mathematics, Duke University, Durham, NC, USA
Marc D. Ryser

Authors

Daniel L. Monyak
View author publications
Search author on:PubMed Google Scholar
Shannon T. Holloway
View author publications
Search author on:PubMed Google Scholar
Graham J. Gumbert
View author publications
Search author on:PubMed Google Scholar
Lars J. Grimm
View author publications
Search author on:PubMed Google Scholar
E. Shelley Hwang
View author publications
Search author on:PubMed Google Scholar
Jeffrey R. Marks
View author publications
Search author on:PubMed Google Scholar
Darryl Shibata
View author publications
Search author on:PubMed Google Scholar
Marc D. Ryser
View author publications
Search author on:PubMed Google Scholar

Contributions

M.D.R., D.S., J.R.M., E.S.H., and L.J.G. conceived the study and secured funding. M.D.R., D.L.M., and S.T.H. developed and optimized the analytical methods. D.L.M., G.G., and S.T.H. curated the datasets, maintained the code base, and performed the computational analyses. D.L.M. and S.T.H. prepared the figures and illustrations. All authors contributed to the interpretation of the results. M.D.R., D.S., and J.R.M. supervised the project. D.L.M., M.D.R., and D.S. drafted the manuscript, and all authors revised it critically and approved the final version.

Corresponding authors

Correspondence to Jeffrey R. Marks, Darryl Shibata or Marc D. Ryser.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Andrew Teschendorff and the other, anonymous, reviewer for their contribution to the peer review of this work. Primary Handling Editor: Johannes Stortz. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Monyak, D.L., Holloway, S.T., Gumbert, G.J. et al. Mapping the temporal landscape of breast cancer using epigenetic entropy. Commun Biol 8, 1477 (2025). https://doi.org/10.1038/s42003-025-08867-2

Download citation

Received: 11 October 2024
Accepted: 09 September 2025
Published: 16 October 2025
Version of record: 16 October 2025
DOI: https://doi.org/10.1038/s42003-025-08867-2

This article is cited by

The contribution of phenolic endocrine-disrupting chemicals to breast cancer risk: A comprehensive bioinformatics analysis
- Yanhong Dou
- Xiongxiong Li
- Ting Xu
Scientific Reports (2026)