Main

Cancer development is an evolutionary process1,2; consequently, the evolutionary history of a cancer may set its future trajectory and allow inference of the clinical path of a patient5. However, testing this hypothesis directly is challenging because longitudinal patient samples are required to document evolutionary history. Consequently, evolutionary histories are typically inferred from single timepoint data; for example, somatic (epi)mutations are patterned in distinctive ways by differing evolutionary dynamics3. In the haematological system, genome sequencing of single cells or single-cell colonies have been used to infer the phylogenetic relationships among cells6,7,8. The expense of this approach has restricted analyses to small numbers of cases, limiting suitability for clinical translation.

DNA methylation can serve as a lineage marker, recording the clonal architecture of cell populations9,10,11,12,13 or the proliferative history14,15. We have recently identified DNA methylation at some CpG sites, which stochastically fluctuates over time at a timescale measured in years4. These fluctuating CpGs (fCpGs) function as a ‘methylation barcode’, providing a low-cost strategy to provide high temporal resolution lineage tracing in patient samples4. In this study, we constructed a quantitative modelling framework called evolutionary inference using fluctuating methylation (EVOFLUx). This framework enables precise quantitative inference of the evolutionary history of cancer cells from input fCpG data derived from clinical specimens, at scale (Fig. 1a).

Fig. 1: Selection and characterization of fCpG loci.
Fig. 1: Selection and characterization of fCpG loci.The alternative text for this image may have been generated using AI.
Full size image

a, Schematic representing the study design. Bulk cancer tissue samples were collected, DNA extracted and methylation arrays were performed. Evolutionary dynamics were inferred and these were correlated with clinical variables and outcomes. The illustrations of the body, tumour, eppendorf tube and machine were reproduced courtesy of NIAID. The illustration of the physician was adapted from Science Figures, under an Open Design Licence 1.0. The illustration of the screen was created by Simon Dürr under a CC0 1.0 licence. b, fCpG methylation patterns reflect population evolutionary dynamics. Three scenarios are depicted: in a polyclonal population with a distant MRCA, diverse fCpG methylation (unmethylated in white, heterozygous in grey and methylated in black) results in a unimodal distribution of average methylation near the steady state (top). Following a recent clonal expansion (cell number 3 of the top panel), identical fCpG methylation across cells yields a characteristic ‘W-shape’ in the bulk methylation distribution (middle). Post-bottleneck, ongoing fluctuations generate diverse fCpG methylation patterns, changing the distribution of bulk methylation values (bottom). c, A hierarchically clustered heatmap of the 978 fCpGs identified in lymphoid cells (n = 2,204 samples). Magnified regions represent the homogeneous intermediate methylation pattern in normal lymphoid cells and the speckled pattern in cancer samples. PBMC, peripheral blood mononuclear cell; MGUS, monoclonal gammopathy of undetermined significance; MM, multiple myeloma; NOS, not otherwise specified; RT, Richter transformation. d, Example histograms of fCpG methylation distributions from healthy and neoplastic lymphoid cells (CLLs). e,f, Heatmaps showing the log2 fold change (FC) in genomic locations (e) and chromatin status (f) of fCpGs compared with non-fCpGs. g, Comparison between fCpG-associated and non-fCpG-associated genes in a single CLL sample (left; P = 5.14 × 10−12, Wilcoxon test, sample SCLL-328), and expression of fCpG-associated genes separated by discretized allele methylation status (right; P values were determined by two-sided Wilcoxon test, no multiple correction; n = 505 for fCpG genes and n = 15,736 for non-fCpG genes). The boxplot centre shows the median, the box shows the quartiles and whiskers represent ±1.5× interquartile range (IQR). TPM, transcripts per million.

Source data

EVOFLUx works by considering the heterogeneity of fCpG methylation values within a sample. At a diploid locus, each fCpG can take one of three states: neither allele methylated, one allele methylated or both alleles methylated (0%, 50% or 100% methylated, respectively), so n fCpG sites can take 3n possible methylation patterns. fCpGs fluctuate methylation status independently, meaning that they function as an ‘evolving barcode’ to track clonal evolution: two somatic cells with close ancestry will share a near-identical pattern of fCpG methylation, whereas distantly related cells will have divergent fCpG methylation patterns (Fig. 1b). In bulk populations of clonal somatic cells, the dominant fCpG pattern represents the fCpG state of the founder cell of the population. Therefore, the precise distribution of fCpG methylation is determined by the evolutionary history of the population, meaning that mathematical modelling can be used to recover the evolutionary history of a sample from input fCpG data.

Here we focused on lymphoid neoplasms, which cover a broad spectrum of diseases and subtypes with highly variable clinicobiological features, from highly proliferative acute disease to indolent chronic leukaemia, arising in infants to older adults, with tumour samples across disease stages16,17. These tumours have been extensively profiled by DNA methylation arrays, which have provided insights into their cellular origin, pathogenesis and clinical behaviour18. Although their temporal clonal dynamics has been partially analysed in few patients19,20, their precise evolutionary histories remain poorly characterized. Applying EVOFLUx to 1,976 well-characterized lymphoid malignancies, we precisely measured individual tumour evolutionary history and show that these histories are associated with disease outcome.

Characterization of fCpGs in lymphoid cancers

We assembled bulk Illumina methylation array data of normal and neoplastic lymphoid cells from 2,430 samples14,21,22,23,24,25,26,27,28,29,30 (Methods; Supplementary Tables 13). Following quality control, we retained 2,204 samples from 2,054 patients and 389,180 CpGs. As fCpG loci are tissue specific4, we constructed a pipeline to identify lymphoid-specific fCpGs (Methods; Extended Data Fig. 1a) using 1,471 samples from multiple lymphoid tumour entities (Supplementary Tables 1 and 2). We identified 978 pan-lymphoid cancer fCpGs (Supplementary Table 4). Methylation at fCpGs shows a characteristic ‘speckled’ pattern across cancers (Fig. 1c) because (de)methylation occurs independently in each tumour, in stark contrast to the orderly patterns observed for traditional methylation clocks or a random subset of CpGs (Extended Data Fig. 1b–d). fCpGs did not cluster the samples by disease (Fig. 1c), and there was also no clustering based on disease subtype or array platform (Extended Data Fig. 1e–i), except for some B and T cell acute lymphoblastic leukaemias (B/T-ALLs) and multiple myeloma cases that exhibited global hypermethylation or hypomethylation, respectively, as previously reported14. This is consistent with fCpGs behaving as a stochastic ‘barcode’ encoding lineage information. By comparison, methylation at CpGs excluded by our selection filters either did cluster by disease, had very low heterogeneity across samples or had unequal methylation and demethylation rates (Supplementary Figs. 1 and 2).

We then examined the methylation value distribution of these 978 fCpG in individual samples (Supplementary Table 5). In each cancer sample, the fCpGs followed a characteristic ‘W-shaped’ distribution that depicts the fCpG methylation pattern of the founder cell of the cancer sample (Fig. 1d). By contrast, the healthy B cell subpopulations, which were not included in the discovery set, had unimodal distributions with intermediate methylation levels consistent with these being polyclonal populations (that is, average of the three methylation states; Fig. 1d).

Methylation values across fCpGs were uncorrelated, except for a small number of fCpGs located within 1 kb of another fCpG (74 of 978; Extended Data Fig. 2a). In whole-genome bisulfite sequencing (WGBS) data of sorted bulk B and T cell populations31, methylation at fCpG loci in these polyclonal normal samples was largely intermediate. Over a small window of 100 bp, as the distance from the fCpGs increased, an increasing fraction of the neighbouring CpGs were either hypermethylated or hypomethylated (Extended Data Fig. 2b). Together, these analyses suggest that the local 3D genome structure influences (de)methylation processes.

We sought to verify fCpGs as ‘evolving barcodes’. Analysis of fCpG methylation fluctuation over time confirmed that inter-patient fCpG heterogeneity was not caused by common single-nucleotide polymorphisms (SNPs; Methods; Extended Data Fig. 2c and Supplementary Fig. 3). We generated long-read nanopore sequencing32 on normal B cells (n = 6) and matched chronic lymphocytic leukaemia (CLL)-Richter transformation samples (n = 2 pairs) to simultaneously detect genetic mutations and DNA methylation, and confirmed that fCpG methylation variation is not a consequence of underlying somatic mutation (Extended Data Fig. 2d and Supplementary Fig. 4). In matched data, fCpG methylation levels measured by bead array or long-read sequencing were highly concordant (Extended Data Fig. 2e), and similar excellent concordance was observed in additional WGBS data (Extended Data Fig. 2f). We constructed fCpG methylation haplotypes using long-read sequencing (Extended Data Fig. 2g) and additional single-cell reduced representation bisulfite sequencing6, and detected lower intra-haplotype heterogeneity within CLL samples than normal B cell samples (Extended Data Fig. 2h and Supplementary Fig. 5), consistent with the leukaemia being a clonal expansion, whereas normal B cells are polyclonal.

We utilized somatic copy number alterations in 492 CLL and 85 mantle cell lymphoma (MCL) samples26,33 (Supplementary Tables 6 and 7) to distinguish between alleles and show the (de)methylation at fCpGs occurred independently on each allele (Extended Data Fig. 3a), despite copy number alterations being rare in our cohorts (Supplementary Fig. 6). Thus, fCpGs show independent ongoing allele-specific changes to methylation, uniquely labelling cell lineages.

As the DNA methylome is influenced by age34, we tested whether fCpGs showed evidence of age-dependent epigenetic modulation. In normal blood samples, mean fCpG methylation was not correlated with age, suggesting that fluctuations continue throughout life, whereas fCpG methylation variance increased with age (Extended Data Fig. 3b and Supplementary Fig. 7). Variance is higher in samples where there has been a recent clonal expansion (that is, homozygous methylated or unmethylated alleles become more prominent; Extended Data Fig. 3c), suggesting that fCpGs were detecting age-related clonal expansions of cells of the haematopoietic system4,35,36.

We analysed the genomic features of fCpG sites. fCpGs were enriched on the shores of CpG islands (Fig. 1e), underrepresented in gene-associated regions (Extended Data Fig. 3d) and, notably, were distinct from CpGs used in other epigenetic clocks (Extended Data Fig. 3e and Supplementary Tables 8 and 9). At the chromatin level, fCpGs were enriched in normal and neoplastic B cell weak promoters and enhancers as well as H3K27me3-marked regions, and significantly underrepresented in active promoters and H3K36me3-marked regions (Fig. 1f). RNA-sequencing analysis of CLL samples demonstrated that genes associated with fCpGs have significantly lower expression levels (Fig. 1g and Extended Data Fig. 3f), with no association between fCpG methylation status and associated gene expression in matched cases (Fig. 1g). No correlation was observed between fCpG methylation and the expression of key DNA methylation modifier genes (Extended Data Fig. 3g). Pathway enrichment analysis revealed that fCpG-associated genes were underrepresented in pathways ubiquitously expressed across multiple tissue types but enriched in developmental pathways (Supplementary Tables 10 and 11). Although these results do not provide a detailed molecular understanding of the mechanisms underpinning fCpG fluctuation; together, they indicate that fCpGs tend to be located in silent regions of the genome and do not regulate transcription, so are likely to be neutral lineage markers.

EVOFLUx measures clonal evolution

We developed EVOFLUx, a stochastic mathematical modelling and Bayesian inference framework, to simulate how clonal evolution quantitatively determines fCpG methylation values and enable inference of evolutionary history of individual tumour samples (Fig. 2a).

Fig. 2: The EVOFLUx model accurately captures fCpG data patterns.
Fig. 2: The EVOFLUx model accurately captures fCpG data patterns.The alternative text for this image may have been generated using AI.
Full size image

a, Schematic of the EVOFLUx method that simulates the evolutionary dynamics of a growing cancer. EVOFLUx relies on a mathematical model describing how the fCpG distributions vary with the evolutionary parameters of a cancer. The model is split into two phases: before the MRCA (τ) in which methylation changes occur in the single-cell lineage that is subsequently transformed, and following the MRCA in which the population grows exponentially (θ). At each time step, epigenetic switching is allowed to occur between the three possible states (rates μ, ν, γ and ζ). b,c, Simulated fCpG methylation distributions for cancer with a distant versus recent MRCA (b), and rapid versus slow growth (c). d, Fisher plot showing the change in the subclonal composition within a simulated set of longitudinal samples (top). Scatter plots showing the marginal fCpG methylation distribution between simulated pairwise longitudinal samples with the subclonal proportions depicted in each corresponding sample timepoint (n = 1,000 simulated fCpGs; bottom). Points in the scatter plot are coloured according to the difference in methylation between the first and the subsequent timepoints. e, Histogram overlaying the fCpG methylation distributions of data simulated at T1 (neutral) and T3 (subclonal; left), and comparison of the fraction of fCpGs at intermediate values between T1 and T3 (right; the P value was determined by chi-squared test).

Source data

EVOFLUx simulates the ongoing gain and loss of methylation at fCpGs within a lineage from the birth of a patient until the beginning of a cancer-associated clonal expansion at some specified time, and then continues to simulate methylation fluctuations within the growing population of cancer cells until the cancer sample was collected at time T (Methods; Fig. 2a and Supplementary Information). The key parameters in the model are:

  • Cancer growth rate per year (θ), assuming an exponentially growing population.

  • Cancer age, measured in terms of the age of the patient in years at the time the cancer started growing (τ).

  • fCpG switching rates per allele per year. Four parameters corresponding to the four possible transitions between homozygous unmethylated, heterozygous methylated and homozygous methylated (μ, ν, γ and ζ).

By combining the cancer growth rate and age, the cancer effective population size (Ne) — the number of long-lived lineages in the cancer — is calculated as Ne = eθ(T τ).

Computational simulations of the model recapitulated the observed W-shaped fCpG methylation distribution. Altering model parameters caused notable shifts in the distribution: increasing cancer age caused the flanking peaks of the W-shaped distribution to move towards the central peak, whereas slower growth broadened peak width (Fig. 2b,c). Hence, the distribution of fCpGs encoded the evolutionary history of a tumour.

We added simulations of a single advantageous subclone within the cancer (Methods). Sampling longitudinally from model simulations and comparing fCpG methylation between timepoints showed that subclone outgrowth was marked by the small number of fCpGs with distinct methylation status in the subclone, becoming detectable only when the subclone was sufficiently large (Fig. 2d,e).

EVOFLUx contained an extensive Bayesian inference method to learn model parameters from input fCpG methylation distribution data, accounting for tumour purity and the technical noise introduced by the methylation array (Methods; Fig. 3a). We generated simulated fCpG distributions with prespecified (that is, known) model parameters and validated the ability of EVOFLUx to accurately recover the ‘ground truth’ in these simulated data (Extended Data Fig. 4a,b and Supplementary Information), even when the assumptions underlying the method were weakened (Extended Data Fig. 4c–h and Supplementary Information).

Fig. 3: EVOFLUx reveals the evolutionary dynamics of lymphoid cancers.
Fig. 3: EVOFLUx reveals the evolutionary dynamics of lymphoid cancers.The alternative text for this image may have been generated using AI.
Full size image

a, Schematic showing the EVOFLUx Bayesian inference pipeline used to infer patient-specific tumour evolutionary history from input bulk methylation data, patient age and tumour purity. The illustration of human ageing was adapted from Pixabay under a CC0 1.0 licence. The illustration of the tumour was adapted courtesy of NIAID. b, Scatter plot of the inferred growth rate (θ) versus effective population size (Ne = eθ(T − τ)) per sample from patients with cancer, using the posterior median as a descriptive summary statistic of the probability distribution inferred for each evolutionary variable (n = 1,885 samples; coloured according to cancer type). c, Scatter plot of the inferred time since the MRCA (τ) versus the mean epigenetic switching rate (averaged across different rate parameters) per sample (n = 1,885 samples). df, Inferred growth rate (top) and effective population size (Ne; bottom) of individual cancer samples separated by molecular subtype in B-ALL (d; two-sided MWU test, n = 767 samples), MCL (e; two-sided MWU test, n = 126 samples) and CLL (f; two-sided MWU test, n = 702 samples). Differences between subtypes were tested using MWU tests, with Holm–Bonferroni corrections applied. In panel d, B-ALL cases with MLL rearranged had significantly higher growth rate than each of the other B-ALL subtypes (all P < 0.005); however, for ease of presentation, only the comparison with t(1;19) is shown here. cMCL, conventional MCL; nnMCL, non-nodal MCL. g, Inferred growth rate (top) and effective population size (bottom) for CLL cases with a TP53 driver mutation in each IGHV CLL subtype (two-sided MWU, n = 646 samples, multiple corrected using false discovery rate together with analysis of other driver mutations; Extended Data Fig. 7). WT, wild type. The boxplot centre shows the median, the box denotes the quartiles and whiskers represent ±1.5× IQR.

Source data

Evolution of lymphoid malignancies

We applied EVOFLUx to 1,976 samples of lymphoid cancers (including T-ALL, B-ALL, CLL, MCL, diffuse large B cell lymphoma (DLBCL) and multiple myeloma) and premalignant conditions (that is, monoclonal B cell lymphocytosis (MBL) and monoclonal gammopathy of undetermined significance) for which we had age and tumour cell purity information, to infer the individual growth rate of each cancer, time since the most recent common ancestor (MRCA) and epigenetic switching rates (Supplementary Table 12). Posterior distributions were well formed (Extended Data Fig. 5a) and posterior predictive distributions recapitulated the input data well (Extended Data Fig. 5b), emphasizing the excellent fit of the model to data. Inferred parameters were not significantly affected by tumour cell content of samples (Supplementary Fig. 8) or exclusion of copy number alteration-altered regions (Extended Data Fig. 5c–e and Supplementary Fig. 9). Most parameters were also insensitive to the number of fCpGs excluded (Extended Data Fig. 5f,g), except the effective population size (Extended Data Fig. 5h).

Paediatric ALL and adult lymphoid neoplasms exhibited markedly different evolutionary histories (Fig. 3b). ALLs demonstrated much higher growth rates (θ; Extended Data Fig. 6a; P = 9.3 × 10−306, Mann–Whitney U (MWU) test, Holm–Sidak correction), smaller effective population sizes (Ne; Extended Data Fig. 6b; P = 8.1 × 10−25) and shorter times since the MRCA (Extended Data Fig. 6c; P = 6.0 × 10−306) than other lymphoid malignancies. T-ALL grew faster than B-ALL (P = 0.0017, Holm–Sidak correction) and showed more homogenous growth rates (P = 0.00044, Levene test). In adult cancers, MBL (a precursor to CLL) displayed lower growth rates and longer time since the MRCA than CLL (Extended Data Fig. 6a,c; P = 9.7 × 10−10 and P = 9.9 × 10−13, respectively). DLBCL notably had the largest Ne despite comparable growth rates.

fCpG switching rates varied significantly across diseases, with paediatric ALLs showing much faster switching than adult malignancies (Fig. 3c and Extended Data Fig. 6d). Paediatric ALLs also demonstrated a strong positive correlation between fCpG switching rate and growth rate (Extended Data Fig. 6e; P = 2.4 × 10−98 and R2 = 0.44 in B-ALL and P = 5.9 × 10−6 and R2 = 0.22 in T-ALL) and a strong negative correlation with patient age (Extended Data Fig. 6f–k; P = 3.3 × 10−137 and R2 = 0.56 in B-ALL and P = 3.6 × 10−18 and R2 = 0.6 in T-ALL). These findings suggest that fCpG (de)methylation rates are decreased in adult lymphoid cancers.

In CLL, we estimated the ‘contemporary’ growth rate derived from multiple longitudinal clinical measurements of the lymphocyte count preceding treatment. The EVOFLUx inferred growth rate, which represents the rate of the initial clonal expansion of the disease, was moderately correlated with the contemporary growth rate (P = 2 × 10−5, R = 0.27; Extended Data Fig. 6l and Supplementary Fig. 10).

Evolution varies by cancer subtype

We examined how cancer evolutionary history related to molecular subtypes. In B-ALL, MLL-rearranged cases had a significantly higher growth rate (Fig. 3d; P = 1.3 × 10−13, MWU, 44.3 ± 6.1 versus 11.7 ± 0.2 per year (mean ± standard error)), but lower effective population size Ne (P = 3.7 × 10−7, 1.8 × 105 ± 0.2 × 105 versus 3.0 × 105 ± 0.06 × 105 cells) than the other subtypes, consistent with their distinct clinical behaviour37. In MCL, the generally more indolent leukaemic non-nodal MCL26 had a lower growth rate (Fig. 3e; P = 1.1 × 10−3, 1.7 ± 0.1 versus 2.1 ± 0.1 per year) and Ne (P = 7.4 × 10−5, 4.7 × 105 ± 1.4 × 105 versus 1.5 × 106 ± 0.2 × 106 cells) than the more aggressive conventional MCL. In DLBCL transcriptomic subtypes38, there was no significant differences, probably due to the smaller number of cases and the lower sample purity (Supplementary Fig. 11).

In CLL, two major molecular subtypes are defined based on the extent of somatic hypermutation in the heavy-chain variable region of the IG gene (IGHV): unmutated CLL (U-CLL) and mutated CLL (M-CLL). The more aggressive U-CLL subtype39,40 showed significantly higher growth rates (Fig. 3e; P = 1.3 × 10−32, 2.3 ± 0.04 versus 1.8 ± 0.02 per year) and larger Ne (P = 2.1 × 10−22, 7.2 × 105 ± 0.3 × 105 versus 4.1 × 105 ± 0.3 × 105 cells) than M-CLL, independent of tumour purity (Supplementary Fig. 12). Similar results were obtained when analysing its precursor condition MBL (Supplementary Fig. 12).

Patients with mutations in specific driver genes, such as TP53, are well known to have a worse prognosis41. We compared the inferred growth rates and effective population sizes accounting for IGHV status for the most prevalent driver genetic alterations in CLL: TP53, SF3B1, NOTCH1, ATM, POT1 and IGLV3-21R110, del(11)(q22.3), del(13)(q14.3), del(17)(p13.1) and trisomy 12 (Fig. 3g and Extended Data Fig. 7). Patients with M-CLL with TP53 mutations had a higher growth rate and effective population size (P = 0.030 and P = 0.036, respectively, MWU test, false discovery rate corrected; Fig. 3g). Trisomy 12 was associated with increased effective population size in both U-CLL and M-CLL (P = 0.036 and P = 0.036, respectively), but no difference in growth rate.

Most lymphoid cancers grow effectively-neutrally

As new advantageous subclones can arise during cancer evolution, we also examined subclonal architecture in our cohort. In CLL, a small fraction of cases presents two or more clones with independent origins42. In genetic data, subclonal, independent and monoclonal (Extended Data Fig. 8a) architectures are distinguishable by characteristic patterning of mutation allele frequencies43,44. Similarly, in fCpG data, clonal architectures are depicted by additional intermediate peaks in the methylation distribution (Fig. 2d,e). Simulations showed that subclone inference by EVOFLUx was limited to detect only strongly selected subclones arising at an intermediate timepoint in the history of the tumour (Supplementary Fig. 13 and Supplementary Information), for reasons analogous to limitations of subclone detection in genetic sequencing data45. We describe the evolution in tumours without detectable subclones as effectively-neutral.

Applying EVOLFUx in our cohort revealed that most cancers (1,610 of 1,976) showed no evidence of either subclonal or independent clones (Extended Data Fig. 8b and Supplementary Table 12). The frequency of subclone detection varied considerably between cancer types, ranging from over 30% in CLL (232 of 718) to less than 5% in DLBCL (1 of 57).

We verified EVOFLUx inferences with matched whole-exome sequencing (WES) data from 425 CLL cases (Supplementary Table 13). Using the MOBSTER subclonal deconvolution tool43, subclones were detected in 78 of 425 cancers (Supplementary Table 14), and these cancers had significantly higher EVOLUFx subclonality weights (P = 2.0 × 10−4, MWU test; Extended Data Fig. 8c). MOBSTER was more likely to detect subclones in cancers with more mutations (Extended Data Fig. 8d), suggesting limited power to detect subclones in WES. We therefore obtained matched whole-genome sequencing (WGS) data for 127 CLL samples (Supplementary Table 15) and observed better agreement between EVOFLUx and MOBSTER subclone calls (P = 3.9 × 10−4; Extended Data Fig. 8e and Supplementary Table 16), in which MOBSTER subclone calls were then independent of single-nucleotide variant count (P > 0.05). A classifier to predict WGS subclone architecture using EVOFLUx outperformed a WES-based classifier (area under the curve (AUC) = 0.73 versus AUC = 0.62) and performed equivalently to a classifier using both EVOFLUx and WES (AUC = 0.74; Extended Data Fig. 8f). Hence, EVOFLUx was more effective at detecting ongoing subclonal selection than MOBSTER applied to WES data.

CLLs with two independent clonal origins were detected in 22 of 718 cases. Validation through comparing IG gene rearrangements from WES or WGS and RNA sequencing46 (Supplementary Table 17) showed that patients with multiple IG gene rearrangements had elevated independent origin model weightings (Extended Data Fig. 8g; P = 0.028).

fCpGs record clonal dynamics over time

Some patients with CLL undergo Richter transformation, the emergence of an aggressive phenotype with dismal prognosis. We assembled longitudinal matched WGS and methylation data27 for two patients with CLL developing Richter transformation followed for 19.5 and 14.5 years, respectively. WGS data provided ground-truth measurement of clonal evolution during the decades of longitudinal follow-up, which we contrasted with clonal inference from methylation data (Fig. 4a,b and Supplementary Table 18).

Fig. 4: fCpGs allow for phylogenetic reconstruction of longitudinal lymphoid cancer samples.
Fig. 4: fCpGs allow for phylogenetic reconstruction of longitudinal lymphoid cancer samples.The alternative text for this image may have been generated using AI.
Full size image

a,b, Timelines and Fisher plots derived from WGS of two patients with CLL with longitudinal samples, annotated with treatment received (top). The Richter-transformed clone is shown in puce. Scatter plots showing the marginal fCpG methylation distribution between pairwise samples from above (middle). The points are coloured according to the difference in methylation between the first and the subsequent timepoints. Reconstructed phylogenies of the relationship between samples, annotated with the sample timepoints (bottom). The black triangles represent the time that occurred since the MRCA (T − τ), taken as the posterior median from the single-sample EVOFLUx inferences. The methylation fraction of the 978 fCpG loci are presented as heatmaps (0% blue and 100% red). Benda.-Obi, bendamustine, obinutuzumab; CLB, chlorambucil; CLB-R, chlorambucil and rituximab; CP, cyclophosphamide and prednisone; R-CVP, rituximab, cyclophosphamide, vincristine and prednisone; RFCM, rituximab, fludarabine, cyclophosphamide and mitoxantrone. c, An example of longitudinal samples (SW-BCP-ALL-375) showing the development of the fCpG distribution from diagnostic B-ALL, through remission and relapse. d, Scatter plot showing the marginal fCpG methylation distribution between diagnosis and relapse from panel c.

Source data

In CLL case 12, between T1 and T2 (13.1 years), WGS data showed that a subclonal expansion occurred, and this was mirrored in fCpGs (Fig. 4a). In the short (1 month) period between T2 and T3, the patient received ibrutinib treatment, but there was no clonal expansion detected in WGS nor methylation data. The patient then received rituximab, cyclophosphamide, vincristine and prednisone (R-CVP) combination treatment, but by T4 (5.6 months after T3) presented a clinical manifestation of Richter transformation, and WGS showed that the Richter transformation clone had expanded to form 77% of the tumour. This very large clonal expansion was clearly evident in fCpG methylation data.

In CLL case 19, the initial samples at T1–T4 spanned a period of 7.1 years. WGS showed gradual expansion of a subclone that was mirrored in fCpG methylation data (Fig. 4b). At T5 (5.5 years later), there had been a large nested subclonal expansion detected by WGS that was also evident in fCpG data. At T6 (2.8 years later), the patient was diagnosed with Richter transformation. WGS showed near-fixation of the Richter transformation clone and there was a correspondingly stark signal in fCpG data.

We used EVOFLUX on these longitudinal fCpG methylation data to construct phylogenetic relationships between samples (Methods). In both cases, the phylogenies (Fig. 4a,b) showed that the Richter transformation clone diverged exceptionally early, roughly a decade before the MRCA of the samples containing non-transformed CLL cells (9 and 12 years for cases 12 and 19, respectively). This was consistent with our previous analysis of single-cell RNA sequencing and DNA sequencing that detected Richter transformation cells at low frequencies within the diagnostic CLL sample27, but suggests, remarkably, that the initial Richter transformation divergence occurred well before diagnosis, over 30 years before the clinical presentation of Richter transformation.

We validated the fCpG phylogenetic inferences by comparing them with phylogenetic trees from matched WGS data27 and comparing it with methylation data (Methods; Extended Data Fig. 9a,b). fCpGs exactly recapitulated the WGS tree topology and had highly similar branch lengths unlike other CpG sets (Supplementary Figs. 1416). Hence, fCpGs are a high-resolution phylogenetic character.

We also performed phylogenetic analysis on B-ALL cases from diagnosis to relapse (Extended Data Fig. 9c–e). All relapse samples formed a separate clade from the initial diagnostic sample, suggesting a major treatment-induced evolutionary bottleneck. In patients with B-ALL with matched cancer–remission samples, we consistently observed that the W-shape (indicating a clonal expansion) was replaced by a unimodal distribution (indicating no clonal expansion) following successful treatment, with similar variance as normal blood (Extended Data Fig. 9f,g). In two patients with matched diagnosis, remission and relapse samples, we found that the unimodal fCpG distribution during remission was replaced with a W-shaped distribution at relapse similar in shape to the diagnostic sample, due to the clonal expansion driving recurrence (Fig. 4c and Extended Data Fig. 9h). Comparing the fCpG distributions between diagnostic versus relapse samples revealed subclonal evolution through treatment (Fig. 4d and Extended Data Fig. 9i).

Evolutionary history and clinical outcome

To investigate the relationship between the evolutionary history and future clinical trajectory of a cancer, we leveraged a series of 478 CLL cases with well-annotated follow-up data. Using univariate Cox models, we tested the effect of evolutionary parameters on time to first treatment (TTFT), which reflects the natural cancer biology, and then on overall survival, which is a more complex end point as it convolves disease biology with treatment responses.

In univariate analysis, faster growing CLLs had markedly shorter TTFT (Fig. 5a; P = 1.4 × 10−30, hazard ratio (HR) = 3.95) and worse overall survival (P = 0.0053, HR = 1.51). The Ne of a cancer did not have a strong effect on TTFT (P = 0.058, HR = 1.17), but was associated with shorter overall survival (P = 1.3 × 10−4, HR = 1.41). The patient age at the time of the MRCA of the CLL population was highly correlated with the age of the patient (Supplementary Fig. 17), so unsurprisingly, older patients had worse overall survival (P = 2.3 × 10−11, HR = 1.79). The decrease in risk of progression with cancer age (P = 3.8 × 10−17, HR = 0.65), measured by the time since the MRCA, was probably due to confounding with the growth rate, as these parameters were negatively correlated (Supplementary Fig. 17). The epigenetic switching rate parameters were largely uninformative of prognosis.

Fig. 5: The evolutionary history of a tumour is prognostic of clinical outcome.
Fig. 5: The evolutionary history of a tumour is prognostic of clinical outcome.The alternative text for this image may have been generated using AI.
Full size image

a, Univariate survival analysis of the TTFT (blue) and overall survival (OS; red) in the discovery CLL cohort for evolutionary variables inferred via EVOFLUx. b, Kaplan–Meier curves comparing the TTFT between patients with high versus low inferred cancer growth rates, separated by IGHV mutational status. c, Multivariate Cox regression model of the TTFT shows that the cancer growth rate is significant when controlling for IGHV status, TP53 alterations and age at sampling. The error bars represent 95% confidence intervals. A log-rank test was used in the Kaplan–Meier curves and Wald tests for Cox models. A Schoenfeld residuals test was used to test proportional hazard assumptions. No multiple comparison adjustments were done.

Source data

As the growth rate was different between U-CLL and M-CLL (Fig. 5b), we analysed its prognostic impact within each group separately. Higher growth rates consistently correlated with shorter TTFT in both subtypes (Fig. 5b; P = 1.4 × 10−5 for M-CLL, P = 1.56 × 10−7 for U-CLL and overall P = 2.1 × 10−53). In a multivariate Cox regression model, growth rate maintained a strong independent prognostic impact as quantitative variable (P = 2.2 × 10−10, HR = 2.28) even when controlling for the IGHV mutational status and TP53 aberrations, as well as age (Fig. 5c). Of note, the effect of TP53 mutations on TTFT appeared mediated by increased growth rate. The cancer Ne was more significantly correlated with overall survival than the growth rate, and this effect was preserved in the U-CLL subtype (Extended Data Fig. 10a; P = 0.55 for M-CLL, P = 9.90 × 10−7 for U-CLL and overall P = 4.72 × 10−9) and in the multivariate setting with IGHV status, TP53 aberrations and age at sampling (Extended Data Fig. 10b; P = 0.025, HR = 1.33).

Although the inference of the evolutionary parameters on our initial cohort was wholly blinded to the clinical outcomes, we also validated our findings using a second independent cohort of 209 patients with CLL (135 untreated at sampling)28,29 (Supplementary Table 2). These results verified tumour initial growth rate as a predictor of TTFT (Extended Data Fig. 10c–e). Furthermore, the EVOFLUx-derived growth rate was prognostic even when controlling for the contemporary rate of change of lymphocyte counts (P = 0.018; Extended Data Fig. 10f).

These results demonstrate that the evolutionary parameters inferred from a cost-effective methylation array could have a direct clinical application by contributing to predict the clinical behaviour of patients with CLL independently from well-established prognostic variables.

Discussion

Our study establishes a computational framework called EVOFLUx, which enables quantitative measurement of the evolutionary history of human malignancies at massive scale using only widely available and low-cost bulk methylation data as input. Evolutionary histories are fundamentally distinct from characterizations of the contemporary phenotype of cancer cells, such as the fraction of proliferating cells. EVOFLUx methodology should also work identically for sequencing-based methods to measure methylation such as bifulfite-based47,48 and bisulfite-free approaches32,49 (for example, long-read nanopore), which show an excellent correlation with our array data. In theory, these methods allow for assessment of many more fCpGs, increasing inference accuracy. EVOFLUx should also be applicable to tumour-derived cell-free DNA extracted from blood.

Evolutionary histories are strongly associated with disease phenotype and clinical outcomes across lymphoid disease types. We consider this strong evidence that clonal evolution, the fundamental cellular process of disease development, underlies the clinical course of the disease. Consequently, we expect these results to generalize across all cancer types. We note that genome-wide DNA methylation analyses also measure other important biological features of a cancer (for example, molecular subtype14) that could be combined with EVOFLUx-based inference of evolutionary history to further improve the prognostic value of DNA methylation data.

In summary, we present a cost-effective high-throughput platform for measuring cancer evolutionary dynamics at the population scale in patient samples. These fundamental measurements of the disease biology hold substantial prognostic value and represent an innovative asset in the field of precision oncology.

Methods

Assembly and quality control of DNA methylation data

We assembled and processed with a harmonized pipeline14 (v4.1; see Code availability section) 2,430 bulk sample Illumina methylation array data of normal and neoplastic lymphoid cells from previous publications14,21,22,23,24,25,26,27,28,29,30. As healthy control samples, this dataset contained sorted CD19+ B cells (n = 40), CD3+ T cells (n = 35), peripheral blood mononuclear cells (n = 6) and whole-blood samples (n = 6). As tumour samples, we included precursor 797 B-ALLs and 90 T-ALLs at diagnosis, 28 B-ALLs and 2 T-ALLs at relapse, as well as 74 B-ALLs and 12 T-ALLs at complete remission (that is, normal blood); 149 MCLs; 722 CLLs, 55 of its precursor condition MBL and 6 samples from patients with CLL undergoing a DLBCL transformation called Richter transformation; 62 primary DLBCL, not otherwise specified; and 104 multiple myeloma and 16 of its precursor condition monoclonal gammopathy of undetermined significance. In brief, raw idat files were loaded and processed with R (v4.3.1) using the minfi package50,51 (v1.46.0) in batches as specified in the column ‘SSNOB_NORMALIZATION_BATCH’ of Supplementary Table 2. In brief, the data were processed for each batch as follows. First, idats files were loaded into a RGChannelSet object, and minfi quality metrics using the qcReport function were performed, removing samples with unexpected distributions of methylation values (that is, distributions markedly distinct from a bimodal centred around 0 and 1 β-values and/or from the remaining samples) and low signal intensities of internal control probes for each sample, including bisulfite conversions I and II, extension hybridization, hybridization, non-polymorphic, specificities I and II, and target removal probes.

Next, further quality metrics were derived using the function minfiQC on the unnormalized RGChannelSet obejct. Those samples with median signal intensities of unmethylated and methylated channels of at least 10.5 in log2 scale were considered as having good signal intensities. Subsequently, detection P values were calculated across all CpGs and samples using the detectionP function for the unnormalized RGChannelSet object. Samples were considered as good if having a mean detection P value across all CpGs of P ≤ 0.01. On a CpG level, we retained CpGs with a detection P ≤ 1 × 10−16 in 90% or more of the samples, which has been shown to improve the quality of downstream analyses52,53. The RGChannelSet object was normalized with the single-sample batch-independent preprocessNoob function with dye bias correction. We next retained only CpGs (excluding CH probes) that did not contain any SNP neither in the interrogated CpGs nor in the probe extension using the dropMethylationLoci and dropLociWithSnps functions with default options (minor allele frequency (MAF) = 0). Further analyses using long-read nanopore data, Illumina array control probes, annotation packages and a data-driven approach were used to ensure the lack of any genetic confounding in the methylation values of the resulting fCpGs (see the next sections).

Furthermore, CpGs with any previous evidence of potential cross-hybridization were excluded54 and only CpGs mapping to autosomal chromosomes were subsequently retained for downstream analyses. Finally, to further confirm the accuracy of the filtering criteria, we checked the distribution of normalized methylation values and performed principal component analyses separately for samples passing all quality checks as well as those considered as bad samples. The final DNA methylation matrix contained 2,204 samples and 389,180 CpGs passing all the aforementioned quality controls, and included 2,054 patients (22 technical replicates, 3 synchronic and 125 longitudinal samples from the same patients)55 (Supplementary Table 2).

To determine the purity of samples, we used our previously deconvolution strategy to infer tumour cell content by DNA methylation14, which was used as a consensus purity in all the tumour samples except for DLBCL and multiple myeloma. In these two tumour entities, we have previously identified a DNA methylation signature loss causing inaccurate tumour purity predictions using DNA methylation data, and therefore we used available genetic or flow cytometry data for DLBCL and multiple myeloma, respectively.

Pipeline to select fluctuating CpGs

We constructed a pipeline to identify fCpGs in lymphoid tumours, based on the following criteria:

  1. (1)

    Heterogeneous across different participants with the same disease (by accepting CpG loci with the top 5% of standard deviation of methylation value within a cancer type).

  2. (2)

    Equally likely to be methylated or unmethylated (by selecting CpGs with average methylation of approximately 0.5 within a cancer type).

  3. (3)

    Unlikely to be associated with specific cell or cancer types. We used an unsupervised Laplacian score feature selection metric56 to rank CpG loci by their tendency to preserve the nearest-neighbour graph, and accepted the 5% least-informative CpGs.

Exclusion of genetic confounding on fCpGs

We performed a series of analyses to exclude the potential genetic confounding (germline SNPs and somatic SNVs) on our fCpGs. We first excluded the possibility that common germline SNPs caused methylation heterogeneity at fCpG sites between individuals. We observed very distinct methylation dynamics of array control probes containing SNPs (which had been removed during the initial array processing) versus fCpGs. SNP probes showed the same distribution in all samples (Extended Data Fig. 2c), including longitudinally followed cases (Supplementary Fig. 3), whereas fCpGs only showed a W distribution in cancer samples with ongoing fluctuations over time. Thus, although SNPs reflect the stable genetic identity of the individual, fCpGs reflect the identity of a single cell and its evolving lineage. In addition, we used the packages SNPlocs.Hsapiens.dbSNP155.GRCh38 (v0.99.24) and MafH5.gnomAD.v4.0.GRCh38 (v3.19) to check for any known significant germline or somatic genetic confounding on the resulting 978 fCpGs. We found approximately 60% of fCpGs reported in the gnomAD v4 database (with the array background having approximately 65%), but with a very low MAF (median of 1 × 10−5 and mean of 1 × 10−3). To exclude the possibility of unknown or very rare genetic confounding, we used the data-driven gaphunting algorithm57 available in the minfi R package, which further discarded a possible cancer-specific single-nucleotide variation (SNV) that could confound the methylation values at the 978 identified fCpGs. Finally, Oxford Nanopore long read of a subset of normal and neoplastic samples further validated that fCpGs represent de/methylated cytosines (Extended Data Fig. 2d,e; see next section).

Generation and analyses of long-read nanopore data

For long-read methylation sequencing in CLL and Richter transformation samples, concentration was assessed using the Qubit assay and DNA integrity was analysed either with the Femto Pulse System (Agilent) or the Fragment Analyzer (Agilent). When more than 6 µg of material with good integrity was available, DNA was additionally treated with the Short Fragment Eliminator Kit XS (PacBio) and eluted in EB buffer. Approximately 4 µg of DNA was used for library preparation according to the standard LSK114 kit and protocol from Oxford Nanopore. The time for DNA repair and end-prep was increased up to 30 min at 20 °C and 30 min at 65 °C. Adapter ligation was performed for 1 h at room temperature. All elutions were performed at 37 °C for 1.5 h, and 550–600 ng of DNA was loaded onto a FLO-PRO114M (CLL cells) flow cells. Flow cells were washed (EXP-WSH004) after 1–2 days, if pore count decreased to less than 30%. A total of 1–4 washes were performed for each flow cell. Flow cells were run for 100 (CLL cells) hours in total with the Fast model (MinKNOW 23.11.7, Dorado 7.2.13). The raw data were rebasecalled using dorado duplex (v0.5.3) and applying the SUP and modified call to detect 5mC and 5hmC, (model dna_r10.4.1_e8.2_400bps_sup@v4.3.0_5mCG_5hmCG@v1).

In normal B cell samples, 1–3 µg of DNA was used for WGS. Libraries were prepared with the DNA ligation kit LSK110 with no modifications. Libraries were loaded onto a flow cell version FLO-PRO002 (R9.4) and were run for 90–110 h. The basecalling was performed on live mode with the Guppy basecaller (v6.2.7), included in the MinKNOW (v22.08.6), using the SUP model for base modification detection of 5mC and 5hmC (dna_r9.4.1_450bps_modbases_5hmc_5mc_cg_sup.cfg).

In all samples, the generated unmapped BAM files after the basecalling were converted to FASTQ files using the SAMtools fastq -T Mm, Ml command. The FASTQ files were then mapped to BAM files using the command minimap2 -ax map-ont -y../GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.mmi. The methylation values were extracted from the BAMs into bedMethyl files using the in-house tool bam2bedmethyl (v0.3.2) and compressed/indexed using bgzip/tabix. Reads from each strand were combined to generate DNA matrices for each CpG and were used for obtaining the methylation values of all fCpGs.

In addition, mini BAM files containing all reads from the 976 fCpGs were generated (in hg38 genome assembly). The reads showed excellent mappability, with a mean of perfect nucleotide matches (NM tag; Levenshtein distance) for all fCpGs across samples of 96.41% (range of 73.31–97.90), and mean mapping quality (MAPQ) of all the reads covering all fCpGs across samples of 59.510 (range of 2–60). Subsequently, long reads were phased using variants called using Clair 3 (v1.0.9, model r941_prom_hac_g360 + g422)58 with the Longphase package (v1.7)59. The methylation status of each CpG was called using the modcall function within the Longphase package. At fCpGs, only 2.7% of the reads were non-canonical bases (Extended Data Fig. 2d). The variant allele frequency (VAF) of these mutations tended to be low and was negatively correlated with the coverage at that site (Supplementary Fig. 4a). Hence, the majority of these non-canonical base pairs are probably due to errors in nucleotide assignment. There is also no association between the methylation status of different reads and the variants present within a 50-bp window of each fCpG locus (Supplementary Fig. 4b). Hence, assessment of fCpG methylation via bead array was not majorly confounded by miscalled variants. The fCpG methylation patterns seen in the bead array data were replicated in the long-read data (Extended Data Fig. 2e) and the correlation between the fraction methylated measured via bead array and long-read sequencing at fCpGs was excellent (Extended Data Fig. 2e). The same correspondence was observed in WGBS data (Extended Data Fig. 2f).

To assess the intra-sample long-read diversity for each sample, the pairwise Hamming distances were calculated between every read on both haplotypes. The two lists of Hamming distances were concatenated, and the mean calculated as a summary statistic of the read diversity for each sample. One normal B cell sample contained only two reads from one haplotype, and zero from the other, and so was excluded from further analysis.

Analysis of scRRBS data

Previously published single-cell reduced representation bisulfite sequencing (scRRBS) data were obtained6 and the fCpG methylation values extracted methylation values for normal B cells from 6 donors and CLL cells from 12 patients. There was a high dropout rate, so to extract meaningful patterns we plotted a subset of 40 cells and 20 fCpGs with a high density and overlap of fCpGs across single cells as examples (Supplementary Fig. 5a,b).

To compare the full set of data accounting for the high degree of missing data, we used a metric of heterogeneity at a given fCpG that weights by the number of non-missing fCpGs according to:

$${d}_{i}=\sqrt{\frac{{n}_{i}({n}_{i}-1)}{2}}\sigma ({\beta }_{i})$$

Where ni is the number of non-NaN values for the ith fCpG, \(\frac{n(n-1)}{2}\) is the total possible pairwise comparisons between a set of n objects and σ(βi) is the standard deviation across the methylation values of the ith fCpG (Supplementary Fig. 5c).

Characterization and annotation of fCpGs

To characterize the genomic and regulatory context of fCpGs, we used a series of statistical analyses and database annotations. We annotated fCpGs using Illumina manifest and other genomic annotation packages available at Bioconductor including IlluminaHumanMethylation450kanno.ilmn12.hg19 (v0.6.1) and IlluminaHumanMethylationEPICanno.ilm10b2.hg19 (v0.6.0). We additionally used the packages SNPlocs.Hsapiens.dbSNP155.GRCh38 (v0.99.24) and MafH5.gnomAD.v4.0.GRCh38 (v3.19) to check any possible germline or somatic genetic confounding on the resulting 978 fCpGs. We found approximately 60% of fCpGs reported in the gnomAD v4 database (with the array background having approximately 65%), but with a very low MAF (median of 1 × 10−5 and mean of 1 × 10−3). In addition, we used the Illumina 450k and EPIC array internal SNP probes and showed a dramatically distinct methylation dynamics compared with fCpGs in single-timepoint (Extended Data Fig. 2c) and longitudinal (Supplementary Fig. 3) samples. Finally, the data-driven gaphunting algorithm available in the minfi R package was applied with all the previously published thresholds and cut-offs57, which further discarded possible cancer-specific SNV that could confound the methylation values at the 978 identified fCpGs.

We used Chi-squared tests to assess the enrichment of fCpGs in distinct genomic regions or elements. We performed gene-set enrichment analysis on the fCpG-associated genes using gProfiler60, specifically focusing on the Gene Ontology biological processes61 and the Human Protein Atlas62. The statistical domain space was limited to genes targeted by at least one CpG in the 389,180 candidate CpG set and significance was determined using the g:SCS algorithm63. Previous chromatin segmentation of normal and neoplastic B cells was used to assess the chromatin-state enrichment of fCpG14,64.

fCpGs were checked for their overlap with previous ‘epigenetic clocks’, including mitotic14,65,66,67,68, chronological age69,70,71,72,73,74,75,76,77,78, gestational age79,80,81,82,83, biological age and mortality84,85,86 and trait predictors87,88. The package methylCIPHER (https://github.com/MorganLevineLab/methylCIPHER) was used to obtain the CpGs for most of the epigenetic clocks. The package methylclock (v1.10.0) was used to calculate all epigenetic clocks but epiCMIT, which was derived as previously described14.

CLL RNA sequencing data

Previously available RNA sequencing data for 294 patients with CLL were obtained33 and processed as previously described26. Matched RNA sequencing data and DNA methylation data for the same patients at the same timepoint were available for 224 patients with CLL. Transcript per million counts were used to represent differential gene expression values across genes and samples. We used the gene annotation provided in the R Bioconductor package IlluminaHumanMethylationEPICanno.ilm10b2.hg19 to classify genes associated with fCpGs. Genes targeted by any fCpG were considered as ‘fCpG genes’.

In each methylation sample, the 978 fCpGs were discretized as homozygous demethylated, heterozygous methylated or homozygous methylated (coded as [0,1,2], respectively). This was done by separately fitting a β-mixture model with three components to each sample using Stan89 and extracting the component mixture probability. The gene expression value for genes classified as having and fCpG with 0, 1 or 2 alleles methylated were plotted as previously described.

DNA methylation data from normal blood samples

External DNA methylation data were download from the Gene Expression Omnibus database using the GEOquery R package (v2.72.0). For sorted immune cells, these include GSE137594 and GSE184269. For whole-blood samples, these include GSE72773, GSE55763, GSE40279 and GSE36054. Data were analysed with the normalization procedure used in each study together with the metadata provided. Mean and standard deviation for fCpGs were calculated with fCpGs present in the provided normalized matrices.

A stochastic model of fCpGs in a growing population

We built a generative computational model of how the patterns of fCpGs vary over time (t) according to the evolutionary history of a cancer. Initially, our model focused on neutral evolution, before expanding to non-neutral modes of tumour evolution below. For the full explanation of the model, see the Supplementary Information.

Our model was parameterized in terms of the age of the patient at which the MRCA emerged (τ), the exponential growth rate of the cancer (θ) and the epigenetic switching rates of the fCpGs (μ, ν, γ and ζ). The model was partitioned into two phases: before and after the emergence of the MRCA. At time t = 0, the fCpGs were assumed to be equally likely to be homozygously methylated or demethylated. The fCpG status of the MRCA at time t = τ was calculated by applying matrix exponentiation.

The second phase of the model consisted of a discrete time Markov process. The effective population size of the growing cancer was modelled as growing according to a deterministic exponential growth equation, Ne = eθ(T − τ). Each fCpG was considered independently; at each time step, t → t + δt, the number of homozygous-methylated (m), heterozygous-methylated (k) and homozygous-demethylated cells (w) at a specific fCpG was updated according to the epigenetic switching rates.

At the time of sample, T, the fraction methylation of each simulated fCpG was calculated by summing the number of methylated alleles and normalizing by the total number of alleles in the population:

$${\beta }_{c}=\frac{k+2m}{2{N}_{e}}$$

We further accounted for contaminating normal cells and the technical noise introduced by the methylation bead array. The methylation of the contaminated samples was assumed to be an average of the cancer methylation, βc(t), weighted by the tumour purity ρ, and the average of the normal population, βn, weighted by 1 − ρ. Following our previous work, the bead array was assumed to saturate at extreme methylation values, shifting the minimum and maximum methylation by δ and ε, respectively4. The noise of the bead array was assumed to be β-distributed, with precision parameter κ.

Non-neutral models of tumour evolution

Alongside our model of neutral exponentially growing cancer populations, we devised two alternative models of cancer growth:

  1. (1)

    A subclonal selection model in which a single cell within the cancer develops a selective advantage and begins to grow at an increased growth rate.

  2. (2)

    An independent clonal origins model, in which a patient has developed two distinct cancers concurrently.

For the subclonal selection model, we replaced the growth rate (θ) and the time of the MRCA (τ) with the growth rates and time of the MRCA of the initial, slower-growing population (θ1 and τ1, respectively), and that of the more recently emerging, faster-growing population (θ2 and τ2), constraining τ1 < τ2 and θ1 < θ2 (Extended Data Fig. 8a). We assumed that the initial cancer population began exponentially growing at τ1 as above, but at time t = τ2, we selected a single cell with a set of fCpG states drawn according to the cancer population and allowed this second population to grow concurrently with a growth rate θ2.

The independent-cancer model followed the same scheme as the nested subclonal selection model, except the methylation status of the emerging cancer was that of an independent cell that experienced random fluctuations between t = 0 and t = τ2.

If we let the number of cells in the less fit subclone in each methylation state be {m1, k1, w1} and in the fitter subclone be {m2, k2, w2}, following the convention above, then in both cases the measured methylation patterns at the time of sample are:

$${\beta }_{c}(T)=\frac{{k}_{1}(T)+2{m}_{1}(T)+{k}_{2}(T)+2{m}_{2}(T)}{2{N}_{e}(T)}$$

Where \({N}_{e}(T)={e}^{{\theta }_{1}(T-{\tau }_{1})}+{e}^{{\theta }_{2}(T-{\tau }_{2})}\).

Adaption of simulations to a longitudinal setting

We modified the simulations of how the fCpG methylation distribution changes over time to allow for multiple sequential sample collections. These simulations allow for neutral, independent clones, a single subclonal expansion or two subclonal expansions, which can either be nested or emerge from the clonal trunk in parallel. This required pre-specification of sampling times, along with the emergence times of any subclones or independent clones, which we collected to form a set of ‘landmark times’. The discrete time steps of the simulation were split into phases between the landmark times, which evolved according to the discrete time Markov process outlined above. At each sampling time, the fCpG methylation fraction was calculated as above and stored as a column in the output matrix.

Prior functions

For each methylation array blood sample, we had matched age (T) and purity (ρ) information. Hence, the parameters to be inferred are the growth rate (θ), the age of the patient when the MRCA emerged (τ), the epigenetic switching rates (μ, ν, γ, ζ), the average fraction methylated of contaminating normal cells (βn), the β-offsets from 0 and 1 due to the background noise on the methylation array (δ and ε, respectively) and the precision of the β-distributed noise (κ).

These parameters are constrained either to be positive (θ, μ, ν, γ, ζ, κ > 0) or to lie within a specified range (0 < τ/T, δ, ε < 1), which we achieved using appropriate prior distributions. To better allow for priors to be set on a biologically meaningful scale, the priors for the log-normal distribution were set in terms of the real scale mean and standard deviation, rather than the standard log-scale. To reduce correlations in the posterior and make sampling more efficient, the variables ν and ζ were normalized by μ and γ, respectively.

The priors are as follows:

$$\theta \sim {\rm{lognormal}}(\mathrm{3,2})$$
$$\frac{\tau }{T} \sim {\rm{beta}}(2,2)$$
$$\mu \sim {\rm{halfnormal}}(0,0.05)$$
$$\gamma \sim {\rm{halfnormal}}(0,0.05)$$
$$\frac{\upsilon }{\mu } \sim {\rm{lognormal}}(1,0.7)$$
$$\frac{\zeta }{\gamma } \sim {\rm{lognormal}}(1,0.7)$$
$${\beta }_{n} \sim {\rm{beta}}(2,2)$$
$$\delta \sim {\rm{beta}}(5,95)$$
$${\epsilon } \sim {\rm{beta}}(95,5)$$
$$\kappa \sim {\rm{halfnormal}}(100,30)$$

When fitting non-neutral models of tumour growth, the inference was parameterized in terms of the relative growth of the fitter subclone, \({\tilde{\theta }}_{2}=\frac{{\theta }_{2}}{{\theta }_{1}}\), and the fraction of the population consisting of the fitter subclone, \(f=\frac{{e}^{{\theta }_{2}(t-{\tau }_{2})}}{{e}^{{\theta }_{1}(t-{\tau }_{1})}+{e}^{{\theta }_{2}(t-{\tau }_{2})}}\). The age at which the second clone emerges is then:

$${\tau }_{2}=T-\frac{(T-{\tau }_{1}){\theta }_{1}}{{\theta }_{2}}-\frac{{\rm{logit}}(f)}{{\theta }_{2}}$$

This parameterization induces less correlation in the resulting posterior, which greatly improves the sampling efficiency. The priors on these additional parameters are:

$$\frac{{\tau }_{1}}{T} \sim {\rm{beta}}(2,2)$$
$${\widetilde{\theta }}_{2} \sim {\rm{lognormal}}(1,0.7)$$
$$f \sim {\rm{beta}}(2,2)$$

All the other priors were the same as in the neutral case.

Bayesian inference

We developed a stochastic estimator of the log-likelihood function at a given set of parameters by simulating the fCpG methylation distribution a large number of times, correcting for the bias inherent with using a finite number of simulations and penalizing the log-likelihood for extreme values of the Ne (see Supplementary Information for details).

The standard Bayesian algorithms developed to infer the posterior for a given set of data (for example, Markov chain Monte Carlo (MCMC), nested sampling) are typically used when the log-likelihood is analytically tractable and can be calculated exactly. It has been shown that, as long as the stochastic approximation of the log-likelihood is unbiased, MCMC methods can obtain an exact Bayesian inference of the true posterior, as in pseudo-marginal Metropolis–Hastings90.

Here we used a nested sampling approach using the dynesty package91,92,93. Unlike pseudo-marginal Metropolis–Hastings, nested sampling is able to efficiently explore multimodal posterior landscapes (which can occur under the subclonal and independent cancer models).

Model selection for the mode of tumour evolution

We used an expected log pointwise predictive density94 approach to compare our competing models of evolution for each sample using the arviz Python package95, which uses PSIS-LOO-CV to compare the out-of-sample prediction accuracy between models while naturally penalizing more complex models. This required the log-likelihood per data point and the posterior predictive for every point in the posterior. The weights of the respective models were calculated using pseudo-Bayesian model averaging using Akaike-type weighting, stabilized using the Bayesian bootstrap96.

CLL and Richter transformation genomic analyses

Previous mutated annotation files from WES46 and WGS27 data were used to further validate our distinct EVOFLUx evolutionary modes (that is, neutral, subclonal and independent) and Richter transformation phylogenies.

Subclonal deconvolution of WES and WGS data

To detect subclones in bulk WES and WGS data, we used MOBSTER43, which fits the VAF spectrum with a mixture model containing a Pareto distribution to account for the neutral tail97 and a variable number of β-distributions to account for the clonal and subclonal peaks.

We ran MOBSTER using the default parameters, except using a minimum 5% VAF threshold and lowering the minimum number of mutations to compose a cluster to five in WES samples due to the low number of mutations. We then manually quality controlled all 377 WES samples and 10 WGS, tuning the fitting parameters to better represent the data (for instance, when the clonal peak had been called at a low frequency despite the median tumour purity being 95%).

Phylogenetic inference of longitudinal methylation data

A novel Bayesian phylogenetic method was used to reconstruct the evolutionary relationships and the time to MRCA of longitudinal samples from the same patients. This was carried out in the BEAST (v1.8.4) framework98,99 using custom models implemented in PISCA100 (v1.1; available from https://github.com/adamallo/PISCA).

EVOFLUx provided an estimate of the age of the patient when the MRCA of each bulk sample emerged. To estimate the methylation status of each fCpG at the MRCA of the sample in each of our longitudinal samples, we discretized the fCpGs as described above (see the section ‘CLL RNA sequencing data’).

We implemented a four-parameter biallelic binary substitution model analogous to the pre-growth EVOFLUx model in PISCA. This plugin contains all the required statistical machinery to use this model for somatic phylogenetic estimation. The biallelic binary substitution model has three relative rate parameters: (1) heterozygous methylation \(\tilde{\upsilon }\), (2) homozygous demethylation \(\tilde{\gamma }\), and (3) heterozygous demethylation \(\tilde{\zeta }\), where homozygous methylation \(\tilde{\mu }\) was normalized to 1. For all relative transition rate parameters, a log-normal prior with mean of 1 and standard deviation of 0.6 was used, with a half-normal prior with mean of 0 and standard deviation of 0.13 for the molecular clock rate, using a strict clock model for the rate of evolution across the tree. Two demographic tree models, constant population size101 and exponential growth102, were compared by marginal likelihood estimation using path-sampling103 and a constant population model was deemed more appropriate.

MCMC chains were run for 100 million generations sampled every 100,000 generations and convergence was assessed using Tracer (v.1.7)104, ensuring effective sample sizes (ESS) greater than 500 for all parameters. Maximum clade credibility trees were then made using 10% burn-in and medium node heights. The resulting trees were plotted using ggtree105.

Phylogenetic inference of SNVs from WGS data

Each bulk sample is represented by a set of clonal mutations found during the deconvolution of WGS data (see above). Where a mutation was deemed absent in the clonal peak, the reference nucleotide was used. Mutational signature assignment106 was used to select mutations in the clock-like SBS1 channel107. BEAST (v1.10)108 was then used with the simple binary substitution model (as SBS1 effectively represents just C-to-T substitutions), a strict clock model, a constant population size prior101 and a flat prior on the age of MRCA (from zero to earliest patient sample), with ancestral state estimation at the root. Chains were run and ESS values assessed as described above. The distances between the ancestral state of the root at each MCMC state and the clock rate were used to calculate the expected evolution distance between the root and the known germline. This was used to inform the length of the branch between germline (at birth) and the MRCA of the samples.

Survival analysis

Clinical analyses were performed in CLL for TTFT and overall survival from the time of sampling. Tumour growth rate (θ), effective population size (Ne) and epigenetic switching rates were analysed as continuous variables in univariate Cox regression models for both TTFT and overall survival. The effect size of HRs for each evolutionary variable were analysed considering different scaling factors. In particular, the growth rate was analysed assuming exponential growth (that is, for θ = 1, the population is e = 2.71 times bigger per year), the Ne was considered per million cells, and the cancer age or time from the MRCA was analysed for each 10 years. Individual switching rate parameters (μ, ν, γ and ζ) were largely uninformative of prognosis and were summarized into a mean epigenetic switching rate, which was scaled by a factor of 100. In addition, growth rate and effective population were analysed as continuous variables in multivariate Cox regression models together with TP53 aberrations (considering mutations and deletions together), IGHV gene mutational status and the age of patients at sampling. Kaplan–Meier curves were generated for low and high growth rates and effective population size within IGHV subtypes using maximally selected log-rank statistic using the maxstats package (v0.7-25). P values from Kaplan–Meier curves were derived using the log-rank statistic. Survival (v3.5-7), survminer (v0.4.9) and ggsurvfit (v0.3.1) packages were used under R (v4.3.1). Plots were generated using ggplot2 (v3.5.2).

Estimating the rate of change in lymphocyte counts

Historical records of the absolute number of lymphocytes in blood obtained via haemocytometer were collected for patients with CLL over the whole disease course (that is, an approximate of the number of malignant CLL cells in blood). In 231 patients with CLL, we could obtain at least 10 sample timepoints (that is, at least 10 medical appointments, median n = 27 and mean n = 34) before the first treatment, allowing us to track the natural history of the disease before treatment intervention for the tumour (Supplementary Fig. 10). We fitted a linear model to all 231 cases and obtained the slope of the observed log number of lymphocytes (that is, the coefficient of the univariate linear model) and compared it with growth rate estimates derived from EVOFLUx.

Statistical analysis

Statistical tests performed throughout the study were performed as two-sided. Appropriate multiple test correction, such as the Holm–Sidak correction, is noted when applied.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.