Abstract
Cell division drives somatic evolution but is challenging to quantify. We developed a framework to count cell divisions with DNA replication-related mutations in polyguanine homopolymers. Analyzing 505 samples from 37 patients, we studied the milestones of colorectal cancer evolution. Primary tumors diversify at ~250 divisions from the founder cell, while distant metastasis divergence occurs significantly later, at ~500 divisions. Notably, distant but not lymph node metastases originate from primary tumor regions that have undergone surplus divisions, tying subclonal expansion to metastatic capacity. Then, we analyzed a cohort of 73 multifocal lung cancers and showed that the cell division burden of the tumors’ common ancestor distinguishes independent primary tumors from intrapulmonary metastases and correlates with patient survival. In lung cancer too, metastatic capacity is tied to more extensive proliferation. The cell division history of human cancers is easily accessible using our simple framework and contains valuable biological and clinical information.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The data, including all polyguanine genotypes, needed to reproduce all analyses and figures is available without restrictions from https://doi.org/10.5281/zenodo.14269963 ref. 62.
Code availability
The code to run all the analyses presented is available without restrictions from https://doi.org/10.5281/zenodo.14269963 ref. 62.
References
Jones, S. et al. Comparative lesion sequencing provides insights into tumor evolution. Proc. Natl Acad. Sci. USA 105, 4283–4288 (2008).
Werner, B. et al. Measuring single cell divisions in human tissues from multi-region sequencing data. Nat. Commun. 11, 1035 (2020).
Hu, Z. et al. Quantitative evidence for early metastatic seeding in colorectal cancer. Nat. Genet. 51, 1113–1122 (2019).
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
Williams, N. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022).
Moeller, M. E., Mon Père, N. V., Werner, B. & Huang, W. Measures of genetic diversification in somatic tissues at bulk and single-cell resolution. eLife 12, RP89780 (2024).
Seplyarskiy, V. B. & Sunyaev, S. The origin of human mutation in light of genomic data. Nat. Rev. Genet. 22, 672–686 (2021).
Gao, Z., Wyman, M. J., Sella, G. & Przeworski, M. Interpreting the dependence of mutation rates on age and time. PLoS Biol. 14, e1002355 (2016).
Yaacov, A., Rosenberg, S. & Simon, I. Mutational signatures association with replication timing in normal cells reveals similarities and differences with matched cancer tissues. Sci. Rep. 13, 7833 (2023).
Viguera, E., Canceill, D. & Ehrlich, S. D. Replication slippage involves DNA polymerase pausing and dissociation. EMBO J. 20, 2587–2595 (2001).
Boyer, J. C. et al. Sequence dependent instability of mononucleotide microsatellites in cultured mismatch repair proficient and deficient mammalian cells. Hum. Mol. Genet. 11, 707–713 (2002).
McDonald, M. J. et al. Mutation at a distance caused by homopolymeric guanine repeats in Saccharomyces cerevisiae. Sci. Adv. 2, e1501033 (2016).
Schlötterer, C. Evolutionary dynamics of microsatellite DNA. Chromosoma 109, 365–371 (2000).
Shibata, D., Navidi, W., Salovaara, R., Li, Z. H. & Aaltonen, L. A. Somatic microsatellite mutations as molecular tumor clocks. Nat. Med. 2, 676–681 (1996).
Tsao, J. L. et al. Genetic reconstruction of individual colorectal tumor histories. Proc. Natl Acad. Sci. USA 97, 1236–1241 (2000).
Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U. & Shapiro, E. Genomic variability within an organism exposes its cell lineage tree. PLoS Comput. Biol. 1, e50 (2005).
Salipante, S. J. & Horwitz, M. S. Phylogenetic fate mapping. Proc. Natl Acad. Sci. USA 103, 5448–5453 (2006).
Wasserstrom, A. et al. Estimating cell depth from somatic mutations. PLoS Comput. Biol. 4, e1000058 (2008).
Biezuner, T. et al. A generic, cost-effective, and scalable cell lineage analysis platform. Genome Res. 26, 1588–1599 (2016).
Naxerova, K. et al. Hypermutable DNA chronicles the evolution of human colon cancer. Proc. Natl Acad. Sci. USA 111, E1889–E1898 (2014).
Naxerova, K. et al. Origins of lymphatic and distant metastases in human colorectal cancer. Science 357, 55–60 (2017).
Reiter, J. G. et al. Lymph node metastases develop through a wider evolutionary bottleneck than distant metastases. Nat. Genet. 52, 692–700 (2020).
Cheek, D. & Shneer, S. The empirical mean position of a branching Lévy process. J. Appl. Probab. 57, 1252–1259 (2020).
Tsao, J. L. et al. Colorectal adenoma and cancer divergence. Evidence of multilienage progression. Am. J. Pathol. 154, 1815–1824 (1999).
Athreya, K. B. & Ney, P. E. Branching Processes (Springer, 1972).
Kimura, M. & Ohta, T. Stepwise mutation model and distribution of allelic frequencies in a finite population. Proc. Natl Acad. Sci. USA 75, 2868–2872 (1978).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Kapadia, C. D. et al. Clonal dynamics and somatic evolution of haematopoiesis in mouse. Preprint at bioRxiv https://doi.org/10.1101/2024.09.17.613129 (2024).
Snippert, H. J. et al. Intestinal crypt homeostasis results from neutral competition between symmetrically dividing Lgr5 stem cells. Cell 143, 134–144 (2010).
Cheshier, S. H., Morrison, S. J., Liao, X. & Weissman, I. L. In vivo proliferation and cell cycle kinetics of long-term self-renewing hematopoietic stem cells. Proc. Natl Acad. Sci. USA 96, 3120–3125 (1999).
Neher, R. Dynamic Aspects of DNA: DNA-Slippage and Nucleosome Dynamics. PhD Dissertation, Ludwig–Maximilians–Univ. (2007).
Tomasetti, C., Vogelstein, B. & Parmigiani, G. Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc. Natl Acad. Sci. USA 110, 1999–2004 (2013).
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
Chen, C.-D., Yen, M.-F., Wang, W.-M., Wong, J.-M. & Chen, T.H.-H. A case–cohort study for the disease natural history of adenoma–carcinoma and de novo carcinoma and surveillance of colon and rectum after polypectomy: implication for efficacy of colonoscopy. Br. J. Cancer 88, 1866–1873 (2003).
Muzny, D. M. et al. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Hu, Z., Li, Z., Ma, Z. & Curtis, C. Multi-cancer analysis of clonality and the timing of systemic spread in paired primary tumors and metastases. Nat. Genet. 52, 701–708 (2020).
Ulintz, P. J., Greenson, J. K., Wu, R., Fearon, E. R. & Hardiman, K. M. Lymph node metastases in colon cancer are polyclonal. Clin. Cancer Res. 24, 2214–2224 (2018).
Klein, C. A. Parallel progression of primary tumours and metastases. Nat. Rev. Cancer 9, 302–312 (2009).
Rew, D. A., Wilson, G. D., Taylor, I. & Weaver, P. C. Proliferation characteristics of human colorectal carcinomas measured in vivo. Br. J. Surg. 78, 60–66 (1991).
Bozic, I., Gerold, J. M. & Nowak, M. A. Quantifying clonal and subclonal passenger mutations in cancer evolution. PLoS Comput. Biol. 12, e1004731 (2016).
Girard, N. et al. Genomic and mutational profiling to assess clonal relationships between multiple non-small cell lung cancers. Clin. Cancer Res. 15, 5184–5190 (2009).
Detterbeck, F. C. et al. The IASLC Lung Cancer Staging Project: background data and proposed criteria to distinguish separate primary lung cancers from metastatic foci in patients with two lung tumors in the forthcoming eighth edition of the TNM Classification for Lung Cancer. J. Thorac. Oncol. 11, 651–665 (2016).
Detterbeck, F. C. et al. The IASLC Lung Cancer Staging Project: background data and proposals for the classification of lung cancer with separate tumor nodules in the forthcoming eighth edition of the TNM Classification for Lung Cancer. J. Thorac. Oncol. 11, 681–692 (2016).
Yang, C.-Y. et al. Genomic profiling with large-scale next-generation sequencing panels distinguishes separate primary lung adenocarcinomas from intrapulmonary metastases. Mod. Pathol. 36, 100047 (2023).
Burr, R. et al. Developmental mosaicism underlying EGFR-mutant lung cancer presenting with multiple primary tumors. Nat. Cancer 5, 1681–1696 (2024).
Wassenaar, E. C. et al. A unique interplay of access and selection shapes peritoneal metastasis evolution in colorectal cancer. Preprint at bioRxiv https://doi.org/10.1101/2024.09.25.614736 (2024).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
Al Bakir, M. et al. The evolution of non-small cell lung cancer metastases in TRACERx. Nature 616, 534–542 (2023).
Engstrand, J., Nilsson, H., Strömberg, C., Jonas, E. & Freedman, J. Colorectal cancer liver metastases—a population-based study on incidence, management and survival. BMC Cancer 18, 78 (2018).
Koi, M. et al. Human chromosome 3 corrects mismatch repair deficiency and microsatellite instability and reduces N-methyl-N′-nitro-N-nitrosoguanidine tolerance in colon tumor cells with homozygous hMLH1 mutation. Cancer Res. 54, 4308–4312 (1994).
Salipante, S. J., Thompson, J. M. & Horwitz, M. S. Phylogenetic fate mapping: theoretical and experimental studies applied to the development of mouse fibroblasts. Genetics 178, 967–977 (2008).
Guyot D’Asnières De Salins, A. et al. Discordance between immunochemistry of mismatch repair proteins and molecular testing of microsatellite instability in colorectal cancer. ESMO Open 6, 100120 (2021).
Sammalkorpi, H. et al. Background mutation frequency in microsatellite-unstable colorectal cancer. Cancer Res. 67, 5691–5698 (2007).
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
National Cancer Institute. The Cancer Genome Atlas Program (TCGA) https://www.cancer.gov/ccg/research/genome-sequencing/tcga (2022).
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Blohmer, M. Quantifying cell divisions along evolutionary lineages in cancer. Github https://github.com/mblohmer/cell_divs_polyg (2024).
Acknowledgements
This work was supported by funding from the National Institutes of Health/National Cancer Institute (grants nos. R37CA225655, R01CA279054, R01CA26928), an American Association for Cancer Research NextGen Grant for Transformative Cancer Research and an Emerging Leader Award from the Mark Foundation for Cancer Research (all to K.N.). We thank the Boston Children’s Organoid Core and the Microscopy Resources on the North Quad (MicRoN) core at Harvard Medical School.
Author information
Authors and Affiliations
Contributions
M.B., D.M.C., W.-T.H., I.-H.L., E.C.E.W., A.N.G. and K.N. analyzed the data. D.M.C. developed the mathematical framework. M.B., W.-T.H., M. Kessler, F.C., J.W., W.H. and E.C.E.W. performed the experiments. C.-Y.Y., Y.-C.Y., H.-L.H., M.L., S.I.P., O.K., J.K.L. and T.-Y.C. obtained and reviewed the clinical samples and clinical data. D.S., M.M.K. and M. Kloor contributed to data interpretation. K.N., M.B., W.-T.H. and D.M.C. designed the study. K.N., M.B. and D.M.C. wrote the paper with input from all authors. K.N. supervised and directed the research.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Peter Campbell, Benjamin Izar and Peter Van Loo for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Whole genome doubling simulations.
Simulation of the interaction of L1 distance and whole genome duplication. Insertion and deletions at 50 polyguanine tracts were simulated with a mutation rate of 4 ×10−4 with (a) or without (b) whole genome duplication. Blue line, mean L1 value among 1000 simulations; grey bars, 2.5 and 97.5 percentiles for simulated L1 value; dashed line, product of cell divisions and μ. The data was smoothed by applying a rolling mean over 10 cell divisions.
Extended Data Fig. 2 Quartet similarity between true and inferred phylogenies.
Observed quartet similarity between experimentally defined and inferred phylogenies (dashed red lines) compared to null distribution of quartet similarities obtained by randomly permuting experimental phylogeny labels 10,000 times (histograms). a, mismatch repair proficient trees and b, mismatch repair deficient trees.
Extended Data Fig. 3 Polyguanine mutation rates in additional model systems.
For the cell lines HT-29 (a, colorectal cancer), A549 (b, lung adenocarcinoma), HMEC (c, normal breast epithelium), and RPTEC (d, normal kidney epithelium), an outline of the experimental design is shown (left panel), as well as the mutation rate estimation using all pairwise comparisons among samples (right panel). Cell divisions on the x-axis were plotted with a jitter of 0.8 cell divisions to aid visualization of all data points. The mutation rate per repeat per division is inferred as the slope of a linear regression between L1 distance and cell divisions; the confidence interval and p-value is shown. Lines show the mean L1 distance according to a linear regression model and the shaded areas show the standard error of the estimate.
Extended Data Fig. 4 Mutation accumulation in cells with different division rates.
a, From five mice, aged 20-24 months, we obtained clonal cell populations from 20 hematopoietic stem cells (HSCs) and 81 intestinal crypts. b, Distance of HSCs (n = 20) and intestinal crypts (n = 81) from a reference HSC sample (this analysis is done separately for each mouse, but all samples are combined in the plot). c, All pairwise L1 distances among HSCs (n = 54) and intestinal crypts (n = 644). As in b, this analysis is done separately in each mouse. d, Schematic of cell division histories consistent with our results. Two HSCs are separated by a median L1 distance of 0.0312. We think that the vast majority of this L1 burden must be embryonic divisions. Our mice have an average age of 22 months, which translates to approximately 660 days. If HSCs in fact divide every 57 days, the total divisions completed during postnatal life would be less than 12 – a number that is likely small in comparison with the number of embryonic divisions. We therefore suspect that embryonic divisions contribute approximately 0.0312/2 = 0.0156 of L1 burden to each lineage. For intestinal crypts, subtracting the embryonic burden (and the distance to the reference HSC) from the total L1 burden of 0.045 leaves us with a postnatal L1 burden of 0.0138. We can try to convert this burden into a mutation rate by leveraging the relatively well-understood proliferation rate of intestinal stem cells. Lgr5+ intestinal stem cells divide approximately every 2 days. We therefore calculate that stem cells have divided approximately 660/2 = 330 times during the animal’s life. Converting the L1 burden 0.0138 to a mutation rate with this number of divisions, we obtain 4.2 x 10−5 – a rate that is remarkably similar to our in vitro estimates across human cell lines. P-values in b and c were calculated using a two-sided Wilcoxon test. Box plot elements: center line, median; box limits, lower and upper quartiles; whiskers, lowest and highest value within 1.5 IQR. Graphics in a were created with BioRender.com.
Extended Data Fig. 5 Patient level summaries of evolutionary landmarks.
Patient-level point estimates (dot) and confidence intervals (shaded area) of cell divisions from zygote to the most recent common ancestor (MRCA) of all tumor cells (a), primary tumor diversification (b), divergence of lymph node metastases (c), and divergence of distant metastases (d). 95% confidence intervals were calculated with 1,000 bootstrap replicates of polyguanine repeats across both the samples shown and the in vitro experiment in Fig. 2 (incorporating uncertainty in the mutation rate estimate). Dashed lines indicate the cohort medians.
Extended Data Fig. 6 Cell divisions during colorectal cancer (CRC) metastasis.
a, Cell division burden of CRC most recent common ancestors (MRCAs, n = 30) before (x-axis) and after (y-axis) subtraction of repeats located on recurrently altered chromosomes (7, 8, 13, 18, 20). Line, linear regression; shaded area, standard error. b, Cell divisions from cancer MRCA to mismatch repair proficient (MMRp), mismatch repair deficient (MMRd) primary tumor (n = 150 and n = 33), lymph nodes metastasis (n = 106 and n = 24) and distant metastasis (n = 65 and n = 4) samples. P-values, two-sided Wilcoxon test. c, Purity-independent distance to primary tumor, calculated as {1 – coalescence ratio} of all pairs of lymph node metastases (n = 526) and distant metastases (n = 383) and primary tumor samples. Normalized by the patient median of {1 – coalescence ratio} for all primary tumor sample pairs (n = 397). P-value, two-sided Wilcoxon test. d, Divisions from lymph node metastasis divergence to tumor samples. Lines connect patient medians (n = 32). P-value, two-sided paired Wilcoxon test. e, Schematic explaining primary tumor diversification. MRCAs of all unique primary tumor sample pairs (red dots) are located on the tree and their cell divisions from the MRCA of all cancer samples (grey dot) are recorded. The median represents primary tumor diversification. f, Cell division burden at primary tumor diversification (n = 36), lymph node (n = 21) and distant metastasis (n = 26) divergence. Metachronous (meta) and synchronous (syn) distant metastases are included. P-values, two-sided paired Wilcoxon test with multiple hypothesis testing adjustment using the Holm method. g, Illustration of the analysis in Fig. 3j. For each metastasis, the primary tumor region with the smallest L1 distance is chosen. The median cell division burden (from tumor MRCA) of those areas is contrasted to the median cell division burden of all remaining primary tumor regions. h, As in Fig. 3j for lymph node metastases (n = 18). P-value, two-sided paired Wilcoxon test. i, Time from tumor initiation to metastasis divergence, assuming a division rate of 1/(4 days). Dots, median values; text, number of metastases; lines, range. Colors indicate distance from tumor initiation. Box plot elements: center line, median; box limits, lower and upper quartiles; whiskers, lowest and highest value within 1.5 IQR.
Extended Data Fig. 7 Additional details on the multifocal lung cancer cohort.
a, Clinical variables relevant to the differentiation between independent and metastatic multifocal lung cancers. Multifocal lung cancers are commonly classified as metastatic if they arise within two years of each other, or if they are located in different lung lobes while a lymph node metastasis is present42. Time (in years) between first resection and last resection of a tumor is shown. Colors show presence of local lymph node metastases (at either resection). Shapes show the location difference between tumors. b, Purity-corrected cell division burden of lung adenocarcinomas (n = 6) from Burr, Leshchiner, Costantino et al.45 and colorectal cancers (n = 45) from Wassenaar, Gorelick et al.46 For both cancer types, all samples with a purity above 30% were included. P-value, two-sided Wilcoxon test. c, SBS1 mutation burden of lung adenocarcinomas (n = 36) and colorectal cancers (n = 60) in the PCAWG cohort. P-value, two-sided Wilcoxon test. d, As in Fig. 4j for overall survival. P-value, two-sided log-rank test. e, As in Fig. 4k for tumor pairs classified as independent (n = 7). P-value, two-sided paired Wilcoxon test. Box plot elements: center line, median; box limits, lower and upper quartiles; whiskers, lowest and highest value within 1.5 IQR.
Extended Data Fig. 8 The coalescence ratio (CoaR) correlates with survival in patients with multifocal lung cancer.
a, CoaRs of inter-individual lung adenocarcinoma and colorectal cancer pairs (that is tumors from different patients are compared to each other, n = 1973 and n = 43,359, respectively), b, Kaplan-Meier curve of overall survival (OS) for groups defined by the CoaR. Every patient is represented by the tumor pair with the highest CoaR. P-value, two-sided log-rank test. c, d, P-value of a two-sided log-rank test of progression-free survival (c) and overall survival (d) as a function of the CoaR cut-off that differentiates independent and metastatic tumors (black line, left y-axis). The CoaR was varied in steps of 0.01 from the lowest to the highest observed value, and at each step the group of patients with all tumor pairs below the cut-off was compared to the group of patients with at least one tumor pair above the cut-off. The fraction of patients in whom the CoaR of at least one tumor pair was above the cut-off is shown as a red line and annotated on the right y-axis. A p-value of 0.05 is annotated with a dotted black line. e, Cox proportional hazards model of progression-free survival. Model is adjusted for all documented clinicopathological risk factors. Squares display the maximum likelihood estimate for the hazard ratio and lines around them demarcate the 95% confidence interval. Dotted lines denote specific hazard ratios. P-values on the right were obtained using a two-sided Wald test. Box plot elements: center line, median; box limits, lower and upper quartiles; whiskers, lowest and highest value within 1.5 IQR.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–9 and Notes 1–5.
Supplementary Tables (download XLSX )
Supplementary Tables 1 and 2.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Blohmer, M., Cheek, D.M., Hung, WT. et al. Quantifying cell divisions along evolutionary lineages in cancer. Nat Genet 57, 706–717 (2025). https://doi.org/10.1038/s41588-025-02078-5
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02078-5
This article is cited by
-
Evolutionary paths towards metastasis
Nature Reviews Cancer (2025)
-
Quantifying cell divisions along evolutionary lineages in cancer
Nature Genetics (2025)


