Main

Cancer begins when cells acquire new malignant traits that increase cell growth, survival and invasion5. Investigating the earliest steps in tumour initiation has important implications for understanding cancer biology, improving risk stratification and prevention6,7. However, studying this process in human tissues is challenging, because phenotypic changes and evolutionary progression make it difficult to characterize the cells that originate the lesion when sampling only malignant tumours. Premalignant lesions are therefore a crucial window into these earlier initiating events, because they reflect an important, often long, period of molecular and evolutionary change where cells acquire the necessary traits for full-blown malignancy8,9.

Therefore, to study the initiation of colorectal tumours, we collected and profiled numerous premalignant colon polyps from several individuals with familial adenomatous polyposis (FAP), a hereditary cancer predisposition syndrome caused by a germline heterozygous loss-of-function mutation in the APC gene4. Starting in adolescence, individuals with FAP develop many colorectal polyps, some of which will inevitably progress to colorectal cancer (CRC) without prophylactic colectomy4. The abundant polyps in these individuals are an excellent model for studying early tumour progression, because they arise at different times and retain information about the initial state of the lesion that is easily obscured by later selective sweeps. We sampled the entire process of CRC development, in which normal mucosal epithelium can form premalignant polyps, some of which later may develop dysplastic features and then transform into malignant adenocarcinomas (AdCas).

Colorectal polyps are canonically assumed to develop after an epithelial cell acquires the required oncogenic mutation(s) to clonally expand and create a detectable lesion composed of the descendants of the original initiating mutant cell10,11,12. However, some studies of human premalignant polyps challenge this hypothesis. First, others have found that key early driver mutations (APC and KRAS) are sometimes not found in all cells in a polyp, as would be expected if they initiated a clonal expansion13. Instead, there can be several unique subclonal mutations in APC or KRAS in a single polyp13. Second, limited in situ hybridization and single-crypt sequencing data have indicated that polyps can be composed of epithelial cells from different genetic lineages that diverged early in the patient’s lifetime, before the polyp initiated2,3. Both observations indicate that polyp initiation is not always monoclonal or the result of clonal expansion of a single cell. Instead, colorectal polyp initiation may often be polyclonal or the result of expansion of a group of genetically diverse cells, possibly driven by cell-extrinsic effects.

To evaluate whether polyclonal initiation is evident in FAP patient colorectal premalignant colon polyps, to determine its prevalence and to investigate the role of somatic driver mutations in CRC development, we performed an evolutionary analysis using whole-genome sequencing (WGS) and/or whole-exome sequencing (WES) data from 123 total samples taken either from normal mucosa, benign polyps without evidence of dysplasia or dysplastic premalignant polyps, or malignant AdCas from six patients with FAP. The profiled lesions were of varied age, size and physical location in the colon and represent an important resource in the NIH Human Tumour Atlas Network (HTAN)14 for understanding premalignant progression in CRC. This bulk genomic profiling was used to detect spontaneously occurring somatic mutations15,16,17,18, which are a record of past mutational processes and evolutionary dynamics19,20,21,22. We computationally analysed these bulk sequencing data to infer clonal architecture and dynamics without longitudinal sampling.

We found that 40% of benign and 28% of dysplastic polyps had evidence of polyclonal initiation. Polyclonal samples were less likely to have clonal APC or KRAS driver mutations, indicating that these oncogenic mutations do not always drive monoclonal expansions that lead to tumour initiation. Single-crypt WGS of three polyps and one AdCa from two further individuals with FAP indicates the polyps are often polyclonal. Two of these polyps show early genetic divergence of individual crypts, including one polyp with several unique APC second-hit mutations acquired after lesion initiation, consistent with polyclonal initiation. Taken together, our results indicate that polyclonal initiation of premalignant polyps is common, challenging the classic model of monoclonal initiation.

FAP polyps have canonical CRC drivers

To investigate early events in CRC lesion initiation, we performed WGS on fresh frozen histologically normal mucosal tissue and several benign and dysplastic polyps from six individuals with FAP (Fig. 1a, Supplementary Tables 13, Supplementary Methods and Supplementary Fig. 1). Five of these individuals had a germline truncating heterozygous mutation in APC detected in all tissue samples and blood (Supplementary Figs. 26), whereas A014 had no detectable germline mutation in APC as assessed by WGS or clinical diagnostic genetic testing. Because this individual has no family history of FAP (Supplementary Table 1), this finding is consistent with reports that roughly half of index cases have no pathogenic variants detectable in the blood and up to 20% of index cases with a detectable APC mutation have a somatic mosaic variant that might not be detected in all tissues23,24.

Fig. 1: Colorectal polyps in patients with FAP harbour common CRC driver mutations.
Fig. 1: Colorectal polyps in patients with FAP harbour common CRC driver mutations.
Full size image

a, Overview of the FAP cohort (six patients, n = 123 samples, WGS and/or WES). b, Oncoplot summarizing the landscape of non-silent SNVs, small insertions/deletions, copy-number gains/amplifications (Gain/Amp), deep deletions (Loss/Del) and copy-neutral LOH (cnLOH) in CRC driver genes based on WGS when available or WES. Only somatic mutations are shown. WGD, whole-genome doubling. c, UpSet plot showing the combination of somatic mutations in APC, KRAS, FBXW7 and TP53. d, ppVAF distributions for all mutations (left) and APC second-hit driver mutations (right) in mucosa (green), benign polyps (orange) and dysplastic polyps (blue) from WGS data. e,f, ppVAF posterior distributions of APC mutations in samples with a single APC second-hit mutation (e, separated into clonal (left) or subclonal (right) APC mutations) or several APC second-hit driver mutations (f), ordered by ascending ppVAF point estimates. Asterisks in f denote clustered APC second-hit mutations (Supplementary Fig. 7). APC second-hit mutation ppVAF distributions are shown in blue, and KRAS driver mutation ppVAF posterior distributions are shown in pink. WGS was used in samples for which it was available; otherwise WES data were used to estimate ppVAFs. g, Clonal driver mutation fractions (Supplementary Methods) from WGS (when available) or WES data for benign and dysplastic polyps showing mutations in APC (benign, n = 31; dysplastic, n = 53 mutations), KRAS (benign, n = 6; dysplastic, n = 25 mutations) and FBXW7 (benign, n = 3; dysplastic, n = 5 mutations). Error bars are 95% Bayesian credible intervals (Supplementary Methods). Illustrations in a were created using BioRender. Ma, Z. (2025) https://BioRender.com/4v5zpca.

Somatic alterations, including single-nucleotide variants (SNVs) and copy-number alterations (CNAs), in driver genes associated with CRC were detected in nearly all polyp samples (91% of benign and 95% of dysplastic polyps) but rarely in normal mucosa (10%) (Fig. 1b). Driver alterations in APC and KRAS were common in polyps; 83% of benign polyps and 82% of dysplastic polyps (adenomas) had second-hit APC driver mutations or APC loss of heterozygosity (LOH), whereas only 7% of normal mucosal samples had biallelic APC inactivation. Furthermore, 17% of benign and 35% of dysplastic polyps had both APC and KRAS somatic mutations (Fig. 1c). The high prevalence of APC and KRAS somatic mutations in polyp samples is consistent with previous reports in both FAP and sporadic adenomas25,26 (Extended Data Figs. 1 and 2) and indicates that these mutations are associated with polyp initiation. Further CRC driver mutations were found in FBXW7, TP53 and other proliferative signalling associated genes in some lesions (Fig. 1b). Although recurrent gains of chromosomes 7, 13 and 20 were noted, as observed in other premalignant colorectal lesions25,26, polyps were generally less aneuploid than malignant CRC samples (Extended Data Fig. 3). The fraction of genome altered was low in both benign (median 0.003) and dysplastic (median 0.03) polyps and increased with disease stage in both our FAP cohort and previously published FAP25 and sporadic cohorts26 (Extended Data Fig. 3). Thus, FAP polyps harbour early oncogenic events, most frequently somatic SNVs and more rarely CNAs.

Clonal drivers are often absent in FAP polyps

To better assess the role of driver mutations in polyp initiation and growth, we computed the purity- and ploidy-adjusted variant allele frequencies (ppVAFs) of all somatic mutations detected (Supplementary Methods). The ppVAF is the estimated fraction of epithelial cells that have a mutation and is analogous to the cancer cell fraction in malignant samples27. The cancer cell fraction is usually calculated using tumour purity values estimated from CNA frequencies28, but our normal and polyp samples have relatively few CNAs compared to CRCs (Extended Data Fig. 3), making copy-number-based purity estimation difficult. Instead, we used the epithelial cell fractions previously measured and reported in further samples from this patient cohort using single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq)29 to estimate the distribution of sample purities for normal mucosa, polyps and AdCas (Supplementary Methods, Supplementary Notes 1 and 2 and Extended Data Fig. 4). The ppVAF point estimates and posterior probability distributions indicate that somatic APC mutations are found at higher allele frequencies than other somatic mutations, reinforcing the idea that they occur early in polyp development and/or experience positive selection (Fig. 1d).

However, many polyp samples in our cohort lacked clonal driver mutations (Fig. 1e–g). This observation is not consistent with the hypothesis that APC second-hit mutations or other driver mutations are required for monoclonal expansions leading to polyp formation. Furthermore, 8 of the 69 polyps harboured more than one APC somatic driver mutation (Fig. 1f). Although five represent clustered mutational events that occurred in the same subclone (Supplementary Fig. 7) or probable biallelic loss in a single subclone in the patient without a germline APC mutation (A014), two samples (A001C107 and G055) show evidence of two subclonal APC mutations. Because a strong selective advantage for several truncating APC mutations to appear in the same subclone with a germline heterozygous mutation is unlikely, this observation indicates that several unique APC mutations existed in the epithelial cell population that initiated the lesion and/or further APC mutations were acquired by cells in the lesion that did not have an APC second hit at the time of lesion initiation. Together, these results indicate that polyps are not always caused by a monoclonal expansion of an epithelial cell with a driver mutation.

Polyps have subclones that diverged early

Because we observed that polyp initiation is not always associated with monoclonal expansion of one or more driver mutations, we wondered how lesion initiation occurred in samples without clonal driver mutations. We hypothesized that several colonic crypts (clonal units of epithelial homeostasis derived from a small number of stem cells30) might collectively initiate each polyp, rather than a single mutant crypt. This would lead to a polyp composed of several distinct genetic lineages that diverged early. We defined initiation of a single polyp from several crypts as ‘polyclonal’ and used the WGS data to determine whether polyp initiation was probably monoclonal or polyclonal (Fig. 2a). In monoclonal samples, clonal expansion of a single crypt leads to all somatic mutations in that founding crypt being present in all epithelial cells in the resulting lesion. This sweep leads to many detectable clonal mutations in the sequenced polyp. By contrast, polyclonal lesions will probably have fewer clonal mutations, because any clonal mutations in the lesion must be present in all founding crypts in a polyp (Fig. 2b).

Fig. 2: Some FAP premalignant polyps had an early most recent common evolutionary ancestor and are probably polyclonal.
Fig. 2: Some FAP premalignant polyps had an early most recent common evolutionary ancestor and are probably polyclonal.
Full size image

a, Schematic showing that polyclonal polyps are initiated from several colorectal crypts in the intestinal mucosa (green crypts), whereas monoclonal polyps are the result of a clonal expansion of a single crypt (pink). b, Schematic showing that monoclonal lesions have an MRCA occurring after lesion initiation (dotted line) because they initiate from a single crypt, leading to many detected clonal mutations (purple Xs on tree). Bottom parts show hypothetical ppVAF distribution shapes for monoclonal and polyclonal samples, with clonal mutations marked in purple. c, Number of expected clonal SNVs detected in WGS data from samples from patients with FAP. Samples with fewer than 63 clonal SNVs (dashed line, corresponding to an MRCA at less than 1 year old) were classified as having an early MRCA and are probably polyclonal. d, ppVAF distributions for example monoclonal (left) and polyclonal (right) lesions. e, ppVAF distributions for all monoclonal (left) and polyclonal (right) lesions combined. f, Fraction of WGS samples classified as polyclonal based on estimated clonal SNV count. Fractions were estimated from n = 25 mucosal samples, n = 15 benign polyp samples and n = 54 dysplastic polyp samples. Error bars are 95% Bayesian credible intervals (Supplementary Methods). Illustrations in a were created using BioRender. Ma, Z. (2025) https://BioRender.com/4v5zpca.

We estimated the number of clonal SNVs in each sample using the ppVAF posterior probability distributions computed for each somatic mutation from the sequencing data. We classified each somatic SNV as clonal or subclonal based on its ppVAF posterior probability distribution and counted the number of clonal mutations in each sample (Supplementary Methods and Supplementary Note 1). We used these estimated clonal SNV counts to determine whether lesions were probably monoclonal or polyclonal. Because most SNVs in these samples have clock-like mutational signatures31 (Extended Data Fig. 5), the number of clonal mutations can be used to estimate an upper bound on the age of the most recent common ancestor (MRCA) for the sample (Supplementary Methods). Samples with an MRCA earlier than the time at which the lesion initiated are by definition polyclonal. In polyclonal samples, several genetic lineages must have existed during lesion initiation, so these genetic lineages must therefore have diverged from the MRCA before initiation. If the MRCA occurred at or after lesion initiation, the lesion is composed of cells that originated from a single genetic lineage at the time of initiation and is monoclonal. We observed that, whereas normal mucosal samples had few (fewer than 15) clonal SNVs, benign and dysplastic polyps had a much wider range of expected clonal SNV counts (0–3,882 SNVs) (Fig. 2c). The low clonal mutation count in the normal samples is consistent with previous observations indicating that normal intestinal mucosa is highly polyclonal, often having few mutations shared between nearby crypts32.

To investigate the prevalence of polyclonal polyps, we estimated the frequency of polyps with an MRCA that existed before one year of age (fewer than 63 clonal SNVs, using WGS data) (Supplementary Methods), which is far earlier than polyps appear in most individuals with FAP33 and is similar to the estimated clonal mutation counts of the polyclonal normal mucosa. We found that 6 out of 15 benign polyps (40%) had an early MRCA, indicating that they are polyclonal, and 15 out of 54 dysplastic polyps (28%) had an early MRCA (Fig. 2d–f). Samples referred to as polyclonal generally do not have a clonal mutation peak visible in the raw VAF or ppVAF distributions (Fig. 2d,e and Supplementary Figs. 817), consistent with their early divergence time. WES data from some samples with WGS data as well as 27 further samples from our cohort were also subjected to clonal SNV count thresholding to estimate the prevalence of early-diverging samples and showed a similar fraction of these lesions (Extended Data Fig. 6). Additionally, clonal SNV counts indicate that polyclonality is prevalent in premalignant and malignant multiregion WES samples from a different published FAP cohort25 (Extended Data Fig. 7 and Supplementary Methods). A similar analysis using published multiregion WES data from individuals without a hereditary CRC predisposition syndrome26 indicates that polyclonal initiation also occurs in sporadic premalignant adenomas but not AdCas (Extended Data Fig. 7).

These analyses indicate that polyclonal initiation is probably present in at least 25–40% of polyps in individuals with FAP. Selective sweeps occurring after lesion initiation can obscure a polyclonal population structure, so the true fraction of lesions that initiate polyclonally may be higher. This is particularly true in older lesions or lesions containing driver mutations with a substantial selective advantage, highlighting the importance of studying early premalignant adenomas. Amongst dysplastic polyps, those of monoclonal origin had shorter telomeres than polyclonal lesions (P = 0.0015; Fisher’s combined P value from per-patient Wilcoxon rank-sum tests) (Supplementary Methods and Extended Data Fig. 8), consistent with a higher number of total cell divisions due to earlier lesion initiation and/or increased proliferation rate. Early, proliferative lesions would have shorter telomeres when sampled and would be more likely to seem monoclonal, particularly if there were expansion of an advantageous clone.

Polyclonal polyps have subclonal drivers

We used these monoclonal/polyclonal classifications to further investigate the role of driver mutations in polyp initiation and progression. Most monoclonal and polyclonal polyps have an APC second-hit driver mutation (Fig. 3a). As expected, the ppVAFs of APC somatic driver mutations are lower in polyclonal samples than monoclonal samples (Fig. 3b; P = 0.003 for benign, P = 0.005 for dysplastic polyps, Wilcoxon rank-sum test), indicating that these mutations are not the sole drivers of lesion initiation and growth and have not caused a selective sweep in the polyclonal polyps. Similarly, KRAS mutations are more likely to be found at lower subclonal frequencies in dysplastic polyclonal polyps than dysplastic monoclonal polyps (P = 0.074 for benign, P = 0.012 for dysplastic polyps, Wilcoxon rank-sum test; Fig. 3c,d). These data are consistent with the notion that premalignant lesions are not always the product of monoclonal expansions associated with common early CRC driver mutations. However, such driver mutations probably have a selective advantage that often increases their frequency and can lead to selective sweeps in more advanced lesions.

Fig. 3: Polyclonal samples had subclonal driver mutations and were not the result of the expansion of an APC and/or KRAS mutated clone.
Fig. 3: Polyclonal samples had subclonal driver mutations and were not the result of the expansion of an APC and/or KRAS mutated clone.
Full size image

a, Fraction of monoclonal and polyclonal WGS samples with second-hit APC somatic driver mutations. b, ppVAFs of APC second-hit mutations in benign (left) and dysplastic (right) polyps. c, Fraction of monoclonal and polyclonal WGS samples with KRAS driver mutations. d, ppVAFs of KRAS driver mutations in benign (left) and dysplastic (right) polyps. Error bars in a and c are 95% Bayesian credible intervals (Supplementary Methods), and fractions in a and c were estimated from n = 25 mucosal, n = 6 polyclonal benign, n = 9 monoclonal benign, n = 15 polyclonal dysplastic and n = 39 monoclonal dysplastic samples.

Single-crypt WGS shows polyclonality

We performed single-crypt isolation and WGS from polyps and AdCas from two further patients with FAP (Fig. 4, Extended Data Figs. 9 and 10, Supplementary Methods and Supplementary Table 4) to examine the clonal relationships between cell populations within each lesion. Lesions from patients with FAP were dissected into several regions before single-crypt isolation to better determine the association between spatial proximity and genomic similarity. After filtering out low-quality crypts and polyps/tumours, we built phylogenies using single-crypt WGS data from the remaining three polyps and one AdCa sample (Supplementary Methods and Supplementary Note 3). We first focused on the polyp for which the most crypts were sequenced (FAP03_P2; 20 single crypts included after filtering). We found that crypts within the polyp diverged early in life, with no mutations shared by all crypts sequenced in the polyp (Fig. 4c). This finding indicates that several distinct genetic lineages with a very early common ancestor combined to form the polyp. The lack of clonal somatic mutations in this polyp is reflected in the structure of its single-crypt phylogeny (Fig. 4c), with long terminal branches and no trunk, as hypothesized in the polyclonal schematic shown in Fig. 2b. This high degree of polyclonality seems spatially structured, with one polyp region (R4) seeming to be monophyletic (Fig. 4c), with 1,025 mutations shared between and exclusive to all 5 crypts in the region (Extended Data Fig. 10b). This implies that the crypts in R4 share a more recent common ancestor than the rest of the polyp, which may have expanded clonally to create that region. By contrast, the other regions are polyclonal mixtures of crypts from different genetic lineages.

Fig. 4: Single-crypt phylogenies based on WGS indicate that FAP polyps are polyclonal, whereas adenocarcinomas are monoclonal.
Fig. 4: Single-crypt phylogenies based on WGS indicate that FAP polyps are polyclonal, whereas adenocarcinomas are monoclonal.
Full size image

a, Schematic showing collection of polyp P2 from patient FAP03, regional dissection, single-crypt isolation and WGS procedure (Supplementary Methods and Supplementary Information). b, Images of individual isolated crypts from FAP03_P2. c, Single-crypt phylogeny reconstructed from single-crypt WGS data from polyp P2, patient FAP03 and adjacent normal mucosa (blue). Putative CRC-associated mutations are highlighted. The spatial region from which each crypt originated is indicated by the tip label colours (Supplementary Table 4). d, Schematic showing collection of CRC lesion T3 from patient FAP01 as well as regional dissection of the lesion. e, Images of individual isolated crypts from FAP01_T3. f, Single-crypt phylogeny reconstructed from lesion T3 from patient FAP01 and normal mucosa (blue). Putative CRC-associated mutations are highlighted, with mutations denoted in grey if they were filtered out of some samples but not others or if the sequencing depth at the mutated site was too low to detect the variant (Supplementary Methods and Supplementary Note 3, section 5). The spatial region from which each crypt originated is indicated by the tip label colours. Illustrations in a and d were created using Biorender (Ma, Z. (2025) https://BioRender.com/4v5zpca) and FigDraw (https://www.figdraw.com).

The single-crypt WGS analysis also showed putative cancer driver mutations in distinct subpopulations within the polyp. Importantly, no driver mutations are shared by all crypts in the polyp. Instead, there were three truncating frameshift mutations in APC occurring in independent subclones (Fig. 4 and Extended Data Fig. 9c) that either arose after lesion initiation or were present at subclonal frequencies in the initiating cell population. One of these APC second-hit mutations is found in all crypts in region R4, raising the possibility that it caused a clonal expansion that dominated that part of the polyp. These observations indicate that this polyp did not initiate from a clonal expansion driven by a single APC second hit but rather is a mixture of epithelial cell clones with independent growth advantages.

By contrast, single-crypt WGS of the malignant AdCa tumour sample FAP01_T3 showed a monoclonal expansion, with 479 mutations shared by all 10 crypts in the sample (Fig. 4d–f). This expanded clone includes a second-hit stopgain mutation in APC, a missense driver mutation in SMAD4 (R361C) and a truncating mutation in CTNNB1 (beta-catenin) (Fig. 4f). Lower sequencing depth in some loci in some samples makes it more difficult to determine which driver mutation(s) are found in all crypts and possibly responsible for the initial expansion; in particular, low coverage (less than or equal to 3×) at the APC second-hit mutation locus in crypts R6_G9, R5_G4 and R1_G7 limited our ability to detect the mutation in these samples (Supplementary Note 3). However, the phylogenetic structure of this tumour (long trunk from which all tumour crypts originate) clearly indicates that it is monoclonal (compare phylogeny to schematics in Fig. 2c) and provides a contrasting example to the phylogenetic pattern in the polyclonal polyp FAP03_P2.

Two further polyps from these patients with single-crypt sequencing data (FAP01_P6 and FAP03_P1) show different patterns (Extended Data Fig. 9). The 908 clonal mutations in the 7 crypts from polyp FAP01_P6 indicate that it is monoclonal. The lesion has a clonal expansion, possibly resulting from the APC LOH event found in all crypts from this polyp and further fuelled by a subclonal KRAS mutation (Q61R), consistent with most of the APC second-hit driven monoclonal polyps in our bulk sequencing dataset. By contrast, the single-crypt WGS data from polyp FAP03_P1 are more difficult to interpret. Although very few clonal mutations are present (21 shared by all 7 crypts), indicating early genetic divergence of the crypts within it, the somatic APC truncating mutation present in 6 out the 7 crypts (which share only 57 other mutations) raises the possibility that this APC second hit occurred very early in life and may have contributed to lesion initiation. In summary, this single-crypt sequencing dataset provides high-resolution orthogonal validation of polyclonal initiation in FAP and highlights the mutational heterogeneity in polyclonal lesions.

Discussion

Although it is widely assumed that malignancies are the product of clonal expansions from single mutant cells of origin10, limited case studies in CRC and other cancers indicate that premalignant lesions can be polyclonal2,3,34,35. However, systematic assessment of the prevalence of polyclonality in patient samples is still lacking, despite the utility of sequencing to detect this phenomenon. To address this, we used WGS data from 69 colorectal polyps and found 40% with benign histology and 28% with dysplasia from individuals with FAP originated from several colon crypts. Furthermore, analysis of single-crypt WGS data supported our conclusion that premalignant colorectal lesions in individuals with FAP can initiate polyclonally and showed local expansion of subpopulations within a polyp with unique APC second-hit mutations. These findings point towards a possible role for cell-extrinsic mechanisms in tumour initiation and indicate that this process may involve cell–cell interactions in a premalignant multicellular ecosystem as well as cell-intrinsic effects of driver mutations.

The finding that many polyps are polyclonal has implications for understanding the molecular and microenvironmental determinants and dynamics of tumour initiation and progression. Polyclonal initiation provides a genetically diverse starting point for premalignant evolution. This intralesion heterogeneity can persist because sweeps are relatively rare, with long periods of stasis between such events. The finding that premalignant polyps may experience several selective sweeps before accruing genetic alterations and transforming to CRC is consistent with our previous findings that subsequent evolution is often effectively neutral19,20,21. Common drivers (APC, KRAS) may not directly initiate the lesion by causing a clonal expansion but instead may be present at subclonal frequencies in the initiating population or be acquired after lesion initiation. This process can lead to monoclonal conversion of a previously polyclonal lesion and may lead to underestimation of the frequency of polyclonal initiation in our dataset, indicating that the extent of this phenomenon may be substantially higher than reported here.

Our study has several limitations. First, our bulk sequencing analysis does not directly estimate the purity of each sample individually, but rather uses the distribution of epithelial cell fractions estimated from scATAC-seq data from the HTAN FAP patient cohort as a measure of purity. Although this procedure avoids the pitfalls of copy-number-based algorithms that render them inappropriate for non-malignant tissue samples (Supplementary Methods, Extended Data Fig. 4 and Supplementary Note 2), samples with lower purity due to stromal or immune cell inclusion may be falsely called polyclonal. Although this cannot account for all polyclonal samples we identify by means of bulk genomic sequencing (Supplementary Note 2), further profiling with strategies that either isolate epithelial cells for sequencing (such as single-crypt WGS) or directly measure epithelial cell fraction in the same sample will be instrumental to accurately estimate the fraction of polyclonal samples. Additionally, we focused on CRC initiation in the context of a hereditary cancer predisposition syndrome (FAP). Focus on this patient population also introduces the possibility that some of our polyclonal polyps could be caused by stochastic collision of independently initiated lesions due to the increased density of colorectal polyps in patients with FAP. However, recently published studies indicate that polyclonality is a general phenomenon in colorectal lesions. Our findings are consistent with studies in the analogous APC mutant murine model, which demonstrate cellular cooperativity during colorectal tumorigenesis35 and polyclonal tumour origins36,37,38,39. Moreover, analysis of clonal SNV counts in sporadic CRC adenomas similarly indicates that 29% are polyclonal36,38. Thus, polyclonality is unlikely to be restricted to hereditary colon cancers or exclusively caused by random polyp collisions.

Our findings raise questions about the events necessary and/or sufficient for cancer development. Here, we did not identify what mechanistic role these further epithelial cell clones are playing in tumorigenesis, and follow-up studies will be required to determine how these clones function and interact. Evidence for the role of cell–cell interactions in tumour development between epithelial cells indicates that both cooperation between premalignant clones40 and recruitment of neighbouring non-malignant epithelium36,41 may contribute to polyclonal initiation, although other interactions may also be involved. Furthermore, signals from non-epithelial cells in the microenvironment, including fibroblasts and immune cells42,43,44, may lead to malignant phenotypic changes in several colonic crypts at once, resulting in a polyclonal lesion. These findings also raise questions about the role of canonical driver mutations in tumour initiation: how does acquisition of an oncogenic mutation in an individual cell lead to expansion of a diverse group of cells, not all of which have the mutation? More generally, polyclonal initiation may be common across diverse tissue types34,35. Indeed, a recent study in premalignant pancreatic cancer lesions demonstrated extreme multifocality, consistent with polyclonal origins34. The conceptual and analytic framework we outline for assessing polyclonality from sequencing data can be extended to other premalignant lesions to systematically investigate this underappreciated phenomenon in cancer initiation.

Methods

See Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.