Main

Whole-genome doubling (WGD) is found in more than 30% of solid cancers and leads to increased rates of metastasis, drug resistance and poor therapeutic outcomes1,2,3,4,5. Often observed on a background of TP53 mutation, WGD leads to increased chromosomal instability (CIN) and karyotypic diversification4,7,8. Errors in chromosome segregation often lead to cytokinesis failure and the generation of polyploid cells9, indicating that WGD may be an active process during tumour evolution7. Furthermore, phenotypic consequences, such as chromatin and epigenetic compensatory changes10, replication stress11, and cell-cycle dysregulation10,11, enable cell persistence despite the expected deleterious effects of WGD. In patient tumours, the impact of WGD on tumour evolution, cancer-cell phenotypes, and the tumour microenvironment remains poorly understood, being limited in part by bulk whole-genome sequencing (WGS) approaches that do not allow the identification of WGD subpopulations. Crucially, reports from in vitro and patient-derived xenograft models have demonstrated that the temporal and evolutionary dynamics of WGD can be captured at single-cell resolution12,13. We therefore sought to use single-cell approaches to study WGD in individuals with high-grade serous ovarian cancer (HGSOC), an archetypal tumour of genomic and chromosomal instability. Our results establish WGD as both an ongoing evolutionary process and an important covariate of inflammatory signalling and immunosuppression in HGSOC.

Cohort and single-cell WGS

We generated a multimodal mapping of aneuploidy, genomic instability, and cell-intrinsic and tumour microenvironment phenotypic read-outs (Extended Data Fig. 1a). We studied a cohort of 41 treatment-naive HGSOC patients14 (Fig. 1a, Extended Data Fig. 1b, Methods and Supplementary Tables 1 and 2) using single-cell whole-genome sequencing (scWGS), multiplexed immunofluorescence and single-cell RNA sequencing (scRNA-seq), applied to 70 multi-site samples. The cohort included 18 homologous recombination-deficient (HRD)-Dup (enriched in duplications; BRCA1 mutant-like) and 8 HRD-Del (enriched in deletions; BRCA2 mutant-like) cases, as well as 14 HR-proficient foldback inversion (FBI)-bearing tumours and one tandem duplicator tumour, as inferred by integrating point mutations and structural variants13,14,15.

Fig. 1: WGD is a dynamic mutational process.
Fig. 1: WGD is a dynamic mutational process.
Full size image

a, Overview of the MSK SPECTRUM cohort and specimen collection workflow, including numbers of patients, sites and samples processed by various means. H&E, haematoxylin and eosin; IF, immunofluorescence. b, Study design for analysing cellular ploidy and WGD in single cells using scWGS with the DLP+ protocol. The plot shows the classification of WGD multiplicity in cancer cells (0, 1 or 2 WGDs) using the fraction of the genome with major copy number (CN) ≥ 2 versus the mean allele CN difference; n = 30,260 cells. BAF, B-allele frequency; TCN, total copy number. c, Top, age at diagnosis, mutation signature, BRCA1/BRCA2 mutation status, and WGD class. Middle, distribution of cell ploidy of individual cells for each tumour, coloured by the number of WGDs. Bottom, percentage of WGDs, number of cells per patient, and fraction of cells in the minority WGD multiplicity state. Bottom right, illustrations of cell classifications. d, Heatmaps of total copy number (left) and allelic imbalance (right) for patient OV-045, with predicted WGD multiplicity and site of resection for each cell annotated. The 1×WGD population was downsampled from 1,857 to 200 cells for visualization, and the full 0×WGD and 2×WGD populations, numbering 18 and 44 cells, respectively, are shown. A-Hom, homozygous for haplotype A; A-gained, allelic imbalance with more copies of haplotype A (analogous for haplotype B); Balanced, equal copies of the two haplotypes.

To generate scWGS data, we flow-sorted tumour-derived single-cell suspensions to remove CD45+ immune cells and prepared libraries following the direct library preparation (DLP+) protocol16 (Methods and Supplementary Table 2). Sequencing yielded 100,054 single-cell whole genomes (median, 1,720 per patient) with a median coverage depth of 0.060 and a median coverage breadth of 0.057 per cell (Extended Data Fig. 2a,b and Supplementary Table 3). After extensive quality control, including filtering out non-malignant cells and doublets using the optical components of DLP+, we retained 30,260 high-quality tumour-cell genomes for downstream analysis (Extended Data Fig. 2c,d, Methods, Supplementary Note and Supplementary Table 4). The aggregated copy-number landscape was as expected for HGSOC (Extended Data Fig. 2e) and correlated with clinical panel-based bulk sequencing (Extended Data Fig. 2f) and matched bulk WGS (Extended Data Fig. 2g). From the scWGS data, we inferred the number of WGD events in the evolutionary history of each tumour cell (WGD multiplicity), based on allele-specific copy-number profiles3,17 (Fig. 1b and Extended Data Fig. 2h,i). Per-cell WGD multiplicity correlated with mitochondrial DNA copy number (Extended Data Fig. 2j), fraction of overlapping reads (Extended Data Fig. 2k), and cell size, as measured by the optical components of DLP+ (Extended Data Fig. 2l), providing orthogonal validation based on known correlates of nuclear genome scaling16,18.

Ongoing WGD

Intra-patient cellular WGD heterogeneity was pervasive across the cohort, with 40 of 41 patients exhibiting coexisting WGD multiplicities (Fig. 1c and Supplementary Note). For example, patient OV-045 (Fig. 1d) simultaneously had 0×WGD cells (1%; Extended Data Fig. 2m), a majority of 1×WGD cells (97%; Extended Data Fig. 2n) and a small fraction of 2×WGD cells (2%; Extended Data Fig. 2o). In total, 4% of all tumour cells across the cohort (n = 1,213 cells) represented non-majority WGD multiplicities (median of 2.5% of cells per patient; Extended Data Fig. 2p). Mixed WGD multiplicities were observed across sites for 16 out of 21 patients with multi-site sequencing, consistent with WGD as an ongoing process (Supplementary Note). As 39 out of 41 patients’ tumours were dominated by a single WGD multiplicity (more than 85% of cells), we divided tumours into two categories: WGD-high (over 85% of cells had at least 1×WGD; 27 out of 41 patients); or WGD-low (fewer than 15% of cells had at least 1×WGD; 14 out of 41 patients). The two tumours with intermediate (50–85%) proportions of cells having at least 1×WGD were grouped with WGD-high because they had large WGD clones. WGD-high tumours constituted 66% of the cohort, were enriched for FBI and HRD-Del mutation signatures, and occurred in patients who were significantly older at diagnosis, concordant with previous bulk genome sequencing studies14,19 (Extended Data Fig. 2q–t). Thus, the WGD-high fraction is consistent with previous bulk estimates of WGD prevalence across patients17. However, single-cell analysis established that WGD is ubiquitous across patients and exists as a distribution over coexisting 0×WGD, 1×WGD and 2×WGD cells in tumours, congruent with WGD as an ongoing mutational process.

Evolutionary histories of WGD clones

We next inferred evolutionary histories and WGD timing for each tumour to characterize the role of WGD in HGSOC clonal evolution. We developed doubleTime, a multi-step computational approach that uses somatic single nucleotide variants (SNVs) to estimate the timing of clonal divergence and WGD expansion(s) in each tumour (Fig. 2a, Methods and Supplementary Data 12). After excluding two patients because of technical limitations (OV-024 and OV-125; Methods), we observed four classes of WGD evolution: truncal WGD, parallel WGD, subclonal WGD, and unexpanded WGD. Truncal WGD, defined as a single WGD event ancestral to all cells and an absence of residual 0×WGD cells, was observed in 21 patients (Fig. 2b and Extended Data Fig. 4a; see Fig. 1c, Extended Data Fig. 3 and Supplementary Note for a diagram and analysis of residual 0×WGD cells). Parallel WGD, defined by multiple clones with different ancestral WGD events, was observed in two patients, OV-025 and OV-045 (Fig. 2b,c). Remarkably, for both of these patients, multiple WGD clones coexisted in different anatomical sites. In patient OV-025, all clones were present in both the right adnexa and omentum, and in patient OV-045, the left adnexa harboured one of the three WGD clones, whereas the right adnexa, omentum and peritoneal tumours were mixtures of all three WGD clones. Subclonal WGD, defined by a WGD clone coexisting with 0×WGD cells, was seen in five patients (Fig. 2b; further details below). Unexpanded WGD, defined as the absence of a discernible WGD clone, nevertheless included small populations of 1×WGD cells in all but one of the remaining 11 patients (Fig. 2b and Extended Data Fig. 4a).

Fig. 2: Evolutionary timing of WGD events from SNVs.
Fig. 2: Evolutionary timing of WGD events from SNVs.
Full size image

a, Schematic of the approach for timing WGDs in SNV clones (Methods). cnLOH, copy-neutral loss of heterozygosity. b, Clone phylogenies and WGD timing for 18 patients (see Extended Data Fig. 4a for another 21 patients). Branch length shows the number of age-associated SNVs (C-to-T at CpG sites) assigned to each branch, adjusted for coverage-depth-related reduction in SNV sensitivity. Expanded WGD events are shown as triangles at the predicted location along WGD branches, coloured by relative timing. Branches are coloured by WGD multiplicity. Bar plots show, for each leaf, the fraction of cells in each WGD multiplicity and the fraction of cells from each anatomical site. OV-045 and OV-075 (starred) each harboured 0×WGD cells not captured in the doubleTime clone tree. The x axis is labelled with the SBMClone clone indices for each leaf. c, Variant allele frequency (VAF) of SNVs in two-copy LOH regions showing support for parallel versus shared WGD for patients OV-045 (left) and OV-025 (right). Each axis shows a different pair of clones, and each SNV is coloured according to its most likely variant copy numbers in the respective clones. SNVs that are assigned variant copy numbers 0/0 (absent from both clones) or 2/1 or 1/2 (inconsistent with the simple CNLOH WGD model) have been omitted. d, Histogram and rug plot showing the sensitivity-adjusted age-associated SNV count for WGD and diagnosis events for WGD-low (top, n = 14 patients) and WGD-high (bottom, n = 25 patients) tumours. Left, diagram showing the two time periods being measured by SNV counts. MRCA, most recent common ancestor. e, Fraction of additional-WGD cells in each clone plotted against the log binomial P value for the test that a clone has a greater fraction of additional-WGD cells than the average additional-WGD fraction across the cohort. Patients with P < 0.01 (dotted line) are annotated.

To refine our understanding of WGD heterogeneity, we timed key evolutionary events in each patient using age-associated C>T CpG mutations20. WGD-high tumours exhibited increased mutation time from conception to surgical resection compared with WGD-low tumours, similar to WGD versus non-WGD patients in previous bulk WGS analyses19. Although WGD events generally occurred early in tumour evolution19, a long tail of late events was also observed (Fig. 2d). In 8 out of 25 WGD-high tumours, the WGD event occurred more than 50% of the way through the tumour’s ancestral branch or after the most recent common ancestor (Fig. 2b,d and Extended Data Fig. 4a). Three of these late-WGD patients harboured residual populations of 0×WGD cells, consistent with a pre-WGD ancestral population coexisting with late-emerging WGD clones: OV-045 had 16 0×WGD cells (0.8%; Extended Data Figs. 3a and 4b), OV-075 had 30 0×WGD cells (3.3%; Extended Data Figs. 3c and 4c) and OV-081 had 216 0×WGD cells (35%; Fig. 2b). We speculate that the lack of residual 0×WGD populations observed in patients with earlier timing may indicate 1×WGD clonal sweeps, and therefore increased fitness associated with WGD in these patients.

Additional-WGD cells, those with one more WGD than the majority population (1×WGD in 0×WGD clones and 2×WGD in 1×WGD clones; Fig. 1c), were detected in 37 out of 41 patients, further exemplifying that WGD is ongoing. We investigated whether these additional-WGD cells shared common mutations indicative of clonal expansions (Fig. 2b,e). In patient OV-025, a small clone containing 40 2×WGD cells (and 4 1×WGD cells) harboured 296 clone-specific SNVs (Extended Data Fig. 4d). Subclonal WGD expansions in patients OV-006 (27 cells), OV-031 (7 cells) and OV-139 (17 cells) were too small to be detected by SNV analysis but nevertheless exhibited shared copy-number events across multiple WGD cells (Extended Data Fig. 4e–g). For other patients, unexpanded WGD cells were distributed across multiple clones and anatomical sites: 25 out of 31 patients had additional-WGD cells in multiple clones, and 14 out of 21 patients with multisite scWGS had additional-WGD cells in multiple sites (Extended Data Fig. 4h), indicative of ongoing WGD across clonal populations as a background mutational process.

Post-WGD genomic diversification

We then asked how WGD promotes genomic diversification and evolvability. First, we quantified cell-to-cell genomic heterogeneity using pairwise nearest-neighbour copy-number distance (NND) (Methods and Extended Data Fig. 5a). Mean NND increased with WGD multiplicity and was highest for additional-WGD cells (Fig. 3a). Some WGD-high tumours exhibited surprising levels of cellular diversity: in eight patients, the average difference between each cell and its most similar neighbour was more than 10% of the genome. The empirical distribution of NND values had a heavy tail (Extended Data Fig. 5b) consisting of cells with very distinct copy-number profiles. We therefore defined cells with NND above the 99th percentile of a beta distribution fit as divergent (Fig. 3b and Methods). These divergent cells exhibited substantial chromosome- and arm-level alterations relative to pseudobulk profiles (Fig. 3c and Extended Data Fig. 5c,d), with higher nullisomy rates across all tumours (Extended Data Fig. 5e). Increased nullisomy and lack of clonal expansion, as indicated by each cell’s unique copy-number profile, indicate that these cells have reduced proliferative capacity and decreased fitness, reminiscent of the ‘hopeful monsters’ identified in colorectal cancer organoids21. Divergent cells were present in 38 out of 41 patients (mean, 2.6% of cells), with higher rates in WGD-high tumours (Fig. 3d), and were more frequently additional-WGD cells (Extended Data Fig. 5f). Furthermore, the fraction of divergent cells was highest in late-WGD tumours and decreased with the age of the WGD event(s) (Extended Data Fig. 5g). Overall, these results suggest that expansion of WGD clones coincides with increased rates of catastrophic cell division.

Fig. 3: Impact of WGD on rates of chromosomal instability at single-cell resolution.
Fig. 3: Impact of WGD on rates of chromosomal instability at single-cell resolution.
Full size image

a, Nearest-neighbour distance in each WGD population, where distance is calculated as the fraction of the genome with a different CN. The centre line shows the median, box boundaries show quartiles, and whiskers indicate 1.5 × the interquartile range (IQR). b, QQ plot of the beta distribution fit versus empirical quantiles of NND values for all cells, including divergent cells (greater than the 99th percentile of the beta distribution). c, CN profile of an example divergent cell from OV-004 (top) compared with the pseudobulk CN of all cells for OV-004 (bottom). Each point is a 500-kb bin coloured by assigned CN state, and y axes show normalized read counts. Shaded regions indicate CN differences. d, Fraction of divergent cells. Boxplots are defined as in a. e, Method for inferring cell-specific CN events in non-divergent cells. Chrom., chromosome. f, Ploidy-normalized event counts per cell split by WGD multiplicity and WGD-high versus WGD-low tumour status. Mann–Whitney one-sided U-test significance (FDR corrected) is annotated: *1.0 × 10−2 < P ≤ 5.0 × 10−2, **1.0 × 10−3 < P ≤ 1.0 × 10−2, ***1.0 × 10−4 < P ≤ 1.0 × 10−3, ****P ≤ 1.0 × 10−4. Only significant comparisons are shown. Boxplots are defined as in a. g, High-resolution whole-slide immunofluorescence imaging to detect micronuclei (MN) and primary nuclei (PN), and quantify micronuclei rates. Scale bars, 10 μm. h, Mean primary nuclei area. Significance was calculated using a GEE model with patients as groups, annotated as in f. Boxplots are defined as in a. i, Micronuclei rates per slide. Each point is a tumour region of interest (ROI). Bar plots show total number of cGAS+ micronuclei across tumour ROIs (top) and total number of primary nuclei across tumour ROIs (bottom). Small tumour ROIs (fewer than than 103 primary nuclei) have been excluded. Shaded boxplots indicate patients highlighted in Extended Data Fig. 5j. Boxplots are defined as in a. j, Micronuclei rate per slide. Significance was calculated using a GEE model with patients as groups, annotated as in f.

To study post-WGD diversification in non-divergent cells, we computed cell-specific copy-number aberrations (CNAs) accrued since each cell’s immediate ancestor in a phylogenetic tree (Fig. 3e, Extended Data Fig. 5h and Methods). Per-cell rates of gains and losses affecting whole chromosomes, chromosome arms, and segments (>15 Mb) increased with WGD multiplicity for all event types. Rates normalized to account for genome size yielded the same trend, indicating that rate differences were not entirely attributable to increased chromosome number, but rather were indicative of increased systemic instability after the WGD (Fig. 3f and Methods). For instance, ploidy-adjusted chromosome (2.6-fold) and arm (2.4-fold) losses were more abundant in WGD-high 1×WGD cells than in WGD-low 0×WGD cells (P = 1.2 × 10−2 and P = 2.3 × 10−3, Mann-Whitney U-test, FDR adjusted). Chromosome and arm gains both exhibited 2.3-fold increases (P = 2.1 × 10−2 and P = 6.8 × 10−3, Mann-Whitney U-test, FDR adjusted). In a multivariate generalized estimating equations (GEE) model accounting for covariates (patient age, mutation signature and anatomical site), chromosome, arm, and segmental alterations remained significantly associated with WGD (Extended Data Fig. 5i).

We next sought to validate increased CNA rates in WGD populations through immunofluorescence quantification of cGAS+ ruptured micronuclei. Missegregated chromosomes can become encapsulated in micronuclei, which are structures that have aberrant, rupture-prone nuclear envelopes. Ruptured micronuclei release genomic double-stranded DNA (dsDNA) into the cytoplasm22,23,24, resulting in activation of innate immune signalling driven by the cytosolic dsDNA-sensing pathway cGAS-STING25. Thus, we reasoned that cGAS expression can act as an orthogonal in situ marker of missegregation. We performed multiplexed immunofluorescence on formalin-fixed and paraffin-embedded (FFPE) sections (measuring DAPI, cGAS, panCK, CD8, p53 and STING), using high-resolution whole-slide microscopy imaging. We used a deep-learning approach to perform whole-slide quantification of primary nuclei and cGAS+ ruptured micronuclei. From 102 quality-filtered slides spanning 37 patients, we detected 20,988,413 primary nuclei and 896,042 ruptured micronuclei (Fig. 3g and Methods). Tumour cell nuclear area was significantly higher for WGD-high than WGD-low tumours (P = 3.5 × 10−7; Fig. 3h), further supporting biophysical correlates of WGD. The micronuclei rate, computed as the number of ruptured cGAS+ micronuclei per primary nuclei in tumour regions, ranged from 0.001 to 0.543 across regions of interest (Fig. 3i and Supplementary Tables 5 and 6). Within-patient variation was also observed, reflective of spatially heterogeneous micronuclei rates across tissues. Importantly, the micronuclei rate was 3.3-fold higher in WGD-high tumours (P = 1.8 × 10−6; Fig. 3j and exemplar regions in Extended Data Fig. 5j and Methods), providing further evidence, orthogonal to scWGS, that WGD significantly impacts CIN.

Taken together, multiple forms of CIN, including chromosomal missegregations, catastrophic mitoses, and ruptured micronuclei, exhibited elevated rates in WGD cells, firmly linking WGD to increased CIN and cellular genomic diversification in HGSOC.

Evolvability of WGD clones

Given the increased CIN associated with WGD, we next used scWGS-based phylogenies to investigate the impact of this instability on tumour evolution (Fig. 4a and Methods). We categorized CNA events on ancestral (root) branches into those inferred to occur after WGD in the ancestral branches of WGD-high tumours (post-WGD), before WGD in ancestral branches of WGD-high tumours (pre-WGD) or on the ancestral branches of WGD-low tumours (non-WGD). Ancestral gains of chromosomes and arms were rare in general, although chromosome gains were significantly more numerous post-WGD than pre-WGD, similar to previous results26 (Fig. 4b). By contrast, losses of chromosomes and arms were an order of magnitude more frequent than gains in all contexts. The ratio of losses to gains on ancestral branches was also significantly higher than the same ratio computed for cell-specific event rates (Extended Data Fig. 5k). These results, together with simulation experiments27 (Supplementary Note), indicate that the commonly observed pseudo-triploid karyotypes in HGSOC are unlikely to arise through incremental gains on a diploid background, and instead arise from WGD and both pre-WGD and post-WGD losses.

Fig. 4: Modes of evolution post WGD.
Fig. 4: Modes of evolution post WGD.
Full size image

a, Pre- and post-WGD events illustrated for the ancestral branch of patient OV-044. Top, CN profile of the inferred ancestral non-WGD clone. Bottom, CN profile of the WGD clone. The plots in between show the CN changes (positive indicating gains, negative indicating losses) inferred to be pre-WGD and post-WGD, as illustrated on the left. b, Counts of ancestral arm and chromosome events detected across the cohort for non-WGD ancestral branches of WGD-low tumours, and pre- and post-WGD branches for WGD-high tumours. Bars and 95% confidence intervals show the distribution of counts on the given type of branch. Mann–Whitney U-test significance (FDR corrected) is annotated as: *1.0 × 10−2 < P ≤ 5.0 × 10−2, **1.0 × 10−3 < P ≤ 1.0 × 10−2, ***1.0 × 10−4 < P ≤ 1.0 × 10−3, ****P ≤ 1.0 × 10−4. Only significant comparisons are shown. c, Bar plots show counts of arm and chromosome events occurring post-WGD for all high-confidence clonal and subclonal WGD events detected across the cohort, split by clonality of the WGD (cell fraction threshold, 0.99). Bars and 95% confidence intervals show the distribution of counts on the root branch of the given type of WGD. Each bar indicates a clone that is labelled below and annotated above with the number of WGD events ancestral to the clone, as well as its clonality. The bottom bar plots show the fraction of cells from each patient that the clone represents. d, Boxplots summarizing c, annotated with FDR-corrected significance (Mann–Whitney U-test) as in b. NS, not significant.

To determine whether post-WGD losses were the result of immediate post-WGD instability (for example, divergent cells) or the accumulation of gradual losses, we analysed chromosome and arm CNAs in truncal and subclonal WGD clones (Fig. 4c). Truncal WGD clones harboured significantly more alterations than subclonal WGD clones (Fig. 4d), including three times as many whole chromosome and arm losses. The number of post-WGD events for some subclonal WGD clones was surprisingly low, and rarely (only one clone in OV-025) exceeded the average number of post-WGD events calculated for divergent cells (Fig. 4c). For example, the WGD clone in patient OV-081 (64% of cells) exhibited only two arm losses post-WGD compared with an average of 8.6 chromosome or arm events for divergent cells. For truncal WGD clones, the number of chromosome and arm losses was significantly correlated with the age of the WGD as measured by the number of C>T CpG mutations occurring from the WGD to the time of sample collection (Methods and Extended Data Fig. 5l). These results support a fitness model in which WGD cells are more likely to expand if they gradually accumulate post-WGD losses, rather than experience the large-scale alterations observed in divergent cells.

WGD and cellular phenotypes

Finally, we studied the phenotypic impact of WGD on cancer-cell-intrinsic, stromal, and immune cell transcriptional states using previously published patient- and site-matched scRNA-seq data14. We sought to determine whether WGD-specific phenotypic associations were independent of previously discovered links between mutation signatures, cellular states, and immune evasion in HGSOC14. We first focused on how WGD and CIN affect the cell cycle in cancer cells. WGD-high tumours exhibited a lower proportion of S-phase cells and a higher proportion of G1-phase cells, both cohort-wide and within the HRD-Dup subset (Extended Data Fig. 6a,b and Methods). Similarly, pseudotime inference of cell-cycle trajectories revealed distinct disruptions to cell-cycle progression in WGD-high versus WGD-low tumours (Extended Data Fig. 6c and Methods). In particular, MCM-complex genes involved in licensing of DNA replication origins at the G1/S transition (MCM2 and MCM6) were expressed earlier in the cell cycle in WGD-high tumours, together with factors involved in MCM-complex loading, such as CDC6 (Extended Data Fig. 6c), likely facilitating the replication of larger genomes. Mitotic cyclins (CCNE1) and genes involved in DNA repair (BRCA2 and MSH2) also had altered temporal order. Investigating differential responses to CIN, we found that the expression of E2F target genes showed strong negative correlation with chromosome losses in WGD-low tumours (Spearman’s ρ = −0.64, P = 0.015; Extended Data Fig. 6d) and an absence of correlation in WGD-high. Furthermore, the fraction of cells in G1 was correlated with rates of chromosome losses in WGD-low tumours, but not in WGD-high tumours (Spearman’s ρ = 0.64, P = 0.016; Extended Data Fig. 6e). Thus, both WGD and CIN were associated with altered cell-cycle dynamics, including delayed progression through G1, that increased with both CIN and WGD28,29.

Next we investigated CIN-dependent activation of innate immunity in cancer cells. CIN transcriptional phenotypes30 were significantly higher in WGD-high tumours (Fig. 5a), as expected given the CIN increases observed by means of scWGS and immunofluorescence. Nevertheless, WGD-high tumours showed a significant decrease in type I (IFNα and IFNβ) and type II (IFNγ) interferon, inflammatory pathways, and TNF via NF-κB signalling, relative to WGD-low tumours. The decrease was statistically significant for the cohort as a whole (Fig. 5a) and for the HRD-Dup subset (Extended Data Fig. 7a), with similar trends for the FBI subset (Extended Data Fig. 7b), indicating that the effect of WGD on cell-intrinsic immuno-phenotypic signalling may be independent of mutation signature. Interestingly, scWGS-derived rates of chromosome, arm, and segmental losses were positively correlated with immune-related expression programs in WGD-low tumours but not in WGD-high tumours (Extended Data Fig. 7c). This indicates that the innate immune response to CIN may be preserved in WGD-low tumours and abrogated in WGD-high tumours. Repression of STING1, an innate immune response gene activated by the presence of cytosolic DNA, is a well-established mechanism for evasion of the immunostimulatory effects of CIN31,32,33,34,35,36. STING1 was expressed at significantly lower levels in WGD-high tumours (Fig. 5b), whereas in WGD-low tumours, STING1 expression was positively correlated with rates of missegregation, especially chromosome losses (Spearman’s ρ = 0.75, P = 0.003; Fig. 5c). This finding was confirmed by the immunofluorescence measurements, which also showed a decrease in STING1 protein in WGD-high tumours (Fig. 5d,e). Similarly, STING1 protein was weakly correlated with micronuclei rate in WGD-low tumours, whereas in WGD-high tumours, STING1 exhibited a negative correlation with micronuclei rate (Fig. 5f). These results support a model in which WGD-high tumours adapt to increased rates of genomic and chromosomal instability by transcriptional remodelling of interferon signalling response pathways, including repression of STING1 (ref. 37).

Fig. 5: Tumour-cell phenotypes and microenvironment remodelling in the context of WGD.
Fig. 5: Tumour-cell phenotypes and microenvironment remodelling in the context of WGD.
Full size image

a, Scatter plot depicting GEE regression coefficients versus Benjamini–Hochberg-adjusted P values for selected genes and pathways in WGD-high and WGD-low tumour cells. MHC, major histocompatibility complex. b, Per-sample mean gene expression of STING1 in WGD-high (n = 63) and WGD-low (n = 34) samples. Centre line shows the median, box boundaries show quartiles and whiskers indicate 1.5 × IQR. Significance calculated using two-sided Wilcoxon rank sum test is included. c, Scatter plot of STING1 gene expression versus rate (counts per cell) of chromosomal losses, split by WGD-low and WGD-high (colours). Lines indicate the result of a linear regression in either WGD-high or WGD-low tumours. Regression coefficients and significance results are shown separately for WGD-low and WGD-high tumours. d, Example immunofluorescence images of WGD-high and WGD-low tumour samples with varying STING1 expression. Top, multichannel overlay images of STING1, panCK, DAPI and cGAS intensity at high magnification (scale bars, 125 μm). Bottom, zoomed insets (locations indicated by white boxes in the top panels; scale bars, 15 μm). e, Boxplots showing distribution of per-sample mean STING1 immunofluorescence intensity over tumour cells for WGD-high and WGD-low samples. Box plots are defined as in b. Significance calculated using a GEE model is included. f, Scatter plot and density estimation of STING1 versus micronuclei rate for 1 mm × 1 mm tiles in tumour ROIs. Points, density contours and coefficients, and P values of a generalized linear model are coloured by WGD-high and WGD-low tumour status. g, Differential cell-type abundance testing results from Milo with permutation testing (Methods) for cell types in WGD-high versus WGD-low samples. h, Normalized enrichment scores (NES) in the interferon pathway for cell types in the tumour microenvironment. CAF, cancer-associated fibroblasts; cDC1, conventional type 1 DCs; DCs, dendritic cells; EC, endothelial cells; NK, natural killer; pDC, plasmacytoid DCs. i, NES in the cell-cycle pathway for cell types in the tumour microenvironment.

To validate the cell-intrinsic impacts of WGD in vitro, we used TP53 mutant hTERT-immortalized retinal pigment epithelial (RPE-1) and TP53 mutant fallopian tube epithelial (FNE1) cell lines. In each cell line, a distinct, spontaneously arising WGD clone was observed by scWGS and could be identified in scATAC-seq and scRNA-seq using clone-specific chromosome- and arm-level copy-number events (Methods and Extended Data Fig. 8a–c). We first studied how non-WGD cells from early, predominantly diploid passages of each cell line responded to the CIN-inducing drugs nocodazole and reversine. Treatment was associated with increased chromosome and arm losses and gains (Extended Data Fig. 8d) and a concomitant rise in G1 cell fraction and mean STING1 expression (Extended Data Fig. 8e,f). Untreated later passages of each cell line harboured an almost-equal mixture of WGD and non-WGD cells, allowing robust identification of WGD-specific transcriptional programs. In these mixed-WGD samples, WGD cells did not exhibit an increased G1 cell fraction (Extended Data Fig. 8g), despite increased rates of copy-number events (Extended Data Fig. 8d). However, STING1 expression was lower in WGD cells than in non-WGD cells in mixed-WGD samples and across treatment conditions in early-passage RPE-1 samples (Extended Data Fig. 8f). Together, these in vitro data indicate that WGD-induced STING1 downregulation can occur independently of the tumour immune microenvironment.

Finally, we profiled the composition of cell states in the tumour immune microenvironments of the patient tumours. We found enrichment of CXCL10+CD274+ macrophages (M2.CXCL10), and IFN-producing plasmacytoid and activated dendritic cells in WGD-low tumours in cohort-wide (Fig. 5g and Extended Data Fig. 9a) and HRD-Dup-specific analyses (Extended Data Fig. 9b). All the main cell types had significant enrichment of ISGs in WGD-low tumours, indicating a pro-inflammatory immune response (Fig. 5h). By contrast, WGD-high tumours showed enrichment for endothelial cells, pericytes, and cancer-associated fibroblasts (Fig. 5g), along with ISG suppression. WGD-high tumours also showed slight enrichment of cytotoxic CD8+ T cells, possibly because of mutual exclusivity between cytotoxic CD8+ T cells and CXCL10+CD274+ macrophages across the cohort (Extended Data Fig. 9c). Notably, all the main cell types in WGD-high tumours (except for endothelial cells) exhibited marked depletion in cell-cycle-related gene expression, consistent with a pro-angiogenic yet immunosuppressive microenvironment in WGD tumours (Fig. 5i).

Discussion

We used scWGS matched with scRNA-seq and tissue-based immunofluorescence quantification of ruptured micronuclei to reveal the impact of WGD on tumour evolvability and phenotypic states in HGSOC. Using doubleTime to infer the evolutionary histories and timing of WGD revealed a complex role for WGD in HGSOC and context-dependent selection of WGD clones. More than half of the tumours in our cohort harboured a truncal WGD event, with the timing ranging from very early to late, indicating that WGD cells can expand across the evolutionary continuum. In a subset of patients, we observed partial expansion of recently emerged late-WGD clones coexisting with populations of residual 0×WGD cells, indicating that there was active selection at the time of tumour resection. The absence of residual 0×WGD cells in early WGD cases is consistent with 1×WGD clonal sweeps, underscoring the positive selective advantage that WGD confers in ovarian cancer. Intriguingly, in tumours in which we observed parallel WGD events, these events occurred at approximately the same time in the tumour’s evolutionary history. This could indicate that cell-extrinsic promotion factors led to a WGD-permissive state in these patients, enabling the simultaneous expansion of distinct WGD subclones. In WGD-low tumours, the small fractions of cells generated by ongoing WGD indicate that fixation of WGD is not limited by the event rate, but rather by tumour contexts that are permissive of WGD expansion, raising the crucial question of which cell-intrinsic and microenvironmental factors modulate the selection of WGD in HGSOC.

The relationship between WGD and genomic diversification is evident: we found ubiquitous minor populations that have undergone additional doublings, an increased rate of cell-specific aneuploidies post-WGD, and profoundly divergent cells38. Analysis of tumour-derived single-cell data allowed measurement of CNA rates much closer to the true underlying rate of CNA in patient tumours than is possible with bulk sequencing methods. Although DLP+ sequences live cells and may miss deleterious CNAs in non-viable cells, we observed cells with large regions of homozygous deletions, indicating that we nevertheless did capture part of the non-viable population. The existence of cells with highly divergent genomes is indicative of punctuated copy-number evolution39,40,41,42,43 as a mechanism for generating the extensive losses seen in some WGD clones. However, analysis of both truncal and subclonal WGD indicates that gradual losses, rather than punctuated evolution, shape the post-WGD evolution of many WGD clones, which simultaneously requires adaptation and tolerance for the high CIN levels associated with WGD. Despite elevated CIN, WGD-high tumours showed decreased cell-intrinsic and cell-extrinsic interferon signalling and a pro-angiogenic, immunosuppressive tumour microenvironment, consistent with previous findings on chronic CIN-induced immune suppression37,44. The disrupted correlation between CIN and STING1 in WGD-high tumours implicates STING1 transcriptional repression as a prerequisite for the clonal expansion of WGD. Given the very early timing of WGD in some patients, our results also prompt further investigation of STING1 repression as an early event that may precede WGD in the evolutionary history of some HGSOC tumours. Studying WGD and cGAS-STING in the context of serous tubal epithelial carcinoma (STIC) precursor lesions45,46 could yield important insights into how WGD and cGAS-STING modulation contributes to tumorigenesis in HGSOC.

Our data introduce a critical covariate for therapeutic stratification of patients: nearly every tumour harbours WGD cells with co-existing multiplicities. Even with the modest cohort size presented here, we anticipate that studying how WGD clones affect responsiveness to HRD-stratified PARP inhibitors, or to anti-angiogenic therapies such as bevacizumab, will advance the rational administration of therapeutic strategies for HGSOC47,48. Intriguingly, the genomic and phenotypic consequences of WGD were evident even within HRD subtypes, indicating the potential for composite biomarkers involving mutational process and WGD to stratify patients. Moreover, given that emerging approaches targeting the WGD process itself and/or the downstream consequences of CIN49,50,51 are in early phase clinical trials, we anticipate that further insight into WGD evolutionary dynamics will be required to interpret the efficacy and durability of response. The relevance of our findings to other tumour types remains unclear, although in vitro12, breast patient-derived xenograft models13 and pancreatic cancer mouse7 studies indicate that ongoing WGD dynamics may be pervasive across TP53 mutant cancers. Thus, future studies should prioritize investigating how the evolutionary dynamics of ongoing WGD affect therapeutic responses52 across tumour types.

Methods

Experimental methods

Sample collection

All the enrolled patients were consented to an institutional biospecimen banking protocol and MSK-IMPACT testing53, and all analyses were performed per a biospecimen research protocol. All protocols were approved by the Institutional Review Board (IRB) of Memorial Sloan Kettering Cancer Center. Patients were consented following the IRB-approved standard operating procedures for informed consent. Written informed consent was obtained from all patients before conducting any study-related procedures. The study was conducted in accordance with the Declaration of Helsinki and the Good Clinical Practice guidelines (GCP).

We collected fresh tumour tissues from 41 HGSOC patients at the time of up-front diagnostic laparoscopic or debulking surgery. Ascites and tumour tissue from multiple metastatic sites, including bilateral adnexa, omentum, pelvic peritoneum, bilateral upper quadrants and bowel, were procured in a predetermined, systemic fashion (a median of four primary and metastatic tissues per patient) and were placed in cold RPMI for immediate processing. Blood samples were collected before surgery for the isolation of peripheral blood mononucleated cells (PBMCs) for normal whole-genome sequencing (WGS). The isolated cells were frozen and stored at –80 °C. Tissue was also snap-frozen for bulk DNA extraction and tumour WGS. Tissue was also subjected to FFPE for histological, immunohistochemical and multiplex immunophenotypic characterization.

Sample processing

We profiled patient samples using five different experimental assays:

  1. 1.

    Viably frozen single-cell suspensions were derived from fresh tissue samples and processed for scWGS of 70 sites from 41 patients (mean of 1,429 cells per site; Supplementary Table 3). CD45 cells were flow-sorted in samples with low tumour purity.

  2. 2.

    CD45+ and CD45 flow-sorted cells were previously reported fresh tissue samples and were processed for scRNA-seq of 123 sites from 32 patients (about 6,000 cells per site).

  3. 3.

    For each specimen with scWGS and/or scRNA-seq, site-matched FFPE tissue sections were stained by multiplexed immunofluorescence for micronuclei and DNA-sensing mechanisms, together with adjacent sections used for whole-slide haematoxylin and eosin (H&E) staining (102 tissue samples from 37 patients).

  4. 4.

    FDA-approved clinical sequencing of 468 cancer genes (MSK-IMPACT) was obtained on DNA extracted from FFPE tumour and matched normal blood specimens for each patient (Extended Data Fig. 1b).

  5. 5.

    Snap-frozen tissues were processed to obtain matched tumour-normal bulk WGS on a single representative site from 33 of 41 patients with scWGS, scRNA-seq and immunofluorescence, to derive mutational processes from genome-wide single-nucleotide and structural variants.

Single-cell DNA sequencing

Tissue dissociation

Tumour tissue was immediately processed for tissue dissociation. Fresh tissue was cut into 1-mm pieces and dissociated at 37 °C using a human tumour dissociation kit (Miltenyi Biotec) on a gentleMACS Octo Dissociator. After dissociation, single-cell suspensions were filtered and washed with ammonium-chloride-potassium (ACK) lysing buffer. Cells were stained with Trypan blue, and cell counts and viability were assessed using a Countess II automated cell counter (ThermoFisher). For a detailed protocol, see ref. 54. Freshly dissociated cells were processed for scRNA-seq as described previously14. Viably frozen dissociated cells were stored for scWGS.

Cell sorting

Viably frozen dissociated cells used for scWGS were thawed and then stained with a mixture of GhostRed780 live/dead marker (TonBo Biosciences) and Human TruStain FcX Fc receptor blocking solution (BioLegend). For samples with low tumour purity, the stained samples were then optionally incubated and stained with Alexa Fluor 700 anti-human CD45 antibody (BioLegend). After staining, they were washed and resuspended in RPMI plus 2% FCS and submitted for cell sorting. The cells were sorted into CD45-positive and CD45-negative fractions by fluorescence assisted cell sorting on a BD FACSAria III flow cytometer (BD Biosciences). Positive and negative controls were prepared and used to set up compensations on the flow cytometer. Cells were sorted into tubes containing RPMI plus 2% FCS for sequencing.

Library preparation and sequencing

Single-cell whole-genome library preparation was done as described previously16. In brief, single cells were dispensed into nanowells with protease (Qiagen) and DirectPCR cell lysis reagent (Viagen). After overnight incubation, cells were subjected to heat lysis and protease inactivation followed by tagmentation in a tagmentation mix (14.335 nl TD buffer, 3.5 nl TDE1 and 0.165 nl 10% Tween-20) at 55 °C for 10 min. When the tagmentation reaction was neutralized, eight cycles of PCR followed. The indexed single-cell libraries were recovered from the nanowells by centrifugation into a pool and sequenced at the MSKCC Integrated Genomics Core on an Illumina NovaSeq 6000 (paired-end 150-base pair reads).

Immunofluorescence

Overview

We profiled matched FFPE tissues by immunofluorescence to quantify the rate of micronuclei formation in tumours using a six-colour assay (DAPI, cGAS, STING, p53, panCK and CD8). Immunofluorescence detection was done at the Molecular Cytology Core Facility of Memorial Sloan Kettering Cancer Center using a Discovery XT processor (Ventana Medical Systems, Roche-AZ). Antigen retrieval was done using ULTRA Cell Conditioning (Ventana Medical Systems, 950-224). The tissue sections were blocked first for 30 min in background blocking reagent (Innovex, NB306). Multiplex assay antibodies and conditions are described in Supplementary Table 6.

Tissue staining

Automated multiplex immunofluorescence was done using a Leica Bond BX staining system. Paraffin-embedded tissues were sectioned at 5 μm and baked at 58 °C for 1 h. Slides were loaded in Leica Bond and immunofluorescence staining was done as follows. Samples were dewaxed at 72 °C before being pretreated with EDTA-based epitope retrieval ER2 solution (Leica, AR9640) for 20 min at 100 °C. The 5-plex antibody staining and detection was done sequentially. The primary antibody against cGas (1.25 μg ml−1, rb, CST, 7997), Sting (0.075 μg ml−1, rb, CST, 13647), p53 (0.005 μg ml−1, rb, Abcam, ab32389), panCK (ms, 1:500, DAKO, M3515) or CD8 (rb, ventana, 1/40) was incubated for 1 h at room temperature followed by application of Leica Bond polymer anti-rabbit HRP secondary antibody (included in the Polymer Refine detection kit (Leica, DS9800)) for 8 min at room temperature. For the mouse primary antibody, the rabbit anti-mouse linker (Leica Bond post-primary reagent included in Polymer Refine detection kit (Leica, DS9800)) was incubated for 8 min before the application of Leica Bond polymer anti-rabbit HRP. After that, Alexa Fluor tyramide signal amplification reagents (Life Technologies, B40953, B40958) or CF dye tyramide conjugates (Biotium, 92172, 96053, 92174) were used for detection. After each round of immunofluorescence staining, epitope retrieval was done for denaturation of primary and secondary antibodies before another primary antibody was applied. When the run was finished, slides were washed in PBS and incubated in 5 μg ml−1 4′,6-diamidino-2-phenylindole (DAPI) (Sigma Aldrich) in PBS for 5 min, rinsed in PBS and mounted in Mowiol 4–88 (Calbiochem). Slides were kept overnight at −20 °C before imaging.

RPE-1 cell-line experiments

We explored the phenotypic effects of chromosomal instability and WGD in TP53-knockout RPE-1 cells. TP53-knockout RPE-1 was a gift from the Maciejowski laboratory at the Memorial Sloan Kettering Cancer Center (MSKCC). RPE-1 cells were cultured in DMEM (Corning) supplemented with 10% fetal bovine serum (Sigma-Aldrich), 1% penicillin-streptomycin (Thermo Fisher) at 37 °C and 5% CO2. All cells were periodically tested for mycoplasma contamination.

TP53−/− RPE-1 cells were treated with nocodazole, reversine and DMSO control to induce varying levels of chromosomal instability, then subjected to both 10× multiome sequencing and scWGS using DLP+ (Supplementary Table 7). For nocodazole treatment, RPE-1 cells were seeded at 20% confluence at the time of nocadazole addition. Cells were treated with 100 ng ml−1 nocodazole (Sigma-Aldrich) or DMSO for 8 h. After 8 h, cells were washed three times with PBS to remove the drug. After 48 h, the cells were collected. For reversine (Cayman Chemical Company) treatment, cells were treated at a concentration of 0.5 µM reversine for 48 h. After 48 h, cells were washed three times with PBS to remove the drug. Cells were collected after 12 h. We collected 10,000 cells per condition for 10x Genomics Chromium Single Cell Multiome ATAC+ gene expression according to the manufacturer’s protocol. Library preparation and sequencing were done in the MSKCC Integrated Genomics Core. We subjected 1 million matched cells per condition to scWGS DLP+ as described above.

A spontaneously arising WGD subclone was observed as a minor population of TP53-knockout RPE-1 cells (Extended Data Fig. 8a). The relative fraction of this WGD population was monitored by DNA FISH every 5 passages. After 30 further passages (sample RPE-1 mixed), the WGD subclone, as measured by DLP+, comprised 37% of the population. Sample RPE-1 mixed was subjected to scWGS DLP+ and 10× scRNA-seq.

FNE1 cell-line experiments

FNE1 cells were a gift from Tan Ince. Cells were cultured in FOMI (US Biological Life Science, 506388.500) at 5% O2 and 5% CO2 at 37 °C, as described previously55. All cells were periodically tested for mycoplasma contamination. TP53 knockout was performed by electroporation (Lonza 4D nucleofector) of a ribonucleoprotein complex of Alt-R Cas9 (IDT 1081058) and the guide sequence mC*mC*mA* rUrUrG rUrUrC rArArU rArUrC rGrUrC rCrGrG rUrUrU rUrArG rArGrC rUrArG rArArA rUrArG rCrArA rGrUrU rArArA rArUrA rArGrG rCrUrA rGrUrC rCrGrU rUrArU rCrArA rCrUrU rGrArA rArArA rGrUrG rGrCrA rCrCrG rArGrU rCrGrG rUrGrC mU*mU*mU* rU. Cells were treated with 10 µM nutlin-3a to select for TP53-deficient cells for one week; at that time point, cells treated with a control guide were no longer proliferating. Loss of p53 was also confirmed by sequencing. For reversine (Cayman Chemical Company) treatment, early passage cells were treated at a concentration of 0.25 µM reversine for 48 h. After 48 h, cells were washed three times with PBS to remove the drug. Cells were collected after 12 h. We collected 20,000 cells for each condition. Cells were passaged and monitored for the emergence of a WGD population by DNA FISH as described above. By passage 15 after TP53 loss, nearly 50% of the cells were polyploid, as quantified by DNA FISH.

Monitoring for WGD using DNA FISH

After every five passages, cells were frozen and assessed for WGD using DNA FISH. In brief, cells were pelleted, incubated in 5 ml 75 mM KCl for 15–30 min. Cells were subsequently washed two times in ice-cold 3:1 methanol:glacial acetic acid solution. Cells were then spotted on a slide and dried overnight at 37 °C. Slides were washed twice in 2× SSC for 2 min each, then dehydrated sequentially in 70%, 85% and 100% ethanol, and air-dried for 2 min. FISH probes (MetaSystems, D-6008-100-OG) were applied to cells on glass slides, sealed with a coverslip using rubber cement and co-denatured with the samples at 72 °C for 5 min. After denaturation, hybridization was performed overnight at 37 °C in a humidified chamber. After hybridization, slides were washed in 2× SSC three times for 2 min each, rinsed in PBS, counterstained with DAPI and dehydrated in 70%, 85% and 100% ethanol before being mounted in ProLong Gold antifade solution. Quantification of tetraploid cells was performed on a Zeiss LSM880 (Carl Zeiss Microscopy) using a Plan-Apochromat 63×/1.4 NA oil objective lens.

Computational methods

Computational analyses of multimodal datasets were enabled by the Isabl platform56.

Single-cell DNA sequencing

Overview

The single-cell DNA analysis pipeline is a suite of workflows for analysing the single-cell data generated by the DLP+ platform16. The workflow takes dual-indexed reads from Illumina paired-end sequencing data as the input and performs various alignment and postprocessing tasks. The pipeline is publicly available on GitHub (https://github.com/mondrian-scwgs/mondrian), which we run within the Isabl framework56.

Alignment

We used Trim Galore to remove adapters and FastQC to generate QC reports before running alignment. The reads were then aligned with bwa-mem v0.7.17 (ref. 57) (with support for bwa-aln). PCR duplicates were marked using Picard v.2.27.4 with the MarkDuplicates tool, and alignment metrics were computed for each cell with the Picard tools CollectWgsMetrics and CollectInsertSizeMetrics. The pipeline also generated plots for each alignment metric for a quick overview.

Copy-number segmentation

Reads were tabulated for non-overlapping 500-kilobase regions. A modal regression normalization16 was performed to reduce GC bias. The pipeline then ran HMMcopy with six different ploidy settings and the best fit was chosen automatically58. The pipeline also generated heatmaps with cell clustering, per-cell copy-number profiles and the modal regression curve for visualization.

Quality control

The scWGS data were first subjected to quality control and filtering to remove non-cancer cells, S-phase replicating cells, low-quality cells, and doublets, resulting in 30,260 high-quality cancer-cell genomes (Extended Data Fig. 2c,d and Supplementary Note). The quality-control pipeline compiled the results from the total copy-number analysis and alignment, and we then used a random forest classifier to predict the quality of each cell based on the alignment and HMMcopy metrics16. We then inferred allele-specific copy-number profiles for each of these cells using SIGNALS13. Patient-level average ploidy ranged from 1.6 to 4.4, and the average fraction of LOH ranged from 0.12 to 0.57. Ploidy and LOH estimates were concordant with matching bulk WGS and clinical panel sequencing by MSK-IMPACT, and losses and gains from scWGS coincided with known drivers of HGSOC (Extended Data Fig. 2e–g). Thus, at a pseudobulk level, the genomic characteristics of our scWGS cohort matched those of both whole-genome and targeted bulk data.

Haplotype-specific copy number

In a bulk WGS matched normal sample for each patient, we measured reference and alternate allele counts for SNPs from the 1000 Genomes Phase 2 reference panel. We used a binomial exact test to identify SNPs that were heterozygous in the normal sample. Using SHAPEIT59 and the 1000 Genomes phase 2 reference panel, we computed haplotype blocks. Next we measured per-cell reference and alternate allele counts for heterozygous SNPs in the tumour scWGS data.

Mitochondrial DNA copy number

To infer the mitochondrial DNA copy number, we first computed the average read depth of the mitochondrial genome in each cell, restricting it to reads with a mapping quality of at least 30. Then we converted the mitochondrial genome coverage for each cell to an approximate copy number by dividing by the nuclear genome coverage and multiplying by the cell’s average (nuclear) ploidy.

Cell filtering

We established stringent filters to maximize the removal of problematic cells without losing sensitivity to rare, interesting populations, including those representing cell-specific WGD.

Removal of low-quality cells

We removed cells with a quality score lower than 0.75. The quality score was computed using the classifier presented in ref. 16.

Removal of normal cells

After copy-number calling, we identified normal cells as those with an average copy-number state between 1.95 and 2.05 with a standard deviation of less than 0.5. We removed these normal cells from further analysis. We also manually inspected cells with aneuploidy slightly outside this range but much less than tumour cells in the same sample, and manually selected ‘aberrant normal’ cells for removal (see Supplementary Note for examples). These cells typically did not share SNVs with the tumour cells and may correspond to other epithelial cells affected by field cancerization60 or immune/stromal cells with rare chromosomal aberrations.

Removal of S-phase cells

It is necessary to remove S-phase cells before downstream analysis because the observed HMMcopy profiles of these cells reflect a mixture of both somatic (heritable) copy number and transient doubling of replicated genomic loci. We nominated S-phase cells through a combination of features known to correlate with S-phase cells. We aimed to isolate the high-quality G1/2-phase cells for downstream analysis, so we did not need to distinguish between S-phase cells and low-quality cells (noisy HMMcopy profiles resulting from other factors, such as under-tagmentation before sequencing or incomplete cell lysis).

We first computed the following three features for each cell:

  1. 1.

    The Spearman correlation between the HMMcopy state profile for a cell of interest and the RepliSeq replication timing profile from MCF-7 cells. S-phase cells have higher correlations than G1/2-phase cells.

  2. 2.

    The number of HMMcopy breakpoints per cell, that is, the number of pairs of adjacent bins with different integer copy-number states. S-phase cells have more breakpoints than G1/2-phase cells.

  3. 3.

    The median breakpoint prevalence across all HMMcopy breakpoints. This statistic was calculated by first computing the mean prevalence of each breakpoint across all cells belonging to a particular patient. Then, for each cell of interest, we subset to only the genomic loci with detected breakpoints in that cell and calculated the median of the mean breakpoint prevalences for those loci. S-phase cells have low median breakpoint frequency scores, because they have lots of rare breakpoints.

All three features varied widely across patients because of each patient’s unique number, positioning and heterogeneity of somatic copy-number alteration. We therefore used a strategy of examining each feature’s distribution across all cells in a patient, manually inspecting outlier cells and selecting custom thresholds for each patient. We used a filtering approach whereby cells are called as S-phase if any two of the three features are beyond the threshold. This conservative strategy ensured that all remaining cells were truly in the G1/2 phase and therefore had HMMcopy profiles that accurately reflected the somatic copy number. The thresholds used for each patient are included as Supplementary Table 4.

Removal of doublets

We applied several orthogonal approaches to remove doublets from the DLP data. First, under the assumption that the chromosome 17 LOH should be clonal in ovarian cancer, we removed tumour cells that lacked LOH of chromosome 17. Then we used a combination of mutation-based features to manually identify tumour-normal doublets, including LOH (much lower than typical tumour cells), the proportion of SNVs with alternate reads (higher than typical normal cells) and copy-number profiles that were similar to tumour cells with the addition of two copies across the genome. Finally, two raters separately reviewed the brightfield image of each cell in the clear microfluidic nozzle before deposition in the microwell array for sequencing and flagged any images that appeared to contain more than one cell. Any cell with an image that was flagged by at least one reviewer was removed from analysis. Example doublet copy-number profiles and spotter images are included in the Supplementary Note.

Removal of suspect high-ploidy cells

We restricted analysis to cells with high-confidence ploidy calls. Absolute ploidy is unidentifiable from the copy-number data of an individual cell, so we took a parsimony approach and assumed the true ploidy to be the lowest ploidy value that provided a reasonable fit to the data. One failure mode in the automatic determination of ploidy by HMMCopy occurred when HMMCopy converged on a solution with double the true ploidy, driven by the overfitting of isolated outlier bins. Such cells were characterized by mostly even copy-number states, except for isolated bins with odd copy numbers. To remove such potential artefacts, we required there to be at least one segment longer than 10 megabases in length with a copy number of 1, 3 or 5. Cells with no segments longer than 10 megabases with copy number 1, 3, or 5 were removed from further analysis. Note that as a result of this conservative approach, G2-phase cells and cells that had sustained perfect doublings would be detected as half their true ploidy or omitted from this study.

In conclusion, we performed several filtering steps including both automatic classification and manual review to remove low-quality cells, normal cells, S-phase cells, doublets and dubious high-ploidy cells (see also Supplementary Note). The requirement that predicted copy-number profiles include at least one 10-megabase or larger segment with a copy-number state of 1, 3 or 5 ruled out a non-WGD solution with half of the inferred copy number. However, it should be noted that individual cells that had sustained perfect doublings and non-aberrant G2 phase cells would be detected as half of their true ploidy in this study.

Comparison with bulk copy number

We used the WGS copy number inferred by ReMixT61 to validate the average ploidy in the MSK SPECTRUM cohort. Similarly, we used the IMPACT copy number inferred by FACETS62 for further orthogonal validation.

Detecting WGD in single cells using allele-specific copy number

WGD events were identified in single cells based on the allele-specific copy number state, as previously described for bulk WGS3. We computed two metrics from SIGNALS results: the fraction of the genome with two or more copies for the main allele (FM2) and the fraction of the genome with three or more copies for the main allele (FM3). Similar to the results in bulk WGS, a clear separation could be seen between subpopulations using each metric (Extended Data Fig. 2h,i). We classified any cell with FM2 > 0.5 as having undergone at least one WGD, and any cell with FM3 > 0.5 as having undergone at least two WGDs.

Patient-level WGD classifications

Tumours were classified as WGD-high at the patient level if the fraction of cells with at least one WGD exceeded 50% of the cells sequenced for that patient. The remaining tumours were classified as WGD-low.

Subclonal WGD classification

We classified cells for each patient as comprising a subclonal WGD subpopulation if they were predicted to have one more WGD than the ‘background’ WGD multiplicity, which we define as the lowest WGD multiplicity representing at least 25% of cells. For all WGD-low tumours, this was 0×WGD. For most WGD-high tumours, this was 1×WGD, with the exception of cells from patients OV-081 and OV-125, which had a background WGD multiplicity of 0×WGD as they had more than 25% 0×WGD cells.

Variant calling

SNV calling

Because the low per-cell coverage in scWGS was insufficient to resolve variants at nucleotide resolution, we merged all the single cells together to create a pseudo-bulk genome for each library. We ran the Mutect2 variant caller63 on the merged data across all the libraries from each patient. We computed the reference and alternate counts for each cell at all variant loci detected across all samples from a given patient.

SV calling

We used a similar approach for breakpoint calling by creating pseudo-bulk libraries, then running deStruct64 and Lumpy65 on each library. Only consensus SVs detected by both methods were retained; SVs from both methods were considered consensus if their coordinates were within 200 base pairs and their orientations matched. The SV calls were further post-processed as described in a previous study66.

Filtering somatic variant calls using ArtiCull

We applied ArtiCull67 to remove artefactual SNVs resulting from the short insert sizes in the scWGS data. ArtiCull was trained on high-confidence correct and artefactual calls based on manually labelled clones from seven patients (OV-004, OV-022, OV-045, OV-046, OV-052, OV-081 and OV-083), then applied it to all variants from all patients.

SBMClone

We applied SBMClone68 to the filtered somatic variants for each patient. SBMClone was run ten times for each patient with different random initializations, and the solution with the highest likelihood was kept (for patient OV-024, two of the initializations exceeded the runtime limit of seven days so the best solution of eight initializations was used).

Evolutionary histories of SNV clones using doubleTime

We developed doubleTime, which is a method for computing the evolutionary histories of the SNV clones in each patient, including accurate placement of WGD events in the clonal phylogeny of each patient. We have made doubleTime publicly available on GitHub (https://github.com/shahcompbio/doubleTime). It involves three main steps. First, we constructed a clonal phylogeny relating the clones identified by SBMClone. Second, we assigned WGD events to branches in the clonal phylogeny. For each pair of WGD clones, we assessed whether those clones arose from a single shared WGD or two parallel WGD events. Given this information, we were able to unambiguously assign WGD events to branches of each patient’s clonal phylogeny. Third, we used a probabilistic model to assign SNVs to branches of the clonal phylogeny, including assignment before and after WGD events on WGD branches. To control for the effect of small clones on sensitivity to detect mutations, terminal branch lengths were corrected for the total haploid coverage of the corresponding clone (Supplementary Note). We describe each of the three steps in detail below. Patient OV-024 was excluded because the clones were predominantly 2×WGD, which is not supported. Patient OV-125 was excluded owing to low cell counts (no SBMClone clone with at least 20 cells).

SBMClone SNV-based clonal phylogenies

We reconstructed phylogenetic trees with SBMClone clones as leaves using a binarized version of the implicit block structure inferred by SBMClone. We first computed a density matrix D, in which each row corresponded to a clone (cell block), each column corresponded to an SNV cluster (SNV block) and each entry Di,j contained the number of pairs (a,b), in which cell a in clone i had at least one alternative read covering SNV b in cluster j, divided by the total number of possible pairs (the size of clone i times the size of cluster j). We then computed a binary matrix B by rounding up those entries of D that exceeded a density of 0.01, removing empty columns, and collapsing identical rows (combining clones that contained the same blocks of mutations). We then attempted to infer a phylogenetic tree by applying the perfect phylogeny algorithm. Matrices B that did not permit a perfect phylogeny were manually modified with the minimum number of changes required to permit a perfect phylogeny; this typically occurred when mutations shared between two or more clones had been lost owing to a deletion in a subset of the clones.

Discerning parallel from shared WGD

To identify cases in which sequenced WGD cells arose from distinct WGD events, we analysed SNVs from the single-cell DNA sequencing data. Specifically, for each patient, we focused exclusively on those regions that exhibited copy-neutral loss of heterozygosity (cnLOH; major copy number 2 and minor copy number 0) among nearly all (90% or more) tumour cells with a single WGD. Given a candidate bipartition of the 1×WGD cells, under the infinite-sites assumption, each cnLOH SNV can be assigned to one of the following categories:

  • two mutant copies in both clones (shared pre-WGD and pre-divergence);

  • one mutant copy in one clone (private post-divergence);

  • no mutant copies (false-positive variant);

  • one mutant copy in both clones (shared post-WGD and pre-divergence);

  • two mutant copies in one clone (private pre-WGD and post-divergence).

The last two categories of SNVs present evidence for or against multiple parallel WGD events. SNVs that are shared at one variant copy (VAF ~ 0.5) would indicate that the two sets of cells underwent the same ancestral WGD event, because they share mutations that must have followed the WGD. Conversely, SNVs that are private at two variant copies (VAF ~ 1) would indicate that the two sets of cells underwent distinct WGD events, because they have private mutations that preceded the WGD. Specifically, we considered the following hypotheses:

  1. 1.

    single-WGD: shared one-copy SNVs are allowed but private two-copy SNVs are not allowed;

  2. 2.

    multiple-WGD: shared one-copy SNVs are not allowed, but private two-copy SNVs are allowed.

To evaluate the relative strength of these hypotheses, we developed a likelihood ratio test that compared the probability of observing the given variant counts for cnLOH SNVs under these two hypotheses: for each patient, we evaluated P(multiple-WGD)/P(single-WGD) using a simple binomial model of read counts. We then tested the significance of this likelihood ratio by generating an empirical null distribution: we fixed the total SNV read counts and their best-fitting variant copy numbers under the single-WGD hypothesis and resampled alternate read counts.

Assigning SNVs to branches and estimating branch lengths

From the previous steps, we have a tree relating the clones detected by SBMclone. We place WGD events on branches such that all WGD-high tumours had a WGD event placed on the root of the tree, except those in which parallel WGD events had been identified (patients OV-025 and OV-045) or WGD only affected a subset of clones (patient OV-081), in which case those specific events were placed further down the tree. We used a probabilistic model to assign SNVs to branches and estimate branch lengths based on read-count evidence for SNVs in each clone (for each leaf, we collected read counts only from those cells in the majority WGD multiplicity). For WGD branches, the model assigns SNVs as occurring before or after the WGD and estimates the length of the branch before and after the WGD. This strategy effectively splits each branch with a WGD event into two unique positions in the tree, meaning that the total number of positions in the tree to which an SNV can be assigned is equal to the number of branches plus the number of branches with WGD events.

For this analysis, we considered only those SNVs in regions where, for each SBMClone clone, more than 80% of cells shared the same copy-number state. We further restricted analysis to SNVs in regions with allele-specific copy-number states whose multiplicity (the variant copy number, or the number of copies of the genome containing the SNV), and thus the expected VAF, could be uniquely determined by the combination of tree placement and WGD status (that is, whether or not the clone was affected by an ancestral WGD event). Specifically, we analysed regions with the following copy-number states across all clones:

  • 1:0 in both WGD and non-WGD clones;

  • 1:1 in both WGD and non-WGD clones;

  • 2:0 in WGD clones, 1:0 in non-WGD clones;

  • 2:1 in WGD clones, 1:1 in non-WGD clones;

  • 2:2 in WGD clones, 1:1 in non-WGD clones.

In each of these scenarios, we assumed that the WGD and copy-number events immediately following the WGD accounted for the differences in copy number between WGD and non-WGD clones. Note that the only patient in the cohort with different WGD status for different leaves was patient OV-081, so for nearly all patients, we analysed only those SNVs with clonal copy-number states (matching the above listed states depending on WGD status). The multiplicity for an SNV on a particular allele placed on a particular branch of the tree was as follows:

  • 0, if the corresponding allele had 0 copies;

  • equal to the allele-specific copy number of the allele in the clone, if the SNV occurred pre-WGD and the leaf was affected by WGD;

  • equal to 1 otherwise.

Each SNV was assigned to a tree position by fitting the observed total and alternative counts of said SNV to the expected VAFs for all clones. SNVs were assigned to positions in the tree using a Dirichlet categorical distribution, and a beta-binomial emission model was used to relate observed SNV counts to expected VAFs. The model was implemented in Pyro and fitted using black-box variational inference69. Note that when computing branch lengths, we only used C>T SNVs at CpG sites because these SNVs have been reported to correspond most closely to chronological age20.

To account for the differences in genome size and copy-number heterogeneity between different patients with varying amounts of aneuploidy, we normalized the number of C>T CpG SNVs on each branch by the number of bases being considered. First, we computed the effective genome length of each clone as the total size of the bins considered to be clonal for a valid copy-number state as defined above, with each bin weighted by its total copy number. Then, for the internal nodes of the tree, we assumed that the only copy-number changes to these bins were directly coupled to WGD events. Thus, for post-WGD branches, the genome length was identical to that of the leaves; and for pre-WGD branches, the genome length was computed using the correspondence described above between pre- and post-WGD copy numbers.

Estimating pre- and post-WGD changes in WGD subpopulations

We used a maximum parsimony-based method to estimate pre- and post-WGD changes from estimated ancestral and descendent copy-number profiles. We proceeded independently for each bin. Let x be the ancestral copy-number state and y be the descendent copy-number state, and assume that y is produced by a combination of pre-WGD copy-number change followed by WGD followed by post-WGD copy-number change. We can relate x and y using

$$y=2(x+b)+a,$$

where b represents the pre-WGD copy-number change and a represents the post-WGD copy-number change. Let the cost of any given a and b be |a| + |b|. Conveniently, every combination of x and y results in a unique a and b that minimize this cost. Thus, for each x and y, we computed the associated b and a as the pre- and post-WGD changes, respectively, and |a| + |b| as the cost of those changes.

Computing the percentage genome different

We computed the percentage genome different for a pair of cells as follows. First, we computed the bin-level difference in total copy number and identified consecutive segments of changed and unchanged bins. We then removed segments less than or equal to 2 megabases in size (that is, affecting fewer than four consecutive 500-kb bins). Finally, we counted the number of bins for which the two genomes have different total copy numbers and divided by the total number of bins considered.

Classification of divergent cells

We defined divergent cells as outliers of the NND, using the percentage genome different as the distance metric. For each index cell, we identified its nearest neighbour as the other cell in the population with the minimal percentage genome different. The NND for each cell is thus the percentage genome different with respect to this neighbour cell. We then fitted a beta distribution to the NND values of all cells in the cohort and called divergent cells as those cells that have NND values in the 99th percentile of this beta distribution.

Cell phylogenies using MEDICC2

We derived estimates of chromosome missegregation rates per cell for each patient from copy-number phylogenies inferred using MEDICC2 (ref. 70). In addition to the cell filtering applied for all analyses, we removed divergent cells before running MEDICC2. First, we refined the single-cell haplotype-specific copy-number profiles for each patient by applying the dynamic programming formulation from asmultipcf71 to GC-corrected read counts and phased B-allele frequencies for each bin across all cells from the patient. Using this method, we identified segment boundaries for each patient and then summarized the number of copies of each segment and haplotype in each cell by rounding. Next, we ran MEDICC270 on these refined haplotype-specific single-cell copy numbers, which infers a tree with single cells corresponding to leaves. We used the –wgd-x2 flag for MEDICC2 which represents WGD as an actual doubling of all copy-number segments in the genome, rather than the default behaviour of adding 1 to all segments.

Reconstruction of ancestral copy number

To infer the ancestral haplotype-specific copy-number profiles associated with internal nodes of the cell phylogeny inferred by MEDICC2, we used a maximum-parsimony approach that treats each bin independently and aims to minimize the total number of changes on the tree. For each branch, the parsimony score is the absolute difference between the haplotype-specific copy-number profiles of the parent and the child. Transitions from 0 to any other copy number are given a score of infinity to prevent gain from 0 copies. The score for a WGD branch (assuming WGD placement from MEDICC2 is correct) is the sum of two parsimony scores: the parsimony score for copy-number changes between the parent and an intermediate genome, and the parsimony score for copy-number changes between a doubled version of the intermediate genome and the child (this is described above in the Estimating pre- and post-WGD changes in WGD subpopulations section). The state of each bin at each branch in the tree was chosen to minimize this parsimony score using the Sankoff algorithm72,73. We assumed that the MEDICC2 placement of WGD on branches of the phylogeny is correct in most cases, with the following exceptions.

  1. 1.

    For patients OV-025 and OV-045, we adjusted the WGD placement to be concordant with SNV evidence indicating a distinct clonal origin of multiple parallel WGD clones.

  2. 2.

    For 10 patients (OV-002, OV-003, OV-014, OV-024, OV-036, OV-044, OV-051, OV-052, OV-071 and OV-083), MEDICC2 failed to identify an ancestral WGD affecting a large proportion (97–100%) of cells that were indicated as WGD by the cell-specific CNA-based classifier. To correct this, we added a WGD event for each of these patients such that the number of WGD events ancestral to each cell in the MEDICC2 tree was identical to the number of ancestral WGD events indicated by the CNA-based classification.

  3. 3.

    For a further 5 patients (OV-004, OV-022, OV-050, OV-087 and OV-139), MEDICC2 disagreed with the cell-specific CNA-based classifier on the WGD classification of a small number (at most five) of cells. These cells were removed from the tree before ancestral reconstruction.

Classifying events from copy-number differences

Given a phylogenetic tree in which both leaves and internal nodes are labelled by haplotype-specific copy-number profiles, we identified the copy-number events on each branch using a greedy approach. First, we identified the differences between the parent haplotype-specific copy-number profile and the child copy-number profile. Then, for each chromosome and haplotype, we explained the copy-number differences between parent and child using events that are as large as possible:

  1. 1.

    if more than 90% of bins in the chromosome were altered in the same direction, we called a chromosome gain or loss that accounted for a change of one copy for all bins in the chromosome;

  2. 2.

    if no chromosome gain or loss was found, but 90% of the bins in one of the two arms is altered in the same direction, we called an arm-level gain or loss that accounted for a change of one copy for all bins in the chromosome arm;

  3. 3.

    if no chromosome- or arm-level gain or loss was found, we called a gain or loss of the largest contiguous segment that had a change in the same direction.

We then adjusted the copy-number difference by the selected event and repeated until all copy-number changes between parent and child have been accounted for. Note that if nearly all of the bins of a chromosome are gained (or lost), our method will first predict a chromosome gain (or loss), then another small segment loss (or gain) to account for the few bins that were predicted as unchanged. We selected this approach because we consider a whole chromosome (or arm) change to be more parsimonious if most of a chromosome’s (or arm’s) bins are altered. Our approach is also more robust to bin-level noise than a strategy that requires 100% of the bins to be altered.

For branches with WGD, we computed the intermediate pre-doubling profile that would result in the fewest copy-number changes (see Estimating pre- and post-WGD changes in WGD subpopulations above). Using our bin-independent parsimony model, we can compute this optimal intermediate profile analytically. We then performed the event-calling procedure described above twice: once on the differences between the parent and the intermediate pre-WGD profile, and once between the doubled intermediate profile and the child.

Normalizing missegregation rates to account for cell ploidy

We controlled for the opportunity for each cell to missegregate by dividing the number of copy-number events for each cell by the number of chromosomes (for chromosome-level missegregations) or arms (for arm-level missegregations) in the inferred parent node of each cell in the tree (the source of the terminal branch). This yields a rate of missegregation events per cell and per parental copy. For shorter segmental copy-number events, we divided the number of events in each cell by its parent’s genome length to control for opportunity. Although the resulting rate is not comparable to segment- and arm-level rates, it makes the cell-specific segmental rates more comparable between cells and across patients.

Enumerating events on ancestral branches

We classified copy-number events on the root branch of each patient’s cell phylogeny into three classes of event timing. Events were classified as non-WGD if they were predicted to occur on the root branch of a WGD-low tumour; pre-WGD if they were predicted to occur before the WGD event on the root branch of a WGD-high tumour; and post-WGD if they were predicted to occur after the WGD event on the root branch of a WGD-high tumour. Patients OV-025, OV-045, and OV-081 were omitted from this analysis because their WGD history precludes this categorization of copy-number events.

Calculating post-WGD changes in WGD clones

We catalogued all the high-confidence WGD clones detected in our cohort. This included all predicted WGD clades with at least 20 cells in the MEDICC2 phylogenies. We also included three small WGD clones from patients OV-006, OV-031 and OV-139 (Extended Data Fig. 4e–g). Counts of shared post-WGD events were calculated from the ancestral reconstruction on MEDICC2 trees as described above (see the Reconstruction of ancestral copy number section).

Single-cell RNA sequencing

Cell type assignment

Using scRNA-seq of CD45+/− sorted cells, we assigned the main cell types by supervised clustering using CellAssign74, as described in ref. 14.

InferCNV and scRNA-seq-derived copy-number clonal decomposition

InferCNV (v.1.3.5) was used to identify large-scale copy-number alterations in ovarian cancer cells identified by CellAssign75,76. For each patient, 3,200 non-cancer cells annotated by CellAssign were randomly sampled from the cohort and used as the set of reference ‘normal’ cells. After subtracting the reference expressions in non-cancer cells, chromosome-level smoothing and de-noising, we derived a processed expression matrix that represents copy-number signals. Cancer-cell subclusters are identified by ward.D2 hierarchical clustering and random_trees partition method using P < 0.05.

WGD classification

Identification of WGD cells from scRNA-seq data is technically challenging, because inferred copy number from expression data is typically noisy, allele-specific markers are sparse, and, as shown in our scWGS analysis, the prevalence of non-WGD cells in WGD-high tumours and WGD cells in WGD-low tumours is generally low, confounding identification of non-clonal ploidy populations within samples. Leveraging the high concordance between scWGS- and scRNA-derived copy number, even between non-site-matched patient samples (see Supplementary Note), we propagated scWGS-derived WGD status labels to all available patient-matched scRNA-seq samples for the purposes of transcriptional phenotyping analysis. Within-sample absolute normalization of unique molecular identifier (UMI) counts between tumour and non-tumour cells showed a significant increase in overall transcript counts per cell in WGD-high versus WGD-low tumours (see Supplementary Note), which was highly concordant with established estimates of transcriptional changes in WGD versus non-WGD samples in bulk RNA77. Thus, we concluded that site-matched scRNA-seq data effectively capture WGD transcriptional phenotypes. Any analyses correlating scWGS-derived missegregation rates to transcriptional phenotypes were restricted to site-matched samples with at least 20 cells in both scWGS and scRNA-seq.

Cell-cycle analysis

Discrete cell-cycle phase information was computed using Seurat’s CellCycleScoring function, excluding samples with fewer than 20 malignant cells. To estimate the association between WGD and cell-cycle phase, we used binomial GEE models cohort wide. We included tumour site and added interaction terms for WGD and age, and for WGD and mutation signature subtype. We repeated this analysis within the HRD-Dup signature subset.

We identified circular trajectories linked to cell-cycle progression in cancer cells using Cyclum78. Across the cohort, 10,000 cancer cells annotated by CellAssign were randomly sampled across tumours and used for cell-cycle trajectory inference. Pseudotime inference was performed on the scaled cell-by-gene matrix, limiting genes to cell-cycle markers included in cell-cycle GO terms (GO:0007049). Discretization of the continuous pseudotime trajectories was accomplished using a three-component Gaussian mixture model. Smoothed pseudotime trajectories of cell-cycle-related genes previously reported in the literature79 were then evaluated to interpret phase-specific gene activity and phase transitions as a function of pseudotime (Extended Data Fig. 6e).

Differential gene and pathway activity

Pathways were curated from single-cell hallmark metaprograms80, 50 hallmark pathways81 or CIN-associated gene signatures manually curated from the literature, including inflammatory signalling and ER stress30,37, and scored in single cells using Seurat’s AddModuleScore function. Owing to the hierarchical nature of the data, with multiple samples from patients, we used GEE on sample mean gene or pathway expression levels, adding tumour site (adnexa or non-adnexa) as a covariate in the model and restricting analysis to samples with at least 20 cells to compare WGD multiplicities. We repeated this procedure subsetting for HRD-Dup samples and adding an interaction term for age and WGD status as well as tumour site (HRD-Dup-only model). P values were adjusted for multiple testing using FDR. In parallel, we also performed differential expression analysis using a pseudobulked generalized linear mixed model (DREAMLET82), accounting for random patient and fixed tumour-site effects, and performed gene-set enrichment analysis (GSEA) with the same set of pathways.

Differential cell-type abundance

To determine cell populations that were differentially abundant between WGD-low and WGD-high samples, we used miloR v.1.8.1 (ref. 83), setting prop to 0.2 and using tumour_megasite (adnexa or non-adnexa) as a contrast in the differential abundance testing. To obtain significance values for each cell population, we ran permutation tests by swapping the sample WGD status labels 1,000 times and computing the proportion of tests in which the resulting non-permuted median log2(fold change) was more extreme than the permuted median values for each cell type.

Immunofluorescence

ROIs

We defined ROIs containing tumours on immunofluorescence images by delineating regions with tumour foci based on panCK, p53 and DAPI signal, and contrasting these with images of the immunofluorescence-adjacent H&E section. ROI annotations were drawn in QuPath. To ensure that complex tissue regions within ROIs used for analysis only included tumour, we classified regions of tumour, stroma, vasculature and glass within each ROI. We trained a pixel classifier with examples of tumour, stroma, vasculature and glass from each of the ROIs and slides using the panCK, p53 and DAPI signal in immunofluorescence, and verifying the region classification against the immunofluorescence-adjacent H&E section. ROIs with high cGAS background were excluded from analysis to minimize false-positive segmentations of cGAS+ micronuclei.

Segmentation of primary nuclei and micronuclei

Whole-slide immunofluorescence images stained with DAPI, cGAS, STING, p53, panCK and CD8 were analysed to characterize primary nuclei and micronuclei in ROIs. Segmentation of primary nuclei was done in QuPath v.0.5.1 using the StarDist algorithm on the DAPI channel84. We used a segmentation model pretrained on single-channel DAPI images (dsb2018_heavy_augment.pb). Applying the primary nuclei segmentation model across all ROIs yielded 20,988,413 primary nuclei in tumour regions. Segmented primary nuclei ranged between 5 μm2 and 100 μm2 in size, with a minimum fluorescence intensity of 1 a.u. The cell membrane for each primary nuclei was approximated using a cell expansion of 3 μm of the nuclear boundary.

MN were detected by StarDist segmentation of cGAS spots. We trained a new segmentation model on single-channel cGAS images using a U-Net architecture. We manually annotated cGAS+ micronuclei in a set of 256-pixel x 256-pixel tiles encompassing tumour regions across all slides. We created training and test sets using a 70:30 split, resulting in a training set of 70 tiles and a test set of 30 tiles. To ensure that the model generalized across patients and samples, we applied augmentation to the training set by applying random rotations, flips and intensity changes. We monitored the loss function during model training and saved the trained model with frozen weights.

This allows for whole-slide quantification and cell-level annotation of primary nuclei and micronuclei. Nuclear segmentation was also done using StarDist on the DAPI channel. Each micronucleus was assigned to the closest primary nucleus. Micronuclei were included for analysis if they were 10 μm or less from the centroid of the closest nucleus, had an area of 20 μm2 or less, a circularity of more than 0.65 and a minimum object probability of more than 0.75.

Validation of micronuclei segmentation

We evaluated our method on a test dataset with held-out micronuclei labels, showing good performance of predicted micronuclei segmentations with high average precision and F1 scores (intersection-over-union (IoU) < 0.5). We quantitatively evaluated the segmentation performance on the test data by considering cGAS+ micronuclei objects in the ground truth to be correctly matched if there were predicted objects with overlap. We used IoU as an overlap criterion, demonstrating good performance with a chosen IoU threshold of more than 0.5.

Micronuclei rates

Micronuclei rupture rates were estimated on the basis of the number of cGAS+ micronuclei and primary nuclei segmented in tumour ROIs. The rate of micronuclei rupture was estimated by localization of cGAS+ micronuclei neighbouring primary nuclei. The micronuclei rate was calculated as the fraction of primary nuclei with one or more micronuclei. Applying the micronuclei segmentation model across all ROIs yielded 896,042 cGAS+ micronuclei in tumour ROIs, with a mean micronuclei area of 0.76 μm2, ranging between 0.1 μm2 and 6.8 μm2. Slide-level and ROI-level micronuclei rates were calculated and are summarized in Supp Tab. 5, excluding small ROIs with 1,000 primary nuclei or fewer in downstream analyses.

Statistical comparisons of micronuclei rates

To compare the micronuclei rate between WGD-high and WGD-low, we used GEE. We used binary WGD-high versus WGD-low as the dependent variable with gaussian distribution and log(micronuclei rate) as the independent variable, adding patient as a group variable in the model. Reported effect size of WGD was calculated from the coefficient of log(micronuclei rate) in the learned model.

Analysing the relationship between micronuclei rate and STING1

We used a linear mixed effects model to evaluate the relationship between STING1 protein intensity and micronuclei rate separately for WGD-high and WGD-low tumours. We first divided each image into a regular grid of 1 mm × 1 mm tiles. For each primary nucleus, we computed the mean STING1 protein intensity in the combined nuclear and cytoplasmic region. For each tile we then computed the micronuclei rate in the tile and the mean STING1 protein intensity for tumour cells detected in the tile. We log-transformed this micronuclei rate and mean STING1 intensity and used a linear mixed effects model with the formula log(STING1) ~ log(micronuclei rate) with images as the group variable. We then report the coefficient and -value of the coefficient of log(micronuclei rate) in the model.

Mutational signatures

We analysed mutational signatures by integrating SNVs and structural variations detected by either bulk WGS or scWGS in a unified probabilistic approach called multimodal correlated topic models (MMCTM)15.

For bulk WGS samples, we obtained signature labels in the MSK SPECTRUM cohort (n = 41) using MMCTM, as presented in ref. 14. Mutational signatures for cases without bulk WGS data were assigned on the basis of mutational signatures inferred from scWGS. For scWGS samples, we obtained signature labels in the MSK SPECTRUM cohort (n = 41) using a ridge classifier with default regularization strength (α = 1.0). This classifier was trained on the integrated SNV and SV signature probabilities, which were obtained using MMCTM13 from HGSOC bulk whole genomes13 (n = 170).

Consensus mutational signatures were preferentially derived based on MMCTM signatures derived from bulk WGS and MMCTM signatures from scWGS. Mutational signatures for cases without bulk WGS data (OV-006, OV-044, OV-046, and OV-071) or inconclusive bulk WGS assignments (OV-004, OV-045, OV-080, and OV-081) were resolved on the basis of scWGS.

Analysis of RPE-1 and FNE1 cell-line experiments

10x scRNA-seq preprocessing

Raw 10x Genomics sequencing data for RPE-1-mixed and FNE1-mixed were aligned using CellRanger (v.7.0.0), which also performed barcode filtering and UMI gene counting using the 10× GRCh38 reference transcriptome.

10x Multiome preprocessing

Raw 10x Genomics sequencing data for RPE-D, RPE-Noco and RPE-Rev were aligned to the 10x Genomics GRCh38 reference transcriptome using CellRanger ARC (v.2.0.2). CellRanger ARC also performed barcode filtering and UMI gene counting to generate feature-barcode matrices for both RNA and ATAC modalities.

scATAC-seq copy-number analysis

Copy number was inferred from the scATAC-seq component of the 10x Genomics multiome data for the RPE-D, RPE-Noco and RPE-Rev samples. Blacklist-filtered fragments were first counted in 10-megabase genome bins. Bins with a GC content of less than 30% were removed before GC correction using modal regression16. Cells with more than 5% of their bins containing NA values after GC modal correction were removed from subsequent analysis. GC-corrected counts were smoothed using the DNACopy R package (v.1.73.0) smooth.CNA function, setting smooth.region = 4. Smoothed counts were mean-normalized per cell before clustering using Seurat (v.5)85. For visualization, mean-normalized and smoothed counts were scaled bin-wise to emphasize copy differences between clusters.

scRNA-seq copy-number analysis

Copy number was inferred from 10× scRNA-seq for the RPE-1-Mixed and FNE1-Mixed samples using Numbat (v.1.4.0)86 to preprocess and smooth expression counts. Smoothed counts were then rebinned to 500-kilobase bins, reduced to 50 dimensions by PCA and then clustered using Leiden clustering at 1.0 resolution on a SNN graph.

Identification of WGD subclones

A spontaneously arising WGD copy-number clone was observed in all DLP+ samples for RPE-1, characterized by gain of chromosome 1p and loss of chromosomes 1q, 2q, 4q and 21 (Extended Data Fig. 8a). The same WGD clone was evident in copy number inferred from scATAC-seq for RPE-1-D, RPE-1-Noco, and RPE-1-Rev, and from scRNA-seq for RPE-1-Mixed (Extended Data Fig. 8a). For event rate analyses in RPE-1-D, RPE-1-Noco and RPE-1-Rev we excluded scRNA-seq cells in the scATAC-seq-inferred WGD clone from further analysis to characterize the phenotypic impact of CIN in non-WGD cells. For RPE-1-Mixed, we aimed to characterize the phenotypic differences between WGD and non-WGD cells. We therefore used the scRNA-based copy-number clusters to label cells in that sample as either WGD or non-WGD.

In the FNE1-Mixed cell line, from the DLP+ data we identified a WGD clone characterized by loss of chromosomes 4, 18 and 21, and gain of chromosomes 5 and 20 (Extended Data Fig. 8b). The same WGD clone was evident in the copy number inferred from scRNA-seq (Extended Data Fig. 8b).

Estimating rates of cell-specific events from DLP+

We inferred cell-specific rates of copy-number change from the RPE-1 and FNE1 DLP+ data using a clustering-based method. We first removed low-quality and cycling cells as described above. We then clustered all cells from each cell line to identify a stable non-WGD copy-number profile. Next, for each cell, we computed the number of copy-number events between the stable non-WGD profile and the cell profile under two scenarios: including a WGD along the path from stable profile to cell profile, and not including a WGD. For each scenario, we classified events using the same greedy approach as in the patient data to identify chromosome, arm and segment events. We kept the smaller set of events for each cell; if this corresponded to the scenario with a WGD, then the cell was called WGD. For the FNE1 data, we ignored the small number of WGD cells in preceding samples (9.2–13.2% of cells) because only those in the FNE1-Mixed sample represented the clone identifiable in scRNA-seq (Extended Data Fig. 8b).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.