Main

Breast cancer is the most common malignancy in women, accounting for more than 15% of new cancer cases in the USA annually1. Clinically, breast tumours are stratified into three immunohistochemistry subtypes—ER+HER2, HER2 and triple-negative breast cancer (TNBC)—on the basis of the expression of ER, progesterone receptor and HER2 (ref. 2). Although heterogeneity in gene expression, especially measures of proliferation, within these subtypes correlates with prognosis and patterns of relapse, and is used to guide therapy3, ultimately the paradigm of three major subtypes dictates our understanding of and approach to the disease.

We previously defined eleven subtypes of breast cancer on the basis of integrative clustering (IC) of genomic and transcriptional profiles, and demonstrated their distinct prognosis and relapse trajectories4,5. Among patients with ER+ cancer (80% of cases), one-quarter had a 45% chance of distant recurrence two decades post-diagnosis5. This ER+ ‘high-risk’ subgroup, corresponding to IC1, IC2, IC6 and IC9 subtypes, is enriched for luminal B tumours harbouring focal oncogene amplification and overexpression, similar to ERBB2-amplified tumours (IC5, 10–15%). Moreover, genes within these amplicons mediate resistance to hormonal therapy6,7. TNBC comprises genome-unstable basal-like IC10 and IC4ER tumours, the latter with relapse risk that persists beyond 5 years.

Although the IC subgroups improve relapse prediction and define new drivers5, their origins, evolution and tumour immune microenvironments (TMEs) remain unknown. To investigate, we assessed the genomic architecture and microenvironmental composition of breast tumours from a meta-cohort of 1,828 tumours spanning pre-invasive ductal carcinoma in situ (DCIS), primary and metastatic lesions, profiled using whole-genome sequencing (WGS) and transcriptome sequencing8,9,10. We further implemented a machine learning framework to determine IC subtypes from DNA-based profiles alone. Our analyses reveal three primary genomic archetypes of breast cancer— (i) TNBC: ICs (IC10 and IC4ER); (ii) typical-risk ER+HER2 (IC3, IC4ER+, IC7 and IC8); and (iii) high-risk ER+HER2 (IC1, IC2, IC6 and IC9) and HER2+ (IC5) (referred to as ER+ high-risk + HER2+). The last group is characterized by early, recurrent amplifications, including extrachromosomal DNA (ecDNA) owing to APOBEC3B (A3B)-editing at ER-induced R-loops. These genomic patterns, accompanied by variable TMEs, implicate complex rearrangements as a major driver of immune escape and highlight new therapeutic vulnerabilities in aggressive subgroups.

Evolution of the IC subgroups

The mutational processes underlying breast cancer initiation and progression are incompletely understood11,12,13,14. Herein we uniformly processed 1,828 samples from DCIS (n = 406), primary (n = 702) and metastatic (n = 720) lesions using a harmonized, state-of-the-art bioinformatics pipeline to identify single nucleotide variants (SNVs), copy number aberrations (CNAs), structural variants (SVs), ecDNA and mutational signatures (Fig. 1a, Supplementary Fig. 1a–c and Supplementary Table 1). Owing to shallow coverage of the archival DCIS cohort, SNVs and SVs were not called10. Additionally, we used the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort of primary invasive tumours4 (n = 1,894) with both RNA and DNA profiles and about 20 years of clinical follow-up. To our knowledge, this represents the largest collection of uniformly processed breast tumours spanning all disease stages.

Fig. 1: ENiClust identifies the IC subtypes.
figure 1

a, Schematic of the study design. BC, breast cancer; HTAN, Human Tumor Atlas Network; sWGS, shallow WGS; TCGA, The Cancer Genome Atlas; PCAWG, Pan-Cancer Analysis of Whole Genomes. b, Schematic of the ENiClust IC classifier. WES, whole-exome sequencing. c, Kaplan–Meier curves of distant relapse-free (DRF) survival of the ER+ typical-risk and ER+ high-risk classes detected by the four IC subtype classifiers. Shaded area represents 95% confidence interval. HR, hazard ratio. d, Difference in distant relapse-free survival probability (top) or delta in Cox proportional hazard ratio (bottom) between ER+ typical-risk and ER+ high-risk classes detected by the four different IC classifiers. Error bars represent the difference in 95% confidence intervals between ER+ typical-risk and ER+ high-risk in each model. e, Differential pattern of relapse across the ICs, illustrated by the cumulative (black) and annual (red) risk of relapse over time. f, IC subgroup (left) and subtype (right) distributions across disease stages. P, primary; M, metastatic. The schematics in a,b were created with BioRender.com.

Although the ICs predict distant relapse and delineate genomic drivers4,5, current methods fail to accurately capture them using DNA profiles alone12. Accordingly, we developed Ensemble Integrative Clustering (ENiClust), which reliably infers IC subtypes from whole-exome sequencing or WGS, across all stages of disease (Fig. 1b and Supplementary Table 2). The final ensemble model yields a nine-class prediction (Fig. 1b and Supplementary Table 2), which is further split into ten on the basis of the ER status of IC4 (that is, IC4ER+ and IC4ER). These ten classes comprise four clinically distinct IC subgroups—TNBC (IC10 and IC4ER), HER2+ (IC5), ER+ typical-risk (IC3 + IC7, IC4ER+ and IC8) and ER+ high-risk (IC1, IC2, IC6 and IC9). Throughout we refer to HER2+ tumours as those classified as IC5, enriching for ERBB2 amplification. ENiClust outperformed iC10 DNA alone1,12 (Methods and Supplementary Fig. 1d) and improves patient stratification, with high-risk tumours exhibiting worse distant recurrence-free survival (METABRIC; Fig. 1c–e and Supplementary Fig. 1d–f). Thus, ENiClust identifies clinically meaningful subgroups with distinct biology.

Using ENiClust, we interrogated the distribution of ICs across disease stages. DCIS was enriched for IC5 tumours (Fisher’s exact test P = 2.98 × 10−6; Fig. 1f), corroborating our previous findings10. ER+ high-risk ICs were enriched among metastatic tumours, consistent with their increased relapse risk (Fig. 1f and Extended Data Fig. 1a). IC10 basal-like tumours were depleted in the metastatic cohort, potentially owing to differences in ancestry (Extended Data Fig. 1b–d). The ICs were largely stable from primary to metastasis (concordance = 71.8%; Extended Data Fig. 1e,f).

There was an increased proportion of luminal B versus luminal A from pre-invasive to primary (Δ(LumB/(LumA + LumB)) = +11%) and primary to metastatic (Δ(LumB/(LumA + LumB)) = +29%; Extended Data Fig. 1g) lesions. Among primary tumours, ER signalling in ER+ high-risk tumours was more akin to that of HER2+ER+ tumours15 and significantly lower than that of ER+ typical-risk tumours (Extended Data Fig. 1h), with no difference between primary and metastatic tumours (Extended Data Fig. 1i). Compared to ER+ typical-risk, ER+ high-risk was enriched among patients with tumours that were resistant to endocrine therapy (odds ratio (OR) ≥ 5.58, P ≤ 0.03; Supplementary Fig. 1g). In a clinical trial (NCT00651976) in early-stage ER+ breast cancer, high-risk tumours had a decreased proliferation score with letrozole treatment but it remained significantly higher than that for typical-risk tumours (P ≤ 0.02; Supplementary Fig. 1h). Thus, ER+ high-risk tumours may experience persistent proliferation despite endocrine treatment. New therapies (selective oestrogen receptor degraders and proteolysis-targeting chimeras) that more fully suppress proliferation might particularly benefit this subgroup.

Early IC-specific SVs fuel progression

The IC subtypes have distinct CNA landscapes (Extended Data Fig. 1j), but their SV landscape and evolution have not been investigated. Leveraging ENiClust, we found that the IC-subgroup-specific genomic landscape of breast cancer is consistent throughout disease progression despite an increased burden of alterations10,13,16,17 (Fig. 2a and Extended Data Fig. 2a,b). Both HER2+ and ER+ high-risk primary and metastatic tumours exhibit characteristic sharp increases in SV burden at their respective recurrently amplified loci (IC5: 17q12; IC6: 8p11; IC2: 11q13; IC1: 17q23). The peak of SV burden at 17q12 (ERBB2) suggests that ERBB2 amplification is fuelled by complex alterations, such as ecDNA18. The mutational burden in primary ER+ typical-risk tumours was minimal (Supplementary Fig. 1i) but increased in metastatic disease (Fig. 2a), in part owing to treatment (Extended Data Fig. 2c). IC10 and IC4ER tumours exhibit diffuse genome-wide instability with an increased SV burden, although the latter show an attenuated pattern and harbour fewer pathogenic SVs and alterations in DNA repair pathways, confirming previous reports19 (Extended Data Fig. 2d,e). Across metastatic sites, the cumulative burden of alterations was higher in lung and subcutaneous metastases and lower in soft-tissue and in-breast recurrences (Extended Data Fig. 2f). These subgroup-specific alterations were seen in DCIS (Extended Data Fig. 2a), emphasizing early oncogene addiction and mechanisms of malignant transformation.

Fig. 2: SVs define three distinct genomic archetypes.
figure 2

a, IC group-level CNA profile (shaded area; dark denotes amplification, light denotes deletion) with SV burden (line) as overlay and total alteration burden in primary and metastatic samples. b, Pareto front projection on ternary plot of CNA and SV signature profiles from primary (left) and metastatic (right) tumours independently, resulting in three genomic archetypes. Each plotted circle represents a tumour. c, Lollipop plots illustrating the correlation between mutational features and the distance to each archetype. amp., amplification; BFB, breakage–fusion–bridge; TIC, templated insertion chain; LOH, loss of heterozygosity; WGD, whole-genome doubling; FGA, fraction of genome altered.

Next we characterized CNA and SV signatures in 702 primary breast tumours, replicating the 24 CNA20 and 6 rearrangement8,21 signatures (RSs) previously reported (Supplementary Fig. 2a–c). RS3, RS5 (associated with homologous repair deficiency (HRD); Supplementary Fig. 2d) and CN17 were enriched in IC10 tumours, whereas RS4, RS6 (associated with complex amplifications) and CN7 were enriched in ER+ high-risk and HER2+ tumours (Extended Data Fig. 2g,h and Supplementary Fig. 2e–g). ER+ typical-risk tumours were enriched for CN1 (associated with diploid genomes; Supplementary Fig. 2d,e).

Projected on a two-dimensional plane (Supplementary Fig. 3a,b), the architectural profiles follow a continuum and form a polyhedron reminiscent of Pareto optimum theory, which illustrates trade-offs between biological tasks22. Primary breast cancers map onto three dominant genomic archetypes (Supplementary Fig. 3c–f): TNBC-enriched, ER+ typical-risk-enriched and ER+ high-risk + HER2+-enriched. Tumours dominated by a single mutational process are proximal to a vertex, whereas those characterized by multiple processes cluster at the centre (Fig. 2b and Extended Data Fig. 2i). The TNBC-enriched archetype was positively correlated with genomic instability, HRD and APOBEC-editing SNVs (Fig. 2c and Supplementary Fig. 3g). Compared to ER+ high-risk tumours, HER2+ tumours were enriched for tyfonas (Extended Data Fig. 2j). The ER+ high-risk + HER2+-enriched archetype was positively correlated with complex amplifications, reactive oxygen species and APOBEC-associated SNVs harbouring co-amplification of multiple cytobands (Extended Data Fig. 3a). By contrast, the ER+ typical-enriched archetype negatively correlated with most genomic features.

Tumours predicted to be BRCA-like on the basis of germline or somatic genomic features23 map to the TNBC-enriched archetype (Extended Data Fig. 3b). Indeed, both BRCA1-like and BRCA2-like ER+ and ER tumours demonstrated significantly higher TNBC-archetype scores than non-HRD tumours, and HRD-like ER+ high-risk tumours were closer to the TNBC-enriched archetype than their non-HRD-like counterparts (OR = 5.09; P = 6.5 × 10−4). Additionally, the mutational patterns of BRCA1-like and BRCA2-like ER and ER+ tumours were highly concordant (Supplementary Fig. 3h,i). Notably, whereas 43.6% of TNBC tumours were HRD-like, 13.2% of ER+ high-risk tumours were also predicted to be HRD-like, with most being ER+ high-risk IC1 or IC9 (OR = 4.43; P = 0.03; Extended Data Fig. 3c and Supplementary Fig. 3j). Indeed, although foldback inversions and pyrgos were enriched in TNBC (foldback inversion: 17.3%, P = 2.00 × 10−3; pyrgos: 18.8%, P = 9.33 × 10−4), these mutational events were also observed in ER+ tumours (5.1% and 4.1%, respectively; Extended Data Fig. 3d). These data reinforce multiple mechanisms of genome instability in TNBC24 that also affect a subset of ER+ tumours.

The three genomic archetypes replicated in an independent cohort of 2,229 primary tumours from Genomics England21 (Extended Data Fig. 3e). Overall, the genomic landscape of primary breast tumours falls along a continuum with mutational patterns captured by three main genomic archetypes, namely, genome-stable, diploid genomes (ER+ typical-risk-enriched), genome-wide instability (TNBC-enriched) and focal, complex amplifications (ER+ high-risk + HER2+-enriched).

Metastatic lesions exhibit increased SNV and SV burdens compared to unpaired primary tumours, probably owing to therapy, as we and others have shown13,17. Using the above approach, we identified six de novo SV signatures in metastases that correlated with those in primary tumours (Supplementary Fig. 4a,b) and showed similar subgroup-specific enrichment patterns (Extended Data Fig. 3f). Two-dimensional projection again revealed three dominant archetypes (Supplementary Fig. 4c) that overlap with those in primary tumours (Fig. 2b,c, Extended Data Fig. 3g and Supplementary Fig. 4d). Our results were robust to choice of dimensionality reduction algorithm (Supplementary Fig. 4e–g). Thus, the three genomic archetypes of breast cancer are conserved in metastatic disease.

SV signatures were generally conserved, although increased, in metastatic tumours except for RS4 and RS6 in ER+ high-risk and HER2+ tumours, respectively, which were stable (Extended Data Fig. 3h). These data support the early occurrence of complex rearrangements and their persistence through metastasis. Although the distribution of CNA signatures mirrored primary tumours, the Pareto front revealed increased alteration burden and more intermixed profiles in metastasis, consistent with increased whole-genome doubling and genomic instability17 (Extended Data Fig. 3i and Supplementary Fig. 4h,i). Thus, metastatic tumours retain the scars of subgroup-specific mutational processes operative in early-stage disease.

Although ER+ typical-risk tumours have a favourable prognosis, 29% of patients experience distant relapse4. We investigated whether the genomic archetypes improve risk stratification. Mapping METABRIC onto the Pareto front (Methods, Extended Data Fig. 3j and Supplementary Fig. 4j–l), the position of ER+ typical-risk tumours was predictive of relapse, with recurrent tumours mapping closer to the ER+ high-risk + HER2+ archetype (Extended Data Fig. 3k,l) accompanied by a higher HRD loss-of-heterozygosity score, invasive lobular carcinoma (ILC) histology and increased proliferation.

In METABRIC, ILCs were enriched in ER+ typical-risk tumours (OR = 2.20, P = 2.27 × 10−3, Fisher’s exact test; Supplementary Fig. 4m). Within ER+ high-risk tumours, ILCs exhibited a higher 5-year recurrence risk (39% versus 30%) and cumulative recurrence risk (62% versus 54% at 20 years; Extended Data Fig. 3m). This difference was more marked among ER+ typical-risk tumours (55% versus 37% at 20 years). ILCs were closer to the ER+ typical-risk archetype than their invasive ductal carcinoma (IDC) counterparts (P = 2.10 × 10−5; Extended Data Fig. 3n,o) given their lower levels of whole-genome doubling, ploidy and fraction of genome altered. Thus, given comparable genomic architectures, lobular histology remains a high-risk feature.

ER-induced R-loops fuel ecDNA genesis

ER+ high-risk and HER2+ breast tumours were enriched for complex amplifications in two independent cohorts (OR > 10.1; P < 2.2 × 10−16; Fig. 3a and Extended Data Fig. 4a), motivating further exploration of their origin and nature (Supplementary Fig. 5a,b). There was no difference in cyclic amplifications in HER2+ER primary tumours compared to HER2+ER+ primary tumours (Extended Data Fig. 4b). Leveraging two independent ecDNA inference methods, 43–67% of primary ER+ high-risk and HER2+ cases were predicted to harbour ecDNA (Extended Data Fig. 4c,d). A proportion of HER2+ primary tumours (25.7%) harboured amplifications in loci specific to the ER+ high-risk subgroup (Extended Data Fig. 4e and Supplementary Fig. 5c), with 8.57% predicted to be on ecDNA. Additionally, we observed a modest enrichment of inversions at the 11q13 locus in primary tumours. HRD and ecDNA were mutually exclusive in primary ER+ high-risk and IC10 tumours (OR = 0.21–0.29; false discovery rate (FDR) < 0.02; Supplementary Fig. 5d). We interrogated complex amplifications in 406 pre-invasive DCIS profiled with shallow WGS (5× median coverage)10. We predicted 35 cyclic and 205 complex non-cyclic amplifications, enriched in ER+ high-risk + HER2+ tumours (OR = 4.21; P = 2.48 × 10−4; Extended Data Fig. 4f and Supplementary Fig. 5e). This pattern replicated in 12 DCIS samples from Genomics England (92.8×)25. Leveraging the clock-like accumulation of mutations, SNV density informs the timing of cyclic amplifications (Methods). Compared to cyclic amplifications in TNBC tumours, cyclic amplifications in ER+ high-risk and HER2+ tumours had a lower SNV density before amplification, suggesting an earlier origin (Fig. 3b and Supplementary Fig. 5f). Median time of cyclic amplification in ER+ high-risk and HER2+ tumours occurs decades earlier than in IC10 tumours, respectively, implicating cyclic amplifications as early events.

Fig. 3: Cyclic amplifications are early mutational processes in ER+ high-risk and HER2+ breast tumours.
figure 3

a, Proportion (top) and number (bottom) of samples with at least one cyclic or complex non-cyclic amplification in primary or metastatic tumours. b, The density of SNVs occurring before amplification in primary (top) and metastatic (bottom) tumours. Boxplot represents median, 0.25 and 0.75 quantiles with whiskers at 1.5× the interquartile range. c, Illustration showing copy number (CN) and SVs linking together disjoint segments in ecDNA (top), ratio of read depth in the tumor versus normal sample (middle) and location of oncogenes in ecDNA (bottom) in a representative primary IC2 tumour. d, Ratio of sequencing coverage in digested versus parental UCD65 (IC2) cell line in the predicted ecDNA region (dashed red line) compared to 1,000 null regions. e, Proportion of tumours within each IC subtype that harbour cyclic, complex non-cyclic or linear amplification in IC-specific oncogenes. f, Schematic for the genesis of cyclic amplifications. TC-NER, transcription-coupled nucleotide-excision repair. g, The density of ER-induced R-loops in cyclic versus complex non-cyclic amplifications. h, The percentage of breakpoints that overlap ER-induced R-loops with (+) or without (−) E2 treatment. Error bars represent the standard deviation across three replicates. i, The distance of each oncogene to the nearest ER-induced R-loop. The schematic in f was created with BioRender.com.

Most cyclic amplifications in ER+ high-risk (88%) and HER2+ (96%) tumours overlapped at least one COSMIC-defined oncogene (Extended Data Fig. 4g). Of these, 79–92% involved oncogenes in IC-associated cytobands (Extended Data Fig. 1j) and 15% involved two or more cytobands (Extended Data Fig. 4h and Supplementary Table 3). In cell line models of IC2 (UCD65) and IC6 (UCD12) before and after linear DNA digestion, significantly higher sequencing coverage occurred at regions predicted to encode ecDNA, corroborating our computational predictions (Fig. 3c,d and Supplementary Fig. 5g,h). Oncogene incorporation varied across subtypes, with HER2+ tumours harbouring the largest number per megabase (Extended Data Fig. 4i,j). A total of 82% of IC2, 59% of IC5 (HER2+), 48% of IC6 and 32.5% of IC1 tumours had predicted cyclic amplifications at subgroup-defining cytobands, whereas 3% of IC1 and IC9 tumours harboured cyclic amplifications at 20q13, spanning the NCOA3 oncogene (Fig. 3e). Overall 42% of IC9 tumours harbour ecDNA, but these ecDNAs are diffuse along the genome and do not include MYC. In support, focal SV peaks were not observed at 8q24 spanning the MYC oncogene in IC9 primary or metastatic tumours. Instead, a broader region is subject to enhancer hijacking by the long noncoding RNA PVT1, as we previously reported26. PVT1 co-amplifies with MYC in about 90% of tumours (Supplementary Fig. 5i). Frequent enhancer hijacking at MYC may explain the weak correlation between MYC copy number and mRNA abundance (Supplementary Fig. 5j,k).

The subset of ER+ typical-risk tumours harbouring ecDNA fell along the ER+ typical-risk versus high-risk archetype continuum (Extended Data Fig. 4k–l). By contrast, ER tumours with ecDNA had limited structural conservation (Extended Data Fig. 5a). Across all subgroups, similar patterns were observed in metastatic and pre-invasive tumours (Extended Data Fig. 5b–f).

Increased replication stress has been associated with response to checkpoint27 and DNA repair28 inhibitors, and hence is a therapeutic vulnerability in TNBC8. Assessing replication stress across the IC subgroups, we found increased levels of oncogene-induced replication stress in ER+ high-risk and HER2+ tumours compared to ER typical-risk, IC10 and IC4ER tumours (FDR < 0.026; Extended Data Fig. 6a,b and Supplementary Table 2). The replication stress signature was positively correlated with TNBC-enriched and ER+ high-risk + HER2+-enriched genomic archetypes (effect size > 0.154, P < 4.98 × 10−15; Extended Data Fig. 6c–e). Within ER+ typical-risk tumours, ILC had a higher replication stress than IDC (FDR = 4.08 × 10−3). Meta-analysis suggests a positive association between ecDNA and replication stress in HER2+, IC1 and IC6 tumours (Extended Data Fig. 6f) and higher levels of type-I interferon signature in ecDNA+ tumours (Supplementary Fig. 6a). Finally, ER+ high-risk and HER2+ tumours demonstrated increased cGAS–STING activity (Extended Data Fig. 6g,h), a possible therapeutic target linked to chromosomal instability and replication stress29.

Consistent with the findings of ref. 30, our data showed that cyclic amplifications were significantly enriched for translocations compared to complex non-cyclic amplifications in ER+ primary tumours (Fig. 3f, Extended Data Fig. 7a,b and Supplementary Fig. 6b). These cyclic-amplified ER+ high-risk tumours had a higher ESR1 mRNA abundance (β = 1.27; P = 6.90 × 10−3; Extended Data Fig. 7c) and enriched ER binding within the amplified region (Extended Data Fig. 7d and Supplementary Fig. 6c). Nonetheless, ER signalling was lower in ER+ high-risk compared to typical-risk tumours (Extended Data Fig. 1h). Given the evidence for ecDNA in pre-malignant lesions, we reasoned that ER signalling is increased in DCIS lesions that classify as ER+ high-risk and subsequently decreases in invasive disease. Leveraging 18 paired ER+ DCIS and primary tumours with transcriptome sequencing10, we observed decreased ER signalling in ER+ high-risk tumours (effect size = 0.33; P = 0.03; Extended Data Fig. 7e). These data support the role of ER in ecDNA genesis through translocations and emphasize their early origin.

The mechanism by which ER activation induces translocations remains unknown. ER recruitment of A3B promotes double-stranded breaks (DSBs) at ER binding sites31 (Fig. 3f and Extended Data Fig. 7a). Increased ER-induced transcription leads to the formation of R-loops producing single-stranded DNA, a substrate for A3B-editing32. A3B deaminates cytosine to uracil, which can be repaired by base-excision repair (BER). Single-strand nicks induced by BER coupled with transcription-coupled nucleotide-excision repair processing of the R-loop can result in DSBs31. Together, these findings indicate that A3B can exacerbate chromosomal instability in the pre-invasive setting33. We reasoned that ER-induced R-loops initiate translocation-bridge amplifications through A3B-editing and confirmed that A3B binding in ER+ cell lines was enriched in cyclic versus non-cyclic amplifications (Extended Data Fig. 7d and Supplementary Fig. 6c). Treatment with oestradiol (E2) in MCF7 cell lines induced R-loops (nR-loops = 212) in the same regions where cyclic amplifications were observed in patient tumours (Fig. 3g and Extended Data Fig. 7f,g). This finding was specific to ER-induced R-loops (nR-loops = 13,965; Extended Data Fig. 7h). Unresolved R-loops due to A3B knockout in MCF10A cells were preferentially enriched in regions of cyclic amplifications in primary breast tumours (Extended Data Fig. 7i and Supplementary Fig. 6d). Tumours containing ecDNA were also enriched for transcription-replication collision-associated large tandem duplications (>100 kb), indicative of impaired R-loop resolution28 (Supplementary Fig. 6e). Translocations within cyclic amplifications were significantly closer to ER-induced R-loops than those outside cyclic amplifications (Extended Data Fig. 7j,k). These data support a role for A3B in R-loop resolution, contributing to ecDNA formation.

Accordingly, we reasoned that oestrogen-induced SV breakpoints would be enriched at ER-induced R-loops. Comparing SV patterns in E2-treated MCF7 cells through high-throughput genome-wide translocation sequencing of DSBs forming translocations induced by CRISPR–Cas9 (ref. 30), we confirmed the enrichment for E2-induced breakpoints at E2-induced R-loops (Fig. 3h) compared to all R-loops (Extended Data Fig. 7l). There was no difference in replication timing between cyclic and non-cyclic amplifications (Supplementary Fig. 6f). ER-induced R-loops were enriched closer to the IC-specific oncogenes PAK1 (IC2), ZNF703 (IC6) and MYC (IC9) than to all other COSMIC-defined oncogenes, including ERBB2 (Fig. 3i). There was no enrichment of ER-induced R-loops near IC1 oncogenes.

Germline CNA polymorphisms in A3B have been associated with APOBEC-dependent mutations34 and immune activation in breast cancer35. Despite limited power, our analyses found a modest but nonsignificant decrease in ecDNA prevalence in ER+ high-risk and typical-risk samples with the homozygous deletion allele (n = 5; Extended Data Fig. 7m). Together, these data indicate that ER activity promotes cyclic amplifications through R-loop formation and A3B-editing.

The ICs harbour distinct TMEs

Tumour clonal composition and genomic features are sculpted by immune pressures36, and oncogenic alterations promote both pro-tumour and anti-tumour immune responses37. Using transcriptomic profiles, we characterized the TME in primary (nTCGA = 1,015; nMETABRIC = 1,894) and metastatic (n = 360) tumours focusing on four subtypes defined by immune infiltration and stromal composition: immune-enriched fibrotic, immune-enriched non-fibrotic, fibrotic and depleted38 (Fig. 4a, Extended Data Fig. 8a and Supplementary Table 2). The reproducibility of the TME subtypes is supported by single-cell spatial proteomic profiling (n = 384; Extended Data Fig. 8b) and cell type proportions estimated from bulk transcriptomics (Supplementary Fig. 7a).

Fig. 4: Complex alterations contribute to IC-specific immune escape.
figure 4

a, Schematic of TME subtypes and select immune escape pathways. b, Comparison of TME subtypes by IC subgroup. The number of tumours in each subgroup is indicated on the top of each bar. c, Left: proportion of primary and metastatic samples in each IC subgroup with GIE. Right: proportion of samples with alterations in each pathway stratified by IC subgroup and stage of progression. d, Proportion of alteration types in primary and metastatic samples for each of the immune escape pathways. The schematic in a was adapted from BioRender.com (credit: A. Iwasaki & J.-H. Lee; https://app.biorender.com/biorender-templates/figures/all/t-5f4fb77c3b02b700b74df8c6-mhc-class-i-and-ii-pathways).

We then quantified microenvironmental differences across the IC subgroups. Primary IC10 and IC4ER were enriched for immune-rich (immune-enriched non-fibrotic and immune-enriched fibrotic) TMEs (OR = 3.004, P = 5.17 × 10−11, Fisher’s exact test; Fig. 4b and Supplementary Fig. 7b), as previously reported39. ER+ high-risk and HER2+ primary tumours harboured immune-depleted TMEs (OR = 3.09, P = 1.06 × 10−15, Fisher’s exact test), whereas genome-stable ER+ typical-risk and IC4ER primary tumours were enriched for fibrotic signatures (fibrotic and immune-enriched fibrotic subtypes; OR = 5.619, P < 2.2 × 10−16, Fisher’s exact test). These observations replicated using a second transcriptional immune score (Supplementary Fig. 7c,d). Within ER+ high-risk tumours, immune enrichment did not differ across subgroups (Extended Data Fig. 8c). Among ER+ typical-risk tumours, ILCs were enriched for the immune-enriched fibrotic subtype compared with IDCs (OR = 2.18, P = 1.17 × 10−3; Extended Data Fig. 8d).

IC4ER tumours have a more favourable prognosis but longer-term risk of recurrence than IC10 tumours5 despite similar genomic landscapes (Fig. 2a). To investigate differences in their TME, we leveraged single-cell spatial proteomic data and discovered an increased proportion of fibroblasts and T cells in IC4ER compared to IC10 tumours (Extended Data Fig. 8e and Supplementary Fig. 7e). In support, previous work has linked increased T cell infiltration with improved overall survival in TNBC39. Compared to primary tumours, ER metastatic tumours were depleted of immune-enriched non-fibrotic and immune-enriched fibrotic features (OR = 3.01; P = 2 × 10−4; Extended Data Fig. 8f). By contrast, HER2+ and ER+ tumours exhibited stable TMEs through metastasis (Fig. 4b, Extended Data Fig. 8g and Supplementary Fig. 7f), consistent with previous reports that ER promotes immunosuppression and immunoediting in pre-invasive lesions40,41.

We found that 43.86% of primary and 47.67% of metastatic tumours exhibited genetic immune escape (GIE), most of which occurred in a single pathway with varying prevalence across IC subgroups (Fig. 4c and Supplementary Tables 4 and 5). IC2 and IC6 tumours were more immune-depleted than IC1 and IC9 tumours (Extended Data Fig. 8c) but harboured fewer GIE (Extended Data Fig. 9a). Instead, 60% of primary IC6 tumours amplified IDO1, which encodes the heme-containing enzyme indoleamine 2,3-dioxygenase located within 8p11.21 that metabolizes tryptophan involved in immune tolerance42 (Extended Data Fig. 9b,c). ER+ typical-risk ILCs exhibit fewer GIE alterations than ER+ typical-risk IDC tumours (Extended Data Fig. 9d), and GIE was not associated with antigen burden (Supplementary Fig. 7g).

Complex alterations and SVs have been overlooked when evaluating GIE37. We found that about 20% of primary and metastatic tumours with GIE harboured SVs or complex amplifications (Fig. 4d and Extended Data Fig. 9e). HER2+ tumours demonstrated the largest increase in GIE between primary and metastatic disease, potentially owing to greater pressure to evade anti-HER2 therapies (OR = 2.23, FDR = 0.19, Fisher’s exact test; Extended Data Fig. 9f). These data illuminate the role of complex alterations in immune escape and tumour-immune co-evolution during disease progression.

Discussion

Here we identify three dominant genomic archetypes of breast cancer driven by distinct mutational processes, describing a continuum of genomic profiles and providing a mechanistic basis for these patterns (Fig. 5a). These three archetypes overlap with the main clinical breast cancer subgroups with a notable difference. For a sizeable proportion of ER+ tumours (43.2%), the ER+ high-risk + HER2+ archetype dominates and the mutational processes are indistinguishable from those of HER2+ tumours. Rather than amplifying ERBB2, these ER+ high-risk tumours harbour focal amplifications of other oncogenes (Extended Data Fig. 1j) and have an increased risk of recurrence akin to HER2+ tumours before the introduction of anti-HER2 therapies5. These ER+ high-risk tumours may similarly benefit from agents directed at their amplified oncogenic drivers and/or shared vulnerabilities.

Fig. 5: Genomic and microenvironmental evolution of breast cancer subgroups.
figure 5

a, Schematic summary of the genomic and microenvironmental characteristics of the three dominant genomic archetypes in breast cancer. b, Temporal changes in genomic stability, ER signalling and immune enrichment from pre-invasive, primary invasive to metastatic disease across subgroups. The schematics in a,b were created with BioRender.com.

A defining feature of the ER+ high-risk + HER2+ archetype is the generation of focally amplified ecDNA through ER-induced R-loops and A3B-editing. ER-induced R-loops create single-stranded DNA, which serves as a substrate for A3B-editing. DSBs arising from BER and nucleotide-excision repair are resolved in the form of interchromosomal translocations. Dicentric chromosomes can form chromosome bridges during mitosis, and breakage of these bridges can generate ecDNA30. ecDNA formation preferentially occurs at loci that define the four ER+ high-risk subgroups and HER2+ disease. Although ecDNA genesis depends on ER, circular amplification may reduce reliance on ER by increasing a particular oncogene’s copy number and rewiring its regulatory network43. This is supported by reduced ER signalling in ER+ high-risk tumours from DCIS to invasive disease. As ER transcriptional activity can contribute to DSBs44, ecDNA formation may balance increased oncogenic signalling with protection against further ER-induced genomic instability (Fig. 5b), and hence reflects an evolutionary trade-off, consistent with mutual exclusivity between complex amplifications and diffuse genome instability.

Beyond tumour subtype, the mutational processes captured by our architectural map may be indicative of distinct therapeutic vulnerabilities. For example, HRD-like tumours are sensitive to PARP inhibition and this has become a mainstay of therapy for TNBC. We find that 44% of TNBC tumours have HRD-like profiles on the basis of WGS, and 13% of ER+ high-risk tumours exhibit BRCA2-like patterns. Although HRD as measured from sequencing data is not confirmed to correlate with PARP inhibitor sensitivity, this result implies that additional patients may benefit from these agents. Further, we find that focally amplified ER+ high-risk tumours exhibit increased replication stress pathway activities, suggesting potential sensitivity to new agents targeting this pathway. Additionally, although APOBEC3 mutagenesis can occur early during tumorigenesis, given its effect on ER activity, A3B represents a potential target in the ER+ high-risk subgroup for which inhibitors are in development45.

The mutational processes that generate and propagate genomic instability both sculpt oncogenic signalling and mediate interactions between tumour cells and the TME. More specifically, SVs contribute to GIE in 9% of breast tumours, but have been overlooked, owing to the need for WGS. Basal-like IC10 tumours, which harbour both high genomic instability and immune infiltrates, probably adapt to this immune pressure through GIE. By contrast, ER+ tumours, both typical- and high-risk, are more immune-depleted at the onset with fewer GIE events, suggesting non-GIE mechanisms37. This is noteworthy given the evolving utility of immunotherapy in breast cancer46. Despite high immune infiltration, up to 62% of TNBC tumours are resistant to current immunotherapies, potentially owing to GIE, whereas 38% of ER+ tumours have immune-enriched TMEs, making them candidates for such agents. Our findings highlight multiple potential strategies for personalizing breast cancer treatment, which will be the focus of ongoing preclinical and translational studies.

Methods

A detailed description of the methods and materials is available in the Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.