Introduction

Despite remarkable progress in breast cancer treatment, locoregional recurrence and distant metastasis remain the primary causes of cancer-related mortality. This clinical challenge largely arises from dynamic intra-tumoral heterogeneity, where genomic and epigenetic alterations generate therapy-resistant subpopulations1. While conventional treatments preferentially eliminate rapidly proliferating cells, they often spare dormant tumor reservoirs with stemness properties2. Similarly, targeted therapies inevitably select for resistant clones through mutation or alternative pathway activation3. In addition to the cancer cell-intrinsic factors, tumor cells exploit stromal interactions—including immune evasion and angiogenesis-mediated nutrient acquisition—to facilitate disease recurrence4. These observations underscore the urgent need for preclinical models that accurately recapitulate human disease progression and recurrence.

Established cell lines, after prolonged in vitro culture, often lose critical features of human tumor heterogeneity upon transplantation5. Genetically engineered mouse models (GEMMs) using mammary-specific promoters like MMTV or WAP provide valuable alternatives6. Unlike the pregnancy-activated WAP promoter, the MMTV promoter is constitutively active in mammary epithelium without requiring pregnancy, significantly simplifying experimental procedures. Although not naturally occurring in human breast cancer—unlike commonly activated oncogenes such as Wnt7 and HER2/Neu8, the polyomavirus middle T antigen (PyMT) mimics receptor tyrosine kinase signaling and generates tumors that progress through histopathological stages closely resembling human breast cancer, with tumor-associated stromal cells infiltration actively shaping disease evolution9,10. Notably, while MMTV-PyMT tumors display luminal B features including HER2/Neu upregulation in late stages, their lack of estrogen and progesterone receptor expression also makes them suitable for modeling triple-negative breast cancer11.

MMTV-PyMT also offers additional advantages, particularly its strain compatibility and genetic simplicity. Unlike most MMTV-driven models limited to FVB mice, MMTV-PyMT reliably forms tumors on the C57BL/6 background—albeit with longer latency—enabling access to the extensive genetic tools available for this widely used background. Moreover, its single-transgene design contrasts with the complex multi-component systems required for modeling other cancers (e.g., Cre+ Kras+ p53null models for lung12 and pancreatic cancer13), allowing for orthogonal use with Cre-loxP mediated lineage-tracing technologies in native tumor microenvironments.

Beyond cell-type-specific Cre drivers, markers for biological processes (e.g., proliferation14 and senescence15) have been used to control Cre expression. Here, we developed a proliferation tracing and ablation system by incorporating a fluorescent-diphtheria toxin receptor (DTR) dual reporter into the MMTV-PyMT breast cancer model. This system enables simultaneous identification and selective elimination of proliferating cell populations, while allowing investigation of relapse originating from dormant reservoirs. Through integrated single-cell RNA sequencing (scRNA-seq), we characterized the transcriptomic landscapes of primary and recurrent tumors and uncover profound intra-tumoral heterogeneity and extensive microenvironmental remodeling associated with locoregional relapse. Our work establishes a versatile experimental platform that serves as both a mechanistic discovery tool and a therapeutic evaluation system, providing valuable resources for understanding tumor recurrence and developing targeted treatments.

Results

Design of the PyMT ProTracer and Deleter System

Building upon a previously reported genetic tracing system for proliferating cells14, we developed a cell proliferation recording and depletion system by replacing the fluorescent reporter with a fluorescent-diphtheria toxin receptor (DTR) dual reporter and integrating this system with a mammary-specific PyMT overexpression spontaneous breast cancer model (MMTV-PyMT), which we called PyMT ProTracer/Deleter (Fig. 1a). In this system, a single tamoxifen (Tam) pulse activates DreERT2, which removes Rox-STOP-Rox (RSR) cassette upstream of Cre (Ki67-RSR-Cre). Subsequently, when cells enter the cell cycle and express Ki67, Cre recombinase is produced. The activated DreERT2 and Cre collectively mediate the removal of transcriptional stop cassettes (Rox-STOP-Rox and LoxP-STOP-LoxP, respectively) in the reporter line, driving the concurrent expression of tdTomato fluorescent protein and DTR16. This dual-recombinase system provides two key capabilities: (1) permanent genetic labeling of all proliferative (Ki67+) cells following Tam administration, and (2) selective ablation of these cells via diphtheria toxin (DT) treatment at any desired timepoint.

Fig. 1: PyMT ProTracer/Deleter for modeling breast cancer relapse.
figure 1

a Schematic of Ki67-based proliferation tracing and deletion in the PyMT spontaneous breast cancer model. Tamoxifen-induced DreER/Rox recombination activates Ki67 promoter-driven Cre, enabling continuous labeling of proliferating cells. b Experimental workflow for fluorescent labeling of organoids derived from the PyMT ProTracer/Deleter model (4-OH-TAM: 4-Hydroxytamoxifen; IF: immunofluorescence; FC: flow cytometry). IF staining (c) and FC quantification (d) of tdTomato+ cells after 4-OH-TAM treatment. e EdU incorporation and Ki67 IF staining in cultured organoids. White arrowheads denote rare non-proliferative single cells or small clusters (scale bar, 50 μm). f Tumor growth kinetics post-palpation. Mice were administered with 2 mg tamoxifen (Tam) via oral gavage on Day 58, and 200 ng diphtheria toxin (DT) intraperitoneally on Day 60. g (Upper) Experimental design for fluorescent labeling and genetic ablation of tumor cells proliferating within a two-day time window. (Lower) IF analysis of tdTomato+ cell abundance in primary versus relapsed tumors (scale bar, 50 μm). In the TamPost approach, Tam was administered both before DT (initial labeling) and during relapse, enabling detection of repopulating cancer cells that were not labeled during the original 2-day labeling window. Mean ± s.e.m. shown. P values were calculated by comparing individual animals using two-tailed paired (d, g) Student’s t-test.

We implemented this system in the MMTV-PyMT spontaneous breast cancer model, which faithfully recapitulates the molecular and histopathological progression of human breast cancer10 (Supplementary Fig. 1a, b). To functionally validate the ProTracer/Deleter system, we first employed an established PyMT in vitro organoid culture system17(Fig. 1b). Treatment with 4-hydroxytamoxifen (4-OH-Tam) efficiently induced tdTomato labeling in cultured PyMT organoids (Fig. 1b–d). The tdTomato expression pattern showed strong concordance with both endogenous Ki67 expression and EdU incorporation, confirming labeling of proliferating cell populations (Fig. 1e).

To complement our in vitro organoid findings, we evaluated the ProTracer/Deleter system in endogenous PyMT tumors. We found that a single Tam treatment resulted in efficient labeling of 90% PyMT+ cells, exceeding the proportion of actively proliferating cells detected by Ki67 or EdU staining (40%) (Supplementary Fig. 1c, d). This discrepancy highlights the distinction between Ki67 fate mapping, which labels cells that have previously expressed Ki67 since the Tam treatment, and Ki67 reporting, which only scores cells that are currently proliferating. Furthermore, injection of DT led to a profound shrinkage of tumors (Fig. 1f), with a dramatic reduction of tdTomato+ PyMT cells (Fig. 1g). Subsequent relapse was characterized by progressive repopulation, with Tam supplementation during this phase ultimately restoring tdTomato+ cell proportions to pre-treatment levels (Fig. 1g). Collectively, these results demonstrate the system’s robust capacity to monitor proliferative dynamics across all phases of tumor regression and recurrence.

Transcriptomic landscape of primary and relapsed PyMT tumors

We then dissociated the Tam-induced primary PyMT tumors and the Tam/DT-induced relapsed tumors (Supplementary Fig. 2a) and incubated the cells with different hashtag antibodies–unique DNA barcodes conjugated to antibodies that tag both tumor cells and stromal cells (Supplementary Fig. 2b). The information contained within these barcodes enabled us to deconvolute the cells into their respective origins, whether from primary or relapsed tumors, allowing us to pool the single-cell suspensions and eliminate batch effect18. Following multiple quality control steps, we obtained single-cell transcriptomes from a total of 16,892 cells (10,494 from primary tumors and 6398 from relapsed tumors) for downstream analysis (Supplementary Fig. 2c). These cells were categorized into 12 distinct cell types (Fig. 2a), which were annotated using established markers (Fig. 2b). Notably, the abundance of neutrophils was found to be increased in relapsed tumors (Fig. 2c), a phenotype associated with adverse clinical outcomes in breast cancer, including resistance to immunotherapy and distant metastasis19. Conversely, the percentage of natural killer (NK) cells20, which are potent cytotoxic innate lymphoid cells that limit tumor growth, was decreased (Fig. 2c). These changes suggest that cancer recurrence not only involves the re-emergence of malignant epithelial cells but also the remodeling of stromal cells, thereby creating a permissive environment for the re-initiation of cancer cells.

Fig. 2: Single-cell transcriptomic profiling reveals distinct cellular landscapes between primary and relapsed PyMT tumors.
figure 2

a UMAP visualization of tumor cellular composition. Immune cell populations are demarcated by black dashed circles. Neutrophils and NK cells are highlighted with green and blue arrows, respectively. b Dot plot analysis of cell type-specific marker genes. Dot size represents the percentage of cells expressing each gene, while color intensity indicates normalized expression levels. Sankey plot comparing the proportions of CD45+ immune cells (c) and CD45 non-immune cells (d) in primary and relapsed tumors.

Cancer cell dynamics in tumor relapse

In addition to comparing the total numbers of different cell types, we also sought to identify the distinctions between the same type of cells isolated from primary and relapsed PyMT tumors by re-clustering the cells at a higher resolution and analyzing their gene expression profiles. Given that the tumor cells are of epithelial origin in breast cancer21, we initially focused on the EpCAM+ epithelial cells, which were further divided into 14 clusters (Supplementary Fig. 3a, Fig. 3a). Notably, PyMT+ epithelial cells from the relapsed tumor exhibited a greater enrichment of cancer stem cell populations, including the Tspan8+ and Aldh1a3+ clusters, which are known prognostic markers of poor clinical outcome22,23. Pseudo-time analysis based on the unsupervised RNA velocity indicated that the Aldh1a3+ cells were in a more progenitor-like state, with the potential to give rise to other malignant epithelial populations (Fig. 3b). This stem-like phenotype may enable their survival during proliferation ablation due to low cell cycling and subsequently repopulate recurrent lesions.

Fig. 3: PyMT+ cells in primary and relapsed tumors.
figure 3

a UMAP visualization of epithelial cell heterogeneity. Tspan8+ and Aldh1a3+ cancer stem cell populations are demarcated with black dashed lines. A Sankey plot compares the frequencies of different epithelial subsets. b RNA velocity-based pseudo-temporal trajectory of epithelial subtypes using partition-based graph abstraction. Arrows indicate predicted differentiation directions. c UMAP projection demonstrating PyMT expression restriction to Ly6a cells. d InferCNV analysis comparing aneuploidy scores between Ly6a and Ly6a+ epithelial cells. e Gene set enrichment analysis revealing upregulation of estrogen response pathway in PyMT Ly6a+ epithelial cells. f UMAP showing cellular composition of myoepithelial cells (yellow) and epithelial cells (cyan). Feature plots showing the expression of PyMT and other EMT markers. g InferCNV analysis showing that a myoepithelial subpopulation shares aneuploidy score with PyMT+ epithelial population. h Differential gene set analysis showing significant upregulation of mesenchymal signatures in PyMT+ myoepithelial versus PyMT+ epithelial cells.

We identified a unique Ly6a+ (Sca-1+) epithelial population that emerged specifically in relapsed tumors. While Ly6a is an established murine stem cell marker24, these cells exhibited low PyMT expression (Fig. 3c, Supplementary Fig. 3b). Compared to bona fide cancer cell clusters, Ly6a+ cluster exhibited a relatively lower accumulation of copy number variations (CNVs), particularly in chromosomes 1 and 17 (Fig. 3d). Although the precise role of Ly6a+ cells in relapse requires further investigation, their molecular profile, including retained estrogen receptor expression (Fig. 3e), suggests they may represent an early-stage PyMT tumor population that survives DT-mediated ablation. This hypothesis aligns with the known ER+ status of early PyMT lesions25.

Apart from epithelial population, we identified a subset of myoepithelial cells that aberrantly expressed PyMT (Fig. 3f). To determine whether these PyMT+ myoepithelial cells represented transformed cells, we performed inferCNV analysis on the myoepithelial cluster. Strikingly, these PyMT+ myoepithelial cells shared characteristic copy number variations with malignant epithelial cells, confirming their neoplastic origin (Fig. 3g). Furthermore, these cells co-expressed basal epithelial markers (Krt5, Krt14) and mesenchymal markers (Vim, Snai1, Cdh2) (Fig. 3f), suggesting either keratin switching or active epithelial-mesenchymal transition (EMT). Pathway analysis reinforced this interpretation, showing significant enrichment of mesenchymal development programs in the PyMT+ myoepithelial population (Fig. 3h). These findings reveal an unexpected plasticity in tumor cell differentiation, where a subset of malignant cells adopts an mesenchymal-like phenotype while maintaining molecular features of transformation.

T cell dynamics in tumor relapse

Immune cells are key components of the tumor microenvironment that influence tumor progression. We characterized lymphoid and myeloid cells, the two major categories of immune cells, in detail. Lymphoid cells, including NK cells and T cells, were further subdivided into 15 subtypes (Fig. 4a, b). Notably, we observed an increased proportion of γδ T cells in relapsed PyMT tumors (Fig. 4b). These γδ T cells have been reported to mediate both anti-tumor effects through IFNγ and pro-tumor effects via IL17 production26. In PyMT model, the γδ T cells enriched in relapsed tumors exhibited high expression of Il17a (Fig. 4c), along with elevated levels of the galactoside-binding protein Lgals1 (Fig. 4c), which inactivates effector T cells by binding to their glycosylated receptor27.

Fig. 4: T cells in primary and relapsed tumors.
figure 4

a Dot plot showing representative marker genes for annotating different subsets of T lymphocytes. Dot size represents the percentage of cells expressing each gene; color intensity shows normalized expression levels (log-scaled). b UMAP visualization of lymphoid cell populations, highlighting γδ T cells (black arrows) and PD-1 +  CD8+ T cells (blue arrows). Sankey diagram quantifies proportional changes in lymphoid subsets between primary and relapsed tumors. c Functional characterization of γδ T cells through violin plots comparing expression profiles of anti-tumor genes versus pro-tumor genes. d Expression of immune checkpoints across CD8+ T cell subsets. A dot plot visualizes both expression levels (color intensity) and detection frequency (dot size). e Clinical correlation analysis showing frequencies of exhausted CD8+ T cells and naïve CD4+ T cells in immunotherapy-treated breast cancer patients. Bar plot illustrating estrogen receptor (ER)+ breast cancer and triple-negative breast cancer (TNBC) cohorts, stratified by T cell clonal expansion status (E: expanded; NE: non-expanded).

Interestingly, this contrasts with γδ T cells in human breast cancer, which exhibit a predominantly pro-inflammatory phenotype characterized by higher expression of anti-tumor mediators including granzymes A/B, perforin (PRF1) and IFNγ compared to immunosuppressive markers (IL17A and LGALS1) (Supplementary Fig. 4a). Consistent with this functional difference, increased γδ T cell infiltration in human breast cancer correlates with better patient prognosis (Supplementary Fig. 4b). These findings highlight important species-specific differences in γδ T cell biology between the murine PyMT model and human breast cancers that warrant careful consideration when translating preclinical findings.

In the context of conventional CD4+ T cells and CD8+ T cells, a noteworthy observation is that Tcf7+ naïve T cells are increased in both CD4+ T cells and CD8+ T cells within relapsed tumors. Conversely, exhausted Pd1+ CD8+ T cells, which co-expressed additional exhaustion markers such as Tox, Tim3 and Tigit28, were significantly reduced (Fig. 4b, d). To evaluate the clinical relevance of these findings, we analyzed single-cell RNA-seq data from human breast cancer patients, where T cell clonal expansion served as a biomarker for anti-PD1 response29. Intriguingly, while PD1+ exhausted CD8+ T cell frequencies were comparable between groups, treatment-responsive TNBC patients exhibited fewer undifferentiated Tcf7+ CD4+ T cells compared to non-responders (Fig. 4e). These results suggest that increased T cell naiveness, which parallels T cell exhaustion, may induce T cell anergy and contribute to immune evasion during tumor recurrence and immune checkpoint inhibitor resistance.

Myeloid cell dynamics in tumor relapse

Myeloid cells, comprising neutrophils, dendritic cells, and monocytes/macrophages, represent a major component of the tumor microenvironment and display remarkable heterogeneity and plasticity30,31. Through single-cell analysis, we identified 16 distinct myeloid subpopulations (Fig. 5a, b). Beyond alterations of cell numbers, differential gene expression analysis revealed key molecular alterations in myeloid cells during tumor relapse (Fig. 5c). Most notably, we observed upregulation of the chemokine Cxcl2 and pro-inflammatory S100a8/a9 genes32,33, along with Vegfa, a marker of immunosuppressive M2-like macrophages.

Fig. 5: Myeloid cells in primary and relapsed tumors.
figure 5

a Dot plot showing representative marker genes for annotating different subsets of myeloid cells. Dot size represents detection frequency; color intensity shows mean normalized expression (log2 scale). b UMAP visualization of myeloid heterogeneity, identifying neutrophils (Neu), conventional DCs, plasmacytoid DCs (pDCs), monocytes (Mono), and macrophages (Mac). Vegfa+ populations are highlighted (orange arrows: neutrophils; blue arrows: macrophages). Sankey plot compares the frequencies of different myeloid subsets in primary and relapsed tumor. c Differential expressed gene (DEG) analysis of myeloid populations in relapsed (Tam + DT) vs. primary tumors (Tam). Volcano plot displays log2 fold-change versus statistical significance (−log10 of adjusted P value). d Violin plots showing the expression of Spp1 and Vegfa across various neutrophil and macrophage subsets. Colors as in (b). The highlighted Vegfa+ neutrophils and Vegfa+ macrophages also exhibited elevated expression levels of Spp1. e Patient stratification based on the expression levels of SPP1 and VEGFA (left panel). The Kaplan-Meier disease-free survival curves of breast cancer patients stratified by the expression levels of SPP1 and VEGFA (right panel). The P value was determined by log-rank test. f Prevalence of SPP1 +  VEGFA+ macrophages in immunotherapy-treated cohorts. Bar plots compare ER+ and TNBC patients, stratified by T cell clonal expansion status (E: expanded; NE: non-expanded). g Violin plots showing the expressions of chemokine Cxcl2 and corresponding receptor Cxcr2 across different cell types. (colors consistent with Fig. 2a). h Cell-cell communication network inferred by CellChat. Edge weights indicate interaction probabilities, highlighting autocrine Cxcl2-Cxcr2 signaling in neutrophil recruitment.

A significant proportion of Vegfa+ neutrophils and macrophages co-expressed Spp1 (Fig. 5d), an extracellular matrix protein known to promote PyMT tumor recurrence34. This finding translated to human breast cancer, where analysis of METABRIC datase35 showed poorest prognosis in patients with high SPP1 and VEGFA expression. We found the patients with high expressions of SPP1+ and VEGFA+ were of poorest prognosis outcome (Fig. 5e). Similarly, increased SPP1+VEGFA+ macrophage infiltration correlated with poor response to immune checkpoint blockade (Fig. 5f).

Intriguingly, the concurrent upregulation of Cxcl2 and neutrophil expansion suggests a potential inflammatory feedback loop36,37. While Cxcl2 was broadly expressed across monocytes, macrophages and neutrophils, its receptor Cxcr2 showed neutrophil-specific expression (Fig. 5g, Supplementary Fig. 5). Cell-cell communication analysis demonstrated that neutrophils co-express both Cxcl2 and Cxcr2 (Fig. 5h), creating an autocrine signaling axis that sustains neutrophil recruitment and maintains a pro-tumorigenic, neutrophil-rich microenvironment during relapse.

Other stromal cell dynamics in tumor relapse

In contrast to the immune components, non-immune stromal populations (endothelial cells and fibroblasts) showed conserved transcriptional profiles and percent between primary and relapsed tumors (Supplementary Fig. 6a). This stability contrasts with Myc inactivation models where tumor recurrence is accompanied by extensive stromal remodeling, particularly TP53 loss-driven neo-angiogenesis38, suggesting our proliferation-ablation model engages distinct microenvironmental reprogramming pathways during relapse.

Analysis of non-immune stromal cells also revealed a notable reduction in fibroblasts during relapse (Fig 2d, Supplementary Fig. 6b), consistent with reports of chemotherapy-induced fibroblast senescence39,40. Given their phenotypic heterogeneity, we performed sub-clustering of fibroblasts, identifying distinct populations including (1) Ly6a+ Cd34+ Cxcl12+ fibroblast adipogenic progenitor (FAP)41, (2) Acta2+ Tagln+ Mki67low quiescent myofibroblast42, (3) Mki67+ Stmn1+ proliferative myofibroblasts, and (4) Apod+ Acta2+ Igfbp2+ transitional fibroblasts that represent an intermediate state (Supplementary Fig. 6c–e). The scarcity of proliferating fibroblasts aligned with low Ki67 staining, confirming limited in situ proliferation (Supplementary Fig. 6f).

Notably, FAP and transitional fibroblasts exhibited elevated senescence markers (Supplementary Fig. 6g). However, all subsets were diminished in relapsed tumors. These findings highlight the heterogeneous nature of cancer-associated fibroblasts, which substantially decreased during the recurrence of the PyMT model, potentially contributing to microenvironmental remodeling that facilitates cancer relapse.

Discussion

Our study establishes a robust experimental platform for investigating breast tumor relapse through the integration of a Ki67-based fluorescent tracing system with diphtheria toxin receptor (DTR)-mediated cell ablation in the MMTV-PyMT model. While this approach is not directly translatable to clinical applications, it provides significant advantages over conventional cancer treatments by enabling the selective elimination of proliferating cells with minimal off-target effects. Unlike chemotherapy or radiotherapy, which indiscriminately affect both malignant and stromal compartments, our ProTracer/Deleter system achieves precise and population-specific depletion, thereby reducing confounding variables that complicate mechanistic studies. This specificity is particularly valuable given the well-documented heterogeneity in treatment responses across tumor models43,44, which often stems from variations in oncogene expression profiles and tumor microenvironment characteristics. Although clinical translation remains challenging, our system offers unprecedented resolution for dissecting tumor relapse dynamics.

The genetic ablation of proliferating cells in our model provides distinct insights compared to oncogene inactivation approaches that mimic targeted therapy. While both strategies induce tumor regression, they differ fundamentally in their mechanisms of recurrence. Oncogene deprivation typically leads to compensatory activation of alternative pathways, exemplifying the principle of oncogene addiction. For example, silencing Myc frequently results in endogenous Myc reactivation45, whereas HER2-downregulated tumors often develop polyclonal recurrences characterized by MET amplification or JAK/STAT pathway activation46,47. In contrast, our ProTracer/Deleter system reveals that proliferative cell depletion enables the outgrowth of quiescent populations, particularly cancer stem cells with enhanced tumor-initiating capacity and dynamic remodeling of stromal cells (e.g., neutrophil recruitment via autocrine Cxcl2-Cxcr2 signaling). This finding challenges the conventional view that residual cells simply re-enter the cell cycle upon mitogenic stimulation48, suggesting instead a more complex hierarchy of tumor repopulation.

Our studies extend beyond cell-autonomous mechanisms to highlight the critical roles of stroma-dependent processes in tumor relapse49. The PyMT ProTracer/Deleter model reveals dynamic immune microenvironment remodeling, characterized by NK cell reduction and neutrophil expansion. These changes suggest that recurrent tumors evade immune surveillance through multiple mechanisms: exclusion of cytotoxic lymphocytes20 and recruitment neutrophils50 establish an immunosuppressive niche. Although total T cell and macrophage numbers remain similar, recurrent tumors exhibit a marked enrichment of pro-tumoral γδ T cells and VEGFA+ SPP1+ macrophages. These findings suggest that relapsing tumors actively reshape their microenvironment to evade immune destruction. Notably, the relative contributions of intrinsic versus extrinsic relapse mechanisms appear to vary by molecular subtype, with ER+ tumors favoring genomic adaptations and TNBCs relying more heavily on stromal remodeling51,52, thereby underscoring the need for subtype-specific therapeutic strategies.

Looking forward, the modular design of our PyMT ProTracer/Deleter system creates numerous opportunities for technical innovation in breast cancer research. The platform’s flexibility allows for straightforward adaptation of key components: (1) the Dre-activated Ki67-Cre cassette could be replaced with EMT (Vimentin)53 or senescence (p16Ink4a) reporters15; (2) dual-recombinase systems could enable parallel tracking of multiple stromal lineages54; and (3) alternative reporter modules (e.g., SrtA for cell-cell interaction labeling55 or NuTRAP tags for multi-omics56) could reveal new biological insights. By combining the well-established strengths of the MMTV-PyMT model with these cutting-edge technologies, our system provides not only a powerful tool for studying tumor relapse but also a versatile platform for addressing diverse questions in cancer biology.

Methods

Animals used

H11CAG-DreERT2 (#T049774) and Ki67-RSR-Cre (#T050101) (both from GemPharmatech), Tg(MMTV-PyMT) (#022974, Jackson Laboratory), and R26CAG-LSL-RSR-tdT-DTR (denoted as R26CAG-LR-tdT-DTR, #NM-KI-190086, Shanghai Model Organisms Center) are in C57BL/6 background. The room is controlled at stable temperature ~22°C within ~50% humidity, 12-hour light/12-hour dark cycle (on from 7:30-19:30), 10-15 fresh air exchanges hourly.

No randomization or blinding was used to allocate experimental groups. No statistical methods were used to predetermine sample size. All animal work was done in accordance with a protocol (L025504202107010) approved by the Sun Yat-sen University Cancer Center (SYSUCC) Institutional Animal Care and Use Committee.

Tumor size measurement

When the primary tumors became palpable, the tumor growth was monitored every three days by measuring tumor length (L) and width (W). Tumor volume (V) was then calculated using the formula, V = π × L × W 2 / 6.

PyMT organoid culture

PyMT organoid cultures were established using an optimized protocol based on previously described methods17. Briefly, minced tumor tissue was digested in RPMI-based buffer containing 1 mg/ml Collagenase A (Sigma #10103586001), 0.1 mg/ml DNase I (Sigma #10104159001), and 1% heat-inactivated FBS. Mechanical dissociation was performed using the “Tumor_3” program on a Miltenyi Single-cell Suspension Dissociator followed by 20 min incubation at 37 °C. Unlike conventional protocols aiming for single-cell suspensions, this modified approach using shorter collagenase digestion preferentially preserved epithelial cell clusters.

After centrifugation (300 g, 5 min, room temperature), the supernatant was transferred to BSA-coated tubes while allowing large fragments to settle. Tumor clusters were selectively pelleted using brief centrifugation (300 g, 3 sec, room temperature) to eliminate single cells and stromal contaminants. Two additional enrichment rounds were performed before resuspending the clusters in organoid culture medium mixed with phenol red-free, growth factor-reduced Matrigel (Corning #356231) for solidification at 37 °C. Cultures were typically passaged every 4–6 days.

For passaging, medium was carefully aspirated and Matrigel was dissolved using Cell Recovery Solution (Corning #354253). After PBS washing, organoids were dissociated into smaller clusters or single cells using TrypLE (Thermo #12604013) at 37 °C for 5–10 min, with digestion stopped by adding an equal volume of FBS-containing medium.

To activate DreERT2 in vitro, cultures were treated with 4-hydroxytamoxifen (4-OH-Tam; ApexBio #B5421). A 100 mg/ml stock solution in DMSO was diluted in PBS to 1 mg/ml (1000×) and stored at −20 °C. For treatment, 1 μl of 1000× stock was added per 1 ml culture medium for 48-hour incubation.

Tamoxifen and diphtheria toxin treatment

Tamoxifen (MedChemExpress) was dissolved in corn oil (20 mg ml−1) and heated at 37 °C for 3 hours. Mice were administered with tamoxifen through oral gavage (100 mg/kg body weight). A single dose of 200 ng DT (List Laboratories) was intraperitoneally introduced two days after the tamoxifen injection.

Immunofluorescence staining

The mice were euthanized by CO2 and subjected to whole-animal perfusion. In brief, an incision was immediately made in the right atrium, and 10 ml PBS was slowly injected through the left ventricle. The change in color of the liver should be observed for successful perfusion. After whole body perfusion, PyMT tumors were dissected from the perfused mice and fixed with 4% paraformaldehyde (PFA) overnight at 4 °C. After being washed with PBS three times, the fixed tissues were incubated in 30% sucrose in PBS (w/v) overnight, and frozen in optimum cutting temperature gel (Tissue-Tek; Sakura). Sections of 8-μm in thickness were cut on a Leica CM1850 cryotome.

Slides were blocked in PBS with 10% normal donkey serum (Jackson Laboratory), 1% BSA, and 0.3% Triton X-100 for 1 h at 20 °C. Primary antibodies were incubated overnight at 4 °C in the staining solution (5% normal donkey serum, 0.5% BSA, and 0.3% Triton X-100). Secondary antibodies were added to the staining solution and incubated for 1 h at 20 °C. Excess antibodies were removed by washing for 5 min in PBS with 0.5% Tween-20.

Primary antibodies included: anti-PyMT (Alexa-647 conjugated, Santa Cruz, sc-53481) and anti-Ki67 (CST, #12202). Secondary antibodies included: Alexa-488 donkey anti-rabbit and Alexa-647 donkey anti-rat (1:800, both from Jackson ImmunoResearch). Stained sections were visualized by a Zeiss LSM 980 confocal microscope or scanned with Zeiss Axioscan.Z1. ImageJ was used to manually count the number of Ki67+ nuclear foci and automatically quantify the number of DAPI+ nuclear foci.

EdU labeling of proliferating cells

For in vivo proliferation analysis, tumor-bearing PyMT ProTracer/Deleter mice received intraperitoneal injections of EdU (2.5 mg ml−1 in H2O) at 200 μl per mouse. Tumors were harvested 24 hours post-injection and fixed in 4% paraformaldehyde (PFA) overnight prior to cryo-sectioning. The Click-iT reaction was performed according to the manufacturer’s protocol (ApexBio, EdU Imaging Kits Cy5, #K1076).

In vitro experiments used the same EdU stock solution (2.5 mg ml−1, 1000× concentrated), which was diluted to a working concentration of 2.5 μg ml−1 for labeling proliferating organoids. Nuclei were counterstained with DAPI (1 μg ml−1) to enable cellular localization.

Tumor dissociation and hashtag antibody labeling

After whole animal perfusion, the PyMT tumors were dissected and briefly cut with scissors. The tissue pieces were transferred to a tissue digestion C-tube (Multenyi) and incubated in the mouse tumor dissociation kit (RWD, #DHTE-5001) according to the manufacturer’s instructions. Briefly, the tumor pieces were dissociated enzymatically and mechanically on a Single-cell Suspension Dissociator (RWD, # DSC-410) at 37 °C for 30 minutes. When the digestion reaction was completed, the cell suspension was filtered through a 70-um cell strainer and was then transferred to 2-ml ice-cold RPMI medium to stop the reaction. Cellular debris and red blood cells were removed by the Debris Removal Solution (Miltenyi) and 1× RBC Lysis Buffer (Biolegend), respectively, as per manufacturer’s instructions.

0.1−0.5 × 106 cells were incubated for 15 min at 4 °C with anti-mouse Fc-block CD16/32 antibody (clone 93, 1:100) and True-Stain Monocyte Blocker (1:100) (both from Biolegend), in 100 µl FACS staining buffer (1.5% FBS and 0.5% BSA in PBS). Cells were subsequently stained with cell hashtag antibodies (A0305, A0306, A0307, A0308, all from Biolegend, Supplementary Fig. 2c) in FACS staining buffer at 4 °C for 25 min. The unbound hashtag antibodies were removed, and the cells were resuspended in FACS staining buffer containing DAPI (1 μg ml−1). During FACS, DAPI singlets were sorted for downstream scRNA-seq library preparation.

scRNA-seq raw data pre-processing

Single cell suspensions (2 × 106 cells ml−1 in ice-cold PBS) were loaded onto microwell chip using the Singeron Matrix Single Cell Processing System. Barcoding Beads were subsequently collected from the microwell chip, followed by reverse transcription of the mRNA captured by the Barcoding Beads and to obtain cDNA for PCR amplification. The amplified cDNA was then fragmented and indexed with sequencing adapters. The single-cell RNA sequencing (scRNA-seq) libraries were constructed according to the protocol of the GEXSCOPE scRNA Library Kits (Singleron Biotechnologies). Individual libraries were diluted to 4 nM, pooled, and then sequenced on Novaseq 6000 (Illumina) with 150 bp paired end reads.

Singeron scRNA-seq analysis pipeline CeleScope (v1.9.0; https://github.com/singleron-RD/CeleScope) was used to pre-process the raw reads and to generate a gene expression matrix with default parameters. Briefly, barcodes and unique molecular identifiers (UMIs) were extracted from R1 reads and corrected. Adapter sequences and poly A tails were trimmed from R2 reads, and the trimmed R2 reads were aligned against the GRCm38 (mm10) transcriptome using STAR (v2.6.1a). A 1266 bp coding sequence of PyMT (GeneBank U27813.1) was integrated into the mouse genome for subsequent scRNA-seq analysis. This gene sequence was prepared in FASTA format and accompanied by a GTF file detailing gene annotations. These files were appended to the existing of mouse reference genome FASTA and GTF files. The modified reference genome was constructed using the CeleScope ‘mkref’ command. Finally, the raw sequencing data were processed using CeleScope, resulting in the generation of a gene expression matrix that included the PyMT sequence.

Uniquely mapped reads were assigned to exons using FeatureCounts (v2.0.1). Successfully assigned reads sharing the same cell barcode, UMI, and gene were grouped together to create the gene expression matrix for subsequent analysis. Seurat57(v4.4.0; https://github.com/satijalab/seurat) was used for quality control, dimensionality reduction, and clustering. For each sample dataset, we filtered the expression matrix based on the following criteria: 1) cells fewer than 200 gene counts or in the top 2% of gene counts were excluded; 2) cells in the top 2% of UMI counts were excluded; 3) cells with mitochondrial content exceeding 10% were excluded; and 4) genes expressed in fewer than three cells were excluded. We employed the R package DecontX58 (v1.0.0; https://github.com/satijalab/seurat) to estimate and remove the contamination of ambient RNA present in the cell suspension. After filtering, a total of 16,892 cells were retained for downstream analyses, with an average of 1,317 genes and 2,602 UMIs per cell.

Filtered data were then log-normalized and scaled using ‘NormalizeData’ and ‘ScaleData’, respectively. Cell–cell variation due to UMI counts and percent mitochondrial reads was regressed out. The top 2000 variable genes were selected by ‘FindVariableFeautres’ for principal component analysis (PCA). The top 30 principal components, determined by the JackStraw algorithm, were used for clustering and dimensional reduction at a resolution of 1.5. The Uniform Manifold Approximation and Projection (UMAP) algorithm was applied to visualize cells in a two-dimensional space. Heatmaps, dot plots, violin plots displaying the expression of markers used to identify each cell type were generated by ‘DoHeatmap’, ‘DotPlot’, ‘FeaturePlot’, and ‘Vlnplot’, respectively. The cell type identity of each cluster was determined with the expression of canonical markers based on the annotation of the SynEcoSys database (https://singleron.bio/product/detail-25.html). To obtain a higher resolution map of epithelial cells and immune cells, we extracted cells of the same cell type and performed re-clustering by setting the resolution to 1.0, 1.5, and 1.5 for epithelial cells, lymphoid and myeloid cells, respectively.

Gene expression program and pathway analysis

To identify biological processes and hallmark pathways associated with differentially expressed genes (DEGs) across different populations, we conducted Gene Ontology (GO) enrichment analysis and Gene Set Enrichment Analysis (GSEA). Differential expression analysis was performed using the FindMarkers function in the Seurat R package, with DEGs defined as genes exhibiting an adjusted p-value < 0.05 and log2 fold change > 0.25. For GO enrichment, analysis, we employed the enrichGO function from the clusterProfiler59(v4.6.2; https://github.com/YuLab-SMU/clusterProfiler) R package, using biological process terms in the org.Mm.eg.db annotation database. The top five significantly upregulated GO terms for each cell type were visualized as bar plots and ranked by their q value. To evaluate pathway-level activity, we performed GSEA using the GSEA function in clusterProfiler package. Hallmark gene sets for Mus musculus were obtained from the MSigDB database via the msigdbr60 package (v7.5.1; https://github.com/igordot/msigdbr).

The signatures of venous, tip, capillary, and immature endothelial cells were used to compare tumor-associated endothelial cells isolated from primary and relapsed tumor61. To assess the senescence status of fibroblasts, we curated a senescence-related gene set primarily derived from the senescence-associated secretory phenotype (SASP) genes62. We then computed the enrichment scores of these gene programs by ‘AddModuleScore’ function in the Seurat package.

Trajectory and RNA velocity analysis

For the analysis of RNA velocity, a BAM file containing epithelial cells along with the reference genome GRCm38 (mm10) were used. The analysis was conducted using ‘scv.pl.velocity_embedding_stream’ function in scVelo63 (v0.3.2, https://github.com/theislab/scvelo) package in Python with default parameters. The results were projected onto the UMAP plot from Seurat clustering analysis to ensure visualization consistency.

We then used partition-based graph abstraction (PAGA) in the Scanpy package to perform trajectory inference analysis. Briefly, a symmetrized kNN-like graph based on PCA data was constructed by ‘scanpy.tl.paga’ function, using the approximate nearest neighbor search within UMAP. A PAGA graph was generated using the connectivity for each partitioning.

Copy number variation (CNV) Analysis for Malignant Cell Identification

CNV analysis was performed using the inferCNV R package64 (v1.18.1; https://github.com/broadinstitute/infercnv) to discriminate malignant epithelial cells from non-malignant populations, with immune cells serving as reference controls for baseline gene expression. The analysis utilized raw UMI count matrices as input, retaining only genes detected in at least 20 cells. Genes were ordered by chromosomal position, and relative expression values were centered to 1, with floor/ceiling thresholds set at 1.5 standard deviations from residual-normalized expression values. To minimize noise from gene-specific expression fluctuations, we applied a sliding window approach (101-gene span) to smooth chromosomal expression patterns. The resulting CNV profiles were visualized as heatmaps using the R pheatmap function, displaying arm-level and whole-chromosome alterations.

Cell-cell interaction analysis (CellChat)

Intercellular communication networks were inferred using the CellChat R package65(v2.1.2; https://github.com/sqjin/CellChat), which systematically evaluates chemokine-receptor binding and direct cell-cell contact. The single-cell dataset was preprocessed and filtered to retain only interactions supported by a minimum of 10 cells per group using filterCommunication (cellchat, min.cells = 10). The visualization of the overall communication network through circle plots was generated by ‘netVisual_aggregate()’. To specifically interrogate the CXCL2-CXCR2 axis, we extracted significantly enriched ligand-receptor pairs using ‘extractEnrichedLR()’ and generated targeted visualizations with ‘netVisual_individual()’. These plots used directional arrows to depict sender-to-receiver interactions, with node colors corresponding to annotated cell types and edge weights reflecting interaction strengths, thereby providing both system-level and pairwise perspectives on cellular crosstalk within the tumor microenvironment.

Human breast cancer RNA expression analysis

The clinical annotations and RNA expression data for the breast cancer METABRIC dataset were obtained from cBioPortal (http://www.cbioportal.org/datasets). Log-rank P values were calculated using the ‘survdiff’ function from the survival package in R. Kaplan-Meier curves were generated using the ‘survfit’ function in the same R package based on the expression levels of SPP1 and VEGFA. The expression level above the median was classified as ‘high’, while those below the median were categorized as ‘low’.

The scRNA-seq dataset of breast cancer patients who received immune checkpoint immunotherapy in combination with neoadjuvant chemotherapy (European Genome-phenome Archive, EGAS00001004809) was used to predict the effects of T cell naïveness and SPP1+ VEGFA+ macrophages on T cell clonal expansion. The processed data were obtained from http://biokey.lambrechtslab.org. T cells were -reclustered by setting the resolution to 1.5. Myeloid cells were re-clustered by setting the resolution to 1.0. Monocytes and macrophages were extracted based on the expression of PTPRC and LYZ. Furthermore, CD68+ CD163+ macrophages were further distinguished from VCAN+ FCN1+ monocytes to calculate the percentage of SPP1+ VEGFA+.

Statistics

In most in vivo experiments, group sizes were determined based on the results of preliminary experiments, and no statistical method was used to predetermine sample size. Experiments were repeated at least twice independently, and the data are combined and presented. For the dot plots of IF staining results, smaller dots without an outline are values from individual fields (~0.4 mm2 fields), and circles that are outlined represent mean values taken over 4−5 fields from the same mouse, unless otherwise noted. The statistical tests were performed by comparing the individual animals. All in vitro experiments were repeated independently for three times (batches). The P values in these in vitro experiments were determined based on biological replicates.