Introduction

Breast cancer, which can be clinically stratified into hormone receptor-positive (HR + ), HER2-positive (HER2 + ), and triple-negative (TNBC) subtypes, is the leading cause of cancer-related death in women1. The use of immune checkpoint inhibitors (ICIs), both alone and in combination with chemotherapy, has thus far yielded mixed results in the treatment of HR+ breast tumors2,3,4. Despite anti-Programmed Cell Death Protein 1 (PD-1) (with chemotherapy) being an effective standard therapy for some TNBC patients, the molecular drivers of early-stage ICI response remain unclear, as PD-L1 immunohistochemistry lacks sufficient predictive power5,6. The factors governing the induction and maintenance of antitumor responses in the setting of ICIs, as well as conventional cytotoxic chemotherapies used in HR+ breast cancer management, are likely multifaceted and span both tumor-intrinsic and microenvironmental processes. Moreover, tumor-intrinsic characteristics significantly influence the priming, activation, and recruitment of T cells to the tumor microenvironment, which are critical for an immune response in the context of ICIs. Yet, specific mechanisms between tumor and microenvironmental cells that may result in therapeutic response to ICI plus chemotherapy in breast cancer are not well understood.

Multiomic single-nucleus sequencing, which couples single-nucleus transcriptome and chromatin accessibility profiling, offers opportunities to study the microenvironment of tumors and uncover molecular programs underlying response or resistance to therapy in breast cancer7,8. Previous studies have focused on elucidating mechanisms driving therapeutic response by characterizing the clonal expansion of T cells in breast cancer8 and examining the immune cell states in the TNBC subtype7,9,10. However, the intricate tumor-intrinsic properties and tumor-immune interactions that emerge during combination ICI and chemotherapy treatment, particularly in the context of combination anti-PD-1 and taxane chemotherapy in early-stage HR+ breast cancer, remain largely unknown.

In this work, we examine the tumor-intrinsic and -extrinsic microenvironmental dynamics underlying therapeutic response to combination anti-PD-1 + taxane chemotherapy in HR+ breast cancer by performing and analyzing single-nucleus multiome sequencing (snRNA/ATAC-seq) on tumors biopsied before, during, and after neoadjuvant nab-paclitaxel + pembrolizumab treatment from 20 patients with early-stage HR+ breast cancer. Our findings suggest that key shifts in tumor, macrophage, and T cell states may underlie response to chemo-immunotherapy.

Results

Single-nucleus transcriptome and chromatin accessibility profiling

We conducted a pilot study (NCT02999477) of changes in PD-L1 expression that occur during preoperative treatment with nab-paclitaxel + pembrolizumab in HR+ breast cancer11 (Fig. 1A). Eligible patients had treatment-naïve stage II-III HR+ breast cancer. Briefly, patient participants were randomized 1:1 to receive a 2-week window of nab-paclitaxel (Arm 1: Chemo → Combo) or pembrolizumab (Arm 2: ICI → Combo). Patients randomized to Arm 1: Chemo → Combo received nab-paclitaxel 125 mg/m2 via intravenous (IV) infusion once a week for two weeks. Patients randomized to Arm 2: ICI → Combo received one dose of 200 mg pembrolizumab via IV infusion. After this window phase, all patients received weekly nab-paclitaxel 125 mg/m2 in combination with pembrolizumab 200 mg once every three weeks. Total planned neoadjuvant therapy for all patients was 12 doses of weekly nab-paclitaxel and five doses of every-3-week pembrolizumab. Approximately one-half of patients experienced an incomplete clinical response to the trial regimen and received additional standard of care neoadjuvant chemotherapy with Adriamycin/Cyclophosphamide (AC) for four cycles prior to surgery. Patients were subsequently categorized as “favorable responders” with residual cancer burden (RCB) 0-I, or “unfavorable responders” with RCB II-III, consistent with the associated clinical trial evaluation11.

Fig. 1: The transcriptomic and epigenetic landscape of tumor microenvironment in primary hormone receptor-positive breast cancer at single cell resolution.
figure 1

A The study design (Created in BioRender. Miler-jones, L. (2025) https://BioRender.com/fv15ktr). B Genomic and clinical overview of the hormone receptor-positive (HR+) breast cancer sample. Each column represents a tumor sample. Tumors are ordered by residual cancer burden (RCB) response (RCB 0–I or RCB II-III) and within each subgroup ordered by treatment arm. “Stage” denotes the breast cancer stage. “BluePrint” refers to the molecular subtype assessed by MammaPrint. The presence of bulk RNA-seq and bulk WES data is indicated by a black box for each tumor. The type of 10x Genomics assay used for each tumor is denoted by different colors. An additional four cycles of neoadjuvant Adriamycin/cyclophosphamide (AC) are marked by a black box for each patient. Somatic mutations in genes frequently mutated in breast cancer and common copy number alterations are displayed for each tumor. Sample indicators connect samples from the same patients. C UMAP representation of transcriptional (left) and epigenetic landscape (right) of sequenced cells. Top: UMAPs of broad cell-type annotation. Bottom: Transcriptionally based UMAPs displaying canonical marker gene expression. D Proportion of cellular compartments in each sample. Samples are colored by the RCB response. Source data are provided as a Source Data file.

In both treatment arms, treatment-naïve tumor biopsies were collected within 14 days of initiation (“Baseline”), and also as feasible at week 3, day 1, between window monotherapy and combination therapy (“W3D1”; “On-monotherapy”), at week 7, day 1, during combination therapy (“W7D1”; “On-combination”), after the completion of combination therapy (“Pre-surgery”), and at the time of surgery (“Surgery”). Tumor biopsies collected from both treatment arms were profiled using established protocols for single-nuclei RNA-seq (snRNA-seq) or single-nuclei multiome sequencing (paired snRNA/ATAC-seq) (Methods). In total, we successfully generated a single-nuclei transcriptome with or without chromatin accessibility profiles for 40 tumor biopsies from 20 patients (Fig. 1B). Eight tumor biopsies from three patients were profiled using snRNA-seq and 32 tumor biopsies from 17 patients were profiled using multiome sequencing (Supplementary Data 1A). Additionally, 31 of 40 tumor biopsies were also profiled with both bulk RNA-seq and whole-exome sequencing (WES).

From the snRNA-seq data, we detected 249,379 total nuclei, with an average of 1583 unique genes expressed per nucleus (Supplementary Fig. 1A and Supplementary Data 1 Methods). Clustering and annotation analysis (Methods) identified cell types assigned to breast epithelial cells, immune cells, and stromal cells (Fig. 1C). None of these three major cell classes were unique to any specific patient, library construction method, treatment, or treatment timepoint. (Fig. 1D and Supplementary Fig. 1B). Regarding the snATAC-seq data, we performed peak calling12 for each annotated cell type per individual patient tumor biopsy and quantified the number of fragments on the consensus peak regions across all patient tumor biopsy samples in the cohort. After standard quality control (Methods), we captured 22,125 nuclei with high-quality data for both their snRNA-seq and snATAC-seq profiles (Fig. 1C).

Tumor programs linked to combination therapy response

To dissect the tumor-intrinsic molecular programs that contribute to differential therapeutic response to combination therapy in HR+ breast cancer, we first identified malignant cells using single-cell variational aneuploidy analysis with SCEVAN13 (Supplementary Fig. 2A–C; Methods). In order to identify common molecular programs across malignant cells from multiple patients derived at multiple sampling timepoints, we next employed a non-negative matrix factorization (NMF)-based intratumor heterogeneity detection method14,15 to identify gene expression metaprograms (MPs). Briefly, for each MP, a set of genes most shared among the clustered programs is defined as the MP gene signature, and among the malignant cells across our patient tumor biopsies, we identified 11 MPs (Methods). The genes comprising two MPs, MP1 (mitochondrial and ribosomal genes) and MP10 (genes having high correlation with the number of detected unique molecular identifiers (UMIs); Pearson correlation coefficient = 0.75), were suggestive of suboptimal data quality or sequencing library size, respectively, and were thereby excluded from downstream analyses.

The remaining MPs were annotated based on functional pathway enrichment analyses of their gene signatures (Fig. 2A, B; Methods). Four MPs resembled those identified in a previous multi-cancer intratumor heterogeneity study15, including “cell cycle” (MP2), epithelial to mesenchymal transition (EMT) I or “EMT-I” (MP3), “Interferon-MHC II” (MP4), and “Stress” (MP5) (Supplementary Fig. 2D, E). Most MPs could be further categorized based on shared biological processes. For example, in addition to the widely shared MP3 (EMT-I) among patient tumors, the less commonly shared MP6 (EMT-II) and MP11 (EMT-III) were not only enriched for the established EMT MSigDB Hallmark pathway but also had unique enrichments of apoptosis and protein secretion pathways, respectively (Supplementary Data 2). This finding suggests that, in addition to a broadly shared general EMT state among most tumors in this cohort, subsets of tumors exhibited variations of EMT-like programs that reflect distinct biological processes16,17 of the EMT process in these tumors. We also identified two estrogen receptor (ER)-related MPs, MP7 (ER-I) and MP8 (ER-II), along with MP9, which is associated with a previously identified prognostic feature (Apelin, a peptide hormone)18 in breast cancer (Supplementary Fig. 2F–H).

Fig. 2: Tumor intrinsic gene signatures associated with combination therapy tesponse.
figure 2

A Heatmap displaying pairwise Jaccard similarity indices among robust non-negative matrix factorization (rNMF) programs based on their top 50 genes. Programs are clustered into nine metaprograms (MPs); MPs, timepoint and patient information are labeled at the top. The central scatterplot illustrates Pearson correlation of rNMFs with cellular complexity. B Heatmap depicting gene membership within MPs, with rows as top representative genes, columns as MPs, and functional annotations on the right. C Distribution of ER-I-related MP7 and EMT-III-related MP11 signature level in tumor cells from pretreatment biopsies. P values from two-sided Wald tests on linear mixed model coefficients (patient as random effect; Methods), without multiple-comparison adjustment. Data presented as median with nested quantile ranges and boxes narrowing toward distribution tails to show extreme values. D Baseline comparison of the relative abundance of ER-I-related MP7 and EMT-III-related MP11 states in the tumor population between favorable responders (R; N(pt) = 3) and unfavorable responders (NR; N(pt) = 9), with p values from two-sided Mann–Whitney–Wilcoxon test. Data presented as median with interquartile range (first and third quartiles). E Dotplot of scaled transcription factor expression (color) and target region enrichment scores (dot size). Each row shows a gene regulatory network (GRN), named after the transcription factor and its target region; columns represent four tumor cell states. F PROGENy-inferred activity in ER-I (MP7) and EMT-III (MP11) tumor cell states. Barplots (left) show pathway activity scores relative to other tumor states. X-axis (right) shows gene weights in the p53 (top) and TGFβ (bottom) pathways; Y-axis shows t-values from differential expression comparing MP7 or MP11 to other states. G ER-I (MP7) and EMT-III (MP11) abundance in tumors with or without TP53 mutations from pretreatment biopsies. P values from two-sided Mann–Whitney–Wilcoxon test. Data presented as median with interquartile range (first and third quartiles). H ER-I-related MP7 and EMT-III-related MP11 gene signatures relative to the overall tumor signature in bulk RNA-Seq data from TCGA primary HR+ breast cancer. Patients grouped by TP53 mutation status, with significance tested by two-sided Mann–Whitney–Wilcoxon test. Data presented as median with interquartile range (first and third quartiles). Source data are provided as a Source Data file.

Malignant cells were then categorized based on their relative signature scores of these MPs (Methods; Supplementary Data 1B). Among baseline tumors, favorable responders (RCB = 0–I) harbored a higher enrichment of MP11 (EMT-III) compared unfavorable responders (RCB = II-III) (Fig. 2C and Supplementary Fig. 2I). In contrast, favorable responders exhibited a lower MP7 (ER-I) enrichment than unfavorable responders (Fig. 2C and Supplementary Fig. 2I). When comparing the cellular abundance between the two clinical groups, we observed a higher proportion of EMT-III-related MP11 and a lower proportion of ER-I-related MP7 among malignant cells in favorable responders than in unfavorable responders (Fig. 2D). Longitudinal analysis(Supplementary Fig. 2J–L) revealed that EMT-III-related MP11 levels significantly decreased during combination therapy, independent of response (Supplementary Fig. 2L). However, when comparing on-combination to on-monotherapy tumors ER-I-related MP7 exhibited opposing trends between the two clinical groups: combination therapy elevated ER-I-related MP7 in unfavorable responders while reducing it in favorable responders when comparing on-combination to on-monotherapy tumors (Supplementary Fig. 2L).

To validate this finding, we projected these two MP gene signatures into bulk RNA-seq data from (i) 25 baseline pretreatment biopsies from this same trial11 and (ii) 13 pretreatment breast biopsies from a cohort of patients with ER+ metastatic breast cancer treated with eribulin and pembrolizumab on a clinical trial19. From corresponding bulk RNA tumors in this trial, early-stage HR+ breast patients with favorable responses exhibited high EMT-III-related MP11 and low ER-I-related MP7 gene signature scores compared to patients with unfavorable responses (Supplementary Fig. 2M). Similarly, metastatic ER+ breast cancer patients experiencing partial response to eribulin and pembrolizumab harbored high EMT-III-related MP11 and low ER-I-related MP7 gene signature scores compared to patients with progressive disease (PD; Supplementary Fig. 2L). Additionally, of the 12 baseline tumor samples in our study, nine underwent BluePrint subtyping (an 80-gene molecular assay classifying early-stage breast cancer into Basal, Luminal, and HER2 subtypes). In these, the basal subtype exhibited a higher proportion of EMT-III-related MP11 (Supplementary Fig. 2O). Taken together, these findings suggest that EMT-III-related MP11 and ER-I-related MP7 may represent tumor-intrinsic ER+ breast cancer gene signatures associated with response and resistance to preoperative ICI plus chemotherapy, respectively.

To infer epigenetic gene regulation specific to malignant cells harboring a high relative gene signature score for MP11 and MP7, we next evaluated enhancer-driven gene regulatory networks (GRNs) from malignant cells with paired single-nucleus transcriptome and chromatin accessibility data via SCENIC+20. TCF7L1/2 transcription factor (TF) regulons, which are a key component of Wnt-β-catenin signaling21,22 and regulate EMT process in epithelial cells23,24,25, were specific to EMT-III-related MP11-high malignant cells (Fig. 2E and Supplementary Fig. 2R–Q). GATA3 and ESR1 regulons, two essential TFs that regulate maintenance of hormone sensitivity in breast cancer tumor cells26,27, were enriched in ER-I-related MP7-high malignant cells (Fig. 2E and Supplementary Fig. 2R, S). These results indicated a phenotypic consistency between the molecular gene expression programs and their predicted epigenetic GRNs.

Lastly, we applied PROGENy28 to malignant cell gene expression profiles to estimate their pathway-responsive gene activities. Transforming growth factor-β (TGF-β) signaling, a key driver of the EMT process, was highly activated in EMT-III-related MP11-high malignant cells (Fig. 2F), whereas p53 signaling was the most activated pathway in ER-I-related MP7-high malignant cells. To further investigate the relationship between p53 signaling and the ER-I-related MP7 signature, we analyzed the cellular abundance of various tumor cell states in patients with and without somatic TP53 mutations. Patients with somatic TP53 mutations tended to have a higher abundance of EMT-III-related MP11-high malignant cells and lower abundance of ER-I-related MP7-high malignant cells (Fig. 2G), which was further confirmed via companion analysis of these MP signature scores in early-stage ER+ primary breast cancer patients from The Cancer Genome Atlas (TCGA) stratified by TP53 mutation status29 (Fig. 2H). Taken together, these observations link tumor-intrinsic transcriptional metaprograms to distinct biological and clinical states in this trial setting, suggesting that the p53 and TGF-β pathways play divergent roles in HR+ breast cancer tumor cells to potentially influence response to preoperative ICI plus chemotherapy.

CD8+ T cell states responsive to combination therapy

To complement the investigation of malignant cell molecular programs associated with differential therapeutic response to neoadjuvant nab-paclitaxel + pembrolizumab combination therapy, we next evaluated the contribution of CD8 + T cells to this therapeutic response within our patient cohort. We identified a total of 18,772 T cells, of which 17,230 were CD8 + T cells (Supplementary Fig. 3A). Using the same methods as those applied to tumor cells, we identified that CD8 + T cell subpopulations were characterized by three MPs (Fig. 3A, Supplementary Fig. 3B, C, and Supplementary Data 1C) that recapitulate known transcriptional profiles of previously defined CD8 + T cell states30: naive, cytotoxic, and exhausted cell states (Supplementary Fig. 3D, E). For example, increased expression of exhaustion-associated markers such as PDCD1 (PD-1), HAVCR2 (TIM-3), and ENTPD1 (CD39) was detected in a subset of exhausted CD8 + T cells, while CCR7 (CCR7) and IL7R (IL-7R) were upregulated in naïve subsets, and PRF1 (Perforin 1) and GNLY (Granulysin) were elevated in cytotoxic subsets. (Supplementary Fig. 3D). We also evaluated cellular signaling activities stimulated by extracellular cytokines across our different CD8 + T cell populations with CytoSig31. Through this analysis, we observed that CD8 + T cells exhibiting exhaustion also had significantly elevated interleukin-15 (IL-15) cytokine response scores compared to the other two CD8 + T cell populations (Fig. 3B, C). IL-15 is produced by various cell types and is traditionally recognized for its role in promoting cytotoxic effector functions32; however, this finding implies that the enrichment of exhausted CD8 + T cells may prevent antitumor response even in the setting of IL-15 exposure by additional cells in the microenvironment.

Fig. 3: Reprogramming of CD8+ T cells and their molecular signatures during monotherapy and combination therapy.
figure 3

A Heatmap illustrating average gene expression from three metaprograms (MPs) among three distinct CD8 + T cell states. Rows represent gene names. Top: cell state, patient, and timepoint information. B Differential cytokine activity between cytotoxic and exhausted CD8 + T cell states, analyzed using the two-sided Mann–Whitney–Wilcoxon test. The Y-axis displays negative log10 p values; the X-axis shows t-values. C Boxplot depicting distribution of predictive IL-15 cytokine activity across three CD8 + T cell states. Cytotoxic, native, and exhausted CD8 + T cell sates were detected in 38, 38, and 35 samples, respectively. Each point represents a sample, with p values calculated using the two-sided Mann–Whitney–Wilcoxon test. Data presented as median with interquartile range (first and third quartiles). D Violin plots illustrating shift in distribution of three MP gene signatures from baseline to monotherapy in favorable and unfavorable responders with biopsies at both timepoints. Top: CD8 + T cells during chemotherapy vs. pretreatment CD8 + T cells. Bottom: CD8 + T cells during Pembrolizumab treatment vs. pretreatment CD8 + T cells. P values were calculated via two-sided Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. E Violin plots displaying shift in distribution of three MP gene signatures from monotherapy to combination therapy in favorable and unfavorable responders with biopsies at both timepoints. P values calculated via two-sided Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. F Violin plots showing shift in distribution of three MP gene signatures from combination therapy to post-combination therapy (with AC treatment) in favorable and unfavorable responders with biopsies at both timepoints. P values calculated via two-sided Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. RCB residual cancer burden. Source data are provided as a Source Data file.

Given these findings, we proceeded to compare the three CD8 + T cell population states through trial therapy (Fig. 3D, E and Supplementary Fig. 3F–H). Exhaustion gene scores initially decreased between on-monotherapy and on-combination therapy time points (Fig. 3E and Supplementary Fig. 3G), and cytotoxic gene scores subsequently increased between on-combination therapy and pre-surgery time points (Fig. 3F and Supplementary Fig. 3G, H), seemingly consistent with the anticipated effects of ICI in augmenting CD8 + T cell effector states. While this pattern was seen in both favorable and unfavorable responders, there also were notable differences between the response groups in both baseline scores and dynamics. At baseline, tumor biopsies from favorable responders exhibited numerically higher naive, cytotoxic effector, and exhausted gene signature scores than patients with an unfavorable response (Supplementary Fig. 3F). Over the course of treatment, only unfavorable responders demonstrated a late enrichment of the exhausted gene signature in the pre-surgery samples (Fig. 3F and Supplementary Fig. 3G, H), suggesting a less sustained therapeutic benefit in this group.

Macrophage state dynamics during ICI plus chemotherapy

In addition to T cells, macrophages play an important role in mediating immune surveillance and have been implicated in contributing to selective ICI response in other solid tumors33,34. We thus complemented our microenvironmental analysis by investigating macrophages in the microenvironment of tumors from this trial cohort. Within the myeloid lineage, we identified two subclusters corresponding to distinct cell populations defined by the hierarchical clustering significance method sc-SHC35. Among all patient tumor biopsies, one of the myeloid subclusters displayed high expression of canonical macrophage markers (CD68 and CD163)36 and was thus annotated as the macrophage sub-compartment (Supplementary Fig. 4A and Supplementary Data 1D). Utilizing the same approaches applied to other cell compartments, we identified five macrophage subpopulations defined by six biologically relevant MPs. While five of the MPs could be assigned to a specific macrophage subpopulation, the presentation-related MP8 was broadly expressed among multiple macrophage subpopulations, including those characterized by MP4 (adhesion), MP6 (lipid), and MP7 (interferon) macrophage (Fig. 4A and Supplementary Fig. 4B, C; Methods). Macrophage cells with a high MP gene signature score for either MP1 (endocytosis) or MP4 (lipid processing) displayed high concomitant expression of canonical anti-inflammatory markers (MERTK, MRC1) (Supplementary Fig. 4D). Conversely, macrophage cells with a high relative MP7 (interferon) gene signature score possessed high expression of pro-inflammatory markers (IL1B and CXCL10).

Fig. 4: Reprogramming of macrophages and their molecular signatures during monotherapy and combination therapy.
figure 4

A Heatmap displaying average gene expression from six metaprograms (MPs) among five macrophage cell states. Rows represent gene names. Top: cell state, patient, and timepoint information. B Pairwise Pearson correlation among the six MP gene signatures. P values calculated with two-sided Student’s t-distribution. C Boxplot illustrating distribution of six MP gene signatures in macrophages during monotherapy, comparing 2802 macrophage cells from two favorable responders and 1990 macrophage cells from seven unfavorable responders. P values were calculated via Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. Data presented as median with multiple nested quantile ranges (50–75%, 75–87.5%, 87.5–93.75%, 93.75–96.875%, and 96.875–100%), with boxes narrowing toward distribution tails to show extreme values. D Violin plots showing shift in distribution of six MP gene signatures from baseline to monotherapy in favorable and unfavorable responders with biopsies at both timepoints. Top: Macrophages during chemotherapy vs. pretreatment macrophages. Bottom: Macrophages during Pembrolizumab treatment vs. pretreatment macrophages. P values were calculated via two-sided Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. E Violin plots depicting shift in distribution of six MP gene signatures from monotherapy to combination therapy in favorable and unfavorable responders with biopsies at both timepoints. P values calculated via two-sided Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. F Violin plots illustrating shift in distribution of six MP gene signatures from combination therapy to post-combination therapy (with AC treatment) in favorable and unfavorable responders with biopsies at both timepoints. P values calculated via two-sided Wald tests on linear mixed model coefficients (Methods: Differential gene signature analysis), with patient ID modeled as a random effect. No adjustment made for multiple comparison. RCB residual cancer burden. Source data are provided as a Source Data file.

We next evaluated whether these macrophage populations mapped to previously reported macrophage states37,38,39. Interestingly, the endocytosis-related MP1 gene signature score was negatively correlated with other MP gene signature scores among macrophage cells (Fig. 4B), whereas interferon-related MP7 was positively correlated with other MP gene signature scores except endocytosis-related MP1. As previous studies have reported that M2-like macrophages are endocytotic40, we investigated the association of these two MPs with the M1/M2 axis. The endocytosis-related MP1 gene signature score exhibited a significant positive correlation with literature-derived gene signatures37,38,39 representing the canonical M2-like macrophage phenotype, while the interferon-related MP7 gene signature score had a significant positive correlation with literature-derived gene signatures representing the canonical M1-like macrophage phenotype (Supplementary Fig. 4E). Companion analysis of enhancer-driven gene regulatory networks (Supplementary Fig. 4F) identified that the TCF12-DAB2 regulon41,42,43 is specifically enriched in the endocytosis-related MP1 cell state (Supplementary Fig. 4G), and previous studies have shown that DAB2 regulates macrophage polarization44. Growing evidence suggests that macrophage differentiation exists along a spectrum of phenotypic states, moving beyond the rigid classification of purely antitumor M1 or pro-tumor M2 subtypes45. Our result suggest that endocytosis and interferon-related pathways may in part characterize two functional branches of dynamic transcriptional programming in macrophages in this specific context.

Although we did not observe any statistically significant association between macrophage MP signature scores and therapy response in pretreatment biopsies (Supplementary Fig. 4H), we found that the interferon-related MP7 gene signature was enriched in on-treatment tumors from favorable responders after monotherapy (Fig. 4C). Moreover, in post-exposure tumor biopsies to either monotherapy type from unfavorable responders, we observed a decrease in endocytosis-related MP1 gene signature score compared to pretreatment baseline tumor biopsies (Fig. 4D). Similarly, we detected an increase in the gene signature score of two other MPs, secretion-related MP5 and adhesion-related MP6, both of which were anti-correlated with endocytosis-related MP1 gene signature scores among macrophages (Fig. 4D). In contrast, on-treatment tumor biopsies from favorable responders showed an increase in endocytosis-related MP1, interferon-related MP7, and presentation-related MP8 after chemotherapy. Next, when combination therapy was added after either monotherapy window, the gene signature scores for these four MPs (MP5, MP6, MP7, and MP8) across macrophages were relatively diminished, but the endocytosis-related MP1 gene signature score was elevated in both favorable and unfavorable responders (Fig. 4E). However, when comparing the on-combination tumor biopsies to those after combination therapy, we observed an elevation in adhesion-related MP6 and antigen presentation-related MP8 gene signatures scores across macrophages, whereas the endocytosis-related MP1 gene signature decreased in favorable responders but increased in unfavorable responders (Fig. 4F and Supplementary Fig. 4I, J). Collectively, these findings suggest that both anti-PD-1 therapy and chemotherapy, when administered as monotherapy in HR+ breast cancer, may enhance a pro-inflammatory response in microenvironmental macrophages. Furthermore, exposure to both treatments in a combination therapy setting appears to induce polarization toward increased endocytosis activity associated metaprograms only in unfavorable responders, with a decrease in this feature observed in favorable responders during the course of combination therapy.

Tumor-immune crosstalk linked to combination therapy response

Given the contributions of microenvironmental cells to therapeutic response, along with prior studies demonstrating synergistic interactions between malignant and immune cells that influence tumor progression46,47, we next conducted a focused investigation into the predicted tumor-immune interactions that may be operant among HR+ breast cancer patients as stratified by therapy response via MultiNicheNetR48 (Supplementary Data 3A). We first predicted interactions in pretreatment baseline biopsies (Fig. 5A–D). From this analysis, we observed that predicted TIGIT receptor signaling from CD8 + T cells with Nectin family ligands on malignant cells was observed in pretreatment tumor biopsies from both favorable and unfavorable responders (Fig. 5A). However, inferred CD8 + T cell PD-1 signaling activity in pretreatment baseline tumor biopsies was higher in favorable responders than in unfavorable responders (Supplementary Fig. 5A, B), and predicted CD86/CTLA-4 interaction was numerically higher in favorable responders (Fig. 5B). By contrast, predicted tumor-expressed WBP1 and FAM200A ligand signaling49 to the CTLA-4 immune checkpoint receptor on CD8 + T cells was only enriched in pretreatment baseline tumor biopsies from unfavorable responders (Fig. 5A, B). Additionally, in assessing interactions between malignant cells and macrophages, we found that the malignant cell-expressed ligand signaling to the ITGAV and NRP1/2 receptors on macrophages was more prevalent in pretreatment tumor biopsies from favorable responders (Fig. 5C, D). Conversely, in pretreatment tumor samples from unfavorable responders to the combination therapy, we observed enrichment of CLDN1 and CXCR4 signaling interactions between malignant cells and macrophages (Fig. 5C, D).

Fig. 5: Tumor cellular interactions with CD8+ T cells and macrophages.
figure 5

A Rank plots displaying baseline ligand-receptor (L–R) interactions between tumors and CD8 + T cells, sorted by their MultiNicheNetR prioritization score. Top: Predominant L–R interactions in favorable responders. Bottom: Predominant L–R interactions in unfavorable responders. Annotations highlight L–R interactions involving immune checkpoint receptors on CD8 + T cells that rank within the top 50 for each group. (Top illustration was created in BioRender. Miler-jones, L. (2025) https://BioRender.com/e6aheln). B Boxplots illustrating the distribution of three L–R interaction metrics for tumors expressing ligands that interact with the CTLA-4 receptor on CD8 + T cells, comparing favorable and unfavorable responders. Left: Probability of producing both ligand and receptor; Middle: Probability of producing the ligand; Right: Probability of producing the receptor. Data were presented as median with interquartile range (first and third quartiles). C Rank plots displaying baseline ligand-receptor (L–R) interactions between tumors and macrophages, sorted by their MultiNicheNetR prioritization score. Top: Predominant L–R interactions in favorable responders. Bottom: Predominant L–R interactions in unfavorable responders. Annotations highlight the top five L–R interactions for each group. (Top illustration was created in BioRender. Miler-jones, L. (2025) https://BioRender.com/e6aheln). D Boxplots illustrating the distribution of three L–R interaction metrics for tumors expressing ligands that interact with the receptors on macrophage, comparing favorable and unfavorable responders. Left: Probability of producing both ligand and receptor; Middle: Probability of producing the ligand; Right: Probability of producing the receptor. Data are presented as median with interquartile range (first and third quartiles). E Heatmap illustrating the changes in tumor-CD8 + T cell L–R interactions from pretreatment (BL) to on-combination therapy (W7D1). The top five altered L–R interactions for each group are displayed. F Heatmap depicting the changes in tumor-macrophage L–R interactions from pretreatment (BL) to on-combination therapy (W7D1). The top five altered L–R interactions for each group are shown. RCB residual cancer burden. Source data are provided as a Source Data file.

Next, we sought to further dissect the impact of combination therapy on tumor-immune interactions in HR+ breast cancer by comparing the predicted signaling interactions between tumor biopsies from the pretreatment baseline (BL) tumor biopsies to on-combination therapy tumor biopsies (W7D1), further partitioned between favorable and unfavorable responders (Supplementary Data 3B). Compared to pretreatment baseline tumor biopsies, on-combination therapy tumor biopsies exhibited augmented malignant cell-CD8 + T cell interactions critical for tumor recognition (HLA-B-CD8A) and CD8 + T cell cytotoxic effector function (ZG16B-CXCR450) only in favorable responders (Fig. 5E). Malignant cells from on-combination biopsies of favorable responders were predicted to interact with pro-inflammatory macrophages (Fig. 5F), and these macrophages also demonstrated increased expression of canonical pro-inflammatory genes (Supplementary Fig. 5C) and enrichment of the interferon-related MP7 signature (Supplementary Fig. 5D). Conversely, predicted interactions between SFTPD and LAIR1, a receptor known to inhibit T cell activation51, from malignant cells to CD8 + T cells were more prevalent in on-combination therapy tumor biopsies from unfavorable responders (Fig. 5E), potentially suggesting suppression of CD8 T cell activation by tumors. Moreover, VEGFA-NRP1/2 signaling from malignant cells to macrophages was enriched in on-combination therapy biopsies from unfavorable responders (Fig. 5F). This aligns with prior evidence showing that macrophages expressing Neuropilin-1 (NRP1) or NRP2 drive tumor progression52,53,54. In summary, these observations underscore the dynamic state reprogramming of malignant and immune cells, along with predicted interactions among them, during neoadjuvant nab-paclitaxel + pembrolizumab combination therapy in early-stage HR+ breast cancer.

Discussion

In this study, we conducted a comprehensive analysis of single-nucleus transcriptomic and chromatin accessibility landscapes in early-stage HR+ breast cancer patients on a clinical trial examining neoadjuvant nab-paclitaxel +  pembrolizumab combination therapy. Prior works have both characterized distinct immune/stromal microenvironments across breast cancer subtypes55 and defined the lineage-specific progenitors of luminal and basal tumor cells in primary disease56. A prior window-of-opportunity study8 investigated changes in the tumor microenvironment of breast cancer, specifically focusing on its relationship to T cell expansion. However, the mechanisms by which tumor-intrinsic properties and immune-tumor interactions modulate therapeutic efficacy during chemotherapy, PD-1 blockade, or combination regimens remain poorly understood. Therefore, our study extends these investigations by assessing the dynamic influence of taxane chemotherapy and anti-PD-1 immunotherapy, both as monotherapies as well as in combination, on tumor-intrinsic and -extrinsic microenvironmental features in patient tumors. From this analysis, we identified cell type-specific molecular programs, GRNs, and signaling interactions that are associated with patient responses to combination of taxane chemotherapy and anti-PD-1 therapy. Prior research has indicated that certain malignant cell gene signatures, such as cell cycle and EMT, are shared among malignant cells across different cancer types15,55, thereby providing a valuable framework for identifying common therapeutic targets and understanding tumor progression mechanisms. Additionally, high immune infiltration in lung adenocarcinoma has been strongly linked to the induction of EMT, which is associated with increased expression of inhibitory immune checkpoints like CTLA-4 and PD-L157. A pan-cancer study58 further demonstrated that the crosstalk between immune evasion and EMT correlates with responses to immune checkpoint blockade across several cancer types. Utilizing an NMF-based approach15, we detected several malignant cell molecular programs that recurred among patients in our cohort. Notably, two of these metaprograms, EMT-III-related MP11 and ER-I-related MP7, demonstrated a significant correlation with response and resistance to the combination therapy, respectively. In addition, the EMT-III gene signature was associated with mutations in TP53, and malignant cells with a high EMT-III or ER-I gene signature score displayed enrichment of TGF-β or estrogen signaling pathway activities, respectively. Lastly, by integrating paired gene expression and chromatin accessibility data, we determined that the EMT-III program was predominantly regulated by TCF7L1/2 TF regulons, whereas the ER-I program was regulated by ESR1 and GATA3 TF regulons59. Previous studies have revealed a positive correlation between EMT and PD-L1 expression in breast cancer patients60. Collectively, these findings suggest that a tumor specific EMT program may serve as a potential biomarker for combination immunotherapy and chemotherapy in breast cancer, with potential relevance to other cancer types.

Immune cell states and their relative abundance have also been implicated as predictive biomarkers for ICI efficacy. In our study, we detected relatively high naïve, cytotoxic effector, and exhausted CD8 + T cell states in baseline tumor biopsies from patients who favorably responded to the therapy, whereas these cell populations were detected at relatively low level in patients with an unfavorable response. Furthermore, favorable responders exhibited an increase in the CD8 + T cell cytotoxic effector state and a decrease in the CD8 + T cell exhausted state in on-combination therapy tumor biopsies, suggesting a more robust antitumor T cell response compared to unfavorable responders. In evaluating macrophage state dynamics, our findings suggest that neoadjuvant exposure to either taxane chemotherapy or anti-PD-1 as monotherapies may independently elevate pro-inflammatory macrophage phenotypes, but that the combination of these two therapies may promote a shift of these monotherapy induced-inflammatory macrophage toward an anti-inflammatory macrophage phenotype in tumors resistant to therapy.

Our findings also reveal a difference between predicted tumor-immune cell signaling interactions in pretreatment baseline tumor biopsies versus on-combination therapy biopsies, suggestive of a shift during therapy that may influence patient responses. For example, we detected a higher number of interactions between tumor and CD8 + T cells via PD-1 in favorable responders compared to unfavorable responders at baseline. We also found that pretreatment interactions between tumor cells and CTLA-4 on CD8 + T cells were more prevalent in unfavorable responders. In addition, macrophages expressing NRP1/2 receptors have been linked to immunosuppression, and upon evaluating signaling activity between pretreatment baseline and on-combination therapy tumor biopsies, we noted enhanced tumor-macrophage interactions via NRP1/2-VEGFA in on-combination therapy biopsies from unfavorable responders.

Our study has several limitations. Due to the unique trial design and limited tissue access, our analyses are constrained by the small sample size. Consequently, these findings would benefit from further experimental investigations into the specific cell-type relationships observed, as well as additional external clinical validation, where feasible, with tumors treated with similar therapies and analyzed using the same sequencing approach. Additionally, spatial heterogeneity within tumor regions inherently restricts our ability to comprehensively determine the cell types and states identified. This issue may be further investigated with companion spatially resolved transcriptomics from future trial and real-world samples and cohorts.

In summary, our study provides a comprehensive overview of the tumor-intrinsic and -extrinsic microenvironmental dynamics following a combination of taxane chemotherapy and anti-PD-1 therapy in the neoadjuvant setting of HR+ breast cancer. We identified malignant cell molecular programs, such as ER-I, and immune cell states (i.e., exhausted CD8 + T cell states) and interactions (i.e., malignant-T cell signaling interactions via CTLA-4) that correlate with unfavorable responses to the combination therapy. Our study thereby offers valuable insights into potential predictive biomarkers for therapeutic response in early-stage HR+ breast cancer patients. Further research, including longitudinal studies on patient survival, is necessary to validate the associations identified in this study.

Methods

Sample collection and processing

Patient population and sample selection

This study was approved by the institutional review board of Dana-Farber/Harvard Cancer Center and was conducted in accordance with the principles of the Declaration of Helsinki. As breast cancer is a leading cause of cancer death among women, only female patients were enrolled. Participants did not receive compensation and provided information on their self-identified gender. Tumor biopsies were collected from patients enrolled in the randomized and open-label clinical trial NCT02999477, which enrolled 32 patients. All participants provided written informed consent before undergoing any study-related procedures. Among 32 patients, two patients withdrew before receiving study therapy, and one was found ineligible due to HER2 positivity after a single dose of pembrolizumab. Of the remaining 29 patients eligible for efficacy analysis, only 20 were ultimately evaluated. This discrepancy arose because the single-cell sequencing study was initiated prior to full trial enrollment (with six trial patients enrolled after the single-cell sequencing efforts began, who were then not included in the single-cell cohort), one patient failed to achieve successful single-cell profiling at all timepoints, and the absence of baseline biopsies for two patients. Briefly, baseline biopsies were available for 21 patients for single-cell sequencing; however, due to a high rate of library construction failure, only 12 patients had successful library construction and single-cell profiling. Baseline patient and disease characteristics for these 12 patients are presented in Supplementary Data 4. The median age was 42 years, with 66.7% having clinical stage II breast cancer and 75% of tumors exhibiting pure ductal histology. Additionally, 16.7% of tumors were classified as HR low-positive. These characteristics align with the 29 patients included in the efficacy analysis outlined in Ada et al., 2024 (co-submitted). For W3D1 biopsies, 14 patient samples were available, with successful library construction and single-cell profiling achieved for nine patients. Similarly, for W7D1 biopsies, 12 patient samples were available, of which eight underwent successful library construction and single-cell profiling. Among pre-surgery biopsies, five samples were available, with four successfully processed. For surgery biopsies, ten samples were available, with successful library construction achieved for seven. Overall, despite variability in sample availability and processing success, the patient characteristics of those with successful single-cell profiling remained consistent with those in the efficacy analysis (Supplementary Data 4).

Nuclei isolation

Nuclei isolation was performed as previously described61. Low-retention microcentrifuge tubes (Fisher Scientific, Hampton, NH, USA) were used throughout the procedure to minimize nuclei loss. Briefly, tissues were manually dissociated by chopping with fine spring scissors for 10 minutes, then homogenized in TST solution. The homogenate was filtered through a 30 µm MACS SmartStrainer (Miltenyi Biotec, Germany) and centrifuged at 500×g for 10 min at 4 °C to pellet the nuclei. The nuclei pellet was resuspended in a lysis buffer to permeabilize the nuclei, followed by another centrifugation at 500×g for 10 min at 4 °C. The final nuclei pellet was resuspended in 150 µl of 10x Genomics Diluted Nuclei Buffer. Trypan blue-stained nuclei were then counted manually using INCYTO C-Chip Neubauer Improved Disposable Hemacytometers (VWR International Ltd., Radnor, PA, USA).

Single-nuclei RNA-sequencing (snRNA-seq)

Following the 10x Genomics protocol62, a maximum of 10,000 nuclei per sample were loaded into each channel of the Chromium Next GEM Chip K for processing on the 10x Chromium Controller (10x Genomics, Pleasanton, CA, USA). This was followed by cDNA generation and library construction according to the manufacturer’s instructions (Chromium Next GEM Single Cell 5ʹ Reagent Kits v2 User Guide, Rev E). The resulting libraries were then normalized and pooled for sequencing on an Illumina NovaSeq system (Illumina, Inc., San Diego, CA, USA) with run parameters set to 26, 10, 10, and 90.

Multiomic GEX and ATAC profiling (MO)

Following the 10x Genomics protocol62, we loaded around 16,000 nuclei per sample per channel of the Chromium Next GEM Chip J for processing on the 10x Chromium Controller (10x Genomics, Pleasanton, CA, USA). This was followed by transposition or cDNA generation and library construction according to the manufacturer’s instructions (Chromium Next GEM Single Cell Multiome ATAC + Gene Expression User Guide, Rev F). The resulting libraries were then normalized and pooled for sequencing on two NovaSeq SP-100 flow cells (Illumina, Inc., San Diego, CA, USA).

Single-nuclei gene expression analysis

We utilized CellRanger v6.0.1 to generate raw gene expression matrices for snRNA-seq and CellRanger Arc v2.0.0 for MO profiling, quantifying reads aligned to the GRCh38-2020-A reference genome on a per-cell basis. Cellbender(v0.3.0)63 was then applied to the raw expression matrix for eliminating technical artifacts coming from ambient RNAs and random barcode swapping. Cells with fewer than 250 expressed genes, fewer than 500 UMIs, or greater than 5% mitochondrial gene counts were excluded from further analysis. Subsequently, we employed Scrublet64 to remove potential doublets from the filtered gene expression matrices. After these quality control steps, a total of 249,379 cells remained for downstream analysis (Python 3.8.12 and R 4.1).

There were seven sequencing batches in total (Supplementary Data 1A). For each sequencing batch, Leiden clustering and cell compartment annotation were performed on their aggregated expression matrices with Scanpy (1.9.1)65. Unless specified, default Scanpy parameters were employed. Briefly, for each batch, 2000 highly variable genes were identified and used for the principal component analysis (PCA). The top 50 principal components (PCs) were then used to compute the nearest neighbors distance matrix, with a local neighborhood size of 10 for constructing the neighborhood graph. We applied Leiden clustering at various resolutions (0.1, 0.3, and 0.5) to the neighborhood graph. For each clustering resolution, the Scanpy rank_gene_groups function was performed to characterize the highly expressed genes per cluster. Clusters were annotated by comparing their highly expressed genes to canonical cell type marker genes (Supplementary Data 5), with the Jaccard index quantifying the overlap. The resolution yielding the fewest uncharacterized clusters was selected. Cell types were then grouped into their respective lineages prior to concatenating data from the entire cohort (Supplementary Data 1E). Lastly, we repeated the PCA analysis on 2000 highly variable genes from the integrated expression matrix, constructed a neighborhood graph, and performed uniform manifold approximation and projection (UMAP) to embed the neighborhood graph into a two-dimensional space for visualization.

Malignant cell detection through copy number analysis

Segregation of malignant cells from normal cells on each individual sample was assessed by SCEVAN (1.0.1)13 with immune cells as a reference. Subsequently, we designated malignant cells originating from the epithelial compartment as tumor cells. As this tool also infers the copy number profile of malignant cells, we confirmed that the common copy number alterations detected through WES were also presented on the copy number profile derived from the single-cell expression matrix (Suppl Fig. 2B).

Recurrent programs detection and metaprograms generation

Within each cell type, we filter out samples with a number of cells less than 100 and adopted the methodologies outlined in previous studies14,15. In brief, we employed sparse non-negative matrix factorization (sNMF) implemented by nimfa (1.4.0) on a given cell type from an individual sample, exploring a range of k values from 4 to 10, and characterized the resulting NMF programs by their top 50 defining genes. We then assessed the degree of overlap for the NMF programs both within and across samples, retaining those programs with fewer than 35 overlapping genes within the same sample and more than ten overlapping genes between different samples. These selected NMF programs were designated as robust NMF programs. Finally, we applied a customized clustering method15 to these robust NMF programs, resulting in the identification of clusters referred to as metaprograms (MPs).

Cell state annotation

Within each cell type, cells were further classified into distinct cell states using MPs previously identified for that specific cell type. For each cell type, including CD8 T cells, macrophages, and tumor cells, we quantified cell-type-specific MPs, using the Vision(3.0.1)66 method, generating an N (cell count) × M (MP count) signature matrix per cell type. Due to the incomparable score ranges of MPs, we applied Gaussian mixture modeling (GMM) with hard labeling via the signature scoring python package67. Briefly, GMM was fitted to each MP scoring matrix with the number of mixtures equal to the number of identified MPs. Pearson’s correlation coefficient between MPs and GMM clusters was calculated to generate an M × C similarity matrix. Subsequently, GMM mixtures were labeled according to the MP with which they exhibited the highest similarity. In the final step, each cell was classified under the MP that corresponded to the GMM mixture with the highest probability in the GMM (Supplementary Fig. 6).

Differential gene expression and pathway activity analysis

We utilized the decoupler (1.6.0)68 Python package for the differential gene expression (DEG) analysis and pathway activity inference. To identify DEGs associated with specific cell states, we generated pseudobulk samples69 for each cell state within each patient sample. DEGs for each cell state within the same cell type were then calculated using DESeq270,71. To account for the unequal number of samples across patients, we included patient ID as a design factor in our analysis to minimize patient-driven biases.

For the estimation of signaling activity levels within each cell state, we utilized PROGENy (1.0.6)28, which is based on a compendium of large-scale public signaling perturbation experiments that define responsive genes and interaction weights for pathways. Briefly, the cell state DEGs statistic was fitted into a multivariate linear model. This model predicted the observed gene expression using the PROGENy Pathway-Gene interaction weights. The resulting t-values of the model coefficients served as the activity scores for each cell state’s signaling pathways.

Gene signatures on bulk RNA-sequencing data

Initially, we established a set of signature genes for each cell state, selecting the top 100 genes that were differentially expressed (with adjusted P value <0.05 and a log fold change >1) in comparison to other cells detected in our cohort. In accordance with the guidelines provided by vision(3.0.1)66, we normalized the bulk RNA-sequencing raw count matrix by dividing by the total reads per sample and then multiplying by the median total reads across the cohort. Next, cell state signature scores were calculated on the scaled count matrix via Vision. Lastly, the resulting cell state signature scores were further normalized against the average scores of cell states within the same cell type, as the signatures being investigated were pertinent to individual cell types.

Differential gene signature analysis

The gene signature scores were compared between two groups of interest to identify signatures with significant differences. The cell type-specific signatures changes over treatment time were modeled using linear mixed-effects regression model (1):

$$y={\beta }_{0}+\,{\beta }_{1}t+{Zd}+\epsilon$$
(1)

Where \(y\) is the observed signature score for a signature across all cells in the given cell type, \({\beta }_{0}\) is the intercept, \({\beta }_{1}\) is the fixed effect of treatment time \(t\) on signature level, \(Z\) is a binary design matrix indicating if the cells are from the same patient or not, \(d\) is a vector of random effect for patient, which is normally distributed with mean zero and represents the deviation from the overall mean of the mean signature scores for each patient, and \(\epsilon\) is random errors. To ensure robust statistical modeling of within-patient dynamics while mitigating bias from incomplete longitudinal data, we only include patients with samples at both timepoints when performing this analysis. To identify signatures that significantly differentiate favorable responders from unfavorable responders, we applied a similar formula, where \({\beta }_{1}\) is the fixed effect of therapy response \(t\) on signature level.

Cell-to-cell communication of single-nuclei gene expression data

Given the requisite minimum number of cells and samples per condition for the application of MultinicheNetR(2.0.0)48, our analysis was concentrated on the cellular interactions at the cell type level, specifically between tumors, macrophages, and CD8 T cells. The algorithm operates by identifying ligand-receptor interactions that exhibit differential expression and activity between specified conditions, as detailed in the referenced literature48. We investigated the interactions that were differentially activated between favorable and unfavorable responders using their baseline samples (Supplementary Data 3A). Subsequently, we explored the changes in cell-cell communication dynamics from baseline to W7D1, focusing on differences between favorable and unfavorable responders (Supplementary Data 3B).

Single nuclei ATAC-sequencing data processing

We generated a fragment bed file for each sample, where each line corresponds to a unique ATAC-seq fragment identified by the assay, using the Cellranger Arc 2.0.0. We then proceeded with the SCENIC + (1.0.1.dev6+ge5ba6fc)20 workflow for our downstream analysis. Firstly, leveraging the cell type labels derived from gene expression analysis, we constructed pseudobulk fragment bed files for each cell type and conducted consensus peak calling12,20. Upon establishing the consensus peak regions, we counted the number of fragments overlapping the consensus peak regions. Cells with fragments in peaks ratio <0.45, log number of unique fragments <3.3, or transcription start site enrichment score <5 were removed. This resulted in 22,125 high-quality cells (Supplementary Data 1F).

Next, we applied topic modeling to the entire cohort, as well as tumor cells and macrophages individually, using LDA with a collapsed Gibbs sampler. We selected models with 30, 15, and 10 topics for the entire cohort, tumor cells, and macrophages, respectively, based on the stabilization of metrics72,73,74 and log-likelihood. Finally, we conducted dimensionality reduction using UMAP on all topics identified across the entire cohort.

Topic binarization

We followed the recommendations as described in ref. 20 to generate the required region sets for gene regulation network inference analysis. Briefly, we binarized topic regions using two methods, which are Ostu75 and ntop (the top 3000 regions per topic).

Differentially accessible regions detection

We imputed the region accessibility exploiting the cell-topic and topic-region probabilities and normalized the probability values with the default scale factor 1,000,000. To speed up the hypothesis testing step, we identified highly variable regions and then identified differentially accessible regions per cell state using the Wilcoxon rank-sum test with log fold change >1.5 and false discovery rate <0.05.

Motif enrichment analysis

We used the score and ranking database v10 motif collection (SCENIC+ motif collection). Motif enrichment was performed using both the cisTarget and DEM algorithm on binarized topic regions, the top 3000 regions per topic, and cell-type-based differentially accessible regions. The motif enrichment analysis was run both including promoters, which were defined as regions within 500 bp up- or downstream of the TSS of each gene and excluding them.

Gene regulation network inference

The raw gene expression count matrix, imputed accessibility and motif enrichment results were used as input into the SCENIC+ workflow, keeping 22,125 cells with both high-quality ATAC-seq and RNA-seq profiles. The SCENIC+ workflows were run using default parameters20.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.