Introduction

Breast cancer remains the most commonly diagnosed malignancy worldwide and constitutes a major contributor to cancer-related mortality among women1. Despite continuous advancements in therapeutic strategies and significant improvements in clinical outcomes, a proportion of patients still suffer from poor prognosis due to tumor recurrence, distant metastasis, or the development of resistance to treatment. High intratumoral heterogeneity (ITH) and the complexity of cancer cells lead to drug resistance, necessitating a better understanding of prognostic assessment and effective therapeutic targeting. However, the clinical application of multi-modal, high-dimensional data necessitates extensive analytical expertise and is time-intensive, even when employing dimensionality reduction techniques such as matrix factorization or tensor decomposition. The continual expansion of oncology-related databases further compounds the complexity of extracting meaningful information2. Therefore, Machine learning (ML) approaches are becoming vital for automatically uncovering underlying data patterns across diverse modalities to aid clinicians in informed decision-making processes3,4. ML, a specialized field within artificial intelligence (AI), is dedicated to recognizing data patterns for the purpose of predictive modeling and automated analysis. For decades, in oncology, ML has served as an effective adjunct in cancer characterization and treatment planning, with demonstrated applications in early diagnostic screening, classification of cancer subtypes, extraction of clinically relevant features, deconvolution of the tumor microenvironment, prognosis estimation, and prediction of therapeutic responses5,6,7.

Mitochondrial pathway abnormalities and metabolic disorders can alter gene expression, thereby facilitating cancer development, progression, and immune evasion8,9. H Histone H2B monoubiquitination plays a negative role in regulating the Warburg effect and tumor formation in breast cancer cells by modulating the expression of mitochondrial intake genes9,10. Research indicates that apoptosis-inducing factors influence mitochondrial respiration, with phosphorylation promoting breast cancer advancement. Moreover, mitochondrial ribosome-related genes are highly expressed in metastatic breast adenocarcinoma11,12. Mitochondrial ribosome-related genes are highly expressed in breast adenocarcinoma metastasis13. Pyruvate carboxylase (PC), a mitochondrial enzyme, has also been associated with breast metastasis in breast cancer14. Since tumor cells primarily rely on glycolysis and mitochondrial oxidative phosphorylation for survival, targeting this metabolic pathway has become a promising direction for cancer therapy. Specific changes in mitochondrial function observed in metastatic breast adenocarcinoma further support the exploration of novel treatment strategies15. Enhanced mitochondrial glucose oxidation is a typical characteristic of breast cancer, where the mitochondrial electron transport chain (MTC) plays a pivotal role in maintaining tumor growth16,17. Therefore, inhibiting MTC, particularly in combination with molecularly targeted therapies, has demonstrated strong anti-tumor effects. Moreover, voltage-dependent anion channel 1 (VDAC1), a mitochondrial outer membrane protein essential for cellular energy homeostasis, is often upregulated in malignant tissues, and its inhibition has been shown to suppress breast cancer cell proliferation18. Collectively, these findings highlight the central role of mitochondrial energy metabolism in breast cancer pathogenesis and its potential as a therapeutic target.

Abnormalities in mitochondrial energy metabolism pathways are linked to oxidative stress, caused by excessive free radicals, high levels of reactive oxygen species (ROS), and reduced antioxidant capacity, leading to ROS accumulation and an oxidation-antioxidant imbalance19,20,21. Oxidative stress is a key feature of various cancers, where cancer cells typically maintain elevated ROS levels compared to normal cells. Mitochondrial respiration serves as a major ROS producer, and excessive ROS can harm cellular components19,22,23. Superoxide is generated through electron leakage at mitochondrial electron transport chain complexes I and III, later converting into hydrogen peroxide. Mitochondrial DNA is prone to somatic mutations, which can disrupt the respiratory chain and hinder energy production, further enhancing ROS accumulation and contributing to tumor development24,25.

This study comprehensively explored the abnormal expression patterns of mitochondrial genes in BRCA and their clinical prognostic significance across seven clinical cohorts (GSE86347, GSE21653, GSE58812, GSE123845, GSE42568, GSE10886) and a single-cell dataset (GSE176078). To optimize the precision stratification strategy for BRCA, we integrated ten machine-learning algorithms to develop a risk-stratification model: MitoScore. Further immune infiltration analysis and metabolic flux analysis elucidated the immune heterogeneity and metabolic dysregulation of patients in high-risk subgroups. We chose the most crucial gene in the above-mentioned model: CPS1, and confirmed its role as an oncogene, bridging glycolysis and immunosuppression. These results suggest that normalizing mitochondrial metabolism and reprogramming the tumor microenvironment by targeting CPS1 is feasible in BRCA.

Results

Mitochondrial genes are abnormally expressed and associated with poor prognosis in BRCA

Single-cell transcriptomic profiling of BRCA revealed pronounced upregulation of mitochondrial genes (MGs) within tumor epithelial compartments compared to normal counterparts (Fig. 1A–D, GSE176078). KEGG pathway enrichment analysis revealed that these mitochondrial genes were mainly grouped in key metabolic pathways, such as the tricarboxylic acid cycle, oxidative phosphorylation, and glycolysis, with a focus on pyruvate metabolism (Fig. 1E). Validation in the TCGA-BRCA cohort confirmed increased mitochondrial metabolic activity in tumor tissues compared to adjacent normal tissues. Survival analyses further delineated a robust association between elevated mitochondrial related genes xand adverse clinical outcomes (Fig. 1F–H). However, the prognostic utility of mitochondrial gene scoring was suboptimal, with an area under the curve (AUC) below 0.7, underscoring the necessity for developing refined risk stratification models to improve predictive accuracy in BRCA management.

Fig. 1
figure 1

Expression and Clinical Implications of Mitochondrial Gene Signatures in BRCA. (A) UMAP plot delineates distinct cellular clusters within the BRCA tumor microenvironment in the GSE176078 single-cell dataset. (B) Cell subtype annotation using specific markers. (C) Mitochondrial gene signature across single-cell populations shown by UMAP. A curated panel of 1136 mitochondrial genes from Human.MitoCarta3.0 was quantified, revealing pronounced enrichment in tumor epithelial cells compared to non-malignant counterparts. (D) Expression patterns of mitochondrial gene scores in normal epithelial and malignant cancer cells. (E) KEGG pathway enrichment of mitochondrial gene signatures. (F) Mitochondrial genes were highly expressed in tumor tissues in the TCGA-BRCA dataset and were significantly associated with unfavorable patient prognosis (G). (H) ROC curves showing the predictive accuracy of mitochondrial gene scores for patient survival status (dead or alive). An AUC greater than 0.7 was considered to have acceptable accuracy.

Integrating multiple machine learning methods to construct the prognostic model: MitoScore

To develop a reliable prognostic model based on mitochondrial genes, we integrated multiple machine learning approaches using seven clinical datasets (training cohort: TCGA-BRCA; validation cohorts: GSE86347, GSE21653, GSE58812, GSE123845, GSE42568, GSE10886). As illustrated in Fig. 2A, the StepCox + RSF algorithm demonstrated superior predictive performance (C-index = 0.94) during model optimization. The finalized model, termed MitoScore (StepCox[forward] + RSF), exhibited consistent prognostic potential across all validation cohorts with an average C-index approaching 0.7. Survival analysis revealed significant stratification of patient outcomes by MitoScore, where high-risk scores correlated with unfavorable survival (Fig. 2B). The model’s discriminative capacity was further evidenced by exceptional ROC curve performance (AUC > 0.97, Fig. 2C). Validation studies confirmed the model’s prognostic reliability through supplementary analyses (Supplementary Fig. 1). Comparative evaluation against 101 published gene signatures demonstrated MitoScore’s competitive advantage in specific datasets (Supplementary Fig. 2).

Fig. 2
figure 2

Construction of a prognostic model for breast cancer based on mitochondrial-related genes. (A) Heatmap showing the performance of various prognostic models across different cohorts (TCGA and GEO datasets). Models include combinations of Stepwise Cox regression (forward, backward, or both), Random Survival Forest (RSF), Lasso, and other machine learning approaches. The C-index is used to evaluate the predictive accuracy of each model, with higher values (red) indicating better performance. Mean C-index values for all cohorts and validation cohorts are highlighted. (B) Kaplan–Meier survival curve comparing the high-risk and low-risk groups in the TCGA cohort. The high-risk group (red) exhibits significantly worse survival compared to the low-risk group (blue), with a hazard ratio (HR) of 15.11 (95% CI 11.05–20.67, p < 0.001). (C) Receiver operating characteristic (ROC) curve assessing the predictive accuracy of the prognostic model at 1, 3, and 5 years. The area under the curve (AUC) values are 0.97, 0.99, and 0.97, respectively, demonstrating excellent discrimination performance. (D) Bubble plot showing the importance of mitochondrial-related genes in the prognostic model. Genes such as CPS1, COX6B2, BCL2L10, and BAK1 are highlighted based on their hazard ratios (HR) and importance scores. The size of each bubble represents the HR, with HR > 1 indicating a higher risk and HR < 1 indicating a protective effect, while the color gradient indicates gene importance.

Collectively, MitoScore serves as a clinically valuable tool for predicting survival in BRCA patients, with potential uses in personalized risk assessment and treatment decisions.

The clinical relevance and single-cell expression pattern of MitoScore

To comprehensively explore the applicable potential of the MitoScore model, we evaluated its correlation with clinicopathological features of BRCA. In the TCGA-BRCA cohort, the risk score was notably higher in tumor tissues than in adjacent normal tissues (Fig. 3A). Furthermore, these scores exhibited a strong association with advanced clinical staging, thus higher risk scores corresponded to unfavorable clinicopathological progression (Fig. 3B). A prognostic nomogram integrating multiple variables was developed to predict patient survival probabilities. The model’s clinical utility was confirmed using calibration curves and decision curve analysis (DCA), demonstrating robust predictive performance (Figs. 3C–E). These findings collectively highlight the MitoScore’s capacity to stratify BRCA patients by disease severity and its potential as a potential biomarker for monitoring cancer progression.

Fig. 3
figure 3

The clinical features and single-cell expression patterns of MitoScore. (A) Risk scores for tumor tissue and normal tissue based on the MitoScore model. (B) Correlation of risk scores with clinical tumor stage (T), lymph node stage (N) and metastatic tumor stage (M). (C) nomogram plots of the MitoScore model to predict patient survival at 1,3,5 years. (D) Correction curve of the nomogram plot showing the proximity of predicted and confirmed values. (E) DCA curve of the MitoScore model. (F) Differential expression of the risk score of the MitoScore model in tumor cells and normal epithelial cells. (G) Expression patterns of the 64 genes of the MitoScore model in single-cell subpopulations.

Mitochondrial metabolism drives MIF-CXCR4 signaling-mediated immune escape

To investigate why patients with hypermetabolism or high risk have worse outcomes, we proposed that elevated metabolic scores might be associated with a colder (immunosuppressive) tumor microenvironment (TME). Analysis of immune cell infiltration revealed that high-metabolism patients exhibited a less active TME, characterized by notably reduced levels of various T cells and NK cells (Fig. 4A). The immune score estimate revealed a lower overall immune level in patients with high metabolism (Fig. 4B). To further investigate the TME differences, cell communication analysis of the single-cell dataset showed significant variations in MIF, SPP1, and MK signals between high- and low-metabolism groups (Fig. 4C). In tumor-T cell interactions, enhanced MIF-(CD74+ CXCR4) and MIF-(CD74+ CD44) signals were observed (Fig. 4D).

Fig. 4
figure 4

High mitochondrial metabolism drives MIF-CXCR4 signaling-mediated immune escape. (A) Immune infiltration analysis based on the Cibersort algorithm in the TCGA-BRCA dataset. (B) Estimate algorithm to assess overall immune levels of samples. (C) CeE0771hat cell communication analysis of single-cell dataset with all differential ligand-receptor pairings demonstrated in high and low risk groups. (D) Strength of communication from tumor cells to T cells in risk subgroups. (EF) Expression and prognostic roles of MIF signaling in the TCGA-BRCA dataset.

MIF promotes tumor proliferation and survival and downregulates anti-tumor immune surveillance26. CXCR4 is one of the cell receptors of MIF, and the MIF-CXCR4 signal has been extensively studied in cancer. CXCR4 maintains the exhausted state of CD8 T cells through the JAK2/STAT3/TOX pathway, which weakens the efficacy of immunotherapy27. The MIF signal (MIF, CD74, CD44, CXCR4) was found to be elevated in tumor samples and linked to poor patient prognosis. This suggests that the MIF signal hampers T-cell-mediated anti-tumor responses, ultimately contributing to decreased patient survival (Fig. 4E, F).

On the other hand, an increase in CD274 expression (encoding PD-L1) and T—cell dysfunction, including reduced proliferation and increased exhausted state, were also observed in high—risk patients (Supplementary Fig. 3). Collectively, the MIF-CXCR4 signal and PD-L1 drive immune escape in BRCA patients with high metabolism, ultimately reducing the patient’s survival.

Glycolysis dominates mitochondrial metabolism

To investigate the metabolic heterogeneity between patients with divergent mitochondrial metabolic activities, we initially performed KEGG enrichment analysis on 64 model genes, revealing significant enrichment in glycolysis-related pathways including carbon metabolism and serine metabolism (Fig. 5A). Subsequent single-cell metabolic profiling demonstrated elevated glycolytic metabolites in high-risk cells (Fig. 5B), with pathway enrichment analysis confirming enhanced glycolytic activity in high-risk subpopulations (Fig. 5C).

Fig. 5
figure 5

Enhanced high mitochondrial metabolism leads to a pro-tumorigenic phenotype. (A) KEGG enrichment analysis of 64 MitoScore model genes. (B) scFEA analysis of glucose and lactate metabolic flow differences in MitoScore-positive and negative tumor cells. (C) scmetabolism analysis of metabolic pathway differences in tumor cells. (D) In TCGA-BRCA, hypermetabolic patients phenocopy higher levels of hypoxia and glycolysis. (E) Pattern diagram of high mitochondrial metabolism leading to unfavorable patient prognosis. Left: Molecular mechanisms of MIF-driven tumors, MIF/CXCR4 signaling inhibits T cell killing function, and MIF/HIF-1α promotes glycolysis-dependent cell proliferation. Right side: In patients with high mitochondrial metabolism, enhanced glycolysis and MIF-CXCR4 signaling leads to cancer cell proliferation and immunosuppression, which in turn leads to poor prognosis. Abbreviations: MIF, Macrophage migration inhibitory factor; HIF-1α, hypoxia inducible factor 1 subunit alpha; CXCR4, chemokine receptor 4.

Beyond its established role in suppressing T-cell activation through CXCR4 interaction, MIF was found to potentiate glycolysis-driven tumor proliferation via HIF-1α upregulation28. Hypoxia synergizes with HIF-1αto transcriptionally activate glycolytic enzymes such as LDHB29,30. Consistent with these mechanisms, TCGA cohort analysis revealed significantly heightened hypoxia signatures and glycolytic flux in high-risk groups (Fig. 5D). Our integrated findings demonstrate that amplified glycolytic capacity represents a hallmark metabolic adaptation in hypermetabolic patients, driving both disease progression and reduced survival outcomes (Fig. 5E). This metabolic reprogramming establishes a vicious cycle wherein enhanced glycolysis fuels tumor growth while simultaneously creating an immunosuppressive microenvironment through lactate accumulation and HIF-1α-mediated signaling.

CPS1, an oncogene, drives glycolysis in BRCA cells

To further investigate how MRGs regulate tumor progression in breast cancer, we focused on CPS1, a key gene within the MRGs. CPS1 expression was significantly elevated in tumor tissues compared to normal breast tissues, as shown in Fig. 6A. Kaplan–Meier survival analysis indicates that high CPS1 expression is associated with poorer overall survival in breast cancer patients (p < 0.0001, Fig. 6B). ROC curve analysis demonstrates the prognostic value of CPS1, with AUC values for 1-year, 3-year, and 5-year survival rates of 0.70, 0.61, and 0.55, respectively (Fig. 6C), suggesting its good predictive capability for patient prognosis. The expression of CPS1 in different breast cancer cell types, shown through bubble and violin plots, reveals that CPS1 is primarily expressed in malignant cells and epithelial cells (Fig. 6D–E).

Fig. 6
figure 6

CPS1 expression and functional validation in breast cancer. (A) CPS1 mRNA expression was higher in TCGA-BRCA tumor tissues than in normal tissues (p < 0.01). (B) Overall survival was worse in TCGA-BRCA patients with high CPS1 expression (p < 0.0001). (C) The AUC of CPS1 expression predicting 1-, 3-, and 5-year survival were 0.70, 0.61, and 0.55, respectively. (D) Single-cell analysis showed (GSE176078) that CPS1 expression was significantly higher in some cell types (e.g., oligodendrocytes and fibroblasts) than in others (e.g., B lymphocytes and endothelial cells). (E) CPS1 expression is higher in malignant epithelial cells than in non-malignant cells. (F-G) qpcr validation of the knockdown efficiency of cps1 in bt549 (F) and e0771 (G) cells showed that the cps1 gene was successfully knocked down in bt549 and e0771 cell lines. (H–J) Knockdown of CPS1 reduced colony formation in BT549 and E0771 cells. (K–M) Proliferation of BT549 and E0771 cells was diminished after cps1 knockdown. (M) ECAR decreased after CPS1 knockdown, with significant differences in sh#2 (p < 0.01, *** p < 0.001). (N, O) Glycolytic activity was significantly reduced in the CPS1 knockdown group. (P) Real-time ECAR in BT549 cells showed reduced glycolysis in the knockdown group. (Q) Western blot showed reduced expression of Pan-Kla and h3k9la in the knockout group, suggesting that cps1 is involved in epigenetic and transcriptional regulation of breast cancer cells by regulating histone h3 lactonylation modification.

To investigate the functional role of CPS1, we performed knockdown experiments targeting CPS1 (sh#1 and sh#2) in BT549 and E0771 breast cancer cell lines using shRNA. Quantitative RT-PCR analysis confirmed successful reduction of CPS1 mRNA expression in BT549 and E0771 cells (Fig. 6F–G, p < 0.0001). Clonogenic assays showed that CPS1 knockdown significantly reduced colony formation in both cell lines compared to the control group (shNC) (Fig. 6H–J, *p < 0.01, p < 0.001), and cell proliferation measured by CCK8 assays (Fig. 7 K, L) showed significant differences between the CPS1 knockdown and control groups at 72 h, indicating that CPS1 knockdown inhibited proliferation in the E0771 and BT549 breast cancer cells. Meanwhile, we constructed a CPS1 overexpression model in the E0771 cell line, confirming that CPS1 overexpression significantly enhanced E0771 cell proliferation activity (Supplementary Fig. 4A-E). ECAR measurements showed that glycolytic activity was significantly reduced in CPS1 knockdown cells, particularly in the sh#2 group (Fig. 6M, ** p < 0.01, * p < 0.001) (Fig. 6M–P). Western blot analysis revealed that Pan-Kla expression and H3K9 lactylation levels were reduced in the CPS1 knockdown group (Fig. 6Q), indicating that the metabolic changes caused by CPS1 loss affect histone lactylation, linking cellular metabolism with epigenetic modifications. In summary, these results suggest that CPS1 plays a crucial role in regulating metabolic activity and epigenetic modifications in breast cancer cells, highlighting its potential as a therapeutic target.

Fig. 7
figure 7

CPS1 knockdown enhances anti-tumor immunity and synergizes with anti-PD1 therapy in a mouse breast cancer model. (A) Schematic of the experimental design. C57BL/6 J mice were subcutaneously injected with E0771 cells (shNC or shCPS1, 5 × 105 cells). Anti-PD1 (200 μg/mouse) or control treatments were administered intraperitoneally on days 7, 12, and 16. Tumor growth and immune responses were assessed on day 21. (B) Representative images of excised tumors from the four treatment groups: shNC, anti-PD1, shCPS1, and shCPS1 + anti-PD1. Tumor size is visibly reduced in the shCPS1 + anti-PD1 group compared to the other groups. (C) Tumor growth curves showing a significant reduction in tumor volume in the shCPS1 + anti-PD1 group compared to single treatments or controls (p < 0.0001). (D) Kaplan–Meier survival curve illustrating improved survival in the shCPS1 + anti-PD1 group compared to the other groups (p < 0.0001). (E) Tumor weight at the endpoint. The shCPS1 + anti-PD1 group exhibits the lowest tumor weight (p < 0.0001). (F–H) Flow cytometry analysis of tumor-infiltrating immune cells: (F) Percentage of CD4+ T cells among CD45+ immune cells. (G) Percentage of CD8+ T cells among CD45+ immune cells. (H) Percentage of Ki67+ proliferating CD8+ T cells among tumor-infiltrating CD8+ T cells. The shCPS1 + anti-PD1 group shows significantly enhanced infiltration and activation of CD8+ T cells (p < 0.0001). (I-J) Flow cytometry analysis of IFN-γ+ CD8+ T cells in tumors. Representative plots (I) and quantification (J) demonstrate increased IFN-γ production in CD8+ T cells from the shCPS1 + anti-PD1 group (p < 0.0001). (K-L) Flow cytometry analysis of GZMB+ CD8 + T cells in tumors. Representative plots (K) and quantification (L) show significantly increased cytotoxic CD8+ T cells in the shCPS1 + anti-PD1 group (p < 0.0001).

CPS1 knockdown enhances T-cell anti-tumor immunity and synergizes with anti-PD1 therapy in a mouse breast cancer model

To assess the effects of CPS1 knockdown on tumor progression and anti-tumor immunity, we developed a subcutaneous breast cancer model in C57BL/6 J mice using E0771 cells with CPS1 knockdown (sh-CPS1) and applied anti-PD1 treatment (Fig. 7A). The results indicate that knocking down CPS1 significantly inhibited tumor growth in the sh-CPS1 group, with a further enhancement of the anti-tumor effect when combined with anti-PD1 treatment. This suggests a potential synergistic relationship between CPS1 knockdown and immune checkpoint inhibition, leading to reduced tumor volume and weight, as shown in the measurements of tumor size and weight in Fig. 7B, C, and E. Kaplan–Meier survival analysis revealed that CPS1 knockdown prolonged survival, with the most pronounced benefit observed in the sh-CPS1 + anti-PD1 group (Fig. 7D). These findings suggest that CPS1 knockdown promotes immune cell infiltration into the tumor microenvironment. The increased presence of CD3⁺ and CD8⁺ T cells in the sh-CPS1 and sh-CPS1 + anti-PD1 groups indicates a more active anti-tumor immune response. Furthermore, the combination of CPS1 knockdown and anti-PD1 treatment appears to further enhance this effect, suggesting a potential mechanism through which CPS1 suppression may improve the efficacy of immune checkpoint blockade therapy. Additionally, Ki67 expression in CD8⁺ T cells was increased in the sh-CPS1 group and further elevated with anti-PD1 treatment, indicating enhanced proliferation of tumor-reactive T cells (Fig. 7H). Functional analysis of CD8⁺ T cells revealed a higher frequency of IFN-γ⁺ cells in the sh-CPS1 and sh-CPS1 + anti-PD1 groups, suggesting improved effector function (Fig. 7I–J). However, the proportion of GZMB-producing CD8⁺ T cells showed no significant difference between groups (Fig. 7K–L). Overall, these findings indicate that CPS1 knockdown strengthens anti-tumor immunity by promoting CD8⁺ T cell infiltration and function, thereby enhancing the therapeutic efficacy of anti-PD1 immunotherapy in breast cancer.

Discussion

Breast cancer (BC), characterized by pronounced molecular and phenotypic heterogeneity, presents considerable obstacles in prognostic stratification and the formulation of tailored therapeutic approaches31. Empirical evidence underscores the pivotal influence of clinicopathological variables, including histological differentiation, tumor staging, patient age, and disease dissemination, on survival outcomes and therapeutic efficacy. Concurrently, molecular biomarkers have gained prominence as indispensable components in risk stratification and treatment optimization32,33. Advances in elucidating the molecular underpinnings of BC have catalyzed the identification of innovative prognostic indicators and therapeutic targets, thereby accelerating the evolution of precision oncology. Furthermore, the integration of machine learning (ML) methodologies has significantly augmented the analytical capacity for processing high-dimensional omics data, facilitating evidence-based clinical interventions34. Nonetheless, the selection of ML algorithms that ensure robust model performance and prognostic reliability remains a pressing issue in translational research.

This study integrates single-cell multi-omics analysis and machine learning to systematically investigate the role of mitochondria-related genes (MRGs) in the prognosis of TCGA-BRCA and develops the MRGs scoring system called Mitoscore based on mitochondrial gene expression patterns. Our findings indicate a strong association between mitochondrial dysfunction and BRCA progression. Notably, malignant cells exhibited significantly higher mitochondrial gene scores than other cell types, suggesting a positive correlation between mitochondrial activity and tumor aggressiveness. By integrating single-cell RNA sequencing data with multi-cohort clinical datasets, we identified a set of key MRGs and constructed a robust prognostic prediction model. This model exhibited robust predictive performance across various independent cohorts, successfully categorizing BRCA patients into high- and low-risk groups, making it a valuable tool for risk assessment and personalized therapy. Additionally, enrichment analysis showed that tumors with high MRGs scores were primarily associated with proliferation-related pathways, such as DNA replication and the cell cycle, which may partially account for their worse prognosis.

In addition to its prognostic utility, this investigation delves into the influence of mitochondria-associated genes on the immunological landscape of tumors. The immune milieu of BRCA constitutes a profoundly complex and dynamic network, exerting a decisive impact on oncogenesis, metastatic dissemination, and therapeutic efficacy35. The intricate interaction between tumor cells and the immune system is further influenced by genomic and epigenomic changes within the tumor cells themselves, which regulate the expression of tumor-specific antigens and a variety of immunomodulatory factors. These modifications help shape the immune response in ways that can either enhance or inhibit tumor immune surveillance. Therefore, a comprehensive understanding of the multifaceted dynamics within the BRCA immune microenvironment is crucial for the advancement of precision immunotherapies, including immune checkpoint inhibitors and chimeric antigen receptor (CAR) T-cell therapies, both of which have shown promise in reprogramming the immune system to effectively target and eradicate cancer cells36,37. The findings reveal that high-risk patients manifest a profoundly immunosuppressive tumor milieu, typified by diminished infiltration of cytotoxic CD8⁺ T lymphocytes and elevated levels of immune inhibitory molecules such as CD80 and PDCD1LG238,39. Moreover, the enhanced expression of HLA-associated antigen-presenting genes in high-risk cohorts implies a potential mechanism of adaptive immune escape. Significantly, the MRGs score not only serves as a robust prognostic indicator for BRCA survival but also demonstrates efficacy in stratifying patients within immunotherapy cohorts, thereby equipping clinicians with a refined tool for patient categorization and enabling the identification of individuals most amenable to immunotherapeutic interventions.

Furthermore, we investigated the function and mechanism of CPS1 (Carbamoyl Phosphate Synthetase 1), a key mitochondria-related gene in breast cancer. Previous studies have identified CPS1 as a key gene in metabolic regulation, primarily expressed in the liver and kidneys40. This enzyme plays a crucial role in the urea cycle, catalyzing the conversion of ammonia (NH₃) and bicarbonate into carbamoyl phosphate, thereby assisting in ammonia detoxification through urea synthesis. In cancer, altered CPS1 expression has been detected in certain cancers, such as hepatocellular carcinoma and colorectal cancer, suggesting its potential involvement in metabolic reprogramming during tumor progression41,42. Our experimental results showed that CPS1 expression was significantly elevated in tumor tissues and correlated with poorer overall survival. However, the conclusion that CPS1 “drives glycolysis” lacks direct evidence, particularly since we have not yet verified downstream targets such as HIF-1α, and this inference requires further experimental validation.

In functional experiments, we observed that CPS1 knockdown significantly suppressed glycolytic activity and altered histone lactylation levels, revealing a potential link between mitochondrial metabolism and epigenetic regulation. However, the specific role of this mechanism still requires further research for confirmation, particularly regarding the role of the MIF-CXCR4 axis in immune evasion. In in vivo experiments, CPS1 knockdown not only enhanced anti-tumor immune responses but also synergized effectively with anti-PD1 therapy. This combination significantly reduced tumor growth and notably extended survival in a murine breast cancer model. These preliminary results provide promising evidence for CPS1 as a novel therapeutic target to enhance the efficacy of immunotherapy. However, considering that the current study has not yet definitively demonstrated the specific role of CPS1 in the immune microenvironment, we plan to further investigate the causal relationship between CPS1, immune evasion, and the tumor microenvironment in future research.

In conclusion, while this study provides new insights into the role of CPS1 in breast cancer, many unresolved questions remain. In future work, we will focus on the specific functions of CPS1 in metabolic reprogramming and immune regulations and conduct further foundational experiments to validate its potential as a therapeutic target for immunotherapy.

Materials and methods

Data download and processing

Data were obtained from the UCSC Xena TCGA-BRCA dataset (https://xena.ucsc.edu/), comprising 1231 samples, including 1118 breast cancer and 113 normal tissue samples. RNA-seq data and corresponding clinical information were collected, along with additional BRCA-related datasets from GEO (GSE86347, GSE21653, GSE58812, GSE123845, GSE42568, and GSE10886). Expression data were normalized to fragments per kilobase of transcript per million reads (FPKM) and log2-transformed. A total of 1136 mitochondrial genes were retrieved from the MitoCarta3.0 database. Batch effects were corrected for GSE86347, GSE21653, GSE58812, GSE123845, GSE42568, and GSE10886 using the R package ‘sva’. This method infers and removes batch effects, eliminating technical biases between different experimental batches to ensure data consistency and reliability. After batch effect correction, the data were unified, enabling more accurate cross-batch analysis. The PCA effect plots show the changes and improvements before and after correction, validating the effectiveness of the method. The detailed results are presented in Supplementary Fig. 4.

Processing of single-cell transcriptomic profiles

Single-cell RNA sequencing (scRNA-seq) data were processed using the Seurat v4.3.2 pipeline, beginning with the removal of low-quality cells—defined by thresholds for gene count, unique molecular identifiers (UMIs), and mitochondrial gene expression. Genes expressed in an insufficient number of cells were also excluded to ensure data robustness. The remaining expression matrices were normalized and scaled to mitigate technical variability arising from differences in sequencing depth and transcriptional activity across cells. Mitochondrially encoded transcripts were specifically annotated due to their role in metabolic regulation and stress response pathways relevant to tumor progression and intratumoral heterogeneity.

To explore and visualize cellular phenotypic diversity, Uniform Manifold Approximation and Projection (UMAP) was employed as a nonlinear dimensionality reduction method, preserving the intrinsic structure of the transcriptomic landscape. Cell clustering was performed using the K-Means algorithm, facilitating the resolution of discrete cellular subsets. Subsequently, these clusters were annotated based on canonical marker genes, enabling the identification of diverse populations including B lymphocytes, epithelial tumor cells, and other cell types within the tumor microenvironment.

Model construction

Differential analysis of mitochondrial-associated genes was conducted using the “limma” package in R, comparing expression levels between non-cancerous and tumor-derived tissue samples. Genes exhibiting statistically significant changes were selected based on an adjusted false discovery rate (FDR) threshold below 0.05 and a stringent absolute log2 fold change (|log2FC|) greater than 1.46, to ensure both statistical and biological relevance. Subsequently, a total of 101 combinations of 10 machine learning algorithms, including stepwise Cox, Lasso, Ridge, partial least squares regression for Cox (plsRcox), CoxBoost, random survival forests (RSF), generalized boosted regression modeling (GBM), elastic net (Enet), supervised principal components (SuperPC), and survival support vector machines (survival-SVM), were applied through tenfold cross-validation. The principal aim of this analysis was to identify the most robust prognostic mitochondrial-related biomarkers (MRGs), evaluated based on their concordance index (C-index) within survival models. The C-index serves as a key metric to assess the discriminatory power of a variable in predicting survival, with higher values indicating greater consistency between predicted and observed outcomes across patient strata.

Multi-cohort survival analysis and model performance evaluation

Multi-cohort survival analysis was conducted to evaluate the prognostic impact of mitochondrial-related genes across multiple datasets from TCGA and GEO (e.g., GSE86347, GSE21653), ensuring robust and generalizable results. Preprocessing and normalization were applied to align data structures across cohorts.

For each cohort, the Cox proportional hazards model assessed the impact of individual gene expressions on survival, Hazard ratios (HRs) were calculated to assess the strength of association between gene expression levels and patient survival risk. Genes with higher HRs indicated greater risk contributions, and their prognostic relevance was further visualized using color-coded scales to facilitate comparative interpretation. Subsequently, patients were categorized into high-risk and low-risk subgroups according to the expression signatures of the selected mitochondrial-related genes, enabling survival stratification and outcome comparison between the two prognostic categories.

To assess predictive accuracy across cohorts, receiver operating characteristic (ROC) curves and area under the curve (AUC) metrics were utilized. These analyses highlighted the prognostic significance of mitochondrial-related genes in breast cancer and supported their potential applications in personalized treatment and risk assessment. To compare the performance of mitochondrial-related genes across different datasets, multiple datasets were analyzed, this analysis included datasets such as GSE86347, GSE21653, GSE58812, GSE123845, GSE42568, GSE10886, and TCGA. For each dataset, risk assessment was carried out using a combination of Cox proportional hazards and random survival forest (RSF) models.

The predictive performance of each model was assessed using the concordance index (C-index), with a higher C-index indicating better predictive accuracy. This comparative approach allowed for a comprehensive evaluation of the model’s robustness and consistency across different datasets, reinforcing the value of mitochondrial-related gene signatures in predicting breast cancer prognosis across various patient cohorts.

Potential mechanisms and pathway analysis

To analyze potential mechanisms and pathways, a systematic approach can be followed, focusing on the key steps of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.kegg.jp/)43. Mitochondria-related genes (MRGs) were converted to Entrez IDs, and GO and KEGG enrichment analyses were performed with an adjusted p.value threshold of < 0.05. Potential mechanisms were further explored through Gene Set Variation Analysis (GSVA). Established approaches were used to examine the cancer immunity cycle, and pathways linked to immunotherapy response were identified.

In-depth characterization of tumor-infiltrating immune cell landscapes

The infiltration of immune cells in different risk groups was analyzed using RNA sequencing data from breast cancer samples. Additionally, the ssGSEA algorithm was employed to evaluate the abundance of immune cell infiltration and the activity of immune-related pathways in tumor samples. The TIMER2.0 database provided a summary of immune cell infiltration abundance in TCGA, integrating results from multiple algorithms. Furthermore, the R package “IOBR” , which includes various immune infiltration algorithms, was used to estimate the abundance of immune cells in tumor samples.

Cell culture

The E0771 and BT549 cell line was obtained from the American Type Culture Collection (ATCC) and cultured in RPMI 1640 medium supplemented with 10% fetal bovine serum (FBS, Gibco, Grand Island, NY, USA) and 1% penicillin/streptomycin solution. The cells were maintained under conditions of 37 °C and 5% CO₂.

Establishment of stable cell lines via lentiviral transduction

E0771 and BT549 breast cancer cell lines were plated in 24-well culture plates at a seeding density of 1 × 105 cells per well to ensure optimal confluency for subsequent viral infection procedures. After plating, the cells were cultured for 12–20 h. The lentivirus was then diluted using a mixture of complete medium and 5 µg/mL polybrene, and 0.5 mL of the diluted lentivirus solution was added to each well. After 24 h of transfection, the virus-containing medium was removed and replaced with 0.5 mL of complete medium containing 1 µg/mL puromycin to select for stably transfected cell lines. Once the cells reached confluency, a portion of the cells was harvested for qRT-PCR and Western blot analyses to evaluate the knockdown efficiency. The RNA target sequences utilized in this study are as follows: CPS1 (human): GCCAACCCTATTATTGGGAAT; CPS1 (mouse): TGTCATCCGCCTGCTAGTTAA.

Quantitative real-time PCR

The qRT-PCR process begins with the extraction of total RNA using TRIzol reagent (Invitrogen, USA) following the manufacturer’s protocol. RNA concentration and purity are assessed, ensuring an A260/A280 ratio of 1.8–2.0. cDNA is synthesized using the PrimeScript™ RT reagent Kit (Takara, Japan) in a 20 µL system with 1 µg of RNA, random primers or Oligo(dT), reverse transcriptase, and reaction buffer. The reaction conditions are 25 °C for 5 min, 50 °C for 30 min, and 85 °C for 5 min. For qPCR, a 20 µL reaction mixture is prepared with TB Green® Premix Ex Taq™ II (Takara, Japan), including 1–2 µL cDNA, 10 µL master mix, 0.4 µL forward and reverse primers (10 µM each), and nuclease-free water. The qPCR runs on a thermal cycler (Thermo Fisher QuantStudio™ 5) with an initial denaturation step at 95 °C for 5 min, followed by 40 cycles of 95 °C for 10 s and 60 °C for 30 s. A melting curve analysis is performed to ensure specificity. Data are analyzed using the ΔΔCT method (2−ΔΔCT), with statistical tests to compare gene expression levels between samples. The gene-specific primer sequences used in the study are as follows: CPS1 (human): Forward Sequence: GTGGTGCTGAACTCCAAGAGGA; Reverse Sequence: GAGTAGTCTCCCAAGGACAGGT. CPS1 (mouse): Forward Sequence: CATGGAACATCCAGCCGAATTGG; Reverse Sequence: GATGGCACATCCTCAGAGCCTT.

Colony formation assay

Cells in the logarithmic phase of growth were trypsinized to obtain a single-cell suspension and subsequently seeded into 6-well plates at a density of 800 cells per well. The plates were maintained at 37 °C in a humidified incubator with 5% CO₂ for 10 days, with medium changes every three days to support optimal cell viability and proliferation. At the end of the incubation period, the culture medium was removed, and cells were gently washed with phosphate-buffered saline (PBS). Colonies were fixed using 4% paraformaldehyde for 20 min at room temperature, followed by staining with 0.1% crystal violet for an additional 20 min. Unbound dye was washed off with water, and the plates were air-dried to allow clear visualization and analysis of colony formation.

CCK-8 assay

Seed the test cells into a 96-well plate at a density of 5000 cells per well, ensuring consistent cell density across all wells. Incubate the plate at 37 °C with 5% CO₂. At 0 h (when the cells are observed to adhere under the microscope, recorded as 0 h), 24 h, 48 h, and 72 h, add 10 µL of CCK-8 solution to the designated wells. Return the plate to the incubator and incubate for 4 h. Measure the absorbance at 450 nm using a microplate reader, record the data, and proceed with analysis.

Subcutaneous tumor xenograft model and administration of anti-PD-1 immunotherapy

To establish an immunocompetent breast cancer model, 6–8-week-old female C57BL/6 mice (weight range: 18–22 g) were subcutaneously injected in the right flank with 5 × 105 E0771 cells suspended in 100 µL of PBS. For all invasive procedures (cell inoculation and euthanasia), mice were anesthetized using 3% isoflurane delivered via a precision vaporizer in 100% oxygen at 1 L/min flow rate. Seven days post-inoculation, mice were randomly assigned to treatment or control groups. The treatment group received intraperitoneal injections of anti-PD-1 monoclonal antibody (10 mg/kg) on days 7, 11, and 15. Tumor volumes were measured every two days using calipers and calculated as (length × width2)/2. Mice were humanely euthanized by CO₂ asphyxiation (displacement rate: 30% chamber volume/min) followed by cervical dislocation when tumor volumes reached or exceeded 1000 mm3 (equivalent to 1 cm3), in compliance with institutional ethical guidelines for animal research. Another independent cohort of breast cancer mouse models was used for long-term survival observation to generate survival curves. All experiments were conducted in strict accordance with the institutional ethical guidelines for animal research.

Multiparameter flow cytometric analysis of tumor-infiltrating immune cells

To analyze immune cell subsets within the tumor microenvironment, subcutaneous tumors were harvested from tumor-bearing mice, mechanically dissociated, and enzymatically digested in a solution containing hyaluronidase, collagenase V, and DNase I at 37 °C for 2 h. The cell suspension was filtered through a 70 µm strainer and treated with red blood cell lysis buffer for 5 min, then washed and resuspended in pre-cooled PBS. For cytokine detection, single-cell suspensions were stimulated for 5 h at 37 °C in the presence of monensin (1 µg/mL), ionomycin (100 ng/mL), and PMA (100 ng/mL). Cells were stained in 96-well plates using the Zombie NIR™ Fixable Viability Kit (Biolegend), blocked with anti-CD16/32, and labeled with anti-mouse CD45, CD3, and CD8a antibodies (Biolegend). After fixation with 1% paraformaldehyde, intracellular staining was conducted with GZMB and IFN-γ antibodies following membrane permeabilization using the Invitrogen permeabilization reagent (3,165,296). Finally, samples were filtered through 200-mesh strainers and analyzed by flow cytometry.

Western blot

Cells and tissues were initially lysed, and proteins were separated by SDS-PAGE before being transferred onto PVDF membranes (Millipore, USA). Protein concentration was measured using the Pierce BCA Protein Assay Kit (Beyotime Biotechnology, P0012S). The membranes were then blocked with 5% skimmed milk at room temperature for 2 h, followed by overnight incubation with primary antibodies at 4 °C.After applying secondary antibodies for 1 h at room temperature, chemiluminescence detection was performed using the Tanon5200 system (China). The Western blot bands were analyzed and quantified using ImageJ software and the antibody information used for Western blot is provided in Supplementary Table 1.

Assessment of cellular respiration and glycolytic activity using seahorse XF analyzer

Cellular metabolism was assessed by measuring the extracellular acidification rate (ECAR) using the Seahorse XF96 Analyzer (Agilent Technologies, USA). Transfected cells (1 × 104) were seeded into Seahorse XF 96-well plates. The assays were performed following the manufacturer’s instructions, with compounds added to measure mitochondrial respiration and glycolytic activity. For mitochondrial stress testing, compounds such as oligomycin, FCCP, and rotenone/antimycin A were used to evaluate basal respiration, ATP production, and maximal respiratory capacity. For glycolysis testing, glucose, oligomycin, and 2-deoxyglucose were introduced to determine glycolytic rate and capacity. Data were collected in real-time, and metabolic parameters were analyzed to assess cellular metabolic activity, including baseline and stress response values for both respiration and glycolysis.

Statistical analysis and data evaluation

All data are expressed as mean ± standard deviation (SD). Statistical analyses were conducted using GraphPad Prism 7.0 software (GraphPad Software, USA). For comparisons between two groups, an unpaired two-tailed t-test was employed. For comparisons involving multiple groups, one-way analysis of variance (ANOVA) was followed by Tukey’s post-hoc test for pairwise comparisons. In cases involving two independent variables, two-way ANOVA followed by Bonferroni’s post-hoc test was used. A P-value less than 0.05 was considered statistically significant.