Introduction

Breast cancer has become the foremost threat to female health globally. With an estimated 300,590 new cases and 43,700 deaths worldwide in 2023, it still has the second highest mortality rate around all maligants1,2. In particular, triple-negative breast cancer, which accounts for 10%–50% of all types of breast cancer, has a poor prognosis and a high recurrence rate3,4,5. Although significant breakthroughs have been made in recent years, the disease burden is still increasing, and further understanding of breast cancer is required.

Oxidative stress is the imbalance between the oxidation and antioxidant systems. Studies have recently discovered a relationship between oxidative stress and cancer. Reactive oxygen species (ROS) and reactive nitrogen species (RNS) contribute to oxidative stress, and most ROS and RNS are produced by the mitochondrial respiratory chain, which can be decomposed by auti-oxdants such as superoxide dismutase catalase (SOD), glutathione peroxidase et al. Overproduction of ROS is believed to lead to DNA damage and tumorigenesis6. Recently, several studies have demonstrated the prognostic value of oxidative stress in breast cancer7,8,9. And is oxidative stress always benefited to tumor? What is the impact of oxidative stress on tumor microenvironment after tumor formation?

In the past, oxidative stress was recognized as a promotive factor in the tumor process and a cause of resistance to therapy10,11,12. Moreover, the persistent presence of ROS may result in alterations in the metabolic pathways in tumors, leading to metastasis12,13. Research has shown that breast cancer type 1 susceptibility proteins can decrease ROS levels and protect breast cells from oxidative stress-induced apoptosis14. And could the oxidative stress promote the growth of breast cancer? and since the oxidative can lead to the damage of DNA and cell apoptosis, can it damage the breast cancer and kill it off? The mechanism underlying this contradiction remains unknown, and further research is urgently needed.

In this study, gene expression data from breast cancer and single-cell data from breast cancer were used to analyze the prognostic effects of oxidative stress and the biological mechanism of oxidative stress in the microenvironment of breast cancer. This could provide more treatment targets and support for clinical practice.

Materials and methods

Ethical statement

Ethical committee approval was not required because only data from public databases were used for the bioinformatics analyses in this study.

Data download and processing

Gene expression datasets for breast cancer were downloaded from TCGA (https://portal.gdc.cancer.gov/), METABRIC (https://www.cbioportal.org/), and the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The single-cell dataset was downloaded from GSE161529. Data on oxidative stress-related genes were downloaded from the Gene Ontology Database (https://www.geneontology.org/). The format of TCGA was counts; therefore, Log 2 transformation was not required. Differential gene expression was analyzed using the limma package in R 4.3 software (https://www.r-project.org/), with the threshold set at an absolute value of Log 2FC > 1 and a adjust-P-value less than 0.05.

Construction of an oxidative stress risk score model and a prognosis model

Common genes with dysregulated expression in the TCGA with “counts” format and oxidative stress-related genes were identified, and least absolute shrinkage and selection operator (LASSO) and Cox regression analyses were performed to construct the oxidative stress risk score model. A prognosis model was constructed using univariate Cox regression analysis and multivariate Cox regression analysis. the prognostic analysis were conducted with TCGA dataset and validated with GEO dataset (GSE103091).

Processing of a single-cell atlas

The Seurat package15 was used to analyze the single-cell data, with the resolution threshold set at 0.5. The clustering results were visualized based on uniform manifold approximation and projection (UMAP), while CytoTRACE16 showed the results using t-distributed stochastic neighbor embedding (t-SNE). The FindAllMarkers function in the Seurat package was used to evaluate differences in gene expression between each cluster. In addition, cell-type annotation was performed using the SingleR17 package with Human Primary Cell Atlas Data and Blueprint Encode Data. The AddModuleScore function of the Seurate package was used to analyze different types of oxidative stress scores. Epithelial cells were excluded from this study.

Functional enrichment analysis

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were conducted based on the dysregulated expression of different genes between each cluster in the single-cell data. The clusterProfiler package of R software was used to perform GO and KEGG analyses.

Immune cell infiltration analysis

Immune cell infiltration analysis was performed using the immunedeconv package18. Three methods (“XCELL,” “EPIC,” and “MCPCOUNTER”) were used to analyze the proportions of the different cell types.

Differentiation, developmental, and pseudo-time analyses

The Monocle 3 package19,20,21 of R software was used to evaluate the developmental trajectory of breast cancer cells. The CytoTRACE package was used to evaluate the developmental differentiation of the different cell types.

Cellular communication

Cellular communication in breast cancer was conducted using the iTALK package; important cellular communication was linked with interlinkage, and highly or differentially expressed genes are shown in the results.

Statistical analysis

Univariate Cox regression analysis and multivariate Cox regression analysis were conducted to form a prognostic model, the prognostic analysis were conducted with TCGA dataset and validated with GEO dataset (GSE103091). Factors with a P-value less than 0.05 in the univariate Cox regression analysis were inputted into the multivariate Cox regression analysis, and factors with a P-value less than 0.05 in the multivariate Cox regression analysis were inputted into the prognosis model. The accuracy of the prognostic model was calculated a receiver operating characteristic (ROC) curve analysis with 1,3,5-year long-term survival dates. All the analysis of this study was conducted by R sofeware (R 4.1).

Result

Construction of risk score model with oxidative stress related genes

The TCGA-BRCA dataset with 1077 patients was included for hub gene selection, and 666 genes with dysregulated expression in TCGA-BRCA dataset were included in the subsequent analysis. Firstly, there were 29 genes in common between the oxidative stress-related genes and hub genes in the TCGA-BRCA dataset (Fig. 1A). After evaluated the genes with LASSO-COX regression to forbid overfitting, A risk score model was constructed with following 8 genes (Fig. 1B and C).

Fig. 1
figure 1

The process of Constructing prognostic model. A. Common genes in the DEGs of the TCGA-BRCA database and oxidative stress-related genes; B. LASSO coefficient profiles based on the LASSO regression model; C. The LASSO partial likelihood deviance shows the eight oxidative stress-related genes determined by an optimal lambda; D. Time-dependent ROC curve of the 1-, 3-, and 5-year overall survival predicted by the prognostic model.

A risk score model was established using eight genes, and the regression results showed that CAMP was a protective factor, while FABP1, CYP1A1, ALAS2, HIF3A, ADCY8, MAS1, and CAV3 were risk factors for breast cancer. The formula for the oxidative stress score was given as follows:

Risk score = -0.00822437 × CAMP + 0.004850592 × FABP1 + 0.009326334 × CYP1A1 + 0.065457188 × ALAS2 + 0.027152253 × HIF3A + 0.030062013 × ADCY8 + 0.01766539 × MAS1 + 0.00524067 × CAV3.

Prognostic analysis of the risk score model

Univariate and multivariate survival analyses were also conducted. The multivariate survival analysis showed that the risk score, age, and tumor stage were independent prognostic factors for breast cancer (Table 1). A prognostic model was established based on the three risk factors, and a Kaplan–Meier survival analysis demonstrated significant differences between the high- and low-risk groups. The time-dependent ROC curve also illustrated that the model satisfactorily predicted survival, with an area under the ROC curve of up to 0.87 for 1-year survival (Fig. 1D).

Table 1 Multivariable Cox analysis in TCGA dataset.

External datasets were used to validate the prognostic models. Data were downloaded from the METABRIC and GEO databases, and a survival analysis was conducted using the prognostic model (Fig. 2). Overall survival analysis was conducted respectively in the METABRIC and GEO databases (Fig. 2A and 2C), recurrence-free survival and median survival was also conducted respectively in the METABRIC (Fig. 2B) and GEO databases (Fig. 2D). Those results demonstrated that the prognostic model had good prognostic ability.

Fig. 2
figure 2

Prognostic analysis of the METABRIC and GEO datasets using the prognostic model. A. Kaplan–Meier survival analysis of the overall survival in the METABRIC dataset of breast cancer using the prognostic model. B. Kaplan–Meier survival analysis of the recurrence-free survival in the METABRIC dataset of breast cancer using the prognostic model. C. Kaplan–Meier survival analysis of the overall survival in the GEO dataset of breast cancer using the prognostic model. D. Kaplan–Meier survival analysis of the median survival in the GEO dataset of breast cancer using the prognostic model.

Immune cell infiltration analysis

Immune cell infiltration analysis was performed on the TCGA-BRCA dataset. XCELL, EPIC, and MCPCOUNTER were used for immune infiltration analysis. It showed that the percentage of endothelial cells was higher in the high-risk group than in the low-risk group, categorized using the risk score model (Fig. 3).

Fig. 3
figure 3

Immune cell infiltration analysis using “XCELL,” “EPIC,” and “MCPCOUNTER” methods.

Single- cell analysis with risk score model

Single-cell datasets were used for further analyses. GSE161529, which contains normal, preneoplastic, and breast cancer cells, was used in this study. The results demonstrated that the percentages of endothelial cell were lower in breast cancer cells than in normal and preneoplastic cells (Fig. 4A, Fig. 4B, and Fig. 4C, respectively). The oxidative risk score was assessed using the AddModuleScore function in the Seurat package, and the results demonstrated that the oxidative risk score was lower in breast cancer cells than in normal and preneoplastic cells (Fig. 4D). The conflicting results of single-cell and gene expression analyses should be further investigated.

Fig. 4
figure 4

UMAP plots showing the microenvironment of normal cells, pre-neoplastic cells, and breast cancer cells. A. A UMAP plot showing the microenvironment of normal cells, four types of cells were shown respectively. B. A UMAP plot showing the microenvironment of pre-neoplastic cells, four types of cells were shown respectively. C. A UMAP plot showing the microenvironment of breast cancer cells, four types of cells were shown respectively. D. The oxidative stress scores in normal, pre-neoplastic, and breast cancer cells.

Cell trajectory and pseudo-time analysis were conducted on breast cancer cells in the GSE161529 dataset. The cell trajectory analysis showed that endothelial cells and cancer-associated fibroblast (CAF) may be involved at the beginning of the developmental trajectory (Fig. 5A) of breast cancer. The pseudo-time analysis result was consistent with the former result (Fig. 5B). The differentiation potential of different cell types was analyzed using CytoTRACE, and the results demonstrated that endothelial cells had the lowest degree of differentiation (Fig. 5C and D). The top 20 genes expressed in breast cancer cells are shown in supplemental Fig. 1.

Fig. 5
figure 5

Cell trajectory and pseudo-time analyses of breast cancer cells. A. A cell trajectory analysis of breast cancer cells. B. A pseudo-time analysis of breast cancer cells. 5C. A boxplot of the differentiation analysis of breast cancer cells. D. A t-SNE plot of the differentiation analysis of breast cancer cells.

Further analysis of endothelial cells in breast cancer tissues revealed that endothelial cells can be divided into eight subpopulations (Fig. 6A), and gene oncology analysis of the endothelial cell marker showed that it was mainly involved in cell adhesion and cell migration (supplemental Figs. 2A–C). A cell trajectory analysis was also performed on endothelial cells. It demonstrated that the second subpopulation may be the beginning of the development trajectory of endothelial cells (Fig. 6B). The differentiation potential of endothelial cells was also analyzed. This showed that the second, third, and fourth subpopulations had a lower degree of differentiation, while the 5th, 0th, and 1st subpopulations had a higher degree of differentiation (Fig. 6C and D).

Fig. 6
figure 6

Cell trajectory and pseudo-time analyses of endothelial cells. A. A UMAP plot showing the endothelial cells of breast cancer tissues. B. A cell trajectory analysis of endothelial cells. C. A boxplot of the differentiation analysis of endothelial cells. D. A t-SNE plot of the differentiation analysis of endothelial cells. E. A biological process analysis with different gene expression analyses between cluster 2 and cluster 1 of endothelial cells. F. A violent plot of gene expression in endothelial cells.

Gene expression analyses between cluster 2 and cluster 1 of endothelial cells were conducted, and a gene-oncology analysis was conducted with those genes. The results showed that oxidative stress was one of the major biological processes (Fig. 6E) involved in breast cancer. Gene expression in the different subpopulations is shown in Fig. 6F. Antioxidant related genes, including NFKBIA, HSPA1B, and GSTK1, were overexpressed in the lower degree subpopulation, while TNFRSF4 and RGS3 were overexpressed in higher degree subpopulations (Fig. 6F). These results resolve the aforementioned contradiction.

Based on the above results, we can define a new cluster of endothelial cells in breast cancer, HSPA1B + and GSTK1 + endothelial cells, which have a lower degree of differentiation and are more aggressive.

Cell–cell interaction analysis

Intercellular communication between the breast cancer cell subpopulations was examined. Four parts of intercellular communication were demonstrated in this study: checkpoints, cytokines, growth factors, and others (Fig. 7A, B, C, and D). All four parts showed strong interactions between the endothelial cells and the immune cells (T cells, B cells, and macrophages). In addition, CAF also interacted with endothelial cells.

Fig. 7
figure 7

Intercellular communication between breast cancer cells. A. The checkpoint subset of the intercellular communication between breast cancer cells. B. The cytokine subset of the intercellular communication between breast cancer cells. C. The growth factor subset of the intercellular communication between breast cancer cells. D. The other subsets of the intercellular communication between breast cancer cells.

Discussion

In this study, we constructed an oxidative stress risk score model and a prognostic model using the TCGA dataset. The prognostic model performed well with the METABRIC and GEO datasets. An analysis of the breast cancer microenvironment was performed using single-cell data, and the results showed that the oxidative stress risk score was lower in breast cancer cells than in normal cells. Which was indicated that the breast cancer can product materials that prevent cellular damage from oxidative stress. Endothelial cells showed the lowest degree of differentiation in the microenvironment of breast cancer, and further analysis revealed that endothelial cells overexpressing antioxidant-related genes may avoid apoptosis induced by ROS and RNS. Moreover, endothelial cells overexpressing antioxidant-related genes exhibited a low degree of differentiation.

An imbalance in the oxidative system leads to oxidative stress, which generates ROS and RNS, resulting in DNA damage and mutations in tumor suppressor genes. Therefore, oxidative stress is believed to be a factor that promotes breast cancer development4,22,23. Recently, Han et al.24 found that KEAP1 would inhabit the degression of NRF2 and reduce the proliferation of breast cancer. Although some studies have constructed a prognostic model for breast cancer that contains oxidative stress-related genes, flaws exist, and a better model is required. Ye et al.25 further verified the result. Even Zhao et al.26 have constructed a prognostic model with oxidative stress long noncoding RNA and clinical factors; however, among the parameters that the model included, the N and M stage were part of tumor stage, therefore, the parameters were duplicated and overfitting, a more suitable prognostic model was need. Hu et al.7 also constructed a prognosis model with oxidative stress-related genes that had a good prognostic value. However, they only used the Kaplan–Meier method to validate the prognostic model and used the same dataset. In this study, we constructed a prognostic model using three parameters: age, tumor stage, and oxidative stress risk score, and the prognostic model performed well on both the training set (TCGA dataset) and the validation set (METABRIC and GEO datasets), lasso-regression was also used to prevent overfitting.

Initiation, promotion, and progression are believed to be the three stages of tumorigenesis. Tumorigenesis has been shown to involve ROS and RNS at all stages11,27,28. Yuzefovych et al.29 discovered that oxidative stress can damage DNA and promote breast cancer progression and metastasis, thus, the imbalance of oxidative system seems to be a promoter factor in breast cancer being30,31. In this study, we found results consistent with those of previous studies, showing that oxidative stress plays an important role in breast cancer tumorigenesis, and the anti-oxidative stress become stronger after the tumorigenesis of breast cancer, which would reduce the cellular damage of oxidative stress on breast cancer. Unlike most studies just constructed a prognostic model and verified it, we would take it into the single-cell data and found the impact of oxidative stress in tumor microenvironment of breast cancer. Interestingly, after formation of breast cancer, the endothelial cell in the tumor microenvironment would overexpress anti-oxidative stress genes, in that case, the cancer cell would escape from death cause by ROS, this is the first study that found the mechanism that the microenvironment of breast cancer answer to ROS, further studies would be need to verify the conclusion.

Using a single-cell dataset, we found that the oxidative risk score was significantly lower in breast cancer cells than in normal cells. This result contradicts the results of previous studies. A literature search, revealed that Morotti et al.32 had found that peroxidative drugs that induced the production of ROS could “kill” breast cancer cells. Furthermore, Tong et al.33 found that glutaminase was upregulated in pancreatic ductal adenocarcinoma and promoted glutaminolysis, tumor growth, and metastasis. Therefore, further analysis was performed, and the result revealed that antioxidant-related genes were overexpressed in breast cancer endothelial cells. Thus, the breast cancer cells could escape oxidative stress-induced death.

However, the mechanisms underlying oxidative stress in breast cancer remain unclear. Dai et al.34 found that the MAPK/JUN pathway may participate in this process and promote the aggressiveness of breast cancer cells. Wang et al.35 found KLF5 would upregulate in breast cancer with liver metastasis and worse the long-term outcome, Luo et al.36 found that ZMYND8 activated the expression of NRF2, which protects breast cancer cells from oxidative stress. In this study, we found that antioxidant-related genes, such as HSPA1B, GSTK1, and DNAJB1, were overexpressed in the endothelial cells of breast cancer, especially in cells with a lower degree of differentiation. This suggests that the overexpression of antioxidant-related genes promote breast cancer aggressiveness.

Although we aimed to understand the mechanism of oxidative stress in breast cancer, there were some limitations to this study. Firstly, the microenvironment of breast cancer is complicated5,37,38, even some target genes were found, the pathway should be further verified, and the impact of ROS on breast cancer requires further exploration. Secondly, additional datasets, particularly clinical datasets, should be used to validate the prognostic model. Thirdly, because the datasets in this study were downloaded from a public database, selection and publication biases may have occurred.

Conclusions

In summary, we constructed a prognostic model using the age, tumor stage, and oxidative risk score that performed well in the TCGA, METABRIC, and GEO datasets. These results demonstrate the role of oxidative stress in tumorigenesis. However, we found that the oxidative stress score was lower in breast cancer cells than in normal cells, and the reason for this contradiction is that antioxidant-related genes are overexpressed in breast cancer endothelial cells. Therefore, the tumor was a “crafty boy,” at the initiation stage, during which oxidative stress “helped” the tumor. However, the tumor “discarded and prevented” oxidative stress during the promotion and progression stages.