Introduction

According to GLOBOCAN 2020, breast cancer is the most prevalent malignancies for women. In American, breast cancer was accounting for nearly one-third of female cancer diagnoses. Largely due to the improvement of early cancer screening and detection and systematic treatment, the encouraging fact comes that the cancer survivors are living longer, and as a result, the number of persons diagnosed with SPMs, which defined as two or more malignant tumors of distinct histological types in the same individual simultaneously or successively, has increased1,2,3,4,5,6. Considering almost nearly 8% of American cancer survivors were diagnosed with more than one primary cancer, it is necessary to pay more attention to SPMs7. A Meta analysis research reported almost 17% breast cancer survivors more likely to develop SPMs than the general population8. The etiology of SPMs remains unknown and may be likely reflects shared risk factors including genetic, reproductive, hormone level, endocrine and treatment, environment, lifestyle and so on9,10.

Breast cancer is a heterogeneous disease with four molecular subtypes of Luminal A, Luminal B, HER2 enriched and triple negative breast cancer(TNBC). Previous researches had reported the relationship between the hormone status and SPMs. A study reported that in comparison with the general population, the risk of SPMs diagnoses was 20% higher among hormone receptor status(HR) positive survivors and 44% higher among HR negative survivors11. The risk for several limited sites by HR status, including lung, contralateral breast, ovarian and uterine corpus SPMs, had been confirmed in several relative researches. For instance, an increased risk of contralateral breast cancer after a first primary estrogen-negative breast cancer was reported. Standardized incidence ratios for second primary ovarian cancers were significantly increased after ER negative breast cancer. Second lung cancer rates were significantly elevated after ER negative, but not ER positive breast cancer. Patients with invasive breast cancer have a higher risk of developing subsequent endometrial cancer regardless of ER or PR status12,13,14,15. Regretfully, there were few studies reported the incidence trend by breast cancer subtypes and there was not quantified risk by subtypes with respect to the whole SPMs so far and limited evidence about the impact factors of developing SPMs.

Therefore, accurately estimating the SPMs risk for the first primary breast cancer(FPBC) survivors and profiling the characteristics of patients at risk would be valuable. The main purpose of our study was to demonstrate the incidence trend and population characteristics for FPBC by subtypes, investigate the risk factors for developing SPMs and explore the risk stratification model and the relative risk for SPMs developed.

Materials and methods

The incidence trend of breast cancer by subtypes

The SEER*Stat software includes a variable named as“Breast Subtype 2010” to facilitate the analysis of trends for breast cancer incidence rates. Based on the status of ER, PR and HER2, it can be categorized into four subtypes: Luminal A (ER + and/or PR+, HER2-), Luminal B (ER + and/or PR +,HER2+), HER2- enriched (ER- and PR-, HER2+) and triple negative breast cancer (TNBC, ER- and PR-, HER2-)16.

Study population in SEER datasets

First primary breast cancer(FPBC) were identified from 18 registration regions of the Surveillance, Epidemiology and End Results (SEER) database to explore the characteristics distribution by subtypes, 2010–2019. FPBC was defined as first pathologically diagnosed with I stage, II stage and III stage. The exclusion criteria included patients who were younger than 20 years and older than 85 years, patients whom first primary cancer was not FPBC, patients with unknown pathological stage and IV stage, patients with missing or unknown information in variables including age, radiotherapy, grade, married status, Clinical lymph node status.survival information. The access to and use of SEER data did not require informed patient consent.

SPMs ascertainment and Follow-Up

SPMs defined as two or more malignant tumors of distinct histological types in the same individual simultaneously or successively. In order to minimize the bias from incidental SPMs diagnoses due to heightened medical surveillance, SPMs were regarded as any type of cancers occurring more than 1 year after FPBC. The follow-up for SPMs ended at whichever occurred first, including the age of 85 years, loss to follow- up, death or December 31, 2018.

SPMs in Hebei breast cancer cohort

In Hebei Province where located in northern China, we extracted detailed information on the ER, PR, and HER2 status from pathological reports in the Fourth Hospital of Hebei Medical University. The results of 1% or more positive nuclear staining of tumor were classified as positive ER (ER+) or PR (PR+). Positive HER2 (HER2+) was defined as positive nuclear staining intensity in “2+” and “3+” tumor cells. The definition of the molecular subtypes was consistent with the classification of SEER database. The use of the data was approved by the Ethics Informed Committee of the Fourth Hospital of Hebei Medical University (Shijiazhuang, Hebei, China), and all analyses were performed in accordance with the approved guidelines. Written informed consent was obtained from all subjects or from their next of kin if the patients were deceased.

Statistical analysis

The SEER data were randomly selected for training and validation sets, and the split ratio was 7:3. The training set was used to train the prediction model. Validation data were used to validate the model. The cumulative incidence rate(CIR) of SPMs development was assessed by Fine-Gray competing risk regression analysis. Experiencing a non-SPMs and dying of all causes were considered competing events by calculating hazard ratios (HRs) and 95% CIs for SPMS17. Chi-square tests were used to compare categorical data. Univariate and multivariate analysis based on competing risk model was performed. Based the model, a nomogram was established.The predictive performance of the model was measured using ROC curves and calibration plots. Risk stratification model based on the total score of each patient as calculated by the nomogram. Time was regarded as the latency between the two cancers, status was defined as SPMs or not. X-Tile software was used to determine the best cut-off value to divide patients into low-risk, intermediate-risk, and high-risk groups. R software (R Core Team, 2020) was used for statistical analysis and graphic plotting, A p value < 0.05 is considered statistically significant.

Results

The incidence trend of breast cancer by subtypes

The incidence rates of breast cancer patients with Lumina B, Lumina A and HER2 enriched were increasing significantly from 2010 to 2018 with an APC of 2.45 and 1.16 and 0.37, respectively. There was a decreased trend for TNBC, the APC was − 0.53(Fig. 1D). In terms of age group,, the incidence rate of breast cancer with four subtypes were increased, with an APC of 2.14, 1.38%, 1.27 and 0.54 for Lumina B, HER2 enriched, Lumina A and TNBC in the young group(Fig. 1A). For middle group, there was a reduction trend for TNBC, with the APC was − 1.15. The APCs were 2.55, 1.25 and 0.64 for Lumina B, Lumina A and HER2 enriched subtypes(Fig. 1B). For old group, the incidence rate trend for HER2 enriched was decreased with the APC of − 0.28, the other three subtypes was increasing with APCs of 2.49,1.13 and 0.21 for Lumina B, Lumina A and TNBC, respectively(Fig. 1C).

Fig. 1
figure 1

Incidence trends of female breast cancer by subtypes (A) young age group; (B) middle age group; (C) old age group (D) whole age group.

Characteristics of FPBC survivors by subtypes

Table 1 shows the characteristics of 324,661 breast cancer. The most common subtype was the Lumina A, accounting for 75.61%, followed by TNBC (10.30%), HER2 enriched was the least common subtype, (3.95%). The median age of diagnosis were 61y, 56y, 56y and 58y, with the latency between the FPBC diagnosis and the SPMs diagnosis were 39 m, 39 m, 40 m and 41 months for Lumina A, Lumina B, HER2 enriched and TNBC, repsecitvely. There were significant difference among the distribution for majority characteristics.

Table 1 Comparisons of baseline characteristics of patients with FPBC by subtypes.

Cumulative incidences of SPMs by subtypes

The five-year CIRs of SPMs in TNBC was 2.84%, which was significant higher than the other three subtypes of Luminal A (2.61%), Luminal B (2.30%) and HER2 enriched (2.21%)(Fig. 2A). In age-specific analyses, the CIRs for TNBC, Luminal A, HER2 enriched and Luminal B were 1.53%, 1.29%, 1.14% and 1.05%, respectively, but there was no significant difference among the four subtypes in the young group(Fig. 2B). In the middle group(Fig. 2C), the highest five-year CIR was 2.61% for negative FPBC patients, followed with Luminal B FPBC patients with the rate of 2.07%, Luminal A FPBC patients with the rate of 2.03%, HER2 enriched provide the lowest five-year CIR of 1.94%, and there were statistical differences for the rates. For the old age group(Fig. 2D), TNBC still obtained the highest five-year CIR of 3.73%, followed with the subtypes for Luminal A (3.56%), Luminal B (3.27%) and HER2 enriched (3.22%), respectively. The median follow up time was 45 months, during the follow up period, the proportions of developing SPMs were 2.23%(5485/245463), 1.91%(629/32922), 1.92%(246/12839) and 2.69%(901/33437) for the subtypes of Lumina A, Lumina B, HER2 enriched and TNBC, respectively.

Fig. 2
figure 2

The cumulative incidences and distribution of SPMs by FPBC subtypes (A) Whole age group; (B) young age group; (C) middle age group; (D) old age group.

Risk factors for developing SPMs

Of the 324,661 individuals, 227,263 were assigned to training set. The comparative clinical characteristics among the training and validation sets are shown in STable 1. With the purpose of exploring risk factors for developing of SPMs, a univariate analysis based on competing hazards model was performed in the training set. It was showed that eight variables including radiotherapy, chemotherapy, tumor size, pathological stage, chemotherapy, clinical lymph node status, subtypes and age of diagnosis were closely related to be the risk factors for developing SPMs for breast cancer survivors. Subsequently, the statistically significant factors above were incorporated into the multivariate analysis. The results revealed that age of diagnosis, subtype, radiotherapy, clinical lymph node status were factors strongly associated with the development of SPMs. As the Table 2 showed that the older the age was, the more opportunity the SPMs developed, with the HR of 1.529(95%CI: 1.464,,1.598). Besides, we can see that TNBC subtype was regarded as one of independent risk factors to develop PMs(HR:1.252, 95%CI:1.144,1.368). Statistics data indicated that radiotherapy was more likely to develop SPMs than no radiotherapy (HR: 1.201, 95%CI:1.134,1.273).

Table 2 SPMs in univariate and multivariate analysis based on competing risk model in training set.

Predict nomogram model for the risk of SPMs development

On the basis of all the factors significantly associated with developing SPMs in the multivariate Competing hazards model analysis, the risk for developing SPMs was predicted by a nomogram plot. It can see that age of diagnosis was the strongest contributor to develop SPMs, followed by subtypes in the training set. The time-dependent ROC curves showed that the AUC values at 3 and 5 years were 0.682(95%CI 0.674–0.691) and 0.679 (95%CI 0.663–0.681)(Fig. 3C).This suggested favorable discrimination of the nomogram (Fig. 3A). Moreover, the calibration curves indicated that the nomogram had a strong calibration(Fig. 3B). The model performance of the validation set was shown in Sfig1. The results on SPMs prediction models by main cancer types were shown in Stable2 - 4.

Fig. 3
figure 3

Competing Hazards Model to calculate the (A) nomogram for predicting SPMs developed risk in female patients with primary FPBC and (B) the prediction calibration plots and (C) ROC curves for predicting 3- and 5-year probability of FPBC in training set.

Risk stratification model and risk stratification for subgroup analysis

Based on the constructed predict nomogram model, the patients were divided into the low-risk group (31.94%; total score < 59), intermediate-risk group (51.83%; total score 60–84), and high-risk group (16.23%; total score > 85). The results showed that there were different for the CIRs among patients in the three groups, with the rates of 1.47%, 2.39% and 3.27%, respectively. The CIRs of SPMs for patients in the high-risk group was significantly higher than that of those in the low-risk group (p < 0.01).

Although the constructed nomogram model worked well, its effectiveness in the subgroups was unclear. Therefore, we divided these patients into different subgroups to further confirm the effectiveness of the nomogram. The results indicated that the risk stratification of all subgroups was statistically significant (P < 0.05). This implied that the nomogram was effective for the distinction of prediction of different risk subgroups of SPMs for FPBC survivors (as shown in Fig. 4).

Fig. 4
figure 4

Subgroup analysis of risk factors for FPBC patients including Race, tumor size, histology, subtypes, radiotherapy, clinical lymph node status, surgery, chemotherapy, married status and grade.

Risk factors for subgroup analysis and stratification analysis

To further explore the risk factors for FPBC by subtypes, we performed subtype stratification analysis in SFig 5. The results showed that old age were the mutual risk factors whatever the subtype was. Radiotherapy, no-chemotherapy and lymph node status were the risk factorsfor Luminal (A) No-chemotherapy was the risk factor for Luminal (B) Well differentiation and married status were associated with developing SPMs for HER2 enriched. Tumor size, radiotherapy, no-chemotherapy, well differentiation and pathological stage were played important roles for developing SPMs for TNBC. In order to further confirm the relative risk for different subtypes compared with the most common subtype of Luminal A, we performed the stratification analysis. TNBC patients had the strongest relative risk to develop SPMs among the vast majority subgroups with the HRs value more than 1.00. Comparing with Luminal A, there were no significant between Luminal B or HER2enriched subtype and Luminal A among majority factors(STable2).

The distribution of SPMs in American and Hbei cohort

As shown in Fig. 5. In SEER database, Lumina A FPBC cases were more likely to develop second breast cancer, followed by lung cancer and gynecological cancer, but Chinese Lumina A were more likely to develop second breast cancer, lung cancer and thyroid cancer, respectively. Whatever for SEER or Hebei, the proportion of the top three ranks of SPMs were second breast, lung and gynecological cancer for HER2 enriched. Same as the SEER database, the most common SPMs was second breast, lung cancer and colorectal cancer in Hebei cohort for Lumina B. Interestingly, the first common SPMs was second breast cancer in both SEER and Hebei cohort, the second and third rank SPMs were lung and gynecological cancer in SEER database, however, the rank presented the opposite result in Hebei cohort (Fig. 5A).

In SEER database, the Luminal A subtype was significantly enriched in both the SPMs and No-SPMs cohorts. However, the distribution of the TNBC subtype was the unique subtype generally higher in the SPMs cohort than the No-SPMs cohort. The proportion rank of SPMs by subtypes in Hebei cohort were Luminal A, Luminal B, Her2 enriched and TNBC, respectively(Fig. 5B).

Fig. 5
figure 5

The distribution of SPMs (A) distribution of SPMs by age and FPBC subtypes, (B) distribution of molecular subtypes in the No-SPMs and SPMs cohorts in USA and Hebei.

Discussion

According to the latest cancer statistics, incidence during 2014 through 2018 continued a slow increase for female breast cancer with annual average percent change of 0.5%, female breast cancer was still the most common cancer in American18. As the health public awareness has increased rapidly and diagnostic and treatment options have improved gradually, the survivorship of cancer patients has continued to extend, the number of SPMs has gradually increased in recent years7. Previous researches indicated that SPMs patients involving breast cancer were related to worse outcomes than those with breast cancer only19,20. To my knowledge, there was rarely published literature to describe the characteristics distribution, explore the risk factors to develop SPMs and identity the risk stratification for breast cancer Therefore, it is necessary to conduct in-depth study on risk prediction model of this targeted population. These results are informative for understanding the pathogenic factors of SPMs. This study underscores the urgency to strengthen the awareness of SPMs detection and screening among breast cancer.

On the whole, there were substantial subtypes disparities in the incidence rates of breast cancer and associated with different age groups, but few studies were conducted on the incidence trend of breast cancer by subtypes. Only one previous study, specifically examined the incidence trends of molecular subtypes of breast cancer over multiple years from 2010 to 2016, reported the Luminal B, HER2 enriched and TNBC incidence rates had no statistically significant changes in the slope of the trends21. However, we exhibited the breast cancer incidence rate by subtypes form 2010 to 2019 in the study and found that there was a decreased trend for breast cancer cases with the TNBC with the APC was − 0.53, which was consistent with the previous research which reported that the incidence rate for TNBC decreased from 14.8 per 100,000 in 2011 to 14.0 in 2019(APC: − 0.6, P= 0.02)22. We found the incidence rate of TNBC in the young group was increasing, which confirmed that TNBC are notably common in younger and premenopausal women as well. HR- positive breast cancer were present increased trends during the study period, the fertility rate declined, a shift to later age at first birth and obesity rate increased maybe explained the associated with increased risk of HR- positive breast cancer23,24,25,26,27,28,29. In addition, increasing access to mammography screening may have also contributed to the rise of HR- positive cancers as screening mammography preferentially detects ER-positive cancers with higher sensitivity than ER-negative cancer24.

A previous study reported that Luminal A was the most common subtype in both the No-SPMs and SPMs cohort, and the Luminal A subtype was more likely to have an SPM (10%) than other subtypes30. Conversely, we observed the wherever in Hebei or USA, most likely to developing SPMs and the highest five-year CIRs of SPMs was the TNBC in the current study, which was confirmed from previous study reported31,32. The risk of SPMs typically increased with the age at the primary breast cancer diagnosis, which was consistent with our current study33,34. Many studies reported that adjuvant radiotherapy for breast cancer increases the risk of SPMs, the similar results were observed in the current study as well35,36,37,38. In this current study, the prediction model of SPMs was established and evaluated by means of competitive risk. The models showed good prediction ability and clinical practicability, and this will contribute to the screening and the clinical intervention for high risk population of SPMs.

As far as we know, many tools have been developed to inform prognostic risk factors in the SPMs among FPBC survivors, but tools for risk stratification for SPMs development are scarce20,39. In the study, we constructed a risk stratification system to distinguish the risk level according to the calculated scores by predict nomogram model and only less than 20% breast cancer patients were regarded as the high-risk population for SPMs development eventually. Our current results demonstrated that in the high-risk group, the CIR was highest for TNBC, which confirmed that TNBC was the most susceptible molecular subtype for SPMs19It might be explained TNBC patients have higher frequency of BRCA1/2 mutation, have the potential for higher frequency of impaired DNA repair mechanisms and limited understanding of the relevant targets for treatment40,41.HR-positive subtype also was found a relative higher risk of SPMs. Previous studies reported that endocrine therapy reduced contralateral breast cancer risk by over 50% and confirmed an increased risk of SPMs after ER-positive breast cancer in patients with tamoxifen therapy31,32,42,43,44, which emphasized the significance to improve endocrine therapy initiation and adherence for ER-positive breast cancer patients.

Although major strengths of this study are the use of high quality population-based cancer registry data, and the systematic comparison of the risk for SPMs development by subtypes, several study limitations should be noted as well. Firstly, SEER database represent almost 30% of the population in American, but largest limitation is that the population is consisted of 70–80% white people, it is difficult to enroll race as a risk factor for SPMs. Because of SEER considers contralateral and all metachronous breast cancers as second primaries, some recurrent or metastatic breast cancers may have been misdiagnosed as SPMs45. Secondly, considering the substantially short-term follow-up time available to observe subsequent primary events, the results according to subtypes should be interpreted with caution. Thirdly, lack of detailed treatment data and absence of risk factors data (such as genetic susceptibility, obesity, menopausal or not and smoking) in SEER limited our ability to explore and identify the specific factors that may have contributed to the variation in the SPMs risk by subtypes35,46. Fourthly, another limitation stemmed from the lack of validation in an external cohort for the risk prediction models in the study.

In conclusion, the incidence rates of subtypes were difference. Subtypes, age of diagnosis, radiotherapy and lymph node status were the risk factors for SPMs, and the risk stratification model could identify the high risk population of SPMs, who should be closely followed up and screened by clinicians.

ER: estrogen receptor; PR: progesterone receptor; HER2: human epidermal growth factor receptor; TNBC: triple negative breast cancer; SPMs: second primary malignancies; FPBC: first primary breast cancer; APC: annual percentage change; young group:< 40 years old; middle group: 40–64 years old; old group: > 65 years old; HRs: hazard ratio; CIR: cumulative incidence rate.