Developing machine learning models for personalized treatment strategies in early breast cancer patients undergoing neoadjuvant systemic therapy based on SEER database

Ren, Jiahui; Li, Yili; Zhou, Jing; Yang, Ting; Jing, Jingfeng; Xiao, Qian; Duan, Zhongxu; Xiang, Ke; Zhuang, Yuchen; Li, Daxue; Gao, Han

doi:10.1038/s41598-024-72385-0

Download PDF

Article
Open access
Published: 27 September 2024

Developing machine learning models for personalized treatment strategies in early breast cancer patients undergoing neoadjuvant systemic therapy based on SEER database

Jiahui Ren^1,2,
Yili Li^1,2,
Jing Zhou^1,2,
Ting Yang^1,2,
Jingfeng Jing^1,2,
Qian Xiao^1,2,
Zhongxu Duan^1,2,
Ke Xiang^1,2,
Yuchen Zhuang^1,2,
Daxue Li^1,2 &
…
Han Gao^1,2

Scientific Reports volume 14, Article number: 22055 (2024) Cite this article

3765 Accesses
3 Citations
Metrics details

Subjects

Abstract

This study aimed to compare the long-term outcomes of breast-conserving surgery plus radiotherapy (BCS + RT) and mastectomy in early breast cancer (EBC) patients who received neoadjuvant systemic therapy (NST), and sought to construct and authenticate a machine learning algorithm that could assist healthcare professionals in formulating personalized treatment strategies for this patient population. We analyzed data from the Surveillance, Epidemiology, and End Results database on EBC patients undergoing BCS + RT or mastectomy post-NST (2010–2018). Employing propensity score matching (PSM) to minimize potential biases, we compared breast cancer-specific survival (BCSS) and overall survival (OS) between the two surgical groups. Additionally, we trained and validated six machine learning survival models and developed a cloud-based recommendation system for surgical treatment based on the optimal model. Among the 13,958 patients, 9028 (64.7%) underwent BCS + RT and 4930 (35.3%) underwent mastectomy. After PSM, there were 3715 patients in each group. Compared to mastectomy, BCS + RT significantly improved BCSS (p < 0.001) and OS (p < 0.001). Prognostic variables associated with BCSS were utilized to develop machine learning models. In both the training and validation cohorts, the random survival forest (RSF) model demonstrated superior predictive performance (0.847 and 0.795), not only outperforming other machine learning models, including Rpart (0.725 and 0.707), Xgboost (0.762 and 0.727), Glmboost (0.748 and 0.788), Survctree (0.764 and 0.766), and Survsvm (0.777 and 0.790), but also outperforming the classical COX model (0.749 and 0.782). Lastly, a web-based prediction tool was built to facilitate clinical application [https://jhren.shinyapps.io/shinyapp1]. After adjusting other confounders, BCS + RT was associated with improved outcomes in patients with EBC after NST, compared to those who underwent mastectomy. Moreover, the RSF model, a reliable tool, can predict long-term outcomes for patients, providing valuable guidance for operative methods and postoperative follow-up.

Survival impact of adjuvant radiotherapy in early stage low risk elderly male breast cancer patients treated with breast conserving surgery

Article Open access 24 August 2025

Establishment and validation survival prediction models for T1 locally advanced breast cancer after breast conservation surgery versus mastectomy

Article Open access 09 April 2025

Machine learning analysis of survival outcomes in breast cancer patients treated with chemotherapy, hormone therapy, surgery, and radiotherapy

Article Open access 10 July 2025

Introduction

Breast cancer remains a significant global health challenge, and advancements in treatment modalities continually shape the landscape of patient care^1,2. In the context of early breast cancer (EBC), the advent of neoadjuvant systemic therapy (NST) has transformed the landscape of treatment options, presenting a dynamic interplay between surgical modalities and adjuvant therapies^3,4. In recent years, several observational studies have consistently indicated superior survival outcomes in patients treated with breast-conserving surgery plus radiotherapy (BCS + RT) compared to those treated with mastectomy^5,6,7,8,9,10. However, the survival impact of type of local therapy following modern NST is unclear^11,12,13. A critical decision facing clinicians involves the choice between BCS + RT and mastectomy for patients who have undergone NST.

The present study delves into this clinical dilemma, seeking to provide a comprehensive comparison of the long-term outcomes associated with BCS + RT and mastectomy in EBC patients post-NST. Over a span from 2010 to 2018, a cohort comprising 13,958 patients was meticulously analyzed, shedding light on the evolving trends in treatment preferences during this period. Notably, with the progression of imaging techniques for treatment response assessment and advancements in localizing breast lesions, there has been a notable rise in the utilization of BCS among patients undergoing NST^{3,14,15,16,17,18}.

Beyond traditional survival analyses, our investigation incorporates a series of pioneering machine learning approaches aimed at enhancing personalized treatment decisions¹⁹. Leveraging data from the Surveillance, Epidemiology, and End Results (SEER)-Medicare database, we not only assess overall and breast cancer-specific survival but also introduce a Random Survival Forest (RSF) model. This model, based on ten prognostic variables, aims to predict long-term outcomes with superior accuracy, thereby offering a valuable tool for clinicians navigating the complex terrain of surgical treatment decisions. Furthermore, the integration of a cloud-based recommendation system further amplifies the impact of our study. By visualizing survival curves and deploying this system on the internet, we bridge the gap between research findings and real-world clinical applications. This user-friendly tool facilitates dynamic and data-driven decision-making, empowering clinicians to tailor treatment plans to individual patient profiles.

Methods

Study population

We acquired the dataset from the SEER 17 Registries, Nov 2022 Sub database and conducted analysis utilizing SEER*Stat 8.4.0 software. Given the public nature of the data without any personally identifiable patient information, our retrospective cohort study obtained approval from the Institutional Review Board of the Chongqing Health Center for Women and Children. The Board waived the requirement for informed consent. Inclusion criteria encompassed female breast cancer patients diagnosed specifically with IDC/ILC, with tumor stages T1–3, N0-3, M0, aged between 20 and 79 years, undergoing either breast-conserving surgery or mastectomy after receiving NST, with treatment initiated within 6 months of diagnosis. Exclusion criteria included patients receiving non-beam radiation or BCS without subsequent radiotherapy, those receiving preoperative or intraoperative radiotherapy, or an unspecified radiotherapy sequence, as well as patients who deceased within 6 months of diagnosis. Additionally, Patients with incomplete data on neoadjuvant therapy or missing values in the Site-Specific Codes for Neoadjuvant Therapy Treatment Effect were excluded. The detailed procedure for data filtering is illustrated in Fig. 1. Our study adhered to the principles of the Declaration of Helsinki (2013 revision).

Study variables

EBC refers to cancer confined within the breast, with or without involvement of regional lymph nodes, and without spread to distant organs. Patients were categorized as having received NST if they underwent systemic treatment before surgery, or both before and after surgery. Clinical and pathologic data included surgery type, age, race, marital status, median household income, rural–urban status, histological type, tumor site, grade, T stage, N stage, molecular subtype, and response to neoadjuvant therapy. The primary endpoint was breast cancer-specific survival (BCSS), with overall survival (OS) considered as the secondary outcome.

Statistical analysis

We conducted both univariate and multivariate analyses to evaluate mortality risk among patients and to identify independent prognostic factors. Multiple imputations by chained equations were employed to handle value and maintain model stability²⁰. To minimize potential selection biases and balance the baseline characteristics between the two surgical groups, we performed propensity score matching (PSM). Variables used for matching included age, race, tumor grade, T stage, N stage, estrogen receptor (ER) status, progesterone receptor (PR) status, HER2 status, marital status, and year of diagnosis. By incorporating the year of diagnosis into the PSM, we aimed to account for changes in diagnostic technologies and treatment strategies over time, ensuring a more balanced comparison between the BCS and mastectomy cohorts. Matching was conducted using a 1:1 nearest-neighbor approach without replacement, with a caliper width of 0.002. Categorical measurements were compared using the Chi-squared test. Survival analysis was conducted using Kaplan–Meier (KM) analysis and the log-rank test. All analyses were two-sided, and statistical significance was defined as a P value less than 0.05. Statistical software R version 4.1 and Python 3.11 were used for all analyses.

Machine learning model design

In this study, patients were randomly allocated into training and testing datasets in a 7:3 ratio. Variables independently associated with the primary outcome were identified through least absolute shrinkage and selection operator (LASSO) regression, as well as univariate and multivariate Cox regression analyses. These variables were then used to construct various machine learning models, including RSF, Rpart, Xgboost, Glmboost, Survctree, and Survsvm. To handle categorical features effectively, one-hot encoding was utilized to represent the different categorical values in a binary manner. Hyperparameters were optimized through tenfold cross-validation and Bayesian optimization to maximize the concordance index (C-index), which measures the ratio of correctly ordered patient pairs to all pairs. indicating better model performance with higher C-index values. The RSF model's hyperparameters were fine-tuned within specified ranges: number of trees (100–500), number of variables randomly sampled at each split (2–10), and minimum node size (2–10). Furthermore, the selected optimal model underwent comparison with the traditional Cox model for generalizability. Model performance was assessed using calibration plots, decision curve analysis (DCA), and receiver operating characteristic (ROC) curves. Construction of machine learning models relied on crucial dependencies such as Mlr3, randomForestSRC, and scikit-survival packages. Additionally, a user-friendly web-based prediction tool was developed to facilitate clinical application.

Results

Description of population

A total of 13,958 patients were included in this analysis. BCS + RT was used for the treatment of 9028 patients (64.7%) and mastectomy for the treatment of 4930 patients (35.3%). We observe a steady increase in BCS rates and a corresponding decrease in mastectomy rates among EBC patients after NST from 2010 to 2018 (Supplementary Fig. S1). The clinical-pathological characteristics of the patients are presented in Supplementary Table S1. The patients' ages were predominantly distributed in the range of 40–59 years, constituting 54.2% of the total; followed by the 60–79 age group, accounting for 34.9%; and patients aged 20–39 comprising 10.9%. In terms of race, white patients constituted 70% of the total. Regarding marital status, 57.3% of patients were non-single, while 38.6% were non-single. Approximately 51.4% of patients reported a relatively favorable economic status, with an annual median household income exceeding $70,000. Overall, 91.6% of patients resided in metropolitan areas. The most common histological type was invasive ductal carcinoma (IDC), representing 90.9% of cases. Common histological grades were predominantly distributed between Grade II (33.2%) and Grade III/IV (53.5%). The most frequent tumor location is in the upper outer quadrant. In terms of molecular subtypes, HR+/HER2− accounted for 39.7%, followed by HR−/HER2− (23.7%), HR+/HER2+ (22.8%), and HR−/HER2+ (11.3%). Tumor staging revealed 20.4%, 60.6%, and 18.9% for T1–T3 stages, respectively; and 43.0%, 41.6%, 9.9%, and 5.5% for N0–N3 stages, respectively. Regarding neoadjuvant therapy treatment effect, complete response (CR) was observed in 22.9%, partial response (PR) in 25.7%, overall response (OR) combined in 17.9%, and no response (NR) in 5.2% of cases.

Survival analysis

We performed univariate and multivariable Cox regression to identify significant variables affecting OS and BCSS in EBC patients after NST, namely, age, race, marital status, rural–urban status, grade, tumor site, T stage, N stage, molecular subtype, and response to neoadjuvant therapy (Supplementary Table S2). To adjust for confounding factors, we performed 1:1 PSM of all the above variables as well as the year of diagnosis (Table 1). After PSM, there were 3715 patients in each group, with a mean follow-up time of 63.2 ± 32.4 months. The KM survival curve showed that the BCS + RT group showed better BCSS (p < 0.001) and OS (p < 0.001) than the mastectomy group (Fig. 2). Furthermore, we also performed sub-analyses on the clinicopathological factors of interest, namely T stage, N stage, and responses to neoadjuvant therapy. In patients with T1 stage, there is no significant difference in BCSS between the two groups (p = 0.11), and there is a minor difference in OS (p = 0.02). However, among patients with T2 or T3 stage, both BCSS and OS show significant differences (p < 0.001). Additionally, regardless of lymph node metastasis, significant differences are observed in both BCSS and OS between the BCS + RT and mastectomy groups (p < 0.001 or p = 0.046). Furthermore, for patients with different responses to NST (CR, PR, OR, and NR), both BCSS and OS are significantly higher in the BCS + RT group than in the mastectomy group (p = 0.01 and p < 0.01 for CR, p < 0.001 and p = 0.003 for PR, p < 0.001 and p < 0.001 for OR, and p = 0.001 and p = 0.002 for NR).

Table 1 Comparison of baseline characteristics y before and after PSM.

Full size table

Machine learning models

Algorithms, including Cox (Supplementary Table S2) and LASSO regression (Supplementary Fig. S2), analyses, showed that surgery, race, marital status, rural–urban status, grade, tumor site, T stage, N stage, molecular subtype, and response to neoadjuvant therapy were prognostic variables identified as independently associated with the primary outcome, namely BCSS. Six models, namely, RSF, Rpart, Xgboost, Glmboost, Survctree, and Survsvm, were established based on the identified ten prognostic variables. We divided the patients into train and test data according to 7:3, and to ensure the stability of the model, tenfold cross-validation and Bayesian Optimization was used in the train set to assess the optimal hyperparameters.

In the training and validation cohort, the predictive performance of the RSF model (0.847 and 0.795) was better than that of the other models, with C-index of Rpart (0.725 and 0.707), Xgboost (0.762 and 0.727), Glmboost (0.748 and 0.788), Survctree (0.764 and 0.766), and Survsvm (0.777 vs 0.790). The hyperparameters of the RSF model include 277 ntrees, 2 mtrys, and 7 nodesizes. Additionally, when compared to the traditional Cox model (0.749 and 0.782), the RSF model continues to exhibit higher predictive performance in both the training and validation cohorts. Then, we further evaluated the accuracy of the RSF model. RSF had good performance in AUC of 3, 5, and 10 years in the training cohort (0.878, 0.849, and 0.825, respectively) (Fig. 3A) and in the validation cohort (0.822, 0.783, and 0.713, respectively) (Fig. 3B). The agreement of RSF between predictions and observations in prognosis was assessed using a calibration plot. The 3-, 5-, and 10-year calibration plots showed good agreement between the predictive value and the actual value in the training and validation cohorts (Fig. 3C,D). DCA was applied to calculate a clinical “net benefit” for the prediction model, and the result of DCA indicated that the RSF model had a better net benefit at most threshold probabilities (Fig. 4A,B). We also assessed the ranking of clinical characteristics in terms of importance in the model. The results showed that N stage, response to neoadjuvant therapy, molecular subtype, grade, surgery type, and T stage were the top six determinants of patient survival (Fig. 5). Furthermore, we determined the optimal cutoff value for risk scores based on the survival curve and used this value to stratify patients into high-risk (risk score > 21.56) and low-risk (risk score ≤ 21.56) groups (Supplementary Fig. S3). The results of the KM analysis and log-rank test between the high-risk group and low-risk group were presented in Fig. 5, demonstrating a significant difference between the two groups (p < 0.001) (Supplementary Fig. S4).

Since the RSF model has better performance than the traditional Cox model, we could not only predict the survival function of the current patient but also offer an adjuvant therapy reference to the surgery doctor based on prediction over different therapy treatment plans. Thus, we deployed the recommender system to the Internet, which could be accessed with a browser in [https://jhren.shinyapps.io/shinyapp1], input the current clinicopathologic characteristics of one patient, and click the predict button (Fig. 6).

Discussion

Comparative effectiveness of surgical modalities

The findings of our study underscore the superiority of BCS + RT over mastectomy in terms of both BCSS (p < 0.001) and OS (p < 0.001) in EBC patients post-NST. The observed improvement in outcomes aligns with several observational studies that have consistently reported favorable survival outcomes associated with BCS + RT in comparison to mastectomy^21,22,23. Additionally, in a meta-analysis including 16 studies with a combined total of 3531 patients, Sun et al. showed that no significant difference in local recurrence and regional recurrence (p = 0.26 and p = 0.03), while they figured out a lower distant recurrence (p < 0.01), a higher disease-free survival (p < 0.01) and a higher OS (p < 0.01) in BCS + RT compared with mastectomy²⁴. This could be due to various factors, including the biological behavior of the residual disease, the impact of radiotherapy, and the overall management strategy associated with BCS + RT. Furthermore, a rigorous and detailed postoperative follow-up regimen for these patients might facilitate the early detection and management of potential recurrences²⁵. The evolving landscape of breast cancer management, particularly in the realm of EBC following NST, necessitates a nuanced understanding of the comparative effectiveness of treatment modalities^26,27. Notably, our analysis, after PSM to balance relevant covariates, reinforces the robustness of these findings.

In this analysis, several factors associated with survival outcomes were identified, including namely, age, race, marital status, rural–urban status, grade, tumor site, T stage, N stage, molecular subtype, and response to neoadjuvant therapy. This is in accordance with several previous studies^22,28,29. Considering these factors, we conducted subgroup analyses for T stage, N stage, and response to neoadjuvant therapy, respectively. Firstly, in our analysis of T1 stage tumors, we observed no statistically significant difference in BCSS between patients who underwent BCS + RT and those who underwent mastectomy. Although there is some difference in OS, it is less pronounced compared to the more evident differences observed in T2 and T3 stages. We deduced that patients at T1 stage generally present with smaller tumors, characterized by limited local invasion. In such cases, both of these surgical approaches may effectively control the disease. As the tumor size increases, BCS + RT may provide better local control, improving long-term outcomes for patients. We acknowledge that patients with T1 tumors and node-negative status are not typically recommended for NST in routine clinical practice. However, these patients might receive NST due to specific tumor biology, patient preference, or other clinical factors. Including these patients ensures a comprehensive reflection of diverse treatment decisions in real-world clinical settings. According to NCCN guidelines³⁰, in patients with triple-negative or HER2-positive breast cancer, we may consider neoadjuvant therapy even if the tumor size is less than 2 cm and the axillary lymph node is negative. Treatment response provides important prognostic and adjuvant therapy information at an individual patient level, particularly in patients with triple-negative or HER2-positive breast cancer. In the T1 subgroup, there were no significant differences in BCSS and OS between the two groups, further supporting the idea that BCS + RT may provide better local control in patients with a higher tumor burden. Secondly, it is noteworthy that in the analysis of different subgroups based on N stage, we observed significant differences between the two surgical approaches. This may reflect the impact of lymph node involvement on surgical choices. BCS + RT might have an advantage in controlling lymph node involvement more comprehensively, especially in patients with higher N stages, resulting in better survival outcomes. Lastly, this finding is consistent across various subgroups of responses to NST, even in those who do not achieve a complete response. The survival benefit observed across all response categories to NST highlights the importance of considering surgical options beyond the extent of tumor response. BCS + RT may provide better local control in patients with higher tumor burden. Understanding the specific scenarios in which one surgical modality may confer a survival advantage over the other enables a more nuanced and personalized approach to breast cancer management.

Machine learning augmentation

The choice between BCS + RT and mastectomy remains a complex decision, influenced by clinical factors, patient preferences, and the evolving landscape of therapeutic options. However, there is a lack of accurate prediction models in the clinic. As a result, a more accurate and powerful model is needed. To our knowledge, the current study is the largest one to analyze the choice of surgical procedures in EBC patients following NST. Beyond traditional survival analyses, our study introduces six machine learning models to predict long-term outcomes for EBC patients post-NST. The RSF model, based on ten prognostic variables identified through Cox regression and LASSO regression, outperforms other machine learning models, including Rpart, Xgboost, Glmboost, Survctree, and Survsvm, in terms of C-index in both training and validation cohorts.

The RSF algorithm, first proposed in 2008³¹, demonstrates superior predictive performance compared to the classical Cox model, highlighting the potential of machine learning to improve prognostic accuracy. This extension of traditional survival analysis leverages ensemble learning by constructing multiple survival trees. Through random sampling and feature selection, each tree predicts survival outcomes, and their collective results enhance robustness and reduce overfitting. This method integrates the advantages of random forests, offering a powerful tool for predicting time-to-event outcomes. Implementation involves random sampling during tree construction, yielding a diverse set of survival trees. The final prediction is an aggregation of individual tree predictions. Notably, the model's ability to handle high-dimensional data, capture non-linear relationships, and account for complex interactions positions it as a valuable tool for clinicians navigating the nuanced landscape of breast cancer treatment decisions³².

In addition, the importance of predictors can be calculated on the basis of the model to identify the factors that are closely related to prognosis for EBC after NST. This information might facilitate the surgery management and reduce the medical burden. So, we observed that clinical features, including N stage, response to neoadjuvant therapy, molecular subtype, grade, surgery type, and T stage, sequentially play significant roles in long-term prognosis, which were also referred to in prior study^22,28,29.

The RSF risk stratification enables the evaluation of a patient's prognosis according to their clinicopathological profile. The high-risk cut-off (risk score > 21.56) was determined using the calibration curve to identify patients with lower predicted survival rates. This cut-off serves to distinguish between patients with different survival probabilities and to provide actionable information for surgeons and patients when considering surgical options. When applying the model, if the 'surgical type' variable for a high-risk patient is changed to BCS + RT, the RSF model may predict longer survival or move them to a lower-risk category. This indicates that opting for BCS + RT instead of mastectomy could potentially benefit these patients. Through individualized survival probability curves, the prognosis is presented with greater precision, providing a more detailed perspective on patients' outcomes. However, this change in surgical approach might not necessarily lower the patient's predicted risk score, as it also depends on other clinicopathologic features.

Web-based recommendation system

To bridge the gap between research findings and real-world clinical applications, we developed a cloud-based recommendation system. This system, accessible through a web interface, facilitates dynamic and data-driven decision-making by visualizing survival curves for each treatment plan. By deploying this system on the internet, we empower clinicians to make informed and personalized treatment decisions based on individual patient profiles.

Study limitation

The retrospective nature of our study relies on data extracted from SEER. While SEER provides a wealth of information, it is essential to recognize inherent limitations. Variability in data collection methods, potential coding errors, and the absence of certain clinical variables may introduce biases or limit the granularity of our analysis. Despite employing PSM to mitigate confounding factors, inherent selection biases may persist. Unmeasured or inadequately controlled variables, such as patient menopausal status or detailed information on the neoadjuvant therapy, could impact the observed outcomes. The retrospective design introduces challenges in fully accounting for all relevant clinical variables. Additionally, the study predominantly includes patients from the United States, potentially limiting the applicability of results to diverse healthcare settings. Furthermore, both training and test sets are from the same database, possibly introducing overlap and compromising the model's generalization capabilities. This implication should be interpreted with caution, as the model's predictions need external validation in other cohorts to ensure its reliability and generalizability. We strongly recommend further validation studies before applying the web-based prediction model to other patient populations.

Conclusions

Our study's findings carry substantial clinical implications for EBC patients post-NST, providing evidence in support of the efficacy of BCS + RT over mastectomy. The integration of machine learning models, particularly the RSF, introduces a new dimension to prognostic predictions, offering clinicians a powerful tool for personalized treatment decisions. Future directions may involve refining and expanding the machine learning model, incorporating additional relevant variables, and validating its performance in diverse patient populations. Further research could explore the impact of emerging therapies and evolving treatment paradigms on the comparative effectiveness of surgical modalities in EBC.

In conclusion, our study contributes to the evolving discourse on breast cancer management by providing robust evidence in favor of BCS + RT over mastectomy in EBC patients post-NST. The integration of machine learning augments prognostic predictions, and the web-based recommendation system facilitates the translation of research findings into actionable insights for clinicians.

Data availability

The dataset for this study can be obtained from the SEER database (https://seer.cancer.gov/).

Abbreviations

BCS + RT:: Breast-conserving surgery plus radiotherapy
BCSS:: Breast cancer-specific survival
C-index:: Concordance index
DCA:: Decision curve analysis
EBC:: Early breast cancer
HER2:: Human epidermal growth factor receptor 2
HR:: Hormone receptor
KM:: Kaplan–Meier
LASSO:: Least absolute shrinkage and selection operator
NST:: Neoadjuvant systemic therapy
OS:: Overall survival
PSM:: Propensity score matching
ROC:: Receiver operating characteristic
RSF:: Random survival forest
SEER:: Surveillance, epidemiology, and end result

References

Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73(1), 17–48 (2023).
Article PubMed Google Scholar
Loibl, S., Poortmans, P., Morrow, M., Denkert, C. & Curigliano, G. Breast cancer. Lancet. 397(10286), 1750–1769 (2021).
Article CAS PubMed Google Scholar
Mieog, J. S., van der Hage, J. A. & van de Velde, C. J. Neoadjuvant chemotherapy for operable breast cancer. Br. J. Surg. 94(10), 1189–1200 (2007).
Article CAS PubMed Google Scholar
Wolmark, N., Wang, J., Mamounas, E., Bryant, J. & Fisher, B. Preoperative chemotherapy in patients with operable breast cancer: Nine-year results from National Surgical Adjuvant Breast and Bowel Project B-18. J. Natl. Cancer Inst. Monogr. 30, 96–102 (2001).
Article Google Scholar
Wrubel, E., Natwick, R. & Wright, G. P. Breast-conserving therapy is associated with improved survival compared with mastectomy for early-stage breast cancer: A propensity score matched comparison using the national cancer database. Ann. Surg. Oncol. 28(2), 914–919 (2021).
Article PubMed Google Scholar
Kim, H. et al. Survival of breast-conserving surgery plus radiotherapy versus total mastectomy in early breast cancer. Ann. Surg. Oncol. 28(9), 5039–5047 (2021).
Article PubMed Google Scholar
de Boniface, J., Szulkin, R. & Johansson, A. L. V. Survival after breast conservation vs mastectomy adjusted for comorbidity and socioeconomic status: A Swedish national 6-year follow-up of 48 986 women. JAMA Surg. 156(7), 628–637 (2021).
Article PubMed PubMed Central Google Scholar
Wang, L. et al. Comparisons of breast conserving therapy versus mastectomy in young and old women with early-stage breast cancer: Long-term results using propensity score adjustment method. Breast Cancer Res. Treat. 183(3), 717–728 (2020).
Article PubMed Google Scholar
Lazow, S. P., Riba, L., Alapati, A. & James, T. A. Comparison of breast-conserving therapy vs mastectomy in women under age 40: National trends and potential survival implications. Breast J. 25(4), 578–584 (2019).
Article PubMed Google Scholar
Corradini, S. et al. Mastectomy or breast-conserving therapy for early breast cancer in real-life clinical practice: Outcome comparison of 7565 cases. Cancers (Basel). 11(2), 160 (2019).
Article CAS PubMed PubMed Central Google Scholar
Agarwal, S., Pappas, L., Neumayer, L., Kokeny, K. & Agarwal, J. Effect of breast conservation therapy vs mastectomy on disease-specific survival for early-stage breast cancer. JAMA Surg. 149(3), 267–274 (2014).
Article PubMed Google Scholar
van Maaren, M. C. et al. 10 year survival after breast-conserving surgery plus radiotherapy compared with mastectomy in early breast cancer in the Netherlands: A population-based study. Lancet Oncol. 17(8), 1158–1170 (2016).
Article PubMed Google Scholar
Lagendijk, M. et al. Breast conserving therapy and mastectomy revisited: Breast cancer-specific survival and the influence of prognostic factors in 129,692 patients. Int. J. Cancer. 142(1), 165–175 (2018).
Article CAS PubMed Google Scholar
Barranger, E. et al. Effect of neoadjuvant chemotherapy on the surgical treatment of patients with locally advanced breast cancer requiring initial mastectomy. Clin. Breast Cancer. 15(5), e231-235 (2015).
Article CAS PubMed Google Scholar
Vugts, G. et al. Patterns of care in the administration of neo-adjuvant chemotherapy for breast cancer. A population-based study. Breast J. 22(3), 316–321 (2016).
Article CAS PubMed Google Scholar
Gobardhan, P. D. et al. The role of radioactive iodine-125 seed localization in breast-conserving therapy following neoadjuvant chemotherapy. Ann. Oncol. 24(3), 668–673 (2013).
Article CAS PubMed Google Scholar
Petruolo, O. et al. How often does modern neoadjuvant chemotherapy downstage patients to breast-conserving surgery?. Ann. Surg. Oncol. 28(1), 287–294 (2021).
Article PubMed Google Scholar
Nakamura, S. et al. 3D-MR mammography-guided breast conserving surgery after neoadjuvant chemotherapy: Clinical results and future perspectives with reference to FDG-PET. Breast Cancer. 8(4), 351–354 (2001).
Article CAS PubMed Google Scholar
Rajula, H. S. R., Verlato, G., Manchia, M., Antonucci, N. & Fanos, V. Comparison of conventional statistical methods with machine learning in medicine: Diagnosis, drug development, and treatment. Medicina (Kaunas). 56(9), 455 (2020).
Article PubMed PubMed Central Google Scholar
van Buuren, S. & Groothuis-Oudshoorn, K. mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011).
Article Google Scholar
Arlow, R. L. et al. Breast-conservation therapy after neoadjuvant chemotherapy does not compromise 10-year breast cancer-specific mortality. Am. J. Clin. Oncol. 41(12), 1246–1251 (2018).
Article PubMed PubMed Central Google Scholar
Simons, J. M. et al. Disease-free and overall survival after neoadjuvant chemotherapy in breast cancer: Breast-conserving surgery compared to mastectomy in a large single-centre cohort study. Breast Cancer Res. Treat. 185(2), 441–451 (2021).
Article PubMed Google Scholar
Gwark, S. et al. Survival after breast-conserving surgery compared with that after mastectomy in breast cancer patients receiving neoadjuvant chemotherapy. Ann. Surg. Oncol. 30(5), 2845–2853 (2023).
Article PubMed Google Scholar
Sun, Y., Liao, M., He, L. & Zhu, C. Comparison of breast-conserving surgery with mastectomy in locally advanced breast cancer after good response to neoadjuvant chemotherapy: A PRISMA-compliant systematic review and meta-analysis. Medicine (Baltimore). 96(43), e8367 (2017).
Article PubMed PubMed Central Google Scholar
Montgomery, D. A., Krupa, K. & Cooke, T. G. Follow-up in breast cancer: Does routine clinical examination improve outcome? A systematic review of the literature. Br. J. Cancer. 97(12), 1632–1641 (2007).
Article CAS PubMed PubMed Central Google Scholar
Heil, J. et al. Eliminating the breast cancer surgery paradigm after neoadjuvant systemic therapy: Current evidence and future challenges. Ann. Oncol. 31(1), 61–71 (2020).
Article CAS PubMed Google Scholar
Criscitiello, C. et al. Impact of neoadjuvant chemotherapy and pathological complete response on eligibility for breast-conserving surgery in patients with early breast cancer: A meta-analysis. Eur. J. Cancer. 97, 1–6 (2018).
Article PubMed Google Scholar
Angelucci, D. et al. Long-term outcome of neoadjuvant systemic therapy for locally advanced breast cancer in routine clinical practice. J. Cancer Res. Clin. Oncol. 139(2), 269–280 (2013).
Article CAS PubMed Google Scholar
Mamounas, E. P. et al. Predictors of locoregional recurrence after neoadjuvant chemotherapy: Results from combined analysis of National Surgical Adjuvant Breast and Bowel Project B-18 and B-27. J. Clin. Oncol. 30(32), 3960–3966 (2012).
Article PubMed PubMed Central Google Scholar
National Comprehensive Cancer Network. Breast Cancer (Version 3.2024). BINV-M 1 OF 2. https://www.nccn.org/professionals/physician_gls/pdf/breast.pdf.
Ishwaran, H., Kogalur, U. B., Blackstone, E. H., Lauer, M. S. Random Survival Forests. (2008).
Taylor, J. M. Random survival forests. J. Thorac. Oncol. 6(12), 1974–1975 (2011).
Article PubMed Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Breast and Thyroid Surgery Department, Chongqing Health Center for Women and Children, Chongqing, China
Jiahui Ren, Yili Li, Jing Zhou, Ting Yang, Jingfeng Jing, Qian Xiao, Zhongxu Duan, Ke Xiang, Yuchen Zhuang, Daxue Li & Han Gao
Breast and Thyroid Surgery Department, Women and Children’s Hospital of Chongqing Medical University, Chongqing, China
Jiahui Ren, Yili Li, Jing Zhou, Ting Yang, Jingfeng Jing, Qian Xiao, Zhongxu Duan, Ke Xiang, Yuchen Zhuang, Daxue Li & Han Gao

Authors

Jiahui Ren
View author publications
Search author on:PubMed Google Scholar
Yili Li
View author publications
Search author on:PubMed Google Scholar
Jing Zhou
View author publications
Search author on:PubMed Google Scholar
Ting Yang
View author publications
Search author on:PubMed Google Scholar
Jingfeng Jing
View author publications
Search author on:PubMed Google Scholar
Qian Xiao
View author publications
Search author on:PubMed Google Scholar
Zhongxu Duan
View author publications
Search author on:PubMed Google Scholar
Ke Xiang
View author publications
Search author on:PubMed Google Scholar
Yuchen Zhuang
View author publications
Search author on:PubMed Google Scholar
Daxue Li
View author publications
Search author on:PubMed Google Scholar
Han Gao
View author publications
Search author on:PubMed Google Scholar

Contributions

J.R.: Data curation (lead); formal analysis (lead); methodology (lead); software (lead); validation (equal); visualization (lead); writing—original draft (lead). Y.L.: Formal analysis (supporting); methodology (supporting); validation (equal). J.Z.: data curation (supporting); software (supporting). T.Y.: Formal analysis (supporting); validation (supporting). J.J.: Validation (supporting); writing—original draft (supporting). Q.X.: Validation (supporting): writing—original draft (supporting). Z.D.: Methodology (supporting). K.X.: Software (supporting). Y.Z.: Formal analysis (supporting); validation (supporting). D.L.: Conceptualization (equal); project administration (supporting); writing—review and editing (equal). H.G.: Conceptualization (equal); project administration (lead); writing—review and editing (equal). All authors contributed to the article and approved the submitted version.

Corresponding authors

Correspondence to Daxue Li or Han Gao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

The study was exempted from Institutional Review Board review because we utilized de-identifed, previously collected, publicly available data.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ren, J., Li, Y., Zhou, J. et al. Developing machine learning models for personalized treatment strategies in early breast cancer patients undergoing neoadjuvant systemic therapy based on SEER database. Sci Rep 14, 22055 (2024). https://doi.org/10.1038/s41598-024-72385-0

Download citation

Received: 13 April 2024
Accepted: 06 September 2024
Published: 27 September 2024
Version of record: 27 September 2024
DOI: https://doi.org/10.1038/s41598-024-72385-0