Introduction

Neoadjuvant chemotherapy (NAC) has emerged as a cornerstone in the management of clinically node-positive (cN+) breast cancer as it not only increases the rate of breast conservation but also enables residual disease‐guided systemic therapy, which has been linked to improved prognostic outcomes1,2,3,4,5. Despite these advances in systemic therapy for breast cancer, a consensus on the optimal post‐NAC axillary management remains elusive in such patients6,7,8,9. Institutional variability in the indications for and extent of axillary surgery reflects ongoing challenges in the accurate assessment of residual nodal disease following NAC, a critical factor given the risks of lymphedema and upper‐limb dysfunction that are associated with axillary procedures.

Preoperative evaluation of nodal status in patients with cN + breast cancer after NAC relies predominantly on imaging modalities; however, a meta-analysis by Samiei et al. demonstrated that imaging-based assessments for pathological node negativity (ypN0) are limited by suboptimal diagnostic accuracy10. Although several predictive models that incorporate clinicopathological parameters—such as the tumor stage, biological subtype, and clinical response—have been developed, they have been primarily evaluated for their predictive performance rather than their diagnostic precision11,12,13. Given that safe omission of axillary surgery depends on reliable identification of ypN0, further validation of these models is imperative14,15.

Recent clinical efforts have focused on predicting breast pathological complete response (breast pCR) using needle biopsy of the tumor bed after NAC, as part of investigations into omission of surgery for patients who respond well to NAC16,17,18,19. Several studies have reported that image-guided needle biopsy can accurately diagnose breast pCR with high sensitivity and specificity. As this approach gains evidence, post-NAC needle biopsy may become a standard tool for determining eligibility for omission of surgery in the near future. This evolving ability to assess ypT stage preoperatively also strengthens its role as a clinically accessible predictor of axillary response.

In parallel, cohort studies involving patients with cN + breast cancer have shown that breast pCR correlates strongly with pathological node negativity, especially in hormone receptor (HR)-negative and HER2-positive subtypes20,21. Studies also reported a stronger association between ypT0 and ypN0 compared to ypTis22,23. These findings highlight the clinical relevance of ypT stage in predicting nodal response and support the development of a predictive model that incorporates ypT stage and other clinicopathological factors to improve patient selection for axillary surgery omission.

Given this evolving landscape, a predictive model incorporating ypT stage alongside other clinicopathological factors may have significant clinical utility in identifying cN + patients who could safely undergo less invasive axillary surgery. In this study, we developed and externally validated a nomogram incorporating ypT stage to predict ypN0 in cN + breast cancer patients treated with NAC.

Materials and methods

Study population

This observational multicenter study was conducted with ethical approval obtained from the institutional review boards of all participating institutions. We included consecutive cases of patients with cN + breast cancer who underwent NAC followed by definitive surgery from January 2006 to September 2024. cN + was defined as the histopathologically confirmed involvement of axillary lymph nodes based on fine-needle aspiration or core-needle biopsy. Patients with distant metastases, those who did not receive an anthracycline and/or taxane-based regimen, and cases with indeterminate HR and HER2 status were excluded. The primary objective of this study was to identify factors predictive of conversion to ypN0 following NAC and to construct a nomogram for estimation of the ypN0 probability. The training cohort consisted of patients from Hiroshima University Hospital, and external validation was performed with data from patients obtained from JA Onomichi General Hospital, Higashihiroshima Medical Center, Hiroshima Prefectural Hospital, Kure Medical Center, JA Hiroshima General Hospital, and Asa Citizens Hospital. Because this study entailed retrospective analysis of hospital database records, the requirement for written informed consent was waived; study details were publicly disclosed, and an opt‐out option was provided to patients. All methods were performed in accordance with the relevant guidelines and regulations.

Data collection and analysis

Patient data were extracted from the hospital databases and electronic medical records of each participating institution. Collected clinical variables included age and preoperative T and N classifications, with staging determined according to the 7th edition of the American Joint Committee on Cancer staging system24. Pathological data from biopsy specimens were recorded, including the tumor histological type, grade, and the expression status of hormone receptors (estrogen receptor [ER] and progesterone receptor [PR]) and HER2. ER and PR statuses were evaluated via immunohistochemistry (IHC), and cases were deemed HR positive if ≥ 1% of tumor cells exhibited staining25. HER2 status was considered positive if IHC resulted in a score of 3 + or if in-situ hybridization demonstrated a gene amplification ratio greater than 226. Post-NAC clinical nodal status (ycN) was primarily assessed by axillary ultrasound, with the disappearance of lymph node enlargement or normalization of nodal architecture (restoration of the fatty hilum and cortical thinning) interpreted as ycN027,28. MRI or PET-CT findings were used as supplementary modalities according to each institution’s policy to support the ultrasound-based evaluation. Final ypT and ypN statuses were determined from surgical pathology reports. According to ypT stage, patients were classified as residual invasive carcinoma (non-ypT0/ypTis), noninvasive carcinoma (ypTis), or complete disappearance of the tumor (ypT0), while their nodal status was categorized as ypN0 or ypN + based on the presence or absence of residual-lymph-node metastases29.

Neoadjuvant chemotherapy

NAC consisted of sequential anthracycline- and taxane-based regimens administered according to institutional protocols. Anthracycline regimens included AC or EC, and taxane regimens consisted of weekly paclitaxel or 3-weekly docetaxel, given sequentially in either order. Dose-dense schedules were also permitted, including ddAC/EC followed by weekly or q2-weekly paclitaxel. For HER2-positive disease, trastuzumab was co-administered with the taxane component, and pertuzumab was added following its approval for early-stage breast cancer. For triple-negative breast cancer, the KEYNOTE-522 regimen—comprising paclitaxel plus carboplatin with pembrolizumab, followed by pembrolizumab with anthracycline-based chemotherapy (AC or EC)—was used for stage II or higher cases after regulatory approval. Detailed regimen definitions, dosing, and schedules are provided in Supplementary Table S1.

Endpoints and statistical analysis

Baseline characteristics of patients in the training and validation cohorts were compared using the Wilcoxon rank-sum test for continuous variables and the chi-square test for categorical variables. In the training cohort, logistic regression analysis was performed with ypN0 as the dependent variable. Variables that were statistically significant, as well as those that were not statistically significant but deemed clinically relevant, were subsequently incorporated into the nomogram30. The diagnostic performance of the nomogram was evaluated based on its discriminative ability, calibration, decision curve analysis (DCA), and diagnostic accuracy at various cutoff thresholds31. Discrimination ability was assessed by constructing receiver operating characteristic (ROC) curves and calculating the area under the curve (AUC) with 95% confidence intervals (CIs). Internal validation was performed using bootstrap resampling (n = 1000) to obtain bias-corrected AUC estimates. Calibration was evaluated using calibration curves and the Hosmer–Lemeshow goodness-of-fit test. Diagnostic accuracy at various thresholds was also determined, and the nomogram was further validated using an external cohort. All statistical analyses were performed using R version 4.3.2 (http://www.r-project.org; R Foundation for Statistical Computing, Vienna, Austria), with a two-sided significance level set at 5%.

Results

Characteristics of the training and validation cohorts

In this study, we used the data of 330 patients from Hiroshima University Hospital as the training cohort and those of 279 patients from other institutions as the validation cohort. Table 1 summarizes the baseline characteristics of the two cohorts. No significant differences were observed between the training and validation groups in terms of age (p = 0.74), histology (p = 0.23), clinical tumor size (p = 0.20), hormone receptor status (p = 0.13), ycN status (p = 0.93), pathological T status (p = 0.28), or pathological N status (p = 0.31). However, clinical nodal status (p = 0.032), histological grade (p < 0.001), and HER2 status (p < 0.001) differed significantly between the two groups. Specifically, the training cohort contained a higher proportion of patients with grade 3 tumors, whereas the validation cohort contained a higher proportion of patients with HER2-positive tumors.

Table 1 Characteristics of the training and validation cohorts.

Logistic regression analysis and nomogram construction for prediction of ypN0 in the training cohort

Based on the logistic regression results in the training cohort (Table 2) and on clinical relevance, we selected five factors—ypT stage, clinical nodal status, HR status, HER2 status, and ycN status—to construct a nomogram (Fig. 1). In this nomogram, each predictor is allocated a score on the uppermost axis in line with the value on its own axis. By summing these points across all five variables, a total score is obtained on the “Total Score” axis; this score can be translated into a predicted probability of ypN0 by using the corresponding scale at the bottom of the nomogram. In clinical practice, a physician would identify a patient’s scores for each of the five factors, mark the associated score for each category, sum them to determine the patient’s overall score, and draw a line to determine the estimated probability of having ypN0.

Table 2 Logistic regression analysis for prediction of ypN0 in the training cohort.
Fig. 1
figure 1

Nomogram for prediction of ypN0. The nomogram was developed using a training cohort of 330 patients and incorporates five variables: ypT stage, cN status, HR status, HER2 status, and ycN status. Each variable is assigned a corresponding score, which is summed to obtain the total score. This total is mapped to a linear predictor and the predicted probability of ypN0. As an example, for a patient with ypT0, cN1, HR-negative, HER2-positive, and ycN0 status, the total score is approximately 210 points, corresponding to a predicted probability of ypN0 of ≈ 0.92.

Table 3 Diagnostic performance of the nomogram for prediction of ypN0 at various risk thresholds.

Internal validation of the nomogram

ROC curve analysis for the diagnostic value of the nomogram yielded an AUC of 0.866 (95% CI: 0.829–0.903), with a bootstrap-corrected AUC of 0.858 (Fig. 2A). The calibration curve demonstrated close agreement between observed and predicted values (Fig. 3A), and the Hosmer–Lemeshow test revealed no significant lack of fit (χ2 = 8.19, degrees of freedom [df] = 7, p = 0.32). DCA visualized the model’s net benefit across various risk thresholds (Fig. 4A). Table 3A summarizes the diagnostic performance of the nomogram at various risk thresholds for the prediction of ypN0 in the training cohort. At thresholds of 0.9, 0.8, 0.7, and 0.6, the sensitivity was 0.42, 0.51, 0.54, and 0.66, respectively, while the specificity was 0.98, 0.96, 0.94, and 0.86, respectively. The positive predictive value (PPV) exceeded 0.90 at thresholds of 0.9 (0.96), 0.8 (0.93), and 0.7 (0.91).

Fig. 2
figure 2

ROC curves of the nomogram for prediction of ypN0. The ROC curves demonstrate the nomogram’s discriminative performance, with the x-axis representing the false positive rate and the y-axis representing the true positive rate.

Fig. 3
figure 3

Calibration plots of the nomogram for prediction of ypN0. (A) Calibration plot for the training cohort. The apparent and bias-corrected calibration curves (bootstrap resampling) are shown. The dashed line indicates perfect calibration. The Hosmer–Lemeshow test revealed no significant lack of fit (χ2 = 8.19, df = 7, p = 0.32). (B) Calibration plot for the validation cohort, showing similar results (χ2 = 12.07, df = 8, p = 0.15).

Fig. 4
figure 4

DCA of the nomogram for prediction of ypN0. The gray and black curves correspond to the “treat-all” and “treat-none” strategies, respectively.

External validation of the nomogram

External validation of the nomogram was performed using a validation cohort. The results of the ROC curve analysis, calibration curve, and DCA are presented in Figs. 2B and 3B, and 4B. The AUC was 0.855 (95% CI: 0.811–0.899). The calibration curve demonstrated good agreement between predicted and observed probabilities, and the Hosmer–Lemeshow test revealed no significant lack of fit (χ2 = 12.07, df = 8, p = 0.15). DCA was used to visualize the model’s net benefit across various risk thresholds. In a post hoc analysis, the diagnostic performance of the nomogram was evaluated across different probability thresholds to assess its predictive accuracy for ypN0. Table 3B summarizes the diagnostic performance of the nomogram at various risk thresholds for the prediction of ypN0 in the validation cohort. At thresholds of 0.9, 0.8, 0.7, and 0.6, the sensitivity was 0.50, 0.57, 0.59, and 0.73, respectively, while the specificity was 0.95, 0.94, 0.94, and 0.87, respectively. The positive predictive value (PPV) remained consistently high, exceeding 0.90 at thresholds of 0.9 (0.93), 0.8 (0.93), and 0.7 (0.93).

Discussion

In this study, we developed and validated a nomogram for the prediction of ypN0 in patients with cN + breast cancer who were treated with NAC. Unlike previous models, our nomogram uniquely incorporates the presence or absence of a pCR alongside other clinicopathological parameters. The model had high diagnostic performance, with AUC values of 0.866 in the training cohort and 0.855 in the external validation cohort. Notably, the nomogram exhibited a particularly high PPV across multiple risk thresholds, suggesting its potential clinical utility in the selection of candidates for de-escalation of axillary surgery.

Previous ypN0-prediction models for cN + breast cancer have yielded moderate to good discriminative performance; however, further improvements are necessary to ensure the safe omission of axillary surgery. The model developed and externally validated by Vila et al. achieved an AUC of 0.78–0.82, but it relied solely on clinicopathological factors and did not incorporate dynamic parameters—such as the tumor reduction rate or clinical response—that reflect the effectiveness of NAC11. Although Corsi et al. developed a model that incorporates a clinical complete response as a predictor of ypN0, it had only moderate diagnostic accuracy (sensitivity: 71%, specificity: 73%, AUC = 0.77)12. In contrast, the model developed by Huang et al. had a high discriminative performance (AUC = 0.85), using a nomogram that integrated post-NAC ultrasound features of axillary lymph nodes with clinicopathological features13. However, a breast ultrasound-based determination of a complete response has a high false-positive rate, and the use of multiple ultrasound parameters for axillary lymph nodes poses challenges in terms of reproducibility and generalizability. In light of these findings, future efforts should be focused on the incorporation of objective measures that accurately reflect NAC effectiveness into prediction models to enhance diagnostic accuracy and ensure the safe omission of axillary surgery.

Unlike previous nomogram models that primarily relied on pre-NAC clinicopathological parameters, our model uniquely incorporates the post-treatment ypT stage—an emerging, biopsy-assessable indicator—thereby bridging the transition toward biopsy-based surgical de-escalation strategies. Recent retrospective analyses have consistently shown that breast pCR, particularly ypT0, is strongly associated with ypN0, especially in HR-negative and HER2-positive subtype20,21,22,23. These findings support the clinical relevance of ypT stage as a surrogate marker for nodal response after NAC. As breast pCR is defined by pathological assessment of the surgical specimen, it has conventionally been available only postoperatively. However, recent studies focusing on omission of surgery for NAC responders have demonstrated that it can be reliably assessed using image-guided needle biopsy of the tumor bed16,17,18,19. A representative study by Tasoulis et al. demonstrated the feasibility of using image-guided vacuum-assisted biopsy to preoperatively assess breast pCR after NAC16. In a multicenter pooled analysis involving 166 patients, they showed that applying a standardized protocol yielded a false-negative rate of 3.2%, a negative predictive value of 97.4%, and an overall diagnostic accuracy of 89.5%. These findings support the reliability of image-guided biopsy in identifying patients who achieve ypT0 and may inform patient selection for de-escalated surgical approaches.

To achieve a high PPV for predicting ypN0, we developed a nomogram that incorporates ypT stage in patients with cN + breast cancer. The model demonstrated strong discriminative ability and good calibration, and in a post-hoc analysis evaluating diagnostic performance across probability thresholds, both the training and validation cohorts showed that a predicted probability of ≥ 0.7 corresponded to a PPV ≥ 0.90, indicating a false-negative rate below 10%. This level of diagnostic performance is clinically meaningful, as a PPV ≥ 0.90 corresponds to a false-negative rate (FNR) below 10%—a widely accepted benchmark for safely omitting axillary surgery. Landmark axillary de-escalation trials such as ACOSOG Z1071, SENTINA, and SN FNAC were similarly designed to maintain the FNR below this threshold6,7,32. Importantly, all variables included in our nomogram—such as HR status, HER2 status, and clinical nodal response—are routinely available in standard clinical practice, reinforcing its practical applicability. HR-negative and HER2-positive tumors typically exhibit higher proliferative activity and greater chemosensitivity than HR-positive/HER2-negative tumors. The strong association of these subtypes with ypN0 likely reflects their intrinsic molecular responsiveness to cytotoxic and HER2-targeted therapy. Although recent AI-based prediction models using large datasets have demonstrated excellent discriminative performance, our clinically interpretable nomogram provides a transparent, easily applicable tool for bedside decision-making. Future integration of deep learning or machine-learning frameworks using larger multicenter datasets may further enhance model generalizability.

Despite the promising performance of our model, its primary limitation lies in the use of ypT stage derived from surgical specimens. Because ypT0 currently requires pathological confirmation after surgery, the model’s applicability is confined to postoperative prediction and cannot yet be directly applied to preoperative decision-making. However, as image-guided or vacuum-assisted biopsy techniques for confirming breast pCR continue to evolve, similar nomograms based on biopsy-confirmed pCR may enable preoperative prediction of ypN0 in the near future. In addition, the retrospective and multicenter design, spanning a long enrolment period (2006–2024), may have introduced heterogeneity in imaging methods, systemic therapy regimens, and pathological assessment across institutions and time periods. Because of the limited sample size within each era, adjustment or stratification by treatment period was not feasible, which may have affected the statistical robustness and generalizability of the findings. Although our nomogram showed consistent performance across cohorts, further prospective validation using biopsy-based pCR and long-term clinical outcomes will be essential to establish its generalizability and clinical utility in guiding axillary surgery de-escalation.

In conclusion, our nomogram incorporating ypT stage provides a robust and clinically practical tool for predicting ypN0 in patients with cN + breast cancer treated with NAC. With consistently high PPVs exceeding 0.90, the model meets the diagnostic performance threshold traditionally required for safely omitting axillary surgery. As the preoperative assessment of breast pCR via image-guided biopsy becomes increasingly feasible, this model may serve as a foundation for individualized axillary management strategies in the era of surgical de-escalation. Future prospective studies are warranted to validate its performance in biopsy-based settings and to support its implementation in clinical decision-making.