Introduction

Multigene signatures (MGS) estimate the risk of recurrence and the benefit of adjuvant chemotherapy (aCT) in patients carrying estrogen receptor (ER)-positive, human epidermal growth factor receptor 2 (HER2)-negative early breast cancer [1,2,3,4,5,6]. The clinical utility of MammaPrint® (MP; Agendia, The Netherlands and Irvine, CA) and Oncotype DX® (ODX; Exact Sciences, Madison, WI) has been validated in large prospective and randomized multinational phase III controlled clinical trials; the Microarray in Node-negative and 1–3 positive lymph nodes Disease may Avoid Chemo Therapy trial (MINDACT) and the Trial Assigning IndividuaLized Options for Treatment (Rx) (TAILORx) [1, 2]. MGS such as Prosigna® (PAM50-based assay; NanoString Technologies, Seattle, WA) and EndoPredict® (Myriad Genetics, Salt Lake City, UT), another commercially available MGS, provide a risk of recurrence as well but these tests have not yet been prospectively validated in randomized controlled trials. The clinical trial Rx for Positive Node, Endocrine Responsive Breast Cancer (RxPONDER) is currently evaluating the clinical predictive utility of ODX for aCT in patients with a recurrence score of ≤25 and as secondary endpoint Prosigna® in patients with node positive (1–3 nodes) breast cancer [7]. Recently, in order to increase accessibility to MP and BluePrint® (BP), we validated a targeted RNA—from formalin-fixed paraffin-embedded tissue-based next-generation sequencing (NGS) kit for the implementation of MGS testing in a decentralized setting [8, 9].

Major oncological societies strongly recommend the use of a clinically validated MGS test to guide therapy decisions in patients with ER-positive HER2-negative lymph node negative breast cancer and some guidelines recommend the test in patients with up to three positive lymph nodes [10,11,12,13,14,15]. However, the cost to perform an MGS test is high and it remains currently unclear, which patients benefit most from MGS due to the different strategies used in the two major clinical trials MINDACT and TAILORx [1, 2]. Ideally, MGS tests are used on a preselected patient group to allow optimal cost-effectiveness.

In recent years, several statistical models based on multiple clinical–pathological parameters have been developed to predict MGS results and/or aCT benefit. Some of them, PREDICT and new Adjuvant! Online (nAOL), are mentioned in the European Society of Medical Oncology guidelines and nAOL is also mentioned in the guidelines of the National Comprehensive Cancer Network to help predict recurrence risk and potential benefit from systemic adjuvant treatment [13, 14]. Moreover, decision trees for selecting patients for MGS testing based on few clinical–pathological features or based on the combination of clinical–pathological features with outcomes of statistical models, such as Magee equations (ME) have also been developed [16, 17]. Still, most currently available MGS predictors have only been designed for ODX and have been tested in relatively small cohorts in a limited number of centers. Recently, the first predictor for Prosigna®, the size, nodal, and Ki-67 index (SiNK) [18], has been developed and to date no predictors have been developed for MP.

Here, we evaluated the utility of eight different statistical models in predicting MGS results. Although each of the models have been designed with a specific MGS in mind, we explored the usefulness of the models for predicting each of the three MGS. These models could be valuable as access to a specific MGS is sometimes limited in certain countries or performing the test can be expensive. We measured the concordance between the risk scores obtained by one of the three major commercially available MGS (ODX, MP, or Prosigna®) and the eight currently available statistical models in a retrospective series of ER-positive HER2-negative early stage breast carcinomas. We aimed to identify a more cost-effective model able to accurately select cases prior to additional MGS testing further decreasing cost and delay in therapeutic decision-making.

Patients and methods

In this retrospective study, we evaluated 129 patients with invasive ER-positive HER2-negative early breast cancer diagnosed at University Hospitals Leuven between May 2013 and April 2019 for which there was doubt about the recommendation of aCT during the multidisciplinary meeting (MDM) and were therefore tested with an MGS. These patients with doubt about aCT administration comprised almost 7% of all patients diagnosed in that period. Patients with (N = 57) and without (N = 72) lymph node involvement were included. After surgery, all patients were discussed at the MDM once before the MGS results and once after the MGS results. A representative tumor block from the resection specimen was selected by board certified pathologists (GF and ASVR) following the standard requirements as indicated either by ODX (N = 44), MP/BP (N = 28), or Prosigna/PAM50® (N = 57).

The main prognostic factors used at our institution to consider aCT are lymph node involvement, lymph vascular invasion, young age (<50 years), large tumors (>2 cm), multifocal tumors, low ER expression (Allred < 5), absence of progesterone receptor (PR) expression (<1%), high tumor grade (grade 3), and high Ki-67 (≥20%).

We retrieved information about ER, PR, HER2, and Ki-67 from the pathology reports. Antibody clones and conditions for ER, PR, HER2, and Ki-67 can be found in Supplementary Table 1. Immunohistochemistry (IHC) and/or fluorescent in situ hybridization were evaluated according to the ASCO/CAP guidelines [19, 20]. In our institution, ER and PR are scored by the semi-quantitative Allred scoring method and Ki-67 is estimated by the average over the full surface of the tumor [21].

Multigene signatures (MGS)

Patient samples were tested with ODX, MP, or Prosigna® and classified into high or low risk of distant recurrence based on the most recent classification criteria as shown in Table 1 [1, 5, 22]. Patients with an intermediate-risk result obtained by ODX or Prosigna® were reclassified into high and low risk as specified in Table 1. In premenopausal patients or patients with age 50 or younger, a RS of 16 was used as cutoff for risk classification, according to the NCCN guidelines even though the contribution of ovarian suppression is less clear in these patients [14].

Table 1 Cutoffs for high–low risk classification for each multigene signature.

Intrinsic molecular subtypes were available for patients tested with the combination of MP/BP or with Prosigna®/PAM50. MP/BP is currently available as the targeted RNA-based NGS kit, and in this retrospective study, most samples were tested with both NGS kit and microarray but only the outcome based on microarray was taken into account [9].

Statistical models

ME [16, 23,24,25], Memorial Sloan Kettering simplified risk score (MSK-SRS) [26], Breast Cancer Recurrence Score Estimator (BCRSE) [27, 28], OncotypeDXCalculator (ODXC) [29, 30], nAOL [31,32,33], MyMammaPrint (MyMP) [1, 34], PREDICT v2.1 [35,36,37] and SiNK [18] were computed for each patient. More details about the statistical models, their development and validation can be found in Table 2.

Table 2 Overview of the statistical models and their characteristics used in this study.

ME, MSK-SRS, BCRSE, and ODXC were developed for predicting ODX results and for patients with lymph node negative breast cancer but in this study, we also applied them to patients with one to three positive lymph nodes. We included one patient with four positive lymph nodes. In our series, the H-score, which is determined by multiplying the percentage of stained cells with their respective intensity (scored from 0 to 3) and adding the results, was not available for most patients as this is not widely used [38]. Therefore, we used the Allred/H-score conversion table to overcome this [39]. Ki-67 staining was performed on all cases. nAOL, MyMP, and PREDICT were not developed for the purpose of predicting the outcome of MGS. However, as those models are frequently used in clinical practice for decision-making concerning aCT and/or MGS testing, we considered to integrate those models in this study.

Experimental setup

Our primary objective was to assess the concordance between high or low risk of recurrence comparing the results of the inexpensive statistical models with MGS risk of recurrence. For this objective, we reclassified patients with an intermediate-risk result obtained by ODX or Prosigna® into high and low risk as specified in Table 1. Specifically for ME, equation 2 (ME eq 2), for which Ki-67 is not required (see Table 2), is stated to give the highest concordance with ODX recurrence score and therefore, we included ME eq 2 and the average of the three equations (ME av eq) [23]. Because some of the multivariable MGS estimators include an intermediate category, we evaluated its concordance with high or low MGS results as well. In addition, the concordance of each single ME as well as the concordance of different cutoffs (original, Magee Decision Algorithm [25], and an alternative on Magee Decision Algorithm) with MGS risk categories was calculated (Table 2 and Supplementary Tables 710).

The secondary objective was to look at the change in decision to use aCT with and without the results of MGS. At the moment of decision-making, ODX and Prosigna® used a three-tier system, with the integration of an intermediate-risk class. The decision about aCT in case of an intermediate-risk result with ODX was based on RS of 18 and with Prosigna® on the 10% cutoff for 10-year distant recurrence.

Statistical analysis

MGS test outcomes were compared with the outcomes of the statistical models using a contingency table. Based on the contingency table, we measured the concordance, the negative (low risk) percent agreement (NPA), and the positive (high risk) percent agreement (PPA) with MGS results as reference standard. As some inexpensive statistical models result in an intermediate-risk outcome, we performed the concordance, NPA, and PPA in three ways. For our primary objective, we ignored the intermediate-risk category, and in Supplementary Table 10, we report results when the intermediate-risk category is integrated in the high-risk and when it is integrated in the low-risk category.

Results

Patient and tumor characteristics are shown in Table 3. No meaningful differences between high- and low-risk distribution were observed in patients with or without lymph-node involvement.

Table 3 Patient and tumor characteristics for patients tested with Oncotype DX® (N = 44), MammaPrint® (N = 28), or Prosigna® (N = 57).

Concordance between inexpensive statistical models and MGS results

Out of the 129 patients, 68 (53%) were MGS low risk and 61 (47%) MGS high risk. Table 4 shows the number of patients with low-, intermediate-, and high-risk outcome obtained by each statistical model. Whereas nAOL and MyMP stratified about 85% of all patients into high risk, PREDICT only stratified 34% of the patients into high risk. ME, MSK-SRS, and BCRSE classified most patients into the intermediate-risk category.

Table 4 Overview of patients with low-, intermediate-, and high-risk outcome with each inexpensive statistical model (N = 129).

The comparisons between risk outcomes obtained by the MGS and the inexpensive statistical models are shown in Table 5. The concordance, NPA and PPA for risk results obtained by the MGS and by the inexpensive statistical models based on the contingency tables can be found in Table 6. Figure 1 shows the number of patients with a high or low-risk outcome obtained by each model and the percentage of patients who were low or high risk by each MGS. ME eq 2 and ME av eq had a high-risk outcome for 4 patients who were tested with ODX. All 4 patients were high risk with ODX (Fig. 1). These 2 models correctly classified 2 patients as low risk who were low risk with ODX. ME eq 2 and ME av eq correctly classified high and low risk but only for ODX. This was reflected in concordance rates, NPA and PPA of 100.0% for the two models for the prediction of ODX (Table 6). For MP, ME eq 2 and ME av eq resulted in a high-risk outcome in 1 patient which corresponded with a MP high-risk outcome (Fig. 1) and resulted in a low-risk outcome in only 1 patient which was discordant with the MP high-risk outcome (Fig. 1). All patients classified as high risk with BCRSE for each MGS had a high-risk result with the MGS but discordant results were observed in the low-risk classification with this model. nAOL correctly predicted low risk for ODX (NPA of 14.3%, a PPA of 100.0% and a concordance of 45.5% for ODX) in all 4 patients classified as low risk by nAOL but showed discordant results for the prediction of low risk in MP and Prosigna® and the prediction of high risk in the three MGS. We observed discordant results with MSK-SRS, MyMP, PREDICT and SiNK for the three MGS in both high and low-risk prediction.

Table 5 Predictive value of ME, MSK-SRS, BCRSE, ODXC, nAOL, MyMP PREDICT, and SiNK in MGS tested tumors.
Table 6 Comparison of test outcomes between each multigene signature and each inexpensive statistical model with results for negative percent agreement (low risk), positive percent agreement (high risk), and concordance.
Fig. 1: The number of patients with a high model risk or a low model risk and the corresponding percentage of those patients with a high or low risk by each MGS.
figure 1

Green: MGS low risk, red: MGS high risk. At the top of the figure, the number of patients with a low-risk outcome and a high-risk outcome obtained by ODX, MP, or Prosigna® is mentioned. On each bar, the number of patients per statistical model with a high-risk outcome or a low-risk outcome obtained by the model is mentioned. In the x-axis, the percentage of patients who are classified high or low risk with the model that are high risk with the MGS or low risk with the MGS is mentioned. ME Magee equations, ME av.eq average of the three Magee equations, MSK-SRS Memorial Sloan Kettering simplified risk score, BCRSE Breast Cancer Recurrence Score Estimator, ODXC OncotypeDXCalculator, nAOL new Adjuvant! Online, MyMP Mymammaprint.com. For ME, cutoffs ≤12 and >30 were used for low and high risk, respectively.

The comparison between MGS high- and low-risk outcome and the outcome of the inexpensive statistical models for patients without and with lymph node involvement is shown in Supplementary Tables 2 and 3, respectively. The concordances, NPA, and PPA for patients with and without lymph node involvement and the results with the inclusion of intermediate categories of the inexpensive predictive models either in the low- or high-risk category are shown in Supplementary Tables 11 and 12.

Change in MDM recommendation for aCT administration

A switch in aCT recommendation based on MDM decisions was observed in 47% (61/129) of patients after incorporation of MGS results. Following MGS testing, aCT was given to 61 patients, which resulted in 15% (11/72) relative and 9% (11/129) absolute reduction (Table 7). The change in MDM decision for aCT administration with ODX, MP, and Prosigna® separately is shown in Table 7, respectively. No reduction in aCT administration was seen after performing Prosigna®.

Table 7 Change in MDM decision to add chemotherapy with integration of MGS results (a) for all patient samples tested with an MGS (n = 129), (b) patient samples tested with Oncotype DX® (n = 44), (c) patient samples tested with MammaPrint® (n = 28), (d) and patient samples tested with Prosigna®.

Although follow-up was too short, two patients tested with Prosigna® showed distant recurrence. Of those two patients, one patient had a high-risk MGS outcome and received aCT after incorporation of MGS results in MDM recommendation, but would have not received aCT based on MDM recommendation before MGS results were available. The other patient had a low-risk outcome and did not receive aCT as was also recommended by the MDM before the Prosigna® results were available.

Discussion

Major oncological societies strongly recommend the use of clinically validated MGS to guide therapy decisions in patients with ER-positive HER2-negative breast cancer. However, many factors impede the broad use of commercially available MGS including the high cost to perform an MGS and the lack of consensus on preselection of a defined target population. As guidelines for the use of MGS are lacking patient selection criteria, several inexpensive statistical models based on multiple clinical–pathological parameters have been developed to predict MGS results and/or aCT benefit (in order to select patients who would benefit from an MGS) [40]. However, most models have only been developed and validated for ODX and have been tested in relatively small cohorts in a limited number of centers.

In this retrospective and observational study, we benchmarked eight currently available statistical models for predicting the results of one of three MGS, namely ODX, MP, and Prosigna®. The predictive models MSK-SRS, BCRSE, ODXC, ME, and SiNK are focused on pathological parameters whereas the others we computed, namely nAOL, MyMP, and PREDICT are focused on clinical parameters. To the best of our knowledge, this is the first time that the information provided by such clinical-oriented tools was tested for predicting the results of MGS in the context of clinical decision-making concerning aCT in patients with luminal early breast carcinomas.

In general, none of the models were able to correctly predict both high- and low-risk cases for all three MGS. Most models based on pathological characteristics provided intermediate-risk results in the majority of the cases (45–89%) (Supplementary Table 4), similar to their reference works [16, 26, 28, 30]. Yet, they still showed the best results as potential screening tools to select patients to be tested by MGS. Remarkably, all cases classified by ME eq 2, ME av eq, and BCRSE as high risk matched with the results of each MGS. In the low-risk category, only ME eq 2 and ME av eq matched 100% match with ODX. The incorporation of the intermediate-risk results of the inexpensive predictive models in either the high- or low-risk category, resulted in a moderate to marked reduction in concordance rates for most models with ODX and MP but better concordance rates with Prosigna® (Supplementary Table 10). MSK-SRS, BCRSE, and ODXC, which were specifically developed to predict ODX, showed high concordances (>80%) with ODX. For MP, only moderate concordance rates were observed, with the clinical-oriented model PREDICT resulting in the highest concordance with 67.9%. Concordance rates of 75.0% and lower were observed between risk results obtained by the statistical models and Prosigna®. We observed a concordance rate of 66.7% between Prosigna® and SiNK, which was specifically developed for Prosigna®. This is comparable to the concordance of 71% observed during the validation of SiNK [18].

ER status was the only parameter that was mutually present in all models. Only ME eq 1, ME eq 3, BCRSE, PREDICT, and SiNK integrate the proliferation marker Ki-67%; ME, BCRSE and SiNK use the exact Ki-67 value and PREDICT stratifies in high and low with a cutoff of >10%. Following international guidelines, there is no consensus for high and low Ki-67 values [13, 41]. Another important parameter is the PR, which is a binary variable in ODXC and a semi-quantitative variable in MSK-SRS, BCRSE, and ME [42]. It has already been shown that PR is strongly related to ODX outcome [43,44,45,46]. Moreover, in ER-positive HER2-negative breast cancer, an absent PR is an independent prognostic marker for poor prognosis [47,48,49,50]. In this study, PR status was negative in 16% (21/126) of the patients of which 62% (13/21) had an MGS high and 38% (8/21) an MGS low-risk result. As opposed to the clinical-oriented models, most pathological-oriented ones have been developed for patients without lymph node involvement (Table 2); however, we computed those pathological-oriented models regardless of the lymph node status. In our study, 72 patients (56%) were lymph node negative and 57 patients (44%) had lymph node involvement, of which 61% (35/57) only had one positive lymph node. In patients with lymph node involvement, we observed good concordance rates (>70%) for ODX only with ME, MSK-SRS, BCRSE, and ODXC. Only for one patient, ME av eq and BCRSE resulted in a correct high-risk MGS prediction.

The decision to recommend aCT or to use an MGS depends on the risk observed by the breast cancer specialist, but this risk has a large interobserver variability [51,52,53,54]. The main prognostic factors used within our institution to consider an MGS are lymph node involvement, lymph vascular invasion, age, tumor size, tumor grade, number of tumor foci, Ki-67, and the level of hormone receptor expression. In our study, 70% (90/129) of patients had at least one and 33% (43/129) even had two, prognostic factors linked to poor outcome, namely tumor size of >2 cm and lymph node involvement, which triggered the request of an MGS during the MDM. Compared to the patients in MINDACT and TAILORx, this study contained more patients with a clinical high-risk profile (44/129 = 34% with PREDICT) [1, 2]. The characteristics of the patient population in this study are overlapping with the characteristics of a series of real-life patients that have recently been described by Hajjaji et al. [55].

When we looked at the change in decision to use aCT following MGS results, we observed a reduction in administration of aCT after performing the MGS. This is in line with earlier results for MP and ODX [56]. When considering Prosigna® specifically, no reduction in the administration of aCT was seen in our study. The higher number of high-risk results with Prosigna® is consistent with the results described by Hequet et al. and by Hajjaji et al. [55, 57]. In Hajjaji et al., the high-risk outcome with Prosigna® was associated with the presence of positive lymph nodes [55]. This can be partly explained by the fact that the algorithm of Prosigna® uses a lower cutoff to classify patients as high risk in case of positive lymph nodes, compared to patients without lymph node involvement. Interestingly, ODX also applies a different algorithm in patients with positive lymph nodes. RxPONDER will be crucial for the validation of these MGS in the context of lymph node involvement [7]. In contrast, MP only has one valid algorithm for both patients with and patients without lymph node involvement.

Despite the relative rarity of cases classified as high risk by ME and BCRSE, these tests could still be a useful adjunct in selecting patients for MGS testing in countries where the access to the test is restricted. Regardless of the MGS, we observed that in seven out of seven patients classified as high risk by ME and in six out of six patients classified as high risk by BCRSE, the results were concordant with the MGS risk classification (Fig. 1). In other words, MGS testing could have been avoided in about 5% (7/129) of patients, resulting in a reduced cost and likely resulting in faster therapeutic decision-making. So far, only the Magee Decision Algorithm has already proven to be cost-effective by avoiding ODX testing in a limited number of patients [16, 58]. Moreover, a prospective randomized trial has recently confirmed the cost-effectiveness of integrating ME in clinical practice [59, 60]. Here, for the first time, we show that ME and BCRSE can be useful in selecting patients for MGS testing, not only for ODX but also for MP and Prosigna®.

We acknowledge that there are several limitations to our study, including the observational and retrospective nature of it. Even though the risk assessment across the different MGS assays can differ, the patients in our study were tested with one of the three MGS assays. In addition, we should take into account that ODX has a predictive character whereas MP and Prosigna® are merely prognostic. The latter can also contribute to the heterogeneity of the results on a patient level [61] as the models have been developed based on different patient populations. As the cutoff for ODX RS is debatable in premenopausal women, we included a Supplementary Table 13 with the NPA, PPA, and concordance for the comparison between ODX with a cutoff of 25 regardless of age and the eight statistical models. We observed reduced concordance rates for ME (which was originally 100.0%), nAOL, MyMP, and SiNK, and improved concordance rates for MSK-SRS, BCRSE, and ODXC (which reached 100.0%). Moreover, our study was performed in a single center and aCT recommendations can vary significantly across centers. Therefore, validation of our findings in a large, multicentric patient cohort is needed.

To conclude, inexpensive statistical models, in particular ME av eq and BCRSE, can be useful in selecting patients with ER-positive HER2-negative early breast cancers for MGS testing. Although statistical models cannot replace MGS testing, they do provide a possible way to reduce the number of MGS tests, resulting in enhanced cost-effectiveness and reduced delay in therapeutic decision-making. Integration of MGS results into MDM recommendations, resulted in a substantial decisional switch and reduction in chemotherapy administration.