Introduction

Breast cancer (BC) is the most prevalent malignant tumor in women, and it is the most lethal of all invasive tumors1,2,3. The course of treatment of BC has been continuously optimized, and their associated fatality rate has been substantially reduced due to the development of precision medicine4. In BC, clinicians primarily rely on analyzing traditional biomarkers for treatment decision-making, including progesterone receptor (PR), estrogen receptor (ER), human epidermal growth factor receptor 2 (HER-2), and Ki-675. Studies have found that the expression of Ki-67 is a reliable biomarker of cell proliferative activity6, and its expression is closely correlated with the prognosis of BC patients. The relapsing rates and prognosis of patients are typically worse when the Ki-67 level is high7,8. Further, patients with its expression > 20% show improved responses against neoadjuvant chemotherapy (NAC)9. However, in clinical practice, the primary method for detecting Ki-67 is immunohistochemistry (IHC), which necessitates invasive biopsy and may result in bleeding, infection, or metastasis10. In HER-2-negative BC patients, preoperative biopsy results in a significant increase in Ki-67 expression, thereby deteriorating the prognosis of the disease11,12. Therefore, it is crucial to establish a timely and non-invasive strategy for predicting the expression of Ki-67 in BC to enhance the outcomes of BC patients.

In detecting BC, dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is highly sensitive and provides more details about tissue anatomy and function13. With the rapid development of precision quantitative medical imaging technology, radiomics can transform medical images into high-dimensional and quantitative data, thereby identifying the potential correlation between radiomics and biological characteristics14. Previous studies based on MRI radiomics have explored its value in predicting benign and malignant BC, molecular subtypes, Ki-67 expression and efficacy of neoadjuvant therapy, and all have shown good performance15,16,17,18,19,20,21. However, previous studies have focused on improving the performance of models. Still, the internal working principles of models are challenging to elucidate due to their low transparency, and their interpretability restricts the clinical application of prediction results.

Currently, the SHAP algorithm has been gradually used in medical research, and its interpretability has been explored in the prediction model of pathological grading of pancreatic neuroendocrine tumors, molecular subtypes of BC, and post-acute pancreatitis diabetes22,23,24. Therefore, based on the SHAP algorithm, this study interpreted the optimal model for predicting the Ki-67 expression in BC. This method can identify the features that have the most substantial influence on the model, and quantify the impact of each feature on single sample when the features are not fully independent. The feature contribution can be visualized to promote doctors’ understanding of the model prediction logic.

Materials and methods

Study cohorts

The ethics committee of the Affiliated Hospital of North Sichuan Medical College approved the retrospective study and exempted patients from informed consent (Accreditation number: 2023ER131-1). All the procedures performed in this study were in accordance with the Declaration of Helsinki (as revised in 2013). The clinical and Imaging data of 290 BC patients admitted to the Affiliated Hospital of North Sichuan Medical College between May 2021 and April 2023 were collected. The inclusion criteria included (1) female patients with BC confirmed by puncture or surgical biopsy; (2) not received any previous treatment for the breast. The exclusion criteria included: (1) Incomplete clinical data; (2) Image artifacts; (3) Incomplete imaging data. Finally, 195 BC patients with 201 lesions were included, including 189 unilateral BC patients and 6 bilateral BC patients (Fig. 1).

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Flowchart of patient recruitment with inclusion and exclusion criteria.

Image acquisition

Images were acquired on a 3.0T MRI scanner (Shanghai United Imaging Medical Co., LTD., China) with the 10-channel phased-array breast coil. The chest wall was affixed to the coil, and the breasts were freely hanging within the breast coil while scanning was carried out in the prone state. The scan imaging covered both breasts and armpits. All patients underwent conventional MRI scan, including a transverse T1-weight (T1WI), a transverse fat-suppressed T2 weighted image (FS-T2WI), a transverse DWI, and a transverse dynamic MRI fat-suppressed T1-weighted (DCE-MRI). First, T1WI, T2WI-fs and DWI sequences were acquired. After the first-stage normal scan, Gd-DOTA (0.2 mmol/kg) was injected into the median elbow vein with a high-pressure syringe at a rate of 2.5 mL/s, and 20 ml normal saline was injected at the same rate. DCE-MRI images were obtained in eight phases, each phase was 69 s. The detailed parameters of the scanning sequence are shown in Appendix 1.

Analyses of clinical data and conventional imaging features

The clinicopathological data of the patients were collected, including age, reproductive history, menopausal status, tumor location, T stage before treatment, lymph node metastasis (LN) status, and whether accompanied by benign tumors. Ki-67 was defined as the percentage of nuclear staining positive cells in cancer cells. There was no significant difference in the prognosis between patients with Ki-67 < 14% and those with Ki-67 expression of 14–20%, while the prognosis of patients with Ki-67 > 20% was significantly worse25. Therefore, 20% was used as the boundary in this study, and Ki-67 ≤ 20% was defined as low expression, and > 20% was considered as high expression.

All collected conventional imaging features were evaluated by two radiologists as per the guidelines of the American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS)26. All included features were as follows: tumor maximum diameter (TMD), type of time-signal intensity curve (TIC), ADC value, mammographic calcification, internal enhancement pattern, type of gland, and necrosis. Among them, TMD was measured on the largest section of the tumor on DCE-MRI, and the average value was obtained after measuring the TMD thrice. According to the DCE-MRI sequence, the range of solid lesions was determined, which was marked (thrice) on the ADC map and averaged. The DCE-MRI images were imported to the medical image processing uWS-MR software (R005, United Imaging Medical, Shanghai, China). In the most apparent enhancement phase, the region of interest (ROI) was defined at the maximum lesion level, covering the whole lesion. Next, the TIC was generated, and the relevant semi-quantitative parameters were derived27. All ROI were blindly examined for the pathological outcomes of the patients by two radiologists, and the inter-observer agreement was evaluated using the inter-class correlation coefficient (ICC).

Extraction of radiomics feature

The lesion was identified by two radiologists with 3 years of experience. They were unaware of the patient’s pathological reports. One radiologist used 3D slicer software to sketch the ROI on the early (ph2) and delayed (ph5) phases of DCE-MRI. Another radiologist randomly picked 1/3 of the patients for sketching to evaluate the inter-observer agreement. Feature extraction was performed through 3D sclicer’s “Pyradiomics” package, with voxels of the segmented ROIs were resampled to 1 × 1 × 1 mm (X-axis, Y-axis, and z-axis) via linear interpolation algorithm to ensure data standardization and comparability. The extracted radiomics features were derived from both the original images and those processed with convolutional filters. Gaussian Laplacian filters (sigma = 0.5, 1.0, 1.5, 2.0) and wavelet filters (LLH, LHL, LHH, HLL, HHL, HLH, HHH and LLL) were used for convolution filters. A total of 1223 radiomics features were extracted for each patient. These included 14 shap features, 18 first-order features, 24 glcm features, 14 gldm features, 16 glrlm features, 16 glszm features, and 5 ngtdm feature.

The radiomics features with ICC ≥ 0.80 were selected. The selected features were standardized by Z-score to enhance the comparability of data from various dimensions after ICC. After that, the Pearson correlation coefficient between features was determined. One of the most closely related characteristics was removed using a threshold of 0.8, and the most significant features were retained via recursive feature elimination (RFE).

Explainable model development

The radiomics model was developed by combining the features gathered from ph2 and ph5. The clinical-imaging features associated with Ki-67 expression were selected using univariate and binary logistic regression analyses, and a Clinical-imaging model was then developed. The combined model was constructed by integrating the selected clinical-imaging features with radiomics features to predict the Ki-67 expression in BC using logistic regression. The SHAP algorithm was used to explain the optimal model. Among these, the logistic regression algorithm uses L2 regularization. Further, the liblinear optimization algorithm suitable for small data sets and multi-classification problems, is used, with the maximum number of iterations set to 10,000 to ensure adequate convergence time for the model. The best regularization strength parameter C is determined by grid search (GS). 5-fold cross-validation was used to evaluate the performance of each parameter combination. The SHAP value of each feature is calculated by determining the marginal contribution of that feature across all possible feature combinations. This is done by analyzing the change in the prediction result when the feature is added or removed, and assigning weights to each marginal contribution. The final model output is then the sum of these individual feature contributions. In this study, the bar plot of feature importance was used to demonstrate the ability of feature importance scores. At the same time, the force diagram and waterfall diagram were used to show the prediction process of a single patient, to help users intuitively understand the effects of features on model prediction, and to improve the interpretability and transparency of the model. Finally, SHAP value heatmaps were used to demonstrate the direction and strength of the influence of features on model predictions.

Statistical analysis

Python 3.9.13 software and SPSS software (version: 25.0) were used for statistical analysis. Independent sample t test was used for continuous variables conforming to normal distribution and described as mean ± standard deviation (SD). Mann–Whitney U test was used for continuous variables not conforming to normal distribution and described as median (quartile). Categorical variables were expressed as frequency and percentage using the chi-square test or Fisher’s exact test. All statistical tests were bilateral, and P < 0.05 was considered statistically significant.

The performance of each model was assessed using the area under the curve (AUC) along with the 95% confidence interval (CI), and a comparison was made of the sensitivity, specificity, precision, and F1 score metrics. Delong test was used to determine whether there was a significant difference in the performance of different models in predicting Ki-67 expression status. The models’ overall improvement was assessed via Integrated Discrimination Improvement (IDI), Calibration curves were used to evaluate the goodness of fit of the model. The net benefit of the test set at different threshold probabilities was quantified by DCA to evaluate the clinical applicability of the model.

Results

Clinical data

The median age of the 195 patients enrolled in this study was 51 years, ranging from 24 to 84 years. They were randomly assigned to the training and test sets. The training set comprised 140 lesions (75 high-expression and 65 low-expression lesions), while the test set contained 61 lesions (42 high-expression and 19 low-expression lesions). The ICC of the semi-quantitative parameters varied from 0.945 to 0.999. Univariate analysis showed that the expression status of Ki-67 was significantly different from TMD, internal enhancement pattern and necrosis in the clinical-imaging features (P < 0.05). In the analysis of binary logistic regression, it was found that internal enhancement mode and necrosis serve as distinct markers of Ki-67 expression in BC after controlling for other variables (P < 0.05) (Table 1). A Clinical-imaging model was developed using logistic regression based on internal enhancement patterns and tumor necrosis.

Table 1 Univariate and binary logistic regression results in the training set.

Radiomics analysis and evaluation

A sum of 2446 radiomics features was extracted from phases 2 and 5 of the DCE-MRI sequence, and the features with ICC ≥ 0.8 were retained. A total of 2316 radiomics features were retained, of which 1126 features were retained in ph2 and 1190 features were retained in ph5. To reduce the redundancy between features, the Pearson correlation coefficient between features was calculated, and the features with high correlation were removed with a threshold of 0.8, and 242 features were finally retained. Finally, 12 radiomics features were retained by RFE, including 7 radiomics features for ph2 and 5 radiomics features for ph5, and the Radiomics model was constructed by logistic regression. A Combined model was constructed by combining clinical-imaging features and radiomics features. In the training set, the AUC of Clinical-imaging model, Radiomics model and Combined model were 0.682, 0.797 and 0.821, respectively. In the test set, the AUC were 0.666, 0.796 and 0.802, respectively. The ROC of different models are shown in Fig. 2. The sensitivity and specificity of each model in predicting Ki-67 expression in breast cancer patients are shown in Table 2.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

ROC curves of the three models. (A) Training set. (B) Test set.

Table 2 Performance of Clinical-imaging, radiomics and combined models in the training and test sets.

Delong test showed that there in the training set, there were significant differences in performance between the Clinical-imaging model and Radiomics model (Clinical-imaging vs. Radiomics: P < 0.001), and there was a significant difference in performance between the Clinical-imaging model and the Combined model (Clinical-imaging vs. Combined: P < 0.001). There was no statistically significant difference in performance between the Radiomics model and the Combined model (Radiomics vs. Combined: P = 0.076). In the test set, there was no statistically significant difference in performance between the Clinical-imaging and Radiomics models (Clinical-imaging vs. Radiomics: P = 0.137), but there was a statistically significant difference in performance between the Clinical-imaging and Combined models (Clinical-imaging vs. Combined: P = 0.046). There was no statistically significant difference in performance between the Radiomics and Combined models (Radiomics vs. Combined: P = 0.872). The IDI results revealed that in the training set, the performance of the combined model was improved by 11.8% and 2.1% compared with the Clinical-imaging and Radiomics models, respectively. They increased by 11% and 1.74% in the test set, respectively.

The calibration curves of the Combined model demonstrated good consistency between the predictive outcome and observation in the training and test sets (Fig. 3). Hosmer–Lemeshow test showed that there was no significant difference between the predicted value and the true value (training set: 0.257, test set: 0.447). The DCA plot shows that the net clinical benefits of the Combined model are higher than those of the Clinical-imaging model and, to a large extent, higher than those of the Radiomics model (Fig. 3).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Model evaluation. (A) Calibration curve of Combined model. (B) Decision curves of the three models in the test set.

Explainable machine learning model Building

The optimal model was interpreted using the SHAP algorithm, the overall Shapley value of the combined model and the individual Shapley value was calculated, and the weights of the nine most significant features of the model were displayed using the SHAP method (Fig. 4). The figure shows that the radiomics features from ph2 of DCE-MRI contributed more to the model than those from ph5, with the surface volume ratio of the original ph2 images contributing the most. Figure 5 demonstrates the individualized predictive performance of the combined model using SHAP plots for two breast cancer patients, visualizing how MRI-based radiomics and clinical features affect the model’s prediction of Ki-67 expression at an individual level. The results showed that the output value of patient 1 was 2.05, which was predicted to be a high expression (Fig. 5A). A waterfall plot (Fig. 5B) illustrates the patient’s overall decision-making process, with a final predicted probability of 0.886. Figure 5C shows Patient 2 with an output value of − 1.45, predicted by the model as a low expression. The same waterfall diagram (Fig. 5D) was used to show the overall decision-making process of the patient, and its final prediction probability was 0.191. However, the heat map of feature distribution under sample clustering (Fig. 6) shows the influence direction and strength of each feature in all samples in the model. The more brightly colored features in the figure, the greater their contribution in the model. Consistent with the results shown in Fig. 4, the surface volume ratio contributed the most in the model, followed by Imc1 from the second phase of DCE-MRI with Gaussian Laplacian filtering.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

SHAP bar graphs show the weights of the nine most important features in the model.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Visualization of the model’s predictions by SHAP, where the base value signifies the model’s average predicted value without any feature influence. Arrow: contribution of each feature, pointing in the direction of the expected outcome; length indicates the importance of the contribution. f(x): the actual prediction result of the model for a specific case. (A,B) A 51 year old patient, invasive carcinoma of the right breast, no special type, with Ki-67 of 30% and high expression predicted by the Combined model; (C,D) A 55 year old patient with invasive carcinoma of the left breast, no special type, with 10% Ki-67 and low expression as predicted by the Combined model. (B) and (D) show waterfall plots, and the dashed line in the figure shows this patient’s overall decision-making process in the model.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Heat map of SHAP values. The direction and strength of influence of each feature of the model are shown for all patients in the test set. For example, the figure shows that the wavelet HLL-glcm-Imc1 in the enhanced delay period has a large positive influence on the first two samples.

Discussion

This study investigated the predictive value of clinical imaging and DCE-MRI dual-phase radiomics features in predicting Ki-67 expression in BC patients. The findings indicated that the Combined model, which combined clinical-imaging and dual-phase radiomics features, outperformed the Clinical-imaging and Radiomics models independently. Simultaneously, this study used the SHAP algorithm to explain the best performance model, visualize the prediction process from individual to overall, ensure the interpretability of the model, assist clinicians in better understanding the decision-making process of the model, and formulate more accurate treatment options for patients.

Recently, many studies have explored the correlation between breast MRI and Ki-67 expression and found that some conventional MRI features are correlated with Ki-67 expression status. Song et al.26 explored the relationship between MRI morphology, kinetic characteristics, ADC value, and Ki-67 expression, and found that Wash-out and intratumoral high signal intensity on T2WI were independent biomarkers of Ki-67 expression status in BC. However, in this study, semi-quantitative parameters were not correlated with the Ki-67 expression, which may be due to the difference in the distribution and quantity of samples between both studies. Surov et al.28 analyzed the relationship between ADC and Ki-67 proliferation index and other histopathological features of BC via a multicenter study, and observed that ADC value could not be used as a surrogate index of BC proliferation activity, which was in line with the findings of this study. Further, this study also revealed that patients with high Ki-67 expression were more likely to have necrosis than those with low Ki-67 expression, which may be related to the proliferation activity of cancer cells. The proliferation activity of BC with elevated Ki-67 expression was considerably higher, and the proliferation rate was excessively rapid. However, neovascularization was insufficient to penetrate the blood flow of the lesion center, resulting in necrosis29. Meanwhile, this study found that patients with elevated Ki-67 levels were susceptible to ring enhancement, similar to a previous study’s findings30.

Radiomics is the process of analyzing medical images to extract high-throughput radiomics features that can accurately represent the overall heterogeneity of lesions. In contrast to conventional imaging features, this method is capable of more accurately reflecting tumor heterogeneity and revealing the possible interaction between radiomics and biological characteristics14,31. Therefore, based on conventional imaging features, the predictive ability of DCE-MRI radiomics features was also analyzed for Ki-67 expression in BC patients. Wu et al.32 extracted the radiomics features of early and delayed DCE-MRI and demonstrated that the nomogram model based on preoperative DCE-MRI images had good efficacy in predicting lymphatic invasion of invasive BC determined by D2-40. In the early and delayed phases of breast DCE-MRI, the characteristics of signal intensity changes may depict the internal blood vessels, microcirculation, and blood circulation of breast lesions, which is important in the differentiation of various forms of BC33,34. Based on this, this study extracted radiomics features of early and delayed DCE-MRI images to obtain more hidden information on images to predict the Ki-67 expression status of BC35. Simultaneously, to reduce the redundancy between features in the early and delayed phases, the Pearson correlation test was used to remove redundant features with high correlation to improve the performance of the model. Ma et al.36 extracted the radiomics features of DCE-MRI and found that the quantitative radiomics features of breast tumors extracted by DCE-MRI were related to the expression of Ki-67 in BC. Feng et al.37 presented that the radiomics model based on the whole tumor area of the DCE-MRI parametric map and the ADC map could predict the Ki-67 expression of BC patients (AUC = 0.839). Liu et al.38 established a radiomics model based on deep learning through multi-parametric MRI for preoperative prediction of Ki-67 status, and its model efficiency reached 0.875. Liang et al.39 established a radiomics classifier as per the T2WI to predict Ki-67 expression, and its model efficiency reached 0.740. Zhang et al.17 extracted radiomics features of bone metastases in patients with primary breast cancer to predict the Ki-67 expression of BC patients. The AUC for the training set was 0.796, while the test set was 0.705. Although previous studies have demonstrated good predictive capabilities for assessing Ki-67 expression in BC, these prediction models are remain a “black box”, and it is challenging to identify the precise function of each feature within the prediction model. Based on this, the SHAP algorithm was used in this study to explain conventional machine learning models at both the local and global levels40,41. The SHAP algorithm functions by quantifying the marginal contribution of a certain feature when included in this model and visualizing the overall or individual contribution of each feature. In this study, the SHAP algorithm was used to analyze the degree of contribution of different features to the prediction model. The average SHAP value of a feature can reflect the average contribution of a feature in the overall prediction. The larger the SHAP value, the greater the influence of the feature on the prediction result. Through the SHAP bar chart, it was observed that “ph2-original-shape-surface volume ratio” exhibited the highest SHAP value, indicating its substantial contribution to the model. Clinicians can understand the specific contributions of different features in each sample by examining the specific SHAP values, which helps doctors understand how the feature affects the model prediction under different circumstances, thereby enhancing the understanding of the model and facilitating the clinical application of the model24,34,41.

Limitations

This study still has some limitations. First, due to the limited temporal resolution of the DCE-MRI sampling in this study, only semi-quantitative parameters can be incorporated to explore their relationship with the expression of Ki-67 in BC. We will improve the temporal resolution in subsequent studies to explore the relationship between quantitative parameters and the expression status of Ki-67. Secondly, this study only explored the value of logistic regression algorithm in predicting Ki-67 expression in breast cancer, and the value of other algorithms needs to be further explored in the future and further explained by SHAP algorithm. Moreover, as a retrospective study, the sample size is relatively small, and in the future study, larger sample size and multi-institutional validation are needed to verify the effectiveness of the model. Finally, the tumor was segmented manually, a time-consuming process, characterized by a high degree of interprocessor variability. In the future, semi-automatic or automatic segmentation may be adopted.

Conclusion

This study developed and validated an interpretable machine learning model based on clinical-imaging features and DCE-MRI dual-phase radiomics features. The model can effectively predict the Ki-67 expression of BC patients, and improve clinicians’ understanding of the decision-making process of the model, which is helpful for clinicians to formulate personalized treatment options for patients.