Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A multimodal machine learning model for the stratification of breast cancer risk

Abstract

Machine learning models for the diagnosis of breast cancer can facilitate the prediction of cancer risk and subsequent patient management among other clinical tasks. For the models to impact clinical practice, they ought to follow standard workflows, help interpret mammography and ultrasound data, evaluate clinical contextual information, handle incomplete data and be validated in prospective settings. Here we report the development and testing of a multimodal model leveraging mammography and ultrasound modules for the stratification of breast cancer risk based on clinical metadata, mammography and trimodal ultrasound (19,360 images of 5,216 breasts) from 5,025 patients with surgically confirmed pathology across medical centres and scanner manufacturers. Compared with the performance of experienced radiologists, the model performed similarly at classifying tumours as benign or malignant and was superior at pathology-level differential diagnosis. With a prospectively collected dataset of 191 breasts from 187 patients, the overall accuracies of the multimodal model and of preliminary pathologist-level assessments of biopsied breast specimens were similar (90.1% vs 92.7%, respectively). Multimodal models may assist diagnosis in oncology.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The overall study design for breast cancer risk stratification and patient care.
Fig. 2: The confusion-matrix comparison between our AI system and human experts at fine-grained breast disease partition.
Fig. 3: Performance of individual modules and readers at highly general coarse-grained breast cancer assessment via inference algorithm.
Fig. 4: The adaptability of the BMU-Net model for breast cancer risk stratification to real-world clinical settings.

Similar content being viewed by others

Data availability

The main data supporting the results in this study are available within the paper and its Supplementary Information. The mammography, ultrasound and multimodal datasets from the First Affiliated Hospital of Anhui Medical University (two branches), Xuancheng People’s Hospital, Nanjing Hospital affiliated to Nanjing Medical University, and Fuyang Cancer Hospital of China are protected because of patient privacy, yet some data can be made available for academic purposes from the corresponding author on reasonable request and with permission from the hospitals. Source data for Figs. 2 and 3 are provided with this paper.

Code availability

The codes used in this study are available in GitHub at https://github.com/Qian-IMMULab/BMU-Net (ref. 55). The pretrained weights for the mammography module are publicly available – Mirai model28. Custom codes and the annotation tool for the deployment of the AI system are available for research purposes from the corresponding author on reasonable request.

References

  1. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).

    Article  PubMed  Google Scholar 

  2. Oeffinger, K. C. et al. Breast cancer screening for women at average risk: 2015 guideline update from the American Cancer Society. JAMA 314, 1599–1614 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Boyd, N. F. et al. Mammographic density and the risk and detection of breast cancer. N. Engl. J. Med. 356, 227–236 (2007).

    Article  CAS  PubMed  Google Scholar 

  4. Berg, W. A. et al. Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer. JAMA 299, 2151–2163 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Harada-Shoji, N. et al. Evaluation of adjunctive ultrasonography for breast cancer detection among women aged 40-49 years with varying breast density undergoing screening mammography: a secondary analysis of a randomized clinical trial. JAMA Netw. Open 4, e2121505 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Berg, W. A. et al. Shear-wave elastography improves the specificity of breast US: the BE1 multinational study of 939 masses. Radiology 262, 435–449 (2012).

    Article  PubMed  Google Scholar 

  7. Cho, N. et al. Distinguishing benign from malignant masses at breast US: combined US elastography and color Doppler US—influence on radiologist accuracy. Radiology 262, 80–90 (2012).

    Article  PubMed  Google Scholar 

  8. Kolb, T. M., Lichy, J. & Newhouse, J. H. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 225, 165–175 (2002).

    Article  PubMed  Google Scholar 

  9. De Felice, C. et al. Diagnostic utility of combined ultrasonography and mammography in the evaluation of women with mammographically dense breasts. J. Ultrasound 10, 143–151 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  10. D’Orsi, C. J., Sickles, E. A., Mendelson, E. B. & Morris, E. A. ACR BI-RADS Atlas: Breast Imaging Reporting and Data System; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up and Outcome Monitoring, Data Dictionary (American College of Radiology, 2013).

  11. Lazarus, E., Mainiero, M. B., Schepps, B., Koelliker, S. L. & Livingston, L. S. BI-RADS lexicon for US and mammography: interobserver variability and positive predictive value. Radiology 239, 385–391 (2006).

    Article  PubMed  Google Scholar 

  12. Tosteson, A. N. et al. Consequences of false-positive screening mammograms. JAMA Intern. Med. 174, 954–961 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Gilbert, F. J. et al. Single reading with computer-aided detection for screening mammography. N. Engl. J. Med. 359, 1675–1684 (2008).

    Article  CAS  PubMed  Google Scholar 

  14. Chen, C.-M. et al. Breast lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and artificial neural networks. Radiology 226, 504–514 (2003).

    Article  PubMed  Google Scholar 

  15. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018).

    Article  PubMed  Google Scholar 

  17. Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

    Article  PubMed  Google Scholar 

  20. Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Yankeelov, T. E., Abramson, R. G. & Quarles, C. C. Quantitative multimodality imaging in cancer research and therapy. Nat. Rev. Clin. Oncol. 11, 670–680 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

    Article  CAS  PubMed  Google Scholar 

  23. Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer 3, 723–733 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lotter, W. et al. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nat. Med. 27, 244–249 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

    Article  CAS  PubMed  Google Scholar 

  26. Wu, N. et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans. Med. Imaging 39, 1184–1194 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kim, H.-E. et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit. Health 2, e138–e148 (2020).

    Article  PubMed  Google Scholar 

  28. Yala, A. et al. Toward robust mammography-based models for breast cancer risk. Sci. Transl. Med. 13, eaba4373 (2021).

    Article  PubMed  Google Scholar 

  29. Qian, X. et al. A combined ultrasonic B-mode and color Doppler system for the classification of breast masses using neural network. Eur. Radiol. 30, 3023–3033 (2020).

    Article  PubMed  Google Scholar 

  30. Qian, X. et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat. Biomed. Eng. 5, 522–532 (2021).

    Article  PubMed  Google Scholar 

  31. Shen, Y. et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat. Commun. 12, 5645 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Yan, L. et al. A domain knowledge-based interpretable deep learning system for improving clinical breast ultrasound diagnosis. Commun. Med. 4, 90 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Zhang, A., Xing, L., Zou, J. & Wu, J. C. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed. Eng. 6, 1330–1345 (2022).

  34. Nagendran, M. et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. Brit. Med. J. 368, m689 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186, 1772–1791 (2023).

    Article  CAS  PubMed  Google Scholar 

  36. Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

    Article  Google Scholar 

  37. Buchberger, W., Niehoff, A., Obrist, P., DeKoekkoek-Doll, P. & Dünser, M. Clinically and mammographically occult breast lesions: detection and classification with high resolution sonography. Semin. Ultrasound CT MRI 21, 325–336 (2000).

  38. Kim, H. J. et al. Mammographically occult breast cancers detected with AI-based diagnosis supporting software: clinical and histopathologic characteristics. Insights Imaging 13, 57 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Tadesse, G. F., Tegaw, E. M. & Abdisa, E. K. Diagnostic performance of mammography and ultrasound in breast cancer: a systematic review and meta-analysis. J. Ultrasound 26, 355–367 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Lev, M. H. et al. Acute stroke: improved nonenhanced CT detection—benefits of soft-copy interpretation by using variable window width and center level settings. Radiology 213, 150–155 (1999).

    Article  CAS  PubMed  Google Scholar 

  41. Youk, J. H., Kim, E.-K., Kim, M. J. & Oh, K. K. Sonographically guided 14-gauge core needle biopsy of breast masses: a review of 2,420 cases with long-term follow-up. Am. J. Roentgenol. 190, 202–207 (2008).

    Article  Google Scholar 

  42. Elmore, J. G. et al. Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA 313, 1122–1132 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Li, H., Zhuang, S., Li, D.-A., Zhao, J. & Ma, Y. Benign and malignant classification of mammogram images based on deep learning. Biomed. Signal Process. Control 51, 347–354 (2019).

    Article  Google Scholar 

  44. Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6, 27 (2019).

    Article  Google Scholar 

  45. Mirbagheri, E., Ahmadi, M. & Salmanian, S. Common data elements of breast cancer for research databases: a systematic review. J. Fam. Med. Prim. Care 9, 1296 (2020).

    Article  Google Scholar 

  46. Chang, J. M., Moon, W. K., Cho, N. & Kim, S. J. Breast mass evaluation: factors influencing the quality of US elastography. Radiology 259, 59–64 (2011).

    Article  PubMed  Google Scholar 

  47. Sarp, S. et al. Tumor location of the lower-inner quadrant is associated with an impaired survival for women with early-stage breast cancer. Ann. Surg. Oncol. 14, 1031–1039 (2007).

    Article  PubMed  Google Scholar 

  48. Clough, K. B., Kaufman, G. J., Nos, C., Buccimazza, I. & Sarfati, I. M. Improving breast cancer surgery: a classification and quadrant per quadrant atlas for oncoplastic surgery. Ann. Surg. Oncol. 17, 1375–1391 (2010).

    Article  PubMed  Google Scholar 

  49. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2016.90 (IEEE, 2016).

  50. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2020).

  51. Snoek, C. G., Worring, M. & Smeulders, A. W. Early versus late fusion in semantic video analysis. In Proc. 13th Annual ACM International Conference on Multimedia 399–402 (Association for Computing Machinery, 2005).

  52. McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012).

    Article  Google Scholar 

  53. Fluss, R., Faraggi, D. & Reiser, B. Estimation of the Youden Index and its associated cutoff point. Biom. J. 47, 458–472 (2005).

    Article  PubMed  Google Scholar 

  54. Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision https://doi.org/10.1109/ICCV.2017.74 (IEEE, 2017).

  55. Qian, X. et al. BMU-Net. GitHub https://github.com/Qian-IMMULab (2024).

Download references

Acknowledgements

We thank L. Yan and X. Gao for software infrastructure support for data preprocessing; P. Liu, X. Zhang and Y. Wu for help in clinical data management. This work would not have been possible without the participation of the mammographers (Y. Sun, Y. Lu, W. Qian, X. Wang and B. Zhu), the sonographers (W. Yao, X. Shuai, J. Zhang and X. Xie) and the pathologists (the team support from the Department of Pathology at the First Affiliated Hospital of Anhui Medical University). This study was supported by the National Natural Science Foundation of China (no. 82371993 to X.Q.), an internal grant from the ShanghaiTech University (to X.Q.) and the HPC Computing Platform of ShanghaiTech University.

Author information

Authors and Affiliations

Authors

Contributions

X.Q. conceived, designed and supervised the project. J.P., D.S. and H. Zheng provided clinical and technical expertise for the study. X.Q., Z.L., D.Y. and Y.C. preprocessed the raw image data, developed the deep-learning framework and software tools necessary for the experiments. C. Han, G.Z. and Z.L. created the datasets, interpreted the data and defined the clinical labels. C. Han, G.Z., N.C., W.Z., F.M., H. Zhang, X.W., Y.S. and W.Q. collected the mammography, ultrasound, clinical contextual information and patients’ pathology results in clinic. X.Q., Z.L., D.Y., Y.C., C. Hu and Z.E. executed the research and performed statistical analysis. X.Q. conducted literature search and wrote the manuscript. All authors contributed to the review and editing of the manuscript.

Corresponding author

Correspondence to Xuejun Qian.

Ethics declarations

Competing interests

X.Q., D.S. and Z.L. are co-inventors on a provisional patent application (2023108088594, China, 2023) encompassing the work described. The other authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Ao Li and the other, anonymous reviewer(s), their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of the flowchart of patient recruitment and assignment.

Pre-defined inclusion and exclusion criteria are applied to all datasets. Owing to the lack of pathology results in screening population, only diagnostic population with findings are included in our study. Mammography dataset is retrospectively collected between February 2016 and June 2022. Ultrasound and multimodal datasets are prospectively gathered from September 2019 to August 2023, from January 2021 to August 2023, respectively.

Extended Data Fig. 2 The performance of five mammographers and four sonographers in the reader study, using a confusion matrix.

Mammographer was tested on the internal test cohort of MG_H1 and sonographer was tested on the internal test cohort of US_H1M1.

Extended Data Fig. 3 The coarse-grained performance of individual modules and readers on a BI-RADS 4 subset.

Different from Fig. 3 whose results are based on the combined BI-RADS categories, results here in ad are measured exclusively on the BI-RADS 4 subset (BI-RADS 4 refers original radiologists’ interpretation from clinical report in Table 1, not the readers in this study) from test cohorts of mammography dataset and ultrasound dataset, respectively.

Extended Data Fig. 4 The coarse-grained performance of individual modules and readers on a BI-RADS 5 subset.

Different from Fig. 3 whose results are based on the combined BI-RADS categories, results here are measured on only BI-RADS 5 subset of a, MG_H1 test cohort and b, US_H1M1 test cohort. The ground truth of radiologists labelled BI-RADS 5 subsets are 49 malignant and 2 benign for mammography, and 10 malignant for ultrasound. It should be noted that very few benign cases (3.9% for mammography dataset and 0% for ultrasound dataset) are included in BI-RADS 5, thus, only sensitivity is meaningful in this figure.

Extended Data Fig. 5 Confusion matrix of the mammography module on two external mammography datasets.

a, MG_H2 test cohort, b, MG_H3 test cohort. T1-T5 refers to regular check benign, attention needed benign, carcinoma in situ, CIS-IC carcinoma, invasive carcinoma, respectively.

Extended Data Fig. 6 Confusion matrix of the ultrasound module on three external ultrasound datasets.

a, US_H1M2 test cohort, b, US_H2 test cohort, c, US_H3 test cohort. T1-T5 refers to regular check benign, attention needed benign, carcinoma in situ, CIS-IC carcinoma, invasive carcinoma, respectively.

Extended Data Fig. 7 Confusion matrix of the BMU-Net model on an external multimodal dataset.

T1-T5 refers to regular check benign, attention needed benign, carcinoma in situ, CIS-IC carcinoma, invasive carcinoma, respectively.

Extended Data Fig. 8 Examples of AI prediction basis.

Colour-coded heatmaps overlaid with the corresponding mammography and tri-modal ultrasound images were generated from the final convolutional layer using the Grad-CAM approach. Breast surgical pathology confirmed results of a, invasive carcinoma – predicted as class T5 by BMU-Net model, and b, fibroadenomas – predicted as class T1 by BMU-Net model.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figures, Tables, Notes and References.

Reporting Summary (download PDF )

Source data

Source Data for Fig. 2 (download XLSX )

Source data and statistics.

Source Data for Fig. 3 (download XLSX )

Source data and statistics.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, X., Pei, J., Han, C. et al. A multimodal machine learning model for the stratification of breast cancer risk. Nat. Biomed. Eng 9, 356–370 (2025). https://doi.org/10.1038/s41551-024-01302-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41551-024-01302-7

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer