Abstract
Machine learning models for the diagnosis of breast cancer can facilitate the prediction of cancer risk and subsequent patient management among other clinical tasks. For the models to impact clinical practice, they ought to follow standard workflows, help interpret mammography and ultrasound data, evaluate clinical contextual information, handle incomplete data and be validated in prospective settings. Here we report the development and testing of a multimodal model leveraging mammography and ultrasound modules for the stratification of breast cancer risk based on clinical metadata, mammography and trimodal ultrasound (19,360 images of 5,216 breasts) from 5,025 patients with surgically confirmed pathology across medical centres and scanner manufacturers. Compared with the performance of experienced radiologists, the model performed similarly at classifying tumours as benign or malignant and was superior at pathology-level differential diagnosis. With a prospectively collected dataset of 191 breasts from 187 patients, the overall accuracies of the multimodal model and of preliminary pathologist-level assessments of biopsied breast specimens were similar (90.1% vs 92.7%, respectively). Multimodal models may assist diagnosis in oncology.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
The main data supporting the results in this study are available within the paper and its Supplementary Information. The mammography, ultrasound and multimodal datasets from the First Affiliated Hospital of Anhui Medical University (two branches), Xuancheng People’s Hospital, Nanjing Hospital affiliated to Nanjing Medical University, and Fuyang Cancer Hospital of China are protected because of patient privacy, yet some data can be made available for academic purposes from the corresponding author on reasonable request and with permission from the hospitals. Source data for Figs. 2 and 3 are provided with this paper.
Code availability
The codes used in this study are available in GitHub at https://github.com/Qian-IMMULab/BMU-Net (ref. 55). The pretrained weights for the mammography module are publicly available – Mirai model28. Custom codes and the annotation tool for the deployment of the AI system are available for research purposes from the corresponding author on reasonable request.
References
Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).
Oeffinger, K. C. et al. Breast cancer screening for women at average risk: 2015 guideline update from the American Cancer Society. JAMA 314, 1599–1614 (2015).
Boyd, N. F. et al. Mammographic density and the risk and detection of breast cancer. N. Engl. J. Med. 356, 227–236 (2007).
Berg, W. A. et al. Combined screening with ultrasound and mammography vs mammography alone in women at elevated risk of breast cancer. JAMA 299, 2151–2163 (2008).
Harada-Shoji, N. et al. Evaluation of adjunctive ultrasonography for breast cancer detection among women aged 40-49 years with varying breast density undergoing screening mammography: a secondary analysis of a randomized clinical trial. JAMA Netw. Open 4, e2121505 (2021).
Berg, W. A. et al. Shear-wave elastography improves the specificity of breast US: the BE1 multinational study of 939 masses. Radiology 262, 435–449 (2012).
Cho, N. et al. Distinguishing benign from malignant masses at breast US: combined US elastography and color Doppler US—influence on radiologist accuracy. Radiology 262, 80–90 (2012).
Kolb, T. M., Lichy, J. & Newhouse, J. H. Comparison of the performance of screening mammography, physical examination, and breast US and evaluation of factors that influence them: an analysis of 27,825 patient evaluations. Radiology 225, 165–175 (2002).
De Felice, C. et al. Diagnostic utility of combined ultrasonography and mammography in the evaluation of women with mammographically dense breasts. J. Ultrasound 10, 143–151 (2007).
D’Orsi, C. J., Sickles, E. A., Mendelson, E. B. & Morris, E. A. ACR BI-RADS Atlas: Breast Imaging Reporting and Data System; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up and Outcome Monitoring, Data Dictionary (American College of Radiology, 2013).
Lazarus, E., Mainiero, M. B., Schepps, B., Koelliker, S. L. & Livingston, L. S. BI-RADS lexicon for US and mammography: interobserver variability and positive predictive value. Radiology 239, 385–391 (2006).
Tosteson, A. N. et al. Consequences of false-positive screening mammograms. JAMA Intern. Med. 174, 954–961 (2014).
Gilbert, F. J. et al. Single reading with computer-aided detection for screening mammography. N. Engl. J. Med. 359, 1675–1684 (2008).
Chen, C.-M. et al. Breast lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and artificial neural networks. Radiology 226, 504–514 (2003).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018).
Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).
Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019).
Yankeelov, T. E., Abramson, R. G. & Quarles, C. C. Quantitative multimodality imaging in cancer research and therapy. Nat. Rev. Clin. Oncol. 11, 670–680 (2014).
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer 3, 723–733 (2022).
Lotter, W. et al. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nat. Med. 27, 244–249 (2021).
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
Wu, N. et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans. Med. Imaging 39, 1184–1194 (2019).
Kim, H.-E. et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit. Health 2, e138–e148 (2020).
Yala, A. et al. Toward robust mammography-based models for breast cancer risk. Sci. Transl. Med. 13, eaba4373 (2021).
Qian, X. et al. A combined ultrasonic B-mode and color Doppler system for the classification of breast masses using neural network. Eur. Radiol. 30, 3023–3033 (2020).
Qian, X. et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat. Biomed. Eng. 5, 522–532 (2021).
Shen, Y. et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat. Commun. 12, 5645 (2021).
Yan, L. et al. A domain knowledge-based interpretable deep learning system for improving clinical breast ultrasound diagnosis. Commun. Med. 4, 90 (2024).
Zhang, A., Xing, L., Zou, J. & Wu, J. C. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed. Eng. 6, 1330–1345 (2022).
Nagendran, M. et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. Brit. Med. J. 368, m689 (2020).
Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186, 1772–1791 (2023).
Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Buchberger, W., Niehoff, A., Obrist, P., DeKoekkoek-Doll, P. & Dünser, M. Clinically and mammographically occult breast lesions: detection and classification with high resolution sonography. Semin. Ultrasound CT MRI 21, 325–336 (2000).
Kim, H. J. et al. Mammographically occult breast cancers detected with AI-based diagnosis supporting software: clinical and histopathologic characteristics. Insights Imaging 13, 57 (2022).
Tadesse, G. F., Tegaw, E. M. & Abdisa, E. K. Diagnostic performance of mammography and ultrasound in breast cancer: a systematic review and meta-analysis. J. Ultrasound 26, 355–367 (2023).
Lev, M. H. et al. Acute stroke: improved nonenhanced CT detection—benefits of soft-copy interpretation by using variable window width and center level settings. Radiology 213, 150–155 (1999).
Youk, J. H., Kim, E.-K., Kim, M. J. & Oh, K. K. Sonographically guided 14-gauge core needle biopsy of breast masses: a review of 2,420 cases with long-term follow-up. Am. J. Roentgenol. 190, 202–207 (2008).
Elmore, J. G. et al. Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA 313, 1122–1132 (2015).
Li, H., Zhuang, S., Li, D.-A., Zhao, J. & Ma, Y. Benign and malignant classification of mammogram images based on deep learning. Biomed. Signal Process. Control 51, 347–354 (2019).
Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6, 27 (2019).
Mirbagheri, E., Ahmadi, M. & Salmanian, S. Common data elements of breast cancer for research databases: a systematic review. J. Fam. Med. Prim. Care 9, 1296 (2020).
Chang, J. M., Moon, W. K., Cho, N. & Kim, S. J. Breast mass evaluation: factors influencing the quality of US elastography. Radiology 259, 59–64 (2011).
Sarp, S. et al. Tumor location of the lower-inner quadrant is associated with an impaired survival for women with early-stage breast cancer. Ann. Surg. Oncol. 14, 1031–1039 (2007).
Clough, K. B., Kaufman, G. J., Nos, C., Buccimazza, I. & Sarfati, I. M. Improving breast cancer surgery: a classification and quadrant per quadrant atlas for oncoplastic surgery. Ann. Surg. Oncol. 17, 1375–1391 (2010).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2016.90 (IEEE, 2016).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2020).
Snoek, C. G., Worring, M. & Smeulders, A. W. Early versus late fusion in semantic video analysis. In Proc. 13th Annual ACM International Conference on Multimedia 399–402 (Association for Computing Machinery, 2005).
McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012).
Fluss, R., Faraggi, D. & Reiser, B. Estimation of the Youden Index and its associated cutoff point. Biom. J. 47, 458–472 (2005).
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision https://doi.org/10.1109/ICCV.2017.74 (IEEE, 2017).
Qian, X. et al. BMU-Net. GitHub https://github.com/Qian-IMMULab (2024).
Acknowledgements
We thank L. Yan and X. Gao for software infrastructure support for data preprocessing; P. Liu, X. Zhang and Y. Wu for help in clinical data management. This work would not have been possible without the participation of the mammographers (Y. Sun, Y. Lu, W. Qian, X. Wang and B. Zhu), the sonographers (W. Yao, X. Shuai, J. Zhang and X. Xie) and the pathologists (the team support from the Department of Pathology at the First Affiliated Hospital of Anhui Medical University). This study was supported by the National Natural Science Foundation of China (no. 82371993 to X.Q.), an internal grant from the ShanghaiTech University (to X.Q.) and the HPC Computing Platform of ShanghaiTech University.
Author information
Authors and Affiliations
Contributions
X.Q. conceived, designed and supervised the project. J.P., D.S. and H. Zheng provided clinical and technical expertise for the study. X.Q., Z.L., D.Y. and Y.C. preprocessed the raw image data, developed the deep-learning framework and software tools necessary for the experiments. C. Han, G.Z. and Z.L. created the datasets, interpreted the data and defined the clinical labels. C. Han, G.Z., N.C., W.Z., F.M., H. Zhang, X.W., Y.S. and W.Q. collected the mammography, ultrasound, clinical contextual information and patients’ pathology results in clinic. X.Q., Z.L., D.Y., Y.C., C. Hu and Z.E. executed the research and performed statistical analysis. X.Q. conducted literature search and wrote the manuscript. All authors contributed to the review and editing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
X.Q., D.S. and Z.L. are co-inventors on a provisional patent application (2023108088594, China, 2023) encompassing the work described. The other authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Ao Li and the other, anonymous reviewer(s), their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overview of the flowchart of patient recruitment and assignment.
Pre-defined inclusion and exclusion criteria are applied to all datasets. Owing to the lack of pathology results in screening population, only diagnostic population with findings are included in our study. Mammography dataset is retrospectively collected between February 2016 and June 2022. Ultrasound and multimodal datasets are prospectively gathered from September 2019 to August 2023, from January 2021 to August 2023, respectively.
Extended Data Fig. 2 The performance of five mammographers and four sonographers in the reader study, using a confusion matrix.
Mammographer was tested on the internal test cohort of MG_H1 and sonographer was tested on the internal test cohort of US_H1M1.
Extended Data Fig. 3 The coarse-grained performance of individual modules and readers on a BI-RADS 4 subset.
Different from Fig. 3 whose results are based on the combined BI-RADS categories, results here in a–d are measured exclusively on the BI-RADS 4 subset (BI-RADS 4 refers original radiologists’ interpretation from clinical report in Table 1, not the readers in this study) from test cohorts of mammography dataset and ultrasound dataset, respectively.
Extended Data Fig. 4 The coarse-grained performance of individual modules and readers on a BI-RADS 5 subset.
Different from Fig. 3 whose results are based on the combined BI-RADS categories, results here are measured on only BI-RADS 5 subset of a, MG_H1 test cohort and b, US_H1M1 test cohort. The ground truth of radiologists labelled BI-RADS 5 subsets are 49 malignant and 2 benign for mammography, and 10 malignant for ultrasound. It should be noted that very few benign cases (3.9% for mammography dataset and 0% for ultrasound dataset) are included in BI-RADS 5, thus, only sensitivity is meaningful in this figure.
Extended Data Fig. 5 Confusion matrix of the mammography module on two external mammography datasets.
a, MG_H2 test cohort, b, MG_H3 test cohort. T1-T5 refers to regular check benign, attention needed benign, carcinoma in situ, CIS-IC carcinoma, invasive carcinoma, respectively.
Extended Data Fig. 6 Confusion matrix of the ultrasound module on three external ultrasound datasets.
a, US_H1M2 test cohort, b, US_H2 test cohort, c, US_H3 test cohort. T1-T5 refers to regular check benign, attention needed benign, carcinoma in situ, CIS-IC carcinoma, invasive carcinoma, respectively.
Extended Data Fig. 7 Confusion matrix of the BMU-Net model on an external multimodal dataset.
T1-T5 refers to regular check benign, attention needed benign, carcinoma in situ, CIS-IC carcinoma, invasive carcinoma, respectively.
Extended Data Fig. 8 Examples of AI prediction basis.
Colour-coded heatmaps overlaid with the corresponding mammography and tri-modal ultrasound images were generated from the final convolutional layer using the Grad-CAM approach. Breast surgical pathology confirmed results of a, invasive carcinoma – predicted as class T5 by BMU-Net model, and b, fibroadenomas – predicted as class T1 by BMU-Net model.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figures, Tables, Notes and References.
Source data
Source Data for Fig. 2 (download XLSX )
Source data and statistics.
Source Data for Fig. 3 (download XLSX )
Source data and statistics.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qian, X., Pei, J., Han, C. et al. A multimodal machine learning model for the stratification of breast cancer risk. Nat. Biomed. Eng 9, 356–370 (2025). https://doi.org/10.1038/s41551-024-01302-7
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41551-024-01302-7
This article is cited by
-
Development and validation of a malignant risk prediction model for breast cystic-solid lesions based on clinical and ultrasound features
BMC Women's Health (2026)
-
A foundation model for breast and lung cancer screening using non-contrast computed tomography
Nature Health (2026)
-
Enhanced Multi-Layer Graphene-Metal Terahertz Biosensor with Machine Learning Optimization for Early-Stage Breast Cancer Detection
Journal of Electronic Materials (2026)
-
The AI revolution: how multimodal intelligence will reshape the oncology ecosystem
npj Artificial Intelligence (2025)
-
The role of artificial intelligence and machine learning in human disease diagnosis: a comprehensive review
Iran Journal of Computer Science (2025)


