Abstract
Foundation models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development for breast ultrasound analysis remains untapped. Here we present BUSGen, the first foundation generative model designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired extensive knowledge of breast structures, pathological features and clinical variations. With few-shot adaptation, BUSGen can generate repositories of realistic and informative task-specific data, facilitating the development of models for a wide range of downstream tasks. Extensive experiments highlight BUSGen’s exceptional adaptability, significantly exceeding real-data-trained foundation models in breast cancer screening, diagnosis and prognosis. In breast cancer early diagnosis, our approach outperformed all board-certified radiologists (n = 9), achieving an average sensitivity improvement of 16.5% (P < 0.0001). In addition, we characterized the scaling effect of using synthetic data. Finally, BUSGen enabled de-identified data sharing, making progress forward in secure medical data utilization.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The BUSI dataset used in this study is publicly available at https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset. The UDIAT dataset is publicly available at https://www2.docm.mmu.ac.uk/STAFF/m.yap/dataset.php. The BUS-BRA103 dataset is publicly available at https://zenodo.org/records/8231412. We released a repository of the BUSI dataset with scanner device augmentation, publicly available in figshare at https://doi.org/10.6084/m9.figshare.30571916.v1 (ref. 104). However, due to respective Institutional Review Board restrictions and to protect patient privacy, the BUS-3.5M and the downstream datasets used in this study cannot be made publicly available. De-identified data may be made available by the corresponding authors for research purposes upon reasonable request. Source data are provided with this paper.
Code availability
The pretraining and adaptation code for BUSGen and an online API are available upon reasonable request via email to haojunyu@pku.edu.cn. Requests will be responded to within 6 weeks. Also, we provide an online demo of BUSGen at https://aibus.bio.
References
Siegel, R. L. et al. Cancer statistics, 2023. CA Cancer J. Clin. 73, 17–48 (2023).
Chhikara, B. S. & Parang, K. Global cancer statistics 2022: the trends projection analysis. Chem. Biol. Lett. 10, 451–451 (2023).
Xia, C. et al. Cancer statistics in China and United States, 2022: profiles, trends, and determinants. Chin. Med. J. 135, 584–590 (2022).
Sickles, E. A. ACR BI-RADS atlas, breast imaging reporting and data system. J. Am. Coll. Radiol. 39 (2013).
Ohuchi, N. et al. Sensitivity and specificity of mammography and adjunctive ultrasonography to screen for breast cancer in the Japan Strategic Anti-cancer Randomized Trial (J-START): a randomised controlled trial. Lancet 387, 341–348 (2016).
Berg, W. A. et al. Detection of breast cancer with addition of annual screening ultrasound or a single screening MRI to mammography in women with elevated breast cancer risk. JAMA 307, 1394–1404 (2012).
Shen, S. et al. A multi-centre randomised trial comparing ultrasound vs mammography for screening breast cancer in high-risk Chinese women. Br. J. Cancer 112, 998–1004 (2015).
Brem, R. F., Lenihan, M. J., Lieberman, J. & Torrente, J. Screening breast ultrasound: past, present, and future. Am. J. Roentgenol. 204, 234–240 (2015).
Sood, R. et al. Ultrasound for breast cancer detection globally: a systematic review and meta-analysis. J. Glob. Oncol. 5, 1–17 (2019).
Park, Y. et al. Pan-Asian adapted ESMO clinical practice guidelines for the management of patients with early breast cancer: a KSMO-ESMO initiative endorsed by CSCO, ISMPO, JSMO, MOS, SSO and TOS. Ann. Oncol. 31, 451–469 (2020).
Weese, J. & Lorenz, C. Four Challenges in Medical Image Analysis from an Industrial Perspective (Elsevier, 2016).
González-Villà, S. et al. A review on brain structures segmentation in magnetic resonance imaging. Artif. Intell. Med. 73, 45–69 (2016).
Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).
Cao, K. et al. Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat. Med. 29, 3033–3043 (2023).
Goetz, L., Seedat, N., Vandersluis, R. & Schaar, M. Generalization—a key challenge for responsible AI in patient-facing clinical applications. npj Digit. Med. 7, 126 (2024).
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Yap, M. H. et al. Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J. Biomed. Health Inform. 22, 1218–1226 (2017).
Al-Dhabyani, W., Gomaa, M., Khaled, H. & Fahmy, A. Dataset of breast ultrasound images. Data Brief 28, 104863 (2020).
Lin, Z. et al. A new dataset and a baseline model for breast lesion detection in ultrasound videos. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 614–623 (Springer, 2022).
Malin, B. A., Emam, K. E. & O’Keefe, C. M. Biomedical data privacy: problems, perspectives, and recent advances. J. Am. Med. Inform. Assoc. 20, 2–6 (2013).
Shen, Y. et al. Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat. Commun. 12, 5645 (2021).
Wu, T., Sultan, L. R., Tian, J., Cary, T. W. & Sehgal, C. M. Machine learning for diagnostic ultrasound of triple-negative breast cancer. Breast Cancer Res. Treat. 173, 365–373 (2019).
Ozaki, J. et al. Deep learning method with a convolutional neural network for image classification of normal and metastatic axillary lymph nodes on breast ultrasonography. Jpn J. Radiol. 40, 814–822 (2022).
Zuluaga-Gomez, J., Al Masry, Z., Benaggoune, K., Meraghni, S. & Zerhouni, N. A CNN-based methodology for breast cancer diagnosis using thermal images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 9, 131–145 (2021).
Ong Ly, C. et al. Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data. npj Digit. Med. 7, 124 (2024).
Namli, T. et al. A scalable and transparent data pipeline for AI-enabled health data ecosystems. Front. Med. 11, 1393123 (2024).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10684–10695 (IEEE, 2022).
Gemini Team Google. Gemini: a family of highly capable multimodal models. Preprint at https://arxiv.org/abs/2312.11805 (2023).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).
Bluethgen, C. et al. A vision-language foundation model for the generation of realistic chest X-ray images. Nat. Biomed. Eng. 9, 494–506 (2025).
Wang, J. et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat. Med. 31, 609–617 (2025).
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
Lee, C. H. et al. Breast cancer screening with imaging: recommendations from the society of breast imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer. J. Am. Coll. Radiol. 7, 18–27 (2010).
Marmot, M. G. et al. The benefits and harms of breast cancer screening: an independent review. Br. J. Cancer 108, 2205–2240 (2013).
Qian, X. et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat. Biomed. Eng. 5, 522–532 (2021).
Pacilè, S. et al. Improving breast cancer detection accuracy of mammography with the concurrent use of an artificial intelligence tool. Radiol. Artif. Intell. 2, 190208 (2020).
Dembrower, K. et al. Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study. Lancet Digit. Health 2, 468–474 (2020).
Zhou, L.-Q. et al. Lymph node metastasis prediction from primary breast cancer US images using deep learning. Radiology 294, 19–28 (2020).
Tafreshi, N. K., Kumar, V., Morse, D. L. & Gatenby, R. A. Molecular and functional imaging of breast cancer. Cancer Control 17, 143–155 (2010).
Zhao, S., Zuo, W.-J., Shao, Z.-M. & Jiang, Y.-Z. Molecular subtypes and precision treatment of triple-negative breast cancer. Ann. Transl. Med. 8, 499 (2020).
Tsang, J. Y. & Tse, G. M. Molecular classification of breast cancer. Adv. Anat. Pathol. 27, 27–35 (2020).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning 8748–8763 (PMLR, 2021).
Fan, L. et al. Scaling laws of synthetic images for model training… for now. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 7382–7392 (IEEE, 2024).
Ktena, I. et al. Generative models improve fairness of medical classifiers under distribution shifts. Nat. Med. 30, 1166–1173 (2024).
Dhariwal, P. & Nichol, A. Diffusion models beat GANS on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).
Ho, J. & Salimans, T. Classifier-free diffusion guidance. In NeurIPS Workshop on Deep Generative Models and Downstream Applications Poster (NeurIPS, 2021).
Zhang, L., Rao, A. & Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proc. IEEE/CVF International Conference on Computer Vision 3836–3847 (IEEE,2023).
Hu, E. J. et al. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations Poster (ICLR, 2022).
Lu, C. et al. DPM-Solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. Adv. Neural Inf. Process. Syst. 35, 5775–5787 (2022).
Lu, C. et al. DPM-Solver++: fast solver for guided sampling of diffusion probabilistic models. Mach. Intell. Res. 22, 730–751 (2025).
Kazdan, J. et al. CPsample: classifier protected sampling for guarding training data during diffusion. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Lin, W. et al. PMC-CLIP: contrastive language-image pre-training using biomedical documents. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention 525–536 (Springer, 2023).
Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: contrastive learning from unpaired medical images and text. In Proc. Conference on Empirical Methods in Natural Language Processing 3876 (ACL, 2022).
Eslami, S., Meinel, C. & de Melo, G. in Findings of the Association for Computational Linguistics: EACL 2023 (eds Vlachos, A. & Augenstein, I.) 1181–1193 (ACL, 2023).
Zhang, S. et al. A multimodal biomedical foundation model trained from fifteen million image–text pairs. NEJM AI 2, 2400640 (2025).
Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proc. IEEE/CVF International Conference on Computer Vision 9640–9649 (IEEE, 2021).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16000–16009 (IEEE, 2022).
Geman, D., Geman, S., Hallonquist, N. & Younes, L. Visual Turing test for computer vision systems. Proc. Natl Acad. Sci. USA 112, 3618–3623 (2015).
Carlini, N. et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21) 2633–2650 (USENIX, 2021).
Elmore, J. G., Armstrong, K., Lehman, C. D. & Fletcher, S. W. Screening for breast cancer. JAMA 293, 1245–1256 (2005).
Myers, E. R. et al. Benefits and harms of breast cancer screening: a systematic review. JAMA 314, 1615–1634 (2015).
Yuan, W.-H., Hsu, H.-C., Chen, Y.-Y. & Wu, C.-H. Supplemental breast cancer-screening ultrasonography in women with dense breasts: a systematic review and meta-analysis. Br. J. Cancer 123, 673–688 (2020).
Yap, M. H. et al. Breast ultrasound region of interest detection and lesion localisation. Artif. Intell. Med. 107, 101880 (2020).
Pinder, S. E. Ductal carcinoma in situ (DCIS): pathological features, differential diagnosis, prognostic factors and specimen evaluation. Modern Pathol. 23, 8–13 (2010).
Ernster, V. L. & Barclay, J. Increases in ductal carcinoma in situ (DCIS) of the breast in relation to mammography: a dilemma. J. Natl Cancer Inst. Monogr. 1997, 151–156 (1997).
Winchester, D. P., Jeske, J. M. & Goldschmidt, R. A. The diagnosis and management of ductal carcinoma in-situ of the breast. CA Cancer J. Clin. 50, 184–200 (2000).
Watanabe, T. et al. Ultrasound image classification of ductal carcinoma in situ (DCIS) of the breast: analysis of 705 DCIS lesions. Ultrasound Med. Biol. 43, 918–925 (2017).
Kim, S. H. et al. Correlation of ultrasound findings with histology, tumor grade, and biological markers in breast cancer. Acta Oncol. 47, 1531–1538 (2008).
Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D. & Siddique, Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies 9, 52 (2021).
Frid-Adar, M. et al. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, 321–331 (2018).
Sagers, L. W. et al. Augmenting medical image classifiers with synthetic data from latent diffusion models. Preprint at https://arxiv.org/abs/2308.12453 (2023).
Kerlikowske, K., Smith-Bindman, R., Ljung, B.-M. & Grady, D. Evaluation of abnormal mammography results and palpable breast abnormalities. Ann. Intern. Med. 139, 274–284 (2003).
Weigel, M. T. & Dowsett, M. Current and emerging biomarkers in breast cancer: prognosis and prediction. Endocr. Relat. Cancer 17, 245–262 (2010).
Baxevanis, C. N., Fortis, S. P. & Perez, S. A. in Seminars in Cancer Biology Vol. 72 76–89 (Elsevier, 2021).
Liedtke, C. et al. Response to neoadjuvant therapy and long-term survival in patients with triple-negative breast cancer. J. Clin. Oncol. 26, 1275–1281 (2008).
Kennecke, H. et al. Metastatic behavior of breast cancer subtypes. J. Clin. Oncol. 28, 3271–3277 (2010).
Rao, R., Euhus, D., Mayo, H. G. & Balch, C. Axillary node interventions in breast cancer: a systematic review. JAMA 310, 1385–1394 (2013).
Cianfrocca, M. & Goldstein, L. J. Prognostic and predictive factors in early-stage breast cancer. Oncologist 9, 606–616 (2004).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Gatys, L. A., Ecker, A. S. & Bethge, M. Image style transfer using convolutional neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2414–2423 (IEEE, 2016).
Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE International Conference on Computer Vision 2223–2232 (IEEE, 2017).
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision 2961–2969 (IEEE, 2017).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Sun, X. & Xu, W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process. Lett. 21, 1389–1393 (2014).
Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. & Golani, I. Controlling the false discovery rate in behavior genetics research. Behav.Brain Res. 125, 279–284 (2001).
Glickman, M. E., Rao, S. R. & Schultz, M. R. False discovery rate control is a recommended alternative to Bonferroni-type adjustments in health studies. J. Clin. Epidemiol. 67, 850–857 (2014).
Hsu, W. et al. External validation of an ensemble model for automated mammography interpretation by artificial intelligence. JAMA Netw. Open 5, 2242343 (2022).
Park, S. H. et al. Methods for clinical evaluation of artificial intelligence algorithms for medical diagnosis. Radiology 306, 20–31 (2023).
Ahn, J. S. et al. Association of artificial intelligence-aided chest radiograph interpretation with reader performance and efficiency. JAMA Netw. Open 5, 2229289 (2022).
Twilt, J. J. et al. AI-assisted vs unassisted identification of prostate cancer in magnetic resonance images. JAMA Netw. Open 8, 2515672 (2025).
Gommers, J. J. et al. Influence of AI decision support on radiologists’ performance and visual search in screening mammography. Radiology 316, 243688 (2025).
Hunter, D. J. & Holmes, C. Where medical statistics meets artificial intelligence. N. Engl. J. Med. 389, 1211–1219 (2023).
Hwang, Y.-T. & Su, N.-C. Sample size determination for comparing accuracies between two diagnostic tests under a paired design. Biom. J. 64, 771–804 (2022).
Riesthuis, P., Otgaar, H. & Bücken, C. Ready to ROC? A tutorial on simulation-based power analyses for null hypothesis significance, minimum-effect, and equivalence testing for ROC curve analyses. Behav. Res. Methods 57, 1–20 (2025).
Jung, S.-H. A Dunnett-type test and its sample size calculation for comparing K ROC curves with a control. Diagnostics 14, 1813 (2024).
Gómez-Flores, W., Gregorio-Calas, M. J. & de Albuquerque Pereira, W. C. BUS-BRA: a breast ultrasound dataset for assessing computer-aided diagnosis systems. Med. Phys. 51, 3110–3123 (2024).
Li, Y. BUSI dataset with scanner device augmentation. figshare https://doi.org/10.6084/m9.figshare.30571916.v1 (2025).
Acknowledgements
We thank R. Li and J. Fan from Peking University for helpful suggestion and discussion. D.H. was supported by a National Science and Technology Major Project (2022ZD0160300) and the National Science Foundation of China (NSFC62376007). L.W. was supported by the National Science Foundation of China (NSFC92470123, NSFC62276005).
Author information
Authors and Affiliations
Contributions
H. Yu and L.W. conceived and designed the study. Z.N., B.T., Y. Luo and X.G. carried out data acquisition. Y. Li, H. Yu, Y. Luo, Z.N. and Q.W. carried out data processing and annotation. H. Yu developed the AI models. Y. Li carried out generated data cleaning. Y. Li developed the platform for reader study. N.Z., Z.N., W.Q., J.T., M.Z., X.G., J.H., L.H. and Y.W. participated in the reader study. H. Yu, H. Ye, S.H., D.H., Y. Li, N.Z., Z.N., D.W., Z.Z., Q.W., D.D., Q.Z., J.Z. and L.W. wrote and revised the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Fajin Dong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Overview of BUSGen pretraining and fine-tuning.
a, The denoising and noising process of diffusion model, where xT represents the noisy image and x0 is the original image. The model learns to denoise xt to obtain xt−1 iteratively. b, The architecture of BUSGen demonstrates the U-Net structure with incorporated condition embeddings and time embeddings at each layer. The structure includes multiple encoder blocks, a middle block, and decoder blocks, where each encoder block consists of ResBlocks and downsampling, the middle block contains ResBlocks and an AttnBlock, and each decoder block comprises ResBlocks and upsampling. c, Encoder, Middle, and Decoder Blocks. d, ResBlock and AttnBlock. e, Illustration of the LoRA fine-tuning principle, where pretrained weights W are factorized into low-rank matrices A and B, enabling efficient adaptation of the model.
Extended Data Fig. 2 Comparison of BUS-DM to state-of-the-art models.
The baselines (MedCLIP, PubMedCLIP and BiomedCLIP) were pretrained on large-scale public available image-text pairs.
Extended Data Fig. 3 Performance comparison with self-supervised learning baselines.
BUS-DM outperforms Baseline-CLIP, ViT-MoCo, and ViT-MAE across benign-malignant classification (a-d), molecular subtype classification (e), and ALN status classification (f) tasks.
Extended Data Fig. 4 DABIS score of real data and generated data.
DABIS score comparison. Real data shows higher shortcut learning (AUC 0.600) compared to BUSGen-generated data (AUC 0.493), indicating that generated data reduces data acquisition-induced bias.
Extended Data Fig. 5 Saliency maps of the ALN metastasis prediction task.
Saliency maps for ALN metastasis prediction. Heat maps (bottom row) highlight peritumoral regions most influential for predicting axillary lymph node metastasis from ultrasound images (top row).
Supplementary information
Supplementary Information (download PDF )
Supplementary Sections 1–9, Figs. 1–14, Tables 1–18 and References.
Source data
Source Data Fig. 2 (download XLSX )
Source data.
Source Data Fig. 4 (download XLSX )
Source data.
Source Data Fig. 5 (download XLSX )
Source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, H., Li, Y., Zhang, N. et al. A foundation generative model for breast ultrasound image analysis. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-026-01639-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41551-026-01639-1


