Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A generalist foundation model and database for open-world medical image segmentation

Abstract

Vision foundation models have demonstrated vast potential in achieving generalist medical segmentation capability, providing a versatile, task-agnostic solution through a single model. However, current generalist models involve simple pre-training on various medical data containing irrelevant information, often resulting in the negative transfer phenomenon and degenerated performance. Furthermore, the practical applicability of foundation models across diverse open-world scenarios, especially in out-of-distribution (OOD) settings, has not been extensively evaluated. Here we construct a publicly accessible database, MedSegDB, based on a tree-structured hierarchy and annotated from 129 public medical segmentation repositories and 5 in-house datasets. We further propose a Generalist Medical Segmentation model (MedSegX), a vision foundation model trained with a model-agnostic Contextual Mixture of Adapter Experts (ConMoAE) for open-world segmentation. We conduct a comprehensive evaluation of MedSegX across a range of medical segmentation tasks. Experimental results indicate that MedSegX achieves state-of-the-art performance across various modalities and organ systems in in-distribution (ID) settings. In OOD and real-world clinical settings, MedSegX consistently maintains its performance in both zero-shot and data-efficient generalization, outperforming other foundation models.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The tree-structured database MedSegDB and the development and evaluation of MedSegX.
Fig. 2: Performance on ID evaluation.
Fig. 3: Performance under cross-site shift setting.
Fig. 4: Performance under cross-task shift setting.
Fig. 5: Performance evaluation under real-world settings.
Fig. 6: Performance evaluation on 3D segmentation.

Similar content being viewed by others

Data availability

The main data supporting the results in this study are available within the paper and its Supplementary Information. The pre-training, ID validation and OOD validation datasets from our MedSegDB database are curated from open-source datasets and can be accessed via the weblinks provided in Supplementary Table 1. Among these, the data in MedSegDB that permit redistribution are available on HuggingFace60 (https://huggingface.co/datasets/medicalai/MedSegDB). Each dataset used in the study is also provided in Supplementary Table 1. The validation dataset from MedSegDB that consists of real-world data collected from hospitals cannot be fully released in a public repository due to privacy regulations. We have deposited a minimum dataset of de-identified real-world data on GitHub (https://github.com/MedSegX/MedSegX-code).

Code availability

The source code to train MedSegX and reproduce the results is available in GitHub at https://github.com/MedSegX/MedSegX-code (ref. 61).

References

  1. Wang, S. et al. Annotation-efficient deep learning for automatic medical image segmentation. Nat. Commun. 12, 5915 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Cao, K. et al. Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat. Med. 29, 3033–3043 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Mei, X. et al. Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nat. Med. 26, 1224–1228 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Wang, G. et al. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nat. Biomed. Eng. 5, 509–521 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Wang, S. et al. Mining whole-lung information by artificial intelligence for predicting EGFR genotype and targeted therapy response in lung cancer: a multicohort study. Lancet Digit. Health 4, e309–e319 (2022).

    Article  PubMed  CAS  Google Scholar 

  6. Zhang, K. et al. Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell 181, 1423–1433.e11 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Xu, Y. et al. Improving artificial intelligence pipeline for liver malignancy diagnosis using ultrasound images and video frames. Brief. Bioinform. 24, bbac569 (2023).

    Article  PubMed  Google Scholar 

  8. Hatamizadeh, A. et al. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. In International MICCAI Brainlesion Workshop (eds Crimi, A. et al.) 272–284 (Springer, 2021).

  9. Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).

    Article  PubMed  CAS  Google Scholar 

  10. Lee, H. H., Bao, S., Huo, Y. & Landman, B. A. 3D UX-Net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. In International Conference on Learning Representations 21891-21905 (ICLR, 2023).

  11. Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Navab, N. et al.) 234–241 (Springer, 2015).

  12. Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39, 1856–1867 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Zhang, J., Xie, Y., Xia, Y. & Shen, C. DoDNet: learning to segment multi-organ and tumors from multiple partially labeled datasets. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 1195–1204 (IEEE, 2021).

  14. Ye, Y., Xie, Y., Zhang, J., Chen, Z. & Xia, Y. UniSeg: a prompt-driven universal segmentation model as well as a strong representation learner. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Greenspan, H. et al.) 508–518 (Springer, 2023).

  15. Kirillov, A. et al. Segment anything. In Proc. IEEE/CVF International Conference on Computer Vision 4015–4026 (IEEE, 2023).

  16. Oquab, M. et al. DINOv2: learning robust visual features without supervision. Trans. Mach. Learn. Res. https://openreview.net/forum?id=a68SUt6zFt (2024).

  17. Ma, J. & Wang, B. Towards foundation models of biological image segmentation. Nat. Methods 20, 953–955 (2023).

    Article  PubMed  CAS  Google Scholar 

  18. Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Cheng, J. et al. SAM-med2D. Preprint at https://arxiv.org/abs/2308.16184 (2023).

  20. Zhili, L. et al. Task-customized self-supervised pre-training with scalable dynamic routing. In Proc. AAAI Conference on Artificial Intelligence 1854–1862 (AAAI, 2022).

  21. Wang, Z., Dai, Z., Póczos, B. & Carbonell, J. Characterizing and avoiding negative transfer. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 11293–11302 (IEEE, 2019).

  22. Senushkin, D., Patakin, N., Kuznetsov, A. & Konushin, A. Independent component alignment for multi-task learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 20083–20093 (IEEE, 2023).

  23. Langlotz, C. P. RadLex: a new method for indexing online educational materials. Radiographics 26, 1595–1597 (2006).

  24. Shazeer, N. et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations 878–896 (ICLR, 2017).

  25. Ma, J., Li, F. & Wang, B. U-Mamba: enhancing long-range dependency for biomedical image segmentation. Preprint at https://arxiv.org/abs/2401.04722 (2024).

  26. Li, T. et al. SrSNet: accurate segmentation of stroke lesions by a two-stage segmentation framework with asymmetry information. Expert Syst. Appl. 254, 124329 (2024).

    Article  Google Scholar 

  27. Leclerc, S. et al. Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Trans. Med. Imaging 38, 2198–2210 (2019).

    Article  PubMed  Google Scholar 

  28. Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).

    Article  PubMed  Google Scholar 

  29. Hupkes, D. et al. A taxonomy and review of generalization research in NLP. Nat. Mach. Intell. 5, 1161–1174 (2023).

    Article  Google Scholar 

  30. Shiraishi, J. et al. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. Am. J. Roentgenol. 174, 71–74 (2000).

    Article  CAS  Google Scholar 

  31. Stirenko, S. et al. Chest X-ray analysis of tuberculosis by deep learning with segmentation and augmentation. In International Conference on Electronics and Nanotechnology 422–428 (IEEE, 2018).

  32. Siegel, R. L., Giaquinto, A. N. & Jemal, A. Cancer statistics, 2024. CA Cancer J. Clin. 74, 12–49 (2024).

    PubMed  Google Scholar 

  33. Zhou, H.-Y. et al. nnFormer: volumetric medical image segmentation via a 3D transformer. IEEE Trans. Image Process. 32, 4036–4045 (2023).

    Article  PubMed  Google Scholar 

  34. Wang, N. et al. MISSU: 3D medical image segmentation via self-distilling TransUNet. IEEE Trans. Med. Imaging 42, 2740–2750 (2023).

    Article  PubMed  Google Scholar 

  35. Zhu, J., Qi, Y. & Wu, J. Medical SAM 2: segment medical images as video via Segment Anything Model 2. Preprint at https://arxiv.org/abs/2408.00874 (2024).

  36. Ma, J. et al. Segment anything in medical images and videos: benchmark and deployment. Preprint at https://arxiv.org/abs/2408.03322 (2024).

  37. Ravi, N. et al. SAM 2: segment anything in images and videos. In International Conference on Learning Representations 41175–41218 (ICLR, 2025).

  38. Huang, Y. et al. Segment anything model for medical images? Med. Image Anal. 92, 103061 (2024).

    Article  PubMed  Google Scholar 

  39. Ye, J. et al. SA-med2D-20M dataset: segment anything in 2D medical imaging with 20 million masks. Preprint at https://arxiv.org/abs/2311.11969 (2023).

  40. Ma, J. et al. AbdomenCT-1K: is abdominal organ segmentation a solved problem. IEEE Trans. Pattern Anal. Mach. Intell. 44, 6695–6714 (2021).

    Article  Google Scholar 

  41. Ji, Y. et al. AMOS: a large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Adv. Neural Inf. Process. Syst. 35, 36722–36732 (2022).

    Google Scholar 

  42. Bilic, P. et al. The liver tumor segmentation benchmark (lits). Med. Image Anal. 84, 102680 (2023).

    Article  PubMed  Google Scholar 

  43. Wasserthal, J. et al. TotalSegmentator: robust segmentation of 104 anatomic structures in CT images. Radiol. Artif. Intell. 5, e230024 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Pang, S. et al. SpineParseNet: spine parsing for volumetric MR image by a two-stage segmentation framework with semantic image representation. IEEE Trans. Med. Imaging 40, 262–273 (2020).

    Article  PubMed  Google Scholar 

  45. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  46. Krishna, R. et al. Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123, 32–73 (2017).

    Article  Google Scholar 

  47. Wu, J. et al. Chest ImaGenome dataset (version 1.0.0). PhysioNet https://doi.org/10.13026/wv01-y230 (2021).

  48. Fifty, C. et al. Efficiently identifying task groupings for multi-task learning. Adv. Neural Inf. Process. Syst. 34, 27503–27516 (2021).

    Google Scholar 

  49. Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26, 1045–1057 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Heller, N. et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: results of the KiTS19 challenge. Med. Image Anal. 67, 101821 (2021).

    Article  PubMed  Google Scholar 

  51. Heller, N. et al. The KiTS21 Challenge: automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT. Preprint at https://arxiv.org/abs/2307.01984 (2023).

  52. Agarap, A. F. Deep learning using rectified linear units (ReLU). Preprint at https://arxiv.org/abs/1803.08375 (2018).

  53. Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 1290–1299 (IEEE, 2022).

  54. Milletari, F., Navab, N. & Ahmadi, S.-A. V-Net: fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision (3DV) 565–571 (IEEE, 2016).

  55. Chen, C. et al. MA-SAM: modality-agnostic SAM adaptation for 3D medical image segmentation. Med. Image Anal. 98, 103310 (2024).

    Article  PubMed  Google Scholar 

  56. Zhai, X. et al. A large-scale study of representation learning with the visual task adaptation benchmark. Preprint at https://arxiv.org/abs/1910.04867 (2019).

  57. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations 611–631 (ICLR, 2021).

  58. Zhang, H., Su, Y., Xu, X. & Jia, K. Improving the generalization of segmentation foundation model under distribution shift via weakly supervised adaptation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 23385–23395 (IEEE, 2024).

  59. Peng, Z. et al. Parameter efficient fine-tuning via cross block orchestration for Segment Anything Model. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 3743–3752 (IEEE, 2024).

  60. Zhang, S. MedSegDB. HuggingFace https://huggingface.co/datasets/medicalai/MedSegDB (2025).

  61. Zhang, S. MedSegX. GitHub https://github.com/MedSegX/MedSegX-code (2025).

Download references

Acknowledgements

G.W. acknowledges funding from the National Natural Science Foundation of China (grants T2522008 and 62272055), New Cornerstone Science Foundation through the XPLORER PRIZE, and the National Key Research and Development Program of China (2024YFC3044700). Shanghang Zhang was supported by the National Science and Technology Major Project (No. 2022ZD0117800). S.W. was supported by the National Natural Science Foundation of China (62425112, 92359202 and 12326610).

Author information

Authors and Affiliations

Authors

Contributions

G.W., X.L., Siqi Zhang, Q.Z., J. Yue, H.X., J. Yao and Y.W. collected and analysed the data. M.L., Shanghang Zhang and G.W. conceived and supervised the project. G.W., Siqi Zhang, X.L., Q.Z., Shanghang Zhang, J. Yue and M.L. contributed to data interpretation, method design, and critical review and wrote the manuscript. All authors discussed the results and reviewed the manuscript.

Corresponding authors

Correspondence to Shanghang Zhang or Guangyu Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Namkug Kim, Ongen Liao and Dong Ni for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Detailed description of MedSegDB.

It consists of 5 anatomical subtrees, including the body parts of the head and neck, torso, skeleton, blood vessel, and skin, covering 39 major organs and tissues, and 111 segmentation tasks. MedSegDB incorporates 10 medical imaging modalities including the computed tomography (CT), magnetic resonance imaging (MRI), X-ray, colonoscopy, dermoscopy, CT angiography (CTA), cone beam CT (CBCT), ultrasound (US), fundus, and endoscopy.

Extended Data Fig. 2 Detailed data processing for development and evaluation of MedSegDB.

a. Flow chart describing the datasets collected for the development and evaluation of ID and OOD sets. b. Details of the real-world datasets.

Extended Data Fig. 3 Detailed performance of MedSegX and other competitors on tasks related to 5 body parts in ID test set.

Comparison of 5 models on tasks in the parts of the a. head and neck (n = 67,746), b. torso (n = 205,066), c. skeleton (n = 45,981) d. blood vessel (n = 8,121), and e. skin (n = 818). P-values are calculated with two-sided t-test. Bar graphs indicate the mean ± 95% CI. The dashed horizontal line represents the Dice score achieved by MedSegX.

Extended Data Fig. 4 Generalization performance evaluation of MedSegX and other competitors on 18 cross-site tasks with different proportions of OOD fine-tuning data (n = 5,801).

a. 15% fine-tuning data. b. 25% fine-tuning data. c. 50% fine-tuning data. d.100% fine-tuning data. Tasks consist of tooth (X-ray), gallbladder (CT), left kidney (CT), right kidney (CT), left lung (CT), right lung (CT), liver (CT), pancreas (CT), stomach (CT), optic cup (Fundus), optic disc (Fundus), left atrium (MRI), prostate (MRI), right ventricle (MRI), left ventricle (MRI), spleen (MRI), left lung (X-ray), and right lung (X-ray) segmentation. In all box plots, each box shows the quartiles of the distribution, with center as the median, minimum as the first quartile, and maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2× interquartile range (IQR) from the nearest quartile.

Extended Data Fig. 5 Data-efficient performance evaluation of MedSegX and SAM-Med2D on 18 cross-site tasks.

95% CI intervals are shown by the shaded area.

Extended Data Fig. 6 Generalization performance evaluation of MedSegX and competitors on seven cross-target tasks with different proportions of OOD fine-tuning data (n = 4,095).

a. 15% fine-tuning data. b. 25% fine-tuning data. c. 50% fine-tuning data. d.100% fine-tuning data. Tasks consist of the colon tumor (CT), kidney tumor (CT), liver tumor (CT), lung tumor (CT), pancreas tumor (CT), prostate tumor (MRI), and vestibular schwannoma segmentation (MRI). In all box plots, each box shows the quartiles of the distribution, with center as the median, minimum as the first quartile, and maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2× interquartile range (IQR) from the nearest quartile.

Extended Data Fig. 7 Data-efficient performance evaluation of MedSegX and MedSAM on 7 unseen categories of body tumors.

95% CI intervals are shown by the shaded area.

Extended Data Fig. 8 The models’ average performance under real-world setting (n = 950).

a. Average zero-shot performance evaluation of 6 models on 5 real-world datasets. b. Average fine-tuning performance evaluation of 6 models on 5 real-world datasets. P-values are calculated with two-sided t-test. Bar graphs indicate the mean ± 95% CI. The dashed horizontal line represents the Dice score achieved by MedSegX.

Extended Data Fig. 9 Visualization of 8 segmentation examples in OOD test sets.

T1: Left Ventricular Myocardium, T2: Teeth, T3: Colon Tumor, T4: Kidney Tumor, T5: Liver Tumor, T6: Lung Tumor, T7: Pancreas Tumor, T8: Prostate Tumor.

Extended Data Fig. 10 Visualization analysis for the ablation experiments.

8 segmentation examples from both ID and OOD test sets are incorporated (ID tasks: T1 ~ T4. OOD tasks: T5 ~ T8): T1: Left Ventricular Myocardium, T2: Optic Disc, T3: Liver, T4: Brain Core Tumor, T5: Breast Cancer, T6: Left Ventricle Epicardium, T7: Left Lung, T8: Stomach, T9: Kidney Tumor, T10: Lung Tumor.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Zhang, Q., Zhang, S. et al. A generalist foundation model and database for open-world medical image segmentation. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01497-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41551-025-01497-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing