A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Zhao, Theodore; Gu, Yu; Yang, Jianwei; Usuyama, Naoto; Lee, Ho Hin; Kiblawi, Sid; Naumann, Tristan; Gao, Jianfeng; Crabtree, Angela; Abel, Jacob; Moung-Wen, Christine; Piening, Brian; Bifulco, Carlo; Wei, Mu; Poon, Hoifung; Wang, Sheng

doi:10.1038/s41592-024-02499-w

Article
Published: 18 November 2024

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Nature Methods volume 22, pages 166–176 (2025)Cite this article

20k Accesses
49 Citations
232 Altmetric
Metrics details

Subjects

Abstract

Biomedical image analysis is fundamental for biomedical discovery. Holistic image analysis comprises interdependent subtasks such as segmentation, detection and recognition, which are tackled separately by traditional approaches. Here, we propose BiomedParse, a biomedical foundation model that can jointly conduct segmentation, detection and recognition across nine imaging modalities. This joint learning improves the accuracy for individual tasks and enables new applications such as segmenting all relevant objects in an image through a textual description. To train BiomedParse, we created a large dataset comprising over 6 million triples of image, segmentation mask and textual description by leveraging natural language labels or descriptions accompanying existing datasets. We showed that BiomedParse outperformed existing methods on image segmentation across nine imaging modalities, with larger improvement on objects with irregular shapes. We further showed that BiomedParse can simultaneously segment and label all objects in an image. In summary, BiomedParse is an all-in-one tool for biomedical image analysis on all major image modalities, paving the path for efficient and accurate image-based biomedical discovery.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of BiomedParse and BiomedParseData.**

**Fig. 2: Comparison on large-scale biomedical image segmentation datasets.**

**Fig. 3: Evaluation on detecting irregular-shaped objects.**

**Fig. 4: Evaluation on object recognition.**

**Fig. 5: Evaluation of BiomedParse on real-world cell segmentation examples.**

Introducing Biomedisa as an open-source online platform for biomedical image segmentation

Article Open access 04 November 2020

A Modified Deep Semantic Segmentation Model for Analysis of Whole Slide Skin Images

Article Open access 08 October 2024

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

Article 07 December 2020

Data availability

BiomedParseData can be accessed at https://aka.ms/biomedparse-release. The three real-world pathology images, including the annotations by pathologists and BiomedParse, can be accessed at https://aka.ms/biomedparse-release.

Code availability

BiomedParse can be accessed at https://aka.ms/biomedparse-release, including the model weights and relevant source code. We include detailed methods and implementation steps in the Methods to allow for independent replication.

References

Royer, L. A. The future of bioimage analysis: a dialog between mind and machine. Nat. Methods 20, 951–952 (2023).
Article CAS PubMed Google Scholar
Li, X., Zhang, Y., Wu, J. & Dai, Q. Challenges and opportunities in bioimage analysis. Nat. Methods 20, 958–961 (2023).
Article CAS PubMed Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630,181–188 (2024).
Liu, Z. et al. OCTCube: a 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis. Preprint at https://www.arxiv.org/abs/2408.11227 (2024).
Wang, R. et al. Medical image segmentation using deep learning: a survey. IET Image Process. 16, 1243–1267 (2022).
Article Google Scholar
Salpea, N., Tzouveli, P. & Kollias, D. Medical image segmentation: a review of modern architectures. In European Conference on Computer Vision 691–708 (Springer, 2022).
Ribli, D., Horváth, A., Unger, Z., Pollner, P. & Csabai, I. Detecting and classifying lesions in mammograms with deep learning. Sci. Rep. 8, 4165 (2018).
Article PubMed PubMed Central Google Scholar
Ma, W., Lu, J. & Wu, H. Cellcano: supervised cell type identification for single cell atac-seq data. Nat. Commun. 14, 1864 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jiang, H. et al. A review of deep learning-based multiple-lesion recognition from medical images: classification, detection and segmentation. Comput. Biol. Med. 157, 106726 (2023).
Article PubMed Google Scholar
Kirillov, A. et al. Segment anything. In Proc. of the IEEE/CVF International Conference on Computer Vision 4015–4026 (IEEE, 2023).
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).
Article CAS PubMed PubMed Central Google Scholar
Tu, Z., Chen, X., Yuille, A. L. & Zhu, S.-C. Image parsing: Unifying segmentation, detection, and recognition. Int. J. Comput. Vis. 63, 113–140 (2005).
Article Google Scholar
Tighe, J. & Lazebnik, S. Superparsing: scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101, 329–349 (2013).
Article Google Scholar
Zhou, S. K. Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches (Academic Press, 2015).
Gamper, J. et al. PanNuke dataset extension, insights and baselines. Preprint at https://arxiv.org/abs/2003.10778 (2020).
Ji, Y. et al. Amos: a large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Adv. Neural Inf. Process. Syst. 35, 36722–36732 (2022).
Google Scholar
Bernard, O. et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37, 2514–2525 (2018).
Article PubMed Google Scholar
Lee, H. H. et al. Foundation models for biomedical image segmentation: a survey. Preprint at https://arxiv.org/abs/2401.07654 (2024).
Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. Preprint at https://arxiv.org/abs/2303.05499 (2023).
Zou, X. et al. Segment everything everywhere all at once. In Proc. 37th Int. Conference on Neural Information Processing Systems 19769–19782 (Curran Associates, 2024).
Yang, J., Li, C., Dai, X. & Gao, J. Focal modulation networks. Adv. Neural Inf. Process. Syst. 35, 4203–4217 (2022).
Google Scholar
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2021).
Article CAS Google Scholar
Sirinukunwattana, K., Snead, D. R. J. & Rajpoot, N. M. A stochastic polygons model for glandular structures in colon histology images. IEEE Trans. Med. Imaging 34, 2366–2378 (2015).
Article PubMed Google Scholar
Du, Y., Bai, F., Huang, T. & Zhao, B. Segvol: universal and interactive volumetric medical image segmentation. Preprint at https://arxiv.org/abs/2311.13385 (2023).
Zhao, Z. et al. One model to rule them all: towards universal segmentation for medical images with text prompts. Preprint at https://arxiv.org/abs/2312.17183 (2023).
Hörst, F. et al. Cellvit: vision transformers for precise cell segmentation and classification. Med. Image Anal. 94, 103143 (2024).
Article PubMed Google Scholar
Hatamizadeh, A. et al. Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In Int. MICCAI Brain Lesion Workshop 272–284 (Springer, 2022).
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Article CAS PubMed Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal.Mach. Intell. 40, 834–848 (2017).
Article PubMed Google Scholar
Butoi, V. I. et al. Universeg: universal medical image segmentation. In Proc. IEEE/CVF International Conference on Computer Vision 21438–21451 (ICCV, 2023).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th Int. Conf. Proc. Part III 234–241 (Springer, 2015).
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Int. Conf. Medical Image Computing and Computer-assisted Intervention 424–432 (Springer, 2016).
Milletari, F., Navab, N. & Ahmadi, S.-A. V-Net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 4th Int. Conf. 3D vision (3DV) 565–571 (IEEE, 2016).
Li, X. et al. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37, 2663–2674 (2018).
Article PubMed Google Scholar
Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. UNet++: redesigning Skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39, 1856–1867 (2019).
Article PubMed PubMed Central Google Scholar
Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization. In Int. MICCAI Brain Lesion Workshop 311–320 (Springer, 2018).
Lee, H. H., Bao, S., Huo, Y. & Landman, B. A. 3D UX-Net: a large kernel volumetric ConvNet modernizing hierarchical transformer for medical image segmentation. In The Eleventh International Conference on Learning Representations https://iclr.cc/media/iclr-2023/Slides/11340.pdf (ICLR, 2023).
Lee, H. H. et al. Scaling up 3D kernels with bayesian frequency re-parameterization for medical image segmentation. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 632–641 (Springer, 2023).
Chen, J. et al. TransUNet: transformers make strong encoders for medical image segmentation. Preprint at https://arxiv.org/abs/2102.04306 (2021).
Xu, G., Zhang, X., He, X. & Wu, X. LeViT-UNet: make faster encoders with transformer for medical image segmentation. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 42–53 (Springer, 2023).
Xie, Y., Zhang, J., Shen, C. & Xia, Y. Cotr: efficiently bridging CNN and transformer for 3D medical image segmentation. In Int. Conf. Medical Image Computing And Computer-assisted Intervention 171–180 (Springer, 2021).
Wang, W. et al. TransBTS: multimodal brain tumor segmentation using transformer. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 109–119 (Springer, 2021).
Hatamizadeh, A. et al. UNETR: transformers for 3D medical image segmentation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 574–584 (2022).
Zhou, H.-Y. et al. nnformer: Volumetric medical image segmentation via a 3d transformer. IEEE Trans. Image Process. 32, 4036–4045 (2023).
Article PubMed Google Scholar
Cao, H. et al. Swin-UNet: UNet-like pure transformer for medical image segmentation. In European Conference on Computer Vision 205–218 (Springer, 2022).
Zhang, S. et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. Preprint at https://arxiv.org/abs/2303.00915 (2023).
Chaves, J. M. Z. et al. Training small multimodal models to bridge biomedical competency gap: a case study in radiology imaging. Preprint at https://arxiv.org/html/2403.08002v2 (2024).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intel. https://doi.org/10.1109/TPAMI.2016.2577031 (2017).
Bochkovskiy, A., Wang, C.-Y. & Liao, H.-Y. M. Yolov4: optimal speed and accuracy of object detection. Preprint at https://arxiv.org/abs/2004.10934 (2020).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Article PubMed Google Scholar
Wong, H. E., Rakic, M., Guttag, J. & Dalca, A. V. Scribbleprompt: fast and flexible interactive segmentation for any medical image. Preprint at https://arxiv.org/html/2312.07381v2 (2024).
Shaharabany, T., Dahan, A., Giryes, R. & Wolf, L. AutoSAM: adapting SAM to medical images by overloading the prompt encoder. Preprint at https://arxiv.org/abs/2306.06370 (2023).
Lei, W., Wei, X., Zhang, X., Li, K. & Zhang, S. MedLSAM: localize and segment anything model for 3D medical images. Preprint at https://arxiv.org/abs/2306.14752 (2023).
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Article CAS PubMed Google Scholar
Greenwald, N. F. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 40, 555–565 (2022).
Article CAS PubMed Google Scholar
Ma, J. & Wang, B. Towards foundation models of biological image segmentation. Nat. Methods 20, 953–955 (2023).
Article CAS PubMed Google Scholar
Girshick, R. Fast r-cnn. In Proc. IEEE Int. Conf. on Computer Vision 1440–1448 (IEEE, 2015).
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proc. IEEE Int. Conf. On Computer Vision 2961–2969 (IEEE, 2017).
Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st Int. Conf. Proc. Part II 265–273 (Springer, 2018).
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Article PubMed Google Scholar
Yang, H. et al. CircleNet: anchor-free glomerulus detection with circle representation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd Int. Conf. Proc. Part IV 35–44 (Springer, 2020).
Nguyen, E. H. et al. CircleSnake: instance segmentation with circle representation. In Int. Workshop on Machine Learning in Medical Imaging 298–306 (Springer, 2022).
Ilyas, T. et al. Tsfd-net: tissue specific feature distillation network for nuclei segmentation and classification. Neural Netw. 151, 1–15 (2022).
Article PubMed Google Scholar
OHDSI. Athena standardized vocabularies. https://www.ohdsi.org/analytic-tools/athena-standardized-vocabularies/
Gu, Y. et al. BiomedJourney: counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. Preprint at https://arxiv.org/abs/2310.10765 (2023).
Li, C. et al. Llava-med: training a large language-and-vision assistant for biomedicine in one day. In 37th Conference on Neural Information Processing Systems https://proceedings.neurips.cc/paper_files/paper/2023/file/5abcdf8ecdcacba028c6662789194572-Paper-Datasets_and_Benchmarks.pdf (NeurIPS, 2024).
Gu, Y., Zhang, S., Usuyama, N. et al. Distilling large language models for biomedical knowledge extraction: a case study on adverse drug events. Preprint at https://arxiv.org/abs/2307.06439 (2023).
Zou, X. et al. Generalized decoding for pixel, image, and language. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15116–15127 (IEEE, 2023).
Ren, T. et al. Grounded SAM: assembling open-world models for diverse visual tasks. Preprint at https://arxiv.org/abs/2401.14159 (2024).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (2018).
Kazerooni, A. F. et al. The brain tumor segmentation (BraTS) challenge 2023: focus on pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs). Preprint at https://arxiv.org/abs/2305.17033 (2023).
Lee, P., Goldberg, C. & Kohane, I. The AI Revolution in Medicine: GPT-4 and Beyond (Pearson, 2023).
Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Massey Jr, F. J. The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46, 68–78 (1951).
Article Google Scholar
Canny, J. A computational approach to edge detection. In IEEE Transactions on Pattern Analysis and Machine Intelligence 679–698 (IEEE, 1986).
Viola, P. & Jones, M. Rapid object detection using a boosted cascade of simple features. In Proc. 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, I–I (IEEE, 2001).
Girshick, R., Donahue, J., Darrell, T. & Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 580–587 (2014).

Download references

Acknowledgements

The authors thank the Microsoft Health and Life Sciences Research team and the Microsoft Health Futures team for support and helpful discussions.

Author information

These authors contributed equally: Theodore Zhao, Yu Gu.

Authors and Affiliations

Microsoft Research, Redmond, WA, USA
Theodore Zhao, Yu Gu, Jianwei Yang, Naoto Usuyama, Ho Hin Lee, Sid Kiblawi, Tristan Naumann, Jianfeng Gao, Mu Wei & Hoifung Poon
Providence Genomics, Portland, OR, USA
Jacob Abel, Christine Moung-Wen, Brian Piening & Carlo Bifulco
Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA
Angela Crabtree, Brian Piening & Carlo Bifulco
Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
Sheng Wang
Department of Surgery, University of Washington, Seattle, WA, USA
Sheng Wang

Authors

Theodore Zhao
View author publications
Search author on:PubMed Google Scholar
Yu Gu
View author publications
Search author on:PubMed Google Scholar
Jianwei Yang
View author publications
Search author on:PubMed Google Scholar
Naoto Usuyama
View author publications
Search author on:PubMed Google Scholar
Ho Hin Lee
View author publications
Search author on:PubMed Google Scholar
Sid Kiblawi
View author publications
Search author on:PubMed Google Scholar
Tristan Naumann
View author publications
Search author on:PubMed Google Scholar
Jianfeng Gao
View author publications
Search author on:PubMed Google Scholar
Angela Crabtree
View author publications
Search author on:PubMed Google Scholar
Jacob Abel
View author publications
Search author on:PubMed Google Scholar
Christine Moung-Wen
View author publications
Search author on:PubMed Google Scholar
Brian Piening
View author publications
Search author on:PubMed Google Scholar
Carlo Bifulco
View author publications
Search author on:PubMed Google Scholar
Mu Wei
View author publications
Search author on:PubMed Google Scholar
Hoifung Poon
View author publications
Search author on:PubMed Google Scholar
Sheng Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

T.Z., Y.G., J.Y., N.U., M.W., H.P. and S.W. contributed to the conception and design of the work. T.Z. contributed to the data acquisition and curation of BiomedParseData. T.Z., J.Y., N.U. and M.W. contributed to BiomedParse model training. Y.G., T.Z., H.L., N.U. and S.K. contributed to the evaluation of BiomedParse and baseline models. T.N. and J.G. contributed to the technical discussions. A.C., J.A., C.M., B.P. and C.B. provided clinical inputs to the study. All authors contributed to the drafting and revision of the manuscript.

Corresponding authors

Correspondence to Mu Wei, Hoifung Poon or Sheng Wang.

Ethics declarations

Competing interests

C.B. is a member of the scientific advisory board and owns stock in PrimeVax and BioAI; is on the scientific board of Lunaphore and SironaDx; has a consultant or advisory relationship with Sanofi, Agilent, Roche and Incendia; contributes to institutional research for Illumina, and is an inventor on US patent applications US20180322632A1 (Image Processing Systems and Methods for Displaying Multiple Images of a Biological Specimen) filed by Ventana Medical Systems, Providence Health and Services Oregon and US20200388033A1 (System and Method for Automatic Labeling of Pathology Images) filed by Providence Health and Services Oregon, Omics Data Automation. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Stefania Moroianu, Dong Ni, Yichi Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Number of images in each of the 25 anatomic sites from 9 modalities.

One anatomic site could present in multiple modalities.

Extended Data Fig. 2 Ablation studies comparing the performance of BiomedParse and two variants.

BiomedParse-SAM stands for using SAM to initialize the image encoder. BiomedParse-PubmedBERT stands for using the frozen PubmedBERT as the text encoder. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. N in the plot denotes the number of images in the corresponding modality. The numbers of object types in each modality are as follows: N = 112 for All, N = 27 for CT, N = 34 for MRI, N = 12 for X-Ray, N = 24 for Pathology, N = 7 for Ultrasound, N = 2 for Fundus, N = 3 for Endoscope, N = 2 for Dermoscopy, and N = 1 for OCT. Each box shows the quartiles of the distribution, with the center as the median, the minimum as the first quartile, and the maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2 times the inter-quartile range (IQR) from the nearest quartile. Data points that lie outside the whiskers are shown as fliers. *indicates the significance level at which BiomedParse outperforms BiomedParse-PubmedBERT, with two-sided paired t-test p-value < 1 × 10^-2 for **, p-value < 1 × 10^-3 for ***, p-value < 1 × 10^-4 for ****. Exact p-values for the comparison between BiomedParse and BiomedParse-PubMedBERT are as follows: p-value < 9.52 × 10^-10 for All, p-value < 1.67 × 10^-3 for CT, p-value < 4.87 × 10^-4 for MRI, p-value < 1.98 × 10^-4 for Pathology, and p-value < 7.13 × 10^-3 for Ultrasound.

Extended Data Fig. 3 Evaluating BiomedParse and competing methods in terms of Average Symmetric Surface Distance.

Box plot comparing the performance of BiomedParse and competing methods in terms of Average Symmetric Surface Distance (ASSD). Smaller ASSD indicates better segmentation performance. Each box shows the quartiles of the distribution, with center as the median, minimum as the first quartile, and maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2 times the inter-quartile range (IQR) from the nearest quartile. Data points that lie outside the whiskers are shown as fliers. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. The numbers of object types in each modality are as follows: n = 112 for All, n = 27 for CT, n = 34 for MRI, n = 12 for X-Ray, n = 24 for Pathology, n = 7 for Ultrasound, n = 2 for Fundus, n = 3 for Endoscope, n = 2 for Dermoscopy, and n = 1 for OCT. *indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10^-2 for **, p-value < 1 × 10^-3 for ***, p-value < 1 × 10^-4 for ****. Exact p-values for the comparison between BiomedParse and MedSAM with oracle box prompt are as follows: p-value < 3.43 × 10^-6 for All, p-value < 2.61 × 10^-3 for CT, p-value < 7.73 × 10^-5 for MRI, and p-value < 2.94 × 10^-8 for Pathology.

Extended Data Fig. 4 Comparing BiomedParse with biomedical-specific text prompt segmentation models.

Bar plot comparing BiomedParse with biomedical-specific text prompt segmentation models across different organs on CT in terms of Dice score. Each bar shows the mean of the distribution, with error bar indicating the 95% confidence interval. The sample sizes for the target organs are as follows: n = 27,779 for All, n = 4,409 for Aorta, n = 864 for Bladder, n = 1,677 for Duodenum, n = 1,964 for Esophagus, n = 712 for Gallbladder, n = 4,105 for Inferior vena cava, n = 635 for Left adrenal gland, n = 1,776 for Left kidney, n = 4,648 for Liver, n = 1,345 for Pancreas, n = 571 for Right adrenal gland, n = 1,649 for Right kidney, n = 1,587 for Spleen, and n = 1,837 for Stomach. *indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10^-2 for **, p-value < 1 × 10^-3 for ***, p-value < 1 × 10^-4 for ****. Exact p-values for the comparison between BiomedParse and SegVol are as follows: p-value < 2.23 × 10^-308 for All, p-value < 1.86 × 10^-58 for Aorta, p-value < 1.73 × 10^-7 for Bladder, p-value < 3.44 × 10^-86 for Duodenum, p-value < 5.00 × 10^-185 for Esophagus, p-value < 3.37 × 10^-15 for Gallbladder, p-value < 6.28 × 10^-99 for Inferior vena cava, p-value < 5.08 × 10^-10 for Left adrenal gland, p-value < 9.26 × 10^-31 for Left kidney, p-value < 3.31 × 10^-37 for Liver, p-value < 2.27 × 10^-56 for Pancreas, p-value < 1.01 × 10^-16 for Right adrenal gland, p-value < 2.98 × 10^-20 for Right kidney, p-value < 1.09 × 10^-20 for Spleen, and p-value < 4.68 × 10^-25 for Stomach.

Extended Data Fig. 5 Comparing BiomedParse with fine-tuned SAM and MedSAM.

Bar plot comparing BiomedParse and SAM and MedSAM when SAM and MedSAM are both further trained on the entire BiomedParseData. Both SAM and MedSAM were provided with oracle bounding box around the segmentation target during the training and the inference stage. Each bar shows the mean of the distribution, with error bar indicating the 95% confidence interval. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. We show the numbers of object types in each modality are as follows. The numbers of object types in each modality are as follows: n = 105 for All, n = 26 for CT, n = 34 for MRI, n = 6 for X-Ray, n = 24 for Pathology, n = 7 for Ultrasound, n = 2 for Fundus, n = 3 for Endoscope, n = 2 for Dermoscopy, and n = 1 for OCT. *indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10^-2 for **, p-value < 1 × 10^-3 for ***, p-value < 1 × 10^-4 for ****. Exact p-values for the comparison between BiomedParse and SAM-FT with oracle box prompt are as follows: p-value < 1.78 × 10^-7 for All, p-value < 2.02 × 10^-2 for CT, p-value < 1.32 × 10^-2 for X-Ray, p-value < 3.52 × 10^-8 for Pathology, and p-value < 1.49 × 10^-2 for Ultrasound.

Extended Data Fig. 6 Comparison between BiomedParse and competing methods on the MedSAM benchmark.

We evaluated MedSAM and SAM using the ground truth bounding box for the segmentation.For nnU-Net and DeepLabV3+, we reported the evaluation reported by MedSAM. Results are shown by imaging modality, with statistical significance comparison between BiomedParse and best-competing method MedSAM. Each box shows the quartiles of the distribution, with center as the median, minimum as the first quartile, and maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2 times the inter-quartile range (IQR) from the nearest quartile. Data points that lie outside the whiskers are shown as fliers. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. The numbers of object types in each modality are as follows: n = 50 for All, n = 18 for CT, n = 15 for MRI, n = 6 for X-Ray, n = 1 for Pathology, n = 6 for Ultrasound, n = 2 for Fundus, n = 1 for Endoscope, and n = 1 for Dermoscopy. * indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10^-2 for **, p-value < 1 × 10^-3 for ***, p-value < 1 × 10^-4 for ****. Exact p-values for the comparison between BiomedParse and MedSAM with oracle box prompt are as follows: p-value < 2.98 × 10^-3 for All, p-value < 7.08 × 10^-3 for CT, and p-value < 4.35 × 10^-2 for MRI.

Extended Data Fig. 7 Comparing the improvement of BiomedParse over SAM with shape irregularity.

Scatter plots comparing the improvement of BiomedParse over SAM with shape irregularity in terms of box ratio (left), convex ratio (middle), and inversed rotational inertia (right). Each dot represents the mean statistics over one object type in our segmentation ontology. We show the regression plot with the 95 confidence interval as the error bands. The p-values show the two-sided Wald test results.

Supplementary information

Supplementary Information

Supplementary Table 1, Figs. 1–11 and References.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, T., Gu, Y., Yang, J. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods 22, 166–176 (2025). https://doi.org/10.1038/s41592-024-02499-w

Download citation

Received: 21 May 2024
Accepted: 02 October 2024
Published: 18 November 2024
Issue date: January 2025
DOI: https://doi.org/10.1038/s41592-024-02499-w

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Subjects

Abstract

Access options

Similar content being viewed by others

Introducing Biomedisa as an open-source online platform for biomedical image segmentation

A Modified Deep Semantic Segmentation Model for Analysis of Whole Slide Skin Images

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 Number of images in each of the 25 anatomic sites from 9 modalities.

Extended Data Fig. 2 Ablation studies comparing the performance of BiomedParse and two variants.

Extended Data Fig. 3 Evaluating BiomedParse and competing methods in terms of Average Symmetric Surface Distance.

Extended Data Fig. 4 Comparing BiomedParse with biomedical-specific text prompt segmentation models.

Extended Data Fig. 5 Comparing BiomedParse with fine-tuned SAM and MedSAM.

Extended Data Fig. 6 Comparison between BiomedParse and competing methods on the MedSAM benchmark.

Extended Data Fig. 7 Comparing the improvement of BiomedParse over SAM with shape irregularity.

Supplementary information

Supplementary Information

Reporting Summary

Peer Review File

Rights and permissions

About this article

Cite this article

A foundation model unlocks unified biomedical image analysis

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links