Abstract
Biomedical image analysis is fundamental for biomedical discovery. Holistic image analysis comprises interdependent subtasks such as segmentation, detection and recognition, which are tackled separately by traditional approaches. Here, we propose BiomedParse, a biomedical foundation model that can jointly conduct segmentation, detection and recognition across nine imaging modalities. This joint learning improves the accuracy for individual tasks and enables new applications such as segmenting all relevant objects in an image through a textual description. To train BiomedParse, we created a large dataset comprising over 6 million triples of image, segmentation mask and textual description by leveraging natural language labels or descriptions accompanying existing datasets. We showed that BiomedParse outperformed existing methods on image segmentation across nine imaging modalities, with larger improvement on objects with irregular shapes. We further showed that BiomedParse can simultaneously segment and label all objects in an image. In summary, BiomedParse is an all-in-one tool for biomedical image analysis on all major image modalities, paving the path for efficient and accurate image-based biomedical discovery.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
BiomedParseData can be accessed at https://aka.ms/biomedparse-release. The three real-world pathology images, including the annotations by pathologists and BiomedParse, can be accessed at https://aka.ms/biomedparse-release.
Code availability
BiomedParse can be accessed at https://aka.ms/biomedparse-release, including the model weights and relevant source code. We include detailed methods and implementation steps in the Methods to allow for independent replication.
References
Royer, L. A. The future of bioimage analysis: a dialog between mind and machine. Nat. Methods 20, 951–952 (2023).
Li, X., Zhang, Y., Wu, J. & Dai, Q. Challenges and opportunities in bioimage analysis. Nat. Methods 20, 958–961 (2023).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630,181–188 (2024).
Liu, Z. et al. OCTCube: a 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis. Preprint at https://www.arxiv.org/abs/2408.11227 (2024).
Wang, R. et al. Medical image segmentation using deep learning: a survey. IET Image Process. 16, 1243–1267 (2022).
Salpea, N., Tzouveli, P. & Kollias, D. Medical image segmentation: a review of modern architectures. In European Conference on Computer Vision 691–708 (Springer, 2022).
Ribli, D., Horváth, A., Unger, Z., Pollner, P. & Csabai, I. Detecting and classifying lesions in mammograms with deep learning. Sci. Rep. 8, 4165 (2018).
Ma, W., Lu, J. & Wu, H. Cellcano: supervised cell type identification for single cell atac-seq data. Nat. Commun. 14, 1864 (2023).
Jiang, H. et al. A review of deep learning-based multiple-lesion recognition from medical images: classification, detection and segmentation. Comput. Biol. Med. 157, 106726 (2023).
Kirillov, A. et al. Segment anything. In Proc. of the IEEE/CVF International Conference on Computer Vision 4015–4026 (IEEE, 2023).
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).
Tu, Z., Chen, X., Yuille, A. L. & Zhu, S.-C. Image parsing: Unifying segmentation, detection, and recognition. Int. J. Comput. Vis. 63, 113–140 (2005).
Tighe, J. & Lazebnik, S. Superparsing: scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101, 329–349 (2013).
Zhou, S. K. Medical Image Recognition, Segmentation and Parsing: Machine Learning and Multiple Object Approaches (Academic Press, 2015).
Gamper, J. et al. PanNuke dataset extension, insights and baselines. Preprint at https://arxiv.org/abs/2003.10778 (2020).
Ji, Y. et al. Amos: a large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Adv. Neural Inf. Process. Syst. 35, 36722–36732 (2022).
Bernard, O. et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37, 2514–2525 (2018).
Lee, H. H. et al. Foundation models for biomedical image segmentation: a survey. Preprint at https://arxiv.org/abs/2401.07654 (2024).
Liu, S. et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection. Preprint at https://arxiv.org/abs/2303.05499 (2023).
Zou, X. et al. Segment everything everywhere all at once. In Proc. 37th Int. Conference on Neural Information Processing Systems 19769–19782 (Curran Associates, 2024).
Yang, J., Li, C., Dai, X. & Gao, J. Focal modulation networks. Adv. Neural Inf. Process. Syst. 35, 4203–4217 (2022).
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2021).
Sirinukunwattana, K., Snead, D. R. J. & Rajpoot, N. M. A stochastic polygons model for glandular structures in colon histology images. IEEE Trans. Med. Imaging 34, 2366–2378 (2015).
Du, Y., Bai, F., Huang, T. & Zhao, B. Segvol: universal and interactive volumetric medical image segmentation. Preprint at https://arxiv.org/abs/2311.13385 (2023).
Zhao, Z. et al. One model to rule them all: towards universal segmentation for medical images with text prompts. Preprint at https://arxiv.org/abs/2312.17183 (2023).
Hörst, F. et al. Cellvit: vision transformers for precise cell segmentation and classification. Med. Image Anal. 94, 103143 (2024).
Hatamizadeh, A. et al. Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In Int. MICCAI Brain Lesion Workshop 272–284 (Springer, 2022).
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal.Mach. Intell. 40, 834–848 (2017).
Butoi, V. I. et al. Universeg: universal medical image segmentation. In Proc. IEEE/CVF International Conference on Computer Vision 21438–21451 (ICCV, 2023).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th Int. Conf. Proc. Part III 234–241 (Springer, 2015).
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Int. Conf. Medical Image Computing and Computer-assisted Intervention 424–432 (Springer, 2016).
Milletari, F., Navab, N. & Ahmadi, S.-A. V-Net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 4th Int. Conf. 3D vision (3DV) 565–571 (IEEE, 2016).
Li, X. et al. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37, 2663–2674 (2018).
Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. UNet++: redesigning Skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39, 1856–1867 (2019).
Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization. In Int. MICCAI Brain Lesion Workshop 311–320 (Springer, 2018).
Lee, H. H., Bao, S., Huo, Y. & Landman, B. A. 3D UX-Net: a large kernel volumetric ConvNet modernizing hierarchical transformer for medical image segmentation. In The Eleventh International Conference on Learning Representations https://iclr.cc/media/iclr-2023/Slides/11340.pdf (ICLR, 2023).
Lee, H. H. et al. Scaling up 3D kernels with bayesian frequency re-parameterization for medical image segmentation. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 632–641 (Springer, 2023).
Chen, J. et al. TransUNet: transformers make strong encoders for medical image segmentation. Preprint at https://arxiv.org/abs/2102.04306 (2021).
Xu, G., Zhang, X., He, X. & Wu, X. LeViT-UNet: make faster encoders with transformer for medical image segmentation. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 42–53 (Springer, 2023).
Xie, Y., Zhang, J., Shen, C. & Xia, Y. Cotr: efficiently bridging CNN and transformer for 3D medical image segmentation. In Int. Conf. Medical Image Computing And Computer-assisted Intervention 171–180 (Springer, 2021).
Wang, W. et al. TransBTS: multimodal brain tumor segmentation using transformer. In Int. Conf. Medical Image Computing and Computer-Assisted Intervention 109–119 (Springer, 2021).
Hatamizadeh, A. et al. UNETR: transformers for 3D medical image segmentation. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 574–584 (2022).
Zhou, H.-Y. et al. nnformer: Volumetric medical image segmentation via a 3d transformer. IEEE Trans. Image Process. 32, 4036–4045 (2023).
Cao, H. et al. Swin-UNet: UNet-like pure transformer for medical image segmentation. In European Conference on Computer Vision 205–218 (Springer, 2022).
Zhang, S. et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. Preprint at https://arxiv.org/abs/2303.00915 (2023).
Chaves, J. M. Z. et al. Training small multimodal models to bridge biomedical competency gap: a case study in radiology imaging. Preprint at https://arxiv.org/html/2403.08002v2 (2024).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intel. https://doi.org/10.1109/TPAMI.2016.2577031 (2017).
Bochkovskiy, A., Wang, C.-Y. & Liao, H.-Y. M. Yolov4: optimal speed and accuracy of object detection. Preprint at https://arxiv.org/abs/2004.10934 (2020).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Wong, H. E., Rakic, M., Guttag, J. & Dalca, A. V. Scribbleprompt: fast and flexible interactive segmentation for any medical image. Preprint at https://arxiv.org/html/2312.07381v2 (2024).
Shaharabany, T., Dahan, A., Giryes, R. & Wolf, L. AutoSAM: adapting SAM to medical images by overloading the prompt encoder. Preprint at https://arxiv.org/abs/2306.06370 (2023).
Lei, W., Wei, X., Zhang, X., Li, K. & Zhang, S. MedLSAM: localize and segment anything model for 3D medical images. Preprint at https://arxiv.org/abs/2306.14752 (2023).
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Greenwald, N. F. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 40, 555–565 (2022).
Ma, J. & Wang, B. Towards foundation models of biological image segmentation. Nat. Methods 20, 953–955 (2023).
Girshick, R. Fast r-cnn. In Proc. IEEE Int. Conf. on Computer Vision 1440–1448 (IEEE, 2015).
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proc. IEEE Int. Conf. On Computer Vision 2961–2969 (IEEE, 2017).
Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st Int. Conf. Proc. Part II 265–273 (Springer, 2018).
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Yang, H. et al. CircleNet: anchor-free glomerulus detection with circle representation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd Int. Conf. Proc. Part IV 35–44 (Springer, 2020).
Nguyen, E. H. et al. CircleSnake: instance segmentation with circle representation. In Int. Workshop on Machine Learning in Medical Imaging 298–306 (Springer, 2022).
Ilyas, T. et al. Tsfd-net: tissue specific feature distillation network for nuclei segmentation and classification. Neural Netw. 151, 1–15 (2022).
OHDSI. Athena standardized vocabularies. https://www.ohdsi.org/analytic-tools/athena-standardized-vocabularies/
Gu, Y. et al. BiomedJourney: counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. Preprint at https://arxiv.org/abs/2310.10765 (2023).
Li, C. et al. Llava-med: training a large language-and-vision assistant for biomedicine in one day. In 37th Conference on Neural Information Processing Systems https://proceedings.neurips.cc/paper_files/paper/2023/file/5abcdf8ecdcacba028c6662789194572-Paper-Datasets_and_Benchmarks.pdf (NeurIPS, 2024).
Gu, Y., Zhang, S., Usuyama, N. et al. Distilling large language models for biomedical knowledge extraction: a case study on adverse drug events. Preprint at https://arxiv.org/abs/2307.06439 (2023).
Zou, X. et al. Generalized decoding for pixel, image, and language. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15116–15127 (IEEE, 2023).
Ren, T. et al. Grounded SAM: assembling open-world models for diverse visual tasks. Preprint at https://arxiv.org/abs/2401.14159 (2024).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proc. European Conference on Computer Vision (ECCV) 801–818 (2018).
Kazerooni, A. F. et al. The brain tumor segmentation (BraTS) challenge 2023: focus on pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs). Preprint at https://arxiv.org/abs/2305.17033 (2023).
Lee, P., Goldberg, C. & Kohane, I. The AI Revolution in Medicine: GPT-4 and Beyond (Pearson, 2023).
Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Massey Jr, F. J. The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46, 68–78 (1951).
Canny, J. A computational approach to edge detection. In IEEE Transactions on Pattern Analysis and Machine Intelligence 679–698 (IEEE, 1986).
Viola, P. & Jones, M. Rapid object detection using a boosted cascade of simple features. In Proc. 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, I–I (IEEE, 2001).
Girshick, R., Donahue, J., Darrell, T. & Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 580–587 (2014).
Acknowledgements
The authors thank the Microsoft Health and Life Sciences Research team and the Microsoft Health Futures team for support and helpful discussions.
Author information
Authors and Affiliations
Contributions
T.Z., Y.G., J.Y., N.U., M.W., H.P. and S.W. contributed to the conception and design of the work. T.Z. contributed to the data acquisition and curation of BiomedParseData. T.Z., J.Y., N.U. and M.W. contributed to BiomedParse model training. Y.G., T.Z., H.L., N.U. and S.K. contributed to the evaluation of BiomedParse and baseline models. T.N. and J.G. contributed to the technical discussions. A.C., J.A., C.M., B.P. and C.B. provided clinical inputs to the study. All authors contributed to the drafting and revision of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
C.B. is a member of the scientific advisory board and owns stock in PrimeVax and BioAI; is on the scientific board of Lunaphore and SironaDx; has a consultant or advisory relationship with Sanofi, Agilent, Roche and Incendia; contributes to institutional research for Illumina, and is an inventor on US patent applications US20180322632A1 (Image Processing Systems and Methods for Displaying Multiple Images of a Biological Specimen) filed by Ventana Medical Systems, Providence Health and Services Oregon and US20200388033A1 (System and Method for Automatic Labeling of Pathology Images) filed by Providence Health and Services Oregon, Omics Data Automation. The other authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Stefania Moroianu, Dong Ni, Yichi Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Rita Strack, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Number of images in each of the 25 anatomic sites from 9 modalities.
One anatomic site could present in multiple modalities.
Extended Data Fig. 2 Ablation studies comparing the performance of BiomedParse and two variants.
BiomedParse-SAM stands for using SAM to initialize the image encoder. BiomedParse-PubmedBERT stands for using the frozen PubmedBERT as the text encoder. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. N in the plot denotes the number of images in the corresponding modality. The numbers of object types in each modality are as follows: N = 112 for All, N = 27 for CT, N = 34 for MRI, N = 12 for X-Ray, N = 24 for Pathology, N = 7 for Ultrasound, N = 2 for Fundus, N = 3 for Endoscope, N = 2 for Dermoscopy, and N = 1 for OCT. Each box shows the quartiles of the distribution, with the center as the median, the minimum as the first quartile, and the maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2 times the inter-quartile range (IQR) from the nearest quartile. Data points that lie outside the whiskers are shown as fliers. *indicates the significance level at which BiomedParse outperforms BiomedParse-PubmedBERT, with two-sided paired t-test p-value < 1 × 10-2 for **, p-value < 1 × 10-3 for ***, p-value < 1 × 10-4 for ****. Exact p-values for the comparison between BiomedParse and BiomedParse-PubMedBERT are as follows: p-value < 9.52 × 10-10 for All, p-value < 1.67 × 10-3 for CT, p-value < 4.87 × 10-4 for MRI, p-value < 1.98 × 10-4 for Pathology, and p-value < 7.13 × 10-3 for Ultrasound.
Extended Data Fig. 3 Evaluating BiomedParse and competing methods in terms of Average Symmetric Surface Distance.
Box plot comparing the performance of BiomedParse and competing methods in terms of Average Symmetric Surface Distance (ASSD). Smaller ASSD indicates better segmentation performance. Each box shows the quartiles of the distribution, with center as the median, minimum as the first quartile, and maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2 times the inter-quartile range (IQR) from the nearest quartile. Data points that lie outside the whiskers are shown as fliers. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. The numbers of object types in each modality are as follows: n = 112 for All, n = 27 for CT, n = 34 for MRI, n = 12 for X-Ray, n = 24 for Pathology, n = 7 for Ultrasound, n = 2 for Fundus, n = 3 for Endoscope, n = 2 for Dermoscopy, and n = 1 for OCT. *indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10-2 for **, p-value < 1 × 10-3 for ***, p-value < 1 × 10-4 for ****. Exact p-values for the comparison between BiomedParse and MedSAM with oracle box prompt are as follows: p-value < 3.43 × 10-6 for All, p-value < 2.61 × 10-3 for CT, p-value < 7.73 × 10-5 for MRI, and p-value < 2.94 × 10-8 for Pathology.
Extended Data Fig. 4 Comparing BiomedParse with biomedical-specific text prompt segmentation models.
Bar plot comparing BiomedParse with biomedical-specific text prompt segmentation models across different organs on CT in terms of Dice score. Each bar shows the mean of the distribution, with error bar indicating the 95% confidence interval. The sample sizes for the target organs are as follows: n = 27,779 for All, n = 4,409 for Aorta, n = 864 for Bladder, n = 1,677 for Duodenum, n = 1,964 for Esophagus, n = 712 for Gallbladder, n = 4,105 for Inferior vena cava, n = 635 for Left adrenal gland, n = 1,776 for Left kidney, n = 4,648 for Liver, n = 1,345 for Pancreas, n = 571 for Right adrenal gland, n = 1,649 for Right kidney, n = 1,587 for Spleen, and n = 1,837 for Stomach. *indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10-2 for **, p-value < 1 × 10-3 for ***, p-value < 1 × 10-4 for ****. Exact p-values for the comparison between BiomedParse and SegVol are as follows: p-value < 2.23 × 10-308 for All, p-value < 1.86 × 10-58 for Aorta, p-value < 1.73 × 10-7 for Bladder, p-value < 3.44 × 10-86 for Duodenum, p-value < 5.00 × 10-185 for Esophagus, p-value < 3.37 × 10-15 for Gallbladder, p-value < 6.28 × 10-99 for Inferior vena cava, p-value < 5.08 × 10-10 for Left adrenal gland, p-value < 9.26 × 10-31 for Left kidney, p-value < 3.31 × 10-37 for Liver, p-value < 2.27 × 10-56 for Pancreas, p-value < 1.01 × 10-16 for Right adrenal gland, p-value < 2.98 × 10-20 for Right kidney, p-value < 1.09 × 10-20 for Spleen, and p-value < 4.68 × 10-25 for Stomach.
Extended Data Fig. 5 Comparing BiomedParse with fine-tuned SAM and MedSAM.
Bar plot comparing BiomedParse and SAM and MedSAM when SAM and MedSAM are both further trained on the entire BiomedParseData. Both SAM and MedSAM were provided with oracle bounding box around the segmentation target during the training and the inference stage. Each bar shows the mean of the distribution, with error bar indicating the 95% confidence interval. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. We show the numbers of object types in each modality are as follows. The numbers of object types in each modality are as follows: n = 105 for All, n = 26 for CT, n = 34 for MRI, n = 6 for X-Ray, n = 24 for Pathology, n = 7 for Ultrasound, n = 2 for Fundus, n = 3 for Endoscope, n = 2 for Dermoscopy, and n = 1 for OCT. *indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10-2 for **, p-value < 1 × 10-3 for ***, p-value < 1 × 10-4 for ****. Exact p-values for the comparison between BiomedParse and SAM-FT with oracle box prompt are as follows: p-value < 1.78 × 10-7 for All, p-value < 2.02 × 10-2 for CT, p-value < 1.32 × 10-2 for X-Ray, p-value < 3.52 × 10-8 for Pathology, and p-value < 1.49 × 10-2 for Ultrasound.
Extended Data Fig. 6 Comparison between BiomedParse and competing methods on the MedSAM benchmark.
We evaluated MedSAM and SAM using the ground truth bounding box for the segmentation.For nnU-Net and DeepLabV3+, we reported the evaluation reported by MedSAM. Results are shown by imaging modality, with statistical significance comparison between BiomedParse and best-competing method MedSAM. Each box shows the quartiles of the distribution, with center as the median, minimum as the first quartile, and maximum as the third quartile. The whiskers extend to the farthest data point that lies within 2 times the inter-quartile range (IQR) from the nearest quartile. Data points that lie outside the whiskers are shown as fliers. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. The numbers of object types in each modality are as follows: n = 50 for All, n = 18 for CT, n = 15 for MRI, n = 6 for X-Ray, n = 1 for Pathology, n = 6 for Ultrasound, n = 2 for Fundus, n = 1 for Endoscope, and n = 1 for Dermoscopy. * indicates the significance level at which BiomedParse outperforms the best-competing method, with two-sided paired t-test p-value < 1 × 10-2 for **, p-value < 1 × 10-3 for ***, p-value < 1 × 10-4 for ****. Exact p-values for the comparison between BiomedParse and MedSAM with oracle box prompt are as follows: p-value < 2.98 × 10-3 for All, p-value < 7.08 × 10-3 for CT, and p-value < 4.35 × 10-2 for MRI.
Extended Data Fig. 7 Comparing the improvement of BiomedParse over SAM with shape irregularity.
Scatter plots comparing the improvement of BiomedParse over SAM with shape irregularity in terms of box ratio (left), convex ratio (middle), and inversed rotational inertia (right). Each dot represents the mean statistics over one object type in our segmentation ontology. We show the regression plot with the 95 confidence interval as the error bands. The p-values show the two-sided Wald test results.
Supplementary information
Supplementary Information
Supplementary Table 1, Figs. 1–11 and References.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, T., Gu, Y., Yang, J. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat Methods 22, 166–176 (2025). https://doi.org/10.1038/s41592-024-02499-w
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41592-024-02499-w