Abstract
Magnetic Resonance Imaging is a critical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity hinder scalable, generalizable machine learning. Although foundation models have revolutionized language and vision tasks, their application to MRI remains constrained by data scarcity and narrow anatomical focus. We present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust representations for broad applications. To enable efficient use, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent improvements over existing foundation models and task-specific approaches. These results support Decipher-MR as a promising and reusable foundation for MRI-based AI, within the scope of the tasks and datasets evaluated.
Similar content being viewed by others
Data availability
The MRI datasets used for model pretraining and internal evaluations are proprietary and cannot be shared publicly due to institutional, contractual, and privacy restrictions. These include the pretraining MRI corpus, the held-out validation set, and three additional internal evaluation datasets (Source1, Source2 Head and Neck, and MRDLAS). Access to these datasets is not permitted outside the hosting institutions, and therefore, they cannot be made available. Example benchmark for missing organ localization can be found at: registry.opendata.aws/gehcai-mapsmr. This study also makes use of several publicly available datasets for benchmarking, including ADNI, PI-CAI, ACDC, LLD-MMRI, MRART, and AMOS. These datasets can be accessed through their respective data portals in accordance with their data-use agreements (citations provided in the manuscript). All dataset identifiers, accession links, and licensing details are provided in the Methods and Supplementary Information. Only aggregated results and derived summary statistics are reported in this manuscript. No individual-level proprietary data or raw imaging files can be released. Additional non-sensitive materials may be provided by the corresponding author upon reasonable request and subject to institutional approval.
Code availability
The pretraining code is primarily adapted from several open-source frameworks, including DINOv2 (https://github.com/facebookresearch/dinov2), the HuggingFace Transformers Trainer for BERT models (https://huggingface.co/docs/transformers/en/main_classes/trainer), and OpenCLIP (https://github.com/mlfoundations/open_clip). We introduced customized components unique to healthcare domain, specifically on MR modality, such as patch tokenization, data augmentation, and data organization, as detailed in the paper. Due to considerations related to safety, intellectual property, and commercial viability, the pretrained Decipher-MR weights might not be directly shared at this time. The algorithmic procedures and model architectures are fully described in the “Methods” section to support reproducibility, and additional methodological details may be provided by the corresponding author upon reasonable request.
References
Carré, A. Standardization of brain MR images across machines and protocols: bridging the gap for MRI-based radiomics. Sci. Rep. 10, 12340 (2020).
OpenAI et al. Gpt-4 technical report (2024). https://arxiv.org/abs/2303.08774. 2303.08774.
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445.
Oquab, M. et al. DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (2024). https://openreview.net/forum?id=a68SUt6zFt. Featured Certification.
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16000–16009 (2022).
Pérez-García, F. et al. Exploring scalable medical image encoders beyond text supervision. Nat. Mach. Intell. 7, 119–130 (2025).
Moutakanni, T. et al. Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning (2024). https://arxiv.org/abs/2405.01469. 2405.01469.
Blankemeier, L. et al. Merlin: A vision language foundation model for 3d computed tomography (2024). https://arxiv.org/abs/2406.06512. 2406.06512.
Yang, L. et al. Advancing multimodal medical capabilities of Gemini. arXiv preprint arXiv:2405.03162 (2024).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Codella, N. C. F. et al. Medimageinsight: An open-source embedding model for general domain medical imaging (2024). https://arxiv.org/abs/2410.06542. 2410.06542.
Ye, Y. et al. Continual self-supervised learning: Towards universal multi-modal medical data representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11114–11124 (2024).
Zhang, S. et al. A multimodal biomedical foundation model trained from fifteen million image-text pairs. NEJM AI 2, AIoa2400640 (2025).
Zhao, T. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat. Methods 22, 166–176 (2025).
Cox, J. et al. Brainsegfounder: Towards 3d foundation models for neuroimage segmentation (2024). https://arxiv.org/abs/2406.10395. 2406.10395.
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).
Sun, J. et al. Medical image analysis using improved sam-med2d: segmentation and classification perspectives. BMC Med. Imaging 24 (2024).
Tak, D. et al. A foundation model for generalized brain MRI analysis. medRxiv (2024). https://www.medrxiv.org/content/early/2024/12/03/2024.12.02.24317992. https://www.medrxiv.org/content/early/2024/12/03/2024.12.02.24317992.full.pdf.
Wang, S. et al. Triad: Vision foundation model for 3d magnetic resonance imaging (2025). https://arxiv.org/abs/2502.14064. 2502.14064.
Cox, J. et al. BrainSegFounder: towards 3D foundation models for Neuroimage Segmentation. Med. Image Anal. 97, 103301 (2024).
Ji, Y. et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. In Koyejo, S. et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 36722–36732 (Curran Associates, Inc., 2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf.
Bernard, O. et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 37, 2514–2525 (2018).
Wang, H. et al. Sam-MED3D: Towards general-purpose segmentation models for volumetric medical images (2024). https://arxiv.org/abs/2310.15161. 2310.15161.
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Wang, Y. et al. DETR3D: 3D object detection from multi-view images via 3d-to-2d queries. In Proceedings of the 5th Conference on Robot Learning, vol. 164 of Proceedings of Machine Learning Research, 180–191 (PMLR, 2022). https://proceedings.mlr.press/v164/wang22b.html.
Chen, Z. et al. Medical phrase grounding with region-phrase context contrastive alignment. In MICCAI (2023).
Silva, M. V. F. et al. Alzheimer’s disease: risk factors and potentially protective measures. J. Biomed. Sci. 26, 33 (2019).
Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. npj Digital Med. 8, 154 (2025).
Dutt, R., Bohdal, O., Tsaftaris, S. A. & Hospedales, T. Fairtune: Optimizing parameter efficient fine tuning for fairness in medical image analysis. In International Conference on Learning Representations (2024).
Wang, R. et al. Drop the shortcuts: image augmentation improves fairness and decreases ai detection of race and other demographics from medical images. eBioMedicine102, (2024). https://doi.org/10.1016/j.ebiom.2024.105047.
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Doshi, J., Erus, G., Ou, Y., Gaonkar, B. & Davatzikos, C. Multi-atlas skull-stripping. Acad. Radiol. 20, 1566–1576 (2013).
He, X., Wang, A. Q. & Sabuncu, M. R. Neural pre-processing: A learning framework for end-to-end brain MRI pre-processing. In Greenspan, H. et al. (eds.) Medical Image Computing and Computer Assisted Intervention–MICCAI 2023, 258–267 (Springer Nature Switzerland, 2023).
Yuan, Y., Ahn, E., Feng, D., Khadra, M. & Kim, J. Z-ssmnet: Zonal-aware self-supervised mesh network for prostate cancer detection and diagnosis with bi-parametric MRI. Comput. Med. Imaging Graph. 122, 102510 (2025).
Petersen, R. C. et al. Alzheimer’s disease neuroimaging initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010).
Saha, A. et al. Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study. Lancet Oncol. 25, 879–887 (2024).
Lou, M. et al. Sdr-former: a siamese dual-resolution transformer for liver lesion classification using 3d multi-phase imaging. Neural Networks 107228 (2025).
Nárai, Á. et al. Movement-related artefacts (mr-art) dataset of matched motion-corrupted and clean structural mri brain scans. Sci. Data 9, 630 (2022).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy.
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C. & Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186 (Association for Computational Linguistics, 2019). https://aclanthology.org/N19-1423/.
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare3, (2021). https://doi.org/10.1145/3458754.
Grattafiori, A. et al. The llama 3 herd of models (2024). https://arxiv.org/abs/2407.21783. 2407.21783.
Lee, D., de Keizer, N., Lau, F. & Cornet, R. Literature review of SNOMED CT use. J. Am. Med. Inform. Assoc. 21, e11–e19 (2013).
Amazon Web Services. Amazon comprehend medical - extract insights from medical text (n.d.). https://aws.amazon.com/comprehend/medical/.
Dong, H. et al. MRI-core: a foundation model for magnetic resonance imaging. arXiv preprint arXiv:2404.09957 (2024).
Myronenko, A. 3d mri brain tumor segmentation using autoencoder regularization. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 311–320 (Springer International Publishing, Cham, 2019).
Isensee, F. et al. nnu-net revisited: A call for rigorous validation in 3d medical image segmentation. In Linguraru, M. G. et al. (eds.) Medical Image Computing and Computer Assisted Intervention–MICCAI 2024, 488–498 (Springer Nature Switzerland, 2024).
Acknowledgements
The authors thank Seyed Iman Zare Estakhraji, Marc Lebel, Michail Fanariotis, and anonymous reviewers for their constructive feedback and discussions throughout the development of this study.
Author information
Authors and Affiliations
Contributions
Z.Y. designed the pretraining model with contributions from X.X., N.D., and I.M. on model design and data preparation. N.D., I.M., X.X., A.H., F.H., K.K., L.R. worked on decoder design, carried out finetuning experiments. E.V., B.S., L.W., P.B., T.K.H. assisted experimental setup for fine-tuning and analysis of results. E.B. designed and directed the project with contributions from P.B. and T.K.H. Z.Y. wrote the majority of manuscript with inputs from N.D., I.M., X.X., A.H., F.H., K.K., L.R., and E.B. All authors discussed the results and contributed to the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
All authors are employees of GE Healthcare. The authors declare no additional competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, Z., DSouza, N., Megyeri, I. et al. Decipher-MR: a vision-language foundation model for 3D MRI representations. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02596-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-026-02596-4


