Decipher-MR: a vision-language foundation model for 3D MRI representations

Yang, Zhijian; DSouza, Noel; Megyeri, Istvan; Xu, Xiaojian; Shandiz, Amin Honarmandi; Haddadpour, Farzin; Koos, Krisztian; Rusko, Laszlo; Valeriano, Emanuele; Swaminathan, Bharadwaj; Wu, Lei; Bhatia, Parminder; Kass-Hout, Taha; Bas, Erhan

doi:10.1038/s41746-026-02596-4

Download PDF

Article
Open access
Published: 04 April 2026

Decipher-MR: a vision-language foundation model for 3D MRI representations

Zhijian Yang¹,
Noel DSouza¹,
Istvan Megyeri²,
Xiaojian Xu¹,
Amin Honarmandi Shandiz²,
Farzin Haddadpour¹,
Krisztian Koos²,
Laszlo Rusko²,
Emanuele Valeriano¹,
Bharadwaj Swaminathan¹,
Lei Wu¹,
Parminder Bhatia¹,
Taha Kass-Hout¹ &
…
Erhan Bas¹

npj Digital Medicine , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Magnetic Resonance Imaging is a critical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity hinder scalable, generalizable machine learning. Although foundation models have revolutionized language and vision tasks, their application to MRI remains constrained by data scarcity and narrow anatomical focus. We present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust representations for broad applications. To enable efficient use, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent improvements over existing foundation models and task-specific approaches. These results support Decipher-MR as a promising and reusable foundation for MRI-based AI, within the scope of the tasks and datasets evaluated.

Projective diffeomorphic mapping of molecular digital pathology with tissue MRI

Article Open access 13 December 2022

A foundation model for enhancing magnetic resonance images and downstream segmentation, registration and diagnostic tasks

Article 05 December 2024

A generalizable foundation model for analysis of human brain MRI

Article Open access 05 February 2026

Data availability

The MRI datasets used for model pretraining and internal evaluations are proprietary and cannot be shared publicly due to institutional, contractual, and privacy restrictions. These include the pretraining MRI corpus, the held-out validation set, and three additional internal evaluation datasets (Source1, Source2 Head and Neck, and MRDLAS). Access to these datasets is not permitted outside the hosting institutions, and therefore, they cannot be made available. Example benchmark for missing organ localization can be found at: registry.opendata.aws/gehcai-mapsmr. This study also makes use of several publicly available datasets for benchmarking, including ADNI, PI-CAI, ACDC, LLD-MMRI, MRART, and AMOS. These datasets can be accessed through their respective data portals in accordance with their data-use agreements (citations provided in the manuscript). All dataset identifiers, accession links, and licensing details are provided in the Methods and Supplementary Information. Only aggregated results and derived summary statistics are reported in this manuscript. No individual-level proprietary data or raw imaging files can be released. Additional non-sensitive materials may be provided by the corresponding author upon reasonable request and subject to institutional approval.

Code availability

The pretraining code is primarily adapted from several open-source frameworks, including DINOv2 (https://github.com/facebookresearch/dinov2), the HuggingFace Transformers Trainer for BERT models (https://huggingface.co/docs/transformers/en/main_classes/trainer), and OpenCLIP (https://github.com/mlfoundations/open_clip). We introduced customized components unique to healthcare domain, specifically on MR modality, such as patch tokenization, data augmentation, and data organization, as detailed in the paper. Due to considerations related to safety, intellectual property, and commercial viability, the pretrained Decipher-MR weights might not be directly shared at this time. The algorithmic procedures and model architectures are fully described in the “Methods” section to support reproducibility, and additional methodological details may be provided by the corresponding author upon reasonable request.

References

Carré, A. Standardization of brain MR images across machines and protocols: bridging the gap for MRI-based radiomics. Sci. Rep. 10, 12340 (2020).
Google Scholar
OpenAI et al. Gpt-4 technical report (2024). https://arxiv.org/abs/2303.08774. 2303.08774.
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (2021). https://api.semanticscholar.org/CorpusID:231591445.
Oquab, M. et al. DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (2024). https://openreview.net/forum?id=a68SUt6zFt. Featured Certification.
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16000–16009 (2022).
Pérez-García, F. et al. Exploring scalable medical image encoders beyond text supervision. Nat. Mach. Intell. 7, 119–130 (2025).
Google Scholar
Moutakanni, T. et al. Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning (2024). https://arxiv.org/abs/2405.01469. 2405.01469.
Blankemeier, L. et al. Merlin: A vision language foundation model for 3d computed tomography (2024). https://arxiv.org/abs/2406.06512. 2406.06512.
Yang, L. et al. Advancing multimodal medical capabilities of Gemini. arXiv preprint arXiv:2405.03162 (2024).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Google Scholar
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Google Scholar
Codella, N. C. F. et al. Medimageinsight: An open-source embedding model for general domain medical imaging (2024). https://arxiv.org/abs/2410.06542. 2410.06542.
Ye, Y. et al. Continual self-supervised learning: Towards universal multi-modal medical data representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11114–11124 (2024).
Zhang, S. et al. A multimodal biomedical foundation model trained from fifteen million image-text pairs. NEJM AI 2, AIoa2400640 (2025).
Google Scholar
Zhao, T. et al. A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities. Nat. Methods 22, 166–176 (2025).
Google Scholar
Cox, J. et al. Brainsegfounder: Towards 3d foundation models for neuroimage segmentation (2024). https://arxiv.org/abs/2406.10395. 2406.10395.
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).
Google Scholar
Sun, J. et al. Medical image analysis using improved sam-med2d: segmentation and classification perspectives. BMC Med. Imaging 24 (2024).
Tak, D. et al. A foundation model for generalized brain MRI analysis. medRxiv (2024). https://www.medrxiv.org/content/early/2024/12/03/2024.12.02.24317992. https://www.medrxiv.org/content/early/2024/12/03/2024.12.02.24317992.full.pdf.
Wang, S. et al. Triad: Vision foundation model for 3d magnetic resonance imaging (2025). https://arxiv.org/abs/2502.14064. 2502.14064.
Cox, J. et al. BrainSegFounder: towards 3D foundation models for Neuroimage Segmentation. Med. Image Anal. 97, 103301 (2024).
Google Scholar
Ji, Y. et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. In Koyejo, S. et al. (eds.) Advances in Neural Information Processing Systems, vol. 35, 36722–36732 (Curran Associates, Inc., 2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/ee604e1bedbd069d9fc9328b7b9584be-Paper-Datasets_and_Benchmarks.pdf.
Bernard, O. et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 37, 2514–2525 (2018).
Google Scholar
Wang, H. et al. Sam-MED3D: Towards general-purpose segmentation models for volumetric medical images (2024). https://arxiv.org/abs/2310.15161. 2310.15161.
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Google Scholar
Wang, Y. et al. DETR3D: 3D object detection from multi-view images via 3d-to-2d queries. In Proceedings of the 5th Conference on Robot Learning, vol. 164 of Proceedings of Machine Learning Research, 180–191 (PMLR, 2022). https://proceedings.mlr.press/v164/wang22b.html.
Chen, Z. et al. Medical phrase grounding with region-phrase context contrastive alignment. In MICCAI (2023).
Silva, M. V. F. et al. Alzheimer’s disease: risk factors and potentially protective measures. J. Biomed. Sci. 26, 33 (2019).
Google Scholar
Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. npj Digital Med. 8, 154 (2025).
Google Scholar
Dutt, R., Bohdal, O., Tsaftaris, S. A. & Hospedales, T. Fairtune: Optimizing parameter efficient fine tuning for fairness in medical image analysis. In International Conference on Learning Representations (2024).
Wang, R. et al. Drop the shortcuts: image augmentation improves fairness and decreases ai detection of race and other demographics from medical images. eBioMedicine102, (2024). https://doi.org/10.1016/j.ebiom.2024.105047.
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Google Scholar
Doshi, J., Erus, G., Ou, Y., Gaonkar, B. & Davatzikos, C. Multi-atlas skull-stripping. Acad. Radiol. 20, 1566–1576 (2013).
Google Scholar
He, X., Wang, A. Q. & Sabuncu, M. R. Neural pre-processing: A learning framework for end-to-end brain MRI pre-processing. In Greenspan, H. et al. (eds.) Medical Image Computing and Computer Assisted Intervention–MICCAI 2023, 258–267 (Springer Nature Switzerland, 2023).
Yuan, Y., Ahn, E., Feng, D., Khadra, M. & Kim, J. Z-ssmnet: Zonal-aware self-supervised mesh network for prostate cancer detection and diagnosis with bi-parametric MRI. Comput. Med. Imaging Graph. 122, 102510 (2025).
Google Scholar
Petersen, R. C. et al. Alzheimer’s disease neuroimaging initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010).
Google Scholar
Saha, A. et al. Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study. Lancet Oncol. 25, 879–887 (2024).
Google Scholar
Lou, M. et al. Sdr-former: a siamese dual-resolution transformer for liver lesion classification using 3d multi-phase imaging. Neural Networks 107228 (2025).
Nárai, Á. et al. Movement-related artefacts (mr-art) dataset of matched motion-corrupted and clean structural mri brain scans. Sci. Data 9, 630 (2022).
Google Scholar
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy.
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C. & Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186 (Association for Computational Linguistics, 2019). https://aclanthology.org/N19-1423/.
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare3, (2021). https://doi.org/10.1145/3458754.
Grattafiori, A. et al. The llama 3 herd of models (2024). https://arxiv.org/abs/2407.21783. 2407.21783.
Lee, D., de Keizer, N., Lau, F. & Cornet, R. Literature review of SNOMED CT use. J. Am. Med. Inform. Assoc. 21, e11–e19 (2013).
Google Scholar
Amazon Web Services. Amazon comprehend medical - extract insights from medical text (n.d.). https://aws.amazon.com/comprehend/medical/.
Dong, H. et al. MRI-core: a foundation model for magnetic resonance imaging. arXiv preprint arXiv:2404.09957 (2024).
Myronenko, A. 3d mri brain tumor segmentation using autoencoder regularization. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 311–320 (Springer International Publishing, Cham, 2019).
Isensee, F. et al. nnu-net revisited: A call for rigorous validation in 3d medical image segmentation. In Linguraru, M. G. et al. (eds.) Medical Image Computing and Computer Assisted Intervention–MICCAI 2024, 488–498 (Springer Nature Switzerland, 2024).

Download references

Acknowledgements

The authors thank Seyed Iman Zare Estakhraji, Marc Lebel, Michail Fanariotis, and anonymous reviewers for their constructive feedback and discussions throughout the development of this study.

Author information

Authors and Affiliations

GE Healthcare, Seattle, WA, USA
Zhijian Yang, Noel DSouza, Xiaojian Xu, Farzin Haddadpour, Emanuele Valeriano, Bharadwaj Swaminathan, Lei Wu, Parminder Bhatia, Taha Kass-Hout & Erhan Bas
GE Healthcare, Budapest, Hungary
Istvan Megyeri, Amin Honarmandi Shandiz, Krisztian Koos & Laszlo Rusko

Authors

Zhijian Yang
View author publications
Search author on:PubMed Google Scholar
Noel DSouza
View author publications
Search author on:PubMed Google Scholar
Istvan Megyeri
View author publications
Search author on:PubMed Google Scholar
Xiaojian Xu
View author publications
Search author on:PubMed Google Scholar
Amin Honarmandi Shandiz
View author publications
Search author on:PubMed Google Scholar
Farzin Haddadpour
View author publications
Search author on:PubMed Google Scholar
Krisztian Koos
View author publications
Search author on:PubMed Google Scholar
Laszlo Rusko
View author publications
Search author on:PubMed Google Scholar
Emanuele Valeriano
View author publications
Search author on:PubMed Google Scholar
Bharadwaj Swaminathan
View author publications
Search author on:PubMed Google Scholar
Lei Wu
View author publications
Search author on:PubMed Google Scholar
Parminder Bhatia
View author publications
Search author on:PubMed Google Scholar
Taha Kass-Hout
View author publications
Search author on:PubMed Google Scholar
Erhan Bas
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.Y. designed the pretraining model with contributions from X.X., N.D., and I.M. on model design and data preparation. N.D., I.M., X.X., A.H., F.H., K.K., L.R. worked on decoder design, carried out finetuning experiments. E.V., B.S., L.W., P.B., T.K.H. assisted experimental setup for fine-tuning and analysis of results. E.B. designed and directed the project with contributions from P.B. and T.K.H. Z.Y. wrote the majority of manuscript with inputs from N.D., I.M., X.X., A.H., F.H., K.K., L.R., and E.B. All authors discussed the results and contributed to the final manuscript.

Corresponding authors

Correspondence to Zhijian Yang or Erhan Bas.

Ethics declarations

Competing interests

All authors are employees of GE Healthcare. The authors declare no additional competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, Z., DSouza, N., Megyeri, I. et al. Decipher-MR: a vision-language foundation model for 3D MRI representations. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02596-4

Download citation

Received: 29 August 2025
Accepted: 21 March 2026
Published: 04 April 2026
DOI: https://doi.org/10.1038/s41746-026-02596-4