Accurate prediction of disease-risk factors from volumetric medical scans by a deep vision model pre-trained with 2D scans

Avram, Oren; Durmus, Berkin; Rakocz, Nadav; Corradetti, Giulia; An, Ulzee; Nittala, Muneeswar G.; Terway, Prerit; Rudas, Akos; Chen, Zeyuan Johnson; Wakatsuki, Yu; Hirabayashi, Kazutaka; Velaga, Swetha; Tiosano, Liran; Corvi, Federico; Verma, Aditya; Karamat, Ayesha; Lindenberg, Sophiana; Oncel, Deniz; Almidani, Louay; Hull, Victoria; Fasih-Ahmad, Sohaib; Esmaeilkhanian, Houri; Cannesson, Maxime; Wykoff, Charles C.; Rahmani, Elior; Arnold, Corey W.; Zhou, Bolei; Zaitlen, Noah; Gronau, Ilan; Sankararaman, Sriram; Chiang, Jeffrey N.; Sadda, Srinivas R.; Halperin, Eran

doi:10.1038/s41551-024-01257-9

Article
Published: 01 October 2024

Accurate prediction of disease-risk factors from volumetric medical scans by a deep vision model pre-trained with 2D scans

Nature Biomedical Engineering volume 9, pages 507–520 (2025)Cite this article

5069 Accesses
7 Citations
70 Altmetric
Metrics details

Subjects

Abstract

The application of machine learning to tasks involving volumetric biomedical imaging is constrained by the limited availability of annotated datasets of three-dimensional (3D) scans for model training. Here we report a deep-learning model pre-trained on 2D scans (for which annotated data are relatively abundant) that accurately predicts disease-risk factors from 3D medical-scan modalities. The model, which we named SLIViT (for ‘slice integration by vision transformer’), preprocesses a given volumetric scan into 2D images, extracts their feature map and integrates it into a single prediction. We evaluated the model in eight different learning tasks, including classification and regression for six datasets involving four volumetric imaging modalities (computed tomography, magnetic resonance imaging, optical coherence tomography and ultrasound). SLIViT consistently outperformed domain-specific state-of-the-art models and was typically as accurate as clinical specialists who had spent considerable time manually annotating the analysed scans. Automating diagnosis tasks involving volumetric scans may save valuable clinician hours, reduce data acquisition costs and duration, and help expedite medical research and clinical applications.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Overview of SLIViT’s performance across 3D imaging modalities.**

**Fig. 3: Performance comparison on four tasks of AMD-biomarker classification when trained on less than 700 OCT volumes.**

**Fig. 4: Performance comparison on cardiac function prediction tasks when trained on echocardiograms.**

**Fig. 5: SLIViT’s performance compared with manual assessment by retina clinical specialists.**

Automated identification of clinical features from sparsely annotated 3-dimensional medical imaging

Article Open access 08 March 2021

Medical slice transformer for improved diagnosis and explainability on 3D medical images with DINOv2

Article Open access 04 July 2025

Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm

Article Open access 28 October 2021

Data availability

The 2D OCT dataset was downloaded from https://data.mendeley.com/datasets/rscbjbr9sj/3. The 3D OCT datasets are not publicly available owing to institutional data-use policy and to concerns about patient privacy. However, they are available from the authors upon reasonable request and with permission of the IRB. The echocardiogram dataset was downloaded from https://stanfordaimi.azurewebsites.net/datasets/834e1cd1-92f7-4268-9daa-d359198b310a. The MRI dataset was downloaded from https://www.ukbiobank.ac.uk under application number 33127. The 3D CT, the 2D CT and the 2D X-ray datasets were downloaded from https://medmnist.com.

Code availability

The code of SLIViT is available via the project’s GitHub repository at https://github.com/cozygene/SLIViT.

References

Chiang, J. N. et al. Automated identification of incomplete and complete retinal epithelial pigment and outer retinal atrophy using machine learning. Ophthalmol. Retina 7, 118–126 (2023).
Article PubMed Google Scholar
Wong, T. Y., Liew, G. & Mitchell, P. Clinical update: new treatments for age-related macular degeneration. Lancet 370, 204–206 (2007).
Article PubMed Google Scholar
Gandhi, S. K. et al. The pathogenesis of acute pulmonary edema associated with hypertension. N. Engl. J. Med. 344, 17–22 (2001).
Article CAS PubMed Google Scholar
Bloom, M. W. et al. Heart failure with reduced ejection fraction. Nat. Rev. Dis. Primers 3, 17058 (2017).
Article PubMed Google Scholar
Guindalini, R. S. C. et al. Intensive surveillance with biannual dynamic contrast-enhanced magnetic resonance imaging downstages breast cancer in BRCA1 mutation carriers. Clin. Cancer Res. 25, 1786–1794 (2019).
Article PubMed Google Scholar
Mann, R. M., Kuhl, C. K. & Moy, L. Contrast-enhanced MRI for breast cancer screening. J. Magn. Reson. Imaging 50, 377–390 (2019).
Article PubMed PubMed Central Google Scholar
Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
Article PubMed Google Scholar
O’Shea, K. & Nash, R. An introduction to convolutional neural networks. Preprint at arXiv https://doi.org/10.48550/arXiv.1511.08458 (2015).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Article Google Scholar
Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).
Article CAS PubMed Google Scholar
Zhuang, F. et al. A comprehensive survey on transfer learning. In Proc. IEEE (ed. Setti, G.) 43–76 (IEEE, 2021).
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
Article CAS PubMed Google Scholar
Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 25, 65–69 (2019).
Article CAS PubMed PubMed Central Google Scholar
Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).
Article PubMed PubMed Central Google Scholar
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).
Article PubMed Google Scholar
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.00747 (2020).
Xie, Y., Zhang, J., Xia, Y. & Wu, Q. UniMiSS: Universal Medical Self-Supervised learning via breaking dimensionality barrier. In Proc. European Conference on Computer Vision (eds. Avidan, S. et al.) 558–575 (Springer, 2022).
Azizi, S. et al. Big self-supervised models advance medical image classification. Preprint at arXiv https://doi.org/10.48550/arXiv.2101.05224 (2021).
Wu, Z. et al. OCT signs of early atrophy in age-related macular degeneration: interreader agreement: classification of atrophy meetings report 6. Ophthalmol. Retina 6, 4–14 (2022).
Article PubMed Google Scholar
Huang, S.-C. et al. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. npj Digit. Med. 6, 74 (2023).
Article PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology 295, 4–15 (2020).
Article PubMed Google Scholar
Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6450–6459 (IEEE, 2018).
Arnab, A. et al. ViViT: a video vision transformer. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 6816–6826 (IEEE, 2021).
Zhu, H., Chen, B. & Yang, C. Understanding why ViT trains badly on small datasets: an intuitive perspective. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.03751 (2023).
Rakocz, N. et al. Automated identification of clinical features from sparsely annotated 3-dimensional medical imaging. npj Digit. Med. 4, 44 (2021).
Article PubMed PubMed Central Google Scholar
Ghorbani, A. et al. Deep learning interpretation of echocardiograms. npj Digit. Med. 3, 10 (2020).
Article PubMed PubMed Central Google Scholar
Gupta, U. et al. Transferring models trained on natural images to 3D MRI via position encoded slice models. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.01491 (2023).
Witowski, J. et al. Improving breast cancer diagnostics with deep learning for MRI. Sci. Transl. Med. 14, eabo4802 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yang, M., Huang, X., Huang, L. & Cai, G. Diagnosis of Parkinson’s disease based on 3D ResNet: the frontal lobe is crucial. Biomed. Signal Process. Control 85, 104904 (2023).
Article Google Scholar
Zou, Q. et al. Three-dimensional ultrasound image reconstruction based on 3D-ResNet in the musculoskeletal system using a 1D probe: ex vivo and in vivo feasibility studies. Phys. Med. Biol. 68, 165003 (2023).
Article CAS Google Scholar
Turnbull, R. Using a 3D ResNet for detecting the presence and severity of COVID-19 from CT scans. In Proc. Computer Vision – ECCV 2022 Workshops (eds Karlinsky, L. et al.) 663–676 (Springer, 2023).
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 9630–9640 (IEEE, 2021).
Zhou, H.-Y., Lu, C., Yang, S., Han, X. & Yu, Y. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 3479–3489 (IEEE, 2021).
Xie, Y., Zhang, J., Liao, Z., Xia, Y. & Shen, C. PGL: prior-guided local self-supervised learning for 3D medical image segmentation. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.12640 (2020).
Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2003.04297 (2020).
Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 9620–9629 (IEEE, 2021).
Liu, Z. et al. A ConvNet for the 2020s. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (IEEE, 2022).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.11929 (2021)
Gupta, U., Lam, P. K., Ver Steeg, G. & Thompson, P. M. Improved brain age estimation with slice-based set networks. In Proc. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 840–844 (IEEE, 2021).
Zeng, Y. et al. A 2.5D deep learning-based method for drowning diagnosis using post-mortem computed tomography. IEEE J. Biomed. Health Inform. 27, 1026–1035 (2023).
Article PubMed Google Scholar
Schlemper, J. et al. Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019).
Article PubMed PubMed Central Google Scholar
Bertasius, G., Wang, H. & Torresani, L. Is space-time attention all you need for video understanding? In Proc. 38th International Conference on Machine Learning (ICML) (2021).
Neimark, D., Bar, O., Zohar, M. & Asselmann, D. Video transformer network. In Proc. IEEE/CVF International Conference on Computer Vision (ICVV) 3156–3165 (2021).
Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9 (2018).
Article CAS PubMed Google Scholar
Wong, W. L. et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Glob. Health 2, e106–e116 (2014).
Article PubMed Google Scholar
Hirabayashi, K. et al. OCT risk factors for development of atrophy in eyes with intermediate age-related macular degeneration. Ophthalmol. Retina 7, 253–260 (2023).
Article PubMed Google Scholar
Ouyang, D. et al. EchoNet-Dynamic: a large new cardiac motion video data resource for medical machine learning. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) (Curran Associates Inc., 2019).
Ziaeian, B. & Fonarow, G. C. Epidemiology and aetiology of heart failure. Nat. Rev. Cardiol. 13, 368–378 (2016).
Article PubMed PubMed Central Google Scholar
Klapholz, M. et al. Hospitalization for heart failure in the presence of a normal left ventricular ejection fraction: results of the New York Heart Failure Registry. J. Am. Coll. Cardiol. 43, 1432–1438 (2004).
Article PubMed Google Scholar
Dunlay, S. M., Roger, V. L. & Redfield, M. M. Epidemiology of heart failure with preserved ejection fraction. Nat. Rev. Cardiol. 14, 591–602 (2017).
Article PubMed Google Scholar
Idilman, I. S. et al. Hepatic steatosis: quantification by proton density fat fraction with MR imaging versus liver biopsy. Radiology 267, 767–775 (2013).
Article PubMed Google Scholar
Jung, J. et al. Direct comparison of quantitative US versus controlled attenuation parameter for liver fat assessment using MRI proton density fat fraction as the reference standard in patients suspected of having NAFLD. Radiology 304, 75–82 (2022).
Article PubMed Google Scholar
Runge, J. H. et al. MR spectroscopy-derived proton density fat fraction is superior to controlled attenuation parameter for detecting and grading hepatic steatosis. Radiology 286, 547–556 (2018).
Article PubMed Google Scholar
Schawkat, K. et al. Preoperative evaluation of pancreatic fibrosis and lipomatosis: correlation of magnetic resonance findings with histology using magnetization transfer imaging and multigradient echo magnetic resonance imaging. Invest. Radiol. 53, 720–727 (2018).
Article PubMed Google Scholar
Kühn, J.-P. et al. Pancreatic steatosis demonstrated at MR imaging in the general population: clinical relevance. Radiology 276, 129–136 (2015).
Article PubMed Google Scholar
Patel, N. S. et al. Insulin resistance increases MRI-estimated pancreatic fat in nonalcoholic fatty liver disease and normal controls. Gastroenterol. Res. Pract. 2013, 498296 (2013).
Article PubMed PubMed Central Google Scholar
Trout, A. T. et al. Relationship between abdominal fat stores and liver fat, pancreatic fat, and metabolic comorbidities in a pediatric population with non-alcoholic fatty liver disease. Abdom. Radiol. 44, 3107–3114 (2019).
Article Google Scholar
Covarrubias, Y. et al. Pilot study on longitudinal change in pancreatic proton density fat fraction during a weight-loss surgery program in adults with obesity. J. Magn. Reson. Imaging 50, 1092–1102 (2019).
Article PubMed PubMed Central Google Scholar
Yang, J. et al. MedMNIST v2 – a large-scale lightweight benchmark for 2D and 3D biomedical image classification. Sci. Data 10, 41 (2023).
Article PubMed PubMed Central Google Scholar
Halder, A., Dey, D. & Sadhu, A. K. Lung nodule detection from feature engineering to deep learning in thoracic CT images: a comprehensive review. J. Digit. Imaging 33, 655–677 (2020).
Article PubMed PubMed Central Google Scholar
Mazzone, P. J. & Lam, L. Evaluating the patient with a pulmonary nodule: a review. JAMA 327, 264–273 (2022).
Article PubMed Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article CAS PubMed PubMed Central Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. https://doi.org/10.1038/s41591-023-02504-3 (2023).
Article PubMed PubMed Central Google Scholar
Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900–908 (2020).
Article CAS PubMed Google Scholar
Guan, H., Wang, L., Yao, D., Bozoki, A. & Liu, M. Learning transferable 3D-CNN for MRI-based brain disorder classification from scratch: an empirical study. In Proc. Machine Learning in Medical Imaging (eds. Lian, C. et al.) 10–19 (Springer, 2021).
Mustafa, B. et al. Supervised transfer learning at scale for medical imaging. Preprint at arXiv https://doi.org/10.48550/arXiv.2101.05913 (2021).
Raghu, M., Zhang, C., Kleinberg, J. & Bengio, S. Transfusion: understanding transfer learning for medical imaging. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) (Curran Associates Inc., 2019).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Article CAS PubMed PubMed Central Google Scholar
Newell, A. & Deng, J. How useful is self-supervised pretraining for visual tasks? In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7343–7352 (IEEE, 2020).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. & Singh, A.) (JMLR, 2020).
He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition 9726–9735 (IEEE, 2020).
Mitrovic, J., McWilliams, B., Walker, J., Buesing, L. & Blundell, C. Representation learning via invariant causal mechanisms. In Proc. International Conference on Learning Representations (2020).
Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow twins: self-supervised learning via redundancy reduction. In Proc. International Conference on Machine Learning (2021).
Kornblith, S., Norouzi, M., Lee, H. & Hinton, G. Similarity of neural network representations revisited. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) (Curran Associates, Inc., 2019).
Taleb, A. et al. 3D self-supervised methods for medical imaging. In Proc. 34th International Conference on Neural Information Processing Systems (Curran Associates, Inc., 2020).
Tang, Y. et al. Self-supervised pre-training of Swin transformers for 3D medical image analysis. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 20698–20708 (IEEE, 2022).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 (IEEE, 2022).
Woo, S. et al. ConvNeXt V2: co-designing and scaling ConvNets with masked autoencoders. In Proc. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition 16133–16142 (IEEE, 2023).
Kadambi, A. Achieving fairness in medical devices. Science 372, 30–31 (2021).
Article CAS PubMed Google Scholar
Chen, R. J. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7, 719–742 (2023).
Article PubMed PubMed Central Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M. et al.) (Curran Associates Inc., 2019).
Howard, J. & Gugger, S. fastai: a layered API for deep learning. Information, 11, 108 (2020).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at arXiv https://doi.org/10.48550/arXiv.1606.08415 (2016).
Smith, L. N. Cyclical learning rates for training neural networks. In Proc. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) 464–472 (IEEE, 2017).
Smith, L. N. & Topin, N. Super-convergence: very fast training of neural networks using large learning rates. Preprint at arXiv https://doi.org/10.48550/arXiv.1708.07120 (2018).
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).
Article PubMed PubMed Central Google Scholar
Kay, W. et al. The kinetics human action video dataset. Preprint at arXiv https://doi.org/10.48550/arXiv.1705.06950 (2017).
Ferris, F. L. et al. Clinical classification of age-related macular degeneration. Ophthalmology 120, 844–851 (2013).
Article PubMed Google Scholar
Nassisi, M. et al. OCT risk factors for development of late age-related macular degeneration in the fellow eyes of patients enrolled in the HARBOR study. Ophthalmology 126, 1667–1674 (2019).
Article PubMed Google Scholar
Lei, J., Balasubramanian, S., Abdelfattah, N. S., Nittala, M. G. & Sadda, S. R. Proposal of a simple optical coherence tomography-based scoring system for progression of age-related macular degeneration. Graefes Arch. Clin. Exp. Ophthalmol. 255, 1551–1558 (2017).
Article PubMed Google Scholar
Nittala, M. G. et al. Amis Eye Study: baseline spectral domain optical coherence tomography characteristics of age-related macular degeneration. Retina 39, 1540–1550 (2019).
Article PubMed PubMed Central Google Scholar
Armato, S. G. et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38, 915–931 (2011).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by a National Institutes of Health (NIH)/National Institute of General Medical Sciences grant 5R25GM135043, an NIH/National Institute of Biomedical Imaging and Bioengineering grant R01EB035028, NIH/National Eye Institute grants R01EY023164 and 1R01EY030614, and an Unrestricted Grant from Research to Prevent Blindness. This research was conducted using the UK Biobank Resource under application number 33127. We also acknowledge the participants of the UCLA Computational Genomics Summer Institute (supported by the aforementioned NIH/National Institute of General Medical Sciences grant) for stimulating discussions.

Author information

These authors contributed equally: Oren Avram, Berkin Durmus.

Authors and Affiliations

Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Oren Avram, Ulzee An, Prerit Terway, Akos Rudas, Elior Rahmani, Bolei Zhou, Sriram Sankararaman & Jeffrey N. Chiang
Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
Oren Avram, Berkin Durmus, Nadav Rakocz, Ulzee An, Prerit Terway, Zeyuan Johnson Chen, Bolei Zhou, Sriram Sankararaman & Eran Halperin
Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Oren Avram & Maxime Cannesson
Doheny Eye Institute, University of California, Los Angeles, Pasadena, CA, USA
Giulia Corradetti, Muneeswar G. Nittala, Yu Wakatsuki, Kazutaka Hirabayashi, Swetha Velaga, Liran Tiosano, Federico Corvi, Aditya Verma, Ayesha Karamat, Sophiana Lindenberg, Deniz Oncel, Louay Almidani, Victoria Hull, Sohaib Fasih-Ahmad, Houri Esmaeilkhanian & Srinivas R. Sadda
Department of Ophthalmology, University of California, Los Angeles, Los Angeles, CA, USA
Giulia Corradetti, Muneeswar G. Nittala & Srinivas R. Sadda
Department of Ophthalmology, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
Liran Tiosano
Department of Ophthalmology and Visual Sciences, University of Louisville, Louisville, KY, USA
Aditya Verma
Retina Consultants of Texas, Retina Consultants of America, Houston, TX, USA
Charles C. Wykoff
Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA
Charles C. Wykoff
Department of Radiology, University of California, Los Angeles, Los Angeles, CA, USA
Corey W. Arnold
Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, USA
Corey W. Arnold
Department of Pathology, University of California, Los Angeles, Los Angeles, CA, USA
Corey W. Arnold
Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
Noah Zaitlen
Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
Noah Zaitlen & Sriram Sankararaman
School of Computer Science, Reichman University, Herzliya, Israel
Ilan Gronau
Department of Neurosurgery, University of California, Los Angeles, Los Angeles, CA, USA
Jeffrey N. Chiang

Authors

Oren Avram
View author publications
Search author on:PubMed Google Scholar
Berkin Durmus
View author publications
Search author on:PubMed Google Scholar
Nadav Rakocz
View author publications
Search author on:PubMed Google Scholar
Giulia Corradetti
View author publications
Search author on:PubMed Google Scholar
Ulzee An
View author publications
Search author on:PubMed Google Scholar
Muneeswar G. Nittala
View author publications
Search author on:PubMed Google Scholar
Prerit Terway
View author publications
Search author on:PubMed Google Scholar
Akos Rudas
View author publications
Search author on:PubMed Google Scholar
Zeyuan Johnson Chen
View author publications
Search author on:PubMed Google Scholar
Yu Wakatsuki
View author publications
Search author on:PubMed Google Scholar
Kazutaka Hirabayashi
View author publications
Search author on:PubMed Google Scholar
Swetha Velaga
View author publications
Search author on:PubMed Google Scholar
Liran Tiosano
View author publications
Search author on:PubMed Google Scholar
Federico Corvi
View author publications
Search author on:PubMed Google Scholar
Aditya Verma
View author publications
Search author on:PubMed Google Scholar
Ayesha Karamat
View author publications
Search author on:PubMed Google Scholar
Sophiana Lindenberg
View author publications
Search author on:PubMed Google Scholar
Deniz Oncel
View author publications
Search author on:PubMed Google Scholar
Louay Almidani
View author publications
Search author on:PubMed Google Scholar
Victoria Hull
View author publications
Search author on:PubMed Google Scholar
Sohaib Fasih-Ahmad
View author publications
Search author on:PubMed Google Scholar
Houri Esmaeilkhanian
View author publications
Search author on:PubMed Google Scholar
Maxime Cannesson
View author publications
Search author on:PubMed Google Scholar
Charles C. Wykoff
View author publications
Search author on:PubMed Google Scholar
Elior Rahmani
View author publications
Search author on:PubMed Google Scholar
Corey W. Arnold
View author publications
Search author on:PubMed Google Scholar
Bolei Zhou
View author publications
Search author on:PubMed Google Scholar
Noah Zaitlen
View author publications
Search author on:PubMed Google Scholar
Ilan Gronau
View author publications
Search author on:PubMed Google Scholar
Sriram Sankararaman
View author publications
Search author on:PubMed Google Scholar
Jeffrey N. Chiang
View author publications
Search author on:PubMed Google Scholar
Srinivas R. Sadda
View author publications
Search author on:PubMed Google Scholar
Eran Halperin
View author publications
Search author on:PubMed Google Scholar

Contributions

O.A., B.D., N.R., G.C., U.A., M.G.N., B.Z., S.S., J.N.C., S.R.S. and E.H. contributed to the conception and design of the work. O.A., B.D., G.C., U.A., M.G.N., A.R., Z.J.C., Y.W., K.H., S.V., L.T., F.C., A.V., A.K., S.L., D.O., L.A., V.H., S.F.-A., H.E., C.C.W., S.R.S. and E.H. contributed to data acquisition. O.A., B.D., N.R., G.C., B.Z., N.Z., I.G., J.N.C., S.R.S. and E.H. contributed to the evaluation of the work. O.A., B.D., N.R., G.C., U.A., M.G.N., P.T., I.G., S.S., J.N.C., S.R.S. and E.H. contributed to the analysis and interpretation of the data. O.A., B.D., N.R., G.C., U.A., A.R., M.C., E.R., C.W.A., N.Z., I.G., S.S., J.N.C., S.R.S. and E.H. contributed to drafting and revising the paper. S.R.S. and E.H. contributed equally as co-advisers. All authors read and approved the final version of the paper.

Corresponding authors

Correspondence to Oren Avram, Srinivas R. Sadda or Eran Halperin.

Ethics declarations

Competing interests

E.H. has an affiliation with Optum. S.R.S. has affiliations with Abbvie/Allergan, Alexion, Amgen, Apellis, ARVO, Astellas, Bayer, Biogen, Boerhinger Ingelheim, Carl Zeiss Meditec, Centervue, Character, Eyepoint, Heidelberg, iCare, IvericBio, Jannsen, Macula Society, Nanoscope, Nidek, NotalVision, Novartis, Optos, OTx, Pfizer, Regeneron, Roche, Samsung Bioepis and Topcon. The other authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Tianyu Zhang, Yukun Zhou and the two other, anonymous, reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 PR AUC comparison of five models in four single-task AMD-biomarker classification problems when trained on less than 700 OCT volumes.

Shown are the PR AUC as an alternative scoring metric for the OCT experiments shown in Fig. 3. The left panel shows the performance when trained and tested on the Houston Dataset (see Supplementary Table 1). The right panel shows the performance when trained on the Houston Dataset and tested on the SLIVER-net Dataset (see Supplementary Table 2). The dashed lines represent the corresponding biomarker’s positive-label prevalence, which is the expected performance of a random model. Box plot whiskers represent a 90% CI.

Extended Data Fig. 2 Precision-recall performance compared to clinical retina specialists’ assessment.

Shown are the PR curves (blue) of SLIViT as an alternative scoring metric for the OCT experiments shown in Fig. 5. SLIViT was trained using less than 700 OCT volumes (Houston Dataset) and tested on an independent dataset (Pasadena Dataset). In each panel, the light-blue shaded area represents a 90% CI for SLIViT’s performance, the red dot represents the retina clinical specialists’ average performance, and the green asterisks correspond to the retina clinical specialists’ assessments. Two of the clinical specialists obtained the exact same performance score for IHRF classification.

Extended Data Fig. 3 SLIViT’s performance in a frame-shuffling experiment.

Shown are the ROC AUC scores distribution of 101 SLIViT models in four single-task classification problems of AMD high-risk factors (DV, IHRF, SDD, and hDC) trained on volumetric-OCT dataset. One model was trained on the OCT dataset in its original form, while the other 100 models were trained on randomly shuffled copies of the dataset. The performance ranks of the former model (Original) compared to the performance distribution of the latter models (Shuffled) were 22, 34, 56, and 47 for DV, IHRF, SDD, and hDC, respectively. The expected performance of a random classifier is 0.5. Box plot whiskers extend to the 5th and the 95th ranked models (out of the 100 shuffled models’ performance distribution).

Extended Data Fig. 4 ImageNet and OCT B-scans pre-training contribution for OCT-related downstream learning tasks.

Shown are the ROC (left) and PR (right) AUC scores across different fine-tuned models for volumetric-OCT classification tasks initialized with five different sets of weights. Combined, the proposed SLIViT’s initialization, is ImageNet weights initialization followed by supervised pre-training on the Kermany Dataset. ssCombined is an ImageNet weights initialization followed by self-supervised pre-training on an unlabeled version of the Kermany Dataset. The expected ROC AUC score of a random model is 0.5. The dashed lines represent the corresponding biomarker’s positive-label prevalence, which is the expected PR AUC score of a random model. Box plot whiskers represent a 90% CI.

Extended Data Fig. 5 ImageNet and OCT B-scans pre-training contribution for non-OCT-related downstream learning tasks.

Shown are the performance scores for the volumetric ultrasound and MRI regression tasks (R²) and the volumetric CT classification task (ROC AUC) initialized with five different sets of weights. Combined, the proposed SLIViT’s initialization, is ImageNet weights initialization followed by supervised pre-training on the Kermany Dataset. ssCombined is an ImageNet weights initialization followed by self-supervised pre-training on an unlabeled version of the Kermany Dataset. The expected R² and ROC AUC of a random model are 0 and 0.5, respectively. Box plot whiskers represent a 90% CI.

Extended Data Fig. 6 Feature similarity analysis between various pre-trained backbone projections.

Shown are nine scatterplots of similarity analysis (CKA) when comparing the projections of a biomedical-imaging dataset induced by different biomedical-imaging pre-trained backbones. Each panel corresponds to a different pair of pre-trained backbones (upper- biomedical pairs; middle- biomedical and ImageNet pairs; lower- biomedical and random pairs). In each panel, each of the 768 dots represents the similarity score computed for the projections induced by the corresponding filter. A dot is red if it falls within the top 5% scores (and gray otherwise). The dashed lines show the average score measured for the color-corresponding set of dots.

Extended Data Fig. 7 2D biomedical-imaging pre-training performance contribution for 3D OCT-related downstream learning tasks.

Shown are the ROC AUC scores on four volumetric-OCT single-task classification problems. Four SLIViT models were evaluated in every classification problem. Each SLIViT model was initialized with ImageNet weights and then pre-trained on a 2D biomedical-imaging dataset of a different modality. The considered modalities were CT, X-ray, OCT, and Mixed (containing all the images from the CT, X-ray, and OCT datasets). SLIVER-net’s performance (Domain-specific) is borrowed from Fig. 3. The expected performance of a random model is 0.5. Box plot whiskers represent a 90% CI.

Extended Data Fig. 8 2D biomedical-imaging pre-training performance contribution for 3D non-OCT-related downstream learning tasks.

Shown are the performance scores for the volumetric ultrasound and MRI regression tasks (R²) and the volumetric CT classification task (ROC AUC). Four SLIViT models were evaluated in every learning problem. Each SLIViT model was initialized with ImageNet weights and then pre-trained on a 2D biomedical-imaging dataset of a different modality. The considered modalities were CT, X-ray, OCT, and Mixed (containing all the images from the CT, X-ray, and OCT datasets). The performance scores of the domain-specific methods were borrowed from Fig. 2. The expected R² and ROC AUC of a random model are 0 and 0.5, respectively. Box plot whiskers represent a 90% CI.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4 and Tables 1 and 2.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Avram, O., Durmus, B., Rakocz, N. et al. Accurate prediction of disease-risk factors from volumetric medical scans by a deep vision model pre-trained with 2D scans. Nat. Biomed. Eng 9, 507–520 (2025). https://doi.org/10.1038/s41551-024-01257-9

Download citation

Received: 12 June 2023
Accepted: 23 August 2024
Published: 01 October 2024
Issue date: April 2025
DOI: https://doi.org/10.1038/s41551-024-01257-9

This article is cited by

Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders
- Muhammad Sajid
- Kaleem Razzaq Malik
- Seada Hussen
Scientific Reports (2025)
An eyecare foundation model for clinical assistance: a randomized controlled trial
- Yilan Wu
- Bo Qian
- Bin Sheng
Nature Medicine (2025)