Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Accurate prediction of disease-risk factors from volumetric medical scans by a deep vision model pre-trained with 2D scans

Abstract

The application of machine learning to tasks involving volumetric biomedical imaging is constrained by the limited availability of annotated datasets of three-dimensional (3D) scans for model training. Here we report a deep-learning model pre-trained on 2D scans (for which annotated data are relatively abundant) that accurately predicts disease-risk factors from 3D medical-scan modalities. The model, which we named SLIViT (for ‘slice integration by vision transformer’), preprocesses a given volumetric scan into 2D images, extracts their feature map and integrates it into a single prediction. We evaluated the model in eight different learning tasks, including classification and regression for six datasets involving four volumetric imaging modalities (computed tomography, magnetic resonance imaging, optical coherence tomography and ultrasound). SLIViT consistently outperformed domain-specific state-of-the-art models and was typically as accurate as clinical specialists who had spent considerable time manually annotating the analysed scans. Automating diagnosis tasks involving volumetric scans may save valuable clinician hours, reduce data acquisition costs and duration, and help expedite medical research and clinical applications.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The SLIViT framework.
Fig. 2: Overview of SLIViT’s performance across 3D imaging modalities.
Fig. 3: Performance comparison on four tasks of AMD-biomarker classification when trained on less than 700 OCT volumes.
Fig. 4: Performance comparison on cardiac function prediction tasks when trained on echocardiograms.
Fig. 5: SLIViT’s performance compared with manual assessment by retina clinical specialists.

Similar content being viewed by others

Data availability

The 2D OCT dataset was downloaded from https://data.mendeley.com/datasets/rscbjbr9sj/3. The 3D OCT datasets are not publicly available owing to institutional data-use policy and to concerns about patient privacy. However, they are available from the authors upon reasonable request and with permission of the IRB. The echocardiogram dataset was downloaded from https://stanfordaimi.azurewebsites.net/datasets/834e1cd1-92f7-4268-9daa-d359198b310a. The MRI dataset was downloaded from https://www.ukbiobank.ac.uk under application number 33127. The 3D CT, the 2D CT and the 2D X-ray datasets were downloaded from https://medmnist.com.

Code availability

The code of SLIViT is available via the project’s GitHub repository at https://github.com/cozygene/SLIViT.

References

  1. Chiang, J. N. et al. Automated identification of incomplete and complete retinal epithelial pigment and outer retinal atrophy using machine learning. Ophthalmol. Retina 7, 118–126 (2023).

    Article  PubMed  Google Scholar 

  2. Wong, T. Y., Liew, G. & Mitchell, P. Clinical update: new treatments for age-related macular degeneration. Lancet 370, 204–206 (2007).

    Article  PubMed  Google Scholar 

  3. Gandhi, S. K. et al. The pathogenesis of acute pulmonary edema associated with hypertension. N. Engl. J. Med. 344, 17–22 (2001).

    Article  CAS  PubMed  Google Scholar 

  4. Bloom, M. W. et al. Heart failure with reduced ejection fraction. Nat. Rev. Dis. Primers 3, 17058 (2017).

    Article  PubMed  Google Scholar 

  5. Guindalini, R. S. C. et al. Intensive surveillance with biannual dynamic contrast-enhanced magnetic resonance imaging downstages breast cancer in BRCA1 mutation carriers. Clin. Cancer Res. 25, 1786–1794 (2019).

    Article  PubMed  Google Scholar 

  6. Mann, R. M., Kuhl, C. K. & Moy, L. Contrast-enhanced MRI for breast cancer screening. J. Magn. Reson. Imaging 50, 377–390 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).

    Article  PubMed  Google Scholar 

  8. O’Shea, K. & Nash, R. An introduction to convolutional neural networks. Preprint at arXiv https://doi.org/10.48550/arXiv.1511.08458 (2015).

  9. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).

    Article  Google Scholar 

  10. Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).

    Article  CAS  PubMed  Google Scholar 

  11. Zhuang, F. et al. A comprehensive survey on transfer learning. In Proc. IEEE (ed. Setti, G.) 43–76 (IEEE, 2021).

  12. McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

    Article  CAS  PubMed  Google Scholar 

  13. Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 25, 65–69 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

    Article  PubMed  Google Scholar 

  16. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  17. Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.00747 (2020).

  19. Xie, Y., Zhang, J., Xia, Y. & Wu, Q. UniMiSS: Universal Medical Self-Supervised learning via breaking dimensionality barrier. In Proc. European Conference on Computer Vision (eds. Avidan, S. et al.) 558–575 (Springer, 2022).

  20. Azizi, S. et al. Big self-supervised models advance medical image classification. Preprint at arXiv https://doi.org/10.48550/arXiv.2101.05224 (2021).

  21. Wu, Z. et al. OCT signs of early atrophy in age-related macular degeneration: interreader agreement: classification of atrophy meetings report 6. Ophthalmol. Retina 6, 4–14 (2022).

    Article  PubMed  Google Scholar 

  22. Huang, S.-C. et al. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. npj Digit. Med. 6, 74 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology 295, 4–15 (2020).

    Article  PubMed  Google Scholar 

  25. Tran, D. et al. A closer look at spatiotemporal convolutions for action recognition. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6450–6459 (IEEE, 2018).

  26. Arnab, A. et al. ViViT: a video vision transformer. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 6816–6826 (IEEE, 2021).

  27. Zhu, H., Chen, B. & Yang, C. Understanding why ViT trains badly on small datasets: an intuitive perspective. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.03751 (2023).

  28. Rakocz, N. et al. Automated identification of clinical features from sparsely annotated 3-dimensional medical imaging. npj Digit. Med. 4, 44 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Ghorbani, A. et al. Deep learning interpretation of echocardiograms. npj Digit. Med. 3, 10 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Gupta, U. et al. Transferring models trained on natural images to 3D MRI via position encoded slice models. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.01491 (2023).

  31. Witowski, J. et al. Improving breast cancer diagnostics with deep learning for MRI. Sci. Transl. Med. 14, eabo4802 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Yang, M., Huang, X., Huang, L. & Cai, G. Diagnosis of Parkinson’s disease based on 3D ResNet: the frontal lobe is crucial. Biomed. Signal Process. Control 85, 104904 (2023).

    Article  Google Scholar 

  33. Zou, Q. et al. Three-dimensional ultrasound image reconstruction based on 3D-ResNet in the musculoskeletal system using a 1D probe: ex vivo and in vivo feasibility studies. Phys. Med. Biol. 68, 165003 (2023).

    Article  CAS  Google Scholar 

  34. Turnbull, R. Using a 3D ResNet for detecting the presence and severity of COVID-19 from CT scans. In Proc. Computer Vision – ECCV 2022 Workshops (eds Karlinsky, L. et al.) 663–676 (Springer, 2023).

  35. Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 9630–9640 (IEEE, 2021).

  36. Zhou, H.-Y., Lu, C., Yang, S., Han, X. & Yu, Y. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 3479–3489 (IEEE, 2021).

  37. Xie, Y., Zhang, J., Liao, Z., Xia, Y. & Shen, C. PGL: prior-guided local self-supervised learning for 3D medical image segmentation. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.12640 (2020).

  38. Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2003.04297 (2020).

  39. Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proc. 2021 IEEE/CVF International Conference on Computer Vision (ICVV) 9620–9629 (IEEE, 2021).

  40. Liu, Z. et al. A ConvNet for the 2020s. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (IEEE, 2022).

  41. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv https://doi.org/10.48550/arXiv.2010.11929 (2021)

  42. Gupta, U., Lam, P. K., Ver Steeg, G. & Thompson, P. M. Improved brain age estimation with slice-based set networks. In Proc. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) 840–844 (IEEE, 2021).

  43. Zeng, Y. et al. A 2.5D deep learning-based method for drowning diagnosis using post-mortem computed tomography. IEEE J. Biomed. Health Inform. 27, 1026–1035 (2023).

    Article  PubMed  Google Scholar 

  44. Schlemper, J. et al. Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Bertasius, G., Wang, H. & Torresani, L. Is space-time attention all you need for video understanding? In Proc. 38th International Conference on Machine Learning (ICML) (2021).

  46. Neimark, D., Bar, O., Zohar, M. & Asselmann, D. Video transformer network. In Proc. IEEE/CVF International Conference on Computer Vision (ICVV) 3156–3165 (2021).

  47. Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9 (2018).

    Article  CAS  PubMed  Google Scholar 

  48. Wong, W. L. et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Glob. Health 2, e106–e116 (2014).

    Article  PubMed  Google Scholar 

  49. Hirabayashi, K. et al. OCT risk factors for development of atrophy in eyes with intermediate age-related macular degeneration. Ophthalmol. Retina 7, 253–260 (2023).

    Article  PubMed  Google Scholar 

  50. Ouyang, D. et al. EchoNet-Dynamic: a large new cardiac motion video data resource for medical machine learning. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) (Curran Associates Inc., 2019).

  51. Ziaeian, B. & Fonarow, G. C. Epidemiology and aetiology of heart failure. Nat. Rev. Cardiol. 13, 368–378 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Klapholz, M. et al. Hospitalization for heart failure in the presence of a normal left ventricular ejection fraction: results of the New York Heart Failure Registry. J. Am. Coll. Cardiol. 43, 1432–1438 (2004).

    Article  PubMed  Google Scholar 

  53. Dunlay, S. M., Roger, V. L. & Redfield, M. M. Epidemiology of heart failure with preserved ejection fraction. Nat. Rev. Cardiol. 14, 591–602 (2017).

    Article  PubMed  Google Scholar 

  54. Idilman, I. S. et al. Hepatic steatosis: quantification by proton density fat fraction with MR imaging versus liver biopsy. Radiology 267, 767–775 (2013).

    Article  PubMed  Google Scholar 

  55. Jung, J. et al. Direct comparison of quantitative US versus controlled attenuation parameter for liver fat assessment using MRI proton density fat fraction as the reference standard in patients suspected of having NAFLD. Radiology 304, 75–82 (2022).

    Article  PubMed  Google Scholar 

  56. Runge, J. H. et al. MR spectroscopy-derived proton density fat fraction is superior to controlled attenuation parameter for detecting and grading hepatic steatosis. Radiology 286, 547–556 (2018).

    Article  PubMed  Google Scholar 

  57. Schawkat, K. et al. Preoperative evaluation of pancreatic fibrosis and lipomatosis: correlation of magnetic resonance findings with histology using magnetization transfer imaging and multigradient echo magnetic resonance imaging. Invest. Radiol. 53, 720–727 (2018).

    Article  PubMed  Google Scholar 

  58. Kühn, J.-P. et al. Pancreatic steatosis demonstrated at MR imaging in the general population: clinical relevance. Radiology 276, 129–136 (2015).

    Article  PubMed  Google Scholar 

  59. Patel, N. S. et al. Insulin resistance increases MRI-estimated pancreatic fat in nonalcoholic fatty liver disease and normal controls. Gastroenterol. Res. Pract. 2013, 498296 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Trout, A. T. et al. Relationship between abdominal fat stores and liver fat, pancreatic fat, and metabolic comorbidities in a pediatric population with non-alcoholic fatty liver disease. Abdom. Radiol. 44, 3107–3114 (2019).

    Article  Google Scholar 

  61. Covarrubias, Y. et al. Pilot study on longitudinal change in pancreatic proton density fat fraction during a weight-loss surgery program in adults with obesity. J. Magn. Reson. Imaging 50, 1092–1102 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Yang, J. et al. MedMNIST v2 – a large-scale lightweight benchmark for 2D and 3D biomedical image classification. Sci. Data 10, 41 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Halder, A., Dey, D. & Sadhu, A. K. Lung nodule detection from feature engineering to deep learning in thoracic CT images: a comprehensive review. J. Digit. Imaging 33, 655–677 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Mazzone, P. J. & Lam, L. Evaluating the patient with a pulmonary nodule: a review. JAMA 327, 264–273 (2022).

    Article  PubMed  Google Scholar 

  65. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. https://doi.org/10.1038/s41591-023-02504-3 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900–908 (2020).

    Article  CAS  PubMed  Google Scholar 

  68. Guan, H., Wang, L., Yao, D., Bozoki, A. & Liu, M. Learning transferable 3D-CNN for MRI-based brain disorder classification from scratch: an empirical study. In Proc. Machine Learning in Medical Imaging (eds. Lian, C. et al.) 10–19 (Springer, 2021).

  69. Mustafa, B. et al. Supervised transfer learning at scale for medical imaging. Preprint at arXiv https://doi.org/10.48550/arXiv.2101.05913 (2021).

  70. Raghu, M., Zhang, C., Kleinberg, J. & Bengio, S. Transfusion: understanding transfer learning for medical imaging. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M.) (Curran Associates Inc., 2019).

  71. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Newell, A. & Deng, J. How useful is self-supervised pretraining for visual tasks? In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7343–7352 (IEEE, 2020).

  73. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. & Singh, A.) (JMLR, 2020).

  74. He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition 9726–9735 (IEEE, 2020).

  75. Mitrovic, J., McWilliams, B., Walker, J., Buesing, L. & Blundell, C. Representation learning via invariant causal mechanisms. In Proc. International Conference on Learning Representations (2020).

  76. Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow twins: self-supervised learning via redundancy reduction. In Proc. International Conference on Machine Learning (2021).

  77. Kornblith, S., Norouzi, M., Lee, H. & Hinton, G. Similarity of neural network representations revisited. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) (Curran Associates, Inc., 2019).

  78. Taleb, A. et al. 3D self-supervised methods for medical imaging. In Proc. 34th International Conference on Neural Information Processing Systems (Curran Associates, Inc., 2020).

  79. Tang, Y. et al. Self-supervised pre-training of Swin transformers for 3D medical image analysis. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 20698–20708 (IEEE, 2022).

  80. He, K. et al. Masked autoencoders are scalable vision learners. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 (IEEE, 2022).

  81. Woo, S. et al. ConvNeXt V2: co-designing and scaling ConvNets with masked autoencoders. In Proc. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition 16133–16142 (IEEE, 2023).

  82. Kadambi, A. Achieving fairness in medical devices. Science 372, 30–31 (2021).

    Article  CAS  PubMed  Google Scholar 

  83. Chen, R. J. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7, 719–742 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M. et al.) (Curran Associates Inc., 2019).

  85. Howard, J. & Gugger, S. fastai: a layered API for deep learning. Information, 11, 108 (2020).

  86. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

  87. Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at arXiv https://doi.org/10.48550/arXiv.1606.08415 (2016).

  88. Smith, L. N. Cyclical learning rates for training neural networks. In Proc. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) 464–472 (IEEE, 2017).

  89. Smith, L. N. & Topin, N. Super-convergence: very fast training of neural networks using large learning rates. Preprint at arXiv https://doi.org/10.48550/arXiv.1708.07120 (2018).

  90. Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Kay, W. et al. The kinetics human action video dataset. Preprint at arXiv https://doi.org/10.48550/arXiv.1705.06950 (2017).

  92. Ferris, F. L. et al. Clinical classification of age-related macular degeneration. Ophthalmology 120, 844–851 (2013).

    Article  PubMed  Google Scholar 

  93. Nassisi, M. et al. OCT risk factors for development of late age-related macular degeneration in the fellow eyes of patients enrolled in the HARBOR study. Ophthalmology 126, 1667–1674 (2019).

    Article  PubMed  Google Scholar 

  94. Lei, J., Balasubramanian, S., Abdelfattah, N. S., Nittala, M. G. & Sadda, S. R. Proposal of a simple optical coherence tomography-based scoring system for progression of age-related macular degeneration. Graefes Arch. Clin. Exp. Ophthalmol. 255, 1551–1558 (2017).

    Article  PubMed  Google Scholar 

  95. Nittala, M. G. et al. Amis Eye Study: baseline spectral domain optical coherence tomography characteristics of age-related macular degeneration. Retina 39, 1540–1550 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Armato, S. G. et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38, 915–931 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by a National Institutes of Health (NIH)/National Institute of General Medical Sciences grant 5R25GM135043, an NIH/National Institute of Biomedical Imaging and Bioengineering grant R01EB035028, NIH/National Eye Institute grants R01EY023164 and 1R01EY030614, and an Unrestricted Grant from Research to Prevent Blindness. This research was conducted using the UK Biobank Resource under application number 33127. We also acknowledge the participants of the UCLA Computational Genomics Summer Institute (supported by the aforementioned NIH/National Institute of General Medical Sciences grant) for stimulating discussions.

Author information

Authors and Affiliations

Authors

Contributions

O.A., B.D., N.R., G.C., U.A., M.G.N., B.Z., S.S., J.N.C., S.R.S. and E.H. contributed to the conception and design of the work. O.A., B.D., G.C., U.A., M.G.N., A.R., Z.J.C., Y.W., K.H., S.V., L.T., F.C., A.V., A.K., S.L., D.O., L.A., V.H., S.F.-A., H.E., C.C.W., S.R.S. and E.H. contributed to data acquisition. O.A., B.D., N.R., G.C., B.Z., N.Z., I.G., J.N.C., S.R.S. and E.H. contributed to the evaluation of the work. O.A., B.D., N.R., G.C., U.A., M.G.N., P.T., I.G., S.S., J.N.C., S.R.S. and E.H. contributed to the analysis and interpretation of the data. O.A., B.D., N.R., G.C., U.A., A.R., M.C., E.R., C.W.A., N.Z., I.G., S.S., J.N.C., S.R.S. and E.H. contributed to drafting and revising the paper. S.R.S. and E.H. contributed equally as co-advisers. All authors read and approved the final version of the paper.

Corresponding authors

Correspondence to Oren Avram, Srinivas R. Sadda or Eran Halperin.

Ethics declarations

Competing interests

E.H. has an affiliation with Optum. S.R.S. has affiliations with Abbvie/Allergan, Alexion, Amgen, Apellis, ARVO, Astellas, Bayer, Biogen, Boerhinger Ingelheim, Carl Zeiss Meditec, Centervue, Character, Eyepoint, Heidelberg, iCare, IvericBio, Jannsen, Macula Society, Nanoscope, Nidek, NotalVision, Novartis, Optos, OTx, Pfizer, Regeneron, Roche, Samsung Bioepis and Topcon. The other authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks Tianyu Zhang, Yukun Zhou and the two other, anonymous, reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 PR AUC comparison of five models in four single-task AMD-biomarker classification problems when trained on less than 700 OCT volumes.

Shown are the PR AUC as an alternative scoring metric for the OCT experiments shown in Fig. 3. The left panel shows the performance when trained and tested on the Houston Dataset (see Supplementary Table 1). The right panel shows the performance when trained on the Houston Dataset and tested on the SLIVER-net Dataset (see Supplementary Table 2). The dashed lines represent the corresponding biomarker’s positive-label prevalence, which is the expected performance of a random model. Box plot whiskers represent a 90% CI.

Extended Data Fig. 2 Precision-recall performance compared to clinical retina specialists’ assessment.

Shown are the PR curves (blue) of SLIViT as an alternative scoring metric for the OCT experiments shown in Fig. 5. SLIViT was trained using less than 700 OCT volumes (Houston Dataset) and tested on an independent dataset (Pasadena Dataset). In each panel, the light-blue shaded area represents a 90% CI for SLIViT’s performance, the red dot represents the retina clinical specialists’ average performance, and the green asterisks correspond to the retina clinical specialists’ assessments. Two of the clinical specialists obtained the exact same performance score for IHRF classification.

Extended Data Fig. 3 SLIViT’s performance in a frame-shuffling experiment.

Shown are the ROC AUC scores distribution of 101 SLIViT models in four single-task classification problems of AMD high-risk factors (DV, IHRF, SDD, and hDC) trained on volumetric-OCT dataset. One model was trained on the OCT dataset in its original form, while the other 100 models were trained on randomly shuffled copies of the dataset. The performance ranks of the former model (Original) compared to the performance distribution of the latter models (Shuffled) were 22, 34, 56, and 47 for DV, IHRF, SDD, and hDC, respectively. The expected performance of a random classifier is 0.5. Box plot whiskers extend to the 5th and the 95th ranked models (out of the 100 shuffled models’ performance distribution).

Extended Data Fig. 4 ImageNet and OCT B-scans pre-training contribution for OCT-related downstream learning tasks.

Shown are the ROC (left) and PR (right) AUC scores across different fine-tuned models for volumetric-OCT classification tasks initialized with five different sets of weights. Combined, the proposed SLIViT’s initialization, is ImageNet weights initialization followed by supervised pre-training on the Kermany Dataset. ssCombined is an ImageNet weights initialization followed by self-supervised pre-training on an unlabeled version of the Kermany Dataset. The expected ROC AUC score of a random model is 0.5. The dashed lines represent the corresponding biomarker’s positive-label prevalence, which is the expected PR AUC score of a random model. Box plot whiskers represent a 90% CI.

Extended Data Fig. 5 ImageNet and OCT B-scans pre-training contribution for non-OCT-related downstream learning tasks.

Shown are the performance scores for the volumetric ultrasound and MRI regression tasks (R2) and the volumetric CT classification task (ROC AUC) initialized with five different sets of weights. Combined, the proposed SLIViT’s initialization, is ImageNet weights initialization followed by supervised pre-training on the Kermany Dataset. ssCombined is an ImageNet weights initialization followed by self-supervised pre-training on an unlabeled version of the Kermany Dataset. The expected R2 and ROC AUC of a random model are 0 and 0.5, respectively. Box plot whiskers represent a 90% CI.

Extended Data Fig. 6 Feature similarity analysis between various pre-trained backbone projections.

Shown are nine scatterplots of similarity analysis (CKA) when comparing the projections of a biomedical-imaging dataset induced by different biomedical-imaging pre-trained backbones. Each panel corresponds to a different pair of pre-trained backbones (upper- biomedical pairs; middle- biomedical and ImageNet pairs; lower- biomedical and random pairs). In each panel, each of the 768 dots represents the similarity score computed for the projections induced by the corresponding filter. A dot is red if it falls within the top 5% scores (and gray otherwise). The dashed lines show the average score measured for the color-corresponding set of dots.

Extended Data Fig. 7 2D biomedical-imaging pre-training performance contribution for 3D OCT-related downstream learning tasks.

Shown are the ROC AUC scores on four volumetric-OCT single-task classification problems. Four SLIViT models were evaluated in every classification problem. Each SLIViT model was initialized with ImageNet weights and then pre-trained on a 2D biomedical-imaging dataset of a different modality. The considered modalities were CT, X-ray, OCT, and Mixed (containing all the images from the CT, X-ray, and OCT datasets). SLIVER-net’s performance (Domain-specific) is borrowed from Fig. 3. The expected performance of a random model is 0.5. Box plot whiskers represent a 90% CI.

Extended Data Fig. 8 2D biomedical-imaging pre-training performance contribution for 3D non-OCT-related downstream learning tasks.

Shown are the performance scores for the volumetric ultrasound and MRI regression tasks (R2) and the volumetric CT classification task (ROC AUC). Four SLIViT models were evaluated in every learning problem. Each SLIViT model was initialized with ImageNet weights and then pre-trained on a 2D biomedical-imaging dataset of a different modality. The considered modalities were CT, X-ray, OCT, and Mixed (containing all the images from the CT, X-ray, and OCT datasets). The performance scores of the domain-specific methods were borrowed from Fig. 2. The expected R2 and ROC AUC of a random model are 0 and 0.5, respectively. Box plot whiskers represent a 90% CI.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4 and Tables 1 and 2.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Avram, O., Durmus, B., Rakocz, N. et al. Accurate prediction of disease-risk factors from volumetric medical scans by a deep vision model pre-trained with 2D scans. Nat. Biomed. Eng 9, 507–520 (2025). https://doi.org/10.1038/s41551-024-01257-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41551-024-01257-9

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research