Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Shifting the retinal foundation models paradigm from slices to volumes for optical coherence tomography
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 05 March 2026

Shifting the retinal foundation models paradigm from slices to volumes for optical coherence tomography

  • Raphael Judkiewicz1,
  • Eran Berkowitz2,3,4,
  • Meishar Meisel2,
  • Tomer Michaeli5 &
  • …
  • Joachim A. Behar1 

npj Digital Medicine , Article number:  (2026) Cite this article

  • 1834 Accesses

  • 3 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Diagnostic markers
  • Eye diseases

Abstract

Optical Coherence Tomography (OCT) is essential in ophthalmology for cross-sectional imaging of the retina. Pretrained foundation models facilitate task-specific model development by enabling fine-tuning with limited labeled data. However, current foundation models rely on a single B-scan (usually the central slice), overlooking volumetric context. This research investigates video foundation models to capture full 3D retinal structure and improve diagnostic performance. V-JEPA, a state-of-the-art video foundation model, was benchmarked against retinal foundation models (RETFound, VisionFM) and a natural image foundation model (DINOv2). All were fine-tuned to detect Age-related Macular Degeneration or Glaucomatous Optic Neuropathy using five OCT datasets. V-JEPA consistently equaled or outperformed image-based models, achieving an average AUROC of 0.94 (0.80–0.99), versus 0.90 (0.76–0.98) for the best image model, a statistically significant improvement (p < 0.001). To our knowledge, this is the first application of transformer-based video models to volumetric OCT, highlighting their promise in 3D medical imaging.

Similar content being viewed by others

OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods

Article Open access 11 April 2024

Outer retinal tubulation formation and clinical course of advanced age-related macular degeneration

Article Open access 19 July 2021

Functional and structural ophthalmic imaging using noncontact multimodal photoacoustic remote sensing microscopy and optical coherence tomography

Article Open access 01 June 2021

Data availability

We used open access datasets: CirrusOCT (https://zenodo.org/records/1481223) and Gamma (gamma.grand-challenge.org), A2A OCT (people.duke.edu/~sf59/RPEDC_Ophth_2013_dataset.htm) NEH-UT (data.mendeley.com/datasets/8kt969dhx6/2). The HYRD dataset may be made available for non-commercial academic use from the authors with permission from the Hillel Yaffe Medical Center. Please contact the corresponding author regarding such requests.

Code availability

The source code used in this study is available through this GitHub repository https://github.com/aim-lab/oct-fm-slices-to-volumes that will be made public upon publication. Information on how to run the code is contained within the README.md file.

References

  1. World Health Organization. World report on vision. https://www.who.int/publications/i/item/world-report-on-vision (2019).

  2. Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).

    Google Scholar 

  3. Ting, D. S. W. et al. Artificial intelligence and deep learning in ophthalmology. Br. J. Ophthalmol. 103, 167–175 (2019).

    Google Scholar 

  4. Cheung, C. Y. et al. A deep learning model for detection of alzheimer’s disease based on retinal photographs: a retrospective, multicentre case-control study. Lancet Digit. Health 4, E806–E815 (2022).

    Google Scholar 

  5. Srivastava, O., Tennant, M., Grewal, P., Rubin, U. & Seamone, M. Artificial intelligence and machine learning in ophthalmology: a review. Indian J. Ophthalmol. 71, 11–17 (2023).

    Google Scholar 

  6. Men, Y. et al. Drstagenet: deep learning for diabetic retinopathy staging from fundus images. Physiol. Meas. 46, 015001 (2025).

    Google Scholar 

  7. Abramovich, O. et al. Gonet: a generalizable deep learning model for glaucoma detection. IEEE Trans. Biomed. Eng. 73, 32–39 (2025).

  8. Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).

    Google Scholar 

  9. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

    Google Scholar 

  10. Qiu, J. et al. Development and validation of a multimodal multitask vision foundation model for generalist ophthalmic artificial intelligence. NEJM AI 1, AIoa2300221 (2024).

    Google Scholar 

  11. Shi, D. et al. Eyefound: a multimodal generalist foundation model for ophthalmic imaging. In Proc. Poster session presented at The International Conference of Vision and Eye Research (iCover) (ARVO, 2024).

  12. Bardes, A. et al. Revisiting feature prediction for learning visual representations from video arXiv preprint arXiv:2404.08471 (2024).

  13. Oquab, M. et al. DINOv2: learning robust visual features without supervision. Trans. Mach. Learn. Res. (2024).

  14. Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9 (2018).

    Google Scholar 

  15. Wen, H. et al. Towards more efficient ophthalmic disease classification and lesion location via convolution transformer. Comput. Methods Prog. Biomed. 220, 106832 (2022).

    Google Scholar 

  16. Leuschen, J. N. et al. Spectral-domain optical coherence tomography characteristics of intermediate age-related macular degeneration. Ophthalmology 120, 140–150 (2013).

    Google Scholar 

  17. Farsiu, S. et al. Quantitative classification of eyes with and without intermediate age-related macular degeneration using optical coherence tomography. Ophthalmology 121, 162–172 (2014).

    Google Scholar 

  18. Maetschke, S. et al. A feature agnostic approach for glaucoma detection in oct volumes. PLoS ONE 14, e0219126 (2019).

    Google Scholar 

  19. Wu, J. et al. Gamma challenge: glaucoma grading from multi-modality images. Med. Image Anal. 90, 102938 (2023).

    Google Scholar 

  20. Zhou, J. et al. iBOT: image BERT pre-training with online tokenizer. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).

  21. Wua, Y. et al. An eyecare foundation model for clinical assistance: a randomized controlled trial. Nat. Med. 31, 3404–3413 (2025).

    Google Scholar 

  22. Shi, D. et al. A multimodal visual-language foundation model for computational ophthalmology. npj Digit. Med. 8, 381 (2025).

    Google Scholar 

  23. Morano, J. et al. Multimodal foundation model and benchmark for comprehensive retinal oct image analysis. npj Digit. Med. 8, 576 (2025).

    Google Scholar 

  24. Silva-Rodríguez, J., Chakor, H., Kobbi, R., Dolz, J. & Ayed, I. B. A foundation language-image model of the retina (flair): encoding expert knowledge in text supervision. Med. Image Anal. 99, 103357 (2025).

    Google Scholar 

  25. DeFauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

    Google Scholar 

  26. Ran, A. R. et al. Three-dimensional multi-task deep learning model to detect glaucomatous optic neuropathy and myopic features from optical coherence tomography scans: a retrospective multi-centre study. Front. Med. 9, 860574 (2022).

    Google Scholar 

  27. Sotoudeh-Paima, S., Jodeiri, A., Hajizadeh, F. & Soltanian-Zadeh, H. Multi-scale convolutional neural network for automated amd classification using retinal oct images. Comput. Biol. Med. 144, 105368 (2022).

    Google Scholar 

  28. Tong, Z., Song, Y., Wang, J. & Wang, L. Videomae: masked autoencoders are data-efficient learners for self-supervised video pre-training. In Proc. 36th International Conference on Neural Information Processing Systems, 10078–10093 (NIPS, 2022).

  29. Cubuk, E. D., Zoph, B., Shlens, J. & Le, Q. V. Randaugment: practical automated data augmentation with a reduced search space. In Proc. 34th International Conference on Neural Information Processing Systems, 18613–18624 (NIPS, 2020).

  30. Arnab, A. et al. Vivit: a video vision transformer. In Proc. IEEE/CVF International Conference on Computer Vision, 6836–6846 (ICCV, 2021).

  31. Feichtenhofer, C., Fan, H. & He, Y. L. K. Masked autoencoders as spatiotemporal learners. In Proc. 36th International Conference on Neural Information Processing Systems, Vol. 35, 35946–35958 (NIPS, 2022).

Download references

Acknowledgements

We acknowledge the assistance of ChatGPT, an AI-based language model developed by OpenAI, in editing the manuscript. R.J. and J.B. acknowledge the support of the Zimin Foundation.

Author information

Authors and Affiliations

  1. Faculty of Biomedical Engineering, Technion-IIT, Haifa, Israel

    Raphael Judkiewicz & Joachim A. Behar

  2. Department of Ophthalmology, Hillel Yaffe Medical Center, Hadera, Israel

    Eran Berkowitz & Meishar Meisel

  3. The Ruth and Bruce Rappaport Faculty of Medicine, Technion-IIT, Haifa, Israel

    Eran Berkowitz

  4. The Adelson School of Medicine, Ariel University, Ariel, Israel

    Eran Berkowitz

  5. Faculty of Electrical and Computer Engineering, Technion-IIT, Haifa, Israel

    Tomer Michaeli

Authors
  1. Raphael Judkiewicz
    View author publications

    Search author on:PubMed Google Scholar

  2. Eran Berkowitz
    View author publications

    Search author on:PubMed Google Scholar

  3. Meishar Meisel
    View author publications

    Search author on:PubMed Google Scholar

  4. Tomer Michaeli
    View author publications

    Search author on:PubMed Google Scholar

  5. Joachim A. Behar
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: J.B. and R.J. Methodology: J.B., R.J., and T.M. Data curation: E.B. and M.M. Investigation: J.B. and R.J. Funding acquisition: J.B. Writing—original draft: J.B. and R.J. Writing—review & editing: J.B., R.J., E.B., M.M., and T.M.

Corresponding author

Correspondence to Joachim A. Behar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Judkiewicz, R., Berkowitz, E., Meisel, M. et al. Shifting the retinal foundation models paradigm from slices to volumes for optical coherence tomography. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02496-7

Download citation

  • Received: 02 June 2025

  • Accepted: 17 February 2026

  • Published: 05 March 2026

  • DOI: https://doi.org/10.1038/s41746-026-02496-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing