Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 24 March 2026

Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework

  • Ruoyu Chen1 na1,
  • Weiyi Zhang1 na1,
  • Bowen Liu1 na1,
  • Xinyuan Wu1,
  • Xiaolan Chen1,
  • Pusheng Xu1,
  • Shunming Liu1,
  • Mingguang He1,2,3 &
  • …
  • Danli Shi1,2 

npj Digital Medicine , Article number:  (2026) Cite this article

  • 1217 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Eye diseases
  • Retinal diseases

Abstract

The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Though deep learning (DL) techniques offer promising avenues for improving diagnostic efficiency, data scarcity and imbalance issues persist in training robust diagnostic models, particularly for rare eye diseases. Here, we introduce EyeDiff, a generative foundation model capable of synthesizing lesion-preserving ophthalmic images from textual descriptions. Both objective metrics and expert human evaluations confirmed EyeDiff’s ability to generate high-fidelity images across multiple imaging modalities, accurately reflecting textual descriptions of diverse retinal diseases and lesion types. By augmenting minority classes across 11 globally sourced datasets, EyeDiff consistently boosted the diagnostic accuracy for both common and rare eye diseases across different foundation model types, including modality-specific, multimodal and vision-language foundation models trained solely on real data. These results underscore EyeDiff’s potential as a general-purpose text-to-image foundation model, offering a scalable and flexible approach to generate balanced, disease-relevant data for advancing retinal disease diagnosis.

Similar content being viewed by others

A multimodal retinal image dataset for diabetic retinopathy detection using foundation models

Article Open access 10 March 2026

A generalizable eye disease detection method based on Zero-Shot Learning

Article Open access 12 March 2026

Optimising deep learning models for ophthalmological disorder classification

Article Open access 24 January 2025

Data availability

The data for model training in the current study are available as open data through the following links: Retinal Image Bank (https://imagebank.asrs.org/), EyePACS (https://www.kaggle.com/c/diabetic-retinopathy-detection/data), OCTDL (https://ieee-dataport.org/documents/octdl-optical-coherence-tomography-dataset-image-based-deep-learning-methods), REFUGE (https://bitbucket.org/woalsdnd/refuge/src/master/), ORIGA (https://figshare.com/articles/dataset/Retinal_Fundus_Glaucoma_Image_dataset/24549217?file=43119880), RIM-ONE (https://bit.ly/rim-one-dl-images), DRISHTI (https://www.kaggle.com/datasets/lokeshsaipureddi/drishtigs-retina-dataset-for-onh-segmentation), GAMMA (https://paperswithcode.com/dataset/gamma-challenge). The data for validation in the current study are available as open data through the following links: IDRID (https://ieee-dataport.org/open-access/ indian-diabetic-retinopathy-image-dataset-idrid), MESSIDOR-2 (https://www.adcis.net/en/third-party/messidor2/), APTOS-2019 (https://www.kaggle.com/competitions/aptos2019-blindness-detection/data), PAPILA (https://figshare.com/articles/dataset/PAPILA/14798004/1), Glaucoma Fundus (https://dataverse.harvard.edu/dataset. xhtml?persistentId=https://doi.org/10.7910/DVN/1YRRAC), JSIEC (https://zenodo.org/record/3477553), Retina (https://www.kaggle.com/datasets/jr2ngb/ cataractdataset), OCTID (https://borealisdata.ca/dataverse/OCTID) and OCTDL(https://ieee-dataport.org/documents/octdl-optical-coherence-tomography-dataset-image-based-deep-learning-methods).

Code availability

The deep-learning model was developed using PyTorch (http://pytorch.org). We trained the model on an NVIDIA V100 card. The code for deep learning model development can be accessed at https://github.com/huggingface/diffusers/tree/main/examples/dreambooth.

References

  1. Raimundo, R. & Rosário, A. The impact of artificial intelligence on data system security: a literature review. Sensors 21, https://doi.org/10.3390/s21217029 (2021).

  2. Lama, H. et al. Severe macular complications in glaucoma: high-resolution multimodal imaging characteristics and review of the literature. BMC Ophthalmol. 23, 318 (2023).

    Google Scholar 

  3. Stino, H. et al. Association of diabetic lesions and retinal nonperfusion using widefield multimodal imaging. Ophthalmol. Retin. 7, 1042–1050 (2023).

    Google Scholar 

  4. Rahman, N., Georgiou, M., Khan, K. N. & Michaelides, M. Macular dystrophies: clinical and imaging features, molecular genetics and therapeutic options. Br. J. Ophthalmol. 104, 451–460 (2020).

    Google Scholar 

  5. Vij, R. & Arora, S. A systematic review on diabetic retinopathy detection using deep learning techniques. Arch. Comput. Methods Eng. 30, 2211–2256 (2023).

    Google Scholar 

  6. Vij, R. & Arora, S. A systematic survey of advances in retinal imaging modalities for Alzheimer’s disease diagnosis. Metab. Brain Dis. 37, 2213–2243 (2022).

    Google Scholar 

  7. Aung, Y. Y. M., Wong, D. C. S. & Ting, D. S. W. The promise of artificial intelligence: a review of the opportunities and challenges of artificial intelligence in healthcare. Br. Med. Bull. 139, 4–15 (2021).

    Google Scholar 

  8. Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digital Health 4, e406–e414 (2022).

    Google Scholar 

  9. Vij, R. & Arora, S. A novel deep transfer learning based computerized diagnostic Systems for Multi-class imbalanced diabetic retinopathy severity classification. Multimed. Tools Appl. 82, 34847–34884 (2023).

    Google Scholar 

  10. Khalifa, N. E., Loey, M. & Mirjalili, S. A comprehensive survey of recent trends in deep learning for digital images augmentation. Artif. Intell. Rev. 55, 2351–2377 (2022).

    Google Scholar 

  11. Goceri, E. Medical image data augmentation: techniques, comparisons and interpretations. Artif. Intell. Rev. 1-45 (2023).

  12. Gao, L., Zhang, L., Liu, C. & Wu, S. Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020).

    Google Scholar 

  13. Khan, A. A., Chaudhari, O. & Chandra, R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst. Appl. 244, 122778 (2024).

    Google Scholar 

  14. Chen, R. et al. Translating color fundus photography to indocyanine green angiography using deep-learning for age-related macular degeneration screening. NPJ Digit. Med. 7, 34 (2024).

    Google Scholar 

  15. Shi, D. et al. Translation of color fundus photography into fluorescein angiography using deep learning for enhanced diabetic retinopathy screening. Ophthalmol. Sci. 3, 100401 (2023).

    Google Scholar 

  16. Kugelman, J. et al. Data augmentation for patch-based OCT chorio-retinal segmentation using generative adversarial networks. Neural Comput. Appl. 33, 7393–7408 (2021).

  17. Yoo, T. K., Choi, J. Y. & Kim, H. K. Feasibility study to improve deep learning in OCT diagnosis of rare retinal diseases with few-shot classification. Med. Biol. Eng. Comput. 59, 401–415 (2021).

    Google Scholar 

  18. Sonmez, S. C., Sevgi, M., Antaki, F., Huemer, J. & Keane, P. A. Generative artificial intelligence in ophthalmology: current innovations, future applications and challenges. Br. J. Ophthalmol. 108, 1335–1340 (2024).

    Google Scholar 

  19. Chen, R. et al. Noninvasive synthesis of multiframe ultra-widefield fluorescein angiography from color fundus photographs. Ophthalmol. Retina https://doi.org/10.1016/j.oret.2025.08.002 (2025).

  20. Rombach, R. et al. High-resolution image synthesis with latent diffusion models. 10674–10685 (2021).

  21. Tian, Y., Fan, L., Isola, P., Chang, H. & Krishnan, D. J. A. StableRep: synthetic images from text-to-image models make strong visual representation learners. abs/2306.00984 (2023).

  22. Xu, K. et al. Digital twins in ophthalmology: Concepts, applications, and challenges. Asia Pac. J. Ophthalmol. 14, 100205 (2025).

  23. Wu, X. et al. Generation of Fundus fluorescein angiography videos for health care data sharing. JAMA Ophthalmol. https://doi.org/10.1001/jamaophthalmol.2025.1419 (2025).

  24. Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

    Google Scholar 

  25. Shi, D. et al. EyeFound: a multimodal generalist foundation model for ophthalmic imaging. ArXiv abs/2405.11338 (2024).

  26. Shi, D. et al. A multimodal visual-language foundation model for computational ophthalmology. NPJ Digit. Med. 8, 381 (2025).

    Google Scholar 

  27. Lin, Z. et al. in Computer Vision–ECCV 2024. (eds Aleš Leonardis et al.) 366-384 (Springer Nature Switzerland).

  28. Porwal, P. et al. IDRiD: diabetic retinopathy–segmentation and grading challenge. Med. Image Anal. 59, 101561 (2020).

    Google Scholar 

  29. Ahn, J. M. et al. A deep learning model for the detection of both advanced and early glaucoma using fundus photography. PLoS One 13, e0207982 (2018).

    Google Scholar 

  30. Cen, L.-P. et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat. Commun. 12, 4828 (2021).

    Google Scholar 

  31. Gholami, P., Roy, P., Parthasarathy, M. K. & Lakshminarayanan, V. OCTID: optical coherence tomography image database. Comput. Electr. Eng. 81, 106532 (2020).

    Google Scholar 

  32. Kulyabin, M. et al. OCTDL: optical coherence tomography dataset for image-based deep learning methods. Sci. Data 11, 365 (2024).

    Google Scholar 

  33. Kovalyk, O. et al. PAPILA: dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment. Sci. Data 9, 291 (2022).

    Google Scholar 

  34. Xu, P. et al. Benchmarking large multimodal models for ophthalmic visual question answering with OphthalWeChat. Adv. Ophthalmol. Pract. Res. 6, 33–41 (2025).

    Google Scholar 

  35. Sharma, M. Overcoming challenges in research and development of rare eye diseases. Indian J. Ophthalmol. 70, 2214–2215 (2022).

    Google Scholar 

  36. Vij, R. & Arora, S. A Systematic Review on Deep Learning Techniques for Diabetic Retinopathy Segmentation and Detection Using Ocular Imaging Modalities. Wirel. Personal. Commun. 134, 1153–1229 (2024).

    Google Scholar 

  37. Vij, R. & Arora, S. A hybrid evolutionary weighted ensemble of deep transfer learning models for retinal vessel segmentation and diabetic retinopathy detection. Comput. Electr. Eng. 115, 109107 (2024).

    Google Scholar 

  38. He, S. et al. Bridging the camera domain gap with image-to-image translation improves glaucoma diagnosis. Transl. Vis. Sci. Technol. 12, 20–20 (2023).

    Google Scholar 

  39. Song, F., Zhang, W., Zheng, Y., Shi, D. & He, M. A deep learning model for generating fundus autofluorescence images from color fundus photography. Adv. Ophthalmol. Pr. Res. 3, 192–198 (2023).

    Google Scholar 

  40. Shi, D., He, S., Yang, J., Zheng, Y. & He, M. One-shot retinal artery and vein segmentation via cross-modality pretraining. Ophthalmol. Sci. 4, 100363 (2024).

    Google Scholar 

  41. Zhang, W. et al. in Medical Image Computing and Computer Assisted Intervention–MICCAI. 689-699 (Springer Nature Switzerland).

  42. Dhariwal, P. & Nichol, A. J. A. Diffusion models beat GANs on image synthesis. (2021).

  43. Wu, J. et al. GAMMA Challenge: glaucoma grading from multi-modality images. 90, 102938 (Elsevier, 2022).

  44. Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama. 316, 2402–2410 (2016).

    Google Scholar 

  45. Orlando, J. I. et al. REFUGE challenge: a unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Med. Image Anal. 59, 101570 (2020).

    Google Scholar 

  46. Zhang, Z. et al. ORIGA(-light): an online retinal fundus image database for glaucoma analysis and research. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference 2010, 3065-3068 (2010).

  47. Fumero, F., Alayón, S., Sánchez, J. L., Sigut, J. F. & Gonzalez-Hernandez, M. J. t. I. S. o. C.-B. M. S. RIM-ONE: an open retinal image database for optic nerve evaluation. 1-6 (2011).

  48. Sivaswamy, J. et al. Drishti-GS: retinal image dataset for optic nerve head(ONH) segmentation. 53-56 (2014).

  49. Ho, J. Classifier-Free Diffusion Guidance. ArXiv (2022).

  50. Zhang, R., Isola, P., Efros, A. A., Shechtman, E. & Wang, O. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 586-595 (IEEE, 2018).

  51. Hessel, J., Holtzman, A., Forbes, M., Le Bras, R. & Choi, Y. CLIPScore: a reference-free evaluation metric for image captioning. ArXiv abs/2104.08718 (2021).

  52. Moon, H. H. et al. Generative AI in glioma: ensuring diversity in training image phenotypes to improve diagnostic performance for IDH mutation prediction. Neuro Oncol. 26, 1124–1135 (2024).

    Google Scholar 

  53. Al-Hammuri, K., Gebali, F., Kanan, A. & Chelvan, I. T. Vision transformer architecture and applications in digital health: a tutorial and survey. Vis. Comput. Ind. Biomed. Art. 6, 14 (2023).

    Google Scholar 

  54. Aburass, S., Dorgham, O., Al Shaqsi, J., Abu Rumman, M. & Al-Kadi, O. Vision transformers in medical imaging: a comprehensive review of advancements and applications across multiple diseases. J. Imaging Inform. Med. https://doi.org/10.1007/s10278-025-01481-y (2025).

  55. Rodriguez, M. A., AlMarzouqi, H. & Liatsis, P. Multi-label retinal disease classification using transformers. IEEE J. Biomed. Health Inf. 27, 2739–2750 (2023).

    Google Scholar 

  56. Oulhadj, M. et al. Diabetic retinopathy prediction based on vision transformer and modified capsule network. Comput Biol. Med. 175, 108523 (2024).

    Google Scholar 

Download references

Acknowledgements

We thank the American Society of Retina Specialists for providing the valuable Retina Image Bank and the InnoHK HKSAR Government for providing valuable support. The study was supported by the Start-up Fund for RAPs under the Strategic Hiring Scheme (P0048623) from HKSAR, Global STEM Professorship Scheme (P0046113), and Henry G. Leong Endowed Professorship in Elderly Vision Health. The sponsors or funding organizations had no role in the design or conduct of this research.

Author information

Author notes
  1. These authors contributed equally: Ruoyu Chen, Weiyi Zhang, Bowen Liu.

Authors and Affiliations

  1. School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China

    Ruoyu Chen, Weiyi Zhang, Bowen Liu, Xinyuan Wu, Xiaolan Chen, Pusheng Xu, Shunming Liu, Mingguang He & Danli Shi

  2. Research Centre for SHARP Vision, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China

    Mingguang He & Danli Shi

  3. Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hong Kong SAR, China

    Mingguang He

Authors
  1. Ruoyu Chen
    View author publications

    Search author on:PubMed Google Scholar

  2. Weiyi Zhang
    View author publications

    Search author on:PubMed Google Scholar

  3. Bowen Liu
    View author publications

    Search author on:PubMed Google Scholar

  4. Xinyuan Wu
    View author publications

    Search author on:PubMed Google Scholar

  5. Xiaolan Chen
    View author publications

    Search author on:PubMed Google Scholar

  6. Pusheng Xu
    View author publications

    Search author on:PubMed Google Scholar

  7. Shunming Liu
    View author publications

    Search author on:PubMed Google Scholar

  8. Mingguang He
    View author publications

    Search author on:PubMed Google Scholar

  9. Danli Shi
    View author publications

    Search author on:PubMed Google Scholar

Contributions

D.S. conceived the study. D.S. built the deep learning model. D.S., R.C, and W.Z. conducted the literature search and analyzed the data. R.C. and X.C. completed human evaluation. W.Z. performed validation of downstream tasks and quantitative evaluation. R.C. wrote the manuscript. R.C, B.L, P.X., S.L, and X.W. organized figures and tables in this study. M.H. provided the data and facilities. All authors critically revised the manuscript. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Mingguang He or Danli Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Zhang, W., Liu, B. et al. Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02560-2

Download citation

  • Received: 28 October 2024

  • Accepted: 08 March 2026

  • Published: 24 March 2026

  • DOI: https://doi.org/10.1038/s41746-026-02560-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing