Comparative analysis of generic vision-language models in detecting and diagnosing inherited retinal diseases using fundus photographs

Meng, Xiang; Wong, Wendy Meihua; Pushpanathan, Krithi; Srinivasan, Sahana; Xue, Cancan; Wang, Meng; Miao, Heng; Li, Hengtong; Yang, Liping; Cen, Ling-Ping; Chen, Li Jia; Chan, Hwei Wuen; Tham, Yih-Chung; Cheng, Ching-Yu

doi:10.1038/s41433-025-04013-8

Article
Published: 07 October 2025

Comparative analysis of generic vision-language models in detecting and diagnosing inherited retinal diseases using fundus photographs

Eye volume 39, pages 3187–3194 (2025)Cite this article

296 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Background

To evaluate the clinical applicability of three generic Vision-Large-Language Models (VLLMs) — OpenAI’s GPT-4omni, GPT-4V(ision) and Google’s Gemini in detecting and diagnosing inherited retinal diseases (IRDs), using fundus photographs.

Methods

The head-to-head comparative study curated 60 ultra-widefield (UWF) fundus images of 30 IRD patients from the National University Hospital, Singapore. Additionally, ten normal, open-sourced UWF fundus images were included for comparison. The 70 fundus images were analysed by the three VLLMs using standardised prompts to generate descriptions of 10 specified retinal features and provide clinical insights. Each VLLM received 2100 scores for descriptions across ten features, rated by three blinded consultant-level graders using three-point scale (0 = poor, 1 = borderline, 2 = good). Clinical insights including disease detection, diagnosis and pathological gene inference evaluated against clinical ground-truth.

Results

GPT-4o achieved the highest mean quality score in feature description (1.64 [0.697], mean [SEM]), outperforming GPT-4V (1.57 [0.738]) and Gemini (1.46 [0.800]; both p < 0.001). All models demonstrated high detection accuracy (\(\ge\)81.4%), but Gemini incorrectly classified all normal fundus images as IRD. GPT-4omni (65.7%) outperformed GPT-4V (50%) and Gemini (60%) in diagnosis accuracy. Gene inference precision remained low (\(\le\)20.3%) across all models. High concordance was observed across all models between feature descriptions and diagnoses (\(\ge\)97.1%), between diagnoses and clinical recommendations (100%).

Conclusions

GPT-4omni and GPT-4V demonstrated promising potential in detecting IRDs from fundus photographs, with good feature extraction capabilities and high detection accuracy. Gemini struggled with misidentifying normal fundus images. All three VLLMs require further refinement to improve diagnostic accuracy and gene inference.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The specific case examples of fundus images for six IRD-related genes curated in the study.**

**Fig. 2: Flowchart of overall study design.**

**Fig. 3: The model’s mean quality scores representing features description capability.**

**Fig. 4: Models’ performance in classifying fundus images.**

**Fig. 5: Heatmap of prediction genes for GPT-4o, GPT-4V and Gemini.**

Genotype-phenotype correlations for 17 Chinese families with inherited retinal dystrophies due to homozygous variants

Article Open access 24 January 2025

Specialized curricula for training vision language models in retinal image analysis

Article Open access 19 August 2025

Survey of perspectives of people with inherited retinal diseases on ocular gene therapy in Australia

Article Open access 02 October 2022

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620:172–80.
Article PubMed PubMed Central CAS Google Scholar
Liévin V, Hotherc E, Motzfeldt AG, Winther O. Can large language models reason about medical questions?. ArXiv. 2023. https://arxiv.org/abs/2207.08143.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.
Antaki F, Milad D, Chia MA, Giguère C, Touma S, El-Khoury J, et al. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol. 2024;108:1371–8.
Article PubMed Google Scholar
Measuring performance on the Healthcare Access and Quality Index for 195 countries and territories and selected subnational locations: a systematic analysis from the Global Burden of Disease Study 2016. Lancet. 2018;391:2236–71.
Ayuso C, Millan JM. Retinitis pigmentosa and allied conditions today: a paradigm of translational research. Genome Med. 2010;2:34.
Article PubMed PubMed Central Google Scholar
Hanany M, Rivolta C, Sharon D. Worldwide carrier frequency and genetic prevalence of autosomal recessive inherited retinal diseases. Proc Natl Acad Sci USA. 2020;117:2710–6.
Article PubMed PubMed Central CAS Google Scholar
Heath Jeffery RC, Mukhtar SA, Mcallister IL, Morgan WH, Mackey DA, Chen FK. Inherited retinal diseases are the most common cause of blindness in the working-age population in Australia. Ophthalmic Genet. 2021;42:431–9.
Article PubMed PubMed Central CAS Google Scholar
Liew G, Michaelides M, Bunce C. A comparison of the causes of blindness certifications in England and Wales in working age adults (16–64 years), 1999–2000 with 2009–2010. BMJ Open. 2014;4:e004015.
Article PubMed PubMed Central Google Scholar
Galvin O, Chi G, Brady L, Hippert C, Del Valle Rubido M, Daly A, et al. The impact of inherited retinal diseases in the Republic of Ireland (ROI) and the United Kingdom (UK) from a cost-of-illness perspective. Clin Ophthalmol. 2020;14:707–19.
Article PubMed PubMed Central Google Scholar
Wong WM, Tham YC, Simunovic MP, Chen FK, Luu CD, Chen H, et al. Rationale and protocol paper for the Asia Pacific Network for inherited eye diseases. Asia Pac J Ophthalmol. 2024;13:100030.
Article CAS Google Scholar
Horiuchi D, Tatekawa H, Oura T, Oue S, Walston SL, Takita H, et al. Comparing the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in challenging neuroradiology cases. Clin Neuroradiol. 2024;34:779–87.
Article PubMed Google Scholar
Mert S, Stoerzer P, Brauer J, Fuchs B, Haas-Lützenberger EM, Demmer W, et al. Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs. Arch Orthop Trauma Surg. 2024;144:2461–7.
Article PubMed PubMed Central Google Scholar
Koga S. Evaluating ChatGPT in pathology: towards multimodal AI in medical imaging. J Clin Pathol. 2024;78:70.
Antaki F, Chopra R, Keane PA. Vision-language models for feature detection of macular diseases on optical coherence tomography. JAMA Ophthalmol. 2024;142:573–6.
Article PubMed PubMed Central Google Scholar
Cheong KX, Zhang C, Tan T-E, Fenner BJ, Wong WM, Teo KY, et al. Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy. Brit J Ophthalmol. 2024;108:1443–9.
Article Google Scholar
Berger W, Kloeckener-Gruissem B, Neidhardt J. The molecular basis of human retinal and vitreoretinal diseases. Prog Retin Eye Res. 2010;29:335–75.
Article PubMed CAS Google Scholar
Jacobson, S, Buraczynska G, Milam M, A H, Chen C, et al. Disease expression in X-linked retinitis pigmentosa caused by a putative null mutation in the RPGR gene. Invest Ophthalmol Vis Sci. 1997;38:1983–97.
PubMed CAS Google Scholar
Salmaninejad A, Motaee J, Farjami M, Alimardani M, Esmaeilie A, Pasdar A. Next-generation sequencing and its application in diagnosis of retinitis pigmentosa. Ophthalmic Genet. 2019;40:393–402.
Article PubMed Google Scholar
Konstantinou EK, Shaikh N, Ramsey DJ. Birt-Hogg-Dubé syndrome associated with chorioretinopathy and nyctalopia: a case report and review of the literature. Ophthalmic Genet. 2023;44:175–81.
Article PubMed Google Scholar
Patal R, Banin E, Batash T, Sharon D, Levy J. Ultra-widefield fundus autofluorescence imaging in patients with autosomal recessive retinitis pigmentosa reveals a genotype–phenotype correlation. Graefe’s Arch Clin Exp Ophthalmol. 2022;260:3471–8.
Article CAS Google Scholar
Abalem MF, Otte B, Andrews C, Joltikov KA, Branham K, Fahim AT, et al. Peripheral visual fields in ABCA4 Stargardt disease and correlation with disease extent on ultra-widefield fundus autofluorescence. Am J Ophthalmol. 2017;184:181–8.
Article PubMed PubMed Central Google Scholar
Masumoto H, Tabuchi H, Nakakura S, Ohsugi H, Enno H, Ishitobi N, et al. Accuracy of a deep convolutional neural network in detection of retinitis pigmentosa on ultrawide-field images. PeerJ. 2019;7:e6900.
Article PubMed PubMed Central Google Scholar
Government Technology Agency (GovTech). Mastering the art of prompt engineering with Empower [Internet]. Singapore: GovTech TechNews; 2025 Apr 3 [cited 2025 Sep 23]. https://www.tech.gov.sg/technews/mastering-the-art-of-prompt-engineering-with-empower.
Jacque L, Duncan KB, David GB, Stephen PD, Fishman GA, et al. Guidelines on clinical assessment of patients with inherited retinal degenerations [Internet]. San Francisco (CA): American Academy of Ophthalmology; 2022 [cited 2025 Sep 23]. https://www.aao.org/education/clinical-statement/guidelines-on-clinical-assessment-of-patients-with.
Georgiou M, Robson AG, Fujinami K, De Guimarães TAC, Fujinami-Yokokawa Y, Daich Varela M, et al. Phenotyping and genotyping inherited retinal diseases: molecular genetics, clinical and imaging features, and therapeutics of macular dystrophies, cone and cone-rod dystrophies, rod-cone dystrophies, Leber congenital amaurosis, and cone dysfunction syndromes. Prog Retin Eye Res. 2024;100:101244.
Article PubMed CAS Google Scholar
Liu Y, Xie H, Zhao X, Tang J, Yu Z, Wu Z, et al. Automated detection of nine infantile fundus diseases and conditions in retinal images using a deep learning system. EPMA J. 2024;15:39–51.
Article PubMed PubMed Central Google Scholar
Lu MY, Chen B, Williamson DFK, Chen RJ, Zhao M, Chow AK, et al. A multimodal generative AI Copilot for human pathology. Nature. 2024;634:466–73.
Article PubMed PubMed Central CAS Google Scholar
Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, et al. A visual-language foundation model for computational pathology. Nat Med. 2024;30:863–74.
Article PubMed PubMed Central CAS Google Scholar
Huang Z, Bianchi F, Yuksekgonul M, Montine TJ, Zou J. A visual–language foundation model for pathology image analysis using medical Twitter. Nature Med. 2023;29:2307–16.
Article PubMed CAS Google Scholar
Ikezogwo, W, Seyfioglu O, M, Ghezloo S, Geva F, et al. Quilt-1M: one million image-text pairs for histopathology. Adv Neural Inf Process Syst. 2023;36:37995–8017.
PubMed PubMed Central Google Scholar
Wei J, Wang X, Schuurmans D, Bosma M, Chi EH-H, Xia F, et al. Chain of thought prompting elicits reasoning in large language models. ArXiv. 2022. https://arxiv.org/abs/2201.11903.
Gu J, Han Z, Chen S, Beirami A, He B, Zhang G, et al. A systematic survey of prompt engineering on vision-language foundation models. ArXiv. 2023. https://arxiv.org/abs/2307.12980.
Liu S, Lin Z, Yu S, Lee R, Ling T, Pathak D, et al. Language models as black-box optimizers for vision-language models. 2023. https://ui.adsabs.harvard.edu/abs/2023arXiv230905950L. https://doi.org/10.48550/arXiv.2309.05950.
Apornvirat S, Namboonlue C, Laohawetwanit T. Comparative analysis of ChatGPT and Bard in answering pathology examination questions requiring image interpretation. Am J Clin Pathol. 2024;162:252–60.
Giray L. Prompt engineering with ChatGPT: a guide for academic writers. Ann Biomed Eng. 2023;51:2629–33.
Article PubMed Google Scholar
Mao R, Chen G, Zhang X, Guerin F, Cambria E. GPTEval: a survey on assessments of ChatGPT and GPT-4. ArXiv. 2023. https://arxiv.org/abs/2308.12488.
Wang YX, Panda-Jonas S, Jonas JB. Optic nerve head anatomy in myopia and glaucoma, including parapapillary zones alpha, beta, gamma and delta: Histology and clinical features. Progr Retinal Eye Res. 2021;83:100933.
Article Google Scholar
Sorin V, Kapelushnik N, Hecht I, Zloto O, Glicksberg BS, Bufman H, et al. Integrated visual and text-based analysis of ophthalmology clinical cases using a large language model. Sci Rep. 2025;15:4999.
Article PubMed PubMed Central CAS Google Scholar
Xu P, Chen X, Zhao Z, Shi D. Unveiling the clinical incapabilities: a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis. Br J Ophthalmol. 2024;108:1384–9.
Article PubMed Google Scholar
Rahmanzadehgervi P, Bolton L, Taesiri MR, Nguyen AT. Vision language models are blind. In: Computer Vision – ACCV 2024: 17th Asian Conference on Computer Vision, Proceedings, Part V. Springer-Verlag; 2024, pp. 293–309, https://doi.org/10.1007/978-981-96-0917-8_17.
Zhou Y, Chia MA, Wagner SK, Ayhan MS, Williamson DJ, Struyven RR, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622:156–63.
Article PubMed PubMed Central CAS Google Scholar
Pontikos N, Woof W, Veturi A, Javanmardi B, Ibarra-Arellano M, Hustinx A, et al. Eye2Gene: prediction of causal inherited retinal disease gene from multimodal imaging using deep-learning. Invest Ophthalmol Vis Sci. 2022;63:1161.
Google Scholar
Huang C, Jiang A, Feng J, Zhang Y, Wang X, Wang Y. Adapting visual-language models for generalizable anomaly detection in medical images. ArXiv. 2024. https://arxiv.org/abs/2403.12570.
Van M-H, Verma P, Wu X. On large visual language models for medical imaging analysis: an empirical study. ArXiv. 2024. https://arxiv.org/abs/2402.14162.
Bejani MM, Ghatee M. A systematic review on overfitting control in shallow and deep neural networks. Artif Intell Rev. 2021;54:6391–438.
Article Google Scholar
Eli AA, Ali A. Deep learning applications in medical image analysis: advancements, challenges, and future directions. ArXiv. 2024. https://arxiv.org/abs/2410.14131.

Download references

Funding

This work was supported by grants from the National Medical Research Council, Singapore (MOH-CSASI22jul-0001; to CYC). XM acknowledge the support of China Scholarship Council program (project ID:202306010300).

Author information

These authors contributed equally: Xiang Meng, Wendy Meihua Wong
These authors jointly supervised this work: Yih-Chung Tham and Ching-Yu Cheng

Authors and Affiliations

Centre for Innovation & Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Xiang Meng, Wendy Meihua Wong, Krithi Pushpanathan, Meng Wang, Hengtong Li, Hwei Wuen Chan, Yih-Chung Tham & Ching-Yu Cheng
Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Xiang Meng, Wendy Meihua Wong, Krithi Pushpanathan, Meng Wang, Hengtong Li, Hwei Wuen Chan, Yih-Chung Tham & Ching-Yu Cheng
Department of Ophthalmology, Peking University People’s Hospital, Beijing, China
Xiang Meng & Heng Miao
Department of Ophthalmology, Third Hospital, Peking University, Beijing, China
Xiang Meng & Liping Yang
Department of Ophthalmology, National University Hospital, Singapore, Singapore
Wendy Meihua Wong, Hwei Wuen Chan & Yih-Chung Tham
Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
Sahana Srinivasan, Cancan Xue & Ching-Yu Cheng
Beijing Key Laboratory of Ocular Disease and Optometry Science, Beijing, China
Heng Miao
Beijing Key Laboratory of Restoration of Damaged Ocular Nerve, Peking University Third Hospital, Beijing, China
Liping Yang
Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, School of Medical Technology, Guangdong Medical University, Zhanjiang, China
Ling-Ping Cen
Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
Li Jia Chen

Authors

Xiang Meng
View author publications
Search author on:PubMed Google Scholar
Wendy Meihua Wong
View author publications
Search author on:PubMed Google Scholar
Krithi Pushpanathan
View author publications
Search author on:PubMed Google Scholar
Sahana Srinivasan
View author publications
Search author on:PubMed Google Scholar
Cancan Xue
View author publications
Search author on:PubMed Google Scholar
Meng Wang
View author publications
Search author on:PubMed Google Scholar
Heng Miao
View author publications
Search author on:PubMed Google Scholar
Hengtong Li
View author publications
Search author on:PubMed Google Scholar
Liping Yang
View author publications
Search author on:PubMed Google Scholar
Ling-Ping Cen
View author publications
Search author on:PubMed Google Scholar
Li Jia Chen
View author publications
Search author on:PubMed Google Scholar
Hwei Wuen Chan
View author publications
Search author on:PubMed Google Scholar
Yih-Chung Tham
View author publications
Search author on:PubMed Google Scholar
Ching-Yu Cheng
View author publications
Search author on:PubMed Google Scholar

Contributions

Conception and design of the study (YCT and CYC); Acquisition, analysis, and interpretation of data (XM, KP, HTL and WMW); Drafting the work (XM); Revising the work (XM, CCX, MW, HM and LPY); Consultant-grade evaluators for clinical diagnostic assessment (WMW, LJC and LPC); supervision (HWC, YCT and CYC); Final approval of the version to be published (all authors).

Corresponding authors

Correspondence to Yih-Chung Tham or Ching-Yu Cheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Tables (download PDF )

Supplementary Figure 1 (download TIF )

Supplementary Figure 2 (download TIF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Meng, X., Wong, W.M., Pushpanathan, K. et al. Comparative analysis of generic vision-language models in detecting and diagnosing inherited retinal diseases using fundus photographs. Eye 39, 3187–3194 (2025). https://doi.org/10.1038/s41433-025-04013-8

Download citation

Received: 13 November 2024
Revised: 31 July 2025
Accepted: 10 September 2025
Published: 07 October 2025
Version of record: 07 October 2025
Issue date: December 2025
DOI: https://doi.org/10.1038/s41433-025-04013-8

Comparative analysis of generic vision-language models in detecting and diagnosing inherited retinal diseases using fundus photographs

Subjects

Abstract

Background

Methods

Results

Conclusions

Access options

Similar content being viewed by others

Genotype-phenotype correlations for 17 Chinese families with inherited retinal dystrophies due to homozygous variants

Specialized curricula for training vision language models in retinal image analysis

Survey of perspectives of people with inherited retinal diseases on ocular gene therapy in Australia

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental Tables (download PDF )

Supplementary Figure 1 (download TIF )

Supplementary Figure 2 (download TIF )

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Background

Methods

Results

Conclusions

Access options

Similar content being viewed by others

Genotype-phenotype correlations for 17 Chinese families with inherited retinal dystrophies due to homozygous variants

Specialized curricula for training vision language models in retinal image analysis

Survey of perspectives of people with inherited retinal diseases on ocular gene therapy in Australia

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplemental Tables (download PDF )

Supplementary Figure 1 (download TIF )

Supplementary Figure 2 (download TIF )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links