Abstract
Background
To evaluate the clinical applicability of three generic Vision-Large-Language Models (VLLMs) — OpenAI’s GPT-4omni, GPT-4V(ision) and Google’s Gemini in detecting and diagnosing inherited retinal diseases (IRDs), using fundus photographs.
Methods
The head-to-head comparative study curated 60 ultra-widefield (UWF) fundus images of 30 IRD patients from the National University Hospital, Singapore. Additionally, ten normal, open-sourced UWF fundus images were included for comparison. The 70 fundus images were analysed by the three VLLMs using standardised prompts to generate descriptions of 10 specified retinal features and provide clinical insights. Each VLLM received 2100 scores for descriptions across ten features, rated by three blinded consultant-level graders using three-point scale (0 = poor, 1 = borderline, 2 = good). Clinical insights including disease detection, diagnosis and pathological gene inference evaluated against clinical ground-truth.
Results
GPT-4o achieved the highest mean quality score in feature description (1.64 [0.697], mean [SEM]), outperforming GPT-4V (1.57 [0.738]) and Gemini (1.46 [0.800]; both p < 0.001). All models demonstrated high detection accuracy (\(\ge\)81.4%), but Gemini incorrectly classified all normal fundus images as IRD. GPT-4omni (65.7%) outperformed GPT-4V (50%) and Gemini (60%) in diagnosis accuracy. Gene inference precision remained low (\(\le\)20.3%) across all models. High concordance was observed across all models between feature descriptions and diagnoses (\(\ge\)97.1%), between diagnoses and clinical recommendations (100%).
Conclusions
GPT-4omni and GPT-4V demonstrated promising potential in detecting IRDs from fundus photographs, with good feature extraction capabilities and high detection accuracy. Gemini struggled with misidentifying normal fundus images. All three VLLMs require further refinement to improve diagnostic accuracy and gene inference.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 18 print issues and online access
$259.00 per year
only $14.39 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620:172–80.
Liévin V, Hotherc E, Motzfeldt AG, Winther O. Can large language models reason about medical questions?. ArXiv. 2023. https://arxiv.org/abs/2207.08143.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.
Antaki F, Milad D, Chia MA, Giguère C, Touma S, El-Khoury J, et al. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol. 2024;108:1371–8.
Measuring performance on the Healthcare Access and Quality Index for 195 countries and territories and selected subnational locations: a systematic analysis from the Global Burden of Disease Study 2016. Lancet. 2018;391:2236–71.
Ayuso C, Millan JM. Retinitis pigmentosa and allied conditions today: a paradigm of translational research. Genome Med. 2010;2:34.
Hanany M, Rivolta C, Sharon D. Worldwide carrier frequency and genetic prevalence of autosomal recessive inherited retinal diseases. Proc Natl Acad Sci USA. 2020;117:2710–6.
Heath Jeffery RC, Mukhtar SA, Mcallister IL, Morgan WH, Mackey DA, Chen FK. Inherited retinal diseases are the most common cause of blindness in the working-age population in Australia. Ophthalmic Genet. 2021;42:431–9.
Liew G, Michaelides M, Bunce C. A comparison of the causes of blindness certifications in England and Wales in working age adults (16–64 years), 1999–2000 with 2009–2010. BMJ Open. 2014;4:e004015.
Galvin O, Chi G, Brady L, Hippert C, Del Valle Rubido M, Daly A, et al. The impact of inherited retinal diseases in the Republic of Ireland (ROI) and the United Kingdom (UK) from a cost-of-illness perspective. Clin Ophthalmol. 2020;14:707–19.
Wong WM, Tham YC, Simunovic MP, Chen FK, Luu CD, Chen H, et al. Rationale and protocol paper for the Asia Pacific Network for inherited eye diseases. Asia Pac J Ophthalmol. 2024;13:100030.
Horiuchi D, Tatekawa H, Oura T, Oue S, Walston SL, Takita H, et al. Comparing the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in challenging neuroradiology cases. Clin Neuroradiol. 2024;34:779–87.
Mert S, Stoerzer P, Brauer J, Fuchs B, Haas-Lützenberger EM, Demmer W, et al. Diagnostic power of ChatGPT 4 in distal radius fracture detection through wrist radiographs. Arch Orthop Trauma Surg. 2024;144:2461–7.
Koga S. Evaluating ChatGPT in pathology: towards multimodal AI in medical imaging. J Clin Pathol. 2024;78:70.
Antaki F, Chopra R, Keane PA. Vision-language models for feature detection of macular diseases on optical coherence tomography. JAMA Ophthalmol. 2024;142:573–6.
Cheong KX, Zhang C, Tan T-E, Fenner BJ, Wong WM, Teo KY, et al. Comparing generative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy. Brit J Ophthalmol. 2024;108:1443–9.
Berger W, Kloeckener-Gruissem B, Neidhardt J. The molecular basis of human retinal and vitreoretinal diseases. Prog Retin Eye Res. 2010;29:335–75.
Jacobson, S, Buraczynska G, Milam M, A H, Chen C, et al. Disease expression in X-linked retinitis pigmentosa caused by a putative null mutation in the RPGR gene. Invest Ophthalmol Vis Sci. 1997;38:1983–97.
Salmaninejad A, Motaee J, Farjami M, Alimardani M, Esmaeilie A, Pasdar A. Next-generation sequencing and its application in diagnosis of retinitis pigmentosa. Ophthalmic Genet. 2019;40:393–402.
Konstantinou EK, Shaikh N, Ramsey DJ. Birt-Hogg-Dubé syndrome associated with chorioretinopathy and nyctalopia: a case report and review of the literature. Ophthalmic Genet. 2023;44:175–81.
Patal R, Banin E, Batash T, Sharon D, Levy J. Ultra-widefield fundus autofluorescence imaging in patients with autosomal recessive retinitis pigmentosa reveals a genotype–phenotype correlation. Graefe’s Arch Clin Exp Ophthalmol. 2022;260:3471–8.
Abalem MF, Otte B, Andrews C, Joltikov KA, Branham K, Fahim AT, et al. Peripheral visual fields in ABCA4 Stargardt disease and correlation with disease extent on ultra-widefield fundus autofluorescence. Am J Ophthalmol. 2017;184:181–8.
Masumoto H, Tabuchi H, Nakakura S, Ohsugi H, Enno H, Ishitobi N, et al. Accuracy of a deep convolutional neural network in detection of retinitis pigmentosa on ultrawide-field images. PeerJ. 2019;7:e6900.
Government Technology Agency (GovTech). Mastering the art of prompt engineering with Empower [Internet]. Singapore: GovTech TechNews; 2025 Apr 3 [cited 2025 Sep 23]. https://www.tech.gov.sg/technews/mastering-the-art-of-prompt-engineering-with-empower.
Jacque L, Duncan KB, David GB, Stephen PD, Fishman GA, et al. Guidelines on clinical assessment of patients with inherited retinal degenerations [Internet]. San Francisco (CA): American Academy of Ophthalmology; 2022 [cited 2025 Sep 23]. https://www.aao.org/education/clinical-statement/guidelines-on-clinical-assessment-of-patients-with.
Georgiou M, Robson AG, Fujinami K, De Guimarães TAC, Fujinami-Yokokawa Y, Daich Varela M, et al. Phenotyping and genotyping inherited retinal diseases: molecular genetics, clinical and imaging features, and therapeutics of macular dystrophies, cone and cone-rod dystrophies, rod-cone dystrophies, Leber congenital amaurosis, and cone dysfunction syndromes. Prog Retin Eye Res. 2024;100:101244.
Liu Y, Xie H, Zhao X, Tang J, Yu Z, Wu Z, et al. Automated detection of nine infantile fundus diseases and conditions in retinal images using a deep learning system. EPMA J. 2024;15:39–51.
Lu MY, Chen B, Williamson DFK, Chen RJ, Zhao M, Chow AK, et al. A multimodal generative AI Copilot for human pathology. Nature. 2024;634:466–73.
Lu MY, Chen B, Williamson DFK, Chen RJ, Liang I, Ding T, et al. A visual-language foundation model for computational pathology. Nat Med. 2024;30:863–74.
Huang Z, Bianchi F, Yuksekgonul M, Montine TJ, Zou J. A visual–language foundation model for pathology image analysis using medical Twitter. Nature Med. 2023;29:2307–16.
Ikezogwo, W, Seyfioglu O, M, Ghezloo S, Geva F, et al. Quilt-1M: one million image-text pairs for histopathology. Adv Neural Inf Process Syst. 2023;36:37995–8017.
Wei J, Wang X, Schuurmans D, Bosma M, Chi EH-H, Xia F, et al. Chain of thought prompting elicits reasoning in large language models. ArXiv. 2022. https://arxiv.org/abs/2201.11903.
Gu J, Han Z, Chen S, Beirami A, He B, Zhang G, et al. A systematic survey of prompt engineering on vision-language foundation models. ArXiv. 2023. https://arxiv.org/abs/2307.12980.
Liu S, Lin Z, Yu S, Lee R, Ling T, Pathak D, et al. Language models as black-box optimizers for vision-language models. 2023. https://ui.adsabs.harvard.edu/abs/2023arXiv230905950L. https://doi.org/10.48550/arXiv.2309.05950.
Apornvirat S, Namboonlue C, Laohawetwanit T. Comparative analysis of ChatGPT and Bard in answering pathology examination questions requiring image interpretation. Am J Clin Pathol. 2024;162:252–60.
Giray L. Prompt engineering with ChatGPT: a guide for academic writers. Ann Biomed Eng. 2023;51:2629–33.
Mao R, Chen G, Zhang X, Guerin F, Cambria E. GPTEval: a survey on assessments of ChatGPT and GPT-4. ArXiv. 2023. https://arxiv.org/abs/2308.12488.
Wang YX, Panda-Jonas S, Jonas JB. Optic nerve head anatomy in myopia and glaucoma, including parapapillary zones alpha, beta, gamma and delta: Histology and clinical features. Progr Retinal Eye Res. 2021;83:100933.
Sorin V, Kapelushnik N, Hecht I, Zloto O, Glicksberg BS, Bufman H, et al. Integrated visual and text-based analysis of ophthalmology clinical cases using a large language model. Sci Rep. 2025;15:4999.
Xu P, Chen X, Zhao Z, Shi D. Unveiling the clinical incapabilities: a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis. Br J Ophthalmol. 2024;108:1384–9.
Rahmanzadehgervi P, Bolton L, Taesiri MR, Nguyen AT. Vision language models are blind. In: Computer Vision – ACCV 2024: 17th Asian Conference on Computer Vision, Proceedings, Part V. Springer-Verlag; 2024, pp. 293–309, https://doi.org/10.1007/978-981-96-0917-8_17.
Zhou Y, Chia MA, Wagner SK, Ayhan MS, Williamson DJ, Struyven RR, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622:156–63.
Pontikos N, Woof W, Veturi A, Javanmardi B, Ibarra-Arellano M, Hustinx A, et al. Eye2Gene: prediction of causal inherited retinal disease gene from multimodal imaging using deep-learning. Invest Ophthalmol Vis Sci. 2022;63:1161.
Huang C, Jiang A, Feng J, Zhang Y, Wang X, Wang Y. Adapting visual-language models for generalizable anomaly detection in medical images. ArXiv. 2024. https://arxiv.org/abs/2403.12570.
Van M-H, Verma P, Wu X. On large visual language models for medical imaging analysis: an empirical study. ArXiv. 2024. https://arxiv.org/abs/2402.14162.
Bejani MM, Ghatee M. A systematic review on overfitting control in shallow and deep neural networks. Artif Intell Rev. 2021;54:6391–438.
Eli AA, Ali A. Deep learning applications in medical image analysis: advancements, challenges, and future directions. ArXiv. 2024. https://arxiv.org/abs/2410.14131.
Funding
This work was supported by grants from the National Medical Research Council, Singapore (MOH-CSASI22jul-0001; to CYC). XM acknowledge the support of China Scholarship Council program (project ID:202306010300).
Author information
Authors and Affiliations
Contributions
Conception and design of the study (YCT and CYC); Acquisition, analysis, and interpretation of data (XM, KP, HTL and WMW); Drafting the work (XM); Revising the work (XM, CCX, MW, HM and LPY); Consultant-grade evaluators for clinical diagnostic assessment (WMW, LJC and LPC); supervision (HWC, YCT and CYC); Final approval of the version to be published (all authors).
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meng, X., Wong, W.M., Pushpanathan, K. et al. Comparative analysis of generic vision-language models in detecting and diagnosing inherited retinal diseases using fundus photographs. Eye 39, 3187–3194 (2025). https://doi.org/10.1038/s41433-025-04013-8
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41433-025-04013-8


