Abstract
Since its introduction in November 2022, the public interest in the utility of large language models (LLMs) has gained widespread adoption among individual consumers and among medical practitioners, with a consequent increase in publications describing their utility in healthcare. This review highlights original research articles on how LLM’s can be utilized by various stakeholders in ophthalmology through clinical assistance, patient education, medical education, and research. ChatGPT consistently responds with better accuracy and quality than other LLMs across various studies employing different methodologies, with newer iterations offering more advantages. Studies have likewise identified limitations of LLMs, which include hallucination, inability to interpret image-based prompts, and limited performance across non-English languages. As newer iterations of available and more advanced models with image processing are currently being introduced, generative artificial intelligence should be continuously monitored for its implications in eye care.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 18 print issues and online access
$259.00 per year
only $14.39 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others
References
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930–40. https://doi.org/10.1038/s41591-023-02448-8.
Cascella M, Semeraro F, Montomoli J, Bellini V, Piazza O, Bignami E. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. J Med Syst. 2024;48:22. https://doi.org/10.1007/s10916-024-02045-3.
Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885. https://doi.org/10.2196/46885.
Zandi R, Fahey JD, Drakopoulos M, Bryan JM, Dong S, Bryar PJ, et al. Exploring diagnostic precision and triage proficiency: a comparative study of GPT-4 and bard in addressing common ophthalmic complaints. Bioengineering. 2024;11:120. https://doi.org/10.3390/bioengineering11020120.
Lyons RJ, Arepalli SR, Fromal O, Choi JD, Jain N. Artificial intelligence chatbot performance in triage of ophthalmic conditions. Can J Ophthalmol. 2024;59:e301–e308. https://doi.org/10.1016/j.jcjo.2023.07.016.
Singh S, Djalilian A, Ali MJ. ChatGPT and ophthalmology: exploring its potential with discharge summaries and operative notes. Semin Ophthalmol. 2023;38:503–7. https://doi.org/10.1080/08820538.2023.2209166.
Gopalakrishnan N, Joshi A, Chhablani J, Yadav NK, Reddy NG, Rani PK, et al. Recommendations for initial diabetic retinopathy screening of diabetic patients using large language model-based artificial intelligence in real-life case scenarios. Int J Retin Vitreous. 2024;10:11. https://doi.org/10.1186/s40942-024-00533-9.
Choudhary A, Gopalakrishnan N, Joshi A, Balakrishnan D, Chhablani J, Yadav NK, et al. Recommendations for diabetic macular edema management by retina specialists and large language model-based artificial intelligence platforms. Int J Retin Vitreous. 2024;10:22. https://doi.org/10.1186/s40942-024-00544-6.
Liu X, Wu J, Shao A, Shen W, Ye P, Wang Y, et al. Uncovering language disparity of ChatGPT on retinal vascular disease classification: cross-sectional study. J Med Internet Res. 2024;26:e51926. https://doi.org/10.2196/51926.
Mohammadi SS, Nguyen QD. A user-friendly approach for the diagnosis of diabetic retinopathy using ChatGPT and automated machine learning. Ophthalmol Sci. 2024;4:100495. https://doi.org/10.1016/j.xops.2024.100495.
Chen X, Zhang W, Xu P, Zhao Z, Zheng Y, Shi D, et al. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer. NPJ Digit Med. 2024;7:111. https://doi.org/10.1038/s41746-024-01101-z.
Chen X, Zhang W, Zhao Z, Xu P, Zheng Y, Shi D, et al. ICGA-GPT: report generation and question answering for indocyanine green angiography images. Br J Ophthalmol. 2024;108:1450–6. https://doi.org/10.1136/bjo-2023-324446.
Lin Z, Zhang D, Shi D, Xu R, Tao Q, Wu L, et al. Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation. J Biomed Inf. 2023;138:104281. https://doi.org/10.1016/j.jbi.2023.104281.
Chen X, Xu P, Li Y, Zhang W, Song F, He M, et al. ChatFFA: an ophthalmic chat system for unified vision-language understanding and question answering for fundus fluorescein angiography. iScience. 2024;27:110021. https://doi.org/10.1016/j.isci.2024.110021.
Carlà MM, Gambini G, Baldascino A, Giannuzzi F, Boselli F, Crincoli E, et al. Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. Br J Ophthalmol. 2024;108:1457–69. https://doi.org/10.1136/bjo-2023-325143.
Huang X, Raja H, Madadi Y, Delsoz M, Poursoroush A, Kahook MY, et al. Predicting glaucoma before onset using a large language model chatbot. Am J Ophthalmol. 2024;266:289–99. https://doi.org/10.1016/j.ajo.2024.05.022.
Kass MA, Heuer DK, Higginbotham EJ, Johnson CA, Keltner JL, Miller JP, et al. The ocular hypertension treatment study: a randomized trial determines that topical ocular hypotensive medication delays or prevents the onset of primary open-angle glaucoma. Arch Ophthalmol. 2002;120:701–13. https://doi.org/10.1001/archopht.120.6.701.
Carlà MM, Gambini G, Baldascino A, Boselli F, Giannuzzi F, Margollicci F, et al. Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison. Graefes Arch Clin Exp Ophthalmol. 2024;262:2945–59. https://doi.org/10.1007/s00417-024-06470-5.
Rojas-Carabali W, Sen A, Agarwal A, Tan G, Cheung CY, Rousselot A, et al. Chatbots Vs. human experts: evaluating diagnostic performance of chatbots in uveitis and the perspectives on AI adoption in ophthalmology. Ocul Immunol Inflamm. 2024;32:1591–8. https://doi.org/10.1080/09273948.2023.2266730.
Ćirković A, Katz T. Exploring the potential of ChatGPT-4 in predicting refractive surgery categorizations: comparative study. JMIR Form Res. 2023;7:e51798. https://doi.org/10.2196/51798.
Ali MJ. ChatGPT and lacrimal drainage disorders: performance and scope of improvement. Ophthalmic Plast Reconstr Surg. 2023;39:221–5. https://doi.org/10.1097/IOP.0000000000002418.
Tailor PD, Dalvin LA, Chen JJ, Iezzi R, Olsen TW, Scruggs BA, et al. A comparative study of responses to retina questions from either experts, expert-edited large language models, or expert-edited large language models alone. Ophthalmol Sci. 2024;4:100485. https://doi.org/10.1016/j.xops.2024.100485.
Pushpanathan K, Lim ZW, Er Yew SM, Chen DZ, Hui’En Lin HA, Lin Goh JH, et al. Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience. 2023;26:108163. https://doi.org/10.1016/j.isci.2023.108163.
Tailor PD, Xu TT, Fortes BH, Iezzi R, Olsen TW, Starr MR, et al. Appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model. Mayo Clin Proc Digit Health. 2024;2:119–28. https://doi.org/10.1016/j.mcpdig.2024.01.003.
Barclay KS, You JY, Coleman MJ, Mathews PM, Ray VL, Riaz KM, et al. Quality and agreement with scientific consensus of ChatGPT information regarding corneal transplantation and Fuchs dystrophy. Cornea. 2024;43:746–50. https://doi.org/10.1097/ICO.0000000000003439.
Kianian R, Sun D, Crowell EL, Tsui E. The use of large language models to generate education materials about uveitis. Ophthalmol Retin. 2024;8:195–201. https://doi.org/10.1016/j.oret.2023.09.008.
Dihan Q, Chauhan MZ, Eleiwa TK, Hassan AK, Sallam AB, Khouri AS, et al. Using large language models to generate educational materials on childhood glaucoma. Am J Ophthalmol. 2024;265:28–38. https://doi.org/10.1016/j.ajo.2024.04.004.
Ferro Desideri L, Roth J, Zinkernagel M, Anguita R. Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration. Int J Retin Vitreous. 2023;9:71. https://doi.org/10.1186/s40942-023-00511-7.
Wu G, Zhao W, Wong A, Lee DA. Patients with floaters: answers from virtual assistants and large language models. Digit Health. 2024;10:20552076241229933. https://doi.org/10.1177/20552076241229933.
Milad D, Antaki F, Milad J, Farah A, Khairy T, Mikhail D, et al. Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases. Br J Ophthalmol. 2024;108:1398–405. https://doi.org/10.1136/bjo-2023-325053.
Antaki F, Milad D, Chia MA, Giguère CÉ, Touma S, El-Khoury J, et al. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol. 2024;108:1371–8. https://doi.org/10.1136/bjo-2023-324438.
Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3:100324. https://doi.org/10.1016/j.xops.2023.100324.
Botross M, Mohammadi SO, Montgomery K, Crawford C. Performance of Google’s artificial intelligence chatbot “Bard” (now “Gemini”) on ophthalmology board exam practice questions. Cureus. 2024;16:e57348. https://doi.org/10.7759/cureus.57348.
Haddad F, Saade JS. Performance of ChatGPT on ophthalmology-related questions across various examination levels: observational study. JMIR Med Educ. 2024;10:e50842. https://doi.org/10.2196/50842.
Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108:1379–83. https://doi.org/10.1136/bjo-2023-324091.
Thirunavukarasu AJ, Mahmood S, Malem A, Foster WP, Sanghera R, Hassan R, et al. Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: a head-to-head cross-sectional study. PLOS Digit Health. 2024;3:e0000341. https://doi.org/10.1371/journal.pdig.0000341.
Panthier C, Gatinel D. Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: a novel approach to medical knowledge assessment. J Fr Ophtalmol. 2023;46:706–11. https://doi.org/10.1016/j.jfo.2023.05.006.
Sakai D, Maeda T, Ozaki A, Kanda GN, Kurimoto Y, Takahashi M. Performance of ChatGPT in board examinations for specialists in the Japanese Ophthalmology Society. Cureus. 2023;15:e49903. https://doi.org/10.7759/cureus.49903.
Cai LZ, Shaheen A, Jin A, Fukui R, Yi JS, Yannuzzi N, et al. Performance of generative large language models on ophthalmology board-style questions. Am J Ophthalmol. 2023;254:141–9. https://doi.org/10.1016/j.ajo.2023.05.024.
Jiao C, Edupuganti NR, Patel PA, Bui T, Sheth V. Evaluating the artificial intelligence performance growth in ophthalmic knowledge. Cureus. 2023;15:e45700. https://doi.org/10.7759/cureus.45700.
Moshirfar M, Altaf AW, Stoakes IM, Tuttle JJ, Hoopes PC. Artificial intelligence in ophthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions. Cureus. 2023;15:e40822. https://doi.org/10.7759/cureus.40822.
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, et al. Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine. 2023;95:104770. https://doi.org/10.1016/j.ebiom.2023.104770.
Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, et al. Investigating the accuracy and completeness of an artificial intelligence large language model about uveitis: an evaluation of ChatGPT. Ocul Immunol Inflamm. 2024;32:2052–5. https://doi.org/10.1080/09273948.2024.2317417.
Delsoz M, Raja H, Madadi Y, Tang AA, Wirostko BM, Kahook MY, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmol Ther. 2023;12:3121–32. https://doi.org/10.1007/s40123-023-00805-x.
Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, et al. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci Rep. 2023;13:18562. https://doi.org/10.1038/s41598-023-45837-2.
Singer MB, Fu JJ, Chow J, Teng CC. Development and evaluation of aeyeconsult: a novel ophthalmology Chatbot leveraging verified textbook knowledge and GPT-4. J Surg Educ. 2024;81:438–43. https://doi.org/10.1016/j.jsurg.2023.11.019.
Raja H, Munawar A, Mylonas N, Delsoz M, Madadi Y, Elahi M, et al. Automated category and trend analysis of scientific articles on ophthalmology using large language models: development and usability study. JMIR Form Res. 2024;8:e52462. https://doi.org/10.2196/52462.
Dupps WJ Jr. Artificial intelligence and academic publishing. J Cataract Refract Surg. 2023;49:655–6. https://doi.org/10.1097/j.jcrs.0000000000001223.
Van Gelder RN. The pros and cons of artificial intelligence authorship in ophthalmology. Ophthalmology. 2023;130:670–1. https://doi.org/10.1016/j.ophtha.2023.05.018.
Bressler NM. What artificial intelligence chatbots mean for editors, authors, and readers of peer-reviewed ophthalmic literature. JAMA Ophthalmol. 2023;141:514–5. https://doi.org/10.1001/jamaophthalmol.2023.1370.
Apellis Pharmaceuticals. FDA approves Syfovre (pegcetacoplan) injection, the first and only in its class. 2023. Available at: https://investors.apellis.com/news-releases/news-release-details/fda-approves-syfovretm-pegcetacoplan-injection-first-and-only. Accessed August 18, 2024.
EyesOnEyeCare. FDA approves IVERIC bio’s IZERVAY (branciciclovir injection) for geographic atrophy. 2023. Available at: https://glance.eyesoneyecare.com/stories/2023-08-07/fda-approves-iveric-bio-s-izervay-for-ga/. Accessed August 18, 2024.
Volpe NJ, Mirza RG. Chatbots, artificial intelligence, and the future of scientific reporting. JAMA Ophthalmol. 2023;141:824–5. https://doi.org/10.1001/jamaophthalmol.2023.3344.
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. 2022, https://arxiv.org/abs/2201.11903.
Anisuzzaman DM, Malins JG, Friedman PA, Attia ZI. Fine-tuning large language models for specialized use cases. Mayo Clin Proc Digit Health. 2024;3:100184. https://doi.org/10.1016/j.mcpdig.2024.11.005.
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, et al. Training language models to follow instructions with human feedback. In: Proceedings of the Neural Information Processing Systems (NeurIPS) 2022; 2022. https://doi.org/10.48550/arXiv.2203.02155.
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Proceedings of the 36th International Conference on Machine Learning. 2019:5243–52. https://doi.org/10.5555/3495724.3496517.
Nguyen Q, Nguyen DA, Dang K, Liu S, Nguyen K, Wang SY, et al. Advancing question-answering in ophthalmology with retrieval-augmented generation (RAG): Benchmarking open-source and proprietary large language models. J-GLOBAL. 2024. Available from: https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=202402211872512470.
Chen JS, Reddy AJ, Al-Sharif E, Shoji MK, Kalaw FGP, Eslani M, et al. Analysis of ChatGPT responses to ophthalmic cases: can ChatGPT think like an ophthalmologist. Ophthalmol Sci. 2024;5:100600. https://doi.org/10.1016/j.xops.2024.100600.
Ullah E, Parwani A, Baig MM, Singh R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology - a recent scoping review. Diagn Pathol. 2024;19:43. https://doi.org/10.1186/s13000-024-01464-7.
Celi LA, Cellini J, Charpignon ML, Dee EC, Dernoncourt F, Eber R, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities-a global review. PLOS Digit Health. 2022;1:e0000022. https://doi.org/10.1371/journal.pdig.0000022.
Dychiao RGK, Alberto IRI, Artiaga JCM, Salongcay RP, Celi LA. Large language model integration in Philippine ophthalmology: early challenges and steps forward. Lancet Digit Health. 2024;6:e308. https://doi.org/10.1016/S2589-7500(24)00064-5.
Restrepo D, Wu C, Tang Z, Shuai Z, Phan TNM, Ding J-E, et al. Multi-OphthaLingua: a multilingual benchmark for assessing and debiasing LLM ophthalmological QA in LMICs. AAAI. 2025;39:28321–30.
Tom E, Keane PA, Blazes M, Pasquale LR, Chiang MF, Lee AY, et al. Protecting data privacy in the age of AI-enabled ophthalmology. Transl Vis Sci Technol. 2020;9:36. https://doi.org/10.1167/tvst.9.2.36.
Kalaw FGP, Baxter SL. Ethical considerations for large language models in ophthalmology. Curr Opin Ophthalmol. 2024;35:438–46. https://doi.org/10.1097/ICU.0000000000001083.
Bernstein IA, Zhang YV, Govil D, Majid I, Chang RT, Sun Y, et al. Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions. JAMA Netw Open. 2023;6:e2330320. https://doi.org/10.1001/jamanetworkopen.2023.30320.
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: exploring the use of artificial intelligence in ophthalmology by comparing the accuracy, safety, and readability of responses to frequently asked patient questions regarding cataracts and cataract surgery. Semin Ophthalmol. 2024;39:472–9. https://doi.org/10.1080/08820538.2024.2326058.
Wilhelm TI, Roos J, Kaczmarczyk R. Large language models for therapy recommendations across 3 clinical specialties: comparative study. J Med Internet Res. 2023;25:e49324. https://doi.org/10.2196/49324.
Xue X, Zhang D, Sun C, Shi Y, Wang R, Tan T, et al. Xiaoqing: A Q&A model for glaucoma based on LLMs. Comput Biol Med. 2024;174:108399. https://doi.org/10.1016/j.compbiomed.2024.108399.
Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt. 2023;43:1562–70. https://doi.org/10.1111/opo.13207.
Funding
FGPK - National Institutes of Health Bridge2AI (AI-READI Salutogenesis Grand Challenge) Grant OT2OD032644.
Author information
Authors and Affiliations
Contributions
JCMA, MCBG, GMNS, FGPK - designed the study, acquired, parsed, and interpreted the data, drafted and revised the manuscript, and approved the final version of the manuscript. APA, IDN – designed the study, acquired the data, and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Artiaga, J.C.M., Guevarra, M.C.B., Sosuan, G.M.N. et al. Large language models in ophthalmology: a scoping review on their utility for clinicians, researchers, patients, and educators. Eye 39, 2752–2761 (2025). https://doi.org/10.1038/s41433-025-03935-7
Received:
Revised:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41433-025-03935-7


