Table 1 Comparing the accuracy of responses to FRCOphth Part 2 written questions with different LLM-chatbots.
Covariate | Levels | Inaccurate response | Accurate response | Univariable OR | Multilevel OR |
|---|---|---|---|---|---|
LLM-chatbot | ChatGPT-3.5 | 65 (50.4) | 64 (49.6) | - | - |
Google Bard | 62 (48.1) | 67 (51.9) | 1.10 (0.67–1.79, p = 0.71) | 1.16 (0.63–2.14, p = 0.64) | |
Bing Chat | 22 (17.1) | 107 (82.9) | 4.94 (2.82–8.92, p < 0.001)*** | 11.90 (5.54–25.53, p < 0.001)*** | |
ChatGPT-4.0 | 27 (20.9) | 102 (79.1) | 3.84 (2.24–6.71, p < 0.001)*** | 8.10 (3.95–16.62, p < 0.001)*** | |
ChatGPT-4.0 prompted | 5 (11.6) | 38 (88.4) | 7.72 (3.10–23.52, p < 0.001)*** | 23.36 (6.51–83.80, p < 0.001)*** | |
Difficulty | Mean (SD) | 2.8 (0.8) | 2.4 (0.8) | 0.60 (0.48–0.75, p < 0.001)*** | 0.52 (0.21–1.25, p = 0.14) |
Topica | Investigations | 26 (40.0) | 39 (60.0) | - | - |
Trauma | 16 (61.5) | 10 (38.5) | 0.42 (0.16–1.05, p = 0.07) | 0.11 (0.01–1.92, p = 0.13) | |
Oculoplastic & Orbit | 13 (50.0) | 13 (50.0) | 0.67 (0.26–1.67, p = 0.39) | 0.39 (0.03–5.40, p = 0.48) | |
Glaucoma | 24 (61.5) | 15 (38.5) | 0.42 (0.18–0.93, p = 0.035)* | 0.13 (0.01–1.41, p = 0.09) | |
Strabismus | 10 (38.5) | 16 (61.5) | 1.07 (0.42–2.77, p = 0.89) | 0.68 (0.05–9.31, p = 0.77) | |
Paediatrics | 7 (26.9) | 19 (73.1) | 1.81 (0.69–5.19, p = 0.24) | 3.75 (0.23–60.75, p = 0.35) | |
Retina | 23 (29.5) | 55 (70.5) | 1.59 (0.80–3.21, p = 0.19) | 1.37 (0.19–10.03, p = 0.76) | |
Cataract | 5 (12.8) | 34 (87.2) | 4.53 (1.68–14.57, p = 0.005)** | 2.66 (0.16–45.68, p = 0.50) | |
Cornea & External Eye | 2 (3.8) | 50 (96.2) | 16.67 (4.60–107.44, p < 0.001)*** | 23.55 (1.42–390.93, p = 0.028)* | |
Uveitis & Oncology | 10 (25.6) | 29 (74.4) | 1.93 (0.82–4.79, p = 0.14) | 2.22 (0.23–21.91, p = 0.49) | |
Neurology | 13 (25.0) | 39 (75.0) | 2.00 (0.91–4.55, p = 0.09) | 1.59 (0.16–15.53, p = 0.69) | |
Genetics | 6 (46.2) | 7 (53.8) | 0.78 (0.23–2.66, p = 0.68) | 1.70 (0.05–61.67, p = 0.77) | |
Pharmacology | 12 (46.2) | 14 (53.8) | 0.78 (0.31–1.96, p = 0.59) | 0.89 (0.06–12.26, p = 0.93) | |
Miscellaneous | 14 (26.9) | 38 (73.1) | 1.81 (0.83–4.05, p = 0.14) | 1.42 (0.16–12.53, p = 0.75) |