Table 4 Performance of AI and experts for the identification of visually significant cataract cases in a test set of 186 eyes
From: Detecting visually significant cataract using retinal photograph-based deep learning
Sensitivity (%) (95% CI) | Specificity (%) (95% CI) | TP (no.) | TN (no.) | FP (no.) | FN (no.) | Accuracy (%) (95% CI) | Error rate (%) (95% CI) | |
|---|---|---|---|---|---|---|---|---|
Round 1 (using retinal photographs only) | ||||||||
Our algorithm | 93.3 (85.9–97.5) | 99.0 (94.4–99.9) | 83 | 96 | 1 | 6 | 96.2 (92.4–98.5) | 3.8 (1.5–7.6) |
Grader 1 | 27.0 (18.1–37.4) | 100.0 (96.3–100.0) | 24 | 97 | 0 | 65 | 65.1 (57.7–71.9) | 34.9 (28.1–42.3) |
Grader 2 | 24.7 (16.2–35.0) | 100.0 (96.3–100.0) | 22 | 97 | 0 | 67 | 64.0 (56.6–70.9) | 36.0 (29.1–43.4) |
Clinician 1 | 93.3 (85.9–97.5) | 92.8 (85.7–97.0) | 83 | 90 | 7 | 6 | 93.0 (88.3–96.2) | 7.0 (3.8–11.7) |
Clinician 2 | 53.9 (43.0–64.6) | 97.9 (92.7–99.7) | 48 | 95 | 2 | 41 | 76.9 (70.2–82.7) | 23.1 (17.3–29.8) |
Clinician 3 | 60.7 (49.7–70.9) | 96.9 (91.2–99.4) | 54 | 94 | 3 | 35 | 79.6 (73.1–85.1) | 20.4 (14.9–26.9) |
Clinician 4 | 29.2 (20.1–39.8) | 99.0 (94.4–100.0) | 26 | 96 | 1 | 63 | 65.6 (58.3–72.4) | 34.4 (27.6–41.7) |
Round 2 (using both retinal and slit-lamp photographs) | ||||||||
Clinician 1 | 96.6 (90.5–99.3) | 90.7 (83.1–95.7) | 86 | 88 | 9 | 3 | 93.5 (89.0–96.6) | 6.5 (3.4–11.0) |
Clinician 2 | 59.6 (48.6–69.8) | 97.9 (92.7–99.7) | 53 | 95 | 2 | 36 | 79.6 (73.1–85.1) | 20.4 (14.9–26.9) |
Clinician 3 | 79.8 (69.9–87.6) | 93.8 (87.0–97.7) | 71 | 91 | 6 | 18 | 87.1 (81.4–91.6) | 12.9 (8.4–18.6) |
Clinician 4 | 51.7 (40.8–62.4) | 96.9 (91.2–99.4) | 46 | 94 | 3 | 43 | 75.3 (68.4–81.3) | 24.7 (18.7–31.6) |