Table 4 Performance of AI and experts for the identification of visually significant cataract cases in a test set of 186 eyes

From: Detecting visually significant cataract using retinal photograph-based deep learning

 

Sensitivity (%) (95% CI)

Specificity (%) (95% CI)

TP (no.)

TN (no.)

FP (no.)

FN (no.)

Accuracy (%) (95% CI)

Error rate (%) (95% CI)

Round 1 (using retinal photographs only)

 Our algorithm

93.3 (85.9–97.5)

99.0 (94.4–99.9)

83

96

1

6

96.2 (92.4–98.5)

3.8 (1.5–7.6)

 Grader 1

27.0 (18.1–37.4)

100.0 (96.3–100.0)

24

97

0

65

65.1 (57.7–71.9)

34.9 (28.1–42.3)

 Grader 2

24.7 (16.2–35.0)

100.0 (96.3–100.0)

22

97

0

67

64.0 (56.6–70.9)

36.0 (29.1–43.4)

 Clinician 1

93.3 (85.9–97.5)

92.8 (85.7–97.0)

83

90

7

6

93.0 (88.3–96.2)

7.0 (3.8–11.7)

 Clinician 2

53.9 (43.0–64.6)

97.9 (92.7–99.7)

48

95

2

41

76.9 (70.2–82.7)

23.1 (17.3–29.8)

 Clinician 3

60.7 (49.7–70.9)

96.9 (91.2–99.4)

54

94

3

35

79.6 (73.1–85.1)

20.4 (14.9–26.9)

 Clinician 4

29.2 (20.1–39.8)

99.0 (94.4–100.0)

26

96

1

63

65.6 (58.3–72.4)

34.4 (27.6–41.7)

Round 2 (using both retinal and slit-lamp photographs)

 Clinician 1

96.6 (90.5–99.3)

90.7 (83.1–95.7)

86

88

9

3

93.5 (89.0–96.6)

6.5 (3.4–11.0)

 Clinician 2

59.6 (48.6–69.8)

97.9 (92.7–99.7)

53

95

2

36

79.6 (73.1–85.1)

20.4 (14.9–26.9)

 Clinician 3

79.8 (69.9–87.6)

93.8 (87.0–97.7)

71

91

6

18

87.1 (81.4–91.6)

12.9 (8.4–18.6)

 Clinician 4

51.7 (40.8–62.4)

96.9 (91.2–99.4)

46

94

3

43

75.3 (68.4–81.3)

24.7 (18.7–31.6)

  1. 186 eyes randomly extracted from SCES and SINDI test sets, with visually significant cataracts defined as cataracts with BVCA < 20/60. Cataracts were graded based on the Wisconsin cataract grading system by A.G.T. independently, to form the gold standard for this evaluation.
  2. TP, true positive; TN, true negative; FP, false positive; FN, false negative.