Table 2 Diagnostic Performance of Reviewers and DL-DSS.
AUC | Comparison | Sensitivity | Specificity | Accuracy | F-1 Score | |
|---|---|---|---|---|---|---|
Step 1: Initial Image Analysis | Step 1 vs Step 2 (p value) | |||||
Reviewer A | 0.94 [0.88–0.98] | 0.49 | 88.6 (31/35) [73.3–96.8] | 85.7 (54/63) [74.6–93.3] | 86.7 (85/98) [74.1–94.6] | 0.827 [0.737–0.870] |
Reviewer B | 0.78 [0.68–0.85] | <0.01 | 71.4 (25/35) [53.7–85.4] | 68.3 (43/63) [55.3–79.4] | 69.4 (68/98) [54.7–81.5] | 0.624 [0.509–0.704] |
Reviewer C | 0.87 [0.79–0.93] | 0.11 | 97.1 (34/35) [85.1–99.9] | 65.1 (41/63) [52.0–76.7] | 76.5 (75/98) [63.8–85.0] | 0.747 [0.687–0.760] |
Step 2: DL-DSS Alone | ||||||
DL-DSS | 0.92 [0.85–0.97] | 74.3 (26/35) [56.7–87.5] | 92.1 (58/63) [82.4–97.4] | 85.7 (84/98) [73.2–93.9] | 0.788 [0.663–0.867] | |
Step 3: DL-DSS Aided | Step 1 vs Step 3 (p value) | |||||
Reviewer A | 0.95 [0.88–0.98] | 0.65 | 85.7 (30/35) [69.7–95.2] | 93.7 (59/63) [84.5–98.2] | 90.8 (89/98) [79.2–97.1] | 0.869 [0.770–0.921] |
Reviewer B | 0.91 [0.83–0.96] | <0.01 | 80.0 (28/35) [63.1–91.6] | 93.7 (59/63) [84.5–98.2] | 88.8 (87/98) [76.9–95.8] | 0.836 [0.723–0.902] |
Reviewer C | 0.91 [0.83–0.96] | 0.17 | 91.4 (32/35) [76.9–98.2] | 71.4 (45/63) [58.7–82.1] | 78.6 (77/98) [65.2–87.9] | 0.753 [0.673–0.787] |