Table 2 Performance metrics of multi-resolution AI model in internal cohort validation set, testing set, and external cohort testing set.
Cohort | AUC | Average probability threshold p > 0.265 | ||||||
|---|---|---|---|---|---|---|---|---|
TP | TN | FP | FN | Sensitivity | Specificity | Accuracy | ||
Internal cohort validation set (N = 2048) | 0.989 (0.980–0.996) | 153 | 1770 | 117 | 8 | 95.03% (90.4–98.4) | 93.80% (92.0–95.6) | 93.90% (92.2–95.5) |
Internal cohort test set (N = 224) | 0.993 (0.975–1.00) | 19 | 194 | 11 | 0 | 100.00% (100–100) | 94.63% (90.3–98.1) | 95.09% (90.9–98.3) |
External cohort (N = 428) | 0.963 (0.909–0.994) | 42 | 161 | 224 | 1 | 97.67% (91.1–100) | 41.82% (35.9–48.0) | 47.43% (41.9–53.3) |
External cohort with fine-tuning (test set N = 357) | 0.964 (0.902–0.998) | 29 | 311 | 14 | 3 | 90.63% (75.0–100) | 95.69% (93.6–97.6) | 95.24% (93.0–97.3) |