Table 3 COVID-19 discriminability of the machine learning model and comparison to clinical, radiologist consensus and combined model.
Positive/total | AUCa | Accuracy | Sensitivity | Specificity | PPV | NPV | |
|---|---|---|---|---|---|---|---|
n | % (95%-CI) | % (95%-CI) | % (95%-CI) | % (95%-CI) | % (95%-CI) | % (95%-CI) | |
Validation set 1 | |||||||
ML model | 40/605 | 89.9 (85.9–93.9) | 89.3 (86.5–91.6) | 57.5 (40.9–73.0) | 91.5 (88.9–93.7) | 32.6 (22.8–42.3) | 97.9 (96.6–99.1) |
Clinical model | 40/605 | N/A | 70.4 (66.6–74.0) | 30.0 (16.6–46.5) | 73.3 (69.4–76.9) | 7.4 (3.4–11.4) | 93.7 (91.4–95.9) |
Radiologist consensus | 40/605 | N/A | 73.2 (69.5–76.7) | 55.0 (38.5–70.7) | 74.5 (70.7–78.1) | 13.3 (8.1–18.4) | 95.9 (94.0–97.8) |
Radiologist + ML model | 40/605 | N/A | 68.4 (64.6–72.1) | 92.5 (79.6–98.4) | 66.7 (62.7–70.6) | 16.4 (11.6–21.3) | 99.2 (98.3–100.1) |
Validation set 2 | |||||||
ML model | 155/3121 | 91.3 (89.2–93.3) | 93.0 (92.0–93.9) | 57.4 (49.2–65.3) | 94.8 (94.0–95.6) | 36.8 (30.7–42.9) | 97.7 (97.2–98.3) |
Validation set 3 | |||||||
ML model | 27/382 | 95.8 (91.6–99.9) | 96.9 (94.6–98.4) | 77.8 (57.7–91.4) | 98.3 (96.4–99.4) | 77.8 (62.1–93.5) | 98.3 (97.0–99.7) |
Clinical model | 27/382 | N/A | 67.2 (62.2–71.9) | 57.7 (36.9–76.6) | 67.9 (62.7–72.8) | 11.8 (6.2–17.4) | 95.6 (93.0–98.1) |
Radiologist readb | 27/382 | N/A | 92.3 (89.1–94.8) | 53.8 (33.4–73.4) | 95.1 (92.3–97.1) | 45.2 (27.6–62.7) | 96.5 (94.6–98.5) |
Radiologist + ML model | 27/382 | N/A | 55.5 (50.3–60.6) | 92.3 (74.9–99.1) | 52.7 (47.3–58.1) | 12.7 (8.0–17.4) | 98.9 (97.4–100.4) |