Table 3 Classification accuracy and 95% confidence intervals for machine learning models on the LR-M test set with average radiologist performance comparison on the 1st test-set segmentation.
ROC AUC | PR AUC | Accuracy | P value | Sensitivity | P value | Specificity | P value | Kappa | |
---|---|---|---|---|---|---|---|---|---|
Manual | 0.68 | 0.91 | 0.71 (0.51–0.85) | 0.20 | 0.79 (0.56–0.92) | 1.00 | 0.40 (0.12–0.77) | 0.03 | 0.18 |
TPOT | 0.60 | 0.89 | 0.63 (0.43–0.79) | 0.03 | 0.68 (0.45–0.85) | 0.08 | 0.40 (0.12–0.77) | 0.03 | 0.07 |
Radiologist 1 | NA | NA | 0.79 (0.59–0.91) | 0.44 | 0.79 (0.56–0.92) | 1.00 | 0.80 (0.36–0.98) | 0.56 | 0.48 |
Radiologist 2 | NA | NA | 0.79 (0.59–0.91) | 0.44 | 0.79 (0.56–0.92) | 1.00 | 0.80 (0.36–0.98) | 0.56 | 0.48 |
Radiologist 3 | NA | NA | 0.83 (0.63–0.94) | 0.80 | 0.79 (0.56–0.92) | 1.00 | 1.00 (0.51–1.00) | 1.00 | 0.61 |
Radiologist 4 | NA | NA | 0.83 (0.63–0.94) | 0.80 | 0.84 (0.61–0.95) | 1.00 | 0.80 (0.36–0.98) | 0.56 | 0.56 |
Mean radiologist | NA | NA | 0.81 (0.61–0.92) | 1.00 | 0.80 (0.57–0.93) | 1.00 | 0.85 (0.40–1.00) | 1.00 | NA |