Table 5 Sensitivity analysis of performance metrics on two datasets (post re-adjudication)
Workflow | Prevalence | Sensitivity (95% CI) | Specificity (95% CI) | F1 Score (95% CI) | Accuracy (95% CI) |
|---|---|---|---|---|---|
Agentic workflow (AP3) | 50% | 0.64 (0.56, 0.74) | 0.98 (0.96, 1.00) | 0.76 (0.69, 0.83) | 0.81 (0.77, 0.86) |
Expert-driven workflow (XP3) | 50% | 0.78 (0.72, 0.85) | 0.95 (0.92, 0.99) | 0.85 (0.81, 0.98) | 0.87 (0.83, 0.91) |