Table 5 Sensitivity analysis of performance metrics on two datasets (post re-adjudication)

From: An autonomous agentic workflow for clinical detection of cognitive concerns using large language models

Workflow

Prevalence

Sensitivity

(95% CI)

Specificity

(95% CI)

F1 Score

(95% CI)

Accuracy

(95% CI)

Agentic workflow

(AP3)

50%

0.64

(0.56, 0.74)

0.98

(0.96, 1.00)

0.76

(0.69, 0.83)

0.81

(0.77, 0.86)

Expert-driven workflow

(XP3)

50%

0.78

(0.72, 0.85)

0.95

(0.92, 0.99)

0.85

(0.81, 0.98)

0.87

(0.83, 0.91)

  1. XP3 expert prompt 3, AP3 agent prompt 3, CI confidence interval.