Table 3 Post-re-adjudication classification performance of all prompts on the prompt validation dataset

From: An autonomous agentic workflow for clinical detection of cognitive concerns using large language models

Prompt

Sensitivity (95% CI)

Specificity (95% CI)

F1 Score (95% CI)

Accuracy (95% CI)

P0

0.85 (0.81, 0.89)

0.54 (0.53, 0.56)

0.54 (0.52, 0.56)

0.62 (0.61, 0.64)

AP3

0.62 (0.58, 0.66)

0.98 (0.95, 1.00)

0.74 (0.68, 0.80)

0.88 (0.86, 0.91)

  1. AP agent prompt, CI confidence interval.