npj Digital Medicine

Table 3 Post-re-adjudication classification performance of all prompts on the prompt validation dataset

From: An autonomous agentic workflow for clinical detection of cognitive concerns using large language models

Prompt	Sensitivity (95% CI)	Specificity (95% CI)	F1 Score (95% CI)	Accuracy (95% CI)
P0	0.85 (0.81, 0.89)	0.54 (0.53, 0.56)	0.54 (0.52, 0.56)	0.62 (0.61, 0.64)
AP3	0.62 (0.58, 0.66)	0.98 (0.95, 1.00)	0.74 (0.68, 0.80)	0.88 (0.86, 0.91)

AP agent prompt, CI confidence interval.

Back to article page

Search

Advanced search

Quick links