Table 1 Evaluation dimensions and indicators of AI agent in the healthcare
From: AI agent in healthcare: applications, evaluations, and future directions
Dimension | Primary indicator | Representative metrics | Operational example (typical agent) |
|---|---|---|---|
Basic indicators | Objective correctness | Accuracy, Precision, Recall, F1-score, ROC-AUC – be used to measure the correctness of the model’s prediction results | |
Semantic correctness | BLEU, ROUGE, METEOR, BERTScore -- be utilized to assess the semantic correctness of a model | ||
Task completion | Completion rate, success rate, (tool use) -- be used as indicators to examine how well the model achieves a specific medical task | ||
Developmental indicators | Efficiency level | Response time, number of interaction rounds -- be placed on the response time and the number of interaction rounds | |
Content & presentation quality | Richness, usefulness, safety, ethical compliance, readability, coherence – ensure output content meets requirements in terms of text quality and content value | ||
Humanistic care | Humanistic care, confidence, adherence, satisfaction –assess the appropriateness of humanistic considerations and user acceptability in the interaction. |