Table 1 Evaluation framework

From: Simulated patient systems powered by large language model-based AI agents offer potential for transforming medical education

Performance aspect

Evaluation dimension

Evaluation by

Metrics

Effectiveness

Knowledgebase validity (NER)

Medical doctors

F1

QA accuracy (conversation)

Researchers

Accuracy

Readability

Algorithm

Flesch Reading Ease, Flesch-Kincaid Grade Level

Trustworthiness

Robustness (system)

Researchers

Accuracy, ANOVA

Stability (personality)

Researchers

Accuracy, ANOVA