Table 1 Evaluation framework
Performance aspect | Evaluation dimension | Evaluation by | Metrics |
|---|---|---|---|
Effectiveness | Knowledgebase validity (NER) | Medical doctors | F1 |
QA accuracy (conversation) | Researchers | Accuracy | |
Readability | Algorithm | Flesch Reading Ease, Flesch-Kincaid Grade Level | |
Trustworthiness | Robustness (system) | Researchers | Accuracy, ANOVA |
Stability (personality) | Researchers | Accuracy, ANOVA |