Table 3 Ablation studies using scores on important modules
From: Healthcare agent: eliciting the power of large language models for medical consultation
Models | Inquiry Quality | Response Quality | Safety | ||||
|---|---|---|---|---|---|---|---|
Inquiry Proactivity | Inquiry Relevance | Conversational Fluency | Accuracy | Helpfulness | Harmfulness | Self-awareness | |
healthcare agent | 3.96 ± 0.20 | 4.20 ± 0.24 | 4.12 ± 0.38 | 3.94 ± 0.55 | 3.98 ± 0.53 | 4.14 ± 0.42 | 4.12 ± 0.52 |
w/o Planner Sub-Module | 4.18 ± 0.23 | 3.55 ± 0.41 | 3.88 ± 0.10 | 3.91 ± 0.63 | 3.99 ± 0.20 | 4.10 ± 0.33 | 4.15 ± 0.14 |
w/o Inquiry Sub-Module | 2.55 ± 0.30 | 1.98 ± 0.28 | 2.99 ± 0.41 | 3.76 ± 0.19 | 3.66 ± 0.32 | 4.16 ± 0.36 | 4.12 ± 0.31 |
w/o Safety Module | 4.00 ± 0.22 | 4.25 ± 0.50 | 4.10 ± 0.19 | 3.81 ± 0.46 | 3.95 ± 0.34 | 3.90 ± 0.14 | 3.40 ± 0.34 |
w/o Discuss-then-Modification | 3.94 ± 0.22 | 4.22 ± 0.38 | 4.16 ± 0.20 | 3.88 ± 0.44 | 4.00 ± 0.23 | 4.06 ± 0.42 | 4.14 ± 0.55 |
w/o Conversation Memory | 3.98 ± 0.22 | 3.88 ± 0.13 | 4.10 ± 0.29 | 3.74 ± 0.15 | 3.64 ± 0.37 | 4.00 ± 0.62 | 4.09 ± 0.50 |