Table 3 Ablation studies using scores on important modules

From: Healthcare agent: eliciting the power of large language models for medical consultation

Models

Inquiry Quality

Response Quality

Safety

 

Inquiry Proactivity

Inquiry Relevance

Conversational Fluency

Accuracy

Helpfulness

Harmfulness

Self-awareness

healthcare agent

3.96 ± 0.20

4.20 ± 0.24

4.12 ± 0.38

3.94 ± 0.55

3.98 ± 0.53

4.14 ± 0.42

4.12 ± 0.52

w/o Planner Sub-Module

4.18 ± 0.23

3.55 ± 0.41

3.88 ± 0.10

3.91 ± 0.63

3.99 ± 0.20

4.10 ± 0.33

4.15 ± 0.14

w/o Inquiry Sub-Module

2.55 ± 0.30

1.98 ± 0.28

2.99 ± 0.41

3.76 ± 0.19

3.66 ± 0.32

4.16 ± 0.36

4.12 ± 0.31

w/o Safety Module

4.00 ± 0.22

4.25 ± 0.50

4.10 ± 0.19

3.81 ± 0.46

3.95 ± 0.34

3.90 ± 0.14

3.40 ± 0.34

w/o Discuss-then-Modification

3.94 ± 0.22

4.22 ± 0.38

4.16 ± 0.20

3.88 ± 0.44

4.00 ± 0.23

4.06 ± 0.42

4.14 ± 0.55

w/o Conversation Memory

3.98 ± 0.22

3.88 ± 0.13

4.10 ± 0.29

3.74 ± 0.15

3.64 ± 0.37

4.00 ± 0.62

4.09 ± 0.50