Table 4 Results on medical concept explanation, as reported with ‘BLEU/ROUGE’ scores

From: Towards evaluating and building versatile large language models for medicine

Method

Size

Health Fact Exp.

Do Entity Exp.

BioLORD Concept Exp.

Avg.

Close-source Models

GPT-4

18.63/20.80

19.14/21.14

20.33/22.80

19.37/21.58

Claude-3.5

14.96/18.48

8.75/13.28

13.95/18.49

12.56/16.75

Open-source Models

MEDITRON

7B

6.09/8.65

7.68/25.39

11.76/22.66

8.51/18.90

InternLM 2

7B

22.36/27.01

5.28/10.39

6.95/13.62

11.53/17.01

Mistral

7B

18.11/21.31

9.21/14.11

13.27/16.68

13.53/17.37

Llama 3

8B

16.79/20.32

14.88/18.84

8.87/14.61

13.51/17.92

Qwen 2

7B

14.94/17.45

5.87/9.73

6.81/10.83

9.20/12.67

Med42-v2

8B

18.15/21.21

13.31/17.13

12.26/15.64

14.57/18.00

Baichuan 2

7B

18.04/20.56

9.75/13.12

10.99/13.62

12.93/15.77

MMedIns-Llama 3

8B

30.50/28.53

34.66/39.99

38.12/43.90

34.43/37.47

  1. “Exp.” denotes Explanation. Bolding represents the best results.