Table 2 Rationale evaluation on MMedB with ROUGE-1/ BLEU-1

From: Towards building multilingual language model for medicine

Method

English

Chinese

Japanese

French

Russian

Spanish

Avg.

Zero-Shot Evaluation

GPT-3.5

36.21/ 38.25

27.33/ 37.34

21.30/ 31.87

33.95/ 45.51

12.65/ 20.70

24.62/ 36.20

26.01/ 34.98

Gemini-1.0 pro

11.85/ 28.20

6.26 / 27.23

6.54 / 24.28

8.42/33.11

3.39/ 15.38

7.22/ 27.98

7.28/ 26.03

Parameter-efficient Fine-tuning Evaluation

BLOOMZ

41.45/ 36.81

43.09/ 45.17

28.79/ 38.09

38.89/ 37.49

22.25/ 15.28

42.36/ 39.22

36.14/ 35.34

InternLM

41.29/ 38.05

43.47/ 44.71

22.80/ 37.57

30.35/ 32.14

18.24/ 16.79

36.32/ 34.56

32.08/ 33.97

Llama 2

44.72/ 39.34

42.69/ 43.71

45.58/ 49.53

42.93/ 39.29

31.75/ 22.66

44.22/ 39.64

41.98/ 39.03

MedAlpaca

43.59/ 39.52

40.71/ 42.50

37.27/ 44.69

39.82/ 39.57

30.11/ 22.83

42.80/ 39.64

39.05/ 38.12

ChatDoctor

44.65/ 40.26

40.88/ 42.80

39.54/ 45.00

40.12/ 39.06

30.95/ 22.84

42.88/ 40.23

39.84/ 38.37

PMC-LLaMA

44.98/ 40.90

40.09/ 42.95

38.15/ 43.67

38.89/ 38.64

30.08/ 22.45

43.00/ 39.80

39.20/ 38.07

MEDITRON

44.26/ 40.42

39.26/ 42.06

36.31/ 43.34

38.73/ 37.88

28.34/ 21.64

42.02/ 39.06

38.15/ 37.40

Mistral

48.13/ 42.80

45.61/ 46.31

43.82/ 48.19

44.73/ 41.07

33.62/ 24.75

47.37/ 42.83

43.88/ 40.99

InternLM 2

46.87/ 41.66

47.64/ 49.28

42.22/ 46.91

41.81/ 38.46

26.78/ 21.71

44.51/ 40.13

41.64/ 39.69

BioMistral

45.85/41.34

43.12/44.75

38.76/44.46

41.82/39.41

27.73/18.80

45.52/41.07

40.46/38.31

Llama 3

46.33/ 41.73

47.09/ 47.44

46.24/ 50.43

43.13/ 40.69

30.89/ 22.22

47.14/ 42.70

43.47/ 40.87

MMedLM (Ours)

41.63/ 38.83

44.30/ 46.38

38.61/ 46.90

37.54/ 37.78

19.99/ 21.28

40.79/ 38.77

37.14/ 38.32

MMedLM 2 (Ours)

47.07/ 41.51

47.15/ 48.36

47.90/ 52.24

43.22/ 41.36

27.81/ 25.70

46.17/ 42.64

43.22/ 41.97

MMed-Llama 3 (Ours)

46.56/ 41.57

47.12/ 47.71

48.10/ 53.18

43.62/ 40.97

33.92/ 24.87

47.67/ 43.32

44.50/ 41.94

Full Fine-tuning Evaluation

BLOOMZ

45.94/ 40.51

48.37/ 48.26

44.71/ 48.61

44.47/ 41.05

29.95/ 21.50

45.91/ 40.77

43.22/ 40.12

InternLM

46.53/ 41.86

48.24/ 48.64

44.89/ 49.83

41.80/ 37.95

27.87/ 21.20

43.42/ 38.59

42.12/ 39.68

Llama 2

46.87/ 41.39

46.62/ 46.57

48.53/ 51.21

44.43/ 40.38

33.05/ 23.24

45.96/ 40.37

44.24/ 40.53

MedAlpaca

47.33/ 42.31

45.72/ 46.49

45.35/ 49.12

43.78/ 40.41

32.80/ 23.15

45.99/ 40.57

43.49/ 40.34

ChatDoctor

47.22/ 41.97

44.66/ 45.81

38.87/ 47.95

44.64/ 40.25

32.19/ 23.37

45.68/ 40.71

42.21/ 40.01

PMC-LLaMA

47.33/ 42.87

45.87/ 46.18

44.52/ 48.44

43.80/ 40.23

31.14/ 22.28

46.30/ 40.68

43.16/ 40.12

MEDITRON

47.40/ 42.85

47.93/ 48.61

49.13/ 52.03

45.93/ 41.37

33.65/ 24.10

46.42/ 41.11

45.08/ 41.68

Mistral

47.16/ 41.82

48.34/ 47.91

48.80/ 50.60

45.83/ 40.88

34.52/ 24.68

47.55/ 41.41

45.37/ 41.22

InternLM2

49.48/ 44.12

51.38/ 51.58

50.64/ 53.46

46.73/ 42.00

32.93/ 24.05

47.94/ 41.96

46.52/ 42.86

BioMistral

47.96/ 42.16

49.76/ 49.33

49.73/ 52.12

46.34/ 41.64

34.20/ 24.27

47.57/ 41.11

45.93/ 41.77

Llama 3

48.74/ 43.66

49.44/ 49.426

51.97/ 53.98

47.11/ 42.49

34.73/ 25.07

48.59/ 42.44

46.76/ 42.84

MMedLM (Ours)

47.37/ 41.98

48.68/ 49.28

48.95/ 52.34

45.39/ 41.41

33.24/ 24.67

46.68/ 41.35

45.05/ 41.84

MMedLM 2 (Ours)

50.02/ 44.77

51.39/ 51.78

54.79/ 57.10

49.04/ 45.30

37.49/ 28.18

50.14/ 44.59

48.81/ 45.29

MMed-Llama 3 (Ours)

47.61/ 42.47

49.96/ 49.36

52.89/ 55.06

47.92/ 42.85

36.31/ 26.67

48.61/ 43.41

47.21/ 43.29

  1. The best results are bold under different settings.
  2. The value preceding the symbol ‘/’ represents BLEU-1, while the value following it represents ROUGE-1.