Table 2 Rationale evaluation on MMedB with ROUGE-1/ BLEU-1
From: Towards building multilingual language model for medicine
Method | English | Chinese | Japanese | French | Russian | Spanish | Avg. |
---|---|---|---|---|---|---|---|
Zero-Shot Evaluation | |||||||
GPT-3.5 | 36.21/ 38.25 | 27.33/ 37.34 | 21.30/ 31.87 | 33.95/ 45.51 | 12.65/ 20.70 | 24.62/ 36.20 | 26.01/ 34.98 |
Gemini-1.0 pro | 11.85/ 28.20 | 6.26 / 27.23 | 6.54 / 24.28 | 8.42/33.11 | 3.39/ 15.38 | 7.22/ 27.98 | 7.28/ 26.03 |
Parameter-efficient Fine-tuning Evaluation | |||||||
BLOOMZ | 41.45/ 36.81 | 43.09/ 45.17 | 28.79/ 38.09 | 38.89/ 37.49 | 22.25/ 15.28 | 42.36/ 39.22 | 36.14/ 35.34 |
InternLM | 41.29/ 38.05 | 43.47/ 44.71 | 22.80/ 37.57 | 30.35/ 32.14 | 18.24/ 16.79 | 36.32/ 34.56 | 32.08/ 33.97 |
Llama 2 | 44.72/ 39.34 | 42.69/ 43.71 | 45.58/ 49.53 | 42.93/ 39.29 | 31.75/ 22.66 | 44.22/ 39.64 | 41.98/ 39.03 |
MedAlpaca | 43.59/ 39.52 | 40.71/ 42.50 | 37.27/ 44.69 | 39.82/ 39.57 | 30.11/ 22.83 | 42.80/ 39.64 | 39.05/ 38.12 |
ChatDoctor | 44.65/ 40.26 | 40.88/ 42.80 | 39.54/ 45.00 | 40.12/ 39.06 | 30.95/ 22.84 | 42.88/ 40.23 | 39.84/ 38.37 |
PMC-LLaMA | 44.98/ 40.90 | 40.09/ 42.95 | 38.15/ 43.67 | 38.89/ 38.64 | 30.08/ 22.45 | 43.00/ 39.80 | 39.20/ 38.07 |
MEDITRON | 44.26/ 40.42 | 39.26/ 42.06 | 36.31/ 43.34 | 38.73/ 37.88 | 28.34/ 21.64 | 42.02/ 39.06 | 38.15/ 37.40 |
Mistral | 48.13/ 42.80 | 45.61/ 46.31 | 43.82/ 48.19 | 44.73/ 41.07 | 33.62/ 24.75 | 47.37/ 42.83 | 43.88/ 40.99 |
InternLM 2 | 46.87/ 41.66 | 47.64/ 49.28 | 42.22/ 46.91 | 41.81/ 38.46 | 26.78/ 21.71 | 44.51/ 40.13 | 41.64/ 39.69 |
BioMistral | 45.85/41.34 | 43.12/44.75 | 38.76/44.46 | 41.82/39.41 | 27.73/18.80 | 45.52/41.07 | 40.46/38.31 |
Llama 3 | 46.33/ 41.73 | 47.09/ 47.44 | 46.24/ 50.43 | 43.13/ 40.69 | 30.89/ 22.22 | 47.14/ 42.70 | 43.47/ 40.87 |
MMedLM (Ours) | 41.63/ 38.83 | 44.30/ 46.38 | 38.61/ 46.90 | 37.54/ 37.78 | 19.99/ 21.28 | 40.79/ 38.77 | 37.14/ 38.32 |
MMedLM 2 (Ours) | 47.07/ 41.51 | 47.15/ 48.36 | 47.90/ 52.24 | 43.22/ 41.36 | 27.81/ 25.70 | 46.17/ 42.64 | 43.22/ 41.97 |
MMed-Llama 3 (Ours) | 46.56/ 41.57 | 47.12/ 47.71 | 48.10/ 53.18 | 43.62/ 40.97 | 33.92/ 24.87 | 47.67/ 43.32 | 44.50/ 41.94 |
Full Fine-tuning Evaluation | |||||||
BLOOMZ | 45.94/ 40.51 | 48.37/ 48.26 | 44.71/ 48.61 | 44.47/ 41.05 | 29.95/ 21.50 | 45.91/ 40.77 | 43.22/ 40.12 |
InternLM | 46.53/ 41.86 | 48.24/ 48.64 | 44.89/ 49.83 | 41.80/ 37.95 | 27.87/ 21.20 | 43.42/ 38.59 | 42.12/ 39.68 |
Llama 2 | 46.87/ 41.39 | 46.62/ 46.57 | 48.53/ 51.21 | 44.43/ 40.38 | 33.05/ 23.24 | 45.96/ 40.37 | 44.24/ 40.53 |
MedAlpaca | 47.33/ 42.31 | 45.72/ 46.49 | 45.35/ 49.12 | 43.78/ 40.41 | 32.80/ 23.15 | 45.99/ 40.57 | 43.49/ 40.34 |
ChatDoctor | 47.22/ 41.97 | 44.66/ 45.81 | 38.87/ 47.95 | 44.64/ 40.25 | 32.19/ 23.37 | 45.68/ 40.71 | 42.21/ 40.01 |
PMC-LLaMA | 47.33/ 42.87 | 45.87/ 46.18 | 44.52/ 48.44 | 43.80/ 40.23 | 31.14/ 22.28 | 46.30/ 40.68 | 43.16/ 40.12 |
MEDITRON | 47.40/ 42.85 | 47.93/ 48.61 | 49.13/ 52.03 | 45.93/ 41.37 | 33.65/ 24.10 | 46.42/ 41.11 | 45.08/ 41.68 |
Mistral | 47.16/ 41.82 | 48.34/ 47.91 | 48.80/ 50.60 | 45.83/ 40.88 | 34.52/ 24.68 | 47.55/ 41.41 | 45.37/ 41.22 |
InternLM2 | 49.48/ 44.12 | 51.38/ 51.58 | 50.64/ 53.46 | 46.73/ 42.00 | 32.93/ 24.05 | 47.94/ 41.96 | 46.52/ 42.86 |
BioMistral | 47.96/ 42.16 | 49.76/ 49.33 | 49.73/ 52.12 | 46.34/ 41.64 | 34.20/ 24.27 | 47.57/ 41.11 | 45.93/ 41.77 |
Llama 3 | 48.74/ 43.66 | 49.44/ 49.426 | 51.97/ 53.98 | 47.11/ 42.49 | 34.73/ 25.07 | 48.59/ 42.44 | 46.76/ 42.84 |
MMedLM (Ours) | 47.37/ 41.98 | 48.68/ 49.28 | 48.95/ 52.34 | 45.39/ 41.41 | 33.24/ 24.67 | 46.68/ 41.35 | 45.05/ 41.84 |
MMedLM 2 (Ours) | 50.02/ 44.77 | 51.39/ 51.78 | 54.79/ 57.10 | 49.04/ 45.30 | 37.49/ 28.18 | 50.14/ 44.59 | 48.81/ 45.29 |
MMed-Llama 3 (Ours) | 47.61/ 42.47 | 49.96/ 49.36 | 52.89/ 55.06 | 47.92/ 42.85 | 36.31/ 26.67 | 48.61/ 43.41 | 47.21/ 43.29 |