Table 4 Results on medical concept explanation, as reported with ‘BLEU/ROUGE’ scores
From: Towards evaluating and building versatile large language models for medicine
Method | Size | Health Fact Exp. | Do Entity Exp. | BioLORD Concept Exp. | Avg. |
|---|---|---|---|---|---|
Close-source Models | |||||
GPT-4 | – | 18.63/20.80 | 19.14/21.14 | 20.33/22.80 | 19.37/21.58 |
Claude-3.5 | – | 14.96/18.48 | 8.75/13.28 | 13.95/18.49 | 12.56/16.75 |
Open-source Models | |||||
MEDITRON | 7B | 6.09/8.65 | 7.68/25.39 | 11.76/22.66 | 8.51/18.90 |
InternLM 2 | 7B | 22.36/27.01 | 5.28/10.39 | 6.95/13.62 | 11.53/17.01 |
Mistral | 7B | 18.11/21.31 | 9.21/14.11 | 13.27/16.68 | 13.53/17.37 |
Llama 3 | 8B | 16.79/20.32 | 14.88/18.84 | 8.87/14.61 | 13.51/17.92 |
Qwen 2 | 7B | 14.94/17.45 | 5.87/9.73 | 6.81/10.83 | 9.20/12.67 |
Med42-v2 | 8B | 18.15/21.21 | 13.31/17.13 | 12.26/15.64 | 14.57/18.00 |
Baichuan 2 | 7B | 18.04/20.56 | 9.75/13.12 | 10.99/13.62 | 12.93/15.77 |
MMedIns-Llama 3 | 8B | 30.50/28.53 | 34.66/39.99 | 38.12/43.90 | 34.43/37.47 |