Table 3 Results on information extraction, as reported with Accuracy score
From: Towards evaluating and building versatile large language models for medicine
Method | Size | PICO | ADE Drug Dose Ext. | PMC patient Basic Info. Ext. | Avg. | ||
|---|---|---|---|---|---|---|---|
Participant Ext. | Intervention Ext. | Outcome Ext. | |||||
Close-source Models | |||||||
GPT-4 | – | 67.44 | 62.79 | 65.12 | 91.30 | 97.93 | 76.92 |
Claude-3.5 | – | 65.12 | 76.74 | 60.47 | 95.65 | 99.07 | 79.41 |
Open-source Models | |||||||
MEDITRON | 7B | 72.09 | 46.51 | 51.16 | 95.65 | 72.20 | 67.52 |
InternLM 2 | 7B | 72.09 | 74.42 | 69.77 | 95.65 | 83.60 | 79.11 |
Mistral | 7B | 60.47 | 65.12 | 48.84 | 91.30 | 85.20 | 70.18 |
Llama 3 | 8B | 58.14 | 79.07 | 58.14 | 69.57 | 95.93 | 72.17 |
Qwen 2 | 7B | 58.14 | 67.44 | 41.86 | 73.91 | 95.93 | 67.46 |
Med42-v2 | 8B | 55.81 | 60.47 | 60.47 | 91.30 | 95.67 | 72.74 |
Baichuan 2 | 7B | 48.84 | 34.88 | 16.28 | 69.57 | 73.33 | 48.58 |
MMedIns-Llama 3 | 8B | 83.72 | 79.07 | 62.79 | 95.65 | 97.60 | 83.77 |