Table 3 Results on information extraction, as reported with Accuracy score

From: Towards evaluating and building versatile large language models for medicine

Method

Size

PICO

ADE Drug Dose Ext.

PMC patient Basic Info. Ext.

Avg.

  

Participant Ext.

Intervention Ext.

Outcome Ext.

   

Close-source Models

GPT-4

67.44

62.79

65.12

91.30

97.93

76.92

Claude-3.5

65.12

76.74

60.47

95.65

99.07

79.41

Open-source Models

MEDITRON

7B

72.09

46.51

51.16

95.65

72.20

67.52

InternLM 2

7B

72.09

74.42

69.77

95.65

83.60

79.11

Mistral

7B

60.47

65.12

48.84

91.30

85.20

70.18

Llama 3

8B

58.14

79.07

58.14

69.57

95.93

72.17

Qwen 2

7B

58.14

67.44

41.86

73.91

95.93

67.46

Med42-v2

8B

55.81

60.47

60.47

91.30

95.67

72.74

Baichuan 2

7B

48.84

34.88

16.28

69.57

73.33

48.58

MMedIns-Llama 3

8B

83.72

79.07

62.79

95.65

97.60

83.77

  1. “Ext.” denotes extraction and “Info.” denotes information. Bolding represents the best results.