Table 3 Reasoning analysis: evaluations in the zero-shot learning setting to examine whether the LLMs could reason about drugs using learned knowledge

From: A collaborative large language model for drug analysis

Method

No. of parameters

USMLE

MedMCQA

MMLU

ChatDoctor

ADE

Drug_Effects

DDI

PubMedQA

  

Accuracy

Accuracy

Accuracy

Precision

Recall

F1

Accuracy

Accuracy

Accuracy

Accuracy

Med-PaLM-26

340B (1.9×)

79.7

71.3

79.2

ChatGPT1

175B (1×)

55.8

63.5

71.4

4.7

5.6

5.1

45.2

39.8

42.8

64.1

GPT-42

>1T (>5.7×)

80.2

76.6

84.4

15.6

10.7

12.7

55.5

47.8

58.5

74.5

DrugGPT (current work)

175B (1×)

82.7

80.2

85.6

50.2

33.7

40.3

84.2

92.7

95.1

84.5

  1. Bold values indicate the highest performance for each dataset.