Table 3 Reasoning analysis: evaluations in the zero-shot learning setting to examine whether the LLMs could reason about drugs using learned knowledge

Method	No. of parameters	USMLE	MedMCQA	MMLU	ChatDoctor			ADE	Drug_Effects	DDI	PubMedQA
		Accuracy	Accuracy	Accuracy	Precision	Recall	F1	Accuracy	Accuracy	Accuracy	Accuracy
Med-PaLM-2⁶	340B (1.9×)	79.7	71.3	−	−	−	−	−	−	−	79.2
ChatGPT¹	175B (1×)	55.8	63.5	71.4	4.7	5.6	5.1	45.2	39.8	42.8	64.1
GPT-4²	>1T (>5.7×)	80.2	76.6	84.4	15.6	10.7	12.7	55.5	47.8	58.5	74.5
DrugGPT (current work)	175B (1×)	82.7	80.2	85.6	50.2	33.7	40.3	84.2	92.7	95.1	84.5

Quick links

Search