Fig. 1: Med-PaLM 2 performance on MultiMedQA. | Nature Medicine

Fig. 1: Med-PaLM 2 performance on MultiMedQA.

From: Toward expert-level medical question answering with large language models

Fig. 1: Med-PaLM 2 performance on MultiMedQA.

a, Med-PaLM 2 achieved an accuracy of 86.5% on USMLE-style questions in the MedQA dataset. The shaded region highlights the reported performance of models developed after Med-PaLM 2. b, In a pairwise ranking study on n = 1,066 consumer medical questions, Med-PaLM 2 answers were preferred over physician answers by a panel of physicians across eight of nine axes in our evaluation framework. Stacked bars represent proportions of answers for which physician raters preferred Med-PaLM 2 answers (orange), answers generated by other physicians (blue) or ties (light blue). Error bars reflect 95% confidence intervals of the overall preference rates for physician and Med-PaLM 2 answers, as determined by clustered bootstrapping computed over all 1,066 paired ratings.

Back to article page