Fig. 3: Ranking comparison of long-form answers. | Nature Medicine

Fig. 3: Ranking comparison of long-form answers.

From: Toward expert-level medical question answering with large language models

Fig. 3: Ranking comparison of long-form answers.

Med-PaLM 2 answers are consistently preferred over Med-PaLM answers by physician raters across all ratings dimensions, in both MultiMedQA (a) and adversarial (b) question sets. Stacked bars represent proportions of answers for which physician raters preferred Med-PaLM 2 answers (orange), Med-PaLM 1 answers (green) or ties (light blue). Error bars reflect 95% confidence intervals as determined by bootstrapping, centered on preference rates for Med-PaLM 2 and Med-PaLM, respectively, across n = 1,066 paired ratings. Detailed breakdowns for adversarial questions are presented in Supplementary Table 4.

Back to article page