Fig. 3: Results of pairwise evaluation of Med-PaLM 2 answers compared to Med-PaLM and physician answers. | Nature Medicine

Fig. 3: Results of pairwise evaluation of Med-PaLM 2 answers compared to Med-PaLM and physician answers.

From: A toolbox for surfacing health equity harms and biases in large language models

Fig. 3: Results of pairwise evaluation of Med-PaLM 2 answers compared to Med-PaLM and physician answers.

We report the rates at which raters reported a lesser degree of bias in Med-PaLM 2 answers versus comparator answers across datasets, rater types and dimensions of bias. The numbers of answers rated for each dataset are reported in Table 2 and the Methods. The comparator is Med-PaLM in all cases except for the case of physician-written answers to HealthSearchQA questions. Data are reported as proportions with 95% CIs.

Back to article page