Fig. 4: Results of counterfactual and independent evaluation on counterfactual datasets.
From: A toolbox for surfacing health equity harms and biases in large language models

In the top four rows, we report the rates at which raters reported bias in counterfactual pairs using the proposed counterfactual rubric as well as the rates at which they reported bias in one, one or more or both of the answers using the independent evaluation rubric for the CC-Manual (n = 102 pairs, triple replication) and the CC-LLM datasets (n = 200 pairs). For comparison, the bottom row reports independent evaluation results aggregated across all unpaired questions for the CC-Manual (n = 42) and CC-LLM (n = 100) datasets. Data are reported as proportions with 95% CIs.