Fig. 2: Bias in LLMs demonstrated quantitatively. | Communications Medicine

Fig. 2: Bias in LLMs demonstrated quantitatively.

From: Unmasking and quantifying racial bias of large language models in medical report generation

Fig. 2

This figure presents evidence of LLMs’ bias with respect to racial and ethnic groups. a GPT-3.5-turbo's projected hospitalization duration comparisons across different racial and ethnic groups. b GPT-3.5-turbo's projected cost comparisons across racial and ethnic groups. c GPT-4's projected hospitalization comparisons across different racial and ethnic groups. d GPT-4's projected cost duration comparisons across races. ***, **, * denotes p-value < 0.001, p-value < 0.05, and p-value ≥ 0.05 in two-sided z-tests. For a–d, n = 4000 generated responses for each pair of comparison. The exact P values can be found in Supplementary Data 1.

Back to article page