Table 2 Performance comparison across additional race categories

From: Mitigating the risk of health inequity exacerbated by large language models

Model

Middle Eastern

Indigenous

African American

South Asian

East Asian

LLaMA3 8B w/ EquityGuard

70.0 ± 0.6%

69.9 ± 0.7%

70.2 ± 0.5%

70.2 ± 0.6%

70.3 ± 0.5%

LLaMA3 8B w/o EquityGuard

68.7 ± 0.8%

68.6 ± 0.8%

69.2 ± 0.7%

68.9 ± 0.8%

69.2 ± 0.7%

Mistral v0.3 w/ EquityGuard

70.3 ± 0.6%

70.4 ± 0.5%

70.4 ± 0.6%

70.6 ± 0.5%

70.6 ± 0.5%

Mistral v0.3 w/o EquityGuard

68.9 ± 0.7%

68.9 ± 0.8%

69.2 ± 0.7%

69.3 ± 0.8%

69.5 ± 0.7%

GPT-4

71.1 ± 0.5%

71.1 ± 0.5%

74.1 ± 0.6%

73.1 ± 0.6%

71.2 ± 0.5%

  1. EquityGuard reduces variability in NDCG@10 scores among diverse racial groups. Values are reported as mean ±standard deviation over five runs.