Fig. 4: Resolvability of dialect prejudice. | Nature

Fig. 4: Resolvability of dialect prejudice.

From: AI generates covertly racist decisions about people based on their dialect

Fig. 4

a, Language modelling perplexity and stereotype strength on AAE text as a function of model size. Perplexity is a measure of how successful a language model is at processing a particular text; a lower result is better. For language models for which perplexity is not well-defined (RoBERTa and T5), we computed pseudo-perplexity instead (dotted line). Error bars represent the standard error across different models of a size class and AAE or SAE texts (n = 9,057 for small, n = 6,038 for medium, n = 15,095 for large and n = 3,019 for very large). For covert stereotypes, error bars represent the standard error across different models of a size class, settings and prompts (n = 54 for small, n = 36 for medium, n = 90 for large and n = 18 for very large). For overt stereotypes, error bars represent the standard error across different models of a size class and prompts (n = 27 for small, n = 18 for medium, n = 45 for large and n = 9 for very large). Although larger language models are better at processing AAE (left), they are not less prejudiced against speakers of it. Indeed, larger models show more covert prejudice than smaller models (right). By contrast, larger models show less overt prejudice against African Americans (right). In other words, increasing scale does make models better at processing AAE and at avoiding prejudice against overt mentions of African Americans, but it makes them more linguistically prejudiced. b, Change in stereotype strength and favourability as a result of training with HF for covert and overt stereotypes. Error bars represent the standard error across different prompts (n = 9). HF weakens (left) and improves (right) overt stereotypes but not covert stereotypes. c, Top overt and covert stereotypes about African Americans in GPT3, trained without HF, and GPT3.5, trained with HF. Colour coding as positive (green) and negative (red) is based on ref. 34. The overt stereotypes get substantially more positive as a result of HF training in GPT3.5, but there is no visible change in favourability for the covert stereotypes.

Back to article page