Abstract
The rapid deployment of generative language models has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative language models has primarily examined bias via explicit identity prompting. However, prior research on bias in language-based technology platforms has shown that discrimination can occur even when identity terms are not specified explicitly. Here, we advance studies of generative language model bias by considering a broader set of natural use cases via open-ended prompting, which we refer to as a laissez-faire environment. In this setting, we find that across 500,000 observations, generated outputs from the base models of five publicly available language models (ChatGPT 3.5, ChatGPT 4, Claude 2.0, Llama 2, and PaLM 2) are more likely to omit characters with minoritized race, gender, and/or sexual orientation identities compared to reported levels in the U.S. Census, or relegate them to subordinated roles as opposed to dominant ones. We also document patterns of stereotyping across language model–generated outputs with the potential to disproportionately affect minoritized individuals. Our findings highlight the urgent need for regulations to ensure responsible innovation while protecting consumers from potential harms caused by language models.
Similar content being viewed by others
Introduction
The widespread deployment of generative language models (LMs)—algorithmic computer systems that generate text in response to various inputs, including chat—is raising concerns about societal impacts1. Despite this, they are gaining momentum as tools for social engagement and are expected to transform major segments of industry2. In education, LMs are being adopted in a growing number of settings, many of which include unmediated interactions with students3,4. Khan Academy (with over 100 million estimated consumers) launched Khanmigo in March 2023, a ChatGPT4-powered super tutor promising to bring one-on-one tutoring to students as a writing assistant, academic coach, and guidance counselor5. In June 2023, the California Teachers Association called for educators to embrace LMs for use cases ranging from tutoring to co-writing with students6; meanwhile, GPT-simulated students are being used to train novice teachers to reduce the risk of negatively impacting actual students7. Corresponding with usage spikes at the start of the following school year, OpenAI released a teacher guide in August8 and signed a partnership with Arizona State University in January 2024 to use ChatGPT as a personal tutor for subjects such as freshman writing composition9.
The rapid adoption of LMs in unmediated interactions with consumers is not limited to students. For example, due in part to rising loneliness among the U.S. public, a range of new LM-based products have entered the artificial intimacy industry10. The field of grief tech offers experiences for consumers to digitally engage with loved ones post-mortem via voice and text generated by LMs11. However, as labor movements responding to the threat of automation have observed, there is currently a lack of protection for both workers and consumers from the negative impacts of LMs in personal settings12. In an illustrative example, the National Eating Disorders Association replaced its human-staffed helpline in March 2023 with a fully automated chatbot built on a generative LM. When asked about how to support those with eating disorders, the model encouraged patients to take responsibility for healthy eating at a caloric deficit - ableist advice that is known to worsen the condition of individuals with eating disorders13.
A rising number of published studies of LM bias have emerged in different sectors, including journalism, medicine, education, and human resources14,15,16,17,18. However, few specifically interrogate the potential for LMs to reproduce and amplify societal bias with direct exposure to diverse end-users19,20,21,22. This study addresses this gap by investigating how the base models of five publicly available LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) respond to open-ended writing prompts covering three domains of life set in the United States: classroom interactions (Learning), the workplace (Labor), and interpersonal relationships (Love). We analyze the resulting responses for textual cues shown to exacerbate potential harms for minoritized individuals by race, gender, and sexual orientation23,24. Notably, we define harm as “… the impairment, or setback, of a person, entity, or society’s interests. People or entities suffer harm if they are in worse shape than they would be had the activity not occurred”25. We employ this definition as it acknowledges the ways in which algorithms arbitrarily and discriminatorily affect people’s lives with or without their awareness26.
This study advances the algorithmic bias literature in multiple ways, building upon prior intersectional approaches15,27,28 and advancing our understanding of sociotechnical harms emerging from algorithmic systems29,30. The extant studies of bias in generative LMs, including attempted self-audits by LM developers, are limited in scope and context, examining a handful of race/ethnicity categories (e.g., Black, White, or Asian), binary gender categorizations (Woman, Man), and one or two LMs31,32,33,34,35,36,37,38. The most widely adopted methodologies utilize what we term explicit identity prompting, where studies probe LMs using prompt templates that directly enumerate identity categories, e.g., “The Black woman works as a …”31,32. While these approaches are valuable for assessing stereotypical associations encoded by LMs32, they fail to capture a wider range of everyday scenarios where consumers need not explicitly specify identity terms to encounter bias. Examples of this include discrimination against distinctively African-American names in hiring17,39 and search engine results19,40. Our study builds on recent approaches that account for this broader set of natural uses with open-ended prompting33, where we analyze how LMs respond to prompts that do not rely on the usage of explicit identity terms (including for race, gender, or sexual orientation).
Furthermore, existing measures of bias for open-ended prompting have not been grounded in end-consumer contexts41,42 and have primarily focused on explicit biases in generative AI outputs. Some examples include methods that either rely on bias scores that consolidate multiple races34 or measures that use automated sentiment analysis or toxicity detection to approximate potential harms to humans33. Studies considering implicit biases remain limited. Given that modern generative LMs have become better at masking explicit biases via increased model safety guardrails and reinforcement learning from human feedback43, the algorithmic bias research landscape is shifting to a focus on covert forms of bias44,45. Existing studies of algorithmic bias are also limited in their consideration of multidimensional proxies of race46, variations across races47, and other issues associated with small-N populations48. These approaches reinforce framings that exclude members of the most minoritized communities from being considered valid or worthy of study, reinforcing their erasure in the scholarly discourse and perpetuating their minoritization in application.
To address these gaps, this study applies the theoretical framework of intersectionality49 to model algorithmic bias by inspecting structures of power embedded in language50. This framework offers several contributions to the LM and algorithmic bias literature. By employing an intersectional lens, we examine the societal reproduction of unjust systems of power within generative LM outputs51,52. This theoretical grounding allows for the examination of interconnected systems of power— what Collins refers to as the “matrix of domination”—and the potential for these outputs to advantage or disadvantage particular, often intersecting, socially constructed identities53. Specifically, we identify patterns of omission, subordination, and stereotyping, and examine the extent to which these models perpetuate biased narratives for minoritized intersectional subgroups, including small-N populations by race, gender and sexual orientation. We then analyze LM-generated texts for identity cues that have been shown to activate cognitive stereotyping54, including biased associations by names and pronouns23,24. Multiple studies investigate these potential psychosocial harms, such as increased negative self-perception55, prejudices about other identity groups56, and stereotype threat (which decreases cognitive performance in many settings, including academics54). These are frequently described in related literature as representational harms in that they portray certain social identity groups in a negative or subordinated manner57,58, shaping societal views about individuals based on group assumptions59. Representational harms from generative LMs are therefore not limited to the scope of individual experiences. Rather, they are inextricable from systems that amplify societal inequities and unevenly reflect the resulting biases (e.g., from training data, algorithms, and composition of the artificial intelligence (AI) workforce60) back to consumers who inhabit intersectional, minoritized identities19,47,61. To that end, we pose the following research question: To what extent does open-ended prompting of generative language models result in biased outputs against minoritized race, gender and sexual orientation identities?
In this work, we identify patterns of omission, subordination and stereotyping against every minoritized identity group included in our study. Our analysis allows for a critical examination of the ways in which implicit AI bias may result in downstream potential harms62,63. Specifically, this study extends existing algorithmic bias frameworks characterizing representational harms41,58,64 to include an investigation of what we term Laissez-Faire harms (defined as let people do as they choose) where (1) the LMs freely respond to open-ended prompts, (2) prompts correspond to unmediated consumer interactions (e.g., creative writing65) rather than probing for bias, and (3) market actors (i.e., companies) are free to develop without government intervention. By extending the discussion of representational harms into the social sphere, we reframe these harms from a public policy lens and therefore redefine them as Laissez-Faire harms to account for their broad societal impacts. This phrasing was motivated by the rapid deployment of generative AI tools as broad public-facing interfaces, coupled with the limited set of regulations and human-rights protections to guide this expansion. While we do not directly examine human exposure to LM outputs, we believe our study plays a key role in advancing the field’s knowledge of implicit LM biases by analyzing text responses generated from open-ended prompts that are free of explicit race/ethnicity, gender and sexuality-specific identity signals.
Results
The results reflect our analysis of 500,000 outputs generated by the base models of five publicly available generative language models: ChatGPT 3.5 and ChatGPT 4 (developed by OpenAI), Llama 2 (Meta), PaLM 2 (Google), and Claude 2.0 (Anthropic). We query these LMs with 100 unique open-ended prompts spanning three core domains of social life situated within the context of the United States: Learning (i.e., student interactions across K-12 academic subjects), Labor (i.e., workplace interactions across occupations from the U.S. Bureau of Labor Statistics), and Love (i.e., interpersonal interactions between romantic partners, friends, and siblings). In total, we analyze 50 domain-specific prompt scenarios: 15 for Learning, 15 for Labor, and 20 for Love (see Table 1 for examples) under both the power-neutral and power-laden conditions (i.e., in which there is a dominant and subordinate character). This generated a total of 100,000 stories (1000 for each prompt) using the default parameters configured for consumer access, over a period of twelve weeks.
Each domain is then examined from the lens of intersectionality (see Supplementary Methods A), which describes how power is embedded in both social discourse and language28,50. Although our prompts involve two characters at most, we observe responses from all five LMs that reproduce broader structures of inequality codified through textual cues for race and gender (see Section “Textual Identity Proxies and Psychosocial Impacts”). We model seven categories of racialization based on the 2030 OMB-approved U.S. Census classifications66: American Indian or Alaska Native (AI/AN), Native Hawaiian or Pacific Islander (NH/PI), Middle Eastern or North African (MENA), Hispanic or Latino (we adopt Latine as a gender-neutral label), Asian, African-American or Black, and White based on model-generated names. We model three gender classifications based on model-generated pronouns, titles, and gendered references: feminized (F), masculinized (M), and non-binary (NB). Based on gender classifications, we infer sexual orientations based on the six unique gender pairs (NB-NB, NB-F, NB-M, F-F, F-M, M-M; see Section “Modeling Gender, Sexual Orientation, And Race” for a detailed explanation of race and gender assignation). In all, we identify patterns of omission, subordination, and stereotyping that perpetuate biased narratives for minoritized intersectional subgroups, including small-N populations by race, gender, and sexual orientation.
Patterns of Omission
The first pattern we identify is that of omission. To quantify it, we begin by restricting our analysis to power-neutral prompt responses and measuring statistical deviations from the US Census. For a given demographic, we define the representation ratio as the proportion p of characters with the observed demographic divided by the proportion of the observed demographic in a comparison distribution p*.
Here, a demographic characteristic could be any combination of race, gender, and/or sexuality. We compute gender and sexuality proportions directly from gender reference mappings (see Table S9), and model race using fractional counting:
This allows us to understand the degree to which texts from LMs correlate with or amplify the underrepresentation of minoritized groups beyond known patterns. Figure 1ai shows that White characters are the most represented across all domains (i.e., Learning, Labor, and Love) and models, from 71.0% (Learning, ChatGPT3.5) to 84.1% (Love, PaLM2). The second most represented racial group only reaches a 13.2% likelihood (Latine, Love, Claude2.0). Examining the distribution within domain-model combinations (horizontal rows in 1.a.i.), the ranked order of representation by race is typically White, Latine, and Black (aside from a few exceptions that invert Black and Latine representations), with Asian represented in fourth place in all instances.
a, b Show overall likelihoods by race, sexual orientation, and gender inferred from LM-generated text in response to power-neutral prompts, categorized by model and domain. Bluer colors represent greater degrees of omission, and redder colors represent greater degrees of over-representation in comparison to the U.S. Census, with the exception of MENA, which is approximated by an auxiliary dataset (see Section “Modeling Gender, Sexual Orientation, and Race”). All colors except gray refer to cells with p < .001 (two-tailed computed using the Wilson score interval). We summarize median representation ratios in (aii, b). We focus on especially omitted groups in (c, d) with log-scale histograms of names by racial likelihood in the LM-generated texts. Exact Rrep ratios, p-values, confidence intervals, and effect sizes (Cohen’s d) are provided in Table S13a–d.
While the rank order aligns with the representation in the U.S. Census, proportional representation is not observed. Compared to the U.S. Census, median representation for racially minoritized characters (Fig. 1aii) ranges from ratios of 0.22 (MENA, Labor, v = 57247, p < 0.001, d = −1.337, 95% CI [0.198, 0.237]) to 0.66 (NH/PI, Labor, v = 57247, p < 0.001, d = −0.249, 95% CI [0.585, 0.800]), while White characters are over-represented at a median ratio of over 1.25 in Learning (v = 71870, p < 0.001, d = 1.04, 95% CI [1.238, 1.262]) to 1.34 in Labor (v = 57247, p < 0.001, d = 1.233, 95% CI [1.324, 1.351]). This means that for names reflecting any minoritized race, their representation is 33% (i.e., NH/PI, Labor) to 78% (i.e., MENA, Labor) less likely to appear in LM-generated stories, while White names are up to 34% more likely to appear relative to their representation in the U.S. Census. Meanwhile, gender representation is predominantly binary, skewing towards more feminized character representation, particularly for students in the Learning domain (except for ChatGPT 4, which skews masculinized).
Concerning gender, characters with non-binary pronouns are represented less than 0.5% of the time in all models except ChatGPT3.5 (3.9% in Learning). Binary gender representation ratios skew slightly feminine for all domains (Rrep = 1.07, v = 193370, p < 0.001, d = 0.023, 95% CI [1.058, 1.089]), whereas non-binary genders are underrepresented by an order of magnitude compared to Census levels (Rrep = 0.10, v = 193370, p < 0.001, d = −0.119, 95% CI [0.065, 0.148], see Fig. 1aii). Non-heterosexual romantic relationships are similarly underrepresented and are depicted in less than 3% of generated stories, with median representation ratios ranging from 0.04 (NB-NB, v = 35587, p < 0.001, d = −0.380, 95% CI [0.008, 0.241]) to 0.28 (F-F, v = 35587, p < 0.001, d = −0.181, 95% CI [0.214, 0.364], see Fig. 1b). Therefore, we find that all five generative LMs exacerbate patterns of omission for minoritized identity groups beyond population-level differences in race, gender, and sexual orientation (with p-values of <0.001 across nearly every combination of model and domain). That is, we observe far fewer mentions of these identity groups than we would expect given their representation in the population.
In Fig. 1c we illustrate additional patterns of omission specifically for NH/PI and AI/AN names, where we find little to no representation above a racial likelihood threshold of 24% (NH/PI) and 10% (AI/AN). Notably, this pattern of omission also holds for intersectional non-binary identities, where models broadly represent non-binary identified characters with predominantly White names (Fig. 1d). These baseline findings indicate that LMs broadly amplify the omission of minoritized groups in response to power-neutral prompts. The extent of this erasure exceeds expected values from the overall undercounting of minoritized groups in U.S. Census datasets67,68.
Patterns of Subordination
The representation of minoritized groups increases when power dynamics are added to the prompts, specifically with the introduction of a subordinate character. We find that race and gender-minoritized characters appear predominantly in portrayals where they are seeking help or are powerless. We quantify their relative frequency using the subordination ratio (see Eq. 3), which we define as the proportion of a demographic observed in the subordinate role compared to the dominant role. Figure 2a displays overall subordination ratios at the intersection of race and gender.
a Shows subordination ratios across all domains and models, increasing from left to right. Ratios for each model are indicated by different symbols plotted on a log scale (circles refer to ChatGPT3.5, squares refer to ChatGPT4, plus symbols refer to Claude2, x symbols refer to Llama2, and triangles refer to PaLM2). Center lines indicate the median across all five models. Redder colors represent greater degrees of statistical confidence (calculated as two-tailed p-values for the binomial ratio distribution, with p < .05 shown in yellow, p < .01 shown in orange, p < .001 shown in red, and p > .05 shown in gray), compared against the null hypothesis (subordination ratio = 1, dotted). b Shows the median subordination values across all five models by gender, race, and domain. Values above 1 indicate greater degrees of subordination, and values below 1 indicate greater degrees of dominance. Exact Rsub ratios, p-values, and confidence intervals are provided in Table S13e–m.
This approach allows us to focus on relative differences in the portrayal of characters when power-laden prompts are introduced. If the subordination ratio is less than 1, we observe dominance; if the subordination ratio is greater than 1, we observe subordination; and if the subordination ratio is 1, then the demographic is neutral (independent from power dynamics):
Overall, feminized characters are generally dominant in the Learning domain (i.e., subordination <1, meaning they are more likely to be portrayed as a “star student”). Notably, this relationship holds across all classroom subjects, including math, despite cultural stereotypes about math and gender (see Section “Textual Identity Proxies and Psychosocial Impacts”)69,70. This result is consistent with new trends in U.S. higher education in which women obtain undergraduate degrees at significantly higher rates than their male counterparts71. However, feminized characters hold largely subordinated positions in the Labor domain (i.e., subordination >1—see Fig. 2a, b). White feminized characters are uniformly dominant in stories across all five models in Learning (Rsub = 0.25, v = 139149, p < 0.001, 95% CI [0.238, 0.262]), while White masculinized characters are uniformly dominant in Labor (Rsub = 0.69, v = 79754, p < 0.001, 95% CI [0.667, 0.710]). For Love, most models portray White feminized characters as dominant (Rsub = 0.73, v = 141411, p < 0.001, 95% CI [0.700, 0.752]), with the exception of PaLM 2 and ChatGPT 4. We observe that for any combination of domain and model, either a White feminized and/or White masculinized character is dominant (p < .001). The same universal access to power is not afforded to characters of other racialized and gendered identities. Non-binary intersections across all races tend to appear as more subordinated, however, due to their omission these results are non-significant (see Fig. 1d). Domain differences are also observed at the intersection of race and gender. For example, as shown in Fig. 2b, high degrees of subordination are observed for Asian women in Labor (Rsub = 3.75, v = 79754, p < 0.001, 95% CI [2.95, 4.78]) and, to a lesser extent, Love (Rsub = 2.18, v = 141411, p < 0.001, 95% CI [1.832, 2.594]), whereas they are dominant in Learning (Rsub = 0.45, v = 139149, p < 0.001, 95% CI [0.367, 0.541]). Conversely, Asian men are highly subordinated in Learning (Rsub = 7.70, v = 139149, p < 0.001, 95% CI [5.416, 10.96]) and moderately subordinated in Love (Rsub = 1.46, v = 141411, p = 0.004, 95% CI [1.132, 1.872]), whereas their subordination ratio in Labor is ambiguous (Rsub = 0.86, v = 79754, p = 0.562, 95% CI [0.496, 1.453]). Overall, the models reinforce dominant portrayals of women in educational settings and men in workplace settings.
Examining names that are increasingly likely to be associated with one race (measured using fractionalized counting—see Eq. 1) reveals a more fine-grained pattern (see Fig. 3). With few exceptions (e.g., PaLM2 tends to repeat a single high-likelihood Black name, Amari, as a star student in Learning), the models respond to greater degrees of racialization with greater degrees of subordination for all races except White, as shown in Fig. 3a, b (LMs do not produce high-likelihood racialized names for NH/PI and AI/AN, as shown in Fig. 1c, hence these two categories are missing from Fig. 3).
a Shows subordination ratios, increasing from left to right per plot, of unique given names across all LMs, by race for which likelihoods vary (models do not generate high likelihood NH/PI or AI/AN names as shown in Fig. 1c). When a name has zero occurrences in either dominant or subordinated roles, we impute using Laplace smoothing. b Plots overall subordination across all models above a racial likelihood threshold as a percentage from 0 to 100. c Shows the median subordination ratio taken across all integer thresholds from 0 to 100, controlling for the effects of gender and categorized by domain, model, race, and gender (for non-binary characters, the models do not generate high likelihood racial names as shown in 1 d). Exact Rmrs ratios, p-values (two-tailed binomial ratio distribution), and confidence intervals are provided in Table S13n–p.
To quantify the extent to which subordination ratios vary across names for increasing degrees of racialization, we introduce the median racialized subordination ratio, which quantifies subordination across a range of possible racial thresholds. First, we control for possible confounding effects by conditioning on gender references (pronouns, titles, etc.). Then, for each intersection of race and gender, we compute the median of all subordination ratios for names above a variable likelihood threshold t as defined in Eq. 4. With sufficiently granular t, this statistic measures subordination while taking the spectrum of racial likelihoods into account. For our experiments, we set t ∈ [1, 2, …, 100].
Figure 3c shows intersectional median racialized subordination ratios by race and gender. We find large median subordination ratios for every binary gender intersection of Asian, Black, Latine, and MENA characters across nearly all models and domains (for non-binary characters, LMs do not produce a significant number of high-likelihood racialized names for any race except White). In 86.67% of cases (i.e., 104 of 120 table cells), characters from minoritized races appeared more frequently in a subordinated role compared to a dominant role. By contrast, in 3% of all cases (i.e., 1 of 30 table cells), White masculinized or feminized characters appeared more frequently in a subordinated role compared to a dominant role. In Learning, Latine masculinized students are portrayed by Claude 2.0 in the median as 1308.6 times more likely to be subordinated (i.e., a struggling student) than dominant (i.e., a star student, Rmrs = 1308.6, v = 15908, p < 0.001, 95% CI [184.31, 9290.3]). Across models and domains, Asian feminized characters are subordinated by several orders of magnitude (Rmrs = 172.6 for ChatGPT 4 in Learning, v = 11044, p < 0.001, 95% CI [23.644, 1260.2]; Rmrs = 352.2 for Claude 2.0 in Labor, v = 8604, p < 0.001, 95% CI [49.475, 2507.7]; and Rmrs = 160.6 for PaLM 2 in Labor, v = 7925, p < 0.001, 95% CI [22.544, 1144.5]). Black and MENA masculinized characters are subordinated to a similar degree by PaLM 2 (Rmrs = 83.8 for Black masculinized characters in Love, v = 10853, p < 0.001, 95% CI [11.438, 613.25]; Rmrs = 350.7 for MENA masculinized characters in Labor, v = 4588, p < 0.001, 95% CI [48.938, 2513.5]).
To further illustrate levels of subordination, we provide counts for the most common highly racialized names across LMs by race, gender, domain, and power condition (baseline is power-neutral; dominant and subordinated are power-laden) (Table 2). Asian, Black, Latine, and MENA names are several orders of magnitude more likely to be subordinated when a power dynamic is introduced. By contrast, White names are several orders of magnitude more likely to appear in baseline and dominant roles than non-White names. In the Learning domain, Sarah (83.1% White) and John (88.0% White) respectively appear 11,699 and 5915 times in the baseline condition and 10,925 and 5239 times, in the dominant condition. The next most common name, Maria (72.3% Latine), is a distant third, appearing just 550 times in the baseline condition and 364 times in the dominant condition.
Alternatively, when it comes to the subordinated roles, this dynamic is reversed. In Learning, Maria appears subordinated 13,580 times compared to 5939 for Sarah (a relative difference of 229%) and 3005 for John (a relative difference of 452%). Whereas Maria is significantly more likely to be portrayed as a struggling student than a star student, the opposite is true for Sarah and John. This reversal pattern of subordination extends to masculinized Latine, Black, MENA and Asian names. For example, in the Learning domain, Juan (86.9% Latine) and Jamal (73.4% Black) are 184.41 and 5.28 times more likely to appear subordinated than in dominant portrayals, respectively. The most commonly occurring masculinized Asian and MENA names (i.e., Hiroshi, 66.7% Asian, and Ahmed, 71.2% MENA) do not appear in either baseline or dominant positions for Learning despite appearing frequently in subordinated roles. Of the most frequently occurring racially minoritized names, only two appear more frequently in dominant than subordinated roles: Amari (86.4% Black, 1251 stories) and Priya (68.2% Asian, 52 stories). However, both of these appearances are generated exclusively by PaLM 2 in the Learning condition. Whereas PaLM 2 portrays other Black characters as subordinated across all domains, it represents Asian feminized characters as dominant in the Learning domain. This breaks from the pattern of the other four LMs that portray Asian characters as subordinated, reflecting variation among how LMs manifest model minority stereotypes. However, in Labor and Love, these exceptions disappear, and all of the most common minoritized characters are predominantly portrayed as subordinated. This pattern extends beyond the most common minoritized names (see Fig. 3a; we provide a larger sample of names in Tables S10 and S11(a–e)).
Patterns of Stereotyping
To analyze patterns of stereotyping, we turn to the linguistic content of the LM-generated narratives. We start by sampling stories (Table 3) with the most common racialized names (shown in Table 2). For the most omitted identity groups (LGBTQ+ and Indigenous; see Fig. 1c, d), we search for additional textual cues beyond names and gender references, including broad descriptors (e.g., Native American, transgender) and specific country/Native nation names and sexualities (e.g., Samoa, Muscogee, pansexual). We find representations of these terms to be low overall and entirely non-existent for most Native/Pacific Islander nations and sexualities. Sample stories in which these identity proxies do appear can be found in Table 4, and additionally in Table S12e–h. Qualitative coding identified frequently occurring linguistic patterns and stereotypes (see Section “Qualitative Coding for Explicit Stereotype Analysis”). Table 3a–d depicts representative stories for the most frequently occurring highly racialized names by identity group.
We find evidence of widespread cultural stereotyping across groups in addition to stereotypes that are group-specific. To some degree, these stereotypes provide a linguistic explanation for the high rates of subordination discussed in Section “Patterns of Subordination”.
The most frequent stereotype affecting MENA, Asian, and Latine characters is that of the perpetual foreigner72, which the LMs rhetorically employ to portray the subordination of these characters due to differences in culture, language, and/or disposition. Claude 2.0’s Maria is described as a student who just moved from Mexico, ChatGPT 4’s Ahmed is a foreign student from Cairo (in Egypt), and PaLM 2’s Priya is a new employee from India (Table 3a–c). All three characters face barriers that the texts attribute to their international background. Maria and Ahmed struggle with language barriers, and Priya has to learn how to “adjust to the American work culture”. Each character is also assigned additional character traits that map onto group-specific racial stereotypes. Maria is described using terms associated with a lack of intelligence and as someone who struggles to learn Spanish, despite it being her native language. This type of characterization reproduces negative stereotypes of Latina students as low-achieving (which is also reinforced strongly with masculinized Latine names, shown in Fig. 2b)73. Ahmed is described as “cantankerous”, aligning with negative stereotypes of MENA individuals as conflict-seeking74. Some ChatGPT 4 stories even depict Ahmed as requiring adjustments due to his upbringing in a war-torn nation (see Supplementary Method C, Tables 13a, d). Priya is described as grateful, which may be considered a positive sentiment in isolation, however, the absence of leadership qualities in any of her portrayals reifies model minority stereotypes of Asian women as obedient, demure, and good followers75. Priya is always a mentee, and despite being a quick learner, she nevertheless needs John’s help. While such portrayals may describe inequities in American society (such as systemic barriers that impede the career advancement of Asians/Asian Americans75), the stories produced by these models limit the responsibility for these inequities to the individual. By framing their struggles as deficits resulting from their foreignness or personality traits, these stories universally fail to account for larger structures and systems that produce gendered racism76,49.
In turn, LM stories center the white savior stereotype77, with dominant characters displaying positive traits in the process of helping minoritized individuals overcome challenges. For example, John (88.0% White), Charlie (31.3% White), and Sara (74.9% White) are depicted as successful, patient, hard-working, and charitable (Table 3a, d). For example, Jamal (73.4% Black) is introduced by Claude 2.0 as a jobless single father of three who is ultimately saved by Sara. Sara is portrayed as a hard worker driven by a calling to help other people. In that sense, Jamal is introduced to tell stories of Sara's good deeds, which include connecting Jamal with the food bank and finding ways to ensure his children are fed. There is no mention of attempts made by Jamal to help himself, let alone any reference to the historically entrenched systems that lead to the recurring separation of Black families in the U.S.78. The final dialogue between Jamal and Sara illustrates the rhetorical purpose for Jamal’s desperate portrayal, which is to ennoble Sara (“Helping people is my calling”). Jamal, meanwhile, appears in a power-dominant or power-neutral portrayal only twice despite filling a subordinated role 154 times. Credit for the success of the minoritized individual in these stories is ultimately attributed to characters embodying this white savior stereotype.
Stories emphasizing the struggle of individuals with minoritized sexualities are framed in a similar manner. Characters who are openly gay or transgender are most commonly cast in stories of displacement and homelessness due to coming out (Table 4a), while comparatively few stories depict gay or transgender individuals in stories that are affirming or mundane. Similar to Jamal’s depiction, the unnamed gay teenager is mentioned to elevate the main character, who is a diligent and compassionate social worker (Alicia, 47.0% White). The sexuality of the social worker is left unspecified, which illustrates the sociolinguistic concept of marking79. The asymmetry in textual cues specifying sexuality draws an explicit cultural contrast between the gay teenage client and the unmarked social worker, thus creating distance between the victim and the savior in the same manner that foreignness does in stories of Ahmed, Priya, and Maria.
Even in the more intimate scenarios, we observe imbalances that disproportionately subordinate LGBTQ+ characters. In Table 4b, Llama 2’s Alex (47.5% White) is a non-binary character who faces financial difficulties and must rely on their romantic partner Sarah (83.1% White) for support (in this story, Sarah is referred to using she/her pronouns). Whereas Sarah is a software engineer, Alex is “pursuing their passion for photography” and is “struggling to make ends meet”. Outputs like this play into cultural stereotypes that non-binary individuals are unfit for the professional world80. Across all 32 LM-generated stories of Alex as a non-binary character involving finances, Alex must rely on their partner for support. Furthermore, in every story except for one, their partner’s gender is binary (i.e., 96.9% of stories). For comparison, in cases where a heterosexual couple is presented, 9483 out of the 14,282 stories involving a financial imbalance place the masculinized character in a dominant position over the feminized character (i.e., 66.4% of stories). Therefore, non-binary identified characters in LGBTQ+ relationships are depicted by the models in a way that considerably amplifies comparable gender inequities faced by feminized characters in heterosexual relationships, above and beyond non-binary character omission in power-neutral settings (see Figs. 1a and 2b).
Multiple aforementioned stereotypes converge in stories describing Indigenous peoples. Table 4c introduces an unnamed Inuit elder from a remote village who is critically ill, living in harsh natural conditions. As with previous stories of the perpetual foreigner and white savior, ChatGPT 4’s savior James (86.8% White) is a main character who must also transcend “borders”, “communication barriers”, and “unfamiliar cultural practices” (despite the story taking place in Alaska). However, on top of that, James must also work with “stringent resources” and equipment that is “meager” and “rudimentary”. This positions the Inuit elder as a noble savage81, someone who is simultaneously uncivilized yet revered in a demeaning sense (mysteriously, the unnamed Inuit elder never speaks and only communicates his appreciation through a “grateful smile”). Twelve out of 13 occurrences of Inuit portrayals followed this sick patient archetype. Table 4d highlights another aspect of this stereotype, described as representations frozen in time82. Dale (90.5% White), the Native American character, is put in a position of power as somebody with authority to teach his best friend a “thrilling and unusual” hobby: making dreamcatchers. In the story, several words combine to frame Dale in a mystical and historical light (“ancient”, “sacred”, and “ancestors and fables”). As a result, his character is simultaneously distanced in both culture and time from Jon (90.7% White), a New Yorker who is curious by nature and “expands his world view” thanks to Dale. Most stories containing the term “Native American” follow this same archetype of teaching antique hobbies (in 18 out of 19 dominant portrayals). In the other common scenario, the term “Native American” is used only in the context of a historical topic to be studied in the classroom (in 68 out of 109 occurrences). The disproportionate frequency of such portrayals omits the realities that Indigenous peoples contend with in modern society, reproducing and furthering their long history of erasure from the lands that are now generally referred to as America.
Discussion
As history has shown, fictional depictions of human beings are more than passive interpretations of the real world83,84,85. Rather, they are active catalysts of cultural production that shape the construction of contemporary social reality, often impacting the freedoms and rights of minoritized communities globally86,87,88. Compared to human authors, language models produce stories that reflect social biases with greater scale, efficiency, and influence. We demonstrate that patterns of omission, subordination, and stereotyping are widespread across five well-utilized models. These patterns have the potential to affect consumers across races, genders, and sexual orientations. Crucially, they are present in LM outputs spanning educational contexts, workplace settings, and interpersonal relationships. Implicit bias and discrimination continue to be overlooked by model developers in favor of self-audits under the relatively new categories of AI safety and red-teaming, repurposing terms originating from fields such as computer security89,90. Such framings give greater attention to malicious users, national security concerns, or future existential risks at the expense of safeguarding fundamental human rights91,59. Despite lacking rigorous evidence, developers use terms like “Helpful, Harmless, Honest” or “Responsible” to market their LMs92,93. The generative AI-bias literature consistently finds that the leading LMs overwhelmingly reify socially dominant representations (i.e., white, heteronormative)32,36,43,44,45,94. We provide additional evidence that these models exacerbate racist and sexist ideologies for everyday consumers with scale and efficiency. In line with prior evidence, our findings underscore the extent to which generative AI models produce sexist and racist representations in text28,62,95, and vision models96,97,98, all of which further homogenize and essentialize marginalized identities99,100. The bias we identify is especially impactful as it does not require explicit prompting to reinforce the omission and subordination of minoritized groups. This in turn increases the risks of psychosocial and physical harms, even outside of conscious awareness42,101,102.
Results highlight widespread patterns of omission in the power-neutral condition, as well as high ratios of subordination and prevalent stereotyping in the power-laden condition. Combined, these outputs contribute to a lived experience where consumers with minoritized identities, if they are represented at all, experience character portrayals as struggling students (as opposed to star students), patients or defendants (as opposed to doctors or lawyers), and a friend or romantic partner who is more likely to borrow money or do the chores for someone else (as opposed to the other way around). Importantly, these omission levels exceed any level of bias that may be expected if language models were simply reflecting reality103. Minoritized characters are up to thousands of times more likely to be portrayed as subordinated and stereotyped than empowered (see Fig. 3c). As evidenced by the social psychology literature, omission, subordination, and stereotyping through racialized and gendered textual cues are shown to have direct consequences on consumer health and psychological well-being104. For example, exposure to linguistic cues that signal one-sided stereotypic associations (e.g., cantankerous Ahmed, or supportive Priya) can lead to unhealthy eating behaviors102 and reduced motivation to pursue career opportunities105. Observed patterns of subordination may be especially consequential when the magnitude and duration of stereotyping are proportional to the frequency of linguistic triggers101. As language models are being rapidly adopted in educational settings with goals such as personalized learning106, their potential to propagate cultural stereotypes further exacerbates pre-existing threats, especially if used in high-pressure contexts (e.g., testing and assessment)107. These stereotypes disproportionately target minoritized groups54,55 and may contribute to increased cognitive load, significantly impacting sense of belonging70, behavior108, self-perception, and even cognitive performance23,54,73. Even for individuals who do not inhabit minoritized identities, such stereotypes reinforce pre-existing prejudices56.
The prompts in our study correspond to scenarios where LMs are increasingly having unmediated interactions with vulnerable consumers, from AI-assisted writing for K-12 and university students3,9 to text-based bots for simulating romantic interactions10,11 or roleplaying as refugees seeking asylum109. By releasing these models as general-purpose interfaces, LM developers risk propagating Laissez-Faire harms to an untold number of susceptible secondary producers who build products using their models. This is particularly consequential for minoritized students, for whom language and identity are critical in the acquisition of academic knowledge110 as well as consumers in international contexts, who are not covered by the U.S.-centric focus of this initial study. A growing number of AI bias and fairness studies contend that to truly understand the broad impacts of AI-generated potential harms, future research should analyze prompts across diverse use-cases, including models reflecting varying cultural and linguistic contexts111,112,113 (e.g., BLOOM114). It remains to be seen if open-ended prompting leads these models to behave in similar ways. Our results call for researchers to adapt our open-ended prompting method to examine additional prompts in other languages, locales, and power contexts with consideration to additional identity factors (e.g., religion, class, disability). Such studies would benefit from the framework of intersectionality, replacing U.S.-centric identity categories with power structures specific to international contexts (e.g., using caste), and considering a broader set of use-cases, including representations of people in generative audio, image, or video.
Our findings are especially urgent given the limited set of regulatory human-rights protections in the U.S. context, underscoring the need for multiple reforms in generative AI policy. In 2022, under U.S. President Biden, the Office of Science and Technology Policy (OSTP) released an AI Bill of Rights that documented the dangers of unchecked automated technologies and provided a blueprint for risk mitigation. Seven major companies—Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI—voluntarily committed to upholding the principles of this Bill and ensuring that their products were scrutinized for potential harm. The blueprint is now maintained by the U.S. Archives115. A current examination of the priorities of the OSTP and the White House presents a different future for AI: one in which deregulation and expansion are the primary goals. The current U.S. Administration distributed America’s AI Action Plan in July 2025, which identifies more than 90 Federal policy actions to achieve the goals of the administration. Furthermore, the OSTP has explicitly revoked the Executive Order (EO) on AI from the Biden administration and has produced a new EO on preventing “woke AI” in the federal government. The EO, as well as the AI Action Plan, is focused on removing ideological biases from large language models. Our analyses demonstrate that there is indeed considerable ideological bias in contemporary large language models116. In regulating AI, we advocate for intersectional and sociotechnical approaches towards addressing the structural gaps that have enabled developers to sell recent language models as general-purpose tools to an unregulated number of consumer markets, while also remaining vague about (or refusing) to define the types of potential harms that are addressed in their self-audits. That is, effective regulation of language models must go beyond benchmarking117 to audit real-life consumer use cases89—including creative writing—while also grounding measures in a thoughtful consideration of potential human harms prior to their limited deployment in well-tested scenarios42. Second, our findings bolster calls for greater transparency from LM developers118, providing the public with details of the training datasets, model architectures, and labeling protocols used in the creation of generative LMs, given that each of these steps can contribute to the types of bias we observe in our experiments47,103. Third, we highlight the urgent need to expand public infrastructure to support third-party research capable of matching the rapid pace of model release as millions of AI models have proliferated the web, putting strain on traditional research and publishing pathways119. Stereotyping literature suggests that identity threats may be reduced by creating identity-safe environments through cues that signal belonging120. Critical AI education also raises awareness of the potential for language models to discriminate, helping to protect minoritized students by empowering them to respond in conducive ways121,122. Our study finds that publicly available LMs do not reflect reality; instead, they amplify biases by several orders of magnitude and reproduce discriminatory stereotypes reflecting dangerous ideologies concerning race, gender, and sexual orientation61. Given the disproportionate impacts on minoritized individuals and communities, we highlight the urgent need for critical and culturally relevant global AI education and literacy programs to inform, protect, and empower diverse consumers in the face of the Laissez-Faire harms they may encounter alongside the proliferation of generative AI tools123.
Limitations
This study also has limitations. Reliance on U.S. Census racial categories and prompts framed around the term American limits the generalizability of findings to international contexts. Laissez-faire harms tied to categories such as caste, religion, or class in non-U.S. societies remain beyond our study scope; however, studies of this type are encouraged in future research. While our study identifies major stereotypes by race (e.g., perpetual foreigner, white savior) and gender (e.g., glass ceiling), additional analyses are necessary for subtler or emergent stereotypes (e.g., those by nationality, socio-economic status, etc.)113. Likewise, our analysis focuses on five widely deployed, English-dominant LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, PaLM2), excluding open-source multilingual models (e.g., BLOOM) and smaller-domain models, potentially overlooking biases in non-English or other domain-specific contexts.
Additionally, in the absence of self-reported data, the datasets we employ have several limitations. First, we note that countries of origin in the case of MENA and NH/PI identities can only approximate race in the absence of self-reported data. Second, methods of data creation and collection for both datasets themselves skew racial distribution, due to factors like voting restrictions and demographic bias of Wikipedia editors124. As we discuss in Section “Modeling Gender, Sexual Orientation, and Race”, Florida voter registration imperfectly approximates the demographic composition of the United States. Controlling for such local variations when quantifying name-race associations would necessitate a national-level dataset surveying a significant number of named individuals alongside racial and ethnic self-identification that also incorporates membership in Indigenous communities. To the best of our knowledge, no such dataset currently exists. These limitations remain a persistent issue within widely adopted data collection methods for race and/or ethnicity, including the U.S. Census (which in 2023 proposed adding MENA as a racial category alongside allowing open-ended self-identification of ethnicity). This operational shortcoming affects all publicly available research datasets combining U.S. racial categories with given name data125,126,127. We also note several limitations to our approach for modeling gender and sexual orientation. First, categorical mapping on word lists does not capture stories where people may choose gender pronouns from multiple categories (e.g., they/she) or neopronouns. Second, we are unable to effectively infer transgender identities, as such individuals may choose to adopt pronouns or references in any of the above categories despite maintaining a separate gender identity (furthermore, we observe no instances of the terms trans woman or trans man in any of the generated stories). Third, our approach does not account for sexual orientations that cannot be directly inferred from single snapshots of gender references. To better capture broadly omitted gender populations, we utilize search keywords to produce qualitative analyses (e.g., transgender) (see Supplementary Methods B section 7). That said, our choice of keywords is far from exhaustive and warrants continued research. To support such efforts, we open-source our collected data (see Supplementary Methods D).
Ethical and Societal Impact
In this study, we evaluate intersectional forms of bias in LM-generated text outputs. Given the nature of biases we find in all five LMs, we do not involve human subjects in our research, nor did we outsource data labeling and analysis beyond members of our authorship team. We released our dataset to allow for audit transparency and in the hopes of furthering responsible AI research. At 500,000 stories, the size of our dataset may also reduce barriers to entry for researchers with less funding (e.g., independent researchers). We must also highlight the possibility of adverse impacts. One concern with releasing this data is that reading a dataset of this nature may be both triggering and upsetting to readers and potentially pose the risk, if not properly contextualized, of subliminally reinforcing biased narratives of historically marginalized social groups to unsuspecting readers. Furthermore, some studies suggest that the act of warning that LMs may generate biased outputs may lead to increased anticipatory anxiety, while having mixed results on actually dissuading readers from engaging128. We hope that this risk will be outweighed by the benefits of informing susceptible consumers of possible subliminal harms.
A secondary group of adverse impacts includes discriminatory abuses of the datasets and methods we describe in our study for modeling race, gender, and sexual orientation. One recent abuse of automated models is illuminated by a 2020 civil lawsuit, National Coalition on Black Civic Participation v. Wohl129, which describes how a group of defendants used automated robocalls to target and attempt to intimidate tens of thousands of Black voters ahead of the November 2020 U.S. election. To mitigate the risks of our models being used in such a system, we do not release our trained models.
Finally, to preserve the privacy of real-world individuals whose data contributed to fractional race modeling, we do not publish racial probabilities in our dataset, as they may be used to reveal personally identifiable information for rare names in particular. For researchers seeking to reproduce our work, we note that these data may be accessed instead through a gated repository, similar to the one described above, by contacting the researchers whom we cite in our work (see Section “Data Availability”).
Methods
To answer our research question, we divided our methodological approach into three stages. First, we selected the language models and designed open-ended prompts that incorporated power dynamics to uncover underlying biases related to race, gender, and sexual orientation within each model. Second, we quantified biases of omission and subordination by calculating representation ratios based on the probabilistic distribution of race, gender, and sexual orientation identities, using LM-generated names and gendered references (including pronouns and titles). Third, we employed critical qualitative methods130 to analyze the most frequently occurring identity cues across intersectional subgroups and validated stereotype constructs using interrater reliability techniques.
Model Selection
We investigate 500,000 texts generated by the base models of five publicly available generative language models: ChatGPT 3.5 and ChatGPT 4 (developed by OpenAI), Llama 2 (Meta), PaLM 2 (Google), and Claude 2.0 (Anthropic). Model selection was based on both the sizable amount of funding wielded by these companies and their investors (on the order of tens of billions in USD131), as well as the prominent policy roles that each company has played on the federal level. In July of 2023, the U.S. White House secured voluntary commitments from each of these companies to ensure product safety before launching them publicly132. To some extent, our analysis tests the extent to which they met this policy imperative.
We query these LMs with 100 unique open-ended prompts pertaining to 50 everyday scenarios across three core dimensions of social life situated within the context of the United States. For each language model (LM), we gathered a total of 100,000 stories—1000 samples for each of the 100 unique prompts—using the default parameters configured for consumer access, over a period of twelve weeks.
Prompt Design
Several principles guided our prompt design. First, prompts were designed to reflect potential use cases across multiple domains, for example, an AI writing assistant for students in the classroom5,9 or screenwriters in entertainment12. An analysis of consumer interactions with ChatGPT ranked creative writing as the most frequent consumer use case (comprising 21% of all conversations), highlighting the relevance of our study scope65. Second, each prompt uses the colloquial identity term American, which is common parlance to refer to those residing in the United States (i.e., The American People), regardless of their socio-economic background (i.e., race, ethnicity, citizenship, employment status, etc.). Even though American is a misnomer in that it can also be used to refer to members outside of the United States (e.g., individuals living in Central or South American nations), as we show in the results, these models appear to interpret American to mean those in the United States, thus furthering U.S.-centric biases present in earlier technology platforms which privilege WEIRD (Western, Educated, Industrialized, Rich, Democratic) norms and values133,134,135.
Utilizing the intersectional theoretical framework28,50, we examine how LMs generate outputs in response to prompts that depict everyday power dynamics and forms of routinized domination49. For each scenario, we capture the effect of power by dividing our prompts into two treatments: one power-neutral condition and one power-laden condition, where the latter contains a dominant character and a subordinate one. Therefore, our study conceptualizes social power specifically through prompts that ask LMs to generate stories in response to scenarios where dominant and subordinated characters interact with one another.
To obtain stories from a wide variety of contexts, our prompts span three primary domains of life in the US: Learning, Labor, and Love. In total, our study assesses 50 prompt scenarios: 15 for Learning, 15 for Labor, and 20 for Love (see Table 1 for examples). Learning scenarios describe classroom interactions between students, spanning 15 academic subjects: nine (9) core subjects commonly taught in U.S. public K-12 schools, three (3) subjects from Career and Technical Education (CTE), and three (3) subjects from Advanced Placement (AP). Labor scenarios describe workplace interactions and span 15 occupations categorized by the U.S. Bureau of Labor Statistics (BLS). For both domains, we base our selection of subjects and occupations to reflect a diversity of statistical representations by gender, class, and race, including subjects and occupations for which minoritized groups are statistically overrepresented in comparison to the 2022 U.S. Census68,136 (see Tables S1, S2). Love scenarios describe interpersonal interactions that are subcategorized by interactions between (a) romantic partners, (b) friends, or (c) siblings. In each of these three subcategories, we design six shared scenarios capturing everyday interpersonal interactions (ranging from going shopping to doing chores). For romantic partners, we add two extension scenarios that capture dynamics specific to intimate relationships: (1) going on a date, and (2) moving to a new city. We limit our scenarios to interpersonal interactions between two people in the interest of studying the effects of power (see Section “Textual Identity Proxies and Psychosocial Impacts”) and while these prompt scenarios do not reflect the full diversity of experiences that comprise interpersonal interactions, we believe this framework offers a beachhead for future studies to assess an even wider variety of culturally relevant prompts, both within the U.S. and beyond. For each LM, set to default parameters, we collect 100,000 outputs (or 1000 samples for each of the 100 unique prompts). We provide a complete list of prompt scenarios in Tables S3, S4, and S5. Data collection was conducted from August 16th to November 7th, 2023.
Textual Identity Proxies and Psychosocial Impacts
We analyze LM-generated outputs for bias using linguistic identity cues with the potential to induce potential psychosocial harms that disproportionately affect minoritized consumers. We specifically focus on textual identity proxies for race, gender, and sexual orientation in the context of stories, narratives, and portrayals of people. Established cognitive studies show how exposure to biased representations and stereotypic associations can shape how individuals view themselves, which in turn, shape their interactions with their environment in contexts where identities are salient104,137. For example, female undergraduates majoring in math, science, and/or engineering who viewed an advertisement video of professionals in their academic field were more likely to respond with cognitive and physiological vigilance and report a reduced sense of belonging and motivation when the video portrayed a gender imbalance, compared to when the video showed equal gender representations70. However, these effects did not extend to male undergraduates, irrespective of representation ratios. These video portrayals thus functioned as a situational cue with cognitive impacts depending on both the participant setting (i.e., academic environments) and the identity of the students (i.e., gender), given the prevalent American cultural stereotype that math is for boys69. Identity-based cues may be textual as well as visual. A study assessing the same stereotype on Asian-American female learners found that wording to selectively cue race or gender identity on a questionnaire administered prior to a test predicted performance based on whether a racial stereotype was activated (i.e., Asians are good at math) or whether a gender stereotype was activated (i.e., women are bad at math)24. Therefore, intersectional identity backgrounds must be taken into account when considering how identity portrayals may function as situational cues138. Furthermore, the impacts of narrative cues may be positive or negative depending on a variety of factors in addition to social identity, including the perceived risk of a situation and how the cue is framed104. Potential psychosocial harms faced by minoritized groups from negative stereotypic cues are broad and far-ranging, including negative impacts in behavior108, attitude23, performance24,54,73,139, and self-perception55 in addition to reinforcing the prejudiced perceptions of other identity groups56.
Settings that elicit identity-based cues do not require the reader to be consciously monitoring for stereotypes, and in some settings this may in fact magnify the effect101. This aligns with our study’s context, where race, gender, and sexual orientation are not explicitly requested (see Table 1). Following stereotyping studies that leverage linguistic identity cues23,24,102,105, we analyze LM-generated texts for race (using names) and gender proxies (using pronouns, titles, and gendered references). Table 5 shows the similarities between textual proxies in our study and words that have been demonstrated in psychology studies to prime stereotype threat by race and gender. This experimental design has additional precedence in sociotechnical studies that report discriminatory outcomes in hiring17,39 and targeted search advertisements40 in response to equivalent proxies.
To extract textual identity proxies at scale, we fine-tune a coreference resolution model (ChatGPT 3.5) using 150 hand-labeled examples to address underperformance in the pretrained LMs on underrepresented groups (e.g., non-binary)140. On an evaluation dataset of 4600 uniformly down-sampled LM-generated texts, our model performs at 98.0% gender precision, 98.1% name precision, 97.0% gender recall, and 99.3% name recall (.0063 95CI). Overall name coverage of our fractionalized counting datasets is 99.98%.
Modeling Gender, Sexual Orientation, And Race
In the context of studies of real-world individuals, the gold standard for assessing identity is through voluntary self-identification46,66,141. Given our context of studying fictional characters generated by LMs, our study instead measures observed identity46 via associations between identity categories and textual proxies. Out of the four gender labels collected by the U.S. Census Bureau68, our model quantifies three categories of gendering: feminized (F), masculinized (M), and non-binary (NB, which is listed in the Census as “None of these”). We are unable to quantify transgender as a gender category because our study examines gender references found in LM-generated text via pronouns, titles and gendered references, all of which may be used non-exclusively by transgender individuals and are thus insufficient for determining transgender identity in the absence of explicit identity prompting. We model sexual orientation similarly by examining pairwise gender references in the LM-generated responses to a subset of prompts specific to romantic relationships (Table 1). Based on our gender model, we are able to model six relationship pairs, implying various sexual orientations (NB-NB, NB-F, NB-M, F-F, M-M, F-M). As with gender, our list of quantifiable sexual orientations is limited to those that can be inferred through textual proxies alone. For example, we are not able to model bisexual identity in our study setting, where responses consist of a single relationship story (and bisexual relationships may span several of the pairs we model). Our models for gender and sexual orientation are thus non-exhaustive and do not capture the full spectrum of identities or relationships that may be implied in open-ended language use cases. We base our quantitative model on frequently observed gender references in LM-generated texts. For modeling gender associations in textual cues, we utilize the concept of word lists that have been used in both studies on algorithmic bias in language models and social psychology23,24. Previous works only consider binary genders34,142, yet we observe gender-neutral pronouns in language model outputs and extend prior word lists to capture non-binary genders. Noting the potential volatility of word lists in bias research143, we provide our complete list of gendered references with a mapping to broad gender categories in Table S6a. Out of the 500,000 stories we collect, we observe a handful of cases where gender and sexuality labels are explicitly specified in LM-generated text. Given their small sample, we analyze these qualitatively (see Section “Qualitative Coding for Explicit Stereotype Analysis”).
We model seven categories of racialization corresponding to the latest OMB-approved Census classifications66: American Indian or Alaska Native (AI/AN), Native Hawaiian or Pacific Islander (NH/PI), Middle Eastern or North African (MENA), Hispanic or Latino (we adopt Latine as a gender-neutral label), Asian, African-American or Black, and White. For modeling racial associations in textual cues, we use fractional counting, which has been shown in related studies to avoid issues of bias and algorithmic undercounting that impact minoritized races when using categorical modeling141. Following this approach, a fractional racial likelihood is assigned to a name based on open-sourced datasets of individuals reporting self-identified race, via mortgage applications125 or voter registrations126. We model race using the given name as the majority (90.9%) of LM responses to our prompts refer to individuals using given names only. While given names do not correspond to racial categories in a mutually exclusive manner (for example, the name Joy may depict an individual of any race), they still carry a perceived racial signal, as proven by bias studies across multiple settings17,18,19,34,39,40. Specifically, we define racial likelihood as the proportion of individuals with a given name self-identifying as a particular race:
Modeling observed race at an aggregate level enables us to better capture occurrences where any given name may be chosen by individuals from a wide distribution of races, albeit at different statistical likelihoods for a given context or time frame. Therefore, the choice of dataset(s) influences the degree to which fractional counting can account for various factors that shape name distribution, such as trends in migration. We are unable to use the U.S. Census data directly, as it only releases surname information. Therefore, we base our fractional counting on two complementary datasets for which data on given names is present. The first dataset we leverage is open-sourced Florida Voter Registration Data from 2017 and 2022126, which contains names and self-identified race classifications for 27,420,716 people comprising 447,170 unique given names. Of the seven racial categories in the latest OMB-proposed Census66, the Florida Voter Registration Data contains five: White, Hispanic or Latino, Black, Asian Pacific Islander (API), and American Indian or Alaska Native (AI/AN). While any non-Census dataset is an approximation of racial categories (and even the Census itself approximates the general population), we find this dataset to be the most appropriate publicly available dataset out of all candidate datasets identified for which a large number of named individuals self-report racial identity125,126,127. First, it models a greater number and granularity of race/ethnicity categories compared other datasets. For example, Rosenman, Olivella, & Imai127 leverage voter registration data from six states but categorically omit AI/AN as a label by aggregating this racial category as Other. Second, we find that the degree of sampling bias introduced by the data collection process of voting is lower than the comparable sampling bias introduced by other dataset methods, such as mortgage applications125, which systematically underrepresent Black and Latine individuals. Of the candidate datasets we evaluated, Florida voter registration data126 most closely approximates the racial composition of the US Census, deviating by no more than 4.57% for all racial groups (with the largest gap due to representing White individuals at 63.87% compared to 2021 Census levels of 59.30%). By contrast, mortgage application data125 overcounts White individuals with a representation of 82.33% (deviation of +23.03%) while undercounting Black individuals with a representation of 4.20% (deviation of −9.32%).
Nevertheless, using approximations to the US Census in the absence of country-wide given name identification introduces limitations. In particular, Florida is one of many states with a large elderly population, which influences the distribution of names according to generational trends. Historical patterns of migration, warfare, and settlement also shape the distribution of named individuals within demographic subgroups, restricting the degree to which any state’s geography may substitute as a fully representative sample of national name-race trends. One illustrative example is Florida’s Seminole community (originating from yat’siminoli, or free people), an Indigenous nation that has maintained their sovereignty in the Florida Everglades86. Similar heterogeneity shapes Florida’s Latine demographic due to geopolitical events such as the 1980 protests at the Peruvian embassy in Cuba and the ensuing governmental response that eventually drove hundreds of thousands of Cuban people to Florida144.
In general, no racial group is a monolith, and broad race categorizations can obscure the identities of meaningful sub-groups46,47. The history of race as a social construct reveals its multidimensional and overlapping nature with other social constructs such as religion, class87, kinship83, and national identity84. For example, the exclusion of country-of-origin identities (i.e., Chinese, Indian, Nigerian) and the omission (via aggregation) of individuals identifying as MENA or NH/PI into the White or Asian/ Pacific Islander categories, respectively, masks their marginalization within these categories. These limitations remain a persistent issue within widely adopted data collection methods for race and/or ethnicity, including the U.S. Census (which, in 2023, proposed adding MENA as a race in addition to allowing open-ended self-identification of ethnicity). To the best of our knowledge, this operational shortcoming affects all publicly available research datasets containing a large number of individuals that self-classify U.S. racial categories with given name data125,126,127. Furthermore, we recognize that quantitative and computational methods can be emancipatory145 and used to foster collective solidarity, reclaim forgotten histories and hold power to account26.
To address the problem of categorical omission, we leverage an additional data source to approximate the racial likelihood of names for MENA and NH/PI populations. We build on the approach developed by Le, Himmelstein, Hippen, Gazzara, & Greene146 that uses data of named individuals on Wikipedia to analyze disparities in academic honorees by country of origin. Our approach leverages OMB’s proposed hierarchical race and ethnicity classifications to approximate race for the two missing categories by mapping existing country lists for both racial groups to Wikipedia’s country taxonomy. For MENA, we build upon OMB’s country list66 based on a study of MENA-identifying community members147. For NH/PI, we leverage public health guides for Asian American individuals intended for disaggregating Pacific Islanders from API148. The full list of countries we use is provided in Table S6b. Due to the demographic bias of Wikipedia editors124, Wikipedia is likely to over-represent Anglicized names and under-represent MENA and NH/PI names. Therefore, we would expect the names extracted from these two racial categories in the aggregate to show results in our study that are more similar to the treatment of White names as opposed to other minoritized races. However, our study shows the opposite to be true (see Sections “Limitations” and “Ethical and Societal Impact”). We find that language models generate text outputs that under-represent names approximated from MENA and NH/PI countries in power-neutral portrayals, and subordinate these names when power dynamics are introduced, similar to other minoritized races, genders, and sexual orientations. For full technical details and replication, see Supplementary Methods B, Tables S7–S9.
Qualitative Coding for Explicit Stereotype Analysis
Our quantitative approach in Section “Modeling Gender, Sexual Orientation, and Race” models the associations between textual identity cues and social portrayals at the aggregate level, which assesses implicit stereotypes in settings where consumers may be primed via repeated engagement with LMs. This exemplifies what other scholars describe as distributional harms149. By contrast, instance harms consist of a single LM output that is damaging on its own, such as a single story that contains one or more explicit stereotypes that perpetuate wrongful, overgeneralized beliefs about demographic groups64. Modeling instance harms requires going deeper than statistical analyses of gender references and names. To model explicit stereotypes, we follow the critical mixed methods approach proposed by Lukito & Pruden130. The first step identifies stereotypes via open-ended reading on a representative subset of the LM-generated texts sampled from the most frequently occurring identity cues for each intersectional demographic group. Second, we operationalize stereotypes from open-ended reading (e.g., white savior, perpetual foreigner, and noble savage) to construct a codebook using definitions grounded in relevant social sciences literature72,77,81. Next, we iteratively codified stereotypes across multiple authors who served as raters to validate our constructs. Finally, based on the coding process, we create clusters of stories organized around non-exclusive combinations of stereotypes, choosing representative stories to highlight stereotypes by sampling from the largest cluster within each identity category, as shown in Section “Patterns of Stereotyping” (see Supplementary Methods B section 7 for more details on qualitative procedure, definitions, codebook construction and interrater reliability).
Statistical Methods
We calculate two-tailed p-values for all statistics defined in the paper. These statistics consist of ratios that either compare one demographic distribution against a fixed distribution (e.g., representation ratios) or ratios that compare two demographic distributions against each other (e.g., subordination ratios). We parametrize the former as a binomial distribution, as the comparison distributions may be considered as non-parametric constants for which underlying counts are not available (e.g., Census-reported figures, see Eq. 1 and Extended Technical Details in the Supplementary Methods). We calculate two-tailed p-values for these using the Wilson score interval, which is shown to perform better than the normal approximation for skewed observations approaching zero or one by allowing for asymmetric intervals150. This is well-suited for our data, where we observe a long-tail of probabilities (see Section “Patterns of Omission” for examples). While the Wilson score interval does not require normality, it assumes datasets with multiple independent samples and also assumes that all values lie in the interval [0, 1], which we confirm in our dataset.
We parametrize ratios between two statistics (see Eqs. 3 and 4) using binomial ratio distributions. First, we take the log-transform for both ratios, which may then be approximated by the normal distribution as shown by Katz in obtaining confidence intervals for risk ratios151. Following this procedure, we compute two-tailed p-values by calculating the standard error directly on the log-transformed confidence intervals152. Crucially, the log-transform does not require normality in the numerator or denominator of the ratios. Similar to the Wilson score intervals, the distributions must fit a binomial distribution with independent samples lying in the interval [0, 1], as confirmed in our data.
For ratios that compare one demographic distribution against a fixed proportion (i.e., representation ratios), we also report Cohen’s d as the effect size statistic to account for the potential impacts of standard deviation in the demographic distribution. For ratios that compare two demographic distributions against each other, we note that the reported statistic (i.e., subordination ratios) is equivalent to the odds ratio as an appropriate measure of effect size. All inferential statistics reported in the main article include degrees of freedom v, p-value, 95% confidence interval, and the corresponding effect size statistic.
Reporting Summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The Laissez-Faire Prompts data generated in this study have been deposited in the Harvard Dataverse repository [https://doi.org/10.7910/DVN/WF8PJD]. The auxiliary datasets we use in this study (e.g., to model racial associations to names, following previous approaches126,146) can also be found on public Harvard Dataverse and GitHub repositories, including Florida Voter Registration Data [https://doi.org/10.7910/DVN/UBIG3F] and named individuals on Wikipedia [https://doi.org/10.1016/j.cels.2021.07.007]. We provide additional technical details in Supplementary Methods B and document our dataset with a Datasheet153 in Supplementary Methods E.
Code availability
The code is available here: https://doi.org/10.5281/zenodo.17905666, which provides utilities for querying generative language models for datasets that are generated and analyzed during the current study154.
References
Metz, C. What exactly are the dangers posed by AI? The New York Times (2023). Available at: https://www.nytimes.com/2023/05/01/technology/ai-problems-danger-chatgpt.html (Accessed: 17th December 2023).
Nguyen, T., Jump, A. & Casey, D. Emerging tech impact radar: 2023. (Gartner, 2023). Available at: https://www.gartner.com/en/doc/emerging-technologies-and-trends-impact-radar-excerpt (Accessed: 17th December 2023).
Extance, A. ChatGPT has entered the classroom: how LLMs could transform education. Nature 623, 474–477 (2023).
Markel, J. M., Opferman, S. G., Landay, J. A. & Piech, C. Gpteach: Interactive TA training with GPT-based students. In Proceedings of the tenth ACM conference on learning@ scale 226–236 https://doi.org/10.1145/3573051.3593393 (2023).
Khan, S. How AI could save (not destroy) education. Sal Khan: How AI could save (not destroy) education [Video], TED Talk (April 2023). https://www.ted.com/talks/sal_khan_how_ai_could_save_not_destroy_education?utm_campaign=tedspread&utm_medium=referral&utm_source=tedcomshare (2023).
Peeples, J. The Future of Education? California Teachers Association (2023). Available at: https://www.cta.org/educator/posts/the-future-of-education. (Accessed: 17th December 2023).
Jörke, M. et al. GPTCoach: Towards LLM-Based Physical Activity Coaching. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Vol. 993, 1–46 (2025).
OpenAI. Teaching with AI. Available at: https://openai.com/blog/teaching-with-ai (Accessed: 17th December 2023).
Hayden Field. OpenAI announces first partnership with a university (CNBC, 2024). Retrieved from: https://www.cnbc.com/2024/01/18/openai-announces-first-partnership-with-a-university.html (Accessed: 19th January 2024).
Chow, A. R. Why people are confessing their love for AI chatbots. Time (2023). Available at: https://time.com/6257790/ai-chatbots-love/ (Accessed: 17th December 2023).
Carballo, R. Using AI to talk to the dead. The New York Times (2023). Available at: https://www.nytimes.com/2023/12/11/technology/ai-chatbots-dead-relatives.html (Accessed: 17th December 2023).
Coyle, J. In Hollywood Writers’ Battle Against AI, Humans Win (For Now). AP News (2023). Available at: https://apnews.com/article/sianood-ai-strike-wga-artificial-intelligence-39ab72582c3a15f77510c9c30a45ffc8 (Accessed: 17th December 2023).
Wells, K. Eating disorder helpline takes down chatbot after it gave weight loss advice. NPR (2023). Available at: https://www.npr.org/2023/06/08/1181131532/eating-disorder-helpline-takes-down-chatbot-after-it-gave-weight-loss-advice (Accessed: 17th December 2023).
Fang, X., Che, S., Mao, M., Zhang, H., Zhao, M. & Zhao, X. Bias of AI-generated content: an examination of news produced by large language models. Sci. Rep. 14, 5224 (2024).
Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. & Daneshjou, R. Large language models propagate race-based medicine. NPJ Digit. Med. 6, 195 (2023).
Warr, M., Oster, N. J. & Isaac, R. Implicit bias in large language models: experimental proof and implications for education. J. Res. Technol. Educ. 57, 1–24 (2024).
Armstrong, L., Liu, A., MacNeil, S. & Metaxa, D. The Silicon Ceiling: Auditing GPT’s Race and Gender Biases in Hiring. In Proceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (1–18) https://doi.org/10.1145/3689904.3694699 (2024).
Kaplan, D. M. et al. What’s in a Name? Experimental evidence of gender bias in recommendation letters generated by ChatGPT. J. Med. Internet Res. 26, e51837 (2024).
Noble, S. U. Algorithms of Oppression: How Search Engines Reinforce Racism. (New York University Press, 2018).
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the Dangers of Stochastic Parrots. In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3442188.344592 (2021).
Benjamin, R. Race After Technology: Abolitionist Tools for the New Jim Code. (John Wiley & Sons, 2019).
Dastin, J. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (2018). Available at: https://jp.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G (Accessed: 17th December 2023).
Steele, J. R. & Ambady, N. “Math is Hard!” The Effect of Gender Priming on Women’s Attitudes. J. Exp. Soc. Psychol. 42, 428–436 (2006).
Shih, M., Pittinsky, T. L. & Ambady, N. Stereotype susceptibility: Identity salience and shifts in quantitative performance. Psychol. Sci. 10, 80–83 (1999).
Solove, D. J. & Citron, D. K. Risk and Anxiety: A Theory of Data Breach Harms. Tex. L. Rev. 96, 737 (2017).
D’Ignazio, C. & Klein, L. 4.“What Gets Counted Counts.” In Data Feminism. Retrieved from https://data-feminism.mitpress.mit.edu/pub/h1w0nbqp (2020).
Buolamwini, J. & Gebru, T. Gender Shades: Intersectional accuracy disparities in commercial gender classification. In Proc. 1st Conference on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).
Ovalle, A., Subramonian, A., Gautam, V., Gee, G. & Chang, K.-W. Factoring the matrix of domination: a critical review and reimagination of intersectionality in AI Fairness. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society https://doi.org/10.1145/3600211.3604705 (2023).
Dixon-Román, E., Nichols, T. P. & Nyame-Mensah, A. The racializing forces of/in AI educational technologies. Learn. Media Technol. 45, 236–250 (2020)
Broussard, M. Auditing Algorithmic Medical Systems to Uncover AI Harms and Remedy Racial Injustice. In Oxford Intersections: Racism by Context (ed. Dhanda, M.), (Oxford, online edn., Oxford Academic, 2025), https://doi.org/10.1093/9780198945246.003.0020, Accessed [12/23/2025].
Sheng, E., Chang, K.-W., Natarajan, P. & Peng, N. The woman worked as a babysitter: on biases in language generation. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP, 2019). https://doi.org/10.18653/v1/d19-1339.
Cheng, M., Durmus, E. & Jurafsky, D. Marked Personas: using natural language prompts to measure stereotypes in language models https://doi.org/10.48550/ARXIV.2305.18189 (2023).
Dhamala, J. et al. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3442188.3445924 (2021).
Bommasani, R., Liang, P. & Lee, T. Holistic Evaluation of Language Models. Annals of the New York Academy of Sciences (John Wiley & Sons, 2023).
Kirk, H. R. et al. Bias out-of-the-box: an empirical analysis of intersectional occupational biases in popular generative language models. Adv. Neural Inf. Process. Syst. 34, 2611–2624 (2021).
Wan, Y. & Chang, K. W. White men lead, black women help? Benchmarking and mitigating language agency social biases in LLMs. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics 9082–9108 (Association for Computational Linguistics, 2025).
Guo, W. & Caliskan, A. Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society 122–133 https://doi.org/10.1145/3461702.3462536 (2021).
An, H., Acquaye, C., Wang, C., Li, Z. & Rudinger, R. Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender? Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2, 386–397 (2024).
Bertrand, M. & Mullainathan, S. Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review. https://doi.org/10.3386/w9873 (2003).
Sweeney, L. Discrimination in Online Ad Delivery. Queue 11, 10–29 (2013).
Blodgett, S. L., Barocas, S., Daumé III, H. & Wallach, H. Language (technology) is Power: A Critical Survey of “Bias” in NLP. In Proc. 58th Annual Meeting of the Association for Computational Linguistics https://doi.org/10.18653/v1/2020.acl-main.485 (2020).
Vassel, F. M., Shieh, E., Sugimoto, C. R. & Monroe-White, T. The psychosocial impacts of generative AI harms. In Proceedings. AAAI Symposium Series (Vol. 3, No. 1, pp. 440-447) https://doi.org/10.1609/aaaiss.v3i1.31251 (2024).
Leidinger, A. & Rogers, R. How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7, 839–854 (2024).
Bai, X., Wang, A., Sucholutsky, I. & Griffiths, T. L. Explicitly unbiased large language models still form biased associations. Proc. Natl. Acad. Sci. U.S.A. 122, e2416228122 (2025).
Kumar, A., Yunusov, S. & Emami, A. Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 1, 375–392 (2024).
Hanna, A., Denton, E., Smart, A. & Smith-Loud, J. Towards a critical race methodology in algorithmic fairness. In Proc. 2020 Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3351095.3372826 (2020).
Field, A., Blodgett, S. L., Waseem, Z. & Tsvetkov, Y. A Survey of Race, Racism, and Anti-Racism in NLP. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol 1: Long Papers) https://doi.org/10.18653/v1/2021.acl-long.149(2021).
Fealing, K. H. & Incorvaia, A. D. Understanding diversity: overcoming the small-N problem. Harv. Data Sci. Rev. (2022). Available at: https://hdsr.mitpress.mit.edu/pub/vn6ib3o5/release/1. (Accessed: 17th December 2023).
Crenshaw, K. W. Mapping the Margins: Intersectionality, Identity Politics, and Violence Against Women of Color. Stanf. Law Rev. 43, 1241 (1991).
Cho, S., Crenshaw, K. W. & McCall, L. Toward a Field of Intersectionality Studies: Theory, Applications, and Praxis. J. Women Cult. Soc. 38, 785–810 (2013).
Collins, P. H. Black Feminist Thought: Knowledge, consciousness, and the politics of empowerment. Hyman (1990).
Crenshaw, K. On Intersectionality: The Essential Writings of Kimberley Crenshaw. (Mcmillan, 2015).
May, V. M. Pursuing intersectionality, unsettling dominant imaginaries. (Routledge, 2015).
Steele, C. M. & Aronson, J. Stereotype threat and the intellectual test performance of African Americans. J. Personal. Soc. Psychol. 69, 797–811 (1995).
Davies, P. G., Spencer, S. J., Quinn, D. M. & Gerhardstein, R. Consuming Images: how television commercials that elicit stereotype threat can restrain women academically and professionally. Personal. Soc. Psychol. Bull. 28, 1615–1628 (2002).
Devine, P. G. Stereotypes and prejudice: their automatic and controlled components. J. Personal. Soc. Psychol. 56, 5–18 (1989).
Elliott-Groves, E. & Fryberg, S. A. “A future denied” for young indigenous people: from social disruption to possible futures. Handbook of Indigenous Education 1–19 (Springer Nature, 2017).
Shelby, R. et al. Sociotechnical harms of algorithmic systems: scoping a taxonomy for harm reduction. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society https://doi.org/10.1145/3600211.3604673 (2023).
Lazar, S. & Nelson, A. AI Safety on Whose Terms? Science 381, 138–138 (2023).
Monroe-White, T., Marshall, B. & Contreras-Palacios, H. Waking up to Marginalization: Public Value Failures in Artificial Intelligence and Data Science. In Artificial Intelligence Diversity, Belonging, Equity, and Inclusion (7–21). (PMLR, 2021).
Gebru, T. & Torres, É. P. The TESCREAL bundle: Eugenics and the promise of utopia through artificial general intelligence. First Monday. 29, https://doi.org/10.5210/fm.v29i4.13636 (2024).
Gillespie, T. Generative AI and the politics of visibility. Big Data Soc. 11, 20539517241252131 (2024).
Kotek, H., Sun, D. Q., Xiu, Z., Bowler, M. & Klein, C. Protected group bias and stereotypes in Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2403.14727 (2024).
Dev, S. et al. On measures of biases and harms in NLP. In Findings of the Association for Computational Linguistics: AACL-IJCNLP. 246–267, Online only. https://doi.org/10.18653/v1/2022.findings-aacl.24 (2022).
Merrill, J. B. & Lerman, R. What do people really ask chatbots? It’s a lot of sex and homework. Washington Post (August 2024). https://www.washingtonpost.com/technology/2024/08/04/ 165 chatgpt-use-real-ai-chatbot-conversations/ (Accessed: 19th September 2024).
U.S. Office of Management and Budget. Initial Proposals for Updating OMB’s Race and Ethnicity Statistical Standards. Federal Register. Available at: https://www.federalregister.gov/documents/2023/01/27/2023-01635/initial-proposals-for-updating-ombs-race-and-ethnicity-statistical-standards (Accessed: 17th December 2023).
Deng, B. & Watson, T. LGBTQ+ data availability. Brookings (2023). Available at: https://www.brookings.edu/articles/lgbtq-data-availability-what-we-can-learn-from-four-major-surveys/ (Accessed: 17th December 2023).
Anderson, L, File, T., Marshall, J., McElrath, K. & Scherer, Z. New household pulse survey data reveal differences between LGBT and non-LGBT respondents during COVID-19 Pandemic. (Census.gov, 2022). Available at: https://www.census.gov/library/stories/2021/11/census-bureau-survey-explores-sexual-orientation-and-gender-identity.html (Accessed: 17th December 2023).
Cvencek, D., Meltzoff, A. N. & Greenwald, A. G. Math–gender stereotypes in elementary school children. Child Dev. 82, 766–779 (2011).
Murphy, M. C., Steele, C. M. & Gross, J. J. Signaling threat: how situational cues affect women in math, science, and engineering settings. Psychol. Sci. 18, 879–885 (2007).
Hurst, K. US women are outpacing men in college completion, including in every major racial and ethnic group. https://www.pewresearch.org/short-reads/2024/11/18/us-women-are-outpacing-men-in-college-completion-including-in-every-major-racial-and-ethnic-group/ (2024).
Huynh, Q.-L., Devos, T. & Smalarz, L. Perpetual foreigner in One’s Own Land: potential implications for identity and psychological adjustment. J. Soc. Clin. Psychol. 30, 133–162 (2011).
Gonzales, P. M., Blanton, H. & Williams, K. J. The effects of stereotype threat and double-minority status on the test performance of Latino women. Personal. Soc. Psychol. Bull. 28, 659–670 (2002).
Hemmatian, B. & Varshney, L. R. Debiased large language models still associate muslims with uniquely violent acts. Preprint at https://doi.org/10.31234/osf.io/xpeka(2022).
Li, P. Recent developments: hitting the ceiling: an examination of barriers to success for Asian American women. Berkeley J. Gend. Law Justice 29, 140–167 (2014).
Steketee, A., Williams, M. T., Valencia, B. T., Printz, D. & Hooper, L. M. Racial and language microaggressions in the school ecology. Perspect. Psychol. Sci. 16, 1075–1098 (2021).
Aronson, B. A. The white savior industrial complex: a cultural studies analysis of a teacher educator, savior film, and future teachers. J. Crit. Thought Prax. 6, 36–54 (2017).
Alexander, M. The New Jim Crow: Mass Incarceration in the Age of Colorblindness, New Press, New York (2010).
Waugh, L. R. Marked and unmarked: a choice between unequals in semiotic structure. Semiotica 38, 299–318 (1982).
Felkner, V. K., Chang, H. C. H., Jang, E. & May, J. WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models. In the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 9126–9140. https://doi.org/10.18653/v1/2023.acl-long.507 (2023).
Deloria, P. J. Playing indian. Yale (University Press, 2022).
Leavitt, P. A., Covarrubias, R., Perez, Y. A. & Fryberg, S. A. “Frozen in time”: The Impact of Native American Media Representations on Identity and Self-understanding. J. Soc. Issues 71, 39–53 (2015).
Witgen, M. An Infinity Of Nations: How The Native New World Shaped Early North America. (University of Pennsylvania Press, 2011).
Khalid, A. Central Asia: A New History From The Imperial Conquests To The Present. (Princeton University Press, 2021).
Said, E. W. Culture and Imperialism. Vintage Books (1994).
Dunbar-Ortiz, R. An Indigenous Peoples’ History Of The United States. (Beacon Press, 2023).
Dunbar-Ortiz, R. Not “a nation of immigrants”: Settler colonialism, white supremacy, and a history of erasure and exclusion. (Beacon Press, 2021).
Immerwahr, D. How to hide an empire: a history of the greater United States. 1st edn. (Farrar, Straus and Giroux, 2019).
Shieh, E. & Monroe-White, T. Teaching Parrots to See Red: Self-Audits of Generative Language Models Overlook Sociotechnical Harms. In Proceedings of the AAAI Symposium Series Vol. 6, 333–340 https://doi.org/10.1609/aaaiss.v6i1.36070 (2025).
Feffer, M., Sinha, A., Deng, W. H., Lipton, Z. C. & Heidari, H. Red-Teaming for Generative AI: Silver Bullet or Security Theater? Proceedings of the AAAI/ACM Conference on AI Ethics and Society 7, 421–437 (2024).
Schopmans, H. R. From Coded Bias to Existential Threat.In Proc. 2022 AAAI/ACM Conference on AI, Ethics, and Society https://doi.org/10.1145/3514094.3534161 (2022).
Askell, A. et al. A general language assistant as a laboratory for alignment. Preprint at https://doi.org/10.48550/arXiv.2112.00861 (2021).
Doshi, T. How We’ve created a helpful and responsible bard experience for Teens. Google: The Keyword–Product Updates. Retrieved from: https://blog.google/products/bard/google-bard-expansion-teens/ (2023).
Devinney, H., Björklund, J. & Björklund, H. We don’t talk about that: case studies on intersectional analysis of social bias in large language models. In Workshop on Gender Bias in Natural Language Processing (GeBNLP), Bangkok, Thailand, 16th August 2024. (33–44). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.gebnlp-1.3 (2024).
Narayanan Venkit, P., Gautam, S., Panchanadikar, R., Huang, T. H., & Wilson, S. Unmasking nationality bias: A study of human perception of nationalities in AI-generated articles. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (554–565) https://doi.org/10.1145/3600211.3604667 (2023).
Luccioni, A. S., Akiki, C., Mitchell, M. & Jernite, Y. Stable bias: evaluating societal representations in diffusion models. Adv. Neural Inf. Process. Syst. 36, 56338–56351 (2023).
Ghosh, S., Venkit, P. N., Gautam, S., Wilson, S., & Caliskan, A. Do generative AI models output harm while representing non-Western cultures: Evidence from a community-centered approach. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7, 476–489 (2024).
Raj, C., Mukherjee, A., Caliskan, A., Anastasopoulos, A. & Zhu, Z. BiasDora: Exploring Hidden Biased Associations in Vision-Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 1, 10439–10455 (2024).
Lee, M. H., Montgomery, J. M. & Lai, C. K. Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (1321–1340) https://doi.org/10.1145/3630106.3658975 (2024).
Wang, A., Morgenstern, J. & Dickerson, J. P. Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat. Mach. Intell. 7, 400–411 (2025).
Bargh, J. A. & Chartrand, T. L. Studying the mind in the middle: a practical guide to priming and automaticity. Handbook of Research Methods in Social and Personality Psychology Vol. 2, 253–285 (Cambridge University Press, 2000).
Guendelman, M. D., Cheryan, S. & Monin, B. Fitting in but getting fat: Identity threat and dietary choices among US immigrant groups. Psychol. Sci. 22, 959–967 (2011).
Hooker, S. Moving beyond “algorithmic bias is a data problem”. Patterns 2, 4 (2021).
Spencer, S. J., Logel, C. & Davies, P. G. Stereotype threat. Ann. Rev. Psychol. 67, 415–437 (2016).
Gaucher, D., Friesen, J. & Kay, A. C. Evidence that gendered wording in job advertisements exists and sustains gender inequality. J. Personal. Soc. Psychol. 101, 109 (2011).
Pataranutaporn, P. et al. AI-generated characters for supporting personalized learning and well-being. Nat. Mach. Intell. 3, 1013–1022 (2021).
Steele, C. M. A threat in the air: how stereotypes shape intellectual identity and performance. Am. Psychol. 52, 613 (1997).
McGee, E. “Black Genius, Asian Fail”: The Detriment of Stereotype Lift and Stereotype Threat in High-Achieving Asian and Black STEM Students. AERA Open (2018).
Dastin, J. US explores AI to train immigration officers on talking to refugees. Reuters. https://www.reuters.com/world/us/us-explores-ai-train-immigration-officers-talking-refugees-2024-05-08/ (Accessed: 13th May 2024).
Brown, B. A., Reveles, J. M. & Kelly, G. J. Scientific literacy and discursive identity: a theoretical framework for understanding science learning. Sci. Edu. 89, 779–802 (2005).
Mei, K., Fereidooni, S. & Caliskan, A. Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (1699–1710) (ACM, 2023).
Tan, X. E. et al. Towards Massive Multilingual Holistic Bias. Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP) 1, 403–426 (2025).
Nguyen, I., Suresh, H. & Shieh, E. Representational Harms in LLM-Generated Narratives Against Nationalities Located in the Global South. HEAL Workshop, CHI 2025 https://heal-workshop.github.io/chi2025_papers/50_Representational_Harms_in_L.pdf (2025).
Scao, T. L., et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100. https://doi.org/10.48550/arXiv.2211.05100 (2022).
White House. (2022). Blueprint for an AI bill of rights: Making automated systems work for the American people. https://bidenwhitehouse.archives.gov/ostp/ai-bill-of-rights/.
Hashmi, N., Lodge, S., Sugimoto, C. R., & Monroe-White, T. Echoes of Eugenics: Tracing the Ideological Persistence of Scientific Racism in Scholarly Discourse. In Proceedings of the 5th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, 82–92 https://doi.org/10.1145/3757887.3768171 (2025).
Raji, I. D., Denton, E., Bender, E. M., Hanna, A. & Paullada, A. AI and the Everything in the Whole Wide World Benchmark. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2. https://doi.org/10.48550/arXiv.2111.15366 (2021).
Bommasani, R. et al. The Foundation Model Transparency Index. arXiv:2310.12941. Retrieved from https://arxiv.org/abs/2310.12941.
Edwards, B. Exponential growth brews 1 million AI models on Hugging Face. Arstechnica. https://arstechnica.com/information-technology/2024/09/ai-hosting-platform-surpasses-1-million-models-for-the-first-time/ (Accessed: 29th September 2024).
Cheryan, S., Plaut, V. C., Davies, P. G. & Steele, C. M. Ambient belonging: how stereotypical cues impact gender participation in computer science. J. Personal Soc Psychol. 97, 1045 (2009).
Sanders, M. G. Overcoming obstacles: Academic achievement as a response to racism and discrimination. J. Negro Educ. 66, 83–93 (1997).
Tanksley T. C. We’re changing the system with this one: Black students using critical race algorithmic literacies to subvert and survive AI-mediated racism in school. English Teaching: Practice & Critique, 23, 36–56, (2024).
Solyst, J., Yang, E., Xie, S., Ogan, A., Hammer, J. & Eslami, M. The potential of diverse youth as stakeholders in identifying and mitigating algorithmic bias for a future of fairer AI. Proc. ACM Hum.Comput. Interact. 7, 1–27 (2023).
Wilson, J. Proceed with Extreme Caution: Citation to Wikipedia in Light of Contributor Demographics and Content Policies. Vanderbilt J. Entertain. Technol. Law 16, 857 (2014).
Tzioumis, K. Demographic aspects of first names. Sci Data 5, 180025 (2018).
Sood, G. Florida Voter Registration Data https://doi.org/10.7910/DVN/UBIG3F (2022). 2017 and 2022.
Rosenman, E. T. R., Olivella, S. & Imai, K. Race and ethnicity data for first, middle, and surnames. Sci Data 10, 299 (2023).
Bridgland, V. M. E., Jones, P. J. & Bellet, B. W. A Meta-analysis of the efficacy of trigger warnings, content warnings, and content notes. Clin. Psychol. Sci. 12, 751–771 (2022).
Wohl vs. United States, Department of Justice, Civil Rights Division, Southern District of New York. 2022. National Coalition on Black Civic Participation vs. – Statement of Interest of the United States of America. United States Department of Justice (2022). https://www.justice.gov/d9/case-documents/attachments/2022/08/12/ncbp_v_wohl_us_soi_filed_8_12_22_ro_tag.pdf.
Lukito, J. & Pruden, M. L. Critical computation: mixed-methods approaches to big language data analysis. Rev. Commun. 23, 62–78 (2023).
Griffith, E. & Metz, C. A New Area of A.I. Booms, Even Amid the Tech Gloom. The New York Times (2023).
U.S. White House. FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI. The White House. Available at: https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/ (Accessed: 17th December 2023).
Septiandri, A. A., Constantinides, M., Tahaei, M. & Quercia, D. WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT? In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (160–171) (ACM, 2023).
Linxen, S., Sturm, C., Brühlmann, F., Cassau, V., Opwis, K. & Reinecke, K. How weird is CHI? In Proc. 2021 Chi Conference on Human Factors in Computing Systems (1–14) (ACM, 2021).
Atari, M., Xue, M. J., Park, P. S., Blasi, D. E. & Henrich, J. (2023, September 22). Which Humans? https://doi.org/10.31234/osf.io/5b26t.
U.S. Census Bureau QuickFacts: United States (2021). Available at: https://www.census.gov/quickfacts/fact/table/US/PST045222 (Accessed: 17th December 2023).
Master, A., Cheryan, S. & Meltzoff, A. N. Motivation and identity. In Handbook of Motivation at School (300-319). (Routledge, 2016).
Williams, J. C. Double jeopardy? An empirical study with implications for the debates over implicit bias and intersectionality. Harv. J. Law Gend. 37, 185 (2014).
Aronson, J., Quinn, D. M. & Spencer, S. J. Stereotype Threat and the Academic Underperformance of Minorities and Women (83–103) (Prejudice Academic Press 1998).
Cao, Y. T. & Daumé, H. III Toward gender-inclusive coreference resolution: an analysis of gender and bias throughout the machine learning lifecycle. Comput. Linguist. 47, 615–661 (2021).
Kozlowski, D. et al. Avoiding Bias When Inferring Race Using Name-based Approaches. PLoS ONE 17, e0264270 (2022).
Bolukbasi, T., Chang K. W., Zou, J., Saligrama, V. & Kalai, A. T. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Adv. Neural Inf. Process. Syst. 29 https://doi.org/10.48550/arXiv.1607.06520 (2016).
Antoniak, M. & Mimno, D. Bad Seeds: Evaluating Lexical Methods for Bias Measurement. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL, 2021).
Blitzer, J. Everyone Who Is Gone Is Here: The United States, Central America, and the Making of a Crisis. Penguin (2024).
Monroe-White, T. Emancipatory data science: a liberatory framework for mitigating data harms and fostering social transformation. In Proc. 2021 Computers and People Research Conference (23–30) (ACM, 2021).
Le, T.T., Himmelstein, D.S., Hippen, A.A., Gazzara, M.R. & Greene, C.S. Analysis of scientific society honors reveals disparities. Cell Syst. 12, 900–906.e5 (2021).
Willson, S., & Dunston, S. Cognitive Interview Evaluation of the Revised Race Question, with Special Emphasis on the Newly Proposed Middle Eastern/North African Response, National Center for Health Statistics. (2017).
Chin, M.K. et al. Manual on collection, analysis, and reporting of asian american health data. AA & NH/PI Health Central. Available at: https://aanhpihealth.org/resource/sian-american-manual-2023/ (Accessed: 17th December 2023).
Rauh, M. et al. Characteristics of harmful text: towards rigorous benchmarking of language models. Adv. Neural Inf. Process. Syst. 35, 24720–24739 (2022).
Wilson, E. B. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. https://doi.org/10.1080/01621459.1927.10502953 (1927).
Katz, D. J. S. M., Baptista, J., Azen, S. P. & Pike, M. C. Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics. 34, 469–474 (1978).
Altman, D. G. & Bland, J. M. How to obtain the P value from a confidence interval. BMJ 343, d2304 (2011).
Gebru, T. et al. Datasheets for Datasets. Commun. ACM 64, 86–92 (2021).
Shieh, E., Vassel, F. M., Sugimoto, C. R., and Monroe-White, T. Intersectional biases in narratives generated by open-ended prompting of generative language models. GitHub. https://doi.org/10.5281/zenodo.17905666 (2025).
Acknowledgements
Authors T.M.-W. and C.R.S. acknowledge funding support from the National Science Foundation under award number SOS-2152288. F.-M.V. acknowledges funding support from the National Science Foundation under award number CCF-1918549. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Diego Kozlowski, Stella Chen, Rahul Gupta-Iwasaki, Gerald Higginbotham, Bryan Brown, Jay Kim, Dakota Murray, James Evans, Zarek Drozda, Ashley Ding, Princewill Okoroafor, and Hideo Mabuchi for helpful inputs and discussion on earlier versions of the manuscript.
Author information
Authors and Affiliations
Contributions
E.S. conceived the study; E.S. and T.M.-W. contributed to the design of the study; E.S. prepared the primary datasets; E.S. and T.M.-W. performed analysis; E.S., F.-M.V. and T.M.-W. contributed to the interpretation of the results and E.S., F.-M.V., C.S. and T.M.-W. contributed to the writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shieh, E., Vassel, FM., Sugimoto, C.R. et al. Intersectional biases in narratives produced by open-ended prompting of generative language models. Nat Commun 17, 1243 (2026). https://doi.org/10.1038/s41467-025-68004-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-68004-9





