Abstract
The rapid deployment of generative language models has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative language models has primarily examined bias via explicit identity prompting. However, prior research on bias in language-based technology platforms has shown that discrimination can occur even when identity terms are not specified explicitly. Here, we advance studies of generative language model bias by considering a broader set of natural use cases via open-ended prompting, which we refer to as a laissez-faire environment. In this setting, we find that across 500,000 observations, generated outputs from the base models of five publicly available language models (ChatGPT 3.5, ChatGPT 4, Claude 2.0, Llama 2, and PaLM 2) are more likely to omit characters with minoritized race, gender, and/or sexual orientation identities compared to reported levels in the U.S. Census, or relegate them to subordinated roles as opposed to dominant ones. We also document patterns of stereotyping across language model–generated outputs with the potential to disproportionately affect minoritized individuals. Our findings highlight the urgent need for regulations to ensure responsible innovation while protecting consumers from potential harms caused by language models.
Similar content being viewed by others
Data availability
The Laissez-Faire Prompts data generated in this study have been deposited in the Harvard Dataverse repository [https://doi.org/10.7910/DVN/WF8PJD]. The auxiliary datasets we use in this study (e.g., to model racial associations to names, following previous approaches126,146) can also be found on public Harvard Dataverse and GitHub repositories, including Florida Voter Registration Data [https://doi.org/10.7910/DVN/UBIG3F] and named individuals on Wikipedia [https://doi.org/10.1016/j.cels.2021.07.007]. We provide additional technical details in Supplementary Methods B and document our dataset with a Datasheet153 in Supplementary Methods E.
Code availability
The code is available here: https://doi.org/10.5281/zenodo.17905666, which provides utilities for querying generative language models for datasets that are generated and analyzed during the current study154.
References
Metz, C. What exactly are the dangers posed by AI? The New York Times (2023). Available at: https://www.nytimes.com/2023/05/01/technology/ai-problems-danger-chatgpt.html (Accessed: 17th December 2023).
Nguyen, T., Jump, A. & Casey, D. Emerging tech impact radar: 2023. (Gartner, 2023). Available at: https://www.gartner.com/en/doc/emerging-technologies-and-trends-impact-radar-excerpt (Accessed: 17th December 2023).
Extance, A. ChatGPT has entered the classroom: how LLMs could transform education. Nature 623, 474–477 (2023).
Markel, J. M., Opferman, S. G., Landay, J. A. & Piech, C. Gpteach: Interactive TA training with GPT-based students. In Proceedings of the tenth ACM conference on learning@ scale 226–236 https://doi.org/10.1145/3573051.3593393 (2023).
Khan, S. How AI could save (not destroy) education. Sal Khan: How AI could save (not destroy) education [Video], TED Talk (April 2023). https://www.ted.com/talks/sal_khan_how_ai_could_save_not_destroy_education?utm_campaign=tedspread&utm_medium=referral&utm_source=tedcomshare (2023).
Peeples, J. The Future of Education? California Teachers Association (2023). Available at: https://www.cta.org/educator/posts/the-future-of-education. (Accessed: 17th December 2023).
Jörke, M. et al. GPTCoach: Towards LLM-Based Physical Activity Coaching. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Vol. 993, 1–46 (2025).
OpenAI. Teaching with AI. Available at: https://openai.com/blog/teaching-with-ai (Accessed: 17th December 2023).
Hayden Field. OpenAI announces first partnership with a university (CNBC, 2024). Retrieved from: https://www.cnbc.com/2024/01/18/openai-announces-first-partnership-with-a-university.html (Accessed: 19th January 2024).
Chow, A. R. Why people are confessing their love for AI chatbots. Time (2023). Available at: https://time.com/6257790/ai-chatbots-love/ (Accessed: 17th December 2023).
Carballo, R. Using AI to talk to the dead. The New York Times (2023). Available at: https://www.nytimes.com/2023/12/11/technology/ai-chatbots-dead-relatives.html (Accessed: 17th December 2023).
Coyle, J. In Hollywood Writers’ Battle Against AI, Humans Win (For Now). AP News (2023). Available at: https://apnews.com/article/sianood-ai-strike-wga-artificial-intelligence-39ab72582c3a15f77510c9c30a45ffc8 (Accessed: 17th December 2023).
Wells, K. Eating disorder helpline takes down chatbot after it gave weight loss advice. NPR (2023). Available at: https://www.npr.org/2023/06/08/1181131532/eating-disorder-helpline-takes-down-chatbot-after-it-gave-weight-loss-advice (Accessed: 17th December 2023).
Fang, X., Che, S., Mao, M., Zhang, H., Zhao, M. & Zhao, X. Bias of AI-generated content: an examination of news produced by large language models. Sci. Rep. 14, 5224 (2024).
Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. & Daneshjou, R. Large language models propagate race-based medicine. NPJ Digit. Med. 6, 195 (2023).
Warr, M., Oster, N. J. & Isaac, R. Implicit bias in large language models: experimental proof and implications for education. J. Res. Technol. Educ. 57, 1–24 (2024).
Armstrong, L., Liu, A., MacNeil, S. & Metaxa, D. The Silicon Ceiling: Auditing GPT’s Race and Gender Biases in Hiring. In Proceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (1–18) https://doi.org/10.1145/3689904.3694699 (2024).
Kaplan, D. M. et al. What’s in a Name? Experimental evidence of gender bias in recommendation letters generated by ChatGPT. J. Med. Internet Res. 26, e51837 (2024).
Noble, S. U. Algorithms of Oppression: How Search Engines Reinforce Racism. (New York University Press, 2018).
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the Dangers of Stochastic Parrots. In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3442188.344592 (2021).
Benjamin, R. Race After Technology: Abolitionist Tools for the New Jim Code. (John Wiley & Sons, 2019).
Dastin, J. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (2018). Available at: https://jp.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G (Accessed: 17th December 2023).
Steele, J. R. & Ambady, N. “Math is Hard!” The Effect of Gender Priming on Women’s Attitudes. J. Exp. Soc. Psychol. 42, 428–436 (2006).
Shih, M., Pittinsky, T. L. & Ambady, N. Stereotype susceptibility: Identity salience and shifts in quantitative performance. Psychol. Sci. 10, 80–83 (1999).
Solove, D. J. & Citron, D. K. Risk and Anxiety: A Theory of Data Breach Harms. Tex. L. Rev. 96, 737 (2017).
D’Ignazio, C. & Klein, L. 4.“What Gets Counted Counts.” In Data Feminism. Retrieved from https://data-feminism.mitpress.mit.edu/pub/h1w0nbqp (2020).
Buolamwini, J. & Gebru, T. Gender Shades: Intersectional accuracy disparities in commercial gender classification. In Proc. 1st Conference on Fairness, Accountability and Transparency 77–91 (PMLR, 2018).
Ovalle, A., Subramonian, A., Gautam, V., Gee, G. & Chang, K.-W. Factoring the matrix of domination: a critical review and reimagination of intersectionality in AI Fairness. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society https://doi.org/10.1145/3600211.3604705 (2023).
Dixon-Román, E., Nichols, T. P. & Nyame-Mensah, A. The racializing forces of/in AI educational technologies. Learn. Media Technol. 45, 236–250 (2020)
Broussard, M. Auditing Algorithmic Medical Systems to Uncover AI Harms and Remedy Racial Injustice. In Oxford Intersections: Racism by Context (ed. Dhanda, M.), (Oxford, online edn., Oxford Academic, 2025), https://doi.org/10.1093/9780198945246.003.0020, Accessed [12/23/2025].
Sheng, E., Chang, K.-W., Natarajan, P. & Peng, N. The woman worked as a babysitter: on biases in language generation. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP, 2019). https://doi.org/10.18653/v1/d19-1339.
Cheng, M., Durmus, E. & Jurafsky, D. Marked Personas: using natural language prompts to measure stereotypes in language models https://doi.org/10.48550/ARXIV.2305.18189 (2023).
Dhamala, J. et al. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency https://doi.org/10.1145/3442188.3445924 (2021).
Bommasani, R., Liang, P. & Lee, T. Holistic Evaluation of Language Models. Annals of the New York Academy of Sciences (John Wiley & Sons, 2023).
Kirk, H. R. et al. Bias out-of-the-box: an empirical analysis of intersectional occupational biases in popular generative language models. Adv. Neural Inf. Process. Syst. 34, 2611–2624 (2021).
Wan, Y. & Chang, K. W. White men lead, black women help? Benchmarking and mitigating language agency social biases in LLMs. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics 9082–9108 (Association for Computational Linguistics, 2025).
Guo, W. & Caliskan, A. Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society 122–133 https://doi.org/10.1145/3461702.3462536 (2021).
An, H., Acquaye, C., Wang, C., Li, Z. & Rudinger, R. Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender? Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2, 386–397 (2024).
Bertrand, M. & Mullainathan, S. Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review. https://doi.org/10.3386/w9873 (2003).
Sweeney, L. Discrimination in Online Ad Delivery. Queue 11, 10–29 (2013).
Blodgett, S. L., Barocas, S., Daumé III, H. & Wallach, H. Language (technology) is Power: A Critical Survey of “Bias” in NLP. In Proc. 58th Annual Meeting of the Association for Computational Linguistics https://doi.org/10.18653/v1/2020.acl-main.485 (2020).
Vassel, F. M., Shieh, E., Sugimoto, C. R. & Monroe-White, T. The psychosocial impacts of generative AI harms. In Proceedings. AAAI Symposium Series (Vol. 3, No. 1, pp. 440-447) https://doi.org/10.1609/aaaiss.v3i1.31251 (2024).
Leidinger, A. & Rogers, R. How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7, 839–854 (2024).
Bai, X., Wang, A., Sucholutsky, I. & Griffiths, T. L. Explicitly unbiased large language models still form biased associations. Proc. Natl. Acad. Sci. U.S.A. 122, e2416228122 (2025).
Kumar, A., Yunusov, S. & Emami, A. Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 1, 375–392 (2024).
Hanna, A., Denton, E., Smart, A. & Smith-Loud, J. Towards a critical race methodology in algorithmic fairness. In Proc. 2020 Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3351095.3372826 (2020).
Field, A., Blodgett, S. L., Waseem, Z. & Tsvetkov, Y. A Survey of Race, Racism, and Anti-Racism in NLP. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol 1: Long Papers) https://doi.org/10.18653/v1/2021.acl-long.149(2021).
Fealing, K. H. & Incorvaia, A. D. Understanding diversity: overcoming the small-N problem. Harv. Data Sci. Rev. (2022). Available at: https://hdsr.mitpress.mit.edu/pub/vn6ib3o5/release/1. (Accessed: 17th December 2023).
Crenshaw, K. W. Mapping the Margins: Intersectionality, Identity Politics, and Violence Against Women of Color. Stanf. Law Rev. 43, 1241 (1991).
Cho, S., Crenshaw, K. W. & McCall, L. Toward a Field of Intersectionality Studies: Theory, Applications, and Praxis. J. Women Cult. Soc. 38, 785–810 (2013).
Collins, P. H. Black Feminist Thought: Knowledge, consciousness, and the politics of empowerment. Hyman (1990).
Crenshaw, K. On Intersectionality: The Essential Writings of Kimberley Crenshaw. (Mcmillan, 2015).
May, V. M. Pursuing intersectionality, unsettling dominant imaginaries. (Routledge, 2015).
Steele, C. M. & Aronson, J. Stereotype threat and the intellectual test performance of African Americans. J. Personal. Soc. Psychol. 69, 797–811 (1995).
Davies, P. G., Spencer, S. J., Quinn, D. M. & Gerhardstein, R. Consuming Images: how television commercials that elicit stereotype threat can restrain women academically and professionally. Personal. Soc. Psychol. Bull. 28, 1615–1628 (2002).
Devine, P. G. Stereotypes and prejudice: their automatic and controlled components. J. Personal. Soc. Psychol. 56, 5–18 (1989).
Elliott-Groves, E. & Fryberg, S. A. “A future denied” for young indigenous people: from social disruption to possible futures. Handbook of Indigenous Education 1–19 (Springer Nature, 2017).
Shelby, R. et al. Sociotechnical harms of algorithmic systems: scoping a taxonomy for harm reduction. In Proc. 2023 AAAI/ACM Conference on AI, Ethics, and Society https://doi.org/10.1145/3600211.3604673 (2023).
Lazar, S. & Nelson, A. AI Safety on Whose Terms? Science 381, 138–138 (2023).
Monroe-White, T., Marshall, B. & Contreras-Palacios, H. Waking up to Marginalization: Public Value Failures in Artificial Intelligence and Data Science. In Artificial Intelligence Diversity, Belonging, Equity, and Inclusion (7–21). (PMLR, 2021).
Gebru, T. & Torres, É. P. The TESCREAL bundle: Eugenics and the promise of utopia through artificial general intelligence. First Monday. 29, https://doi.org/10.5210/fm.v29i4.13636 (2024).
Gillespie, T. Generative AI and the politics of visibility. Big Data Soc. 11, 20539517241252131 (2024).
Kotek, H., Sun, D. Q., Xiu, Z., Bowler, M. & Klein, C. Protected group bias and stereotypes in Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2403.14727 (2024).
Dev, S. et al. On measures of biases and harms in NLP. In Findings of the Association for Computational Linguistics: AACL-IJCNLP. 246–267, Online only. https://doi.org/10.18653/v1/2022.findings-aacl.24 (2022).
Merrill, J. B. & Lerman, R. What do people really ask chatbots? It’s a lot of sex and homework. Washington Post (August 2024). https://www.washingtonpost.com/technology/2024/08/04/ 165 chatgpt-use-real-ai-chatbot-conversations/ (Accessed: 19th September 2024).
U.S. Office of Management and Budget. Initial Proposals for Updating OMB’s Race and Ethnicity Statistical Standards. Federal Register. Available at: https://www.federalregister.gov/documents/2023/01/27/2023-01635/initial-proposals-for-updating-ombs-race-and-ethnicity-statistical-standards (Accessed: 17th December 2023).
Deng, B. & Watson, T. LGBTQ+ data availability. Brookings (2023). Available at: https://www.brookings.edu/articles/lgbtq-data-availability-what-we-can-learn-from-four-major-surveys/ (Accessed: 17th December 2023).
Anderson, L, File, T., Marshall, J., McElrath, K. & Scherer, Z. New household pulse survey data reveal differences between LGBT and non-LGBT respondents during COVID-19 Pandemic. (Census.gov, 2022). Available at: https://www.census.gov/library/stories/2021/11/census-bureau-survey-explores-sexual-orientation-and-gender-identity.html (Accessed: 17th December 2023).
Cvencek, D., Meltzoff, A. N. & Greenwald, A. G. Math–gender stereotypes in elementary school children. Child Dev. 82, 766–779 (2011).
Murphy, M. C., Steele, C. M. & Gross, J. J. Signaling threat: how situational cues affect women in math, science, and engineering settings. Psychol. Sci. 18, 879–885 (2007).
Hurst, K. US women are outpacing men in college completion, including in every major racial and ethnic group. https://www.pewresearch.org/short-reads/2024/11/18/us-women-are-outpacing-men-in-college-completion-including-in-every-major-racial-and-ethnic-group/ (2024).
Huynh, Q.-L., Devos, T. & Smalarz, L. Perpetual foreigner in One’s Own Land: potential implications for identity and psychological adjustment. J. Soc. Clin. Psychol. 30, 133–162 (2011).
Gonzales, P. M., Blanton, H. & Williams, K. J. The effects of stereotype threat and double-minority status on the test performance of Latino women. Personal. Soc. Psychol. Bull. 28, 659–670 (2002).
Hemmatian, B. & Varshney, L. R. Debiased large language models still associate muslims with uniquely violent acts. Preprint at https://doi.org/10.31234/osf.io/xpeka(2022).
Li, P. Recent developments: hitting the ceiling: an examination of barriers to success for Asian American women. Berkeley J. Gend. Law Justice 29, 140–167 (2014).
Steketee, A., Williams, M. T., Valencia, B. T., Printz, D. & Hooper, L. M. Racial and language microaggressions in the school ecology. Perspect. Psychol. Sci. 16, 1075–1098 (2021).
Aronson, B. A. The white savior industrial complex: a cultural studies analysis of a teacher educator, savior film, and future teachers. J. Crit. Thought Prax. 6, 36–54 (2017).
Alexander, M. The New Jim Crow: Mass Incarceration in the Age of Colorblindness, New Press, New York (2010).
Waugh, L. R. Marked and unmarked: a choice between unequals in semiotic structure. Semiotica 38, 299–318 (1982).
Felkner, V. K., Chang, H. C. H., Jang, E. & May, J. WinoQueer: A Community-in-the-Loop Benchmark for Anti-LGBTQ+ Bias in Large Language Models. In the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 9126–9140. https://doi.org/10.18653/v1/2023.acl-long.507 (2023).
Deloria, P. J. Playing indian. Yale (University Press, 2022).
Leavitt, P. A., Covarrubias, R., Perez, Y. A. & Fryberg, S. A. “Frozen in time”: The Impact of Native American Media Representations on Identity and Self-understanding. J. Soc. Issues 71, 39–53 (2015).
Witgen, M. An Infinity Of Nations: How The Native New World Shaped Early North America. (University of Pennsylvania Press, 2011).
Khalid, A. Central Asia: A New History From The Imperial Conquests To The Present. (Princeton University Press, 2021).
Said, E. W. Culture and Imperialism. Vintage Books (1994).
Dunbar-Ortiz, R. An Indigenous Peoples’ History Of The United States. (Beacon Press, 2023).
Dunbar-Ortiz, R. Not “a nation of immigrants”: Settler colonialism, white supremacy, and a history of erasure and exclusion. (Beacon Press, 2021).
Immerwahr, D. How to hide an empire: a history of the greater United States. 1st edn. (Farrar, Straus and Giroux, 2019).
Shieh, E. & Monroe-White, T. Teaching Parrots to See Red: Self-Audits of Generative Language Models Overlook Sociotechnical Harms. In Proceedings of the AAAI Symposium Series Vol. 6, 333–340 https://doi.org/10.1609/aaaiss.v6i1.36070 (2025).
Feffer, M., Sinha, A., Deng, W. H., Lipton, Z. C. & Heidari, H. Red-Teaming for Generative AI: Silver Bullet or Security Theater? Proceedings of the AAAI/ACM Conference on AI Ethics and Society 7, 421–437 (2024).
Schopmans, H. R. From Coded Bias to Existential Threat.In Proc. 2022 AAAI/ACM Conference on AI, Ethics, and Society https://doi.org/10.1145/3514094.3534161 (2022).
Askell, A. et al. A general language assistant as a laboratory for alignment. Preprint at https://doi.org/10.48550/arXiv.2112.00861 (2021).
Doshi, T. How We’ve created a helpful and responsible bard experience for Teens. Google: The Keyword–Product Updates. Retrieved from: https://blog.google/products/bard/google-bard-expansion-teens/ (2023).
Devinney, H., Björklund, J. & Björklund, H. We don’t talk about that: case studies on intersectional analysis of social bias in large language models. In Workshop on Gender Bias in Natural Language Processing (GeBNLP), Bangkok, Thailand, 16th August 2024. (33–44). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.gebnlp-1.3 (2024).
Narayanan Venkit, P., Gautam, S., Panchanadikar, R., Huang, T. H., & Wilson, S. Unmasking nationality bias: A study of human perception of nationalities in AI-generated articles. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (554–565) https://doi.org/10.1145/3600211.3604667 (2023).
Luccioni, A. S., Akiki, C., Mitchell, M. & Jernite, Y. Stable bias: evaluating societal representations in diffusion models. Adv. Neural Inf. Process. Syst. 36, 56338–56351 (2023).
Ghosh, S., Venkit, P. N., Gautam, S., Wilson, S., & Caliskan, A. Do generative AI models output harm while representing non-Western cultures: Evidence from a community-centered approach. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7, 476–489 (2024).
Raj, C., Mukherjee, A., Caliskan, A., Anastasopoulos, A. & Zhu, Z. BiasDora: Exploring Hidden Biased Associations in Vision-Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 1, 10439–10455 (2024).
Lee, M. H., Montgomery, J. M. & Lai, C. K. Large language models portray socially subordinate groups as more homogeneous, consistent with a bias observed in humans. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (1321–1340) https://doi.org/10.1145/3630106.3658975 (2024).
Wang, A., Morgenstern, J. & Dickerson, J. P. Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat. Mach. Intell. 7, 400–411 (2025).
Bargh, J. A. & Chartrand, T. L. Studying the mind in the middle: a practical guide to priming and automaticity. Handbook of Research Methods in Social and Personality Psychology Vol. 2, 253–285 (Cambridge University Press, 2000).
Guendelman, M. D., Cheryan, S. & Monin, B. Fitting in but getting fat: Identity threat and dietary choices among US immigrant groups. Psychol. Sci. 22, 959–967 (2011).
Hooker, S. Moving beyond “algorithmic bias is a data problem”. Patterns 2, 4 (2021).
Spencer, S. J., Logel, C. & Davies, P. G. Stereotype threat. Ann. Rev. Psychol. 67, 415–437 (2016).
Gaucher, D., Friesen, J. & Kay, A. C. Evidence that gendered wording in job advertisements exists and sustains gender inequality. J. Personal. Soc. Psychol. 101, 109 (2011).
Pataranutaporn, P. et al. AI-generated characters for supporting personalized learning and well-being. Nat. Mach. Intell. 3, 1013–1022 (2021).
Steele, C. M. A threat in the air: how stereotypes shape intellectual identity and performance. Am. Psychol. 52, 613 (1997).
McGee, E. “Black Genius, Asian Fail”: The Detriment of Stereotype Lift and Stereotype Threat in High-Achieving Asian and Black STEM Students. AERA Open (2018).
Dastin, J. US explores AI to train immigration officers on talking to refugees. Reuters. https://www.reuters.com/world/us/us-explores-ai-train-immigration-officers-talking-refugees-2024-05-08/ (Accessed: 13th May 2024).
Brown, B. A., Reveles, J. M. & Kelly, G. J. Scientific literacy and discursive identity: a theoretical framework for understanding science learning. Sci. Edu. 89, 779–802 (2005).
Mei, K., Fereidooni, S. & Caliskan, A. Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (1699–1710) (ACM, 2023).
Tan, X. E. et al. Towards Massive Multilingual Holistic Bias. Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP) 1, 403–426 (2025).
Nguyen, I., Suresh, H. & Shieh, E. Representational Harms in LLM-Generated Narratives Against Nationalities Located in the Global South. HEAL Workshop, CHI 2025 https://heal-workshop.github.io/chi2025_papers/50_Representational_Harms_in_L.pdf (2025).
Scao, T. L., et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100. https://doi.org/10.48550/arXiv.2211.05100 (2022).
White House. (2022). Blueprint for an AI bill of rights: Making automated systems work for the American people. https://bidenwhitehouse.archives.gov/ostp/ai-bill-of-rights/.
Hashmi, N., Lodge, S., Sugimoto, C. R., & Monroe-White, T. Echoes of Eugenics: Tracing the Ideological Persistence of Scientific Racism in Scholarly Discourse. In Proceedings of the 5th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, 82–92 https://doi.org/10.1145/3757887.3768171 (2025).
Raji, I. D., Denton, E., Bender, E. M., Hanna, A. & Paullada, A. AI and the Everything in the Whole Wide World Benchmark. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2. https://doi.org/10.48550/arXiv.2111.15366 (2021).
Bommasani, R. et al. The Foundation Model Transparency Index. arXiv:2310.12941. Retrieved from https://arxiv.org/abs/2310.12941.
Edwards, B. Exponential growth brews 1 million AI models on Hugging Face. Arstechnica. https://arstechnica.com/information-technology/2024/09/ai-hosting-platform-surpasses-1-million-models-for-the-first-time/ (Accessed: 29th September 2024).
Cheryan, S., Plaut, V. C., Davies, P. G. & Steele, C. M. Ambient belonging: how stereotypical cues impact gender participation in computer science. J. Personal Soc Psychol. 97, 1045 (2009).
Sanders, M. G. Overcoming obstacles: Academic achievement as a response to racism and discrimination. J. Negro Educ. 66, 83–93 (1997).
Tanksley T. C. We’re changing the system with this one: Black students using critical race algorithmic literacies to subvert and survive AI-mediated racism in school. English Teaching: Practice & Critique, 23, 36–56, (2024).
Solyst, J., Yang, E., Xie, S., Ogan, A., Hammer, J. & Eslami, M. The potential of diverse youth as stakeholders in identifying and mitigating algorithmic bias for a future of fairer AI. Proc. ACM Hum.Comput. Interact. 7, 1–27 (2023).
Wilson, J. Proceed with Extreme Caution: Citation to Wikipedia in Light of Contributor Demographics and Content Policies. Vanderbilt J. Entertain. Technol. Law 16, 857 (2014).
Tzioumis, K. Demographic aspects of first names. Sci Data 5, 180025 (2018).
Sood, G. Florida Voter Registration Data https://doi.org/10.7910/DVN/UBIG3F (2022). 2017 and 2022.
Rosenman, E. T. R., Olivella, S. & Imai, K. Race and ethnicity data for first, middle, and surnames. Sci Data 10, 299 (2023).
Bridgland, V. M. E., Jones, P. J. & Bellet, B. W. A Meta-analysis of the efficacy of trigger warnings, content warnings, and content notes. Clin. Psychol. Sci. 12, 751–771 (2022).
Wohl vs. United States, Department of Justice, Civil Rights Division, Southern District of New York. 2022. National Coalition on Black Civic Participation vs. – Statement of Interest of the United States of America. United States Department of Justice (2022). https://www.justice.gov/d9/case-documents/attachments/2022/08/12/ncbp_v_wohl_us_soi_filed_8_12_22_ro_tag.pdf.
Lukito, J. & Pruden, M. L. Critical computation: mixed-methods approaches to big language data analysis. Rev. Commun. 23, 62–78 (2023).
Griffith, E. & Metz, C. A New Area of A.I. Booms, Even Amid the Tech Gloom. The New York Times (2023).
U.S. White House. FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI. The White House. Available at: https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/ (Accessed: 17th December 2023).
Septiandri, A. A., Constantinides, M., Tahaei, M. & Quercia, D. WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT? In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (160–171) (ACM, 2023).
Linxen, S., Sturm, C., Brühlmann, F., Cassau, V., Opwis, K. & Reinecke, K. How weird is CHI? In Proc. 2021 Chi Conference on Human Factors in Computing Systems (1–14) (ACM, 2021).
Atari, M., Xue, M. J., Park, P. S., Blasi, D. E. & Henrich, J. (2023, September 22). Which Humans? https://doi.org/10.31234/osf.io/5b26t.
U.S. Census Bureau QuickFacts: United States (2021). Available at: https://www.census.gov/quickfacts/fact/table/US/PST045222 (Accessed: 17th December 2023).
Master, A., Cheryan, S. & Meltzoff, A. N. Motivation and identity. In Handbook of Motivation at School (300-319). (Routledge, 2016).
Williams, J. C. Double jeopardy? An empirical study with implications for the debates over implicit bias and intersectionality. Harv. J. Law Gend. 37, 185 (2014).
Aronson, J., Quinn, D. M. & Spencer, S. J. Stereotype Threat and the Academic Underperformance of Minorities and Women (83–103) (Prejudice Academic Press 1998).
Cao, Y. T. & Daumé, H. III Toward gender-inclusive coreference resolution: an analysis of gender and bias throughout the machine learning lifecycle. Comput. Linguist. 47, 615–661 (2021).
Kozlowski, D. et al. Avoiding Bias When Inferring Race Using Name-based Approaches. PLoS ONE 17, e0264270 (2022).
Bolukbasi, T., Chang K. W., Zou, J., Saligrama, V. & Kalai, A. T. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Adv. Neural Inf. Process. Syst. 29 https://doi.org/10.48550/arXiv.1607.06520 (2016).
Antoniak, M. & Mimno, D. Bad Seeds: Evaluating Lexical Methods for Bias Measurement. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL, 2021).
Blitzer, J. Everyone Who Is Gone Is Here: The United States, Central America, and the Making of a Crisis. Penguin (2024).
Monroe-White, T. Emancipatory data science: a liberatory framework for mitigating data harms and fostering social transformation. In Proc. 2021 Computers and People Research Conference (23–30) (ACM, 2021).
Le, T.T., Himmelstein, D.S., Hippen, A.A., Gazzara, M.R. & Greene, C.S. Analysis of scientific society honors reveals disparities. Cell Syst. 12, 900–906.e5 (2021).
Willson, S., & Dunston, S. Cognitive Interview Evaluation of the Revised Race Question, with Special Emphasis on the Newly Proposed Middle Eastern/North African Response, National Center for Health Statistics. (2017).
Chin, M.K. et al. Manual on collection, analysis, and reporting of asian american health data. AA & NH/PI Health Central. Available at: https://aanhpihealth.org/resource/sian-american-manual-2023/ (Accessed: 17th December 2023).
Rauh, M. et al. Characteristics of harmful text: towards rigorous benchmarking of language models. Adv. Neural Inf. Process. Syst. 35, 24720–24739 (2022).
Wilson, E. B. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. https://doi.org/10.1080/01621459.1927.10502953 (1927).
Katz, D. J. S. M., Baptista, J., Azen, S. P. & Pike, M. C. Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics. 34, 469–474 (1978).
Altman, D. G. & Bland, J. M. How to obtain the P value from a confidence interval. BMJ 343, d2304 (2011).
Gebru, T. et al. Datasheets for Datasets. Commun. ACM 64, 86–92 (2021).
Shieh, E., Vassel, F. M., Sugimoto, C. R., and Monroe-White, T. Intersectional biases in narratives generated by open-ended prompting of generative language models. GitHub. https://doi.org/10.5281/zenodo.17905666 (2025).
Acknowledgements
Authors T.M.-W. and C.R.S. acknowledge funding support from the National Science Foundation under award number SOS-2152288. F.-M.V. acknowledges funding support from the National Science Foundation under award number CCF-1918549. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Diego Kozlowski, Stella Chen, Rahul Gupta-Iwasaki, Gerald Higginbotham, Bryan Brown, Jay Kim, Dakota Murray, James Evans, Zarek Drozda, Ashley Ding, Princewill Okoroafor, and Hideo Mabuchi for helpful inputs and discussion on earlier versions of the manuscript.
Author information
Authors and Affiliations
Contributions
E.S. conceived the study; E.S. and T.M.-W. contributed to the design of the study; E.S. prepared the primary datasets; E.S. and T.M.-W. performed analysis; E.S., F.-M.V. and T.M.-W. contributed to the interpretation of the results and E.S., F.-M.V., C.S. and T.M.-W. contributed to the writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shieh, E., Vassel, FM., Sugimoto, C.R. et al. Intersectional biases in narratives produced by open-ended prompting of generative language models. Nat Commun (2026). https://doi.org/10.1038/s41467-025-68004-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-68004-9


