Extended Data Fig. 1: Differences in the effects of slurs by identity for generic insults and homophobia. | Nature Human Behaviour

Extended Data Fig. 1: Differences in the effects of slurs by identity for generic insults and homophobia.

From: Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement

Extended Data Fig. 1

This figure shows the difference in marginal means for generic insults and homophobia between users with a specified race and gender and the reference group, anonymous users. Each column shows the results for a specified slur type, and each point represents the estimated difference in marginal means, and is colored based on the identity depicted. The top row shows results for human subjects (Nposts = 55,620 evaluated by Nsubjects=1854). The remaining rows show results for each model tested, where Nposts = 60,000 for each model. Error bars are 95% confidence intervals: the MLLM results use bootstrap confidence intervals, and the human experiment results include subject-level clustered standard errors.

Back to article page