Extended Data Fig. 2: Differences in the effects of slurs by identity across prompts (closed models). | Nature Human Behaviour

Extended Data Fig. 2: Differences in the effects of slurs by identity across prompts (closed models).

From: Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement

Extended Data Fig. 2

This figure shows the difference in marginal means for between users with an identity cue and anonymous users for each slur and how these differences vary across prompts. The results for each of the closed models are shown (Nposts = 60,000 for each model). The top row shows results for human subjects (Nposts = 55,620 evaluated by Nsubjects=1854). Each point represents the estimated difference in marginal means and is colored based on the identity depicted. The shape of each point denotes the prompt variant. Error bars are 95% confidence intervals: the MLLM results use bootstrap confidence intervals, and the human experiment results include subject-level clustered standard errors.

Back to article page