Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Research Briefing
  • Published:

Measuring context sensitivity in artificial intelligence content moderation

Automated content moderation systems designed to detect prohibited content on social media often struggle to account for contextual information, which sometimes leads to the erroneous flagging of innocuous content. An experimental study on hate speech reveals how multimodal large language models can facilitate more context-sensitive content moderation on social media.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Evaluations of racialized language vary depending on the author.

References

  1. Wilson, R. A. & Land, M. K. Hate speech on social media: content moderation in context. Conn. Law Rev. 52, 1029–1076 (2021). This review discusses online hate speech, related legal debates and the importance of context.

    Google Scholar 

  2. Davidson, T. et al. Automated hate speech detection and the problem of offensive language. Proc. ICWSM 11, 512–515 (2017). This paper shows how hate speech is often conflated with offensive language.

    Article  Google Scholar 

  3. Sap, M. et al. The risk of racial bias in hate speech detection. Proc. ACL 57, 1668–1679 (2019). This paper demonstrates racial bias in hate speech detection models.

    Google Scholar 

  4. Hainmueller, J., Hopkins, D. J. & Yamamoto, T. Causal inference in conjoint analysis: understanding multidimensional choices via stated preference experiments. Polit. Anal. 22, 1–30 (2014). This paper provides a thorough introduction to conjoint experiments.

    Article  Google Scholar 

  5. Vecchiato, A. & Munger, K. Introducing the visual conjoint, with an application to candidate evaluation on social media. J. Exp. Polit. Sci. 12, 57–71 (2025). This paper introduces visual conjoint designs.

    Article  Google Scholar 

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Davidson, T. Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement. Nat. Hum. Behav. https://doi.org/10.1038/s41562-025-02360-w (2025).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Measuring context sensitivity in artificial intelligence content moderation. Nat Hum Behav (2025). https://doi.org/10.1038/s41562-025-02363-7

Download citation

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41562-025-02363-7

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics