Fig. 2

Polarity score agreement between mean, individual human reviewers, and large language models (LLM) across human reviewed sample (n = 123 responses). Using Cohen’s Kappa, this heatmap shows agreement of response polarity scores between human reviewers and LLM classes.