Table 1 Pairwise comparison of SJT scores.
From: Large language models can outperform humans in social situational judgments
Human sample | Copilot | ChatGPT | Claude | Gemini | |
---|---|---|---|---|---|
Copilot | 0.007 | – | – | – | – |
ChatGPT | 1 | 0.003 | – | – | – |
Claude | < 0.001 | 0.021 | 0.002 | – | – |
Gemini | 1 | 0.003 | 0.891 | 0.002 | – |
you.com | 0.028 | 0.891 | 0.020 | 0.003 | 0.010 |