Table 1 Pairwise comparison of SJT scores.

From: Large language models can outperform humans in social situational judgments

 

Human sample

Copilot

ChatGPT

Claude

Gemini

Copilot

0.007

ChatGPT

1

0.003

Claude

 < 0.001

0.021

0.002

Gemini

1

0.003

0.891

0.002

you.com

0.028

0.891

0.020

0.003

0.010

  1. The Bonferroni–Holm adjusted p-values for all pairwise comparisons are indicated.