Scientific Reports

Table 1 Pairwise comparison of SJT scores.

From: Large language models can outperform humans in social situational judgments

	Human sample	Copilot	ChatGPT	Claude	Gemini
Copilot	0.007	–	–	–	–
ChatGPT	1	0.003	–	–	–
Claude	< 0.001	0.021	0.002	–	–
Gemini	1	0.003	0.891	0.002	–
you.com	0.028	0.891	0.020	0.003	0.010

The Bonferroni–Holm adjusted p-values for all pairwise comparisons are indicated.

Back to article page

Search

Advanced search

Quick links