Table 2 Discrepancy analyses between ChatGPT model predictions and human judgments for current stimuli.
p values | |||||||||
|---|---|---|---|---|---|---|---|---|---|
Variable | N | Mean | SD | 95% CI bounds | t | Uncorr | BH-FDR | Bonf | Cohen’s d |
davinci-human simp. diff | 60 | 0.20 | 1.04 | [− 0.07, 0.47] | 1.49 | 0.141 | 0.141 | > 0.99 | 0.19 |
moral scenarios | 20 | 0.80 | 0.49 | [0.57, 1.03] | 7.28 | < 0.001 | < 0.001 | < 0.001 | 1.63 |
neutral scenarios | 20 | 0.81 | 0.77 | [0.44, 1.17] | 4.65 | < 0.001 | < 0.001 | 0.004 | 1.04 |
immoral scenarios | 20 | − 1.00 | 0.51 | [− 1.24, -0.76] | − 8.76 | < 0.001 | < 0.001 | < 0.001 | − 1.96 |
davinci-human abs. diff | 60 | 0.91 | 0.54 | [0.77, 1.05] | 13.06 | < 0.001 | < 0.001 | < 0.001 | 1.69 |
moral scenarios | 20 | 0.81 | 0.47 | [0.59, 1.03] | 7.68 | < 0.001 | < 0.001 | < 0.001 | 1.72 |
neutral scenarios | 20 | 0.91 | 0.64 | [0.61, 1.21] | 6.34 | < 0.001 | < 0.001 | < 0.001 | 1.42 |
immoral scenarios | 20 | 1.01 | 0.5 | [0.77, 1.24] | 9.08 | < 0.001 | < 0.001 | < 0.001 | 2.03 |
davinci-human sq. diff | 60 | 1.11 | 1.09 | [0.83, 1.39] | 7.89 | < 0.001 | < 0.001 | < 0.001 | 1.02 |
moral scenarios | 20 | 0.86 | 0.65 | [0.56, 1.17] | 5.89 | < 0.001 | < 0.001 | < 0.001 | 1.32 |
neutral scenarios | 20 | 1.22 | 1.39 | [0.57, 1.87] | 3.93 | < 0.001 | 0.001 | 0.022 | 0.88 |
immoral scenarios | 20 | 1.25 | 1.11 | [0.73, 1.77] | 5.02 | < 0.001 | < 0.001 | 0.002 | 1.12 |
gpt-4o-human simp. diff | 60 | 0.32 | 0.79 | [0.12, 0.52] | 3.15 | 0.003 | 0.003 | 0.062 | 0.41 |
moral scenarios | 20 | 0.58 | 0.27 | [0.45, 0.71] | 9.48 | < 0.001 | < 0.001 | < 0.001 | 2.12 |
neutral scenarios | 20 | 0.96 | 0.56 | [0.70, 1.22] | 7.65 | < 0.001 | < 0.001 | < 0.001 | 1.71 |
immoral scenarios | 20 | -0.58 | 0.43 | [− 0.78, − 0.38] | − 6.03 | < 0.001 | < 0.001 | < 0.001 | − 1.35 |
gpt-4o-human abs. diff | 60 | 0.72 | 0.44 | [0.60, 0.83] | 12.56 | < 0.001 | < 0.001 | < 0.001 | 1.62 |
moral scenarios | 20 | 0.58 | 0.27 | [0.45, 0.71] | 9.48 | < 0.001 | < 0.001 | < 0.001 | 2.12 |
neutral scenarios | 20 | 1.00 | 0.49 | [0.77, 1.22] | 9.18 | < 0.001 | < 0.001 | < 0.001 | 2.05 |
immoral scenarios | 20 | 0.58 | 0.42 | [0.38, 0.78] | 6.15 | < 0.001 | < 0.001 | < 0.001 | 1.38 |
gpt-4o-human sq. diff | 60 | 0.71 | 0.77 | [0.51, 0.91] | 7.16 | < 0.001 | < 0.001 | < 0.001 | 0.92 |
moral scenarios | 20 | 0.41 | 0.32 | [0.26, 0.56] | 5.62 | < 0.001 | < 0.001 | < 0.001 | 1.26 |
neutral scenarios | 20 | 1.22 | 0.99 | [0.75, 1.68] | 5.52 | < 0.001 | < 0.001 | < 0.001 | 1.23 |
immoral scenarios | 20 | 0.51 | 0.59 | [0.23, 0.79] | 3.85 | 0.001 | 0.001 | 0.026 | 0.86 |