Table 3 Effect of Language on accuracy rates of ChatGPT−4o and deepseek across different groups (mean ± standard deviation).
ChatGPT−4o | DeepSeek | |||||||
|---|---|---|---|---|---|---|---|---|
English | Chinese | p | Cohen’s d | English | Chinese | p | Cohen’s d | |
Group A | 82.2 ± 0.038 | 83.3 ± 0.037 | 0.844 | 0.029 | 50 ± 0.050 | 55.6 ± 0.050 | 0.457 | 0.111 |
Group B | 80 ± 0.040 | 80 ± 0.040 | 1.000 | 0.000 | 90 ± 0.030 | 80 ± 0.040 | 0.061 | 0.281 |
Group C | 90 ± 0.030 | 93.3 ± 0.025 | 0.402 | 0.120 | 100 ± 0 | 98.9 ± 0.010 | 0.317 | 0.149 |
Group D | 100 ± 0 | 100 ± 0 | 1.000 | 0.000 | 100 ± 0 | 93.3 ± 0.025 | 0.013* | 0.374 |
Group E | 100 ± 0 | 100 ± 0 | 1.000 | 0.000 | 100 ± 0 | 90 ± 0.03 | 0.002* | 0.466 |
Total | 90.4 ± 0.0294 | 91.3 ± 0.0282 | 0.643 | 0.031 | 88 ± 0.0325 | 83.6 ± 0.0371 | 0.056 | 0.127 |