Table 2 Accuracy rates of ChatGPT−4o and deepseek in english and Chinese versions across different groups.
ChatGPT−4o | DeepSeek | p | Cohen’s d | |||||
|---|---|---|---|---|---|---|---|---|
Value (%) | 95% CI-Wald binominal | Standard deviation | Value (%) | 95% CI-Wald binominal | Standard deviation | |||
Group A | 82.2a (83.3a) | 72.7–89.5 (74-90.4) | 0.038 (0.037 ) | 50.0a (55.6a) | 39.3–60.7 (44.7–66) | 0.050 (0.050 ) | < 0.001* (< 0.001*) | 0.720 (0.629) |
Group B | 80.0a (80.0a) | 70.2–87.7 (70.2–87.7) | 0.040 (0.040) | 90.0b (80.0b) | 81.9–95.3 (70.2–87.7) | 0.030 (0.040) | 0.061 (1.000) | 0.281 (0.000) |
Group C | 90.0b (93.3b) | 81.9–95.3 (86.1–97.5) | 0.030 (0.025) | 100.0c (98.9c) | 96–100 (94–100) | 0.000 (0.011) | 0.002* (0.055 ) | 0.469 (0.289) |
Group D | 100.0cd (100.0cd) | 96–100(96–100) | 0.000 (0.000) | 100.0cd (93.3cd) | 96–100 (86.1–97.5) | 0.000 (0.025) | 1.000 (0.013*) | 0.000 (0.373) |
Group E | 100.0cd (100.0cd) | 96–100 (96–100) | 0.000 (0.000) | 100.0cd (90.0cd) | 96–100 (81.9–95.3) | 0.000 (0.030) | 1.000 (0.002*) | 0.000 (0.465) |
Total | 90.4 (91.3) | 87.3–93 (88.3–93.8) | 0.029 (0.028) | 88.0 (83.6) | 84.6–90.9 (79.8–86.9) | 0.033 (0.037) | 0.247 (< 0.001*) | 0.077 (0.234) |