Table 3 Effect of Language on accuracy rates of ChatGPT−4o and deepseek across different groups (mean ± standard deviation).

From: Evaluation of ChatGPT-4o and DeepSeek as tools for orthodontic health literacy in public dental education

 

ChatGPT−4o

DeepSeek

English

Chinese

p

Cohen’s d

English

Chinese

p

Cohen’s d

Group A

82.2 ± 0.038

83.3 ± 0.037

0.844

0.029

50 ± 0.050

55.6 ± 0.050

0.457

0.111

Group B

80 ± 0.040

80 ± 0.040

1.000

0.000

90 ± 0.030

80 ± 0.040

0.061

0.281

Group C

90 ± 0.030

93.3 ± 0.025

0.402

0.120

100 ± 0

98.9 ± 0.010

0.317

0.149

Group D

100 ± 0

100 ± 0

1.000

0.000

100 ± 0

93.3 ± 0.025

0.013*

0.374

Group E

100 ± 0

100 ± 0

1.000

0.000

100 ± 0

90 ± 0.03

0.002*

0.466

Total

90.4 ± 0.0294

91.3 ± 0.0282

0.643

0.031

88 ± 0.0325

83.6 ± 0.0371

0.056

0.127

  1. Bold values indicate statistically significant differences.
  2. *p < 0.05.