Table 4 Consistency rates of ChatGPT−4o and deepseek in english and Chinese versions across different groups (mean and standard deviation).

From: Evaluation of ChatGPT-4o and DeepSeek as tools for orthodontic health literacy in public dental education

 

Group A

Group B

Group C

Group D

Group E

Total

p1

ICC

ChatGPT−4o

95.56 ± 0.093

(96.67 ± 0.105)

100 ± 0

(100 ± 0)

100 ± 0

(96.67 ± 0.105)

100 ± 0

(100 ± 0)

100 ± 0

(100 ± 0)

99.11 ± 0.043

(98.67 ± 0.065)

0.086

(0.574)

0.982

(0.984)

DeepSeek

100 ± 0

(94.44 ± 0.141)

100 ± 0

(100 ± 0)

100 ± 0

(98.89 ± 0.035)

100 ± 0

(96.67 ± 0.105)

100 ± 0

(100 ± 0)

100 ± 0

(98 ± 0.079)

1.000

(0.437)

1.000

(0.978)

p2

0.146

(0.543)

1.000

(1.000)

1.000

(0.942)

1.000

(0.317)

1.000

(1.000)

1.000

(0.412 )

  
  1. Each cell presents the performance of the AI model on both the English and Chinese versions of the same question, shown as English value (Chinese value).
  2. 1Kruskal-Wallis test. 2Mann–Whitney U test.