Table 1 Results of the generalized linear mixed model (GLMM).

From: Evaluation of DeepSeek-R1 and ChatGPT-4o on the Chinese national medical licensing examination: a multi-year comparative study

Fixed effect

Estimate (β)

SE

Z

p-value

Signif.

ChatGPT-4o (baseline)

 − 0.265

0.117

 − 2.267

0.023

*

DeepSeek-R1 vs ChatGPT-4o

 − 1.829

0.155

 − 11.769

 < 0.001

***

Year 2018

0.037

0.137

0.268

0.788

 

Year 2019

0.350

0.139

 − 3.173

0.002

**

Year 2020

0.333

0.138

 − 2.419

0.016

*

Year 2021

 − 0.337

0.138

 − 2.420

0.016

*

Unit 2

0.177

0.105

1.676

0.094

 

Unit 3

0.344

0.105

3.279

0.001

**

Unit 4

 − 0.017

0.107

 − 0.164

0.870

 

DeepSeek-R1 × Year 2018

0.330

0.205

1.612

0.107

 

DeepSeek-R1 × Year 2019

 − 0.050

0.223

 − 0.224

0.822

 

DeepSeek-R1 × Year 2020

 − 0.625

0.240

 − 2.602

0.009

**

DeepSeek-R1 × Year 2021

 − 0.361

0.230

 − 1.571

0.116

 
  1. * p < 0.05, ** p < 0.01, *** p < 0.001 (two-tailed). In the GLMM model, Unit 1 was set as the reference category for subject unit and is not shown in the table.