Table 3 The Cohen’s Kappa statistics of the inter-coder reliability between human coder and ChatGPT

From: Evaluating large language models in analysing classroom dialogue

Codes

Math lessons

Chinese lessons

ELI

0.973***

0.995***

EL

0.961***

0.977***

REI

0.838***

0.962***

RE

0.947***

0.932***

CI

0.497***

−0.005

SC

0.216***

0.004

RC

−0.002

0a

A

0.843***

0.992***

Q

0.735***

0.830***

RB

0.909***

0.874***

RW

0.662***

−0.004

SU

0.611***

−0.010

SA

0.702***

0.529***

OI

0.962***

0.958***

O

0.944***

0.953***

  1. This table presents Cohen's Kappa statistics for inter-coder reliability between human coders and ChatGPT across various codes in Chinese and math lessons. The statistics measure the level of agreement between the two coding methods. Higher values indicate a higher level of agreement. Significant values are denoted with ***.
  2. aNo statistics are computed because in human coding results, the frequency of RC is 0.
  3. ***Indicates significance at p < 0.001. No statistics are computed for code RC because in human coding results the frequency of RC is 0.