Table 8 Agreement between the scores computed by human and GPT-4 (QWK).

From: Applying large language models for automated essay scoring for non-native Japanese

 

Measures

Agreement

Measures

Agreement

Human scoring-GPT 4 scoring

MATTR

0.655

CN

0.807

LD

0.819

ACC

0.794

LS

0.679

CPC

0.783

MDD

0.743

SOPT

0.798

MLC

0.812

SOPK

0.805

VPT

0.754

word2vec

0.644

CT

0.667

IMM

0.680

DCT

0.803

GE

0.648