Table 8 Agreement between the scores computed by human and GPT-4 (QWK).
From: Applying large language models for automated essay scoring for non-native Japanese
Measures | Agreement | Measures | Agreement | |
|---|---|---|---|---|
Human scoring-GPT 4 scoring | MATTR | 0.655 | CN | 0.807 |
LD | 0.819 | ACC | 0.794 | |
LS | 0.679 | CPC | 0.783 | |
MDD | 0.743 | SOPT | 0.798 | |
MLC | 0.812 | SOPK | 0.805 | |
VPT | 0.754 | word2vec | 0.644 | |
CT | 0.667 | IMM | 0.680 | |
DCT | 0.803 | GE | 0.648 |