Table 11 Distribution of measured outcomes of linguistic feature-, LLM-based AES approach.

From: Applying large language models for automated essay scoring for non-native Japanese

Statistical metric

AES system

Mean score (Lexical richness)

Mean score (Syntactic complexity)

Mean score (Content)

Mean score (Grammatical accuracy)

QWK

Jess-human

0.608

0.591

0.518

0.655

JWriter-human

0.600

0.589

0.521

0.661

BERT-human

0.653

0.652

0.638

0.671

GPT 4 -human

0.665

0.655

0.657

0.689

OCLL-human

0.639

0.648

0.623

0.662

Human-human

0.657

0.639

0.578

0.677

PRMSE

Jess-human

0.691

0.658

0.601

0.689

JWriter-human

0.683

0.675

0.584

0.732

BERT-human

0.701

0.746

0.628

0.749

GPT 4 -human

0.711

0.733

0.634

0.754

OCLL-human

0.687

0.699

0.619

0.694

Human-human

0.691

0.745

0.590

0.744