Table 2 Measures of nonnative Japanese writing proficiency.

From: Applying large language models for automated essay scoring for non-native Japanese

Criteria

Measures

Code

Calculation

Lexical richness

Lexical diversity

MATTR

Moving average type token ratio per essay:

MATTR(W)word form = \(\frac{{\sum }_{{\rm{I}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}\)

Lexical density

LD

# of lexical words (token)/# of words

Lexical sophistication

LS

# of sophisticated word types / total # of words per essay

Syntactic

complexity

Mean dependency distance

MDD

\({\rm{MDD}}=\) \(\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|\)

Mean length of clause

MLC

# of words / # of clauses

Verb phrases per T-unit

VPT

# of verb phrases / # of T-units

Clauses per T-unit

CT

# of clauses / # of T-units

Dependent clauses per T-unit

DCT

# of dependent clauses / # of T-units

Complex nominals per T-unit

CNT

# of complex nominals / # of T-units

Adverbial clauses rate

ACC

# of adverbial clauses / # of clauses

Coordinate phrases rate

CPC

# of coordinate phrases / # of clauses

Cohesion

Semantic similarity

SOPT

Synonym overlap/paragraph (topic)

SOPK

Synonym overlap/paragraph (keywords)

word2vec cosine similarity

word2vec

(a). word2vec → (b). cosine similarity between sample and reference

Content elaboration

Metadiscourse marker rate

IMM

# of metadiscourse marker (type) / Number of words

Grammatical accuracy

Grammatical error rate

GER

# of errors per essay

  1. *The term “token” refers to the occurrence of a word in the text.