Table 2 Measures of nonnative Japanese writing proficiency.
From: Applying large language models for automated essay scoring for non-native Japanese
Criteria | Measures | Code | Calculation |
|---|---|---|---|
Lexical richness | Lexical diversity | MATTR | Moving average type token ratio per essay: MATTR(W)word form = \(\frac{{\sum }_{{\rm{I}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}\) |
Lexical density | LD | # of lexical words (token)/# of words | |
Lexical sophistication | LS | # of sophisticated word types / total # of words per essay | |
Syntactic complexity | Mean dependency distance | MDD | \({\rm{MDD}}=\) \(\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|\) |
Mean length of clause | MLC | # of words / # of clauses | |
Verb phrases per T-unit | VPT | # of verb phrases / # of T-units | |
Clauses per T-unit | CT | # of clauses / # of T-units | |
Dependent clauses per T-unit | DCT | # of dependent clauses / # of T-units | |
Complex nominals per T-unit | CNT | # of complex nominals / # of T-units | |
Adverbial clauses rate | ACC | # of adverbial clauses / # of clauses | |
Coordinate phrases rate | CPC | # of coordinate phrases / # of clauses | |
Cohesion | Semantic similarity | SOPT | Synonym overlap/paragraph (topic) |
SOPK | Synonym overlap/paragraph (keywords) | ||
word2vec cosine similarity | word2vec | (a). word2vec → (b). cosine similarity between sample and reference | |
Content elaboration | Metadiscourse marker rate | IMM | # of metadiscourse marker (type) / Number of words |
Grammatical accuracy | Grammatical error rate | GER | # of errors per essay |