Table 5 The weights of linguistic features in score prediction.

From: Applying large language models for automated essay scoring for non-native Japanese

Measures

Calculation

Weight

Lexical diversity

Moving average type token ratio per essay

0.0391

Lexical density

# of lexical words (token)/# of words

0.0200

Lexical sophistication

# of sophisticated word types / total # of words per essay

0.0212

Mean dependency distance

\({\rm{MDD}}=\frac{1}{{\rm{n}}}{\sum }_{i=1}^{n}|{{\rm{DD}}}_{i}|\)

0.0388

Mean length of clause

# of words / # of clauses

0.0374

Verb phrases per T-unit

# of verb phrases / # of T-units

0.0112

Clauses per T-unit

# of clauses / # of T-units

0.0021

Dependent clauses per T-unit

# of dependent clauses / # of T-units

0.0029

Complex nominals per T-unit

# of complex nominals / # of T-units

0.0379

Adverbial clauses rate

# of adverbial clauses / # of clauses

0.0108

Coordinate phrases rate

# of coordinate phrases / # of clauses

0.0325

Semantic similarity

Synonym overlap/paragraph (topic)

0.0072

Synonym overlap / paragraph (keywords)

0.0075

word2vec cosine similarity

(a). word2vec → (b). cosine similarity between sample and reference

0.0295

Metadiscourse marker rate

# of metadiscourse marker (type) / Number of words

0.0103

Grammatical error rate

# of errors per essay

0.0322