Table 5 The weights of linguistic features in score prediction.
From: Applying large language models for automated essay scoring for non-native Japanese
Measures | Calculation | Weight |
|---|---|---|
Lexical diversity | Moving average type token ratio per essay | 0.0391 |
Lexical density | # of lexical words (token)/# of words | 0.0200 |
Lexical sophistication | # of sophisticated word types / total # of words per essay | 0.0212 |
Mean dependency distance | \({\rm{MDD}}=\frac{1}{{\rm{n}}}{\sum }_{i=1}^{n}|{{\rm{DD}}}_{i}|\) | 0.0388 |
Mean length of clause | # of words / # of clauses | 0.0374 |
Verb phrases per T-unit | # of verb phrases / # of T-units | 0.0112 |
Clauses per T-unit | # of clauses / # of T-units | 0.0021 |
Dependent clauses per T-unit | # of dependent clauses / # of T-units | 0.0029 |
Complex nominals per T-unit | # of complex nominals / # of T-units | 0.0379 |
Adverbial clauses rate | # of adverbial clauses / # of clauses | 0.0108 |
Coordinate phrases rate | # of coordinate phrases / # of clauses | 0.0325 |
Semantic similarity | Synonym overlap/paragraph (topic) | 0.0072 |
Synonym overlap / paragraph (keywords) | 0.0075 | |
word2vec cosine similarity | (a). word2vec → (b). cosine similarity between sample and reference | 0.0295 |
Metadiscourse marker rate | # of metadiscourse marker (type) / Number of words | 0.0103 |
Grammatical error rate | # of errors per essay | 0.0322 |