Table 4 Sample analysis of the differences between model and human scoring.

From: Intelligent text analysis for effective evaluation of english Language teaching based on deep learning

Sample ID

S1

S2

S3

Prompt

Should governments invest more in public transportation?

Is it better to live in a city or a rural area?

Should college education be free?

Excerpt

“While some may argue that cars offer greater freedom, I firmly believe that investing in public transportation leads to a greener, more efficient society. Isn’t it better to reduce traffic jams and pollution?”

“Living in a city has many benefits. You can go to museums, restaurants, or hospitals easily. Everything is close.”

“College education should be free so that everyone can access knowledge. However, the government needs a sustainable plan to fund it.”

Human Score

4.5

3.0

4.8

Model Score

3.7

4.1

4.5

Score Gap

−0.8

+ 1.1

−0.3

Analysis of Deviation

The model misinterpreted the rhetorical question and contrastive reasoning, underestimating the strength of the author’s stance and giving a lower score.

Despite the fluent language and clear structure, the essay lacked critical analysis. The model over-weighted surface fluency and failed to penalize the lack of argument depth, resulting in an inflated score.

Minor spelling and grammar issues were over-penalized by the model, leading to a slight underestimation of the overall quality.