Table 4 Text preprocessing statistics.
Preprocessing step | Average tokens per article | Stopwords removed | Unique lemmas |
|---|---|---|---|
Raw text | 500 | - | - |
After stopword removal | 350 | 30% Removed | 10,000 |
After lemmatization | 340 | - | 9,500 |
Preprocessing step | Average tokens per article | Stopwords removed | Unique lemmas |
|---|---|---|---|
Raw text | 500 | - | - |
After stopword removal | 350 | 30% Removed | 10,000 |
After lemmatization | 340 | - | 9,500 |