Table 7 Number of features for dataset when converted to TF-IDF.

From: Open source Arabic research paper dataset for natural language processing

Preprocessing

# Features

Without preprocessing

(2011-1516485)

Stop word removal and tashkeel

(2011-195465)

Stop word removal, tashkeel, and Arabic normalization

(2011-190576)

Stop word removal, tashkeel, stemming, Term pruning, and Arabic normalization

(2011-114255)