Table 7 Number of features for dataset when converted to TF-IDF.
From: Open source Arabic research paper dataset for natural language processing
Preprocessing | # Features |
---|---|
Without preprocessing | (2011-1516485) |
Stop word removal and tashkeel | (2011-195465) |
Stop word removal, tashkeel, and Arabic normalization | (2011-190576) |
Stop word removal, tashkeel, stemming, Term pruning, and Arabic normalization | (2011-114255) |