Fig. 6: The influence of the number of most frequent words, used as text features, on learning the thematic signal, measured with ARI.
From: Computational thematics: comparing algorithms for clustering the genres of literary fiction

There is a positive relationship between the n of words and ARI, as well as between the level of thematic foregrounding and ARI. However, the middle parameter values of both (5000 MFWs and medium foregrounding) should be enough for most analyses.