Fig. 3: Distribution of the consensus matrix hits for two cases: a semantic rich corpus represented by 350 features (red), and a poor corpus represented by only 145 features (blue).
From: Topic detection with recursive consensus clustering and semantic enrichment

The distribution of the hits has been rescaled by the number of topics l in the interval 5-50 with steps of 5, individual curves with fixed l appear as tiny lines in each distribution. The effect of the semantic richness of a corpus is evident in the more peaked distribution: the more a corpus is rich the less is the fraction of words needed to describe a topic.