Fig. 1: Corpora sizes and the impact of model parameters on attribution accuracy.
From: Inference through innovation processes tested in the authorship attribution task

In panel (a) we offer a pictorial view of various characteristics related to the size of the considered corpora. The size of the triangles is proportional to the logarithm of the corpus size, measured as number of documents. In the x and y axes we represent for each corpus the distribution of the numbers of texts (x axis) and of the numbers of characters (y axis) per author. Specifically, the continuous line bars represent the interquartile range of the distributions and the dotted lines show the 95% interval, to highlight their long tails. Panels (b–f) report the attribution accuracy varying the length of the fragments and the δ value. The colour scale refers to the difference relative to the maximum attribution accuracy obtained in each dataset. In the upper band, the considered length of fragments is of a single token. In the lower band, the text is not partitioned in fragments (full text).