Fig. 1: Four steps of the analysis.
From: Computational thematics: comparing algorithms for clustering the genres of literary fiction

The workflow includes two loops. Big loop goes through various combinations of thematic foregrounding (Step 1a), feature type (1b), and distance metric (1c). For each such combination, a smaller loop is run: it randomly draws a genre-stratified sample of 120 novels (Step 2), clusters the novels using the Ward algorithm (Step 3), and validates the clusters on the dendrogram using the Adjusted Rand Index (Step 4). As a result, each combination receives an ARI score: a score of its performance in detecting genres.