Figure 8
From: Machine learning based attribution mapping of climate related discussions on social media

Sample 2 final optimized clusters in word clouds50. The word cloud for a given optimized cluster has been generated from top 30 bigrams68 featured in that cluster sorted by Tfidf-Vectorizer69 weights. It provides a bird's view into the underlying keywords belonging to all 12 clusters (a–l), specifically showing how distinguished the optimized clusters are in terms of their composition and how similar they are when compared with Sample 1 (Fig. 7). We observe a distinct group of bigrams belonging to each optimized cluster in Sample 2 similarly as observed in Sample 1 thereby expressing the underlying theme distinctly for each cluster. The only difference between Sample 1 & 2 was observed in the Agriculture cluster (h), whereby we observe that while in Sample 2, bigrams are able to express the agriculture theme clearly, in Sample 1 (Fig. 7h), we observe a mix of bigrams from agriculture and administration themes, and so the agriculture-related posts also contain text related to administration. The same can also be observed from scatter plots (Fig 4a, c), where in Sample 1, a part of the Agriculture cluster highlighted in yellow is mixed up with the Administration cluster highlighted in black as opposed to Sample 2, where both of the clusters are clearly distinct.