Figure 7
From: Machine learning based attribution mapping of climate related discussions on social media

Sample 1 final optimized clusters in word clouds50. The word cloud for a given optimized cluster has been generated from top 30 bigrams68 featured in that cluster sorted by Tfidf-Vectorizer69 weights. It provides a bird's view into the underlying keywords belonging to all 12 clusters (a–l), specifically showing how distinguished the optimized clusters are in terms of their composition. For instance, “global warming” features in clusters 4 & 5 both (d, e), however by observing the other bigrams within the respective word clouds, we see that words like “climate science/scientists”, “climate denial” etc. dominate the Climate science cluster (d) while “sea ice”, “level rise” etc. - mainly the impacts of global warming - dominate the Global warming cluster (e). Moreover, we observe a clear distinct group of bigrams belonging to each optimized cluster thereby expressing the underlying theme distinctly except for General posts (k) and Unidentifiable (l) clusters. While the General posts cluster (k) contains mainly Reddit posts belonging to a wide range of themes of climate as opposed to discussing a specific theme as observed in other identified clusters, the Unidentifiable cluster (l) mostly contains posts which are having discussions of other than climate and thus do not belong to any specific climate theme or cluster.