Figure 5
From: Diversity in gut bacterial community of school-age children in Asia

Random forest clustering of 303 Asian children using species composition data.
(a) Multidimensional plot of the proximity matrix calculated using random forest analysis of the phylotype composition data of the 303 children. Relative-abundance data of all species in the children were subjected to random forest analysis to perform the machine-learning clustering to identify the country of origin of the samples. The ensemble included 5,000 trees. The calculated proximity matrices are plotted together with the corresponding city colors. (b) Heat map representation of species-level bacterial composition of the microbiota of the 303 Asian children. Top 30 species with the highest Gini score in the random forest analysis performed in (a) were chosen to create the heat map with dendrogram showing the clustering of the species. The relative abundances of these species in each participant were converted to log10 values and subjected to Pearson correlation analysis followed by hierarchical clustering using complete linkage. The population densities of species are scaled by color.