Extended Data Fig. 2: Intersections and distribution of biosynthetic diversity values among different ecosystem types.
From: Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes

The bar plot on the left depicts the number of Gene Cluster Families (GCFs as defined by BiG-SLiCE with T = 0.4) found in each biome type. The bar plot on top shows the size (number of GCFs) of each intersection. Which sets (biome types) are included in each intersection can be seen in the matrix below the bar plot, where the dark dots pinpoint included sets. If more than one set is part of an intersection, connecting lines are drawn for better visibility. The data presented in this graph come only from the MAGs in the GEMS dataset (see Supplementary Table 1), which was the only one with sufficient metadata. Only the top 63 most sizable intersections are depicted here, and only the 35 ecosystem types (with the most GCFs out of the 63) that were part of them are shown on the left. The data indicate that there is barely any overlap between the ecosystem types; most GCFs (74.43 %) are specific to a single biome (a complete overview of unique GCFs per ecosystem type can be found in Supplementary Table 7), while the largest intersection (the one including most habitats - not visible in this Figure) includes 50 of the 63 ecosystem types.