Fig. 2: Intersection and frequency of species across studies. | Nature Biotechnology

Fig. 2: Intersection and frequency of species across studies.

From: A unified catalog of 204,938 reference genomes from the human gut microbiome

Fig. 2: Intersection and frequency of species across studies.

a, Number of species found across genome study sets, ordered by their level of overlap. Vertical bars represent the number of species shared between the specific study sets highlighted with colored dots in the lower panel. Horizontal bars in the lower panel indicate the total number of species contained in each study set. Different shades of green denote the study sets represented exclusively by MAGs, whereas those in blue represent studies only containing isolate genomes. b, Rarefaction curves of the number of species detected as a function of the number of nonredundant genomes analyzed. Curves are depicted both for all the UHGG species and after excluding singleton species (represented by only one genome). c, Number of nonredundant genomes detected per species (left) alongside the degree of geographic diversity (calculated with the Shannon diversity index; right). Only the 25 most represented species clusters are depicted. d, Left, proportion of metagenomic reads from 1,005 independent datasets classified with Kraken 2 against the UHGG species representatives. Right, the degree of classification improvement provided over the standard Kraken 2 RefSeq database. The following correspond to the number of datasets analyzed per country: Cameroon, n = 54; Ethiopia, n = 25; Germany, n = 56; Ghana, n = 40; India, n = 105; Italy, n = 50; Luxembourg, n = 26; Russia, n = 4; Tanzania, n = 61; United Kingdom, n = 210; United States, n = 374. Box lengths represent the IQR of the data, and whiskers extend to the lowest and highest values within 1.5 times the IQR from the first and third quartiles, respectively.

Back to article page