Fig. 2: Ecology of isolated human gut bacteria and their proteins.
From: HiBC: a publicly available collection of bacterial strains isolated from the human gut

a The number of strains and genomes produced by eight major isolation projects, along with HiBC, were compared. Strains were deemed requestable if it was claimed in the original publication, although these claims were not substantiated. They were deemed deposited if culture collection identifiers were included in the original paper and were confirmed to exist. Genomes were deemed high quality if they were >90% complete and <5% contaminated. The number of strains within each study is stated, while the percentage meeting each criterion is plotted. Red dots highlight datasets which have barriers to their accessibility, i.e., data available upon request or access limited to specific countries. Strain collections: GMbC, Global Microbiome Conservancy2; BIO-ML, Broad Institute-OpenBiome Microbiome Library24; CGMR, Chinese Gut Microbial Reference25; CAMII, Culturomics by Automated Microbiome Imaging and Isolation26; hGMB, Human Gut Microbial BioBank3; HBC, Human Gastrointestinal Bacterial Collection1; HiBC, Human Intestinal Bacteria Collection (this study); IHU, collection of the Institut Hospitalier Universitaire Méditerranée Infection27,103; HMP, Human Microbiome Project at ATCC. b Number of species per isolate collection, either via manual curation (HiBC) or dereplication of the available genomes (ANI values > 95% indicated identical species). c The cumulative relative abundance of gut metagenomes across 4624 individuals from Leviatan et al.28 covered by all isolated bacteria across studies including HiBC (Global isolates, dark blue), HiBC alone (green), or the subset represented by the 29 novel taxa described in this work (light blue), which had matches within 4583 of the samples. d Relative abundance of dominant (mean relative abundance >0.25%) novel taxa across 4,624 individuals, with the number of positive samples stated. Each strain represents a distinct novel species, described in detail in the protologues at the end of the “Methods” section. e, f Genomic location of proteins significantly differentially prevalent between Crohn’s disease (CD) samples and healthy controls (inner ring), or ulcerative colitis (UC) samples and healthy controls (outer ring). The delta-prevalence (prevalence in healthy donors – prevalence in corresponding patients) is shown in blue (more prevalent in healthy controls), red (UC), or mauve (CD). The species, strain, number of proteins predicted within the genome, and those significantly differentially between health conditions are shown within the circle. Only contigs >10 kp were plotted. In panel c and d, boxplots include a line in the centre indicating the median, the boxes represent the interquartile range, and the whiskers represent the minimum and maximum values, not including outliers.