Metagenomic global survey and in-depth genomic analyses of Ruminococcus gnavus reveal differences across host lifestyle and health status

Nooij, S.; Plomp, N.; Sanders, I. M. J. G.; Schout, L.; van der Meulen, A. E.; Terveer, E. M.; Norman, J. M.; Karcher, N.; Larralde, M. F.; Vossen, R. H. A. M.; Kloet, S. L.; Faber, K. N.; Harmsen, H. J. M.; Zeller, G. F.; Kuijper, E. J.; Smits, W. K.; Ducarmon, Q. R.

doi:10.1038/s41467-025-56449-x

Download PDF

Article
Open access
Published: 30 January 2025

Metagenomic global survey and in-depth genomic analyses of Ruminococcus gnavus reveal differences across host lifestyle and health status

Nature Communications volume 16, Article number: 1182 (2025) Cite this article

11k Accesses
8 Citations
13 Altmetric
Metrics details

Subjects

Abstract

Ruminococcus gnavus is a gut bacterium found in > 90% of healthy individuals, but its increased abundance is also associated with chronic inflammatory diseases, particularly Crohn’s disease. Nevertheless, its global distribution and intraspecies genomic variation remain understudied. By surveying 12,791 gut metagenomes, we recapitulated known associations with metabolic diseases and inflammatory bowel disease. We uncovered a higher prevalence and abundance of R. gnavus in Westernized populations and observed bacterial relative abundances up to 83% in newborns. Next, we built a resource of R. gnavus isolates (N = 45) from healthy individuals and Crohn’s disease patients and generated complete R. gnavus genomes using PacBio circular consensus sequencing. Analysis of these genomes and publicly available high-quality draft genomes (N = 333 genomes) revealed multiple clades which separated Crohn’s-derived isolates from healthy-derived isolates. Presumed R. gnavus virulence factors could not explain this separation. Bacterial genome-wide association study revealed that Crohn’s-derived isolates were enriched in genes related to mobile elements and mucin foraging. Together, we present a large R. gnavus resource that will be available to the scientific community and provide novel biological insights into the global distribution and genomic variation of R. gnavus.

A genome-wide association study for gut metagenome in Chinese adults illuminates complex diseases

Article Open access 09 February 2021

Interpersonal variability of the human gut virome confounds disease signal detection in IBD

Article Open access 25 February 2023

The genomic landscape of reference genomes of cultivated human gut bacteria

Article Open access 25 March 2023

Introduction

The human gut microbiome is a topic of intense research interest and many bacterial species have been associated with specific diseases¹. One such species is Ruminococcus gnavus, for which associations with human health have been reported in the context of various ailments^2,3,4,5,6,7. Officially, its taxonomic status has been revised and R. gnavus is now member of the genus Mediterraneibacter, but it has also been termed Faecalicatena gnavus⁸. Here, we will designate the species as Ruminococcus gnavus. R. gnavus is a non-spore forming Gram-positive member of the bacterial phylum Bacillota (formerly Firmicutes) and was first described in 1976⁹. It is considered a prevalent member of the human gut microbiome (present in > 90% of healthy European and North-American adults), but can also be found in the gastrointestinal tract of a variety of animal species^10,11. Its median relative abundance in humans is reported to be ~0.1%–0.3%, although it should be noted that these estimates were based on small and geographically restricted studies^12,13.

In microbiome association studies, increases in R. gnavus relative abundance have consistently been linked to diseases including metabolic syndrome, type 2 diabetes mellitus and Crohn’s disease (CD, a form of inflammatory bowel disease (IBD))^2,3,14. Furthermore, its relative abundance increased concomitantly with symptomatic flares in CD, where it reached up to 69.5% of the gut microbiome². While it remains unknown if R. gnavus causally contributes to disease development or whether the increased abundance is a result of the changing intestinal environment, several molecular mediators have been identified that potentially contribute to disease. For instance, the cell-surface exposed polysaccharide glucorhamnan has been described as pro-inflammatory, with a strain-dependent effect, depending on whether the R. gnavus isolate carried a capsular polysaccharide that promoted a more tolerogenic response^15,16. However, these observations are limited by the fact that they were made using one or few isolates and strain variation remains underexplored in many gut microbes, including R. gnavus.

Not only mechanistic, but also genomic studies of R. gnavus have suffered from a limited scope. One study divided R. gnavus into two clades based on genome sequences and noted that one was enriched in IBD patients². However, this study was limited by a low number of draft isolate genomes (N = 11) and a scarcity of knowledge on experimentally verified virulence factors of R. gnavus at the time^{15,16,17,18,19,20}. A more recent study based on 152 draft genomes identified three major lineages, but genomes of different host organisms were mixed and this study did not investigate associations of genetic features with metadata¹¹. Therefore, an important outstanding question remains whether proposed R. gnavus virulence factors are enriched in IBD-derived isolates, or whether different genes and functions could separate IBD-derived R. gnavus isolates from controls.

In this work, we surveyed global R. gnavus prevalence and abundance across thousands of gut metagenomes to provide a more nuanced picture across human lifespan, different lifestyles, and disease, thereby revealing striking differences. Next, through extensive culturing efforts we established a resource of 45 R. gnavus isolates and applied PacBio circular consensus sequencing (CCS) to generate complete genomes. This collection of isolates and their complete genomes provides ample scope for targeted experimental follow-up work and will be available as a community resource for the scientific community. We complemented this unique collection with publicly available (short-read draft) genomes, which allowed us to perform large-scale comparative genomics at both the level of phylogeny and predicted gene functions.

Results

Intestinal colonization with R. gnavus is associated with age, health, geography and lifestyle

In order to provide a nuanced view of R. gnavus prevalence and abundance across health and disease, geography, and lifestyle, we screened 12,791 publicly available metagenomes from all over the world with manually curated metadata (Fig. 1, Supplementary Data 1; full per-sample metadata are available through https://waldronlab.io/curatedMetagenomicData/)²¹. We observed R. gnavus in 50.58% of all included subjects and the prevalence in 9126 healthy individuals was 43.09% (Fig. 1a). As R. gnavus has been robustly associated with disease, especially with metabolic disease and IBD^2,3, we compared R. gnavus prevalence and abundance between patients with these diseases and healthy subjects (or asymptomatic control subjects) in a meta-analysis. R. gnavus was ~1.6 times more prevalent in IBD patients (70.2%; logistic regression, p < 2.2 × 10⁻¹⁶, odds ratio (OR [95% confidence interval]) = 3.1 [2.6 – 3.7]), 1.3 times more with hypertension (58.0%; p = 0.00127, OR = 1.8 [1.3–2.5]), 1.5 times with type-2 diabetes (T2D; 62.9%; p = 1.52 × 10⁻⁹, OR = 2.2 [1.8−2.8]), and 2.2 times with atherosclerotic cardiovascular diseases (ACVD; 96.2%; p < 2.2 × 10⁻¹⁶, OR = 33.4 [17.6–74.1]) compared to healthy subjects. Furthermore, the relative abundance of R. gnavus was also higher in these conditions as compared to healthy (Fig. 1b; healthy: median [1st-3rd quartile] = 0% [0–0.08%]; IBD: median = 0.11% [0–1.04%], linear model, p < 2.2 × 10⁻¹⁶; T2D: median = 0.027% [0.0-0.22%], p = 1.9 × 10⁻¹⁰; ACVD: median = 0.78% [0.09-3.14%], p < 2.2 × 10⁻¹⁶), except hypertension (median = 0.01% [0–0.07%], p = 0.399). Together, we thus recapitulated that R. gnavus occurs more frequently and in higher abundances in the gut microbiome of patients suffering from IBD, hypertension and T2D. Additionally, our analysis uncovered a striking novel enrichment in ACVD, which had the highest prevalence and abundance of any disease group.

Subsequently, we investigated prevalence and relative abundance of R. gnavus across countries (Fig. 1c, d and Supplementary Fig. 1a). We show only healthy individuals to exclude possible confounding by diseases such as IBD and metabolic disease. We observed large differences in prevalence, which ranged between 10–90% across countries (overall median: 41%) and mean relative abundance per country ranged between 0.0078-4.05% (overall mean = 0.67% ±3.20 standard deviation; Supplementary Fig. 1a). This variation could be partly explained by Westernization status; this binary classification of Westernized / non-Westernized lifestyles is based on, among others, access to medical care and pharmaceuticals, livestock exposure and diet²². Westernized individuals had higher prevalence and abundance of R. gnavus compared to non-Westernized individuals (Fig. 1c-e; prevalence: logistic regression, p < 2.2 × 10⁻¹⁶; abundance: linear model, p < 2.2 × 10⁻¹⁶). As these data were generated in multiple studies, we cannot exclude effects of technical differences (e.g., DNA extraction method). To partially check for this, we investigated sequencing depth and found that higher prevalence and abundance were not the result of higher sequencing depth in Westernized countries as non-Western samples were sequenced deeper (Supplementary Fig. 1b; t-test p = 6.9 × 10⁻²⁰). These differences hold true for any 10% quantile of sequencing depth (Supplementary Fig. 1c, Methods). We also checked for possible correlations between sequencing depth and R. gnavus abundance and found a weakly negative correlation in both Westernized and non-Westernized metagenomes (Supplementary Fig. 1d). In conclusion, R. gnavus colonization is vastly different between countries, and Westernization (lifestyle) may be a major factor contributing to these differences.

We noted extremely high R. gnavus abundance values in healthy people, up to a relative abundance of 83%. Metagenomes with the highest abundances were often samples collected from newborns and children up to age 2, most of whom were recorded not to have received antibiotics. This motivated a further analysis of age-related patterns of R. gnavus colonization (Fig. 1f)²¹. R. gnavus abundances were higher in newborns (linear model, p < 2.2 × 10⁻¹⁶), children up to 11 years old (p < 2.2 × 10⁻¹⁶), and adolescents between 12 and 18 years old (‘schoolage’; p = 0.0164) as compared to adults. Abundances were also higher in seniors (65-92 years old) than in adults (p = 1.37 × 10⁻⁶). We observed similar patterns regarding R. gnavus prevalence (Fig. 1G), where newborns (logistic regression, p = 1.14 × 10⁻⁶), children aged 1-11 (p < 2.2 × 10⁻¹⁶) and seniors (p = 2.92 × 10⁻⁴) were more likely to carry R. gnavus than adults. Adolescents and adults did not have different prevalence of R. gnavus (p = 0.0797).

The high abundances of R. gnavus in infants instigated a closer inspection of abundance over age and in correlation to breastfeeding, as breastfeeding was recently reported to have a strong impact on R. gnavus colonization²³. Looking at R. gnavus abundance in the first ten years of life (Supplementary Fig. 2a), we see a rapid increase after the first half year, followed by a decline and rebound around 8 years. We found the shift in the first half year to strongly correlate with feeding practice (Chi-square, p = 1.49 × 10⁻¹⁵). Specifically, infants that were breastfed had lower R. gnavus abundance than children that received no breastfeeding (linear model; exclusive breastfeeding, p = 6.49 × 10 − 11; mixed feeding, p = 4.69 × 10⁻⁴; Supplementary Fig. 2b). To exclude possible confounding and identify other associated factors, we also tested for associations between R. gnavus abundance with feeding practice (n = 184), mode of delivery (n = 170) and antibiotics use (n = 94) in infants of age up to two years, for whom feeding practice data had been recorded, using multivariable linear modeling. This indicated that only feeding practice was significantly associated with R. gnavus abundance in infants (Chi-square of total variable effect, p = 8.93 × 10⁻⁶). In summary, we find evidence that indicates that breastfeeding delays of R. gnavus colonization in infants, corresponding with previous reports²³.

Together, colonization with R. gnavus appears to be dynamic across the lifespan in healthy individuals, with the highest abundances observed in newborns. While these metagenomic analyses provide important insight into the global distribution of R. gnavus, in-depth genomic analyses are required to investigate whether genomic content differs across described factors such as disease and geography.

Newly generated complete genomes have superior assembly characteristics and cover phylogenetic diversity

For our large-scale genomic analysis of R. gnavus, we first established an isolate collection through extensive culturing efforts and by collecting available isolates, from which we sequenced the genome of 45 isolates using PacBio circular consensus sequencing (CCS) to yield complete, circular genomes and potential extrachromosomal elements (Fig. 2, Methods; Supplementary Data 2). We next complemented these with 208 available MAGs for which sufficient metadata could be retrieved and short-read genome data of an additional 79 isolates (Methods). To obtain assemblies of optimal quality, we tested five long-read de novo assemblers and selected the result with the longest contig (Methods, Supplementary Fig. 3, Supplementary Data 3). We also comprehensively analyzed methylation patterns for the sequenced isolates (Supplementary Methods, Supplementary Fig. 4, Supplementary Data 4). Comparing the quality of these genomes, we observed that MAGs were worse in every aspect of genome assembly when compared to isolate assemblies (Fig. 2a). While total length and number of genes were lower for MAGs as expected, GC content clearly differed between MAGs and isolate genomes, suggesting that current MAG binning techniques may fail to capture AT-rich regions. We further observed that isolates that underwent PacBio CCS were often assembled into single circular contigs, in contrast to a mean of 107 ( ± 58.4 standard deviation) contigs per short-read isolate genome. Additionally, we found four circular extrachromosomal elements predicted to be plasmids with 99.9% confidence (Fig. 2a,b, Supplementary Fig. 5), demonstrating the added value of PacBio CCS. These four putative plasmids comprise two different large sequences of 191 kb and 164 kb, which derived from two distinct isolates from healthy individuals (i.e., QRD006, QRD009 and QRD010 contain one plasmid, QRD011 the other), and have not been described in R. gnavus to date. The plasmids are modular and highly related, that is, they are identical except for one gene cluster that is missing from the shorter 164 kb plasmid (Supplementary Fig. 5a). They do not contain evident predicted antibiotic resistance or virulence genes (Methods). The plasmids are likely conjugative or mobilizable based on identified putative transposase genes which is consistent with their geographically distinct origins (USA and Japan). The plasmids contain a putative ParABS segregation system, annotated as ‘Soj’ (ParA) and ‘ParB domain containing protein’ (ParB). A key feature is a (hypothetical) non-ribosomal protein synthesis (NRPS) cluster with no known homologs (Supplementary Fig. 5b). However, upstream of it we identified with moderate confidence a transcription factor binding site for CatR, an H₂O₂-responsive repressor.

Leveraging our large genome collection, we then investigated the phylogenetic diversity of R. gnavus (Fig. 2c). This revealed no continent or genome source-specific clustering, but importantly, demonstrated that our R. gnavus isolate collection captures the full breadth of phylogenetic diversity across the tree (Fig. 2c).

R. gnavus motility possibly restricted to infant-derived strains

In order to characterize the functional capacity of R. gnavus, we annotated our genomes with functional orthologs, modules and pathways (from KEGG²⁴) and used linear modeling to identify associations between microbial functions and metadata. Using this methodology, we observed flagellum biosynthesis exclusively in newborns and infants up to 1 year of age, and this association was also statistically significant (p = 0.008). We further investigated flagellum biosynthesis together with chemotaxis, as these are functionally closely related, and found both pathways in ten out of 333 genomes. These ten genomes are all MAGs originating from newborns and infants up to 1 year of age (Supplementary Fig. 6a) and contained (almost) complete operons (Supplementary Fig. 6b). To ensure this finding was not a technical assembly artifact, we traced the origin of these genomes, which revealed that these MAGs derive from infants sampled in three studies and five geographically separated locations (Estonia, Finland, Italy, Russia, and Sweden). Eight out of ten genomes with flagellum genes belong to a phylogenetic clade that is associated with newborns and infants (17/19 genomes in that clade derive from infants of 1 year old or younger; Fig. 2c, clade highlighted in gray), suggesting that motility might be associated with a specific infant-associated clade of R. gnavus. The absence of isolates in this clade precludes experimental verification of flagellum functionality, but strain differences in flagella and motility have been described⁹.

We also screened all genomes for antibiotic resistance genes and found that resistance against tetracycline is the most common among R. gnavus (75/125 isolates; Supplementary Fig. 7). A minority of genomes contains resistance genes against aminoglycosides (n = 19), chloramphenicol (n = 8), trimethoprim (n = 11), lincosamide/macrolide (n = 24), and one and two genomes contain genes related to beta-lactamase and streptothricin resistance, respectively. For selective culturing of R. gnavus we therefore deem tetracycline the most helpful and in vitro validation confirmed that at least isolates containing the tet(O) and/or tet(40) genes, which account for the majority of the observed tetracycline resistance determinants, indeed have increased minimum inhibitory concentrations compared to isolates without tet gene (Supplementary Data 5).

Genomic differences between isolates from healthy and Crohn’s indicates a Crohn’s-specific subspecies

To evaluate whether CD-derived R. gnavus isolates genomically differ from healthy-derived isolates, we first placed our genomes into a core genome-based phylogenetic tree (Fig. 3a). As this tree contains practically identical isolates derived from the same person, we also constructed a tree of deduplicated genomes to facilitate statistical testing (Supplementary Fig. 8). This revealed three main clades with a strong enrichment of Crohn’s-derived isolates in the two more basal clades (Fisher’s exact test, p = 3.7 × 10⁻⁴, OR = 12.1 [2.5 – 69.8]). As our phylogenetic tree was reconstructed from only the core genome, we next performed whole-genome ANI analysis and accessory genome comparisons to also assess differences in the other genomic loci, which resulted in a highly similar clustering (Fig. 3a,b). As all R. gnavus genomes included here share at least 95% similarity with one another, which is often considered the species boundary^25,26, we consider that these clades represent subspecies. Together, these results demonstrate that R. gnavus isolates from CD patients are often, but not always, genomically distinct from isolates from healthy controls based both on their core and accessory genome. The phylogeny indicates that most healthy-derived isolates form a monophyletic subspecies clade, while the CD isolates appear polyphyletic and may be categorized into multiple groups.

Host phenotypes cannot be explained by previously identified putative virulence factors in R. gnavus

A previous study established that R. gnavus can secrete a glucorhamnan polysaccharide with pro-inflammatory properties¹⁵. However, another study found the putative gene cluster encoding the production machinery for this polysaccharide varied between strains, but direct comparison was not possible with short-read sequencing data²⁷. Such insights into genomic variations may be crucial to understand immunogenicity of different isolates, motivating a more detailed analysis of this gene cluster and other genes with similar putative functions. We therefore tested whether previously suggested R. gnavus virulence factors could explain the association with CD (Supplementary Fig. 8). First, we observed that four genes or gene clusters (superantigens, tryptophane decarboxylase, bilirubin reductase and selenium-dependent xanthine dehydrogenase) were present in all complete R. gnavus genomes and are therefore part of the core genome (Fig. 3a). While we saw variation in several other gene clusters (glucorhamnan-producing gene cluster, Fisher’s exact test, p = 0.19; and the nan gene cluster, p = 0.35), only one, namely the capsular polysaccharide gene (cps) cluster was associated with the distinction and was detected exclusively in isolates from the healthy-associated clade (p = 8 × 10^-4). In conclusion, only the cps cluster, that leads to a more tolerogenic immune response¹⁶, could partially distinguish host phenotype groups.

Genomic architecture of gene cluster producing the proinflammatory polysaccharide glucorhamnan reveals genomic variations

Previous studies have highlighted the relevance and genomic architecture of the gene cluster producing inflammatory glucorhamnan based on complete, intermediate, or limited short-read coverage²⁷. Here, we re-examined in our diverse collection of complete genomes if these clusters derive from the same genomic locus and are likely to be homologous (Supplementary Fig. 9). Compared to the isolate in which the gene cluster was experimentally verified (QRD039 = RJX 1121)¹⁵, we saw variations in multiple genes, including several glycosyltransferases (Supplementary Fig. 9a). We observed 13 out of 45 long-read genomes to have the complete original cluster as identified in RJX1121, while 30 genomes had 20/23 genes as annotated in NZ_AAYG02000032.1 and two had 21/23 genes (those with 20 or 21 hits are subsequently called ‘partially complete’)^15,27. These genomes lacked the same genes: a glycosyltransferase (RUMGNA_03519; present in the two genomes with 21 genes found), a transporter (RUMGNA_03522) and a polyphosphoglycerol synthesis gene (RUMGNA_03523). These partially complete cluster variants lack the genes in positions that were reported to have low coverage and we think they are therefore the same as those described in Sorbara et al., 2020 as ‘intermediate coverage’. To elucidate whether these genomes contain a truly different gene cluster at a different genomic location, the flanking genes were determined to map the genomic neighborhood. All investigated genomes had the same neighboring genes, thereby revealing a conserved genomic locus (The 3’ and 5’-flanking genes are annotated as ‘HPr family phosphocarrier protein’ and ‘glutamine-fructose-6-phosphate transaminase’). By closer inspection of the genomic loci, we found that the operon lacking RUMGNA_03519, RUMGNA_03522 and RUMGNA_03523 had other genes inserted instead (Supplementary Fig. 9a). Moreover, the variability at protein level compared to the reference gene (30-70% identity) suggests that this whole locus may be subject to positive selection or adaptation pressure. Nevertheless, based on similarity in genomic architecture we expect that all these strains still produce polysaccharides, although it remains to be established whether all of them induce pro-inflammatory effects.

A similar comparative genomics analysis for the nan gene cluster, responsible for releasing 2,7-anhydro-Neu5Ac from mucin²⁰, showed some genomes with nan-like genes in a different locus (Supplementary Fig. 9b). All these alternative nan-like clusters had the same genomic architecture, which importantly lacked the nanH (intramolecular trans sialidase) gene, suggesting that this partial cluster does not confer the same function. Together, these data show that strain differences across functionally relevant gene clusters are common, indicating that statements regarding virulence of R. gnavus based on single isolates should be interpreted with caution. Our collection of well-characterized isolates allows researchers to assess the relevance of strain differences in future experiments.

GWAS reveals genes related to healthy or Crohn’s-associated phenotype

In order to find genes that could explain differences in the genomic repertoire of Crohn’s-derived versus healthy-derived isolates, we conducted a bacterial GWAS using Hogwash, which incorporates genomic relatedness information (Methods). On a technical note, we confirmed high correlation between core and accessory genomes (Fig. 3b), and high pangenome size similarity between the Crohn’s-associated and healthy-associated groups (Fig. 3c). We deemed including MAGs for this analysis to be inappropriate, as both the core and accessory genome of MAGs are substantially smaller than that of isolates (Supplementary Fig. 10, p < 2 × 10⁻¹⁶). Thus, their inclusion may increase false negatives or otherwise lead to spurious results.

Our bacterial GWAS analysis revealed 163 genes that were robustly associated with Crohn’s isolates (FDR < 0.05, stricter synchronous model) through a high epsilon value, which quantifies the correlation between genotype and phenotype (Fig. 3d)²⁸. We visualized and counted the presence of these genes in all R. gnavus genomes to better understand their possible correlation with host phenotype (Supplementary Fig. 11,12). Among the genes enriched in Crohn’s-derived isolates we found nineteen genes related to mobile genetic elements (transposases and excisionases), a predicted fucosidase which might be involved in cleaving off the terminal fucose residue on mucin, a response regulator that Bakta annotated as ‘spo0A’, and a holin gene (Supplementary Data 6). We screened the consensus sequence of this putative fucosidase gene for CAZyme domains (Methods) to gain more functional insight and indeed found a GH29 domain encoding a fucosidase. We also compared fucosidase domains between Crohn’s and healthy isolates using CAZyme annotations for GH29 and GH95 (CAZymes with known fucose-cleaving functionality of mucin molecules), but found no significant differences (Wilcoxon rank sum test, p = 0.098 and p = 0.39, respectively; Supplementary Fig. 13). On the other hand, healthy-derived isolates were especially enriched for galactosidases and other genes involved in sugar metabolism (Fig. 3d, Supplementary Data 5). Taken together, we find novel gene-phenotype associations and provide a set of candidate genes for follow-up research on the role of R. gnavus in CD.

Discussion

Host phenotype-microbe association studies are often restricted to single diseases, age groups and geographic regions, which has also been the case for R. gnavus^12,13. In this work we provide a detailed global image of both the relative abundance and prevalence of R. gnavus, while we also investigate genomic variation within R. gnavus isolates in depth. In both aspects, this is to our knowledge the largest investigation to date. Key findings are the remarkably high relative abundance in newborns and young infants (Fig. 1f), which is inversely associated with breastfeeding (Supplementary Fig. 2), and the increased prevalence and abundance of R. gnavus in Westernized populations (Fig. 1c,d). Given the robust associations of increased relative abundance of R. gnavus with several inflammatory diseases and allergies, many of which have high incidence in high-income countries and have their incidences rapidly increasing in newly industrialized countries^29,30,31,32, this begs the question of whether R. gnavus can have detrimental immunogenic effects on the host and whether this is strain-dependent. We show extensive genetic variation between strains in immunomodulating gene clusters, and our genetically well-characterized isolate resource can be used for experimental validation of differences in immunogenicity. The high prevalence of R. gnavus across both healthy and diseased individuals suggests that the consequences of being colonized with R. gnavus per se are unlikely exclusively negative, prompting the question if disease-associations become apparent when distinguishing R. gnavus strains. This hypothesis is in line with what we observed in clustering of our isolate genomes (Fig. 3a), where we see that isolates deriving from healthy individuals generally cluster apart from those isolated from Crohn’s patients. Indeed, there have also been examples in literature of a positive health influence of R. gnavus, for example with healthy weight gain in undernourished children³³. It would therefore be crucial that future intervention studies using R. gnavus determine if the used isolates belong to a healthy-associated or disease-associated clade.

In the past decade MAGs have been increasingly used in large-scale gut bacterial genomics studies^{34,35,36,37,38}, especially because culturing of specific gut bacteria can be highly laborious and challenging. While these MAGs have led to important biological advances, we show here that even high-quality MAGs (as defined by international standards³⁹) remain of substantially worse quality than isolate genomes in multiple aspects (lower genome size and missing genes, higher GC content, amongst others, Fig. 2A)⁴⁰. In case of bacterial GWAS analyses, which aim to associate bacterial genes or genomic features with a phenotype of interest, including MAGs may therefore lead to biases and spurious associations caused by (non-)randomly missing genes due to binning and assembly artifacts. Extrachromosomal elements such as plasmids are generally not represented in MAGs, as they cannot be confidently binned, while these may be the most relevant in connection to disease and treatment options^41,42.

Through bacterial culture combined with PacBio CCS, we have generated high-quality genome data that lead to novel insights into R. gnavus biology. Two aspects that highlight this are the identification of large plasmids and a conserved methylated sequence motif. To date, only one 7kb-long plasmid of Ruminococcus gnavus is described in GenBank (accession number NZ_CP084015.1)⁴³. The two related novel plasmids we identified in the present study are much larger (164 kb and 191 kb; Supplementary Fig. 5) and likely conjugative, indicating a diversity of plasmids in R. gnavus that is of yet underexplored. The methylated DNA motifs that are identified here are different from those known so far (http://rebase.neb.com/cgi-bin/pacbioget?10929; Supplementary Fig. 4)⁴⁴, in line with the high variability in motifs we found per genome. Nevertheless, we find a single m⁴C-methylated motif that is almost universally conserved across R. gnavus genomes (VNNVNCTGVNCAN). These results are reminiscent of those described for Clostridioides difficile⁴⁵.

We demonstrated that R. gnavus is a polyphyletic species, divided into multiple (genotypically and phenotypically distinct) subspecies clades. Notably, Crohn’s-derived isolates were overrepresented in specific phylogenetic groups, while previously suggested virulence factors could not explain this separation. This suggests that these virulence factors may not play a significant role in CD symptomatology. Instead, by bacterial GWAS we identified 163 genes that could be targets for experimental validation of their role in CD development (Fig. 3, Supplementary Data 5). Among these genes are 56 that we find overrepresented in CD. However, we advise further validation of these genes in larger numbers of Crohn’s-derived R. gnavus genomes before conducting laborious in vitro or in vivo experiments. Validations with the currently available data indicate that some presumably Crohn’s-associated genes are also common among R. gnavus derived from healthy people. We listed the more noticeable candidates for which functions could be predicted. The most striking candidate is a putative fucosidase gene, as this could be directly involved in relevant cellular processes such as cell adhesion and immune system regulation⁴⁶. Secondly, we hypothesize that genomic rearrangements and horizontal gene transfer may play an important role in the evolution of CD-associated R. gnavus, given the enrichment of predicted transposase and excisionase genes. Thirdly, we find a predicted holin gene which, although highly speculative, might play a role in suppressing competing bacteria⁴⁷. A previous study identified 199 IBD-specific genes², based on a pangenome of 17 draft genomes. Those draft genomes include multiple IBD-related strains and genomes from the type strain, which we find to be phylogenetically distant in our core genome phylogeny based on a pangenome of 333 genomes. This increase in genome number in the current work particularly expands the accessory genome, where the largest differences in functionality are expected. Both the previous report and our results indicate predicted functional differences in e.g. mobile elements such as transposases and (putative) mucus utilization genes underscoring the robustness of the results and narrowing down the set of target genes for IBD-specific research². Furthermore, IBD research on R. gnavus could benefit from considering the host and possible complex host-microbe interplay for the proposed virulence factors. For example, in antibiotic-treated mice the genetic background determined whether R. gnavus would ameliorate or exacerbate colitis⁴⁸.

In conclusion, we present one of the largest collections of complete genomes and associated extrachromosomal elements of any gut microbe not usually causing acute infection⁴⁹, and provide important novel biological insight into the global epidemiology and genomic variation of R. gnavus. R. gnavus has an ambiguous relationship with human health⁵⁰, and different strains may exert different effects on their host. Our resource of complete genomes and isolates opens promising avenues for experimental validation and further bioinformatic scrutiny, and we expect this to be valuable to the broad gut microbiome research community.

Methods

Assessing prevalence and abundance of R. gnavus across human populations

We used the publicly available ‘curatedMetagenomicData’ (version 3.6.2) resource to screen 21,030 fecal metagenomes from 86 studies on all habitable continents for the prevalence and abundance of R. gnavus²¹. We used R (version 4.0.2; https://www.R-project.org/) to interrogate this dataset and calculate statistical parameters. We focused our analyses on metagenomes with a sequencing depth of at least five million reads and retained only the first sample per subject ID, after which 12,791 samples remained. We used the accompanying curated metadata to assess prevalence and abundance among healthy individuals across age, geography, lifestyle, and health states (Supplementary Data 1). Prevalence of R. gnavus was compared using logistic regression. Relative abundances were compared after adding a pseudocount of 1.3 × 10⁻⁵, followed by log-transformation and multivariable linear modeling. To identify suitable variables for logistic and linear models, we calculated collinearity between variables using Variance Inflation Factors (VIF) using the ‘vif()’ function from the ‘car’ package. VIF values above 2 were excluded by removing age (in years) and country from the models, leaving disease, age category, gender and westernization included as informative variables. Rows with missing values were discarded when building the models. For the final models, the association with each variable to R. gnavus prevalence or abundance was tested with Chi-square using the ‘drop1()’ R function. For infants, linear models were built using the same approach, including the variables feeding_practice, born_method and antibiotics_current_use. Correlation between feeding practice and age under or over half a year were tested using Chi-square. Sequencing depth (number of reads) was also log-transformed and compared using parametric t-test. P-values ≤ 0.05 were considered significant. To compare differences in R. gnavus prevalence in relation to sequencing depth, we divided all Westernized and non-Westernized metagenomes in ten equal groups (quantiles) based on sequencing depth (number of reads). Relative abundances of R. gnavus are shown as quantiles, as adapted from previous publications^51,52.

Mapping the distribution of R. gnavus across environments

To map the spread of R. gnavus across different environments, we searched publications and online resources that link the presence of R. gnavus to an environment or biome. R. gnavus has been described to reside in the intestinal tract of different animals: cats and dogs¹⁰, chickens⁵³, lambs⁵⁴, rodents and pigs¹¹, and cattle⁵⁵. Furthermore, we have downloaded and screened the dataset related to the 2022 Microbiome publication by Ruscheweyh and colleagues to visualize prevalence and abundance of R. gnavus (Supplementary Fig. 14)⁵⁶.

Collection and curation of publicly available genome datasets

To compose a collection of R. gnavus metagenome-assembled genomes (MAGs) and isolate genomes, we queried a large, recent collection of gut MAGs³⁴. Here, we specifically selected high-quality (HQ) MAGs annotated as Ruminococcus gnavus or its synonym Faecalicatena gnavus (with completeness > 90% and contamination < 5%)³⁹. As the metadata from Almeida et al. does not contain curated information on disease status of the individual and this is of prime interest to our study³⁴, we matched identifiers to those present in the curatedMetagenomicData package. HQ-MAGs were only included if at least both disease status and geographic origin of the original sample could be traced back. This led to a collection of 201 HQ R. gnavus MAGs with associated metadata.

In order to obtain additional isolate genomes to complement the MAG collection, we queried the NCBI database in December 2021 and associated metadata to retrieve at least information on disease status and geographic origin of the isolate, like the HQ-MAGs. This yielded an additional 65 R. gnavus isolate genomes, which all originated from China or the USA. Furthermore, we included the type strain as reference genome (ATCC 29149, accession number GCA_009831375.1)^2,27,57.

Metagenome-assembled genome generation from fecal metagenomes derived from multiple recurrent Clostridioides difficile-infected patients

We used an in-house metagenomic dataset of multiple recurrent Clostridioides difficile-infected patients to generate seven additional HQ R. gnavus MAGs – the metagenomic data of which are available in the European Nucleotide Archive under project number PRJEB44737⁵⁸. To produce high-quality metagenome-assembled genomes (MAGs), we adapted a previously published protocol⁵⁹.

The workflow is available as Snakemake⁶⁰ on Zenodo (https://doi.org/10.5281/zenodo.14628195) and works as follows. Raw metagenomics sequencing reads, from which human reads had already been removed, were preprocessed using fastp (version 0.20.1, parameters: ‘--cut_right --cut_window_size 4 --cut_mean_quality 20 -l 75 --detect_adapter_for_pe -y’) to trim low-quality ends, remove reads shorter than 75 bases, remove adapter sequences and remove low-complexity reads⁶¹. (Note: preprocessing is not part of the workflow as described on Zenodo.) Remaining, high-quality reads were assembled into scaffolds using metaSPAdes (version 3.15.4, parameters: ‘--only-assembler’)⁶². Scaffolds were binned with metaWRAP⁶³ (version 1.3.2) using three binning tools: MaxBin2⁶⁴ (version 2.2. 6), MetaBAT2⁶⁵ (version 2.12.1) and CONCOCT⁶⁶ (version 1.0.0) using a minimum contig length of 2500 bp (‘-l’ option). Bins were then refined using metaWRAP’s ‘bin_refinement’ function, which uses CheckM⁶⁷ (version 1.0.12) to assess bin quality, setting completeness and contamination cut-offs of 75% and 10%, respectively (‘-c’ and ‘-x’ options). After refinement, bins were reassembled using metaWRAP’s ‘reassemble_bins’ function with assemblers MEGAHIT⁶⁸ (version 1.1.3) and metaSPAdes (version 3.13.0), again setting the minimum completeness to 75% and contamination to 10%, and the minimum length to 2000 (‘-l’ option). The resulting refined and reassembled bins were classified with the Genome Taxonomy Database toolkit (GTDB-Tk; version 2.1.0)⁶⁹. Bins classified as Ruminococcus gnavus with > 90% completeness and < 5% contamination were included for further analyses.

Culturing of R. gnavus from feces of healthy donors and patient material

We ordered R. gnavus strain H2_28 (DSM number 108212) from the German Collection of Microorganisms and Cell Cultures (DSMZ, Braunschweig, Germany), resuspended it in Brain Heart Infusion broth (bioMérieux, Marcy-l'Étoile, France) and streaked it on Tryptic Soy agar +5% Sheep blood (TSS; bioMérieux) to isolate pure cultures. Two unique cultures (QRD001-QRD002) were isolated from feces by streaking on Columbia Naladixic acid Agar (bioMérieux; Supplementary Data 2). These were all cultured in an anaerobic cabinet (Whitley A35, Don Whitley Scientific Limited, UK) with an anaerobic gas mixture (10% H₂, 10% CO₂, 80% N₂) at 37 °C. These samples were cultured from two different sample collections. First, healthy-derived isolates were obtained from donor fecal samples of Netherlands Donor Feces Bank donors and written informed consent was obtained for using these and clinical data, and approved by the Medical Ethics Committee at Leiden University Medical Center (P15.145). Second, CD-derived isolates from LUMC were obtained from fecal samples of patients aged above 18 years with a planned fistula surgery at LUMC and material was collected between July 2019 and June 2021. The study was approved by the Central Committee on Research involving Human Subjects and the local Medical Ethical Committee of the Leiden University Medical Center (study number P18.069). All patients gave written informed consent.

To further expand our R. gnavus genome collection, we cultured fourteen R. gnavus isolates from fecal samples of healthy feces donors that were available at Vedanta Biosciences (Supplementary Data 2). Human donor samples were obtained from both university hospitals and commercial sources. In all instances, informed consent language was reviewed and approved by the local ethics and regulatory authorities. Consent for the use of the sample was obtained from each subject. These were isolated and identified as follows: R. gnavus strains were isolated from various healthy donor stools by generating spore and non-spore fractions. Briefly, the non-spore fraction was generated by resuspending 1 g of fecal material in 10 mL sterile, pre-reduced PBS. The spore fraction was generated by adding 100% ethanol to the PBS fecal suspension to achieve a 50% (v/v) ethanol concentration. The fecal ethanol suspension was incubated at 25 °C for 1 hr while shaking. Following incubation, the fecal ethanol suspension was centrifuged at 3400 × g for 20 minutes and the cell pellet resuspended in 1 mL of sterile, reduced PBS. Serial dilutions of the spore and non-spore fraction were plated on either Eggerth-Gagnon + 5% horse blood agar, Brucella Blood Agar (Anaerobe Systems, Inc., Morgan Hill, California, USA), MSAT (Anaerobe Systems), or chocolate agar and incubated at 37 °C anaerobically for 72 hr. Isolated colonies were identified by Sanger sequencing of the 16S amplicon using 8 F and 1492 R primers and Illumina shotgun sequencing. Isolated colonies were inoculated into 1.2 mL of Peptone Yeast Extract Broth with Glucose (PYG; Anaerobe Systems) in a 96-deep well plate and incubated at 37 °C anaerobically for 48 hr. After incubation, colony identity was determined by performing PCR from 200 µL of the culture using universal 16S primers 8 F and 1492 R. Selected isolates were then sub-cultured from the 96-deep well plate onto the appropriate agar medium and incubated at 37 °C anaerobically for 72 hr. An isolated colony from this plate was inoculated into 5 mL of PYG and incubated at 37 °C anaerobically for 24 hr. 1 mL of the culture was pelleted by centrifuging at 10000 × g for 5 minutes. DNA was extracted from the pellet using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany) following the manufacturer instructions. Colony identity was determined again by Sanger sequencing of the 16S gene amplicon using 8 F and 1492 R primers and Illumina shotgun sequencing.

Furthermore, fourteen isolates were cultured and collected at the University Medical Center Groningen as follows. Brucella blood agar medium (Mediaproducts BV, Groningen, The Netherlands) was used to cultivate the R. gnavus strains QRD024, QRD025 and QRD028 from human clinical specimens (Supplementary Data 2). QRD024, QRD025 and QRD028 were obtained from clinical samples and isolated bacteria were used for research purposes as no objections were raised by patients and no patient data was used. The plates were transferred to an anaerobic workstation (Whitley A45) after inoculation and incubated for one to three days at 37 °C. The anaerobic medium YCFA supplemented with either apple pectin or porcine mucin type III (4.5 g/l) was used for the isolation of QRD026, QRD027, and QRD029-QRD031 as described earlier⁷⁰. Fecal samples of healthy volunteers were used for inoculation on pre-reduced medium and the plates were incubated at 37^oC in an anaerobic chamber (Whitley A35 Workstation) with an anaerobic gas mixture (10% H₂, 10% CO₂, 80% N₂). The strains QRD032-QRD037 were isolated from fecal samples of IBD patients on either phenylethyl alcohol agar (Mediaproducts BV, Groningen, The Netherlands), brain heart infusion agar (Oxoid Limited, Cheshire, UK) supplemented with yeast (2,5 g/l), hemin (0,001% w/v) and cysteine (1 g/l) or YCFA medium supplemented with glucose (4.5 g/l). Ethical approval for collecting and using biological material was obtained as previously described for QRD026, QRD027 and QRD029-QRD037 (local ethics committee of the University Medical Center Groningen METc2014.236 and METc2014.291, respectively)⁷¹. Additional details on logistics and sample collection can be found in Plomp et al. for QRD026, QRD027 and QRD029-QRD031⁷², and in von Martels et al. (study was registered on ClinicalTrials.gov under NCT02538354) for QRD032-QRD037⁷¹.

Moreover, isolates as cultured in their respective publications were obtained from the Broad Institute¹⁵, and Sanger Institute⁷³. All cultures from outside the Leiden University Medical Center (LUMC) were sent to the LUMC as frozen glycerol stocks and anaerobically cultured on TSS. After obtaining pure colonies, all isolates were independently confirmed to be R. gnavus in our laboratory using matrix-assisted laser desorption/ionization coupled to a time-of-flight mass spectrometer (MALDI-TOF; Bruker Daltonics GmbH, Bremen, Germany). All isolates were able to grow on TSS, CNA and Chocolate agar PolyViteX (bioMérieux) and the colony morphology appeared on plates as round, glassy white colonies with a bright white center. Sometimes colonies displayed concentric circles, reminiscent of checker game pieces.

Data processing of Illumina-sequenced R. gnavus isolates

The fourteen isolates cultured at Vedanta Biosciences were sequenced on the Illumina NextSeq platform using 150 bp paired-end reads. These data were included with the isolate short-read-based genomes, increasing the number to 79 short-read isolates. Raw Illumina sequence data was cleaned and trimmed using fastp (v0.23.2) and sequence quality was inspected using Fastqc (v0.11.9; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Multiqc⁷⁴ (v1.8). Cleaned reads were assembled by first using SKESA⁷⁵ (v2.4.0) and subsequently SPAdes (v3.15.3) with “--untrusted-contigs” and “--isolate” parameters.

Quality control and annotation of short-read-based genome collection

We have collected a total of 287 short-read-based genomes of R. gnavus, consisting of 79 assembled whole-genome sequences from cultured isolates and 208 metagenome-assembled genomes (MAGs). We also added the one available reference sequence in our analyses (NCBI GenBank accession number GCF_009831375.1). We filtered out contigs shorter than 1000 bp using BBtools’ reformat.sh (version 37.62; https://sourceforge.net/projects/bbmap/). We estimated completeness and contamination of all genomes using CheckM (version 1.0.13) and verified that all genomes taxonomically classify as R. gnavus using GTDB-Tk (version 2.1.0). Assembly length statistics were determined using QUAST⁷⁶ (version 5.0.2). Finally, genomes were annotated using Bakta⁷⁷ (version 1.6.1), which also provides the number of open reading frames, or predicted genes, per genome.

DNA isolation of R. gnavus isolates and generation of complete genomes using PacBio circular consensus sequencing

To generate complete genomes, 45 isolates were subjected to long read sequencing on the Pacific Biosciences (PacBio, Menlo Park, California, USA) Sequel IIe platform at the Leiden Genome Technology Center. To prepare high molecular weight total DNA, isolates were cultured anaerobically overnight in 10 mL BHI at 37 °C. Cells from 5 mL of culture were pelleted and processed using the Qiagen Genomic-tip 100/G, according to the manufacturer’s instructions. SMRTbell® libraries were generated as follows. Genomic DNA was sheared with the Megaruptor 3 system (Diagenode LLC, Denville, New Jersey, USA) using 35 cycles. Libraries were generated according to the following manufacturer’s procedure and checklist: Preparing whole genome and metagenome libraries using SMRTbell® prep kit 3.0 (PN 102-166-600 REV02 MAR2023), thereby using barcoded adapters. Size-selection was performed on library sub-pools using either diluted AMPure PBbeads (PacBio, 35% beads, 3.1x v/v ratio) or Blue Pippin (Sage Science, Beverly, Massachusetts, USA), depending on the insert-size of the libraries. The libraries were sequenced on a PacBio Sequel IIe platform with a 30 hour movie time using Sequel II Binding Kit 3.2 and Sequel II sequencing kit 2.0.

Long-read assembler mini-benchmark

Given the relative infancy of assembly algorithms for PacBio CCS data of microbial genomes, we performed a mini-benchmark of five long-read de novo assemblers: Canu^78,79 (version 2.2), Flye⁸⁰ (version 2.9.2), Raven⁸¹ (version 1.8.1), Hifiasm⁸² (version 0.19.6-r595) and IPA (version 1.8.0; https://github.com/PacificBiosciences/pbipa). In this benchmark, each assembler was provided 8 processor threads on the Shark high-performance computing cluster of the Leiden University Medical Center. Shark runs on Rocky Linux 8.7, with SLURM version 23.02.7. The available processors include Intel Xeon E5-2697, E5-2690 and E5-4650. Each assembler was provided as much memory as it needed to complete the assembly. The tools exhibited clear differences in number of contigs generated, processing time and memory use (Supplementary Fig. 3). Note that sample QRD034 was sequenced much deeper than the rest and subsampled to 30% of reads ( = 277X coverage) to facilitate assembly. Contigs were taxonomically classified using the Contig Annotation Tool (CAT version 5.2.3)⁸³ to verify if they derived from R. gnavus. Canu, Flye, Hifiasm and IPA report if assembled contigs are linear or circular. From the different assemblies, we selected the assembly that yielded the longest contig and the longest total assembly length (all exceeding 3 Mb), giving Flye precedence as it provides the most extensive statistics (Supplementary Data 3). This resulted in 38 assemblies from Flye, 3 from Hifiasm, and 2 each from IPA and Raven. All contigs from selected assemblies were reoriented using dnaapler⁸⁴ (version 0.3.0) to start at the dnaA, repA or terL gene for chromosomes, plasmids and bacteriophages, respectively. Raven and Hifiasm produce assembly graphs, which were viewed to assess if contigs were linear or circular. Assemblies with a smaller secondary circular contig were analyzed with geNomad⁸⁵ (version 1.7.4) to predict the probability of it being a plasmid, using the built-in score calibration module with aggregated results from both the marker-based and neural net-based classifications.

We included two isolates derived from the strain DSMZ 108212, of which one we obtained directly from the DSMZ (QRD005) and the other was cultured at the Sanger Institute (QRD022). Assembly with Hifiasm yielded a 3.3 Mb contig and a 28 kb contig for QRD022, while QRD005 could not be resolved to less than three contigs, with the longest being 2.4Mbp. These two assemblies were not completely identical and we decided to use a reference-based assembly of the unresolved one against the 3.3 Mb contig using minimap2⁸⁶ (version 2.29) and samtools⁸⁷ consensus (version 1.19; parameters: ‘--min-MQ 5 --min-depth 10’) to generate an improved assembly of QRD005. This resulted in two contigs of 3.3 Mb and 178 bp. We manually removed the 178 bp fragment and use the single 3.3 Mb contig assembly as representative of the ‘DSMZ-108212’ = QRD005 isolate (Supplementary Data 3).

Final genome assemblies were annotated with DNA methylation information from the PacBio SMRT Link Microbial Genome Analysis platform.

Antibiotic resistance screening of isolate genomes

To assess the genotypic antibiotic resistances in isolate genomes, we screened 79 short-read genome sequences of isolates, the 45 newly generated long-read genomes, and the one reference genome for the presence of antibiotic resistance genes using ABRicate (version 0.8.13; https://www.github.com/tseemann/abricate) with NCBI’s AMRFinderPlus database (downloaded 11 November 2022, containing 5735 sequences)⁸⁸. Genes were assumed present if at least 95% of the gene matched with at least 95% identity to the gene in the database. For in vitro validation, ten isolates—five with tet tetracycline resistance genes and five without—were assessed for tetracycline minimum inhibitory concentrations (MIC at 48 h) using an ETEST (bioMérieux) on TSS medium at 37 °C in a Whitley A35 anaerobic cabinet. However, since we managed to isolate R. gnavus without the use of antibiotic selection and tetracycline resistance is also common among other human gut commensals, we did not pursue this further.

Search for previously described inflammatory factors of R. gnavus

Several R. gnavus genes have previously been associated with intestinal inflammation. We screened our collection of genomes for the presence of two superantigen genes (accession numbers WP_105084811.1 and WP_105084812.1)¹⁷, 23 genes encoding the machinery to produce a proinflammatory (glucarhamnan) polysaccharide (NZ_AAYG02000032.1)¹⁵, one tryptophane decarboxylase gene (RUMGNA_01526 from UniProt)¹⁸, and 20 genes encoding a capsule polysaccharide (RUMGNA_02411 – RUMGNA_02392 from UniProt)¹⁶. We used protein BLAST⁸⁹ (blastp; version 2.13.0) to screen the genomes for the presence of each of these genes. Only hits that covered at least half of the gene of interest (‘-qcov_hsp_perc 50’) with an E-value of 1 × 10⁻²⁰ or smaller (‘-evalue 1e-20’) were considered for further analysis. Gene clusters were considered present when all the genes were detected.

Using the same method, we also screened genomes for the presence of the bilirubin reductase gene (bilR, WP_009244284.1)⁹⁰, selenium-dependent xanthine dehydrogenase (sd-XDH, QHB24869.1)¹⁹, and the nan cluster for sialic acid metabolism (RUMGNA_02691 through RUMGNA_02701 from UniProt)²⁰. Gene operons were visualized using clinker⁹¹,

Annotation of functional pathway genes

We annotated carbohydrate-active enzymes (CAZymes) by comparing the genomes to dbCAN⁹² (version 10) using HMMer⁹³ (version 3.3.2). Within the CAZyme families, we focused on two glycosyl hydrolase families that include fucosidases, GH29 and GH95, which have been described as important for mucus utilization⁹⁴, a main feature of R. gnavus. Genomes were also annotated using KEGG-Decoder⁹⁵. Pathways for chemotaxis and flagellum biosynthesis were annotated using the KOALA definitions available online²⁴. Moreover, genomes were screened for the presence of annotated biosynthetic gene clusters (BGC) using antiSMASH⁹⁶ (version 6.1.1).

Comparison of whole genomes to find clusters of genomic variants

Whole genomes were compared to one another using average nucleotide identity (ANI) with fastANI (version 1.33)²⁶. Furthermore, genomes were subjected to a pangenome analysis using Panaroo (version 1.3.0; parameters ‘--clean-mode strict -a core --aligner mafft --core_threshold 0.95’)⁹⁷. For the pangenome, we considered genes that occur in at least 95% of genomes core genes as recommended when including MAGs⁹⁸. The core genes were concatenated and using MAFFT⁹⁹ (version 7.505) a core genome multiple sequence alignment was generated, which was automatically trimmed using trimAl¹⁰⁰ (version 1.4.1). A maximum likelihood phylogeny was inferred from the trimmed multiple alignment using IQ-tree¹⁰¹ (version 2.2.0.3), including ModelFinder Plus¹⁰² to automatically select the best fitting evolutionary model and ultrafast bootstrap (1000 replicates) to calculate branch support¹⁰³. The selected models were: short-read genomes GTR + F + I + I + R9; long-read genomes GTR + F + R7; all genomes GTR + F + R10. Trees were visualized in iTOL¹⁰⁴.

Bacterial genome-wide association study (GWAS)

To identify genes that are putatively associated with CD, we subjected genomes of R. gnavus isolates to a bacterial genome-wide association study using Hogwash (version 1.2.6; parameters: ‘fdr = 0.05, bootstrap = 0.875, grouping_method = “post-ar” ’)²⁸. Hogwash implements a more stringent version of the homoplasy-based PhyC method introduced in 2013¹⁰⁵. Hogwash reconstructs the evolutionary history of the genomes of interest using a phylogenetic tree and predicts where genotype and phenotype transitions occurred to assess where genotype and phenotype transitions coincide. We made use of the high correlation between core and accessory genome to use these two as input, together with phenotype of either CD or healthy. Genomes were assigned healthy or CD phenotype based on available metadata on health status from the person from whom the R. gnavus isolate was cultured. We included short-read sequencing isolate draft genomes as well as our in-house generated PacBio complete genomes. If multiple sequences of the same isolate existed, we deduplicated based on ANI > 99.9%. Of these duplicates, we picked the first based on alphabetic order as representative, and we preferentially select long-read-based genomes when available. This resulted in fourteen R. gnavus isolate genomes derived from CD patients and 41 from healthy people (total N = 55). We used a matrix of (accessory) gene presence and absence generated by Panaroo as input for Hogwash. As phylogenetic tree, we pruned the tree of all R. gnavus genomes inferred by IQ-tree to include only this set of 55 deduplicated genomes and midpoint rooted the tree. Associations between genotype and phenotype are evaluated both by p-value indicating statistical significance, and epsilon value, which calculates the strength of genotype-phenotype association on a 0-1 scale (Supplementary Data 6).

To further validate the genes found to be significantly associated with either a healthy or Crohn’s host phenotype, we counted the prevalence of each group of genes in both healthy-derived (n = 123) and IBD-derived MAGs (Crohn’s n = 8; ulcerative colitis n = 1; Supplementary Fig. 11). Furthermore, we visualized the prevalence of these genes among genomes, annotated by their host disease phenotype, as a heatmap to visually inspect the predicted gene associations (Supplementary Fig. 12).

Statistical analyses

All tools were run with default parameters unless stated otherwise. Statistical analyses and visualization were done in R (version 4.0.2) using RStudio (https://posit.co/). A p-value of 0.05 or smaller was considered significant. Data were visualized using the R package ggplot2 (version 3.5.0)¹⁰⁶, with the publication theme from ggembl (version 0.1.2; https://git.embl.de/grp-zeller/ggembl). Figures were polished manually using Inkscape (version 0.92.5; https://inkscape.org/).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The long-read whole-genome sequencing data generated in this study and corresponding assemblies of isolates presented in this study are available from the European Nucleotide Archive under accession number PRJEB76407. Raw metagenomic data used for additional MAG building, and the MAGs themselves, are available under accession number PRJEB44737. A complete set of short and long-read genomes together with metadata, along with processed data, is available through Zenodo, https://doi.org/10.5281/zenodo.13907031. Isolates will be made available upon request to the corresponding author (q.r.ducarmon@lumc.nl). The use of biological materials for research purposes generated in this study by Vedanta Biosciences can be made available under a material transfer agreement. Correspondence should be sent to jnorman@vedantabio.com and legal@vedantabio.com and will addressed within 2 weeks. Source data are provided with this paper.

Code availability

Scripts of both the whole-genome annotation and comparative genomics analyses, as well as further downstream and statistical analyses are available on Zenodo, https://doi.org/10.5281/zenodo.14628203. The code used to generate MAGs is also available on Zenodo (https://doi.org/10.5281/zenodo.14628195).

References

VanEvery, H., Franzosa, E. A., Nguyen, L. H. & Huttenhower, C. Microbiome epidemiology and association studies in human health. Nat. Rev. Genet. 24, 109–124 (2023).
Article CAS PubMed Google Scholar
Hall, A. B. et al. A novel Ruminococcus gnavus clade enriched in inflammatory bowel disease patients. Genome Med. 9, 103 (2017).
Article PubMed PubMed Central Google Scholar
Grahnemo, L. et al. Cross-sectional associations between the gut microbe Ruminococcus gnavus and features of the metabolic syndrome. Lancet Diab. Endocrinol. 10, 481–483 (2022).
Article Google Scholar
De Filippis, F. et al. Specific gut microbiome signatures and the associated pro-inflamatory functions are linked to pediatric allergy and acquisition of immune tolerance. Nat. Commun. 12, 5958 (2021).
Article ADS PubMed PubMed Central MATH Google Scholar
Wirbel, J., Essex, M., Forslund, S. K. & Zeller, G. Evaluation of microbiome association models under realistic and confounded conditions. bioRxiv https://doi.org/10.1101/2022.05.09.491139 (2022).
Berland, M. et al. Both disease activity and HLA-B27 status are associated with gut microbiome dysbiosis in spondyloarthritis patients. Arthritis Rheumatol. 75, 41–52 (2023).
Article CAS PubMed MATH Google Scholar
Watanabe, N. et al. Clinical and microbiological characteristics of Ruminococcus gnavus bacteremia and intra-abdominal infection. Anaerobe 85, 102818 (2024).
Article CAS PubMed MATH Google Scholar
Togo, A. H. et al. Description of Mediterraneibacter massiliensis, gen. nov., sp. nov., a new genus isolated from the gut microbiota of an obese patient and reclassification of Ruminococcus faecis, Ruminococcus lactaris, Ruminococcus torques, Ruminococcus gnavus and Clostridium glycyrrhizinilyticum as Mediterraneibacter faecis comb. nov., Mediterraneibacter lactaris comb. nov., Mediterraneibacter torques comb. nov., Mediterraneibacter gnavus comb. nov. and Mediterraneibacter glycyrrhizinilyticus comb. nov. Antonie Van. Leeuwenhoek 111, 2107–2128 (2018).
Article CAS PubMed Google Scholar
Moore, W. E. C., Johnson, J. L. & Holdeman, L. V. Emendation of bacteroidaceae and butyrivibrio and descriptions of desulfomonas gen. nov. and ten new species in the genera desulfomonas, butyrivibrio, eubacterium, clostridium, and ruminococcus. Int. J. Syst. Evolut. Microbiol. 26, 238–252 (1976).
Google Scholar
Branck, T. et al. Comprehensive profile of the companion animal gut microbiome integrating reference-based and reference-free methods. ISME Journal https://doi.org/10.1093/ismejo/wrae201 (2024).
Abdugheni, R. et al. Comparative genomics reveals extensive intra-species genetic divergence of the prevalent gut commensal Ruminococcus gnavus. Microb. Genom. 9, mgen001071 (2023).
Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).
Article CAS PubMed PubMed Central MATH Google Scholar
Kraal, L., Abubucker, S., Kota, K., Fischbach, M. A. & Mitreva, M. The prevalence of species and strains in the human microbiome: a resource for experimental efforts. PLoS One 9, e97279 (2014).
Article ADS PubMed PubMed Central MATH Google Scholar
Xu, T. et al. Microbiome features differentiating unsupervised-stratification-based clusters of patients with abnormal glycometabolism. mBio 14, e0348722 (2023).
Article PubMed Google Scholar
Henke, M. T. et al. Ruminococcus gnavus, a member of the human gut microbiome associated with Crohn’s disease, produces an inflammatory polysaccharide. Proc. Natl Acad. Sci. 116, 12672–12677 (2019).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Henke, M. T. et al. Capsular polysaccharide correlates with immune response to the human gut microbe Ruminococcus gnavus. Proc. Natl. Acad. Sci. 118, e2007595118 (2021).
Bunker, J. J. et al. B cell superantigens in the human intestinal microbiota. Science Translational Medicine 11, eaau9356 (2019).
Zhai, L. et al. Ruminococcus gnavus plays a pathogenic role in diarrhea-predominant irritable bowel syndrome by increasing serotonin biosynthesis. Cell Host Microbe 31, 33–44.e35 (2023).
Article CAS PubMed Google Scholar
Yan, Y. et al. Commensal bacteria promote azathioprine therapy failure in inflammatory bowel disease via decreasing 6-mercaptopurine bioavailability. Cell Rep. Med. 4, 101153 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Bell, A. et al. Elucidation of a sialic acid metabolism pathway in mucus-foraging Ruminococcus gnavus unravels mechanisms of bacterial adaptation to the gut. Nat. Microbiol. 4, 2393–2404 (2019).
Article PubMed PubMed Central MATH Google Scholar
Pasolli, E. et al. Accessible, curated metagenomic data through experimentHub. Nat. Methods 14, 1023–1024 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Valles-Colomer, M. et al. The person-to-person transmission landscape of the gut and oral microbiomes. Nature 614, 125–135 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Shenhav, L. et al. Microbial colonization programs are structured by breastfeeding and guide healthy respiratory development. Cell 187, 5431–5452.e5420 (2024).
Article CAS PubMed MATH Google Scholar
Tully, B. J. KEGGDecoder KOALA definitions, https://github.com/bjtully/BioData/blob/master/KEGGDecoder/KOALA_definitions.txt (2021).
Goris, J. et al. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int. J. Syst. Evolut. Microbiol. 57, 81–91 (2007).
Article CAS MATH Google Scholar
Jain, C., Rodriguez, R. L., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Article ADS PubMed PubMed Central Google Scholar
Sorbara, M. T. et al. Functional and genomic variation between human-derived isolates of lachnospiraceae reveals inter- and intra-species diversity. Cell Host Microbe 28, 134–146.e134 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Saund, K. & Snitkin, E. S. Hogwash: three methods for genome-wide association studies in bacteria. Microb. Genom. 6, mgen000469 (2020).
Collaborators, G. B. D. I. B. D. The global, regional, and national burden of inflammatory bowel disease in 195 countries and territories, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet Gastroenterol. Hepatol. 5, 17–30 (2020).
Article MATH Google Scholar
Fogarty, A. W. What have studies of non-industrialized countries told us about the cause of allergic disease? Clin. Exp. Allergy 45, 87–93 (2015).
Article CAS PubMed MATH Google Scholar
Tian, J., Zhang, D., Yao, X., Huang, Y. & Lu, Q. Global epidemiology of systemic lupus erythematosus: a comprehensive systematic analysis and modelling study. Ann. Rheum. Dis. 82, 351–356 (2023).
Article PubMed MATH Google Scholar
Gacesa, R. et al. Environmental factors shaping the gut microbiome in a Dutch population. Nature 604, 732–739 (2022).
Article ADS CAS PubMed MATH Google Scholar
Blanton, L. V. et al. Gut bacteria that prevent growth impairments transmitted by microbiota from malnourished children. Science 351 https://doi.org/10.1093/ismejo/wrae201 (2016).
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Article CAS PubMed MATH Google Scholar
Karcher, N. et al. Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations. Genome Biol. 21, 138 (2020).
Article CAS PubMed PubMed Central Google Scholar
Karcher, N. et al. Genomic diversity and ecology of human-associated Akkermansia species in the gut microbiome revealed by extensive metagenomic assembly. Genome Biol. 22, 209 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Tett, A. et al. The prevotella copri complex comprises four distinct clades underrepresented in westernized populations. Cell Host Microbe 26, 666–679.e667 (2019).
Article CAS PubMed PubMed Central Google Scholar
Blanco-Miguez, A. et al. Extension of the Segatella copri complex to 13 species with distinct large extrachromosomal elements and associations with host conditions. Cell Host Microbe 31, 1804–1819 e1809 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Meziti, A. et al. The reliability of metagenome-assembled genomes (MAGs) in representing natural populations: insights from comparing mags against isolate genomes derived from the same fecal sample. Appl. Environ. Microbiol. 87, e02593–20 (2021).
Castañeda-Barba, S., Top, E. M. & Stalder, T. Plasmids, a molecular cornerstone of antimicrobial resistance in the one health era. Nat. Rev. Microbiol. 22, 18–32 (2024).
Article PubMed Google Scholar
Zorea, A. et al. Plasmids in the human gut reveal neutral dispersal and recombination that is overpowered by inflammatory diseases. Nat. Commun. 15, 3147 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Tourlousse, D. M. et al. Characterization and demonstration of mock communities as control reagents for accurate human microbiome community measurements. Microbiol. Spectr. 10, e0191521 (2022).
Article PubMed Google Scholar
Blow, M. J. et al. The epigenomic landscape of prokaryotes. PLoS Genet. 12, e1005854 (2016).
Article PubMed PubMed Central MATH Google Scholar
Oliveira, P. H. et al. Epigenomic characterization of Clostridioides difficile finds a conserved DNA methyltransferase that mediates sporulation and pathogenesis. Nat. Microbiol. 5, 166–180 (2020).
Article CAS PubMed MATH Google Scholar
Li, J., Hsu, H. C., Mountz, J. D. & Allen, J. G. Unmasking fucosylation: from cell adhesion to immune system regulation and diseases. Cell Chem. Biol. 25, 499–512 (2018).
Article PubMed Google Scholar
Backman, T. et al. A phage tail-like bacteriocin suppresses competitors in metapopulations of pathogenic bacteria. Science 384, eado0713 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Yu, S. et al. Paneth cell-derived lysozyme defines the composition of mucolytic microbiota and the inflammatory tone of the Intestine. Immunity 53, 398–416.e398 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Bartlett, A., Padfield, D., Lear, L., Bendall, R. & Vos, M. A comprehensive list of bacterial pathogens infecting humans. Microbiology 168 https://doi.org/10.1099/mic.0.001269 (2022).
Crost, E. H., Coletto, E., Bell, A. & Juge, N. Ruminococcus gnavus: friend or foe for human health. FEMS Microbiol. Rev. 47, fuad014 (2023).
Wirbel, J. et al. Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat. Med. 25, 679–689 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wirbel, J. et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. 22, 93 (2021).
Article PubMed PubMed Central Google Scholar
Li, Z. et al. Effects of herbal dregs supplementation of Salvia miltiorrhiza and Isatidis Radix residues improved production performance and gut microbiota abundance in late-phase laying hens. Front. Vet. Sci. 11, 1381226 (2024).
Article PubMed PubMed Central Google Scholar
Xiao, H. et al. The effect of early colonized gut microbiota on the growth performance of suckling lambs. Front. Microbiol. 14, 1273444 (2023).
Article PubMed PubMed Central Google Scholar
Chen, Z. et al. Differences in meat quality between Angus cattle and Xinjiang brown cattle in association with gut microbiota and its lipid metabolism. Front. Microbiol. 13, 988984 (2022).
Article PubMed PubMed Central Google Scholar
Ruscheweyh, H. J. et al. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. Microbiome 10, 212 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nooij, S. et al. Fecal Microbiota Transplantation Influences Procarcinogenic Escherichia coli in Recipient Recurrent Clostridioides difficile Patients. Gastroenterology 161, 1218–1228.e1215 (2021).
Article CAS PubMed Google Scholar
Saheb Kashaf, S., Almeida, A., Segre, J. A. & Finn, R. D. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 16, 2520–2541 (2021).
Article CAS PubMed Google Scholar
Molder, F. et al. Sustainable data analysis with Snakemake. F1000Research 10, 33 (2021).
Article PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central MATH Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Article CAS PubMed PubMed Central Google Scholar
Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018).
Article PubMed PubMed Central Google Scholar
Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).
Article CAS PubMed Google Scholar
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).
Article PubMed PubMed Central Google Scholar
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Article CAS PubMed MATH Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, D. et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Article CAS PubMed MATH Google Scholar
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).
Article PubMed PubMed Central Google Scholar
Lopez-Siles, M. et al. Cultured representatives of two major phylogroups of human colonic Faecalibacterium prausnitzii can utilize pectin, uronic acids, and host-derived substrates for growth. Appl. Environ. Microbiol. 78, 420–428 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
von Martels, J. Z. H. et al. Riboflavin Supplementation in Patients with Crohn’s Disease [the RISE-UP study. J. Crohns. Colitis. 14, 595–607 (2020).
Article Google Scholar
Plomp, N. et al. A convenient and versatile culturomics platform to expand the human gut culturome of Lachnospiraceae and Oscillospiraceae. Benef. Microbes, 1-16 (2024)
Forster, S. C. et al. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat. Biotechnol. 37, 186–192 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Article CAS PubMed PubMed Central Google Scholar
Souvorov, A., Agarwala, R. & Lipman, D. J. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol. 19, 153 (2018).
Article PubMed PubMed Central Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schwengers, O. et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics 7, 000685 (2021).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed MATH Google Scholar
Vaser, R. & Sikic, M. Time- and memory-efficient genome assembly with Raven. Nat. Computational Sci. 1, 332–336 (2021).
Article MATH Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 20, 217 (2019).
Article Google Scholar
Bouras, G., Grigson., S., Papudeshi., B., Mallawaarachchi V., Roach, M. J. Dnaapler: A tool to reorient circular microbial genomes https://github.com/gbouras13/dnaapler (2023).
Camargo, A. P. et al. Identification of mobile genetic elements with geNomad. Nat. Biotechnol. 42, 1303–1312 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central MATH Google Scholar
Feldgarden, M. et al. AMRFinderPlus and the reference gene catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Sci. Rep. 11, 12728 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma. 10, 421 (2009).
Article MATH Google Scholar
Hall, B. et al. BilR is a gut microbial enzyme that reduces bilirubin to urobilinogen. Nat. Microbiol. 9, 173–184 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Gilchrist, C. L. M. & Chooi, Y. H. clinker & clustermap.js: automatic generation of gene cluster comparison figures. Bioinformatics 37, 2473–2475 (2021).
Article CAS PubMed MATH Google Scholar
Zhang, H. et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Eddy, S. R. Accelerated Profile HMM Searches. PLoS Computational Biol. 7, e1002195 (2011).
Article ADS MathSciNet CAS MATH Google Scholar
Berkhout, M. D., Plugge, C. M. & Belzer, C. How microbial glycosyl hydrolase activity in the gut mucosa initiates microbial cross-feeding. Glycobiology 32, 182–200 (2022).
Article CAS PubMed Google Scholar
Graham, E. D., Heidelberg, J. F. & Tully, B. J. Potential for primary productivity in a globally-distributed bacterial phototroph. ISME J. 12, 1861–1866 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29–W35 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
Article PubMed PubMed Central MATH Google Scholar
Li, T. & Yin, Y. Critical assessment of pan-genomic analysis of metagenome-assembled genomes. Briefings in Bioinformatics 23, bbac413 (2022).
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Article CAS PubMed PubMed Central Google Scholar
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article CAS PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evolution 37, 1530–1534 (2020).
Article CAS MATH Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evolution 35, 518–522 (2018).
Article CAS Google Scholar
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids. Res. 52, W78–W82 (2024).
Article PubMed PubMed Central Google Scholar
Farhat, M. R. et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 45, 1183–1189 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Wickham, H. et al. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2016).

Download references

Acknowledgements

We thank all members of the scientific community that generously provided us with R. gnavus isolates. This study was supported by the Leiden University Fund / Dr. F.F. Hofman Fonds, (www.luf.nl) to QD. Further funding was provided by the LUMC (LUMC Fellowship to G.Z.), the Health + Life Science Alliance Heidelberg Mannheim through state funds approved by the State Parliament of Baden-Württemberg (postdoctoral fellowships to Q.D. and N.K.) and EMBO postdoctoral fellowship (ALTF 1030-2022 to Q.D.). The Graduate School of the Medical Sciences of the University of Groningen provided a grant to N.P. The NDFB (S.N. and E.M.T.) received an unrestricted research grant from Vedanta Biosciences.

Author information

Authors and Affiliations

Leiden University Center for Infectious Diseases (LUCID), Leiden University Medical Center, Leiden, The Netherlands
S. Nooij, I. M. J. G. Sanders, L. Schout, E. M. Terveer, M. F. Larralde, G. F. Zeller, E. J. Kuijper, W. K. Smits & Q. R. Ducarmon
Center for Microbiome Analyses and Therapeutics, Leiden University Medical Center, Leiden, The Netherlands
S. Nooij, L. Schout, E. M. Terveer, G. F. Zeller, E. J. Kuijper, W. K. Smits & Q. R. Ducarmon
Netherlands Donor Feces Bank (NDFB), Leiden University Medical Center, Leiden, the Netherlands
S. Nooij & E. M. Terveer
Department of Medical Microbiology and Infection Prevention, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
N. Plomp & H. J. M. Harmsen
Department of Gastroenterology and Hepatology, Leiden University Medical Center, Leiden, The Netherlands
A. E. van der Meulen
Vedanta Biosciences, Inc., Cambridge, Massachusetts, USA
J. M. Norman
Molecular Systems Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
N. Karcher, G. F. Zeller & Q. R. Ducarmon
Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
R. H. A. M. Vossen & S. L. Kloet
Department of Gastroenterology and Hepatology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
K. N. Faber

Authors

S. Nooij
View author publications
Search author on:PubMed Google Scholar
N. Plomp
View author publications
Search author on:PubMed Google Scholar
I. M. J. G. Sanders
View author publications
Search author on:PubMed Google Scholar
L. Schout
View author publications
Search author on:PubMed Google Scholar
A. E. van der Meulen
View author publications
Search author on:PubMed Google Scholar
E. M. Terveer
View author publications
Search author on:PubMed Google Scholar
J. M. Norman
View author publications
Search author on:PubMed Google Scholar
N. Karcher
View author publications
Search author on:PubMed Google Scholar
M. F. Larralde
View author publications
Search author on:PubMed Google Scholar
R. H. A. M. Vossen
View author publications
Search author on:PubMed Google Scholar
S. L. Kloet
View author publications
Search author on:PubMed Google Scholar
K. N. Faber
View author publications
Search author on:PubMed Google Scholar
H. J. M. Harmsen
View author publications
Search author on:PubMed Google Scholar
G. F. Zeller
View author publications
Search author on:PubMed Google Scholar
E. J. Kuijper
View author publications
Search author on:PubMed Google Scholar
W. K. Smits
View author publications
Search author on:PubMed Google Scholar
Q. R. Ducarmon
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, E.K., W.K.S., and Q.D.; methodology, N.P., I.S., L.S., Avd.M., E.T., J.N., N.K., R.V., S.K., H.H., K.F.; investigation, S.N., N.P., R.V., I.S., L.S., M.L., and Q.D.; formal analysis, S.N. and Q.D.; writing—original draft, S.N. and Q.D.; writing—review & editing, all authors; supervision, S.K., G.Z., E.K., W.K.S. and Q.D.; funding acquisition: Q.D.

Corresponding author

Correspondence to Q. R. Ducarmon.

Ethics declarations

Competing interests

JN is an employee of Vedanta Biosciences Inc. The other authors report no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1-6

Reporting Summary

Transparent Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Nooij, S., Plomp, N., Sanders, I.M.J.G. et al. Metagenomic global survey and in-depth genomic analyses of Ruminococcus gnavus reveal differences across host lifestyle and health status. Nat Commun 16, 1182 (2025). https://doi.org/10.1038/s41467-025-56449-x

Download citation

Received: 12 July 2024
Accepted: 17 January 2025
Published: 30 January 2025
DOI: https://doi.org/10.1038/s41467-025-56449-x

This article is cited by

Unveiling the diagnostic and pro-inflammatory role of crohn’s disease: insights from 16 S-guided discovery and species-specific validation
- Yao Xu
- Runxiang Xie
- Wei Liang
BMC Gastroenterology (2025)