Ecological dynamics of Enterobacteriaceae in the human gut microbiome across global populations

Yin, Qi; da Silva, Ana C.; Zorrilla, Francisco; Almeida, Ana S.; Patil, Kiran R.; Almeida, Alexandre

doi:10.1038/s41564-024-01912-6

Download PDF

Analysis
Open access
Published: 10 January 2025

Ecological dynamics of Enterobacteriaceae in the human gut microbiome across global populations

Nature Microbiology volume 10, pages 541–553 (2025)Cite this article

37k Accesses
21 Citations
983 Altmetric
Metrics details

Subjects

Abstract

Gut bacteria from the Enterobacteriaceae family are a major cause of opportunistic infections worldwide. Given their prevalence among healthy human gut microbiomes, interspecies interactions may play a role in modulating infection resistance. Here we uncover global ecological patterns linked to Enterobacteriaceae colonization and abundance by leveraging a large-scale dataset of 12,238 public human gut metagenomes spanning 45 countries. Machine learning analyses identified a robust gut microbiome signature associated with Enterobacteriaceae colonization status, consistent across health states and geographic locations. We classified 172 gut microbial species as co-colonizers and 135 as co-excluders, revealing a genus-wide signal of colonization resistance within Faecalibacterium and strain-specific co-colonization patterns of the underexplored Faecalimonas phoceensis. Co-exclusion is linked to functions involved in short-chain fatty acid production, iron metabolism and quorum sensing, while co-colonization is linked to greater functional diversity and metabolic resemblance to Enterobacteriaceae. Our work underscores the critical role of the intestinal environment in the colonization success of gut-associated opportunistic pathogens with implications for developing non-antibiotic therapeutic strategies.

Commensal consortia decolonize Enterobacteriaceae via ecological control

Article Open access 18 September 2024

Metaproteomic portrait of the healthy human gut microbiota

Article Open access 28 June 2024

Long-term ecological and evolutionary dynamics in the gut microbiomes of carbapenemase-producing Enterobacteriaceae colonized subjects

Article Open access 15 September 2022

Main

The human gut microbiota comprises a diverse microbial community playing a fundamental role in human health. An increasing number of studies are revealing the importance of a healthy gut microbiome not only for digestion and immune regulation, but also in controlling the colonization of exogenous and opportunistic pathogens^1,2. Consequently, disruption to the human gut microbiome composition and function has been associated with numerous pathologies^3,4,5.

Although the intestinal tract is colonized by a varied community of commensal microorganisms with beneficial roles to human health, many gut microbial species also have the potential to cause disease. Species from the Enterobacteriaceae family, such as Escherichia coli and Klebsiella pneumoniae, represent opportunistic pathogens with the potential to cause severe, life-threatening infections. An overabundance of Enterobacteriaceae in the gut not only increases infection risk but has also been linked to non-communicable diseases such as Crohn’s disease⁶ and even a higher all-cause mortality⁷. Notably, multidrug- and extended-spectrum β-lactamase (ESBL)-producing Enterobacteriaceae have been classified as priority 1 pathogens by the World Health Organization. Moreover, transmission rates of ESBL-producing Enterobacteriaceae in household settings (23% for E. coli and 25% for K. pneumoniae) were found to surpass those in healthcare environments⁸. Therefore, there is a global need to control the outgrowth and transmission of Enterobacteriaceae in the human population beyond traditional antibiotic-based therapies. Microbiome-derived therapeutics utilizing beneficial species and/or functions found in the human gut represent promising alternative strategies to mitigate pathogen colonization. The most successful example of a microbiome-based therapy thus far has been in the treatment of Clostridioides difficile infection, where faecal microbiota transplantation from healthy donors to infected patients has been shown to resolve ~90% of cases⁹.

Most research studying the pathogenic potential, antimicrobial resistance and evolution of Enterobacteriaceae has focused on clinical isolates, limiting our understanding of their ecology within the surrounding intestinal microbiome. Exploring the relationship between Enterobacteriaceae and the gut microbiome may provide ecological insights into controlling the colonization of Enterobacteriaceae and ultimately reducing disease risk. Advancements in sequencing technologies and culture-independent metagenomic analyses have provided substantial improvements in our ability to characterize the composition and diversity of the human microbiome. However, previous studies investigating the relationship between Enterobacteriaceae and the gut microbiome have been limited by small sample sizes and/or have been solely based on 16S rRNA genotyping^10,11, a technique with reduced taxonomic and functional resolution. Furthermore, recent metagenomic developments have enabled the creation of a more comprehensive sequence catalogue of the human gut microbiome, comprising >200,000 genomes that include thousands of previously uncharacterized species¹². This has now opened opportunities to uncover the intricate dynamics and interactions of opportunistic pathogens within the human gut microbiome at an unprecedented resolution.

Here we perform a large-scale, high-resolution metagenomic analysis investigating the ecology of Enterobacteriaceae within the human gut microbiome across >12,000 human gut metagenomic samples distributed worldwide. We identified over 300 candidate species significantly associated with Enterobacteriaceae colonization dynamics and used gene functional analyses combined with metabolic modelling to obtain mechanistic insights into these interspecies interactions. Our results expand our understanding of the role of the microbiome and the intestinal environment in the colonization success and abundance of Enterobacteriaceae in the human gut.

Results

Distribution of Enterobacteriaceae worldwide

To perform a comprehensive global characterization of human gut microbiome signatures linked to Enterobacteriaceae colonization, we retrieved 12,238 public human gut metagenomic samples from 65 studies across 45 countries (Fig. 1 and Supplementary Table 1). Samples were collected primarily from Europe (n = 4,284, 35%) and North America (n = 3,367, 27.5%), followed by Asia (n = 2,844, 23.2%) and Africa (n = 1,024, 8.4%). The majority of metagenomic datasets were from adults (n = 8,275, 67.6%) and healthy individuals (n = 7,606, 62.2%) (Extended Data Fig. 1a).

**Fig. 1: Exploring the global ecological landscape of Enterobacteriaceae.**

Using the Unified Human Gastrointestinal Genome (UHGG) catalogue¹², we employed a mapping-based approach to accurately detect the presence and abundance of 4,612 gut microbial species (113 from the Enterobacteriaceae family) in the 12,238 global metagenomes on the basis of their level of breadth, depth and expected coverage (see Methods); the chosen thresholds were further validated with an experimental mock community (Extended Data Fig. 1b). Applying these parameters to synthetic metagenomes showed the detection limit of our metagenomic approach to be within a relative abundance of 0.003–0.01% (Extended Data Fig. 1c). The overall prevalence observed for Enterobacteriaceae was 66%, which is in line with a previous culturing-based surveillance study of Escherichia coli found in stool¹³. The distribution of Enterobacteriaceae was generally well balanced across metadata categories (Fig. 1c) and studies (Extended Data Fig. 1d), with a median prevalence of 57% across continents, 67% across age groups and 71% across different health states. The genera Escherichia, Klebsiella and Enterobacter were the most prevalent across various age groups, health states and continents, with E. coli, Klebsiella pneumoniae and Enterobacter hormaechei representing the most frequent species in their respective genera. The prevalence of E. coli was highest among African samples (88%), infant metagenomes (74%) and rheumatoid arthritis patients (96%) (Fig. 2a).

**Fig. 2: Distribution and diversity of the most prevalent Enterobacteriaceae species.**

We further investigated the co-distribution of Enterobacteriaceae species across this gut metagenomic collection to characterize patterns of polymicrobial colonization (Fig. 2b). Although most samples were found to be uniquely colonized by E. coli, we identified a statistically significant co-colonization of E. coli with K. pneumoniae predominantly among samples from Asia (observed vs expected proportion of 16.2% vs 10.9%, binomial exact test, P < 0.001). Moreover, we detected a significant co-occurrence of E. coli, K. pneumoniae and Enterobacter hormaechei primarily in samples from Africa and Oceania (observed vs expected proportion of 5.3% vs 0.7%, binomial exact test, P < 0.001). Overall, these distinct geographic patterns of Enterobacteriaceae co-colonization might be reflective of variations in environmental conditions, dietary habits, lifestyle and/or healthcare practices.

As E. coli is the most prevalent Enterobacteriaceae in the gut, we performed a dedicated analysis of the strain diversity of this species. To reduce the effect of host state and better understand the subspecies diversity circulating asymptomatically in the human population, we focused on 5,128 metagenomic samples collected from healthy adults. Through a metagenomic multilocus sequence typing (MLST)¹⁴ we identified 585 distinct E. coli sequence types (STs) (Extended Data Fig. 2a). The most prevalent known STs were classified as ST10, ST95, ST131 and ST73, which represent dominant E. coli lineages worldwide^13,15,16. However, 76.5% of detected strains belonged to unknown STs, including two STs (here labelled as 100024 and 100083) found to be among the 10 most frequent STs (Extended Data Fig. 2a). These unknown lineages were found to be overrepresented particularly among samples from Africa (Extended Data Fig. 2b). Given current reference biases towards E. coli clinical isolates and the extent of unknown subspecies diversity uncovered here, these results suggest that a substantial global diversity of E. coli remains uncharacterized.

Microbiome structure linked to colonization dynamics

Having access to a global collection of >12,000 human gut metagenomes enabled us to explore the relationship between the gut microbiome composition and Enterobacteriaceae colonization status. First, we built machine learning classifiers to distinguish samples with or without Enterobacteriaceae on the basis of the abundance and prevalence of the remaining non-Enterobacteriaceae microbiome species (Fig. 3a and Extended Data Fig. 3a). We tested three supervised learning methods (ridge regression, random forest and gradient boosting) using as outcome variables the colonization status of Enterobacteriaceae as a whole, or that of E. coli and K. pneumoniae in particular. Results across all methods and variables showed a consistently good performance (median area under the receiver operating curve, AUROC = 0.788), with gradient boosting outperforming ridge regression and random forest for the classification of Enterobacteriaceae (median AUROC = 0.812), E. coli (median AUROC = 0.797) and K. pneumoniae status (median AUROC = 0.773). Model performance was consistent between the entire metagenomic dataset and when only considering samples from healthy adults (Extended Data Fig. 3a). Given the variation in gut microbiome composition between geographic regions, we also investigated whether models were generalizable within and across different continents. Focusing on samples from healthy adults, models tested on a per continent basis had overall good performance (AUROC > 0.7 for all continents tested, Extended Data Fig. 3b), with the highest performance observed for samples from Africa. Cross-validation between continents also revealed that models specifically trained on metagenomes from Asia, Europe or North America performed well across other regions (AUROC > 0.7 in at least 2 other continents, Extended Data Fig. 3c), showing that models trained on larger sample sizes are more generalizable. Overall, these results indicate that the human gut microbiome harbours compositional differences linked to Enterobacteriaceae colonization, even across different health states and geographic locations.

**Fig. 3: Gut microbiome composition is associated with Enterobacteriaceae colonization and abundance.**

We next performed diversity analyses to investigate overall community differences relating to Enterobacteriaceae colonization and abundance (Extended Data Fig. 4). Beta diversity estimates revealed higher pairwise distance among samples with Enterobacteriaceae compared with those without (Wilcoxon rank-sum test, P < 0.0001, Extended Data Fig. 4a). These differences were independent of alpha diversity and sample read depth, as we observed a low correlation between Enterobacteriaceae abundance and Shannon diversity estimates (Pearson’s coefficient of determination, R² = 0.03, Extended Data Fig. 4b), even after subsampling to 500,000 mapped reads—the minimum depth needed for accurate beta and alpha diversity estimates¹⁷.

To identify the microbiome species associated with Enterobacteriaceae presence (co-colonizers) or absence (co-excluders), we performed a differential abundance analysis on the basis of the intersection of a generalized (ALDEx2 (ref. ¹⁸)) and mixed-effects model (MaAsLin2 (ref. ¹⁹)), while accounting for study, age group, health state, continent and read depth (see Methods for further details). In addition, beyond investigating the differential abundance of species according to Enterobacteriaceae colonization status at a binary level (presence/absence), we used a network-based approach^20,21 to model Enterobacteriaceae–microbiome co-abundance patterns. We analysed all 12,238 human gut metagenomes, as well as the subset of 5,128 samples from healthy adults to further control for microbiome differences related to age and health state. This revealed 307 prokaryotic species (12% of prevalence-filtered species) significantly associated with Enterobacteriaceae, E. coli and/or K. pneumoniae colonization and abundance (Fig. 3b, Extended Data Fig. 5a and Supplementary Table 2): 172 were identified as Enterobacteriaceae co-colonizers and 135 as Enterobacteriaceae co-excluders.

At a taxonomic level, species from the orders Lachnospirales, Oscillospirales and Bacteroidales were overrepresented among the co-excluders (Fisher’s exact test, adjusted P < 0.05). In contrast, co-colonizers were significantly associated with the orders Lactobacillales, Veillonellales and Actinomycetales. Analysis of the 1,000 most prevalent species revealed that 17 bacterial orders contained neither co-excluders nor co-colonizers (Extended Data Fig. 5b), even though two taxa in particular (RF39 and Burkholderiales) were represented by >10 species each. At a species level, 89% of the 307 candidate species showed a consistent signal across datasets (all or healthy adults only) and taxa (Enterobacteriaceae, or E. coli and K. pneumoniae individually) (Extended Data Fig. 5c), showing that the identified microbiome signatures are robust to differences in host state and analysis resolution. Species from the Faecalibacterium genus were among the strongest co-excluders (Fig. 3c and Extended Data Fig. 5d), with one uncharacterized Faecalibacterium species (Faecalibacterium sp.900539885) identified as the top antagonistic candidate. Previous studies have shown that species from the Faecalibacterium genus (for example, F. prausnitzii) carry important beneficial functions in the intestinal tract such as the production of short-chain fatty acids (SCFAs)²². This in turn has been shown to directly inhibit the growth of Enterobacteriaceae species, including E. coli and K. pneumoniae²³. Thus, the overrepresentation of Faecalibacterium species in gut microbiomes with low levels of Enterobacteriaceae might be reflective of a genus-wide mechanism of colonization resistance. Focusing on the co-colonization patterns, we found that members of the Intestinibacter, Veillonella and Enterococcus genera were the strongest candidates. E. faecalis has been previously shown to promote the growth and survival of E. coli in vitro and in vivo through the production of l-ornithine²⁴. However, the relationship between species of the Intestinibacter and Veillonella genera with Enterobacteriaceae has not been previously explored and may potentially underlie uncharacterized mechanisms associated with Enterobacteriaceae colonization success and outgrowth.

To further evaluate the clinical relevance of our findings, we compared our results with an independent study that investigated the longitudinal dynamics of carbapenemase-producing Enterobacteriaceae (CPE)¹⁰. This study tracked for up to 12 timepoints a cohort of CPE-positive individuals that were later decolonized, as well as CPE-negative household controls (n = 46 participants; 361 samples). Comparison of differentially abundant species between CPE-positive individuals and household controls showed a statistically significant overlap (χ² test, P = 0.0071) in relation to those we detected to be associated with Enterobacteriaceae in general (Extended Data Fig. 6a). However, the overlap was not significant when comparing CPE-positive to CPE-negative individuals that were previously colonized. This aligns with findings from the original CPE study¹⁰, which noted that individuals recently colonized by CPE are still undergoing microbiome recovery. Nevertheless, we suggest that the overlapping co-excluder and co-colonizer species here discovered (Extended Data Fig. 6b) are not only involved in Enterobacteriaceae colonization as a whole but may also be related to colonization of CPE lineages in particular.

Strain-specific patterns of Enterobacteriaceae colonization

Our analyses revealed a strong association between the microbiome species composition and Enterobacteriaceae colonization patterns. However, subspecies (that is, strain)-level differences could also be linked to Enterobacteriaceae–microbiome interactions. We therefore investigated strain-specific signatures of 39 gut microbiome species that were identified as either co-colonizers or co-excluders of Enterobacteriaceae among healthy adults, and that were represented by at least 10 genomes with CheckM²⁵ statistics of >90% completeness and <5% contamination within the UHGG. Using the Unified Human Gastrointestinal Protein (UHGP)¹² catalogue, we characterized the accessory genome (that is, genes detected in <90% of conspecific genomes) of these 39 selected species to identify subspecies populations associated with Enterobacteriaceae colonization. A total of 213 accessory genes were significantly associated with Enterobacteriaceae colonization status (207 positively- and 6 negatively-associated genes; adjusted P < 0.05). These were distributed across 15 of the 39 species and overrepresented in functions involved in nucleotide transport and metabolism (Fisher’s exact test, adjusted P = 7.93 × 10⁻⁵), with the majority belonging to the species Ruminococcus B gnavus and Faecalimonas phoceensis (Extended Data Fig. 7). We further investigated the phylogenetic similarity of strains with the highest number of accessory genes associated with Enterobacteriaceae (top 10% strains; Fig. 4). This revealed a much stronger population structure associated with Enterobacteriaceae co-colonization among F. phoceensis strains (permutational multivariate analysis of variance (PERMANOVA), R² = 0.74, P < 0.001) compared with R. gnavus genomes (PERMANOVA, R² = 0.02, P < 0.001; Fig. 4). Interestingly, only 2 Faecalimonas species among the 307 candidate species were identified as Enterobacteriaceae co-colonizers. In contrast to other genera such as Veillonella and Streptococcus, which harboured >10 candidate species each, these results suggest that there is a more specific, strain-level association between the diversity of Faecalimonas and Enterobacteriaceae co-colonization.

**Fig. 4: *Faecalimonas phoceensis* exhibits strain-specific co-colonization patterns.**

Co-colonizers are more functionally diverse

Complementing our taxonomic results, we analysed the functional capacity and diversity of the 245 (out of 307) non-Enterobacteriaceae species that consistently showed co-colonizing or co-excluding patterns across datasets and taxa. On the basis of the annotation of protein-coding sequences with KEGG²⁶, we found that co-colonizers exhibited a greater functional diversity compared with co-excluders (Wilcoxon rank-sum test, P = 0.0049; Fig. 5a), which was found to be independent of annotation coverage and genome quality differences (Extended Data Fig. 8a). A previous study showed that metabolic independence drives gut microbial colonization and resilience of disease-associated species, particularly under inflammatory conditions²⁷. Here we show that co-colonization of gut bacteria with Enterobacteriaceae is also associated with higher metabolic independence, even among healthy individuals. Statistical analysis of the KEGG Orthologs (KOs) associated with co-colonization or co-exclusion showed that functions involved in drug resistance and DNA regulation (for example, major facilitator superfamily transporter, 16S rRNA methyltransferase and HTH-type transcriptional regulators) are among the functions most strongly linked to co-colonization (Extended Data Fig. 8b and Supplementary Table 3). In contrast, genes related to iron metabolism and transport (for example, rubrerythrin, ferredoxin and ferrous iron transport protein B) show stronger association with colonization resistance. As iron is an essential nutrient for many human pathogens²⁸, an enrichment of iron-utilizing genes among co-excluders suggests that there is competition for iron availability between Enterobacteriaceae and co-excluder species.

**Fig. 5: Functional differences between co-excluders and co-colonizers.**

On a broader level, we characterized the main functional categories (Clustered Orthologs Groups, COGs) associated with Enterobacteriaceae co-colonization and co-exclusion (Fig. 5b). In general, functions involved in metabolism (for example, amino acids, nucleotides and inorganic ions) were enriched among co-colonizers (Fisher’s exact test, adjusted P < 0.05), which further supports the hypothesis that co-colonizers exhibit greater metabolic independence. On the other hand, co-excluders encoded a higher number of genes involved in signal transduction mechanisms, which includes functions such as sporulation, motility and quorum sensing. We also identified more genes with unknown functions among co-excluders, suggesting that they may carry uncharacterized mechanisms with potential roles in Enterobacteriaceae colonization resistance. These results were consistent even when only considering species from the Bacillota phylum, which contains the highest number of both co-excluders and co-colonizers (Extended Data Fig. 8c).

We further investigated the distribution of specialized primary metabolic pathways associated with Enterobacteriaceae colonization patterns using the gutSMASH²⁹ algorithm. We found that co-excluders were primarily enriched in metabolic gene clusters involved in the production of the three major SCFAs (acetate, propionate and butyrate), as well as in other pathways involving the Rnf complex and 2-hydroxyglutaryl-CoA dehydratases (Fig. 5c). These results reinforce the genus-wide signal we observed for Faecalibacterium, one of the notable short-chain fatty acid producers in the gut. In addition, the Rnf complex signature further supports the importance of iron among co-excluders, as the Rnf complex consists of a ferredoxin:NAD⁺ oxidoreductase involved in energy production in anaerobic bacteria³⁰. These analyses reveal that species co-occurring with Enterobacteriaceae are more functionally diverse, with iron metabolism and SCFAs potentially playing important roles in regulating gut environmental conditions and modulating Enterobacteriaceae colonization and abundance.

Gut colonization is largely driven by habitat filtering

To investigate the relationship between co-colonization patterns and interspecies metabolic interactions between Enterobacteriaceae species and the rest of the gut microbiome, we generated genome-scale metabolic models³¹ of all candidate co-colonizers and co-excluders (only considering those with a consistent signal across datasets and taxa), together with all Enterobacteriaceae species from the UHGG detected at >1% prevalence (n = 282). Using a phylogenetically adjusted quantification method³², we calculated metabolic competition and complementarity indices between all pairwise Enterobacteriaceae–microbiome combinations. Metabolic competition was calculated on the basis of the overlap between two given metabolic networks, while metabolic complementarity measured the potential of one species to utilize the metabolic output of another. Values were distributed across two main clusters based on high or low competition/complementarity indices (Extended Data Fig. 9a). However, within each cluster, there was a significant negative correlation between metabolic competition and complementarity (Cluster1: Pearson’s r = −0.85, P < 0.0001; Cluster2: Pearson’s r = −0.20, P < 0.0001). We combined both parameters to estimate a metabolic distance score and observed that co-colonizers showed a lower metabolic distance to Enterobacteriaceae species compared with co-excluders (Wilcoxon rank-sum test, P < 0.0001; Fig. 5d and Extended Data Fig. 9b). These results indicate that habitat filtering, a process that favours the coexistence of functionally similar species, is probably the main driver of colonization success and microbiome assembly among Enterobacteriaceae-associated species. In addition, metabolic comparisons were made both within and between the groups of co-excluders and co-colonizers. These analyses revealed that within-group metabolic distances were smaller than between-group differences (Wilcoxon rank-sum test, P < 0.0001; Extended Data Fig. 9c), indicating shared niche preference. Importantly, results were consistent even when simulating metabolic models under various media compositions reflecting differences in diets (Extended Data Fig. 9d). This supports a previous study showing that gut microbiome species with similar nutritional requirements tend to co-occur across individuals³³.

To further explore functional differences between co-excluders and co-colonizers inferred from metabolic models, we compared the number and types of metabolite predicted from uptake and secretion fluxes within each species population. By simulating the models under a rich gut medium (M3)³⁴, we observed that the number of metabolites predicted to be secreted was significantly higher among co-colonizers (Wilcoxon rank-sum test, P < 0.0001, Extended Data Fig. 10a), further supporting that higher functional diversity and versatility is associated with Enterobacteriaceae co-colonization. Statistical analyses identified 7 and 77 candidate metabolites from uptake or secretion fluxes, respectively, as differentially abundant between co-colonizers and co-excluders (Extended Data Fig. 10b). With regards to estimated uptake fluxes, l-serine and indole were the most significant metabolites enriched in co-colonizers and co-excluders, respectively. Interestingly, dietary l-serine has been previously described to provide a competitive fitness advantage to Enterobacteriaceae under inflammatory conditions³⁵, while indole has been shown to alleviate intestinal inflammation through modulation of the gut microbiome composition³⁶. Indole is also recognized as a signalling molecule among indole-producing bacteria, such as E. coli³⁷. Therefore, a higher uptake of indole among co-excluders could impair intercellular signal communication of Enterobacteriaceae. In terms of secretion, we observed most notably an overrepresentation of the metabolites undecaprenyl phosphate/undecaprenyl-diphosphatase and thymine among co-excluders, concomitant with a higher secretion of oxidized glutathione and β-alanine among co-colonizers. Undecaprenyl phosphate is involved in the biogenesis of the bacterial cell wall³⁸, while thymine is essential for DNA synthesis, repair and bacterial growth. In contrast, detection of oxidized glutathione among co-colonizers may be indicative of adaptation to an environment with higher oxygen tension and oxidative stress. Lastly, β-alanine, which we found to be overrepresented among co-colonizers, was previously identified as significantly increased in Crohn’s disease patients³⁹. Overall, these results highlight metabolic differences between co-excluders and co-colonizing species that may reflect differences in colonization and adaptation to distinct gut niches. This further supports that modulation of the gut environment, for instance through diet, may affect susceptibility to Enterobacteriaceae colonization.

Co-excluders encode gene families involved in quorum sensing

As the production of secondary metabolites can influence bacterial fitness and interspecies ecological interactions, we performed a dedicated analysis of biosynthetic gene clusters (BGCs) among Enterobacteriaceae co-colonizers and co-excluders using the antiSMASH⁴⁰ prediction tool. The BGCs identified as most significantly overrepresented among co-excluders belonged to a class of cyclic lactone autoinducers (Fisher’s exact test, adjusted P < 0.05; Fig. 6a). Autoinducers represent signalling molecules that play a role in bacterial communication through quorum sensing⁴¹. This is in line with the COG results which showed an enrichment of signal transduction mechanisms among co-excluders. By investigating the taxonomic distribution of all 147 autoinducer BGCs detected, we find that the majority were harboured by species from the Lachnospiraceae family (97/147, 66%). Further grouping of the BGCs by their genetic similarity (>50% nucleotide identity over >50% coverage; Supplementary Table 4) revealed 110 unique BGC families, segregated between co-excluders and co-colonizers (Fig. 6b). We compared all genes from each family to experimentally characterized BGCs in the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) database⁴² and found an overall low amino acid identity to known clusters (interquartile range, IQR = 25–30%; Fig. 6c), suggesting that these autoinducer BGCs represent uncharacterized sequences. Quorum sensing molecules have been previously implicated as a means for members of the gut microbiome to provide colonization resistance against external pathogens⁴³. Therefore, an enrichment of these autoinducer BGCs among co-excluders indicates that they could play a role in controlling the colonization and abundance of Enterobacteriaceae in the human gut.

**Fig. 6: Co-excluders harbour biosynthetic gene clusters involved in quorum sensing.**

Discussion

We performed a large, global characterization of gut microbiome signatures linked to Enterobacteriaceae colonization and abundance, revealing taxonomic and functional shifts related to co-colonization and co-exclusion. In addition to confirming previous associations (for example, negative signal of Enterobacteriaceae with SCFA producers) in a diverse large-scale cohort, our study provides several important insights. First, we reveal significant and consistent microbiome differences associated with Enterobacteriaceae colonization across health states and geographic regions, uncovering a large uncharacterized subspecies diversity of E. coli among healthy adults in Africa. Moreover, our findings suggest that species from the Faecalibacterium genus beyond the well-established F. prausnitzii may play a critical role in colonization resistance against Enterobacteriaceae. We also identified notable co-colonization patterns involving underexplored taxa such as Intestinibacter and F. phoceensis that may underlie biological mechanisms linked to Enterobacteriaceae outgrowth. Finally, we discovered that co-excluder species harbour a range of uncharacterized BGCs involved in quorum sensing that could be modulating Enterobacteriaceae abundance.

Despite representing a large metagenomic investigation of Enterobacteriaceae–microbiome dynamics, our study has some inherent limitations. Due to the compositional nature of sequencing data, we were unable to differentiate between absolute and relative abundance estimates. Metagenomic samples may show varying relative abundances of Enterobacteriaceae, yet reflect similar absolute colonization levels. In addition, it remains unclear whether co-excluders are a cause or consequence of reduced Enterobacteriaceae levels, as species such as E. coli could potentially inhibit co-excluder growth through mechanisms such as antimicrobial production. Lastly, although we analysed metagenomic data from 45 countries, many regions of the world, particularly South America and Africa, remain undersampled.

A recent study of two large-scale population cohorts identified that baseline gut microbiome composition is an important risk factor for infection-related hospital admission⁴⁴. In line with our results, lower relative abundances of Veillonella and higher abundances of butyrate producers were associated with protection against hospitalization. These findings suggest that some of the associations here described might not only be correlated with Enterobacteriaceae colonization, but could also be predictive of distinct health outcomes and overall infection risk.

Previous studies in animal models^{45,46,47,48,49} and perturbed microbiomes⁵⁰ have described competitive interactions among Enterobacteriaceae species, including between E. coli strains and other Enterobacteriaceae species or between members of the Klebsiella genus specifically. However, in our study we detected a significant co-occurrence of species that are predicted to metabolically compete, including those within the Enterobacteriaceae family. Therefore, our results suggest that habitat filtering has a stronger effect on Enterobacteriaceae colonization success in the human gut compared with that observed in animal models, where the impact of direct interspecies competition may be more pronounced. In addition, most of our findings were inferred from healthy populations, which exhibit substantial microbiome differences compared with disturbed microbiomes such as those undergoing transplantation or antibiotic treatment. Altogether, these data indicate that gut environmental conditions, for instance mediated by diet, play a key role in the risk of Enterobacteriaceae outgrowth. Designing large-scale longitudinal and/or interventional studies combining metagenomics, metatranscriptomics and metabolomics to test the role of diet, medication and the environment in Enterobacteriaceae colonization and abundance represent promising future directions.

In summary, our research provides important insights into the biological role of the human gut microbiome in the colonization success of Enterobacteriaceae in the human gut. With the global rise of multidrug-resistant Enterobacteriaceae and the negative health outcomes associated with Enterobacteriaceae outgrowth, our findings could guide future research towards developing microbiome-based therapeutic strategies.

Methods

Human gut metagenomic datasets

We compiled 12,238 human gut metagenomic samples available in the European Nucleotide Archive (ENA)⁵¹, encompassing 65 different studies from 45 countries (Supplementary Table 1). Samples were selected on the basis of the following criteria: (1) containing at least 500,000 paired-end metagenomic reads; (2) with available metadata on health state, age group and country of origin; (3) from individuals with no diagnosed acute infections; and (4) no reported antibiotic usage in the previous month. Metagenomic datasets were first downloaded from ENA using fastq-dl v.2.0.4 (https://github.com/rpetit3/fastq-dl) and further quality-filtered with TrimGalore (v.0.6.0)⁵². Human contamination was removed by aligning the reads using BWA MEM (v.0.7.16a-r1181)⁵³ against human genome GRCh38. The custom pipeline used for downloading and quality-filtering the human gut metagenomic samples can be found in GitHub at https://github.com/alexmsalmeida/metagen-fetch.

Species prevalence and abundance

Quality-filtered metagenomes were mapped and quantified against the UHGG (v.1.0) catalogue¹². Before read mapping, genomes from the UHGG were curated to filter those matching all of the following criteria: (1) singletons (that is, species represented by only one genome); (2) <90% completeness based on CheckM (v.1.0.11)²⁵; and (3) classified by GUNC (v.1.0.3)⁵⁴ as chimaeric (‘clade_separation_score’ >0.45, ‘contamination_portion’ >0.05 and ‘reference_representation_score’ >0.5). This removed 32 species, resulting in a total of 4,612 species representatives. In addition, UHGG species representatives were taxonomically reclassified using GTDB-Tk (v.2.3.2)⁵⁵ (database release 214). The final curated database was indexed using BWA v.0.7.16a-r1181 (‘bwa index’), and metagenomic sequence reads were mapped using ‘bwa mem’. Aligned reads were filtered using Samtools (v.1.9)⁵⁶ to solely keep alignments where >60% of the read matched with >90% identity against any species representative. Breadth of coverage, depth of coverage, total read counts and counts of uniquely mapped reads were calculated per sample for each species in the reference database. In addition, to account for differences in sample sequencing depth, an expected breadth of coverage (E) per sample per species was calculated using a previously established formula⁵⁷:

$$E=1\mbox{--}{e}^{\mbox{--}0.883D}$$

(1)

where D corresponds to the average depth of coverage of the genome. Each species was considered present in a sample if the ratio between the breadth of coverage and the expected breadth of coverage was >30% (given that genomes from the UHGG were originally clustered using a 30% aligned fraction) and if the breadth of coverage was >5% (to account for the presence of metagenome-assembled genomes, MAGs, with up to 5% contamination). Read counts were transformed to 0 when a species was considered absent. Parameters of genome coverage and coverage ratio were validated with the ZymoBIOMICS Microbial Community Standard. In addition, we generated synthetic metagenomic communities to establish the detection limit of our metagenomic analysis approach. Briefly, metagenomes containing the 50 most prevalent species detected in our dataset at equal relative abundances were simulated with ‘wgsim’ in Samtools⁵⁶. Thereafter, we spiked-in one Enterobacteriaceae species at various known relative abundances (from 0.0001% to 1% at 8 intervals) and processed the sample with the mapping approach described above. Each analysis was repeated for five Enterobacteriaceae species representing the top five most prevalent genera (E. coli, K. pneumoniae, E. hormaechei, Citrobacter freundii and Kluyvera ascorbata) and three levels of sequencing depth based on the distribution observed for our dataset: low depth (first quartile) = 13 million reads; medium depth (median) = 31 million reads; high depth (third quartile) = 51 million reads. This represented a total of 120 synthetic communities. The detection limit was defined as the minimum relative abundance at which all five species were detected. The custom read mapping and species quantification workflow is publicly available as a snakemake⁵⁸ pipeline in GitHub at https://github.com/alexmsalmeida/metamap.

To account for study-specific batch effects, species abundances were subsequently corrected using a conditional quantile algorithm implemented in ConQuR (v.1.2.0)⁵⁹. Each of the 65 studies was tested as a reference in the batch correction process, and we ultimately used study ERP111320 as the final reference, which yielded the lowest study effect size (R² = 0.068) based on a PERMANOVA.

Microbiome community differences

To evaluate the potential of the gut microbial community structure in classifying the colonization status of Enterobacteriaceae across all 12,238 metagenomes, we tested three supervised machine learning algorithms (ridge regression, random forest and gradient boosting) using a custom workflow (https://github.com/alexmsalmeida/ml-microbiome) derived from the Mikropml⁶⁰ R package. The abundance of each filtered microbiome species was transformed into centred-log ratios (clr) and used as features in each machine learning model. Features were further pre-processed to exclude those with prevalence <1% and with zero variance. Model training and hyperparameter tuning was performed on 80% of the data using a 5-fold cross-validation, while the other 20% was used for testing with the best hyperparameter setting. The whole procedure was then repeated 10 times with independent seeds. We performed the analysis using, as outcome variable, the presence or absence of any Enterobacteriaceae species, or of E. coli and K. pneumoniae in particular. The study source was used as a grouping factor to ensure that samples from the same study were kept together in either the training or testing dataset. Model performance was evaluated using the AUROC.

Diversity estimates were inferred from the species abundance data. Alpha diversity was calculated using the Shannon index in the vegan⁶¹ R package, on the basis of the number of unique read counts mapped to each species per sample. Beta diversity between samples was estimated from the Aitchison distance of the clr-transformed species abundances. Correlation between alpha diversity and Enterobacteriaceae abundance was assessed with the Pearson’s coefficient of determination (R²), while differences in pairwise beta diversity were calculated using a two-sided Wilcoxon rank-sum test. To further account for differences in sequencing depth, results were confirmed by rarefying the number of mapped reads to 500,000.

Identification of candidate species

To identify differentially abundant microbiome species associated with the presence/absence of Enterobacteriaceae (colonization status), we utilized a combination of generalized (ALDEx2 (v.1.32.0)¹⁸) and mixed-effects (MaAsLin2 (v.1.14.1)¹⁹) models. To be able to appropriately control for potential confounders, the differential analyses spanned two datasets: (1) a complete collection of all 12,238 metagenomes with metadata including age group, health state (that is, specific disease name or ‘Healthy’), continent and study, which were incorporated as model covariates; and (2) a subset of 5,128 metagenomes exclusively obtained from healthy adults, controlled for both study and continent. In both datasets, read counts were included as an additional covariate to account for differences in sequencing depth between samples. Analyses were conducted using Enterobacteriaceae, E. coli or K. pneumoniae presence/absence as outcome variables. For ALDEx2, we used the ‘aldex.clr’ function followed by ‘aldex.glm’ to model abundance differences after a centred-log ratio transformation. For MaAsLin2, we used a log transformation combined with relative abundance (total sum scaling) normalization and accounting for genome length. A minimum prevalence threshold of 1% was used to filter out rare species from both analyses. P values were corrected for multiple testing using the Benjamini–Hochberg method and only species with a false discovery rate (FDR) < 5% were considered significant.

We used FastSpar (v.1.0)²¹ (a C++ implementation of SparCC²⁰) to generate correlation networks between Enterobacteriaceae species abundances and the abundances of other microbiome members found at >1% prevalence. As above, batch-corrected read counts were used as species abundances. Exact P values were calculated from 1,000 bootstrap correlations and corrected for multiple testing using the Benjamini–Hochberg method. Only correlations with an FDR < 5% were kept. To select the final candidate species, only those that were statistically significant across ALDEx2, MaAsLin2 and FastSpar were chosen. On the basis of the overall direction of the signal (positive or negative), species were classified as either Enterobacteriaceae co-colonizers or co-excluders.

Results were further compared with the dataset of ref. ¹⁰ that specifically investigated the longitudinal dynamics of carbapenemase-producing Enterobacteriaceae (CPE). A total of 361 samples were processed from study ERP133829 and mapped to the UHGG as described above. To identify differentially abundant species while accounting for the longitudinal study design, we used the mixed-effects modelling implemented in MaAsLin2. The CPE status (CPE-positive, CPE-negative control or CPE-negative index) was used as a fixed effect, and the individual participant was used as a random effect. Samples from individuals who received antibiotics or were hospitalized since their last visit were excluded from the analysis. As above, the minimum prevalence threshold was set at 1% and statistical significance was determined with an FDR < 5%. The overlap with the species identified as Enterobacteriaceae co-excluders or co-colonizers was inferred with a χ² test using four estimates: (1) number of species associated with CPE and Enterobacteriaceae; (2) number of species associated only with CPE; (3) number of species associated with Enterobacteriaceae but not CPE; and (4) number of species not associated with either.

Strain-specific colonization patterns

A metagenomic multilocus sequence typing (metaMLST¹⁴) analysis was performed to characterize E. coli subspecies diversity. Metagenomic reads from 5,128 healthy adults were aligned using bowtie2 (v.2.5.3)⁶² (option ‘–very-sensitive-local’) against the 7 housekeeping genes from the Achtman E. coli MLST scheme (containing 13,253 STs as of April 2024). Mapping results were further processed using the ‘metamlst.py’ (option ‘–min_accuracy 0.5’) and ‘metamlst-merge.py’ (option ‘-z 10’) Python scripts in the metaMLST (v.1.2.3) tool. Results were further filtered to only include STs with a confidence score >90. A minimum spanning tree was built using the igraph⁶³ R package on the basis of the Euclidean distances of the ST allelic profiles detected.

We further explored the strain-specific genetic diversity of species identified as either co-colonizers or co-excluders of Enterobacteriaceae among healthy adult samples. To account for differences in genome quality and overall number of strains, only gut species with at least 10 genomes with >90% completeness and <5% contamination represented within the UHGG database were chosen. We extracted all accessory genes (<90% prevalence) exclusive to each candidate species from the UHGP catalogue¹² clustered at 90% amino acid identity (UHGP-90). In parallel, we generated metagenome assemblies of the 5,128 samples from healthy adults using MEGAHIT (v.1.2.9)⁶⁴ (option ‘–min-contig-len 500’), followed by protein prediction with Prodigal (v.2.6.3)⁶⁵ (option ‘-p meta’). Thereafter, we used DIAMOND (v.2.1.8)⁶⁶ (function ‘blastp’) to align all extracted UHGP genes against the predicted proteins of the healthy adult metagenomes (using thresholds of 90% amino acid identity and 80% coverage of the shortest sequence). This resulted in a binary (presence/absence) matrix of all genes across all metagenomic samples. Subsequently, a generalized linear model (glm R function, family = ‘binomial’) was employed to investigate the association of the accessory genes with Enterobacteriaceae colonization status (presence/absence), using ‘Study’ as a covariate. P values were corrected for multiple testing using the Benjamini–Hochberg method and only genes with an FDR < 5% were considered significant.

On the basis of the number of significant genes identified, two species (Ruminococcus B gnavus and Faecalimonas phoceensis) were chosen for further phylogenetic analysis. After extracting all conspecific genomes (>90% complete) available in the UHGG, we used Panaroo (v.1.3.3)⁶⁷ to perform a species-specific core genome alignment (options: ‘–clean-mode strict –remove-invalid-genes -c 0.9 -f 0.5 –merge_paralogs –core_threshold 0.9 -a core’). FastTree (v.2.1.11)⁶⁸ was subsequently used to build the phylogenetic tree, which was visualized with the interactive Tree of Life (iTOL) (v.6)⁶⁹ online tool. A PERMANOVA was used to assess the association between the cophenetic distances in the phylogeny and the number of significant accessory genes per genome.

Genome functional analyses

Functional analyses were performed on candidate microbiomes with a consistent positive or negative signal in relation to Enterobacteriaceae colonization and abundance. First, protein-coding sequences were predicted from the genome assemblies using Prokka (v.1.14.16)⁷⁰. Thereafter, we used a custom workflow (https://github.com/alexmsalmeida/genofan) to comprehensively annotate each genome using eggNOG-mapper (v.2.1.3)⁷¹, dbCAN2 (v.2.0.11)⁷², KOFam²⁶ (release 2021-11), gutSMASH (v.1.0)²⁹ and antiSMASH (v.6.0.1)⁴⁰. The diversity of KEGG Orthologs (KOs) among co-excluders and co-colonizers was derived from the KOFam results and further assessed using the Shannon diversity index implemented in the vegan R package. A generalized linear model was used to identify KOs significantly associated with co-exclusion or co-colonization (glm R function, family = ‘binomial’) while controlling for genome type (MAG or isolate) as a covariate. COGs were extracted from the eggNOG results and compared using a two-tailed Wilcoxon rank-sum test. Lastly, the proportions of gene clusters, retrieved using gutSMASH and antiSMASH, between co-colonizers and co-excluders were evaluated using Fisher’s exact test. In all statistical analyses, exact P values were corrected for multiple testing and filtered on the basis of an FDR < 5%. BGCs belonging to the class of cyclic lactone autoinducers were further analysed using an all-against-all blast analysis, and a network was built using the igraph R package, linking nodes (BGCs) with >50 nucleotide identity over >50% coverage. Sequence identity to known clusters from the MIBiG database was derived using the ‘KnownClusterBlast’ algorithm implemented in antiSMASH.

Genome-scale metabolic models were generated for each genome using CarveMe (v.1.5.2)³¹ with default parameters and gap-filled with gut-specific rich media (M3)³⁴. From the reconstructed models, we estimated metabolic competition and complementarity indices between all candidate species and all Enterobacteriaceae species detected at >1% prevalence using PhyloMint (v.0.1.0)³². Given the negative correlation observed between both measures, we also calculated a combined metabolic distance score (1 − (competition index − complementarity index)). Results were further confirmed using flux balance analysis simulation of metabolic models under defined gut media (M1)³⁴ supplemented with various diet-based media available in the Virtual Metabolic Human database⁷³. The number and types of metabolites predicted from secreted or uptake fluxes among co-colonizers and co-excluders were derived using the COBRApy (v.0.29)⁷⁴ Python package and further compared using a generalized linear model as described above for the identification of candidate KOs.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All the metagenomic datasets used in this study are publicly available in the European Nucleotide Archive (see Supplementary Table 1 for all associated accession codes). The sequence databases used were retrieved from the Unified Human Gastrointestinal Genome (UHGG) catalog v.1.0 and the Unified Human Gastrointestinal Protein (UHGP-90) catalog v.1.0. Abundance data estimated for the UHGG species and all metagenomic samples here included are available in figshare at https://doi.org/10.6084/m9.figshare.27044341.v1 (ref. ⁷⁵). FASTA files of the BGCs detected with antiSMASH for all co-excluders and co-colonizers can be accessed in figshare at https://doi.org/10.6084/m9.figshare.27044335.v1 (ref. ⁷⁶). Accession code of the human reference genome used for decontamination (GRCh38) is GCA_000001405.15.

Code availability

Custom scripts and pipelines used in this work are publicly available in GitHub at https://github.com/microfundiv-lab/EnteroEco (ref. ⁷⁷).

References

Pickard, J. M., Zeng, M. Y., Caruso, R. & Núñez, G. Gut microbiota: role in pathogen colonization, immune responses, and inflammatory disease. Immunol. Rev. 279, 70–89 (2017).
CAS PubMed PubMed Central Google Scholar
Leshem, A., Liwinski, T. & Elinav, E. Immune–microbiota interplay and colonization resistance in infection. Mol. Cell 78, 597–613 (2020).
CAS PubMed Google Scholar
Hou, K. et al. Microbiota in health and diseases. Signal Transduct. Target. Ther. 7, 135 (2022).
PubMed PubMed Central Google Scholar
Duvallet, C., Gibbons, S. M., Gurry, T., Irizarry, R. A. & Alm, E. J. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat. Commun. 8, 1784 (2017).
PubMed PubMed Central Google Scholar
Armour, C. R., Nayfach, S., Pollard, K. S. & Sharpton, T. J. A metagenomic meta-analysis reveals functional signatures of health and disease in the human gut microbiome. mSystems 4, e00332-18 (2019).
PubMed PubMed Central Google Scholar
Khorsand, B. et al. Overrepresentation of Enterobacteriaceae and Escherichia coli is the major gut microbiome signature in Crohn’s disease and ulcerative colitis; a comprehensive metagenomic analysis of IBDMDB datasets. Front. Cell. Infect. Microbiol. 12, 1015890 (2022).
CAS PubMed PubMed Central Google Scholar
Salosensaari, A. et al. Taxonomic signatures of cause-specific mortality risk in human gut microbiome. Nat. Commun. 12, 2671 (2021).
CAS PubMed PubMed Central Google Scholar
Hilty, M. et al. Transmission dynamics of extended-spectrum β-lactamase-producing Enterobacteriaceae in the tertiary care hospital and the household setting. Clin. Infect. Dis. 55, 967–975 (2012).
PubMed PubMed Central Google Scholar
Bakken, J. S. et al. Treating Clostridium difficile infection with fecal microbiota transplantation. Clin. Gastroenterol. Hepatol. 9, 1044–1049 (2011).
PubMed PubMed Central Google Scholar
Kang, J. T. L. et al. Long-term ecological and evolutionary dynamics in the gut microbiomes of carbapenemase-producing Enterobacteriaceae colonized subjects. Nat. Microbiol. 7, 1516–1524 (2022).
CAS PubMed PubMed Central Google Scholar
Korach-Rechtman, H. et al. Intestinal dysbiosis in carriers of carbapenem-resistant Enterobacteriaceae. mSphere 5, e00173-20 (2020).
PubMed PubMed Central Google Scholar
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
CAS PubMed Google Scholar
Ludden, C. et al. Defining nosocomial transmission of Escherichia coli and antimicrobial resistance genes: a genomic surveillance study. Lancet Microbe 2, e472–e480 (2021).
CAS PubMed PubMed Central Google Scholar
Zolfo, M., Tett, A., Jousson, O., Donati, C. & Segata, N. MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples. Nucleic Acids Res. 45, e7 (2017).
PubMed Google Scholar
Doumith, M. et al. Rapid identification of major Escherichia coli sequence types causing urinary tract and bloodstream infections. J. Clin. Microbiol. 53, 160–166 (2015).
CAS PubMed Google Scholar
Matsui, Y. et al. Multilocus sequence typing of Escherichia coli isolates from urinary tract infection patients and from fecal samples of healthy subjects in a college community. Microbiologyopen 9, 1225–1233 (2020).
CAS PubMed Google Scholar
Hillmann, B. et al. Evaluating the information content of shallow shotgun metagenomics. mSystems 3, e00069-18 (2018).
CAS PubMed PubMed Central Google Scholar
Fernandes, A. D., Macklaim, J. M., Linn, T. G., Reid, G. & Gloor, G. B. ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-seq. PLoS ONE 8, e67019 (2013).
CAS PubMed PubMed Central Google Scholar
Mallick, H. et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Comput. Biol. 17, e1009442 (2021).
CAS PubMed PubMed Central Google Scholar
Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, e1002687 (2012).
CAS PubMed PubMed Central Google Scholar
Watts, S. C., Ritchie, S. C., Inouye, M. & Holt, K. E. FastSpar: rapid and scalable correlation estimation for compositional data. Bioinformatics 35, 1064–1066 (2019).
CAS PubMed Google Scholar
Louis, P. & Flint, H. J. Diversity, metabolism and microbial ecology of butyrate-producing bacteria from the human large intestine. FEMS Microbiol. Lett. 294, 1–8 (2009).
CAS PubMed Google Scholar
Chang, K. C., Nagarajan, N. & Gan, Y.-H. Short-chain fatty acids of various lengths differentially inhibit Klebsiella pneumoniae and Enterobacteriaceae species. mSphere 9, e00781-23 (2024).
PubMed PubMed Central Google Scholar
Keogh, D. et al. Enterococcal metabolite cues facilitate interspecies niche modulation and polymicrobial infection. Cell Host Microbe 20, 493–503 (2016).
CAS PubMed PubMed Central Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
CAS PubMed PubMed Central Google Scholar
Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
CAS PubMed Google Scholar
Watson, A. R. et al. Metabolic independence drives gut microbial colonization and resilience in health and disease. Genome Biol. 24, 78 (2023).
PubMed PubMed Central Google Scholar
Cassat, J. E. & Skaar, E. P. Iron in infection and immunity. Cell Host Microbe 13, 509–519 (2013).
CAS PubMed PubMed Central Google Scholar
Pascal Andreu, V. et al. gutSMASH predicts specialized primary metabolic pathways from the human gut microbiota. Nat. Biotechnol. 41, 1416–1423 (2023).
CAS PubMed Google Scholar
Westphal, L., Wiechmann, A., Baker, J., Minton, N. P. & Müller, V. The Rnf complex is an energy-coupled transhydrogenase essential to reversibly link cellular NADH and ferredoxin pools in the acetogen Acetobacterium woodii. J. Bacteriol. 200, e00357-18 (2018).
CAS PubMed PubMed Central Google Scholar
Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 46, 7542–7553 (2018).
CAS PubMed PubMed Central Google Scholar
Lam, T. J., Stamboulian, M., Han, W. & Ye, Y. Model-based and phylogenetically adjusted quantification of metabolic interaction between microbial species. PLoS Comput. Biol. 16, e1007951 (2020).
CAS PubMed PubMed Central Google Scholar
Levy, R. & Borenstein, E. Metabolic modeling of species interaction in the human microbiome elucidates community-level assembly rules. Proc. Natl Acad. Sci. USA 110, 12804–12809 (2013).
CAS PubMed PubMed Central Google Scholar
Tramontano, M. et al. Nutritional preferences of human gut bacteria reveal their metabolic idiosyncrasies. Nat. Microbiol. 3, 514–522 (2018).
CAS PubMed Google Scholar
Kitamoto, S. et al. Dietary l-serine confers a competitive fitness advantage to Enterobacteriaceae in the inflamed gut. Nat. Microbiol. 5, 116 (2020).
CAS PubMed Google Scholar
Wang, G. et al. Microbiota-derived indoles alleviate intestinal inflammation and modulate microbiome by microbial cross-feeding. Microbiome 12, 59 (2024).
PubMed PubMed Central Google Scholar
Lee, J. H. & Lee, J. Indole as an intercellular signal in microbial communities. FEMS Microbiol. Rev. 34, 426–444 (2010).
CAS PubMed Google Scholar
Workman, S. D. & Strynadka, N. C. J. A slippery scaffold: synthesis and recycling of the bacterial cell wall carrier lipid. J. Mol. Biol. 432, 4964–4982 (2020).
CAS PubMed Google Scholar
Santoru, M. L. et al. Cross sectional evaluation of the gut-microbiome metabolome axis in an Italian cohort of IBD patients. Sci. Rep. 7, 9523 (2017).
PubMed PubMed Central Google Scholar
Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29–W35 (2021).
CAS PubMed PubMed Central Google Scholar
Miller, M. B. & Bassler, B. L. Quorum sensing in bacteria. Annu. Rev. Microbiol. 55, 165–199 (2001).
CAS PubMed Google Scholar
Kautsar, S. A. et al. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res. 48, D454–D458 (2020).
PubMed Google Scholar
Hsiao, A. et al. Members of the human gut microbiota involved in recovery from Vibrio cholerae infection. Nature 515, 423–426 (2014).
CAS PubMed PubMed Central Google Scholar
Kullberg, R. F. J. et al. Association between butyrate-producing gut bacteria and the risk of infectious disease hospitalisation: results from two observational, population-based microbiome studies. Lancet Microbe 5, 100864 (2024).
PubMed Google Scholar
Eberl, C. et al. E. coli enhance colonization resistance against Salmonella Typhimurium by competing for galactitol, a context-dependent limiting carbon source. Cell Host Microbe 29, 1680–1692.e7 (2021).
CAS PubMed Google Scholar
Sassone-Corsi, M. et al. Microcins mediate competition among Enterobacteriaceae in the inflamed gut. Nature 540, 280–283 (2016).
CAS PubMed PubMed Central Google Scholar
Oliveira, R. A. et al. Klebsiella michiganensis transmission enhances resistance to Enterobacteriaceae gut invasion by nutrition competition. Nat. Microbiol. 5, 630–641 (2020).
CAS PubMed Google Scholar
Osbelt, L. et al. Klebsiella oxytoca causes colonization resistance against multidrug-resistant K. pneumoniae in the gut via cooperative carbohydrate competition. Cell Host Microbe 29, 1663–1679.e7 (2021).
CAS PubMed Google Scholar
Furuichi, M. et al. Commensal consortia decolonize Enterobacteriaceae via ecological control. Nature https://doi.org/10.1038/s41586-024-07960-6 (2024).
Schluter, J. et al. The TaxUMAP atlas: efficient display of large clinical microbiome data reveals ecological competition in protection against bacteremia. Cell Host Microbe 31, 1126–1139.e6 (2023).
CAS PubMed PubMed Central Google Scholar
Yuan, D. et al. The European Nucleotide Archive in 2023. Nucleic Acids Res. 52, D92–D97 (2024).
CAS PubMed Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754 (2009).
CAS PubMed PubMed Central Google Scholar
Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 22, 178 (2021).
CAS PubMed PubMed Central Google Scholar
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).
PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat. Biotechnol. 39, 727–736 (2021).
CAS PubMed PubMed Central Google Scholar
Köster, J. et al. Sustainable data analysis with Snakemake. F1000Res. 10, 33 (2021).
PubMed PubMed Central Google Scholar
Ling, W. et al. Batch effects removal for microbiome data via conditional quantile regression. Nat. Commun. 13, 5418 (2022).
CAS PubMed PubMed Central Google Scholar
Topçuoğlu, B. D., Lesniak, N. A., Ruffin, M. T., Wiens, J. & Schloss, P. D. A framework for effective application of machine learning to microbiome-based classification problems. mBio 11, e00434-20 (2020).
PubMed PubMed Central Google Scholar
Oksanen, J. et al. Vegan: community ecology package. R package version 2.6-4. https://doi.org/10.32614/CRAN.PACKAGE.VEGAN (2022).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
CAS PubMed PubMed Central Google Scholar
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJ. Complex Syst. 1695, 1–9 (2006).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
CAS PubMed Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
CAS PubMed Google Scholar
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
PubMed PubMed Central Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256–W259 (2019).
CAS PubMed PubMed Central Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
CAS PubMed Google Scholar
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
CAS PubMed PubMed Central Google Scholar
Zhang, H. et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 46, W95–W101 (2018).
CAS PubMed PubMed Central Google Scholar
Noronha, A. et al. The Virtual Metabolic Human database: integrating human and gut microbiome metabolism with nutrition and disease. Nucleic Acids Res. https://doi.org/10.1093/nar/gky992 (2018).
Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 7, 74 (2013).
PubMed PubMed Central Google Scholar
Almeida, A. UHGG abundance read counts. figshare https://doi.org/10.6084/m9.figshare.27044341.v1 (2024).
Almeida, A. Antismash BGCs. figshare https://doi.org/10.6084/m9.figshare.27044335.v1 (2024).
Yin, Q. & Almeida, A. EnteroEco. GitHub https://github.com/microfundiv-lab/EnteroEco (2024).

Download references

Acknowledgements

We thank J. Parkhill and all members of the Microbiome Function and Diversity group for helpful feedback and suggestions; and all authors who collected and made the gut metagenomic datasets used in this study publicly available. Funding was provided by a Career Development Award from the Medical Research Council (MR/W016184/1) to A.A.; and 2021.02791.CEECIND (Fundação para a Ciência e Tecnologia) to A.S.A. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
Qi Yin, Ana C. da Silva & Alexandre Almeida
College of Public Health, Chongqing Medical University, Chongqing, China
Qi Yin
Medical Research Council Toxicology Unit, University of Cambridge, Cambridge, UK
Francisco Zorrilla & Kiran R. Patil
GIMM - Gulbenkian Institute for Molecular Medicine, Lisbon, Portugal
Ana S. Almeida
Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
Ana S. Almeida

Authors

Qi Yin
View author publications
Search author on:PubMed Google Scholar
Ana C. da Silva
View author publications
Search author on:PubMed Google Scholar
Francisco Zorrilla
View author publications
Search author on:PubMed Google Scholar
Ana S. Almeida
View author publications
Search author on:PubMed Google Scholar
Kiran R. Patil
View author publications
Search author on:PubMed Google Scholar
Alexandre Almeida
View author publications
Search author on:PubMed Google Scholar

Contributions

Q.Y. and A.A. performed the metagenomic analyses and wrote the paper. A.C.d.S. helped in the curation of the sample metadata. F.Z. and K.R.P. assisted with the genome-scale metabolic modelling. A.S.A. processed and provided the mock community samples. A.A. supervised the work and provided funding. All authors read, edited and approved the paper.

Corresponding author

Correspondence to Alexandre Almeida.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks Sean Gibbons, Till Strowig and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Sample distribution and mapping quality control.

a, Distribution of age groups, health states and continents of the 12,238 gut metagenomic samples. b, Comparison of taxonomic profiles and abundances of three mock community samples in relation to their expected proportions, estimated using the read mapping filtering parameters used in this study. c, Detection limit of our metagenomic approach evaluated with 120 synthetic metagenomics consisting of the top 50 most prevalent gut species and one Enterobacteriaceae species at a defined abundance across three levels of sequencing depth. Horizontal dashed line represents the minimum relative abundance at which the five Enterobacteriaceae species tested were detected. Abundance values are log-scaled. d, Two-sided Pearson correlation between the number of samples with or without Enterobacteriaceae across the 65 studies. Error band represents the 95% confidence interval.

Extended Data Fig. 2 Strain diversity of Escherichia coli in the human gut microbiome among healthy adults.

a, Minimum spanning tree of the E. coli sequence types (STs) detected across 5,128 human gut metagenomes from healthy adults. The most prevalent STs are labelled next to their respective nodes (ST100024 and ST100083 represent unknown STs). b, Geographical distribution of samples containing known or unknown STs.

Extended Data Fig. 3 Machine learning models to classify Enterobacteriaceae colonization status.

a, Area Under the ROC Curve (AUROC) performance results of different machine learning methods, datasets and outcome variables (taxa) relating the gut microbiome composition with Enterobacteriaceae colonization status (n = 10 independent seeds per analysis). Box lengths represent the IQR of the data, the central line represents the median value, and the whiskers depict the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively. b, ROC curve of the machine learning results linking the gut microbiome composition with Enterobacteriaceae status. AUROC values represent the median of gradient boosting models across 10 independent seeds, stratified by continent and only considering samples from healthy adults c, All-against-all performance results comparing models trained and tested using microbiome samples across different continents. All models were generated with the gradient boosting algorithm using samples from healthy adults only to classify Enterobacteriaceae colonization status.

Extended Data Fig. 4 Microbiome diversity metrics based on Enterobacteriaceae colonization status and abundance.

a, Distribution of pairwise beta diversity estimates (Aitchison distance) between samples with or without Enterobacteriaceae. b, Two-sided Pearson correlation between Enterobacteriaceae abundance (transformed to centred log-ratio) and gut microbiome alpha diversity (Shannon index). Sample depths were rarefied to 500,000 reads.

Extended Data Fig. 5 Candidate gut microbiome species associated with Enterobacteriaceae colonization and abundance.

a, Heatmap depicting all statistically significant microbiome species linked to Enterobacteriaceae, E. coli or K. pneumoniae colonization and/or abundance across the entire dataset or strictly among healthy adults. b, Number of species among the 1000 most prevalent detected that were classified as co-excluders, co-colonizers or not significant according to their order affiliation. c, Proportion of candidate species per taxon classified according to whether they were consistently associated to different taxa and/or across different datasets. d, Phylogenetic tree of representative genomes from all Faecalibacterium species detected in this study and their estimated association to Enterobacteriaceae, E. coli or K. pneumoniae. Species without a labeled effect size were not associated with any of the Enterobacteriaceae species tested.

Extended Data Fig. 6 Co-excluders and co-colonizers of carbapenemase-producing Enterobacteriaceae.

a, Number of species differentially abundant between individuals colonized by carbapenemase-producing Enterobacteriaceae (CPE) compared to household negative controls (left) and compared to CPE-negative index subjects that were decolonized within the previous year (right). Species are coloured based on whether they were also found to be significantly different, and in the same direction, using the whole Enterobacteriaceae family (green), missing (grey) or significant but in opposite directions (red). b, Bar height represents the effect size derived from MaAsLin2 of species that were associated with both Enterobacteriaceae and CPE status using household controls (left) or using CPE-negative index subjects (right). Positive effect size denotes co-colonizers, while co-excluders are shown with a negative effect size. Error bars represent the standard error.

Extended Data Fig. 7 Accessory genes significantly linked to Enterobacteriaceae status.

a, Number and annotation of all accessory genes per species identified as significantly associated with Enterobacteriaceae colonization. Analysis was performed with 39 gut microbiome species that were identified as either co-colonizers or co-excluders of Enterobacteriaceae among healthy adults, but only 15 species contained significantly associated accessory genes. b, COG functional category significantly overrepresented (two-sided Fisher’s exact test, adjusted P = 7.93 × 10⁻⁵) among the accessory genes associated with Enterobacteriaceae colonization.

Extended Data Fig. 8 Functional diversity and candidate orthologs among co-excluders and co-colonizers.

a, Distribution of the number of annotated genes with KEGG (left) and Shannon diversity estimates (right) among co-excluders (n = 122) and co-colonizers (n = 96). Only genomes with >90% completeness were included. Box lengths represent the IQR of the data, the central line represents the median value, and the whiskers depict the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively. P values were derived from a two-sided Wilcoxon rank-sum test. P values were derived from a Wilcoxon rank-sum test. b, Heatmap depicting the distribution of the top 20 KEGG Orthologs (KOs) associated with co-excluders or co-colonizers. Columns represent bacterial species coloured by their taxonomic affiliation, genome type and classification (co-colonizer or co-excluder). KOs are grouped using a complete linkage hierarchical clustering on the basis of their presence/absence patterns. c, COG functional categories significantly associated with co-colonizers (positive effect size) or co-excluders (negative effect size), only considering genomes belonging to the Bacillota phylum.

Extended Data Fig. 9 Metabolic indices estimated between gut microbiome species and Enterobacteriaceae.

a, Metabolic competition and complementary indices estimated with PhyloMint between co-excluders or co-colonizers and all Enterobacteriaceae species detected at >1% prevalence. b, Distribution of metabolic distance scores between co-colonizers (n = 4292 comparisons) and co-excluders (n = 4773 comparisons) in relation to Enterobacteriaceae. c, Comparison of metabolic distances within and between co-excluders and co-colonizers. Co-excluders vs. co-excluders: n = 8256 comparisons; co-colonizers vs. co-colonizers: n = 6670 comparisons; co-colonizers vs. co-excluders: n = 14,964 comparisons. d, Reproducibility of metabolic distance scores of co-colonizers (n = 4292 comparisons) and co-excluders (n = 4773 comparisons) compared to Enterobacteriaceae after simulating models with defined gut media (M1) supplemented with diets from the Virtual Metabolic Human database, or with the M3 rich growth media. All comparisons were statistically significant (P < 0.0001). Box lengths represent the IQR of the data, the central line represents the median value, and the whiskers depict the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively. P values were derived from a two-sided Wilcoxon rank-sum test.

Extended Data Fig. 10 Distribution of predicted metabolites among co-excluders and co-colonizers.

a, Distribution of the number of metabolites predicted from uptake (top) or secretion (bottom) fluxes among co-excluders (n = 129) and co-colonizers (n = 116). P values were derived from a two-sided Wilcoxon rank-sum test. Box lengths represent the IQR of the data, the central line represents the median value, and the whiskers depict the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively. b, Metabolites significantly associated with either co-excluders or co-colonizers. Columns represent bacterial species coloured by their taxonomic affiliation, genome type and classification (co-colonizer or co-excluder). Metabolites are grouped using a complete linkage hierarchical clustering on the basis of their presence/absence patterns and coloured based on the type of metabolic flux (uptake or secretion).

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yin, Q., da Silva, A.C., Zorrilla, F. et al. Ecological dynamics of Enterobacteriaceae in the human gut microbiome across global populations. Nat Microbiol 10, 541–553 (2025). https://doi.org/10.1038/s41564-024-01912-6

Download citation

Received: 30 July 2024
Accepted: 12 December 2024
Published: 10 January 2025
Issue date: February 2025
DOI: https://doi.org/10.1038/s41564-024-01912-6

This article is cited by

Gut microbiota dysbiosis and systemic immune dysfunction in critical ill patients with multidrug-resistant bacterial colonization and infection
- Zongxin Ling
- Wenwen Ding
- Ruilai Jiang
Journal of Translational Medicine (2025)
Quantifying the intra- and inter-species community interactions in microbiomes by dynamic covariance mapping
- Melis Gencel
- Gisela Marrero Cofino
- Adrian W. R. Serohijos
Nature Communications (2025)
Exploring the bioactive landscape: peptides and non-peptides from the human microbiota
- Abdul Bari Shah
- Hyeonjae Cho
- Sang Hee Shim
npj Biofilms and Microbiomes (2025)
Enabling next-generation anaerobic cultivation through biotechnology to advance functional microbiome research
- Thomas Clavel
- Franziska Faber
- Lisa Maier
Nature Biotechnology (2025)
Recombinant Bacillus subtilis Displaying DHAV-1 VP1 Protein as a Dual-Function Probiotic: Evaluation of Immunological Efficacy, Growth Performance, Antioxidant Capacity, and Intestinal Health in Cherry Valley Ducks
- Bin Chen
- Yicen Tang
- Kangcheng Pan
Probiotics and Antimicrobial Proteins (2025)