The influence of environment on bacterial co-abundance in the gut microbiomes of healthy human individuals

Boetto, Christophe; Romero, Violeta Basten; Henches, Léo; Frouin, Arthur; Auvergne, Antoine; Patin, Etienne; Bredon, Marius; Kennedy, Sean P.; Duffy, Darragh; Quintana-Murci, Lluis; Sokol, Harry; Aschard, Hugues

doi:10.1038/s42003-025-08895-y

Download PDF

Article
Open access
Published: 06 November 2025

The influence of environment on bacterial co-abundance in the gut microbiomes of healthy human individuals

Communications Biology volume 8, Article number: 1537 (2025) Cite this article

3479 Accesses
1 Citations
43 Altmetric
Metrics details

Subjects

Abstract

The gut microbiome is a complex ecosystem characterized not only by its marginal taxonomic composition but also by its emergent properties. Bacteria develop local interactions to form coherent functional communities, whose effects on health and diseases cannot be predicted from the behavior of individual members. Understanding the factors underlying variability in these communities may therefore provide critical insights on the biological links between the gut microbiome and human phenotypes. Here, we examined the effect of a range of host factors, including demographics, medical history, and dietary habits, on these communities in 938 healthy individuals using MANOCCA, a covariance-based approach developed to address existing limitations. Increased age and smoking were associated with a significant overall decrease in co-abundance, and conversely a higher body mass index was associated with increased co-abundance. At the taxon level, a core of 200 genera were systematically impacted in their co-abundance with other taxa, suggesting a central role in structuring the network. Finally, we demonstrate that our approach offers a powerful framework for prediction purposes, with taxa co-abundance being able to predict the age of participants with an accuracy three-fold higher than a model based on abundance only.

Environmental factors shaping the gut microbiome in a Dutch population

Article 13 April 2022

Gut micro-organisms associated with health, nutrition and dietary interventions

Article Open access 10 December 2025

A proposal for the reference intervals of the Italian microbiota “scaffold” in healthy adults

Article Open access 10 March 2022

Introduction

There is an increasing corpus of research describing features associated with the gut microbiome composition in both healthy individuals and disease cases^1,2,3,4. Prominent features include age^5,6, sex^7,8,9, long term diet^10,11,12, and host genetics^13,14,15,16, but many other factors have been found associated^17,18. Yet, some aspects of the host-microbiome relationship remain difficult to characterize^19,20. In particular, handling the high-dimensional nature of microbiome data presents methodological challenges. As a result, most existing studies have focused on univariate approaches, evaluating the relationship between single taxa and host features one by one. Moving toward more comprehensive approaches that can capture the dynamics, connectivity, and high-order characteristics of the microbiome is an active area of research. Beyond established descriptive multivariate metrics—such as alpha and beta diversity indexes, which measure intra- and inter-sample distances between microbiome samples^13,21—various multivariate methods have been proposed to jointly test associations between host factors and the abundance of multiple taxa^22,23,24. One emerging research topic is the study of the co-abundance of taxa in metagenomes^25,26,27, and in gut microbiome in particular^{28,29,30,31,32}.

The gut microbiome is a complex ecosystem whose constituents form sub-communities through interactions between individual taxa. Those sub-communities, sometimes referred to as guilds²⁹, or cliques, display co-abundances because they exploit the same class of resources or work together as a coherent functional group^25,30. Studying taxa co-abundance—and more generally, the global connectivity within the gut microbiome—can help identify emergent properties that would be missed by univariate methods. Previous studies already illustrated differences in co-abundance networks across inflammatory bowel disease status and body mass index²⁸, and geographically diverse populations³¹. However, there is currently no gold standard method to screen for factors associated with changes in the co-abundance of bacteria across individuals in a population. Existing approaches are typically based on the inference of taxa networks across conditions using a threshold-based design to define significant co-abundances when the phenomenon is likely continuous. Furthermore, previous works showed that the inferred networks can vary substantially across approaches³³. More problematic, by construction those methods are restricted to categorical predictors, they do not allow for covariate adjustment, and the significance of the observed association is typically derived through permutations.

The recently developed MANOCCA³⁴ test offers a formal statistical framework to test the effect of both categorical and continuous predictors on the covariance matrix of a multivariate outcome—a metric directly proportional to the co-abundance—thus addressing all these methodological limitations. Briefly, the MANOCCA models pairwise taxa covariances at the individual level (i.e. each individual has its own taxa covariance matrix) and use them as dependent variables in a standard multivariate regression to evaluate their association with predictors of interest. Existing tools³⁵ assume the observed covariances for a given discrete condition (e.g. male or female) are drawn from a single underlying distribution and aim to infer that distribution. In comparison, the MANOCCA treats the covariances as random variables that varies across environmental factors, and aims at estimating the effect of those factors. Note that by construction, the test is independent of the baseline covariance, and does not aim at identifying strongly co-abundant taxa whose interactions is constant across environmental conditions. Here, we applied MANOCCA to characterize the host factors associated with variability in the gut microbiome co-abundance network of healthy participants. We further performed an in-depth examination of the effect of each associated factor on the taxa interaction network, highlighting the key taxa impacted and how these factors shape the microbiome co-abundance. We conclude by illustrating how the proposed framework can be used to develop predictive models of host factors based on taxa co-abundances.

Results

Identifying host factors associated with changes in the taxa co-abundance structure

We evaluated variability in the gut microbiome co-abundance structure of healthy individuals conditional on host factors using samples from the Milieu Intérieur (MI) consortium³⁶. MI is a population-based cohort including 1,000 healthy participants recruited in France, and split equally by sex and stratified across five-decades of life (from 20 to 60 years old). Gut microbiota composition was obtained from shotgun metagenomics sequencing, and taxonomic levels were derived by summing the normalized abundances (Fig. S1). We used MANOCCA to screen for association between 80 host features collected at baseline, including demographics, dietary habits, medical history and biomarkers measurements (Supplementary Data 1), and taxa co-abundance at the species, genus, and family levels using 938 MI participants with complete data (Fig. 1a-b). We focused on the most common taxa³⁷ and where MANOCCA maintained calibration despite strong data sparsity (Fig. S1c-d) for a total of 675 species (40% occurrence in samples), 718 genus (5% occurrence in samples), and 151 families (5% occurrence in samples), and after quality control filtering (Supplementary Data 2). For each feature and each taxonomic level, a single MANOCCA test was performed, evaluating if the overall co-abundance (for example the 675 ×675 covariance matrix at the species level) varies with that feature. Except when used as predictors, all analyses were adjusted for age, sex, and body mass index (BMI).

**Fig. 1: Screening for environmental and clinical factors.**

MANOCCA identified associations at a stringent Bonferroni correction threshold (P < 6.25 × 10⁻⁶) at all three taxonomic levels with age (P_species = 2.0 × 10⁻⁵⁵, P_genus = 3.5 × 10⁻⁵⁶, P_family = 9.2 × 10⁻³⁷), sex (P_species = 2.2 × 10⁻¹⁷, P_genus = 6.3 × 10⁻²², P_family = 3.1 × 10⁻¹⁸), smoking (P_species = 2.8 × 10⁻¹⁴, P_genus = 1.6 × 10^-20, P_family = 5.6 × 10⁻¹³), and at the genus level for BMI (P_species = 1.6 × 10⁻⁵, P_genus = 5.9 × 10⁻⁶, P_family = 5.3 × 10⁻⁵) (Fig. 1c-d and Fig. S2). Several features displayed suggestive significant association (P < 6.25 × 10⁻⁴), including appendicectomy (P_genus = 6.3 × 10⁻⁶, P_family = 4.2 × 10⁻⁵), cholesterol (P_species = 8.6 × 10⁻⁶) and dairy products (P_genus = 8.1 × 10⁻⁶), and at a slightly lower level, raw fruits, Nutrinet factor 1, income, dried pulses, cured cooked meats, bread, steady job, traditional nutrient profile, typhoid vaccination and level of urea. Besides significant and suggestive significant signals, we observed an enrichment for association with taxa co-abundance for nutrition-related features suggesting a modest but systematic impact of diet on the taxa interaction network. For example 23 out of the 30 variables of the diet panel, and 5 of the 7 factors of the Nutrinet panel displaying association below the nominal significance level of 0.05 at the genus level (Supplementary Data 3). We also conducted sensitivity analyses, assessing the variability in the results when applying MANOCCA to random subsets of available taxa. Overall, the larger the set of taxa included, the stronger the association signal (Fig. S2e-h), highlighting a global effect of the four factors on the microbiome co-abundance network.

We applied two alternative multivariate approaches for comparison purposes: a standard MANOVA, testing for association between each host factor and the joint abundance of taxa, and alpha diversity using the Shannon and Simpson indices (Fig. S2i-k, Supplementary Data 3). Some factors were significant after correction for multiple testing but at a much lower significance level. The MANOVA identified an association with age at the species and family level (P_species = 5.2 × 10⁻⁶, P_family = 1.0 × 10⁻²⁹) and with sex at the family level (P = 5.2 × 10⁻¹⁰). Both Simpson and Shannon indexes identified a signal with age at the family level (P_Simpson = 3.4 × 10⁻⁴, P_Shannon = 3.9 × 10⁻⁵) and the Shannon index also identified age at the genus level (P_Shannon = 1.6 × 10⁻⁵). Altogether, the signals observed exclusively with MANOCCA for smoking and BMI, and the stronger association of age and sex as compared to mean-based and diversity index approaches, points towards a substantially larger information content of the co-abundances of taxa over these existing metrics to describe the relationship between the gut microbiome of healthy individuals and these host variables.

Contribution of taxa on the co-abundance association signal

By construction, the MANOCCA association statistic is a weighted sum of the contribution from each pair of taxa considered on the overall co-abundance association statistics (see Methods). We derived those contribution weights for the age, sex, smoking and BMI signals, focusing on the genus level, which displayed the strongest association. As shown in Fig. S3a, most taxa display a non-zero contribution to the association, highlighting again the global effect of these factors on the microbiome composition. Nevertheless, these contributions show substantial heterogeneity, with a limited number of taxa pairs displaying outstanding weights as compared to the expected under the null (Fig. S3b). To assess potential links between effects on co-abundance and effects on relative abundance, we compared the MANOCCA weights against the univariate mean effect P-value associations derived using a standard univariate linear regression (see Methods and Supplementary Data 4). As shown in Fig. S3c, we observed a positive and significant correlation between the two results for all four variables (age, P = 1.6 × 10⁻¹²; sex, P = 1.3 × 10⁻⁴⁰; smoking, P = 2.1 × 10⁻⁵; BMI, P = 4.6 × 10⁻¹⁵), suggesting a dual impact of these factors on the abundance and the co-abundance of many of these genera, in agreement with the existing literature^{5,6,7,8,9,17,38,39,40,41}. Several top genera contributing to the co-abundance also passed a stringent Bonferroni correction threshold (P < 8.7 × 10⁻⁷) for univariate association. This includes Bacteroides (P = 1.9 × 10⁻⁷), Coprococcus B (P = 7.1 × 10⁻⁹), Anaerotruncus (P = 8.5 × 10⁻⁷), Agathobacter (P = 3.0 × 10⁻⁸), Alistipes (P = 2.2 × 10⁻⁸), and Intestinimonas (P = 1.9 × 10⁻⁹).

We next investigated the characteristics of the top 5% pairs of genera displaying the largest contribution to the variability in co-abundance at the family level. Out of 151 families, a subset of 10, 8, 11 and 7 overlapping sets of families, covered 50% or more of those top contributing genera with age, sex, smoking and BMI, respectively. Those key families include the ones with the highest relative abundance, Lachnospiraceae (23.6%), Bacteroidaceae (22.4%), Ruminococcaceae (7.8%), Acutalibacteraceae (6.4%), Oscillospiraceae (5.6%), but also some rare ones: Eggerthellaceae (0.5%), Peptostreptococcaceae (0.6%), Muribaculaceae (0.5%), and four unspecified Co-Abundance Groups (CAG-74, CAG-508, CAG-272, CAG-138) (Fig. 2a). While the representativity of families involved in co-abundance variability was somewhat proportional to their relative abundance, we noted several major differences. Some families, such as Bacteroidaceae are largely underrepresented in the co-abundance signal. Conversely, co-abundance involving the Oscillospiraceae family are strongly impacted by all four factors, and by BMI in particular. Other families also display factor-specific enrichment, including Peptostreptococcaceae and Muribaculaceae with smoking, two families already reported to be associated with smoking status^42,43.

**Fig. 2: Cross and inter-family interactions for age, sex, smoking and BMI.**

We examined the composition of the top contributing pairs of genera to determine whether they were involved in changes in interaction within the same family (intra-family co-abundance) or interaction between genera from different families (cross-family co-abundance) (Fig. 2b). Within-family co-abundance represented a small fraction of all interactions, with the vast majority of interactions taking place between genera of different families (Fig. 2c-f). Besides a few exceptions (e.g. variability in the co-abundance between taxa from Oscillospiraceae and Acutalibacteraceae families for BMI), we did not observe any marked pattern.

Network of impacted taxa

We formed a network of co-abundance variation from the top 1000 pairs of genera contributing to the MANOCCA association signal with age, sex, smoking, and BMI (Fig. 3a, and Supplementary Material). Altogether, these 4000 pairs involved a total of 476 unique genera. As shown in Fig. 3b, there was a substantial overlap in pairs of co-abundant taxa impacted by sex and BMI (N = 658 pairs, approximately 66% of sex and BMI associated pairs), and age and smoking habits (N = 306 pairs, approximately 31% of age and smoking-associated pairs). Conversely, the overlap across the nine other pairs of factors was null or negligible. At the taxa level, a core of 200 genera were shared across all predictors (Fig. 3c). Other genera were evenly spread across factors, except for age and smoking which involved 49 (13%) and 54 (15%) genera specific to those two factors, respectively. Together, this suggests that the four factors partly control the interacting partners of this core genera. As shown in Fig. 3d, increased age and smoking are mostly associated with a decrease in co-abundances, with 86% and 75% of top pairs displaying negative associations with these two factors, respectively. BMI exhibited an opposing trend, with 72% of top pairs showing an increase in co-abundance with increasing BMI. The sex predictor displayed a more balanced distribution, with a 60% decrease and a 40% increase of co-abundance in males as compared to females.

**Fig. 3: Co-abundance network influenced by age, sex, smoking and BMI.**

Multiple patterns emerged when exploring the contributing genera. Those shared between smoking and age are especially enriched in the Oscillospirales order (e.g. Massillioclostridium, CAG-180, CAG-1427, Marseille-P4683, and MGYG-HGUT-03297), and consistently exhibited reduced co-abundances with the core taxa. Among genera unique to smoking, Bacteroides A genus was by far the most impacted, showing a reduction of co-abundances with many of the core taxa. Interestingly, the relative abundance of this common genus (detected in 99% of participants, Supplementary Data 2) was not associated with smoking status in our data (P-value from a linear regression equals 0.26, Supplementary Data 4). This suggests that smoking might only break some of its interactions with other genera without impacting the presence of this genus itself, highlighting the ability of our approach to detect taxa missed by standard abundance-based approaches. A subset of genera contributing to association with BMI, sex and smoking was enriched from the Lachnospiraceae family (Ruminococcus A, Dorea, Coprococcus B, GCA-900066135, Agathobacter), displaying both increased and decreased co-abundances across predictors. Both Lachnospiraceae relative abundance and co-abundance with other taxa have already been found to be associated with human diseases and obesity in particular^28,44.

To assess the relevance of the covariance-based co-abundances network impacted, we considered two alternative approaches. A naïve permutation-based approach, inspired from the existent³³, that produces an empirical comparison of pairwise covariance between all taxa (see Methods), and the commonly used SparCC⁴⁵ approach (see Supplementary Note 4 for a detailed description of the two approaches). Note that both the SparCC and the permutation-based approaches, like all existing method, are limited to binary predictors and use a threshold to define co-abundance in each group studied. Both methods are meant to detect significant differences in pairwise taxa correlations across values of a categorical predictor, and should in theory detect effect on co-abundance variability similar to those detected by MANOCCA. However, as showed in the simulation from Fig. S4a-d, permutation-based shows poor specificity as compared to MANOCCA, and SparCC shows the poorest performances in this simulation with almost no power. We applied both methods to the two binary predictors, sex and smoking, at the genus level and crossed the results with MANOCCA’s top contributing products. The overlap between MANOCA and the two alternative methods was very modest, but highly significant with minimum P-values of 1 × 10⁻¹⁴⁵ and 2 × 10⁻¹⁵³ for sex and smoking respectively for the permutation approach (Fig. S4e and g), and 1 × 10⁻⁷⁰ and 1 × 10⁻⁸² for sex and smoking respectively for the SparCC approach (Fig. S4f, h), thus, confirming that those three alternative methods do detect some similar network components.

Prediction of individual features based on taxa correlation

Our framework is built out of a linear model where the covariance is defined at the individual level. This is a major advantage over existing correlation approaches³³, that allows for a range of complementary analyses. One particularly important extension is the possibility of training a predictive model of an outcome based on taxa covariance, so that the outcome in question can be predicted for any new individual based on its microbiome (and conversely). Here, we assessed the accuracy of MANOCCA to predict the four most associated features (age, sex, smoking and BMI), using taxa from the species, genus and family level and a 30-fold cross-validation. Accuracy was derived using squared-correlation (\({r}^{2}\)) for continuous outcomes (age and BMI), and using the area under the receiver operating curve (AUC) for binary outcomes (smoking and sex). We compared the covariance-based prediction model against a standard linear model based on the relative abundance of each single taxa.

As showed in Fig. 4, the MANOCCA strongly outperforms the standard mean-based prediction model, being significantly more accurate in all scenarios we considered. Gain in prediction was especially large for age with up to a three-fold increase in power. The median of \({r}_{{age}}^{2}\) from the MANOCCA equals 0.27, 0.25 and 0.18 for models based on species, genus and family, respectively. In comparison, the mean-based model \({r}_{{age}}^{2}\) equal 0.10, 0.07 and 0.05, respectively. Prediction was also significantly higher for sex, with AUCs of 0.66, 0.64, and 0.64 for the mean-based model at the species, genus and family level, respectively, and AUCs of 0.67, 0.69 and 0.70 for the covariance-based model. This confirms the higher information content of co-abundance as compared to abundance, and demonstrates the validity of using covariance-based co-abundances for prediction purposes.

**Fig. 4: Predictive power of covariance-based models.**

Discussion

There is a strong rationale for studying changes in microbiome co-abundances. There is now increasing evidence that species form functionally coherent groups that work together to exploit the same resources from the local environment³⁰. Studying those groups, rather than each single taxa, might help better understand the role of the microbiome in human health outcomes. With this same argument, it has already been proposed to study those groups through variability in the network of co-abundances^28,31. Although simple in principle, the implementation of this objective can be challenging in practice. Here, we applied MANOCCA³⁴, a recently developed method, that enables formal statistical testing of associations between taxa covariance and any predictor - whether continuous, categorical or binary. We used it to investigate host features associated with gut microbiome co-abundance in 938 healthy individuals from the Milieu Intérieur cohort.

We identified highly significant associations between taxa co-abundance variability and age, sex, smoking status and body mass index (BMI). Except for BMI, these associations were detected at all three taxonomic levels studied: species, genus, and family. In comparison, mean-based multivariate and diversity-based analyses identified associations only with age, and one with sex at the family level, but at a much lower significance. For the four associated features, there was a significant correlation between contribution to co-abundance and univariate effect on relative abundance, suggesting that those features impact both the abundance and co-abundance of taxa. The network of top contributing genera shows that interaction variability was concentrated in a limited number of families. These interactions primarily occurred between genera of different families, rather than within the same family. The overlap of top contributing taxa over the four features was substantial, especially between age and smoking, and between sex and BMI, suggesting potentially shared mechanisms. Finally, we demonstrate that the MANOCCA framework can be used to build predictive models. In this study, we applied it to age, sex, smoking and BMI. For all features, the predictive power based on co-abundances was significantly and systematically higher than for a standard mean-based multivariate model, with up to a three-fold increase in r-squared for age.

Our study also has limitations. First, the approach is not applicable to microbiome data of small sample sizes. Despite the data reduction steps through principal component analysis, the number of PCs analysed should remain substantially smaller than the sample size, thus limiting the application to datasets of 100 participants or more. Hopefully, this will become less of an issue thanks to the increasingly large cohorts available. Second, the proposed approach does not model the compositional aspect of the data per se⁴⁶. However, under reasonable assumptions, the change in covariance assessed by MANOCCA is independent of the marginal covariance. As a result, any potential bias due to compositional effects acts as an offset, without impacting our test. Additionally, when the dimension of the data is large enough, as for the analysis of species, genus or family conducted in this study, the compositional effect on correlation becomes negligible (Supplementary Note 3 and Fig. S4i-n). Third, new bacterial species databases have been published since our analysis, potentially offering novel insights. Fourth, we demonstrated that covariance can be used for prediction purposes. However, implementing such predictive models will require further exploration. As for prediction model based on relative abundance, some species might not be quantified in the targeted samples for prediction. This issue will likely be exacerbated when working with thousands of covariance terms. One possible solution is to develop sparse predictive models focusing on pairs of taxa that are fairly common, instead of using the entire covariance matrix. Furthermore, we used simple linear predictive models for both abundance and co-abundance. Future work might investigate the use of more complex methods⁴⁷ to combine the proposed covariance into prediction models.

Through the characterization of the links between variability in gut microbiome taxa co-abundance and healthy individual host factors, this study addresses three major limitations of the existent. First, the proposed approach allows for a formal statistical test of association between taxa co-abundance and both binary and continuous host features. In contrast, existing methods are restricted to ad hoc comparisons of inferred networks across a limited number of conditions. Second, the framework allows for covariate adjustment, so that the respective effects of correlated factors can be deciphered from one another. Third, the covariance-based approach provides a mean to derive a co-abundance metric at the individual level, allowing for a range of secondary analyses, including the development of co-abundance-based predictive models. Altogether, the proposed approach opens paths for various co-abundance analyses. It is highly complementary to recent efforts to develop experimental design to study co-abundance (e.g.⁴⁸).

Overall, because the MANOCCA is built on a linear least squares framework, it inherits its assumptions and robustness, and can be extended to more complex scenarios using established solutions. Future works may examine the analysis of longitudinal data by modelling data structure through linear mixed model, or regularization to account for data specific issues. Conceptually, nothing prevents applying MANOCCA to other scenarios, such as taxa comparing taxa co-abundance between inflammatory bowel disease cases and controls. However, we highly recommend validating any new use case through calibration plots. Altogether, this approach can be used to produce new working hypothesis, and assess statistical evidence for effect on co-abundance from both observational and experimental data.

Methods

Milieu Interieur gut microbiome data

The Milieu Intérieur Consortium is a population-based cohort initiated in September, 2012³⁶. It comprises 1,000 healthy volunteers, all recruited in the suburban Rennes area (Ille-et-Vilaine, Bretagne, France), with a 1:1 sex ratio (500 males, 500 females) and an equal distribution across 5 decades (20 to <30 y, 30 to <40 y, 40 to <50 y, 50 to <60 y). The primary objectives of the MI Consortium are to define the naturally occurring variability in a healthy population’s immune phenotypes and to characterize genetic, environmental, and clinical factors driving this variability. The cohort collected a broad range of variables, including genetic, genomic, and environmental data, on most participants. On their first visit the volunteers were also asked to fill in an extended form about socio-demographic, lifestyle and family health history, all recorded in an electronic case report form (eCRF). Gut microbiota composition was obtained from shotgun metagenomics sequencing, and taxonomic levels were reconstructed by summing the normalized abundances within a branch at a given level (Fig. S1), resulting in a total of 13,446 unique bacterial species. Further description of the data generation is provided in Supplementary Note 1.

Covariance method

To test for changes in co-abundance we used the Multivariate Analysis Of Conditional Covariance Analysis test (MANOCCA) as previously published in Boetto et al.^34,49. Briefly, the variability in the covariance between two standardized outcomes \({Y}_{1}\) and \({Y}_{2}\) can be investigated through the element-wise product of those outcomes. For two outcomes \({Y}_{1}\) and \({Y}_{2}\) the covariance, or co-abundance, can be expressed as \({{{\mathrm{cov}}}}\left({Y}_{1},{Y}_{2}\right){\mathbb{=}}{\mathbb{E}}\left[{Y}_{1}{Y}_{2}\right]{\mathbb{-}}{\mathbb{E}}\left[{Y}_{1}\right]{\mathbb{E}}\left[{Y}_{2}\right]\), and for standardized outcomes and a sample size \(N\), it can be re-expressed as the average of the element-wise product across individuals: \({{{\mathrm{cov}}}}\left({Y}_{1},{Y}_{2}\right)=\left({\sum }_{i=1\ldots N}{Y}_{1i}{Y}_{2i}\right)/N\). It follows that the effect of a predictor \(X\) on \({{{\mathrm{cov}}}}\left({Y}_{1},{Y}_{2}\right)\) can be tested using a standard least-squares regression framework where \(X\) is treated as a predictor and the product \({Y}_{1}{Y}_{2}\) as the outcome. One can easily demonstrate that, under reasonable assumptions, this test is independent of mean and variance effect³⁴. Extending the method to more than two outcomes can be done through the following four steps: i) starting with \(K\) centered outcomes \({Y}_{1},\ldots ,\,{Y}_{K}\), all the pairwise products are computed: \({P}_{{ij}}={Y}_{i}{Y}_{j} \; {for} \; i \in [\![1,K]\!]\,{and} \; i \, < {j;ii}\)) The \({P}_{{ij}}\) products are then mapped to the quantiles of a normal distribution using an inverse-rank normal transformation; iii) to reduce the dimension of product matrix, \({P}\) is then projected in a reduced latent space of dimension \(p \; \ll \; \frac{k\left(k-1\right)}{2}\) using the Principal Components Analysis transformation: \(P{C}_{r}={Q}_{r}={\sum }_{i=0}^{K}{\sum }_{j > i}^{K}{\lambda }_{{ij}}^{(r)}{Y}_{i}{Y}_{j}\), the resulting Principal Components (PC) are then mapped to the quantiles of a normal distribution using an inverse-rank transformation, and scaled. This gives, for \(N\) considered individuals, a matrix \({{{\bf{Q}}}}\) of dimension \(N\times p\) that we can use for the test. A detailed explanation of each step is available in Boetto et al.³⁴. Finally, iv) given a scaled predictor \(X\) and scaled covariates matrix \({{{\bf{C}}}}\), which can be continuous, categorical or binary, the test for association between the predictor and the covariance matrix can be conducted using a standard multivariate model: \({{{\bf{Q}}}} \sim X+{{{\bf{C}}}}\).

In this application, we varied the number of principal components used in MANOCCA from two to one hundred but limited the number of PC analysed for each predictor based on the guidelines provided in ref. ³⁴, and used a stringent multiple testing significance threshold to account for the various number of PCs considered. Further description of the impact of dimension reduction on the MANOCCA test is provided in Supplementary Note 2.

Contribution of taxa to covariance association signal

All the steps in the derivation of the statistical test are linear operations, which means that the contribution of features contributing to the MANOCCA association signal can be summed. Two types of contributions can be derived: the covariance contribution from each pair of taxa, and the sum of the covariance contribution assigned to each single taxon. The contribution of a given pair of taxa \(i\) and \(j\) to the covariance signal, \(\phi ({P}_{{ij}})\), is defined as the square of the PCA loadings multiplied by the univariate association coefficient \({\hat{\beta }}_{.}^{2}\) of the corresponding principal components with the considered predictor: \(\phi ({P}_{{ij}})={\sum }_{r=1}^{p}{\hat{\beta }}_{r}^{2}{\left({\lambda }_{{ij}}^{\left(r\right)}\right)}^{2}\), where \({\lambda }_{{ij}}^{\left(r\right)}\) is the loading of the \({ij}\) pair of taxa for PC \(r\), and \(p\) is the total number of PCs included in the analysis. The single taxa contribution, \(\psi \left({Y}_{i}\right)\), can be derived by summing its contributions across all pairs: \(\psi \left({Y}_{i}\right)={\sum }_{j=1}^{K-1}\phi ({P}_{{ij}})\), with \(j \; \ne \; i\), and \(K\) is the total number of pairs.

Environmental association screening using MANOCCA

We applied MANOCCA to identify environmental factors associated with a change in covariances between taxa at the family, genus and species level. Milieu Interieur volunteers were asked to fill in a questionnaire of 44 pages, covering multiple panels such as demographics, lifestyle, and vaccination history. We selected the most relevant panels for the study, leading to the selection of 102 environmental factors. Among them was included diet information collected as part of the Nutrinet study⁵⁰: the top three factors from the Nutrinet factors analysis, and the Nutrinet profiles were binarized to yes/no. We filtered out variables with more than half of the sample size in missing values, or a binary predictor with frequencies smaller than 5%. For categorical predictors displaying highly skewed distributions, outliers, defined as value three more standard deviation away from the mean were merged with a lower occurring category. A total of 80 environmental factors remained for analysis. After filtering, we ended up with a cohort of 938 individuals with complete shotgun sequencing, age, sex and body mass index (BMI) data. For the genus and family levels, we kept taxa abundant in at least 5% of the cohort, leading to a drop from 1192 genera to 718 genera, and 216 families to 151 families. At the specie level, to avoid having too many species with regard to the sample size, we set the threshold to 40% of the cohort leading to a drop from 3885 species to 675 species.

Significance thresholds and sensitivity analyses

We used two significance level to account for multiple testing. A very stringent Bonferroni corrected threshold, \({B}_{{stringent}}=0.05/\left(80\times 100\right)={6.25\times 10}^{-6}\), that cumulate the total number of test (80 host feature and 100 principal component (PC) models) without accounting for the strong correlation between the PC models, and a suggestive threshold, \({B}_{{suggestive}}=0.05/80={6.25\times 10}^{-4}\), that assume strong correlation across the PCs models.

Comparison with MANOVA and alpha diversity

For comparison purposes, we considered three alternative multivariate methods: a standard MANOVA and the alpha diversity, using the Shannon and Simpson indexes. The screening methodology was the same as the one used for MANOCCA, though some pre-processing adjustments were made to match the expected assumptions of each method. The MANOVA was applied to the taxa relative abundances from a given phylogenic level, which was processed following standards from the literature⁵¹: proportion followed by arcsin root transformation followed by a scaling. With \({{{\bf{Y}}}}\) the matrix of resulting taxa, \(X\) the considered predictor, and \({{{\bf{C}}}}\) a matrix of covariates. We applied the Wilk’s lambda test : \({{{\bf{Y}}}}\, \sim {X}+{{{\bf{C}}}}\), and in more details with \(\widetilde{{{{\bf{Y}}}}}={{{\bf{Y}}}}-{\beta }_{c}C\) the residual matrix after adjustment from the covariates, \(\hat{\beta }=({X}^{T}X)^{-1}({X}^{T}\widetilde{Y})\) the regression coefficient and \(N\) the sample size, we could compute a P-value for the statistic : \(\det \left({\widetilde{Y}}^{T}\widetilde{Y}\,-N\hat{\beta }{\hat{\beta }}^{T}\right)/\det ({\widetilde{Y}}^{T}\widetilde{Y}) \sim F(p,N-p-1)\).

For the alpha diversity indexes, the raw abundances were used to the corresponding metric : \({\alpha }_{{Shannon}}={\sum }_{i=1}^{N}\frac{{x}_{i}}{{\sum }_{j=1}^{N}{x}_{j}}\log\bigg(\frac{{x}_{i}}{{\sum }_{j=1}^{N}{x}_{j}}\bigg)\) and \({\alpha }_{{Simpson}}=1-{\sum }_{i=1}^{N}\frac{{x}_{i}\left({x}_{i}-1\right)}{\left({\sum }_{j=1}^{N}{x}_{j}\right)\left(\left({\sum }_{j=1}^{N}{x}_{j}\right)-1\right)}\,\). The resulting \(\alpha\) was tested in a standard univariate linear regression adjusted for the covariates: \(\alpha \, \sim \,{\delta }_{X}X+{\delta }_{C}C\). The effect of \(X\) was assessed using a Wald test to the \({\hat{\delta }}_{X}\).

Deriving the covariance network

Networks of variation in the covariance were built using the top 1,000 co-abunding pairs derived using the MANOCCA test. In representing the network, we included three parameters: the total number of connections (qualified through the node size), the actual pairwise taxa connection (edges in the graph), and the direction of host factor effect on the covariance (decrease or increase of co-abundance). For each pair \({Y}_{i}{Y}_{j}\) adjusted for covariates \(C\) : \(\widetilde{{Y}_{i}{Y}_{j}}={Y}_{i}{Y}_{j}-{\left({C}^{T}C\right)}^{-1}{C}^{T}{Y}_{i}{Y}_{j}C\), the direction of effect was derived using the sign of the regression coefficient for the predictor \(X\) : \({\beta }_{{ij}}={\left({X}^{T}X\right)}^{-1}X\widetilde{{Y}_{i}{Y}_{j}}\). For shared pairs with mixed direction of effect, the edge was colored in black. To facilitate the reading of the network, we coloured the node conditional on the association with each of the four predictors of interest, or shared among them. The ‘viridis’ cmap from \({matplotlib}\) was used as colour scheme, with each shared taxa being a combination of original predictor colours.

Comparison with other network-based approaches

For comparison purposes, we derived a permutation-based network inference approach for binary predictor, which we used to validate the MANOCCA network. In brief, we derive the pairwise covariance matrix for each categorical value, and then derive the empirical distribution of the correlation under the null by simulating \({N}_{{permutations}}\) covariances after shuffling the abundances of a bacterial taxa for each individual. Using a fixed detection threshold, we can then select pairs of taxa with extreme covariances. Since we are interested in variability of the covariance, we only keep the pairs uniquely detected across all values of the given predictor. When applied to compare results from the sex and smoking analyses, we ran 100,000 permutations, and retrieved the unique pairs detected in either group (women vs men, and non-smoker vs ever smoke). We also ran the SparCC⁴⁵ correlation analysis, as this approach is commonly used and performed relatively well in a review of existing approaches³³. We ran SparCC using the recommended parameters, deriving the P-values using 1000 permutations, on both the simulated data (Fig. S4c) and the real data (Fig. S4, h). Further descriptions of both approaches are provided in the Supplementary Note 4.

Using co-abundance for prediction purposes

We assessed the performances of a predictive model based on covariance across taxa. The implementation of a predictive model follows the standard used for multivariate linear model. For a given outcome \(A\) to be predicted, the estimated coefficients between \(A\) and \({{PC}}_{i=1\ldots L}^{\left({train}\right)}\) obtained in a training dataset from MANOCCA, \(\hat{{{{\boldsymbol{\beta }}}}}=({\hat{\beta }}_{1}\ldots \,{\hat{\beta }}_{L})\) are projected on the principal component from an independent test dataset and summed up to form a predictive score \(S={\sum }_{i=1}^{L}{\hat{\beta }}_{i}{{PC}}_{i}^{\left({test}\right)}\). Note that the dimensionality of the covariance data and the principal component analysis (PCA) step make the implementation slightly more complex. In particular, the principal components derived on the same variables for two independent samples might not always match, with structure in the data being capture by different components. To avoid this issue, PCA is not applied in the test data. Instead, \({{PC}}_{i}^{\left({test}\right)}\) are derived by projecting out the loadings from the training sets: \({{PC}}_{i}^{\left({test}\right)}={\sum }_{1}^{L}{\lambda }_{j}^{\left(i\right)}{X}_{j}\), where \({\lambda }_{j}^{\left(i\right)}\) is the loading of variable \({{PC}}_{i}^{\left({train}\right)}\) for product of taxa \(i\) obtained in the train data. It also implies that the test dataset should have the same dimension (i.e. approximately the same list of taxa) as the train dataset.

We applied this approach for the prediction of age, BMI, smoking and sex using taxa from the three lowest taxa levels (species, genus and family), using a 30-fold cross-validation, and without including other factors as covariate. For each of the 30 cross-validation, the dataset was randomly split into two independent sets: a training set including 90% of the data and a test set including the 10% remaining samples. We measured the accuracy of the predictive model using squared-correlation for continuous outcomes, derived as \({{cor}\left(S,{A}^{({test})}\right)}^{2}\), and using the area under the receiver operating curve (AUC) for binary outcomes. The AUC is a common metric to quantify the predictive power of binary outcome. It equals the probability of correctly classifying a random sample from the test data.

Statistics and reproducibility

All statistical analyses were performed using the MANOCCA framework for multivariate covariance analysis, the MANOVA for multivariate additive analysis, and otherwise a standard linear regression. All models were fitted using linear regression under least squares assumptions.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The dataset supporting the conclusions of this article is available in the European Genome-Phenome Archive under accession code EGAC00001001785. Figure 1 was generated using data from Supplementary Data 1 and Supplementary Data 3. Figure 2 was generated using data from Supplementary Data 2 and Supplementary Data 3. Figure 3 was generated using data from Supplementary Data 2 and Supplementary Data 3. Figure 4 was generated using data from Supplementary Data 4.

Code availability

All code is available in Python and R at: https://gitlab.pasteur.fr/statistical-genetics/manocca and at https://doi.org/10.5281/zenodo.16945401.

References

Hou, K. et al. Microbiota in health and diseases. Signal Transduct Target Ther 7, 135 (2022).
Article PubMed PubMed Central Google Scholar
Daniel, N., Lecuyer, E. & Chassaing, B. Host/microbiota interactions in health and diseases-Time for mucosal microbiology! Mucosal Immunol 14, 1006–1016 (2021).
Article CAS PubMed PubMed Central Google Scholar
Vijay, A. & Valdes, A. M. Role of the gut microbiome in chronic diseases: a narrative review. Eur J Clin Nutr 76, 489–501 (2022).
Article CAS PubMed Google Scholar
VanEvery, H., Franzosa, E. A., Nguyen, L. H. & Huttenhower, C. Microbiome epidemiology and association studies in human health. Nature Reviews Genetics 24, 109–124 (2023).
Article CAS PubMed Google Scholar
O’Toole, P. W. & Jeffery, I. B. Gut microbiota and aging. Science 350, 1214–1215 (2015).
Article PubMed Google Scholar
Xu, C., Zhu, H. & Qiu, P. Aging progression of human gut microbiota. BMC Microbiol 19, 236 (2019).
Article PubMed PubMed Central Google Scholar
Zhang, X. et al. Sex- and age-related trajectories of the adult human gut microbiota shared across populations of different ethnicities. Nature Aging 1, 87–100 (2021).
Article PubMed Google Scholar
Kim, Y. S., Unno, T., Kim, B. Y. & Park, M. S. Sex Differences in Gut Microbiota. World J Mens Health 38, 48–60 (2020).
Article PubMed Google Scholar
Min, Y. et al. Sex-specific association between gut microbiome and fat distribution. Nat Commun 10, 2408 (2019).
Article PubMed PubMed Central Google Scholar
Stege, P. B. et al. Impact of long-term dietary habits on the human gut resistome in the Dutch population. Sci Rep 12, 1892 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rinninella, E. et al. The role of diet in shaping human gut microbiota. Best Pract Res Clin Gastroenterol 62-63, 101828 (2023).
Article PubMed Google Scholar
Perler, B. K., Friedman, E. S. & Wu, G. D. The Role of the Gut Microbiota in the Relationship Between Diet and Human Health. Annu Rev Physiol 85, 449–468 (2023).
Article CAS PubMed Google Scholar
Turpin, W. et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat Genet 48, 1413–1417 (2016).
Article CAS PubMed Google Scholar
Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat Genet 53, 156–165 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lopera-Maya, E. A. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat Genet 54, 143–151 (2022).
Article CAS PubMed Google Scholar
Qin, Y. et al. Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. Nat Genet 54, 134–142 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zahavi, L. et al. Bacterial SNPs in the human gut microbiome associate with host BMI. Nat Med 29, 2785–2792 (2023).
Article CAS PubMed PubMed Central Google Scholar
Fishbein, S. R. S., Mahmud, B. & Dantas, G. Antibiotic perturbations to the gut microbiome. Nat Rev Microbiol 21, 772–788 (2023).
Article CAS PubMed PubMed Central Google Scholar
Awany, D. et al. Host and Microbiome Genome-Wide Association Studies: Current State and Challenges. Front Genet 9, 637 (2018).
Article CAS PubMed Google Scholar
Sanna, S., Kurilshikov, A., van der Graaf, A., Fu, J. & Zhernakova, A. Challenges and future directions for studying effects of host genetics on the gut microbiome. Nat Genet 54, 100–106 (2022).
Article CAS PubMed Google Scholar
Ruhlemann, M. C. et al. Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome. Nature Genetics 53, 147 (2021).
Article PubMed Google Scholar
Raimondi, S. et al. Multivariate Analysis in Microbiome Description: Correlation of Human Gut Protein Degraders, Metabolites, and Predicted Metabolic Functions. Front Microbiol 12, 723479 (2021).
Article PubMed PubMed Central Google Scholar
Le Cao, K. A. et al. MixMC: A Multivariate Statistical Framework to Gain Insight into Microbial Communities. PLoS One 11, e0160169 (2016).
Article PubMed PubMed Central Google Scholar
Banerjee, K., Chen, J. & Zhan, X. Adaptive and powerful microbiome multivariate association analysis via feature selection. NAR Genom Bioinform 4, lqab120 (2022).
Article PubMed PubMed Central Google Scholar
Faust, K. & Raes, J. Microbial interactions: from networks to models. Nat Rev Microbiol 10, 538–550 (2012).
Article CAS PubMed Google Scholar
Kumar, M., Ji, B., Zengler, K. & Nielsen, J. Modelling approaches for studying the microbiome. Nat Microbiol 4, 1253–1267 (2019).
Article CAS PubMed Google Scholar
Schmidt, T. S. B., Raes, J. & Bork, P. The Human Gut Microbiome: From Association to Modulation. Cell 172, 1198–1215 (2018).
Article CAS PubMed Google Scholar
Chen, L. et al. Gut microbial co-abundance networks show specificity in inflammatory bowel disease and obesity. Nat Commun 11, 4018 (2020).
Article CAS PubMed PubMed Central Google Scholar
Banerjee, S., Schlaeppi, K. & van der Heijden, M. G. A. Keystone taxa as drivers of microbiome structure and functioning. Nat Rev Microbiol 16, 567–576 (2018).
Article CAS PubMed Google Scholar
Wu, G., Zhao, N., Zhang, C., Lam, Y. Y. & Zhao, L. Guild-based analysis for understanding gut microbiome in human health and diseases. Genome Med 13, 22 (2021).
Article PubMed PubMed Central Google Scholar
Jackson, M. A. et al. Detection of stable community structures within gut microbiota co-occurrence networks from different human populations. PeerJ 6, e4303 (2018).
Article PubMed PubMed Central Google Scholar
Bayer, G., Ganobis, C. M., Allen-Vercoe, E. & Philpott, D. J. Defined gut microbial communities: promising tools to understand and combat disease. Microbes Infect 23, 104816 (2021).
Article CAS PubMed Google Scholar
Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J 10, 1669–1681 (2016).
Article CAS PubMed PubMed Central Google Scholar
Boetto, C. et al. MANOCCA: a robust and computationally efficient test of covariance in high-dimension multivariate omics data. Brief Bioinform. 25, bbae272 (2024).
Matchado, M. S. et al. Network analysis methods for studying microbial communities: A mini review. Comput Struct Biotechnol J 19, 2687–2698 (2021).
Article CAS PubMed PubMed Central Google Scholar
Thomas, S. et al. The Milieu Interieur study - an integrative approach for study of human immunological variance. Clin Immunol 157, 277–293 (2015).
Article CAS PubMed Google Scholar
Lozupone, C. A. et al. Meta-analyses of studies of the human microbiota. Genome Res 23, 1704–1714 (2013).
Article CAS PubMed PubMed Central Google Scholar
Saraswati, S. & Sitaraman, R. Aging and the human gut microbiota-from correlation to causality. Front Microbiol 5, 764 (2014).
PubMed Google Scholar
Dominianni, C. et al. Sex, body mass index, and dietary fiber intake influence the human gut microbiome. PLoS One 10, e0124599 (2015).
Article PubMed PubMed Central Google Scholar
Gui, X., Yang, Z. & Li, M. D. Effect of Cigarette Smoke on Gut Microbiota: State of Knowledge. Front Physiol 12, 673341 (2021).
Article PubMed PubMed Central Google Scholar
Fan, J. et al. Cross-talks between gut microbiota and tobacco smoking: a two-sample Mendelian randomization study. BMC Med 21, 163 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tam, A. et al. Effects of sex and chronic cigarette smoke exposure on the mouse cecal microbiome. PLoS One 15, e0230932 (2020).
Article CAS PubMed PubMed Central Google Scholar
Antonello, G. et al. Smoking and salivary microbiota: a cross-sectional analysis of an Italian alpine population. Sci Rep 13, 18904 (2023).
Article CAS PubMed PubMed Central Google Scholar
Meehan, C. J. & Beiko, R. G. A phylogenomic view of ecological specialization in the Lachnospiraceae, a family of digestive tract-associated bacteria. Genome Biol Evol 6, 703–713 (2014).
Article CAS PubMed PubMed Central Google Scholar
Friedman, J. & Alm, E. J. Inferring Correlation Networks from Genomic Survey Data. PLOS Computational Biology 8, e1002687 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 8, 2224 (2017).
Article PubMed PubMed Central Google Scholar
Marcos-Zambrano, L. J. et al. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front Microbiol 12, 634511 (2021).
Article PubMed PubMed Central Google Scholar
Ho, P. Y., Nguyen, T. H., Sanchez, J. M., DeFelice, B. C. & Huang, K. C. Resource competition predicts assembly of gut bacterial communities in vitro. Nat Microbiol 9, 1036–1048 (2024).
Article CAS PubMed Google Scholar
Boetto, C. MANOCCA. (2024).
Hercberg, S. et al. The Nutrinet-Sante Study: a web-based prospective study on the relationship between nutrition and health and determinants of dietary patterns and nutritional status. BMC Public Health 10, 242 (2010).
Article PubMed PubMed Central Google Scholar
Sokal, R. R. & Rohlf, F. J. Biometry: The Principles and Practice of Statistics in Biological Research, (W. H. Freeman, 1981).

Download references

Acknowledgements

This research was supported by the Agence Nationale pour la Recherche (ANR-20-CE15-0012-01). This work has been conducted as part of the INCEPTION program (Investissement d’Avenir grant ANR-16-CONV-0005). The Milieu Interieur consortium was also supported by the Agence Nationale pour la Recherche (ANR-10-LABX-69-01). More specifically, authors C.B., V.B.R., L.H., A.F., A.A., M.B., S.P.K., L.Q.M., H.S. and H.A. were funded using ANR-20-CE15-0012-01. Authors E.P., D.D. and L.Q.M. were funded using ANR-10-LABX-69-01. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Institut Pasteur, Université Paris Cité, Department of Computational Biology, F-75015, Paris, France
Christophe Boetto, Violeta Basten Romero, Léo Henches, Arthur Frouin, Antoine Auvergne, Sean P. Kennedy & Hugues Aschard
Human Evolutionary Genetics Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, 75015, Paris, France
Etienne Patin & Lluis Quintana-Murci
Sorbonne Université, INSERM UMRS-938, Centre de Recherche Saint-Antoine, CRSA, AP-HP, Paris, 75012, France
Marius Bredon & Harry Sokol
Translational Immunology Unit, Institut Pasteur, Université de Paris Cité, 75015, Paris, France
Darragh Duffy
Chair of Human Genomics and Evolution, Collège de France, 75005, Paris, France
Lluis Quintana-Murci
Gut, Liver & Microbiome Research (GLIMMER) FHU, Paris, France
Harry Sokol
Université Paris-Saclay, INRAe, AgroParisTech, Micalis Institute, Jouy-en-Josas, 78350, France
Harry Sokol
Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
Hugues Aschard
Hôpital Necker, Paris, France
Laurent Abel
Trinity College, Dublin, Ireland
Andres Alcover, Hugues Aschard, Philippe Bousso & Nollaig Bourke
Karolinska Institutet, Stockholm, Sweden
Petter Brodin, Ivo Gomperts-Boneca, Milena Hasan & Gunilla Karlsson Hedestam
INSERM UMR 1163 – Institut Imagine, Paris, France
Pierre Bruhns, Nadine Cerf-Bensussan, Mickaël Ménager, Etienne Patin, Sandra Pellegrini, Antonio Rausell & Frédéric Rieux-Laucat
EPFL, Lausanne, Switzerland
Ana Cumano, Christophe D’Enfert, Ludovic Deriano, Marie-Agnès Dillies, James Di Santo, Gérard Eberl, Jost Enninga & Jacques Fellay
Université Paris 13, Paris, France
Serge Hercberg & Mathilde Touvier
Institut Cochin and Institut Pasteur, Paris, France
Molly A. Ingersoll
Institut Curie, Paris, France
Olivier Lantz
Trinity College Dublin, Dublin, Ireland
Rose Anne Kenny, Frédérique Michel, Hugo Mouquet & Cliona O’Farrelly
Institut Roche, Paris, France
Lars Rogge & Magnus Fontes
Hôpital Saint-Louis, Paris, France
Anavaj Sakuntabhai, Olivier Schwartz, Benno Schwikowski, Spencer Shorte, Frédéric Tangy & Antoine Toubert
In Sitro, San Francisco, California, US
Marie-Noëlle Ungeheuer, Christophe Zimmer & Matthew L. Albert

Authors

Christophe Boetto
View author publications
Search author on:PubMed Google Scholar
Violeta Basten Romero
View author publications
Search author on:PubMed Google Scholar
Léo Henches
View author publications
Search author on:PubMed Google Scholar
Arthur Frouin
View author publications
Search author on:PubMed Google Scholar
Antoine Auvergne
View author publications
Search author on:PubMed Google Scholar
Etienne Patin
View author publications
Search author on:PubMed Google Scholar
Marius Bredon
View author publications
Search author on:PubMed Google Scholar
Sean P. Kennedy
View author publications
Search author on:PubMed Google Scholar
Darragh Duffy
View author publications
Search author on:PubMed Google Scholar
Lluis Quintana-Murci
View author publications
Search author on:PubMed Google Scholar
Harry Sokol
View author publications
Search author on:PubMed Google Scholar
Hugues Aschard
View author publications
Search author on:PubMed Google Scholar

Consortia

Milieu Intérieur Consortium

Laurent Abel
, Andres Alcover
, Hugues Aschard
, Philippe Bousso
, Nollaig Bourke
, Petter Brodin
, Pierre Bruhns
, Nadine Cerf-Bensussan
, Ana Cumano
, Christophe D’Enfert
, Ludovic Deriano
, Marie-Agnès Dillies
, James Di Santo
, Gérard Eberl
, Jost Enninga
, Jacques Fellay
, Ivo Gomperts-Boneca
, Milena Hasan
, Gunilla Karlsson Hedestam
, Serge Hercberg
, Molly A. Ingersoll
, Olivier Lantz
, Rose Anne Kenny
, Mickaël Ménager
, Frédérique Michel
, Hugo Mouquet
, Cliona O’Farrelly
, Etienne Patin
, Sandra Pellegrini
, Antonio Rausell
, Frédéric Rieux-Laucat
, Lars Rogge
, Magnus Fontes
, Anavaj Sakuntabhai
, Olivier Schwartz
, Benno Schwikowski
, Spencer Shorte
, Frédéric Tangy
, Antoine Toubert
, Mathilde Touvier
, Marie-Noëlle Ungeheuer
, Christophe Zimmer
, Matthew L. Albert
, Darragh Duffy
& Lluis Quintana-Murci

Contributions

C.B. performed analyses and drafted the manuscript. V.B.R. worked on data selection and preprocessing, L.H. worked on bioinformatics pipelines, A.F. and A.A. contributed addressing methodological questions, E.P., M.B., S.P.K. worked on the microbiome data processing and analysis, D.D., L.Q.M. and H.S. supervised the microbiome data generation and analyses, H.A. ran some analyses, supervised the study and drafted the manuscript.

Corresponding authors

Correspondence to Christophe Boetto or Hugues Aschard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks James D. Brunner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Sabina La Rosa, Tobias Goris and David Favero. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer review file (download PDF )

Supplementary Information (download PDF )

Description of Additional Supplementary Materials (download PDF )

Supplementary Data 1-4 (download XLSX )

Reporting summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Boetto, C., Romero, V.B., Henches, L. et al. The influence of environment on bacterial co-abundance in the gut microbiomes of healthy human individuals. Commun Biol 8, 1537 (2025). https://doi.org/10.1038/s42003-025-08895-y

Download citation

Received: 21 March 2025
Accepted: 16 September 2025
Published: 06 November 2025
Version of record: 06 November 2025
DOI: https://doi.org/10.1038/s42003-025-08895-y