Introduction

The ongoing coronavirus disease 19 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been a major cause of death worldwide in the last three years (World Health Organization). Although COVID-19 is a respiratory illness, about 10–20% of patients suffer from gastrointestinal symptoms, including loss of appetite, diarrhea, nausea, and vomiting1,2. These symptoms are often associated with a high level of fecal calprotectin (an indicator of intestinal inflammation) and interleukin 18 (IL-18, a cytokine that mediates intestinal inflammatory reactions) in patients’ stools3,4. COVID-19 RNA and, more rarely, infectious viruses can be detected in infected patients’ fecal samples, suggesting that the gastrointestinal tract is a major target organ of SARS-CoV-2 infection5,6. Moreover, the Angiotensin-Converting Enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2), both critical proteins for entry of SARS-CoV-2 into host cells, are expressed at high levels in the gastrointestinal epithelial cells7.

Increasing evidence from preclinical models and clinical studies suggests that SARS-CoV-2 infection leads to alterations of the gut microbiota and that gut dysbiosis could be involved in COVID-19 severity8,9,10,11,12,13,14,15. Patients experiencing severe COVID-19 have significant alterations in gut microbiota’s composition, characterized by a loss of microbial diversity and richness. In several cohorts, SARS-CoV-2 infection was associated with depletion of beneficial taxa, including butyrate producers (e.g., genera from the Ruminococcaceae family) and bacterial species with known immunomodulatory potential (e.g., Faecalibacterium prausnitzii and Eubacterium rectale)10,16,17,18. In contrast, enrichment of potential opportunistic pathobionts was reported in COVID-19 patients, such as Bacteroides nordii, Rothia, Actinomyces, Ruminococcus, and Clostridium hathewayi10,16,17,18,19. Of interest, a limited number of studies, mostly in Asian populations, pointed out stronger microbiota alterations in patients developing severe COVID-1915,20,21. Beyond the taxonomy, recent studies have shown functional alterations linked to COVID-19, including metabolic pathways related to SCFA production and bile acid metabolism12,22. However, only European, North American, and Asian populations have been studied in this regard, and no data are available for African patients suffering from COVID-19 infection.

In the current study, we compared the gut microbiota of 200 patients with COVID-19 and 102 healthy subjects from Moroccan and French cohorts. Through shotgun metagenomics and targeted quantitative metabolomics, we showed that COVID-19 infection is associated with alterations in the gut microbiota diversity, composition, and functions in both populations. Interestingly, we found common signals in the patients from the two continents. The strength of the alterations is more marked in severe COVID-19 patients. Finally, we showed that a machine learning-based approach using only bacterial taxa could predict COVID-19 infection severity with high accuracy, but it was not transposable from one population to the other.

Results

Patient cohorts

From April 2020 to January 2021, 316 fecal samples were collected from two populations: 163 from Morocco on admission to the hospital (123 COVID-19 patients and 40 healthy subjects) and 153 from France at different times during hospitalization (91 COVID-19 samples from 77 unique patients and 62 healthy subjects). Less than half of the Moroccan patients with COVID-19 had a severe infection (51 patients, 41.5%), of which five died during hospitalization, and 47% had comorbidity (obesity was the most common, 10 patients were concerned). Among the French COVID-19 patients with comorbidity (64.9%), cardiovascular diseases were the most common (68%), followed by high blood pressure (60%) and obesity (42%). Ten French COVID-19 patients (13%) died during hospitalization. COVID-19 Moroccan patients were characterized by significantly higher levels of D-dimers, C-reactive protein, ALT, AST, and LDH, and lower platelet and lymphocyte counts compared to healthy subjects. Moreover, Moroccan patients with severe infection were older and hospitalized longer than non-severe patients, and they had higher levels of D-dimers, C-reactive protein, platelets, and LDH than the latter. Regarding the French cohort, COVID-19 patients were older, and they had a higher BMI than healthy subjects. The characteristics of these two populations are described in Supplementary Table 1.

COVID-19 is associated to alteration of the microbiota composition

Average compositions and relative abundance of the bacterial community in Morocco and French populations at the phylum level showed that Bacteroidetes and Firmicutes were dominant in all patients (Fig. 1a). Among all the studied metadata, COVID-19 was the factor that most explained changes in the microbiota composition (Fig. 1b), representing 40.0% variance observed explained in the Moroccan population and 12.9% in the French one, far ahead of other factors such as age (not significant in Moroccan population, 4.0% in the French population), comorbidity (0.4% in Moroccan population, 1.6% in the French population) or moisture in the feces (information available only for the French population, not significant). However, the taxonomic composition, richness, and diversity of Moroccan and French populations were significantly different (Fig. 1c, d and Supplementary Fig. 1). According to the Shannon and Chao1 indices, species richness is similar between the two cohorts, but their distribution differs; the Moroccan cohort appears to be more homogeneous than the French cohort (Supplementary Fig. 1). The patient’s origin explains 37.9% of the variance in microbiota when all the samples are compared together, far ahead of the other significant factors (3.7% for COVID and 0.4% for sex). These differences may also result from variations in sample processing between the two cohorts. At the phylum level, Bacteroidetes dominated the French microbiota (representing 57% of the samples on average), while Firmicutes dominated the Moroccan ones (representing 74% of the samples on average). Oscillospiraceae and Veillonellaceae were the most important Firmicutes families in the Moroccan population (representing 38% and 20% of the samples on average, respectively), whereas Bacteroidaceae (Bacteroidetes) dominated the French population (representing 21% of the samples on average).

Fig. 1: Association between COVID-19 infection and the microbiota composition of Moroccan and French cohorts.
figure 1

a Relative abundance of prokaryotic taxa in microbiota of Moroccan and French COVID-19 patients and healthy subjects at phylum, family, and genus levels. b Explained variance in Bray-Curtis distance (R²) calculated from PERMANOVA tests in Moroccan and French population. No significant associations were observed (FDR p value > 0.05) for other demographic details, covariates, or hospital course information. c Alpha diversity indices (Shannon’s H and Chao1) calculated from the raw taxonomic tables of Moroccan and French COVID-19 patients and healthy subjects. Wilcoxon tests were used to compare the groups. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001. d PCoA were built from the Bray-Curtis dissimilarity matrices constructed from the normalized abundance of species of each microbiota. Ellipses were drawn around the centroids of each emerging community at 95% (inner) and 97% (outer) confidence intervals.

Despite significant differences in the composition of their microbiota, COVID-19 patients from the two populations both showed significantly loss of richness and diversity (Fig. 1c) and an alteration of its overall composition compared with healthy subjects (Fig. 1d, Morocco: PERMANOVA; R² = 37.8%, F = 97.69, df = 1, P = 0.001 – France: PERMANOVA; R² = 9.65%,F = 16.02, df = 1, P = 0.003). Furthermore, compared with healthy subjects, Firmicutes were less abundant in French COVID-19 patients (representing 20% and 38% on average of the COVID-19 and healthy samples, respectively) in favor of Bacteroidetes, while the opposite was true for Moroccan COVID-19 patients (representing 76% and 71% on average of the COVID-19 and healthy samples, respectively) (Fig. 1a).

Multivariable analyzes (after accounting for gender, age, presence of various comorbodities, smoking status and treatment with chemotherapy, immunosuppressant, metformin or antibiotics) showed that the relative abundance of large number of microbiota species was significantly associated to COVID-19 infection in Moroccan and French patients (Fig. 2, Supplementary Fig. 2, and Supplementary Data 1). Among the most abundant bacteria of Moroccan patients, relative abundance of several of them known to be beneficial to host physiology was significantly depleted in COVID-19 patients, like some Bacteroides species, Faecalibacterium prausnitzii, Flavonifractor plautii, Ruthenibacterium lactatiformans, Dysosmobacter welbionis, Intestinimonas butyriciproducens, Ruminococcus bicirculans, Adlercreutzia equolifaciens, Collinsella aerofaciens, and Coprococcus comes (Fig. 2 and Supplementary Data 1). On the other hand, compared with healthy subjects, gut bacteria composition of COVID-19 patients was significantly enriched in bacterial pathobionts, including Ruminococcus gnavus, Klebsiella pneumoniae, Klebsiella variicola, and Bacteroides ovatus (Fig. 2 and Supplementary Data 1). In addition, in the Moroccan population, 17 prokaryotic taxa were only found among COVID-19 patients (Supplementary Fig. 3), with some Enterobacter and Klebsiella species and Citrobacter freundii complex sp. CFNIH9. Of note, Candidatus Mancarchaeum acidiphilum, an archea identified in acidophilic microbiomes23, was the only microorganism present in all COVID-19 patients and no healthy subjects.

Fig. 2: Most abundant prokaryotic taxa in microbiota of Moroccan population associated to COVID-19.
figure 2

Colors on the heatmap represent the scaled (Z-score) abundance of each taxon in samples. The arrows above the heatmap correspond to the results of the multivariable analyzes: the direction of each arrow indicates whether the taxon is more or less abundant in COVID-19 patients compared to healthy subjects, and in severe patients compared to non-severe patients. The heatmap was divided into three clusters corresponding to healthy subjects, non-severe patients, and severe patients. On the right-hand side of the heatmap, each patient’s hospitalization time, age, presence of comorbidities, ALT value, C-reactive protein level, and lymphocyte count are displayed. The histograms in the top right illustrate the differences in ALT value, C-reactive protein level, and lymphocyte count between healthy subjects, non-severe patients, and severe patients.

Despite significant differences in the composition of the microbiota of French and Moroccan populations, 3778 common taxa were negatively associated with the disease (qval < 0.05, Fig. S4 and Supplementary Data 1), including commensal bacteria such as Bacteroides species, F. prausnitzii, and C. comes. Among the most abundant bacteria in the French population, other bacteria important for the host’s metabolism were also decreased in COVID-19 patients, such as Odoribacter splanchnicus, Phascolarctobacterium faecium, Bifidobacterium longum, Roseburia intestinalis, Dysosmobacter welbionis, and Intestinimonas butyriciproducens (Supplementary Data 1). No taxa unique to COVID-19 patients compared to healthy subjects were identified in the French cohort.

The greater the severity of COVID-19 infection, the greater the alteration of the gut microbiota

Samples from the Moroccan patients were all obtained at hospital admission before any treatment, while samples from French patients were much more heterogeneous, with notably different types of treatment already started. Therefore, the following analyzes were performed exclusively on the Moroccan population. The gut microbiota alterations were greater in patients with severe COVID-19 infection (Fig. 3a). The richness of the microbiota of severe and non-severe patients was equivalent, but the Shannon index, taking into account both diversity and evenness, was lower in severe patients compared to non-severe and healthy subjects (Fig. 3b). The relative abundance of Bacteroidetes decreased more in favor of the Firmicutes in severe patients compared to non-severe ones and healthy subjects. Beneficial bacteria, such as F. prausnitzii, Roseburia hominis, and Bacteroides uniformis, were even more reduced in severe patients compared to non-severe patients and healthy subjects (Figs. 2, 3a) after accounting for gender, age, presence of various comorbidities, smoking status, and medication. No taxa were exclusive to severe patients, but some taxa found only in COVID-19 patients exhibited differential abundance between severe and non-severe cases (Supplementary Fig. 3), including many Klebsiella and Enterobacter species. Within COVID-19 population, disease severity explained a statistically significant proportion of variance in Bray–Curtis distances (19.8%, FDR p value = 0.015), ahead of the other significant tested factors (Fig. 3c). At the whole microbiome community level, microbiota of severe and non-severe patients were distinct from those of healthy controls (Fig. 3d, PERMANOVA: R² = 48.7%, F = 75.889, df = 2, P = 0.001), and also differ between severe and non-severe patients (PERMANOVA: R² = 19.8%, F = 29.825, df = 2, P = 0.001).

Fig. 3: Association between the severity of the disease and microbiota composition of Moroccan patients.
figure 3

a Relative abundance of prokaryotic taxa in Moroccan microbiota of severe, non-severe patients and healthy subjects at phylum, family, and genus levels. b Alpha diversity indices (Shannon’s H and Chao1) calculated from the raw taxonomic tables of severe, non-severe patients and healthy subjects from Morocco. Kruskal-Wallis tests with Dunn’s test post-hoc (Benjamini-Hochberg p-value correction method) were used to compare the three groups. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001. c Explained variance in Bray-Curtis distance (R²) calculated from PERMANOVA tests in Moroccan patients. No significant associations were observed (FDR p-value > 0.05) for other demographic details, covariates, or hospital course information. d PCoA build from the Bray-Curtis dissimilarity matrices constructed from the normalized abundance of species of each microbiota. Ellipses were drawn around the centroids of each emerging community at 95% (inner) and 97% (outer) confidence intervals.

COVID-19 infection is also associated with functional alterations of the gut microbiota

To further investigate the functional roles of the altered intestinal microbiome in COVID-19 patients, the HUMAnN 3 software was used to annotate the potential functions of gut microbiota and calculate the MetaCyc pathway abundance. At the whole population levels, including all COVID-19 patients and healthy subjects, pathways identified in French and Moroccan microbiota were significantly different (Fig. 4a, PERMANOVA: R² = 5.6%, F = 18.478, df = 1, P = 0.001). In both populations, there were two distinct clusters according to the COVID-19 status (COVID-19 patients/healthy subjects) (Morocco: PERMANOVA; R² = 68.1%, F = 344.54, df = 1, P = 0.001 – France: PERMANOVA; R² = 6.1%, F = 9.6895, df = 1, P = 0.003). Multivariable analyzes (after accounting for gender, age, presence of various comorbidities, smoking status and treatment with chemotherapy, immunosuppressant, metformin or antibiotics) showed that the relative abundance of 304 pathways was significantly different between COVID-19 and healthy Moroccan subjects (qval < 0.05), while the abundance of only 31 pathways was different between COVID-19 and healthy French subjects at a qval of 0.2 (Fig. 5, Supplementary Fig. 5, and Supplementary Data 2). Among the most abundant pathways, most of those associated to COVID-19 infection were related to amino acid biosynthesis, nucleoside biosynthesis, glycolysis, and pyruvate SCFAs (Fig. 5). Among those, the three pathways most associated with COVID-19 infection in the Moroccan cohort were coenzyme A biosynthesis, phosphopantothenate biosynthesis, and L-tryptophan biosynthesis (Supplementary Data 2). The coenzyme A biosynthesis and the phosphopantothenate biosynthesis pathways are directly connected (phosphopantothenate biosynthesis is required for coenzyme A biosynthesis) and are crucial for energy metabolism and particularly citric acid cycle downstream of glycolysis.

Fig. 4: Association of the COVID-19 and its severity with the functions of the microbiota of Moroccan and French patients.
figure 4

PCoA were built from Bray-Curtis (a, b) dissimilarity matrices constructed from the normalized abundance of pathways in microbiota (severity analyzes are based on Moroccan population only). Ellipses were drawn around the centroids of each emerging community at 95% (inner) and 97% (outer) confidence intervals. c Result of the MaAsLin2 analysis for the L-tryptophan biosynthesis pathway. d Tryptophan metabolites whose abundance was significantly different between healthy subjects, severe COVID-19 patients, and non-severe ones. Kruskal-Wallis tests with Dunn’s test post-hoc (Benjamini-Hochberg p-value correction method) were used to compare the three groups. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001. e Spearmans correlation between microbiota species and the abundance of L-Tryptophan metabolites in Moroccan patients. Only correlations with a q-value < 0.1 were represented in the heatmap. Colors represent the strength of the correlation. The numbers correspond to clusters based on correlation similarity.

Fig. 5: Most abundant pathways in microbiota of Moroccan population associated with COVID-19 infection.
figure 5

Colors on the heatmap represent the scaled (Z-score) abundance of each pathway in samples. The arrows above the heatmap correspond to the results of the multivariable analyzes: the direction of each arrow indicates whether the pathway is more or less abundant in COVID-19 patients compared to healthy subjects, and in severe patients compared to non-severe patients. The heatmap was divided into three clusters corresponding to healthy subjects, non-severe patients, and severe patients. On the right-hand side of the heatmap, each patient’s hospitalization time, age, presence of comorbidities, ALT value, C-reactive protein level, and lymphocyte count are displayed. The histograms in the top right illustrate the differences in ALT value, C-reactive protein level, and lymphocyte count between healthy subjects, non-severe patients, and severe patients.

In line with taxonomic data, many microbiota functions were more altered in severe than non-severe patients (Fig. 5). PCoA showed that pathways repertoire of severe and non-severe patients was distinct from those of healthy controls (Fig. 4b, PERMANOVA: R² = 80.9%, F = 339.11, df = 2, P = 0.001), and also differed between severe and non-severe patients (PERMANOVA: R² = 44.5%, F = 97.0, df = 1, P = 0.001). In total, the abundance of 239 pathways differed between COVID-19 patients and healthy subjects (qval < 0.05, Supplementary Data 2).

Multivariable analyzes showed that L-Tryptophan-related pathways were strongly associated with COVID-19 infection in Moroccan patients (Fig. 4c). As others have observed tryptophan metabolism alterations in COVID-19 infection setting24,25,26, we performed targeted quantitative metabolomics focusing on 20 Tryptophan metabolites in the serum of Moroccan patients obtained at the same time as stool samples. Tryptophan can be metabolized into a multitude of bioactive molecules through three major pathways: the kynurenine pathway, the serotonin pathway, and the indole pathway. While the first two occur in mammalian cells, the last pathway takes place in the gut microbiota and leads to the production of aryl hydrocarbon receptor (AhR) agonists that exhibit many beneficial effects for the host27,28. The abundance of several tryptophan metabolites was significantly different between COVID-19 patients and healthy subjects, and between non-severe and severe patients (Fig. 4d). These metabolites belonged to all three tryptophan pathways (kynurenine, serotonin, and indole pathways), suggesting that the disease is associated with the L-Tryptophan metabolism as a whole. Interestingly, indoles, such as indole-3-acetic acid or indole-3-aldehyde, were increased in patients with COVID-19. This might seem paradoxical as these molecules are known to have beneficial effects on intestinal homeostasis and immunity. This may be related to the increased availability of tryptophan in the colon, as suggested by shotgun sequencing data showing an enhanced tryptophan synthesis (TRPSYN-PWY) by gut bacteria of patients with COVID-19 (Fig. 4c).

The abundance of several bacterial species correlated with the abundance of six tryptophan metabolites from the three different metabolism pathways (Fig. 4e). Interestingly, most of those bacteria (22 out of 31) were significantly associated with COVID-19 infection (Supplementary Data 1), suggesting some link between gut microbiota alterations, tryptophan metabolism and COVID-19 infection.

Machine learning models found that gut microbiota is associated with COVID-19 severity

A random forest classifier model was constructed based on genus-level microbiota composition to identify gut microbial markers associated with COVID-19 severity (see material and methods for severity definition) using data from the Moroccan population (51 severe and 72 non-severe COVID-19 patients). For this purpose, we used an approach that allowed us to validate the obtained results internally. Eighty percent of the initial population was first randomly sampled 200 times, and VSURF was applied on each subsample (modeling of the outcomes was performed on the remaining 20% of the population) to perform variable selection and rank them according to their respective importance in the models. Then, we kept only the bacterial genus selected at least 100 times (Fig. 6a). We then determined the optimal number of variables to include in the predictive model. We tested one to more than 300 variables and evaluated the AUC in 100 trials, showing that the optimal number of variables to include in the model was only six (Fig. 6b). To refine the results, ROC curves were generated by random forests on 1000 random trials (Fig. 6c) using the first six variables selected by VSURF (Fig. S6). The selected variables were the genus Enterococcus, Klebsiella, Salmonella, Phascolarctobacterium, Kluyvera, and Raoultella (Fig. 6d). The model was very accurate, with an AUC of 96% (confidence interval = 0.89–0.99).

Fig. 6: Model of COVID-19 severity.
figure 6

a Number of times a variable was selected by VSURF in 200 trials. b Average AUC on 100 trials using random forests according to the number of variables in the model. c Average AUC on 1000 trials using random forests according to the number of variables in the model. d Bacterial genera used to generate predictive models of COVID-19 severity.

We then tried to apply the model to the French population. However, likely because the French and Moroccan microbiota were very different and because the French patients were heavily treated, the model failed to classify patients as severe or non-severe: only two French patients were classified as non-severe, while all the others were classified as severe. As an experiment, we tried to build a new model from the Moroccan data using the same method as described above, but this time using as input the bacterial genera that were associated to COVID-19 in both populations in the same way. The selected variables were the genus Thiothrix, Iodobacter, Alienimonas, Salmonella, Methylotenera, Lelliottia, and Limosilactobacillus, and the AUC was 96% (confidence interval = 0.91–0.99). This model was then applied to the French data to test its ability to identify patients with severe infection phenotype. At the whole microbiome community level, the severe predicted, non-severe predicted patients, and healthy subjects were clustered into three distinct clusters (PERMANOVA: R² = 13.1%, F = 10.214, df = 2, P = 0.001). Nevertheless, regarding severity parameters available to us (respiratory rate, CRP levels, transfer to intensive care or hospitalization time), there was no significant difference between the severe and non-severe predicted phenotypes.

Discussion

This study is the first one to investigate and compare the relationship between the gut microbiome and COVID-19 in two cohorts from two different continents, including Africa, which was studied for the first time in this context. Using shotgun metagenomic analysis of 316 stool samples from Moroccan and French subjects, we showed that the gut microbiota and its functions were disturbed in patients with COVID-19 infection, and in an even greater way in patients with severe phenotype. These alterations were characterized by reduced diversity, depletion of beneficial bacteria, increased pathobionts, and alterations in several microbiome functions, including tryptophan-related metabolites. Using random forest machine learning, we further show that the severity of the disease can be accurately identified by incorporating these bacteria into a model. However, the predictive model generated for the Moroccan population was not applicable to the French one, suggesting either a “population specificity” in microbiome-based predictors or an effect of differences in treatment between the two populations.

The French and Moroccan populations differed in several ways. Firstly, the composition of their microbiota was very different: Firmicutes dominated the Moroccan population, while Bacteroidetes dominated in the French population. Secondly, patients with COVID-19 were not under the same treatment at sampling. Particularly, while all samples from Moroccan patients were obtained before the initiation of a standardized treatment, samples from French patients were obtained at different time points, and many patients were already under treatment, including antibiotics that disturb the gut microbiota29. This led to significant heterogeneity within the French cohort. Despite these differences, we observed many similarities in the gut microbiota disruption associated with COVID-19 infection in the two populations. The relative abundance of SCFA-producing bacteria, especially butyrate, including F. prausnitzii, D. welbionis, I. butyriciproducens, C. aerofaciens, and C. comes, was negatively associated with COVID-19 infection30. Also, the abundance of bacteria producing secondary metabolites, such as F. plautii and A. equolifaciens, was decreased in COVID-19 patients. Some of these signals have also been observed in other studies investigating patients from Asia and North America, particularly with regard to SCFA-producing bacteria and those with immunomodulatory potential10,15,16,17,18,31. In contrast, the abundance of known pathogenic bacteria was increased in infected patients, in accordance with previous studies10,11,16, but several were identified here, for the first time, in the Moroccan cohort (e.g., Klebsiella species, Bacteroides ovatus).

The severity of the disease seems to be linked with gut microbiota alterations since the gut microbiota of patients with severe COVID-19 was more altered than non-severe ones. These data align with previous studies, particularly regarding the decreased abundance of beneficial microorganisms, including F. prausnitzii, R. hominis, and C. comes9,10,15,16. Our machine learning model based on bacterial genus showed that severity of COVID-19 can be identified with high accuracy (AUC = 96%) with relatively simple microbiome data. Nevertheless, the application of this predictive model, which was generated and validated in the Moroccan population with the VSURF approach, did not work well on the French population. This might be due to the significant heterogeneity within the French population, with notably the effect of underlying treatments, including antibiotics. Another possibility might be that the use of a model generated and validated in a given population is poorly applicable in another population with a very different gut microbiota structure. Including in the model only bacterial genera that are similarly associated with the disease in all studied populations, as we did here, could be successful in other contexts. Studies have already shown that it is possible to identify COVID-19 severity in a given population based on clinical data15,32,33,34, but these approaches appear to be much less accurate than our microbiome-based one. Recently, Nguyen et al.15 have created a random forest classifier that discriminates between moderate and severe cases of COVID-19 using only gut microbial features. They also tried to include clinical features, but this did not improve the classification accuracy of the model.

As a whole, functions identified in microbiota from France and Morocco were significantly different. This may seem surprising as it has been observed that there is functional redundancy in microbial systems, even if the communities are different35,36, but it likely illustrates the very different environments in which French and Moroccan subjects are living. In the two populations, COVID-19 infection was associated with a disturbance of the microbiota functional pathways and their relative abundance. As for the taxa, functions identified in severe patients were more disrupted than in non-severe patients. Of note, L-Tryptophan biosynthesis pathway was one of the most positively associated to COVID-19 infection. Guided by this result, we performed targeted metabolomics focusing on tryptophan metabolisms and showed for the first time clear alterations associated with COVID-19 infections and a link with microbiome. Our metabolomics analysis showed that both microbiome and host Tryptophan metabolism pathways are associated with COVID-19 infection. Several metabolites from the kynurenine pathway, known to be activated in case of inflammatory processes, were higher in COVID-19 patients and even more in severe COVID-19 patients, as previously observed by others24,25,37,38,39. More surprisingly, we observed that indoles were also increased in patients with COVID-19 infection and even more in severe COVID-19 patients. Indoles are produced from tryptophan by members of the gut microbiota. In several diseases involving an alteration of the gut microbiome, such as IBD28 or metabolic syndrome40, their production is impaired, and it is thought to play a role in the pathogenesis. In the present case, the increased production of indoles might be related to the higher capacity of colonic microorganisms to produce tryptophan (as shown by shotgun results, Fig. 4c), which is then available for the production of indoles. Interestingly, these microbiota-derived indoles are found in the systemic circulation and are known to activate AhR, which plays a role in the anti-viral immune response41, suggesting a potential role in COVID-19 infection.

Finally, our study demonstrates for the first time that alterations of the gut microbiota composition and function are associated with COVID-19 infection in both European and African populations. Moreover, our shotgun metagenomics-based analysis provides knowledge on both taxa and functions altered in COVID-19-infected patients. Infection with COVID-19 is associated with a profound disruption of the microbiota composition and function, including pathways central to the host, such as tryptophan metabolism. Altogether, although our study cannot ensure causality, the results support the role of the gut microbiota in COVID-19 infection severity and suggest it could be targeted from a preventive or therapeutic perspective.

Methods

Moroccan cohort

Consecutive Moroccan patients with confirmed COVID-19 (123 patients), including severe and non-severe forms, were admitted at Cheikh Zaïd Hospital (Rabat, Morocco) from 07 Jul 2020 to 09 Oct 2020 (first three waves of the COVID-19 pandemic) (CEFCZ/PR/2020-PR04). Patients were categorized as having severe COVID-19 on the basis of clinical criteria developed by the American Thoracic Society guidelines for community-acquired pneumonia42. Forty healthy subjects were recruited in the same geographical area.

Serum was obtained from healthy subjects and patients with COVID-19 in accordance with the guidelines of the Ethics Committee of Cheikh Zaïd Hospital of Rabat. Each stool sample was stored at −80 °C until analysis. Stool samples from COVID-19 patients were collected at the time of hospital admission before any treatment was started.

French cohort

French patients with confirmed COVID-19 (total of 77 patients) were admitted in the “Assistance Publique-Hôpitaux de Paris” network of 39 hospitals of the Greater Paris area (20 April 2020 to 22 Jan 2021, first three waves of the COVID-19 pandemic). These patients were part of COVIDeF cohort. Fecal samples were collected in RNAlater at Days 0, 3, 7, or hospital discharge, and stored at -80°C until analysis. For most of the patients, only one sample was available and analyzed. For machine learning analysis, only the first sample was considered for a given patient. Informed consent was obtained from the patients or their relatives in case of inability to consent. The study was approved by a research ethics committee (CPP Ile de France XI, advice N°20026-80727, Clinical Trial number NCT04352348). Sixty-two healthy subjects (without symptoms) were recruited in the framework of the Suivitheque study (Comité de Protection des Personnes Ile-de-France IV, IRB 00003835, registration number 2012/05NICB).

Patients and the public were not involved in this study.

Sequencing

Stool samples were resuspended in MGIEasy Stool Sample Collection Kit (1000005265/1000003702), and DNA extraction was performed using MagPure Stool DNA LQ Kit (384 RXN). Seven-hundred-microliter stool sample suspension was transferred to 2.0 ml deepwell plates containing MagPure grinding beads using MGI-STP7000, and extraction was performed using MGI-SP960 automation robot. Cell lysis was performed by beating plate for 1 min at 1600 rpm and thermal lysis at 650 C for 20 min. For DNA purification, 340 µl of sample was used.

DNA libraries were prepared using MGIEasy FS DNA Library Prep Set (1000006988), circularized using MGIeasy circularization module V2.0 (1000005260), and sequenced on DNBSEQ-T7 using High-throughput Sequencing Set (FCL PE150) (1000016106).

Read processing and quality control

The size of the sequenced pair-end libraries ranges from 57,705,898 bp to 261,629,626 bp, representing a total of over 43 billion 150 bp reads. Read quality was checked with FastQC (version 0.11.9), and low-quality reads and sequencing adapters were removed using Trimmomatic43 (version 0.39). Reads shorter than 75 base pairs were discarded. Host-reads were removed using KneadData with default parameters (version 0.10.0; http://huttenhower.sph.harvard.edu/kneaddata) by mapping reads to the Homo sapiens reference database44 (build hg37dec_v0.1).

Taxonomic profiling and macrodiversity calculations

The samples were taxonomically profiled using Kraken45 (version 2.1.2) with the “PlusPF” database (05/2021). The number of reads originating from each species was then estimated by Bracken46 (version 2.5). For the subsequent analyzes (except for alpha-diversity calculations), the abundance of each taxon present in a sample was normalized using the relative method to allow sample-to-sample comparison. Taxa whose average abundance and prevalence were less than 0.1% and 3%, respectively, were discarded.

Taxonomic profiles were analyzed in R (version 4.0.5) using the Phyloseq package47 (version 1.34.0). Statistical analyzes were performed using rstatix48 (version 0.7), and figures were plotted with the ggplot249 and ComplexHeatmap50 packages.

Principal coordinates analyzes (PCoA) was carried out on the Bray-Curtis dissimilarity matrices constructed from the abundance of species. Communities that emerged were verified using a PERMANOVA test with Vegan package51 (version 2.5–7).

Multivariable association between microbial community abundance and disease status (COVID-19 infection status and severity were analyzed independently) was examined with MaAsLin252 by analyzing both populations independently. The following potential confounding factors were considered in each analysis: gender, age, presence of diabetes, obesity, HTA, smoking status, and treatment with chemotherapy, immunosuppressant, metformin, or antibiotics.

Functional annotation

Functional potential analysis of the metagenomic samples (pathway profiles and gene-family abundances) was performed using HUMAnN353 (version v3.0.0, UniRef database release 07-2021). Multivariable associations and PCoA were calculated in the same way as described above.

Spearman correlation analyzes were conducted to associate L-Tryptophan metabolites and microbiota species using the R package energy (version 1.7–8). Correlations with adjusted p values < 0.1 (Benjamini-Hochberg Procedure) were considered significant.

Targeted quantitative metabolomics

Targeted quantitative metabolomics was performed on all Moroccan patients (n = 123) and healthy subjects (n = 40). Samples were lyophilized (3 mg) and weighted. The quantification of 20 tryptophan metabolites was based on liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS), and has been described previously54.

Machine Learning

Random forest method was used to build a model that can identify factors associated to severity in Moroccan COVID-19 patients. First, the R package VSURF55 (version 1.1.0) was utilized to perform variable selection and prediction from the normalized table of abundance of bacterial genera of the Moroccan population. To avoid overfitting of the data, 80% of the initial population was randomly sampled several times, and VSURF was trained on each subsample. The final score of each variable was the average value obtained over all subsamples. The final model only contained variables that had been selected at least 50% of the time, and they were then ranked in decreasing order of importance to decide how many variables would be kept in the final model. This number was chosen to determine which subset of variables provided the best results in terms of prediction score, and what was the minimum number of variables that were necessary to reach a reasonable performance.

The performance criterion considered was the area under the Receiver Operating Characteristic (ROC) curve, referred to as AUC. For a subsample of 80% of the initial population, the algorithm described above was applied, and then the results were predicted for the remaining 20% of the population and compared with the actual results. This procedure also allowed to computation of empirical confidence intervals, bounds of which were the values that contained 95% of the AUC obtained over all subsamples. The R package pROC56 (version 1.18.0) was used to create the ROC curves.