Main

Chronic kidney disease (CKD) affects 10–15% of the global population and is the tenth leading cause of death, responsible for ~1.5 million deaths annually1. Marked by a gradual decline in kidney function or kidney damage lasting over 3 months2, the classification of the Kidney Disease Improving Global Outcomes (KDIGO) recognized 5 stages in this condition based on glomerular filtration rate (GFR), with stage 5 being near-total kidney failure necessitating kidney replacement therapy (KRT) such as dialysis or transplantation for survival3. In advanced stages of CKD, diminished kidney function contributes to significant accumulation of uraemic toxins (UTs) such as indoxyl sulfate (IxS), p-cresyl sulfate (pCS) and p-cresyl glucuronide (pCG) in the systemic circulation, contributing to the uraemic environment and increased cardiovascular risk4,5. The precursors of these UTs, indole and p-cresol (pC), are produced by gut microbiota through fermentation of endogenous and dietary amino acids4.

A surge in interest in the CKD-associated microbiome has led to the identification of various taxonomic changes linked to CKD over the past decade6,7,8,9,10,11,12,13,14,15,16. However, these findings are far from uniform, with reported taxonomic shifts varying widely across studies. Conflicting associations for taxa such as the mucin-degrading Ruminococcus torques7,14 exemplify this heterogeneity and highlight the incoherence of taxonomic signatures in CKD. Adding another level of complexity is the inherent diversity among individuals with CKD, as well as inclusion and exclusion study criteria and methodological differences, all of which can influence study outcomes. Given these multiple layers of complexity, standardized workflows, considering the whole process from patient recruitment and sample collection to data processing, as well as covariate control specific for CKD microbiome studies, are imperative17.

Historically, the field of gut microbiology has leaned heavily on relative microbiome profiling (RMP), which presents microbial populations in relative proportions18. While normalization algorithms addressing the compositional nature of the microbiome or absolute count prediction approaches from sequencing data have been developed19,20, RMP can obscure crucial disease-associated variations in microbial abundance. By integrating experimental cell counting techniques (for example, flow cytometry) in high-throughput sequencing workflows, the resulting quantitative microbial profiling (QMP) enables the representation of microbial communities in absolute values, reducing both false-positive and false-negative associations in downstream analyses18.

Furthermore, while earlier studies predominantly utilized 16S ribosomal (r)RNA gene amplicon sequencing, more recent efforts employ whole-genome shotgun metagenomics to obtain a more in-depth view of composition and the functional potential of the gut microbiota. These studies often rely on databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG)21, which may provide incomplete insights into the genetic potential of microbial communities, particularly concerning pathways relevant for CKD.

Here we used 16S rRNA gene profiling to characterize broad compositional patterns and ecological structure in a cohort of 130 individuals across CKD stages and non-CKD controls. We complemented these analyses with data processed according to the QMP approach18, while adjusting for host-related explanatory factors of microbiota variation, to profile a clinically well-characterized cohort representing the different stages of CKD. We linked taxonomic variation and inferred functional profiles to host-related markers and plasma concentrations of UTs and their precursors. Our newly generated CKD-specific microbial pathway interpretation framework was applied to investigate clinically and ecologically relevant gut microbiome patterns in this heterogeneous patient population. We complemented our cohort-specific results with a cross-study biomarker analysis based on 11 previous studies. Finally, we explored potential associations of gut microbiome composition and functional capacity with individual CKD progression over a 4-year follow-up.

Our study identified intestinal transit time (ITT) as the primary driving force of microbial variance in this CKD cohort and pointed to an overall shift towards a more proteolytic microbial metabolism in advanced stages of CKD. No microbial markers for CKD progression were found.

Results

Host and microbial community feature variations in CKD

In this cohort of 130 participants (109 non-dialysed CKD patients, 10 CKD 5 patients on peritoneal dialysis (PD) and 11 non-CKD controls), faecal calprotectin concentrations were higher in the CKD 5(PD) group (48.73 μg g−1 [interquartile range (IQR), 33.49–117.66]) compared with the control (9.94 μg g−1 [IQR, 5.69–16.58]; Padj < 0.05). Faecal moisture content was reduced in CKD 5(PD) (50.92% [IQR, 42.64–65.98]) compared with CKD 1/2 (72.55% [IQR, 63.59–73.90]; Padj < 0.05), suggesting a prolonged ITT. No significant differences in microbial load were detected between any of the groups. The number of medications classified by the Anatomical Therapeutic Chemical (ATC) subgroup22 increased with CKD stage from 1.22 ± 1.20 in controls to 9.52 ± 4.38 in CKD 4/5 and 10.8 ± 2.66 in CKD 5(PD) (Extended Data Fig. 1). Antibiotic use was significantly higher in CKD 5(PD) compared with all other groups (Padj < 0.05). Serum urea and creatinine and all serum UTs were increased in advanced disease stages (Supplementary Table 1).

To capture gut microbiota composition patterns in CKD, we performed principal coordinates analysis (PCoA) on Bray–Curtis dissimilarities of genus-level 16S rRNA gene profiles. The first 18 principal coordinates (PC) explained 90% of the total community variance, with PC1 and PC2 explaining 20.8% and 15.8%, respectively (Extended Data Fig. 2a). Notably, PC2 was positively associated with faecal moisture content and short-chain fatty acids, and negatively associated with faecal pC and plasma p-cresyl conjugates (all Padj < 0.001; Fig. 1a), as well as with Clostridiales and Alistipes, while showing positive association with Faecalibacterium. Together, these indicate a functional shift from saccharolytic towards proteolytic microbial metabolism23,24.

Fig. 1: Enterotype profiles are associated with clinical parameters in CKD.
figure 1

a, Heat map of Spearman rank correlations (ρ) between principal coordinates that showed at least one significant association with any clinical or microbial feature. Blue tiles indicate positive correlations; red tiles indicate negative correlations. P values were adjusted for multiple comparisons using Benjamini–Hochberg (BH) correction; *Padj < 0.05, **Padj < 0.01, ***Padj < 0.001 (two-sided Spearman correlation). b, Proportional abundance of 16S rRNA gene genus-based enterotypes (CKD cohort n = 130, FGFP cohort n = 211). Differences in enterotype proportions by CKD stage were tested against non-CKD groups using two-sided Fisher’s exact test. P values shown were adjusted for multiple testing using the BH procedure. Only significant comparisons (FGFP_Control versus CKD 4/5 (Padj: Bact1 = 0.0000614; Rum = 0.0108); FGFP_Control versus CKD 5(PD) (Padj: Bact1 = 0.0452; Bact2 = 0.0000356, Prev = 0.000223); Control versus CKD 4/5 (Padj: Bact1 = 0.000000447, Rum = 0.0108); Control versus CKD 5(PD) (Padj: Bact1 = 0.00179, Bact2 = 0.0000356, Prev = 0.0169)) are displayed. c, Faecal moisture content (%) across enterotypes. Boxplots show the median and interquartile range (box; 25th–75th percentiles); whiskers extend to 1.5× the IQR. Group differences were assessed using Dunn’s test with BH adjustments for multiple comparisons; *Padj < 0.05, **Padj < 0.01. d, Association between enterotype prevalence and eGFR in individuals with CKD not on peritoneal dialysis (PD) (multinomial logistic regression, n = 109, likelihood-ratio χ² test, two-sided, P = 0.007). e, Residuals from the linear regression between eGFR and faecal moisture content across the enterotypes in individuals with CKD not on PD (n = 109). Boxplots show the median and interquartile range (box; 25th–75th percentiles); whiskers extend to 1.5× the IQR. Group differences were assessed using Kruskal–Wallis test with BH adjustment; Padj > 0.05. K–W, Kruskal–Wallis. f, Residuals from the linear regression between faecal moisture content and eGFR across enterotypes in individuals with CKD not on PD (n = 109). Boxplots show the median and interquartile range (box; 25th–75th percentiles); whiskers extend to 1.5× the IQR. Differences between the groups were assessed using Dunn’s test with BH correction for multiple testing; *Padj < 0.05, **Padj < 0.01. Obs, observed; Exp, expected.

Ruminococcus enterotype increases across CKD stages and tracks ITT, whereas the dysbiotic Bacteroides 2 enterotype is enriched in PD patients

To capture recurring compositional patterns of gut microbiota configurations underlying the continuous gradients observed, we next applied 16S rRNA gene-based community typing to summarize non-discrete, dominant modes of microbial community structure (enterotypes)25, which have been shown to capture reproducible associations with environmental and host factors in health and disease25. Dirichlet-multinomial mixture (DMM) analysis identified four enterotype configurations: Ruminococcus (Rum), Bacteroides 1 (Bact1), Bacteroides 2 (Bact2) and Prevotella (Prev), consistent with previously reported enterotype structures25,26, and reproducible under random subsampling and 10-fold cross-validation (Extended Data Figs. 2b and 3b, and Supplementary Table 2). Enterotype distribution differed significantly between non-CKD control and CKD groups (Fig. 1b and Supplementary Table 3). The CKD 5(PD) group showed a reduced prevalence of Bact1 and increased prevalence of the inflammation-associated Bact2 enterotype27,28 (Padj < 0.05), with Prev being absent in this group. Bact2 was detected in 50% of CKD 5(PD) individuals, making it the dominant configuration. This pattern was replicated in an independent Austrian cohort6 (n = 36), in which CKD (PD) also exhibited a higher prevalence of Bact2 compared with controls (Padj < 0.05; Extended Data Fig. 2d and Supplementary Table 3). This pattern is also consistent with previous research linking PD treatment to systemic inflammation29 and associating Bact2 with inflammation-related conditions28,30. The rise in Bact2 in CKD 5(PD) aligned with elevated calprotectin concentrations (Supplementary Table 1) and a positive correlation between plasma C-reactive protein and faecal calprotectin (P = 0.038). In line with previous findings28,31, Bact2 individuals also demonstrated reduced microbial load and species richness (Padj < 0.05; Supplementary Table 3).

Across progressive CKD stages, Bact1 prevalence decreased (CKD 3 and CKD 4/5 versus Control; both Padj < 0.05), whereas Rum prevalence increased (CKD 4/5 versus Control; Padj < 0.05). Similar dynamics across the Bact1–Rum axis were observed in age, sex and body mass index (BMI)-matched participants from the Flemish Gut Flora Project32 (FGFP; n = 211), indicating that these changes exceed expected variation within the non-CKD population33 (Fig. 1b and Supplementary Table 3).

Moreover, Rum-dominated communities exhibited higher microbial load and species richness, and the lowest moisture content (Padj < 0.05; Fig. 1c and Supplementary Table 3). These results are consistent with PCoA-based associations (Fig. 1a) in which Rum-associated taxa (Extended Data Fig. 3b) were negatively correlated with PC2 values, and lower PC2 scores were associated with decreased faecal moisture (Fig. 1a). In non-dialysed individuals with CKD, enterotype distribution along the Bact1–Rum axis increased with estimated GFR (eGFR) (Pr(Chi) = 0.007; Fig. 1d and Supplementary Table 3). However, this trajectory attenuated after adjustment for faecal moisture (Padj = 0.08; Fig. 1e and Supplementary Table 3). Conversely, the association between enterotype distribution and faecal moisture remained intact after adjusting for eGFR (Padj < 0.05; Fig. 1f and Supplementary Table 3). These relationships were replicated in FGFP participants32 with reduced kidney function (n = 990), further reinforcing the robustness of our results (Supplementary Table 3). This observation is in line with previously reported ITT-associated ecosystem maturation across the Bact1–Rum axis in healthy individuals33,34,35, and aligns with the ecological gradient captured by PC2, which similarly links reduced faecal moisture with proteolytic community composition.

Reduction of species richness in the dysbiotic PD group

To further characterize microbiome alterations in CKD, we analysed shotgun metagenomic data using QMP species-level abundance profiles. Species richness was reduced in the CKD 5(PD) group compared with CKD 4/5 (Padj < 0.05), whereas in non-dialysed individuals, richness negatively correlated with eGFR (P < 0.05; Fig. 2a). Shannon diversity and Pielou’s evenness did not differ between CKD groups.

Fig. 2: Microbial species diversity across CKD stages.
figure 2

a, Alpha diversity indices (observed richness, Shannon index, Pielou’s evenness) across CKD cohort groups (n = 130). Boxplots show the median and interquartile range (box; 25th–75th percentiles); whiskers extend to 1.5× the IQR, with individual data points overlaid. Group differences were assessed with Kruskal–Wallis test, followed by two-sided Dunn’s post hoc test with BH correction for multiple testing; **Padj < 0.05. The grey shaded area highlights samples included in the correlation analysis between richness and eGFR, which was assessed using Spearman rank correlation (r = −0.223; exact P = 0.0147). b, Beta diversity based on species-level Bray–Curtis dissimilarities between samples categorized by disease group. Ordination was performed using PCoA. Elipses represent 95% confidence intervals around group centroids. Differences between groups were assessed using Kruskal–Wallis test. Axis 1, 6.55% variance explained; Axis 2, 5.41% variance explained.

The overall species composition differed across CKD stages based on Bray–Curtis dissimilarities (P = 0.02), with comparable within-group dispersion (P = 0.5; Fig. 2b). Pairwise differences did not remain significant after multiple-testing correction (Supplementary Table 4).

Faecal moisture content and medication are key covariates in the association between microbiota and CKD stages

To identify host-related factors affecting microbiota variations, we performed a distance-based redundancy analysis (db-RDA) on metagenomic profiles. After data curation, the included variables (Supplementary Table 5) explained 9.2% of between-sample variation (Fig. 3a), consistent with previous studies30,33,36. Thirteen non-redundant host variables accounted for a cumulative effect size of 5.3% (Fig. 3b), with faecal moisture content, emerging as the strongest contributor (Padj < 0.05).

Fig. 3: Covariates influencing microbiome composition in the CKD cohort.
figure 3

a, Effect size of univariate db-RDA models grouped on the basis of metadata categories (medications, faecal parameters, health status, blood parameters, anthropometrics, diet and lifestyle). Effect sizes represent the proportion of variance explained by each category, assessed using PERMANOVA. b, Cumulative effect size of non-redundant covariates selected by stepwise db-RDA (right bars) compared with individual effect sizes assuming independence (left bars). Effect sizes are expressed as percentage of variance explained and were evaluated using permutation-based forward selection (up to 9,999 permutations). c, PCoA based on Bray–Curtis dissimilarity showing host factors and microbial metabolites associated with microbial community composition in the CKD cohort (n = 117). Samples with >10% missing values were excluded before the analysis. Arrows represent fitted covariate vectors from db-RDA, with direction indicating the gradient of increasing values and length proportional to the effect size. Ordination axes show the percentage of variance explained (Axis 1, 7.2%; Axis 2, 5.3%).

In addition to the previously identified moisture content covariate30,33, eGFR also contributed significantly to the cumulative model, alongside medication use (psychoanaleptics, drugs used in diabetes, psycholeptics, anti-anaemic preparations and antithrombotic agents), axial spondyloarthritis (AxSpA), faecal uric acid and calprotectin concentrations, and a diabetic diet (Supplementary Table 5 and Extended Data Fig. 4).

Within this CKD cohort, eGFR, faecal short-chain fatty acids (SCFAs) and faecal moisture content were oppositely associated with microbiome composition compared with the protein-bound uraemic toxin (PBUT) precursor pC and its plasma conjugates (Fig. 3c). When excluding the CKD 5(PD) group to minimize KRT-related effects, faecal moisture content remained the strongest non-redundant covariate. Psychoanaleptics, AxSpA, drugs used in diabetes and faecal uric acid also remained significant contributors (Supplementary Table 5).

Covariate-controlled quantitative markers of CKD severity show E. coli and Alistipes sp. enrichment and B. adolescentis depletion

We next assessed associations between species abundances and eGFR using covariate-adjusted models (Supplementary Table 5). In QMP analyses, the abundances of E. coli (ref_mOTU_v3_00095) and Alistipes sp. incertae_sedis (meta_mOTU_v3_12829) were negatively associated with eGFR, whereas B. adolescentis (ref_mOTU_v3_02703) showed a positive association (all Padj = 0.08; Fig. 4 and Supplementary Table 6). Notably, the latter only becomes apparent after adjustment for host variables, highlighting the influence of inter-individual differences in host factors in CKD.

Fig. 4: Comparison of CKD microbial signatures across previous studies and the present study.
figure 4

The first two panels summarize microbial species previously reported in more than one study over the past decade, which are positively or negatively associated with CKD stage or eGFR, based on 16S rRNA gene sequencing or shotgun metagenomics. These previously reported associations are benchmarked against results from the current cohort obtained using relative microbiome profiling (RMP) and quantitative microbiome profiling (QMP) methods, shown with and without adjustment for covariates in the last two panels. Red and blue boxes indicate enrichment in the CKD group or the control group, respectively. Covariate adjustment levels are indicated as follows: #demographic and microbiome-related factors, socioeconomic, behavioural and cardiometabolic factors; ##hypertension, diabetes, serum albumin, serum haemoglobin; ###faecal moisture content, psychoanaleptics, faecal uric acid concentration, axial spondyloarthritis, drugs used in diabetes. Species detected at abundances above the overall mean and present in >10% of samples in the current cohort are indicated with a bullet point (·).

Previous studies have reported reduced abundances of butyrate-producing taxa including Faecalibacterium6,7,8,9,12,16 and Roseburia7,8,9,10,16 in CKD. Consistent with this, RMP analysis identified a positive association between Roseburia hominis (ref_mOTU_v3_00861) and eGFR (Padj = 0.08; Fig. 4 and Supplementary Table 6). However, this association disappeared after covariates adjustment and was not observed in QMP analyses. Given that Faecalibacterium and Roseburia have previously been associated with faster ITT37, these findings further highlight the importance of covariates such as faecal moisture content in CKD.

Cross-study comparison shows extremely poor replication of microbial CKD markers

We next compared our findings with previously reported CKD-associated taxa. A total of 24 species-level taxa (including those from the present study) proposed as CKD markers across 11 published cohorts6,7,8,9,10,11,12,13,14,15,16 were compiled, of which more than 75% were detected in at least 10% of samples in our cohort. However, most of these taxa neither retained significant associations with eGFR in the covariate-controlled QMP analysis, nor did they show consistent replication across multiple non-quantitative, non-covariate-controlled studies (Fig. 4 and Supplementary Table 7).

Advanced CKD stages exhibit reduced microbial capacity to utilize plant carbohydrates

Guided by PCoA-based associations suggesting a microbial community shift from saccharolytic to proteolytic metabolism with declining kidney function, we assessed the genomic potential for carbohydrate metabolism through QMP-based analysis of Carbohydrate-Active enZymes (CAZymes)38 derived from the metagenomic gene catalogue. CAZymes were broadly grouped by substrate origin (plant, animal and mucin derived)39, and their ratios were used as indirect proxies of dietary habits and of overall saccharolytic and proteolytic capacity. We observed a progressive decrease in the plant-to-animal CAZymes ratio with advancing CKD stage, with the most pronounced reduction observed from CKD stage 3 onwards, compared with non-CKD control and CKD 1/2 (Padj < 0.1; Fig. 5a and Supplementary Table 8).

Fig. 5: Metagenomically inferred microbial functional capacity in CKD.
figure 5

a, Ratios of CAZymes associated with plant- versus animal-derived carbohydrates, and mucin glycan versus plant utilization potential across progressive stages of CKD. Boxplots show the median and interquartile range (box; 25th–75th percentiles); whiskers extend to 1.5× the IQR, with individual data points overlaid. Group differences were assessed on covariates-adjusted linear model residuals using Kruskal–Wallis test, followed by Dunn’s post hoc test with BH correction for multiple comparisons; *Padj < 0.1. b, Associations between gut metabolic (GMM), gut–brain (GBM) and gut–kidney (GKM) modules abundance with eGFR in individuals not on dialysis (n = 120) after partialling out the effects of significant microbiome covariates identified. Points represent regression coefficient (β) from a generalized linear model (GLM), reflecting changes in module abundance with decrease in eGFR; horizontal black lines represent 95% confidence intervals. Only modules with BH-adjusted P < 0.1 (two-sided Wald test) are shown. c, Presence of functional modules significantly associated with eGFR within microbial species identified to be associated with kidney function within the cohort. Yellow squares, GMM; green squares, GBM; purple squares, GKM. GABA, gamma-aminobutyric acid.

Negative association of microbial pC and indole biosynthetic potential with eGFR is lost when accounting for CAZyme dietary proxy

To investigate CKD-associated microbiome functional alterations beyond taxonomic composition, we applied a module-based analytical framework for targeted profiling of microbial metabolic potential. Building on an existing framework40, we manually curated additional gut–kidney axis modules (GKM) encompassing pathways involved in CKD-associated metabolites, including PBUT precursors and SCFAs (Supplementary Table 9). After covariate adjustment, 11 modules showed increased quantitative abundance with declining eGFR (Padj < 0.1; Fig. 5b and Supplementary Table 10). Among these, putrescine metabolism (MF0082) exhibited one of the strongest associations (Padj = 0.04), likely reflecting the metabolism of undigested proteins. Two modules involved in the synthesis of PBUT precursors: pC (module gut–kidney 004 (MGK004)) and indole (MGK013), were also negatively associated with eGFR (Padj = 0.04 for both).

Notably, inclusion of the plant-to-animal CAZymes ratio as an indirect proxy of dietary habits abolished the associations between eGFR and PBUT precursor synthesis modules (MGK004: Padj = 0.3; MGK013: Padj = 0.5), reflecting the influence of dietary carbohydrates on the microbial functional repertoire, and ensuring independence between CAZyme and GKM gene annotations. In contrast, no significant associations were observed between eGFR and SCFA-related pathways, suggesting preservation of SCFA metabolic potential across CKD stages. We next examined the functional repertoires of taxa significantly associated with eGFR (Fig. 5c). Among taxa negatively associated with eGFR, E. coli encoded pathways for both pC and indole biosynthesis, whereas Alistipes sp. harbour only the indole synthesis pathway module. In contrast, B. adolescentis (positively associated with eGFR) lack detectable pathways for PBUT precursor synthesis (Fig. 5c). To validate these findings, we screened intestinal reference genomes in the Integrated Microbial Genomes (IMG) database41. Consistent with our findings, 99% of E. coli genomes derived from the human large intestine (n = 692) encode both PBUT precursor biosynthesis pathways, while 60% of all Alistipes species genomes (n = 25) encode indole biosynthesis only (Supplementary Table 11). Analysis of 4,438 genomes from the Unified Human Gastrointestinal Genome (UHGG) catalogue42 further showed that only 4.2% and 9.9% of genomes harboured genes linked to pC and indole biosynthesis, respectively, with fewer than 1% of genomes (mostly within Bacteroidales, Oscillospiraceae and Gammaproteobacteria) encoding both pathways (Supplementary Table 12). This shows that the genomic potential for these pathways is limited to a narrow subset of taxa.

Presence of microbial pC biosynthetic potential relates to faecal pC concentrations which are associated with its end metabolites in the plasma

We next linked genome-based functional profiling to targeted measurements of faecal and plasma UT concentrations. Faecal pC concentrations were higher in samples harbouring the pC synthesis IV module (MGK004) than in samples lacking this module (P < 0.01; Fig. 6a). Among samples in which MGK004 was present, module abundance showed a borderline positive association with faecal pC concentration (P = 0.057; Fig. 6b). In addition, faecal pC concentrations were positively correlated with circulating pCS and pCG (P = 0.0048; Fig. 6c), and negatively correlated with faecal moisture content (P < 0.01; Fig. 6d), consistent with previous reports linking ITT to UT accumulation43.

Fig. 6: Associations between the pC biosynthesis module and related end-metabolites in the non-dialysed subpopulation.
figure 6

a, Faecal pC concentration in samples in which the pC biosynthesis module (MGK004) was absent versus present (n = 120). Boxplots show the median and interquartile range (box; 25th–75th percentiles); whiskers extend to 1.5× the IQR, with individual data points overlaid. Groups were compared using a two-sided Mann–Whitney U-test (exact P = 0.00036), ***P < 0.001. b, Association between MGK004 abundance (log10-transformed) and faecal pC concentrations in samples in which modules were present (n = 77). Correlation was assessed using two-sided Spearman rank correlation (r = 0.22, exact P = 0.057). c, Association between faecal pC concentrations and plasma pCS+pCG concentrations, assessed using two-sided Spearman rank correlation (r = 0.26, exact P = 0.0048). d, Association between faecal pC concentrations and faecal moisture content using two-sided Spearman rank correlation (r = −0.37, exact P = 0.00003). Shaded areas indicate 95% confidence intervals for the fitted trend lines.

In contrast, all samples harboured the genomic potential for indole synthesis I (MGK013), but module abundance was not associated with faecal indole concentration (P = 0.53). Nevertheless, faecal indole did correlate with plasma IxS (P = 0.022).

Gut microbiota composition and genomic potential are not associated with CKD progression

Among 109 non-dialysed CKD patients, 38 were categorized as progressors and 71 as non-progressors (Supplementary Table 1). Apart from age and eGFR slope, no gut–kidney axis-related variables differed significantly between the groups. Alpha diversity indices and overall microbial community composition were comparable between progressors and non-progressors. Covariate analysis identified faecal moisture content and psychoanaleptics use as significant contributors to microbiome variance in this subset. However, after adjusting for these covariates and correcting for multiple testing, no individual taxa or functional modules were significantly associated with disease progression (Supplementary Tables 6 and 10).

Discussion

CKD is marked by a gradual decline in kidney function and systemic accumulation of uraemic toxins, many of which originate from gut microbial metabolism44. Here, faecal quantitative shotgun metagenomic sequencing and covariate analysis were employed to identify compositional and inferred functional potential changes in patient microbiomes in different stages of CKD. With ITT as the primary driving force of microbial variance in this CKD cohort, an overall shift towards a more proteolytic microbial metabolism in advanced stages of CKD was observed. No microbial markers for CKD progression were found.

16S rRNA gene-based enterotype analysis showed that the Bact2 enterotype was proportionally dominant among patients with CKD 5 on peritoneal dialysis, both in our study and in a validation cohort. This enterotype profile is associated with gut dysbiosis and systemic and/or local inflammation and is significantly increased in several inflammatory and metabolic diseases28,45, as confirmed by the higher faecal calprotectin levels observed in the CKD 5(PD) group. While previous studies have reported elevated systemic inflammation markers in advanced CKD stages46, we extend these observations by demonstrating increased intestinal inflammation.

The present study noted a predominance of the Bact1 enterotype in individuals without CKD or in earlier stages of CKD, which was confirmed in independent cohorts. In contrast, patients in advanced stages showed a higher prevalence of the Rum enterotype. Consistent with previous reports linking Rum to firmer stools and proteolytic fermentation47, Rum-type participants in this cohort exhibited lower faecal moisture content. Notably, after accounting for moisture content, Rum association with declining kidney function was no longer evident in CKD patients, indicating a dominant role for ITT on the transition to a proteolytic environment. Accounting for host-related factors potentially obscuring meaningful microbiome associations is crucial. Among 13 host-related variables explaining gut species variation (cumulative 5.3%), faecal moisture content (proxy of ITT) emerged as the strongest individual contributor, consistent with previous observations33. Overall, collected host variables explained 9.2% of microbiome variation, comparable to earlier studies33,36,48, indicating that additional, unmeasured factors probably contribute to microbiome alterations in CKD development and progression.

Compared with other recent studies, our quantitative and covariate-controlled approach revealed relatively few significant microbial markers. After excluding patients undergoing peritoneal dialysis and adjusting for cohort-specific covariates, unclassified species of Alistipes and E. coli were negatively associated with eGFR, whereas B. adolescentis showed a positive association. Although several studies have reported increased E. coli abundance in advanced CKD11,12,49, this observation has not been consistently replicated across cohorts. Previous studies have also described a decline in butyrate producers (for example, Butyricicoccus, Faecalibacterium and Roseburia) in patients with impaired kidney function10,37,50,51,52. In contrast, no such associations were detected in our QMP covariate-controlled analysis. While RMP analysis identified a positive correlation between Roseburia hominis and eGFR, this signal was lost after covariate adjustment, consistent with previous reports linking Roseburia abundance to faecal moisture content37. Together, these findings suggest that the abundance of key butyrate-producing taxa remains stable throughout CKD stages when host-related factors are accounted for.

The use of quantitative shotgun metagenomics enabled us to investigate genome-inferred functional potential and link it to identified species. Among the 11 metabolic modules associated with kidney function decline, two newly curated GKMs involved in pC and indole biosynthesis were identified. These findings are consistent with previous studies reporting altered aromatic amino acid metabolism potential in CKD9,10,13, as well as with genomic evidence from the IMG/M41 and UHGG42 databases.

While our genome-based functional profiling provides important leads for further mechanistic research, its predictive power is constrained by downstream regulatory processes affecting gene expression and enzymatic activity53. Nevertheless, in vitro and in vivo studies support the capacity of E. coli and Alistipes sp., but not B. adolescentis to synthesize pC and indole49,54. A recent ex vivo fermentation study reported increased PBUT precursors generation by gut microbiota from CKD patients compared with controls, suggesting increased proteolytic catabolism in CKD55. Importantly, the association between pC- and indole-related modules disappeared after accounting for effects of (proxy) dietary habits, supporting the role of colonic dynamics and substrate availability in UT accumulation43,56.

The plant-to-animal CAZymes ratio observed in advanced CKD may partly reflect dietary management strategies aimed at limiting potassium and phosphorus intake through reduced consumption of plant-derived carbohydrates at the time of patient recruitment3. Along with increased abundance of protein-associated modules (that is, putrescine), this pattern is consistent with a shift towards a more proteolytic gut environment in advanced disease stages, given that protein in Western diets is predominantly of animal origin. In line with this, microbial proteolysis products, such as pC and indole are known to increase with high meat and protein consumption57,58. These observations support our hypothesis that the CKD-associated alterations in CAZymes profiles and PBUT precursor modules are, at least in part, mediated by dietary intake. Such dietary effects may also contribute to the observed enterotype shifts and species-level associations with kidney function, as increased prevalence of Rum and reduced abundance of B. adolescentis have been reported in gut environments characterized by enhanced proteolytic activity and prolonged ITT43,59,60. Although dietary data were not collected in our cohort to substantiate this, previous intervention studies have shown that increasing fibre intake alters plant-to-animal carbohydrate metabolism61, causes enterotype shifts and reduces UT levels in CKD62. Moreover, dietary fibre supplementation has been shown to inhibit microbial tryptophanase expression and reduce indole production63. Our results, in combination with previous findings, suggest that dietary modulation may influence gut ecosystem function and microbial metabolite production in CKD64,65.

This study applies quantitative microbiome profiling paired with a covariate-controlled analyses in the context of the gut–kidney axis within the field of nephrology, extending beyond non-quantitative and non-covariate-adjusted approaches predominantly used in previous studies. Our results underscore the necessity of quantitative, covariate-controlled approaches for accurately identifying microbial signatures of CKD and suggest that some previous conclusions may warrant re-evaluation. Several limitations should nevertheless be acknowledged. The relatively modest cohort size and uneven distribution across disease stages, which, although representative for the distribution of patients in clinical practice, may have reduced sensitivity to detect subtle microbiome changes, especially in relation to key covariates such as medications use and diet. In addition, the subgroup of patients classified as CKD progressors was relatively small, which limits power to detect microbiome features specifically associated with CKD progression. However, post hoc power analyses indicated that most significant associations observed were adequately powered in the cross-sectional cohort. To further strengthen inference, future studies would benefit from larger, more balanced CKD cohorts, including adequately powered longitudinal progressor groups, similar to consortium-based efforts in other disease areas66.

The limited replication observed across studies is likely driven by high heterogeneity in patient selection, physiology, covariates, technical factors and microbiome analysis protocols, which hampers the detection of strong, generalizable disease markers. Addressing this will require geographically matched multicentre approach with standardized sample handling and (quantitative) microbiome analysis approaches17 to minimize regional lifestyle and dietary variability. Incorporating a comprehensive diet and lifestyle analyses in future studies may shed further light on the variations observed67. Finally, integration of transcriptional and in vitro metabolic pathways validation will be essential to improve mechanistic understanding of enzyme-mediated pathways involved in the gut–kidney axis.

Collectively, this study advances understanding of the gut–kidney axis by highlighting that CKD-associated changes in gut microbiome composition and predicted metabolic potential are largely mediated by intestinal transit time and diet-related factors, and that apparent associations with eGFR or disease stage are attenuated after accounting for these variables. These findings highlight a potential role for nutritional strategies and/or transit-time modulation in reducing patient UT burden. Accordingly, preserving and/or restoring the balance between saccharolytic and proteolytic fermentation through approaches such as dietary fibre supplementation or modulation of intestinal transit time emerge as plausible routes for future investigation. Given that prolonged ITT has been recognized as a risk factor for CKD68, systematic evaluation of ITT management effects, combined with adequate fibre intake remains a viable strategy to regulate the intestinal environment and limit the generation of uraemic cardiotoxins in CKD.

Methods

Study cohort

The procedures of the study were approved by the Ethics Committee of Ghent University Hospital (EC2012/063, B670201214999 and EC2010/033, B67020107926). All study participants provided written informed consent. Exclusion criteria were as previously described69: active infection (C-reactive protein >20 mg l−1), immunosuppressive therapy, body mass index >35 kg m−2, inflammatory bowel disease, active malignancy cardiovascular event in the past 3 months, pregnancy, transplantation, use of non-steroidal anti-inflammatory drugs within the past month and age <18 years.

The present study cohort (n = 130) consists of 109 non-dialysed patients with CKD, which were grouped on the basis of disease stage into: CKD 1/2 (n = 36), CKD 3 (n = 44) and CKD 4/5 (n = 29); 10 in CKD 5 receiving peritoneal dialysis treatment (CKD 5(PD)), of which 3 were recruited by the Antwerp University Hospital, Belgium; and 11 non-CKD controls recruited by the Nephrology unit of the Ghent University Hospital in Belgium. Overall, we covered all stages of the disease, following the classification outlined by the Kidney Disease Outcomes Quality Initiative (KDOQI) and the KDIGO guidelines3,70. Each participant provided blood, urine and faecal samples that were aliquoted and stored at −80 °C until analysis. Additional biometrical data, information on lifestyle and medication use were collected during patient visits by means of an extensive questionnaire as previously described69.

Defining CKD progression

For each patient in groups CKD 1/2, CKD 3 and CKD 4/5, disease progression was defined as a change in estimated (e)GFR over a follow-up period of 4 years after study inclusion. Using a linear model based on eGFR measures from ambulant hospital visits, eGFR slopes were calculated. Progression was defined as either a loss of 2.5 ml min−1 year−1, or progression to kidney replacement therapy71.

Sample analyses

Blood, urine and faecal samples used in this study were collected as part of an established clinical cohort69. Quantification of general blood parameters, uraemic toxins, their precursors and SCFAs was performed previously on these samples and has been reported elsewhere69,72. The resulting measurements were reused for the current analyses. Briefly, faecal suspensions were prepared by dissolving 0.5 g of faecal sample in 2.5 ml of phosphate buffered saline (PBS) solution and vortexing for 5 min. The faecal suspension was centrifuged at 10.000 × g for 30 min, after which the supernatant was filtered through a 0.22 µm filter and stored at −80 °C. The total concentrations of tryptophan, tyrosine and phenylalanine, indole and pC in faecal suspensions were quantified using high-performance liquid chromatography (HPLC). In faecal suspensions, urine and plasma, concentrations of pCS, pCG and IxS were quantified using ultra-performance liquid chromatography (UPLC). Faecal concentrations of acetate, butyrate and propionate were determined following liquid–liquid extraction of the samples using diethylether. UPLC was used for quantification. Chromatographic separation was performed on an XBridge BEH C18 XP column (particle size of 2.5 µm) and the diode array detector set to a wavelength of 210 nm. Data processing was done in Open Lab CDS ChemStation Edition for LC and LC/MS Systems Rev C.01.07.SR2, using a peak width of 5 Hz. Full experimental protocols, validation procedures and analytical parameters are described in the original publications69,72.

Faecal calprotectin measurement

Calprotectin concentrations were determined in faecal suspensions69 using the Bühlmann fCAL ELISA kit (Bühlmann Diagnostics). The optical density was measured at 450 nm, allowing for the calculation of the final calprotectin concentration, expressed as µg per gram of faeces.

Faecal moisture content

Faecal moisture content, a proxy for ITT, was determined by weighing each frozen, non-homogenized faecal aliquot before and after lyophilising for 48 h. Moisture content was calculated as the percentage of mass lost after lyophilisation.

Microbial load

Faecal microbial load was quantified as previously described27. In brief, each frozen faecal aliquot was weighed and diluted 1:100 in a 0.9% saline solution (VWR). Next, 1 ml of the diluted faecal suspension was stained with 1 µl SYBR Green I (1:100 dilution in dimethyl sulfoxide) and incubated in the absence of light at 37 °C for 20 min. Quantitative analysis of the stained microbial cells was performed using the CytoFLEX S flow cytometer (Beckman Coulter) operating at a flow rate of 30 µl min−1 over 40 s. Fluorescence was recorded on the fluorescein isothiocyanate channel (525/40 nm), in conjunction with the side-scatter channel (SSC-A). To ensure accurate distribution of detected cells/events, a backward gating was performed on the forward scatter (FSC-A) and the SSC-A dot plot (Extended Data Fig. 5). The weight of the faecal aliquot served as a conversion factor for calculation of microbial cell count per gram of faeces.

Unsupervised ordination of microbial community composition

Previously described 16S rRNA gene sequence data of the cohort50 were taxonomically classified using the Ribosomal Database Project classifier73. Thereafter, genus-level abundance matrices were constructed and rarified to 10,000 reads per sample.

PCoA was performed on Bray–Curtis dissimilarity matrices using the ‘cmdscale’ function in combination with the vegan package (v.2.6-4)74. Ordination was conducted in an unsupervised manner without previous stratification. Principal coordinates cumulatively explaining ~90% of the total inter-individual variance were subsequently evaluated for associations with microbial taxa, as well as with clinical and biochemical metadata of the cohort.

Microbiome community profiling

Amplicon sequencing (16S rRNA gene) data of the cohort50 were additionally employed for microbiome-based community profiling. Identification of enterotype configurations was performed with the DMM approach in R using the DirichletMultinomial (v.1.40.0) package26 on the combined 16S rRNA gene-based FGFP (n = 2,998)33 and CKD (n = 130) cohorts genus-level abundance matrices (rarefied to 10,000 reads) taxonomically classified using the Ribosomal Database Project (RDP) classifier73. To mitigate batch effects, microbiome abundance profiles were harmonized using MMUPHin (v.1.12.1)75. The optimal number of components was determined using the Bayesian information criterion (BIC). Clustering robustness was evaluated by repeating DMM analyses on multiple random subsamples, with BIC-based model selection consistently supporting a four-component solution (k = 4) for our dataset (Extended Data Fig. 2b). The non-discrete clusters were labelled Bacteroides 1 (Bact1), Bacteroides 2 (Bact2), Prevotella (Prev) and Ruminococcus (Rum) as described previously27 (Extended Data Fig. 3b). Reproducibility of DMM clustering and enterotype assignment was assessed by 10-fold cross-validation, implemented with the DirichletMultinomial (v.1.40.0)26 and caret (v.6.0-94)76 packages. Inter-individual variation in microbiome community profiles was visualized using PCoA based on Bray–Curtis dissimilarity (Extended Data Fig. 2c).

Validation datasets

To validate enterotype patterns and reduce cohort-specific biases, two independent cohorts were integrated alongside the core dataset.

FGFP

The FGFP cohort (n = 2,998) originates from a population-level, cross-sectional study of adults in Flanders, Belgium32. From this dataset, a subset of individuals with healthy kidney function (eGFR ≥90) was selected to mimic the non-CKD controls. To minimize demographic biases, propensity score matching with the MatchIt (v.4.5.5) package77 was performed on sex, age and BMI, yielding a matched subset of 211 participants. Ethics approval for the FGFP sampling was granted by the Commissie Medische Ethiek UZ-VUB (B.U.N.143201215505) and the Ethische Commissie Onderzoek UZ/KU Leuven (S58125). Inclusion and exclusion criteria from the original cohort were applied to ensure consistency and comparability across datasets.

External validation of the PD group

An external dataset comprising faecal 16S rRNA gene profiles from PD (n = 15) and non-CKD individuals (n = 21) was obtained from a study conducted at the Medical University of Graz, Austria6. The cohort demographics is broadly comparable to that of the current cohort and, given the Western European origin of both cohorts, is unlikely to differ substantially in terms of ethnic or geographical background. Raw sequencing data (NCBI SRI accession: PRJNA390475) were processed using the same pipeline and taxonomy classifier as applied to our primary cohort to ensure methodological consistency.

Shotgun metagenomic sequencing

Shotgun metagenomic data were generated from the same faecal samples analysed by 16S rRNA gene amplicon sequencing50. DNA was extracted from faecal suspension samples utilizing the QIAGEN RNeasy kit (QIAGEN), with minor adaptations to the protocol as described earlier33 to allow DNA recovery. DNA concentrations were measured with the DropQuant spectrometer. Samples were sent to Novogene Europe for shotgun metagenomic HiSeq 4000 paired-end Illumina sequencing. Low-quality reads, adapters, PhiX contaminants and human DNA were removed using BBmap (v.38.51)78. The cleaned reads were assembled into contigs using metaSPAdes (v.3.14.1)79 and further clustered into metagenome-assembled genomes (MAGs) using metaBAT (v.2.15)80 with default settings. Contigs of <1,500 bp were excluded from further analyses to avoid inaccurate mapping stemming from repetitive and/or low-quality sequences, and MAGs with >90% completeness and <10% contamination rate were selected for functional and taxonomical annotation using the reference genome-independent taxonomic profiler mOTU (v.3.1.0)81. Gene prediction from these contigs was performed using Prodigal (v.2.6.1)82, and genes were then clustered across all samples with an identity threshold of 95% using CD-HIT (v.4.8.1)83. In total, we constructed a catalogue of 7,316,772 gene clusters. Gene clusters were subsequently annotated with EggNOG-mapper (v.2.1.12) in DIAMOND mode84.

To construct the RMP matrix, MAGs coverage was calculated using CoverM (v.0.6.1)85 with the trimmed_mean method and cover_fraction of 0.3. The raw abundances were subsequently normalized on the basis of base pairs per read library86 and transformed to relative abundances. For the assembly of the QMP matrix, clean reads were rarefied using sequencing depth and total cell count in a gram of faeces as rarefication factor as previously described27. Rarefied samples were subsequently mapped back to our constructed gene catalogue using BWA aligner (v.0.7.13)87, sorted and indexed with samtools (v.1.9)88, and taxonomically profiled using mOTU profiler (v.3.1.0)81.

Manual curation of a CKD-specific module annotation platform

Genome-based functional profiling of the faecal microbiome was performed using 139 pre-defined modules from published frameworks covering general gut metabolic and gut–brain axis pathways40,89. In addition, 27 new gut–kidney axis modules (GKM) representing the genetic capacity for synthesis and degradation of PBUT precursors and for the production and metabolism of SCFAs were curated. The new set of modules also included phytate degradation given its role in dietary phosphorus management and its implications in CKD90. The newly curated modules were developed by incorporating information from a comprehensive review of the existing literature and the MetaCyc database91 (Supplementary Table 9), followed by detailed definition of each module and alternative pathways. These modules, characterizing enzymatic activities, are annotated using prokaryotic and archaeal KEGG Orthology (KO) terms21. Given that a single enzymatic function may be represented by multiple orthologues, the GKMs incorporate variations of the same reaction, usually involving identical substrates and products. Each GKM comprises a modular representation of the underlying biochemical pathway, which represents the multiple chemical reactions that might be required in the pathway and in which alternative reaction routes are delineated by TAB separation, whereas KOs that are necessary for the completion of a process are listed with RETURN and COMMA separations.

Previously reported CKD microbial markers

To assemble a list of biomarkers of CKD identified in publications between 2014 and 2024, a Scopus search query with keywords ‘CKD OR eGFR OR Chronic Kidney Disease OR renal OR kidney AND microbiome OR microbiota AND human’ was conducted. The criteria for selecting biomarkers included: (1) clearly defined groups of CKD and non-CKD controls and (2) taxa classified at species level.

Microbiota diversity analysis

All subsequent analyses were performed on the basis of aforementioned QMP species-level data. For each cohort group, median alpha diversity indices including richness, Pielou’s evenness and Shannon diversity were calculated using the R vegan package (v.2.6-4)74. Beta diversity, reflecting compositional differences between samples was calculated as Bray–Curtis dissimilarities using the ‘vegdist’ function of the vegan package (v. 2.6-4)74, and differences between the CKD groups were assessed with permutation-based analysis of variance (PERMANOVA) with the ‘adonis2’ function of the vegan package (v.2.6-4)74.

Host factors explaining variation in gut microbiota

The contribution of host-associated variables, as recorded in the collected metadata, on variations in the intestinal microbiome community was evaluated using db-RDA. This analysis was conducted with the ‘capscale’ function in the R vegan package (v.2.6-4)74. The cumulative contribution of host variables was ascertained by forward step model selection on the db-RDA using the ‘ordiR2step’ function in the vegan R package (v.2.6-4)74. All colinear variables (Pearson’s |r| > 0.8), bacterial metabolite concentrations (SCFAs, PBUT precursors and their derivatives in plasma and urine), variables with <1% variance, variables with >10% not available (NAs) and topical medications were not included in these analyses (Supplementary Table 5). Non-redundant, significant variables from the forward step model as well as microbial metabolites (excluding >10% NAs) previously associated with CKD were used for PCoA to visualize their association with microbiome composition.

Next, analyses were conducted (Supplementary Table 5) with the exclusion of the CKD 5(PD) group to account for the potential variability in microbial composition that could additionally arise from treatment-related factors such as the dialysis procedure per se, dialysis fluid and medication use. Here, the objective was to minimize the influence of kidney replacement therapy-related factors in the overall cohort where the majority of patients are not on dialysis.

Microbiome features associated with kidney function and CKD progression

Taxa present in fewer than 10% of samples and those with mean abundance below the overall mean abundance across taxa were excluded. Associations between species abundance and kidney function (eGFR) or CKD progression (eGFR slope) were assessed using general linear models (GLMs):

$${\rm{M}}1={\rm{response\; variable}} \sim {\rm{species}}({\rm{abundance}}),$$
$${\rm{M}}2={\rm{response\; variable}} \sim {\rm{species}}({\rm{abundance}})+{\rm{significant\; covariates}},$$

where the response variable denotes either eGFR (quasi-Poisson family) or eGFR slope (Gaussian family). The M1 model does not account for the effects of covariates, while the M2 model includes all significant covariates found in the CKD cohort, with the exclusion of the CKD 5(PD) group.

Functional pathway analysis was executed using KO annotation from the predicted genes to quantify their abundance in each sample and each genome. The abundance of modules was determined using omixer-rpm (v.0.3.2)92 as previously described89 and was calculated as the mean of KO abundances with a set detection threshold of 0.6 to compensate for potential miss-annotations and gaps in the genomes.

The aforementioned M2 model was used to investigate the association between modules abundance and eGFR as a continuous variable. Colinear modules (Pearson’s |r| > 0.8) were excluded from the analysis. A non-parametric bootstrapping93 with 1,000 iterations (R = 1,000) was performed in R using boot (v.1.3-28.1)94, extracting the 95% confidence intervals for each regression coefficient. Post hoc power (α = 0.05) for all kidney function associations under the M2 model (covariate-adjusted) was estimated from the partial R2 of full versus reduced linear models and computed using the ‘pwr.f2.test’ function in the pwr (v.1.3-0) R package95 as previously described96 (Supplementary Tables 6 and 10).

To assess the genome-encoded capacity for microbial utilization of various carbohydrate sources, eggNOG-mapper84 was used to annotate the CAZymes38 in each sample. The proportions of CAZymes associated with the microbial metabolism of plant-derived versus animal-derived carbohydrates and mucin (host-derived) versus plant-derived carbohydrates39 were evaluated across all groups while controlling for covariates in the M2 model.

Statistical analysis

Statistical analyses were executed using R (v.4.2.1) and RStudio (v.2023.06.2 + 561, x86_64-apple-darwin17.0)97, employing various packages including tableone (v.0.13.2)98, FSA (v.0.9.5)99, phyloseq (v.1.42.0)100, rstatix (v.0.7.2)101, pairwiseAdonis (v.0.4.1)102, vegan (v.2.6-4)74, DirichletMultinomial (v.1.40.0)26, microbiome (v.1.20.0)103 and omixer-rpm (v.0.3.2)92. Shapiro–Wilk test was used to assess normality of metadata variables. Medians and interquartile ranges (25th–75th percentiles) or means ± s.d. were used to describe the distribution of the different variables. Non-parametric tests were utilized to accommodate non-normal gene and taxa distributions. The Kruskal–Wallis test, complemented by post hoc Dunn tests via the rstatix package (v.0.7.2)101, was utilized to identify group variations. To compare enterotype prevalences in each CKD group, Fisher’s exact test was employed against a non-CKD control. In addition, LoreplotR (v.0.2.1)104 was used to compare the frequency of enterotypes with eGFR in the non-dialysed individuals with CKD. Species composition disparities across groups were quantified using the PERMANOVA test with a Bray–Curtis index, while within-group variances were assessed with the ‘betadisper’ function of the vegan package (v.2.6-4)74. Statistical significance was determined using the BH correction method, applying a false discovery rate (FDR) threshold of <0.05 for non-metagenomic data analysis and <0.1 for QMP-based analysis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.