Fig. 5: Sum of relative abundances of typically oral taxa in the human gut microbiome is a potential indicator of disease. Per-study sample size is shown in Fig. 4.

a Boxplots showing the percentage of positively scoring individuals from a dataset of 6891 gut microbiomes from healthy, adult participants, under different definitions of the score based on progressive thresholds of prevalence to define an oral species. Boxplots span the median, interquartile range and 1.5 times the interquartile range or the most extreme value. Values outside of this range are plotted as points. b Distribution of the per-population mean score in 6891 gut microbiomes from adult, healthy individuals (25 studies), and 3632 gut microbiomes (48 studies) from adults who have received a specific diagnosis. Asterisks mark the between-distribution (disease vs healthy), two-sided Mann–Whitney test P < 0.05, exact p-values are reported in Supplementary Data 8. AUROC of the per-dataset average score predicting the diseased state is reported. Boxplots span the median, interquartile range and 1.5 times the interquartile range or the most extreme value. Values outside of this range are plotted as points. c log10 distributions of summed relative abundance of oral-cavity typical microbial species (defined using 1% of the oral samples as a threshold) in 10 cohorts of CRC patients (orange) and related controls (blue). Asterisks mark the between-distribution (disease vs healthy), two-sided Mann–Whitney test FDR < 0.1, exact p-values are reported in Supplementary Data 8. AUROCs of the oral enrichment score for predicting CRC versus controls are presented. Boxplots span the median, interquartile range and 1.5 times the interquartile range or the most extreme value. Values outside of this range are plotted as points. d Boxplots showing the log10 distributions of oral-cavity typical microbial species summed relative abundance (defined using 1% of the oral samples as a threshold) in 14 diseases (20 cohorts), divided by disease (orange) and controls (blue). Asterisks mark the between-distribution (disease vs healthy), two-sided Mann–Whitney test FDR < 0.1, exact p-values are reported in Supplementary Data 8. AUROCs the oral enrichment score against disease versus healthy conditions are presented. Boxplots span the median, interquartile range and 1.5 times the interquartile range or the most extreme value. Values outside of this range are plotted as points. e Forest plot showing the meta-analysis of the association of disease and corresponding healthy controls of oral species summed relative abundance in 30 cohorts. Single datasets effect sizes are computed as mean difference (beta coefficient) extracted by a linear model controlling for sex, age, BMI, number of reads, and antibiotics usage when possible. A natural log of the score is used. Zeros are imputed using the minimum value in each dataset. Purple/gold: coefficient different from zero (Wald FDR = 0.01 in the single cohorts, meta-analysis P < 0.05); blue/purple: coefficient non-significantly different from zero.