Fig. 2: Identification of bacterial biomarkers at the species level for diagnosing IBD through cross-cohorts.

a Overall composition of the population across 9 metagenomic datasets (n = 1363). b, c The alpha diversity of IBD (red, n = 795) and control (blue, n = 395) was measured using the Shannon index and Simpson index. The adjusted p value (two-sided test) was calculated using MMUPHin tools. The data in boxplots is represented using interquartile ranges (IQRs), with the median shown as a horizontal line, and the whiskers extending to the most extreme points within 1.5 times the IQR. Exact p values are provided in the Source data file. d Principal coordinate analysis (PCoA) shows significant differences in microbial composition between both groups (P = 0.001) and cohorts (P = 0.001). The significance of beta diversity based on Bray-Curtis distance was calculated using PERMANOVA with 999 permutations (two-sided test, n = 1190). The data in boxplots is represented using interquartile ranges (IQRs), with the median shown as a black horizontal line, and the whiskers extending to the most extreme points within 1.5 times the IQR. e The top bar graph displays the 74 gut bacterial species with the most significant differences (P < 0.0001), as calculated using a two-sided Wilcoxon test with FDR-corrected P values. Among these species, 31 are highlighted in dark gray as feature species for subsequent random forest modeling (Confirmed). The middle bar graph shows the generalized fold change (gFC) of these 74 significant species, with red indicating 11 species that are enriched in IBD and blue indicating 63 species that are depleted in IBD. At the bottom, heatmaps are shown in gray and in color, respectively, displaying the species-level significance and the gFC within individual cohorts. (f) The classification models accuracy of IBD resulting from 10-fold cross-validation was assessed within each cohort (gray boxes along the diagonal), as well as cohort-to-cohort model transfer (external validations off-diagonal), using the AUROC for classifiers trained on species abundance profiles. g The classification models accuracy, as evaluated by AUROC on a hold-out cohort, improves when taxonomic data from all other cohorts are combined for training using leave-one-cohort-out (LOCO) validation, compared to models trained on data from a single cohort (cohort-to-cohort transfer). The error bars indicate the mean ± sd, n = 5. Source data are provided as a Source Data file.