Extended Data Fig. 3: The generalized fold change extends the established (median-based) fold change to provide higher resolution in sparse microbiome data. | Nature Medicine

Extended Data Fig. 3: The generalized fold change extends the established (median-based) fold change to provide higher resolution in sparse microbiome data.

From: Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer

Extended Data Fig. 3

a, In the top row, the logarithmic relative abundances for Bacteroides dorei/vulgatus, Parvimonas micra, and F. nucleatum subspecies animalis—examples for one high-prevalence and two low-prevalence species—are shown as swarm plots for the CTRL and CRC groups. The thick vertical lines indicate the medians in the different groups and the black horizontal line shows the difference between the two medians, which corresponds to the classical (median-based) fold change. Since F. nucleatum subspecies animalis is not detectable in more than 50% of cancer cases, there is no difference between the CTRL and CRC median; thus, the fold change is 0. The lower row shows the same data, but instead of only the median (or 50th percentile), 9 quantiles ranging from 10 to 90% are shown by the thinner vertical lines. The generalized fold change is indicated by the horizontal black line again, computed as the mean of the differences between the corresponding quantiles in both groups. In the case of the sparse data (for example, F. nucleatum), the differences in the 70, 80, and 90% quantiles cause the generalized fold change to be higher than 0. b, The median fold change is plotted against the newly developed generalized fold change for all microbial species. (The core set of microbial CRC marker species is highlighted in orange.) Marginal histograms visualize the distribution for both fold change and generalized fold change. c, Scatter plots showing the relationship between fold change and generalized fold change and the area under the ROC curve (AUROC) or the shift in prevalence between CRC and CTRL, with Spearman’s rank correlations (rho) added in the top left corners; the generalized fold change provides higher resolution (wider distribution around 0) and better correlation with the non-parametric AUROC effect size measure as well as prevalence shift, which captures the difference in prevalence of a species in CRC metagenomes relative to CTRL metagenomes.

Source Data

Back to article page