Fig. 1

Association of diverse microbial CAGs with CRC and health. (A) Stacked histogram depicting numbers of genes assembled per sample (top) and CAGs per sample (bottom). (B) Correlation between numbers of CAGs and genes across the aggregated training dataset, color-coded by cohort. The gray dots, which represent metagenomes from individuals with adenomas, were not used in generating our CAG-based model. (C) Bacterial CAGs that are enriched in CRC or in health are encoded in the genomes of phylogenetically diverse bacteria observed at varying abundances, from extremely rare to highly prevalent. (D) Phylum-level differences between CRC-associated and health-associated CAGs. (E) CRC-association Wald statistics of all identified CAGs, collapsed by species-level taxonomic classifications and rank ordered by mean Wald statistic.