Fig. 3: Phylogenetic regression identifies genes associated with mucin-carrier enrichment.

a Phylogenetic regression identifies significant associations between log-carrier-enrichment score (red/blue indicates positive/negative carrier enrichment, respectively) and gene presence–absence patterns (lighter/darker shades of gray indicate gene presence/absence, respectively) across the 80 top prevalent strains detected in passaged samples. We use this model to test a total of 9857 KEGG KO gene families determined using kofamscan29, accounting for phylogenetic relatedness between strains assuming Brownian motion along evolutionary branches. b Volcano plot of two-sided phylogenetic linear regression model using Brownian Motion model for covariance, where each dot represents one KEGG KO— horizontal line at FDR=0.01. Horizontal axis is clipped at 0.1 and 99.9 percentiles, highlighted gene families colored in red. c Bacteroides dorei 5-1-36-D4 and DSM-17855 both harbor a coenzyme F420 dehydrogenase gene (KEGG KO K00441) colocalized amongst LPS/EPS-related gene clusters—these features are collectively missing from the corresponding region in the Bacteroides sp. 9-1-42FAA genome. Plotted genome coordinates: Bacteroides dorei 5-1-36-D4 4619828-4663627bp, Bacteroides dorei DSM-17855 4750505-4793496bp, and Bacteroides sp. 9-1-42FAA 4767641-4780151bp.