Extended Data Fig. 7: Multiple data transformations identify nonlinear relationship between phylogenetic and metabolomic distance.
From: A metabolomics pipeline for the mechanistic interrogation of the gut microbiome

a, Heat map showing the comparison of phylogenetic and metabolomic tree topologies. Cells record the number of tips for which the neighbourhoods share more overlap than expected (P < 0.05; one-sided permutation test). Data are stratified by fractional overlap of neighbourhoods and permutation probability (see Supplementary Methods, ‘Distance comparisons’). b, Histogram of chemical similarity scores (based on Tanimoto 2D structures) between each unique pair of compounds (by PubChem CID) detected in the in vitro dataset. For this pairwise comparison, 359 non-co-eluting compounds were used. c, Metabolomic distance tree with each metabolite weighted based on their chemical similarity (left) or unweighted control metabolomic distance tree (right). The weighted and unweighted matrices were calculated using uniquely detected, non-co-eluting compounds in the in vitro dataset, for which a unique PubChem CID identifier can be assigned to each compound. Two-sided Mantel test for comparison between the weighted and unweighted distance matrices: r2 = 0.863, P = 0.001. d, Left, correlation of phylogenetic and metabolomic distance across pairs of strains coloured by lowest shared taxonomic rank with a LOESS fit shown. Dashed vertical line occurs at x = 0.11 as referenced in the text. Right, Metabolomic distance between pairs of strains binned by the lowest shared taxonomic rank. Species (n = 111), genus (n = 1,386), family (n = 159), order (n = 1,222), class (n = 34), phylum (n = 1,442) and kingdom (n = 8,442). Box, median, 25th and 75th percentiles; whiskers, Tukey’s method. e–i, Internal-standard-corrected fold-change data (e–g) and internal-standard-corrected total ion count data (h, i) were log-transformed and used to calculate pairwise metabolomic distances between microbial taxa. These distances were compared to the corresponding pairwise phylogenetic distances generated from a tree built with the V4 region of 16S (left) or the full-length 16S gene (right). Data are plotted with a LOESS fit. Set 1, microorganisms grown in at least one experiment simultaneously. Set 2, microorganisms grown in the same experiment only. j, Phylogenetic tree constructed using the full 16S sequences of a subset of the strains grown in mega medium. Only strains with available full 16S sequences are shown (Supplementary Table 6). k, Left, schematic of the pathway that synthesizes citrulline and ornithine, or synthesizes agmatine and/or putrescine. Right, the top six matches identified by the comparative genomics tool MultiGeneBlast within a 40-kb search window, when searched against a genomic database of our strain collection with sequenced genomes. Horizontal dashed lines between genes represent multiple other genes present within the search window.