Extended Data Fig. 6: Phylogenetic analysis of the bhc gene cluster.
From: Marine Proteobacteria metabolize glycolate via the β-hydroxyaspartate cycle

a, Genome-based maximum likelihood phylogenetic tree of bacterial strains with the bhc gene cluster. The bhc gene cluster is found in Gammaproteobacteria (green), and in the alphaproteobacterial orders Rhizobiales (blue) and Rhodobacterales (red), as well as in one member each of Sphingomonadales and Kiloniellales. The phylogenetic tree is based on an alignment of 120 bacterial marker genes from 264 publicly available bacterial genomes and 5 MAGs and was calculated using GTDB-Tk64 (https://github.com/Ecogenomics/GtdbTk). If several strains from the same genus cluster together, nodes are collapsed at the genus level, and the size of the resulting circle corresponds to the respective number of strains. Loktanella*: collapsed node contains the MAGs 20110516_Bin_8_1 and 20110523_Bin_9_1; Planktotalea**: collapsed node contains the MAG 20110523_Bin_97_1; Litoricola***: collapsed node contains the MAG 20110526_Bin_19_1. b, Maximum likelihood phylogenetic tree of concatenated BHAC enzyme sequences. Colour code is the same as in a. Phylogenetic groups that were mostly isolated from terrestrial or freshwater habitats are marked with a black dot. Comparison with a reveals that the sequences of the BHAC enzymes are not phylogenetically representative, as, for example, alpha- and gammaproteobacterial sequences form a common branch and sequences from terrestrial or freshwater Rhizobiales and Rhodobacterales form another common branch. This suggests that the bhc gene cluster might have been subject to horizontal gene transfer between distantly related strains in shared habitats. The environmental bhc gene cluster sequence that could not be binned successfully is marked in bold and clusters together with isolated representatives of Pseudoruegeria, Litoreibacter and Pseudooceanicola. The phylogenetic tree is based on the concatenated alignments of the 4 enzymes (BhcA–BhcD) from 264 publicly available bacterial genomes and from 6 metagenome contigs. It was calculated using raxmlGUI67. Bootstrap values of at least 50 are given on the respective nodes; calculated branch lengths of the tree are ignored for the sake of better visualization. If several strains from the same genus cluster together, nodes are collapsed at the genus level, and the size of the resulting circle corresponds to the respective number of strains. If strains from the same genus cluster in more than one node, the respective branches are labelled as Genus_1, Genus_2, and so on, in a clockwise manner. Loktanella_2*: collapsed node contains the MAGs 20110516_Bin_8_1 and 20110523_Bin_9_1; Planktotalea**: collapsed node contains the MAG 20110523_Bin_97_1; Litoricola***: collapsed node contains the MAG 20110526_Bin_19_1. a, b, Taxonomy is based on GTDB (release 03-RS86; http://gtdb.ecogenomic.org/). All strains contained in the phylogenetic trees are listed in Supplementary Data 1.