Extended Data Fig. 7: Phylogenetic reconstruction of AprA peptides.

The maximum-likelihood tree was built with IQ-TREE. A total of 421 Arenicellales genomes were predicted to encode AprA, while 528 were predicted to contain the single copy gene recA. The aprA gene was found in the families UBA868 (398 genomes), UBA5680 (17 genomes), BMS3Bbin11 (represented by one genome), LS-SOB (present in all four genomes) and one unclassified Arenicellales. UBA868 peptides fall in three clusters. UBA5680, BMS3Bbin11 and LS-SOB AprA peptides are within the Gammaproteobacteria in the figure. This reference tree was used to predict the function and taxonomy of the metatranscriptome reads by placing the corresponding peptides on the tree. The smaller tree at the upper right shows part of the UBA868 sequences and some of the Ocean Microbial Reference Gene Catalog (OM-RGC) peptides that were placed within this subcluster. For the sake of simplicity in Figs. 1 and 6, subclusters with the bigger font in their names include the taxa with smaller font. Circles at nodes are proportional to the bootstrap values ≥70% based on 100 replicates. Scale bar indicates number of substitutions per site.