Fig. 1: Effect of different species-level references on the success of genus-level identification of Phaeocystis.
From: Missing microbial eukaryotes and misleading meta-omic conclusions

A Abundance of metagenomic proteins in each ocean basin coassembled from the Tara Oceans dataset annotated to be Phaeocystis by a combined database of the colony-forming references (left in each group; purple), a combined database of the free-living references (middle in each group; pink), a combined database of all Phaeocystis references (right in each group; black). Each group of bars represents either the large (>20 μm) or the small size (0.8–5 μm) fraction samples. Abundance is shown via read coverage (TPM) of annotated metagenomic contigs. B Phylogenetic tree of Phaeocystis references and genomic and transcriptomic outgroups. The bars to the right of the tree show the total number of orthogroups in each species that are a, pink or lavender: shared by other members of the same ecotype (colony-former or free-liver), b, maroon: shared among multiple Phaeocystis species regardless of ecotype, or c, white: present only within one species. C Percentage of sequences from the coassembly from the Southern Ocean Tara Oceans samples annotated to be Phaeocystis by any of the databases that were annotated as Phaeocystis using (top group of two bars) a combined reference database containing all of the free-living Phaeocystis references, (middle group of bars) a combined reference database containing all of the colony-forming Phaeocystis references, (bottom group of bars) a combined reference database containing all Phaeocystis references. The top bar in each group (brown) corresponds to the smallest Tara Oceans size fraction, while the bottom bar in each group (blue) corresponds to the largest Tara Oceans size fraction. D Identical to Panel C, but for the Tara Oceans samples from the Mediterranean Sea.