Fig. 1: Expanding the set of phylogenetic markers with TMarSel. | Nature Communications

Fig. 1: Expanding the set of phylogenetic markers with TMarSel.

From: Augmenting microbial phylogenomic signal with tailored marker gene sets

Fig. 1

a Summary statistics of KEGG gene families annotated from 1510 WoL2 genomes (top) and 793 EMP MAGs (bottom). The left panels show the gene family space defined by the proportion of genomes in which each gene family is present and the proportion of genomes in which the gene family exists as a single copy. Dashed orange lines represent traditional marker selection criteria (i.e., present in 90% of genomes and containing a single copy in at least 95% of them), and the orange region highlights the area defined by these criteria. Each dot corresponds to a gene family. The middle panels show the number of gene families outside and within the region defined by traditional criteria. The left panels show the distribution of gene families across genomes and MAGs, with dashed vertical lines indicating the arithmetic mean. b Illustration of the pipeline used for species tree inference (top) and detailed overview of TMarSel’s steps for marker selection (bottom). c A simple toy case with 5 genomes and all 32 combinations of gene presence (black squares) or absence (white squares). TMarSel iteratively selects k = 10 markers (crimson shade) that maximize the generalized mean of the number of selected genes per species, as denoted on top. Source data are provided as a Source Data file.

Back to article page