Fig. 3: Evaluating the performance and functional makeup of TMarSel-selected markers on genomes from the WoL2 dataset.
From: Augmenting microbial phylogenomic signal with tailored marker gene sets

a Topological accuracy, measured as the proportion of confident branches, of inferred trees from 1510 WoL2 genomes. b nRF distance between inferred trees and WoL2 reference phylogeny. Left panels indicate the tree quality of previous marker sets, whereas middle and right panels show the tree quality of KEGG and EggNOG markers, respectively. Each dot corresponds to a tree inferred from different sets of markers (color-coded). Solid lines and filled symbols indicate trees derived from genomes containing at least 25% of markers, while dashed lines and unfilled symbols indicate trees from genomes with at least one marker. c Relative abundance of KEGG higher functional categories (color-coded) across markers selected from KEGG gene families. d Topological accuracy of trees inferred from taxonomically imbalanced datasets generated from the WoL2 genomes across n = 10 replicates and e nRF distance between inferred trees and the WoL2 reference phylogeny. Unfilled bars indicate that trees were inferred from genomes with at least one marker, and their color displays the marker set (color-coded). Data are presented as the mean ± standard deviations. Asterisks show the significance level of two-sided two-sample Mann-Whitney U tests after Benjamini–Hochberg correction between trees inferred from KEGG (left) and EggNOG (right) markers versus previous markers. Significance levels: n. s.: p > 0.05; *: p ≤ 0.05; **: p ≤ 0.01; ***: p ≤ 0.001. Source data and test statistics are provided as a Source Data file.