Extended Data Fig. 4: Subspecies-level classification performance by clade size.
From: Metabuli: sensitive and specific metagenomic classification via joint analysis of amino acid and DNA

All 2,382 query subspecies used in Extended Data Fig. 1a were divided into groups according to the number of subspecies siblings they had in the reference database, that is, by their species clade size. The average F1 score for queries in each group decreases as the clade’s size increases, indicating that more sibling subspecies pose a harder classification challenge to all tools. Precision and recall are available in Supplementary Table 10.