Extended Data Fig. 8: Proportions of top 10 species (their corresponding batches) in the 661k collection before and after phylogenetic compression. | Nature Methods

Extended Data Fig. 8: Proportions of top 10 species (their corresponding batches) in the 661k collection before and after phylogenetic compression.

From: Efficient and robust search of microbial genomes via phylogenetic compression

Extended Data Fig. 8

The plot depicts the proportions of the top 10 species, the Dustbin pseudo-cluster, and the remaining species grouped as Others, while comparing the following four quantitative characteristics: the number of genomes, their cumulative length, the size of the phylogenetically compressed assemblies, and the size of the phylogenetically compressed COBS indexes (for k = 31). Transitioning from the number of genomes to their cumulative length has only a minor impact on the proportions (corresponding to different mean genome lengths of individual species). However, the divergent genomes occupy a substantially higher proportion of the collection after compression. Moreover, despite genome assemblies and k-mer COBS indexes being fundamentally different genome representations (horizontal vs. vertical, respectively), the observed post-compression proportions in them were nearly identical.

Back to article page