Extended Data Fig. 1: Batching strategies for the 661k (a) and BIGSIdata (b) collections.
From: Efficient and robust search of microbial genomes via phylogenetic compression

Genomes are clustered by species, and clusters that are too small are placed into a common pseudo-cluster called a dustbin. The resulting clusters and the dustbin are then divided into size- and diversity-balanced batches. For more information on batching, see Methods and Supplementary Note 5.