Fig. 2: Conservation of gene clusters identified by Spacedust predicts functional association.
From: De novo discovery of conserved gene clusters in microbial genomes with Spacedust

a, Distribution of cluster sizes of all 106.6 million pairwise cluster matches among 1,308 bacterial genomes. b, Number of all (green), annotated (blue) and unannotated (orange) genes forming part of a cluster match in at least the number of genomes shown on the x axis. c,e, Precision of the functional association of gene pairs, separated by up to four genes in Spacedust cluster matches, versus the number of genomes in which the pair is conserved. True positive predictions are those gene pairs with the same KEGG module IDs. c, Foldseek+MMseqs search. e, Foldseek-only search with ProstT5. d,f, Precision versus recall of functional association of gene pairs separated by up to four genes. The analysis excludes ribosomal genes; see Extended Data Fig. 1 for analysis with ribosomal genes. d, Foldseek+MMseqs search. f, Foldseek-only search with ProstT5. AUC, area under the curve.