Fig. 4: We examined Kraken2 and MMseq2 thresholds and their impact on the number of classified instances. | Communications Biology

Fig. 4: We examined Kraken2 and MMseq2 thresholds and their impact on the number of classified instances.

From: Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization

Fig. 4

As shown in (b), the number of classified instances varies across datasets for each method. In the “Gene Out" dataset, Kraken2 classified 2.8% of instances, while MMseq2 classified 28%. We compared our model’s precision across these different thresholds (c–e), with (green) representing Kraken2-like thresholds, (blue) for MMseq2-like thresholds, and (red) for our thresholds. In a, at a 2.8% classification rate (green), our Scorpio model achieved 94% precision, compared to Kraken2's 43%. At a 28% classification rate (blue), our Scorpio model achieved 90% precision, while MMseq2 achieved 50%. This analysis demonstrates our model’s effectiveness in maintaining high precision while balancing the number of classified instances in novel sequences.

Back to article page