Fig. 3: Low-dimensional(t-SNE) representation of Scorpio embeddings for fine-grained taxonomic analysis.
From: Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization

We visualized embeddings of a genomic region containing the uvrA gene in Fig. 2a. Points are color-coded based on different taxonomic levels: a the 10 most common genes, (b) phyla, (c) classes, (d) orders, (e) families, and f genera. Scorpio effectively clusters short fragments from the same taxonomic group together, demonstrating strong taxonomic consistency across all hierarchical levels.