Fig. 4: Annotating and aligning unclassified putative bacteriocins using TM-Vec.
From: Protein remote homology detection and structural alignment using deep learning

a, Visualization of TM-Vec embeddings using t-SNE for 689 proteins across three classes of bacteriocins in addition to 28 unclassified putative bacteriocins. For 94% of the annotated bacteriocins, the nearest neighbor to a classified bacteriocin was in the same bacteriocin class. b, Visualization of class 1 bacteriocins by subclass, highlighting how TM-Vec can recover multiple levels of manual annotation without protein structures. c, Comparison of TM-Vec’s TM-score predictions with the TM-scores produced by running TM-align on structures predicted by AlphaFold2, OmegaFold and ESMFold for 238,000 pairs of bacteriocins. Using a TM-score of 0.5 as a structural similarity cutoff, TM-Vec could distinguish pairs of proteins that were in the same class versus different classes and in the same subclass for class 1 bacteriocins, whereas TM-align on predicted structures from AlphaFold2, OmegaFold and ESMFold could not. Bounds of the boxplots denote 25% and 75% percentiles, the center is the 50% percentile and the whiskers are denoted by the 1.5× interquartile range. d, DeepBLAST alignments for a putative bacteriocin, YP_006656667, and its three nearest neighbors in embedding space (that is, those with the highest predicted TM-scores).