Fig. 2: DHR outperforms previous methods on the SCOPe dataset regarding both speed and sensitivity. | Nature Biotechnology

Fig. 2: DHR outperforms previous methods on the SCOPe dataset regarding both speed and sensitivity.

From: Fast, sensitive detection of protein homologs using deep dense retrieval

Fig. 2: DHR outperforms previous methods on the SCOPe dataset regarding both speed and sensitivity.The alternative text for this image may have been generated using AI.

All methods used a single sequence as input unless especially mentioned. a, A t-distributed stochastic neighbor embedding visualization of the seven largest SCOPe family embeddings from DHR. b, Violin plot of recall (sensitivity) rate comparison of DHR to other methods on the SCOPe dataset of 5,065 queries. Right, the detailed performance of these methods within the hard cases (MMseqs2 recall rate below 0.34, colored in red). c, Head-to-head comparison of sensitivity between DHR and other methods. d, AUC-1FP metric comparison of different methods on the SCOPe family level (n = 5,065 queries) and the superfamily level (n = 2,065 queries). Data are presented as bar plots where the height of each bar represents the mean and error bars indicate the s.d. e, Regression plot of the recall rate against the SCOPe family size. f, Scatter plot of AUC-1FP against the number of sequence hits with the SCOPe dataset on UniRef90 (training set). Further visualization of the AUC-1FP of the zero-hit samples (n = 990) is provided as the mean value, with error bars indicating the s.d. g, The r.m.s.d. and TM-score relationship between the structure retrieved by DHR and the query structure according to the rank. We averaged the score on each rank with a central curve that connects the mean values of the data points, with two additional curves surrounding the central curve representing the s.d. A higher TM-score or lower r.m.s.d. indicates better structural similarity. h, Failure of k-mer-based methods on the c.55.3.5 family. The predicted structure similarity using DHR is much higher than that using k-mer-based methods, indicating that sequence-based methods using k-mers do not consider structural information. i, Two samples ranked highly by DHR for the c.55.3.5 family. SCOPe did not include them in the family but SCOP2 did. More potential problematic classification examples are shown in Supplementary Figs. 1–3 and Supplementary Table 2.

Back to article page