Fig. 4: The pre-filtering results of PfamFamily & PfamClan.
From: PLMSearch: Protein language model powers accurate and fast sequence search for remote homology

a We evaluated the pre-filtering results of PfamFamily & PfamClan on the SCOPe40-test, Swiss-Prot to Swiss-Prot, and SCOPe40-test to Swiss-Prot search tests (see “Datasets"). PfamClan achieves a higher recall rate. b Same(1) or Different(0) fold on SCOPe40-test. c–d TM-score distributions using kernel density estimation (smoothed histogram using a Gaussian kernel with the width automatically determined). c Swiss-Prot to Swiss-Prot; d SCOPe40-test to Swiss-Prot. The distribution of PfamFamily is overall to the right, because the requirements of PfamFamily are stricter than PfamClan, so the protein pair it recalls has a higher probability of being in the same fold and sharing a higher TM-score. However, this also leads to PfamFamily having a lower recall rate and missing some homologous protein pairs as shown in Supplementary Table 11. It is worth noting that the recall rate is more important in the initial pre-filtering. Source data are provided as a Source Data file.