Extended Data Fig. 5: The frequency that an embedding is ranked as the best across 34 datasets using NDCG. | Nature Computational Science

Extended Data Fig. 5: The frequency that an embedding is ranked as the best across 34 datasets using NDCG.

From: Persistent spectral theory-guided protein engineering

Extended Data Fig. 5

a–d, This is an analog for Extended Data Figure 3 but measured by NDCG. Histograms show the frequency that an embedding is ranked as the best across 34 datasets with 24, 96, 168 and 240 training data, respectively. For each dataset, the best embedding has average NDCG over n = 20 repeats within the 95% confidence interval of the embedding with the highest average NDCG. Comparisons were performed for a sequence-based embeddings; b structure- and sequence-based embeddings; c structure-based embeddings, sequence-based embeddings and evolutionary scores and d structure-based embeddings, sequence-based embeddings, evolutionary scores and two sets of TopFit (VAE+PST+ESM and VAE+PST+eUniRep). We showed and used absolute values NDCG for evolutionary scores.

Source data

Back to article page