Extended Data Fig. 9: Comparisons between TopFit with other methods. TopFit consists of VAE score, ESM embedding and PST embedding.

This is a supplement for Fig. 4b to include results with various numbers of training data. Average Spearman correlation from n = 20 repeats are shown, and all datasets are categorized by their structure modality used: X-ray, nuclear magnetic resonance (NMR), AlphaFold (AF) and cryogenic electron microscopy (EM). One-sided rank-sum test determines the statistical significance that TopFit has better performance than other strategies, except we use null hypothesis that TopFit has worse performance than VAE with 24 training data. The p-values are shown in the corresponding subfigures. They are 1. TopFit versus VAE: P = 4 × 10−6, 2 × 10−5 and 1 × 10−6; 2. TopFit versus ESM: P = 2 × 10−7, 2 × 10−7 and 2 × 10−7 and 3. TopFit versus PST: P = 2 × 10−7, 2 × 10−7 and 2 × 10−7 for training data size 24, 96 and 168, respectively.