Fig. 2: Protein foundation models predict homo-oligomer symmetry more accurately than current template-based methods.
From: Rapid and accurate prediction of protein homo-oligomer symmetry using Seq2Symm

a Performance, measured using area under the precision-recall curve (AUC-PR), for the various methods on the held-out test split of our dataset. The AUC-PR shown is the macro-average over class-wise AUC-PR, with class-weighted AUC-PR results as well as validation set results in Supplementary fig. 1. b Performance of representative models on two other completely unseen datasets, the “UniFold test set” from prior work (see Methods for dataset details, Supplementary Table 7) and “PDB 2024” involving homo-oligomers released by PDB in 2024 (see Supplementary Table 13 for details). The AUC-PR is a macro-average over class-wise AUC-PR for the classes in this dataset. (c) Confusion matrix of one of the baselines: HHSearch and Seq2Symm (a fine tuned ESM2-based model), showing the symmetries where there is confusion. This matrix is shown for only proteins with a single label (i.e. multi-label examples are excluded). d Test AUC-PR for each homo-oligomer symmetry shown for the best model, an ESM2-based fine tuned model. e Class-wise AUC-PR on the test set, averaged over sequence-based models (orange bars) and MSA-based models (blue bars) in the bar chart, with individual model performances in each category shown by the points. We find that the models using a sequence-only representation (triangle points, n = 3) achieve a higher AUC-PR for nearly every symmetry class, as compared to the MSA representation-based models (circular points, n = 3). The biggest gains are seen on higher-order symmetries such as C4, C5, C7–C9, D5. f Inference and training time taken by each protein foundation model, shown per input example. The rightmost plot compares the time taken for full structure prediction using a brute-force search involving AF2 multimer vs. an approach that uses Seq2Symm to obtain the homo-oligomer symmetry first followed by structure generation using AF2. Total time is shown in seconds averaged over 10 proteins with C5 symmetry. g Distribution of homo-oligomer symmetries in our test set.