Table 3 Accuracy of fold class assignment on SCOPe.

From: Real-time structure search and structure classification for AlphaFold protein models

Method

Accuracy

Overall

\(\alpha\)

\(\beta\)

\(\alpha \beta\)

Small proteins

Expert handmade (without optimization)

0.852

0.683

0.771

0.961

0.357

Expert handmade (optimized)

0.880

0.759

0.889

0.928

0.500

Multinomial logistic regression

0.863

0.916

0.861

0.851

0.818

SVM (linear)

0.445

0.991

0.927

0.069

0.548

SVM (RBF kernel)

0.896

0.947

0.869

0.896

0.861

Bagged SVM (RBF kernel)

0.915

0.943

0.882

0.937

0.621

  1. Fold classes were assigned to AlphaFold2 models based on secondary structure content and sequence length. Here we show the benchmark results from optimizing these classifiers on the original manually curated SCOPe fold classes. For the expert handmade classifiers, secondary structure content and protein length conditions were defined for each fold class. The first classifier without optimization used the following conditions: \({{{{{{\rm{length}}}}}}} < 50{aa}\to {{{{{{\rm{small}}}}}}}\); else \({{{{{{\rm{helix}}}}}}}\ge 60 \% \to \alpha\); else \({{{{{{\rm{sheet}}}}}}}\ge 35 \%\) and \({{{{{{\rm{helix}}}}}}} < 20 \% \to \beta\); else \(\to \alpha \beta\). The second one optimized the actual threshold values by parameter sweep of an increment of 5% for secondary structure content and increments of 5aa for the sequence length. The optimized mapping was: \({{{{{{\rm{length}}}}}}} < 55{aa}\to {{{{{{\rm{small}}}}}}}\); else \({{{{{{\rm{helix}}}}}}}\ge 55 \% \to \alpha\); else \({{{{{{\rm{sheet}}}}}}}\ge 25 \%\) and \({{{{{{\rm{helix}}}}}}} < 20 \% \to \beta\); else \(\to \alpha \beta\). For the other classifiers, lengths and secondary structure proportions were used directly as features. For each classifier, accuracy is shown both overall and per-class.