Fig. 2: The accuracy of different classifiers for predicting the substrate specificity of AT domains.

a The notation Bk + represents the bin containing data points at least k Hamming distances away from any training data points. While most methods achieve similar accuracy, the extra-tree classifier generalizes better, i.e., it achieves higher accuracy for testing samples that are dissimilar to the training samples. b Confusion matrix for extra-tree prediction. The results are averaged across five different shuffles (used in fivefold cross-validation) and rounded to the nearest integer.