Fig. 4: Accurate predictions even for enzymes with distinct sequence similarity compared to enzymes in the training data.
From: A general model to predict small molecule substrates of enzymes based on machine and deep learning

We divided the test set into subsets with different levels of enzyme sequence identity compared to enzymes in the training set. a ESP accuracies, calculated separately for enzyme-small molecule pairs where the small molecule occurred in the training set and where it did not occur in the training set. b ESP ROC curves. The dotted line displays the ROC curve expected for a completely random model. Source data are provided as a Source Data file.