Fig. 7: Example surrogate decision tree model.

The task is to predict reaction successes from tabular input data, for which a SVM model achieves good accuracy. A surrogate tree is then built using original SVM model predictions to help understand the original model. The surrogate tree is shown above. Ovals represent decision nodes. Rectangles represent reaction-outcome bins. Triangles represent excised subtrees. Each reaction-outcome bin contains a reaction-outcome value (3 or 4) and the number of reactions assigned to that bin (shown in parenthesis). Bins containing the most successful reactions and their associated synthesis paths are identified and colored. The authors generate the following new testable chemical hypotheses by inspecting the green, blue and red subtrees: (1) Small, low-polarizability amines require the absence of competing Na+ cations and longer reaction times. (2) Spherical, low projection-size amines require V4+-containing reagents such as VOSO4. (3) Long tri- and tetramines require oxalate reactants. Figure reprinted from ref. 20 with permission. Copyright 2016 Springer Nature.