Fig. 4: Performance of each regression method tested in this work using bond lengths as input features compared with results obtained using Marvin.
From: Aqueous pKa prediction for tautomerizable compounds using equilibrium bond lengths

a The 7-fold RMSEE for each model tested, for each method, where “Model ID” corresponds to one of 31 combinations of features out of the 5 bonds i–v chosen for consideration (see Supplementary Table 8 for the full list). The C–O, ii bond is used as a feature for the Model ID numbers shaded in blue. b Experimental pKa variation across the test set (dark blue), along with Marvin predictions using the diketo state with tautomer consideration turned on (blue), and using the keto-enol state with tautomer consideration turned off (magenta), as well as the AIBL-pKa C–O bond model (green). c Root-mean squared error of prediction for the test set (RMSEP, blue) and mean absolute error for the test set (MAE, green) for each method of prediction. Marvin predictions are removed for the plot shown in the inlay, so that AIBL models can be compared. d The structure of Profoxydim, for which the literature experimental pKa value (5.91) and Marvin’s prediction (5.44, tautomer/resonance not considered, keto-enol form used) deviated significantly from our prediction. The new experimental value of 4.82, measured in this work matches our initial prediction more closely.