Fig. 6: Sources of error in the ML surrogate model and strategies for tuning redox potential.

a, Outliers in ML-predicted stability scores relative to DFT values. The predicted location of maximum spin is highlighted for both methods, and the resulting stability score is shown for DFT and ML. b, An example prediction error for redox potential due to lack of similar molecules present in the training database. The input radical is shown on the left, with the five closest training set radicals shown on the right. c, On the left, the distribution in redox potentials for all training set molecules is coloured according to each radical’s SOMO energy. On the right, the distribution of SOMO energies of radicals in the training database is shown, with the grey region extending from the 5th to the 95th percentile of the range observed in RL-optimized candidates. Example structures from the radical database are shown for various SOMO energies.