Fig. 2: Analysis of the dataset generated in this work (ANI-1xnr) with nanoreactor active learning.

a–d, A comparison between the ANI-1xnr dataset (blue points) and a non-reactive, near-equilibrium, molecule in vacuo, AL dataset from the literature (ANI-1x; red points). Two-dimensional visualizations of the local atomic environments for the elements H (a), C (b), N (c) and O (d). The ANI-1xnr dataset not only encompasses the vast majority of the regions sampled in the ANI-1x dataset, but it also interpolates between these regions and even extends these regions substantially. For visual clarity and to manage memory loads, only a random subset of the ANI-1x dataset and ANI-1xnr dataset are depicted in a–d. e, Five examples of the over 1,000 unique molecules that formed during AL. Reaction pathways to form these molecules must, therefore, be present in the ANI-1xnr dataset.