Extended Data Fig. 8: Analysis of Random Forest Generated and Selected Sets. | Nature Machine Intelligence

Extended Data Fig. 8: Analysis of Random Forest Generated and Selected Sets.

From: Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

Extended Data Fig. 8

(a) The percent of building blocks that appear at different frequencies among the generated or selected compounds by SyntheMol with random forest. Building blocks are assigned to bins on the x-axis based on the number of generated or selected compounds that contain that building block, with the final bin including building blocks that appear in at least six compounds (max 212). (b) The distribution of chemical reactions used by the generated or selected compounds by SyntheMol with random forest. (c–f) A comparison of the properties of the 27,396 molecules generated by SyntheMol with the random forest antibacterial model and the 50 molecules selected from that set after applying post-hoc filters. (c) The distribution of nearest neighbor Tversky similarities between the generated or selected compounds and the active molecules in the training set. (d) The distribution of nearest neighbor Tversky similarities between the generated or selected compounds and the known antibacterial compounds from ChEMBL. (e) The distribution of random forest antibacterial model scores on the generated or selected compounds as well as on a random set of 25,000 REAL molecules. (f) The distribution of nearest neighbor Tanimoto similarities among the generated or selected compounds.

Back to article page