Fig. 3: The retrospective experiment for DRD2.
From: Accelerating discovery of bioactive ligands with pharmacophore-informed generative models

a A t-SNE plot of 108-bit pharmacophore fingerprints of 7939 active and 7939 inactive ligands of DRD2. b Data preparation in this experiment. “GuacaMol-DRD2 clean” refers to the subset of the original GuacaMol dataset with every known active ligand of DRD2 excluded. Baseline conditions consist of 3717 compounds randomly sampled from “GuacaMol-DRD2 clean''. DRD2 actives were divided into two subsets by scaffold clustering (see Section “Settings for DRD2 recall experiment''). One subset consists of 3717 ligands of DRD2, which are visible to TransPharmer during training and used as active conditions during generation, while the other subset contains 4222 actives to be rediscovered. c The recall rate and precision count of generated molecules using active and baseline conditions. d Some recalled active ligands and their most similar counterparts TransPharmer has seen during training. An example of recalling active ligands using shortcuts is shown on the top panel while an example of recalling actives beyond any obvious shortcuts is shown at the bottom panel. The molecular similarity between the recalled and seen ligand is shown. Source data are provided as a Source Data file.