Fig. 3: Structural analysis of generated compounds by G2D-Diff and potential candidates. | Nature Communications

Fig. 3: Structural analysis of generated compounds by G2D-Diff and potential candidates.

From: A genotype-to-drug diffusion model for generation of tailored anti-cancer small molecules

Fig. 3

a Frequency-based cluster map for the scaffolds of the ground truth very sensitive compounds. b Frequency-based cluster map for the scaffolds of the generated very sensitive compounds. For (a, b), each column indicates a cell line in the evaluation set 1, and each row is the frequency count of a unique scaffold. Scaffolds with larger frequencies are represented by brighter colors. c Maximum structural and pharmacophore similarity comparison between randomly generated compounds by Chemical VAE and generated compounds by G2D-Diff. Sample size N is 60,000 for each response class, and statistical significance from the one-sided Mann–Whitney U test is indicated as asterisks. Violin plots show the distribution density (width of the plot) along with median (white dot), interquartile range (25th to 75th percentile; thick bar), and the minimum and maximum values within 1.5 times the interquartile range (thin line). d 2D PCA plot of physicochemical properties for very sensitive compounds. e 2D PCA plot of physicochemical properties for sensitive compounds. For (d, e), RDKit descriptors are used for physicochemical properties after standardization. Generated and ground truth compounds are represented as blue and orange colors, respectively. Marginal distributions of samples are depicted in both PC axes. f Criteria for selecting potential drug-like hit candidates for each cell line. g Selected potential drug-like hit candidates correspond to the query cell lines. Source data are provided as a Source Data file.

Back to article page