Fig. 4: Distribution of SAScore for training dataset and sets of generated molecules for ChEMBL and zinc. | Nature Communications

Fig. 4: Distribution of SAScore for training dataset and sets of generated molecules for ChEMBL and zinc.

From: t-SMILES: a fragment-based molecular representation framework for de novo ligand design

Fig. 4

For more detailed information on more curves and Wasserstein distance metrics, please refer to sections SI.E.4 and SI.E.5. a, b and c Baseline, Singleton t-SMILES and Hybrid t-SMILES models on ChEMBL. d, e and f Baseline, Singleton t-SMILES and Hybrid t-SMILES models on Zinc. The letter “H” indicates a hybrid model, while the letters “J”, “B”, “M”, and “S” indicate fragmentation algorithms: JTVAE, BRICS, MMPA, and Scaffolds, respectively. “V” indicates TS_Vanilla. Figures suggest that some singleton or hybrid models in the t-SMILES family seem to capture these physicochemical properties as effective or better than SMILES, DSMILES, and SELFIES in a similar experiment. Furthermore, TSDY and TSID models provide a much better fit to the training data compared to TSSA models. The baseline models hG2G and JTVAE demonstrate limited capabilities for pattern learning from the training data, with the lowest performance.

Back to article page