Fig. 3: Novelty and FCD (Fréchet ChemNet Distance) scores against SMILES, DSMILES, SELFIES, TSSA_S, TSSA_HSV, and TSSA_Hybrid on different training epochs. | Nature Communications

Fig. 3: Novelty and FCD (Fréchet ChemNet Distance) scores against SMILES, DSMILES, SELFIES, TSSA_S, TSSA_HSV, and TSSA_Hybrid on different training epochs.

From: t-SMILES: a fragment-based molecular representation framework for de novo ligand design

Fig. 3

Seven data points from large to small for each code indicate the number of training epochs:50, 100, 200, 300, 500, 1000, and 2000 respectively. The number of training epochs is indicated by the use of various sizes and gradient colors. TSSA_Hybrid represents the hybrid codes, including TS_Vanilla, TSSA_J (JTVAE-based TSSA), TSSA_B (BRICS-based TSSA), TSSA_M (MMPA based TSSA) and TSSA_S (Scaffold-based TSSA). In general, all models demonstrate a consistent increase in FCD scores with the increasing number of training iterations. After 200 training epochs, which are marked by a stroke, the novelty scores of the SMILES, DSMILES, and SELFIES models exhibit a notable decline, reaching a value close to zero. In contrast, the singleton or hybrid t-SMILES models exhibit considerably higher score stability, with values of over 0.8, 0.6, and 0.4 for TSSA_S, TSSA_HSV, and TSSA_Hybrid, respectively. The results indicate that t-SMILES models exhibit superior performance in comparison to SMILES, DSMILES, and SELFIES models. This is achieved by avoiding “striking similarity” to the training dataset and by achieving “better novelty with reasonable similarity”.

Back to article page