Table 2 Comparison of four Dragonfly methods, namely ligand-SMILES, ligand-SELFIES, structure-SMILES, and Structure-SELFIES

From: Prospective de novo drug design with deep interactome learning

DRAGONFLY method

Valid and unique molecules / %

Valid, unique and novel molecules / %

RAScore ≥ 0.5 / %

Average Jaccard distance to other molecules

Ligand-SMILES

93.3 ( ± 0.4)

92.2 ( ± 0.4)

93.4 (±0.6)

0.778 ( ± 0.001)

Ligand-SELFIES

99.9 (±0.1)

99.7 (±0.1)

84.0 ( ± 1.0)

0.805 (±0.002)

Structure-SMILES

90.2 ( ± 0.8)

87.4 ( ± 0.9)

90.0 (±1.0)

0.773 ( ± 0.004)

Structure-SELFIES

99.9 (±0.1)

99.6 (±0.1)

78.0 ( ± 2.0)

0.811 (±0.003)

 

Unique atom scaffolds / %

Unique and Novel atom scaffolds / %

Unique carbon scaffolds / %

Unique and novel carbon scaffolds / %

Ligand-SMILES

85.0 ( ± 0.1)

53.0 ( ± 0.2)

98.4 ( ± 0.3)

58.0 ( ± 0.2)

Ligand-SELFIES

96.9 (±0.4)

86.0 (±0.1)

99.8 (±0.1)

83.0 (±0.1)

Structure-SMILES

84.0 ( ± 0.1)

55.0 ( ± 0.3)

98.3 ( ± 0.3)

56.0 ( ± 0.2)

Structure-SELFIES

96.0 (±0.1)

81.0 (±0.1)

99.9 (±0.1)

83.0 (±0.2)

  1. Bold indicates whether SELFIES- or SMILES-based models achieve a higher value for the investigated property in both structure- and ligand-based models. The percentage of molecules is shown that fulfill the desired criteria: (i) valid and unique molecules, (ii) valid, unique, and novel molecules, (iii) fraction of molecules with an RAScore of ≥ 0.5, (iv) average Jaccard distance to other generated molecules from the same run (indicating diversity), and (v)–(viii) various scaffold metrics, including unique and novel carbon and atom scaffolds. The values are presented as mean and standard deviation, based on three Dragonfly runs (N = 3), each sampling 2000 SMILES-strings.