Table 2 Comparison of four Dragonfly methods, namely ligand-SMILES, ligand-SELFIES, structure-SMILES, and Structure-SELFIES

DRAGONFLY method	Valid and unique molecules / %	Valid, unique and novel molecules / %	RAScore ≥ 0.5 / %	Average Jaccard distance to other molecules
Ligand-SMILES	93.3 ( ± 0.4)	92.2 ( ± 0.4)	93.4 ( ± 0.6)	0.778 ( ± 0.001)
Ligand-SELFIES	99.9 ( ± 0.1)	99.7 ( ± 0.1)	84.0 ( ± 1.0)	0.805 ( ± 0.002)
Structure-SMILES	90.2 ( ± 0.8)	87.4 ( ± 0.9)	90.0 ( ± 1.0)	0.773 ( ± 0.004)
Structure-SELFIES	99.9 ( ± 0.1)	99.6 ( ± 0.1)	78.0 ( ± 2.0)	0.811 ( ± 0.003)

	Unique atom scaffolds / %	Unique and Novel atom scaffolds / %	Unique carbon scaffolds / %	Unique and novel carbon scaffolds / %
Ligand-SMILES	85.0 ( ± 0.1)	53.0 ( ± 0.2)	98.4 ( ± 0.3)	58.0 ( ± 0.2)
Ligand-SELFIES	96.9 ( ± 0.4)	86.0 ( ± 0.1)	99.8 ( ± 0.1)	83.0 ( ± 0.1)
Structure-SMILES	84.0 ( ± 0.1)	55.0 ( ± 0.3)	98.3 ( ± 0.3)	56.0 ( ± 0.2)
Structure-SELFIES	96.0 ( ± 0.1)	81.0 ( ± 0.1)	99.9 ( ± 0.1)	83.0 ( ± 0.2)

Bold indicates whether SELFIES- or SMILES-based models achieve a higher value for the investigated property in both structure- and ligand-based models. The percentage of molecules is shown that fulfill the desired criteria: (i) valid and unique molecules, (ii) valid, unique, and novel molecules, (iii) fraction of molecules with an RAScore of ≥ 0.5, (iv) average Jaccard distance to other generated molecules from the same run (indicating diversity), and (v)–(viii) various scaffold metrics, including unique and novel carbon and atom scaffolds. The values are presented as mean and standard deviation, based on three Dragonfly runs (N = 3), each sampling 2000 SMILES-strings.

Quick links

Search