Table 3 Comparison of SELFIES, SMILES, t-SMILES and fragSMILES, across different augmentation levels and based on various properties, for a set of generated strings (using a ChEMBL subset, across five cross-validation folds)
From: fragSMILES as a chemical string notation for advanced fragment and chirality representation
6000 (x5 fold) sampled strings | 6000 (x5 fold) sampled novel molecules | 6000 (x5 fold) sampled strings (chiral set) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Notation | Validity (↑) | Uniqueness (↑) | Novelty (↑) | FCD•101 (↓) | ΔlogP•101 (↓) | ΔSA•102 (↓) | ΔQED•102 (↓) | ΔMW (↓) | Invalidity (↓) | Validity (↑) | Uniqueness (↑) | Novelty (↑) |
SMILES 1× | 4930 ± 70* (82%)* | 4920 ± 70* (100%) | 4770 ± 60* (97%) | 8 ± 1* | 0.8 ± 0.3 | 5 ± 3 | 2 ± 1 | 14 ± 4 | 400 ± 40* (22%)* | 1370 ± 40 (78%)* | 1370 ± 40 (100%) | 1320 ± 40* (96%) |
SELFIES 1× | 6000 ± 0* (100%)* | 5999 ± 2* (100%)* | 5971 ± 2* (100%)* | 55 ± 2* | 2.0 ± 0.9 | 74 ± 4* | 1.9 ± 0.3 | 5 ± 3 | 670 ± 40* (37%)* | 1150 ± 20* (63%)* | 1150 ± 20* (100%)* | 1140 ± 20* (99%)* |
t-SMILES 1× | 6000 ± 0* (100%)* | 5880 ± 10* (98%)* | 5860 ± 10* (100%)* | 15.6 ± 0.8* | 2 ± 1 | 5 ± 1 | 3.8 ± 0.5* | 38 ± 3* | 1010 ± 50* (55%)* | 830 ± 50* (45%)* | 830 ± 50* (100%)* | 830 ± 50* (100%)* |
fragSMILES 1× | 5280 ± 20 (88%) | 5270 ± 30 (100%) | 5110 ± 40 (97%) | 6.9 ± 0.5 | 1.1 ± 0.6 | 5 ± 3 | 1 ± 1 | 9 ± 5 | 330 ± 30 (19%) | 1440 ± 70 (81%) | 1440 ± 60 (100%) | 1400 ± 60 (97%) |
SMILES 5× | 5300 ± 40* (88%)* | 5300 ± 40* (100%)* | 5280 ± 40 (100%)* | 9.9 ± 0.7* | 1.1 ± 0.4 | 6 ± 2 | 2 ± 2 | 15 ± 9 | 320 ± 50 (17%)* | 1500 ± 100 (83%)* | 1500 ± 100 (100%)* | 1500 ± 100 (100%)* |
SELFIES 5× | 6000 ± 0* (100%)* | 6000 ± 0* (100%)* | 5997 ± 1* (100%)* | 34 ± 1* | 1.2 ± 0.5 | 53 ± 2* | 1.7 ± 0.5 | 5 ± 2 | 520 ± 40* (27%)* | 1380 ± 80* (73%)* | 1380 ± 80* (100%)* | 1370 ± 80* (100%)* |
t-SMILES 5× | 6000 ± 0* (100%)* | 5930 ± 10* (99%)* | 5880 ± 10* (99%)* | 13.7 ± 0.6* | 1.4 ± 0.6 | 5 ± 2 | 3 ± 1* | 36 ± 4* | 1000 ± 100* (53%)* | 890 ± 60* (47%)* | 890 ± 60* (100%)* | 880 ± 60* (99%)* |
fragSMILES 5× | 5420 ± 60 (90%) | 5410 ± 60 (100%) | 5300 ± 60 (98%) | 7.2 ± 0.6 | 1.5 ± 0.7 | 5 ± 2 | 1.5 ± 0.7 | 7 ± 4 | 290 ± 30 (15%) | 1700 ± 100 (85%) | 1700 ± 100 (100%) | 1600 ± 100 (98%) |