Table 2 Results on JNK3 active molecules using MolGPT with different training epochs

From: t-SMILES: a fragment-based molecular representation framework for de novo ligand design

Model3

Valid

Novelty

FCD

Active Novel

FBT Novel

Frag Novel

SMILES[R200]

0.795

0.120

0.584

0.072

N/A

N/A

SMILES[R2000]

1.000

0.001

0.765

0.004

N/A

N/A

DSMILES[R200]

0.677

0.076

0.510

0.043

N/A

N/A

DSMILES[R2000]

0.999

0.001

0.778

0.001

N/A

N/A

SELFIES[R200]

1.000

0.238

0.544

0.148

N/A

N/A

SELFIES[R2000]

1.000

0.008

0.767

0.050

N/A

N/A

TSSA_S[R300]

1.000

0.833

0.564

0.582

2.655

0.962

TSSA_S[R5000]

1.000

0.817

0.608

0.564

2.534

0.049

TSSA_S[R50000]

1.000

0.824

0.572

0.571

2.379

0.023

TSSA_HSV[R200]

1.000

0.483

0.680

0.350

2.086

5.044

TSSA_HSV[R2000]

1.000

0.447

0.716

0.319

1.810

0.365

TSSA_Hybrid[R200]

1.000

0.683

0.622

0.374

2.310

25.978

TSSA_Hybrid[R2000]

1.000

0.657

0.619

0.437

2.672

23.745

TF_SMILES[R5]

0.887

0.707

0.523

0.526

N/A

N/A

TF_SMILES[R100]

0.999

0.033

0.764

0.023

N/A

N/A

TF_TSSA_S[R5]

1.000

0.932

0.483

0.710

2.897

9.105

TF_TSSA_S[R100]

1.000

0.849

0.570

0.569

2.431

0.208

SMILES_Aug50[R10]

0.807

0.570

0.566

0.483

N/A

N/A

SMILES_Aug50[R100]

0.995

0.049

0.750

0.047

N/A

N/A

TSSA_S_Rec50[R10]

1.000

0.962

0.389

0.829

2.414

1.757

TSSA_S_Rec50[R100]

1.000

0.960

0.411

0.809

2.448

0.655

  1. “Active Novel” means the newly generated novelty molecules predicted by the AttentiveFP model as active. “FBT Novel” means different FBT (Full Binary Tree) compared with the training data. “Frag Novel” means different newly generated fragments compared with the training data. “R”means training epochs. “Aug” means augmenting training data by enumerating SMILES. “Rec” means reconstructing directly from active molecules to generate new active molecules as training data. “TF” means transfer learning. TSSA_HSV means hybrid model on TS_Vanilla and TSSA_S. TSSA_Hybrid means hybrid models on all TSSA codes including JTAVE, BRICS, MMPA, and scaffold-based TSSA and TS_Vanilla. See SI.D.3 and SI.D.4 for more detailed results.