Table 1 Results by directly reconstructing molecules using random reconstruction algorithm

From: t-SMILES: a fragment-based molecular representation framework for de novo ligand design

Code Algorithm

Dataset

Model

Valid

Unique

Novelty

KLD

FCD

FBTs

TSSA

ChEMBL

TSSA_J

1.000

0.982

0.833

0.974

0.704

5029

TSSA_B

1.000

0.992

0.682

0.981

0.720

610633

TSSA_M

1.000

0.993

0.856

0.986

0.823

88938

TSSA_S

1.000

1.000

0.882

0.969

0.816

515329

Zinc

TSSA_J

1.000

0.971

0.835

0.985

0.827

61786

TSSA_B

1.000

0.971

0.755

0.975

0.740

1197

TSSA_M

1.000

0.970

0.842

0.988

0.858

18989

TSSA_S

1.000

0.976

0.876

0.972

0.840

485

QM9

TSSA_J

1.000

0.929

0.304

0.977

0.971

279

TSSA_B

1.000

0.945

0.056

0.998

0.983

32

TSSA_M

1.000

0.911

0.141

0.997

0.979

238

TSSA_S

1.000

0.898

0.156

0.996

0.975

69

TSDY

ChEMBL

TSDY_B

1.000

0.996

0.210

0.997

0.915

4267

TSDY_M

1.000

0.996

0.711

0.987

0.897

85409

TSDY_S

1.000

0.996

0.459

0.995

0.913

39

Zinc

TSDY_B

1.000

0.978

0.365

0.995

0.909

190

TSDY_M

1.000

0.978

0.688

0.996

0.916

5681

TSDY_S

1.000

0.978

0.393

0.997

0.939

18

TSID

ChEMBL

TSID_B

1.000

0.996

0.004

0.998

0.925

4267

Zinc

TSID_B

1.000

0.978

0.010

0.999

0.945

190

  1. The TSSA codes achieve the highest novelty scores, while the TSDY codes achieve lower scores and the TSID codes achieve almost zero. All codes receive reasonable FCD scores. “KLD” stands for Kullback–Leibler divergence. “FCD” represents Fréchet ChemNet Distance. “FBTs” refer to the types of Full Binary Trees in the training dataset. The suffix letters “J, B, M, S” in various t-SMILES code names represent the fragmentation algorithm: JTVAE, BRICS, MMPA, and Scaffold.