Table 3 Results of the GuacaMol goal-directed optimization task

From: Leveraging tree-transformer VAE with fragment tokenization for high-performance large chemical model generation

Task

FRATTVAE

+MSO

MoLeR*

+MSO

Best of Dataset

GraphGA‡

CReM‡

Celecoxib rediscovery

0.835

0.868

0.505

1.000

1.000

Troglitazone rediscovery

0.817

0.544

0.419

1.000

1.000

Thiothixene rediscovery

0.589

0.649

0.456

1.000

1.000

Aripiprazole similarity

0.976

0.894

0.595

1.000

1.000

Albuterol similarity

0.946

0.920

0.719

1.000

1.000

Mestranol similarity

0.804

0.741

0.629

1.000

1.000

C11H24

0.967

0.969

0.684

0.971

0.966

C9H10N2O2PF2Cl

0.841

0.881

0.747

0.982

0.940

Median molecules 1

0.358

0.353

0.334

0.406

0.371

Median molecules 2

0.342

0.317

0.351

0.432

0.434

Osimertinib MPO

0.899

0.875

0.839

0.953

0.995

Fexofenadine MPO

0.891

0.890

0.817

0.998

1.000

Ranolazine MPO

0.855

0.866

0.792

0.920

0.969

Perindopril MPO

0.695

0.566

0.575

0.792

0.815

Amlodipine MPO

0.791

0.663

0.696

0.894

0.902

Sitagliptin MPO

0.613

0.736

0.509

0.891

0.763

Zaleplon MPO

0.584

0.612

0.547

0.754

0.770

Valsartan MPO

0.823

0.747

0.259

0.990

0.994

Deco Hop

0.935

0.941

0.933

1.000

1.000

Scaffold Hop

0.880

0.817

0.738

1.000

1.000

Average

0.772

0.742

0.607

0.899

0.896

Quality

0.782

0.652

0.786

0.398

0.524

SA Score

3.126

3.291

2.989

3.975

3.506

  1. Performance comparison of FRATTVAE and MoLeR in the GuacaMol goal-directed optimization tasks. Scores are provided for each task, with higher scores indicating better outcomes. The average score, Quality ratings, and SA scores for the top 100 optimized molecules are also shown. Best of Dataset represents the property value of the molecule with the best property value within GuacaMol dataset. Models marked with an ‘*’ indicate the use of pretrained models. For GraphGA and CReM, literature-reported values12,42 have been included and are indicated with ‘‡’.