Table 1 Comparison of classification accuracy (AUC-ROC) between fine-tuned MLM-FG and existing pre-trained/self-supervised baselines on multiple classification benchmarks

From: Pre-trained molecular language models with random functional group masking

 

BBBP

BACE

ClinTox

Tox21

SIDER

HIV

MUV

No. molecules

2039

1513

1478

7831

1427

41,127

93,087

No. prediction tasks

1

1

2

12

27

1

17

Pre-trained models from existing literature

MolCLR-gin

0.9307

0.7873

0.8005

0.7644

0.5826

0.7768

0.7386

MolCLR-gcn

0.8432

0.7194

0.7997

0.7179

0.5353

0.7616

0.6701

GROVER-base

0.9022

0.7700

0.6847

0.7187

0.5579

0.6950

0.6265

GROVER-large

0.8861

0.7795

0.6082

0.7155

0.5283

0.6956

0.5132

GEM

0.9103

0.8603

0.8506

0.7791

0.6279

0.7500

0.7253

MoLFormer

0.9037

0.8275

0.9451

0.7734

0.5826

0.7630

0.7599

MoLFormer and RoBERTa models without pre-training

MoLFormer (from scratch)

0.8636

0.7728

0.7317

0.7461

0.5667

0.6991

0.6863

RoBERTa (from scratch)

0.8711

0.7445

0.8858

0.7369

0.5285

0.5575

0.6674

RoBERTa models pre-trained by random subsequence masking

RoBERTa (10M, rand. subseq)

0.8572

0.8253

0.9284

0.7533

0.6111

0.7006

0.6234

RoBERTa (20M, rand. subseq)

0.9068

0.8135

0.9011

0.7635

0.5799

0.7477

0.6481

RoBERTa (100M, rand. subseq)

0.9048

0.8248

0.9167

0.7852

0.5860

0.7683

0.6909

MoLFormer and RoBERTa models pre-trained by MLM-FG

MLM-FG (MoLFormer, 10M)

0.8980

0.8044

0.9669

0.7765

0.5811

0.7633

0.6829

MLM-FG (MoLFormer, 20M)

0.8976

0.8088

0.9436

0.7793

0.5992

0.7801

0.7185

MLM-FG (MoLFormer, 100M)

0.9055

0.8040

0.9270

0.7893

0.5786

0.7690

0.6017

MLM-FG (RoBERTa, 10M)

0.8870

0.8265

0.9258

0.7545

0.6054

0.7106

0.6103

MLM-FG (RoBERTa, 20M)

0.9378

0.8458

0.8919

0.7603

0.5908

0.7594

0.6428

MLM-FG (RoBERTa, 100M)

0.9237

0.7981

0.9606

0.7896

0.6042

0.7807

0.7990

  1. Bold values mean they are the best in the column.