Table 1 Predictive performance results of MoleculeFormer on 10 commonly used public datasets

From: MoleculeFormer is a GCN-transformer architecture for molecular property prediction

Dataset

Split type

Metric

MoleculeNet (Graph)

Chemprop (optimized)

Attentive FP

HRGCN+

XGBoost

FP-GNN

MoleculeFormer

BACE

Random

ROC-AUC

 

0.898

0.876

0.891

0.889

0.881

0.913

 

Scaffold

ROC-AUC

0.806

0.857

   

0.860

0.874

HIV

Random

ROC-AUC

 

0.827

0.822

0.824

0.816

0.825

0.837

 

Scaffold

ROC-AUC

0.763

0.794

   

0.824

0.830

BBBP

Random

ROC-AUC

  

0.887

0.926

0.926

0.935

0.932

 

Scaffold

ROC-AUC

0.690

0.886

   

0.916

0.924

Tox21

Random

ROC-AUC

0.832

0.897

0.852

0.848

0.836

0.815

0.839

ClinTox

Random

ROC-AUC

0.832

0.897

0.904

0.899

0.911

0.840

0.883

SIDER

Random

ROC-AUC

0.638

0.658

0.623

0.641

0.642

0.661

0.707

MUV

Random

PRC-AUC

0.109

0.053

0.038

0.082

0.068

0.09

0.144

FreeSolv

Random

RMSE

1.150

1.009

1.091

0.926

1.025

0.905

1.022

ESOL

Random

RMSE

0.580

0.587

0.587

0.563

0.582

0.675

0.645

Lipophileicity

Random

RMSE

0.655

0.563

0.553

0.603

0.574

0.625

0.649

  1. Each dataset is split into training, validation, and test sets with a ratio of 8:1:1. The bold font indicates the top model with the highest scores. Attentive FP and HRGCN+ results are from Wu et al.26. FP-GNN results are from Cai et al.27.