Table 2 Overall performance comparison to the state-of-the-art methods on molecular property prediction regression tasks.

From: Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction

Regression (RMSE, lower is better↓)

Dataset

ESOL

FreeSolv

Lipophilicity

ESOL

FreeSolv

Lipophilicity

Molecules

1128

642

4200

1128

642

4200

Tasks

1

1

1

1

1

1

Splitting strategy

Random

Random

Random

Scaffold

Scaffold

Scaffold

AttentiveFP

0.853 (0.060)

2.030 (0.420)

0.650 (0.030)

0.877 (0.029)

2.073 (0.183)

0.721 (0.001)

FragGAT

0.878 (0.124)

1.538 (0.640)

0.645 (0.042)

0.884 (0.041)

2.065 (0.201)

0.750 (0.013)

MPNN

1.167 (0.430)

2.185 (0.952)

0.672 (0.051)

1.541 (0.630)

2.430 (0.821)

0.730 (0.063)

DMPNN

0.980 (0.258)

2.177 (0.914)

0.653 (0.046)

1.050 (0.008)

2.182 (0.183)

0.683 (0.016)

CMPNN

0.789 (0.112)

2.007 (0.442)

0.614 (0.029)

0.845 (0.039)

1.833 (0.580)

0.658 (0.029)

CoMPT

0.774 (0.058)

1.855 (0.578)

0.592 (0.048)

0.915 (0.042)

1.959 (0.808)

0.646 (0.028)

GROVERbase

0.888 (0.116)

1.592 (0.072)

0.660 (0.061)

1.185 (0.160)

2.001 (0.081)

0.817 (0.008)

GROVERlarge

0.831 (0.120)

1.544 (0.397)

0.643 (0.030)

1.098 (0.178)

1.987 (0.072)

0.823 (0.010)

PharmHGT

0.680 (0.137)

1.266 (0.239)

0.583 (0.026)

0.839 (0.049)

1.689 (0.516)

0.638 (0.040)

  1. The results of baselines are obtained by us using a 5-fold cross-validation with scaffold split or Random split and doing experiments on each task for one time. The values in this table are the Mean and standard deviation of RMSE values. The best performance is marked in bold and the second best is underlined to facilitate reading.