Table 1 Comparison of supervised and self-supervised baselines on representative MoleculeNet benchmarks considered in previous work using the area under the curve of the receiver operating characteristic. All values are scaled by a factor of 100 for reader convenience. All methods are evaluated using scaffold splits to minimize the molecular similarity between the training and testing sets. All reported HDC models (HDBind and MoleHD33) use dimension \(D=10\)k. *Denotes our implementation. β-β denotes no value reported in the original work. Values in parentheses denote standard deviation of the average of 10 trials per task in each dataset. Results above the horizontal line correspond to SOA supervised and self-supervised baselines, below correspond to HDC methods.
From: HDBind: encoding of molecular structure with hyperdimensional binary representations
Method | BBBP | Tox21 | ClinTox | HIV | BACE | SIDER |
|---|---|---|---|---|---|---|
Molecules | 2039 | 7831 | 1478 | 41,127 | 1513 | 1427 |
Tasks | 1 | 12 | 2 | 1 | 1 | 27 |
RF9 | 71.4 | 76.9 | 71.3 | 78.1 | 86.7 | 68.4 |
SVM9 | 72.9 | 81.8 | 66.9 | 79.2 | 86.2 | 68.2 |
MLP | 79.0 | 67.2 | 82.2 | 73.1 | 70.3 | 58.6 |
MGCN65 | 85.0 | 70.7 | 63.4 | 73.8 | 73.4 | 55.2 |
D-MPNN66 | 71.2 | 68.9 | 90.5 | 75.0 | 85.3 | 63.2 |
N-gram61 | 91.2 | 76.9 | 85.5 | 83.0 | 87.6 | 63.2 |
GeomGCL62 | β | 85.0 | 91.9 | β | β | 64.8 |
\(\text {MolCLR}_{\text {GIN}}\)10 | 73.6 | 79.8 | 93.2 | 80.6 | 89.0 | 68.0 |
MoLFormer-XL9 | 93.7 | 84.7 | 94.8 | 82.2 | 88.21 | 69.0 |
MoleHD33 | 84.4 | β | 98.7 | β | β | 56.6 |
HDB-RPFP | 94.8 (0.3) | 70.8 (0.9) | 86.3 (4.0) | 71.8 (1.3) | 71.3 (0.7) | 55.2 (2.0) |
HDB-MolCLR | 66.8 (0.4) | 68.0 (0.8) | 71.2 (4.0) | 70.6 (0.7) | 82.4 (0.5) | 61.2 (1.9) |
HDB-MoLFormer | 99.2 (0.1) | 67.3 (1.0) | 98.8 (0.0) | 79.2 (0.6) | 66.8 (0.4) | 55.4 (1.9) |
HDB-DECFP | 93.8 (0.2) | 69.6 (0.8) | 90.6 (4.0) | 77.8 (0.3) | 74.7 (1.1) | 61.4 (1.6) |
HDB-Combo | 97.4 (0.3) | 70.1 (1.2) | 90.7 (3.4) | 77.4 (0.8) | 67.0 (2.7) | 58.8 (2.8) |