Table 1 Comparison of supervised and self-supervised baselines on representative MoleculeNet benchmarks considered in previous work using the area under the curve of the receiver operating characteristic. All values are scaled by a factor of 100 for reader convenience. All methods are evaluated using scaffold splits to minimize the molecular similarity between the training and testing sets. All reported HDC models (HDBind and MoleHD33) use dimension \(D=10\)k. *Denotes our implementation. β€˜-’ denotes no value reported in the original work. Values in parentheses denote standard deviation of the average of 10 trials per task in each dataset. Results above the horizontal line correspond to SOA supervised and self-supervised baselines, below correspond to HDC methods.

From: HDBind: encoding of molecular structure with hyperdimensional binary representations

Method

BBBP

Tox21

ClinTox

HIV

BACE

SIDER

Molecules

2039

7831

1478

41,127

1513

1427

Tasks

1

12

2

1

1

27

RF9

71.4

76.9

71.3

78.1

86.7

68.4

SVM9

72.9

81.8

66.9

79.2

86.2

68.2

MLP

79.0

67.2

82.2

73.1

70.3

58.6

MGCN65

85.0

70.7

63.4

73.8

73.4

55.2

D-MPNN66

71.2

68.9

90.5

75.0

85.3

63.2

N-gram61

91.2

76.9

85.5

83.0

87.6

63.2

GeomGCL62

–

85.0

91.9

–

–

64.8

\(\text {MolCLR}_{\text {GIN}}\)10

73.6

79.8

93.2

80.6

89.0

68.0

MoLFormer-XL9

93.7

84.7

94.8

82.2

88.21

69.0

MoleHD33

84.4

–

98.7

–

–

56.6

HDB-RPFP

94.8 (0.3)

70.8 (0.9)

86.3 (4.0)

71.8 (1.3)

71.3 (0.7)

55.2 (2.0)

HDB-MolCLR

66.8 (0.4)

68.0 (0.8)

71.2 (4.0)

70.6 (0.7)

82.4 (0.5)

61.2 (1.9)

HDB-MoLFormer

99.2 (0.1)

67.3 (1.0)

98.8 (0.0)

79.2 (0.6)

66.8 (0.4)

55.4 (1.9)

HDB-DECFP

93.8 (0.2)

69.6 (0.8)

90.6 (4.0)

77.8 (0.3)

74.7 (1.1)

61.4 (1.6)

HDB-Combo

97.4 (0.3)

70.1 (1.2)

90.7 (3.4)

77.4 (0.8)

67.0 (2.7)

58.8 (2.8)