Table 2 Table of common benchmark datasets for graph learning tasks.

From: Graph neural networks for materials science and chemistry

Molecules

Size

Tasks

Type

Description

QM789

7165

1

R

DFT quantum calculations

QM7b90

7211

13

R

DFT quantum calculations

QM962

133,885

12

R

DFT quantum calculations

PDBBind91

23,496

1

R

protein binding affinity

MD1792,93

>100,000

≥1

R

molecular dynamics trajectories

FreeSolv94

643

1

R

solvation free energy

Lipop95

4200

1

R

lipophilicity

Tox2195

8014

12

C

qualitative toxicity measurement

ToxCast96

8615

617

C

qualitative toxicity measurement

BBBP97

2053

1

C

blood–brain barrier penetration

HIV95

41,913

1

C

inhibition to virus HIV

SIDER98,99

1427

27

C

adverse drug reaction

Crystals

Size

Tasks

Type

Description

MP100

~144,595

≥1

R, C

Materials Project (MP)

OQMD101

~1,022,603

≥1

R, C

Open Quantum Materials Database

OC20102

~133,934,018

≥1

R

Open Catalyst Project

  1. Note that this list is not complete and merely serves as an overview of different sizes and supervised learning tasks, which is either regression (R) or classification (C).