Table 1 Description of structural subsets used that constitute the MAD dataset.

From: Massive Atomic Diversity: a compact universal dataset for atomistic machine learning

Subset name

Description

# structures

# atoms

MC3D

Bulk crystals from the Materials Cloud 3D crystals database42

33596

738484

MC3D-rattled

Rattled analogs of the original MC3D crystals, with Gaussian noise added to all atomic positions

30044

599675

MC3D-random

Artificial structures from MC3D with randomized atomic species sampled from the list of 85 elements

2800

25095

MC3D-surface

Surface slabs generated from MC3D by cleaving along random low-index crystallographic planes

5589

205185

MC3D-cluster

Nanoclusters (2-8 atoms) cut from MC3D and MC3D-rattled crystals as random atomic environments

9071

44829

MC2D

Two-dimensional crystals from the Materials Cloud 2D database43,44

2676

43225

SHIFTML-molcrys

Curated SHIFTML molecular crystals from the Cambridge Structural Database45,46

8578

852044

SHIFTML-molfrags

Neutral molecular fragments from the SHIFTML dataset47

3241

72120