Table 1 Description of structural subsets used that constitute the MAD dataset.
From: Massive Atomic Diversity: a compact universal dataset for atomistic machine learning
Subset name | Description | # structures | # atoms |
|---|---|---|---|
MC3D | Bulk crystals from the Materials Cloud 3D crystals database42 | 33596 | 738484 |
MC3D-rattled | Rattled analogs of the original MC3D crystals, with Gaussian noise added to all atomic positions | 30044 | 599675 |
MC3D-random | Artificial structures from MC3D with randomized atomic species sampled from the list of 85 elements | 2800 | 25095 |
MC3D-surface | Surface slabs generated from MC3D by cleaving along random low-index crystallographic planes | 5589 | 205185 |
MC3D-cluster | Nanoclusters (2-8 atoms) cut from MC3D and MC3D-rattled crystals as random atomic environments | 9071 | 44829 |
MC2D | Two-dimensional crystals from the Materials Cloud 2D database43,44 | 2676 | 43225 |
SHIFTML-molcrys | Curated SHIFTML molecular crystals from the Cambridge Structural Database45,46 | 8578 | 852044 |
SHIFTML-molfrags | Neutral molecular fragments from the SHIFTML dataset47 | 3241 | 72120 |