Table 1 Using various pure structural representations or physical signatures to classify the highest 5% (denoted as H-Eact problem) and lowest 5% Eact atoms (L-Eact problem) of the combined Eact spectrum merged from six MGs (Fig. 1b).

From: Predicting the propensity for thermally activated β events in metallic glasses via interpretable machine learning

Representation

Feature number*

H-Eact

L-Eact

Interstice distribution

80

0.942 ± 0.004

0.888 ± 0.012

<0, 0, 12, 0, 0 > + <0, 0, 12, 4, 0 > (baseline 1)

2

0.673 ± 0.010

0.557 ± 0.006

Voronoi index (baseline 2)

5

0.750 ± 0.009

0.628 ± 0.016

A group of SRO features

18

0.807 ± 0.010

0.634 ± 0.017

SRO + MRO features

72

0.908 ± 0.005

0.801 ± 0.011

Radial symmetry functions

70a

0.905 ± 0.008

0.770 ± 0.009

Bispectrum coefficients

30b

0.901 ± 0.008

0.761 ± 0.010

Moment tensor potential (MTP)c

288d

0.918 ± 0.008

0.784 ± 0.011

Smooth-overlap of atomic positions (SOAP)

567e

0.927 ± 0.007

0.812 ± 0.011

Flexibility volume, Vflex

1

0.845 ± 0.009

0.784 ± 0.010

Atomic shear moduli, G

1

0.690 ± 0.012

0.629 ± 0.013

Coarse-grained G

1

0.736 ± 0.014

0.674 ± 0.015

  1. The area under the receiver operating characteristic curve (AUC-ROC) on the test set is used as the scoring metric. The reported AUC-ROC is averaged from three times of undersampling and fivefold cross-validation on each sampled data, and the standard deviation of AUC-ROC over the fifteen cross-validation splits is also provided.
  2. *Except for the interstice distribution representation, an additional feature indicating whether the atom is Cu (0) or Zr (1) is added to each representation to help ML decisions, so technically the feature number should be added with 1. This is very helpful for representations that cannot well distinguish the atom types from the features themselves.
  3. aNfeature = Nelems × Nr, where Nelems is the number of species in the system and Nr is the number of r selected (Methods). Here Nelems = 2 and Nr = 35, thus Nfeature = 70.
  4. bFor an even twojmax = 2(m-1), Nfeature = m(m + 1)(2m + 1)/6, where two jmax is the band limit for bispectrum components (Methods). Here two jmax is set to 6, thus m = 4 and Nfeature = 30.
  5. cAs MTP has fittable parameters designed to be optimized by regression, we train MTP by regression and then derive the classification AUC-ROC using the predicted Eact on the test set (i.e., derive the TPRs and FPRs by varying the Eact threshold and calculate the area underneath the curve).
  6. dThe levmax of MTP is set to be 20 (Methods), and the number of basis functions are 288.
  7. eNfeature = (Nelem × (Nelem + 1) / 2) × (lmax + 1) * nmax * (nmax + 1) / 2, where rmax and nmax are the number of radial basis functions and maximum degree of spherical harmonics (Methods). Here Nelems = 2, lmax = 8 and nmax = 6, thus Nfeature = 567.