Fig. 2: Distribution of the defect dataset generated with first-principles calculations.
From: Machine-learning structural reconstructions for accelerated point defect calculations

a Example host structures and their respective space groups. b) Number of configurations containing each element. c) Two-dimensional projection of structural similarity for defect configurations. Each configuration is represented with the feature vector generated by the M3GNet model79,80 (trained on the bulk formation energies of the Materials Project database) and the vector dimensions are reduced using t-distributed stochastic neighbour embedding (t-SNE)131,132. The defect configurations are coloured by their host composition (with similar colours indicating compositions with similar MEGNet133 feature vectors), showing that related chemical systems cluster near each other. For clarity, in (b) and (c) 10 evenly spaced steps are selected from each relaxation trajectory.