Fig. 3: Interpretability for CrystalTransformer-generated universal atomic embeddings (ct-UAE) includes clustering elements and statistically validating the clustering results. | Nature Communications

Fig. 3: Interpretability for CrystalTransformer-generated universal atomic embeddings (ct-UAE) includes clustering elements and statistically validating the clustering results.

From: Transformer-generated atomic embeddings to enhance prediction accuracy of crystal properties with machine learning

Fig. 3

a UMAP (Uniform Manifold Approximation and Projection) maps ct-UAEs into two dimensions denoted as Component 1 and 2, while K-means method clusters them into three categories denoted by three colors. The shadow background reflects the number of elements in the cluster in the region. The darker shadow indicates a higher number of elements in that cluster region. b, c Elbow plot and silhouette score graph for optimal cluster number. The dashed line in (b) is located at 3, representing the silhouette score being at a relatively high level. So is the dashed line in (c), indicating that the slope of the Sum of Squared Error (SSE) curve is relatively steep when the number of clusters is 3. Five random seeds are used to get averaged results. d–f The violin plots of formation energy, bandgap, and total magnetization of oxide compounds and oxygen allotropes from the Materials Project dataset, categorized into Classes A, B, and C using MT@4p embedding with UMAP. The total numbers of samples for Class A, Class B and Class C shown in (d–f) are 2197, 2719, and 7752, respectively. Parameters like outliers or center for violin plots are listed in the Source data. Source data are provided as a Source Data file.

Back to article page