Fig. 3: Heterogeneity metric between datasets: M\T, M ∩ T and T\M.
From: Machine learning on multiple topological materials datasets

The heterogeneity from dataset A (y-axis) to dataset B (x-axis) was computed as the average distance from each point in A to its 5-nearest neighbors in B, using a feature space defined by the top 47 Matminer features (44 continuous, 3 discrete). These features were selected based on their importance in training XGBoost models. Larger distances indicate higher dissimilarity, revealing the compositional differences between the datasets.