Fig. 1
From: TBHubbard: tight-binding and extended Hubbard model dataset for metal-organic frameworks

(a) Illustration of the TBHubbard dataset. The QMOF38,39 dataset is indicated in pink, providing over 20,000 MOF structures. From this data collection, the TBHubbard dataset comprises two subsets of materials: the Tight-binding (in green) and Extended Hubbard (in blue) subsets with ≈ 10,000 and ≈ 200 materials, respectively; (b) t-SNE projection of tight-binding matrices, where points are colored according to the different datasets analyzed in this study; (c) t-SNE projection of SOAP-3 Å descriptors for metal atoms across the dataset. A preliminary PCA step reduced the descriptor dimensionality to 8 components, retaining 97 % of the total variance. The color scheme for the t-SNE plots is as follows: pink for the QMOF dataset, blue for the EH subset, and green for the TB subset.