Fig. 8
From: High resolution synthetic residential energy use profiles for the United States

Summary of the two case scenarios. Orange color is denoted for findings of case 1 where we cluster real data set \({\mathcal{R}}\) and assign a cluster label to synthetic data set \({\mathcal{S}}\). Blue color is denoted for findings of case 1 where we cluster synthetic data set \({\mathcal{S}}\) and assign a cluster label to real data set \({\mathcal{R}}\). (a) illustrates 100% coverage in both cases even as k varies. This means that, in each case at least one data point belongs to every cluster for a given k. (b) shows the closeness between the two distance vectors: distance of real data points in a cluster to its respective centroid and distance of synthetic data points in a cluster to its respective centroid. Closeness is given by the Hellinger distance which suggests that a value of 0 signifies that the two distributions are similar. The value of distances is close to 0 for all values of k in both the cases. However, an upward trend is observed as k increases. Overall we see the robustness of results w.r.t. k.