Fig. 4: Illustration of convex hull and dendrogram-based heterogeneity indices for non-categorical systems.

Panel a illustrates the basic concept of a convex hull on synthetic 2-dimensional data. The volume of the hull is taken as an index of heterogeneity. Panel b shows one problem with the convex hull method, which occurs when data lie along a lower dimensional surface (here just a curve). In this example, the data are all concentrated along the outer border of the hull, leaving the core unoccupied. However, the convex hull volume index will nonetheless count the empty space toward the heterogeneity value. Panel c illustrates the effect of outliers on convex hull volume. Since a convex hull is found by creating a “shell” around one’s data, outlying points will expand this shell in ways that leave much of the convex hull empty (though still counting toward the heterogeneity value). Panel d shows the dendrogram computed using agglomerative clustering for a simple mixture of five 2-dimensional (2D) Gaussians. The functional diversity (FD) measure, shown in the title, is the sum of all branch lengths in this tree. Panel e shows a simple simulation with five 2D Gaussians (standardized to lie within the bounds [−1.5, 1.5] in both axes) that were progressively separated further. One can appreciate that the FD measure decreases as the distributions become more distinct. This is the opposite effect demonstrated by the convex hull volume, insofar as FD increases as the space becomes more densely populated with data points.