Figure 5

Evaluation of unsupervised clustering. (A) The similarity κ of the results obtained for any pair of feature sets and all 60 data sets studied, and (B) the similarity κm of the feature set-specific results and the manually labelled layers (indicated with ML). The bottom label of panel (B) shows the data set names and the top label of panel (A) their area types. (C) The first six columns and first six rows show the mean values of κ over all 60 locations, and the seventh column and the seven rows the mean value of κm over all 60 locations. The element at (ML, ML) is manually set to 1. (D) The detailed statistics of Log(κm). The histograms show the mean values, error bars the standard deviations, and the blue dots the maximum values. The minimal values are close to zero or even negative, and are, therefore, not shown. Pairwise comparison of F1, F2, F3, and F4 values is not significant (p > 0.2), whereas pairwise comparison of F1, F2, F3, or F4 to F5 or F6 is significant (p < 0.001). To illustrate the meaning of the obtained κm values, we randomly shifted the manually assigned layer boundaries, with the boundary shift values chosen in the range of [-Nsa, Nsa] slices. We repeated runs with Nsa = 1, 2, …, 6. For each Nsa value, each of the 60 locations was re-labelled and the artificial labelling result compared to the original manual labelling. Corresponding mean values are plotted in panel (D). The olive line depicts the respective correlation coefficient when the layers manually labelled by the second expert reader were used as test result. It can be seen that a Nsa value of 5 approximately resembles interobserver variability of layer labelling; data points at and above this line can be considered to indicate ideal performance.