Table 2 Results for the held-out test sets across all cross-validation folds, comparing the SPM and machine learning (ML) approaches.

From: Human-centered evaluation of statistical parametric mapping and explainable machine learning for outlier detection in plantar pressure data

 

SPM approach

ML approach

Confusion matrix

754 (754)

44 (44)

783 (783)

15 (15)

237 (27)

1763 (206)

30 (4)

1970 (229)

MCC (min; max)

0.76; 0.81 (0.74; 0.83)

0.95; 0.98 (0.92; 0.98)

MCC (mean ± std)

0.78 ± 0.02 (0.81 ± 0.03)

0.96 ± 0.01 (0.95 ± 0.01)

F1-score (min; max)

0.92; 0.94 (0.86; 0.93)

0.98; 1.00 (0.94; 0.99)

F1-score (mean ± std)

0.93 ± 0.01 (0.88 ± 0.03)

0.99 ± 0.00 (0.96 ± 0.02)

  1. For comparability, predictions of the multiclass ML approach were reduced to a binary classification of outlier vs. non-outlier. The confusion matrices show actual classes on the rows and predicted classes on the columns, where the top row corresponds to valid samples and the bottom row corresponds to outliers. Correct predictions (true positives and true negatives) are highlighted in bold. Metrics include the minimum (min) and maximum (max) values, as well as the mean ± standard deviation (std) for the Matthews Correlation Coefficient (MCC) and F1-score (F1). Values shown in brackets represent performance calculated exclusively on the real test data, excluding the synthetically generated outliers, which were used only during training and validation.