Figure 3

Performance evaluation of TOF, LOF, and Keogh’s discord detection algorithms on four simulated datasets. (A) Mean Receiver Observer Characteristic Area Under Curve (ROC AUC) score and SD for TOF (orange) and LOF (blue) are shown as a function of neighborhood size (k). TOF showed the best results for small neighborhoods. In contrast, LOF showed better results for larger neighborhoods in the case of the logistic map and ECG datasets but did not reach reasonable performance on random walk with linear outliers. (B) Mean \(\mathrm{F}_1\) score for TOF (orange), LOF (blue), and Keogh’s discord detection (red) algorithms as a function of the expected anomaly length (for TOF) given in either data percentage (for LOF) or window length parameter (for discord). Black dashed lines show the theoretical maximum of the mean \(\mathrm{F}_1\) score for algorithms with prefixed detection numbers or lengths (LOF and discord), but this upper limit does apply for TOF. The \(\mathrm{F}_1\) score of TOF was very high for the linear anomalies and slightly lower for logistic map—tent map anomaly and ECG datasets, but it was higher than the \(\mathrm{F}_1\) score of the two other methods and their theoretical limits in all cases. Note, that the only comparable performance was shown by discord detection on ECG anomaly, while neither algorithms based on discord nor LOF were able to detect the linear anomaly on random background.