Fig. 5: Evaluation metrics for studies classifying four sleep stages using wEEGs. | npj Biomedical Innovations

Fig. 5: Evaluation metrics for studies classifying four sleep stages using wEEGs.

From: Evaluating the performance of wearable EEG sleep monitoring devices: a meta-analysis approach

Fig. 5

Each panel (a, cg) includes a boxplot of metric distributions and a bar chart showing the top studies by MCC score for that specific stage. Asterisks (*) indicate statistically significant differences (p < 0.05) from MCC (a, cg) or between micro- and macro-averaged versions of the same metric in (b). a Overall metrics: Studies showed an average ACC of 0.80 ± 0.08, significantly higher than κ (0.70 ± 0.11) and MCC (0.70 ± 0.11). b Comparison of micro- and macro-averaged metrics: Macro-averaged ACC was significantly higher than micro-averaged ACC. c Wake: Demonstrated κ = 0.68 ± 0.14, MCC = 0.69 ± 0.16, highest NPV and lowest SE (0.68 ± 0.18), suggesting strong ability to exclude Wake epochs but weaker sensitivity. d Light sleep: The most prevalent stage (53% of epochs) had the lowest κ (0.61 ± 0.11) and MCC (0.65 ± 0.11), along with the lowest ACC, SP, and NPV, but highest PPV. e Deep sleep: Showed the strongest performance overall, with the highest κ (0.77 ± 0.13), MCC (0.75 ± 0.13), ACC, and SE (0.80 ± 0.07), indicating robust detection. f REM: Achieved high SE (0.78 ± 0.14), κ (0.65 ± 0.18), and MCC (0.72 ± 0.10), but had the lowest PPV, suggesting higher false positives in REM classification. g Mean stage distribution across studies: Light sleep (53%), Deep sleep (17%), REM (18%), and Wake (12%). ACC: accuracy; κ: Cohen’s kappa; MCC: Matthews correlation coefficient; SE: sensitivity; SP: specificity; PPV: positive predictive value; NPV: negative predictive value.

Back to article page