Fig. 6: Study-level factors significantly associated with variations in wEEG performance.
From: Evaluating the performance of wearable EEG sleep monitoring devices: a meta-analysis approach

This figure presents only those comparisons where significant differences (p < 0.05) in evaluation metrics were observed. Full analyses across all stages and variables—including non-significant results—are available in the Supplementary figures. Asterisks (*) indicate statistically significant group differences based on Mann–Whitney U tests. I Influence of participants’ health status on wEEG performance: Boxplots show statistically significant differences between healthy, clinical, and mixed populations in macro-averaged metrics (Ia), Wake (Ib), N1 (Ic), N3 (Id), and REM (Ie) stages. Clinical and mixed-population studies generally reported higher performance, especially for κ, SE, F1, and MCC. II Influence of number of participants: Pearson correlation analysis revealed a significant negative correlation between the number of participants and N3 F1 score, suggesting reduced consistency in larger, more heterogeneous cohorts. III Influence of number of epochs: Significant positive correlations were found between the number of epochs and Wake-stage κ, F1 and MCC, indicating improved Wake classification performance with greater data volume. IV Influence of study environment: Boxplots comparing controlled (e.g., sleep lab or hospital) and home-based settings show significantly better performance in home studies for N1 ACC (IVa) and N3-stage κ, SE, F1, and MCC (IVb). Controlled settings yielded higher REM-stage ACC and NPV (IVc). ACC: accuracy; κ: Cohen’s kappa; MCC: Matthews correlation coefficient; SE: sensitivity; SP: specificity; PPV: positive predictive value; NPV: negative predictive value; F1: F1 score.