Introduction

Dementia is a major global health challenge, with an estimated 57.4 million people affected in 2019, a number projected to rise to 152.8 million by 20501. Mild cognitive impairment (MCI) is an early stage of dementia, where individuals experience cognitive difficulties but can still manage daily activities2,3. Given that 40–60% of MCI cases are linked to underlying Alzheimer’s disease (AD) pathology4,5, early identification of MCI is crucial for delaying cognitive decline and preventing progression to AD6. However, diagnosing MCI accurately remains a critical challenge in clinical practice.

Cognitive screening tools such as the Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) are used widely7, but they often lack sensitivity and are influenced by examiner expertise and subjective interpretation8. Advanced tools such as magnetic resonance imaging (MRI), positron emission tomography (PET)7,9,10, provide higher diagnostic accuracy but are expensive and less accessible in low-resource settings, while cerebrospinal fluid (CSF) biomarkers11, though highly informative, require invasive lumbar puncture procedures. These collective limitations emphasize the critical need for non-invasive, cost-effective, and widely accessible MCI screening tools.

With advancements in biomedical engineering, non-invasive wearable technologies capturing physiological parameters, including gait patterns12, sleep quality13, heart rate variability (HRV)14, and others15, have emerged as valuable cognitive assessment tools. Among these approaches, wearable EEG-based biomarkers demonstrate particular promise through their capacity to provide direct, real-time measurement of neural oscillatory patterns and functional connectivity networks16. Increasing evidence indicates that resting-state EEG signals reflect distinct alterations associated with AD and MCI, including shifts in power spectrum from high-frequency components (alpha, beta, gamma) to low-frequency components (delta, theta)17, reduced complexity of brain electrical activity18, altered cortical connectivity patterns19, and cholinergic system dysfunction20. A recent systematic review reported EEG-based diagnostic accuracies for MCI or AD ranging from 62 to 98%21. However, this review primarily focused on electrophysiological biomarkers or classification algorithms for MCI or AD. Most included studies mainly used clinical EEG systems, which are expensive, complex, and require trained personnel, limiting their feasibility for routine use or large-scale screenings in non-clinical or non-laboratory settings. As its search period covered 2018-2022, recent advances in EEG technologies were not captured.

The emergence of wearable EEG technology has addressed these limitations22,23. In this study, we adopt a widely used definition24,25, describing wearable EEG as compact, lightweight, and easy-to-use head-mounted devices capable of recording brain activity over extended periods in natural or daily-life settings. These devices do not require specialized personnel or complex electrode preparation and are not limited to specific electrode types, emphasizing portability and usability outside the laboratory. Wearable EEG has been widely applied in epilepsy26, fatigue detection27, and emotion research28. Due to their virtue of their non-invasive nature, portability, and ease of use in non-clinical settings, these devices have attracted significant research attention as potential screening tools for MCI. Several studies have demonstrated the feasibility of using wearable EEG devices for MCI detection29,30. Noda et al.31 recently conducted a scoping review summarizing the potential of diverse wearable technologies, including fitness trackers, smartwatches, EEG systems, and other biosensors, for diagnosing, monitoring, and managing AD and MCI. Although this review provided valuable insights into the expanding field of mHealth wearables, its broad scope limited an in-depth evaluation of EEG-specific evidence for MCI detection. Unlike other wearable modalities, EEG directly records neuronal electrical activity, offering a unique capacity to capture subtle neurophysiological changes associated with early cognitive decline. Accordingly, a targeted synthesis of studies employing wearable EEG for MCI detection is warranted.

Despite ongoing technological advances, significant challenges remain in standardizing wearable EEG applications for practical use. Variability in device hardware, electrode configurations, data acquisition protocols, and analytical methods continues to hinder reproducibility and reliability. Furthermore, to our knowledge, no systematic review has yet been conducted on the use of wearable EEG devices for detecting MCI. Therefore, a quantitative comparison of existing wearable EEG systems is essential to identify key factors influencing diagnostic performance.

To address this gap, we aim to conducted a systematic review of wearable EEG performance in MCI detection, examining how various factors in the EEG system and data flow impact performance. This synthesis will highlight wearable EEG’s potential as an accessible tool for early MCI detection, identify methodological limitations, and propose future research directions.

Results

Search results

A total of 1562 literature records were identified, with 643 duplicates removed. After screening titles and abstracts, an additional 735 articles were excluded, leaving 184 publications. Among these, 26 full texts could not be retrieved and were therefore excluded. Following the application of predefined inclusion and exclusion criteria, an additional 138 studies were removed. One further study was identified through reference list screening, resulting in a final inclusion of 21 publications for this systematic review30,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51 (Fig. 1).

Fig. 1
Fig. 1
Full size image

PRISMA flow chart of the study selection process.

Characteristics of the included studies

The included 21 studies were published between 2019 and 2025, with the majority (17/21, 81.0%) published in the last 3 years (2022–2025) (Table 1). Most studies were published as journal articles (15/21, 71.4%), with the remainder being conference papers (6/21, 28.6%). These studies were conducted in eight different countries: China (n = 7)32,35,36,37,42,48,49, Japan (n = 5)43,44,45,46,50, Korea (n = 3)38,39,41, Germany (n = 2)34,40, Spain (n = 1)30, Poland (n = 1)33, the United States (n = 1)51, and the United Kingdom (n = 1)47. Regarding EEG data collection settings, most studies were carried out in hospitals (7/21, 33.3%)30,38,39,41,42,47,50 or research laboratories (7/21, 33.3%)33,36,37,43,44,45,46. Two studies (2/21, 9.5%)32,49 were community-based, conducted at community health service centers. The remaining studies (5/21, 23.8%)34,35,40,48,51 did not report their data collection settings.

Table 1 Characteristics of included studies evaluating wearable EEG for MCI detection

A total of 1660 participants were included across all studies, comprising 828 individuals diagnosed with MCI and 832 Health Controls (HC). Sample sizes varied considerably across studies, with a median of 45 participants per study (Interquartile Range (IQR): 34, 97; range: 13–336). Among the 20 studies (95.2%) reporting age data, the median age across all participants was 70.9 years (IQR: 69.6, 73.6; range: 62.4–74.3 years). Gender distribution was reported in 18 studies (85.7%), with a median proportion of 54.1% female participants (IQR: 45.2%, 66.7%; range: 34.2–96.3%). Educational background was reported in 10 studies (47.6%), with a median of 8.9 years of formal education (IQR: 7.3, 10.3; range: 6.1–15.0 years). When stratified by group, MCI participants were slightly older (median: 72.1 vs 72.0 years) and had lower educational attainment (median: 8.1 vs 9.6 years) compared to HC participants, while gender distribution was comparable between groups (44.4% vs 52.2% female).

A variety of diagnostic criteria were employed to identify MCI. The MoCA was the most frequently used screening instrument (9/21, 42.9%)33,35,36,37,43,44,45,46,48, with cutoff scores ranging from ≤25 to <26. Three studies (3/21, 14.3%)32,39,47 applied the Petersen or McKhann diagnostic criteria. The Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) task-based scoring was used in two studies (2/21, 9.5%)34,40. Two studies (2/21, 9.5%) utilized multimodal approaches incorporating CSF biomarkers and neuroimaging techniques (MRI or PET)30,42. The remaining five studies (5/21, 23.8%) employed other standardized assessment tools, including the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) criteria51, the Structured Clinical Interview for DSM Disorders (SCID) and Mini-International Neuropsychiatric Interview (MINI)38, Seoul Neuropsychological Screening Battery-Core (SNSB-C)41, the 2018 Chinese Dementia Guidelines49, and Clinical Dementia Rating (CDR) scale50.

Features of wearable EEG

This review organized wearable EEG device features into three fundamental domains based on a comprehensive evaluation framework: device fundamentals, signal acquisition, and physical and portability (Fig. 2). Detailed technical specifications for individual devices are provided in Supplementary Table 14.

Fig. 2: Schematic overview of the main characteristics of wireless EEG devices.
Fig. 2: Schematic overview of the main characteristics of wireless EEG devices.
Full size image

Categorization of Mobile EEG (CoME) S (System Specifications): Technical performance scoring (4–20 scale) based on bit resolution, sampling rate, battery life, and electrode type, where higher scores indicate superior system specifications. CoME D (Device Mobility): Device portability assessment (0–5 scale) from off-body tethered systems (0D) to fully integrated head-mounted devices (5D). When information needed to calculate the score was missing, the lowest value was used. Scores are not calculated if over 10 items are unavailable.

Device fundamentals

Our review identified 16 distinct wearable EEG devices evaluated across 21 studies. The rationale for including these devices, based on the predefined inclusion and exclusion criteria for wearable EEG systems, is detailed in Supplementary Table 3. MUSE 2 headband was most frequently utilized (7/21, 33.3%)35,36,37,43,44,45,46, followed by EPOC X (3/21, 14.3%)34,38,40. One study (Lee et al.)38 evaluated five different devices concurrently for comparative analysis. These devices originated from eight countries, with the United States contributing the largest proportion (7/16, 43.8%) (Fig. 3). Market development accelerated from 2013 to 2021, with the majority of wearable EEG devices (14/16, 87.5%) released during this period. Regarding regulatory status, 7 devices (43.8%) had obtained medical certification, while 5 (31.3%) had no medical certification, and 4 (25.0%) did not report certification status. Safety classification standards were documented for 7 devices (43.8%), and technical standards compliance was reported for 10 devices (62.5%) (Supplementary Table 6). The cost of these wearable EEG devices showed substantial variation, ranging from $200 for the MUSE 2 to $6289 for the g. Nautilus (converted from €5717), with price information unavailable for 8 devices (50%).

Fig. 3: Wearable EEG devices of global distribution and certification status in included studies.
Fig. 3: Wearable EEG devices of global distribution and certification status in included studies.
Full size image

Distribution of 16 wearable EEG devices by release year (2009–2021) and country of origin. Point colors indicate medical certification status (blue: certified; purple: not certified; pink: unknown). Point sizes represent price ranges (small: <$500; medium: $500–$2000; large: >$2000; smallest: unknown). Device names show corresponding prices in USD. USA leads in device development (n = 7). Across all devices, prices ranging from $200 (MUSE 2) to $6289 (g.Nautilus). Medical certification obtained by 43.8% of devices (n = 7). *Open-cEEGrids: open-source platform (https://doi.org/10.1038/srep16743). †STAT™ X24: release year unknown, 2021 represents study inclusion year.

Electrode and signal

Analysis of the 16 devices revealed that dry electrodes predominated (8/16, 50.0%), followed by wet (6/16, 37.5%) and hybrid designs (2/16, 12.5%) (Fig. 4). Most devices (12/16, 75.0%) featured fixed electrode placements. To enhance signal quality, devices implemented preamplification and shielding. Electrode preamplification involves circuitry at the electrode site to amplify signals and reduce noise. This was implemented as active (8/16, 50.0%) or passive (6/16, 37.5%), while two devices (12.5%) did not report this specification. Electrode shielding, which minimizes electromagnetic interference, was incorporated in half of the devices (8/16, 50.0%), explicitly absent in one (n = 1/16, 6.3%), and not reported in the remaining seven (7/16, 43.8%). Both preamplification and shielding technologies are independent of electrode type and can be applied to wet or dry sensors (Supplementary Table 7). To augment EEG data and account for movement artifacts, 62.5% (n = 10/16) of devices integrated accelerometers, and 43.8% (7/16) included gyroscopes. Other integrated hardware included ports for external triggers (5/16, 31.3%) and auxiliary channels for other physiological signals (4/16, 25.0%) (Supplementary Table 9).

Fig. 4: Sankey diagram illustrating interrelationships among technical specifications of wearable EEG acquisition systems.
Fig. 4: Sankey diagram illustrating interrelationships among technical specifications of wearable EEG acquisition systems.
Full size image

Sankey diagram illustrating technical specification relationships for 16 wearable EEG devices across six dimensions: electrode type, electrode placement, preamplification, shielding, channel count, and sampling rate. Flow widths represent device proportions for each specification. Electrode types were classified based on manufacturer descriptions: wet electrodes require conductive medium (gel, saline, or water); dry electrodes enable direct scalp contact or use polymer-based semi-dry sensors; hybrid electrodes can function in either mode. Sampling rates represent maximum values when available, otherwise standard/fixed rates.

Configurations of 4 channels were most common (3/16, 18.8%), followed by 8-channel (2/16, 12.5%), 19-channel (2/16, 12.5%), 2-channel (2/16, 12.5%). Sampling rates range from 128 to 600 Hz, with 256 Hz (4/16, 25%) being the most prevalent (Supplementary Table 8). Based on 23 unique electrode configurations identified across the studies, 44 unique electrode positions were utilized. AF7, AF8, F3, F4, T7, and T8 were the most frequently used recording sites (9/23, 39.1%) (Supplementary Fig. 3). In terms of regional coverage, the Anterior Frontal (14/23, 60.9%) and Frontal (11/23, 47.8%) regions were the most common for electrode placement (Fig. 5).

Fig. 5: Electrode usage frequency and brain region coverage across wearable EEG device configurations.
Fig. 5: Electrode usage frequency and brain region coverage across wearable EEG device configurations.
Full size image

Frequencies are based on N = 23 unique device configurations (sourced from 19 studies; e.g., Lee et al.38 contributed 5 configurations). Two non-standard (non-10-20 system) configurations were excluded. A Electrode Frequency is the absolute count (out of 23) of configurations using each specific electrode. B Region Coverage Frequency is the absolute count (out of 23) of configurations using at least one electrode from the specified region.

Physical and mobility

Bluetooth technology, including Bluetooth Low Energy (BLE), was the most common wireless protocol (13/16, 81.3%) in these wearable EEG devices. Two devices used proprietary 2.4 GHz wireless protocols, and one device did not report connectivity specifications (Supplementary Table 10). Local storage capabilities were available in 5 devices (31.3%), while most supported smartphone integration (13/16, 81.3%) (Supplementary Table 10). Battery life ranged from 3 to >12 h, and device weights ranged from 27 to 1000 g (Supplementary Table 11). We further applied the Categorization of Mobile EEG (CoME) framework52 to evaluate device portability and technical performance. CoME assessments were completed for 13 of 16 devices (81.3%) with sufficient technical documentation (Supplementary Table 11). Most devices demonstrated high portability, with 61.5% (8/13) achieving the top mobility rating, indicating the predominance of fully integrated, head-mounted wireless designs. System specification scores (CoME S) varied substantially (5–13) and were generally aligned with channel density. High-density systems such as Versatile (16 channels, S = 13) and g.Nautilus (32 channels, S = 10) showed superior sampling precision and electrode quality, whereas low-channel consumer devices (e.g., Focusband, 2 channels, S = 5) offered limited performance. Furthermore, a group analysis of these devices showed that 4–8 channel configurations strike an optimal balance, maintaining high mobility (Mean = 3.8) while delivering robust system quality (Mean = 9.3), which was notably higher than low-channel (≤3) configurations (Supplementary Fig. 4).

EEG recording protocols

Our analysis revealed that among the 21 studies reviewed, 47.6% (n = 10) utilized task-based protocols for EEG data acquisition33,34,40,42,43,44,45,46,47,49, 23.8% (n = 5) relied on resting-state recordings30,38,41,50,51, and 28.6% (n = 6) adopted a mixed protocol combining both approaches32,35,36,37,39,48 (Supplementary Table 4). Among the five resting-state studies, three were conducted with closed eyes38,41,50, one with open eyes30, and one included both open and closed eye conditions51, with durations ranging from 2 to 10 min. For six mixed protocols, resting-state segments lasted from 1 to 5 min, with four studies using closed eyes32,36,37,39, one utilizing open eyes35, and one where eye condition not reported48.

Task-based EEG recording paradigms varied across studies in complexity and targeted cognitive domains. Some investigations employed domain-specific tasks during EEG recordings, such as the delayed match-to-sample paradigm to assess short-term memory42,49 and a visual tracking task to evaluate attention32. However, the majority of studies recorded EEG signals during multi-domain cognitive stimulation. For example, Boudaya et al.34,40 used the GERAD test to evaluate executive function, visual perception, short- and long-term memory, visual construction, and verbal episodic memory. Similarly, Li et al.36 implemented a custom cognitive battery that assessed verbal fluency, memory, attention, auditory processing, visuospatial function, and executive functioning. Additionally, Chai et al.37 incorporated fine motor and visuospatial tasks by having participants complete four graphomotor exercises on a digital tablet. Segaert et al.47 employed a two-word phrase comprehension task probing lexical retrieval and semantic binding, extending EEG-based MCI research to linguistic processing. Several studies introduced novel EEG acquisition paradigms. Rutkowski et al. conducted a series of studies refining their task framework over time. Initial work utilized an oddball paradigm based on visual stimuli featuring indoor scenes for working memory assessment46. Subsequent studies incorporated facial emotion videos designed to probe visuospatial processing, short-term memory, and attention43,44,45. Finally, they combined these paradigms into a single composite task that involved facial emotion recognition and oddball-based working memory assessment33. Virtual reality (VR)-based cognitive tasks for EEG signal acquisition emerged as a novel methodological advancement. Wu et al.35 employed immersive VR environments to deploy language-based tasks, wherein participants were asked to explore interactive scenes and describe them in detail, integrating both visual and auditory stimuli. Participants could navigate through VR spaces and interact with objects, enhancing ecological validity. Lee et al.39 developed a VR-based cognitive protocol encompassing four domains: sustained attention, selective attention, working memory, and depth perception. Similarly, Xue et al.48 designed a series of VR tasks, including a sensitive response task, a spatial perception task, and an item placement task, collectively targeting attention, visuospatial working memory, and executive functions.

Overall, the task-stimulated EEG recordings primarily engaged various cognitive domains, with visuospatial processing being the most frequently assessed (10/21, 47.6%)33,34,35,36,37,40,43,44,45,48, followed by working memory (9/21, 42.8%)33,36,39,43,44,45,46,48,49, attention (7/21, 33.3%)32,36,39,43,44,45,48, and short-term memory (5/21, 23.8%)33,34,36,40,42 (Supplementary Table 12).

Wearable EEG signal processing

The overall EEG signal processing pipeline included segmentation, band-pass filtering, artifact removal, and feature preprocessing or selection. Based on a prior study, EEG feature extraction for MCI detection can generally be categorized into three main phenomena related to brain function: EEG slowing, complexity, and connectivity alterations21. Given the differences in signal processing techniques for these features, we summarize the current wearable EEG-based signal processing approaches for MCI detection based on these three categories (Supplementary Table 13).

Our results showed that most studies targeting EEG slowing applied band-pass filters with considerable variability but generally broad ranges39,41,42,50, extending up to 0.5–95 Hz50, and segmentation windows of 2–8 s42,50. Artifact removal was primarily performed using independent component analysis (ICA) or Auto ICA combined with manual inspection41,42,47,49. The extracted feature methods primarily included power spectral density (PSD)39,41,42,47,49, event-related potentials (ERP)42, and short-time Fourier transform (STFT)50 measures. Feature selection commonly relied on Chi-square42 or Wilcoxon tests39 while logistic regression models41 were occasionally used for statistical validation. Only a few studies reported feature preprocessing, with Segaert et al. applying Principal Component Analysis (PCA) for PSD-based dimensionality reduction47. Overall, these pipelines optimized the quantification of cortical slowing, characterized by increased δ/θ power and decreased α/β activity in MCI.

For studies extracting complexity-based EEG features to detect MCI, band-pass filters were relatively narrow, typically ranging from 1 to 30 Hz43,44,46, with segmentation windows of 3–8 s43,46. In most studies, artifact removal was not reported, with the exception of Rutkowski et al.45, who applied Empirical Mode Decomposition (EMD). Feature extraction methods primarily included multifractal detrended fluctuation analysis (MFDFA)43,44 and multiscale entropy (MSE)46, while topological data analysis (TDA)45 was also used to capture nonlinear signal complexity. Methods for feature selection were generally not reported. Preprocessing of extracted features was typically performed using Z-score normalization43,46, aimed at standardizing feature distributions across subjects. Overall, these pipelines focused on quantifying the loss of signal complexity in MCI, reflecting reduced adaptability and dynamical richness of cortical activity.

Studies extracting connectivity-based EEG features were limited. They employed band-pass filters ranging from 1 to 49 Hz with segmentation windows of 2–20 s33,51. Artifact removal methods included ICA51 or EMD33, depending on the study. Extracted features comprised functional connectivity metrics and EEG power distribution difference functions (PDDF)51, with dimensionality reduction or standardization performed via uniform manifold approximation and projection (UMAP)33 or PCA51. These pipelines primarily aimed to capture network-level alterations alongside spectral slowing associated with MCI.

Most studies employing multi-domain features (EEG slowing, complexity, and/or connectivity) used band-pass filters typically ranging from 1 to 45 Hz36,37,48. Artifact handling generally involved ICA with manual inspection35,36,37,48, or more advanced approaches, such as those used by Perez-Valero et al.30, who combined ICA with Autoreject. Autoreject uses a data-driven process with cross-validation and Bayesian search to set channel-specific rejection thresholds, automatically detecting high-amplitude or widespread artifacts and performing channel interpolation rather than discarding epochs. Residual ocular and muscular artifacts are subsequently removed via ICA, creating a two-stage correction pipeline that improves signal quality for connectivity analysis. Extracted features spanned spectral power, entropy measures, multiscale and multifractal complexity metrics, Hjorth parameters, and connectivity-informed indices. Feature selection methods varied across studies and included techniques such as recursive feature elimination with cross-validation (RFECV), binary swarm algorithms (BSA)35,36,37,48, stepwise regression32, Chi-square tests38, and Fisher scores30. For feature preprocessing, several studies applied PCA40, or Z-score normalization32 to reduce dimensionality and standardize the features. These pipelines integrated multi-domain EEG data, enabling enhanced characterization of both cortical slowing and reduced signal complexity, ultimately improving MCI classification performance.

Across studies, Z-score normalization34,45,46 and PCA-based standardization40,51 were the most recurrent preprocessing techniques. This consistency underscores a unified framework where each EEG feature category-whether spectral, nonlinear, or network-level-dictates its own optimized signal-handling strategy, while converging on comparable normalization and dimensionality-reduction principles in post-processing.

Classification algorithms

Among the 21 studies included in this systematic review, a total of 11 different classification algorithms were employed for the detection of MCI using wearable EEG devices. The majority of studies adopted traditional machine learning techniques. Support Vector Machine (SVM) was the most frequently used classifier (13/21, 61.9%)30,33,34,35,37,38,40,42,43,44,45,46,48 (Table 2). Of these, four studies implemented specialized SVM variants43,44,45,46, including linear, radial basis function (RBF), polynomial, and sigmoid kernel functions. Other commonly used traditional algorithms included Random Forest (RF) (12/21, 57.1%)33,34,35,36,37,39,40,43,44,45,46,49, Logistic Regression (LR) (10/21, 47.6%)30,32,33,40,41,43,44,45,46,49, Linear Discriminant Analysis (LDA) (5/21, 23.8%)33,43,44,45,46, Decision Tree (DT) (6/21,28.6%)34,35,36,40,42,48, K-Nearest Neighbors (KNN) (4/21,19.0%)34,36,40,42, XGBoost (4/21, 19.0%)35,36,37,48, Gradient Boosting (GB) (2/21, 9.5%)34,36 and Naive Bayes (NB) (n = 1)36. Deep learning approaches were less frequently employed than traditional machine learning techniques in MCI detection, with 23.8% of studies (n = 5) utilizing deep feedforward neural networks (DFNN/FNN)33,43,44,45,46. Additionally, one study employed a customized Transformer-based deep learning model (1/21, 4.8%)50.

Table 2 . Classification algorithms, validation methods, and performance metrics of machine learning models for MCI detection using wearable EEG

Validation strategies and performance metrics

In the 21 included studies, 5 different validation methodologies were employed. 10-fold cross-validation was the most common approach (9/21, 42.9%)34,36,37,40,43,44,46,48,50 (Table 2). Various leave-one-out strategies were also frequently utilized, including Leave-One-Out Cross-Validation (LOOCV) (5/21, 23.8%)33,35,39,42,51, Leave-One-Subject-Out (LOSO) cross-validation (2/21, 9.5%)30,45, and Leave-Pair-Out (LPO-CV) (n = 1)38. Additionally, one study implemented Randomized CV32, two studies used ROC curve analysis (2/21, 9.5%)32,41. One study did not report the validation method used (1/21, 4.8%)47.

Performance metric reporting varied considerably across studies. Accuracy was the most commonly reported metric (16/21, 76.2%)30,32,33,34,35,36,37,38,39,40,42,43,45,46,48,50, followed by AUC (8/21, 38.1%)32,33,41,44,47,49,50,51, F1 scores (6/21, 28.6%)30,33,35,36,37,39, recall (5/21, 23.8%)30,33,35,36,39, sensitivity (5/21, 23.8%)32,38,4249,50, specificity (5/21, 23.8%)32,38,42,49,50, and precision (4/21, 19.0%)30,35,36,39 (Supplementary Fig. 7). Chai et al.37 and Lee et al.39 respectively reported Kappa and Matthews Correlation Coefficient (MCC) as additional performance metrics (Table 2). Four studies reported accuracy, F1, precision, and recall30,35,36,39. Four studies reported accuracy, sensitivity, and specificity32,38,42,50. Ten studies reported only one metric (accuracy or AUC)34,40,41,43,44,45,46,47,48,51 limiting the assessment of classification performance.

Wearable EEG biomarker results in MCI detection

Our systematic review identified three primary neurophysiological phenomena across 21 studies: EEG slowing (16/21, 76.2%), complexity alterations (13/21, 61.9%), and connectivity/asymmetry changes (7/21, 33.3%) (Supplementary Fig. 6).

EEG slowing

MCI is associated with slowed EEG dynamics, reflected by a characteristic shift of spectral power toward lower frequency bands, specifically increased delta (δ) and theta (θ) power and reduced alpha (α) and beta (β)/gamma (γ) power. This phenomenon, often referred to as EEG slowing in previous studies21,53, is typically quantified using time–frequency analyses. In the studies included in our review, power spectral density (PSD) analysis was the most commonly used method for extracting EEG biomarkers related to this phenomenon (13/21, 61.9%)34,35,36,37,38,39,40,41,42,48,49,50,51. Increased slow-wave activity, with delta and/or theta power elevations, was observed in 28.6% (n = 6) of studies35,36,37,39,48,51 (Supplementary Tables 15 and 16). Conversely, high-frequency oscillations were reduced, with altered alpha power in 14.3% (n = 3)39,42,47 and diminished beta/gamma power in 9.5% (n = 2)32,39. Band power ratios, particularly the theta/alpha ratio (TAR, 3/21, 14.3%)32,39,51 and delta/alpha ratio (DAR, 2/21, 9.5%)32,39, served as discriminative biomarkers. Additional methods for feature extraction included time-domain features (2/21, 9.5%)34,40, ERPs (1/21, 4.8%)42, and time-frequency analyses (1/21, 4.8%)50.

Complexity alterations

Entropy measures were the most frequently used complexity metrics, employed in 38.1% (n = 8) of studies30,32,35,36,37,38,46,48. Among these, spectral entropy was most common (6/21, 28.6%)30,35,36,37,38,48, followed by multiscale entropy (2/21, 9.5%)32,46 and fuzzy entropy (2/21, 9.5%)37,48. Xue et al.48 revealed state-dependent complexity changes: increased fuzzy entropy at rest but decreased during cognitive tasks. Multifractal detrended fluctuation analysis (MFDFA) (2/21, 9.5%)43,44 showed elevated scores in temporal-prefrontal regions and theta band, indicating disorganized neural dynamics. Hjorth complexity (4/21, 19.0%)30,34,38,40 and Lempel-Ziv complexity (2/21, 9.5%)32,38 were also employed. Topological data analysis using persistent homology (1/21, 4.8%)45 demonstrated simplified network architecture with fewer topological cycles in MCI.

Connectivity and asymmetry

Asymmetry indices (Differential Asymmetry (DASM)/Rational Asymmetry (RASM)) were most prevalent (5/21, 23.8%)35,36,37,38,48 revealing altered interhemispheric coordination. Wu et al.35 found greater asymmetry during language tasks, Chai et al.37 showed decreased synchrony in temporal regions, and Lee et al.38 reported increased delta DASM frontally. Single studies examined coherence analysis (1/21, 4.8%)51, showing decreased coherence in delta, theta, and alpha bands; phase-amplitude coupling (1/21, 4.8%)38, revealing reduced theta-high beta coupling; and network analysis (1/21, 4.8%)33, demonstrating fewer network nodes and edges in MCI patients.

Classification results

Our systematic review of 21 studies revealed significant performance heterogeneity (Fig. 6). Accuracy (Acc), reported in 16 studies, ranged from 0.46 to 0.95, with a median IQR of 0.74–0.85. Notably, 9 of these studies reported Acc exceeding 85%. The remaining five studies reported only AUC values, which ranged from 0.45 to 0.90.

Fig. 6: Performance metrics of wearable EEG systems for detecting mild cognitive impairment.
Fig. 6: Performance metrics of wearable EEG systems for detecting mild cognitive impairment.
Full size image

Lollipop chart showing classification performance ranges for 21 studies (n = 13–336). Studies ranked by median performance with sample sizes indicated (n = X). Lines connect minimum-maximum performance ranges; points represent individual methods colored by metric (blue: accuracy; orange: AUC), with opacity indicating performance level and size scaled by sample size. Labels show min, max, and median (bold) values. Performance spans 0.45–0.95, with median values 0.62–0.94. Heterogeneity reflects dataset, protocol, and methodological variations. MCI mild cognitive impairment, AUC area under curve.

Machine learning algorithm performance comparison for EEG-only MCI classification

In the classification results using wearable EEG signals, 17 of the 21 studies compared different methods for detecting MCI31,33,34,35,36,37,39,40,41,42,43,44,45,46,47,48,49 (Fig. 7). Of these, 13 studies evaluated various classification algorithms, with four reporting that SVM achieved the highest accuracy, followed by RF in three studies.

Fig. 7: Comparative analysis of classification performance for MCI detection models using wearable EEG.
Fig. 7: Comparative analysis of classification performance for MCI detection models using wearable EEG.
Full size image

Each chart represents one study; axes show algorithms or conditions; distance from center indicates accuracy (concentric circles: 25%, 50%, 75%, 100%). Colors distinguish experimental conditions in multi-condition studies. Highlighted labels indicate best-performing methods. Studies systematically compared 3–9 algorithms, including traditional machine learning (SVM, RF, kNN) and deep learning approaches. Results demonstrate task-dependent optimal algorithm selection, with ensemble methods and SVMs frequently achieving competitive performance. Row 1: task/algorithm comparisons; Row 2: device configurations and feature combinations; Row 3: EEG modalities and frequency bands; Row 4: encoding/decoding and emotion recognition tasks. MCI mild cognitive impairment, RF random forest, SVM support vector machine.

Six studies examined classification performance under different paradigms. Rutkowski et al.33 found the Reminiscent Interior Oddball task achieved the highest accuracy, outperforming emotion and learning tasks. Lee et al.39 showed that combining resting-state and task-based EEG features outperformed resting-state EEG alone. Similarly, Xue48 found task-based EEG superior to resting-state EEG. Rutkowski et al.43 reported similar results between encoding and decoding phases of emotional facial video training. In their subsequent 2022 study, they found that the inhibited emotional expression task outperformed other paradigms44. Segaert et al.47 compared the Two-word Phrase Paradigm, showing single-word retrieval outperformed binding.

Several studies examined the impact of EEG features on classification, with most showing that band selection algorithms yielded highest accuracy in the theta band35,37,48,49. Xue et al.48 reported better classification performance with beta band features in task-based EEG, while Chen et al.42 found high classification performance in both alpha and theta bands. Chen et al. also noted that ERP features outperformed power spectral density entropy (PSDE) features, and combining both improved accuracy. Jin-Young et al.41 found that high beta power from resting-state EEG had a slightly higher AUC (0.68) than gamma power (0.67).

In most studies ( 9/21, 42.9%)30,32,35,36,37,38,39,40,42, feature selection results showed that the optimal classification feature subsets typically included asymmetry indices (DASM, RASM), mean power (MP), spectral entropy (e.g., fuzzy entropy), phase-amplitude coupling (PAC), relative power (RP), and Hjorth complexity (HC). A comparison of features categorized as “connectivity,” “complexity,” and “EEG slowing” (temporal and frequency domain) showed that connectivity-based features generally yielded higher classification accuracy (Supplementary Fig. 8).

Further analysis of factors potentially influencing the classification performance of wearable EEG in MCI detection (Fig. 8D) showed that studies using fronto-parietal placements achieved significantly higher accuracy, AUC, and precision (all P < 0.001) compared with other configurations. Logistic regression confirmed electrode placement as a strong independent predictor of performance (Supplementary Table 17).

Fig. 8: Technical specifications impact on wearable EEG-based MCI detection performance.
Fig. 8: Technical specifications impact on wearable EEG-based MCI detection performance.
Full size image

Violin plots comparing classification performance across EEG system characteristics. Shapes show density distributions; boxplots show quartiles; red diamonds indicate means; p-values from statistical tests. A Protocol type: Multi-domain protocols outperform single-domain (accuracy P = 0.001; AUC P = 0.001) with reduced variability. B Feature integration: Multi-domain features show improved accuracy (P = 0.03) and F1 scores (P = 0.29) versus single-domain features. C Channel count: Higher channel counts (9–20) yield better accuracy than lower counts (1–3: P = 0.01; 4–8 vs 9–20: P < 0.001), with reduced variability. D Electrode placement: Fronto-parietal montages significantly outperform other configurations across all metrics (all P < 0.001) with greater consistency. Multi-domain protocols, higher channel counts, and optimized electrode placement enhance MCI classification performance and consistency. MCI mild cognitive impairment, AUC area under curve.

Channel count also had a significant effect. Systems with 1–3 channels showed lower accuracy than both 4-8 channel (P = 0.01) and 9–20 channel configurations (P < 0.001) (Fig. 8C). Regression analysis indicated that 1–3 channel systems had markedly reduced odds of achieving high performance (OR = 0.12, 95% CI 0.03–0.40) relative to the 4-8 channel reference group (Supplementary Table 17). No significant difference was observed between 4–8 and 9–20 channel systems (AUC P = 0.57), suggesting a performance plateau. In univariate analyses, both multi-domain EEG protocols (Fig. 8A) and multi-domain features (Fig. 8B) were associated with higher accuracy (P < 0.001 and P = 0.03, respectively). However, in the multivariate logistic regression model, the effect of multi-domain protocols was not statistically significant (OR = 0.91, 95% CI 0.25–3.30; Supplementary Table 17). These findings indicate that electrode placement and channel count are the most robust independent predictors of classifier performance.

Performance enhancement through multimodal feature integration

We examined two key strategies reported in the included studies for improving MCI classification: multimodal feature fusion and ensemble learning.

Eight studies integrated EEG with additional data modalities. Our analysis showed that multimodal classifiers (e.g., EEG combined with other features) achieved significantly higher accuracy than EEG-only models (Fig. 9B; P = 0.0025). Intra-study comparisons corroborated this trend, with multimodal fusion consistently outperforming EEG-only baselines (Fig. 10A). Reported accuracy improvements ranged from 5.3 to 36.8% (Supplementary Fig. 9). The fused features included physiological signals (e.g., HRV, EDA), motor-control parameters (e.g., handwriting, gait), and cognitive or clinical measures (e.g., VR task performance, eye tracking, digital cognitive markers).

Fig. 9: Comparative analysis of classifier architectures and feature integration strategies for EEG-based mild cognitive impairment detection.
Fig. 9: Comparative analysis of classifier architectures and feature integration strategies for EEG-based mild cognitive impairment detection.
Full size image

Violin plots showing performance distributions. A Ensemble classifiers superior to individual methods (P = 0.005). B Multi-domain features outperform single-domain EEG (P = 0.0025). Both approaches enhance accuracy and consistency. MCI mild cognitive impairment.

Fig. 10: Comparative analysis of classification performance for MCI detection models using wearable EEG.
Fig. 10: Comparative analysis of classification performance for MCI detection models using wearable EEG.
Full size image

Radar charts comparing classification approaches across studies. Distance from center indicates performance (circles: 25%, 50%, 75%, 100%). Colors distinguish methodological approaches. A Unimodal versus multimodal features (8 studies, 2 rows): Studies systematically compared EEG-only features against multimodal integration incorporating physiological signals (HRV), behavioral measures (handwriting, eye-tracking), cognitive assessments, or multi-paradigm EEG. Multimodal fusion consistently achieved 5–15 percentage point improvements across studies. B Individual versus ensemble classifiers (12 implementations, 4 studies): Systematic comparison of individual machine learning algorithms (blue/cyan/green) against ensemble soft voting methods (red/orange). Ensemble approaches demonstrated consistent superiority with balanced performance across evaluation criteria. Multimodal integration and ensemble learning both substantially improve MCI classification, with combined approaches optimizing detection systems. MCI mild cognitive impairment, HRV heart rate variability, ERP event-related potential.

Four studies directly compared individual classifiers with ensemble methods (e.g., majority voting) (Fig. 10B). Our results demonstrated that ensemble models provided a statistically significant increase in accuracy over single models (Fig. 9A; P = 0.005). Although absolute gains were smaller than those from multimodal fusion, improvements were consistent across all studies, ranging from 1.7 to 7.7% (Supplementary Fig. 8). Case-level analyses further highlighted the positive impact of ensemble strategies.

Risk of bias assessment

Our analysis showed low risk of bias in participant selection in only 23.8% (5/21) of studies, with low participant applicability concerns in 61.9% (13/21) and high concerns in 33.3% (7/21) (Supplementary Figs. 1 and 2). The index test domain showed low risk of bias in 61.9% (13/21) of studies, with all studies (100%) demonstrating low index test applicability concerns. For reference standards, 71.4% (15/21) showed low risk of bias, and 95.2% (20/21) had low applicability concerns. Overall, only 9.5% (2/21) of studies exhibited low risk of bias across all domains.

Discussion

To our knowledge, this is the first systematic review examining wearable EEG devices for MCI, identifying 21 relevant studies investigating their detection capabilities. Our findings suggest that wearable EEG technologies hold significant potential as a promising tool for identifying MCI. However, the detection accuracy of MCI varied considerably across those studies, ranging from 0.46 to 0.95. Our study identified significant methodological heterogeneity across various aspects, including participant selection procedures, device specifications, EEG acquisition protocols, signal preprocessing pipelines, feature extraction methodologies, and classification algorithms. These findings underscore the urgent need for the standardization of technical infrastructure, data processing workflows, study protocols, and reporting guidelines to advance the practical implementation of wearable EEG technologies for the detection of MCI.

Challenges of wearable EEG system and data flow

Device differences

This study highlights substantial heterogeneity among wearable EEG devices currently used in MCI research (Fig. 11). A major limitation lies in the lack of transparency regarding sensor components, technical specifications, and reported variables across different devices. This issue significantly hinders the reproducibility and comparability of findings across studies. Additionally, other systematic reviews on wearable EEG devices have also identified similar problems with the current technology54,55. The substantial price range of these devices, from $200 to $6289, also raises concerns about equity in research and clinical accessibility. Furthermore, most existing wearable EEG devices were not initially designed to detect cognitive impairment in the elderly. For example, while MUSE 2 has been validated in many studies, it is primarily used for stress management and meditation56. Our analysis showed heterogeneity in electrode configurations, with prefrontal and frontal regions most frequently monitored. This aligns with evidence of early frontal involvement in MCI pathophysiology, but the low representation of occipital areas may limit detection of early visual processing changes linked to cognitive decline. Our findings indicate that the majority of wearable EEG devices utilize dry electrode systems, which enhance out-of-laboratory usability by eliminating the need for conductive gel. However, dry electrodes may compromise signal quality to some extent compared to traditional wet electrodes57. Additionally, most wearable EEG devices feature fixed electrode placement designs. While this presents a limitation in adapting to diverse individual skull morphologies, potentially reducing signal quality, it also offers the advantage of minimizing inter-operator variability, particularly when non-clinical staff are involved in electrode positioning. Regarding medical certification, only about one-third of the wearable EEG devices in our review have CE certification, raising concerns about their standardization and reliability for practical applications. Concurrently, to meet practical demands, these devices are trending toward miniaturization and simplification. However, this may limit their ability to extract the nuanced biomarkers needed for precise MCI research.

Fig. 11: Challenges in wearable EEG system and data flow for MCI detection.
Fig. 11: Challenges in wearable EEG system and data flow for MCI detection.
Full size image

Six key challenges: device heterogeneity, protocol inconsistency, signal quality issues, biomarker gaps, algorithmic limitations, and validation weaknesses. Issues span hardware specifications, recording standardization, artifact management, feature engineering, machine learning interpretability, and statistical rigor. Comprehensive solutions needed for clinical deployment. MCI mild cognitive impairment.

A lack of consensus for signal acquisition protocols

The heterogeneity in recording protocols observed in our systematic review presents a significant challenge for standardizing the use of wearable EEG devices in MCI detection. Our research identified three primary categories of protocols: task-based, resting-state (rsEEG), and mixed approaches. This distribution underscores an ongoing debate and a lack of consensus regarding the most sensitive and reliable signal acquisition strategies for MCI detection. While rsEEG protocols are often preferred by elderly populations due to their reduced participant burden58, they are also prone to factors such as uncontrolled mind-wandering and fluctuations in mental states immediately before recording59. These issues can negatively impact the test-retest reliability of the extracted EEG markers. Conversely, task-based paradigms, particularly those assessing multiple cognitive domains simultaneously, may provide more sensitive biomarkers by directly evaluating cognitive functions affected in MCI. However, inherent inter-participant variability during task execution can introduce noise, thereby compromising classification accuracy. For example, the significant differences in cognitive tasks used across studies, including variations in stimulus presentation frequency, task complexity, and duration, can substantially alter EEG results. This makes direct comparisons and the validation of biomarkers across different research settings exceedingly challenging. In the absence of standardized task batteries specifically validated for MCI screening with wearable EEG, the field struggles with issues related to reproducibility and the establishment of robust, generalizable MCI-specific neural signatures.

Signal quality and artifact contamination issues

Wearable EEG devices offer flexibility for use in less controlled environments. However, this advantage is coupled with heightened susceptibility to external interferences, which can introduce artifacts into EEG recordings60. Mitigating these contaminants remains a critical challenge in the practical application of wearable EEG systems. Our systematic review revealed that many included studies did not report their artifact removal strategies or provide details on temporal segmentation. Among those that specified epoch durations, few offered a rationale for their selection. ICA with manual inspection was the most commonly employed artifact removal technique across the reviewed studies, aligning with conventional EEG research practices61. While this semi-automated approach retains the benefit of expert judgment, it also introduces operator dependency and limits scalability. This limitation can be mitigated by leveraging ICA component classification tools such as ICLabel, which facilitate automated identification of artifact components and reduce reliance on manual inspection62. Temporal windowing and segmentation present additional methodological challenges. Short windows may fail to capture sufficient neural information, while overly long windows can obscure transient but clinically significant events through temporal averaging. We found in our review that studies using wearable EEG to assess MCI showed considerable variation in epoch durations. Research indicates that the optimal epoch length depends on the specific task33. For instance, shorter epochs are more suitable for memory tasks with precise stimuli, whereas longer epochs are preferable for resting-state or emotion processing analyses63. Additionally, the alignment of these epochs, especially when synchronized with stimulus onset or offset, introduces significant variability, particularly if the durations of the stimuli differ substantially64. This heterogeneity in epoch duration and alignment complicates cross-study comparability and hinders standardized methodology development. Therefore, the establishment of transparent, standardized signal processing pipelines is essential to ensure data integrity and reproducibility in wearable EEG research aimed at detecting MCI.

Neurophysiological biomarker complexity and validation challenges

Current efforts in wearable EEG-based MCI detection predominantly focus on biomarkers reflecting EEG slowing and reduced signal complexity, whereas markers capturing functional connectivity alterations remain comparatively underexplored. Recent studies, however, have begun to adopt more integrative approaches to characterize brain network disorganization. Rutkowski et al.33 applied Ordinal Partition Network (OPN) analysis and reported markedly fewer network nodes and edges in MCI, indicating reduced microstate complexity and impaired network integration, consistent with early synaptic loss and hub degradation. Meghdadi et al.51 observed pronounced θ–α power increases across temporal regions, suggesting maladaptive hypersynchronization associated with cortical disconnection. Similarly, Rutkowski et al.45 used TDA to demonstrate higher persistence entropy but fewer topological cycles in MCI, reflecting a simplified functional architecture and diminished integration.

Despite recent progress, several promising EEG biomarker domains remain largely unexplored in wearable EEG research. Cross-frequency coupling (CFC), particularly PAC between low and high frequency oscillations such as theta–gamma and delta–gamma, has demonstrated strong associations with cognitive decline and disease progression in laboratory EEG studies65,66. However, these complex interactions are rarely examined in wearable EEG due to computational constraints and vulnerability to motion artefacts. Graph-theoretical network metrics, such as global efficiency, small-worldness, and nodal centrality, consistently reveal disrupted network organization in MCI and Alzheimer’s disease when derived from high-density EEG or magnetoencephalography (MEG) data67. Yet, adapting these measures to the low-channel configurations typical of wearable devices remains challenging. Similarly, phase-based synchrony measures-including phase-locking value and coherence-effectively capture fronto-parietal and default-mode network decoupling associated with cognitive impairment68,69. Nonetheless, their application in wearable EEG studies is still limited.

Beyond analytical gaps, technological and methodological constraints further limit the robustness of current evidence. Most MCI-related EEG biomarkers have been developed using clinical-grade systems under controlled conditions. In contrast, wearable EEG devices often suffer from restricted electrode coverage, greater motion artifacts, and lower signal fidelity due to variable electrode–skin contact and environmental noise60,70. Furthermore, most studies are small-scale and cross-sectional, restricting reproducibility and the capacity to capture longitudinal trajectories of cognitive decline.

These challenges highlight the need for future research to incorporate multi-dimensional neural descriptors that integrate cross-frequency interactions, network topology, and dynamic phase synchrony to better characterize system-level reorganization in MCI. Addressing these challenges will require interdisciplinary collaboration and the development of innovative methodological solutions.

Algorithmic barriers and computational constraints

Our review identified ten classification algorithms across wearable EEG studies, with SVM being the most frequently employed classifier. SVM demonstrates superior performance through kernel-based transformations that map features to higher-dimensional spaces, effectively capturing subtle non-linear EEG alterations characteristic of MCI71. However, SVM implementation faces substantial challenges, including intensive parameter optimization requirements and high computational demands that compromise real-time processing capabilities essential for wearable applications72. In addition to SVMs, traditional machine learning algorithms such as RF and LR are frequently employed for MCI classification. Although these methods have demonstrated utility, they are prone to overfitting specific datasets, which compromises their generalizability73,74. This limitation is particularly problematic in real-world scenarios characterized by high inter-individual variability. As a result, the robustness of these classifiers is often called into question, underscoring the need for the development of more generalizable models capable of maintaining consistent performance across diverse populations75. Deep learning methods, particularly feedforward neural networks, hold substantial potential for MCI detection but remain underutilized. The primary limitation is data scarcity: most studies (17/21, 81.0%) included in this review had small sample sizes insufficient to train generalizable deep neural networks. Heterogeneity in electrode configurations, sampling rates, and preprocessing across wearable EEG devices further hinders cross-study data integration. While interpretability issues of “black-box” models can be alleviated using explainable artificial intelligence (XAI) techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM), SHapley Additive exPlanations (SHAP), or attention mechanisms76,77, addressing dataset scale and device standardization is essential for clinical translation. Future efforts should focus on establishing large, multi-center EEG databases and unified acquisition protocols to enable effective deep learning applications in MCI detection.

Issues in model validation and performance evaluation

Our systematic review found methodological limitations in validation strategies, especially the use of 10-fold cross-validation with small sample sizes (<100 participants). These small cohorts reduce statistical power, resulting in unreliable estimates and limiting generalizability to larger, diverse populations and clinical settings78. A critical issue is the frequent underreporting of comprehensive performance metrics. The majority of studies relied solely on accuracy as a performance measure, which fails to adequately capture model performance, particularly in imbalanced datasets where healthy controls often outnumber MCI cases79,80. The omission of essential complementary metrics, such as F1 scores, AUC, sensitivity, and specificity, results in an incomplete and potentially misleading assessment of model efficacy. This oversight not only hinders the ability to evaluate the true clinical utility of these models but also complicates meaningful comparisons across studies.

Determinants of superior classification performance

Our systematic review revealed substantial variability in the classification performance of wearable EEG-based systems for MCI detection, with reported accuracies ranging from 46 to 95%. Based on the criteria described in the “Methods” section, eleven studies were identified as demonstrating superior classification performance. By examining the shared characteristics of these high-performing studies, including EEG system configurations and data processing pipelines, and integrating insights from quantitative analyses, we identified several key factors associated with improved classification accuracy (Fig. 12).

Fig. 12: Key design features optimizing diagnostic performance, portability, and affordability for wearable EEG-based MCI screening.
Fig. 12: Key design features optimizing diagnostic performance, portability, and affordability for wearable EEG-based MCI screening.
Full size image

Five-stage pipeline using moderate-channel wearable EEG (4–8 channels) with multi-domain tasks, multimodal integration, adaptive preprocessing, and ensemble classification. Balances high accuracy with portability and affordability, enabling population-level screening. MCI mild cognitive impairment, EEG electroencephalography.

Moderate channel density

Among the twelve high-performing studies identified, 66.7% (8/12) employed wearable EEG devices with four or more channels. Comparative analysis across channel configurations revealed that systems using 4-8 channels achieved significantly better classification performance than those with 1–3 channels (Supplementary Table 17 and Fig. 8C). However, no significant improvement was observed when compared to devices with nine or more channels, suggesting a performance plateau beyond this range. Marginal effect analysis of channel number against accuracy and AUC identified an optimal performance peak at approximately eight channels, with additional electrodes providing diminishing returns (Supplementary Fig. 5). Similarly, Lee et al.38 reported that classification accuracy increased with electrode count but reached saturation around eight channels, indicating limited benefit from further expansion. Although increasing the number of electrodes can enhance performance to some extent, practical considerations must also be addressed. Our analysis of mobility and system quality scores across channel configurations showed that 4–8 channel setups achieve the best balance between portability and signal integrity (Supplementary Fig. 4). Moreover, moderate-channel systems are generally more cost-effective.

Taken together, these findings suggest that wearable EEG devices with a moderate number of channels (~4–8) provide an optimal trade-off between diagnostic accuracy, portability, and affordability. For broader real-world adoption and accessibility, future development should prioritize mid-density wearable EEG systems, while further large-scale cost–benefit studies are warranted to refine the optimal channel configuration for MCI detection.

Frontal and parietal electrode placement

Our analysis indicated that studies achieving superior MCI classification performance predominantly positioned electrodes over the frontal and parietal cortices, regions critically involved in attention, executive control, and working memory, which are among the earliest functions affected in MCI. Analysis of electrode usage frequency across wearable EEG studies (Fig. 5 and Supplementary Fig. 3) revealed that 60.9% of high-performing studies included at least one frontal electrode, whereas 47.8% incorporated at least one parietal electrode. Quantitative comparisons (Supplementary Table 17 and Fig. 8D) demonstrated that fronto-parietal configurations yielded significantly higher classification performance relative to alternative configurations (P < 0.05). Consistently, Lee et al.38 and Chen et al.42 directly compared electrode placement strategies, reporting that channels contributing most strongly to MCI discrimination were concentrated in frontal and parietal regions. Jiang et al.49 similarly observed that electrodes exhibiting significant group differences between MCI and healthy controls were primarily distributed across prefrontal and parietal cortices. These findings likely reflect the pivotal role of fronto-parietal networks in cognitive control and memory encoding, with MCI-associated disruptions manifesting as reduced alpha synchronization and altered theta–gamma coupling, impairing top-down attentional modulation and working memory integration81,82. Collectively, these results emphasize that fronto-parietal electrode placements not only capture the most discriminative neural signatures for MCI detection but also align with established neurophysiological models of early cognitive decline.

Multi-domain cognitive task protocols

Our review found that studies with superior MCI classification performance typically used paradigms from multiple cognitive domains, including attention, working memory, and visuospatial abilities. These protocols often included oddball paradigms33,43,45, memory-related tasks46, the CERAD battery34,40, and handwriting motor tasks37. A study by Borhani et al.29 found that during rest, higher alpha and beta activity in the right parietal region correlated with lower memory retrieval accuracy. During tasks, better working memory performance was associated with increased delta and theta power in the left parietal region. Seo et al. further showed that visuospatial reproduction and working memory assessments contribute to early MCI detection83. Previous research showed better MCI detection performance during memory tasks compared to resting state with eyes closed84. Additionally, several MCI-related EEG features, such as ERPs, can only be extracted during specific cognitive paradigms because they are not present during resting states85. These evidences suggest that collecting EEG signals during cognitive task performance offers advantages for MCI detection. However, task-based protocols require more participant cooperation and consume more time and energy than resting-state data collection. Therefore, designing simple and comprehensive task paradigms covering key cognitive domains is crucial for improving wearable EEG detection accuracy and maintaining user-friendliness for elderly participants.

Adaptive signal preprocessing

Given the inherent signal quality limitations of wearable EEG devices and substantial heterogeneity in current preprocessing methodologies, standardized automated pipelines with predefined quality thresholds are essential for cross-device comparability in MCI classification.

Frequency-domain filtering is particularly critical, as MCI is consistently associated with characteristic spectral alterations. In resting-state EEG, elevated theta power constitutes the most robust biomarker51,86, whereas task-related paradigms, such as memory encoding or attention tasks, often reveal discriminative alpha-band changes48, underscoring the need to align preprocessing strategies with cognitive state. Bandpass selection should further reflect the complexity of the extracted features. Advanced nonlinear metrics, including MFDFA, TDA, and MSE, benefit from a narrower 1–30 Hz filter combined with preliminary denoising via EMD or wavelet-based methods, which preserve the multiscale temporal structure critical for these analyses87,88. Conventional time- and frequency-domain features, in contrast, are typically analyzed using a broader 1–45 Hz filter, preserving both low-frequency oscillations and higher-frequency cognitive correlates while attenuating muscle artifacts (>45 Hz) and slow drifts (<1 Hz); artifact removal generally combines ICA with visual inspection, representing the current gold standard89. When focusing on specific frequency bands for classification, such as theta or alpha, subsequent band-specific filtering after broadband preprocessing can further enhance signal-to-noise ratio and feature discriminability.

Multi-domain feature extraction

We further examined which EEG feature domains most strongly contributed to the superior classification performance observed across studies. Several studies directly compared spectral features across frequency bands and consistently reported that theta (4–8 Hz) activity provided the highest discriminatory power for MCI detection. This finding is consistent with established evidence that cortical slowing, characterized by increased theta power, represents a core pathophysiological hallmark of MCI, reflecting underlying cholinergic dysfunction and early neurodegenerative processes90,91. In contrast, task-based paradigms revealed different optimal profiles. Xue et al.48 reported that beta-band (13–30 Hz) features achieved superior performance in task-related recordings. These state-dependent differences highlight the importance of aligning frequency-band selection with the recording paradigm.

When categorized into connectivity, complexity, and EEG-slowing domains, connectivity-based features consistently yielded the highest classification accuracy (Supplementary Fig. 8). This superiority supports the network disruption hypothesis of MCI, which posits that cognitive decline arises from impaired functional integration across distributed brain networks rather than isolated regional dysfunction92,93. Connectivity measures effectively capture the loss of large-scale neuronal coordination characteristic of early Alzheimer’s pathology, particularly within the default mode and frontoparietal networks, where functional disintegration precedes substantial structural atrophy67,94.

Our analysis further revealed that studies incorporating multi-domain feature combinations, integrating spectral, entropy-based, and network-derived metrics, achieved substantially higher classification accuracy compared with those relying on single-domain features (Fig. 8B). Such integration provides a more comprehensive representation of MCI-related neurophysiology by encompassing both linear and non-linear dynamics. The combined use of frequency-domain features (neural oscillatory abnormalities), connectivity measures (network disruptions), and complexity metrics (information-processing efficiency) offers a holistic characterization of MCI’s multifaceted pathophysiology95,96.

Overall, these findings indicate that multi-domain feature integration, particularly the combination of connectivity and complexity measures with band-limited spectral power, provides the most robust and biologically informed framework for distinguishing MCI from healthy ageing.

Ensemble classifiers

Our review revealed that the application of ensemble classifiers substantially enhanced the accuracy of MCI detection across multiple studies, with reported improvements ranging from 5.3 to 36.8%34,35,36,37. Individual ML classifiers may produce different results on specific datasets due to their attributes, capacity to learn local/global patterns, and vulnerability to overfitting. The integration of ensemble methods, particularly voting strategies combining predictions from multiple classifiers, consistently enhanced classification outcomes and offers a practical approach to address inter-individual variability in neurophysiological patterns34,35,36,37. Additionally, since EEG signals are collected under different cognitive tasks with varying contributions to MCI detection, weighting each task’s classification accuracy proportionally to its share of the total accuracy can effectively reflect different cognitive tasks’ influence on MCI classification performance35.

Multimodal integration

Our review further found that substantial improvements in classification MCI performance (1.7–7.7%) were achieved through multimodal integration. Most studies combining EEG data with additional physiological signals (heart rate variability34, electrodermal activity36), motor parameters (handwriting kinematics37, gait metrics41), and cognitive performance35 indicators consistently outperformed unimodal EEG approaches. These findings suggest wearable EEG devices achieve optimal detection capability when embedded within comprehensive digital phenotyping frameworks rather than deployed as standalone screening tools.

Recommendations for future research

This review underscores substantial methodological variability in wearable EEG-based MCI detection studies, emphasizing the need for a unified framework to enhance implementation and generalizability. (Fig. 13).

Fig. 13: Conceptual framework for future study designs in wearable EEG-based MCI detection.
Fig. 13: Conceptual framework for future study designs in wearable EEG-based MCI detection.
Full size image

This figure outlines a comprehensive approach to advancing wearable EEG-based MCI detection, encompassing crucial aspects from participant recruitment and diagnostic standardization to EEG data processing, model validation, and implementation guidelines.

Community-based recruitment and population representativeness

Future research should focus on community-based recruitment to improve ecological validity and generalizability. The predominance of laboratory-based settings in reviewed studies constrains real-world applicability and limits the translational potential for wearable EEG-based MCI detection34,35,36,37,43,44,45. Across included studies, sample sizes have varied considerably (13–336 participants), with studies reporting highest classification accuracies (>90%) frequently utilizing smaller samples30,33,34,40,43,45,46, suggesting potential overfitting and limited external validity. Future investigations should use adequately powered study designs with formal sample size calculations, enrolling a sufficient number of participants per diagnostic group to ensure reliable biomarker validation, considering effect sizes, study design, and expected attrition rate. Rigorous demographic matching between MCI and control groups across age, sex, education, and socioeconomic factors is essential to minimize confounding and enhance diagnostic specificity. Community-based recruitment via primary care networks, senior centers, and population registries enhances participant diversity, ecological representativeness, and reduces selection bias from clinic-based sampling.

Standardization of MCI diagnostic frameworks

Standardizing MCI diagnostic frameworks is essential for cross-study comparability and clinical translation. The use of MoCA screening with inconsistent cutoff thresholds (≤25 to <26) creates diagnostic inconsistency33,35,36,37,43,44,45,46, hindering meta-analysis and biomarker validation. The lack of confirmatory biomarker validation in reviewed studies, including neuroimaging, cerebrospinal fluid analysis, and positron emission tomography, is a fundamental limitation for establishing robust diagnostic criteria. Future research should use consensus-driven diagnostic algorithms that combine cognitive assessment with biomarker confirmation, following frameworks like the National Institute on Aging–Alzheimer’s Association (NIA-AA) guidelines97. Standardized protocols incorporating neuroimaging evidence of neurodegeneration, tau pathology, or amyloid burden would improve diagnostic accuracy, enable inter-study comparisons, and support regulatory approval for clinical use.

Device usability optimization and technical standardization

The heterogeneity in device specifications requires comprehensive technical standards for wearable EEG devices used in cognitive assessment. Implementation of mandatory medical device certification (CE marking, FDA clearance) should become prerequisite for research applications, ensuring signal quality benchmarks and safety compliance. A standardized reporting framework should encompass device specifications, signal quality metrics, and performance validation protocols. This framework should mandate transparent documentation of sampling rates, electrode impedance characteristics, signal-to-noise ratios, and artifact rejection capabilities to enable meaningful cross-study comparisons.

Standardized acquisition paradigms

The heterogeneity in recording protocols can be addressed by developing standardized paradigms validated for wearable EEG-based MCI detection. Virtual reality-based protocols represent a promising solution, as demonstrated by Wu et al.’s immersive language assessment environments that enhance ecological validity while maintaining experimental control35. These approaches address the fundamental limitation of traditional laboratory recordings by creating realistic assessment contexts that better reflect real-world cognitive demands98. However, challenges including cybersickness (dizziness, nausea, disorientation), age-related difficulties with interface controls, and technology acceptance barriers must be carefully addressed, particularly as MCI patients may experience more severe initial discomfort than cognitively normal older adults99. Simplified controls, structured training, and gradual exposure may help mitigate these limitations100. Future studies should adopt multi-domain assessment batteries that systematically evaluate memory, attention, and executive functions within standardized timeframes. The integration of oddball paradigms, memory-related tasks, and naturalistic cognitive challenges within VR environments can provide comprehensive neurophysiological profiles while ensuring participant engagement and protocol adherence.

Harmonization of data processing pipeline

Methodological standardization and signal processing

Given the inherent signal quality limitations of wearable EEG devices and substantial heterogeneity in current preprocessing methodologies, standardized automated processing pipelines with predefined quality thresholds are critical for cross-device comparability in MCI classification. Neurophysiologically-informed frequency-domain filtering targeting canonical spectral bands (delta, theta, alpha, beta, gamma) is essential, as MCI exhibits characteristic spectral alterations, including increased theta activity and reduced alpha/beta power.

Feature selection and biomarker validation

Feature extraction should prioritize neurobiologically-grounded markers reflecting established MCI pathophysiology, including spectral slowing101,102, reduced signal complexity103, and disrupted inter-regional connectivity104,105. Rigorous cross-validation with gold-standard MCI biomarkers, including amyloid/tau PET imaging, CSF markers (Aβ42, p-tau), and MRI-derived structural/functional metrics, is essential to confirm the neurobiological validity of wearable EEG-derived features.

Deep learning integration and interpretability

Recent advancements in end-to-end deep learning models, such as Convolutional EEG Encoder–Decoder Network (CEEDNet) and Gated Recurrent Unit (GRU)-based frameworks, capitalize on the temporal properties of EEG signals without the constraints of signal length106. These developments offer a promising avenue for improving the accuracy of wearable EEG-based MCI detection. Despite the “black box” nature of deep learning, emerging explainable AI methods are enhancing model interpretability. Techniques like LIME (Local Interpretable Model-agnostic Explanations) and gradient-based attribution methods are facilitating the identification of neurobiologically relevant EEG features, in alignment with established cognitive neuroscience principles107. Such advancements enhance the interpretability and clinical applicability of deep learning models, enabling reliable and transparent detection of MCI with wearable EEG devices.

Data infrastructure and validation standards

Large-scale, rigorously-annotated wearable EEG datasets with standardized performance benchmarks are essential for field advancement. Comprehensive methodological documentation should include machine learning model selection rationale, hyperparameter optimization protocols, neurophysiologically-justified feature selection criteria, and robust validation frameworks employing nested cross-validation or independent holdout datasets to ensure generalizability and prevent overfitting, thereby enhancing transparency and reproducibility across studies.

External validation and cost-effectiveness analysis

Another major challenge is the limited external validation of wearable EEG devices in real-world settings. Laboratory studies offer insights into feasibility, but performance in clinical environments remains largely unexplored. We recommend future studies prioritize external validation with diverse, independent cohorts reflecting real-world patient populations. Furthermore, the adoption of wearable EEG devices for MCI detection should be evaluated from a cost-effectiveness perspective. Cost-effectiveness analyses are crucial to determine the feasibility of integrating these devices into clinical practice. These analyses should consider direct costs (e.g., purchase and maintenance) and indirect costs (e.g., training and logistics management). Incorporating such analyses in future research will help assess the economic and practical viability of using wearable EEG devices.

Comprehensive reporting guidelines

Adherence to comprehensive reporting guidelines represents a critical step toward enhancing methodological transparency and research reproducibility. These guidelines should establish minimum reporting standards encompassing all aspects of wearable EEG-based MCI detection studies, from participant recruitment strategies and diagnostic criteria through signal processing methodologies and statistical analysis approaches108. Standardized reporting frameworks should mandate detailed documentation of device specifications, recording protocols, preprocessing pipelines, feature extraction methodologies, and validation strategies. Implementation of structured reporting templates, analogous to CONSORT guidelines for clinical trials, would facilitate systematic review and meta-analysis while improving study quality and comparability109. These guidelines should also require transparent reporting of negative results and methodological limitations to prevent publication bias and support evidence-based clinical translation.

Our review has several limitations. First, the rapid evolution of wearable EEG technologies means that recent developments may not be fully captured despite our comprehensive search across six databases. Second, methodological heterogeneity existed among included studies, with some reporting only accuracy ranges or not featuring accuracy as a primary outcome, potentially affecting our tabulated results and subsequent interpretations. Finally, publication bias favoring positive findings may have inflated the apparent diagnostic utility of wearable EEG systems for MCI detection.

In conclusion, our systematic review identified key characteristics associated with superior MCI detection performance using wearable EEG devices: 4–8 channel configurations, strategic frontal-parietal electrode placement, multi-domain cognitive protocols, advanced preprocessing techniques, comprehensive feature extraction, ensemble classification methods, and multimodal data integration. These findings provide essential guidance for optimizing wearable EEG usability and enhancing MCI detection accuracy. To address current methodological limitations, we recommend that future research prioritize:(1) standardize MCI diagnostic frameworks; (2) increase sample diversity; (3) optimizing device usability and technical specifications; (4) standardize recording protocols; (5) harmonizing data processing pipelines; (6) validate in real-world settings; (7) assess cost-effectiveness; and (8) implement comprehensive reporting guidelines. These insights advance the translational potential of wearable EEG-based MCI detection, establishing a foundation for scalable, user-friendly systems that could transform cognitive health monitoring and early intervention from specialized clinical settings to community and primary care environments.

Methods

The systematic review adhered to the PRISMA Guidelines for preferred reporting110 (Supplementary Table 18) and is registered with PROSPERO (CRD42025637128).

Search strategy

We conducted a comprehensive search across IEEE, Web of Science, PsycINFO, PubMed, EMBASE, and CINAHL through September 20, 2025, targeting studies using wearable devices for MCI identification. The search strategy combined broad search terms and Medical Subject Headings (MeSH), grouped into three categories: terms related to portable devices (e.g., wireless, mobile, consumer), terms related to electroencephalography (e.g., Electroencephalography, electroencephalogram, EEG), terms related to mild cognitive impairment (e.g., Cognitive Dysfunction, Mild Neurocognitive Disorder, Mild Cognitive Impairment), and terms related to detection (e.g., predict, diagnosis, distinguish). Boolean operators (AND, OR) were applied to refine and combine these terms. The detailed search strategy is provided in Supplementary Table 1. Additionally, we manually screened reference lists of relevant studies and systematic reviews to identify any eligible studies missed during database searching.

Eligibility criteria

Studies were included if they: (1) employed wearable EEG devices, defined as miniaturized, portable, wireless, and user-friendly head-mounted systems capable of recording brain activity in naturalistic or daily-life environments, without restrictions on electrode configuration or material; (2) enrolled participants with a clinical diagnosis of MCI and age-matched HC; (3) reported EEG-derived metrics or classification performance for MCI detection, providing at least one quantitative indicator (e.g., accuracy, sensitivity, specificity, F1-score, or AUC) in a binary MCI vs. HC design; and (4) were original, peer-reviewed articles published in English.

Studies were excluded if they: (1) used wired, invasive, or semi-invasive EEG systems (e.g., intracranial, subdural, or electrocorticographic electrodes), or devices described by manufacturers or study protocols as intended solely for experimental or hospital-based diagnostic use rather than for real-world applications; (2) employed recording setups that required participant immobilization or continuous connection to stationary hardware (e.g., bedside amplifiers, rack-mounted consoles, or wired signal hubs); (3) focused on therapeutic or interventional outcomes, conducted multi-class classification (e.g., MCI vs. AD vs. HC), or primarily investigated dementia subtypes without isolating MCI; or (4) were duplicate publications, reviews, preprints, conference abstracts, editorials, commentaries, or non-English articles. Detailed inclusion and exclusion criteria are provided in Supplementary Table 2.

Study selection

Study selection followed a two-stage process conducted by two independent reviewers (C.C.H. and X.R.Y.). Firstly, duplicates were removed using EndNote software, and titles and abstracts were screened blindly against eligibility criteria. Discrepancies were resolved by consensus or advanced to full-text evaluation. Subsequently, full-text articles were retrieved and independently assessed, with disagreements adjudicated by a third reviewer (N.J.). Reference lists of included studies were manually screened to identify additional relevant publications. The selection process is shown in the PRISMA flow diagram. (Fig. 1).

Data extraction

Two independent reviewers (C.C.H. and X.R.Y) extracted data using predetermined forms. Data included: demographics, MCI diagnostic criteria, participant characteristics, EEG acquisition parameters (tasks, electrode specifications, device features), signal processing methods, and MCI detection performance metrics. Discrepancies were resolved by a third investigator.

Statistical analysis

Given the significant heterogeneity among studies using wearable EEG for MCI detection, we employed a narrative synthesis framework to analyze the entire wearable EEG system and data flow for detecting MCI.

Our comprehensive evaluation framework examined three key dimensions of wearable EEG systems: device fundamentals, signal acquisition, and physical and mobility characteristics (Fig. 2). To systematically and objectively evaluate the physical and mobility dimension of wearable EEG devices, we additionally applied the CoME scheme52 as a complementary assessment tool. The CoME framework is a validated and standardized method designed to quantify system specifications and device mobility. In this review, we specifically used two CoME parameters most relevant to assessing these characteristics: CoME S (System Specifications) and CoME D (Device Mobility). The detailed scoring criteria for these two parameters are provided in the Supplementary Tables 19 and 20.

Our analysis examined key components of the detection pipeline: hardware specifications, electrode configurations, signal acquisition protocols, preprocessing methods, feature extraction approaches, classification algorithms, and validation strategies. To identify the features of wearable EEG studies that achieved superior performance in detecting MCI, studies were classified as superior classification performance when their reported accuracy or AUC was ≥80%. Because superior classification performance reported may not always reflect methodological robustness or adequate validation, a modified version of the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to assess the risk of bias and applicability. Studies were included only when fewer than three domains were rated as high risk. Those with two high-risk domains and three or more domains rated as unclear were excluded. Potential determinants of classification performance were then examined based on the shared characteristics of high-performing studies. Factors included electrode placement, number of EEG channels, extracted EEG features, integration of EEG with other modalities, and the use of ensemble classification methods. Group differences were analyzed using the Mann-Whitney U test, which is suitable for small samples and non-normally distributed data. Finally, multivariable binary logistic regression analyses were conducted to evaluate the independent associations of these factors with classification performance, with the dependent variable defined as superior versus non-superior performance according to the above criteria. All data analyses were conducted in R (version 4.5; R Foundation for Statistical Computing, Vienna, Austria). A two-sided P < 0.05 was considered statistically significant.

Quality assessment

The authors (X.R.Y and Y.H.Z) employed a modified version of the QUADAS-2 tool to assess the risk of bias and applicability concerns in all included studies (Supplementary Table 5). This adapted tool has been previously validated for evaluating wearable AI in depression detection111.