Introduction

As the body’s central organ, the brain plays a pivotal role in regulating most of the physical and mental processes that underlie human behavior. The information encoded in brain waves has therefore garnered significant attention and holds substantial promise. Electroencephalography1 stands out among brain imaging methods for its relatively high temporal resolution and cost effectiveness2. Thus, electroencephalograms (EEGs) are used in a wide variety of fields, including the study of cognitive functions, healthcare, and mental state monitoring3.

Typically, EEGs are measured in a controlled environment to minimize the effects of external factors, such as noise, electromagnetic interference, and light4,5. The subject’s skin is prepared before electrodes are attached to it to reduce impedance6. High-frequency sampling rates are recommended for obtaining quality data, and wired connections are preferred for stable recording. However, wired connections between the device and computer present challenges in real-life applications due to restricting the subject’s mobility and convenience, limiting EEG devices’ usability outside the laboratory7.

Researchers and EEG device manufacturers have been actively working to overcome these issues and enhance EEG devices’ universal usability8,9. They have developed wireless devices to facilitate unrestricted movement, as well as gel-free sensors to enable users to quickly and comfortably wear the device. These commercial EEG devices are widely used in various fields, such as brain-computer interface (BCI) field and clinical applications10. However, there are concerns about the signal quality of these devices.

Previous studies have reported the evaluation of EEG devices and the validation of their signal quality through diverse approaches. Some studies assessed consumer-grade EEG devices’ usability11, whereas others evaluated temporal delays and signal distortion through experiments12. Cognitive tasks commonly used in BCI research—such as the oddball task, 0-back task, and stop-signal task—have also been used to examine whether characteristic EEG features, such as quantitative electroencephalography (QEEG) or event-related potential (ERP) components, appear as expected12,13. Resting-state EEGs have been used to evaluate device performance by examining whether characteristic EEG features are reliably observed13,14. Some studies assessed the detectability of large signals, such as eye blinks9,15. Although various evaluation methods have been explored, no studies have compared practical dry wireless commercial devices with four or fewer electrodes. In addition, basic-level comparisons of multiple consumer-grade and research-grade EEG devices have not been reported.

Our work compares the signal quality and user experience of wireless gel-free consumer-grade EEG devices, proposing a validation paradigm specifically designed for this purpose. We selected four widely used consumer-grade EEG devices and one research device for comparison and conducted a comparative evaluation. The paradigm assesses the devices’ ability to detect signals progressively, starting from large non-neural physiological artefacts signals to smaller brainwave signals. Additionally, it evaluates each device’s performance in noisy environments to ensure reliable signal acquisition under real-world conditions. This approach enables a comprehensive comparison of whether consumer-grade EEG devices can accurately measure EEG signals and how they perform in various challenging scenarios.

Methods and materials

Level of signal quality

Signal quality can be checked at various levels. For example, many researchers directly checked EEG devices’ applicability in BCI applications16,17. On the other hand, one recent study18 introduced an EEG phantom test that uses a rubber model, a function generator, and an oscilloscope for checking signal quality. The procedures in these studies are reasonable in the context of brain signal validity; however, they do not provide the granularity required to assess the specific signal quality that each EEG device captures. The testing procedure needs to be segmented to more accurately evaluate device-level signal quality. EEG device sensors typically measure electrical potentials flowing on the scalp; however, the measurement does not display the zero micro-voltage level because the device still records a residual analog signal19,20. Thus, experimenters usually set EEG electrode impedance below a certain level and monitor for artefactual signals generated from known actions like eye blinking and jaw clenching21. Once these are confirmed, brain wave acquisition is conducted according to a predefined experimental protocol. Another important consideration in EEG usage is its high sensitivity to noise from movement or environmental changes. Because most consumer-grade EEG devices are designed to be convenient for users, they are less stable and relatively more sensitive to physical movement22,23,24. Thus, noise-robustness is one of the important features for a usable EEG device25,26. Considering these points and issues, three points seem important and straightforward for evaluating consumer-grade EEG devices: quality of “signal detection,” “brain wave detection,” and “noise robustness.” “Signal detection” refers to the detection of non-neural physiological signals (e.g., eye blinks, jaw clenching) that exhibit greater amplitude than brain waves, whereas “brain wave detection” refers to the detection of characteristic features inherent in brain waves that includes changes of brain rhythms like Alpha, Beta and Gamma.

By summarizing the above-mentioned issues, we introduce an evaluation framework for checking the signal quality of Consumer-grade EEG devices as shown in Table 1. The first level of evaluation should assess basic functionality, ensuring that EEG devices can reliably detect significant fluctuations in scalp potentials, even when not originating from brain wave activity. As commonly practiced, artefactual signals generated from known movements such as eye blinking, jaw clenching, and head tilting15 serve as effective benchmarks for confirming EEG devices’ sensitivity to these signal fluctuations.

The second level of evaluation pertains to brain wave activity. Given that scalp electrical potential is not zero micro-voltage even in the absence of information about brain waves, it is important to confirm signal variations resulting from neural processes or specific brain rhythm changes. Conventional paradigms such as motor imagery27, ERPs28, mental arithmetic tasks29 are commonly used to elicit measurable brain responses. In addition, power shifts in the alpha rhythm—called the “Berger effect”—offer a simple yet effective means of validating this level. The primary goal of this level of evaluation is to confirm the device’s capacity for measuring brain activity.

The central question for the last level of evaluation is whether the device is robust against various sources of noise. The assessment methods at this stage should evaluate the EEG device’s robustness against noise from physical movement, environmental changes, and prolonged recording durations. A simple method for completing this evaluation involves checking the EEG characteristics of normal-state and noisy situations. For example, a subject may be instructed to remain in a relaxed state, perform physical movements, and then relax again. The resting-state EEG recorded before/after the movement task can be compared to see if they show similar spectral patterns.

Table 1 Evaluation framework of signal quality for consumer-grade EEG devices. Per each level, the main question, evaluation and exemplary methodology and experiment are described.

Evaluation study

EEG devices

In this study, we evaluated four consumer-grade EEG devices that are widely used for research and application development, as well as one research-grade device, DSI-24, for comparison. All consumer-grade and research-grade EEG devices used in this study are dry-electrode systems. These devices are shown in Fig. 1.

Fig. 1
Fig. 1
Full size image

EEG devices evaluated in this study. (A) BrainLink Pro, (B) NeuroNicle Fx2, (C) Mindwave Mobile2, (D) Muse2, and (E) DSI-24.

BrainLink Pro (BLP) is a product that Macrotellect, Inc. released in 2018. It has a single channel at Fp1 and one reference electrode on the left ear (Fig. 1-A). The maximum sampling rate is 512 Hz.

NeuroNicle FX2 (FX2) is a product by Laxtha, Inc. It has two electrodes (named EEG1 and EEG2) on the left and right frontal areas (Fig. 1-B), analogous to Fp1 and Fp2. It includes a reference electrode on the left ear and has a maximum sampling rate of 250 Hz.

Mindwave Mobile2 (MW2), which Neurosky released in 2018, also has a single channel at Fp1 and a reference electrode (Fig. 1-C) with a maximum sampling rate of 512 Hz.

Muse2 is a product that InteraXon, Inc. released in 2018. It has a total of four channels, Af7, Af8, Tp9, and Tp10 (Fig. 1-D), with a maximum sampling rate of 256 Hz.

DSI-24, a product from Wearable Sensing, Inc., has 21 electrodes and can be used both wired and wirelessly (Fig. 1-E), with a maximum sampling rate of 300 Hz. It is suitable for a wide range of studies30 and has been applied in various BCI paradigms, including P300-based spellers31, motor imagery32, and QEEG33, consistently demonstrating the intended outcomes within these paradigms.

Subjects

A total of 30 subjects (16 females and 14 males, aged 19–27 years, with a mean age of 23.2) participated in the study. The study received approval from the Institutional Review Board (IRB) of Handong Global University (No. [2023-HGUR026]). The experiments were conducted with the subjects’ full understanding and their written consent. All procedures were performed in accordance with the relevant guidelines and regulations.

Experiment

The experiment was designed to ensure that consumer-grade EEG devices can measure brain waves. The composition of the experimental paradigm is shown in Fig. 2. First, we verified whether the devices could detect artefactual signals, which show relatively larger amplitudes than brain waves. Subsequently, to evaluate the measured EEG signals’ validity, we verified whether the devices could detect Alpha power changes and alpha peak frequency. Finally, we verified the devices’ sensitivity to the wearer’s movements. Consumer-grade EEG devices are intended for use not only in controlled environments but also in everyday public situations where movements are likely to occur during use. For checking EEG devices’ signal quality, we followed the levels described in Table 1.

Fig. 2
Fig. 2
Full size image

Experimental procedure. Basically, experiments were designed to address level 1 to 3 settings. Each session consists of three blocks (pre-rest, task and post-rest). For exact instruction, a beep sound were provided during the task block.

For level 1, “signal detection,” we used artefactual signals generated from eye blinking and mandibular contraction (jaw clenching). These actions produce significantly larger amplitudes compared with brain waves, making them important indicators of artefactual signal in the recordings34. Assessing eye blinking and jaw clenching as initial steps allowed us to verify whether the EEG device was functioning correctly and could accurately capture signal variations on the scalp. The paradigm of eye blinking and jaw clenching proceeded as follows (see Fig. 2). The resting-state EEG (pre-rest) was measured before performing a one-minute task in a comfortable state. After that, the subjects performed 20 physical movements (e.g., eye blinking or jaw clenching) at the sound of a beep every three seconds within one minute. After this task, the resting-state EEG was measured for one minute in a comfortable state (post-rest). Except for eye-blinking cases, signals were recorded in the eyes-closed condition. During the jaw-clenching task, the action was performed by lightly biting a coffee straw with the teeth. We monitored the movement of the straw to ensure that the subjects performed the given task correctly.

For level 2, “brain wave detection,” the alpha shifts of eyes open versus closed conditions were introduced. For level 3, “noise robustness,” head movement was chosen because it requires a relatively light movement that a subject can easily perform, and moving the head is likely to occur in normal situations involving using an EEG headset, possibly influencing the EEG signal’s quality35,36. Thus, we incorporated eyes-open/closed conditions and head movement in the experiment. The head movement paradigm proceeded as follows. A pre-resting-state EEG was measured before a one-minute task was performed in a comfortable state. After that, the head moved from left to right or from right to left by the beep at three-second intervals for one minute. Sub-beep sounds at one-second intervals were played to prevent the subject from moving the neck too quickly and encourage them to move the neck at an even speed. The task was performed 20 times in one minute. Afterward, the post-resting-state EEG was measured for one minute as the subject remained in a comfortable state. This procedure was performed with eyes-open and eyes-closed states, respectively.

The experiment consisted of a total of five sessions, with each session using one randomly assigned device. However, the comparison device, DSI-24, was used in Session 3. Each session included four paradigms: eye blinking, jaw clenching, head movements to the left and right with eyes open, and head movements to the left and right with eyes closed.

All paradigms were conducted for a total of three minutes, consisting of a one-minute pre-rest period, during which an EEG was measured in a resting state before the task; a one-minute task performance; and a one-minute post-rest period, during which an EEG was measured in a resting state after the task (Fig. 2). Participants performed the task at the sound of a beep that played every three seconds. Each experiment required approximately 2.5 h, including EEG setup, user evaluation, and the completion of four paradigms across the five devices.

Questionnaire-based evaluation

A questionnaire-based study was also conducted using an adapted version of the System Usability Scale (SUS)37 to assess user evaluation. At the end of each session, participants completed all four paradigms using a single EEG device. They were then asked to rate the device on a scale from 1 to 10 across five dimensions: comfort while wearing, willingness to wear again, design, and familiarity with the device. At the end of the experiment, the participants were asked to rank the five devices based on the difficulty of wearing them and their preference for each device. The survey was conducted in Korean (native language), and Table 2 provides the translated version of the survey questions.

Table 2 Questionnaire items for user evaluation.

Data analysis

Data preprocessing

For data analysis, EEG signals were obtained from electrodes positioned over the left forehead was used. That are Fp1 channel for BLP, MW2, and DSI-24, AF7 for Muse2, and EEG1 for FX2. Also, for Muse 2, bipolar re-referencing was performed by referencing AF7 to TP10.

Due to the heavy noise issue, not all the data was usable. Thus, we excluded some data for the analysis. The number of subjects excluded for each device in the data analysis was as follows (Table 3). Subject exclusion was categorized into two cases: Case 1, where the eye blinking or jaw clenching paradigm could not be used, and Case 2, where the head movement paradigm could not be used. Due to Case 1, three subjects were excluded from Muse2. Due to Case 2, five subjects were excluded in Muse2, and one subject was excluded in each of the other four devices. In Case 2, the data from subject S26, measured with all devices, were excluded.

For the non-neural physiological artefacts signal evaluation, the one-minute data obtained during the tasks (Fig. 2-A, B) were used. The full one-minute pre-rest data (Fig. 2-C, D) were used for the EEG signal validity evaluation, whereas both pre-rest and post-rest one-minute data (Fig. 2-D, E) were used for the EEG signal motion sensitivity evaluation. Data analysis was conducted in MATLAB (MathWorks Inc., R2022b) and using the EEGLAB library (version 2022.1).

Table 3 Number of excluded subjects. Subjects are excluded when Raw data contained excessive noise or unstable electrode contact.

Evaluation of signal detection

Figure 3 shows an exemplary plot of the EEG recorded during the eye blinking and jaw clenching task. We manually counted the peaks stemming from blinking and jaw clenching to verify how well these non-neural physiological artefacts signals were detected on the recordings. The time series data for each task was visualized and reviewed by three experimenters to ensure accuracy.

Fig. 3
Fig. 3
Full size image

EEG signal from all five devices, recorded during the eye-blinking and jaw-clenching tasks of subject 03. No preprocessing is applied to the four consumer-grade devices (BrainLink Pro, NeuroNicle Fx2, MindWave Mobile2, and Muse2). For DSI-24 only, a 3-Hz high-pass filter is applied to suppress the prominent low-frequency drift observed in this subject. Normalization is not applied.

Evaluation of brain wave detection

Figure 4 shows an exemplary EEG recording obtained during resting conditions with eyes closed and eyes open. We utilized the alpha rhythm shifts (known as the Berger effect) and the alpha peak frequency to evaluate the capacity of brain wave detection. The Berger effect refers to the phenomenon in which visual stimuli, such as opening the eyes, suppress or reduce stable alpha waves38,39. Thus, an increase in alpha power occurs when the eyes are closed compared with when they are open. Alpha peak frequency denotes the frequency of the highest alpha peak, which varies individually40. These two features were computed as follows.

Fig. 4
Fig. 4
Full size image

EEG signal from all five devices, recorded during the eyes closed and eyes open resting state of subject 03. Signals from the four consumer-grade devices are displayed without any preprocessing. The DSI-24 data are additionally high-pass filtered at 3 Hz to remove strong low-frequency components. No normalization is applied.

First, the Berger effect index was obtained by dividing the alpha power of the eyes-closed condition by the alpha power of the eyes-open condition (which produces the ratio of alpha powers between the two conditions). The Berger effect index was higher than 1, as the eyes-closed condition produced a larger amplitude of the alpha rhythm. To calculate this index without bias, we removed the 1/f trend from each power spectral density (PSD) by fitting the trend (the Curve Fitting Toolbox in MATLAB was used), and the alpha power was computed by adding up the powers within the frequency range of 8–13 Hz.

Second, we picked the frequency of the peak alpha within 6–15 Hz. This was because the alpha peak can appear in a slightly wider band range than the traditional alpha band (8–13 Hz)40. The experimenter manually identified and determined the peak. During this procedure, we referred to the open source41 to calculate the alpha peak frequency. Figure 5 represents the exemplary power spectral densities of two conditions. As seen, the higher alpha power is shown in eyes-closed condition, and the peak frequency at 10.3 Hz is observable.

Fig. 5
Fig. 5
Full size image

Representative power spectral densities of two conditions (eyes open/closed) from subject 04 across all five devices. No filtering or amplitude normalization is applied. PSD values are expressed in µV²/Hz. Alpha peaks are clearly observable in eyes-closed condition at 9.2, 9.5, 9.2, 9.0, and 9.2 Hz, respectively.

Evaluation of noise robustness

We compared the pre-/post-movement data to evaluate the device’s sensitivity to movement. Based on the assumption that devices may exhibit varying levels of robustness against artefactual movements, we hypothesized that differences would manifest in the EEG spectral patterns. Thus, the power spectral density was calculated, and Pearson correlation analysis was conducted to determine the similarity between the two time points.

Statistical analysis

Repeated measures analysis of variance (ANOVA) was employed to compare differences between devices, accounting for within-subject variability across multiple measurements. This approach was chosen due to the repeated measurements collected from each subject across different devices, allowing for a more accurate assessment of device differences while controlling for individual variability. Additionally, we also used the Wilcoxon signed-rank test in case normality is not satisfied.

Results

Evaluation of signal detection

Eye blinking and jaw clenching were performed to evaluate the detection of non-neural physiological artefacts signals. Each participant performed 20 times per task for each device, and we confirmed that all devices showed 20 or close to 20 (FX2 and Muse2) on average (Table 4). This confirms that all five devices can detect non-neural physiological artefacts signals.

Table 4 Counts of eye blinking (EB) and jaw clenching (JC). Data excluded from this analysis due to heavy noise are marked with “-”. Note that the maximum number of events is 20 per each task.

Evaluation of brain wave detection

We checked the Berger effect index, which is the ratio between the alpha powers of eyes-closed/open conditions. Figure 6 shows the results of the Berger effect index. Average values of the index are 4.023 (BLP), 5.442 (FX2), 4.284(MW2), 3.779(Muse2), and 7.962(DSI-24). Repeated measures ANOVA was performed across the devices less than 0.001, indicating a statistical difference. A post-hoc pairwise comparison with Bonferroni correction revealed significant differences between device 5 and devices 1, 2, 3, and 4 (adjusted p < 0.05). All devices showed a reasonable score beyond index = 1, although some subjects (presented in dots) were placed under the y = 1 line (black dashed horizontal line in Fig. 6). This result indicates that the tested devices are capable of detecting brain wave shifts (here, the alpha rhythm).

Fig. 6
Fig. 6
Full size image

Berger Effect index across devices. The Berger effect index is measured by the ratio between the alpha powers of eyes-closed/open conditions. Dots denote recordings by subjects, and the black dashed horizontal line is presented for guidance of y = 1.

Secondly, we picked the individual alpha peak frequency and compared it with the value obtained from DSI-24 (research-grade device). Table 5 shows the results, and Table 6 demonstrates the average difference of the alpha peak frequency between each device and DSI-24. We observed \(\:\varDelta\:f=\) 0.24 Hz (BLP), 0.26 (FX2), 0.20 (MW2), and 0.32 (Muse2) on average. A repeated measures ANOVA revealed that no statistically significant difference exists between them (p > 0.05).

Table 5 Individual alpha peak frequency (Hz). Data excluded from this analysis due to heavy noise are marked with “-”.
Table 6 The average difference of alpha peak frequencies (Hz) between each device and DSI-24.

Evaluation of noise robustness

Movement sensitivity was evaluated through a correlation analysis with the power spectral densities of two EEG recordings from pre-/post-movement. The results are shown in Fig. 7. The correlation coefficients are \(\:r=\) 0.94 (DSI-24), 0.95(BLP), 0.94 (MW2), 0.91 (FX2), and a relatively lower value was observed in Muse2 (0.89). Repeated measures ANOVA was performed across the devices, resulting in a p-value of 0.038, less than 0.05, indicating a statistical difference. A post-hoc pairwise comparison using the Wilcoxon signed-rank test with Bonferroni correction revealed significant differences between device 1 and device 4, and between device 3 and device 4 (adjusted p < 0.05).

Fig. 7
Fig. 7
Full size image

Results of correlation analysis. The white dots represent the average of correlation coefficients, whereas colored dots denote samples. The estimated distribution is also presented with violin shape.

User evaluation

Figure 8 presents the user evaluation survey results. The user evaluation results showed that the research device, DSI-24, scored lower compared with the four consumer-grade EEG devices. The average score across five categories was 7.24 (BLP), 7.11 (FX2), 7.67 (MW2), 7.11 (Muse2), and 4.15 (DSI-24). MW2 received the highest scores across all five question items. Notably, the level of comfort while wearing DSI-24 was significantly lower, at 3.33, compared with the consumer-grade EEG devices’ average score of 7.9. Repeated measures ANOVA was performed across the devices per question item, and the p-values for all question items were less than p < 0.05. Post-hoc pairwise comparisons with Bonferroni correction showed that DSI-24 differed significantly from the four other devices across all questions (all p < 0.001). In contrast, no significant differences were observed among those four EEG devices. The differences between research equipment and consumer-grade equipment were significant, but no significant differences were found among the consumer-grade equipment. The only exception was observed when a statistical test was conducted on the design-related question between MW2, which received the highest score among the consumer-grade equipment, and BLP, which received the lowest score, showing a significant difference.

Fig. 8
Fig. 8
Full size image

User evaluation results. Participants rated comfort, usability, and perceived stability of each device using a 10-point Likert scale (1–10). The each bar represents the average value.

In the survey responses regarding the maximum wearable duration shown in Table 7, at least 66.6% of subjects responded that they could wear consumer-grade EEG devices for more than 60 min–60 min. However, only 33.3% of subjects responded similarly to the DSI-24. Notably, whereas up to 10% of respondents reported being able to wear the consumer-grade EEG devices for less than 30 min, 36.7% reported this for the DSI-24, indicating difficulties with prolonged wear.

In the post-experiment survey, 29 out of 30 participants indicated that the DSI-24 was the most difficult device to wear. Additionally, 24 out of 30 participants reported it as the most uncomfortable device to wear. When asked about their preferred devices, nine participants chose BLP, four chose FX2, seven chose MW2, six chose Muse2, and four chose DSI-24. The reasons for choosing a preferred device included comfort during wear, stability while wearing, fit, absence of a reference, and weight.

Table 7 Maximum wearable duration per equipment (number of respondents = 30). The number of respondents is presented in each pair of device and time duration.

Discussion

An experimental paradigm was proposed to verify whether consumer-grade EEG devices accurately measure EEG signals and assess the quality of data. First, the device’s ability to detect artifacts with relatively large amplitudes was assessed to verify accurate measurement, and the recorded EEG signals’ validity was confirmed by analyzing Alpha power changes and its peak frequency. Furthermore, the devices’ sensitivity to movement was evaluated to check the noise robustness, considering consumer-grade equipment’s diverse applications.

All devices successfully detected non-neural physiological artifacts. In this study, the artifacts observed across 20 trials were visually confirmed by the experimenter. However, when the number of repetitions increases in future sessions, an automated algorithm would be required to efficiently identify such artifacts.

An increase in alpha power was observed in all devices when the eyes were closed. Additionally, the individual alpha peak frequency, which varies among individuals, was similarly detected across all devices within the same participant. This indicates that the EEG data measured with all devices is valid. The BLP device showed lower sensitivity to movement compared with other devices, whereas Muse2 exhibited relatively higher sensitivity to movement compared with the other devices. These results may relate to the noise level of each data. We checked the known documents and found some relevant information. The NeuroNicle FX2 has internal noise below 0.8 µV rms42, and the DSI-24 is reported to have less than 3 µV peak-to-peak within the frequency range of 1–50 Hz43. However, noise floor values are not specified in the vendor documentation for consumer-grade systems such as MindWave 2, BrainLink Pro, and Muse 2. Thus, we could not further interpret our results.

The absolute amplitude varied across devices because the hardware configurations of the five systems are not identical. Differences in factors such as reference scheme, electrode characteristics, impedance behavior, and analog front-end properties (including gain and bandwidth) can influence the scale of the recorded voltage even when the same neural activity is measured. As these device-specific elements can affect amplitude independently of signal quality, the absolute magnitude should not be interpreted as a direct measure of performance. Instead, more stable indicators—such as spectral patterns, alpha peak identification, and task-related modulations—provide more meaningful bases for comparing device behavior.

During the experiments, multiple retests were conducted with the Muse2 device due to heavy noise, and some experiments were excluded when the issues could not be resolved. These problems appeared to be related to the participants’ head shapes. When the experimenter wore the device for verification, the device worked normally, but it failed to measure normally when the participant wore it again. Although Muse2 is somewhat sensitive to the participant’s head shape, all five devices seem capable of measuring brain waves (tested here by using the alpha rhythm).

In the user evaluation, a significant difference in responses was found between the consumer-grade EEG devices and the research device. Additionally, in the survey conducted after testing all five devices, the majority of respondents indicated that the DSI-24 was the most difficult and uncomfortable to wear. This suggests that the research-grade device, DSI-24, faces challenges in consumerization not only due to its high cost but also due to user discomfort and difficulties in continuous use from a usability perspective.

In terms of signal characteristics, the four consumer-grade EEG devices demonstrated generally similar performance. However, each device had its own advantages and disadvantages in terms of the number of electrodes and connection methods. In practice, when evaluating and selecting EEG devices, it is important to consider not only signal quality but also whether the device specifications are suitable for the experimental purpose. In this study, we prioritized the use of a program called OpenViBE to connect the EEG devices. OpenViBE is a free software platform widely used in EEG experiments, as it enables the use of multiple EEG devices through a unified interface and facilitates experimental design and execution44.

Mindwave2 was easily connected via OpenViBE, enabling smooth signal acquisition and experiment execution, with no re-recordings due to connection issues. It also received the highest user evaluation scores, reflecting its relatively favorable usability and comfort. However, because it uses a single electrode at Fp1, the amount of information that can be obtained is limited. The BrainLink Pro, like Mindwave2, could also be connected via OpenViBE and allowed for easy signal acquisition; however, re-recordings due to connection issues occurred in 10 out of 30 participants. BrainLink Pro also provides limited information due to having only one electrode at Fp1. Since many frontal EEG studies aim to examine asymmetry45,46, MindWave2 and BrainLink Pro cannot meet these experimental requirements.

FX2 consists of two electrodes (around Fp1 and Fp2) but has the disadvantage of being incompatible with OpenViBE. Although it provides its own software, stimuli cannot be presented, and event markers cannot be inserted, which substantially limits its use beyond resting-state recordings. Lastly, Muse2 comprises four electrodes and supports Lab streaming layer (LSL) communication through external programs, allowing it to be used in OpenViBE-based experiments. However, it showed the highest number of re-recordings due to signal instability and exclusions caused by severe noise, which appeared to be influenced by individual differences in head shape affecting electrode contact quality.

When choosing among several consumer-grade EEG devices, if there is no significant difference in signal quality, the device specifications may become an important factor in determining suitability for the intended experimental purpose.

Future work should include EEG device validation using paradigms widely employed in practical BCI applications. For instance, tasks such as mental arithmetic, cognitive workload assessment, and the oddball paradigm can be analyzed using a relatively small number of channels, making them promising candidates for practical applications with consumer-grade EEG devices. These tasks can be used to compare whether EEG features obtained from consumer-grade EEG devices resemble the features extracted with research-grade devices. This comparison would help clarify not only consumer-grade EEG devices’ practical applicability but also the scope and limitations of the analyses that can be performed with them.

Conclusion

In this study, we designed a conceptual evaluation framework for EEG devices comprising three concrete levels of testing and applied it to assess five EEG devices. The results demonstrated that four consumer-grade EEG devices were capable of reliably measuring non-neural physiological artefacts signal, brain waves. Moreover, these consumer-grade devices exhibited superior usability compared with the research-grade device. Future research should further assess these devices’ suitability for BCI applications by employing well-established paradigms such as ERPs and P300.