Abstract
Steady-State Visually Evoked Potential (SSVEP) signals can be decoded by either a traditional machine learning algorithm or a deep learning network. Combining the two methods is expected to enhance the performance of an SSVEP-based brain-computer interface (BCI) by exploiting their advantages. However, an efficient strategy for integrating the two methods has not yet been established. To address this issue, we propose a classification framework named eTRCA + sbCNN that combines an ensemble task-related component analysis (eTRCA) algorithm and a sub-band convolutional neural network (sbCNN) for recognizing the frequency of SSVEP signals. The two models are first trained separately, then their classification score vectors are added together, and finally the frequency corresponding to the maximal summed score is decided as the frequency of SSVEP signals. The proposed framework can effectively exploit the complementarity between the two kinds of feature signals and significantly improve the classification performance of SSVEP-based BCIs. The performance of the proposed method is validated on two SSVEP BCI datasets and compared with that of eTRCA, sbCNN and other state-of-the-art models. Experimental results indicate that the proposed method significantly outperform the compared algorithms, and thus helps to promote the practical application of SSVEP- BCI systems.
Similar content being viewed by others
Introduction
Brain-computer interface (BCI) technology opens the door to direct communication and control of the biological brain with peripheral devices1. Electroencephalogram (EEG) acquired on the scalp is the most commonly employed neurophysiological signal for constructing a BCI system owing to its low cost and high time resolution. So far, EEG-based BCIs have been widely applied in many fields such as neural rehabilitation, control of assistive technologies, entertainment and intraoperative awareness detection2,3.
The choice of an experimental paradigm is crucial in the study of BCI systems. Motor imagery (MI)4,5, steady-state visual evoked potential (SSVEP)6,7 and event-related potentials8,9 are widely studied and applied paradigms. In particular, SSVEP-based BCI (SSVEP-BCI) has become a hot spot in current research due to its high information transfer rate (ITR) and short training time1. SSVEP is the brain signal evoked by a repetitive visual stimulus and its maximal amplitude is located over the occipital region. SSVEP comprises the fundamental and harmonic signals of a stimulus frequency. In general, an SSVEP-BCI system includes multiple stimuli. When a user gazes at different stimuli, he/she can send varying instructions for controlling an electronic device.
EEG decoding is one core issue in EEG-based BCI systems. Due to low signal-to-noise ratio (SNR) and high inter-trial variability of EEG signals, accurately decoding them and recognizing the stimulus frequency poses a huge challenge, limiting the real-world applications of SSVEP-BCIs. EEG signals can be decoded by either a traditional machine learning (ML) algorithm or a deep learning (DL) network. According to the requirement for training data, the decoding methods are divided into training-free and training-based ones. Compared to the latter, the former requires long data segment to achieve an acceptable classification accuracy, and thus degrades the information transfer rate (ITR), which is a major performance metrics of an SSVEP-BCI system. For example, based on two typical datasets Benchmark and BETA32,33, the training-based system achieved the highest ITRs of 250 bits/min and 186.76 bits/min respectively, whereas the highest ITRs of training-free system were only 210.81 bits/min and 129.41 bits/min respectively29. Thereby, the existing BCI studies focus mainly on the training-based method. Traditional ML algorithms aim to boost the SNR of multi-channel EEG signals through spatially filtering. Thereby, the emphasis is placed on the optimization of spatial filters using training data. So far, several typical spatial filtering algorithms have been proposed for SSVEP-BCIs such as minimum energy combination10, canonical correlation analysis (CCA) and its variants11,12,13,14,15,16 and task-related component analysis (TRCA)17. Among them, the TRCA achieved better performance than other algorithms and has become a benchmark method for SSVEP classification. TRCA suppresses noise in SSVEP by maximizing the inter-trial covariance. By incorporating filter bank analysis15 and ensemble spatial filter17, the performance of TRCA can be further augmented.
Recently, deep learning (DL) techniques provide new solutions for the BCI classification task18,19,20,21,22,23,24,25,26,27,28,29. DL can automatically extract features applied to classification from original signals and improve classification accuracy. Feature extraction and pattern recognition are combined in the one frame to avoid information loss caused by different objective functions of the two stages. Lawhern et al.18 proposed a compact CNN named EEGNet for EEG-based BCIs and achieved great success in four BCI paradigms. They employed depthwise and separable convolutions to create an EEG-based specific network that packages the concept of feature extraction including spatial and temporal filtering. Subsequently, Waytowich et al.19 revised the EEGNet and applied it to an asynchronous SSVEP-BCI system. Guney et al.20 proposed a DNN structure (we rename it as sbCNN in the study) for processing sub-band SSVEP signals with convolutions across the sub-bands of harmonics, channels and time and classifying with a fully connected layer. A fine-tuning strategy was used for training the network to boost the intra-subject classification. The sbCNN network achieved the highest-ever ITRs on two benchmark SSVEP data sets. Zhang et al.21 proposed a bidirectional Siamese correlation analysis (bi-SiamCA) method for the detection of SSVEP signals. Two long short-term memory (LSTM)-based parallel subnetworks are used to extract features from the SSVEP signal and template signals and then calculate the similarity between the outputs of the two branches. The experimental results on two SSVEP datasets indicate that the network can significantly improve the classification accuracy compared with the prominent traditional and DL methods especially at short data lengths. Pan et al.22 proposed an efficient SSVEP network (SSVEPNet) for frequency recognition, which is based on one-dimensional convolution and LSTM module. Spectral normalization and label smoothing technologies are utilized to enhance network performance. Experimental results on two datasets validated the effectiveness of the network for SSVEP classification. Recently, the attention mechanism23 based on Transformer model attracts wide interest in DL field and is applied to SSVEP-BCIs24,25. Bagchi and Bathula24 proposed an EEG-ConTransformer network that incorporates multi-head self-attention modules to capture inter-region interaction patterns and convolutional filters to learn temporal patterns. Chen et al.25 proposed a Transformer-based DNN model (SSVEPformer) for SSVEP classification. They adopted the complex spectrum features of SSVEP data as the model input for enabling the model to jointly explore the spectral and spatial information. The two models achieved better classification accuracies and ITRs than other baseline methods.
Although both traditional ML method and DL method have achieved high decoding accuracy of SSVEP signals, neither of their classification performance can fully meet the needs of practical applications. Both methods have their own strengths and limitations. The former is able to extract discriminative feature based on the expertise of the researchers, but has a weak learning capability; The latter has a stronger ability to automatically represent high-level abstractions and can accurately model complex EEG signals, but require large amounts of labeled data, which is difficult to obtain for BCI research. Thereby, combining these two methods for decoding SSVEP signals is expected to improve the frequency detection accuracy by making use of their advantages.
Recently, it has been considered that single SSVEP classification models often suffer from overfitting or underfitting problems and may have low classification performance26. Yao et al.27 proposed a model called FB-EEGNet for SSVEP frequency detection, which fuses the features of multiple neural networks, resulting in much higher classification accuracy than a single model. Meanwhile, Li et al.28 also proposed a new DNN named Conv-CA that combines a DL model with a traditional ML model, which achieves higher classification accuracy and ITR through the fact that the outputs of the two parallel branches used for signal and reference are ultimately correlated and decoded with each other. This approach of combining DL models with traditional ML models provides new ideas for the processing of EEG signals, demonstrating its feasibility. Deng et al.29 proposed a new algorithm termed TRCA-Net to increase the classification performance of SSVEP signals, which first utilizes TRCA algorithm to create spatial filters for extracting task-related features, then the features from different filters are rearranged as new multi-channel signals and finally the rearranged features are classified with a deep CNN.
Motivated by the above-mentioned three studies, we propose a novel classification framework named eTRCA + sbCNN that combines an ensemble task-related component analysis (eTRCA)17 and a sub-band convolutional neural network (sbCNN)20 for recognizing the frequency of SSVEP signals. In principle, any one traditional ML method and any one DL method can be selected as two base methods for the combination. eTRCA and sbCNN are adopted in the study because they are state-of-the-art ML and DL method respectively. Specifically, based on sub-band filtered EEG data, the eTRCA and the sbCNN are first trained separately with training data, then the trained models are respectively used for classifying a single-trial testing signal, next the two classification score vectors are fused by an addition operation, and finally the frequency corresponding to the maximal summed score is decided as the stimulus frequency of the SSVEP signal. The contributions of this article are as follows:
-
(1)
In order to fully exploit the knowledge of eTRCA and the learning ability of sbCNN, a parallel model-combining framework eTRCA + sbCNN is proposed for detecting the frequencies of SSVEP signals. A feature fusion method is further proposed for enhancing SSVEP classification by summing the two classification score vectors of a testing trial yielded by eTRCA and sbCNN;
-
(2)
The performance of eTRCA + sbCNN is analyzed in depth and validated using two SSVEP datasets containing a total of 105 subjects and compared with that of eTRCA, sbCNN and other state-of-the-art traditional ML and DL methods in terms of classification accuracy and ITR at different lengths of data, numbers of channels and training trials. The experimental results validated the superiority of the method.
Methods
The flowchart of the eTRCA + sbCNN framework is shown in Fig. 1, which includes the components of data preprocessing, model training and testing-trial classification of eTRCA and sbCNN, addition of two classification score vectors, and frequency recognition. We will detail all these components in the following subsections except for the data preprocessing, which will be elaborated in the section Data Acquisition and Preprocessing.
The algorithmic flowchart of the eTRCA + sbCNN framework.
Sub-band convolutional neural network (sbCNN)
-
(1)
Network structure and parameters. The sbCNN20 is an end-to-end system that receives temporally filtered signals with three sub-bands. The network architecture is shown in Fig. 2a. The network consists mainly of four convolutional layers and one fully connected layer: the first convolutional layer is used for sub-band combining, the second for channel combining, the third and fourth for extracting features, and the fully connected layer predicts the stimulus frequency by selecting the frequency with the highest probability of being returned by the last softmax function. This sbCNN has 12 layers in total, whose parameters are shown in Fig. 2b.
-
(2)
Model training and classification. The model training and classification method for sbCNN is shown in Fig. 3. The sbCNN is trained in two stages using transfer learning method. In the first stage, the training data from all subjects are used for training. The weights of the first layer are initialized with 1, and the weights of the other layers are initialized from a Gaussian distribution with a mean of 0 and a standard deviation of 0.01. In addition, dropout probabilities of 0.1, 0.1, and 0.95 are applied between the second and third layers, the third and fourth layers, and the fourth and fifth layers respectively. The network is trained in each iteration based on the training batch \(\left\{ {\left( {x_{i} ,y_{i} } \right)} \right\}_{i = 1}^{{D_{b} }}\), where \(D_{b}\) is the number of trials in the batch, by minimizing the following categorical cross-entropy loss via Adam optimizer with the learning rate of \(\nu = 0.0001\)
where \(\lambda = 0.001\) is the constant of the L2 regularization, \(s_{i} \epsilon [0,1]\)\(^{{N_{F} \times 1}}\) is the softmax output for the instance \(x_{i}\), \(N_{F}\) is the number of class labels, \(s_{i} (y_{i} )\) is the \(y_{i} ^{\prime}{\text{th}}\) entry of \(s_{i}\), \(W\) are the weights of all layers in the sbCNN and the final prediction is done by \(\hat{y} = \arg \max s_{i} (j)\).
(a) sbCNN network architecture; (b) Parameters of the sbCNN network.
Model training and classification method for sbCNN.
In the second stage, only the training data from the testing subject are used for training, The weights and biases of each layer are initialized with the weights yielded in the first-stage training. The dropout probabilities between the second and third layers and between the third and fourth layers are modified as 0.6 for the Benchmark dataset and 0.7 for the BETA dataset due to reduced amount of training data.
For the two stages, training stops when the maximal number of epochs is reached, which is 1000 and 800 for Benchmark and BETA dataset respectively. The batch size is 100 and 200 for the Benchmark in the first-stage training and second-stage training respectively, whereas that is 100 and 120 for the BETA dataset in the first-stage and second-stage training respectively.
After the second-stage training, the subject-specific model \(net\) is obtained, which is employed for classifying the test data. For a single-trial testing signal \(\tilde{X}_{t} ,t = 1,2, \ldots ,N_{t}\), where \(N_{t}\) is the number of testing trials, the classification score vector \(s_{t}^{sbCNN} \in R^{{1 \times N_{F} }}\), where \(N_{F}\) is the number of classes or stimulus frequencies, and the stimulus frequency \(f_{t}^{sbCNN}\) are obtained using the neural network classification function as
Ensemble task-related component analysis (eTRCA)
(1) eTRCA Algorithm: TRCA16 is one of the most popular algorithms for recognizing SSVEP signals. It is used for creating a spatial filter by maximizing the reducibility of task-relevant components. Assume that the individual training data from the nth stimulus and the kth sub-band are denoted as \(x_{n}^{k} \epsilon R^{{N_{C} \times N_{S} \times N_{T} }} ,n = 1,2, \ldots ,N_{F} ,k = 1,2, \ldots ,N_{K}\), where \(N_{C} ,N_{S}\), \(N_{T} ,N_{F} \;{\text{and}}\;N_{K}\) are the number of channels, sampling points in a trial, trials for each stimulus, visual stimuli and sub-bands respectively. TRCA aims to optimize a spatial filter for each sub-band and each stimulus by maximizing the sum of inter-trial covariance, after projecting the multi-channel signal into single-channel signal using the spatial filter. Thereby, the objective of this algorithm is to find a spatial filter that maximizes the covariance as follows
where \(S_{n}^{k}\) denotes the sum of cross-covariance matrices between all pairs of trials for nth stimulus and kth sub-band. To yield a finite solution, the variance of \(S_{n}^{k}\) is normalized to one
where \(Q_{n}^{k}\) denotes the sum of self-covariance matrices for nth stimulus. With the constraint, the optimization boils down to a Rayleigh–Ritz eigenvalue decomposition problem and the spatial filter is estimated as
The solution of above equation can be represented as the eigenvector of matrix \(Q^{ - 1} S\) corresponding to the maximum eigenvalue.
The ensemble TRCA (eTRCA) is the extended version of TRCA. The ensemble spatial filter for kth sub-band, \(w^{k} \in R^{{N_{C} \times N_{F} }}\), is obtained by concatenating the spatial filters from all stimuli
(2) Model training and classification: Different from the training method for sbCNN, eTRCA is trained with subject specific method, i.e., only the training data from the testing subject is used for training eTRCA model. The model training and classification for eTRCA is illustrated in Fig. 4.
Model training and classification for eTRCA.
The spatial filter is used for filtering a testing trial from the kth sub-band \(\tilde{X}_{t}^{k} \in R^{{N_{C} \times N_{S} }}\) and the template signal from the kth sub-band and the nth stimulus, which is the average of all training trials, i.e., \(\overline{X}_{n}^{k} = (1/N_{T} )\sum\nolimits_{i = 1}^{{N_{T} }} {X_{n,i}^{k} } \in R^{{N_{C} \times N_{S} }}\). Their Pearson correlation coefficient \(\rho\) can be calculated as
The classification score is yielded by integrating the correlation coefficients from \(N_{K}\) sub-bands using the following formula16
Classification score vector \(s_{t}^{eTRCA} = [s_{t,1}^{eTRCA} ,s_{t,2}^{eTRCA} , \cdots ,s_{{t,N_{F} }}^{eTRCA} ]\) is yielded accordingly. Finally, the stimulus frequency of the testing trial predicted by eTRCA can be decided as follows:
Feature combination
Currently, sbCNN and eTRCA are the state-of-the-art traditional DL algorithms and ML algorithms respectively, which have yielded good performance in SSVEP-BCIs. Nevertheless, the eTRCA algorithm is limited in effectively utilizing information from other subjects, which is precisely the strength of DL methods. Therefore, effective integration of eTRCA and sbCNN approaches is expected to improve the classification performance of SSVEP-BCIs. In this study, a feature combination framework was developed to detect the stimulus frequency of SSVEP signals. Under the condition of small samples in the training set, it is expected to improve the classification performance of SSVEP-BCIs and the robustness of frequency detection, thus promoting their practical application.
In this study, a feature combination approach was developed to detect the stimulus frequency of SSVEP signals. We treat the two classification score vectors of a testing trial yielded by eTRCA and sbCNN respectively as two feature vectors, and fuse them as a feature vector for SSVEP classification. There are two commonly used methods for feature fusion: (1) Normalized sum. The two vectors are first normalized by their maximal score values respectively and then summed up; (2) Weighted sum. The two vectors are first weighted by the training accuracies generated by eTRCA and subCNN respectively and then summed up. Unfortunately, classification results of the two methods are not satisfactory. Instead, we fuse the two feature vectors by direct sum.
Specifically, for a testing trial \(t\), the two classification score vectors \(s_{t}^{sbCNN} \in R^{{1 \times N_{F} }}\) and \(s_{t}^{eTRCA} \in R^{{1 \times N_{F} }}\) derived from the sbCNN and the eTRCA classification respectively are summed together
where \(N_{F}\) is the number of stimuli (or stimulus frequencies). Then the stimulus frequency \(f_{t}\) of the testing trial \(\tilde{X}_{t}\) is decided as the frequency with the maximal score value
Remark 1
Model combining is a strategy commonly used in the field of machine learning and data analytics30,31, aiming at obtaining comprehensive performance that is more accurate, robust, or generalizable by integrating the outputs of several different models or methods. The starting point of this approach is that individual models may have unique strengths in different aspects, and by combining them effectively, they are able to compensate for their respective shortcomings and thus improve the overall performance. Common model combination methods include sequential combination29 and parallel combination27,28, among which the latter improves the overall performance, reduces the risk of overfitting, and improves robustness by integrating the outputs of several different models. Thereby, the parallel combination is used in the study.
Remark 2
Although the method for feature combination is straightforward, it is highly effective. The reason can be analyzed from three aspects as follows.
-
(a)
If each of the two scores corresponding to the correct stimulus frequency takes the maximum value in its classification score vector, the sum of two scores must take the maximum value and thereby the combined model can correctly identify the stimulus frequency of a single-trial testing data;
-
(b)
If one of the two scores corresponding to the correct stimulus frequency does not take the maximum value in its classification score vector, it is very likely to take the second largest value or a relatively large value. The sum of the two scores corresponding to the correct stimulus frequency can also take the maximum value and thus the combined model can correctly identify the stimulus frequency;
-
(c)
If both the two scores corresponding to the correct stimulus frequency do not take the maximum values in their classification score vectors, each of them is very likely to take the second largest value or a relatively large value. In this case, the sum of the two scores may still take the maximum value and thus the combined model may still correctly identify the stimulus frequency as long as the two largest scores do not occur at the same stimulus frequency.
Data acquisition and preprocessing
Data acquisition
The proposed method is evaluated on two publicly available SSVEP datasets, Benchmark32 and BETA33. The main differences between them lie in the number of subjects and the number of blocks performed by each subject. In addition, their experimental settings were also different.
-
(1)
Benchmark dataset. The dataset was acquired using 64 EEG channels from an SSVEP-based spelling experiment containing 40 stimulus targets (or frequencies), which were modulated by a joint frequency and phase coding approach. The stimulus frequencies ranged from 8 Hz and 15.8 Hz at 0.2 Hz intervals, while the stimulus phases ranged from 0 rad and 1.5 rad at 0.5 rad intervals. The sampling rate was 1000 Hz. 35 healthy subjects (17 females, average age of 22 years) took part in the experiment. The experiment contained 6 blocks and each block consisted of 40 trials, which correspond to 40 stimulus targets prompted in randomized order, i.e., every target contained 6 trials. Every trial lasted 6 s, including 0.5 s for visual cue, 5 s for visual stimulus and 0.5 s for relaxing. The dataset was collected in the laboratory with electromagnetic shielding.
-
(2)
BETA dataset. The dataset was acquired in a similar SSVEP-based spelling experiment to the Benchmark dataset. Seventy healthy subjects (42 males, average 25 years old) participated in the experiment. The difference from the Benchmark data is as follows: Each subject performed 4 blocks and each block contained 40 trials corresponding to 40 stimulus targets prompted in randomized order, i.e., each target contained 4 trials. Every trial lasted 3 s for the first 15 subjects or 4 s for the remaining subjects, including 0.5 s for visual cue, 2 s or 3 s for visual stimulus and 0.5 s for relaxing. Besides, the BETA dataset was collected outside the laboratory without electromagnetic shielding.
Data preprocessing
For each of the two datasets, EEG data of the 9 electrodes over occipital lobe, Pz, PO5, PO3, POz, PO4, PO6, O1, Oz, O2, were employed for the study. The raw EEG data were down-sampled from 1000 to 250 Hz. Single-trial data were segmented in the temporal window [0.64 s, (0.64 + d) s] and [0.63 s, (0.63 + d) s] for the Benchmark and BETA dataset respectively, where d denotes the data length used for frequency recognition, 0.5 s the time for gaze shifting, and 0.14 s and 0.13 s the latency delay in the visual system for Benchmark and BETA dataset respectively30,31. All data segments from the nine channels were filtered in the frequency range [m × 8 Hz, 90 Hz] with an IIR filter of Chebyshev type I, where \(m = 1,2, \cdots ,N_{SB}\) is the index of sub-bands, where \(N_{SB}\) is the number of sub-bands. Five and three sub-bands are used for eTRCA and sbCNN method respectively17,20. The temporally filtering was done forward and backward using the Matlab function filtfilt to avoid phase distortion.
Results
The performance of the eTRCA + sbCNN framework was evaluated by comparing it with those of eTRCA and sbCNN from the following aspects: classification accuracy, simulated ITR, feature distribution and confusion matrix in terms of different lengths of data, number of channels and number of training trials used for classification. In addition, the accuracy and ITR of eTRCA + sbCNN were also compared with those of several state-of-the-art tradition ML and DL models to further verify its superiority. Classification accuracy is the ratio of the number of testing trials correctly recognized to total number of testing trials, whereas ITR in bits/min1 is formulated as
where M is the number of stimuli, P is the classification accuracy of stimuli, and T is the mean time in seconds for a detection. For the calculation of ITRs, the 0.5 s for gaze shifting was added in the time for target detection.
The ITR represents the amount of information transmitted per unit time by a communication system. It is positively correlated with the number of detected stimuli and the detection accuracy, and negatively correlated with the detection time of a stimulus. In contrast, classification accuracy takes only the number of stimuli into account, without considering the detection time. Thereby, ITR is a more usable performance measure of SSVEP BCI systems than classification accuracy in real-world scenarios.
Classification accuracy and ITR
Since the performance of an SSVEP BCI is severely affected by the length of data, the number of channels and the number of the training trials used for frequency detection, we investigated the relationship between classification accuracy and ITR and these parameters. It is noted that both accuracy and ITR are the averaged accuracy and ITR across all subjects in a dataset.
Figure 5 shows the accuracies and ITRs yielded in each of the two datasets at five different lengths of data ranging between 0.2 s and 1 s with the stride of 0.2 s. The number of channels was fixed at 9 for the two datasets, whereas the number of training blocks was fixed at 5 for the Benchmark dataset and 3 for the BETA dataset. It is easily observed from the figure that for each dataset, the accuracies of these three models increase positively with the data length, whereas the ITRs of these three models first increase with data length, reach the peak at a data length and then drop continually. eTRCA + sbCNN significantly outperformed both eTRCA and sbCNN in terms of accuracy and ITR at all lengths of data except for 0.2 s, at which eTRCA + sbCNN and sbCNN do not have statistical difference in accuracy. The reason is that the performance of eTRCA is too poor at the data length of 0.2 s compared to sbCNN, so that their complementarity is not reflected. when the data length is 0.2 s, the highest ITRs, 249.47 bits/min and 188.55 bits/min, are achieved by eTRCA + sbCNN at the data length of 0.4 s for the Benchmark and the BETA dataset respectively. These results validate that the method for model combination is effective for improving the performance of SSVEP BCIs.
Classification accuracies (a) and ITRs (c) of the three models at five data lengths for the Benchmark dataset; Classification accuracies (b) and ITRs (d) of the three models at five data lengths for the BETA dataset. Error bars denote standard errors. The statistically significant difference between two algorithms (i.e., p value yielded by the paired t-test) is indicated by asterisks: * p < 0.05; ** p < 0.01; *** p < 0.001.
Figure 6 illustrates the accuracies and ITRs for each of the two datasets at four different groups of channels, each of which includes the number of channels from 3 to 9 with the stride of 2. The data length was fixed at 0.6 s for the two datasets, whereas the number of training blocks was fixed at 5 for the Benchmark and 3 for the BETA dataset. For the purpose of simplicity, the channels in each group were sequentially chosen from the nine channels employed for the study. It is seen from the figure that for each dataset, both the accuracies and the ITRs of these three models rise positively with the number of channels used for frequency recognition. In terms of accuracy and ITR, eTRCA + sbCNN is significantly superior to both eTRCA and sbCNN at 5, 7, and 9 channels, but significantly inferior to or has no statistical difference with sbCNN at 3 channels. These results demonstrate that using more than 3 channels, the method for model combination is effective for improving BCI performance. The reason is that too few EEG recording channels severely affect the performance of eTRCA algorithm, so does the combination of eTRCA and sbCNN.
Classification accuracies (a) and ITRs (c) of the three models at four groups of channels for the Benchmark dataset; Classification accuracies (b) and ITRs (d) of the three models at four groups of channels for the BETA dataset.
Figure 7 depicts the classification accuracies and ITRs of the three models for each of the two datasets at four and two numbers of training trials for the Benchmark and BETA dataset respectively. The data length and the number of channels used for frequency recognition were fixed at 0.6 s and 9 respectively for the two datasets. As shown in the figure, both the accuracies and ITRs of the three models rise with the number of training trials. For either dataset, eTRCA + sbCNN significantly outperforms both eTRCA and sbCNN as for accuracy and ITR at each number of training trials. These results indicate that as for the different training efforts, the method for model combination is effective for improving BCI performance.
Classification accuracies (a) and ITRs (c) of the three models at four numbers of training trials for the Benchmark dataset; Classification accuracies (b) and ITRs (d) of the three models at two numbers of training trials for the Beta dataset.
Feature distribution
In order to further explore the performance of the proposed framework, we employed a 2-dimensional t-SNE34 to compare the 40-dimensional features, i.e., the classification score vector. Figure 8 shows the two-dimensional feature distributions of the three models for the two datasets. In the figure, only the feature distributions of the first eight stimulus frequencies (i.e., the first 8 categories) starting from 8 Hz with the interval of 1 Hz are shown for easy observation. The raw 40-dimensional feature vectors were generated with 0.6 s-long data, 9 channels and 5 training blocks for the Benchmark dataset and 0.6 s-long data, 9 channels and 3 training blocks for BETA dataset. Each point in the figure represents one testing trial and the colors denote different categories. Based on leave-one block-out cross validation, each category includes a total of 35 (subjects) × 6 (blocks) = 210 testing trials for the Benchmark dataset or 70 (subjects) × 4 (blocks) = 280 testing trials for the Beta datasets. The results show that in each row, from left to right, the latter model produces tighter clustering and more separable categories in the 2-dimensional feature space compared to the former model. This arises from the fact that compared to eTRCA, sbCNN leverages other subjects’ data for training the DL model, and the additional information improves the quality of feature signals; on the other hand, eTRCA + sbCNN exploits the advantages of two models, reduces the overall model bias and improves the robustness of feature signals.
For the first eight stimulus frequencies from 8 to 15 Hz with a stride of 1 Hz, 2-dimensional feature distributions of testing trials from all blocks and all subjects yielded by t-SNE for the three models in the Benchmark dataset (a) and the BETA dataset (b).
Feature distribution
In order to further explore the performance of the proposed framework, we employed a 2-dimensional t-SNE34 to compare the 40-dimensional features, i.e., the classification score vector. Figure 8 shows the two-dimensional feature distributions of the three models for the two datasets. In the figure, only the feature distributions of the first eight stimulus frequencies (i.e., the first 8 categories) starting from 8 Hz with the interval of 1 Hz are shown for easy observation. The raw 40-dimensional feature vectors were generated with 0.6 s-long data, 9 channels and 5 training blocks for the Benchmark dataset and 0.6 s-long data, 9 channels and 3 training blocks for BETA dataset. Each point in the figure represents one testing trial and the colors denote different categories. Based on leave-one block-out cross validation, each category includes a total of 35 (subjects) × 6 (blocks) = 210 testing trials for the Benchmark dataset or 70 (subjects) × 4 (blocks) = 280 testing trials for the Beta datasets. The results show that in each row, from left to right, the latter model produces tighter clustering and more separable categories in the 2-dimensional feature space compared to the former model. This arises from the fact that compared to eTRCA, sbCNN leverages other subjects’ data for training the DL model, and the additional information improves the quality of feature signals; on the other hand, eTRCA + sbCNN exploits the advantages of two models, reduces the overall model bias and improves the robustness of feature signals.
Confusion matrix
Due to space limitation, only the Benchmark dataset is used to calculated the averaged confusion matrix of the eTRCA + sbCNN across subjects, which is shown in Fig. 9. The confusion matrix is yielded using 1 s-long data, 9 channels and 5 training blocks. The total number of testing trials for each stimulus frequency or category equals 210 (i.e., 6 (blocks) × 35 (subjects)). The experimental results show that the correct recognition rates of these stimuli were high and there were no large differences in accuracy among different stimuli.
Averaged confusion matrix of the proposed eTRCA + sbCNN model across 35 subjects in the Benchmark dataset yielded at the data length of 1.0 s.
Comparison with other models
To further validate the performance of eTRCA + sbCNN, we compare it with other state-of-the-art methods including traditional ML algorithms and DL networks. For the Benchmark dataset, the following methods are compared: (a) CCA11; (b) FBCCA15; (c) TRCA17; (d) compact-CNN18; (e) conv-CA28; (f) bi-SiamCA21. Their classification accuracies and ITRs are reported in Table 1. It is observed from the table that both accuracies and ITRs of the proposed method are higher than those of all other methods at the five different data lengths. Paired t-tests at 95% confidence level exhibit that eTRCA + sbCNN is significantly better than every other method in both accuracy and ITR at each of the five data lengths with all p values smaller than 0.05.
For the BETA dataset, the following methods are compared: (a) FBCCA15; (b) ms-eCCA35; (c) ITCCA13; (d) TRCA17; (e) Conv-CA28. Since most of these methods provided classification results only at the data length of 1.0 s, their accuracies and ITRs for the dataset at the data length are reported in Table 2. Paired t-tests at 95% confidence level exhibit that eTRCA + sbCNN is significantly better than every other method in both accuracy and ITR at the data length with all p values smaller than 0.05.
Computational complexity
The total training time of eTRCA + sbCNN includes that of sbCNN and eTRCA. The former is long because sbCNN comprises the two stages of global training and fine-tuning, whereas the latter is short and negligible compared to the former. The testing stage includes classifying a testing signal with the trained sbCNN and eTRCA model, summing two classification score vectors and predicting the label of the testing trial. This experiment was performed under Matlab R2023a on a computer configured with 12th Gen Intel (R) Core (TM) i5-12400F CPU @2.50 GHz, 32 GB RAM, a GeForce 3080 GPU, 64-bit Windows10. The training time takes about 2.4 h and the testing time of a single trial is less than 0.1 s. The training time is too long to suit online applications. However, the training time of sbCNN can be significantly reduced by replacing the global model with that trained from other subjects, and thereby eTRCA + sbCNN is still applicable to online experiments.
Discussion
Model combining is regarded as a promising approach to address the problem of how to improve the classification performance of a pattern recognition system with small samples in the training set. In this study, we propose a novel model combining-based classification method for improving the performance of SSVEP-BCI systems. Based on the addition of two feature vectors derived from two state-of-the-art models eTRCA and sbCNN respectively, the combined model eTRCA + sbCNN achieves significantly higher classification accuracy and ITR than both eTRCA and sbCNN on two commonly used SSVEP datasets.
Model combination can be done by either sequential method29 or parallel method27,28. The method proposed by Deng et al.29 belongs to the former, in which an eTRCA model is utilized for feature extraction, and a serially connected sbCNN model is employed for feature abstraction and classification, whereas the method proposed in this study belongs to the latter, in which the two models eTRCA and sbCNN are parallelly connected and their feature signals are summed together for classification. The advantage of the former method is that introducing eTRCA filters prior to sbCNN network improves the SNR of its input data, whereas that of the latter is that it avoids the loss of useful information in raw EEG data.
The proposed method for model combination works well because of the complementary nature of the two models. The DL-based model sbCNN is totally data driven, has strong ability to automatically represent high-level abstraction through multiple convolutional layers, and is able to extract generic features by exploiting the data from other subjects. On the other hand, the traditional model eTRCA is created relying on neurophysiological knowledge of a specific BCI paradigm and can extract appropriative features that fit this paradigm. Their combination takes into account both the universality and the specificity of the feature signals and consequently can achieve better performance than single models.
This paper provides a framework for combining a traditional ML model and a DL model to improve the performance of SSVEP BCIs. Any one traditional ML and any one DL model can be combined in an appropriate fashion. However, the performance of the two models must not differ too much, otherwise their combination will not improve and may even degrade the overall classification performance. As shown in Figs. 5 and 6, when the data length is 0.2 s, the classification accuracy of eTRCA is much lower than that of sbCNN, so that the accuracy of eTRCA + sbCNN has no significant difference with sbCNN for the two datasets; When the number of EEG channels is 3, the classification accuracy of eTRCA is also much lower than that of sbCNN, so that eTRCA + sbCNN has no significant difference with and is inferior to sbCNN in accuracy for Benchmark dataset and BETA dataset respectively, and has no significant difference with sbCNN in ITR for the two datasets. In the future, we will explore new methods for model combination such as weighting two models to further improve the classification effect of SSVEP signals.
The limitation of the proposed method is that the training time is too long when the sbCNN model is trained using transfer learning. For example, the total training time is about 2.4 h at the 0.4 s-long data for the Benchmark dataset comprising 35 subjects. This is not impractical for a new user due to the tedious calibration procedure. However, the main training time is taken on the first stage, in which the training includes large amount of data from other subjects. In practice, the first-stage training can be removed by directly transferring global model pre-trained on the existing dataset to a new user. Only the second-stage training for fine-tuning is performed with his/her calibration data. In addition, only an offline analysis of the proposed algorithm was conducted in this study. We will explore the its online implementation in the future.
Conclusion
In this paper, we present a model-combining framework eTRCA + sbCNN, specialized for frequency identification of the SSVEP BCIs. For a testing trial, the framework combines a traditional ML model eTRCA and sub-band signal-based DL model sbCNN by summing their classification score vectors and predicts the testing label (i.e., stimulus frequency) by choosing the label corresponding to the maximal summed score. We apply this proposed framework to two SSVEP BCI datasets including a total of 105 subjects and evaluate it by classification accuracy and ITR at different data lengths, channel subsets and training blocks. The experimental results demonstrate that the eTRCA + sbCNN framework significantly outperforms both eTRCA and sbCNN and several state-of-the-art traditional ML or DL models, and provides promising potential for the decoding of EEG signals in SSVEP BCIs.
Data availability
The two datasets analyzed in the study are available at http://bci.med.tsinghua.edu.cn/download.html.
References
Wolpaw, J. R., Birbaumer, N., Mcfarland, D. J., Pfurtscheller, G. & Vaughan, T. M. Brain-computer interfaces for communication and control. Clin. Neurophysiol. 113, 767–791 (2002).
Mridha, M. F. et al. Brain-computer interface: Advancement and challenges. Sensors 21(17), 5746 (2021).
Curran, E. A. & Stokes, M. J. Learning to control brain activity: A review of the production and control of EEG components for driving brain-computer interface (BCI) systems. Brain Cognit. 51, 326–336 (2003).
Ortiz-Rosario, A., Adeli, H. & Buford, J. A. Wavelet methodology to improve single unit isolation in primary motor cortex cells. J. Neurosci. Methods 246, 106–118 (2015).
George, S. H., Rafiei, M. H., Borstad, A., Adeli, H. & Gauthier, L. Gross motor ability predicts response to upper extremity rehabilitation in chronic stroke. Behav. Brain Res. 333, 314–322 (2017).
Vialatte, F. B., Maurice, M., Dauwels, J. & Cichocki, A. Steady-state visual evoked potentials: Focus on essential paradigms and future perspectives. Prog. Neurobiol. 90(4), 418–438 (2010).
Middendorf, M., McMillan, G., Calhoun, G. & Jones, K. S. Brain-computer interfaces based on the steady-state visual-evoked response. IEEE Trans. Rehabil. Eng. 8(2), 211–214 (2000).
Farwell, L. A. & Donchin, E. Talking off the top of your head: A mental prosthesis utilizing event-related brain potentials. Electroencephalogr. Clin. Neurophysiol. 70, 355–372 (1988).
Donchin, E., Spencer, K. M. & Wijesinghe, R. The mental prosthesis: Assessing the speed of a P300-based brain-computer interface. IEEE Trans. Rehabil. Eng. 8, 174–179 (2000).
Friman, O., Volosyak, I. & Graser, A. Multiple channel detection of steady-state visual evoked potentials for brain–computer interfaces. IEEE Trans. Biomed. Eng. 54(4), 742–750 (2007).
Lin, Z., Zhang, C., Wu, W. & Gao, X. Frequency recognition based on canonical correlation analysis for SSVEP-based BCIs. IEEE Trans. Biomed. Eng. 53(12), 2610–2614 (2006).
Zhang, Y. et al. L1-regularized multiway canonical correlation analysis for SSVEP-based BCI. IEEE Trans. Neural Syst. Rehabil. Eng. 21(6), 887–896 (2013).
Wang, Y., Nakanishi, M., Wang, Y., & Jung, T. Enhancing detection of steady-state visual evoked potentials using individual training data. in Proc. 36th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., Chicago, IL, USA, Aug. 2014, pp. 3037–3040.
Zhang, Y., Zhou, G., Jin, J., Wang, X. & Cichocki, A. Frequency recognition in SSVEP-based BCI using multiset canonical correlation analysis. Int. J. Neural Syst. 4(4), 1450013 (2014).
Chen, X., Wang, Y., Gao, S., Jung, T.-P. & Gao, X. Filter bank canonical correlation analysis for implementing a high-speed SSVEP-based brain–computer interface. J. Neural Eng. 12(4), 046008 (2015).
Wei, Q. et al. A training data-driven canonical correlation analysis algorithm for designing spatial filters to enhance performance of SSVEP-based BCIs. Int. J. Neural Syst. 30(5), 2050020 (2020).
Nakanishi, M. et al. Enhancing detection of SSVEPs for a high-speed brain speller using task-related component analysis. IEEE Trans. on Biomed. Eng. 65(1), 104–112 (2017).
Lawhern, V. J. et al. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 15(5), 056013 (2018).
Waytowich, N. et al. Compact convolutional neural networks for classification of asynchronous steady-state visual evoked potentials. J. Neural Eng. 15(6), 066031 (2018).
Guney, O. B., Oblokulov, M. & Ozkan, H. A deep neural network for SSVEP-based brain-computer interfaces. IEEE Trans. Biomed. Eng. 69(2), 932–944 (2022).
Zhang, X. et al. Bidirectional Siamese correlation analysis method for enhancing the detection of SSVEPs. J. Neural Eng. 19, 046027 (2022).
Pan, Y., Chen, J., Zhang, Y. & Zhang, Y. An efficient CNN-LSTM network with spectral normalization and label smoothing technologies for SSVEP frequency recognition. J. Neural Eng. 19, 056014 (2022).
Vaswani, A., Shazeer, N., Parmar, N., et al. Attention is all you need. in Proc. Adv. Neural Inf. Process. Syst. 30, Red Hook, Ny, USA: Curran Association, 2017, pp. 1–11.
Bagchi, S. & Bathula, D. R. EEG-ConvTransformer for single-trial EEG-based visual stimulus classification. Pattern Recognit. 129, 108757 (2022).
Chen, J., Zhang, Y., Pan, Y., Xu, P. & Guan, C. A transformer-based deep neural network for SSVEP classification. Neural Netw. 164, 521–534 (2023).
Dang, W. et al. MHLCNN: Multi-harmonic linkage CNN model for SSVEP and SSMVEP signal classification. IEEE Trans. Circuits Syst. II: Express Briefs 69(1), 244–248 (2021).
Yao, H. et al. FB-EEGNet: A fusion neural network across multi-stimulus for SSVEP target detection. J. Neurosci. Meth. 379, 109674 (2022).
Li, Y., Xiang, J. & Kesavadas, T. Convolutional correlation analysis for enhancing the performance of SSVEP-based brain-computer interface. IEEE Trans. Neural Syst. Rehabil. Eng. 28(12), 2681–2690 (2020).
Deng, Y., Sun, Q., Wang, C., Wang, Y. & Zhou, K. TRCA-Net: Using TRCA filters to boost the SSVEP classification with convolutional neural network. J. Neural Eng. 20, 046005 (2023).
Large, J., Lines, J. & Bagnall, A. A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates. Data Min. Knowl. Disc. 33(6), 1674–1709 (2019).
Tsai, C. F. et al. Predicting stock returns by classifier ensembles. Appl. Soft Comput. 11(2), 2452–2459 (2011).
Wang, Y. et al. A benchmark dataset for SSVEP-based brain–computer interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 25(10), 1746–1752 (2016).
Liu, B. et al. BETA: A large benchmark database toward SSVEP-BCI application. Front. Neurosci. 14, 544547 (2020).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Sun, Y. et al. Cross-subject fusion based on time-weighting canonical correlation analysis in SSVEP-BCIs. Measurement 199, 111524 (2022).
Acknowledgements
The authors would like to thank anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Contributions
Q.W. conceived the study, designed the experiment and revised the manuscript. C.L. performed the experiment, analyzed the data and wrote the manuscript. Y. W. and X. G. gave some advice and guidance about the algorithm. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wei, Q., Li, C., Wang, Y. et al. Enhancing the performance of SSVEP-based BCIs by combining task-related component analysis and deep neural network. Sci Rep 15, 365 (2025). https://doi.org/10.1038/s41598-024-84534-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-024-84534-6
Keywords
This article is cited by
-
TRCA with sparrow-optimized adaptive filter banks for SSVEP recognition
Cluster Computing (2025)











