Introduction

Relaxation methods proved to be helpful for the patients with some illnesses and mental disorders. Oncological patients were reported to respond better to treatment when they used relaxation techniques1. Therefore, it is beneficial to develop relaxation techniques in order to improve the quality of life.

Guided Imagery (GI) is a technique that harnesses the power of imagination to bring about changes in physical, emotional, or spiritual aspects of an individual2. It is a common practice in psychotherapy where relaxation methods are combined with the creation of mental images that engage all five senses: sight, sound, touch, taste, and smell. The purpose of this technique is to intentionally create specific images that can alter physiological and emotional states using the subject’s imagination3. As a relaxation technique, Guided Imagery is largely applied and proved to be effective in reducing test anxiety and dealing with stress of different origins4,5,6. In the study on reducing state anxiety by Nguyen et al.6 nature-based (for example forests) surroundings and non-nature-based (such as urban and office) visualizations were used. Nature-based visualizations were proven to better reduce anxiety. There are also benefits of using Guided Imagery approach in chronic pain diseases7 by using calming images to interrupt pain signal. Also, since Guided Imagery is used to reduce cancer-related pain, it could be beneficial to know if GI is affecting patient’s state8. There are also findings that show promising results of Guided Imagery in treating sleep disorders using visualization of serene environments, such as a tranquil beach, forest to reduce physical and cognitive arousal that interferes with sleep9.

Guided Imagery utilizes mental visualization to provoke sensory experiences and is recognized as one of the most ancient healing practices. It is described as an internal perception of an event without any real external stimuli, involving both sensory and cognitive aspects10. Guided Imagery and meditation are alike in that they both incorporate relaxation methods and mental visualization to impact physical and emotional conditions. These practices are effective in soothing the mind and body, alleviating stress and anxiety, and fostering a sense of well-being. In relation to Focused Attention (FA) and Open Monitoring (OM) types of meditation, Guided Imagery can include aspects of both. During Guided Imagery, individuals may concentrate on specific mental images or scenarios (FA) while also staying receptive to the sensory experiences and emotions that emerge during the visualization process (OM). Thus, Guided Imagery can be understood as a blend of FA and OM meditation techniques, offering a structured approach to visualization while permitting open awareness of internal experiences11. Guided Imagery is recognized for its effects on various physiological systems, including the respiratory, cardiovascular, metabolic, gastrointestinal, and immune systems. It influences these systems by regulating the activity of the hypothalamic- pituitary-adrenal axis and encouraging a state of relaxation and overall well-being12,13.

Electroencephalography (EEG) can be a good method to find out if patients are in the state of relaxation or not. Scalp EEG is a non-invasive method of measuring bio-electrical activity of the human brain. Moreover, it is less expensive and less stressful for patients than other brain activity measuring devices, such as PET or MRI14,15. On the other hand, manual multichannel EEG signal analysis can be a difficult and time-consuming process. Machine learning and deep learning tools are commonly used to classify various types of data, starting with the images16 to the different kinds of signals17,18. The aim of this study is to propose an EEG signal classifier based on the 1D Convolutional Neural Networks (CNNs) by using raw signal with only basic filtering done as an input data.

For different types of EEG signals, classical machine learning (ML) methods, such as Support Vector Machines (SVM), were used19. In classification of relaxation and concentration states based on the electroencephalographic signal SVMs can achieve around 80% of accuracy (ACC)20.

State-of-the-art classification methods applied for the EEG signal already used Convolutional Neural Networks (CNNs) with success21. Furthermore, the above mentioned classical ML methods are increasingly being replaced by deep learning approaches. Convolutional Neural Networks are applicable in the EEG signal analysis, for instance, in motor imagery processing22, epileptic seizure detection23, emotion recognition24, and research topics devoted to Brain-Computer Interfaces based on EEG feature extraction using CNNs25, among others, even for identity authentication26.

The most common approach is to classify signals by feeding the classifier with the frequency bands data. The EEG signal is commonly partitioned into discrete frequency ranges, encompassing delta waves below 4 Hz, theta waves ranging from 4 to 7 Hz, alpha waves spanning 8 to 12 Hz, beta waves between 13 and 30 Hz, and gamma waves surpassing 30 Hz. It was proved that using specific selected bands of EEG signal, SVM classificator can be done20,27. Calculation of power across specific frequency bands is needed. Therefore it would be beneficial to skip manual feature extraction and use CNN-based feature extraction from the raw signal. Some researchers used this approach successfully for emotions recognition28,29. The experiments described by Baydemir et al. showed that it is possible to classify EEG signal of low and high cognitive load using 1D-CNN with a great accuracy30. Classification of fNIRS-EEG mental workload signal using CNN was made, showing a good accuracy of 89%31. However there are only few papers including 1D Convolutional Neural Networks used specifically in the binary classification of relaxation and mental workload using the raw EEG signal which still needs to be investigated.

In our previous research, the classical classification method was used for Guided Imagery and Mental Task groups32. Generalized Linear Model (GLM) used in that research achieved 81% accuracy using a very specific time segment, 779–839 s, extracted from the complete recording. In order to achieve this level of accuracy, this required feeding the classifier with five EEG bands (alpha, beta, delta, theta, and gamma), extracted from the raw signal of the 60 seconds duration. However, on the full-length recording, the accuracy of 90.77% was achieved.

The objective of this study is to compare four approaches to classification of EEG signals of two mental states: Guided Imagery relaxation technique and Mental Workload tasks. For this research 1D Convolutional Neural Network (1D-CNN), Long-Short Time Memory (LSTM), 1D-CNN-LSTM hybrid model and 2D-CNN (EEGNet) will be taken into consideration. Signals were filtered and split into 1-second segments. Bad channels were marked automatically and interpolated. That way all 256 channels could have been used for training. No further preprocessing or artifact removal was done. No features were extracted from that signal manually.

Materials and methods

The signal for this study was obtained from a cohort of 26 males, aged 19–24 years. They were all right-handed and short-haired. Being right- or left-handed could influence the results due to brain lateralization. Described experiments were reviewed and approved by the Maria Curie- Skłodowska University Bioethical Commission. The experiments were conducted according to the best experimental practices and guidelines. They were also done under the supervision of qualified psychologists. All participants agreed to the EEG signal recording and were informed about the purpose of the experiment. They all signed written consent before taking part in it.

Inclusion and exclusion criteria

The criteria for selecting participants in this study involve being a healthy, right-handed male, aged 19 to 24, with short hair and fluency in Polish. They should have no history of chronic diseases, no current use of prescribed or recreational drugs, and should be able to attend study appointments without specific technological requirements. Additionally, participants were required to abstain from alcohol and medication for at least 72 hours before the experiment.

On the other hand, exclusion criteria encompassed individuals younger than 19 or older than 24, left-handed individuals, those with long hair, limited proficiency in Polish, serious or chronic illnesses, current use of medications or drugs, recent medical treatments, or inability to attend study appointments. Participants failing to meet the inclusion criteria or declaring serious diseases, including mental disorders, were automatically excluded. Prior to participation, participants were informed about the EEG research and technology and consented to take part in the study.

There were several reasons for recruiting participants aged 19–24 and only males. Firstly, the majority of individuals in this age range are students, particularly those pursuing first and second degrees. Secondly, in the Institute of Computer Science, there is a predominant male student population, making it challenging to form both target and control groups including women.

Moreover, it was noted that a substantial majority of female computer science students had lengthy hair. It is noteworthy that the research has also highlighted differences in electroencephalogram patterns between males and females33,34, and the objective was to achieve a relatively balanced representation from the participant pool. Consequently, the study’s findings are limited to male participants, which we recognize as a significant limitation of the work.

They all signed a written consent. Half of the group listened to the Guided Imagery relaxation recording prepared by the psychologist. The other half were asked to recall specific kinds of information: the names of Polish administrative units (voivodships), the names of the Zodiac signs, the names of US states, etc. (Mental Task group or MT group). Tasks were given by the same psychologist on the recording. After each task there was a period of silence when participants were thinking about the answer. The GI group was supposed to relax during the experiment, while the MT group was supposed to be put under mental workload. At the beginning of the experiment the MT group was told that after its completion they would be asked to write down all the information they will have recalled. For each task there was specific amount of time. The idea that this task would induce mental workload was based on research showing that task complexity and time pressure cause mental workload35. The Guided Imagery and the Mental Task recordings were of the same length of 20 min. The participants were asked to close their eyes and each trial was conducted in the lying position with lights turned off to decrease the effects of muscle artifacts, power line noise and distractions on the EEG signal.

The experiments were conducted in the EEG Laboratory of the Department of Neuroinformatics and Biomedical Engineering of Maria Curie-Skłodowska University (UMCS) in Lublin, Poland (Fig. 1). All trial signals were recorded at the sampling frequency of 250 Hz with the use of a 256-channel dense array EGI GSN 130 series cap (Fig. 1). For signal acquisition, the EGI Net Station 4.5.4 software was used.

Fig. 1
figure 1

On the left: EGI 256-channel EEG cap. On the right: the overview of the whole EEG Laboratory at UMCS, Lublin, Poland.

Our dense array amplifier recorded the signal from all 256 electrodes. However, we expected to find differences on the so-called cognitive electrodes based on the previous experience in the cognitive processing EEG signal analysis36,37,38,39. These electrodes are described in the EGI 256-channel cap specification40,41,42 as the best for cognitive ERP observations, covering the scalp regularly, and numbered as follows: E98, E99, E100, E101, E108, E109, E110, E116, E117, E118, E119, E124, E125, E126, E127, E128, E129, E137, E138, E139, E140, E141, E149, E150, E151, and E152 (see Fig. 2).

Fig. 2
figure 2

Electrodes placement on HydroCel GSN 130 Geodesic Sensor Net41. The mentioned 26 electrodes for investigation of ERP cognitive observations were highlighted with blue.

Signal preprocessing and data sets preparation

The recorded EEG signals were pre-processed using mne Python toolkit 1.3.043. Noisy channels were removed from the signal and interpolated to maintain the same size of data in each sample. For automatic bad channel rejection the RANSAC algorithm implemented in pyprep toolkit44 was used. This toolkit is based on the on PREP pipeline designed originally for EEG signal preprocessing in MATLAB45. MATLAB was not used for this research, but the results obtained by MATLAB and Python toolkit were similar. The signal from each trial was filtered with a band pass filter of 1–45 Hz. Each signal was cropped from 10 to 12 minutes of the recording, which gives 120 seconds per subject. Time segment selection placement in the recordings was show in Fig. 3. The time segment was chosen based on the previous experience with GI relaxation method. It was proved that the period between 10 and 14 min. of recording has the greatest significance for distinguishing the relaxation and mental workload state32. Each cropped signal was split into 1-second segments. This gives a total amount of 3120 recording samples (1560 samples of Guided Imagery group and 1,560 samples of Mental Task group). Figure 6 shows the data preparation steps. The sample 1-s segments for both GI and MT states were shown in terms of different power densities for each of frequency bands in Figs. 4 (for GI) and  5 (for MT).

Fig. 3
figure 3

Selected time placement in the recording.

Fig. 4
figure 4

Power spectral density of different frequency bands shown for 1-s segment of signal from GI sample subject.

Fig. 5
figure 5

Power spectral density of different frequency bands shown for 1-s segment of signal from MT sample subject.

Fig. 6
figure 6

Data science pipeline - steps of data preparation for training.

Two sets of electrodes were selected for the experiments. The first one included a full set of 256 channels of EEG signal. The second one contained a subset of 26 electrodes from the central-parietal and central-ocipital region to reduce the amount of data subjected to training. Based on the previous research in analyzing cognitive processing of EEG signals39,46,47, variations were expected to be observed specifically on the above mentioned 26 cognitive electrodes. Those electrodes, specified as optimal for observing cognitive phenomena according to the EGI 256-channel cap specifications48, are positioned in the central-parietel and central-ocipital region and numbered: E98, E99, E100, E101, E108, E109, E110, E116, E117, E118, E119, E124, E125, E126, E127, E128, E129, E137, E138, E139, E140, E141, E149, E150, E151, and E152. The topographical map showing the placement of these electrodes on the scalp can be found in the EGI documentation48 and in36, Fig.  1. It was also showed that they cover the region of the greatest significance for the alpha band-based research, as this band is correlated with the relaxation state15. Finally, the both datasets consisted of 3,120 signal samples. Each sample included 256 EEG channels in the data set 1 (FULL-256) or 26 EEG channels in the data set 2 (COGN-26), and 250 timesteps per second. No further pre-processing or feature extraction was done.

The data set was split into 2,640 samples in the training data set and 480 samples in the testing data set. 6-fold cross-validation was used to confirm performance of the model. The StratifiedGroupKfold method from scikit- learn49 was used to prevent the data from one subject to be put in training and validation data sets at the same time. On the other hand, StratifiedGroupKFold keeps the data set with a balanced number of samples for each group. The data set was shuffled to prevent the model from learning data from only one subject in one batch. Folds were saved for benchmarking purposes.

EEGNet

The first method of classification of EEG signal in this research was 2D-CNN architecture called EEGNet proposed by Lawhern et al.50. Implementation of this network was done using tensorflow and keras. All architecture remained as presented in the original research. The parameters were adjusted as suggested by the EEGNet authors. All parameters are described in Table 1 and are given in Fig.  7.

The learning rate was set to 0.001, the optimizer was Adam and the loss function was binary cross-entropy. Loss function selection resulted in changing the activation function from original Softmax to Sigmoid.

EEGNet performance in terms of validation accuracy and validation loss was selected as reference for all other methods of binary classification described in this research. Using COGN-26 data set, the model had 2153 parameters. After training on FULL-256 data set the model had 6753 parameters.

Table 1 Parameters set for EEGNet architecture according to original paper50.
Fig. 7
figure 7

EEGNet detailed model architecture with parameters for specific layers: f - number of filters, k - kernel size, pool sizes and dropout rates.

LSTM

Long short-term memory (LSTM) is a type of Recurrent Neural Network cell introduced as a solution for learning features from long time sequences including noisy data51.

The simple LSTM-based network was tested as a second reference method. It was proved that Bidirectional LSTM-based (BiLSTM) model can be a good method of EEG classification tasks like emotion classification52 or seizure classification53.

The architecture presented here contained one BiLSTM layer having 64 units(cells) for each backward and forward direction. The number of units were selected according to52. As the input signals included 250 samples each, we decided to take 1/4th of the sampling rate as a unit number. The closest power of 2 was 64. In the backward and forward directions, this means that our model included of 128 units in BiLSTM configuration. Two fully-connented (called also dense) layers, of 32 and 1 node, followed BiLSTM layer. Between those layers, dropout layer was set as the regularization method. Dropout rate was set to 0.5. Activation function in output Fully Connected layer was Sigmoid. The selection of power of two as the unit number in the LSTM layer was supported by connecting CNNs and LSTM in the next step. The selection of 32 nodes in the first Fully Connected layer was supported by trials with different sizes of 16, 32, 64 and 128. That number in that BiLSTM configuration gave the best results.

The learning rate was set to 0.001, the optimizer was Adam and the loss function was binary cross-entropy. Learning rate was tested from value of 0.001 up to 10e-6. We found out that learning rate of 0.001 provides the best classification results for this architecture.

Using the COGN-26 data set, the model had 50,753 parameters. After training using the FULL-256 data set, the model had 168,513 parameters. Detailed architecture is given in Fig. 8.

Fig. 8
figure 8

LSTM detailed model architecture with parameters of specific layers.

1D-CNN

The proposed CNN model included of 4 convolutional layers. Model was built systematically adding next layers from 1 up to 6. The 4-layered model performed the best in described task. The layer is the main element of Convolutional Neural Network. It contains a set of filters which adjust their parameters during the model training phase. The LeakyReLU activation layer was used after each convolutional layer to provide non-linearity54. Moreover, the Batch Normalization layer was used in each block of convolution containing a convolution layer and an activation layer. The purpose of Batch Normalization is to normalize data in batch to enhance learning speed and performance. Batch Normalization was neglected in the third block of convolution because Spatial Dropout (called SpatialDrop in Fig. 9) with the dropout rate of 0.25 was used before. Spatial Dropout is a method of regularization that drops randomly features learned by convolution layer during training to reduce overfitting55. Instead of using pooling layers, strided convolution was applied. It can provide simpler architecture with better accuracy in some applications56. In the case of proposed CNN model it was the best choice in terms of achieved accuracy. The Flatten layer was set in front of two Fully Connected layers, which are responsible for binary classification of features extracted by convolutional layers. The dropout layer was used between Fully Connected layers as regularization method. It deactivates randomly weights of certain parameters during the training process to reduce overfitting57. The dropout rate was set to 0.5.

For the 1D-CNN model the loss function and optimizer remain the same as for the EEGNet and LSTM-based model. The learning rate was reduced to 0.00001 from the default value of 0.001. Learning rate was tested from value of 0.001 up to 10e-6. We found out that learning rate of 0.00001 provides the best classification results for this architecture.

The numbers of parameters in the model for COGN-26 and FULL-256 data sets were: 165,649 and 176,689 respectively. Figure 9 shows the model architecture in detail.

Fig. 9
figure 9

Detailed 1D-CNN model architecture with parameters of specific layers: f - number of filters, k - kernel size, pool sizes and dropout rates.

1D-CNN-LSTM

It was proved that 1D-CNN-LSTM can be applied to the EEG signals successfully. It was reported that this kind of approach can be beneficial for epileptic seizures classification58 and motor imagery classification59.

A decision was made to connect 1D-CNN network model with the LSTM one described in the previous sections. In order to pass the Flatten output as input to the BiLSTM layer, and mantain the same model weights for all output data, the Time Distributed layer was used (referenced in Fig.  10 as TimeD. Moreover as data are processed in CNN layers and the input size for LSTM part is already reduced, we decided to reduce number of nodes in first Fully Connected layer from 64 to 32. This resulted in model architecture shown on Fig.  10.

The numbers of parameters in the model for the COGN-26 and FULL-256 data sets were: 77,777 and 88,817 respectively. The learning rate, optimizer and loss function were set as for 1D-CNN model.

Fig. 10
figure 10

1D-CNN-LSTM detailed model architecture with parameters of specific layers: f - number of filters, k - kernel size, pool sizes and dropout rates.

Evaluation metrics

Validation accuracy was selected as the main performance metric due to the fact that the balanced data sets were used for the binary classification. Validation loss was also monitored during the model designing phase. F1-score, precision and recall averaged over 6 folds are also reported for all tested models. Mentioned metrics are defined as follows60:

$$\begin{aligned} Accuracy = \frac{TP+TN}{TP+TN+FP+FN} \end{aligned}$$
(1)

Here TP is defined as True Positives, TN - True Negatives, FP - False Positives and FN - False Negatives.

$$\begin{aligned} Precision = \frac{TP}{TP+FP} \end{aligned}$$
(2)

Precision quantifies the accurate prediction of positive labels within the total predicted labels belonging to the positive class.

$$\begin{aligned} Recall = \frac{TP}{TP+FN} \end{aligned}$$
(3)

Recall is a measure of the number of positive labels that are correctly classified.

$$\begin{aligned} F1 = \frac{2*Precision*Recall}{Precision+Recall} = \frac{2*TP}{2*TP+FP+FN} \end{aligned}$$
(4)

F1 is defined as the harmonic mean between the recall and precision values.

Grad-CAM

To get insight of how 1D-CNN architecture predicts a specific class, the Grad-CAM (Gradient-weighted Class Activation Mapping) method was used. It is an interpretability method for CNNs that highlights input regions most influential for a model’s prediction. It computes the gradients of the target class score with respect to the feature maps of a convolutional layer, averages them to obtain importance weights, and uses these to generate a class-discriminative heatmap. The method is model-agnostic, widely applicable to tasks like image classification, time-series analysis, and medical imaging, and provides visually interpretable, class-specific explanations. A ReLU ensures only positive contributions to the target class are visualized61. Grad-CAM method has already been shown to work well in applications of EEG- or EMG- signal-based models62,63.

Results

All architectures were tested using keras and tensorflow 2.15 packages with Python 3.11. The hardware used for testing was an Intel i7-based machine with 64GB of DDR5 RAM. The machine was also equipped with the Nvidia GeForce RTX 4070-based graphics card with 12GB of RAM. The operating system was Ubuntu 23.10. None of the setup elements were overclocked.

EEGNet was chosen as a reference because of the well documented architecture. The 6-fold cross validation procedure was perofmed using the model. The results for each fold and validation metrics such as: accuracy, loss, F1-score, precision and recall as well as their average values with standard deviations are presented in Tables  2 and  3 - for the FULL-256 and COGN-26 data sets. On the average after the 6-fold cross-validation(CV) EEGNet obtained 0.7615 and 0.7646 accuracy respectively. In terms of precision as well as recall and F1-score with all average metrics exceeding 0.75 on both data sets model can be considered as a good reference point.

Table 2 EEGNet Validation Metrics for FULL-256 data set for each.
Table 3 EEGNet Model Validation Metrics COGN-26 data sets for each fold.

The LSTM model with only one LSTM layer followed by dropout was chosen as second reference point. The results 6-fold cross-validation and validation metrics such as: accuracy, loss, F1-score, precision and recall as well as their average values with standard deviations are presented in Tables 4 and  5. On the FULL-256 data set precision, recall and F1-score achieved the averaged over folds values above 0.72. The averaged ACC for this case was 0.7250 on the full set of channels. The model performed worse than EEGNet in terms of all described metrics. On the data set containing only 26 electrodes it performed the worst of all compared models with the cross-validated accuracy of 0.6833. It achieved also the worst cross-validated accuracy for both data sets.

Table 4 LSTM Model Validation Metrics on FULL-256 data set for each fold.
Table 5 LSTM Model Validation Metrics on COGN-26 data set for each fold.

The 6-fold cross validation procedure was applied for the 1D-CNN model. The results for each fold and validation metrics such as: accuracy, loss, F1-score, precision and recall as well as their average values with standard deviations are presented in Tables 6 and  7. The averaged over folds accuracy for this model using the FULL-256 data set was 0.7682 which can be considered as a result comparable to that of the EEGNet model. On cognitive electrodes subset it achieved 0.8094 accuracy which outperforms all described architectures for this case. Also in terms of F1-score, precision and recall this model performs the best in the research for the COGN-26 data set.

Table 6 1D-CNN Model Validation Metrics on FULL-256 data set for each fold.
Table 7 1D-CNN Model Validation Metrics on COGN-26 data set for each fold.

The 6-fold cross validation procedure was applied for the hybrid 1D-CNN-LSTM model. The results for each fold and mentioned earlier validation metrics such as: accuracy, loss, F1-score, precision and recall as well as their average values with standard deviations are presented in Tables  8 and  9. The averaged over folds validation accuracy for this model trained using the FULL-256 data set was 0.7726. This was the best accuracy result for the full set of channels of all approaches discussed in this paper. On the cognitive electrodes subset it achieved 0.7556 accuracy which outperforms only the plain LSTM model in this case. For the COGN-26 data set the results are worse than those of 1D-CNN and comparable with EEGNet.

Table 8 1D-CNN-LSTM Model Validation Metrics for FULL-256 data set for each fold.
Table 9 1D-CNN-LSTM Model Validation Metrics for COGN-26 data set for each fold.

The averaged 6-fold cross-validated metrics for all models are reported in Table 10 for the FULL-256 data set and in Table 11 for the COGN-26 data set. It can be seen that the worst model for classification from the full set of electrodes is the one-layer LSTM-based model. The other models obtained comparable results in terms of accuracy, while the best one was the 1D-CNN-LSTM hybrid model. On the other hand for the signal collected from subset of cognitive electrodes in terms of validation metrics of accuracy, loss, F1-score and precision the 1D-CNN-based model outperformed all other approaches with the accuracy of 0.8094, the F1-score value of 0.7806 and the precision close to 0.8970.

Table 10 Evaluation of Metrics for Different Models for the FULL-256 data set. The best result for every metric is reported in bold.
Table 11 Evaluation of Metrics for Different Models using the COGN-26 data set. The best result for each metric is reported in bold.

After 6-fold cross-validation and obtaining results for ACC, Loss, F1-score, Precision and Recall accuracy performance in Leave-One-Subject-Out cross-validation was tested. Results are shown in Tables  12 and  13.

Table 12 Results of accuracy for Leave-One-Subject-Out cross-validation for COGN-26 dataset across different models.
Table 13 Results of accuracy metrics for Leave-One-Subject-Out cross-validation for FULL-256 dataset across different models.

In terms of accuracy, EEGNet performs consistently well across both datasets, making it a robust choice, especially for more specific tasks such as COGN-26. The model achieves an accuracy of 77.50% on COGN-26 and 74.04% on FULL-256, demonstrating its adaptability and reliability. In contrast, LSTM underperforms on both datasets, with low accuracy and high variability. This may be due to its reliance on sequential data, which may not be well suited to these datasets. On the other hand, 1D-CNN shows good accuracy performance, especially on the COGN-26 dataset, where it achieves the highest accuracy of 78.87% with a relatively low standard deviation. This indicates that 1D-CNN can effectively extract features from more task-specific datasets such as COGN-26. Meanwhile, 1D-CNN-LSTM is competitive, but its higher variability across folds suggests that it may not generalize as well as other models, which could be a limitation in practical applications.

Model explainability

Fig. 11
figure 11

Sample Grad-CAM heat map for GI subject with activation intensities averaged over all validation samples.

Fig. 12
figure 12

Sample Grad-CAM heat map for MT subject with activation intensities averaged over all validation samples.

Grad-CAM sample heat maps for both the mental workload state and the guided imagery state are presented in and Figs. 11 (GI) and 12 (MT). Activation intensities were calculated for each subject, for each 1-second samples when using Leave-One-Subject-Out cross-validation. To make heat maps more informative we decided to average all intensities from 120 samples. That way impact of possible noisy samples or out-layers is lowered. Also it shows consistently important parts of the given signals.

Figure 11 represents the Guided Imagery state, characterized by sparse and localized activations, with lower intensity and rhythmic patterns typical for relaxation. Figure 12 corresponds to the mental workload state, as it shows sustained, broad activations across time and channels with higher overall intensity, reflecting heightened cognitive engagement. Mental workload involves more distributed brain activity, particularly in frontal and parietal regions, while relaxation primarily engages parietal and occipital areas. These distinctions align with expected neural dynamics for each state15. The differences in temporal and spatial activation provide visible differentiation between the two conditions. This differences are distinguishable to the model despite some possible noisy fragments of signal such as signal from electrodes near eyes or chicks(higher numbers).

Discussion

There are known approaches of using convolutional neural networks in biometrics64 and other cybernetical tasks65, more and more of them in the EEG signal classification64. More and more often deep learning methods are applied in the biomedical engineering systems to help patients with numerous of disorders like sleep apneua66.

The aim of this paper was to compare the effectiveness of four different architectures in the EEG signal classification originating from a psychological experiment involving Guided Imagery. There were used the EEGNet, LSTM, 1D-CNN and 1D-CNN-LSTM approaches in the case of dense array amplifier setup using 256 electrodes and the so-called cognitive setup using 26 electrodes.

Training all of these models is relatively fast, does not require extensive resources, and as a result can be incorporated into less demanding computational environments after training using different data. What is also beneficial is that in spite of the fact that the EEG signal can vary in time and between subjects, it is possible to train the model with great accuracy using smaller segments of 1 second instead of 1 minute or even the full-length signal. Benefit of this work is also that all the models make use of all 256 EEG channels to learn features and its simplified version of 26 cognitive channels.

Indeed, the results obtained in this study show that the manual feature extraction (EEG bands, wavelets etc.) can be neglected while using the CNN-based, LSTMs and hybrid models architectures.

Simple filtration and interpolation of the signal seem to be sufficient. The binary signal classifiers described above perform well on raw data, resulting in the level of accuracy comparable to that of state-of-the art methods and to our previous paper on Generalized Linear Model in EEG signal classification32.

In case of the full signal collection recorded from 256 electrodes the 1D-CNN-LSTM performs best in terms of accuracy and precision. Almost as good as the one above is 1D-CNN, especially that it has better loss and F1-score values. One layer LSTM accuracy is the worst in this experiment, however still higher than 0.70 with the best recall of 0.79. The reference model EEGNet has the accuracy of 0.76 (compared to the best discussed here 0.77) and generally lower characteristics in the case of remaining three metrics. The collection and comparison of all results of the discussed classifiers are presented in Table 10.

In the case of the signal collected from 26 cognitive electrodes evidently the best one is the proposed 1D-CNN model achieving 80% (6-fold cross-validation) and 78.87% (Leave-One-Subject-Out cross-validation) accuracy with the best loss, F1 and precision characteristics. The one-layer LSTM has much lower accuracy (68% - 6-fold CV) and (63% LOOCV) but its recall is the highest reaching 0.83. The accuracy of the EEGNet reached 76% and 1D-CNN-LSTM 0.75 which were lower by 5% compared with the best one 1D-CNN. The other parameters like F1 and precision are of the same order of value, relatively similar but none is as good as that for 1D-CNN-LSTM. The collection and comparison of all results of the discussed classifiers are presented in Table 11.

Better performance on 26 electrodes (accuracy of 81% for 1D-CNN vs 77% for 1D-CNN-LTSM) can be the result of putting more influential data for feature extraction and automatically selecting those of greater significance for the task than the manually selected subset of 256 electrodes or in special case all of them. This also has support in presented Grad-CAM heat maps that there are EEG channels that can produce some noise while the other ones can be very influential.

There is still place for improving those models by training them with more data from more subjects. There is also need to test if best models work well for the data gathered from female subjects. Also finding new architecture for this task can be a way to reducing number of parameters of the model. It needs to be investigated how other electrodes subsets, like 10–20 international system67 can affect performance of classification using the described architecture.

Another aspect of improvements that can be applied is the parameter tuning for the models. In our opinion, based on the previous experience36 this could increase the accuracy of the models by 3–5%.

Then, there can be designed more complex hybrid architectures, involving other methods of EEG signal analysis68,69 or eg. the fuzzy logic approach70,71.

The main limitation of this work is small amount of data collected for training. Secondly the results were shown for 4 different approaches for data classification using neural networks. Further research should be done on different models and architectures. Also the selection of channels for this task can be crucial for better performance. The selected subset can omit some relevant regions of the human brain. On the other hand using full set of electrodes is easy, but can be irrelevant for this kind of task.

Conclusions

It was proved that from the computational point of view it is even more beneficial to collect fewer data for Gudided Imagery and mental workload classification and expanding the cap to 256 electrodes does not always add a significant value.

EEGNet stands out as the most reliable performer in terms of accuracy performance across both datasets, making it a strong candidate for EEG-based tasks. 1D-CNN also shows potential, particularly when handling more focused datasets like COGN-26, due to its high accuracy and consistency. LSTM, however, struggles with both datasets, suggesting that it may not be the best fit for these tasks unless significant improvements or tuning are applied. Finally, while 1D-CNN-LSTM can deliver competitive performance, its high variability suggests that further tuning is required to enhance its reliability and reduce its inconsistency across different validation folds.

The results show that all four models performed comparably considering metrics such as Accuracy, F1-score, Precision and Recall. The LSTM model performed the worse from all four approaches. Especially in Leave-One-Subject-Out cross-validation using COGN-26 dataset. On this dataset the best result in terms of accuracy was achieved by 1D-CNN model. However on the FULL-256 dataset EEGNet(2D-CNN) model performed the best in terms of accuracy. Those results are supporting the 6-fold cross-validation which indicated that CNN-based models 1D-CNN and EEGNet are the promising choice for EEG signal classification of Guided Imagery relaxation and mental tasks states.

Research using electroencephalography (EEG) to study Guided Imagery (GI) could significantly enhance the development of brain-computer interfaces (BCIs) tailored for therapeutic applications. By analyzing EEG patterns associated with the distinct physiological and neurological states induced by GI, researchers can identify specific biomarkers indicative of a GI-prone state. These biomarkers may include alterations in brainwave frequencies, such as increased alpha and theta activity, which are commonly linked to relaxation and heightened imaginative engagement. By integrating these EEG-derived biomarkers into a BCI, therapists could receive real-time feedback on a patient’s readiness or depth of immersion in the GI process, allowing them to optimize the timing and effectiveness of therapeutic interventions.

Furthermore, a BCI designed with EEG insights could enable therapists to personalize GI sessions more precisely. As Guided Imagery has been shown to influence various physiological systems through the modulation of the hypothalamic-pituitary-adrenal axis, tracking EEG changes could help therapists gauge the intensity and impact of GI on the patient’s overall state. For instance, if EEG data reveals that a patient is not achieving the desired brainwave patterns, therapists could adjust the imagery or relaxation techniques accordingly. Ultimately, this technology could lead to more effective and efficient GI sessions, enhancing therapeutic outcomes by ensuring that patients are consistently engaged in a GI-prone state during their therapy.

So, as shown above the research presented here can shed new light on the engineering of new brain-computer interfaces with application for psycho-therapists and neuro-therapists using the relaxation techniques and Guided Imagery method.

Future plans

The research group will be expanded in the near future. We are going to recruit a larger group and, knowing the limitations, we will try to include female subjects in this extended research. We are also planning to develop a model that deals with noisy channels without filtration for this task, as filtration is a time-consuming process.