Introduction

Pain is a phenomenon created as an internal feeling and a mental experience due to a harmful stimulus1. Many premature babies hospitalized in the intensive care unit experience frequent pain. Premature babies are susceptible to pain2,3,4, And because of their immature nervous system, they feel more pain and experience short-term and long-term effects of pain5. Since babies are in the category of people who cannot express pain, repetition of these effects and painful methods in babies can cause changes in their neural development and behavioural abnormalities6,7,8. Also, many premature babies need oxygen after birth in the intensive care unit and are connected to a ventilator, which is a painful process that they experience for several days. Whether and to what extent analgesia should be used for all these infants has received attention in recent years9,10. In adults, the issue of pain assessment is also essential to reduce its effects. For example, for those people who are exposed to long-term operations and after the procedure and in a semi-conscious state, it is necessary to measure the level of their pain. And as it was said, the experience of pain is entirely subjective and individual, and a certain level of painful stimulation does not result in the same intensity of pain for different people11. Therefore, the pain level of people should be measured by a method without the need for individual reports.

In infants, there are no usual numerical and visual scoring methods for pain assessment and measurement tools such as premature infant pain profile (PIPP), CRIES (Cry, Requires O2 for oxygen saturation (SpO2) above 95, Increased vital signs and BP, Expression, and Sleeplessness index), FLACC (Face, Legs, Activity, Cry, and Consolability) which includes changes in physiological characteristics and behavioural characteristics is used12,13,14,15. Also, the usual measures of pain measurement are not continuous and real-time, and many behavioural and physical reactions are not necessarily specific reactions caused by pain16.

Many studies have been conducted on brain EEG signal processing during pain12,17,18,19. Since the mechanism of pain starts from pain receptors in the whole body and reaches the brain, and the response to pain is initiated by the brain, the amount of pain can be determined by using brain signals in babies. Recording the electrical activity of the brain or electroencephalography is one of the easy and available techniques that has an excellent temporal or spatial resolution and is readily available, and it is possible to measure the intensity of pain by analysing this signal20. Some researcher have used neonatal EEG to investigate multiple sleep stage, because neonates spend 80% of their time sleeping in a resting state and sleep play an important role in neonatal brain21,22.

Among the articles that have focused on the classification of pain levels in adults and children, pain assessment has been done in two-class, multi-class and continuous rating situations. In the case of two-class classification, it is possible to separate two levels of pain and absence of pain, or two levels of low pain and high pain; Mari et al.23 used a new method to evaluate pain intensity using EEG signal; in this research, an attempt was made to overcome the weakness of external validation; they first created two levels of low and high pain using pneumatic pressure, then trained a random forest using a dataset containing 25 subjects and cross-validated Then, using two separate data sets, (1) the number of 15 subjects with pain similar to the stimulation of study 1 and (2) the number of 15 subjects with the same pain but stimulation with the new parameters of the trained model were evaluated that the obtained accuracies were 73.18%, 68.32% and 60.42%, respectively, which show a promising performance that generalizes to new samples and paradigms. Vaart et al.24 Using the EEG signal of 109 babies, which included two pain groups with heel lance stimulation and a control group without pain, they separated these two levels of pain, using a random forest model and using a set of behavioural, physiological and neurophysiological features such as heart rate, blood oxygen, facial changes and brain activity, this article was able to distinguish these two groups from each other with 81% accuracy.

In three-class classification, wider levels of pain are separated; these levels can include low pain, Moderate and high pain. In the continuous rating mode, a score should be assigned to each pain level. Boostani et al.12 classified pain into three and five levels. In this study, 24 healthy subjects underwent the cold-water press test stimulation, and during this, their EEG signals were recorded using 30 electrodes. In this study, average brain maps in alpha and delta bands were obtained for each pain level. And by designing a decision tree in which KNN and SVM classifiers were used, they reached the accuracies of 60 ± 5% and 80 ± 5% and the accuracies of 62 ± 5% and 83 ± 5% in three levels and five levels, respectively. Yu et al.18 presented a new method for the classification of pain intensity, In this method, which was called convolutional neural networks based on various frequency bands, at first, the EEG signal of 32 subjects was recorded under cold stimulation and at three levels of pain Then, different combinations of delta, theta, alpha, beta and gamma frequency bands were given to the convolutional neural network, and finally the best accuracy of 97.37% was obtained in the combination of alpha, beta and gamma frequency bands in three intensities of no pain, moderate pain and thigh pain. The main contributions of this article described below:

  • Designing an algorithm to decode the level of pain and avoid the jumbling of different levels of neonatal pain.

  • Our study focused on enhancing the accuracy of neonatal pain intensity classification through feature extraction and selection. Additionally, we have identified the specific brain channels and regions that are most closely associated with pain perception.

  • Furthermore, we aimed to identify the effective frequency bands for pain assessment to improve and develop clinical evaluation methods.

The rest of the article is arranged as follows: Sect. 2 presents the dataset and methods, Sect. 3 proposes the classification and evaluation. The classification results using the proposed methods are reported in Sect. 4 and discussed in Sect. 5. Finally in Sect. 6 concludes the paper.

Method and material

In this study, the EEG signal of 112 infants was used to determine the pain intensity in two groups. In the first group, the intensity of pain was evaluated in three levels: no pain, low pain, and moderate pain, and in the second group, the separation of two states of pain and no pain was evaluated. By using the wavelet transform as a time-frequency domain analysis method, Statistical features were extracted and then, smote algorithm was used as one of the methods to fix unbalanced data, after that by using wrapper methods, the best features were selected, In the next step, some classifier such as the support vector machine, K-Nearest Neighbors and Bootstrap aggregating were used to distinguish different levels of pain in each group and among the selected features. In this section, data and participants, pre-processing, description of features and auxiliary methods, and design of classifiers are presented. Flowchart of the proposed method for pain decoding has been shown in Fig. 1.

Data and participants

In this research, we used the EEG signal of 112 neonates (52 females; 29–47 weeks corrected age, median 36 weeks + 5 days) aged between 0.5 and 96 days, who were recruited from the postnatal, special care, or intensive care wards at the Elizabeth Garrett Anderson Obstetric Wing, University College London Hospital (UCLH)from June 2015 to June 201715. As mentioned in15, prior to study, informed consent was obtained from the parents and the study was approved by the NHS Health Research Authority (London – Surrey Borders) and conformed to the standards set by the Declaration of Helsinki. It was funded by the Medical Research Council UK (MR/M006468/1 and MR/L019248/1), with support from the UCLH/UCL Comprehensive Biomedical Research Centre, and was performed at the National Institute for Health Research/Wellcome UCLH Clinical Research Facility and UCLH Neonatal Unit15.

These infants, who were hospitalized in the special ward, were subjected to three different tasks, which included Auditory control, heel lance, and sham control, respectively, using a lancet and in different conditions of sleep or wake or in the arms of their parents, then The brain activity of these infants was recorded under heel incision using a lancet and using up to 20 electrodes that were scattered on the head according to the standard of 10–10 by Ag/AgCl type electrodes These channels included. The reference electrode is placed in, and the ground electrode is placed in or depending on the position of the baby, and the data is recorded with a sampling frequency of 2 kHz. In this data, the signal of and channels was not available for all neonates, so this study was conducted on 18 channels and the data of those channels were not used in this study. After recording, this data has been divided into 4-second parts, including 2 s before the moment of stimulation and 2 s after the moment of impulse and has been provided to the researcher. Also, by using the Premature Infant Pain Profile (PIPP), it was determined that the level of pain in infants is 21 according to the parameters in this criterion. On the PIPP scale, the rating is between 0 and 6 for no pain or little pain, 7–12 for moderate pain and a rating above 12 for severe pain. Considering that this research only focuses on pain processing, among the 112 available data files, only the data related to 107 neonates, which includes the information of 18 channels (the information of and channels was not available for all neonates), was used. Many of these data lacked PIPP rating for reasons such as incomplete behavioural and physiological information; in some of the data, there was a high movement noise that could not be resolved, so in total, out of 107 subjects, 70 subjects with clear pain levels according to the PIPP criteria and 37 subjects who were subjected to heel incision stimulation, but their pain levels were not known, were used for the pain-free condition. In order to clear any ambiguity and for the absence of data related to the level of no pain, in this research, we considered the EEG signal for the level of no pain from a period of 2 s before applying the stimulation. A sample of EEG signals with and without pain has been shown in Fig. 2.

Fig. 1
figure 1

Flowchart of the electroencephalography (EEG) analysis strategy for pain decoding.

Fig. 2
figure 2

Neonatal EEG signal recorded over 4 s; (a) with pain, (b) without pain.

Preprocessing

At first, the signal was filtered using a second-order Butterworth filter in the frequency range of 1.5 to 40 Hz. Then, we used a 4th-order Butterworth filter in the frequency range of 48–50 Hz to remove the residual effect of city electricity noise. After that, we cleaned the data by using the average signal in the period of 400 milliseconds before the moment of stimulation; in this way, we subtracted the average signal in this interval from the rest of the signal and then divided the data into parts with a length of negative 400 milliseconds before and Positively, we divided 1300 milliseconds after the moment of stimulation, But regarding the data related to the level of no pain, only the first 2 s of the signal that was before the stimulation was kept, and according to the average of the signals in the time period of 400 milliseconds before the moment of stimulation, it was used to clean the data, and then the signal was investigated for feature extraction25,26,27,28.

Time-frequency analysis of EEG signal

Many biological signals, such as the EEG signal, are unstable signals and are constantly changing with time and frequency, so in this research, we used the time-frequency representation of the EEG signal to extract features. For this purpose, at first, we reduced the sampling frequency to 1000 Hz to reduce the sampling frequency and the levels of the wavelet transform, and then we took the wavelet to transform from the EEG signal of each baby and from each channel and then extracted the features from the violet transform output.

Wavelet transform is a time-frequency transform that provides a better representation of details for data, such as the non-stationary EEG signal. In the wavelet transformation, the scaled and shifted forms of the basic functions called the mother wavelet are used, which display the details and approximation coefficients of the signal in the output, and these coefficients determine the degree of similarity between the signal and the mother wavelet. In the wavelet transform, the detail coefficients at each stage of signal decomposition are obtained by passing a high-pass filter and approximation coefficients by passing a low-pass filter with a factor of 2. In this research, the Violet transformation continued up to 7 levels, and the coefficients D5, D6, D7, D8, and A8 represent the frequency bands of delta {1.5–3.9 Hz}, theta {3.9–7.8 Hz}, alpha {7.8–15.6 Hz}, beta {15.6–31.25 Hz} and gamma {40–31.25 Hz} (Fig. 3). Also, to select the most effective mother wavelet, we checked the performance of different wavelets29,30,31 such as the Daubechies wavelet of order four “db4” and order seven “db7”, Demi family and Symlets order8 and Symlets order4 through the accuracy of the classifiers, and the sym8 wavelet was chosen because of the best performance among them (Fig. 4). Both continuous and discrete wavelet transforms (DWT) can be considered for feature extraction. Given the success of discrete wavelet transforms in previous studies on pain classification and their ability to extract features from specific frequency bands, the discrete wavelet transform was selected for this study31.

Fig. 3
figure 3

Extracting electroencephalography (EEG) features using discrete wavelet transform (DWT) with the mother wavelet sym8 and level 8.

Using the detail and approximation coefficients obtained from the Violet transformation, we received characteristics in each frequency band and channel. These features included statistical features such as mean, variance, elongation, and skewness12,29of the signal. Another feature that was used was Shannon’s entropy32, which is the entropy (H) of a random vector in information theory with probability distribution \(\:{p}_{x}\left(\gamma\right)\) defined as follows:

$$\:H\left(X\right)=-\int\:{p}_{x}\left(\gamma\right)log{p}_{x}\left(\gamma\right)d\gamma\:,$$
(1)

Entropy is a measure of the amount of disorder in the sequence. Another feature that was used in this study was the signal power spectrum In this research, the Welch method was used to obtain the signal power spectrum for each channel and each frequency band; in the Welch method, the signal is divided into a maximum of 8 parts that overlap each other by 50% and the average power spectrum of these parts is calculated. Statistical features such as mean, maximum and standard deviation of the power spectrum in each frequency band and each channel were used as features along with other features to classify pain intensity.

Fig. 4
figure 4

Choosing best mother wavelet with estimating SVM classifier.

Feature selection

Feature selection is one of the most common steps in classification methods. Feature selection is essential because it simplifies models for researchers and users, reduces learning time, prevents the problem of large dimensions and reduces over-fitting, and thus, by choosing important and meaningful features, it also increases classification accuracy. There are two general feature selection approaches: wrapper and filter methods, each of which has its advantages. In filter methods, the importance and resolution of features are checked separately, and this makes less time spent. And in wrapper methods, unlike scalar methods, they consider the relationship between the features and choose the best combination among the available features. In this research, we filter methods to select optimal features, saving time and determining whether a feature is important33.

Statistical analysis

This research used the Kruskal-Walli’s test to select the features. In this test, a generalized sample of the Wilcoxon test and a non-parametric sample of the variance test, the null hypothesis emphasizes the absence of differences between the groups according to the averages. A significance level of 0.05 was considered. In this test, it is not necessary for the data distribution to be expected, and it is also used for communities with more than two classes.

Fisher score

Among the features that had a sufficient significance level, they were used for scoring based on Fisher’s algorithm34. In this algorithm, the features that increase the difference between groups and decrease the difference in a group are scored and used using the average and variance parameters. With this assumption, the within-group variance matrix for a feature \(\:{\stackrel{-}{x}}_{ij}\) is defined as follows according to the average of each group \(\:{\:\stackrel{-}{\mu\:}}_{i}\):

$$\bar{\bar{S}}_{w} = \sum\limits_{{i = 1}}^{c} {p_{i} \left( {\frac{1}{{n_{i} }}\sum\limits_{{j = 1}}^{{n_{i} }} {\left( {\bar{x}_{{ij}} - \bar{\mu }_{i} } \right)\left( {\bar{x}_{{ij}} - \bar{\mu }_{i} } \right)^{t} } } \right)} .$$
(2)

The inter-class variance matrix is ​​also defined according to the average of each class and the average of all groups as follows:

$$\:{\bar{\bar{S}}}_{b}=\sum\limits_{i=1}^{c}{p}_{i}\left(\frac{1}{{n}_{i}}\sum\limits_{i=1}^{{n}_{i}}\left({\stackrel{-}{\mu\:}}_{i}-\stackrel{-}{\mu\:}\right){\left({\stackrel{-}{\mu\:}}_{i}-\stackrel{-}{\mu\:}\right)}^{t}\right).$$
(3)

Finally, for the kth feature, the Fisher score is equal to:

$$\:Fisher\left(k\right)=\frac{{S}_{b}\left(k\right)}{{S}_{w}\left(k\right)}.$$
(4)

Classification model and evaluation

Unbalanced data

Another thing that should be considered in the classification problem is to pay attention to the equality of the number of subjects in each group. The issue of unbalanced data in classes can cause inaccuracy in the obtained results and overfitting of the data. The imbalance of data causes the number of samples in one class to be low or high compared to others, and as a result, the problem of data dependence occurs during the training of the model, and it is not able to correctly predict new samples Therefore, SMOTE (Synthetic Minority Oversampling Technique) method was used in this research due to the imbalance in the number of samples of the classes and to solve this issue35. In this method, data is created for the minority class in proportion to the number of samples of the majority class so that the number of samples of the two classes is equal to each other. This method uses over-sampling, and instead of copying samples, it generates samples from them in their neighbourhood. With the help of distance measurement, this algorithm selects two or more similar samples and creates a new sample from them in the neighbourhood until the number of minority and majority class samples is equal to each other. In this data, the number of samples from the no-pain and moderate pain groups increased to the maximum group, which was the low-pain class group, and the total number increased from 107 to 132 through oversampling.

Different classifiers have been used to classify pain levels36,37,38. In this research, we used SVM, KNN and Bagging collective learning classifiers to evaluate the performance of each classifier in pain level classification from EEG signal. The selection of SVM, KNN, and Bagging classifiers for this research was based on several technical and empirical reasons. SVM, with its ability to create complex decision boundaries and its robustness to high-dimensional data, is well-suited for classifying the intricate patterns found in EEG signals. KNN, due to its simplicity and interpretability, was a suitable choice for initial data analysis. Additionally, Bagging, by reducing variance and improving accuracy, helped to increase confidence in the results. Experimental results demonstrated that these classifiers outperformed other methods considered for pain intensity classification. This choice is also consistent with findings from other related studies in this field.

In Support Vector Machine (SVM) classification, a hyperplane is constructed in the feature space to optimally separate different classes. The goal is to maximize the margin, which is the distance between the hyperplane and the closest data points from each class.

While linear kernels can efficiently separate linearly separable data in the original feature space, many real-world datasets, including EEG signals, exhibit complex patterns that are not linearly separable. To address this, SVM employs the kernel trick, which implicitly maps the data into a higher-dimensional feature space. This allows for the construction of non-linear decision boundaries, such as those defined by Gaussian kernels. Based on our experimental results on a dataset of 107 EEG recordings, with 720 features extracted from each recording, and among the kernels that are used such as linear, Radial Basis Function and polynomial, the Gaussian kernel demonstrated superior classification accuracy for our EEG-based pain intensity classification task.

The K-Nearest Neighbors (KNN) algorithm is a widely used non-parametric method for classification tasks. In KNN, the classification of a new data point is determined by a majority vote of its k nearest neighbors in the feature space. The distance metric used to find these neighbors can vary, with Euclidean distance being a common choice. One of the key advantages of KNN is its simplicity and interpretability, making it a valuable tool for initial data exploration. Additionally, KNN does not make any assumptions about the underlying data distribution, making it suitable for a wide range of datasets.

In this study, we employed the KNN classifier to classify pain intensity. Through experimentation with various k values 3, 5, and 10 and distance metrics such as Chebyshev, Minkowski, Euclidean, exhaustive, we observed that the KNN classifier achieved the highest accuracy in classifying levels of pain when using k = 3 and the Chebyshev distance metric.

Ensemble learning is a machine learning technique that combines multiple models to improve predictive performance. By creating diverse models and aggregating their predictions, ensemble methods can reduce overfitting, improve generalization, and enhance robustness. Bagging is a popular ensemble method, involves training multiple models on different bootstrap samples of the training data. Bootstrap sampling involves randomly sampling with replacement from the original dataset. This creates diverse training sets, leading to diverse models. When making predictions, the predictions of all base models are combined, typically through majority voting. In pain intensity classification, the number of samples in each class is often imbalanced. Bagging can help mitigate the impact of imbalanced data by creating multiple models, each with a slightly different distribution of samples. This can lead to more robust and accurate predictions, especially for the minority class. By creating multiple models and combining their predictions, bagging can reduce overfitting, which is a common problem in machine learning, especially when dealing with complex models and limited data. Ensemble methods like bagging can improve the generalization ability of a model, making it more capable of accurately predicting on unseen data. Bagging can help to stabilize the model’s performance, making it less sensitive to variations in the training data. bagging is a powerful technique for pain intensity classification due to its ability to handle imbalanced datasets, reduce overfitting, and improve generalization. By combining multiple models, bagging can provide more robust and accurate predictions, leading to better clinical decision-making.

To evaluate the classification algorithm in a range of features, a 10-fold cross-validation method was adopted in the proposed study. This evaluation method has better validity due to the limited number of subjects. As shown in Fig. 1, the subjects are divided into ten equal portions using random selection in the tenfold validation method. The ML model is trained using nine portions for testing with the remaining portion. This process is repeated ten times, changing the test and training portion at each stage. In each repetition, the values of different performance parameters are obtained, and the average of the results is defined as the performance of the ML model, and this algorithm repeated for 50 times and the end the average of the result calculated. The classification results have been evaluated using five metrics: accuracy (%), sensitivity (%), specificity (%), AUC scores and Cohen’s kappa. confusion matrix21 for two pain level classification Given the basic statistics of true positive (TP) represents the number of pain subject detected correctly, true negative (TN) measures the number of no-pain averse no-pain predicted correctly, false negative (FN) represents the number of pain group detected as no-pain group, and false positive (FP) represents the number of no-pain predicted as pain group39,40. Therefore, the evaluation metrics can be calculated as follows:

$$\:\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}=\frac{\text{T}\text{P}\:+\:\text{T}\text{N}}{\text{T}\text{P}+\text{T}\text{N}\:+\:\text{F}\text{P}\:+\:\text{F}\text{N}},$$
(5)
$$\:\text{S}\text{e}\text{n}\text{s}\text{i}\text{t}\text{i}\text{v}\text{i}\text{t}\text{y}=\frac{\text{T}\text{P}\:}{\text{T}\text{P}+\:\text{F}\text{N}},$$
(6)
$$\:\text{S}\text{p}\text{e}\text{c}\text{i}\text{f}\text{c}\text{i}\text{t}\text{y}=\frac{\:\text{T}\text{N}}{\text{T}\text{N}\:+\:\text{F}\text{P}\:},$$
(7)
$$\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}=\frac{\text{T}\text{P}}{\text{T}\text{P}+\:\text{F}\text{P}\:},$$
(8)
$$\:Kappa=\:\frac{2(TP.TN-FP.FN)}{\left(TP+FP\right).\left(FP+TN\right)+\left(TP+FN\right).(FN+TN)}.$$
(9)

Results

In this study, we classified the pain intensity of 107 infants with painful stimulation during heel lancing and using EEG signals. At first between pain and no-pain group, the data on the level of low pain and moderate pain are placed in one group, and the data related to the state of no pain are also placed in one group after that In order to eliminate the effect of the different data of the two classes, from the pain class, which has a larger number, only the number of samples of the no-pain class were randomly selected, so that the pain-free group has 37 subjects, so in total, to evaluate two levels in total We had 74 subjects. About second, we investigate three level of pain of 107infants, this group contain no-pain, low pain and moderate pain. SMOTE method was used to solve the effect of unbalanced data in classes, so that the number of samples in classes are equal by using interpolation so the total number increased from 107 to 132 through oversampling. After this, those features that had a significance value greater than 0.05 and a high Fisher score were divided into test and training data, and then SVM, KNN and Bagging classifiers were trained and tested. The results of the average repetition of 50 times of 10-fold validation are as follows:

Classification results of three level pain: low pain, no pain, and moderate pain

In this case, the time-frequency features of the EEG signal were obtained, and the number of these features was equal to 720 (18 electrodes*8 features*5 frequency bands), which was reduced to 569 numbers after selecting the feature using the statistical test and Fisher’s score, and the results of the classifiers in the classification of three levels are as shown in Table 1.

The results show that the accuracy of all classifiers is higher than the chance level. The SVM classifier has the best performance with an accuracy of 84.4%, and KNN and Bagging classifiers have 72.8% and 82.9% accuracy, respectively. The confusion matrix of all applied that shows in Fig. 4a, the matrix shows the best performance of SVM classifier.

Table 1 The results of decoding in three levels of pain.

Classification results of two levels of pain and no pain

Between group of pain and no-pain, according to classification of three level of pain, 720 features extracted, among them those had p-value < 0.05 and high Fisher’s score, selected and were used for classification. The ML algorithm was repeated for 50 times so the average of 50 repetitions of 10 folds is as shown in Table 2 and confusion matrix of best classifier shown in Fig. 4b.

Table 2 The results of pain decoding in two levels of pain and no pain.

Figure 5a and b respectively show the average confusion matrices of the best classifier for the two-level and three-level classifications.

Fig. 5
figure 5

Average Confusion matrix for (a) two-class (1: no pain, 2: pain), (b) three-class (1: no pain, 2: low pain, 3: moderate pain).

The result showed the best performance is for bagging classifier and is 100%, due to the number of subjects is equal in each group the overfitting not happened, and other classifier have good performance.

Fig. 6
figure 6

Classification performance for each frequency band (delta, theta, alpha, beta, and gamma).

Discussion

As it was said, the experience of feeling pain is completely unique and subjective, so the method that can be used to measure the intensity of pain without the intervention of a person is important. One of the most important groups studied in the field of assessing the severity of pain is newborns. Considering that the feeling of pain is associated with the activation of a part of the brain and the connection between different parts of the brain is dynamic, then it is possible to obtain important features by considering the activity of the brain at different levels of pain. This study presents a classification framework based on neonatal EEG that not only classifies pain and non-pain states, but also measures different levels of pain. In accordance with the classification method presented using electroencephalogram signals of newborns and using machine learning algorithms, the pain intensity is initially in three states of no pain, mild pain, and moderate pain, and then in two states of pain and absence of pain in a non-invasive approach was investigated. Due to the number of subjects in each level wasn’t equal and it cased to bad effect in result and overfitting so we used the SMOTE Technique, this algorithm using oversampling increases the subject of minority class. Time-frequency features were extracted using WDT from five frequency bands, those features which have significant effect on classification were selected by statistical analysis and fisher score algorithm. For SVM classifier the best performance of investigating three level pain, this method was evaluated among 107 infants and in three level with an accuracy of 89.4%, a Sensitivity of 80.9%, a Specificity of 86.2% and AUC of 0.94. For bagging classifier, the best performance of investigating two class with 74 subjects, the best performance of this method was evaluated with an accuracy of 100%, a Sensitivity of 100%, a Specificity of 100% and AUC of 1.

According to the features that had significance and high scores for pain classification after the two stages of Fisher’s score and statistical test, according to the order of placement of the features when extracting the features from each band(delta, theta, alpha, betta and gamma), these features belong to the delta frequency band. Also, by examining the best features of each frequency band separately (144 features of all channels), as showed in Fig. 6, it is clear that the features related to the delta band are more effective in diagnosing and classifying pain. These results were in compliance with the results reported by M. Kaur et al.30.

By examining the features that were used in each iteration as features with a high Fisher score and significance less than 0.05 (P < 0.05) in the classification, that these features are often from the P8 and CPz channels which are also in compliance with past research30,41. Therefore, it can be concluded that the parietal lobe of the brain is more effective in evaluating the level of pain. The parietal lobe is one of the four main lobes of the brain of living beings and is located in the upper part of the middle hemisphere of the brain, which is between the frontal lobe and the occipital lobe and above the temporal lobe. The sense of touch, the recognition of colour, size and shape from each other, and the feeling of pain from the functions of this lobe are also the main sensory inputs of the skin (temperature, touch and pain receptors) are transferred to the parietal lobe through the thalamus.

In order to compare the proposed method with state-of-the-art methods in this field, their results in terms of number of pain classes, the utilized EEG features and the classification accuracy are demonstrated in Table 3. It should be considered that the best result for each method is shown. In this study the accuracy of classification in three levels is more than chance level and is acceptable and in two levels the proposed method has better result than other, it showed these methods are very responsible and can use for other studies.

Table 3 Comparison with the state of the art pain classification approaches.

The study was limited by the short duration of the EEG signal, which was only 4 s. A longer signal would have allowed for the development of a more robust preprocessing pipeline capable of automatically eliminating a significant portion of artifacts, thereby enhancing the overall performance. Furthermore, a larger dataset with a greater number of samples across various pain intensity levels would have enabled a more comprehensive analysis and potentially revealed finer distinctions in infant pain perception. Additionally, collecting data from a larger and more diverse population of infants could help to generalize the findings and identify potential biomarkers for pain assessment in this population.

Conclusion

In this article, we extracted the effective features in the classification of different levels of pain from the EEG signal and from infants, and using it, we separated the pain intensity in three levels and two levels. In this study, several classifiers were used and the results obtained are the result of several stages of evaluation. The two groups of pain and no-pain were separated from each other with 100% accuracy, and the three groups (low pain, moderate pain, and no pain) were also well separated from each other with 84.2% accuracy. Moreover, results of the proposed method for different number of pain classes statistically outperform state-of-the-art schemes for the same number of classes.