Introduction

Chronic pain affects 50 million Americans (20.4% of US adults) as per estimates by the centers for disease control and prevention (CDC), and it has been linked to high healthcare costs as well as decreased quality of life1. Spinal cord stimulation (SCS)- an FDA-approved widely used treatment, has been associated with improved clinical outcomes when compared to opioids2. However, a significant portion of pain patients receive suboptimal care due to the lack of clear criteria for parameter and device selection, and patient selection still remains a challenge3. Unfortunately, the absence of objective biomarkers further complicates pain management and treatment4.

To address this challenge, EEG offers several advantages for understanding chronic pain and its treatment with SCS, making it a valuable tool in both clinical and research settings5. One of the most important aspects might be its compatibility with all clinically available SCS devices and its integration with other assessment tools. Intraoperative EEG is a commonly used method in neurosurgery. It has been used for dorsal root ganglion stimulation lead placement6 and studies have observed nociceptive activation in intraoperative EEG while patients are under general anesthesia7. However, its application in chronic pain management remains novel, with limited studies utilizing intraoperative EEG for comprehensive analysis of neural pain patterns8,9.

There are currently a variety of different self-reporting methods to quantify pain10,11,12,13,14,15,16. However, due to the complex and multifactorial nature of pain, these methods often limited to capture its full scope, highlighting the need for machine learning (ML) to integrate diverse physiological, neural, and clinical data for more objective and comprehensive pain assessment. ML is a rapidly growing field in pain assessment. Studies have found that ML models can approximate self-reported pain scores, the current clinical standard, solely based on physiological features recorded with real-world context17,18. For example, researchers demonstrated that ML models trained on baseline features, including demographics, diagnostic features, and self-reported pain scores - without incorporating any neural data - could predict responders to SCS with an accuracy of 50–73% 19,20. A recent study recorded intracranial EEG and showed that self-reported pain scores could be predicted with high sensitivity from neural features, but was limited by the small sample size of just four participants21.

The successful prediction of SCS outcomes for chronic pain patients would represent a significant stride forward for personalized medicine and science as a whole. In the short term, intraoperative prediction would accelerate precise device selection, avoiding the need for patients to have multiple operations to sample several devices. Predicting responders would also empower patients by giving them insight into their most likely outcomes before they go through the months-long process of having their SCS settings clinically optimized. In the long term, models trained to predict responders could be used to unlock the neurophysiological bases of chronic pain, helping us understand and design more personalized SCS. The current study aims to predict which chronic pain patients will respond positively to SCS based on intraoperative neural patterns. To date, no studies have published results for this task in the literature.

Methods

Participants

Twenty chronic pain patients who underwent spinal cord stimulation implant surgery for their chronic back and/or leg pain as standard of care were included in the present study. All patients provided written informed consent to participate in the study. The experimental protocol was approved by the Institutional Review Board of Albany Medical Center (IRB Number: 4973), and the research was conducted in accordance with the Helsinki Declaration.

Patient reported outcome measures

To monitor the patients’ levels of pain over time, a number of self-reported scores were recorded as per routine protocol. These included Numeric Rating Scale (NRS) scores, which is used to assess pain intensity, Beck Depression Inventory (BDI), which measures the extent of depressive symptoms due to chronic pain13; Oswestry Disability Index (ODI), which quantifies the degree of disability associated with chronic pain14; McGill Pain Questionnaire (MPQ) assesses many aspects of pain, including its emotional impact and intensity15; and Pain Catastrophizing Scale (PCS) to measure how much patient’s chronic pain induces thoughts about worst-case scenarios or extremely pessimistic futures16. These scores were collected preoperatively and 3 months (on average) after surgery. To add depth to the measurements, the patients were also asked about their absolute worst, best, and average levels of pain over the preceding week on the NRS at each follow-up. To obtain a more robust pain intensity score, NRS-now and NRS-average were averaged, and responders were defined as those with a decrease in this score (labeled as NRS- average) of at least 50% within the first three months. This threshold was used because it is a widely accepted standard and was included in the IMMPACT recommendations as the criteria for substantial improvement in pain22.

Surgical procedure and neuro monitoring

Surgeries were performed while patients were under general anesthesia at appropriate levels for intraoperative neurophysiological monitoring (IONM) per a widely accepted routine23. Leads were placed covering T9-10 and T8-9 either via laminectomy or percutaneously using a standard of care surgical methods as previously described23,24. Briefly, the incision site and lead placement were planned preoperatively with respect to the “sweet spot” from trial identifying the ideal stimulation location. An initial midline incision was made over the posterior spinous process, guided by fluoroscopic localization. Electrocautery and surgical dissection were used to expose the spinous process and lamina. Following intraoperative fluoroscopic confirmation of the correct level, a laminectomy was performed adequate for placement of the SCS paddle. Serial fluoroscopic imaging and IONM confirmed placement and intact neurological status. The routine IONM protocol included monitoring muscle activity in the abdominals, quadriceps, tibialis anterior, abductor hallucis, medial gastrocnemius, biceps femoris, and gluteus maximus under stimulation23,25. To check laterality, each lead was temporarily connected to a manufacturer-specific testing device, a contact of interest is selected as the active contact based on the predicted ‘sweet spot’, and compound muscle action potentials (CMAPs) in response to tonic stimulation (60 Hz, 300 µs) at gradually increased amplitudes (0-to-10 mA) were monitored. Once placement was deemed adequate for long-term therapy and implantable pulse generator (IPG) was placed subcutaneously, our EEG experiment was performed by connecting the commercial paddle to its pulse generator and delivering stimulation to the same active contacts. The surgery was completed in standard fashion23,26.

EEG recording and analysis

Intraoperative EEG data was recorded at 10 channels (F3-4, FP1-2, C3-4, CP3-4, and T3-4) in accordance with the extended 10–20 system using the Cadwell Cascade Pro IONM systems (Cadwell Inc., Kennewick, Wash., USA). EEG signals were bandpass filtered at 1–58 Hz and sampled at 128 Hz. A 90-second baseline (stimulation OFF) was recorded at the beginning of each session, followed by randomized, 60-second-long alternating stimulation ON and OFF conditions. The following types of stimulation were recorded twice each: (i) Tonic at 60 Hz/300 µs and (ii) High-density (HD) Stimulation at 1 kHz/30 µs. The stimulation OFF periods separated the stimulation ON periods to allow for signal recovery. The contacts and current amplitude were determined by SCS screening trials.

Data was exported to MATLAB (Mathworks, Natick, MA) for offline analysis using custom scripts in MATLAB R2021a (Mathworks, Natick, MA). All EEG recordings were visually examined to identify corrupted channels/trials, as well as mechanical artifacts. After corrupted data was rejected, signals were filtered via a second order Butterworth high pass filter (2 Hz). Independent component analysis (ICA) was performed to remove heartbeat artifacts27. Subsequently, a common average reference (CAR) filter was applied to each trial to remove common activity across all channels28.

To estimate power spectral density of EEG signals in each condition, a fast Fourier transform (FFT) was computed in 5-second epochs with a 1-second sliding Hanning window29. Next, the average magnitude of Fourier coefficients across trials was computed for the theta (4–7 Hz), alpha (8–12 Hz), beta (13–33 Hz), and gamma (35–50 Hz) subbands, as well as the overall spectrum (1–50 Hz). Subband power values were normalized for each stimulation condition relative to baseline on a logarithmic decibel (dB) scale. Additionally, band specific peak frequency and peak power ratio between alpha and theta bands were calculated. Each of these EEG features were extracted for all contacts, allowing for an investigation of the topological distributions. This feature extraction approach yielded total of 100 relative power features (2 SCS conditions x 10 channels x 5 bands), 30 peak power ratio features (3 SCS conditions x 10 channels x 1 band), and 150 peak frequency features (3 SCS conditions x 10 channels x 5 bands) per subject.

Statistics

Participant data was summarized by mean ± standard error of mean. A Mann-Whitney U test was used to compare age, sex, and all PROMs between responders and nonresponders. The Fisher’s exact test was used to compare diagnosis (4 categories) and pain location (5 categories) between responders and nonresponders and identify the significant variables. To compare the preoperative and postoperative PROMs within groups, a two-tailed Wilcoxon signed-rank test was conducted.

The Shapiro-Wilk test and Levene’s test were used to assess the normality of the feature distributions and the homogeneity of variances, respectively. Spectral features passed the tests were analyzed using an independent samples t-test and a 3-way analysis of variance (ANOVA) with the independent factors of group (responders and nonresponders), SCS condition (OFF, tonic, HD), and cortical region (prefrontal, frontal, motor, somatosensory, and temporal). Features with non-normal distributions were analyzed using the Mann-Whitney U. Post hoc analyses were done by Bonferroni correction. Descriptive statistics were reported as mean ± standard deviation. Analysis was conducted using IBM SPSS version 29.0.1.1 (SPSS Inc., Chicago, IL, United States), with a significance threshold of p = 0.05. Correlations between the neural features in each region and preoperative PROMs were analyzed via Spearman’s rank order correlation using MATLAB statistical toolbox (Mathworks, Natick, Massachusetts).

Machine learning

To determine EEG features that account for the most variance, we applied principal component analysis (PCA) to rank-order the features, enabling objective selection of inputs for the ML classifiers30. We then used leave-one-out cross-validation (LOOCV) across all patients, training on all but one patient in each iteration and testing on the excluded patient. To promote model generalizability, feature selection was conducted only once prior to LOOCV, ensuring that the same features were used to predict across all folds.

For hyperparameter tuning and random seed selection, we conducted a grid search before beginning LOOCV, which ensured consistent hyperparameters across all iterations and avoided tailoring model settings or initializations to specific data subsets. We trained various classifiers, including logistic regression, decision trees, support vector machines (SVM), XGBoost, and random forests31,32,33,34. Models were evaluated using accuracy, F1 score, and area under the curve (AUC) of receiver operating characteristic (ROC) aggregated across folds. For the best architecture from the initial set of experiments, we performed a second grid search to optimize the number of EEG features that were to be used as inputs from the rank-ordered list from PCA, continually adding the remaining features with the highest rank.

Results

Patient cohort and outcome measures

A total of 20 patients with chronic pain (10 female, aged 54.4 ± 2.95 years on the date of surgery) were recruited for this study. Three patients were excluded from the study due to insufficient postoperative NRS data for classification. Of the 17 patients included, 7 subjects were grouped as responders with a mean age of 54.14 ± 5.82 years and 10 subjects were grouped as nonresponders with a mean age of 53.1 ± 3.41 years (Table 1). All patients were diagnosed with chronic neuropathic pain (CNP), including specific types such as chronic low back pain (CLBP), persistent spinal pain syndrome (PSPS), and complex regional pain syndrome (CRPS). Particularly in responders, 3 patients were treated for CLBP, 2 were treated for CNP, 1 was treated for PSPS, and 1 was treated for CRPS while, in nonresponders, 3 were treated for CLBP, 6 were treated for PSPS, and 1 were treated for CNP. Most of the participants suffered from lower back and leg pain (responder/nonresponder: 5/4) while 2 responders suffered from lower back pain only, 2 nonresponders suffered from bilateral leg pain only, and 4 nonresponders suffered from pain in another body area(s). Statistical analyses did not show a significance difference in age (U = 31, p = 0.696), sex (U = 32.5, p = 0.778), diagnosis (p = 0.199), or pain location (p = 0.111) between responders and nonresponders.

Table 1 Demographic and clinical data for responders and nonresponders.

PROMs were collected from each patient preoperatively and at 3 months. None of the preoperative outcome measures, including NRS-worst (U = 19.5, p = 0.118), NRS-average (U = 21, p = 0.169), BDI (U = 31.5, p = 0.733), MPQ (U = 26.5, p = 0.404), PCS (U = 15, p = 0.317), and ODI (U = 32, p = 0.768), demonstrated statistically significant difference between responders and nonresponders. Statistical analysis indicated significantly lower postoperative scores of NRS-worst (U = 13, p = 0.028), NRS-average (U = 1, p < 0.001), and ODI (U = 10, p = 0.030) in responders while BDI, MPQ, and PCS did not show a significant difference (U = 28.5, p = 0.524; U = 19.5, p = 0.128; and U = 23.5, p = 0.396, respectively). Within group comparisons in responders demonstrated significant improvement in NRS-average (p = 0.018), PCS (p = 0.043), and ODI (p = 0.046) from preoperative to 3-month postoperative period while no significant improvement was noted in NRS-worst (p = 0.088), BDI (p = 0.343), and MPQ (p = 0.089). In nonresponders, only significant reduction was noted in NRS-worst (p = 0.041) following surgery (NRS-average: p = 0.192; BDI: p = 0.125; MPQ: p = 0.089; ODI: p = 0.672; PCS: p = 0.469). Additional patient characteristics can be found in Table 1.

Spectral features with correlations

Global activity

When examining global activity changes across different conditions (baseline, tonic, HD), we observed that average spectrum (1–50 Hz, Fig. 1A, B) in responders showed stronger and faster alpha trends with greater variance (5.39 ± 4.43 µV2/Hz; 5.36 ± 4.84 µV2/Hz; 5.59 ± 5.05 µV2/Hz, respectively) compared to nonresponders (4.43 ± 4.67 µV2/Hz; 4.65 ± 2.19 µV2/Hz; 4.75 ± 2.07 µV2/Hz, respectively). However, these differences were not statistically significant in any condition (p > 0.05). While none of the differences in global activity between responders and nonresponders reached statistical significance, higher trends in variance in HD may suggest varying stimulation-induced cortical responses. The global theta trends (Supplementary Fig. 1) were higher in nonresponders (5.09 ± 1.86 µV2/Hz; 5.15 ± 2.17 µV2/Hz; 5.24 ± 2.39 µV2/Hz, respectively) compared to responders (4.50 ± 2.76 µV2/Hz; 4.08 ± 2.76 µV2/Hz; 4.02 ± 2.69 µV2/Hz, respectively) in all conditions; however, these changes were not statistically different (p > 0.05). Global beta band activity (13–33 Hz) decreased with both stimulation conditions in responders, more uniformly with HD resulting in a smaller variance (0.50 ± 0.41 µV2/Hz). The highest variance in gamma band (35–50 Hz) was localized to nonresponders and noted in tonic stimulation (0.39 ± 1.03 µV2/Hz).

Fig. 1
figure 1

Global activity. (A) Power spectrum density (PSD) estimates. PSDs were averaged across EEG regions and patients per group during stimulation OFF, tonic and HD. Blue: Responders. Red: Nonresponders. Shaded area indicating ± standard deviation. HD: high-density. (B) Global alpha band activity. Boxplots indicating global alpha activity (8–12 Hz) during baseline (top row), tonic stimulation (middle row), and HD stimulation (bottom row). Blue: Responders. Red: Nonresponders. Black lines indicating the groups’ mean.

Alpha-theta peak power ratio

A three-way ANOVA showed a marginal difference for group factor indicating that responder-nonresponder factor might affect the alpha-theta peak power ratio (F(1,225) = 3.44, p = 0.055). We noted that responders showed higher alpha-theta ratio compared to nonresponders (1.47 ± 0.26 dB vs. 0.84 ± 0.22 dB). Additionally, pairwise comparisons demonstrated a significant difference in alpha-theta peak power ratio between CP3-CP4 and T3-T4 (adjusted p = 0.019), with the strongest activity in T3-T4 during HD stimulation (Fig. 2A, bottom-row), and the lowest in CP3-CP4 during tonic stimulation in nonresponders (Fig. 2A, middle-row). Alpha-theta peak power ratio was notably higher in frontal (F3-F4) and prefrontal (FP1-FP2) regions in responders; however, it did not survive multi-comparison correction. No significant group x region x SCS interaction was found (p > 0.05).

Fig. 2
figure 2

Comparison of alpha/theta peak power ratio between responders and nonresponders. (A) Topographical maps of alpha-theta peak power ratio in responders and nonresponders during stimulation OFF (top-row), tonic (middle-row) and HD (bottom-row). Color bar indicating alpha-theta peak power ratio in decibel (dB) scale. (B) Correlation analysis between alpha/theta peak power ratio in FP1-FP2 during baseline and preoperative Numeric Rating Scale (NRS) worst pain scores. Blue circle: Responders. Red diamond: Nonresponders. (C) Correlation analysis between alpha/theta peak power ratio in C3-C4 during baseline and preoperative Beck Depression Inventory (BDI) scores.

As shown in Fig. 2B, activity in FP1-FP2 during baseline was significantly correlated with preoperative NRS-worst and NRS-average scores in responders (r=−0.829, p = 0.030; r=–0.800, p = 0.041, respectively). This indicated that patients with severe pain scores tended to have smaller prefrontal region alpha-theta peak power ratio. While trends were similar in the nonresponders, correlation was not significant. Figure 2C presents the significant correlation between alpha-theta peak power ratio in C3-C4 during baseline and preoperative BDI scores in nonresponders (r=−0.709, p = 0.028). Similarly, a significant correlation was found between baseline CP3-CP4 and preoperative BDI scores in nonresponders (r=−0.660, p = 0.044). These indicated that patients with severe depression scores tended to have smaller central-centroparietal alpha-theta peak power, indicating stronger theta activity. While trends were similar in the responders, correlation was not significant.

Global relative power

The relative global power (1–50 Hz) also demonstrated a significant main effect of group factor, indicating that responder-nonresponder factor might affect the global activity - which was computed with respect to baseline (F(1,150) = 5.99, p = 0.016). The relative global activity was significantly smaller in responders (adjusted p = 0.016) and the smallest activity was localized to CP3-CP4 under HD (-0.72 ± 1.25 dB) followed by tonic stimulation (−0.62 ± 1.25 dB). While not statistically significant (p > 0.05) following the multi-comparison correction, descriptive statistics of within-group distribution indicated that responders demonstrated lower activity in central and frontal-prefrontal regions under tonic stimulation (Fig. 3A). While trends in central area remained low under HD similar to tonic, HD induced stronger activity in frontal-prefrontal regions. Trends in nonresponders did not show a change in response to tonic or HD stimulation. Analysis did not show a significant interaction across group, region, and SCS condition.

Fig. 3
figure 3

Comparison of relative power between responders and nonresponders. (A) Topographical distribution of relative global (1–50 Hz) power in tonic (top-row) and HD (bottom-row). Color bar indicating relative power in decibel (dB) scale. (B) Correlation analysis between relative global power in CP3-CP4 during HD stimulation and preoperative Numeric Rating Scale (NRS) average pain scores. Blue circle: Responders. Red diamond: Nonresponders. (C) Correlation analysis between relative theta (4–7 Hz) power in CP3-CP4 during tonic stimulation and preoperative Oswestry Disability Index (ODI) scores.

Correlation analysis between global relative power and the subjective scores demonstrated opposite trends between the groups under HD and it was limited to centroparietal region. Particularly, preoperative NRS-average scores in nonresponders indicated positive correlation with CP3-CP4 relative power (r = 0.695, p = 0.026; Fig. 3B).

Subband relative power

Analyzing relative theta power (4–7 Hz), a significant main effect of group (F(1,150) = 13.54, p < 0.001) was observed. The relative theta activity was significantly smaller in responders (adjusted p < 0.001) and the smallest activity was localized to CP3-CP4 under HD (-0.90 ± 0.99 dB and −0.24 ± 1.13 dB) followed by tonic stimulation (−0.84 ± 1.22 dB and −0.17 ± 0.99 dB) in both groups (Supplementary Fig. 2A). No significant interaction was found among group, region, and SCS conditions. Relative theta power showed significant correlation only in CP3-CP4 region under tonic stimulation with preoperative ODI scores in nonresponders (r=−0.654, p = 0.040; Fig. 3C).

Relative alpha power (8–12 Hz) displayed no significant main effect of group or interaction. While not significant, descriptive statistics showed higher alpha activity in HD compared to tonic condition (0.0 ± 0.12 dB vs. -0.13 ± 0.12 dB). Interestingly, the smallest relative alpha activity was induced by HD in responders in CP3-CP4 region while it was tonic stimulation in nonresponders in the same cortical area (Supplementary Fig. 2B). In beta band (13–33 Hz), a significant group effect (F(1,150) = 8.43, p = 0.004) indicated reduced relative beta power in responders (adjusted p = 0.004), with no significant group-region-SCS interaction. The weakest beta activity was observed in CP3-CP4 with HD stimulation for both groups (Supplementary Fig. 2 C). No significant correlation was found between the relative alpha or beta power and the PROMs. In contrast, the lowest gamma activity (35–50 Hz) was localized to FP1-FP2 region with HD stimulation in responders (−0.91 ± 1.99 dB), and to CP3-CP4 region (-0.91 ± 2.40 dB) in nonresponders (Supplementary Fig. 2D). Tonic stimulation produced the strongest gamma activity in C3-C4 for both groups. Relative gamma power showed no significant group main effect or 3-factor interaction. A significant correlation was found between the tonic-induced relative gamma power in C3-C4 and the preoperative NRS-average scores in responders (r=-0.855, p = 0.021).

Subband peak frequencies

Three-way ANOVA revealed a significant main effect of group in theta frequency (F(1,225) = 34.29, p < 0.001); however, there was no significant interaction noted between group, region, and SCS condition (p = 0.965). Subband peak frequencies computed per region were compared between groups across SCS conditions (Fig. 4A). While not statistically significant (p > 0.05) following the multi-comparison correction, descriptive statistics indicated trend where responders exhibited faster theta peak frequency (5.3 ± 0.09 Hz) compared to nonresponders (4.6 ± 0.07 Hz). Further, the fastest theta rhythm was noted in the C3-C4 area for both groups (Responder: 5.4 ± 0.2 Hz; Nonresponder: 4.7 ± 0.2 Hz) while slowest theta rhythm was found in F3-F4 for responders (5.1 ± 0.2 Hz) and CP3-CP4 for nonresponders (4.5 ± 0.2 Hz). HD stimulation showed faster theta frequency in CP3-CP4 (Responder: 5.5 ± 0.3 Hz; Nonresponder: 5.1 ± 0.3 Hz) compared to baseline (p = 0.046).

Fig. 4
figure 4

Subband peak frequencies between responders and nonresponders. (A) Heatmaps showing the subband peak frequency per SCS condition in each EEG region in responders (top-row) and nonresponders (bottom-row). Color bar indicating the frequency in Hz. (B) Correlation analysis between theta peak frequency in CP3-CP4 during HD stimulation and preoperative McGill Pain Questionnaire (MPQ) and Oswestry Disability Index (ODI) scores. (C) Correlation analysis between alpha peak frequency in CP3-CP4 and FP1-FP2 regions during HD stimulation and preoperative Numeric Rating Scale (NRS) scores. Blue circle: Responders. Red diamond: Nonresponders.

Analysis of peak frequency showed no significant interaction between group, region, and SCS for any subband or the entire spectrum (p > 0.05). Descriptive statistics demonstrated that the fastest alpha rhythm was in C3-C4 under HD stimulation (9.3 ± 0.2 Hz), and in T3-T4 under tonic stimulation for both responder (9.4 ± 0.3 Hz) and nonresponders (9.4 ± 0.2 Hz). Beta peak frequency averaged 13.5 ± 0.2 Hz in responders and 13.8 ± 0.1 Hz in nonresponders, with the highest beta activity was in F3-F4 during baseline for responders (14.1 ± 0.6 Hz) and in FP1-FP2 with tonic for nonresponders (14.5 ± 0.5 Hz). The slowest beta frequency occurred in HD T3-T4 for responders (13.2 ± 0.6 Hz) and baseline C3-C4 for nonresponders (13.2 ± 0.5 Hz). Under HD stimulation, trends in gamma rhythms were slower in responders (36.7 ± 0.3 Hz) than in nonresponders (37.2 ± 0.3 Hz). For responders, trends in global rhythms averaged 5.7 ± 0.7 Hz at baseline and increased to 6.1 ± 0.7 Hz with tonic stimulation, while nonresponders showed no change in trends (6.3 ± 0.6 Hz).

Further analysis demonstrated significant correlations in responders localized to alpha band and nonresponders to the theta band. More specifically, theta peak frequency under HD stimulation in nonresponders was negatively correlated with preoperative MPQ and ODI scores in both C3-C4 (r=−0.878, p < 0.001; r=−0.819, p = 0.004, respectively) and CP3-CP4 (r=−0.744, p = 0.014; r=−0.673, p = 0.033, respectively) regions (Fig. 4B). In alpha band, significant correlations were found to be between HD-induced alpha peak frequency and preoperative NRS-average scores in responders, localized to CP3-CP4 (r=−0.982, p = 0.002) and FP1-FP2 (r=−0.844, p = 0.029) regions (Fig. 4C).

Machine learning

PCA was performed with the EEG data to rank top features of hundreds of neural features by explained variance. Our feature selection approach resulted in 12 features that can be considered an indicator of response to SCS (Fig. 5A). It was indicated that the five most important EEG features were baseline theta peak frequency in C4, HD-induced alpha peak frequency in C4, baseline alpha peak frequency in C4, HD-induced global relative power in F4, and tonic-induced gamma relative power in F3. A list of the top EEG features included as inputs to the classifier is provided in Table 2. The top EEG features, along with demographics, clinical features, and self-reported pain scores, were used as inputs to an ML model that predicted responders.

Fig. 5
figure 5

Machine learning pipeline. (A) Diagram of full pipeline. EEG data was collected during SCS implant surgery and was subsequently exported to MATLAB for offline analysis. The data was preprocessed and the relevant neural features were extracted. Principle component analysis (PCA) was employed to rank these features by explained variance. The top features, along with demographics, clinical features, and pain scores, were used as inputs to a machine learning model that predicted responders. (B) Comparison of different numbers of EEG features as input to the decision tree classifier. Accuracy, F1 score, and area under the receiver operating characteristic curve (AUROC) are shown. The optimal number of EEG features was defined as that with the highest AUROC, indicated by a vertical arrow. (C) ROC for the optimal version of each of the architectures compared. AUROC is noted in the figure legend. (D) Confusion matrix for the optimal decision tree classifier.

Table 2 Top-ranked EEG features based on PCA.

The decision tree classifier performed best, with an accuracy of 88.2%, F1 score of 0.857, and AUC-ROC of 0.879 (Table 3). Importantly, the optimal number of EEG features were relatively consistent across the majority of the architectures compared (see Supplementary Fig. 3 for further information). For the decision tree, 12 EEG features as inputs led to the best prediction as measured by AUC-ROC, providing enough information without diluting the signal with noise (Fig. 5B). The decision tree also had the highest accuracy and F1 score at this input setting. When each model was given its optimal number of EEG features, the decision tree had the maximal AUC-ROC (Fig. 5C). The confusion matrix of the optimized decision tree is shown in (Fig. 5D). The model only had one false positive and one false negative across the 17 patients included for the ML experiments.

Table 3 Performance metrics of predictive models: responders vs. Nonresponders.

Discussion

The present study develops predictive models for SCS outcomes using intraoperative EEG features as inputs for a machine learning classifier. Selecting candidates for SCS surgery is challenging due to a lack of clear biomarkers for chronic pain, but our work brings the field one step closer towards objective patient and device selection.

Numerous works have highlighted variable neural responses, particularly in areas like the prefrontal and somatosensory cortices, associated with pain processing in patients suffering from conditions like CRPS, PSPS, and NP35,36. Our study has revealed distinct neural patterns associated with responders to SCS. For example, our findings suggest that the responder status may influence the alpha-theta peak power ratio, with responders showing higher values compared to nonresponders. The observed differences in the alpha-theta peak power ratio between regions, particularly the stronger difference between somatosensory and temporal area, highlight the region-specific effects of SCS. A similar group effect was observed in features such as relative global power, relative theta power, and theta peak frequency, where the relative activity was significantly lower in responders, but peak frequency was significantly higher. Furthermore, the smallest activity in these spectral features was consistently localized to the somatosensory region under HD. While no significant interaction between group, region, and SCS was found, these findings indicate that SCS may modulate neural activity differently in responders, potentially reflecting distinct neurophysiological responses to treatment.

Alpha-theta peak power ratio in prefrontal cortex and central regions during baseline was inversely correlated with pain intensity, indicating that responders experiencing severe pain displayed slower rhythms before stimulation. Chronic pain and depression are closely related37 and high levels of psychological distress are associated with altered pain processing and perception38. Although no significant differences in BDI scores were observed between groups, nonresponders with elevated depression scores showed a tendency toward theta activity in central-centroparietal regions, potentially indicating altered attentional and emotional processing of pain stimuli. Correlations between objective and subjective measures showed parallel trends in most features between the groups, except for somatosensory global power and theta peak frequency in HD. Particularly, relative global power tended to be higher in nonresponders with higher pain intensity; and slower theta rhythms with stronger sensory experience of pain. Overall, these findings provide insights into the neurophysiological mechanisms underlying the response to SCS in individuals with chronic pain, highlighting potential EEG measures for predicting treatment response and identifying differences in neural processing between responders and nonresponders.

The use of ML in neurosurgery is notable for its ability to handle large, complex datasets and generate predictive outputs crucial for patient-specific, AI-driven decision-making. In spine surgery, ML applications include predicting post-surgical outcomes, diagnosing complex conditions from CT or MRI, and even assisting in surgeries such as pedicle screw placement, fostering personalized medicine39,40,41. More specifically, ML has shown incredible promise in the chronic pain literature. For instance, ML has shown promise in pain assessment by approximating self-reported pain scores based solely on physiological features, which could lead to more holistic, multidimensional approaches to pain assessment compared to conventional self-reporting methods17,18. Researchers have predicted individual chronic pain severity scores via intracranial orbitofrontal cortex signals21. Our findings confirm and extend the work of Gram et al., who showed ML based on EEG signals could distinguish between responders and nonresponders to morphine, even when conventional statistics reveal no significant differences between the two groups42. Levitt et al. trained an SVM to distinguish between healthy controls, patients with chronic lumbar radiculopathy, and those with treatment-resistant chronic back pain who were considered candidates for SCS by a healthcare committee. Their model predicted clinical decisions for SCS implantation, thus highlighting the relevance of EEG features for assessing patient suitability for SCS43.

Interpretability is essential when applying ML to clinical decision-making. Beyond showcasing how responders can be predicted with great accuracy, this work highlights which EEG features out of a relatively large set of options are the most significant, and we show this is consistent across a large cohort of patients in the intraoperative setting. Very few studies have investigated EEG differences across baseline, tonic, and HD stimulation8,9, and none have applied ML to successfully predict which patients will be responders to SCS. The twelve most discriminative features identified in this study, selected through PCA, hold significant promise for future application in distinguishing responders from nonresponders.

The decision tree’s highly accurate performance for this task is reasonable given the lack of statistically significant differences in individual features between the two groups. Decision trees, like XGBoost and Random forest classifiers, excel at handling non-linear relationships. These three tree-based models had better performance than the other architectures, which may have struggled with the complex relationships between the EEG features we extracted. Without extensive kernel tuning, the SVM may have struggled with the non-linear separability. Furthermore, while the dataset of 20 patients is immense in the context of intraoperative EEG studies9, it is relatively small for ML problems. This likely gave the decision tree an edge over XGBoost, which is a more complex model that could be prone to overfitting.

At present, SCS device settings are manually tuned for several months after surgery in a trial-and-error based manner, and patients cannot know whether they will be a responder until after this process is complete. Intraoperative EEG data provides a dynamic and real-time insight into the patient’s neurophysiological state, free from potential confounds related to cognitive conditions. A recent study used intraoperative frontal EEG signatures to predict postoperative delirium44. This underlines the potential of this modality to predict surgical outcomes.

In the surgical context, both percutaneous and paddle SCS leads offer distinct advantages and disadvantages. The existing SCS literature includes numerous clinical trials reporting successful outcomes with both implantation techniques (laminectomy vs. percutaneous)45,46. However, studies directly comparing these approaches in terms of patient outcomes remain limited47. One study reported that patients who underwent laminectomy for paddle lead placement had a higher rate of successful outcomes at a 1.9-year average follow-up, but this advantage was not observed at the 2.9-year follow-up, where success rates were similar between the two groups48. A retrospective review demonstrated significant pain relief for both groups based on visual analog scale (VAS) scores within an 8.6- to 10.3-month postoperative period while long-term follow-up (up to 66 months) revealed significantly greater pain relief in patients who underwent laminectomy compared to those who had percutaneous leads49. In our dataset, the distribution of percutaneous versus paddle leads was comparable between responders and nonresponders: 3 vs. 4 in responders and 5 vs. 5 in nonresponders, respectively. Using a 50% reduction in NRS scores as the threshold for response, the implantation technique (percutaneous vs. paddle) did not significantly affect patient outcomes at the 3-month follow-up. This suggests that while certain studies indicate potential advantages for paddle leads in long-term pain relief, the short-term outcomes in our cohort were independent of the surgical method.

Limitations and future directions

Some limitations should be considered when interpreting our results. To be reliably integrated into clinical practice, ML algorithms need to be tested across multiple institutions and in larger cohorts. Although our model’s high performance across a variety of metrics adds credibility, external validation remains a critical step for broader application.

In our study, we focused on a 3-month follow-up as this timeframe is clinically critical for optimizing SCS settings. Early response to SCS often guides the subsequent parameter adjustments, which are crucial for long-term management of chronic pain. Several pivotal trials also use a primary evaluation at 3 months, underscoring the importance of this period in determining initial efficacy and patient-specific adjustments. For example, EVOKE study conducted their primary analysis at 3 months to assess the effectiveness of closed-loop SCS50. Deer et al. similarly assessed the efficacy of dorsal root ganglion stimulation at this interval51. However, future studies should follow patients for longer periods of time after surgery and attempt to predict 12- or even 24-month outcomes.

Although our sample size is relatively high compared to other recent intraoperative SCS studies52,53,54,55, we acknowledge that a multi-center study with a larger sample size would further strengthen our findings. Specifically, a study with more patients across different etiologies could help test the generalizability of our algorithms and explore how varying etiologies influence responses to different SCS waveforms. Such a follow-up study would provide a broader base for understanding how these treatments impact diverse patient populations and could ultimately enhance the clinical utility of our model. Similarly, in our study, the average age and female-male ratios were comparable between groups, with no significant differences. While age and sex might influence SCS-related EEG changes, their comparable distribution likely minimizes such effects. Thus, strict age- and sex-matching may not be essential; however, future studies with larger sample sizes could consider strict matching to ensure fully unbiased analyses.

Finally, intraoperative mapping (IOM) poses unique challenges and opportunities for EEG recordings, impacting their accuracy and interpretation. In our study, IOM was performed after positioning the SCS lead according to routine surgical procedures. EEG testing began after lead and IPG implantation, following surgical closure, approximately 15 min before surgery concluded. To ensure consistency, stimulation was delivered through the “sweet spot” contacts identified during the trial period, which provided > 50% pain relief and were also used for laterality testing. Motor mapping involved evoked responses from 18 muscles (9 per side), alongside somatosensory evoked potentials (SSEPs). While the specific stimulation site along the dorsal column may affect neural responses54, our findings reflect stimulation consistently delivered at a clinically validated location, ensuring reliability and relevance.

Future research should focus on exploring preoperative neural signatures to predict surgical outcomes, aiding in better patient selection and enhancing the efficacy of neuromodulation therapy. There is potential to apply the present study’s predictive model in the preoperative setting, allowing for a more informed selection of candidates for SCS before patients enter the operating room. Leveraging preoperative data could reduce unnecessary procedures and help identify patients who are most likely to benefit from SCS. This approach aligns with the growing trend of using AI for predictive modeling in medicine to avoid unnecessary invasive procedures or surgeries. The effective application of machine learning algorithms to the clinical settings has the potential to bring medicine into an era of more personalized, precise, and accessible care.

Conclusion

Neural signatures in response to various SCS waveforms were extracted in chronic pain patients who were undergoing SCS implant surgery for their lower back/leg pain. Using clinically recognized outcome measures and the objective EEG markers, ML algorithms were developed to predict responders to SCS therapy. A decision tree model with 12 selected spatio-spectral features performed best and predicted the responders with 88% accuracy. Our research offers insights into the basic science behind chronic pain, identifying specific EEG patterns correlating with pain relief and bridges the gap by applying ML to EEG signals from chronic pain patients undergoing SCS surgery. Our novel approach provides a potentially powerful tool for better patient selection, with the potential to improve chronic pain management.