Introduction

Epilepsy is a neurodevelopmental disorder that affects over 70 million patients in the world, characterized by the occurrence of repeated unprovoked seizures due to abnormal neuronal activity. Alongside pharmacological and surgical interventions, unfortunately, one-third of people with epilepsy still have drug-resistant epilepsy (DRE). DRE is described as the failure to achieve sustainable seizure control, regardless of attempts with adequate antiseizure medications, and the patients are termed drug-resistant1. DRE has increased risks of cognitive decline, psychiatric comorbidities, accidental injury, and even premature mortality, making it a substantial clinical and public health concern2. Evidence for neuroinflammation being a key mediator in epileptogenesis, especially in the drug-resistant condition, is rapidly increasing. Both preclinical and clinical models consistently demonstrate increased expression of proinflammatory cytokines, such as IL-1β, TNF, and IL-6, in seizure-initiating brain regions and in peripheral blood3,4. Microglial activation, complement system dysregulation, and leukocyte infiltration are now recognized as key contributors to seizure initiation, propagation, and resistance to therapy5,6. Such results have introduced the concept of immune-related biomarkers and new approaches targeting neuroinflammatory pathways as potential treatments5,6.

Despite this, the transcriptomic landscape of drug-resistant epilepsy remains incompletely defined; most studies rely on small sample sizes or focus on only a limited number of candidate cytokines, making it challenging to identify gene-level traits that are common across experiments7,8. Moreover, heterogeneity in tissue sources, batch effects, and data preprocessing across studies hampers direct comparisons and generalization9. The lack of integrative, high-dimensional analyses with model interpretability has limited the clinical applications of drug-resistant transcriptomic findings. Systems-level design employing transcriptomic integration, machine learning, and model explainability is the optimal solution to cope these limitations SHAP (SHapley Additive Explanations) provides clear explanations of its contributions to the model, a post-hoc model-agnostic explanation method, to quantify feature attributions and improve transparency in biomarker discovery and prioritization, while acknowledging that the underlying models themselves remain complex, thereby enhancing transparency in biomarker discovery and prioritization10. Furthermore, combining gene signatures and drug-target interactions via in silico docking provides a promising framework for therapeutic repurposing in epilepsy. Hence, this study was designed to identify and validate immune-inflammatory gene signatures from drug-resistant epilepsy datasets using integrated transcriptomic profiling and interpretable machine learning, illustrated by the overall workflow depicted in Fig.Ā 1. This study hypothesized that immune-related genes would reliably distinguish epileptic from non-epileptic tissue and that top-ranked biomarkers would be druggable, offering translational potential for biomarker-guided therapy.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Graphical abstract of the current study.

Methods

Data acquisition

Publicly available RNA sequencing datasets retrieved from the NCBI Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) were integrated (GSE256068, GSE213488, and GSE186334). GSE256068 has RNA sequencing data derived from hippocampal and cortical brain tissues of patients with drug-resistant epilepsy. It includes 64 samples from patients with temporal lobe epilepsy and hippocampal sclerosis (TLE-HS), 33 samples from patients with focal cortical dysplasia type IIb (FCD2b), 21 samples from patients with tuberous sclerosis complex (TSC), and 27 non-epileptic controls, comprising both hippocampal and cortical regions. GSE213488 includes gray- and white-matter transcriptomic profiling of focal cortical dysplasia type IIa (FCD2a) and IIb (FCD2b), with 17 epileptic and four control samples. These datasets were generated using Illumina sequencing platforms and are supported by well-annotated metadata, making them ideal for integrative and comparative transcriptomic analyses. The unified dataset constructed from these three GEO accessions comprises 197 samples, including 162 from epileptic patients and 35 from non-epileptic controls. To validate the robustness and generalizability of these top features, GSE186334 was used as external validation, which includes cortical expression profiles from both epileptic (46) and non-epileptic (22) individuals, it is an entirely independent cohort, generated in a separate study with cortical-only sampling and distinct sequencing/batch characteristics, and was not used for differential expression analysis, feature selection, hyperparameter tuning, or model training All datasets were accessed in May 2025, and raw count matrices and metadata were downloaded in their original form to ensure reproducibility.

Data preprocessing

Raw RNA counts from GSE256068 and GSE213488 were merged into a dataset using gene symbols as standard identifiers. Genes with low-abundance features were removed, excluding those with fewer than 10 total reads across all samples, a step aimed at reducing noise from low-abundance transcripts9. Sample metadata was curated to include batch identifiers, tissue type (hippocampal or cortical), and clinical condition (epileptic or non-epileptic). Metadata alignment was verified by ensuring concordance with the column of the count matrix. Data integrity checks were conducted to remove duplicate entries or missing annotations. No attention-based integration methods were used; datasets were merged strictly by shared gene symbols, and subsequent batch-aware normalization was performed using a variance-stabilizing transformation (VST) in DESeq2 (version 1.42.0, R version 4.3.1) to normalize gene counts and mitigate the influence of sequencing depth11. Specifically, DESeq2 models counts using a negative binomial distribution, estimates gene-wise dispersions, and fits a smooth mean–dispersion curve; the variance-stabilizing transformation is then derived as an approximate integral over this mean–variance relationship, yielding transformed values with approximately constant variance across expression levels. Principal component analysis (PCA) was performed to confirm that batch and tissue signals were adequately captured in the metadata.

Differential expression analysis

DESeq2 (version 1.42.0) was used to perform differential expression analysis (DEA), accounting for batch effects and tissue as explicitly specified covariates. At the same time, variance partitioning has enabled their impact to be appropriately adjusted, thereby ensuring the reliable identification of genes associated with the disease12,13. Wald tests were used to assess differential expression between epileptic and non-epileptic samples. The apeglm method (version 1.24.0) was used to refine effect sizes, yielding more efficient log2fold change estimates, especially for lowly expressed genes. Genes were defined as differentially expressed if they satisfied two criteria: an adjusted p-value (Benjamini-Hochberg false discovery rate) < 0.05 and an absolute logā‚‚ fold change (|logā‚‚FC|) ≄ 114,15, indicating thresholds of statistical and biological effect size, control of false discoveries between epileptic and control samples.

Feature selection and ensemble modeling

Feature selection was performed using an ensemble approach that integrated five supervised learning algorithms: RF, ETs, XGBoost, LASSO LR, and linear SVM16. These classifiers were implemented in Python (scikit-learn version 1.4.2 and XGBoost version 2.1.0). To address class imbalance (~ 84% epileptic), stratified 5-fold cross-validation was applied, and the Synthetic Minority Oversampling Technique (SMOTE) was used exclusively on the training folds to augment the minority class without data leakage17. Feature importance scores were obtained from each model using either coefficient magnitudes (linear models) or Gini-based impurity measures (tree-based models). Hyperparameters for RF, ETs, XGBoost, LASSO, and SVM were tuned using grid search with inner cross-validation on the training data only. For example, RF/ETs were tuned over n_estimators (100–1000) and max_depth (3–None); XGBoost over n_estimators (100–800), max_depth (3–10), and learning_rate (0.01–0.3); and C/penalty parameters for LASSO and SVM over log-spaced grids. All procedures used fixed random_state values (42) for cross-validation splits, tree initializations, and SMOTE to ensure reproducibility.

SHAP (SHapley Additive Explanations, version 0.45.0) and model-agnostic permutation importance were computed to validate feature relevance across different model classes. Genes were retained if their cumulative SHAP contribution reached at least 50% of the model’s total explanatory power, using a fixed cutoff across classifiers to ensure consistency18. This reduced model-specific biases, minimized redundancy, and improved generalizability to capture both linear and non-linear gene–disease relationships while ensuring that feature selection remained reproducible and interpretable.

Post-hoc model explainability with SHAP and permutation importance

To quantify gene contributions to the model’s predictions, explainability assessments were performed using SHAP and permutation importance. SHAP was also implemented for each model type separately: TreeExplainer for RF, ETs, and XGBoost; and LinearExplainer and KernelExplainer for LASSO LR and SVM, respectively. This ensured consistent, model-specific interpretation of gene relevance. SHAP values were aggregated across samples to compute overall feature importance, while local interpretability was assessed by examining SHAP distributions in misclassified samples. Permutation importance was also applied across models as a sanity check to confirm whether predictive performance declined when specific genes were randomized. These methods were implemented using SHAP (version 0.45.0) and scikit-learn (version 1.4.2). We computed mean absolute SHAP values per gene and normalized them so that the sum of SHAP contributions within a model equaled 1. Normalized SHAP scores were then averaged across the five classifiers to obtain a consensus importance for each gene.

External validation

External validation was performed by applying the best-performing classifiers trained on internal cross-validation folds to the independent GSE186334 dataset. For each of the five folds, a separate model was trained, predictions were made on the external dataset, and performance metrics, including accuracy, precision, recall, F1 score, and area under the ROC curve (AUC), were computed. An ensemble strategy was implemented by averaging predicted probabilities across the fold models, thereby producing a consensus classifier. This ensemble was chosen to reduce fold-to-fold variability and improve generalization. Performance was evaluated using ROC and precision–recall curves, and results were compared with internal cross-validation outcomes to quantify consistency. All analyses were conducted in Python using scikit-learn (version 1.4.2).

Drug–gene interaction mapping

To assess the translational potential of top features, drug–gene interaction mapping was performed using the Drug–Gene Interaction Database (DGIdb, version 4.3.0, https://www.dgidb.org/). The top-ranked genes from SHAP analyses were submitted to the database, and all reported interactions were retrieved19. To ensure reproducibility and clinical relevance, interactions were filtered according to the following criteria: (1) only FDA-approved drugs were retained, and (2) interaction scores > 2. These thresholds were chosen to minimize inclusion of low-confidence or preclinical interactions while maintaining potential therapeutic relevance. Additional prioritization was applied to drugs with established roles in neurological or inflammatory pathways, given their significance to epilepsy pathophysiology.

Molecular docking and molecular dynamics simulations

Molecular docking and molecular dynamics (MD) simulations were performed to evaluate the structural feasibility of drug–gene interactions. Protein structures corresponding to prioritized genes were retrieved from the Protein Data Bank (PDB, https://www.rcsb.org/) when available, while unresolved structures were modeled using SWISS-MODEL (version 4.2.0, https://swissmodel.expasy.org/). Ligand structures of FDA-approved drugs were prepared using PyRx (version 0.9.8) and screened for pharmacokinetic and toxicity properties with ADMETlab (version 2.0, https://admetmesh.scbdd.com/) to ensure drug-likeness and safety relevance. Docking studies were then carried out using AutoDock Vina (version 1.2.5), focusing on biologically annotated binding sites. Candidate complexes were ranked according to binding affinity scores, and the top-scoring interactions were selected for MD simulations. MD simulations were performed for 100 ns under physiological conditions using Schrƶdinger Desmond (Schrƶdinger Release 2024-1, https://www.schrodinger.com/). System stability was assessed using root mean square deviation (RMSD) and root mean square fluctuation (RMSF), which provided insights into temporal stability, binding persistence, and protein–ligand flexibility.

Results

Integrated transcriptome profiling

RNA sequencing samples from two GEO datasets (GSE256068 and GSE213488) were included; after filtering, 18,642 protein-coding genes were retained for downstream analysis. The cohort distribution indicated that the majority of epileptic cases originated from hippocampal and temporal lobe tissues (Fig.Ā 2A), whereas control samples were more evenly distributed across regions. Heatmap visualization of sample-to-sample distances further confirmed clear separation between epileptic and control groups, although tissue-specific clustering was also observed (Fig.Ā 2B). Data integration highlighted contributions from two major batches, with 82.2% of samples derived from GSE256068 and 17.8% from GSE213488 (Fig.Ā 2C). Variance decomposition showed that batch effects accounted for 27.3% of overall variation, tissue type contributed 10.5%, and clinical condition explained 2.4% (Fig.Ā 2D). Principal component analysis demonstrated batch-driven segregation before correction, but post-correction, using a variance-stabilizing transformation in DESeq2, showed precise alignment with epileptic status, confirming effective removal of batch effects (Fig.Ā 2E–F). The first two principal components explained 49.3% of the variance, underscoring the influence of both technical and biological factors (Fig.Ā 2G).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Data integration, quality control, and batch correction. (A) Distribution of samples across epileptic and control conditions, stratified by tissue origin. (B) Heatmap of sample-to-sample distances demonstrating separation between epileptic and non-epileptic groups. (C) Batch distribution showing 82.2% of samples from GSE256068 and 17.8% from GSE213488. (D) Variance partitioning, with batch contributing 27.3%, tissue 10.5%, and condition 2.4% of variance. (E) PCA before batch correction, highlighting batch-driven segregation. (F) PCA after batch correction, with improved condition-driven clustering. (G) Variance explained by the top principal components, showing PC1 (29.7%) and PC2 (19.6%) capturing ~ 50% of variation.

Immuno-inflammatory axis dominates differential expression

Differential expression analysis identified 1,220 genes significantly dysregulated between epileptic and control tissues (adjusted p < 0.05, |log2FC| ≄ 1; Fig.Ā 3A, B). Genes meeting statistical thresholds displayed stable effect sizes across the expression spectrum, as demonstrated in MA plots. Of these, 672 were upregulated and 548 were downregulated (Fig.Ā 3C). Filtering out excluded pseudogenes and unannotated loci yielded 897 high-confidence differentially expressed genes for further analyses (Fig.Ā 3D). Prominent upregulated genes included TNF, CCL3, XIRP1, and ACY3, whereas notable downregulated genes included IL1RL1, IL1R2, and CXCL11 (Fig.Ā 3E). Volcano plots showed strong enrichment of inflammatory mediators in the upregulated cohort, whereas anti-inflammatory regulators, such as IL1R2, showed reduced expression.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Differential expression analysis of epileptic versus control tissues. (A) Volcano plot of differentially expressed genes (DEGs), with significantly upregulated and downregulated genes highlighted. (B) MA plot showing distribution of fold changes across mean expression levels. (C) Counts of upregulated and downregulated DEGs. (D) Gene filtering process, reducing the initial 1,220 DEGs to 897 after exclusion of pseudogenes and unmapped loci. (E) Top DEGs identified, including upregulated TNF, CCL3, XIRP1, and downregulated IL1RL1, IL1R2, CXCL11.

Ensemble-driven biomarker prioritisation

To refine predictive features, five machine learning classifiers were applied under stratified five-fold cross-validation. Class imbalance was evident, with 84% of samples classified as epileptic and 16% as non-epileptic (Fig.Ā 4A). All models achieved robust performance, with PR-AUC values above 0.95 (Fig.Ā 4B) and mean ROC-AUC values ranging from 0.98 to 0.99 (Fig.Ā 4C). Calibration analysis showed that logistic regression achieved the lowest Brier score (0.035), while tree-based models exhibited slight overconfidence (Fig.Ā 4D). Aggregated metrics demonstrated accuracy exceeding 0.95, recall above 0.93, and F1-scores consistently above 0.94 across classifiers (Fig.Ā 4E).

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Performance of machine learning classifiers under internal cross-validation. (A) Distribution of epileptic and control samples showing class imbalance (~ 84% epileptic). (B) Precision–recall curves across models (Random Forest, Extra Trees, XGBoost, LASSO, SVM) (C) ROC curves demonstrating high discriminatory performance with mean AUC values of 0.98–0.99. (D) Calibration curves showing the best calibration for logistic regression (lowest Brier score). (E) Evaluation metrics (accuracy, precision, recall, F1 score, ROC-AUC) across models.

High-fidelity classification of epileptic tissue

External validation using GSE186334 (n = 68 samples; 46 people with epilepsy, 22 controls) confirmed the generalizability of Random Forest classifiers. Accuracy remained stable across folds at 0.84 ± 0.03, while recall was 0.82 ± 0.04, confirming sensitivity in epileptic tissue classification (Fig.Ā 5A). Precision–recall curves demonstrated external PR-AUC values of 0.95–0.97 (Fig.Ā 5B). Fold-specific models achieved ROC-AUC values ranging from 0.89 to 0.94, with an ensemble classifier yielding an AUC of 0.93 (Fig.Ā 5C). Comparative bar plots showed slightly higher performance in internal cross-validation (AUC 0.96–0.99) than in external testing (AUC 0.80–0.93; Fig.Ā 5D). Despite modest reductions, external validation confirmed the reproducibility of identified biomarkers across independent cohorts.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

External validation of Random Forest classifiers on GSE186334. (A) Fold-specific accuracy, recall, precision, F1, and ROC-AUC values are summarized as boxplots. (B) Precision–recall curves for external validation folds and ensemble model. (C) ROC curves showing fold-specific AUCs (0.89–0.94) and ensemble AUC (0.93). (D) Comparison of internal versus external validation metrics, highlighting modest performance reductions in external testing.

SHAP explanation of top features

SHAP analysis was performed across all classifiers to identify gene-level contributions to epileptic classification. The top contributors included XIRP1, P2RY13, CCL3-AS1, ACY3, TNF, DHRS9, CX3CR1, and P2RY12, each displaying mean absolute SHAP values above 0.005 (Fig.Ā 6A). Cumulative SHAP plots showed that the top 50 genes accounted for more than 50% of the overall explanatory power (TableĀ 1). Summary plots showed that inflammatory and microglial genes, such as IL1B, CD163, S100A8/A9, SELPLG, and CCL4, exerted a strong directional influence on epileptic classification (Fig.Ā 6B). Gene-specific dependence plots further illustrated the predictive behavior of these features, with high IL1B and TNF expression driving epileptic predictions, while reduced IL1R2 expression was associated with controls (Fig.Ā 6C–G). These results established a reproducible, multi-model consensus of discriminative gene features central to the epileptic transcriptomic landscape. While some highly ranked genes (e.g., XIRP1) currently lack a clear immune or epilepsy annotation, we emphasize TNF, IL1B, P2RY12, and other immune–microglial genes in the main text because they are both SHAP-prioritized and strongly supported by pathway, cell-type enrichment, and prior literature. In contrast, genes such as XIRP1 are highlighted as exploratory candidates for future mechanistic work.

Table 1 Ranking of top gene features based on SHAP analysis, showing mean absolute SHAP values, normalized contributions, and cumulative proportions used to define the prioritized biomarker set.
Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

SHAP-based interpretability of classifier features. (A) Top genes ranked by mean absolute SHAP values, including XIRP1, P2RY13, CCL3-AS1, ACY3, and TNF. (B) SHAP summary plot showing directional influence of features such as TNF, CX3CR1, and IL1B.Ā (C–G) Gene-specific SHAP dependence plots demonstrating predictive effects of IL1B, TNF, P2RY12, ACY3, and CD163 on epileptic classification.

Biological validation through ontologies and pathways

Post-hoc functional enrichment analyses of SHAP-prioritized features confirmed their biological plausibility, with no overlap between enrichment inputs and model training procedures. Protein–protein interaction (PPI) networks revealed strong clustering of immune and inflammatory mediators, including TNF, CX3CR1, CCL4, IL1A, and CD163 (Fig.Ā 7A). Gene Ontology enrichment analysis revealed overrepresentation of biological processes related to leukocyte adhesion, the inflammatory response, cytokine secretion, and chemotaxis (Fig.Ā 7B). Molecular function terms were dominated by cytokine and chemokine receptor activity, while cellular component enrichment included plasma membrane and extracellular vesicle localization (Fig.Ā 7C). Reactome pathway analysis identified significant enrichment of IL-10 signaling, purinergic receptor signaling, and antimicrobial response pathways (Fig.Ā 7D). Transcription factor analysis revealed regulatory influence from SPI1, RUNX1, NFĪŗB1, and GATA family members, highlighting upstream immune regulation (Fig.Ā 7E). These findings supported the functional plausibility of prioritized biomarkers, linking them to known immune-inflammatory mechanisms relevant to epilepsy. Supplementary File F1 revealed that prioritized epilepsy biomarkers were strongly enriched in immune-related cell types, particularly microglia, monocytes, and macrophages across brain and peripheral tissues. Key genes driving these enrichments included CX3CR1, P2RY12, CSF1R, IL1B, TNF, CD163, and S100A8/9, all of which are central to neuroinflammatory processes. The enrichment scores and adjusted p-values highlighted a dominant role of microglial activation and monocyte/macrophage infiltration, confirming that the identified transcriptomic biomarkers reflect a robust immune–inflammatory cell-type signature underlying epileptic pathology.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Biological validation through protein networks, ontologies, and pathways. (A) Protein–protein interaction network of prioritized features highlighting TNF, CX3CR1, CCL4, IL1A. (B) Biological process enrichment showing leukocyte adhesion, cytokine secretion, and chemotaxis. (C) Molecular function enrichment, including cytokine receptor activity and chemokine binding. (D) Reactome pathways enriched for IL-10 signaling, purinergic signaling, and antimicrobial response. (E) Transcription factor network identifying SPI1, RUNX1, NFĪŗB1, and GATA factors as upstream regulators.

Therapeutic re-positioning opportunities

Drug–gene interaction analysis identified 43 druggable targets among the top SHAP contributors, out of a total of 300 medicines, for the top contributing biomarkers listed in Supplementary File F2, demonstrating that epilepsy-associated biomarkers, particularly TNF, IL1B, and P2RY12, were linked to multiple drug interactions. Many candidate compounds were FDA-approved, including cromolyn sodium, alteplase, and ticagrelor, while others were investigational biologics targeting cytokine signaling. Interaction scores highlighted high-confidence drug–gene pairs involving key inflammatory mediators, reinforcing the translational relevance of immune pathways. The majority of interactions represented inhibitory mechanisms, with agonists and monoclonal antibodies forming a smaller subset (Fig.Ā 8A). Approximately 43% of mapped compounds were FDA-approved, while the remainder were investigational agents (Fig.Ā 8B). Network representation identified high-confidence interactions, including P2RY12–prasugrel/ticagrelor, TNF–golimumab, and IL1B–canakinumab (Fig.Ā 8C).

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Drug–gene interactions and prioritization of therapeutic candidates. (A) Distribution of interaction types across prioritized drug–gene pairs, with inhibitors most common. (B) FDA approval status of interacting drugs (43% approved, 57% investigational).Ā (C) Drug–gene network highlighting key pairs: P2RY12–prasugrel/ticagrelor, TNF–golimumab, IL1B–canakinumab.

Virtual screening and molecular docking

Virtual screening and docking analyses were performed to evaluate the structural feasibility of prioritized drug–gene interactions. Among the candidates, the strongest binding affinities were observed for prasugrel–P2RY12 (–8.7Ā kcal/mol), pentamidine–IL1B (–8.5Ā kcal/mol), and clopidogrel–P2RY12 (–8.4Ā kcal/mol), while clodronic acid–CCL4 demonstrated a weaker interaction (–6.9Ā kcal/mol) (TableĀ 2). Three-dimensional docking visualizations confirmed that ligands were positioned securely within predicted binding pockets, forming a combination of hydrogen bonds, hydrophobic contacts, and π–π interactions that stabilized ligand conformations (Fig.Ā 9A–F). For instance, prasugrel engaged residues within the orthosteric pocket of P2RY12, while pentamidine formed stable hydrogen bonds with IL1B residues, supporting predicted binding stability. Two-dimensional interaction maps corroborated these observations by highlighting residue-specific contact patterns. To enhance translational relevance, all docked candidates were screened using ADMETlab (version 2.0) to assess pharmacokinetic and toxicological profiles (TableĀ 3). Compounds with unfavorable absorption, distribution, metabolism, or excretion properties, or those showing predicted toxicities, were excluded. After filtering, several FDA-approved agents, including prasugrel and clopidogrel, were retained as high-confidence docking candidates. Collectively, these docking studies demonstrated that prioritized epilepsy-associated inflammatory mediators, particularly P2RY12 and IL1B, were structurally druggable, with clinically relevant compounds showing favorable binding energies and stable interaction profiles suitable for downstream validation.

Table 2 Docking results of prioritized drug–gene interactions. Binding affinities (kcal/mol) of FDA-approved or high-confidence drug candidates docked to their respective protein targets.
Table 3 ADMET profiling of candidate drugs: Pharmacokinetic and toxicity properties of drug–gene candidates retrieved from admetlab (version 2.0). Parameters include lipinski’s rule of five, molecular weight, logP, topological Polar surface area (tPSA), blood–brain barrier permeability, human intestinal absorption (HIA), bioavailability, AMES mutagenicity, clinical toxicity, hERG inhibition, and CYP3A4 inhibition. Drugs such as Prasugrel and fluticasone demonstrated favorable CNS penetration and high absorption, whereas clodronic acid exhibited limited bioavailability.

Molecular dynamics simulations

Molecular dynamics (MD) simulations were conducted to assess the temporal stability of the top-ranked drug–protein complexes identified in docking studies. The prasugrel–P2RY12 and pentamidine–IL1B complexes were simulated for 100 ns under physiologically relevant conditions. Root-mean-square deviation (RMSD) analysis revealed that both complexes equilibrated within the first 20 ns and remained stable throughout the trajectories, confirming convergence of the protein–ligand conformations (Fig.Ā 9G–H). Root-mean-square fluctuation (RMSF) profiles showed localized flexibility in surface-exposed loop regions. At the same time, residues in the binding pockets exhibited reduced fluctuations, indicating restricted dynamics upon ligand binding (Fig.Ā 10A–B). Interaction fraction analysis further demonstrated the persistence of key stabilizing contacts, with > 70% of frames retaining hydrogen bonds and hydrophobic interactions across the trajectories (Fig.Ā 10C–D). For the prasugrel–P2RY12 complex, persistent hydrogen bonding with residues GLU81 and THR93 stabilized ligand orientation, while pentamidine maintained long-lived contacts with ASN86 and GLU101 in IL1B. These findings confirmed that docking-predicted interactions were preserved under dynamic conditions. Together, the MD simulations validated the robust, enduring stability of prioritized drug–gene complexes, supporting their potential translational relevance as therapeutic candidates for epilepsy.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Docking of prasugrel–P2RY12 (4NTJ) and pentamidine–IL1B (4G6J) complexes. Binding pockets, 2D interactions, and RMSD trajectories demonstrated stable protein–ligand conformations, with equilibrium reached within ~ 20 ns and sustained stability across the 100 ns simulations.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

MD simulations of prasugrel–P2RY12 and pentamidine–IL1B. RMSF plots showed localized flexibility outside the binding pockets, while interaction-fraction profiles confirmed persistent hydrogen bonds and hydrophobic contacts. Both complexes maintained stable drug–protein interactions throughout 100 ns trajectories, supporting docking-derived binding stability.

Discussion

In this study, immune-inflammatory gene signatures associated with drug-resistant epilepsy (DRE) were identified and validated by integrating multi-cohort transcriptomic data using interpretable machine learning approaches1. Key features such as TNF, IL1B, and P2RY12 were consistently prioritized across models, with strong discriminatory power between epileptic and non-epileptic brain tissue. Functional enrichment analyses linked these genes to innate immune and microglial activation pathways, while drug–gene interaction and docking studies identified potential FDA-approved compounds for therapeutic repurposing. Together, these findings position immune signaling as both a mechanistic contributor and a therapeutic target in epilepsy. Expression of these biomarkers may be influenced by genetic variants, promoter methylation, histone marks, and microRNAs that regulate neuroinflammatory pathways in epilepsy. The results underscore the biological plausibility of a neuroimmune axis in epileptogenesis and drug resistance. Elevated expression of inflammatory mediators such as TNF and IL1B in epileptic tissue supports growing evidence that chronic neuroinflammation contributes to seizure persistence and refractoriness3. Our observation that top SHAP-prioritized genes were enriched in microglia and monocyte-related pathways aligns with prior studies demonstrating increased peripheral and central pro-inflammatory cytokines in DRE patients4,20. The presence of immune infiltration and dysregulated cytokine signaling, particularly involving CX3CR1, CSF1R, and P2RY12, reinforces the hypothesis that glial activation plays a central role in epileptogenic remodeling21. These findings extend prior ML transcriptomic studies in epilepsy by demonstrating cross-tissue and cross-cohort reproducibility of an immune–microglial gene set and, to our knowledge, providing the first SHAP-guided, structure-supported prioritization of P2RY12-targeting antiplatelet agents as repurposable candidates in DRE.

These findings are consistent with recent reports identifying elevated Th17 responses20, peripheral complement deficiencies6, and microglial activation22 in DRE. Our study extends this literature by integrating high-dimensional RNA-seq data with machine learning and explainability tools to offer a unified gene-level understanding of immune involvement. Unlike prior studies focused on single cohorts or blood-based markers, our multi-dataset approach leverages both cortical and hippocampal tissues, increasing generalizability and biological resolution. However, because the datasets were cross-sectional, longitudinal immune-inflammatory dynamics and the predictive value for treatment response could not be assessed. Nevertheless, several strengths support the robustness of our results23,24. First, the study performed batch-aware normalization and addressed class imbalance using stratified cross-validation with SMOTE, reducing common biases in transcriptomic studies. Second, our ensemble modeling captured both linear and non-linear feature relationships, enhancing interpretability across algorithms. Third, external validation in an independent cohort demonstrated strong predictive consistency, reinforcing the translational potential of the identified biomarkers. Finally, integrating SHAP-based feature interpretation with drug–gene docking provides a practical framework for prioritizing drug repurposing candidates. Our use of SHAP and permutation importance provides post-hoc explanations of model predictions rather than entirely intrinsic interpretability; we therefore interpret these results as transparent feature attributions that must be cross-validated with independent biological evidence, rather than as proofs of causal mechanisms.

Several of our observations, including upregulation of TNF, IL1B, and microglial markers, confirm prior reports of neuroinflammation in drug-resistant epilepsy. Beyond corroborating these mechanisms, our study provides novel contributions by integrating hippocampal and cortical RNA-seq cohorts with batch-aware modeling, using multi-model SHAP attributions to define a robust immune–microglial gene set, and systematically nominating P2RY12-centered drug–target pairs for repurposing.

Prior studies have highlighted immune and inflammatory processes as central to drug-resistant epilepsy (DRE). Elevated TNF and IL1B levels in serum and brain tissue suggest their role in sustaining seizure activity24. Microglial activation and dysregulation of P2RY12 and CX3CR1 signaling contribute to epileptogenic remodeling25. Complement pathway abnormalities have been observed in refractory patients, indicating peripheral immune dysfunction26. Evidence also points to increased Th17 responses and pro-inflammatory cytokines in DRE20. Additionally, glial-driven neuroinflammation has been repeatedly confirmed in both experimental and clinical models27. Our study confirms these findings but advances the field by integrating multi-cohort transcriptomics with interpretable machine learning, prioritizing TNF, IL1B, and P2RY12 across tissues, and uniquely linking them to repurposable FDA-approved compounds. Nonetheless, bulk RNA-seq precludes single-cell resolution, potentially masking cell-specific expression patterns. Although the study infers immune cell-type enrichment via deconvolution, future single-cell RNA-seq analyses could validate these findings with greater precision. Additionally, although the study controlled for batch and tissue heterogeneity, unmeasured confounders, such as medication history or comorbid inflammation, could influence expression profiles. Finally, in silico docking and dynamics simulations offer only preliminary evidence of druggability; experimental validation is needed to confirm biological efficacy and safety. These findings have important implications. Clinically, immune-related biomarkers such as P2RY12, TNF, and IL1B may serve as diagnostic indicators or predictors of treatment response, particularly in patients considered for immunomodulatory therapy. Our identification of repurposable drugs like prasugrel and clopidogrel is timely, given the growing interest in anti-inflammatory strategies for epilepsy27,28.

This work has several limitations. First, we analyzed bulk RNA-seq, which cannot resolve cell-type–specific or spatially restricted expression. Second, clinical variables (e.g., detailed treatment history, seizure frequency, comorbid inflammation) were incompletely available, limiting clinicogenomic correlation. Third, all biomarker and drug-repurposing findings are based on in silico analyses; we did not perform wet-lab validation (qPCR, protein assays, or functional drug testing). Accordingly, our results should be viewed as hypothesis-generating and require confirmation in future experimental and clinical studies. A significant limitation is that all findings are derived from in silico analyses; we did not perform qPCR, immunohistochemistry, Western blot/ELISA, or in vitro drug-treatment assays to experimentally validate gene or protein expression, nor the functional impact of candidate drugs. Thus, TNF, IL1B, P2RY12, and the proposed drug–target interactions should be regarded as prioritized hypotheses requiring confirmation in patient tissue, experimental epilepsy models, cell-based assays, and independent single-cell and proteomic datasets.

Future work will benchmark docking- and MD-derived binding strengths against experimental affinity measurements (e.g., SPR, ITC, radioligand assays) and assess correlation between predicted energies and Kd/ICā‚…ā‚€ values for prioritized complexes. Therapeutic modulation of neuroinflammation could complement or augment current antiseizure medications, particularly in patients with refractory seizures. Experimental in vitro and in vivo validation is required to confirm the efficacy and safety of repurposable FDA-approved candidates identified in this study. It should pursue experimental validation of candidate targets in animal models and in patient-derived tissues. Integration of spatial transcriptomics and multi-omics (e.g., epigenomics, proteomics, metabolomics) will enable localization of immune-inflammatory programs within epileptogenic microenvironments and link transcriptional signatures to upstream regulation and downstream functional states, thereby refining mechanistic insight into neuroimmune contributions to drug-resistant epilepsy.

Additionally, longitudinal studies are needed to assess whether inflammatory markers can predict disease progression or treatment resistance. The use of single-cell technologies and spatial transcriptomics will further refine our understanding of cellular heterogeneity and tissue microenvironment in epilepsy.

Conclusion

This study presents a comprehensive transcriptomic analysis of drug-resistant epilepsy, integrating multi-cohort RNA-sequencing datasets with explainable machine learning to identify robust, biologically relevant biomarkers. By controlling for batch effects, tissue heterogeneity, and class imbalance, the study prioritized a set of immune-inflammatory genes—most notably TNF, IL1B, CX3CR1, and P2RY12 that consistently differentiated epileptic from non-epileptic brain tissue. These biomarkers demonstrated high classification performance across internal and external datasets, underscoring their potential diagnostic utility. Functional enrichment analyses confirmed that these genes are enriched in microglial and monocyte-related pathways, implicating neuroimmune activation as a key component of epilepsy pathology. Drug–gene interaction mapping and structure-based screening further revealed that several of these targets are actionable, with high-affinity interactions identified for FDA-approved agents such as prasugrel and clopidogrel. These findings suggest promising avenues for therapeutic repurposing, particularly in targeting inflammatory mechanisms underlying epilepsy. Despite the strengths of this multi-method framework, including external validation, interpretable feature selection, and integrative drug screening, limitations remain. These include the use of bulk RNA-seq, which may mask cell-type–specific dynamics, and the retrospective nature of dataset harmonization. Future studies should pursue single-cell validation and in vivo functional assays to further elucidate causal mechanisms. Accordingly, our study should be viewed as a computational prioritization framework that nominates immune-inflammatory biomarkers and repurposable agents for downstream experimental testing, rather than as definitive mechanistic validation of these targets. Overall, this study demonstrates that immune-inflammatory transcriptomic signatures can reliably stratify epileptic pathology and may serve as both biomarkers and therapeutic entry points. These results provide a foundation for future precision medicine efforts in epilepsy, bridging computational discovery with translational potential. Integrative transcriptomic modeling reveals immune signatures central to epilepsy and highlights repurposable drugs with therapeutic promise, advancing the path toward precision diagnostics and treatment.