Explainable machine learning identifies immune-inflammatory biomarkers and therapeutic candidates in drug-resistant epilepsy

Ijaz, Tayyab; Maqsood, Hamna; Rehman, Abdur; Tahir ul Qamar, Muhammad; Ashfaq, Usman Ali

doi:10.1038/s41598-025-30401-x

Download PDF

Article
Open access
Published: 25 December 2025

Explainable machine learning identifies immune-inflammatory biomarkers and therapeutic candidates in drug-resistant epilepsy

Tayyab Ijaz¹,
Hamna Maqsood¹,
Abdur Rehman²,
Muhammad Tahir ul Qamar¹ &
…
Usman Ali Ashfaq¹

Scientific Reports volume 16, Article number: 965 (2026) Cite this article

5116 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Drug-resistant epilepsy (DRE) affects one-third of total epileptic patients and remains a major clinical challenge. Growing evidence implicates neuroinflammation as a key contributor to epileptogenesis and therapeutic resistance, but comprehensive, reproducible transcriptomic biomarkers are lacking. This study aimed to identify immune-inflammatory gene signatures associated with DRE using integrated transcriptomic profiling and machine-learning classifiers coupled with SHAP-based post-hoc explainability. Herein, this study curated and integrated 197 publicly available RNA-sequencing samples from cortical and hippocampal tissues across three Gene Expression Omnibus (GEO) datasets, comprising 162 epileptic and 35 non-epileptic control samples. After preprocessing and batch correction, differential expression analysis and ensemble-based feature selection were performed using the supervised classifiers Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Extra Trees (ETs), and XGBoost. SHAP (SHapley Additive Explanations) values were used to prioritize features. External validation was conducted on an independent dataset (n = 68). Drug-gene interactions and molecular docking were applied to top-ranked genes. A set of 897 differentially expressed genes (DEGs), including 659 upregulated and 238 downregulated genes, was identified and was enriched for immune-inflammatory processes. Machine learning classifiers achieved high internal performance (mean ROC-AUC: 0.98–0.99) and robust external validation (ensemble ROC-AUC: 0.93). SHAP analysis consistently prioritized genes, including TNF, IL1B, and P2RY12. These features were biologically enriched in microglial and monocyte-related pathways. Drug-gene interaction identified multiple repurposable compounds, with Prasugrel and Pentamidine having strong binding affinities in docking studies. This study reveals reproducible immune-related transcriptomic biomarkers of drug-resistant epilepsy, highlights actionable targets for therapeutic repurposing, and provides a framework for precision medicine approaches in epilepsy. Code and processed data are available at: https://github.com/Tayyab-Ijaz/EpilepsyBiomarkerDrugs.

Transcriptomic decoding of regional cortical vulnerability to drug-resistant epilepsy using 7T MRI

Article Open access 24 December 2025

Predicting surgical outcome in drug-resistant epilepsy by combining interictal biomarkers within a machine learning framework

Article Open access 26 March 2026

Bioinformatics prediction of function of T-cell exhaustion related genes in ischemic stroke

Article Open access 29 May 2025

Introduction

Epilepsy is a neurodevelopmental disorder that affects over 70 million patients in the world, characterized by the occurrence of repeated unprovoked seizures due to abnormal neuronal activity. Alongside pharmacological and surgical interventions, unfortunately, one-third of people with epilepsy still have drug-resistant epilepsy (DRE). DRE is described as the failure to achieve sustainable seizure control, regardless of attempts with adequate antiseizure medications, and the patients are termed drug-resistant¹. DRE has increased risks of cognitive decline, psychiatric comorbidities, accidental injury, and even premature mortality, making it a substantial clinical and public health concern². Evidence for neuroinflammation being a key mediator in epileptogenesis, especially in the drug-resistant condition, is rapidly increasing. Both preclinical and clinical models consistently demonstrate increased expression of proinflammatory cytokines, such as IL-1β, TNF, and IL-6, in seizure-initiating brain regions and in peripheral blood^3,4. Microglial activation, complement system dysregulation, and leukocyte infiltration are now recognized as key contributors to seizure initiation, propagation, and resistance to therapy^5,6. Such results have introduced the concept of immune-related biomarkers and new approaches targeting neuroinflammatory pathways as potential treatments^5,6.

Despite this, the transcriptomic landscape of drug-resistant epilepsy remains incompletely defined; most studies rely on small sample sizes or focus on only a limited number of candidate cytokines, making it challenging to identify gene-level traits that are common across experiments^7,8. Moreover, heterogeneity in tissue sources, batch effects, and data preprocessing across studies hampers direct comparisons and generalization⁹. The lack of integrative, high-dimensional analyses with model interpretability has limited the clinical applications of drug-resistant transcriptomic findings. Systems-level design employing transcriptomic integration, machine learning, and model explainability is the optimal solution to cope these limitations SHAP (SHapley Additive Explanations) provides clear explanations of its contributions to the model, a post-hoc model-agnostic explanation method, to quantify feature attributions and improve transparency in biomarker discovery and prioritization, while acknowledging that the underlying models themselves remain complex, thereby enhancing transparency in biomarker discovery and prioritization¹⁰. Furthermore, combining gene signatures and drug-target interactions via in silico docking provides a promising framework for therapeutic repurposing in epilepsy. Hence, this study was designed to identify and validate immune-inflammatory gene signatures from drug-resistant epilepsy datasets using integrated transcriptomic profiling and interpretable machine learning, illustrated by the overall workflow depicted in Fig. 1. This study hypothesized that immune-related genes would reliably distinguish epileptic from non-epileptic tissue and that top-ranked biomarkers would be druggable, offering translational potential for biomarker-guided therapy.

Methods

Data acquisition

Publicly available RNA sequencing datasets retrieved from the NCBI Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) were integrated (GSE256068, GSE213488, and GSE186334). GSE256068 has RNA sequencing data derived from hippocampal and cortical brain tissues of patients with drug-resistant epilepsy. It includes 64 samples from patients with temporal lobe epilepsy and hippocampal sclerosis (TLE-HS), 33 samples from patients with focal cortical dysplasia type IIb (FCD2b), 21 samples from patients with tuberous sclerosis complex (TSC), and 27 non-epileptic controls, comprising both hippocampal and cortical regions. GSE213488 includes gray- and white-matter transcriptomic profiling of focal cortical dysplasia type IIa (FCD2a) and IIb (FCD2b), with 17 epileptic and four control samples. These datasets were generated using Illumina sequencing platforms and are supported by well-annotated metadata, making them ideal for integrative and comparative transcriptomic analyses. The unified dataset constructed from these three GEO accessions comprises 197 samples, including 162 from epileptic patients and 35 from non-epileptic controls. To validate the robustness and generalizability of these top features, GSE186334 was used as external validation, which includes cortical expression profiles from both epileptic (46) and non-epileptic (22) individuals, it is an entirely independent cohort, generated in a separate study with cortical-only sampling and distinct sequencing/batch characteristics, and was not used for differential expression analysis, feature selection, hyperparameter tuning, or model training All datasets were accessed in May 2025, and raw count matrices and metadata were downloaded in their original form to ensure reproducibility.

Data preprocessing

Raw RNA counts from GSE256068 and GSE213488 were merged into a dataset using gene symbols as standard identifiers. Genes with low-abundance features were removed, excluding those with fewer than 10 total reads across all samples, a step aimed at reducing noise from low-abundance transcripts⁹. Sample metadata was curated to include batch identifiers, tissue type (hippocampal or cortical), and clinical condition (epileptic or non-epileptic). Metadata alignment was verified by ensuring concordance with the column of the count matrix. Data integrity checks were conducted to remove duplicate entries or missing annotations. No attention-based integration methods were used; datasets were merged strictly by shared gene symbols, and subsequent batch-aware normalization was performed using a variance-stabilizing transformation (VST) in DESeq2 (version 1.42.0, R version 4.3.1) to normalize gene counts and mitigate the influence of sequencing depth¹¹. Specifically, DESeq2 models counts using a negative binomial distribution, estimates gene-wise dispersions, and fits a smooth mean–dispersion curve; the variance-stabilizing transformation is then derived as an approximate integral over this mean–variance relationship, yielding transformed values with approximately constant variance across expression levels. Principal component analysis (PCA) was performed to confirm that batch and tissue signals were adequately captured in the metadata.

Differential expression analysis

DESeq2 (version 1.42.0) was used to perform differential expression analysis (DEA), accounting for batch effects and tissue as explicitly specified covariates. At the same time, variance partitioning has enabled their impact to be appropriately adjusted, thereby ensuring the reliable identification of genes associated with the disease^12,13. Wald tests were used to assess differential expression between epileptic and non-epileptic samples. The apeglm method (version 1.24.0) was used to refine effect sizes, yielding more efficient log2fold change estimates, especially for lowly expressed genes. Genes were defined as differentially expressed if they satisfied two criteria: an adjusted p-value (Benjamini-Hochberg false discovery rate) < 0.05 and an absolute log₂ fold change (|log₂FC|) ≥ 1^14,15, indicating thresholds of statistical and biological effect size, control of false discoveries between epileptic and control samples.

Feature selection and ensemble modeling

Feature selection was performed using an ensemble approach that integrated five supervised learning algorithms: RF, ETs, XGBoost, LASSO LR, and linear SVM¹⁶. These classifiers were implemented in Python (scikit-learn version 1.4.2 and XGBoost version 2.1.0). To address class imbalance (~ 84% epileptic), stratified 5-fold cross-validation was applied, and the Synthetic Minority Oversampling Technique (SMOTE) was used exclusively on the training folds to augment the minority class without data leakage¹⁷. Feature importance scores were obtained from each model using either coefficient magnitudes (linear models) or Gini-based impurity measures (tree-based models). Hyperparameters for RF, ETs, XGBoost, LASSO, and SVM were tuned using grid search with inner cross-validation on the training data only. For example, RF/ETs were tuned over n_estimators (100–1000) and max_depth (3–None); XGBoost over n_estimators (100–800), max_depth (3–10), and learning_rate (0.01–0.3); and C/penalty parameters for LASSO and SVM over log-spaced grids. All procedures used fixed random_state values (42) for cross-validation splits, tree initializations, and SMOTE to ensure reproducibility.

SHAP (SHapley Additive Explanations, version 0.45.0) and model-agnostic permutation importance were computed to validate feature relevance across different model classes. Genes were retained if their cumulative SHAP contribution reached at least 50% of the model’s total explanatory power, using a fixed cutoff across classifiers to ensure consistency¹⁸. This reduced model-specific biases, minimized redundancy, and improved generalizability to capture both linear and non-linear gene–disease relationships while ensuring that feature selection remained reproducible and interpretable.

Post-hoc model explainability with SHAP and permutation importance

To quantify gene contributions to the model’s predictions, explainability assessments were performed using SHAP and permutation importance. SHAP was also implemented for each model type separately: TreeExplainer for RF, ETs, and XGBoost; and LinearExplainer and KernelExplainer for LASSO LR and SVM, respectively. This ensured consistent, model-specific interpretation of gene relevance. SHAP values were aggregated across samples to compute overall feature importance, while local interpretability was assessed by examining SHAP distributions in misclassified samples. Permutation importance was also applied across models as a sanity check to confirm whether predictive performance declined when specific genes were randomized. These methods were implemented using SHAP (version 0.45.0) and scikit-learn (version 1.4.2). We computed mean absolute SHAP values per gene and normalized them so that the sum of SHAP contributions within a model equaled 1. Normalized SHAP scores were then averaged across the five classifiers to obtain a consensus importance for each gene.

External validation

External validation was performed by applying the best-performing classifiers trained on internal cross-validation folds to the independent GSE186334 dataset. For each of the five folds, a separate model was trained, predictions were made on the external dataset, and performance metrics, including accuracy, precision, recall, F1 score, and area under the ROC curve (AUC), were computed. An ensemble strategy was implemented by averaging predicted probabilities across the fold models, thereby producing a consensus classifier. This ensemble was chosen to reduce fold-to-fold variability and improve generalization. Performance was evaluated using ROC and precision–recall curves, and results were compared with internal cross-validation outcomes to quantify consistency. All analyses were conducted in Python using scikit-learn (version 1.4.2).

Drug–gene interaction mapping

To assess the translational potential of top features, drug–gene interaction mapping was performed using the Drug–Gene Interaction Database (DGIdb, version 4.3.0, https://www.dgidb.org/). The top-ranked genes from SHAP analyses were submitted to the database, and all reported interactions were retrieved¹⁹. To ensure reproducibility and clinical relevance, interactions were filtered according to the following criteria: (1) only FDA-approved drugs were retained, and (2) interaction scores > 2. These thresholds were chosen to minimize inclusion of low-confidence or preclinical interactions while maintaining potential therapeutic relevance. Additional prioritization was applied to drugs with established roles in neurological or inflammatory pathways, given their significance to epilepsy pathophysiology.

Molecular docking and molecular dynamics simulations

Molecular docking and molecular dynamics (MD) simulations were performed to evaluate the structural feasibility of drug–gene interactions. Protein structures corresponding to prioritized genes were retrieved from the Protein Data Bank (PDB, https://www.rcsb.org/) when available, while unresolved structures were modeled using SWISS-MODEL (version 4.2.0, https://swissmodel.expasy.org/). Ligand structures of FDA-approved drugs were prepared using PyRx (version 0.9.8) and screened for pharmacokinetic and toxicity properties with ADMETlab (version 2.0, https://admetmesh.scbdd.com/) to ensure drug-likeness and safety relevance. Docking studies were then carried out using AutoDock Vina (version 1.2.5), focusing on biologically annotated binding sites. Candidate complexes were ranked according to binding affinity scores, and the top-scoring interactions were selected for MD simulations. MD simulations were performed for 100 ns under physiological conditions using Schrödinger Desmond (Schrödinger Release 2024-1, https://www.schrodinger.com/). System stability was assessed using root mean square deviation (RMSD) and root mean square fluctuation (RMSF), which provided insights into temporal stability, binding persistence, and protein–ligand flexibility.

Results

Integrated transcriptome profiling

RNA sequencing samples from two GEO datasets (GSE256068 and GSE213488) were included; after filtering, 18,642 protein-coding genes were retained for downstream analysis. The cohort distribution indicated that the majority of epileptic cases originated from hippocampal and temporal lobe tissues (Fig. 2A), whereas control samples were more evenly distributed across regions. Heatmap visualization of sample-to-sample distances further confirmed clear separation between epileptic and control groups, although tissue-specific clustering was also observed (Fig. 2B). Data integration highlighted contributions from two major batches, with 82.2% of samples derived from GSE256068 and 17.8% from GSE213488 (Fig. 2C). Variance decomposition showed that batch effects accounted for 27.3% of overall variation, tissue type contributed 10.5%, and clinical condition explained 2.4% (Fig. 2D). Principal component analysis demonstrated batch-driven segregation before correction, but post-correction, using a variance-stabilizing transformation in DESeq2, showed precise alignment with epileptic status, confirming effective removal of batch effects (Fig. 2E–F). The first two principal components explained 49.3% of the variance, underscoring the influence of both technical and biological factors (Fig. 2G).

Immuno-inflammatory axis dominates differential expression

Differential expression analysis identified 1,220 genes significantly dysregulated between epileptic and control tissues (adjusted p < 0.05, |log2FC| ≥ 1; Fig. 3A, B). Genes meeting statistical thresholds displayed stable effect sizes across the expression spectrum, as demonstrated in MA plots. Of these, 672 were upregulated and 548 were downregulated (Fig. 3C). Filtering out excluded pseudogenes and unannotated loci yielded 897 high-confidence differentially expressed genes for further analyses (Fig. 3D). Prominent upregulated genes included TNF, CCL3, XIRP1, and ACY3, whereas notable downregulated genes included IL1RL1, IL1R2, and CXCL11 (Fig. 3E). Volcano plots showed strong enrichment of inflammatory mediators in the upregulated cohort, whereas anti-inflammatory regulators, such as IL1R2, showed reduced expression.

Ensemble-driven biomarker prioritisation

To refine predictive features, five machine learning classifiers were applied under stratified five-fold cross-validation. Class imbalance was evident, with 84% of samples classified as epileptic and 16% as non-epileptic (Fig. 4A). All models achieved robust performance, with PR-AUC values above 0.95 (Fig. 4B) and mean ROC-AUC values ranging from 0.98 to 0.99 (Fig. 4C). Calibration analysis showed that logistic regression achieved the lowest Brier score (0.035), while tree-based models exhibited slight overconfidence (Fig. 4D). Aggregated metrics demonstrated accuracy exceeding 0.95, recall above 0.93, and F1-scores consistently above 0.94 across classifiers (Fig. 4E).

High-fidelity classification of epileptic tissue

External validation using GSE186334 (n = 68 samples; 46 people with epilepsy, 22 controls) confirmed the generalizability of Random Forest classifiers. Accuracy remained stable across folds at 0.84 ± 0.03, while recall was 0.82 ± 0.04, confirming sensitivity in epileptic tissue classification (Fig. 5A). Precision–recall curves demonstrated external PR-AUC values of 0.95–0.97 (Fig. 5B). Fold-specific models achieved ROC-AUC values ranging from 0.89 to 0.94, with an ensemble classifier yielding an AUC of 0.93 (Fig. 5C). Comparative bar plots showed slightly higher performance in internal cross-validation (AUC 0.96–0.99) than in external testing (AUC 0.80–0.93; Fig. 5D). Despite modest reductions, external validation confirmed the reproducibility of identified biomarkers across independent cohorts.

SHAP explanation of top features

SHAP analysis was performed across all classifiers to identify gene-level contributions to epileptic classification. The top contributors included XIRP1, P2RY13, CCL3-AS1, ACY3, TNF, DHRS9, CX3CR1, and P2RY12, each displaying mean absolute SHAP values above 0.005 (Fig. 6A). Cumulative SHAP plots showed that the top 50 genes accounted for more than 50% of the overall explanatory power (Table 1). Summary plots showed that inflammatory and microglial genes, such as IL1B, CD163, S100A8/A9, SELPLG, and CCL4, exerted a strong directional influence on epileptic classification (Fig. 6B). Gene-specific dependence plots further illustrated the predictive behavior of these features, with high IL1B and TNF expression driving epileptic predictions, while reduced IL1R2 expression was associated with controls (Fig. 6C–G). These results established a reproducible, multi-model consensus of discriminative gene features central to the epileptic transcriptomic landscape. While some highly ranked genes (e.g., XIRP1) currently lack a clear immune or epilepsy annotation, we emphasize TNF, IL1B, P2RY12, and other immune–microglial genes in the main text because they are both SHAP-prioritized and strongly supported by pathway, cell-type enrichment, and prior literature. In contrast, genes such as XIRP1 are highlighted as exploratory candidates for future mechanistic work.

Table 1 Ranking of top gene features based on SHAP analysis, showing mean absolute SHAP values, normalized contributions, and cumulative proportions used to define the prioritized biomarker set.

Full size table

Biological validation through ontologies and pathways

Post-hoc functional enrichment analyses of SHAP-prioritized features confirmed their biological plausibility, with no overlap between enrichment inputs and model training procedures. Protein–protein interaction (PPI) networks revealed strong clustering of immune and inflammatory mediators, including TNF, CX3CR1, CCL4, IL1A, and CD163 (Fig. 7A). Gene Ontology enrichment analysis revealed overrepresentation of biological processes related to leukocyte adhesion, the inflammatory response, cytokine secretion, and chemotaxis (Fig. 7B). Molecular function terms were dominated by cytokine and chemokine receptor activity, while cellular component enrichment included plasma membrane and extracellular vesicle localization (Fig. 7C). Reactome pathway analysis identified significant enrichment of IL-10 signaling, purinergic receptor signaling, and antimicrobial response pathways (Fig. 7D). Transcription factor analysis revealed regulatory influence from SPI1, RUNX1, NFκB1, and GATA family members, highlighting upstream immune regulation (Fig. 7E). These findings supported the functional plausibility of prioritized biomarkers, linking them to known immune-inflammatory mechanisms relevant to epilepsy. Supplementary File F1 revealed that prioritized epilepsy biomarkers were strongly enriched in immune-related cell types, particularly microglia, monocytes, and macrophages across brain and peripheral tissues. Key genes driving these enrichments included CX3CR1, P2RY12, CSF1R, IL1B, TNF, CD163, and S100A8/9, all of which are central to neuroinflammatory processes. The enrichment scores and adjusted p-values highlighted a dominant role of microglial activation and monocyte/macrophage infiltration, confirming that the identified transcriptomic biomarkers reflect a robust immune–inflammatory cell-type signature underlying epileptic pathology.

Therapeutic re-positioning opportunities

Drug–gene interaction analysis identified 43 druggable targets among the top SHAP contributors, out of a total of 300 medicines, for the top contributing biomarkers listed in Supplementary File F2, demonstrating that epilepsy-associated biomarkers, particularly TNF, IL1B, and P2RY12, were linked to multiple drug interactions. Many candidate compounds were FDA-approved, including cromolyn sodium, alteplase, and ticagrelor, while others were investigational biologics targeting cytokine signaling. Interaction scores highlighted high-confidence drug–gene pairs involving key inflammatory mediators, reinforcing the translational relevance of immune pathways. The majority of interactions represented inhibitory mechanisms, with agonists and monoclonal antibodies forming a smaller subset (Fig. 8A). Approximately 43% of mapped compounds were FDA-approved, while the remainder were investigational agents (Fig. 8B). Network representation identified high-confidence interactions, including P2RY12–prasugrel/ticagrelor, TNF–golimumab, and IL1B–canakinumab (Fig. 8C).

Virtual screening and molecular docking

Virtual screening and docking analyses were performed to evaluate the structural feasibility of prioritized drug–gene interactions. Among the candidates, the strongest binding affinities were observed for prasugrel–P2RY12 (–8.7 kcal/mol), pentamidine–IL1B (–8.5 kcal/mol), and clopidogrel–P2RY12 (–8.4 kcal/mol), while clodronic acid–CCL4 demonstrated a weaker interaction (–6.9 kcal/mol) (Table 2). Three-dimensional docking visualizations confirmed that ligands were positioned securely within predicted binding pockets, forming a combination of hydrogen bonds, hydrophobic contacts, and π–π interactions that stabilized ligand conformations (Fig. 9A–F). For instance, prasugrel engaged residues within the orthosteric pocket of P2RY12, while pentamidine formed stable hydrogen bonds with IL1B residues, supporting predicted binding stability. Two-dimensional interaction maps corroborated these observations by highlighting residue-specific contact patterns. To enhance translational relevance, all docked candidates were screened using ADMETlab (version 2.0) to assess pharmacokinetic and toxicological profiles (Table 3). Compounds with unfavorable absorption, distribution, metabolism, or excretion properties, or those showing predicted toxicities, were excluded. After filtering, several FDA-approved agents, including prasugrel and clopidogrel, were retained as high-confidence docking candidates. Collectively, these docking studies demonstrated that prioritized epilepsy-associated inflammatory mediators, particularly P2RY12 and IL1B, were structurally druggable, with clinically relevant compounds showing favorable binding energies and stable interaction profiles suitable for downstream validation.

Table 2 Docking results of prioritized drug–gene interactions. Binding affinities (kcal/mol) of FDA-approved or high-confidence drug candidates docked to their respective protein targets.

Full size table

Table 3 ADMET profiling of candidate drugs: Pharmacokinetic and toxicity properties of drug–gene candidates retrieved from admetlab (version 2.0). Parameters include lipinski’s rule of five, molecular weight, logP, topological Polar surface area (tPSA), blood–brain barrier permeability, human intestinal absorption (HIA), bioavailability, AMES mutagenicity, clinical toxicity, hERG inhibition, and CYP3A4 inhibition. Drugs such as Prasugrel and fluticasone demonstrated favorable CNS penetration and high absorption, whereas clodronic acid exhibited limited bioavailability.

Full size table

Molecular dynamics simulations

Molecular dynamics (MD) simulations were conducted to assess the temporal stability of the top-ranked drug–protein complexes identified in docking studies. The prasugrel–P2RY12 and pentamidine–IL1B complexes were simulated for 100 ns under physiologically relevant conditions. Root-mean-square deviation (RMSD) analysis revealed that both complexes equilibrated within the first 20 ns and remained stable throughout the trajectories, confirming convergence of the protein–ligand conformations (Fig. 9G–H). Root-mean-square fluctuation (RMSF) profiles showed localized flexibility in surface-exposed loop regions. At the same time, residues in the binding pockets exhibited reduced fluctuations, indicating restricted dynamics upon ligand binding (Fig. 10A–B). Interaction fraction analysis further demonstrated the persistence of key stabilizing contacts, with > 70% of frames retaining hydrogen bonds and hydrophobic interactions across the trajectories (Fig. 10C–D). For the prasugrel–P2RY12 complex, persistent hydrogen bonding with residues GLU81 and THR93 stabilized ligand orientation, while pentamidine maintained long-lived contacts with ASN86 and GLU101 in IL1B. These findings confirmed that docking-predicted interactions were preserved under dynamic conditions. Together, the MD simulations validated the robust, enduring stability of prioritized drug–gene complexes, supporting their potential translational relevance as therapeutic candidates for epilepsy.

Discussion

In this study, immune-inflammatory gene signatures associated with drug-resistant epilepsy (DRE) were identified and validated by integrating multi-cohort transcriptomic data using interpretable machine learning approaches¹. Key features such as TNF, IL1B, and P2RY12 were consistently prioritized across models, with strong discriminatory power between epileptic and non-epileptic brain tissue. Functional enrichment analyses linked these genes to innate immune and microglial activation pathways, while drug–gene interaction and docking studies identified potential FDA-approved compounds for therapeutic repurposing. Together, these findings position immune signaling as both a mechanistic contributor and a therapeutic target in epilepsy. Expression of these biomarkers may be influenced by genetic variants, promoter methylation, histone marks, and microRNAs that regulate neuroinflammatory pathways in epilepsy. The results underscore the biological plausibility of a neuroimmune axis in epileptogenesis and drug resistance. Elevated expression of inflammatory mediators such as TNF and IL1B in epileptic tissue supports growing evidence that chronic neuroinflammation contributes to seizure persistence and refractoriness³. Our observation that top SHAP-prioritized genes were enriched in microglia and monocyte-related pathways aligns with prior studies demonstrating increased peripheral and central pro-inflammatory cytokines in DRE patients^4,20. The presence of immune infiltration and dysregulated cytokine signaling, particularly involving CX3CR1, CSF1R, and P2RY12, reinforces the hypothesis that glial activation plays a central role in epileptogenic remodeling²¹. These findings extend prior ML transcriptomic studies in epilepsy by demonstrating cross-tissue and cross-cohort reproducibility of an immune–microglial gene set and, to our knowledge, providing the first SHAP-guided, structure-supported prioritization of P2RY12-targeting antiplatelet agents as repurposable candidates in DRE.

These findings are consistent with recent reports identifying elevated Th17 responses²⁰, peripheral complement deficiencies⁶, and microglial activation²² in DRE. Our study extends this literature by integrating high-dimensional RNA-seq data with machine learning and explainability tools to offer a unified gene-level understanding of immune involvement. Unlike prior studies focused on single cohorts or blood-based markers, our multi-dataset approach leverages both cortical and hippocampal tissues, increasing generalizability and biological resolution. However, because the datasets were cross-sectional, longitudinal immune-inflammatory dynamics and the predictive value for treatment response could not be assessed. Nevertheless, several strengths support the robustness of our results^23,24. First, the study performed batch-aware normalization and addressed class imbalance using stratified cross-validation with SMOTE, reducing common biases in transcriptomic studies. Second, our ensemble modeling captured both linear and non-linear feature relationships, enhancing interpretability across algorithms. Third, external validation in an independent cohort demonstrated strong predictive consistency, reinforcing the translational potential of the identified biomarkers. Finally, integrating SHAP-based feature interpretation with drug–gene docking provides a practical framework for prioritizing drug repurposing candidates. Our use of SHAP and permutation importance provides post-hoc explanations of model predictions rather than entirely intrinsic interpretability; we therefore interpret these results as transparent feature attributions that must be cross-validated with independent biological evidence, rather than as proofs of causal mechanisms.

Several of our observations, including upregulation of TNF, IL1B, and microglial markers, confirm prior reports of neuroinflammation in drug-resistant epilepsy. Beyond corroborating these mechanisms, our study provides novel contributions by integrating hippocampal and cortical RNA-seq cohorts with batch-aware modeling, using multi-model SHAP attributions to define a robust immune–microglial gene set, and systematically nominating P2RY12-centered drug–target pairs for repurposing.

Prior studies have highlighted immune and inflammatory processes as central to drug-resistant epilepsy (DRE). Elevated TNF and IL1B levels in serum and brain tissue suggest their role in sustaining seizure activity²⁴. Microglial activation and dysregulation of P2RY12 and CX3CR1 signaling contribute to epileptogenic remodeling²⁵. Complement pathway abnormalities have been observed in refractory patients, indicating peripheral immune dysfunction²⁶. Evidence also points to increased Th17 responses and pro-inflammatory cytokines in DRE²⁰. Additionally, glial-driven neuroinflammation has been repeatedly confirmed in both experimental and clinical models²⁷. Our study confirms these findings but advances the field by integrating multi-cohort transcriptomics with interpretable machine learning, prioritizing TNF, IL1B, and P2RY12 across tissues, and uniquely linking them to repurposable FDA-approved compounds. Nonetheless, bulk RNA-seq precludes single-cell resolution, potentially masking cell-specific expression patterns. Although the study infers immune cell-type enrichment via deconvolution, future single-cell RNA-seq analyses could validate these findings with greater precision. Additionally, although the study controlled for batch and tissue heterogeneity, unmeasured confounders, such as medication history or comorbid inflammation, could influence expression profiles. Finally, in silico docking and dynamics simulations offer only preliminary evidence of druggability; experimental validation is needed to confirm biological efficacy and safety. These findings have important implications. Clinically, immune-related biomarkers such as P2RY12, TNF, and IL1B may serve as diagnostic indicators or predictors of treatment response, particularly in patients considered for immunomodulatory therapy. Our identification of repurposable drugs like prasugrel and clopidogrel is timely, given the growing interest in anti-inflammatory strategies for epilepsy^27,28.

This work has several limitations. First, we analyzed bulk RNA-seq, which cannot resolve cell-type–specific or spatially restricted expression. Second, clinical variables (e.g., detailed treatment history, seizure frequency, comorbid inflammation) were incompletely available, limiting clinicogenomic correlation. Third, all biomarker and drug-repurposing findings are based on in silico analyses; we did not perform wet-lab validation (qPCR, protein assays, or functional drug testing). Accordingly, our results should be viewed as hypothesis-generating and require confirmation in future experimental and clinical studies. A significant limitation is that all findings are derived from in silico analyses; we did not perform qPCR, immunohistochemistry, Western blot/ELISA, or in vitro drug-treatment assays to experimentally validate gene or protein expression, nor the functional impact of candidate drugs. Thus, TNF, IL1B, P2RY12, and the proposed drug–target interactions should be regarded as prioritized hypotheses requiring confirmation in patient tissue, experimental epilepsy models, cell-based assays, and independent single-cell and proteomic datasets.

Future work will benchmark docking- and MD-derived binding strengths against experimental affinity measurements (e.g., SPR, ITC, radioligand assays) and assess correlation between predicted energies and Kd/IC₅₀ values for prioritized complexes. Therapeutic modulation of neuroinflammation could complement or augment current antiseizure medications, particularly in patients with refractory seizures. Experimental in vitro and in vivo validation is required to confirm the efficacy and safety of repurposable FDA-approved candidates identified in this study. It should pursue experimental validation of candidate targets in animal models and in patient-derived tissues. Integration of spatial transcriptomics and multi-omics (e.g., epigenomics, proteomics, metabolomics) will enable localization of immune-inflammatory programs within epileptogenic microenvironments and link transcriptional signatures to upstream regulation and downstream functional states, thereby refining mechanistic insight into neuroimmune contributions to drug-resistant epilepsy.

Additionally, longitudinal studies are needed to assess whether inflammatory markers can predict disease progression or treatment resistance. The use of single-cell technologies and spatial transcriptomics will further refine our understanding of cellular heterogeneity and tissue microenvironment in epilepsy.

Conclusion

This study presents a comprehensive transcriptomic analysis of drug-resistant epilepsy, integrating multi-cohort RNA-sequencing datasets with explainable machine learning to identify robust, biologically relevant biomarkers. By controlling for batch effects, tissue heterogeneity, and class imbalance, the study prioritized a set of immune-inflammatory genes—most notably TNF, IL1B, CX3CR1, and P2RY12 that consistently differentiated epileptic from non-epileptic brain tissue. These biomarkers demonstrated high classification performance across internal and external datasets, underscoring their potential diagnostic utility. Functional enrichment analyses confirmed that these genes are enriched in microglial and monocyte-related pathways, implicating neuroimmune activation as a key component of epilepsy pathology. Drug–gene interaction mapping and structure-based screening further revealed that several of these targets are actionable, with high-affinity interactions identified for FDA-approved agents such as prasugrel and clopidogrel. These findings suggest promising avenues for therapeutic repurposing, particularly in targeting inflammatory mechanisms underlying epilepsy. Despite the strengths of this multi-method framework, including external validation, interpretable feature selection, and integrative drug screening, limitations remain. These include the use of bulk RNA-seq, which may mask cell-type–specific dynamics, and the retrospective nature of dataset harmonization. Future studies should pursue single-cell validation and in vivo functional assays to further elucidate causal mechanisms. Accordingly, our study should be viewed as a computational prioritization framework that nominates immune-inflammatory biomarkers and repurposable agents for downstream experimental testing, rather than as definitive mechanistic validation of these targets. Overall, this study demonstrates that immune-inflammatory transcriptomic signatures can reliably stratify epileptic pathology and may serve as both biomarkers and therapeutic entry points. These results provide a foundation for future precision medicine efforts in epilepsy, bridging computational discovery with translational potential. Integrative transcriptomic modeling reveals immune signatures central to epilepsy and highlights repurposable drugs with therapeutic promise, advancing the path toward precision diagnostics and treatment.

Data availability

All code used for data preprocessing, machine learning, SHAP interpretation, and molecular docking analysis is publicly available at [https://github.com/Tayyab-Ijaz/EpilepsyBiomarkerDrugs](https:/github.com/Tayyab-Ijaz/EpilepsyBiomarkerDrugs). This repository also includes processed datasets and instructions for reproducing the primary analyses and figures.

References

Kwan, P. et al. Definition of Drug Resistant Epilepsy: Consensus Proposal by the Ad Hoc Task Force of the ILAE Commission on Therapeutic Strategies Ed (Wiley Online Library, 2010).
Nguyen, D. L. et al. Longitudinal positron emission tomography imaging of glial cell activation in a mouse model of mesial temporal lobe epilepsy: toward identification of optimal treatment windows. Epilepsia 59(6), 1234–1244 (2018).
Article PubMed Google Scholar
Vezzani, A., Balosso, S. & Ravizza, T. Neuroinflammatory pathways as treatment targets and biomarkers in epilepsy. Nat. Reviews Neurol. 15 (8), 459–472 (2019).
Article Google Scholar
Milano, C. et al. Increased production of inflammatory cytokines by Circulating monocytes in mesial Temporal lobe epilepsy: A possible role in drug resistance. J. Neuroimmunol. 386, 578272 (2024).
Article PubMed Google Scholar
Bazhanova, E. D., Kozlov, A. A. & Litovchenko, A. V. Mechanisms of drug resistance in the pathogenesis of epilepsy: role of neuroinflammation. A literature review. Brain Sci. 11 (5), 663 (2021).
Article PubMed PubMed Central Google Scholar
Pinzon-Hoyos, N. et al. Sex-Specific Complement and Cytokine Imbalances in Drug-Resistant Epilepsy (Biomarkers of Immune Vulnerability, bioRxiv, 2024).
Lariviere, S., Bernasconi, A., Bernasconi, N. & Bernhardt, B. C. Connectome biomarkers of drug-resistant epilepsy. Epilepsia 62(1), 6–24 (2021).
Article PubMed Google Scholar
Avberšek, A. Genomics of Drug Resistance in Epilepsy (UCL (University College London), 2019.
Sánchez-Jiménez, P. et al. Transcriptomic profiling unveils novel therapeutic options for drug-resistant temporal lobe epilepsy. medRxiv https://doi.org/10.1101/2024.06.26.24309519 (2024).
Article Google Scholar
Wang, Y. et al. Automatic localization of seizure onset zone based on multi-epileptogenic biomarkers analysis of single-contact from interictal SEEG. Bioengineering 9(12), 769 (2022).
Article PubMed PubMed Central Google Scholar
Johnson, K. A. & Krishnan, A. Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data. Genome Biol. 23 (1), 1 (2022).
Article PubMed PubMed Central Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Nat. Precedings https://doi.org/10.1038/npre.2010.4282.1 (2010).
Article Google Scholar
Ramachandra, H., Ali, A., Ambili, P., Thota, S. & Asha, P. An optimization on bicluster algorithm for gene expression data, in 2023 4th IEEE global conference for advancement in technology (GCAT), : IEEE, pp. 1–6. (2023).
Varet, H., Brillet-Guéguen, L., Coppée, J. Y. & Dillies, M. A. SARTools: a DESeq2-and EdgeR-based R pipeline for comprehensive differential analysis of RNA-Seq data. PloS One. 11 (6), e0157022 (2016).
Article PubMed PubMed Central Google Scholar
Ali, A., Mohan, J., Nadaf, T. A. A., Ravishankar, H. & Deepa, K. Bioinformatics-Driven discovery of signaling pathways and genes influencing cervical cancer. SN Comput. Sci. 5 (8), 989 (2024).
Article Google Scholar
Demir, S. & Şahin, E. K. Liquefaction prediction with robust machine learning algorithms (SVM, RF, and XGBoost) supported by genetic algorithm-based feature selection and parameter optimization from the perspective of data processing. Environ. Earth Sci. 81 (18), 459 (2022).
Article ADS Google Scholar
Larsen, B. S. Synthetic minority over-sampling technique (SMOTE), GitHub https://github.com/dkbsl/matlab_smote/releases/tag/1.0(2022).
Yap, M. et al. Verifying explainability of a deep learning tissue classifier trained on RNA-seq data. Sci. Rep. 11 (1), 2641 (2021).
Article ADS PubMed PubMed Central Google Scholar
Cannon, M. et al. DGIdb 5.0: rebuilding the drug–gene interaction database for precision medicine and drug discovery platforms. Nucleic Acids Res. 52, D1227–D1235 (2024).
Article PubMed Google Scholar
Ouédraogo, O. et al. The role of Th17 lymphocytes in drug-resistant epilepsy. J. Immunol. 202, 181–188 (2019).
Article Google Scholar
Saito, K. et al. Reactive astrocyte-driven epileptogenesis is induced by microglia initially activated following status epilepticus. JCI Insight 6, e135391 (2019).
Google Scholar
Toledo, A. et al. Temporal lobe epilepsy: evaluation of central and systemic immune-inflammatory features associated with drug resistance. Seizure 91, 447–455 (2021).
Article PubMed Google Scholar
Kuksin, M. et al. Applications of single-cell and bulk RNA sequencing in onco-immunology. Eur. J. Cancer. 149, 193–210 (2021).
Article PubMed Google Scholar
Qi, R. & Zou, Q. Machine Learning Methods in single-cell Immune and Drug Response Prediction Vol. 14, p. 1233078, ed: Frontiers Media SA, (2023).
Kumar, P. et al. Single-cell transcriptomics and surface epitope detection in human brain epileptic lesions identifies pro-inflammatory signaling. Nat. Neurosci. 25 (7), 956–966 (2022).
Article MathSciNet PubMed PubMed Central Google Scholar
Pinzon-Hoyos, N. et al. Drug-resistant epilepsy associated with peripheral complement decreases and sex-specific cytokine imbalances: a pilot study. Sci. Rep. 15 (1), 5096 (2025).
Article ADS PubMed PubMed Central Google Scholar
Vezzani, A., French, J., Bartfai, T. & Baram, T. Z. The role of inflammation in epilepsy. Nat. Reviews Neurol. 7 (1), 31–40 (2011).
Article Google Scholar
Yu, N., Liu, H. & Di, Q. Modulation of immunity and the inflammatory response: a new target for treating drug-resistant epilepsy. Curr. Neuropharmacol. 11 (1), 114–127 (2013).
PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
Tayyab Ijaz, Hamna Maqsood, Muhammad Tahir ul Qamar & Usman Ali Ashfaq
Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University, Yangling, 712100, Shaanxi, China
Abdur Rehman

Authors

Tayyab Ijaz
View author publications
Search author on:PubMed Google Scholar
Hamna Maqsood
View author publications
Search author on:PubMed Google Scholar
Abdur Rehman
View author publications
Search author on:PubMed Google Scholar
Muhammad Tahir ul Qamar
View author publications
Search author on:PubMed Google Scholar
Usman Ali Ashfaq
View author publications
Search author on:PubMed Google Scholar

Contributions

T.I., H.M., and A.R. contributed equally to the analysis and experiments of the study and wrote the first draft of the manuscript. M.T.Q. assisted in the data interpretation and provided technical expertise and support for the experiments. U.A.A. conceptualized and supervised the project and secured funding. All authors reviewed, revised, and approved the final version of the manuscript.

Corresponding author

Correspondence to Usman Ali Ashfaq.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ijaz, T., Maqsood, H., Rehman, A. et al. Explainable machine learning identifies immune-inflammatory biomarkers and therapeutic candidates in drug-resistant epilepsy. Sci Rep 16, 965 (2026). https://doi.org/10.1038/s41598-025-30401-x

Download citation

Received: 09 October 2025
Accepted: 25 November 2025
Published: 25 December 2025
Version of record: 08 January 2026
DOI: https://doi.org/10.1038/s41598-025-30401-x

Subjects

Abstract

Similar content being viewed by others

Transcriptomic decoding of regional cortical vulnerability to drug-resistant epilepsy using 7T MRI

Predicting surgical outcome in drug-resistant epilepsy by combining interictal biomarkers within a machine learning framework

Bioinformatics prediction of function of T-cell exhaustion related genes in ischemic stroke

Introduction

Methods

Data acquisition

Data preprocessing

Differential expression analysis

Feature selection and ensemble modeling

Post-hoc model explainability with SHAP and permutation importance

External validation

Drug–gene interaction mapping

Molecular docking and molecular dynamics simulations

Results

Integrated transcriptome profiling

Immuno-inflammatory axis dominates differential expression

Ensemble-driven biomarker prioritisation

High-fidelity classification of epileptic tissue

SHAP explanation of top features

Biological validation through ontologies and pathways

Therapeutic re-positioning opportunities

Virtual screening and molecular docking

Molecular dynamics simulations

Discussion

Conclusion

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links