Abstract
Precise epileptogenic zone (EZ) localization remains challenging for epilepsy surgery planning. While seizure semiology provides valuable localization information, subjective interpretation and inter-observer variability limit clinical utility. We developed a computational framework utilizing knowledge graph architectures to analyze ictal semiology-EZ relationships systematically. We constructed a semiology-EZ knowledge graph from 852 clinical cases extracted from peer-reviewed literature. GPT-4o facilitated automated extraction and standardization of semiological terminology. Statistical modeling, including Gaussian mixture modeling and Bayesian inference, quantified association strengths between semiological features and anatomical regions. SeeKr, a query platform, was developed to generate EZ localization predictions from patient symptoms with confidence measures. Expert epileptologists evaluated key semiology-to-brain region mappings using a four-point assessment scale. The framework achieved an average correctness score of 2 (1 = strongly agree, 4 = strongly disagree), indicating general clinical plausibility, with most associations falling within the “likely agree” range. Minor inaccuracies involved partial identification when seizures affected multiple regions and slight misclassifications of relationship intensity. This represents the first knowledge graph-based systematic analysis of semiology-EZ relationships. The framework provides a data-driven approach for objective semiological analysis with reasonable clinical accuracy. The methodology offers potential utility as a supplementary diagnostic tool for surgical planning, though further clinical validation is warranted.
Similar content being viewed by others

Introduction
The epileptogenic zone (EZ) is defined as the cortical region necessary and sufficient for seizure initiation, whose complete removal or disconnection results in seizure freedom1. Precise EZ localization is therefore critical for successful epilepsy surgery. During presurgical evaluation, epileptologists systematically delineate five distinct cortical zones to infer the anatomical boundaries and spatial extent of the EZ: the irritative zone, seizure-onset zone, symptomatogenic zone, epileptogenic lesion, and functional deficit zone1. The characterization of these zones, with the exception of the symptomatogenic zone, relies predominantly on multimodal neuroimaging and electrophysiological investigations, encompassing neurological examination, long-term scalp interictal and ictal video electroencephalography (EEG), brain magnetic resonance imaging (MRI), neuropsychological testing, and, when indicated, advanced noninvasive modalities such as positron emission tomography (PET), ictal single-photon emission computed tomography (SPECT), magnetoencephalography (MEG), functional MRI (fMRI), and EEG-fMRI. In select cases requiring further delineation of the EZ or functional networks, intracranial EEG (iEEG) is indicated2,3. Recent research has shown that multimodal approaches, which combine different imaging and analytical modalities, are of significant importance for epilepsy-related lesion localization and brain network analysis, both of which are critical for surgical targeting and are strongly associated with post-surgical seizure freedom4,5,6,7. However, while these advanced techniques provide comprehensive diagnostic information, they often require expensive equipment, specialized expertise, and may not be readily accessible in all clinical environments or suitable for real-time assessment. Semiological analysis requires no specialized instrumentation and yields direct neuroanatomical insights into the cortical regions activated during seizure propagation. Critically, seizure semiology enables precise localization of the symptomatogenic zone, which represents the cortical area whose functional activation generates the earliest recognizable ictal manifestations1,2. This region frequently demonstrates anatomical overlap with or maintains close spatial proximity to the EZ, thereby rendering semiological assessment particularly valuable for seizure localization, especially in resource-constrained clinical environments or in cases of non-lesional epilepsy where conventional neuroimaging fails to identify structural abnormalities8. Through meticulous analysis of the earliest or among the initial ictal manifestations, epileptologists can reliably infer the anatomical location of the seizure-onset zone, as these primary symptoms typically originate from the cortical region initially recruited into the epileptic discharge network8,9,10. Conversely, secondary ictal phenomena resulting from seizure propagation provide diminished localizing value, as they predominantly reflect the spatiotemporal dynamics of epileptic spread rather than the primary seizure focus11,12. Consequently, the initial seconds of seizure evolution-captured through systematic semiological evaluation-constitute the most diagnostically informative seizure component for precise EZ localization and subsequent surgical planning 13. Seizure semiology not only provides independent localizing information but also complements and cross-validates findings from both invasive and non-invasive EEG and imaging modalities14. Notably, comparative analyses of patients who achieved post-surgical seizure freedom following resective epilepsy surgery have demonstrated that semiological assessment provides robust lateralizing and localizing accuracy that is comparable to, and in some instances superior to, that obtained through interictal and ictal scalp EEG recordings and structural MRI15,16.
Despite its clinical value, the study of seizure semiology poses several challenges. Accurate characterization, localization, and lateralization of seizures heavily depend on video-EEG monitoring and direct observation during presurgical evaluation, which contains limitations including high costs and limited accessibility in resource-constrained settings, technical constraints in detecting seizures from deep brain structures, and the inherent subjectivity in interpreting complex semiological features despite objective recording17,18. In recent years, numerous studies have evaluated the localizing value of individual semiologies or analyzed sequential patterns of semiological features, and proved the powerful value of artificial intelligence in inferring the EZs based on semiologies using machine learning19,20,21 and large language models22,23. However, no systematic, data-driven approach has yet comprehensively modeled the variance and uncertainty in these relationships. This gap limits the development of objective and probabilistic models capable of accurately identifying underlying epileptogenic networks based on observed semiological patterns.
Knowledge graphs (KGs) were introduced by Google in 2012 to transform string-based searches into semantic explorations of interconnected entities24, and they have emerged as a promising methodology to address these challenges by significantly improving search accuracy, relevance, and comprehensiveness, which has led to their rapid adoption across industries including question-answering systems, recommendation engines, social networks, and healthcare applications25. In recent years, KGs have been extensively utilized to systematically integrate and represent complex biomedical data. By organizing data into structured triplets, wherein entities of interest (e.g., drugs, diseases, symptoms) are represented as nodes and their semantic relationships (e.g., disease-symptom associations) constitute the connecting edges26, KGs consolidate heterogeneous information from diverse domains and disparate data sources into machine-interpretable formats. This structured representation facilitates knowledge discovery and computational analysis across distinct biological systems, thereby enabling advanced healthcare analytics applications, including disease phenotyping, risk prediction, and drug repurposing27.
Given these demonstrated advantages, we hypothesize that the integration of KGs into epilepsy research can facilitate systematic representation and analysis of the complex and heterogeneous relationships between seizure semiology and epileptogenic zone localization. To our knowledge, no prior KG-based investigation or systematic, data-driven review has explicitly examined the intricate relational landscape governing the associations between seizure semiology and EZ localization, despite their profound clinical and conceptual interdependence. Therefore, this study aims to provide a comprehensive, data-driven, systematic review of the relationships between seizure semiology and epileptogenic zone. By leveraging knowledge graph methodologies, we seek to elucidate and systematically represent the underlying semantic and clinical associations, thereby offering a novel computational framework for objective reasoning and hypothesis generation in epilepsy diagnostics.
Public cohorts curation
The construction of the knowledge graph was conducted using a seizure semiology dataset systematically derived from peer-reviewed literature following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2009 guidelines28. To ensure data quality and integrity, we constructed this dataset through a comprehensive literature search adhering to established systematic review protocols. Specifically, a systematic search was performed in PubMed, encompassing articles published within the past two decades, utilizing targeted keywords including “seizure,” “seizure semiology,” “epilepsy,” and “epileptogenic zones”. To ensure methodological transparency and reproducibility, we provide the full search strategy used in this systematic review. The PubMed search was conducted using the following Boolean query:
(“seizure”[Title/Abstract] OR “ictal” OR “epileptic seizure”)
AND (“semiology” OR “seizure semiology”)
AND (“epileptogenic zone” OR “EZ”)
The search covered publications from January 1, 2003 to December 31, 2023, corresponding to the past two decades. Studies were included if they: (i) were peer-reviewed human clinical case reports or small cohort studies; (ii) were written in English; (iii) provided explicit, descriptive seizure semiology; and (iv) reported a validated epileptogenic zone determined through surgical outcome or multidisciplinary presurgical consensus. Exclusion criteria comprised: (i) EZ localization reported only at the hemispheric level; (ii) non-specific or vague semiology descriptions (e.g., “non-specific aura”); (iii) semiology descriptors fewer than two words; and (iv) studies reporting only aggregated, cohort-level semiology without individual case resolution. These criteria ensured that only cases with interpretable, high-quality semiology–EZ mappings were retained. Each selected article provided detailed clinical information about seizure semiology and validated EZ, documented through individual or group patient reports. Patient-specific data such as demographics (age, sex, and handedness), along with seizure semiology descriptions, were systematically extracted from both the main text and tables of the selected papers. Importantly, each extracted case was verified by surgical outcomes, confirming the EZ localization14. If multiple EZ regions were reported for a single case, all relevant areas were noted accordingly.
Initially, 309 articles were identified from the PubMed database, as illustrated in Fig. 1A. Following initial screening, 116 articles were excluded as they reported only general statistical data on seizure semiology without specific EZ localization definitions. The remaining 193 articles underwent comprehensive evaluation, yielding 895 individual patient records. Of these, 43 records were subsequently excluded due to ambiguous semiological descriptions or indeterminate EZ localizations, resulting in 852 eligible records for analysis. The final systematic review encompassed 193 articles containing 852 meticulously documented cases, each characterized by detailed semiology descriptions and surgically validated EZ localizations.
The distribution of EZ-semiology cases across distinct brain regions within the systematically curated cohort is illustrated in Fig. 1B. The temporal lobe (T) represents the largest proportion, comprising 37.9% (418 cases), followed by the frontal lobe (F) at 31.1% (343 cases). The parietal lobe (P) and occipital lobe (O) demonstrate moderate representation at 13.1% (144 cases) and 9.3% (103 cases), respectively. The smallest proportions are observed in the insular cortex (INS) at 5.4% (60 cases) and the cingulate cortex (CING) at 3.1% (34 cases).
Ethics statement
All data analyzed in this study were obtained exclusively from previously published, peer-reviewed publications. As these datasets originate from openly accessible publications, all ethical approvals and informed consent procedures were completed by the investigators of the original studies. No new human participants were recruited, contacted, or accessed, and no identifiable patient information was collected for this secondary analysis. Therefore, additional institutional review board approval was not required. All methods were carried out in accordance with relevant guidelines and regulations.
Method
To systematically capture and represent the intricate relationships identified in the curated dataset above, we developed a specialized knowledge graph framework. Fig. 2 illustrates the comprehensive workflow for constructing the KG, focusing on epilepsy semiology, brain regions, and their interconnected relationships.
Node extraction
Following data collection, we employed GPT-4o to extract and standardize seizure semiologies from narrative clinical descriptions. GPT-4o processed the text in batches of 100 cases and generated a preliminary list of standardized semiological terms. These outputs were aggregated into a consolidated repository, allowing us to identify overlapping descriptors and maintain a consistent vocabulary for downstream knowledge graph modeling.
To ensure transparency and methodological reproducibility, the GPT-4o standardization step followed a fixed prompting template that required (i) extract seizure semiology exactly as they appear in the paper, (ii) no inference beyond the original clinical text, and (iii) mapping only to a predefined candidate list derived from the ILAE 2022 Seizure Semiology Glossary7. All model-generated labels were manually reviewed, and any ambiguous or inconsistent mappings were corrected prior to inclusion. A subset of GPT-4o outputs required refinement, mainly involving subtle distinctions between closely related experiential or sensory descriptors and the preservation of features such as laterality or body-part specificity when present in the original report. Consolidation of terms was performed only when supported by ILAE-defined semiological groupings; for example, limb-specific somatosensory pain descriptors were unified under the broader category of focal somatosensory pain (upper limb), whereas phenomenologically distinct cognitive auras were retained as separate entities.
Following this preprocessing pipeline, we identified a total of 172 unique seizure semiologies (Semi), 6 general brain regions (GR), and 69 specific brain regions (SR) for knowledge graph construction. Of the 69 SR nodes, 48 correspond to cortical regions and 21 to subcortical regions, based on anatomical definitions from the Harvard–Oxford atlas29.
Relation extraction
The resulting KG includes seven distinct types of directed edges representing the relationships: Semi\(\rightarrow\)GR, GR\(\rightarrow\)Semi, Semi\(\rightarrow\)Semi, Semi\(\rightarrow\)SR, SR\(\rightarrow\)Semi, SR\(\rightarrow\)GR, and SR\(\rightarrow\)SR. To extract and quantify the strength of these multifaceted relationships, we implemented five complementary modeling approaches, each designed to capture different aspects of the underlying semantic and anatomical associations.
-
(a)
Semi \(\rightarrow\) Semi and SR \(\rightarrow\) SR relationships
We first construct two co-occurrence matrices:
-
Semi \(\rightarrow\) Semi, which quantifies the co-occurrence frequency between seizure semiologies;
-
SR \(\rightarrow\) SR, which quantifies how frequently each specific brain region co-occurs with other specific regions within the same seizure presentations.
For each pair of rows i and j in these matrices, let:
represent the vectors corresponding to rows i and j. Their sample means are calculated as:
The absolute Pearson correlation coefficient \(r_{i,j}\) is computed as:
Subsequently, we apply a Gaussian Mixture Model (GMM) with five Gaussian components \((\mathcal {K}=5)\) to categorize these correlation coefficients into five distinct clusters representing varying strengths of relationships. The GMM estimates the mixture distribution:
subject to constraints \(\sum _{k=1}^{\mathcal {K}}\pi _k = 1\) and \(\pi _k \ge 0\) for all k. Each component \(\mathcal {N}(x\mid \mu _k, \Sigma _k)\) is characterized by mean \(\mu _k\) and variance \(\Sigma _k\). Each correlation coefficient x is then assigned to the most likely component by:
We chose \(K = 5\) components to balance interpretability and granularity, enabling categorization of relationship strengths into five ordered levels. The GMM is fit to the global distribution of all correlation coefficients across the complete dataset rather than to individual subsets (e.g., specific epilepsy subtypes or isolated brain regions). Under this global fitting approach, subsets exhibiting uniformly low correlations relative to the overall distribution will have their pairs assigned predominantly to lower GMM components, appropriately reflecting weak associations; similarly, subsets with uniformly high correlations will cluster in upper components. This strategy ensures that the five-level categorization captures inherent variation in the data without artificially fragmenting homogeneous subsets into spurious distinctions. The resulting clusters represent five distinct relationship-strength levels, ranging semantically from “very low” to “very high,” based on the fitted Gaussian component means \(\mu _k\). A visualized schematic workflow is illustrate in Fig 3.
Correlation-based clustering workflow for Semi\(\rightarrow\)Semi relationship extraction. Step 1: Co-occurrence matrix showing raw frequency counts. Step 2: Absolute Pearson correlation matrix revealing standardized association patterns. Step 3: Gaussian Mixture Model with five components fitted to correlation distribution. Step 4: Illustrative example showing cluster assignment for \(r=0.52\) (red dashed line). Step 5: Hard assignment strategy: each correlation coefficient is assigned to the Gaussian component k that maximizes the weighted likelihood \(\pi _k \mathcal {N}(|r||\mu _k,\Sigma _k)\).
-
(b)
SR \(\rightarrow\) GR relationships
Since each specific region is inherently a subset of a general region, the correlation between SR and its corresponding GR is naturally high.
-
(c)
Semi \(\rightarrow\) GR relationships
We constructed a co-occurrence matrix \(M \in \mathbb {R}^{S \times G}\) that quantifies how frequently each seizure semiology (rows indexed by \(i\in \{1,\dots ,S\}\)) co-occurs with each general region (columns indexed by \(j\in \{1,\dots ,G\}\)).
Next, we column-normalize M. Specifically, for each column j,
This ensures that each column of \(\tilde{M}\) sums to 1. In other words, within each region j, the entries \(\tilde{M}_{i,j}\) form a probability-like distribution over seizure semiologies.
We subsequently applied a five-component univariate GMM to fit the normalized values. Each \(\tilde{M}_{i,j}\) is then assigned to whichever Gaussian component k maximizes the posterior probability \(\pi _k\,\mathcal {N}(\tilde{M}_{i,j}\mid \mu _k,\Sigma _k)\):
These five Gaussian components categorize relationships between semiology and GR into a spectrum of intensities, ranging from very low to very high.
-
(d)
GR \(\rightarrow\) Semi, Semi \(\rightarrow\) SR, and SR \(\rightarrow\)Semi relationships
We seek to infer inter-entity relationships from co-occurrence matrices. However, such matrices are typically characterized by high sparsity, containing numerous zero entries. These zeros do not necessarily indicate the absence of relationships between entities; rather, they may reflect limitations in data collection or insufficient sampling depth. To address this sparsity challenge, we implement a Poisson-Gamma Bayesian framework that applies statistical smoothing to the raw count data. This approach treats zero or low-magnitude counts as potentially under-sampled observations rather than definitive evidence of non-association, thereby providing a more robust foundation for relationship inference. The Bayesian model effectively regularizes sparse data by incorporating prior beliefs about the underlying count distribution, enabling more reliable estimation of inter-entity associations even in data-limited scenarios.
The Poisson-Gamma formulation offers several advantages that make it particularly appropriate for our sparse co-occurrence matrices. First, the observed counts are non-negative integers, making the Poisson distribution a mathematically natural choice for the likelihood model, while the conjugate Gamma prior enables closed-form posterior inference without requiring numerical approximation. Second, the resulting posterior mean provides sample-size–dependent shrinkage: pairs with few observations are automatically regularized toward the prior, while well-supported pairs are driven primarily by empirical counts. This adaptive behavior contrasts favorably with ad-hoc smoothing methods (e.g., additive constants) that apply uniform corrections regardless of local sample size. Third, alternative statistical approaches prove less suitable for our specific requirements. Classical hypothesis tests (\(\chi ^2\), Fisher’s exact test) are designed for binary significance decisions rather than continuous association strength estimation, and would introduce severe multiple-comparison issues given the thousands of semiology–region pairs in our graph. Regression-based methods (logistic, multinomial models) are better suited for directional prediction tasks rather than constructing symmetric pairwise association matrices. More complex Bayesian hierarchies (zero-inflated, negative-binomial models) would increase computational burden and reduce interpretability without substantial benefit given our current dataset scale. The Poisson–Gamma conjugate model thus provides an optimal balance between mathematical rigor, computational tractability, and clinical interpretability.
We begin with the fundamental assumption that each observed count \(C_{i,j}\) follows a Poisson distribution:
where \(\lambda _{i,j} > 0\) is the (unknown) rate parameter for the pair (i, j).
We subsequently impose a Gamma prior distribution on the parameter \(\lambda _{i,j}\):
with shape parameter \(\alpha > 0\) and rate parameter \(\beta > 0\). This prior specification reflects a priori assumptions about the rate parameter \(\lambda _{i,j}\) before incorporating observed data. The Gamma distribution is chosen as the conjugate prior to the Poisson likelihood, ensuring that the resulting posterior distribution also follows a Gamma distribution, which facilitates exact Bayesian inference.
Given the rate parameter \(\lambda _{i,j}\) > 0, the observed co-occurrence count \(C_{i,j}\) follows a Poisson distribution with probability mass function (pmf):
Before observing \(C_{i,j}\), we assume a Gamma\((\alpha , \beta )\) distribution as a prior on \(\lambda _{i,j}\):
By Bayes’ theorem, the posterior distribution \(P(\lambda _{i,j} \mid C_{i,j})\) is proportional to the product of the Poisson likelihood and the Gamma prior:
Substituting the Poisson likelihood and Gamma prior yields:
Simplifying terms and combining exponents, we obtain:
This expression matches precisely to the kernel of a Gamma distribution with the updated shape parameter \(\alpha + C_{i,j}\) and rate parameter \(\beta + 1\). Therefore, the posterior distribution is:
Although one can use the entire posterior distribution for inference, a common choice is to take the posterior mean of \(\lambda _{i,j}\) as the new, smoothed value. The posterior mean is:
Thus we replace the raw count \(C_{i,j}\) in the co-occurrence matrix with
This ensures zero entries in \(C_{i,j}\) become small but non-zero in \(\overline{C_{i,j}}\) (reflecting the uncertainty that these two items might be related, despite not having observed it in the limited data).
Finally, we apply min-max normalization to scale the posterior mean values \(\overline{C_{i,j}}\) into the range [0,1]:
We then fit a GMM with \(\mathcal {K} = 5\) components to the flattened normalized values to categorize the intensity of the inferred relationships. Each normalized value \(\tilde{C}_{i,j}\) is assigned to the Gaussian component k that maximizes the posterior probability:
These clusters correspond naturally to distinct relationship intensities, thereby furnishing an objective, data-driven framework for quantifying the strength of inferred inter-entity associations.
Application
We implemented the Seizure Semiology Knowledge Graph (SeeKr) Query Platform as a practical application of our methodology. The SeeKr platform demonstrates the clinical translational potential of this methodology by establishing a robust computational framework that directly maps recorded ictal symptoms to candidate EZs. The platform architecture and demonstration interface are presented in Fig. 4 and 5, respectively. The system architecture encompasses three core functional modules.
User Input Module: In this primary module, users input epilepsy-related symptomatology, such as “tonic activity, dizziness, nausea, loss of consciousness,” with the option to apply temporal decay weighting to capture symptom progression when clinically indicated.
Statistical Analysis Results: This analytical component, positioned in the upper-right quadrant, encompasses three complementary visualization panels. The first chart displays the computed likelihood distribution across potential EZs, with regions ranked in descending order of likelihood from left to right. Notably, this example demonstrates a case where the frontal and temporal lobes exhibit comparable posterior probabilities. Rather than forcing selection of a single “best” region, the system is designed to transparently present this diagnostic uncertainty, providing clinicians with a ranked distribution of candidate EZs alongside confidence measures. The secondary panel identifies the ten most correlated standardized seizure semiologies relative to the reported symptoms, ordered by Pearson correlation coefficient in descending magnitude. The third chart presents the EZ likelihood distribution derived exclusively from these ten highest-ranking standardized semiologies. The underlying semiology-to-semiology and semiology-to-anatomical region associations employed in this analysis are computed using the Bayesian methodologies delineated in the preceding sections. Collectively, these integrated visualizations provide epileptologists with quantitative decision support for precise EZ localization and subsequent characterization of ictal propagation dynamics across cerebral regions.
Knowledge Graph Visualization: The platform additionally generates an interactive visualization of a curated subset of the knowledge graph, dynamically tailored to the specific user query. Within this network representation, red nodes denote user-inputted symptoms, blue nodes represent standardized seizure semiologies from the previously defined semiology set (see Fig. 2), purple nodes correspond to specific anatomical brain regions, and orange nodes indicate general brain regions. The computational methods underlying the construction and validation of these inter-nodal relationships are detailed comprehensively in the preceding methodological sections. This knowledge graph effectively captures the complex, multi-layered associations linking patient-reported symptomatology, standardized semiological classifications, and neuroanatomical substrates.
In aggregate, these features underscore the platform’s capability as a comprehensive translational tool that converts subjective clinical observations into objective, statistically grounded predictions for EZ identification. To support this functionality, the SeeKr platform is implemented as a web-based query interface using Gradio, supported by a modular inference pipeline comprising three stages. Firstly, user-provided semiologies are parsed and matched against a precomputed Pearson correlation matrix to quantify their statistical similarity to all semiologies represented in the KG. Then, the top-10 most correlated semiologies are identified and their weighted contributions to each brain region are aggregated from the KG, with optional temporal decay for sequential symptoms. lastly, the system dynamically generates three outputs, an EZ likelihood distribution derived directly from the input symptoms, a bar plot of the top-10 related semiologies, and an extended region-likelihood profile estimated from one-hop symptom neighbors. The clinical feasibility of this approach was validated through the development of a fully operational interactive web platform. This methodology represents a significant advancement in computational epileptology and provides a validated experimental platform for future research endeavors in seizure localization and network propagation studies, with public access to be provided following institutional review procedures.
Demonstration of query results from the SeeKr platform interface. The platform comprises a user input module and an analytical output framework that delivers EZ predictions, statistical analyses, and query-specific knowledge graph visualizations. In this example query, frontal and temporal regions exhibit similar posterior probabilities, illustrating the system’s design to expose diagnostic uncertainty and present ranked candidate regions rather than forcing a single definitive localization when data support is equivocal.
Discussion
The robustness and validity of our research outcomes are supported by evaluations from expert epileptologists. Important semiology-to-region relationships identified in our analysis underwent comprehensive and rigorous expert review using a structured four-point assessment scale, where lower scores indicate higher agreement (1 = strongly agree; 2 = likely agree; 3 = unlikely; 4 = strongly disagree). Expert epileptologists evaluated the clinical plausibility of key semiology-to-brain region mappings within our knowledge graph framework. The evaluations yielded an average correctness score of 2, indicating that the semiology-region relationships established in our knowledge graph are generally well-supported by clinical expertise, with most associations falling within the “likely agree” range. This score demonstrates that EZ localization predictions based on our semiological analysis framework are clinically reasonable and align with expert knowledge, albeit with minor variations in clinical interpretation. A sample of evaluation and overall evaluation distribution are listed in Table 1 and Fig 6, respectively. These minor inaccuracies typically involve partial identification of all activated regions when seizures involve multiple areas simultaneously, or slight misclassifications of the intensity levels of certain semiology-to-region relationships. Overall, expert assessments validate the substantial accuracy, robustness, and clinical relevance of our predictive framework.
The clinical significance of our findings can be contextualized within the broader landscape of epilepsy diagnostics. The clinical implementation of established diagnostic modalities is frequently constrained by institutional resource limitations30,31, economic considerations32, and temporal constraints33, resulting in incomplete or absent diagnostic datasets that compromise the precision of presurgical planning 34. Moreover, even when these established assessment tools are comprehensively applied under optimal conditions, surgical outcomes remain suboptimal in a substantial proportion of cases35,36. This suggests that the concordance and convergence of findings across multiple presurgical evaluation modalities appear to be the critical factor influencing favorable postoperative seizure outcomes37,38. In this context, our knowledge graph-based approach offers a complementary framework that can enhance existing diagnostic workflows by providing systematic, data-driven insights into semiology-EZ relationships. To maximize the clinical utility of this framework, several practical considerations should be noted regarding optimal query formulation. While SeeKr operates on any symptom count, prediction reliability depends on both the number and specificity of input semiologies. Users should interpret results cautiously when fewer than 2-3 symptoms are provided, when symptoms reflect late seizure propagation rather than onset, or when only non-specific symptoms are entered. Importantly, the platform’s confidence intervals, likelihood distribution width, and entropy measures provide quantitative feedback on reliability-cases with few or non-specific symptoms naturally produce broader distributions, signaling reduced certainty. In clinical practice, focusing on the earliest observable symptoms typically yields the most localizing information. A formal determination of optimal symptom thresholds requires larger, more homogeneous datasets with standardized symptom documentation protocols.
The complexity of seizure semiology presents inherent challenges that extend beyond methodological considerations. The clinical interpretation of semiological symptoms is susceptible to individual epileptologist expertise and cognitive biases, potentially introducing systematic variability in diagnostic assessments that may subsequently propagate through computational models employed for EZ prediction22,39. Additionally, seizure semiology encompasses a broad phenotypic spectrum, extending from elementary sensorimotor manifestations that adhere to well-defined neuroanatomical somatotopy to complex behaviors and automatisms arising from distributed neural network dynamics1. These sophisticated semiological patterns likely result from intricate spatiotemporal interactions involving both excitatory and inhibitory processes across multiple, anatomically disparate networks, particularly encompassing higher-order associative cortical regions, thereby substantially complicating their neuroanatomical interpretation and precise localization14. Our knowledge graph framework addresses these challenges by providing a systematic, objective representation of semiology-region relationships that can reduce subjective interpretation variability.
The application of knowledge graphs in biomedical research has demonstrated remarkable versatility across diverse domains. For instance, a study on drug discovery developed a literature-based knowledge graph (Global Network of Biomedical Relationships, GNBR) to generate hypotheses for drug repurposing in rare diseases40. Another recent study constructed a knowledge graph integrating microbes, genes, and metabolites to enable scalable generation of mechanistic hypotheses explaining host-microbe interactions and their associations with disease phenotypes, such as inflammatory bowel disease and Parkinson’s disease41. Additionally, a radiology-focused study demonstrated that a dynamic knowledge graph could effectively mitigate severe visual and textual biases, significantly enhancing the accuracy of automated chest X-ray report generation42. These applications highlight the potential for knowledge graphs to transform complex biomedical data into actionable clinical insights, supporting our approach’s relevance to epilepsy research.
Distribution of expert evaluations across relation strength and correctness ratings. The heatmap shows the number of Semi-GR associations evaluated by expert epileptologists, categorized by relation strength (1=low, 2=medium, 3=strong) and correctness assessment (1=strongly agree, 2=likely agree, 3=unlikely, 4=strongly disagree). The majority of cases received high correctness ratings (1-2), with medium-strength relations being most frequently evaluated and stronger relations demonstrating higher levels of expert agreement.
A primary limitation of this research arises from issues related to both data quality and quantity. Notably, the least plausible model outputs (correctness ratings 3–4) tend to occur in cases involving multiple general regions simultaneously. In such cases, both clinical interpretation and data coverage pose greater challenges, making it difficult to attribute unique symptoms to a single EZ. Additionally, deep structures such as the insular and cingulate cortex are substantially under-represented in our dataset. As demonstrated by the pie chart in Fig. 1B, the dataset displays a pronounced imbalance, with the INS and CING accounting for only 5.4% and 3.12% of the total data, respectively-substantially less than commonly represented regions like the temporal lobe. Given their anatomical positions at the intersection of large-scale cortical networks, the INS and CING are frequently activated during seizure propagation from adjacent regions, especially the frontal and temporal lobes43,44,45. However, this co-activation often reflects secondary involvement rather than primary seizure onset. As a result, seizure activity in these regions may present diagnostic complexities that could influence their representation in clinical datasets. This distributional variation poses analytical challenges for data-driven approaches, which may demonstrate enhanced performance on well-documented EZs while requiring additional consideration for cases involving less frequently reported regions such as INS and CING. To avoid over-interpretation, it is important to note that the associations represented in the knowledge graph reflect statistical co-occurrence rather than causal symptom generation, and the current framework cannot differentiate primary symptomatogenic regions from propagated activity or broader network co-activation. Moreover, finer distinctions such as mesial versus lateral temporal involvement or peri-rolandic subregions cannot be reliably inferred from the existing literature, and the KG therefore treats SR-level relationships as exploratory while GR-level associations form the most stable layer of the framework. Incorporating higher-resolution anatomical mapping will require future access to consistently annotated, patient-level datasets.
Another limitation pertains to data granularity and precision. During the data collection process, the explicit correspondence between individual semiologies and particular brain regions was not consistently delineated. Instead, semiologies and associated general brain regions were recorded collectively as aggregated sets without clear, individual mapping. For example, when multiple semiologies (e.g., cognitive dysfunction, hyperkinetic automatism, and awareness alteration) co-occurred with multiple general regions (e.g., Frontal lobe, Cingulate cortex) in a single patient, whether semiologies map individually or jointly to specific regions remained undetermined. Furthermore, a critical consequence of this lack of clarity is that occasionally only a single semiology is recorded despite multiple brain regions being activated during seizures. For instance, speech arrest may directly involve frontal lobe, temporal lobe, and insular cortex, yet speech arrest alone is unlikely to independently activate all these regions. This ambiguity presents certain challenges for subsequent analyses and interpretations of semiology-region relationships, potentially affecting the precision of EZ identification and localization. Additionally, rare semiologies are scarcely or not represented in the dataset, further complicating interpretation and modeling efforts related to these uncommon seizure manifestations and their associated brain regions. However, these limitations will be systematically resolved through larger, more diverse datasets in future studies. Enhanced data collection strategies and collaborative research efforts will improve model performance and clinical applicability.
Data availability
We will release a sample dataset supporting the scientific claims of this research. The entire dataset will be made available upon reasonable request to the corresponding author.
References
Lüders, H. O., Najm, I., Nair, D., Widdess-Walsh, P. & Bingman, W. The epileptogenic zone: General principles. Epileptic Disord. 8, 1–9 (2006).
Rosenow, F. & Lüders, H. Presurgical evaluation of epilepsy. Brain 124(9), 1683–1700 (2001).
Baumgartner, C., Lehner-Baumgartner, E. The functional deficit zone: General principles. In Textbook of Epilepsy Surgery 821–831. (CRC Press, 2008).
Jeong, J.-W. et al. Multi-scale deep learning of clinically acquired multi-modal MRI improves the localization of seizure onset zone in children with drug-resistant epilepsy. IEEE J. Biomed. Health Inform. 26(11), 5529–5539 (2022).
Yang, S., Liu, F. Inference of whole brain electrophysiological networks through multimodal integration of simultaneous scalp and intracranial EEG. In The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025). https://openreview.net/forum?id=6UAeCPQPwP
Jiao, M., Yang, S., Xian, X., Fotedar, N., & Liu, F. Multi-modal electrophysiological source imaging with attention neural networks based on deep fusion of EEG and MEG. IEEE Trans. Neural Syst. Rehabil. Eng. (2024)
Beniczky, S. et al. Seizure semiology: ILAE glossary of terms and their significance. Epileptic Disord. 3, 447–495 (2022).
Tufenkjian, K. & Lüders, H. O. Seizure semiology: Its value and limitations in localizing the epileptogenic zone. J. Clin. Neurol. (Seoul, Korea) 8(4), 243 (2012).
Cámpora, N. E., Mininni, C. J., Kochen, S. & Lew, S. E. Seizure localization using pre ictal phase-amplitude coupling in intracranial electroencephalography. Sci. Rep. 9(1), 20022 (2019).
Shahabi, H., Nair, D.R., & Leahy, R.M. Multilayer brain networks can identify the epileptogenic zone and seizure dynamics. elife 12, 68531 (2023).
Proix, T., Jirsa, V. K., Bartolomei, F., Guye, M. & Truccolo, W. Predicting the spatiotemporal diversity of seizure propagation and termination in human focal epilepsy. Nat. Commun. 9(1), 1088 (2018).
Foldvary-Schaefer, N. & Unnwongse, K. Localizing and lateralizing features of auras and seizures. Epilepsy Behav. 20(2), 160–166 (2011).
Andrews, J. P. et al. Early seizure spread and epilepsy surgery: A systematic review. Epilepsia 61(10), 2163–2172 (2020).
Alim-Marvasti, A. et al. Probabilistic landscape of seizure semiology localizing values. Brain Commun. 4(3), 130 (2022).
Elwan, S., Alexopoulos, A., Silveira, D. C. & Kotagal, P. Lateralizing and localizing value of seizure semiology: Comparison with scalp EEG, MRI and PET in patients successfully treated with resective epilepsy surgery. Seizure 61, 203–208 (2018).
Khoo, A. et al. Value of semiology in predicting epileptogenic zone and surgical outcome following frontal lobe epilepsy surgery. Seizure 106, 29–35 (2023).
Lim, K.-S. et al. 48-hour video-EEG monitoring for epilepsy presurgical evaluation is cost-effective and safe in resource-limited setting. Epilepsy Res. 162, 106298 (2020).
Bonini, F. et al. Frontal lobe seizures: from clinical semiology to localization. Epilepsia 55(2), 264–277 (2014).
Ahmedt-Aristizabal, D. et al. Automated analysis of seizure semiology and brain electrical activity in presurgery evaluation of epilepsy: A focused survey. Epilepsia 58(11), 1817–1831 (2017).
Mora, S. et al. NLP-based tools for localization of the epileptogenic zone in patients with drug-resistant focal epilepsy. Sci. Rep. 14(1), 2349 (2024).
Alim-Marvasti, A. The value of seizure semiology in epilepsy surgery: Epileptogenic-zone localisation in presurgical patients using machine learning and semiology visualisation tool. PhD thesis, UCL (University College London, 2022).
Luo, Y. et al. Clinical value of CHATGPT for epilepsy presurgical decision-making: Systematic evaluation of seizure semiology interpretation. J. Med. Internet Res. 27, 69173 (2025).
Yang, S., Luo, Y., Jiao, M., Fotedar, N., Rao, V.R., Ju, X., Wu, S., Xian, X., Sun, H., & Karakis, I., et al. Episemollm: A fine-tuned large language model for epileptogenic zone localization based on seizure semiology with a performance comparable to epileptologists. MedRxiv, 2024–09 (2024).
Singhal, A. et al. Introducing the knowledge graph: Things, not strings. Official Google Blog 5(16), 3 (2012).
Peng, C., Xia, F., Naseriparsa, M. & Osborne, F. Knowledge graphs: Opportunities and challenges. Artif. Intell. Rev. 56(11), 13071–13102 (2023).
Geiger, A., Lenz, P., & Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, 3354–3361, (IEEE, 2012).
Hänsel, K., Dudgeon, S. N., Cheung, K.-H., Durant, T. J. & Schulz, W. L. From data to wisdom: Biomedical knowledge graphs for real-world data insights. J. Med. Syst. 47(1), 65 (2023).
Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., & Group, P., et al.: Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Int. J. Surg. 8(5), 336–341 (2010)
Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23, 208–219 (2004).
Jukkarwala, A. et al. Establishment of low cost epilepsy surgery centers in resource poor setting. Seizure 69, 245–250 (2019).
Wu, X.-T. et al. How to effectively constraint the cost of presurgical evaluation for resective surgery in low-income population: Clinically oriented opinions. Seizure 20(5), 425–427 (2011).
Kwon, C.-S., Chang, E. F. & Jetté, N. Cost-effectiveness of advanced imaging technologies in the presurgical workup of epilepsy. Epilepsy Curr. 20(1), 7–11 (2020).
Hill, C. E. et al. Addressing barriers to surgical evaluation for patients with epilepsy. Epilepsy Behav. 86, 1–5 (2018).
Ryvlin, P. & Rheims, S. Epilepsy surgery: Eligibility criteria and presurgical evaluation. Dialogues Clin. Neurosci. 10(1), 91–103 (2008).
Vakharia, V. N. et al. Getting the best outcomes from epilepsy surgery. Ann. Neurol. 83(4), 676–690 (2018).
Wiebe, S., Blume, W. T., Girvin, J. P. & Eliasziw, M. A randomized, controlled trial of surgery for temporal-lobe epilepsy. N. Engl. J. Med. 345(5), 311–318 (2001).
Baumgartner, C., Koren, J.P., Britto-Arias, M., Zoche, L., & Pirker, S. Presurgical epilepsy evaluation and epilepsy surgery. F1000Research 8, 1000 (2019)
Tonini, C. et al. Predictors of epilepsy surgery outcome: A meta-analysis. Epilepsy Res. 62(1), 75–87 (2004).
Beniczky, S.A., Fogarasi, A., Neufeld, M., Andersen, N.B., Wolf, P., Emde Boas, W., & Beniczky, S. Seizure semiology inferred from clinical descriptions and from video recordings. How accurate are they? Epilepsy Behav. 24(2), 213–215 (2012)
Sosa, D.N., Derry, A., Guo, M., Wei, E., Brinton, C., & Altman, R.B. A literature-based knowledge graph embedding method for identifying drug repurposing opportunities in rare diseases. In Pacific Symposium on Biocomputing 2020, 463–474 (World Scientific, 2019)
Santangelo, B. E., Bada, M., Hunter, L. E. & Lozupone, C. Hypothesizing mechanistic links between microbes and disease using knowledge graphs. Sci. Rep. 15(1), 6905. https://doi.org/10.1038/s41598-025-91230-6 (2025).
Li, M., Lin, B., Chen, Z., Lin, H., Liang, X., & Chang, X. Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3334–3343 (2023).
Koubeissi, M. Z., Jouny, C. C., Blakeley, J. O. & Bergey, G. K. Analysis of dynamics and propagation of parietal cingulate seizures with secondary mesial temporal involvement. Epilepsy Behav. 14(1), 108–112 (2009).
Aljafen, B. N. Insular epilepsy, an under-recognized seizure semiology: A review for general neurologist. Neurosci. J. 25(4), 262–268 (2020).
Ekman, F. R. & González-Martínez, J. A. Insular epilepsy: Functions, diagnostic approaches, and surgical interventions. J. Integr. Neurosci. 23(11), 209 (2024).
Author information
Authors and Affiliations
Contributions
Shihao Yang: Conceptualization, methodology, data curation, software, visualization, manuscript writing. Zirui Wen: Methodology, Data curation, software, manuscript writing. Wenxin Zhan: Manuscript writing. Neel Fotedar: Manuscript writing, evaluation. Yen-Cheng Shih: Data resource, evaluation. Jun-En Ding: Methodology. Danilo Bernardo: Manuscript writing, evaluation. Elisa Kallioniemi: Manuscript writing, evaluation. Weil Alexander G: Manuscript writing, evaluation. Xiying Huang: Manuscript writing. Felix Rosenow: Manuscript writing, supervision. Hai Sun: Manuscript writing, supervision. Yo-Tsen Liu:Data resoruce, evaluation, supervision. Shasha Wu: Evaluation, supervision. Feng Liu: Conceptualization, methodology, manuscript writing, supervision.
Corresponding authors
Ethics declarations
Competing interests
Mr. Shihao Yang and Dr. Feng Liu’s effort is partially sponsored by National Institutes of Health under grant number: R21NS135482 (PI: Liu). The other authors declare no completing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, S., Wen, Z., Zhan, W. et al. Knowledge graph representation of the mappings between seizure semiology and epileptogenic zones. Sci Rep 16, 3004 (2026). https://doi.org/10.1038/s41598-025-32920-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-32920-z







