Introduction

Electroencephalography (EEG) recordings capture brain electrical activity in many different environments. In the clinical setting, a clinician reviews the recordings and, typically using free text, annotates events of interest to identify normal and abnormal findings. Currently, visual analysis and interpretation of the EEG data are critical for diagnosis, and agreement between experts interpreting the same data is higher when they use standardized terminology1,2,3. To facilitate consistency, the International League Against Epilepsy (ILAE) and the International Federation of Clinical Neurophysiology (IFCN) have provided terms and definitions to annotate data features in EEG recordings. However, these terms are not yet machine-readable, complicating large scale automated analyses.

Recently, the Open Science community developed a framework to share definitions of terms for event annotations. The Hierarchical Event Descriptors (HED) framework (https://www.hedtags.org/, https://doi.org/10.5281/zenodo.7930927) enables researchers to annotate data using a formally specified framework for describing and annotating events4. In the HED framework, events are described using tags from a hierarchy of terms. Each event annotation consists of a comma-separated string of terms drawn from the HED vocabulary. These HED event annotations can be incorporated into data formatted in the Brain Imaging Data Structure (BIDS), which is a growing specification for magnetic resonance imaging (MRI), EEG, intracranial EEG (iEEG), magnetoencephalography (MEG) and positron emission tomography (PET) datasets5,6,7,8,9. The HED framework not only ensures the machine readability of event annotations, but the HED infrastructure also provides an extensive tool set for validation, analysis, and event description-based search across datasets. BIDS and HED thus align with FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles10 and have been endorsed by the INCF.

The basic HED schema was developed to standardize event descriptions during cognitive scientific experiments. The HED framework has evolved significantly over the past few years11,12,13 and one of its key features is the extensibility through HED library schemas. HED library schemas allow researchers to define new event types, attributes, and values in a structured vocabulary, including terms unique to specific research fields4,14. Moreover, HED library schemas are open source and machine actionable, providing communities with a standard to share structured vocabularies. While previous studies show the value of using EEG annotations15, these studies struggle with accessibility of data and annotations and currently no HED terms exist for EEG signal annotations. We therefore developed a HED library schema for structured, machine readable terminology for EEG signal annotations.

These annotations are based on terms and definitions provided by the ILAE and IFCN that can be used to characterize normal as well as pathological (ictal) EEG data as encapsulated in the Standardized Computer-based Organized Reporting of EEG (SCORE)16. SCORE is based on broad international consensus in defining graphoelements (EEG waveforms with distinct characteristics). The first SCORE version was endorsed by ILAE-Europe, and the second SCORE version was endorsed as a reporting guideline by the IFCN17. SCORE aims to give experts standard terminology that can be used in clinical practice to annotate EEG data and maximize interobserver agreement. SCORE was built on multiple previously proposed guidelines and vocabularies18,19,20,21,22,23,24,25,26 and on influential EEG textbooks27,28. The second SCORE version added terms drawn from numerous classifications, glossaries, and standard terminologies21,29,30,31,32,33,34. Since its publication, studies have used SCORE to better document and study normal and abnormal EEG patterns in different neurological conditions35,36,37,38,39,40,41,42,43,44,45. Studies have reported that using SCORE terms can improve the consistency and accuracy of EEG reports46,47,48,49. However, without a standard structure, different groups may abbreviate SCORE terms in different ways or use different metadata fields for annotations, resulting in a lack of consistency and lack of machine readability.

To allow the broad scientific and clinical community to add EEG event annotations to BIDS-formatted EEG data in a standardized fashion, we developed a HED-SCORE library schema. The library schema includes terms for modulators (e.g. hyperventilation, medication), background activity (e.g. mu rhythms), EEG patterns in critically ill patients, episodes (e.g. epileptic seizures), feature properties (e.g. sensor list, location), interictal activity, physiological patterns (e.g. frontal arousal rhythm), polygraphic channel features (e.g. ECG, EOG), sleep and drowsiness (e.g. K-complex) and uncertain significant patterns (e.g. Wicket spikes). Terms for artifacts were added to the partnered basic HED schema to facilitate use across electrophysiological measurements modalities. Figure 1 shows a schematic view of how a few terms from the extensive HED-SCORE library schema can be used to annotate graphoelements, including artifacts and seizure activity occurring in an electrophysiology time series recording. We further show that HED-SCORE annotations can be used and validated in public BIDS examples.

Fig. 1
figure 1

Schematic of the HED-SCORE library schema implementation. EEG signals are recorded from the scalp and visualized on a computer screen. The signals can be annotated using HED-SCORE tags. These tags can be given for, e.g., interictal activity, seizures, and artifacts. The data used for this schematic are from the TUH EEG Artifact Corpus (TUAR)50,54, Version: v2.0.0, Patient: 1027, Session: s002 (https://isip.piconepress.com/projects/nedc/data/tuh_eeg/tuh_eeg_artifact/edf/03_tcp_ar_a/010/00001027/s002_2004_01_27/00001027_s002_t004.edf61).

Results

We developed a HED library schema for SCORE. This library schema is an extension of the standard HED schema and allows neurology researchers to annotate electrophysiology recordings per international standards with 396 unique additional terms. An interactive view of the HED-SCORE library schema is available through an expandable HTML viewer (https://www.hedtags.org/display_hed.html) by selecting the “Schema: score” and “Version: HED_score_Latest”. The HTML viewer shows the hierarchical structure of the HED-SCORE library schema (Fig. 2). The top level shows the main types of EEG graphoelements (Fig. 2A). Each level can be expanded to show the lower-level nodes, such as shown for interictal activity. As the artifact terms are relevant across many neuroimaging fields, these were integrated in the partnered main HED schema, which can be used in conjunction (Fig. 2B). Hovering over each tag displays the description of the SCORE term (Fig. 2C).

Fig. 2
figure 2

HED library schema expandable HTML browser allows users to browse the schema terms and descriptors. (A) HED-SCORE library schema top-level describing the main types of EEG graphoelements included in SCORE. (B) Top-level term ‘Artifact’ is expanded to show the lower-level nodes of ‘Biological-artifact’. For data annotations, short forms are typically used (e.g., ‘Chewing-artifact’), and each short form can easily be mapped to its long-form paths (‘Property/Data-property/Data-artifact/Biological-artifact/Chewing-artifact’). (C) The extended description for ‘Epileptiform-interictal-activity’ node shown when hovering over the term. The ‘suggestedTag’ section shows recommend additional terms that a user might want to include along with this tag.

HED-SCORE implementation in example BIDS datasets

We provide a set of BIDS examples to test the implementation and validation of HED-SCORE in BIDS (https://github.com/bids-standard/bids-examples/tree/master/xeeg_hed_score). These examples show the HED-SCORE implementation in three settings: annotating seizures, artifacts, and modulators. Examples are based on data from the Temple University Hospital EEG Corpus (TUEG)50, an open-source collection of clinical EEG recordings performed at Temple University Hospital (TUH) and data from the Mayo Clinic Rochester, MN. These examples were generated following the guidelines from HED resources webpage on how to add HED annotations (https://www.hed-resources.org/en/latest/HowCanYouUseHed.html#adding-hed-annotations-anchor). The process has 4 steps and results in a BIDS dataset where each file is accompanied by a metadata file that describes events occurring at certain times during the corresponding recording.

HED-SCORE implementation step 1: defining the schema

When using HED-SCORE annotations in BIDS events files, the first step is to define the HED-SCORE library schema at the project level in the dataset description json file. The BIDS dataset description file must have a field with the HED-SCORE library schema and version (e.g., “HEDVersion”: “score_2.0.0”). As the HED-SCORE library schema is partnered with the standard HED schema, any annotations in the data can include tags from both the HED-SCORE library schema and the standard schema. This avoids duplication of terms and tags from the standard schema, such as ‘Left’ and ‘Right’, can be used directly to annotate events with HED-SCORE in BIDS.

HED-SCORE implementation step 2: selecting tags for annotation

After the schema is defined in the BIDS dataset description, the second step is to define the HED-SCORE tags that are added to BIDS events files. BIDS events files (…_events.tsv) are tab-separated files that can be accompanied by a human and machine-readable field-value JSON sidecar (…_events.json). Per BIDS principles, the JSON sidecar describes the columns in the tab separated file and defines the HED-SCORE tags associated with event types in these columns. In this step, the schema is browsed (Fig. 2) and relevant terms are defined (examples provided in Figs. 3D5).

Fig. 3
figure 3

Schematic example of a section of a BIDS events file (…_events.tsv) with HED-SCORE annotations. (A) Electrode location and channel labels of the TUH EEG Seizure Corpus. (B) Temporal representation of the example annotations on a subset of selected channels. (C) Two example annotations as they appear in the BIDS tab-separated value events file with columns for onset and duration in seconds and a HED column to annotate the seizure information. The column ‘channel’ is added to indicate which channels in the BIDS _channels.tsv file the annotation corresponds to. The corresponding _channels.tsv file (not visible here, but can be seen in the online example) contains the montage in the dataset. (D) An excerpt from the accompanying JSON sidecar describes the HED-SCORE tag used in the column with seizure information (seizure_info).

Fig. 4
figure 4

Example of an accompanying events JSON sidecar that lists the HED tags used in annotating the artifacts example (sub-eegArtifactTUH). Required BIDS columns are onset and duration. When annotating HED-SCORE events, the column annotation_type is added to describe with the levels and HED tags that are used. The column ‘channel’ is added to describe which channels in the BIDS _channels.tsv file the annotation corresponds to. This _channels.tsv file contains the montage in the dataset.

Fig. 5
figure 5

Example of BIDS events file and JSON sidecar excerpts where the HED-SCORE library schema was used to annotate modulators. The events file (left) shows required BIDS onset and duration columns in seconds and an “event_type” column that combines HED-SCORE and HED tags to describe when the subject opened and closed their eyes and when photic stimulation occurred. The JSON sidecar (right) describes the HED-SCORE and HED tags and is necessary for correct validation of the annotations.

HED-SCORE implementation step 3: data annotation

In the third step, data can be annotated with the tags defined in the JSON sidecar. Data can be reviewed with open software tools such as CTagger12 (an EEGLAB51 toolbox) or EEGNet.org (http://eegnet.org/), which have incorporated the HED-SCORE library schema. However, other clinical and research software tools also provide options to add annotations to a dataset. Consideration should be given to the reference scheme that is used for reviewing EEG data. Clinical EEG data often use a bipolar or referential reference scheme whereas research data more commonly use a common average or Laplacian. The BIDS structure provides metadata fields that can indicate the montage and reference scheme. Moreover, the channels column in the BIDS events file can be used to annotate the channels associated with a specific event (e.g. artifact or graphoelement). Annotations saved in a BIDS event files (…_events.tsv) are then ready for validation.

HED-SCORE implementation step 4: validation

In the fourth and last step, the HED-SCORE tags are validated using the online HED validator (https://hedtools.org/hed or www.hedtags.org/hed-javascript), which can be used separately or as part the BIDS online validator (https://bids-standard.github.io/bids-validator/). The validators return a list of errors when a typo is made in the HED-SCORE tags.

HED-SCORE can be used to annotate seizure information in BIDS data

SCORE includes many terms to annotate EEG data in the clinical setting of epilepsy. To test whether HED-SCORE can be used for BIDS data that fit this use-case, the HED-SCORE BIDS example set includes one subject where seizures are annotated (sub-eegSeizureTUH). This example is based on one subject (41, male) from the TUH EEG Seizure Corpus52. The TUH Corpus uses 27 labels53 that were matched to their corresponding HED-SCORE tags (Supplemental Table 1). Figure 3 shows a subset of seizure annotations in the BIDS events file (…_events.tsv) where tonic clonic seizure activity spread through different EEG channels listed in the channel column. The accompanying JSON sidecar (…_events.json) is used to note how the abbreviation describes the HED-SCORE tag for a tonic-clonic seizure. The sidecar is fully validated in BIDS and if a typing error is made to describe the topic clonic seizure, the BIDS validator throws an error.

HED can be used to annotate artifacts in BIDS data

SCORE includes several standardized terms to annotate data artifacts from biological and non-biological sources. The annotation of artifacts is relevant for many neuroimaging data types in BIDS, and artifacts were incorporated in the partnered standard HED schema (Fig. 2B). The HED-SCORE BIDS example set includes a subject where artifacts are annotated. This example is based on one subject (55, female) from the TUH EEG Artifact Corpus54, a subset of TUEG that contains different artifacts. The artifacts were matched to HED tags (Supplemental Table 1). The dataset includes annotations of 3 different types of artifacts: eye movements, electrode artifacts, and muscle artifacts. An example of how these artifact annotations are described in the JSON sidecar for the BIDS events file is shown in Fig. 4.

HED-SCORE can be used to annotate modulators in BIDS data

During EEG recordings for epilepsy, it is common that data can be modulated by effects of medication, eye closure, cognitive tasks or specific manipulations such as hyperventilation or photic stimulation. The HED-SCORE BIDS example set includes a subject where iEEG data were recorded during photic stimulation. This example is based on one subject (46, male) monitored at Mayo Clinic (Rochester, MN) in the Epilepsy Monitoring Unit. The subject provided informed consent, and the research was performed in accordance with the Mayo Clinic Institutional Review Board. It is important to note that the SCORE terminology is intended for scalp EEG, not intracranial EEG (iEEG). Many terms in SCORE don’t translate to iEEG due to, e.g., differences in signal properties and standard scalp electrode locations with EEG (i.e., 10–20 system). However, some HED-SCORE tags may be used cautiously in iEEG settings, such as the modulators shown in this example. This example includes photic stimulus procedure annotations where event types and photic stimulation frequencies were annotated in the BIDS events file (…_events.tsv) (Fig. 5).

Discussion

We implemented SCORE in a HED library schema, such that the SCORE terms are described in structured fields in a human and machine-readable format. This implementation adheres to the HED design principles of uniqueness, clarity, structural sparsity, and orthogonality. The HED-SCORE library schema is compatible with annotating events in BIDS format, and we show several examples of HED-SCORE annotations integrated with the BIDS metadata standards. The HED-SCORE library schema is available in several formats on GitHub and can be viewed in an expandable HTML viewer.

Demonstrating BIDS compatibility is essential for several large efforts where EEG data are shared in BIDS, such as the EEGManyLabs project55, and open repositories, including OpenNeuro56, the Cuban Human Brain Mapping Project57 and data sharing platforms such as LORIS58. The HED-SCORE library schema includes not only clinical EEG terms but also terminology that can be used for EEG signal preprocessing, such as artifact annotations. A dataset enriched with the HED-SCORE library schema can be validated using the BIDS validator (https://github.com/bids-standard/bids-validator) or the HED validator (https://hedtools.org/hed/). This opens an excellent opportunity for researchers to develop BIDS apps59 and efficiently run automated analysis pipelines with BIDS datasets annotated using the HED-SCORE library schema.

The development of the HED-SCORE library schema is the first step towards developing annotation tools in response to various clinical and scientific needs. For example, CTagger12 is a user-friendly user interface for easily annotating datasets with HED and the HED-SCORE library schema. EEGLAB51 tools already incorporate HED and HED-SCORE support, and several initiatives are piloting these integrations, including EEGNet.org (http://eegnet.org/), NEMAR (https://nemar.org/)60 and the Global Brain Consortium (https://globalbrainconsortium.org/).

Annotating large datasets will be necessary to fully realize the benefits from the open-source implementation of the HED-SCORE library schema. Here, we showed examples of the HED-SCORE library schema applied to existing datasets. Labels used in two datasets from the TUH archive50,52,53,54,61,62 were matched to their corresponding HED-SCORE library schema tags. This mapping allows the automatic conversion of the entire TUH data labels to HED-SCORE tags. Moreover, it demonstrates the ease of matching labels in existing annotated databases to their corresponding HED-SCORE library schema tags, and how HED-SCORE can be used to help develop and evaluate automated seizure detection algorithms63. The developed HED-SCORE library schema with 396 additional terms provides a base for future tagging efforts by other groups.

SCORE was designed and intended for clinical reports on EEG signals. However, to diagnose and study seizures in individuals with neurological conditions, various methods such as EEG, iEEG, or MEG can be used to capture and record electromagnetic brain activity for both clinical and research purposes. The main distinction between these measurement types is the spatial resolution of the recordings. Where EEG and MEG give a global view, they have a relatively lower spatial resolution, while iEEG has a much higher spatial resolution at the cost of sparse coverage64,65. The signals, therefore, have very different features. While some of the scalp EEG graphoelements and findings may apply to iEEG, many do not. Therefore, while some terms from the HED-SCORE library schema may be used for iEEG and MEG data (as in the example with modulators or in general categories of seizures), caution is warranted when describing graphoelements. There are ongoing efforts65,66,67 to review and characterize iEEG activity in a standardized manner for clinical purposes, but these have not yet achieved broad consensus.

The HED-SCORE library schema is focused on describing normal and abnormal EEG graphoelements. Therefore, this work does not detail the description of patient information, information related to referral and recording conditions, and administrative data, which are discussed in SCORE papers16,17. Moreover, we do not yet include neonatal SCORE templates. The American Clinical Neurophysiology Society standardized EEG terminology and categorization to describe continuous EEG monitoring in neonates34. Given the structure of HED, these terms can be included in future extensions of the HED-SCORE library schema or separate library schemas where appropriate.

The HED-SCORE library schema implementation facilitates the sharing of well-annotated data for large-scale analysis. Furthermore, the HED-SCORE library schema can be integrated with existing tools to create new automated analyses, including approaches implemented in future BIDS apps.

Methods

The HED-SCORE library schema follows HED design principles

The HED-SCORE library schema adheres to HED design principles, basic rules, and form requirements, described in detail in the HED specification (https://www.hedtags.org/hed-specification). Following these principles, the tags are orthogonal, and terms that are used independently are represented in separate hierarchies. The content (library schema) is separate from the presentation and the validation tools. Moreover, every node is unique within the library schema and node names are clear and meaningful on their own. The terms also follow form requirements where individual schema terms begin with a capital letter followed by lowercase letters, and spaces are not allowed. Accordingly,  when terms contain multiple words, hyphens were used to separate the words.

HED-SCORE library schema development

The HED standard schema is organized into subtrees representing aspects of events, such as the type of event, agents involved, actions, items, properties, and relationships between these elements. The HED-SCORE library schema has a top-level organization focused on the identification of EEG features and the top levels of the HED-SCORE library schema correspond to the main types of events described in the SCORE papers16,17. These top levels represent background EEG activity, sleep and drowsiness, interictal activity, clinical episodes and electrographic seizures, physiological patterns, patterns of uncertain significance and polygraphic channel features (EMG, EOG, and ECG channels). The top level also includes modulators that pertain to external stimuli and interventions that can change EEG activity. The GitHub commit history reflects the development process of the HED-SCORE library schema, review of internal version HED-SCORE 1.0.0 to current version HED-SCORE 2.0.0 and the design decisions that were made during the development of the HED-SCORE library schema. The pull requests and library schema were reviewed with the HED Working Group, verified against the SCORE papers16,17 and documented on Zenodo68.

Adhering to orthogonality and avoiding repetition

In order to adhere to the HED’s orthogonality rule and avoid repetition, a Feature-property top-level was added to the HED-SCORE library schema. This top-level Feature-property is analogous to the HED standard schema ‘-Property’, which includes descriptive elements, such as adjectives and adverbs. These can be used to describe the other tags in more detail. This allows general descriptors to be used alongside several different elements. For example, feature properties include a tag for Provocative-factor to annotate one of the modulators that provoked a seizure. In addition, the feature properties include tags for rhythms like ‘Gamma-activity’ to describe different types of rhythmic signals like interictal or background activity.

Hierarchy follows is-a relationships

HED requires that child nodes in the schema hierarchy satisfy an ‘is-a’ relationship with the parent level. Further, every tag in the schema must be unique. This constraint allows users to use ‘short-form’ annotations rather than the full path in the hierarchy and to search general terms. This improves searchability because Aware-focal-onset-epileptic-seizure is a type of Epileptic-seizure, and a search for Epileptic-seizure may return events annotated with Aware-focal-onset-epileptic-seizure as well as all other annotated seizure events. HED tools can convert all short-form tags (e.g., Aware-focal-onset-epileptic-seizure) to long-form full paths tags (e.g., /Episode/Epileptic-seizure/Focal-onset-epileptic-seizure/Aware-focal-onset-epileptic-seizure).

Partnering with the main HED schema

Since some of the terms used in SCORE terminology were already present in the standard HED schema (i.e., Increasing, Decreasing, Symmetrical, Hand), these terms were not included in the HED-SCORE library schema. To use these terms, the HED-SCORE library schema was partnered with the standard HED schema. The HED-SCORE library schema is thus designed to not overlap with the standard HED schema and any terms from the main HED schema can be used in conjunction. Moreover, this allowed the tags for artifacts, which apply to many different types of neuroimaging modalities beyond EEG, to be used both using the partnered HED-SCORE library schema and using the main HED schema.

Suggested tags to capture interrelated terms

To maintain the interrelated structure of SCORE while adhering to the HED orthogonality principle (whereby each schema term can be used in only one place) ‘suggestedTag’ schema attributes were included for many SCORE terms. Suggested tags indicate other HED tags that should be included when a given term is used to complete the annotation. For example, Epileptiform-interictal-activity has suggested morphology tags, such as Spike and Sharp-wave. Similarly, Generalized-onset-epileptic-seizure includes suggested tags for specific seizure types to follow the 2017 seizure classification30 such as Tonic-seizure and Myoclonic-seizure. This guides users to consider only those suggested tags that are recommended.

Sources

The SCORE standard was developed over many years by international working groups16,17. Every HED-SCORE tag has a permanent GUID and the exact source table or appendix is indicated in the library schema for proper reference and future compatibility.

HED-SCORE library schema validation

HED tools can be applied to any library schema. According to the standard HED development process for (library) schema and using online HED tools, the HED-SCORE library schema was first developed in the MediaWiki format, a line-oriented markdown language that is easy to read and edit. It was validated to ensure its compliance with current HED (8.0.0+) requirements. As further HED tools for event validation, search and analysis use XML format, HED tools were then used to convert the MediaWiki library schema to XML format, and more recently to a spreadsheet format. The formats are equivalent.

HED-SCORE example dataset validation

Several HED resources are available to validate BIDS datasets with HED tags (https://www.hed-resources.org). The example HED-SCORE event and JSON files were validated using the HED tools python package (https://pypi.org/project/hedtools/) which can be used through an online server (available at https://hedtools.org/hed). The BIDS HED-SCORE-example dataset was validated using the online BIDS validator (https://bids-standard.github.io/bids-validator/).