Background & Summary

Sleep is an indispensable and ubiquitous process associated with physiological changes that involve multiple organ systems in both humans and animals1. It plays a vital role in many physiological functions, such as memory consolidation2, synaptic plasticity3, and brain metabolic waste clearance4. As sleep is a central part of maintaining overall mental and physical well-being, disruption of the complex interplay between sleep and other physiological processes can lead to significant medical consequences5,6,7.

Consequently, the sleeping brain activity of healthy human subjects serves as both a powerful resource for understanding normative brain activity and a useful baseline for the comparison and identification of abnormal activity in a wide range of neurological and psychiatric disorders8,9. This also allows researchers to better understand sleep macro-and micro-architecture10 from non-rapid eye movement (NREM) sleep to rapid eye movement (REM) sleep11, as well as region-specific rhythms and patterns across different sleep stages12.

The gold standard for the measurement of physiological sleep is polysomnography (PSG)13, which consists of several scalp electroencephalography (EEG)14 electrodes to record neural events, electrooculography (EOG) to record eye movements, electromyography (EMG) for monitoring muscle tone and activity during sleep, partial oxygen saturation (SpO2), thoraco-abdominal respiratory effort, oro-nasal airflow, and video monitoring. The physiological data collected using PSG recordings provide a picture of clinically useful information about different sleep stages, sleep disruption, respiratory status during different sleep stages, limb movements, behavioural abnormalities, body positions, and changes in cardiac rate and rhythm during sleep.

While there are some freely open PSG sleep datasets acquired from healthy participants (summarized in Supplementary Table 1)15,16,17,18,19,20, these datasets were mainly collected in the context of their specific research aims which limits their generalizability. All currently openly available datasets were acquired using a low-density EEG set-up, ranging from 3 to 18 electrodes at a sampling frequency of less than 512 Hz under laboratory or ambulatory conditions, which limits the exploration of spatial characteristics of sleep21.

Herein, we present an open-access dataset comprising high-density EEG (HD-EEG) sleep recordings from 29 healthy subjects. The data was collected and phenotyped at the Montreal Neurological Institute and is freely available online in anonymized format. The ANPHY-sleep database includes overnight HD-EEG recordings with 83 channels, EOG, EMG, electrocardiogram (ECG), average electrode positions using accurate co-registrations, and sleep scoring annotations. Additionally, descriptions of sleep demographics, sleep macrostructure, and spectral density in different sleep stages and regions were performed at both individual and group levels.

Emphasis was placed on acquiring a high-quality sleep database of healthy subjects that could serve as a valuable resource for researchers investigating sleep physiological and pathological patterns. We expect that this dataset will be particularly valuable for (a) understanding healthy human sleep at a high spatio-temporal resolution22, (b) the development of automatic algorithms for sleep feature detection in healthy subjects (e.g. sleep stage classification)23, and (c) benchmarking against sleep pathology5,6,7.

Methods

Experiment design

The overall pipeline is summarized in Fig. 1. The healthy subjects were screened to be included in the study. HD-EEG and combined PSG were installed by a certified EEG technician and a sleep technician. More specifically, 83 HD-EEG electrodes were installed according to the 10-10 system by a trained EEG technologist with assistance from a trainee in the lab. Additional sensors required to record PSG according to AASM standards including ECG, EOG, nasal cannula and thermistor, fingertip SpO2 sensor, and thoracic and abdominal bands for plethysmography were installed by a trained sleep technician.

Fig. 1
figure 1

The workflow of study design and analysis. (A) Healthy subjects were screened for study enrolment. (B) Combined HD-EEG and PSG setups were applied by certified technicians. (C) Overnight sleep recordings were collected in the sleep lab. (D) Sleep was scored by a certified sleep technician. (E) Data was preprocessed to identify and remove artifacts. (F) The dataset was technically validated. This figure was created with BioRender.com.

During the experimental session, participants spent approximately 7 hours in the lab from 9–11 PM to 5–7 AM the next morning. After the HD-EEG cap was set up and participants were ready to sleep, overnight EEG recordings commenced with “lights out”. Participants slept without any intervention until the next morning. The recordings were scored by a certified sleep technician. Data preprocessing and technical validation were further performed to confirm that the data quality met the standards for public sharing.

Study participants

Healthy adults between 18 and 45 years were recruited by advertisement between 03/2022 and 12/2023 and underwent a screening interview before study enrolment. In the screening interview, participants were found eligible when they fulfilled the following criteria: no obesity (body mass index ≥ 30), no general sleep complaints, no irregularity of sleep patterns, no diagnosis of substance use disorders affecting mood or sleep, no history of psychiatric illness (mood disorder, anxiety disorder, neurodevelopmental disorder, and other psychiatric disorder), no neurological illness (traumatic brain injury, stroke, epilepsy, and other neurological disorder), and no history of sleep disorders such as insomnia, sleep-related breathing disorders (e.g. obstructive sleep apnea), central disorders of hypersomnolence, circadian rhythm sleep-wake disorders, sleep-related movements disorders and parasomnias (e.g. sleepwalking). Participants also filled out the Pittsburgh Sleep Questionnaire Inventory (PSQI)24, which consisted of 19 items made up of seven components (subjective sleep quality, sleep latency, sleep duration, habitual sleep efficiency, use of sleeping medication, and daytime dysfunction), summing up to a global PSQI score ranging from 0 to 21. Participants were required to have a score less than 5 indicating ‘good sleep quality’ to be considered eligible.

After completion of the screening interview, study participants underwent a combined HD-EEG and PSG recording at the research sleep laboratory of the McGill University Health Centre Research Institute (MUHC-RI). We further excluded participants who had an Apnea-Hypopnea-Index (AHI) > 15/h. Subjects’ information and data with AHI > 15 will be made available upon written request. A total of 29 healthy adults with overnight sleep recordings were included in the ANPHY-sleep database (Fig. 2).

Fig. 2
figure 2

Post-screening subject exclusion flowchart.

Ethics statement

All study participants provided written informed consent to the open publication of the data according to the study protocol approved by the Research Ethics Board of the Montreal Neurological Hospital and Institute (MP-37-2018-3886).

Combined HD-EEG and PSG recordings

Overnight PSG signals with HD-EEG recordings were recorded with the HD EEG/PSG JE-120 256-channel amplifier Nihon Koden system (Tokyo, Japan). The 10–10 system of the International Federation of Clinical Neurophysiology was used for electrode placement as done in our previous work25. 83 electrodes were glued electrodes using collodion and the electrode impedances were kept under 5 kΩ. The EOGs were recorded using four additional bipolar electrodes. Two electrodes were placed in the superior and inferior areas of the left orbit to record vertical EOG, and two electrodes were placed lateral to the left and right orbits to record horizontal EOG. The EMGs were recorded using six electrodes, typically chin, left anterior tibialis muscle, and right anterior tibialis muscle. The ECG was recorded using a single modified D-II lead placed on the torso. The sampling rate of the continuous recording is 1000 Hz. The details of recorded signals from healthy participants are listed in Table 1.

Table 1 Description of shared signals included in the ANPHY-Sleep dataset.

Sleep scoring

Sleep stages were manually scored in Polysmith software version 12.0 Neurotronics (Gainesville FL, USA) by a board-certified polysomnography technologist according to the American Association of Sleep Medicine (AASM) guidelines26 in each 30-s epoch using a mastoid reference montage, EOG, and chin EMG. Additionally, we developed a MATLAB script that converts the sleep stage labels from the raw PSG sleep scoring report to validate the data and ensure high accuracy in the sleep stage annotations (see “Code Availability”).

Co-registration

The average digitized electrode locations from all participants were co-registered by perpendicularly projecting electrodes to the nearest position. The quality of co-registration underwent visual inspection and was adjusted if necessary. This process allows head models to be calculated with greater accuracy compared to the template typically used in most studies.

Data pre-processing and analysis

The scalp HD-EEG signals were pre-processed and analyzed using the Brainstorm27 toolbox in MATLAB (R2020b, MathWorks, MA, USA) as shown in Fig. 3a. First, the sleep database protocol and subject were created in the brainstorm toolbox. Second, the physiological signals from each participant were inspected to ensure the quality of the data. Movement artifacts were identified and removed by visual inspection. The data were high-pass filtered at 0.3 Hz and low-pass filtered at 45 Hz, followed by importing the sensor time series in the protocol. Finally, the sleep scoring files were also linked to the raw EDF file to illustrate the sleep stages.

Fig. 3
figure 3

Data processing and analysis workflow. (a) The data processing procedure employed within the brainstorm database. (b) For spectral analysis, the 83 electrodes were grouped into 11 regions.

Demographic and sleep parameter analyses were performed in Python 3.9 and R 4.3.1 and the criteria for sleep parameters were selected based on AASM guidelines following best practices for sleep research26. Sex distribution was tested using the Chi-square test to check if there is a statistically significant association between men and women; Age distribution was tested using the t-test to determine if there is a statistically significant difference in age between men and women. Sleep parameters were tested for normal distribution with the Shapiro-Wilk test and reported as mean and standard deviation (SD) [mean ± SD] in case of normally distributed data or median and range (median [range]) otherwise. P-values < 0.05 were considered to indicate statistical significance.

The percentage of good channels and epochs can vary depending on different criteria imposed on artifact identification algorithms. Here we used a validated and open-source artifact detector28 called “High-Density-SleepCleaner” to perform the inspection. Channels were classified as “poor” if fewer than 97% of sleep epochs were free from artifacts. Epochs were classified as “bad” if even a single channel contained artifacts during that epoch. Given the strict threshold used, users may opt for a different artifact detection method depending on their specific needs.

For the spectral analysis, EEG data were segmented into 30-s epochs in different sleep stages. The spectral analysis was performed using the Brainstorm toolbox. Initially, the power spectral density (PSD) in each channel from each individual was estimated with Welch’s method, i.e. averaging the magnitude of the discrete-time Fourier transform of 83 overlapping blocks of 2 s duration and 1 s step, weighted by a Hamming window9. In each channel the resulting spectral density was normalized to a total power equal to one, making it independent of the EEG signal amplitude. Next, electrodes were grouped into 11 regions consisting of frontal, central, temporal, parietal, and occipital from left, right, and midline (Fig. 3b)29. The group average of the PSD was calculated across all channels within a region. Due to the signal degradation caused by electrical properties of skin, tissue, and bones as well as artifacts30, the analysis was limited to frequencies below 45 Hz.

Data Records

Data are publicly available and can be accessed from the Open Science Framework (OSF)31 archive via https://doi.org/10.17605/OSF.IO/R26FH (see32) by a global compressed file (ANPHY-Sleep.zip), the specifications of the ANPHY-Sleep dataset are presented in Table 2. The database structure is described in Fig. 3. There are 29 subfolders corresponding to each study participant, containing overnight sleep recordings, sleep annotations, and artifact detector matrix, as well as a file containing the average co-registration positions and detailed demographic information of the healthy participants. A general description of the database is provided in the following paragraphs.

Table 2 Specifications of ANPHY-Sleep dataset.

Combined PSG and HD-EEG signals are stored in each subject folder using the European Data Format (EDF) + D 33,34 depicted in Fig. 4a. The EDF files contain the channels listed in Table 1, with the corresponding sampling frequency and digital and physical minimum and maximum values. An example of a recording is shown in Fig. 4b. The sleep scoring annotation is saved in the “txt” file format which includes the sleep stage label, the start time of the corresponding sleep stage, and the duration of each sleep stage (Fig. 4c). In addition, each folder includes the artifact detection matrix (channels x epochs) containing 0 and 1, where 0 denotes that a certain channel contained artifacts or did not belong to the sleep stage of interest, and 1 indicates that a certain channel is artifact-free and within the sleep stage of interest (Fig. 4d).

Fig. 4
figure 4

Dataset structure. (a) The structure of whole dataset is organized in separate folders. (b) Example trace of all available data types, including HD-EEG, ECG signal, EMG signal, and EKG signals. (c) Example of the sleep annotation in files. (d) Example of the artifact detector matrix.

The average digitized electrode co-registered locations from all subjects are kept in a text file named “Co-registered average positions.pos”. The information about gender, age, and sleep properties of each subject is stored in the xlsx file named “Detailed information of subjects”. Anonymized participant IDs are consistent across data formats and with the participant information table and data quality assessment figures available in the repository.

Technical Validation

A data quality assessment routine was implemented as an initial technical validation according to the “data processing” section workflow in Fig. 3a. We validated the dataset at the group level from three points: (1) demographics analysis, (2) sleep macrostructure features analysis, (3) spectrum analysis from different regions across subjects. Individual information on demographics and sleep macrostructure is provided in Supplementary Table 2.

Demographic features of participants in the dataset

A major limitation in the current use of PSG data in related studies and systems development is the imbalance of subject data that are included in the dataset. The variability of the participating subjects concerning factors such as sex and age may noticeably alter the features of recorded brain activity, such as the distribution of sleep stages35. In the present dataset, the sex distribution of the study participants is relatively balanced (Chi-square test, p = 0.58), with males representing 55.17% and females representing 44.83% of the entire dataset (Fig. 5a). The age distribution of participants ranges from 23 to 44 years for females and 20 to 43 years for males. The mean ages are 34.50 ± 5.83 years and 30.63 ± 6.41 years for females and males (t-test, p = 0.30), with age equally distributed in males and females (Fig. 5b).

Fig. 5
figure 5

Variability of the dataset with regards to the subject’s age and sex. (a) Sex distribution. (b) Age density distribution was recorded separately for men and women for all subjects participating in the dataset.

The statistical analysis of sleep features for the dataset

Here we describe the summary statistics of different parameters of the sleep macrostructure defined by AASM guidelines including total recording time (TRT), total sleep time (TST), sleep efficiency (SE), sleep onset latency (SOL), REM latency (REML), wake after sleep onset (WASO), and distribution of sleep stages (N1, N2, N3, REM) (Table 3). TRT, TST, SE, N1, N2, N3, WASO time, and REM time are normally distributed (p > 0.05), while SOL and REML time are not normally distributed (p < 0.05).

Table 3 Sleep parameters for the healthy subjects, significant values are in bold.

The sleep efficiency distribution heatmap is presented in Fig. 6, with an average sleep efficiency of 81.52% ± 10.11%. Dark blue indicates a higher sleep efficiency, while dark red indicates a lower sleep efficiency. The duration of sleep macrostructure parameters is illustrated in Fig. 7a. Figure 7b illustrates a typical hypnogram in a healthy young adult subject (EPCTL16). In the provided dataset, it is believed that normal sleep macrostructure parameters in healthy adults serve as a useful control reference for clinicians and for future research where it might be difficult to obtain healthy controls.

Fig. 6
figure 6

Sleep efficiency distribution across subjects. The color dark blue represents high sleep efficiency, and the size of each box represents the TST duration.

Fig. 7
figure 7

Sleep properties of study participants. (a) Boxplot of sleep parameters including total record time (TRT), total sleep time (TST), sleep onset latency (SOL), REM latency (REML), N1, N2, N3, REM duration, and wake after sleep onset (WASO). (b) Typical hypnogram of sleep stages in a 32-year-old healthy young man.

The proportion of good channels and good epochs

The “High-Density-SleepCleaner” algorithm was applied in 29 overnight sleep HD-EEG recordings (83 channels), and the percentage of “good” channels and “good” epochs were assessed. Overall, the percentage of good channels is 98.10% ± 1.70%. the percentage of good epochs is 81.50% ± 9.80% (See Supplementary Information Figs. S1, S2).

Spectrum analysis in different regions from sleep stages

A group-level spectrum analysis of sleep stages in different brain regions was performed at a high-resolution level. Here we analyzed the mean spectral density of channels in 11 anatomical brain regions across 29 healthy adults from NREM, REM, and wake stage, as illustrated in Fig. 8.

Fig. 8
figure 8

Average power spectra density in 11 regions across NREM, REM, and Wake stages. For each plot, the NREM stage is represented in blue, the REM stage is represented in red, and the Wake stage is represented in green. A total of 11 regions located on the left and right in the whole brain are described, including the midline region, the frontal region, the parietal region, the central region, the temporal region, and the occipital region. The line represents the mean value, the shade represents the standard error range.

The power spectra of different stages in specific brain regions indicated a distinct pattern of change as the state moved from NREM sleep to REM to wakefulness. Overall, the NREM stage showed larger power peaks at low frequencies, which could be observed in all regions. For each region’s plot, the power spectra exhibited increasing peaks in the lower frequencies as wakefulness progressed to NREM sleep, with a return to near-wake conditions in REM. In addition, variations in normalized spectral density across five stages (N1, N2, N3, REM, wake) were plotted to observe expected differences in each frequency band of interest (see Supplementary Information Figs. S3S8).

Similarly, Kalamangalam et al. proposed a spectrum-based model of human sleep-awake intracranial EEG (iEEG) based on a publicly available normative database provided by our group9,36. They investigated the spectral variation in the iEEG and found that it is continuous in space (across different cortical regions) and time (stage of the circadian cycle)36. Further validation of sleep-awake patterns with an iEEG atlas could suggest a ‘continuum’ hypothesis in the generative processes of sleep and wakefulness in the human brain37,38.

Usage Notes

The overnight combined HD-EEG and PSG recordings of 29 healthy adults are provided. Here, we introduce a few examples of research topics using this dataset.

Healthy human sleep properties research at high spatial resolution levels

Sleep is crucial to reorganizing the brain for memory consolidation and restoring synaptic homeostasis3,39,40. The neural mechanism for these processes has been accounted for by global sleep oscillations occurring in large-scale brain networks. HD-EEG recordings allow for an excellent spatial-temporal resolution of cortical activity during sleep. The present sleep dataset can be used to study the spatial signature of many sleep features, including but not limited to slow waves41, sleep spindles37,42, traveling waves43, and interactions among different oscillatory activities44.

Automatic sleep scoring using HD-EEG signals (sleep stage classification)

Sleep scoring divides sleep into REM and NREM stages, then further divides the latter into light sleep (stages N1 and N2) and deep sleep (stage N3)26. In typical clinical settings, this is a time-consuming and tedious process done by a board-certified technician. Deep learning has shown a promising future for sleep stage classification23 and sleep feature detection45. Algorithms that combine multimodal electrophysiology time series such as HD-EEG or ECG especially warrant more investigation46.

A comparison study with sleep-related disorders

Benchmarking patients with sleep disorders with healthy controls to expose pathology is common in neuroimaging and is gaining increasing attention in neurophysiology11. To account for normal spatial variation, this database can be used to construct a normative map, which describes the healthy spatial profile and ranges of the features of interest. Patients with sleep-related disorders are often normalized against healthy controls to highlight abnormal brain morphology or connectivity. For example, sleep and epilepsy are linked in a bidirectional complex way5. Research has revealed that REM sleep inhibits epileptic activity47,48,49, while focal epilepsy disrupts sleep spindle structure and function29. By comparing sleep features in patients with epilepsy to healthy controls, we may be able to better disentangle physiological and pathological changes in sleep spindle structure and function.