A multimodal dataset for training deep learning models aimed at detecting and analyzing sleep apnea

Tao, Jing; Huang, Jingjing; Miao, Beiping; Yang, Long

doi:10.1038/s41597-025-05583-8

Download PDF

Data Descriptor
Open access
Published: 18 July 2025

A multimodal dataset for training deep learning models aimed at detecting and analyzing sleep apnea

Jing Tao¹^na1,
Jingjing Huang^2,3^na1,
Beiping Miao¹ &
…
Long Yang^2,4

Scientific Data volume 12, Article number: 1263 (2025) Cite this article

5802 Accesses
1 Citations
Metrics details

Subjects

Abstract

Sleep Apnea Syndrome (SAS) is a serious respiratory disorder that can lead to a range of complications, including hypertension, arrhythmias, cognitive impair- ment, and metabolic disturbances. Due to the insidious nature of its symptoms, patients often fail to recognize the condition, and clinical screening is both time- consuming and resource-intensive. To address these challenges, we have developed a comprehensive dataset that integrates data from Polysomnography (PSG) devices with synchronized audio recordings. This dataset has been rigorously annotated by expert medical professionals based on PSG monitoring data, ensur- ing its accuracy and reliability. Our objective is to provide a publicly available, standardized, and high-quality data resource for the development and applica- tion of deep learning models in the field of sleep apnea syndrome. This dataset is designed to enhance diagnostic accuracy and efficiency while promoting advanced scientific research and technological innovation in this domain.

PSG-Audio, a scored polysomnography dataset with simultaneous audio recordings for sleep apnea studies

Article Open access 03 August 2021

Integrating physiological signals for enhanced sleep apnea diagnosis with SleepNet

Article Open access 28 August 2025

Deep learning for obstructive sleep apnea diagnosis based on single channel oximetry

Article Open access 12 August 2023

Background & Summary

Sleep Apnea Syndrome (SAS) is a serious respiratory disorder with a complex patho- physiology. It is primarily caused by partial or complete obstruction of the upper airway, and it can also result from the brain failing to send appropriate respiratory signals, leading to hypoventilation or apnea^1,2. In some extreme cases, patients may exhibit both factors, making the condition more complex³. The pathophysiology of SAS involves multiple aspects, including anatomical abnormalities of the upper air- way, neurological dysregulation, and muscular dysfunction⁴.According to large-scale community cohort studies, the prevalence of Obstructive Sleep Apnea (OSA) is 26%, with major risk factors including age, gender, body mass index (BMI), hypertension, and smoking history^2,3. During nocturnal episodes, SAS patients typically experi- ence reduced airflow and significant decreases in blood oxygen levels⁵. These episodes of hypoxemia and apnea frequently occur, severely impacting the quality of nighttime sleep and leading to frequent awakenings^6,7.

Chronic sleep fragmentation and hypoxemia not only affect daytime functioning but can also lead to various physiological and psychological issues, such as cogni- tive dysfunction, mood swings, and a decline in quality of life^8,9,10,11. SAS is not only a disruptor of sleep quality but is also a significant risk factor for multiple cardio- vascular diseases^12,13. Studies have shown that SAS is closely associated with conditions like hypertension, arrhythmias, coronary artery disease, and stroke^9,14. Repeated episodes of hypoxemia and sleep fragmentation trigger systemic inflamma- tory responses and oxidative stress, which damage cardiovascular function and increase the risk of cardiovascular events^9,15. Furthermore, SAS is linked to metabolic disor- ders, such as insulin resistance and type 2 diabetes, further exacerbating the health burden on patients¹⁶.

Currently, the accurate diagnosis of Obstructive Sleep Apnea Syndrome (OSA) pri- marilyrelies on polysomnography (PSG) as the gold standard. PSG provides clinicians with detailed data on sleep and respiratory function by comprehensively monitoring physiological parameters such as airflow, blood oxygen saturation, respiratory effort, and brain activity, enabling an effective assessment of the severity of OSA^17,18. However, despite being a reliable diagnostic tool, the limitations of PSG significantly hinder its widespread application. Firstly, PSG must be conducted in a hospital or sleep laboratory, where patients are required to wear multi-sensor devices, increasing discomfort, especially during long-term monitoring. This complexity greatly reduces patient compliance. Secondly, unfamiliar sleep environments (e.g., hospital beds, light- ing, and noise) can lead to reduced sleep quality, thereby affecting the accuracy of the monitoring results. Additionally, wearing PSG equipment may induce or exacerbate snoring, disrupting natural sleep patterns and causing deviations in the collected data from real conditions. These issues highlight the inconvenience of PSG in daily home monitoring, making it difficult to meet the urgent demand for convenient, continuous sleep health management. Therefore, developing portable OSA monitoring algorithms based on deep learning has become a solution with significant engineering application potential, utilizing portable devices (such as smartphones and smart bands) to col- lect data and achieve non-invasive, real-time detection of sleep apnea. This approach can reduce patient inconvenience in a home environment, ensure more natural sleep patterns, and enhance diagnostic accuracy and efficiency through deep learning algo- rithms, providing more options and support for continuous patient management and treatment^19,20.

However, most publicly available datasets for OSA research currently rely on high- quality audio and multimodal data collected via PSG. For example, Georgia Korompili et al. collected a public dataset containing 212 PSG records and their synchronized audio, recorded in controlled environments with low background noise, high sensor sensitivity, and precise annotations, making it suitable for training high-performance models²¹. Similarly, Andrea Bernardini et al. developed a PSG dataset for stroke ward patients, including data related to obstructive sleep apnea, but with a limited sample size²². The idealized collection conditions of these datasets make them diffi- cult to apply directly to real-world engineering scenarios, while audio data collected by portable devices (such as smartphone microphones) is significantly lower in quality compared to PSG. Smartphone audio is constrained by factors such as lower sampling rates, noise, device placement, and signal attenuation, resulting in higher variability and noise interference. Models trained on PSG datasets often experience a significant drop in performance when handling these low-quality, complex-background smart- phone audio recordings, failing to meet the practical application needs of portable devices in home environments. Therefore, constructing a dataset that closely mirrors real-world usage scenarios for smartphone-collected data has become a key foundation for developing robust, engineering-compatible OSA monitoring algorithms.

To address these needs, we plan to construct a new OSA dataset to support the development of deep learning-based portable snore monitoring algorithms, specifi- cally targeting real-world smartphone audio applications. Data were collected from 50 patients over more than 400 hours of sleep, with audio captured using smartphone microphones and professional digital recorders placed naturally on bedside tables. Simultaneously, PSG devices were used to collect various physiological data during the night to assist professional annotators in labeling snore-related events. Finally, key physiological data such as blood oxygen levels, heart rate, sleep structure, and airflow were retained for research purposes. The dataset, stored in formats such as WAV, MP3, CSV, and JSON, and has been uploaded to the Science Data Bank for cloud access. This dataset will bridge the performance gap between PSG and smart- phone data, enabling the training of high-performance models and supporting the engineering application of portable OSA monitoring devices.

Methods

Ethical issues management

This study strictly adheres to international and Chinese national ethical standards to ensure transparency, integrity, and respect for participants. The study design and implementation have been approved by the Ethics Committee of Shenzhen Second People’s Hospital (Approval No. 2023-113-01YJ), with all methods and data analyses undergoing rigorous review. Each participant involved in data collection is required to sign an informed consent form, agreeing to participate in the study and to the recording of audio signals and polysomnography (PSG) data during sleep. Additionally, the informed consent form specifies that all de-identified data may be used for scientific research purposes. For individuals who decline to participate in data collection, their physiological data generated during hospital visits will not be extracted or used for any research purposes. We are committed to protecting participants’ personal privacy and data security. All data will be stored and analyzed in an anonymized manner, with all personally identifiable information (such as names, ID numbers, and contact details) removed from the collected files. All researchers have signed conflict-of-interest declarations to ensure the objectivity and impartiality of the study. We will regularly report study progress and any ethical issues to the Ethics Committee. Should any ethical violations be identified, immediate corrective actions will be taken, and the matter will be reported to the Ethics Committee.

Data collection storage

Data collection was conducted at Shenzhen Second People’s Hospital, involving 50 patients diagnosed with sleep apnea. To ensure data consistency, we used the Embla SDx Polysomnography (PSG) equipment to record the patients’ physiological data throughout the night. Simultaneously with the PSG data collection, we employed an OPPO Reno8 smartphone and a Newamy V03 digital voice recorder to capture the patients’ respiratory sounds during sleep. To ensure that the collected data could be aligned on the same timeline, we synchronized the Embla SDx PSG equipment, smartphone, and voice recorder to Beijing time, calibrating the timelines of these three data sources. Finally, we anonymized the collected data for processing. Table 1 lists the data channels recorded by the PSG equipment along with their respective sampling rates, and Fig. 1 displays the complete data for one of the patients.

Table 1 Data Sample Rate.

Full size table

Data annotation

We selected a professional team from the hospital, trained in using the Embla Rem- Logic software, to annotate all collected PSG data; all annotators are clinical doctors. The annotation standards follow the American Academy of Sleep Medicine (AASM) Manual for the Scoring of Sleep and Associated Events¹⁸. Based on the physiological data collected by PSG, four types of snoring states were primarily annotated: obstruc- tive apnea, central apnea, mixed apnea, and hypopnea. Since the smartphone, voice recorder, and PSG equipment were all calibrated to Beijing time, the snoring sound data can directly utilize the PSG annotation files.

Data synchronization

Data collected from multiple devices often faces synchronization challenges. Even PSG-audio datasets, with a maximum tolerable error of 2 seconds, can result in incon- sistencies across thousands of data points. To address this, we synchronized all devices to Beijing time, leveraging millisecond-precise timestamps from PSG data and audio file creation times from smartphones and digital voice recorders, aligning all data to a unified timeline.

Data Records

The dataset is organized into individual folders for each of the 50 patients, it can be accessed at Science Data Bank (https://doi.org/10.57760/sciencedb.19070)²³. Each patient’s folder contains a smartphone audio recording, two digital voice recorder recordings, sleep structure data, heart rate data, blood oxygen saturation data, and annotation data. In later experiments, we added airflow measurements for 36 of these patients, so their folders also include this additional physiological data.

The original PSG data included EEG and ECG recordings. However, as the dataset focuses on detecting sleep apnea events via respiratory sounds, only relevant physio- logical data—blood oxygen, airflow, and heart rate—were retained. This serves two purposes: providing a reference for annotation and enabling analysis of the relationship between physiological data (e.g., blood oxygen, airflow) and respiratory sounds. These data were exported from Embla RemLogic software into CSV files, with columns for patient data index, timestamp, and physiological data or sleep stage.

For the respiratory sound data recorded during the patients’ nighttime sleep, we directly exported the recordings from the smartphone and voice recorder, saving them in MP3 and WAV formats.

The annotation data is exported from the Embla RemLogic software in TXT file format. Subsequently, all event information is extracted from the annotations and saved as a JSON file, structured into three main components: record start, awake intervals, and events. record start:Time in seconds marking the PSG recording start (24-hour format). awake intervals: A list of time intervals when the patient was awake, per PSG sleep staging. events: A list of dictionaries, each detailing a snoring event with its type (event type), start time (event start), duration (event duration), and sleep stage (sleep stage). To align the annotation data with the snoring data, the start time (event start) of each event can be adjusted by subtracting record start.

Technical Validation

This dataset synchronously collects audio through a smartphone microphone and a professional digital recorder, with all timestamps calibrated to Beijing Time. The synchronization error with the PSG device is less than 1 second, ensuring the alignment of multimodal data.Fig. 2 presents box plots of SpO2 (blood oxygen saturation) and heart rate data for 50 patients. The plots indicate that, overall, the patients’ SpO2 levels and heart rates fall within normal ranges; however, there are some patients whose SpO2 and heart rates significantly deviate below the third quartile (Q3), with a similar distribution pattern in SpO2 levels. Specifically, patients numbered 25 and 45 exhibit SpO2 values well below Q3, indicating a substantial number of low SpO2 recordings. This suggests prolonged periods of hypoxemia during sleep. The average heart rate was 64.14 BPM (±8.24 BPM), with a coefficient of variation of 12.85%. Additionally, the average SpO₂ was 92.59% (±2.58%), with a coefficient of variation of 2.79%.Such observations reflect the reliability of the data in characterizing the severity of obstructive sleep apnea.

The last box plot in the figure illustrates airflow data during sleep for the final 36 patients. The data reveal that the dynamic range of airflow primarily spans between 40 and 80 units, depending on the severity of the patient’s sleep apnea. Comparing this with patients 25 and 45 from the previous two plots (patient 29 in the airflow box plot), it is evident that this patient has a broad dynamic range for both SpO2 and airflow. Conversely, the adjacent patients to this individual exhibit smaller interquartile ranges in their airflow data, which correspondingly results in narrower ranges for SpO2 levels in their datasets. Similar analyses of other patients’ airflow data suggest the severity of sleep-disordered breathing. Therefore, we can conclude that the dataset provides reliable insights into the condition of the patients.

Code availability

In this study, we utilized Embla’s Polysomnography (PSG) equipment and its accompanying software for data collection and event annotation. The dataset and data processing code are available at https://doi.org/10.57760/sciencedb.19070²³. This ensures transparency, facilitating peer review and future research.

References

Budhiraja, R., Siddiqi, T. A. & Quan, S. F. Sleep disorders in chronic obstructive pulmonary disease: etiology, impact, and management. Journal of Clinical Sleep Medicine 11(3), 259–270 (2015).
Article PubMed PubMed Central Google Scholar
Peppard, P. E. et al. Increased prevalence of sleep-disordered breathing in adults. American journal of epidemiology 177(9), 1006–1014 (2013).
Article PubMed PubMed Central Google Scholar
Young, T. et al. The occur- rence of sleep-disordered breathing among middle-aged adults. New England journal of medicine 328(17), 1230–1235 (1993).
Article CAS PubMed Google Scholar
Netzer, N. C. et al. Prevalence of symptoms and risk of sleep apnea in primary care. Chest 124(4), 1406–1414 (2003).
Article PubMed Google Scholar
Cowie, M. R., Linz, D., Redline, S., Somers, V. K. & Simonds, A. K. Sleep disordered breathing and cardiovascular disease: Jacc state-of-the-art review. Journal of the American College of Cardiology 78(6), 608–624 (2021).
Article PubMed Google Scholar
Guilleminault, C. et al. Determinants of daytime sleepiness in obstructive sleep apnea. Chest 94(1), 32–37 (1988).
Article CAS PubMed Google Scholar
Kapsimalis, F. & Kryger, M. Sleep breathing disorders in the us female population. Journal of women’s health 18(8), 1211–1219 (2009).
Article PubMed PubMed Central Google Scholar
Dempsey, J. A., Veasey, S. C., Morgan, B. J. & O’Donnell, C. P. Pathophysiology of sleep apnea. Physiological reviews 90(1), 47–112 (2010).
Article CAS PubMed PubMed Central Google Scholar
Punjabi, N. M. et al. Sleep-disordered breathing and mortality: a prospective cohort study. PLoS medicine 6(8), 1000132 (2009).
Article Google Scholar
Yaffe, K. et al. Sleep-disordered breathing, hypoxia, and risk of mild cognitive impairment and dementia in older women. Jama 306(6), 613–619 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ancoli-Israel, S. et al. Cognitive effects of treat- ing obstructive sleep apnea in alzheimer’s disease: a randomized controlled study. Journal of the American Geriatrics Society 56(11), 2076–2081 (2008).
Article PubMed PubMed Central Google Scholar
Marin, J. M. et al. Association between treated and untreated obstructive sleep apnea and risk of hypertension. Jama 307(20), 2169–2176 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gonzaga, C., Bertolami, A., Bertolami, M., Amodeo, C. & Calhoun, D. Obstruc-tive sleep apnea, hypertension and cardiovascular diseases. Journal of human hypertension 29(12), 705–712 (2015).
Article CAS PubMed Google Scholar
Shahar, E. et al. Sleep- disordered breathing and cardiovascular disease: cross-sectional results of the sleep heart health study. American journal of respiratory and critical care medicine 163(1), 19–25 (2001).
Article MathSciNet CAS PubMed Google Scholar
Akset, M., Poppe, K. G., Kleynen, P., Bold, I. & Bruyneel, M. Endocrine disor- ders in obstructive sleep apnoea syndrome: A bidirectional relationship. Clinical Endocrinology 98(1), 3–13 (2023).
Article PubMed Google Scholar
Tasali, E., Van Cauter, E. & Ehrmann, D. A. Relationships between sleep dis- ordered breathing and glucose metabolism in polycystic ovary syndrome. The Journal of Clinical Endocrinology & Metabolism 91(1), 36–42 (2006).
Article CAS Google Scholar
Kushida, C. A. et al. Practice parameters for the indications for polysomnography and related procedures: an update for 2005. Sleep 28(4), 499–523 (2005).
Article PubMed Google Scholar
Berry, R. B. et al. Rules for scoring respiratory events in sleep: update of the 2007 aasm manual for the scoring of sleep and associated events: deliberations of the sleep apnea definitions task force of the american academy of sleep medicine. Journal of clinical sleep medicine 8(5), 597–619 (2012).
Article PubMed PubMed Central Google Scholar
Kapur, V. K. et al. Clinical practice guideline for diagnostic testing for adult obstructive sleep apnea: an american academy of sleep medicine clinical practice guideline. Journal of clinical sleep medicine 13(3), 479–504 (2017).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Sleep Medicine, A. O. S. A. T. F. Clinical guideline for the evaluation, management and long-term care of obstructive sleep apnea in adults. Journal of clinical sleep medicine 5(3), 263–276 (2009).
Article Google Scholar
Korompili, G. et al. Psg-audio, a scored polysomnography dataset with simultaneous audio recordings for sleep apnea studies. Scientific data 8(1), 197 (2021).
Article PubMed PubMed Central Google Scholar
Bernardini, A., Brunello, A., Gigli, G. L., Montanari, A. & Saccomanno, N. Osasud: A dataset of stroke unit recordings for the detection of obstructive sleep apnea syndrome. Scientific Data 9(1), 177 (2022).
Article PubMed PubMed Central Google Scholar
Tao, J. et al. A multimodal dataset for training deep learning models aimed at detecting and analyzing sleep apnea[DS/OL]. V5. Science Data Bank. https://doi.org/10.57760/sciencedb.19070 (2025).

Download references

Acknowledgements

We acknowledge the contribution of the clinical team from the Department of Oto- laryngology at Shenzhen Second People’s Hospital in conducting nighttime sleep breathing tests and collecting related data for the patients. This research was sup- ported by the Shenzhen Clinical Medical Research Center for Otolaryngology Diseases (Grant No. 20220819120540004), and the Sanming Project of Medicine in Shenzhen (Grant No. SZSM202111016), and Guangdong Province Medical Science and Tech- nology Research Foundation(Grant No.A2024601), and Shenzhen Second People’s Hospital Clinical Research Fund of Shenzhen High-level Hospital Construction Project (project code:2023yjlcyj013).

Author information

These authors contributed equally: Jing Tao, Jingjing Huang.

Authors and Affiliations

Department of Otorhinolaryngology, Shenzhen Second People’s Hospital, 3002 Sun Gang West Road, Shenzhen, 518035, Guangdong, China
Jing Tao & Beiping Miao
ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
Jingjing Huang & Long Yang
Sleep Medicine Center, Eye & ENT Hospital, Fudan University, Shanghai, China
Jingjing Huang
State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
Long Yang

Authors

Jing Tao
View author publications
Search author on:PubMed Google Scholar
Jingjing Huang
View author publications
Search author on:PubMed Google Scholar
Beiping Miao
View author publications
Search author on:PubMed Google Scholar
Long Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

Jing Tao: Conceptualization, Writing - Original Draft, Visualization, Funding acqui- sition. Beiping Miao: Data Curation, Funding acquisition, Resources, Project administration. Jingjing Huang: Data Curation, Writing & Review - Editing. Long Yang: Methodology, Software, Validation, Formal analysis, Investigation, Supervision. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Beiping Miao or Long Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Tao, J., Huang, J., Miao, B. et al. A multimodal dataset for training deep learning models aimed at detecting and analyzing sleep apnea. Sci Data 12, 1263 (2025). https://doi.org/10.1038/s41597-025-05583-8

Download citation

Received: 13 January 2025
Accepted: 08 July 2025
Published: 18 July 2025
Version of record: 18 July 2025
DOI: https://doi.org/10.1038/s41597-025-05583-8