Abstract
In the field of critical care medicine, accurate analysis of physiological waveform data is crucial for clinical decision-making. Although existing public databases provide a wealth of physiological waveform data, publicly available ventilator waveform data remain limited. We have established a publicly accessible clinical and ventilator waveform dataset that includes demographic, blood gas analysis, biochemical indicators, and high-resolution respiratory waveform data from critically ill patients. This dataset aims to provide a valuable resource for respiratory mechanics research and to offer stronger scientific support for future studies.
Similar content being viewed by others
Background & Summary
In the domain of critical care medicine, the procurement of high-resolution physiological waveform data is indispensable for the comprehensive elucidation of patients’ physiological profiles and for the precision-guided execution of clinical interventions1,2,3,4,5,6,7,8,9,10. While existing databases such as MIMIC-III and MIMIC-IV have cataloged a variety of physiological waveform data, including monitor waveforms and electrocardiograms, providing valuable resources for medical research and clinical decision-making, they lack ventilator waveform data11,12. The data holds distinctive clinical value as it not only reflects the respiratory mechanics of the patient, such as airway pressure13 and tidal volume14, but also elucidates the interactions between the patient and the ventilator, including synchrony15,16 and pressure support17. These data are of significant importance for optimizing mechanical ventilation parameters, enhancing patient comfort, and guiding weaning strategies18,19,20. The collection and analysis of ventilator waveform data face a series of challenges. Firstly, the scarcity of such publicly available data limits its widespread use in both clinical practice and research, despite its enormous potential. Due to its high temporal resolution and large data volume, the acquisition and processing of ventilator waveform data require efficient technical support. Secondly, the complexity of ventilator waveforms gives them significant clinical and research value, particularly with the support of artificial intelligence technologies, which can reveal underlying patient respiratory patterns and predict clinical outcomes.
Currently, there is an absence of publicly available ventilator waveform data, which to some extent limits further research into respiratory mechanics. Moreover, the collection, storage, and processing of such data must adhere to privacy and security standards for medical data. Therefore, the public availability of ventilator waveform data has significant clinical implications. This would fill the existing gap in ventilator waveform data and facilitate the advancement of respiratory mechanics research and the innovation of clinical applications.
Methods
This study constructed a comprehensive clinical-ventilator waveform dataset for critically ill patients (CCVW-ICU) through retrospective data collection. The dataset integrates structured clinical data, text data from patients’ electronic medical records (EMR), and high-resolution, continuously recorded ventilator waveform data. The process includes data acquisition, extraction, transformation, and de-identification to ensure data quality and consistency. Figure 1 provides an overview of the entire process. The data source of this study was approved by the ethics committee of Capital Medical University (ethics number: ky2023-005-02). Since the study was designed as retrospective, the Institutional Review Board granted a waiver of informed consent. This study was conducted in accordance with the Declaration of Helsinki.
The CCVW-ICU dataset is derived from the Hospital Information System (HIS), Electronic Medical Record System (EMR), Clinical Monitoring System (CMS), and Laboratory Information System (LIS). It includes ventilator waveform data, demographic information, and clinical inpatient data. After conversion and formatting, these data were stored in data archives and underwent de-identification processing.
Sample collection
This study consecutively included adult patients from the Intensive Care Unit (ICU) and Surgical Intensive Care Unit (SICU) of Beijing Shijitan Hospital, Capital Medical University, between June and December 2024. Potentially eligible participants were screened during daily morning rounds. In our center, the initial strategy for mechanical ventilation typically adopts control modes (volume control or pressure control ventilation), while pressure support ventilation (PSV) is the preferred mode for assisted ventilation. During the morning rounds, the attending physician assesses the patient’s ventilation mode. For patients triggering breaths in control ventilation modes, the attending physician will evaluate their tolerance to switching to PSV.
To ensure the homogeneity and reliability of the research data, the following inclusion and exclusion criteria were established:
Inclusion Criteria:
-
1.
Age ≥ 18 years.
-
2.
Endotracheal intubation and receiving PSV treatment.
-
3.
Expected mechanical ventilation duration ≥ 48 hours as assessed by the attending physician at the time of screening.
Exclusion Criteria:
-
1.
PSV duration > 48 hours at the time of screening.
-
2.
Known history of neuromuscular disease.
-
3.
Known diaphragm dysfunction or history of diaphragm surgery.
-
4.
Contraindications for esophageal balloon catheter placement, including severe coagulation disorders, diagnosed or suspected esophageal varices, history of esophageal/gastric/lung surgery.
-
5.
Evidence of active pulmonary leaks (e.g., bronchopleural fistula, pneumothorax, mediastinal emphysema) or presence of a chest tube.
-
6.
Pregnant or breastfeeding women.
-
7.
Currently participating in other clinical trials that may interfere with the results of this study.
-
8.
The patient or their legal representative refuses to sign the informed consent.
Clinical data collection
In this dataset, all clinical information was meticulously curated manually and subjected to rigorous quality control to ensure data accuracy and completeness. More than 100 variables were extracted and systematically recorded, encompassing demographic characteristics (e.g., age, sex), vital signs (e.g., blood pressure, respiratory rate, body temperature), arterial blood gas parameters (e.g., partial pressure of oxygen and carbon dioxide), and key laboratory biomarkers (e.g., complete blood count, electrolytes). In addition, unstructured clinical narratives—including admission notes and discharge diagnoses—were manually transcribed and standardized. All information was reviewed item by item by Xiaozhu Liu and Xu An to ensure accuracy, consistency, and suitability for research purposes.
Acquisition and extraction of ventilator waveform data
All enrolled patients were monitored for esophageal pressure (Pes) using commercially available esophageal balloon catheters (Cooper Catheter, LOT 177405, Cooper Surgical, USA). Prior to catheter insertion, an esophageal balloon leak test was performed using the ventilator’s esophageal function to confirm the integrity of the balloon. To ensure that the balloon accurately reflects pleural pressure changes, the balloon was positioned in the lower third of the esophagus within the thoracic cavity. The balloon inflation volume was automatically determined and maintained by the ventilator according to a preset algorithm. To optimize the quality of the Pes signal and confirm the catheter’s position, all patients underwent the standard Baydur occlusion test. This test involved brief, intermittent abdominal pressure to observe synchronized changes in airway pressure (Paw) and Pes during inspiratory effort (ΔPaw/ΔPes ≈ 1), thereby verifying the appropriateness of the catheter’s position.
In this study, mechanical ventilation for patients was provided using the Dräger V500 ventilator (Drägerwerk AG & Co. KGaA, Lübeck, Germany). The acquisition of respiratory mechanics data was carried out by an independent, high-fidelity signal acquisition system. The key components of this system include:
-
1.
Flow Sensor: A heated Fleisch-type flowmeter (Vitalograph Inc., Lenexa, Kansas, USA) placed at the patient end of the ventilator circuit Y-connector, used to measure respiratory flow (Flow).
-
2.
Pressure Sensors: One sensor measures airway opening pressure (Paw), while another, connected to the esophageal balloon catheter, measures esophageal pressure (Pes).
-
3.
Signal Acquisition Hardware: The signals from the aforementioned sensors are digitized through a high-precision data acquisition card.
All raw physiological signals (Flow, Paw, Pes) were continuously sampled at 200 Hz. For each patient, waveform data were recorded once daily, with each session lasting 1 hour, resulting in a total of 24 sessions over the data collection period. The signals were displayed in real-time and synchronized for storage on a dedicated laptop. Data acquisition, real-time visualization, and raw data storage (in.dat format) were carried out using ICU-Lab software (version 2.6, KleisTEK Engineering, Bari, Italy).
Subsequent analysis was conducted offline: using ICU-Lab 2.6 software, complete waveform data from the target time segment were extracted based on stable respiratory cycles, typically yielding segments of several minutes in duration for detailed analysis. As shown in Fig. 2. The extracted raw waveform data were then exported into structured formats (e.g.,.csv,.xlsx) for further analysis using specialized statistical or analytical software. These curated, analysis-ready segments constitute the waveform data included in the final published dataset.
De-identification
In processing and preparing the dataset for this study, we ensured compliance with privacy protection standards by de-identifying potential Protected Health Information (PHI) in the file headers. Specifically, we removed all direct personal identifiers, such as patient names, identification numbers, phone numbers, and detailed addresses. The modified files were then saved for future research use. This method is simple and efficient, ensuring that our data processing complies with the Health Insurance Portability and Accountability Act (HIPAA) requirements.
Data Records
The complete CCVW-ICU dataset has been deposited in the Science Data Bank and is publicly accessible under the Digital Object Identifier (DOI) https://doi.org/10.57760/sciencedb.2622221. This DOI serves as the assembly accession for the entire, integrated collection of data. The dataset is structured around individual patient profiles, with the Patient ID (P01–P07) acting as the fundamental and unique accession identifier that links all data from a single subject.
For each patient, the data is organized into two primary components:
-
1.
Clinical Data Library: A collection of structured and unstructured clinical data files that provide comprehensive contextual information for the patient.
-
2.
Waveform Read Set: A single, high-resolution time-series file containing the ventilator waveform data, which corresponds directly to the clinical data library of the same patient.
The specific file composition and the correspondence between each patient’s clinical data library and waveform read set are detailed in Table 1. All files are accessible and can be retrieved via the persistent data repository link: https://doi.org/10.57760/sciencedb.26222.
Patient demographic and treatment information table
This table systematically records demographic characteristics and treatment trajectories, as shown in Figs. 3 and 4.
Clinical diagnosis and text data table
This dataset includes complete clinical diagnosis records and medical text data. The diagnostic information is categorized into admission diagnosis, discharge/death diagnosis, and primary diagnosis, recorded in free-text format, covering the patient’s primary diseases and complications. For example, patient P01 has 32 diagnostic entries, including “multifocal cerebral hemorrhage, bacterial pneumonia, sepsis,” etc. The chief complaint records the patient’s main symptoms at the time of consultation, such as “sudden onset of consciousness disorder for 4 hours” or “coma for over 10 months after head injury,” providing concise descriptions. The current medical history details the course of the disease, including the onset time, symptom characteristics, treatment process, and current status. For instance, patient P03’s record includes the complete process from trauma occurrence, emergency treatment, to multiple hospital transfers. The past medical history systematically records the patient’s underlying diseases, surgical history, and medication history, such as “bronchial asthma history for over 40 years, thyroid nodule resection over 50 years ago” for patient P02. To preserve the data as accurately as possible, all text data are saved in the original Chinese format, without translation or standardized coding. Users may utilize translation software to translate the data as needed.
Vital signs and blood gas analysis table
This table contains records of the patient’s vital sign monitoring and blood gas analysis parameters. Vital signs include temperature (°C), pulse rate (beats/min), respiratory rate (breaths/min), systolic and diastolic blood pressure (mmHg), height (cm), and weight (kg). Blood gas analysis includes blood pH, partial pressure of carbon dioxide (PCO₂, in mmHg), partial pressure of oxygen (PO₂, in mmHg), actual bicarbonate (HCO₃−, in mmol/L), standard bicarbonate (SBC, in mmol/L), base excess (BE, in mmol/L), oxygen saturation (SpO₂, in %), and oxygenation index. Additionally, it includes extracellular fluid base excess, total carbon dioxide (TCO₂), and hemoglobin-related parameters (oxyhemoglobin, carboxyhemoglobin, and methemoglobin percentages). All values retain their original measurement precision.
Laboratory indicators table
The laboratory indicators include PH(T), PCO₂(T), PO₂(T), PO₂(A-a)(T), PO₂(a/A)(T), RI(T), sodium ion, potassium ion, calcium ion, chloride ion, anion gap (AnGap), osmolality (mOsm), blood glucose, lactic acid, fraction of inspired oxygen (FiO₂), leukocytes, lymphocyte percentage, monocyte percentage, granulocyte percentage, eosinophil percentage, lymphocytes, monocytes, neutrophils, eosinophils, basophils, erythrocytes, hemoglobin concentration, hematocrit, MCV, MCH, mean corpuscular hemoglobin concentration (MCHC), red blood cell distribution width (RDW), platelets, mean platelet volume (MPV), PCT, platelet distribution width (PDW), alanine aminotransferase (ALT), total bile acids, aspartate aminotransferase (AST), total bilirubin, direct bilirubin, indirect bilirubin, total protein, globulin, globulin ratio, albumin, alkaline phosphatase, GGT, urea, creatinine, uric acid, total cholesterol, triglycerides, glucose, creatine kinase, lactate dehydrogenase, alpha-hydroxybutyrate dehydrogenase, total calcium, inorganic phosphorus, iron, magnesium, amylase, high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), potassium, sodium, chloride, prealbumin, corrected calcium, estimated glomerular filtration rate (eGFR), plasma prothrombin time, plasma prothrombin activity, international normalized ratio (INR), plasma fibrinogen, APTT, thrombin time, plasma D-dimer, N-terminal B-type natriuretic peptide precursor, procalcitonin, myoglobin, troponin I, creatine kinase MB isoenzyme mass, and C-reactive protein (CRP). These indicators reflect various aspects of the patient’s metabolism, electrolyte balance, hematological and inflammatory status.
Waveform data
The waveform data is stored in XLSX format, with a total of 7 files, each corresponding to one patient and recording various respiratory mechanics parameters over time. Each file contains several columns of data, with common column names such as “Time [s]”, “Flow [l/s]”, “Pao [cm H2O]”, and “Pes [cm H2O]”. The data is arranged in increasing time order, with a time interval of 0.005 seconds and a frequency of 200 Hz. Each row represents a data point for a specific time, capturing the values of each parameter at that time. The data within each file is continuously sampled, forming complete waveforms for multiple respiratory cycles, reflecting the respiratory mechanics characteristics of the patient throughout the entire recording period. The structure of all files is consistent, with parameter names and units standardized.
Mechanical ventilation
This table presents relevant parameter information for different patients undergoing mechanical ventilation, covering the clinical reasons for initiating mechanical ventilation and the corresponding initial ventilator settings. The table consists of five columns: “Reason for Mechanical Ventilation Parameters,” “PS,” “PEEP,” “FiO2,” and “ETS.” The “Reason for Mechanical Ventilation Parameters” column records the clinical indications for initiating mechanical ventilation, such as “poor oxygenation,” “hypoxia,” and “anesthesia recovery failure.” The subsequent columns list the initial ventilator settings (e.g., pressure support PS, positive end-expiratory pressure PEEP, fractional inspired oxygen FiO2, and expiratory tidal volume ETS) based on these clinical contexts.
Data Overview
This section provides a summary of the key demographic and clinical characteristics of the patient cohort included in the dataset, as derived from the Patient Demographic and Treatment Information Table. The average age of the patients was 63.1 ± 23.1 years, with 71.4% (5/7) being male and 28.6% (2/7) aged over 80 years. The median length of hospital stay was 17 days (range: 6–39 days), with an in-hospital mortality rate of 28.6% (2/7). Key descriptive fields include place of origin (42.9% from Beijing) and marital status (71.4% married).
Technical Validation
To ensure the technical quality and reliability of the CCVW-ICU dataset, a multi-faceted validation approach was implemented throughout the data lifecycle, from acquisition to curation.
Clinical data quality control
All structured clinical data were manually extracted from the hospital’s electronic medical record (EMR) system and subsequently underwent a rigorous, two-person verification process. Two independent researchers (Xiaozhu Liu and Xu An) cross-checked all entries against the source records to ensure accuracy, consistency, and completeness. Any discrepancies were resolved through consensus and, when necessary, by consulting the attending physician.
Waveform data fidelity
The fidelity of the raw physiological signals (Flow, Paw, Pes) was ensured by using a high-fidelity data acquisition system with sensors calibrated according to manufacturer specifications. The position of the esophageal balloon catheter was verified in all patients using the standardized Baydur occlusion test to ensure accurate measurement of esophageal pressure (Pes), a key metric for assessing respiratory effort. Only data segments with stable signals and correct catheter positioning, as confirmed by this test (ΔPaw/ΔPes ≈ 1), were selected for extraction and inclusion in the final dataset.
Preservation of data integrity in unstructured text
A conscious decision was made to preserve all unstructured clinical narratives (e.g., current medical history, diagnosis records) in their original Chinese to maintain absolute semantic and contextual integrity. While this may present a initial barrier to non-Chinese speakers, it prevents the introduction of errors and loss of clinical nuance that can occur with automated translation. This approach ensures that the data remains a pristine source for natural language processing (NLP) tasks. For users requiring translation, we recommend employing specialized academic or professional translation services for their specific research objectives.
Data completeness and integration
For each of the seven included patients, we verified the presence and internal consistency of all corresponding data files. The dataset is characterized by its completeness, with no missing data files for any patient, ensuring that each clinical data library is fully paired with its corresponding high-resolution waveform read set.
Usage Notes
The CCVW-ICU dataset is publicly accessible under a Creative Commons Attribution 4.0 International (CC BY 4.0) license via the Science Data Bank at https://doi.org/10.57760/sciencedb.26222. Users are free to share and adapt the material, provided appropriate credit is given.
Intended Use Cases and Scope: This dataset is best suited for:
-
1.
Methodological Development and Benchmarking: The high-resolution, multi-parameter waveform data paired with detailed clinical context provides an ideal benchmark for developing and validating new algorithms for signal processing, feature extraction, and event detection (e.g., patient-ventilator asynchrony) in mechanical ventilation.
-
2.
In-depth Physiological Studies: The richness of the data for each patient allows for deep, granular analysis of respiratory mechanics and patient-ventilator interactions within individuals.
-
3.
Pilot and Feasibility Studies: The dataset serves as a valuable resource for piloting new research questions or computational methods before applying them to larger, less-annotated cohorts.
-
4.
Foundation for Annotation Platforms: This carefully curated, high-quality dataset is particularly apt to serve as a foundational core for building standardized ventilator waveform annotation platforms, which can facilitate future large-scale data labeling initiatives.
Note on cohort size
Users should note that the dataset’s strength lies in the depth and resolution of data per patient rather than the size of the cohort (n = 7). It is not intended for large-scale epidemiological studies but rather for intensive, within-subject analyses and methodological work that requires meticulously aligned multimodal data.
Working with Chinese Text Data: Researchers intending to analyze the clinical text data (e.g., for NLP tasks) will need to handle the original Chinese narratives. A variety of mature NLP toolkits (e.g., BERT-based models pre-trained on Chinese corpora) and translation APIs are available for this purpose.
Data integration
To facilitate analysis, users can merge the structured clinical data from the various Excel tables using the provided Patient ID (P01-P07) as the unique key, and then link this integrated clinical profile to the corresponding waveform data file.
Data availability
The dataset generated during and/or analysed during the current study is available in the Science Data Bank repository under the https://doi.org/10.57760/sciencedb.26222.
Code availability
No custom code was used.
References
Gaspard, N., Westover, M. B. & Hirsch, L. J. Assessment of a Study of Continuous vs Repeat-Spot Electroencephalography in Patients With Critical Illness. JAMA Neurol 78, 369 (2021).
Papazian, L., Munshi, L. & Guérin, C. Prone position in mechanically ventilated patients. Intensive Care Med 48, 1062–1065 (2022).
Singh, T. et al. Exercise Electrocardiography and Computed Tomography Coronary Angiography for Patients With Suspected Stable Angina Pectoris: A Post Hoc Analysis of the Randomized SCOT-HEART Trial. JAMA Cardiol 5, 920–928 (2020).
Yavarimanesh, M. et al. Abdominal aortic aneurysm monitoring via arterial waveform analysis: towards a convenient point-of-care device. NPJ Digit Med 5, 168 (2022).
Mills, K. EMG waveforms: video companion to electromyography and neuromuscular disorders. J Neurol Neurosurg Psychiatry 72(1), 130, https://doi.org/10.1136/jnnp.72.1.130-a (2002). Jan.
Scharffenberg, M. et al. Respiratory mechanics and mechanical power during low vs. high positive end-expiratory pressure in obese surgical patients - A sub-study of the PROBESE randomized controlled trial. J Clin Anesth 92, 111242 (2024).
Rubulotta, F. et al. Mechanical Ventilation, Past, Present, and Future. Anesth Analg 138, 308–325 (2024).
Guay, C. S. et al. Postoperative Delirium Severity and Recovery Correlate With Electroencephalogram Spectral Features. Anesth Analg 136, 140–151 (2023).
Silva, D. O. et al. Impact on the ability of healthcare professionals to correctly identify patient-ventilator asynchronies of the simultaneous visualization of estimated muscle pressure curves on the ventilator display: a randomized study (P(mus) study). Crit Care 27, 128 (2023).
Nakornnoi, B., Tscheikuna, J. & Rittayamai, N. The effects of real-time waveform analysis software on patient ventilator synchronization during pressure support ventilation: a randomized crossover physiological study. BMC Pulm Med 24, 212 (2024).
Moody, B., Moody, G., Villarroel, M., Clifford, G. D., & Silva, I. MIMIC-III Waveform Database (version 1.0). (PhysioNet, 2020).
Moody, B., et al MIMIC-IV Waveform Database (version 0.1.0). (PhysioNet, 2022).
da Cruz, M. R. et al. Positive end-expiratory pressure induced changes in airway driving pressure in mechanically ventilated COVID-19 Acute Respiratory Distress Syndrome patients. Crit Care 27, 118 (2023).
Gobbi, A. et al. Effects of increasing tidal volume and end-expiratory lung volume on induced bronchoconstriction in healthy humans. Respir Res 25, 298 (2024).
Colombo, S. M. et al. Neural pressure support ventilation as a novel strategy to improve patient-ventilator synchrony in adult respiratory distress syndrome. Br J Anaesth 130, e430–e432 (2023).
Damiani, L. F. & Goligher, E. C. Lung and Diaphragm Protection During Mechanical Ventilation: Synchrony Matters. Crit Care Med 51, 1618–1621 (2023).
Grieco, D. L. et al. Patient-ventilator interaction with conventional and automated management of pressure support during difficult weaning from mechanical ventilation. J Crit Care 48, 203–210 (2018).
Hernández, G. et al. Effect of postextubation noninvasive ventilation with active humidification vs high-flow nasal cannula on reintubation in patients at very high risk for extubation failure: a randomized trial. Intensive Care Med 48, 1751–1759 (2022).
Telias, I. et al. Magnitude of Synchronous and Dyssynchronous Inspiratory Efforts during Mechanical Ventilation: A Novel Method. Am J Respir Crit Care Med 207, 1239–1243 (2023).
Wright, J. M. et al. Prone Position Ventilation in Neurologically Ill Patients: A Systematic Review and Proposed Protocol. Crit Care Med 49, e269–e278 (2021).
Liu, X. et al. Clinical and ventilator waveform datasets of critically ill patients in China. Science Data Bank https://doi.org/10.57760/sciencedb.26222 (2025).
Author information
Authors and Affiliations
Contributions
J.X.Z. designed the research. X.Z.L., P.W. and C.J.H. collected and analyzed the data. X.Z.L. drafted the manuscript. S.S.X., Y.F.W., H.L.L., and Z.M.T. contributed to the critical revision of the manuscript. All authors contributed to the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, X., Wang, P., Hao, C. et al. Clinical and ventilator waveforms of Chinese patients during pressure support ventilation. Sci Data 13, 48 (2026). https://doi.org/10.1038/s41597-025-06364-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-06364-z






