Introduction

Ventilator-associated events (VAE) are among the most common complications associated with invasive mechanical ventilation (IMV) in critically ill patients, leading to increased mortality, prolonged ICU and hospital stays, longer durations of mechanical ventilation, and higher healthcare costs1,2,3,4. Reported incidence rates range widely, from 7 to 107 events per 1000 ventilator-days1,2,3,5,6,7.

One of the major contributors to VAE is inappropriate airway management, particularly respiratory circuit events such as fluid accumulation (including condensate and patient secretions) and air leakage due to inadequate endotracheal cuff sealing8,9,10,11,12. Condensate in the ventilator circuit often becomes contaminated over time through patient secretions and healthcare manipulation9,12. During patient repositioning or suctioning, contaminated fluid may flow into the lower respiratory tract, promoting bacterial colonization and increasing the risk of infection8. Similarly, underinflated endotracheal tube cuffs allow oropharyngeal and gastric secretions to bypass the airway seal and enter the lower airway, serving as another route for VAE10,11.

To mitigate these risks, airway care bundles have been widely adopted and include strategies such as head-of-bed elevation, oral care, subglottic suctioning, cuff pressure management, and routine drainage of ventilator condensate13,14,15. Although effective and low-cost, these interventions are largely dependent on timely bedside monitoring and nursing vigilance. In practice, continuous surveillance of respiratory circuit conditions is difficult to achieve due to low nurse-to-patient ratios in ICUs and variable proficiency among staff in interpreting ventilator waveforms16.

Notably, these circuit events do produce identifiable waveform patterns on ventilators. Fluid accumulation often results in irregular oscillatory “sawtooth” flow waveforms, while circuit leakage typically causes the volume waveform to fail to return to baseline at end-expiration17. However, such changes may be subtle, not consistently accompanied by ventilator alarms, and are easily missed without real-time monitoring.

While certain waveform morphologies are clinically recognized as being associated with these events, there is currently no widely implemented system for automated, continuous detection of abnormal ventilator waveforms at the bedside. Most monitoring still relies on intermittent visual assessment and manual intervention, which is prone to delays or omissions. This represents a critical gap in the early detection and prevention of VAE.

To address this gap, our team previously developed and tested a pilot algorithm for recognizing waveform abnormalities based on flow and volume patterns18. Building on this foundation, the present study aims to: (1) validate an intelligent detection algorithm based on image processing and deep learning techniques capable of identifying fluid-accumulation-like patterns and leakage-like patterns events across multiple ventilator platforms in real time; and (2) quantify the incidence of fluid-accumulation-like patterns in ICU patients receiving IMV, along with its impact on airway pressure (ΔPaw). This work seeks to enable closed-loop surveillance of ventilator circuit events and provide actionable insights for early intervention to prevent progression to VAE.

Results

Patient characteristics

Between December 2024 and April 2025, a total of 48 mechanically ventilated patients were enrolled. Demographic and clinical characteristics are summarized in Table 1. After signal quality appraisal, 2794 h of ventilator waveform recordings were accepted, comprising 3,142,576 individual breaths. The median respiratory cycles per patient was 48,990 (IQR 29,297–90,075). Following label screening and disagreement resolution, 135,088 breaths were ultimately included for analysis (Fig. 1).

Fig. 1: Flow diagram of patient and breath selection for algorithm development and validation.
figure 1

A total of 48 patients were enrolled, with waveform data collected from 30 patients for training (1755 h) and 18 patients for external validation (1039 h). After expert review, breaths without agreement were excluded. A total of 26,768 breaths were included for model development and internal validation, and 30,528 for external validation. Third-expert adjudication was applied to resolve annotation discrepancies before final inclusion.

Table 1 Patients’ demographic and clinical characteristics

Among the 48 patients, 37 (77.1%) demonstrated fluid-accumulation-like patterns, and 26 (54.2%) experienced leakage-like patterns. A total of 44 patients (91.7%) had at least one respiratory circuit event during monitoring, and 19 patients (39.6%) exhibited both. This high prevalence highlights the clinical relevance of continuous detection, as many events could otherwise go unnoticed under routine observation.

The training dataset included 26,768 annotated breaths: 8440 (31.5%) normal, 13,296 (49.7%) fluid-accumulation-like patterns, and 5032 (18.8%) leakage-like patterns. Approximately 30% were used for internal validation. The external validation dataset comprised 30,528 breaths: 7976 (26.1%) normal, 10,984 (36.0%) fluid-accumulation-like patterns, and 11,568 (37.9%) leakage-like patterns (Table 2). Of total 57,296 annotated breaths, the median number of annotated breaths per patient was 932 (IQR 524–1434).

Table 2 Distribution of labeled breaths by event type and dataset

Algorithm performance and interpretation

The training process analysis indicates the model approached convergence without overfitting, as evidenced by the stable descending trends with gradually reduced slopes but no plateaus in the loss curves (training bounding box loss, training distribution focal loss, validation bounding box loss, and validation classification loss). The training classification loss decreased smoothly, indicating stabilized learning for classification tasks. Performance metrics demonstrated that precision, recall, and mAP@50 reached consistently high levels, while mAP@50-95 maintained upward trends, reflecting ongoing improvements in localization precision, particularly for stricter IoU thresholds (Fig. 2). The synchronous improvements in training/validation metrics and steadily decreasing validation losses confirmed good generalization without overfitting. These convergence characteristics suggest appropriate epoch settings and effective feature learning.

Fig. 2: Training process analysis and model performance.
figure 2

The top row shows the stable descending trends in key loss function curves over training epochs, including training bounding box loss, training distribution focal loss, training classification loss, validation bounding box loss, and validation classification loss. The training process analysis indicates the model approached convergence without overfitting. The bottom row demonstrates improved model performance, with recall, precision, mAP@50, and mAP@50-95 all reaching consistently high levels. Solid lines indicate raw training results, while dashed lines represent smoothed trends.

In internal validation, the model achieved excellent classification results. For fluid-accumulation-like patterns, the recall was 99.80% (3936/3944), precision reached 100%, and the F1 score was 99.90%. For leakage-like patterns detection, the algorithm achieved perfect performance, with 100% recall, 100% precision, and an F1 score of 100% (1552/1552). The overall classification accuracy across all event types was 99.93%, indicating high model reliability under controlled validation conditions.

In external validation, the model maintained strong performance. For fluid-accumulation-like patterns, the recall was 85.92% (9224/10,736), precision remained high at 99.83%, and the F1 score was 92.35%, resulting in an overall classification accuracy of 91.82%. For leakage-like patterns, the model continued to perform with high reliability, achieving 99.23% recall (11,272/11,360), 100% precision, an F1 score of 99.61%, and overall accuracy of 99.52%. Across the entire external dataset, the total misclassification rate was 4.35% (1616 out of 37,112 breaths) (Table 3, Fig. 3).

Fig. 3: Confusion matrices for internal and external validation of the classification model.
figure 3

Internal (A) and external (B) validation results are shown. Rows represent the predicted labels, and columns represent the true waveform labels. Diagonal elements indicate correctly classified samples, with darker shades reflecting higher counts. The grey dashed lines demarcate the analysis boundaries between flow and volume waveform categories; performance metrics were computed independently within these groups. The model demonstrated strong classification consistency, particularly in identifying sawtooth flow and non-zero volume patterns.

Table 3 Detection performance metrics of the classification algorithm

The algorithm showed particularly high consistency in detecting leakage-like patterns, likely due to the distinct and easily recognizable waveform feature of sustained expiratory volume, which serves as a reliable morphological cue. In contrast, fluid-accumulation-like patterns presented with more variable waveform signatures, particularly in breaths influenced by secretion mobilization, patient effort, or intermittent partial obstruction. These complex and less uniform flow patterns may account for the modest decline in recall observed in the external validation cohort.

Nevertheless, the very high precision observed for both event types—exceeding 99%—suggests a low false-positive rate. This characteristic is especially valuable in clinical settings, where excessive false alarms can contribute to alert fatigue. Taken together, these results support the potential utility of the proposed algorithm for real-time respiratory circuit event detection, with strong generalizability and minimal risk of over-alerting.

While deep learning models demonstrate strong performance in ventilator waveform analysis, their opaque decision-making process remains a clinical adoption barrier. Grad-CAM visualization revealed that: normal waveforms show uniform activations across respiratory phases, effectively capturing fundamental respiratory rhythm characteristics; leakage-like patterns focused attention at volume-time curve discontinuities; and fluid-accumulation-like patterns strong responses to flow-time sawtooth oscillations (Fig. 4). These heatmaps validated the model’s attention aligns with clinically meaningful waveform features and provided clinicians with interpretable decision evidence, effectively addressing the black-box concern.

Fig. 4: Grad-CAM visualization of the model’s decision-making process.
figure 4

The heatmaps generated using the Grad-CAM method highlight the key areas the model focuses on when identifying normal waveforms, leakage-like patterns, and fluid-accumulation-like patterns. For normal respiratory waveforms, the heatmap is evenly distributed across inspiratory and expiratory phase, capturing the respiratory rhythm characteristics. In the case of leakage-like patterns, the model significantly concentrates on areas where the volume-time curve is incomplete return to baseline. For fluid-accumulation-like patterns, the model responds strongly to the flow-time sawtooth oscillations.

Incidence of fluid-accumulation-like patterns and associated airway pressure changes

Fluid-accumulation-like patterns occurred in 37 patients, with individual rates ranging from 0.12% to 11.89% of recorded breaths. The median event frequency was 2.58% [IQR 1.04–4.14%], underscoring that while some patients experienced minimal fluid-accumulation-like patterns, others had persistent or recurrent events.

To evaluate the potential association between fluid-accumulation-like patterns and respiratory mechanics, ΔPaw—the change in airway pressure during fluid-accumulation-like patterns versus baseline—was calculated. The median ΔPaw was 2 cmH₂O [IQR 1–6], with a wide inter-patient range of 0 to 23 cmH₂O. In relative terms, this translated to a median pressure increase of 9.52% [IQR 5.56–28.57%], with extremes reaching over 150%.

Patients were stratified into high and low fluid-accumulation-like patterns incidence groups. Those in the high-incidence group had significantly greater ΔPaw (Z = −2.32, p = 0.02, r = −0.38) and greater percentage increases in pressure (Z = −2.71, p = 0.007, r = −0.44) (Fig. 5), suggesting a possible dose-response relationship between fluid accumulation burden and airway pressure variability.

Fig. 5: Comparison of airway pressure changes (ΔPaw) between low-incidence and high-incidence fluid-accumulation-like patterns groups.
figure 5

Data distributions were visualized through boxplots. The left panel displays absolute ΔPaw in cmH₂O, while the right shows percentage increase in ΔPaw. The lines within boxes represent the medians, and boxes include the 25th through 75th percentiles. p-values derived from Mann–Whitney U test.

This finding implies that frequent or unrecognized fluid accumulation may disrupt airway dynamics, elevate ventilatory workload, and potentially trigger inappropriate ventilator compensation, particularly in pressure-targeted modes. The ability to detect such events in real time could inform timely suctioning, circuit inspection, or mode adjustments—contributing to safer, more responsive respiratory support.

Discussion

The main findings of our study were: (1) the deep learning algorithm accurately identified respiratory circuit events from ventilator waveforms, with excellent precision and specificity across both internal and external validation datasets. Although a small number of false negatives occurred, particularly in identifying fluid-accumulation-like patterns, the overall F1 score remained high. (2) Fluid-accumulation-like patterns were frequently observed among ICU patients receiving IMV, with 77.1% of patients exhibiting at least one such event. Notably, a higher incidence of fluid-accumulation-like patterns was associated with elevated ΔPaw. This finding should be interpreted cautiously, as it may reflect underlying ventilatory changes requiring further validation.

Most ICU patients under invasive mechanical ventilation bypass natural airway humidification due to the use of endotracheal or tracheostomy tubes19,20. Heated humidifiers (HHs), while effective in maintaining appropriate airway humidity, promote condensate formation due to temperature gradients between delivered gases and ambient conditions21,22. Although fluid accumulation in ventilator circuits is well-recognized, few studies have quantified its real-world incidence. Ricard et al. noted condensation as a marker of humidification efficiency21, and Craven et al. reported mean condensate accumulation rates of 30 mL/h12. Our study quantified the incidence of breath-level waveform patterns visually consistent with fluid accumulation, showing a median of 2.58% across cycles.

Accumulated condensate can serve as a bacterial reservoir. Contamination rates increase over time, and studies have shown up to 80% of condensates contain high-density bacterial growth after 24 h8,12. Frequently isolated organisms include Gram-negative bacilli such as Acinetobacter baumannii and Pseudomonas aeruginosa, both of which are commonly implicated in infectious VAEs9,12,23. In our cohort, 7 out of 37 patients with fluid-accumulation-like patterns developed VAE, compared to only 1 out of 11 patients without such events. These findings support the hypothesis that fluid-accumulation-like patterns contribute not only to airway obstruction but also to infection risk, particularly when contaminated condensate is inadvertently introduced into the lower respiratory tract during suctioning, repositioning, or circuit manipulation8,24.

Beyond infection risk, fluid-accumulation-like patterns were associated with increased airway resistance, turbulence, and elevated ΔPaw, with observed values reaching up to 23 cmH₂O. This finding is clinically relevant, as elevated ΔPaw reflects increased breathing effort, reduced tidal volume, and potential risk of ventilator-patient asynchrony. Moreover, circuit fluid can induce ventilator auto-triggering, disrupting synchrony and increasing sedation needs17,25. Our data revealed significantly higher ΔPaw in patients with frequent fluid-accumulation-like patterns, implying a potential association with circuit fluid accumulation. These waveform patterns were associated with measurable changes in ventilator parameters; however, their physiological origin and clinical impact remain to be determined. Notably, such changes often remain unnoticed until triggering ventilator alarms, underscoring the need for continuous surveillance.

The endotracheal tube prevents glottis closure, requiring an optimally inflated cuff to ensure an effective seal and minimal tracheal mucosal injury. Clinically, the tracheal diameter around the cuff varies with factors such as patient mobilization, swallowing and coughing26,27, frequently resulting in cuff underinflation as demonstrated by manometric measurements28,29. This underinflation may result in the aspiration of oropharyngeal and gastric secretions into the lower respiratory tract, with each event potentially increasing the risk of infection30. These findings underscore the importance of continuous monitoring and prompt intervention in airway management.

The inclusion of a clinically heterogeneous cohort was intentional, as our goal was to develop a robust and generalizable algorithm that could perform well across a wide spectrum of ICU patients, rather than being tailored to a specific pathology. Representative flow and volume waveforms across different diagnoses (e.g., acute respiratory distress syndrome (ARDS), pneumonia, postoperative) are provided to illustrate the diversity of input signals (Fig. 6). The model’s high external validation performance suggests it successfully learned the core visual features of the target events, independent of the underlying disease state.

Fig. 6: Representative ventilator waveform examples across different clinical diagnoses.
figure 6

This column illustrates representative flow and volume waveforms from patients with varying clinical diagnoses, including ARDS, pneumonia, and postoperative conditions. Each row shows waveform segments exhibiting normal patterns, non-zero volume (characteristic of leakage-like patterns), and sawtooth oscillations (characteristic of fluid-accumulation-like patterns). The comparative visualization highlights both disease-specific waveform alterations and common waveform patterns across different clinical scenarios.

The selection of YOLOv8 for ventilator waveform anomaly detection was primarily based on its superior real-time performance and precise small-object detection capability. As a single-stage object detection algorithm, YOLOv8’s end-to-end prediction architecture provides significantly faster inference speeds compared to two-stage models, making it particularly suitable for time-sensitive clinical monitoring applications. The anchor-free design of YOLOv8 simplifies parameter optimization and demonstrates better adaptability to the multi-scale characteristics of waveform anomalies than the predefined anchor boxes used in SSD. Furthermore, the C2f module enhances shallow feature extraction, enabling effective capture of subtle waveform details, while the dynamic receptive field adjustment in the SPPF module allows for better matching of temporal span variations in abnormal signals. Although the introduction of temporal attention mechanisms could potentially improve the model’s ability to capture long-range waveform dependencies in future work, the current architecture already fully meets the fundamental requirements for clinical monitoring applications.

The algorithm demonstrated robust performance. Precision consistently exceeded 99%, with F1 scores above 92% across all event types. Recall for leakage-like patterns remained high (>99%), while recall for fluid-accumulation-like patterns was lower in external validation, albeit still acceptable (>85%). This discrepancy is likely attributable to the consistent waveform patterns of leakage (e.g., non-zero volume return), compared to the variable morphology of fluid accumulation signals influenced by secretions, airflow dynamics, and patient effort. While this may lead to occasional missed detections, the high precision minimizes false positives and reduces unnecessary clinical intervention.

Compared with previous studies that used pressure waveform analysis or raw waveform signals via proprietary ventilator software31,32,33,34,35,36, our approach offers several advantages. First, we used video-based waveform data, allowing compatibility across different ventilator models without requiring interface integration. Second, the image-based method enables deployment with simple bedside camera setups. Third, the model supports real-time detection, overcoming the limitations of offline analysis. In our preliminary tests, the complete system latency (encompassing image capture, processing, and display operations) is maintained below 500 ms, which is significantly faster than the duration of a single respiratory cycle (typically 2–5 s). This performance ensures reliable waveform detection while providing sufficient time for clinical intervention, with multiple detection opportunities guaranteed within each respiratory cycle to prevent missed events. Together, these features support its feasibility for clinical integration and scalability.

Automated detection of fluid-accumulation-like patterns and leakage-like patterns offers a practical tool for improving airway care and reducing VAE risk. Integrating such algorithms into bedside monitoring systems could enable continuous surveillance, prompt timely interventions (e.g., suctioning or drainage), and reduce reliance on manual inspection. Given the high frequency of unrecognized circuit events, intelligent alerts may improve adherence to airway management protocols and minimize ventilator-related complications. The observed association between fluid-accumulation-like patterns and ΔPaw elevation provides preliminary evidence for developing physiology-based alarm thresholds, though clinical implementation would require further validation. Combining waveform detection with ΔPaw monitoring may provide a closed-loop feedback system for circuit maintenance and VAE prevention. Furthermore, the CNN-based waveform analysis framework developed in this study exhibits significant extensibility. This pattern recognition approach is directly applicable to patient-ventilator asynchrony detection, as such events also generate distinct visual features in ventilator waveform morphologies. This highlights the potential of our approach as a versatile platform for comprehensive, real-time ventilator monitoring.

This study has several limitations. First, a primary limitation of this study is that the labels for our deep learning model were based on visual inspection of waveform morphology by experts, rather than on a concurrently validated physiological ground truth. Without simultaneous confirmation of fluid in the circuit via drainage or other objective measures, the specificity of waveform interpretation remains limited, as patterns such as sawtooth oscillations can arise from other phenomena like cardiogenic oscillations or patient effort. Therefore, our study should be interpreted as the development and validation of an algorithm to detect specific waveform patterns, and the association of these patterns with clinical events requires further validation. Second, the observed association between the higher incidence of fluid-accumulation-like waveforms and elevated ΔPaw does not definitively establish causality. The observational nature of this study and potential confounding factors (e.g., changes in compliance, circuit resistance, or patient effort) necessitate cautious interpretation. The data demonstrate an association rather than definitive causation, and further physiological studies are needed to elucidate the precise causal pathways. Third, this study focused on patients ventilated in PCV and PSV modes. Since waveform manifestations of these events could differ in volume control ventilation (VCV), our model cannot be directly applied to this mode without additional training and validation. Future studies should extend this work to VCV to improve the tool’s generalizability. Fourth, the recorded video resolution was sufficient for detecting the macroscopic waveform distortions characteristic of significant fluid accumulation and circuit leakage, which were the focus of our study. However, detecting very subtle abnormalities might be challenging with this method and could benefit from higher-resolution signal analysis. To further improve detection of subtle abnormalities, future enhancements could explore: upgrading to 4 K resolution for better waveform detail preservation, implementing temporal attention mechanisms to capture minute variations across consecutive frames, and optimizing imaging geometry to reduce angular deviation, distortion, and exposure variations while preserving non-invasive operation. Fifth, while our video-based approach offers compatibility in scenarios where direct ventilator interface data access is unavailable, it inherently carries limitations related to image resolution constraints and the inability to derive precise ventilator parameters or respiratory mechanics. For a comprehensive decision-support system that aims to optimize ventilation settings beyond event detection, these data would be beneficial. We are actively developing solutions to capture and combine these ventilator data. Sixth, the model was developed and validated in a single-center ICU setting. Although external validation showed good performance, further evaluation across different ventilator brands, ventilation strategies, and clinical environments is needed. These limitations highlight areas for future research, including prospective validation with concurrent physiological confirmation, extension to VCV and other modes, ventilator data combination, multicenter validation, and exploration of additional waveform anomalies, to further enhance generalizability and clinical integration.

This study developed a deep learning algorithm that detects visually defined ventilator waveform patterns that may be consistent with fluid accumulation or circuit leakage, based on expert visual labeling. These events were common in ICU patients and associated with increased airway pressure, potentially impacting patient safety. The algorithm enables real-time, automated monitoring, offering a practical tool to support early intervention, reduce the risk of ventilator-associated events, and enhance clinical airway management at the bedside.

Methods

Study design and participants

This prospective cohort study was conducted in the ICU of Ruijin Hospital, Shanghai Jiao Tong University School of Medicine (Trial registration: ChiCTR2500095298). Adult patients (≥18 years) who were expected to receive invasive mechanical ventilation (IMV) for ≥24 h were eligible. Exclusion criteria included pregnancy, bronchopleural fistula, and bronchoesophageal fistula. The study protocol was approved by the Ethics Committee of Ruijin Hospital, Shanghai Jiao Tong University School of Medicine (RJ2024-457). Sample size was determined based on the availability of eligible patients during the study period. Informed consent was obtained from all participants or their legally authorized representatives prior to waveform data collection.

Data acquisition

Waveform data were recorded using a HIKVISION auto-focus 2K-resolution (1920 × 1080 pixels) external camera focused on the ventilator display interface (Puritan Bennett 840, Puritan Bennett 980, and Comen V8). The resolutions of the acquired data were calculated to be 0.4 L/min per pixel for flow (80 L/min range/200 vertical pixels), 0.2 cmH2O per pixel for pressure (40cmH2O range/200 vertical pixels), and 5 mL per pixel for volume (1000 mL range/200 vertical pixels). Video recordings were performed continuously at 30 frames per second for at least 24 h per patient and included flow, volume, and pressure waveforms across various ventilation modes, including pressure support ventilation (PSV) and pressure control ventilation (PCV). Waveform types were selected based on known signal alterations associated with circuit events—fluid accumulation primarily affects flow, while leakage affects volume.

Two experienced researchers independently screened and classified the waveform recordings for each patient. Segments with excessive artifacts were excluded. Representative examples of three waveform categories (normal, fluid-accumulation-like patterns and leakage-like patterns) were selected, reviewed, and verified by a third expert in cases of disagreement. Only segments with agreement from at least two reviewers were included for further analysis. The labeling process was based on expert visual pattern recognition of morphologies that are commonly associated with these events in clinical practice and literature17. A blinded protocol prevented reviewers from accessing the external validation outputs. Breath counts were estimated by multiplying the number of images by the corresponding respiratory frequency.

Image processing and annotation

To ensure waveform clarity and minimize geometric distortion, all raw images underwent a standardized preprocessing pipeline. First, non-relevant areas (e.g., ventilator frame and background) were segmented and removed using an ultra-lightweight detection model (NanoDet-Plus). Subsequently, a custom algorithm based on the Hough Transform was used for skew correction, which included grayscale conversion, Canny edge detection, Hough line detection, and image rotation18. Finally, the corrected waveforms were resized with aspect-ratio-preserving padding to standardize dimensions while preventing distortion. The acquired raw data undergoes this image processing to generate the final model input data, as illustrated in Fig. 7.

Fig. 7: Workflow for waveform image preprocessing.
figure 7

The original images (1920 × 1080 pixels) are first processed using the ultra-lightweight object detection model NanoDet-Plus to extract waveform regions. Then, Hough line detection and image rotation are applied for skew correction, which enhances the quality of the training dataset, thereby contributing to better model performance and generalization. Finally, resizing and padding are performed to ensure the images maintain their original aspect ratio without distortion, producing the final model input images (640 × 640 pixels).

Processed images were annotated using LabelImg software. Waveforms were labeled as: (1) Normal Flow, (2) Sawtooth Flow (indicating fluid-accumulation-like patterns), (3) Normal Volume, and (4) Non-zero Volume (indicating leakage-like patterns), based on predefined criteria (Table 4, Fig. 8). To reduce patient-specific bias, images were proportionally sampled to ensure diversity within the training and validation datasets.

Fig. 8: Representative ventilator waveforms under normal conditions, fluid accumulation, and leakage.
figure 8

Flow, volume, and pressure waveforms are shown during pressure control ventilation. Fluid accumulation presents with irregular oscillations in flow and pressure. Leakage is characterized by volume waveform failing to return to the baseline. These patterns can assist in the automated detection of circuit events.

Table 4 Definition of label used in the annotation process

Algorithm development and training

A YOLOv8n-based convolutional neural network (CNN) model was developed to detect respiratory circuit events through image recognition and classification37. The YOLOv8 model architecture builds upon YOLOv5 with three core components: a Backbone for feature extraction using C2f modules with bottleneck blocks and residual connections, a Neck with spatial pyramid pooling fast (SPPF) for multi-scale feature fusion, and a Head for detection and classification tasks. The model processes 640 × 640 RGB input images and outputs multi-scale predictions including bounding box coordinates, confidence scores, and class probabilities after non-maximum suppression (NMS) processing. For training, we used 75 epochs with a batch size of 8 and 640 × 640 input resolution. The optimizer was SGD with momentum (0.937) and L2 weight decay (0.0005), with an initial learning rate of 0.004 adjusted via cosine annealing. A 3-epoch warmup phase (momentum = 0.8) stabilized initial training. The loss function weighted box regression at 0.08 and classification at 0.8. Early stopping (patience = 30 epochs) prevented overfitting while maintaining detection performance. The architecture diagram of the YOLOv8 model is shown in Fig. 9. The training environment consists of Ubuntu 20, CUDA 11.8, and a GeForce RTX 308018.

Fig. 9: Architecture diagram of the YOLOv8 model.
figure 9

The network structure consists of three main components: Backbone, Neck, and Head. The Backbone is responsible for feature extraction, using a series of convolutional and deconvolutional layers with residual connections and bottleneck structures to reduce network size and enhance performance. The Conv block includes Conv2d layers, BatchNorm2d layers, and the SiLU activation function. The C2f module processes intermediate feature maps, splitting them into two parts, one of which is directly passed to the final Concat block, while the other is further processed through multiple Bottleneck blocks before concatenation. The Bottleneck block consists of a 1 × 1 convolution layer followed by a 3 × 3 convolution, with skip connections to improve feature extraction and increase the receptive field. The Neck performs multi-scale feature fusion, utilizing the Spatial Pyramid Pooling Fast (SPPF) module for pooling operations at different scales. The output feature maps are concatenated and processed through a convolutional layer. The Head handles the final detection and classification tasks, with the detection head generating bounding box and class predictions, while the classification head uses global average pooling for class probabilities.

To address potential imbalance issues, we implemented several mitigation strategies during model training. First, we employed loss weighting by setting cls_pw to 2.0 for classification loss and obj_pw to 1.2 for objectness loss, which helps balance gradient updates across classes. Second, we incorporated comprehensive data augmentation techniques including random horizontal flipping (probability = 0.5), HSV color space perturbations (hsv_h: 0.015, hsv_s: 0.7, hsv_v: 0.4), scale transformations (scale: 0.5), and mosaic augmentation (mosaic: 1.0). These augmentations significantly enhance feature diversity, particularly for minority classes, without requiring explicit oversampling.

During model training, we carefully optimized several hyperparameters to ensure stable convergence and balanced performance. The initial learning rate was set to 0.004 to prevent training oscillations, while the classification loss weight (cls_pw) was increased to 2.0 to address class imbalance. To enhance feature representation, we implemented robust data augmentation techniques including mosaic augmentation (mosaic = 1.0) and HSV color space transformations (hsv_s = 0.7). Additionally, L2 weight decay (weight_decay = 0.0005) was applied to regularize the model and mitigate overfitting.

The input data comprised the annotated waveform images. The training set was randomly partitioned into 70% for model training and 30% for internal validation. The goal of the model was to automatically classify breaths as normal, fluid accumulation, or leakage based on waveform patterns. The input data comprised the annotated waveform images. The training set was randomly partitioned into 70% for model training and 30% for internal validation. The goal of the model was to automatically classify breaths as normal, fluid accumulation, or leakage based on waveform patterns.

To enhance model interpretability, Gradient-weighted Class Activation Mapping (Grad-CAM) was implemented to visualize the decision-making basis of the detection model38. Heatmap generation enabled precise localization of key regions of interest during identification of normal waveforms, fluid-accumulation-like patterns and leakage-like patterns. This visualization approach validated the alignment between the model’s decision logic and clinically relevant waveform features.

External validation

The final external validation dataset consisted of waveform tracings from 18 patients who were not included in the training set. The same preprocessing and annotation procedures were applied. A blinded annotation protocol was implemented, where experts performed labeling without access to the algorithm’s results. Performance of the trained algorithm was evaluated against expert visual assessments, considered the reference standard.

Patient characteristics and clinical outcomes

Demographic and clinical data were collected for each participant, including age, sex, reason for intubation, Acute Physiology and Chronic Health Evaluation II (APACHE II), Sequential Organ Failure Assessment (SOFA) scores, ICU and hospital length of stay (LOS), duration of mechanical ventilation, ICU discharge status (survived or deceased), and occurrence of VAE according to the Centers for Disease Control and Prevention-National Healthcare Safety Network (CDC-NHSN)39. ΔPaw, defined as the difference in peak airway pressure between fluid-accumulation-like patterns and normal ventilation periods, was calculated to evaluate the potential association with fluid-accumulation-like waveform manifestations.

Statistical analysis

Descriptive statistics were used to summarize patient characteristics and outcomes. Continuous variables were expressed as median [interquartile range], and categorical variables as frequency (percentage). Model performance was continuously monitored during training and validation through tracking of five key loss functions on the training and validation sets: (1) training bounding box loss, (2) training classification loss, (3) training distribution focal loss, (4) validation bounding box loss, and (5) validation classification loss. The object detection tasks were quantitatively assessed through standard performance metrics: accuracy, precision, recall, F1 score, mean average precision at intersection over union (IoU) threshold 0.50 (mAP@50), and mean average precision at IoU threshold 0.50 to 0.95 (mAP@50-95)18. Patients were stratified into low and high fluid accumulation groups based on the median incidence rate. Group comparisons of ΔPaw and percentage increase in ΔPaw were conducted using either a two-tailed independent samples t-test or the Mann–Whitney U test, depending on normality assessed by the Shapiro–Wilk test. Statistical analyses were performed using IBM SPSS Statistics version 27.0.1. A two-sided p-value <0.05 was considered statistically significant. Effect size r was calculated manually from Z scores and sample size. Graphical representations were performed using GraphPad Prism version 10.1.2.