Abstract
Obstructive Sleep Apnea Hypopnea Syndrome (OSAHS) is a prevalent systemic disorder affecting approximately 1 billion people worldwide, associated with severe outcomes such as sudden death and traffic accidents. Despite its significant impact, OSAHS is frequently underdiagnosed. The current gold standard for assessing OSAHS severity, overnight polysomnography, is both costly and inconvenient. This study aims to develop an automated method for detecting apnea and hypopnea events using a temporal convolutional network (TCN) to improve diagnostic accuracy and reduce computational costs. We introduce a novel Temporal Convolutional Network with a Linearly Scalable Attention Mechanism (ECG-TCN) designed to simultaneously detect both apnea and hypopnea events. The model was trained and validated using the University College Dublin Sleep Apnea Database. The performance of ECG-TCN was evaluated based on per-segment classification accuracy and generalization capacity. The ECG-TCN model achieved an accuracy of 91.6% in per-segment classification, demonstrating superior performance compared to traditional classification models. Additionally, the model exhibited high generalization capacity, indicating its robustness across different datasets. This study presents the first application of a temporal convolutional network with a linearly scalable attention mechanism for the simultaneous detection of apnea and hypopnea events. The ECG-TCN model offers a cost-effective and accurate alternative to traditional diagnostic methods, with the potential to enhance the early detection and management of OSAHS.
Similar content being viewed by others
Introduction
Obstructive Sleep Apnea Hypopnea Syndrome (OSAHS) is a prevalent symptomatic disorder defined by recurrent episodes of partial or complete airway obstruction during sleep, leading to intermittent hypoxemia and sleep fragmentation. Epidemiological data indicate a global prevalence of approximately 1 billion cases, with regional variations demonstrating that the disorder affects over 50% of the population in certain countries, underscoring its substantial public health burden1. The condition affects 4% of adult males and 2% of adult females, making it more frequent in males than females2. However, among apnea patients, 93% of middle-aged women and 82% of middle-aged men have undetected moderate-to-severe sleep apnea3. With a 3% frequency among preschoolers, sleep apnea affects teenagers as well4. These figures demonstrate the need of identifying such patients from the population. According to several studies, sleep apnea triples the risk of heart disease, quadruples the risk of stroke, and dramatically increases the likelihood of traffic accidents5,6,7. In this case, research into detecting OSAHS is critical.
The gold standard approach of determining the severity of sleep apnea requires overnight laboratory polysomnography (PSG), which counts the number of apneic and hypopneic events per hour of sleep, known as the apnea-hypopnea index (AHI)8. However, it is seen as highly uncomfortable because of the high number of cumbersome cables and intrusive sensors attached to the subject’s body, and a test is exceedingly expensive to conduct, making it cannot be widely accessible to the general population9. Furthermore, according to numerous research10,11, the PSG-based assessment of sleep apnea severity is fundamentally flawed as it relies solely on a single night’s records in an unfamiliar and inherently unpleasant clinical environment, which often fails to reflect the patient’s real-world sleep patterns. Moreover, the number of medical institutions equipped with specialized PSG devices and qualified specialists capable of interpreting PSG results and detecting sleep apnea is relatively small, especially in underdeveloped regions12,13, which creates a significant bottleneck for the timely detection and effective treatment of OSAHS. As such, it is critical to investigate more convenient and cost-effective detection approaches for OSAHS.
Through extensive research, various physiological indicators including electroencephalogram (EEG), electrocardiogram (ECG), and peripheral oxygen saturation (SpO2) have been employed for the identification of OSAHS. However, these detection methods are not without defects: EEG detection is often cumbersome due to the need for multiple electrode placements on the scalp, which can cause discomfort for patients and may lead to signal interference from hair or movement; SpO2 monitoring, while non-invasive, primarily reflects oxygenation changes and lacks direct correlation with the underlying respiratory events, resulting in limited specificity for OSAHS identification. Among these, ECG has been extensively utilized as a fundamental tool and a significant indicator in OSAHS detection14,15. Yet, processing ECG signals for OSAHS detection encounters several notable problems. ECG records the cardiac activity over a given period by capturing electrical signals through external electrodes attached to the skin, but the signals are highly susceptible to noise interference from factors such as muscle movement, electrode displacement, and power line artifacts, which can distort the original waveform and affect the accuracy of subsequent analysis. The resulting ECG signal comprises a sequence of voltage values corresponding to successive occurrences in time, thus reflecting a time series nature inherent in the ECG signal, but this time series data often contains non-stationary components and individual variations in cardiac rhythm, making it challenging to establish universal analytical models that can adapt to different populations.
As we all know, the temporal convolutional network (TCN) model performs well in time series forecasting and data processing on a big scale16, but it still faces the challenge of effectively filtering out noise and adapting to the non-stationary characteristics of ECG signals when applied to OSAHS detection based on ECG data. In this research, we propose a TCN-based model named ECG-TCN for OSAHS detection. Our approach involves processing the ECG signal through ECG-TCN and mapping it to different labels using a linear layer. In this approach, incorporating causal convolution helps in comprehending the context of sequence models, while dilated convolution efficiently processes data and reduces information loss. Additionally, the inclusion of a linearly scaled attention mechanism17 accelerates training and enables the model to handle longer sequences, aligning with the requirements of training on a large number of ECG signals. The final experimental results confirm that our model outperforms other standard models to detect OSAHS. Moreover, two ablation experiments demonstrate the effectiveness of our model and its superiority to TCN.
Our contributions can be highlighted as follows:
-
1.
To the best of our knowledge, this is the first study to use TCN to detect apnea and hypopnea durations.
-
2.
A novel model, i.e. ECG-TCN, is proposed. The experimental results also show that the ECG signals can be classified accurately, and our proposed model can be very useful to effectively improve the detection efficiency of OSAHS in primary hospitals.
-
3.
It is a hybrid model based on TCN with Fast Attention Via Positive Orthogonal Random Feature (FAVOR+), and our attention mechanism functions can effectively predict ECG signals, according to ablation experiments.
The rest of the paper is organized as follows. Section “Related work” summarizes related work on the topic of OSAHS detection and its relevance to deep learning. Section "Material and methods" describes the dataset and the proposed model in details. Section “Results” presents the experimental results and analysis. Finally, Sect. “Discussion” concludes the paper and provides insights for future work.
Related work
OSAHS is a medical term that refers to upper airway collapse during sleep, apnea or hypoventilation induced by blockage, frequent incidence of snoring and sleep structural disturbance, and low blood oxygen saturation. The most effective medical diagnosis approach for OSAHS is PSG18, which can identify the severity of the patient’s condition by monitoring the changes of numerous symptoms of the patient throughout sleep. PSG signals include EEG, eye movement, ECG, electromyogram (EMG), SpO2, mouth and nose airflow signal, and pharyngeal vibration signal19,20. The current detection approach is resource-intensive and inefficient, inhibiting its suitability for widespread implementation. Delayed recognition and treatment of the condition could lead to severe consequences.
In previous research, people recognized OSAHS using several physiological signs such as EEG, ECG, and SpO2, which considerably lowered the detection difficulty21,22. Kuan23 employed height, weight and gender to predict and prevent OSAHS. Waxman24 devised a system for predicting individual bouts of apnea and hypopnea, pointing out that the submental electromyogram is the most relevant signal for apnea prediction. Sharma25 developed a method for detecting sleep apnea based on a patient’s single-lead ECG. However, because of the tiny number of features utilized for classification, it is computationally expensive and thus is not widely employed. Cai26 introduced an ECG classification method that converts one-dimensional ECG data into three-channel pictures, providing a more general idea for ECG classification. Figure 1 visualizes the various signals required in PCG detection technology.
This is a visualized waveform drawn by a tool provided by the PhysioNet database, where the PCG signal can be observed. This graph is useful for doctors to analyze, but we need to turn the signals into usable sequence signals27.
ECG, a technique that captures the electrical activity pattern of the heart during each cardiac cycle, has emerged as a valuable tool in cardiovascular disease detection. ECG signals are represented as a random variable through the arrangement of statistical indices in their sequential occurrence. Previous studies have introduced several approaches for ECG classification. Traditional statistical learning approaches28,29 and machine learning methods30,31 can be used to classify ECG signals. Manish32 developed an innovative and portable OSA-CAD system utilizing a single-channel ECG. This system employs an optimal dual-band filter bank for automatic recognition of obstructive sleep apnea (OSA). Fernando33 conducted a study on automated analysis of nocturnal SpO2 signals, aiming to simplify the diagnosis of OSA. However, the study primarily focused on the automatic detection of apneic events versus normal breathing, without specifically distinguishing between apnea and hypopnea. Mashrur34 proposed a scale graph-based convolutional neural network (SCNN) to detect OSA using single-lead ECG signals. This method also does not distinguish between apnea and hypopnea, which may lead to misjudgment in practical applications. Moreover, these methods are all based on one-dimensional time series modeling, which cannot fully exploit the intrinsic characteristics of ECG signals.
According to a review of existing literature, there is a lack of research on the quick identification of two events: apnea and hypopnea. Additionally, the utilization of multi-channel ECG signals for OSAHS detection is uncommon due to limited access to relevant data. Consequently, proposing and implementing the usage of multi-channel ECG signals for OSAHS detection becomes imperative and advantageous. In response, we present a concrete scheme to address this issue and evaluated its accuracy and practicality using publicly available datasets.
Materials and methods
Experimental dataset
The Sleep Apnea Database from St. Vincent’s University Hospital and University College Dublin has been added to PhysioNet27. This database contains 25 full overnight polysomnography recordings from adult participants with suspected sleep-disordered breathing, as well as a simultaneous three-channel Holter ECG. Although the dataset consists of 25 patients, which may be considered limited for drawing broad conclusions, we addressed this issue by segmenting the long sequence data into 900 segments, thereby generating a larger dataset for analysis. This approach enhances the generalizability of our findings. The subjects in this dataset were enrolled in a study conducted over a 6-month period (September 2002 to February 2003) at the Sleep Disorders Clinic in St Vincent’s University Hospital, Dublin. They were referred for potential diagnosis of obstructive sleep apnea, central sleep apnea, or primary snoring. The recordings were collected at the Sleep Disorders Clinic of St. Vincent’s University Hospital under expert supervision. Onset time and duration of respiratory events (obstructive, central, mixed apneas, hypopneas, and periodic breathing episodes) were annotated by the same sleep technologist. The database labeled the events of each sequence, which is suitable for us to predict sequence data and evaluate our results. The study population consisted of 21 males and 4 females, with a mean age of 50 (± 10) years, mean height of 173.3 (± 9.6) cm, mean weight of 95.0 (± 14.7) kg, and mean BMI of 31.6 (± 4.0) kg/m². These demographic characteristics are critical for interpreting the results and ensuring reproducibility. Additionally, the number of apnea/hypopnea events per subject was recorded, which is essential for clinical interpretation and validation of our findings. Figure 2 visualizes a sample of the ECG signals of apnea, hypopnea, and normal respiration.
Data processing
In line with previous studies, most ECG signals required filtering, segmentation, noisy epoch removal, and signal transformation. Generally, continuous wavelet transform (CWT) is used to process non-stationary signals, and empirical mode decomposition (EMD) is used to decompose the signal into eigenmodes24,34. In the present study, our approach focuses on discerning ECG signals within the database based on three distinct annotations: apnea, hypopnea, and normal. Segmentation was performed strictly according to the time intervals marked by clinicians for each event. Due to the inherent variability in event durations, each processed signal corresponds to a unique time interval determined by the clinical annotations, with durations ranging from 15 to 30 s. The database contains three-channel ECG signals sampled at a fixed interval of 0.007812 s. Consequently, the data length for each signal varies based on its annotated duration, with lengths ranging from 1900 to 3840 samples.
Architecture of the proposed model
Convolutional neural networks (CNN) are widely utilized in medical natural language processing and have been extended for sequence modeling and prediction35,36. In the area of deep learning, sequence modeling is closely related to recurrent neural network designs such as Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU)37,38. Bai16 demonstrated that this style of thinking is out of date and that convolutional networks should be one of the primary candidates when modeling sequence data. They show that convolutional networks outperform RNN in several tasks while avoiding common recurrent model flaws like gradient explosion, vanishing or memory retention. Using a convolutional network can also enhance performance because the output can be computed in parallel. The architecture they proposed is called TCN. As shown in Fig. 3, our model is mainly based on TCN with Fast Attention Via Positive Orthogonal Random Feature.
The overall architecture of ECG-TCN. (1) The left is a dilated causal convolution with Fast Attention Via Positive Orthogonal Random Feature. The receptive field can cover all values from the input sequence. (2) The middle is residual block. An 1 × 1 convolution is added when residual input and output have different dimensions. (3) The right is basic layer, which is a causal convolutional layer.
The convolutions in TCN are causal, which means there is no information “leakage” from the future to the past. This architecture could process sequences of any length and map them to an output sequence of the same length. Furthermore, by utilizing deep networks with residual layers and dilated convolutions, TCN achieves substantial effective history sizes. This makes it particularly effective for predicting longer time series data, such as ECG signals.
After subjecting the three-channel ECG signals to data preprocessing, a natural language processing method is employed, which involves performing word embedding to transform the signals into fixed-length numerical sequences. Specifically, each channel of the ECG signal is mapped to a vector of length 2600, chosen as the median length to standardize the data, and the three channels are combined into a 2600 × 3 matrix. This matrix, with a batch size of 16, is then fed into the model for further analysis and interpretation. Figure 3 shows our final ECG-TCN model with \(\:l\) equal to input length, \(\:k\) equal to kernel size, \(\:b\) equal to dilation base, \(\:k\:>\:b\), and the minimum number of residual blocks with full history coverage \(\:n\), where \(\:n\) can be calculated from the above parameters. By leveraging word embedding, the ECG signals are effectively structured into a format suitable for the model, enabling efficient feature extraction and analysis.
To make ECG-TCN more than merely an unnecessarily sophisticated linear regression model, activation functions on top of the convolutional layers must be incorporated to introduce nonlinearities. Following the two convolutional layers, ReLU activations are added to the residual block. Weight normalization is used for each convolutional layer to normalize the input to the hidden layer (which prevents problems like bursting gradients). Regularization is introduced after each convolutional layer of each residual block to prevent overfitting. The model’s normalization strategy is the weight norm, which is better suited to sequence problems than the typical pooling layer. To achieve our classification prediction goal, we first flatten the multidimensional matrix into a two-dimensional tensor using the flatten layer, and then we utilize the linear layer to map to the category label.
The following sub-sub-sections introduce different parts of the model:
Causal convolutions
TCN is based on two key tenets: (1) TCN uses a one-dimensional fully convolutional network (FCN) architecture, where each hidden layer is the same length as the input layer and padding of length zero is added (kernel size 1), to keep subsequent layers the same length as previous layers. (2) Data leaks have never occurred in the past. When using causal convolution, TCN only convolves the output at time t with earlier elements at time \(\:t\) and the previous layer. Essentially, TCN is designed as a one-dimensional FCN with causal convolution.
Dilated convolutions
A simple causal convolution can only look back at the history that is linear in the depth of the network. For solving longer tasks. We use dilated convolutions to achieve exponentially large receptive fields39.
Formula 1 is a standard discrete convolution calculation in which \(\:F\) denotes a discrete function, \(\:K\) denotes a discrete filter, s denotes the step size, and \(\:k\) is the size of the convolution kernel. Formula 2 is the dilated convolution calculation, which widens the convolution kernel by injecting space between the convolution kernel portions. This increasing parameter \(\:l\) (hole rate) defines how much the convolution kernel should be relaxed. When \(\:l\:=\:1\), the dilated convolution is transformed into a standard convolution. As \(\:l\) grows, the system can give a wider receptive field while maintaining the same computational cost.
Residual connections
Given that the receptive field of TCN is contingent upon the network’s depth, convolution kernel size \(\:k\), and dilation factor \(\:d\), become critical to ensure stability when dealing with deeper and larger TCN. For instance, in scenarios where predictions rely on sizable input sequences of historical and high-dimensional data, with a dimension of \(\:2\:\times\:12\), architectures comprising up to 12 layers may be indispensable. Each individual layer within this TCN framework comprises multiple convolutional kernels dedicated to extracting salient features. Notably, instead of employing a conventional convolutional layer, a generic residual module is utilized as a key component within the TCN model.
Fast attention via positive orthogonal random feature
Different from the traditional self-attention mechanism, the attention mechanism that we used adopts an approximate attention mechanism. It introduces a technique called low-rank self-attention, which reduces the complexity of attention calculation from \(\:O\left({L}^{2}\right)\) reduces to \(\:O\left(L\right)\), as shown in Fig. 4, where \(\:L\) is the length of the input sequence. This method of low-rank decomposition can calculate attention weights more efficiently and reduce computational burden. While maintaining the performance of the model, the computational efficiency is greatly improved.
The overall architecture of Fast Attention Via Positive Orthogonal Random Feature. Left: standard attention module computation, which computes the final expected result by matrix multiplication using the attention matrix A and value tensor V. Right: by decoupling the matrices Q′and K′used in the low-rank decomposition A, and perform matrix multiplication in the order indicated in the dashed box, we obtain a linear attention matrix without the need to explicitly construct A or its approximation.
Results
Evaluation metrics
Accuracy, precision, recall, F1-score, and confusion matrix are the evaluation metrics employed. The proportion of accurate predictions to total predictions is known as accuracy. Analysis of the results’ validity is done using precision. The accuracy of the results is checked using recall, and precision and recall are balanced using the F1-score. Accuracy, precision, recall, F1-score, and confusion matrix are the evaluation metrics employed. These metrics are crucial for assessing the performance of classification models in machine learning. Accuracy is defined as the proportion of accurate predictions to total predictions. It provides a general sense of how often the model is correct across all classes. Precision is used to analyze the validity of the results. It is calculated as the ratio of true positives (TP) to the sum of true positives and false positives (FP), as shown in Formula 3. Precision is particularly important in scenarios where false positives are costly, such as in medical diagnoses. A high precision indicates that the model is good at identifying relevant instances, but it does not account for false negatives. Recall, also known as sensitivity, checks the accuracy of the results by measuring the ratio of true positives to the sum of true positives and false negatives (FN), as shown in Formula 4. Recall is crucial in scenarios where missing a positive instance is detrimental, such as in fraud detection. A high recall means the model is good at capturing most of the relevant instances, but it may also include more false positives. F1-score is the harmonic mean of precision and recall, as shown in Formula 5. It balances the trade-off between precision and recall, providing a single metric that considers both false positives and false negatives. The F1-score is particularly useful when you need a balance between precision and recall, and it is often used in situations where class distribution is imbalanced.
The output of a classifier is summarized using a confusion matrix. A \(\:k\:\times\:\:k\) confusion matrix is used to summarize the classification results, where \(\:k\) is the number of classes. The total amount of data in each row indicates how many instances of this category there are in the data, and each row represents the true attribution category of the data. The total number of each column in the confusion matrix denotes the number of data points that are expected to fall into each category, which is represented by each column in the confusion matrix. The numbers in each column represent the number of actual data points that match each class as predicted.
This paper is the first study to use convolutional network to judge apnea and hypopnea, so our confusion matrix is a 3-by-3 matrix. Each category corresponds to Apnea, Hypopnea, and Normal.
Experiment analysis
To evaluating the efficacy of the methodology used in this study, several classification model comparison experiments, including those using the LSTM Transformer, and TCN models, are set up in this section. To validate the effectiveness of the Fast Attention Via Positive Orthogonal Random Feature and TCN combination in our model, we added a comparative experiment: combing the Fast Attention Via Positive Orthogonal Random Feature and TCN in another way (in series form), TCN-ATTENTION. Compare the differences between the model used in this study and other models when classifying different events (Hypopnea, Apnea and Normal) during sleep.
Table 1 displays the accuracy rate of each model experiment. It is worth noting that ECG-TCN has an accuracy of 91.6%, which is much higher than other models. Among the remaining models, LSTM has the highest accuracy rate of 89.00%. The accuracy rate of the Transformer model with attention mechanism is only 73.90%, the accuracy rate of the TCN model is 79.60%, and the accuracy rate of TCN-ATTENTION, which also combine TCN and Fast Attention Via Positive Orthogonal Random Feature, is only 82.60%. This demonstrates that our technique is effective in identifying and detecting OSAHS.
The F1-score of the model in this study are 87.90%, 88.20%, and 98.40%, showing that our model performs well in terms of classification accuracy and breadth (as demonstrated in Table 1). It should be noted that the recall rate shows the model’s thoroughness, whereas the precision reflects the model’s accuracy. The F1-score that while aiming to enhance precision and recall, we also try to reduce their variation so that the F1-score can more properly judge the model’s strengths and flaws.
Ablation experiments analyzed the effectiveness of core components and showed that while the basic TCN model maintained a high F1-score of 98.70% for Normal events, its classification capabilities for Hypopnea and Apnea events were insufficient, with F1-score of only 63.30% and 74.40% respectively. The TCN-ATTENTION model, which introduced a tandem attention mechanism, improved the Hypopnea F1-score by 10.5% points to 73.80%, but only improved the Apnea F1-score by 1.7% points to 76.10%, demonstrating limited optimization effectiveness. ECG-TCN achieves a significant performance improvement through a superior Fast Attention and TCN combination strategy: Hypopnea’s F1-score increases significantly by 24.6% points to 87.90% compared to the basic TCN, and Apnea’s F1-score increases by 13.8% points to 88.20%. While maintaining a high F1-score of 98.40% for Normal events, the overall accuracy reaches 91.60%, significantly outperforming other models and fully verifying the effectiveness of the proposed combination strategy.
Model efficiency analysis shows that TCN, thanks to its convolutional parallel computing architecture, is the most efficient model, achieving benchmark performance in terms of parameter count, memory usage, and training/inference time. Transformer, due to its self-attention mechanism’s high O(n²) computational complexity and numerous redundant parameters, is the least efficient model. It has approximately three times the number of parameters, five times the training time, and four times the memory usage of TCN, making it difficult to meet the demands of real-time scenarios. In contrast, ECG-TCN achieves significant performance improvements while maintaining balanced efficiency: The number of trainable parameters is only 1.2 times that of TCN, the training time is approximately 1.1 times, and the inference time and memory usage are 1.05 times and 1.1 times, respectively. Through its optimized Fast Attention mechanism, the model effectively controls computational overhead, achieving an optimal balance between accuracy and efficiency.
Figure 5 illustrates the confusion matrix for each model, showcasing their respective recognition performance. The models exhibit significant influence in correctly identifying the Normal label. LSTM demonstrates strong performance in classifying both the Apnea and Normal labels. However, when it comes to the Hypopnea label, LSTM achieves only an 80.05% accuracy rate. In comparison, TCN-ATTENTION’s overall classification effect falls short of TEXT-CNN. Particularly, TCN’s accuracy in categorizing the Hypopnea label is merely 55.7%, indicating a weaker performance. Furthermore, the Transformer model achieves the lowest overall accuracy, suggesting its unsuitability for ECG signal recognition and categorization. In stark contrast, our ECG-TCN model stands out by achieving an impressive accuracy of 91.6%, surpassing all other models. Furthermore, our model excels specifically in accurately classifying the challenging hypopnea label.
This study demonstrated the significant advantages of the ECG-TCN model through sleep event classification experiments. In terms of performance, the ECG-TCN ranked first with an overall accuracy of 91.60% and an average F1-score of 91.50% for three event categories. It was particularly effective in classifying the clinically critical hypopnea and apnea events, achieving F1-score of 87.90% and 88.20%, respectively, significantly exceeding those of LSTM, Transformer, basic TCN, and TCN-ATTENTION models. Ablation studies confirmed that its core advantage stems from its optimized combination of Fast Attention and TCN, achieving more effective feature fusion than a simpler cascaded attention mechanism. In terms of efficiency, while only slightly increasing computational and storage overhead, the ECG-TCN significantly outperformed the more complex Transformer and the less powerful LSTM. It strikes a balance between classification accuracy and practical efficiency, providing a highly efficient and reliable solution for the automated monitoring of sleep respiratory events.
Discussion
PSG is the gold standard for diagnosing OSAHS but conducting PSG tests for neurologically impaired patients in environments with electrical interference, such as stroke wards, is challenging. Additionally, the number of stroke patients far exceeds the availability of PSG equipment and specialized medical personnel. Although a novel convolutional deep learning architecture has been proposed40, effectively reducing the temporal resolution of raw waveform data to extract key features for further processing and detecting OSAHS events in the monitoring records of unscreened stroke ward patients, the limitation of this method lies in its primary focus on stroke patients, resulting in a narrow scope of applicability. In previous research, Cen41 employed deep learning techniques, specifically CNN, as a feature detector to learn the interrelationships between visible data and labels. The study connected the final fully connected layer of the CNN to the output layer for sleep apnea event classification. However, the experimental results indicated modest accuracy rates of 53.61% and 66.24% for hypopnea and apnea, respectively, suggesting limited practical applicability. Mashrur34 developed a new scale map-based CNN model to identify obstructive sleep apnea. A signal conversion pipeline was also created to translate ECG signals into OSA detection. Although its accuracy rate reached 81.86%, it can only identify OSA and is not practical. A recent study combined a CNN and a structured state-space model to detect sleep apnea from ECG spectrograms42, overcoming the high computational complexity and memory consumption of traditional methods when processing long-term data series. The method achieved high accuracy on the Apnea-ECG dataset. However, its drawbacks include its reliance on specific datasets, limited generalization, and underutilization of multimodal data. In contrast, our research focuses on identifying apnea and hypopnea events within each segment, demonstrating greater generalization and adaptability to diverse environments and populations, providing a more robust and flexible solution for practical applications. In this study, we integrate the local and contextual semantic components of sequences with an attention mechanism to produce global sentence features for determining sequence relatedness. Experiment findings reveal that the model outperforms LSTM, TCN, Transformer and TCN-ATTENTION in terms of classification accuracy and performance stability. The classification model has a higher accuracy rate than the classic classification model and a higher generalization capacity.
One of the limitations of our model stems from the availability of data. The performance of any model greatly relies on the quality and quantity of training data. If the dataset is limited in size or lacks diversity, our model may struggle to capture the full range of variations and patterns in the ECG signal, leading to subpar predictive performance. Furthermore, the presence of noise and interference in the ECG signals presents another constraint, potentially impacting the accuracy of our predictions. To address these limitations, our future research will prioritize advancements in data collection techniques and the implementation of robust preprocessing methods, aiming to enhance the overall performance and efficacy of our model.
This study demonstrates that OSAHS can be diagnosed and detected more reliably and conveniently, which has significant clinical implications. This will help save resources and time, allow patients to diagnose and cure ailments earlier, and relieve burden on doctors and practitioners. This research can be expanded to cover more in-depth topics. For instance, by integrating mobile portable devices with sensor technology, our model empowers patients to detect OSAHS conveniently without the need for a hospital visit. Furthermore, the application of ECG signals detection technology can be extended to other disease diagnoses, facilitating the utilization of artificial intelligence technologies in the medical sector.
Conclusion
In this research, we present a temporal convolutional network with a linearly scalable attention mechanism (ECG-TCN). To validate the effectiveness of ECG-TCN, we conducted experiments on the University College Dublin Sleep Apnea Database. The results demonstrate that our ECG-TCN model achieves an accuracy of 91.6% in per-segment classification. Comparing with previous classification models, our proposed model exhibits superior evaluation metrics, indicating significant improvements in the accuracy of OSAHS identification and detection. This advancement in diagnosis and detection of OSAHS offers enhanced reliability and convenience, enabling earlier stage diagnosis and treatment for patients while reducing the burden on healthcare professionals.
Data availability
The St. Vincent’s University Hospital/University College Dublin Sleep Apnea Database can be accessed via PhysioNet at the following link: https://physionet.org/content/ucddb/1.0.0/.
References
Benjafield, A. V. et al. Pépin, others, Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respiratory Med. 7, 687–698 (2019).
Heinzer, R., Marti-Soler, H. & Haba-Rubio, J. Prevalence of sleep Apnoea syndrome in the middle to old age general population. Lancet Respiratory Med. 4, e5–e6 (2016).
Young, T., Evans, L. & Finn, L. Palta, others, Estimation of the clinically diagnosed proportion of sleep apnea syndrome in middle-aged men and women. Sleep 20, 705–706 (1997).
Gislason, T. & Benediktsdottir, B. Snoring, apneic episodes, and nocturnal hypoxemia among children 6 months to 6 years old: an epidemiologic study of lower limit of prevalence. Chest 107, 963–966 (1995).
Ancoli-Israel, S. et al. The relationship between congestive heart failure, sleep apnea, and mortality in older men. Chest 124, 1400–1405 (2003).
Vgontzas, A. N. et al. Sleep apnea and daytime sleepiness and fatigue: relation to visceral obesity, insulin resistance, and hypercytokinemia. J. Clin. Endocrinol. Metabolism. 85, 1151–1158 (2000).
Malhotra, A. & White, D. P. Obstructive sleep Apnoea. Lancet 360, 237–245 (2002).
of, A. A. & Force, S. M. T. others, sleep-related breathing disorders in adults: recommendations for syndrome definition and measurement techniques in clinical research. The report of an American academy of sleep medicine task force. Sleep 22, 667 (1999).
de Chazal, P., Penzel, T. & Heneghan, C. Automated detection of obstructive sleep Apnoea at different time scales using the electrocardiogram. Physiol. Meas. 25, 967 (2004).
Javaheri, S. et al. Pack, others, sleep apnea: types, mechanisms, and clinical cardiovascular consequences. J. Am. Coll. Cardiol. 69, 841–858 (2017).
Kushida, C. A. et al. Kramer, others, practice parameters for the indications for polysomnography and related procedures: an update for 2005. Sleep 28, 499–523 (2005).
Hillman, D. R., Murphy, A. S., Antic, R. & Pezzullo, L. The economic cost of sleep disorders. Sleep 29, 299–305 (2006).
AlGhanim, N., Comondore, V. R., Fleetham, J., Marra, C. A. & Ayas, N. T. The economic impact of obstructive sleep apnea. Lung 186, 7–12 (2008).
Mostafa, S. S., Mendonça, F., Ravelo-García, A. G. & Morgado-Dias, F. A systematic review of detecting sleep apnea using deep learning. Sensors 19, 4934 (2019).
Lin, C. Y., Wang, Y. W., Setiawan, F., Trang, N. T. H. & Lin, C. W. Sleep apnea classification algorithm development using a machine-learning framework and bag-of-features derived from electrocardiogram spectrograms. J. Clin. Med. 11, 192 (2021).
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271.
Choromanski, K. et al. others, Rethinking attention with performers, arXiv Preprint arXiv:2009.14794 (2020).
Yang, J. et al. The PSG challenge: towards comprehensive scene Understanding. Natl. Sci. Rev. 10, nwad126 (2023).
Chesson, A. L. et al. others, practice parameters for the indications for polysomnography and related procedures. Sleep 20, 406–422 (1997).
Haidar, R., Koprinska, I. & Jeffries, B. Sleep apnea event detection from nasal airflow using convolutional neural networks, in: Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part V 24, Springer, : pp. 819–827. (2017).
Bsoul, M., Minn, H. & Tamil, L. Apnea medassist: real-time sleep apnea monitor using single-lead ECG. IEEE Trans. Inf Technol. Biomed. 15, 416–427 (2010).
Nguyen, H. D., Wilkins, B. A., Cheng, Q. & Benjamin, B. A. An online sleep apnea detection method based on recurrence quantification analysis. IEEE J. Biomedical Health Inf. 18, 1285–1293 (2013).
Kuan, Y. C., Hong, C. T., Chen, P. C. & Liu, W. T. Chung, others, logistic regression and artificial neural network-based simple predicting models for obstructive sleep apnea by age, sex, and body mass index. Math. Biosci. Eng. 19, 11409–11421 (2022).
Waxman, J. A., Graupe, D. & Carley, D. W. Automated prediction of apnea and hypopnea, using a LAMSTAR artificial neural network. Am. J. Respir. Crit Care Med. 181, 727–733 (2010).
Sharma, H. & Sharma, K. An algorithm for sleep apnea detection from single-lead ECG using Hermite basis functions. Comput. Biol. Med. 77, 116–124 (2016).
Cai, H., Xu, L., Xu, J., Xiong, Z. & Zhu, C. Electrocardiogram signal classification based on mix time-series imaging. Electronics 11, 1991 (2022).
Goldberger, A. L. et al. PhysioBank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101, e215–e220 (2000).
Yang, W., Si, Y., Wang, D. & Zhang, G. A novel method for identifying electrocardiograms using an independent component analysis and principal component analysis network. Measurement 152, 107363 (2020).
Venkatesh, N. & Jayaraman, S. Human electrocardiogram for biometrics using DTW and FLDA, in: 2010 20th International Conference on Pattern Recognition, IEEE, : pp. 3838–3841. (2010).
Zhu, Q. et al. Automatic kidney stone identification: an adaptive feature-weighted LSTM model based on urine and blood routine analysis. Urolithiasis 52, 145 (2024).
Kumari, L. Sai, others, classification of ECG beats using optimized decision tree and adaptive boosted optimized decision tree. Signal. Image Video Process. 16, 695–703 (2022).
Sharma, M., Raval, M. & Acharya, U. R. A new approach to identify obstructive sleep apnea using an optimal orthogonal wavelet filter bank with ECG signals. Inf. Med. Unlocked. 16, 100170 (2019).
Vaquerizo-Villar, F. et al. Deep-Learning Model Based on Convolutional Neural Networks To Classify Apnea–Hypopnea Events from the Oximetry Signal, In: Advances in the Diagnosis and Treatment of Sleep Apneapp. 255–264 (Filling the Gap Between Physicians and Engineers, Springer,, 2022).
Mashrur, F. R., Islam, M. S., Saha, D. K., Islam, S. R. & Moni, M. A. Scalogram-based convolutional neural network to detect obstructive sleep apnea using single-lead electrocardiogram signals. Comput. Biol. Med. 134, 104532 (2021).
Mao, C., Zhu, Q., Chen, R. & Su, W. Automatic medical specialty classification based on patients’ description of their symptoms. BMC Med. Inf. Decis. Mak. 23, 1–9 (2023).
Ma, T., Pan, Q., Rong, H., Qian, Y., Tian, Y., & Al-Nabhan, N. T-bertsum: Topic-aware text summarization based on bert. IEEE Transactions on Computational Social Systems, 9(3), 879–890 (2021).
López Seguí, F. et al. Teleconsultations between patients and healthcare professionals in primary care in Catalonia: the evaluation of text classification algorithms using supervised machine learning. International journal of environmental research and public health, 17(3), 1093 (2020).
Murad, A., Pyun, J. Y. Deep recurrent neural networks for human activity recognition. Sensors, 17(11), 2556 (2017).
Wang, Y., Hu, S., Wang, G., Chen, C., & Pan, Z. Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimedia Tools and Applications, 79(1), 1057–1073 (2020).
Bernardini, A., Brunello, A., Gigli, G. L., Montanari, A. & Saccomanno, N. An approach to the automatic identification of obstructive sleep apnea events based on deep learning. Artif. Intell. Med. 118, 102133 (2021).
Cen, L., Yu, Z. L., Kluge, T., Ser, W. & Society Automatic system for obstructive sleep apnea events detection using convolutional neural network, in: 40th Annual International Conference of the IEEE Engineering in Medicine and Biology (EMBC), IEEE, 2018: pp. 3975–3978. (2018).
Zan, H. ModelS4Apnea: leveraging structured state space models for efficient sleep apnea detection from ECG signals, Physiological Measurement (2025).
Acknowledgements
We appreciate the public database provider and maintenance staff. We also acknowledge the valuable insights provided by all reviewers.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Study design (LC, RS); data collection (LC, JB); data analysis and interpretation (LC, AL, SF); drafting the manuscript (LC, JB); manuscript revision (LC, SF, RS). All authors gave final approval and agreed to be accountable for all aspects of the work ensuring integrity and accuracy.
Corresponding author
Ethics declarations
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Consent to participate
Not applicable.
Human and animal ethics
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cheng, L., Bai, J., Liu, A. et al. Automated OSAHS detection from ECG using temporal convolutional network. Sci Rep 15, 35915 (2025). https://doi.org/10.1038/s41598-025-19833-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-19833-7