An explainable deep learning framework for trustworthy arrhythmia detection from ECG signals

Talukder, Md. Alamin; Talaat, Amira Samy; Muna, Nusrat Jahan; Alazab, Ammar; Kazi, Mohsin; Das, Utpal Kanti

doi:10.1038/s41598-025-22986-0

Download PDF

Article
Open access
Published: 11 November 2025

An explainable deep learning framework for trustworthy arrhythmia detection from ECG signals

Md. Alamin Talukder¹,
Amira Samy Talaat²,
Nusrat Jahan Muna³,
Ammar Alazab^4,5,
Mohsin Kazi⁶ &
…
Utpal Kanti Das¹

Scientific Reports volume 15, Article number: 39496 (2025) Cite this article

3410 Accesses
Metrics details

Subjects

Abstract

Cardiovascular diseases (CVDs) constitute a foremost global health challenge, with cardiac arrhythmias significantly increasing both mortality and morbidity. Early and precise detection of these arrhythmias from Electrocardiogram (ECG) signals is paramount but inherently complex due to the vast volume, diverse characteristics and variability of ECG data. While Deep Learning (DL) models offer transformative potential for automated ECG analysis, their widespread clinical adoption is hindered by issues such as susceptibility to overfitting, high computational demands and a notable lack of interpretability, resulting in black-box systems. This paper presents an explainable DL framework for accurate and reliable arrhythmia detection. Our innovative approach integrates advanced DL architectures, specifically Convolutional Neural Network (CNN) and Dense Neural Network (DNN), within a sophisticated multi-stage pipeline. This pipeline encompasses meticulous data preparation, state-of-the-art signal preprocessing and robust multi-strategy data balancing techniques, including ADASYN, SMOTE, SMOTETomek and Random Over-Sampling (ROS), to maximize model performance and generalization. Crucially, the framework incorporates Explainable Artificial Intelligence (XAI) methodologies—namely SHAP, LIME and Feature Importance Analysis (FIA) to provide transparent insights into the model’s decision-making process. Rigorous evaluation on benchmark ECG datasets such as MITDB, PTBDB and NSTDB, demonstrates superior classification accuracy, with our ROS+CNN model achieving 99.74%, 99.43% and 99.98%, respectively. The embedded XAI components offer actionable interpretability, fostering clinical trust and paving the way for more reliable and impactful AI-driven cardiovascular diagnostics.

A hybrid deep learning network for automatic diagnosis of cardiac arrhythmia based on 12-lead ECG

Article Open access 18 October 2024

Hybrid CNN-BLSTM architecture for classification and detection of arrhythmia in ECG signals

Article Open access 03 October 2025

Hybrid deep learning framework for heart disease prediction using ECG signal images

Article Open access 30 September 2025

Introduction

Cardiovascular diseases (CVDs) represent the foremost global health challenge, responsible for approximately 17.9 million deaths in 2019, accounting for 32% of all global fatalities¹. A significant proportion of these deaths, particularly 85%, is attributed to heart attacks and strokes, highlighting the critical impact of cardiac conditions. Alarmingly, over 75% of CVD-related mortalities occur in low and middle-income countries, exacerbating global health disparities². Moreover, CVDs contribute to 38% of premature deaths among individuals under 70 years of age due to non-communicable diseases³. Cardiac arrhythmias, including ventricular flutter and ventricular fibrillation, are closely linked to the onset of CVDs and can lead to severe events such as sudden death, hemodynamic collapse and cardiac arrest⁴. Therefore, the timely and precise detection of these arrhythmias is paramount for preventing life-threatening outcomes. Electrocardiogram (ECG) signals are an indispensable non-invasive diagnostic tool for monitoring heart health and identifying cardiovascular abnormalities⁵. ECG records the electrical activity of the heart, with a typical waveform comprising P waves, QRS complexes and T waves. The QRS complex, in particular, is critical for arrhythmia detection, as abnormalities in this segment often indicate irregular heart rhythms⁶. Despite their diagnostic utility, the inherent volume, complexity and variability of ECG data, influenced by subject differences, time and environmental conditions, pose significant challenges for accurate analysis and interpretation by healthcare practitioners.

Traditional ECG monitoring methods, including Holter monitors, telemetry and episodic monitors, face significant challenges in the detection of arrhythmias⁷. These methods often involve manual interpretation, which can cause delays in diagnosis and they also rely heavily on patient compliance. Furthermore, their use is typically limited to clinical settings, making them less effective for long-term or remote monitoring⁸. For example, although wearable ECG devices improve monitoring convenience, traditional systems still struggle to handle the variability of ECG signals across patients and environments⁹. Implantable devices offer continuous monitoring for arrhythmias but are invasive and not suitable for all patients⁷. However, recent innovations in deep learning (DL), notably Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs), have presented a powerful avenue to address these challenges. These models excel at processing complex, high-dimensional data and are adept at capturing subtle temporal patterns within ECG signals. Their ability to automatically extract meaningful features from raw data has demonstrably improved diagnostic accuracy in various medical applications, including arrhythmia detection and classification¹⁰. By reducing manual interpretation, these models enhance diagnostic accuracy, enabling continuous monitoring even outside clinical environments. Furthermore, DL-powered systems can help monitor and analyze ECG signals more efficiently, addressing issues related to noise, data complexity and the variability of ECG signals across different patients¹¹. This shift towards automated analysis is making ECG monitoring more accessible, cost-effective and adaptable to non-clinical settings, offering significant improvements over traditional methods⁷.

Despite the advancements made in DL for ECG analysis, generalization and robustness continue to be significant challenges. Many models tend to overfit on specific datasets, which compromises their ability to perform well on diverse or unseen ECG data, a key concern for real-world applicability¹². For instance, while noise elimination techniques have improved, their ability to generalize across various types of noise and signal distortions remains a challenge. Traditional methods often fail to handle multi-source noise effectively, whereas DL models can adapt but still face limitations in noisy environments¹³. Moreover, computational complexity is a major hurdle for advanced DL models, such as hybrid LSTM-CNNs, which are highly accurate but require significant computational resources¹⁴. This makes them less suited for real-time applications or use on resource-constrained devices, such as mobile or wearable ECG monitors. Additionally, dataset balancing techniques like SMOTE (Synthetic Minority Over-sampling Technique) help mitigate class imbalance but sometimes introduce synthetic data that may not accurately reflect real-world variations, leading to potential biases in the model’s predictions¹⁵. Feature extraction is another area where DL methods still face challenges. Existing techniques may not consistently identify the most relevant features from complex ECG signals, leading to the omission of critical patterns that are vital for accurate arrhythmia detection¹⁶. Furthermore, the black-box nature of DL models limits their transparency. This lack of interpretability is a significant barrier to their clinical adoption, as healthcare professionals require an understanding of the model’s decision-making process to trust and use it in patient care¹⁷.

To overcome the aforementioned limitations, this study introduces a novel, explainable DL framework designed for accurate and reliable arrhythmia detection from ECG signals. Our proposed approach integrates cutting-edge DL models, specifically CNN and DNN, within a sophisticated multi-stage pipeline. This pipeline encompasses cutting-edge data preparation, meticulous signal preprocessing and robust multi-strategy data balancing techniques to enhance model performance and robustness. Furthermore, to address the critical interpretability gap and foster clinical applicability and trust, the framework incorporates XAI methodologies. This comprehensive combination aims to achieve superior performance, provide crucial insights into model decision-making and ultimately push the boundaries of cardiovascular diagnostics.

Contributions

The key contributions of this study are as follows:

Novel explainable deep learning framework: We propose an innovative framework that seamlessly integrates cutting-edge DL architectures (CNN and DNN) with a meticulously designed data processing pipeline and crucial XAI techniques (SHAP, LIME and FIA). This integration provides both high-performance arrhythmia detection and transparency in model predictions, a critical aspect for clinical adoption.
Advanced ECG data preparation and preprocessing: The study employs a systematic data preparation pipeline, including dynamic loading from multiple benchmark ECG datasets (MITDB, NSTDB, PTBDB), precise 1-second segment extraction around R-peaks and parallel processing for efficiency. This is coupled with state-of-the-art signal processing techniques (Butterworth bandpass, 50Hz notch and high-pass filtering, followed by Z-score normalization) to significantly reduce noise and enhance signal quality for more effective analysis.
Robust multi-strategy data balancing: To effectively mitigate the inherent class imbalance in ECG datasets, our approach comprehensively investigates and leverages multiple sophisticated data balancing techniques, including ADASYN, SMOTE, SMOTETomek and ROS. The selection of the most effective method ensures a balanced dataset, addressing bias and improving the robustness and sensitivity of the model towards rare but critical arrhythmia events.
Comprehensive and interpretable evaluation: The proposed framework is rigorously evaluated on publicly available arrhythmia datasets using a diverse set of performance metrics. Beyond quantitative performance, the integration of XAI techniques provides qualitative interpretability, demonstrating the effectiveness and reliability of our approach while offering actionable insights into the model’s detection, thereby outpacing traditional methods and enhancing clinical utility.

This paper is structured to present our novel framework comprehensively. Following this introduction, Section “Related works”, provides an overview of existing research in arrhythmia detection, highlighting their strengths and limitations. Section “Methodology”, details our proposed explainable AI framework, including data preparation, preprocessing, balancing techniques and deep learning model architecture. Section “Result analysis”, presents the experimental results and a thorough analysis of the framework’s performance, complemented by insights from the XAI methods. Section “Discussion”, represents the discussion of our proposed work with state-of-the-art works. Finally, Section “Conclusion”, summarizes our findings, discusses the implications of our work and outlines directions for future research.

Related works

Recent advancements in arrhythmia detection from ECG signals, focusing on studies that have utilized the benchmark datasets relevant to our research. We categorize these works by the primary dataset employed, highlighting their methodologies, key contributions and reported performance metrics.

Arrhythmia detection on MITDB dataset

Srivastava et al.¹⁸ introduced rECGnition_v1.0, a multi-modal DL model that integrates ECG morphological features with patient characteristics (age, gender, BMI) for enhanced arrhythmia classification. Their model, combining a CNN for ECG feature extraction and a Squeeze and Excitation-based Patient characteristic Encoding Network (SEPcEnet), achieved an accuracy of 98.56% on the MITDB, demonstrating improved performance through the correlation of patient-specific data with ECG morphology.

Kim et al.¹⁹ presented a Local-Global Temporal Fusion Network incorporating an attention mechanism for multi-class arrhythmia classification from single-lead ECGs. This framework integrated temporal convolutional networks (TCN) with multiscale temporal information fusion (TIF) and temporal multi-head attention (MHA), achieving an F1-score of 96.45% for duration classification and 96.31% for episode classification on the MITDB. While effective, the model was noted for its high computational cost and hardware requirements.

Tudjarski et al.²⁰ explored a transformer-based approach for Atrial Fibrillation (AFIB) detection. They utilized a bidirectional transformer model (RoBERTa), pre-trained on a large unlabeled ECG dataset and fine-tuned on a smaller labeled one. This method achieved a notable accuracy of 98.81%, a sensitivity of 98.81% and an F1 score of 91.57% on the MITDB, showcasing the power of transformer models with self-supervised pre-training.

Zhou et al.²¹ introduced mRMEBP, a unified framework for online AF detection that combines statistical inference and probabilistic modeling of cardiac interbeat intervals. Their model leverages five robust features analyzed by a Back Propagation Neural Network (BPNN). The mRMEBP achieved an accuracy of 95.42% for the MITDB, demonstrating the importance of robust feature selection for improved accuracy in online monitoring scenarios. He et al.²² proposed a dynamic ECG signal quality assessment method based on a hybrid CNN-LSTM model. Utilizing the MITDB, their model achieved an accuracy of 98.65% for classifying signal quality into excellent, qualified and failed categories. This work underscores the efficacy of combining CNN and LSTM for noise reduction and enhanced diagnostic accuracy in ECG signals.

El-Ghaish et al.²³ introduced ECGTransForm, a DL framework that integrates multi-scale convolutions, channel recalibration and a bidirectional transformer for arrhythmia classification. This architecture adeptly models both past and future temporal dependencies, thereby enhancing the detection of subtle arrhythmic patterns. Evaluated on the MITDB and PTBDB datasets, ECGTransForm achieved an impressive accuracy of 99.35% and a macro F1-score of 94.26%. Furthermore, the incorporation of Context-Aware Loss (CAL) significantly improved class balance, enabling robust and reliable detection across a wide spectrum of arrhythmia types.

Di et al.²⁴ introduced a multimodal CNN with adaptive attention for ECG arrhythmia classification. Using Hilbert space-filling curves and recurrence plots to convert ECG signals into images, their model achieved 98.48% accuracy and an F1 score of 81.91% for interpatient classification and 99.70% accuracy with a 97.64% F1 score for intrapatient classification on the MITDB. Dual-lead input (MLII and V1) and attention refinement notably improved detection of supraventricular arrhythmias.

Islam et al.²⁵ proposed CAT-Net, a hybrid DL model combining convolution, channel attention and transformer encoders for single-lead ECG arrhythmia classification. On the MITDB, it achieved 99.14% accuracy and a macro F1-score of 94.69%. Using SMOTETomek for class balancing, CAT-Net improved the detection of minority classes like supraventricular and fusion beats. Its lightweight design and single-lead input make it suitable for real-time wearable and IoT-based applications.

Berrahou et al.²⁶ proposed a 1D CNN-based model for arrhythmia detection that integrates morphological ECG features with RR interval and entropy rate descriptors. Evaluated on the MITDB, the model achieved 99.17% accuracy for intra-patient and 98.73% for inter-patient classification and 98.20% accuracy on the INCART dataset. The approach demonstrated strong generalization and effective handling of class imbalance across diverse ECG sources

Issa et al.²⁷ proposed a deep neural network with residual blocks (DNN-RB) for single-lead ECG heartbeat classification. Using MLII signals from the MITDB, their model achieved 99.51% accuracy, 99.70% sensitivity and 98.20% specificity. The architecture outperformed existing methods and proved effective for mobile ECG devices and real-time monitoring, even under class imbalance conditions.

Anitha et al.²⁸ developed a hybrid DL model combining an ensemble CNN-RNN for feature extraction with a bidirectional capsule network (Bi-CapsNet) for arrhythmia classification. Tested on the MITDB, the model achieved 97.19% accuracy, outperforming CNN (89.87%), FTBO (85%) and standalone capsule networks (97.0%). Its robustness against noisy ECG signals and ability to capture spatial-temporal features make it suitable for clinical deployment.

Kumar et al.²⁹ introduced an improved Hawks Optimizer (HO)-based stacked ensemble model for CVD classification. The framework incorporated a Neural Network Reasoning component and addressed class imbalance in the Kaggle CVD dataset. They evaluated a collected dataset from Kaggle, the model achieved strong results with 97% accuracy. The HO optimizer enhanced global search capability, leading to superior predictive performance compared to benchmark models, with notable improvements in Matthews Correlation Coefficient (MCC), accuracy and F-measure.

Arunachalam et al.³⁰ developed a novel ML-based model for CVD risk assessment, using K-Nearest Neighbor (KNN) as a baseline alongside ensemble methods such as XGBoost, AdaBoost and Random Subspace. With the aid of Linear Support Vector Feature Measure (LSVFM) for feature prediction, the system demonstrated strong performance on the MITDB, achieving 96% accuracy and 97% precision. The results highlighted the model’s effectiveness as a potential clinical decision-support tool.

Saranya et al.³¹ introduced DenseNet-ABiLSTM, a hybrid DL model combining densely connected convolutional networks with Attention-based Bidirectional LSTM for multiclass arrhythmia detection from ECG signals. Leveraging 1D convolutional kernels for multiscale feature extraction and attention-enhanced BiLSTM for temporal analysis, the model classified arrhythmia. The dataset MITDB achieved an average F1 score of 87.74% and accuracy of 89.14%, outperforming traditional ECG-based approaches.

Finally, Rajagopal et al.³² investigated the role of unsupervised dimensionality reduction (DR) methods in arrhythmia classification using ECG signals. By comparing techniques such as PCA, fastICA, kernel PCA, hierarchical nonlinear PCA and principal polynomial analysis (PPA) with a probabilistic neural network (PNN) classifier, the study highlighted the advantages of nonlinear DR.

Arrhythmia detection on PTBDB dataset

Research on the PTB Diagnostic ECG Database (PTBDB) has also seen significant advancements in arrhythmia detection and ECG signal analysis. Mondal et al.³³ proposed a lightweight Convolutional Neural Network (CNN) that utilizes derivative ECG (dECG) signals for automatic ECG signal quality assessment (ECG-SQA). Their dECG-based CNN, designed to mitigate noise challenges in wearable devices, achieved an accuracy of 97.59% on the PTBDB, demonstrating its feasibility for real-time applications, including on a Raspberry Pi platform.

Khan et al.³⁴ introduced a hybrid GRU-CNN model for cardiac abnormality prediction using the PTBDB. By combining Gated Recurrent Units (GRU) with CNNs, their model effectively processed long sequences and learned non-linear features from ECG signals. They further enhanced performance with an RB-GRU-CNN model, achieving an RMSE of 0.02679, which highlighted the effectiveness of incorporating residual bias for error reduction in time-series ECG data.

Subhiyakto et al.³⁵ focused on addressing class imbalance in ECG classification using a CNN-based model on the PTBDB. They extensively experimented with CNN, Transformer and LSTM architectures in conjunction with various SMOTE techniques, including SMOTE Borderline, ADASYN, Tomek and ENN. The highest accuracy of 99.36% was achieved with CNN combined with SMOTE Borderline, underscoring the importance of resampling methods in imbalanced datasets.

Bai et al.³⁶ developed a hybrid deep learning model named CBGM, which integrates CNN, Bidirectional Gated Recurrent Units (BiGRU) and a multi-head attention mechanism. This model was validated on both MITDB and PTBDB, achieving an accuracy of 98.82% on the latter. The CBGM model effectively captures both spatial and temporal features, making it suitable for real-time ECG screening and clinical decision support.

Lee et al.³⁷ proposed a novel cross-database learning framework for ECG arrhythmia classification, utilizing a two-dimensional beat-score-map (BSM) representation. Their approach addressed generalization challenges across heterogeneous databases by employing both fine-grained and coarse-grained annotations. While validated on PTB-XL (a related dataset) among others, their framework demonstrated an F1 score of 0.9267, showcasing improved generalization performance for complex cardiac arrhythmias.

Finally, Padmavathi et al.³⁸ explored hybrid deep learning models for automated cardiovascular disease identification. They proposed a 1D CNN combined with a Recurrent Hopfield Neural Network (RHNN) and another with a Residual Network (ResNet). Evaluated on the PTBDB, the 1D-CNN-RHNN model achieved a 96.62% accuracy for a 4-class classification system, demonstrating the potential of such hybrid approaches in improving ECG signal analysis for real-time medical diagnostics.

Arrhythmia detection on NSTDB dataset

Singh et al.³⁹ introduced an Attention-Based Convolutional Denoising Autoencoder (ACDAE) model, enhanced with a lightweight channel attention (ECA) module, for robust ECG signal denoising and arrhythmia classification. This model employs skip-layer connections to minimize information loss during reconstruction and demonstrated high performance under noise stress across four ECG databases, achieving an impressive 98.88% accuracy for ECG beat classification.

Kumari et al.⁴⁰ developed a computational model for classifying ECG signals into normal and abnormal categories, tested on both the MITDB and NSTDB. Their methodology integrates Superlet Transform (SLT) for pre-processing and noise filtering, VGG18 for feature extraction via transfer learning and KNN for classification. This approach achieved an impressive 99.46% accuracy in noisy environments, highlighting its robustness for real-world arrhythmia detection.

He et al.²² proposed a dynamic ECG signal quality assessment method based on a hybrid CNN and LSTM network. Aimed at improving diagnostic accuracy in noisy environments, their model categorizes ECG signals into three quality levels and was validated on the MITDB and NSTDB. It achieved an accuracy of 98.65% with a macro-averaged F1 score of 98.50%, demonstrating its effectiveness in robust ECG signal analysis for heart disease diagnosis.

Lee et al.³⁷ presented a robust method for arrhythmia detection from wearable ECG devices, specifically addressing noise contamination. Their system combines an adaptive-threshold QRS detector with a hybrid neural network comprising LSTM and Artificial Neural Networks (ANN), along with SMOTE for class imbalance. Evaluated on the MITDB and NSTDB, the model achieved 97.38% sensitivity and 97.08% precision on the NSTDB, showcasing strong performance even under noisy conditions.

Wei et al.⁴¹ proposed a DL-based denoising model for multichannel ECG signals. Their Fully Convolutional Network-based Denoising Autoencoder (FCN-DAE) with Jacobian regularization aimed at noise removal while preserving critical local information. The model achieved up to 97.02% noise removal accuracy and outperformed traditional methods, providing a robust solution for clinical applications by effectively preserving vital ECG features like the QRS complex.

Finally, Nurmaini et al.⁴² introduced a DL-based stacked denoising autoencoder (DAE) and autoencoder (AE) model integrated with Deep Neural Networks (DNNs) for ECG heartbeat classification. Tested on the MITDB and NSTDB under varying noise levels, their model achieved high performance with an accuracy of 99.34%, demonstrating that the DAE and AE architecture significantly improved feature extraction and denoising capabilities compared to conventional ML models.

Methodology

This study proposes a framework based on Explainable Artificial Intelligence (XAI) and Deep Learning (DL) for detecting arrhythmias from ECG signals, as illustrated in Fig. 1. The methodology consists of several key stages: data acquisition from publicly available datasets, comprehensive preprocessing, and the development and evaluation of DL models. To address the potential class imbalance commonly found in medical datasets, various data balancing techniques were explored, and the most effective method was selected to optimize model performance. Specifically, Dense Neural Networks (DNN) and Convolutional Neural Networks (CNN) were employed to learn the discriminative features necessary for arrhythmia detection. This systematic approach emphasizes robust data preparation, proper data balancing, effective model building, and rigorous evaluation to ensure a reliable framework for detecting arrhythmias.

Dataset description

To develop a robust and generalizable arrhythmia classification model, we utilized three benchmark ECG datasets: the MIT-BIH Arrhythmia Database (MITDB), the Noise Stress Test Database (NSTDB) and the PTB Diagnostic ECG Database (PTBDB). These datasets were selected for their clinical relevance, diverse arrhythmia types and varying levels of signal quality. Together, they provide a comprehensive training and evaluation foundation for DL-based ECG analysis under both ideal and noisy conditions.

The MITDB is the primary dataset used in this study. It contains 48 half-hour, two-channel ambulatory ECG recordings from 47 subjects, sampled at 360 Hz. Each recording is annotated beat-by-beat by expert cardiologists, covering both normal and a wide range of arrhythmic classes such as left/right bundle branch blocks (L/R), atrial premature beats (A) and premature ventricular contractions (V). This dataset offers a rich variety of real-world heartbeat morphologies, making it highly suitable for supervised arrhythmia classification tasks.

The NSTDB provides clean ECG signals combined with various types of synthetic noise, including baseline wander, muscle artifact and electrode motion. All signals are sampled at 360 Hz to match MITDB. Although NSTDB does not include annotated arrhythmia labels, it is valuable for testing model robustness under noise conditions. By adding NSTDB noise to MITDB signals during training and testing, we can simulate real-world ECG interference and enhance the model’s noise resilience.

The PTBDB contains high-resolution ECG recordings (sampled at 1,000 Hz) from 290 patients and healthy volunteers. It includes multiple ECG leads and covers a wide range of cardiac conditions such as myocardial infarction, conduction blocks and hypertrophy. For consistency, the signals were resampled to 360 Hz. Although PTBDB does not provide beat-level annotations like MITDB, its pathological variety helps in augmenting training data and validating generalization across patient populations. This dataset strengthens the model’s capacity to detect arrhythmias beyond the scope of MITDB alone.

Dataset preparation

The dataset preparation involved systematic processing of multiple ECG databases such as MITDB, NSTDB and PTBDB to create a unified, well-structured input for DL models. The WFDB Python library was used to load raw ECG signals and their corresponding heartbeat annotations from each dataset dynamically, avoiding hardcoding and improving flexibility.

For each dataset, heartbeat-centered signal segments were extracted using a fixed time window of one second. This window size ensures capturing sufficient temporal context around each annotated beat. Heartbeats were filtered and labeled according to a predefined mapping scheme for binary classification, where normal beats are labeled as 0 and various arrhythmias are grouped under label 1. Beats outside these categories were ignored to maintain label consistency and reduce noise.

To efficiently handle large amounts of data, parallel processing was employed using the joblib library, which sped up segment extraction across multiple records. After segmentation, each ECG window was flattened into a feature vector and combined into a single DataFrame with corresponding class labels. This DataFrame formed the basis for model training and evaluation, ensuring consistent representation across all three databases.

ECG signal extraction process

To extract ECG signal segments for detection correctly, we utilized several key Python libraries, including wfdb for reading waveform and annotation files, NumPy for numerical operations, pandas for data structuring and joblib for parallel processing.

Step-by-step process:

1.
Read ECG and annotation data: For each record, the ECG signal was read using wfdb.rdrecord() and corresponding annotations were retrieved with wfdb.rdann().
2.
Windowing: A 1-second window (WINDOW_SEC = 1) was applied around each annotated beat. Each segment was extracted symmetrically around the annotated R-peak. The number of samples was computed as w = int(WINDOW_SEC * rec.fs) based on the sampling frequency.
3.
Class mapping: Each annotation symbol was mapped to a binary label using the dictionary LABEL_MAP = {’N’:0, ’L’:1, ’R’:1, ’A’:1, ’V’:1}. This merges all arrhythmic types into a single abnormal class.
4.
Segment extraction: For each annotated beat of interest, a segment was extracted using indexing. If the window extended beyond signal boundaries, zero-padding was used to maintain segment length consistency.
5.
Parallel processing: The function process_record was executed in parallel across all records using joblib.Parallel to accelerate data loading.
6.
Flattening and storage: All extracted segments were concatenated, reshaped into flat feature vectors and stored in a pandas DataFrame. Each row represents a single beat segment, with the class label stored in the Target column.

This structured pipeline ensures uniform segment length, consistent sampling and accurate beat alignment for input into DL models.

Data preprocessing

Effective preprocessing of ECG signals is critical to enhance signal quality and improve the performance of DL models for arrhythmia classification. Raw ECG data often contain various types of noise and artifacts, such as baseline wander, powerline interference and high-frequency noise, which can obscure important cardiac features. To address these issues, a series of filtering and normalization steps was applied systematically.

Bandpass filtering: A Butterworth bandpass filter with cutoff frequencies at 0.5 Hz and 45 Hz was used to remove baseline drift and high-frequency noise. The low cutoff frequency of 0.5 Hz helps in eliminating slow baseline wander caused by respiration and movement, while the high cutoff at 45 Hz removes muscle noise and other high-frequency artifacts. This range preserves the relevant ECG frequency components critical for detecting arrhythmic patterns.

Notch filtering: Powerline interference at 50 Hz is a common source of noise in ECG recordings, especially in regions where the electrical grid operates at this frequency. To mitigate this, a notch filter centered at 50 Hz was applied. This filter selectively attenuates the narrowband noise without significantly affecting the ECG signal, ensuring cleaner recordings.

Baseline wander removal: Although the bandpass filter reduces baseline drift, residual slow fluctuations may remain. To further correct this, a high-pass Butterworth filter with a cutoff frequency of approximately 0.5 Hz was applied. This step ensures that baseline drift, which can interfere with accurate heartbeat delineation, is minimized.

Normalization: Following noise and artifact removal, each ECG segment was normalized using Z-score standardization. This technique centers the data by subtracting the mean and scales it by the standard deviation, resulting in features with zero mean and unit variance. Normalization is essential to bring all input features to a common scale, improving the stability and convergence speed of DL models during training.

All these preprocessing steps were implemented as a pipeline, consistently applied to all ECG segments to produce clean, normalized data ready for model training and evaluation.

Figure 2 illustrates the impact of the preprocessing pipeline on ECG heart signals, highlighting changes in signal quality across two representative samples.

Data balancing

Imbalanced datasets, where some arrhythmia classes are significantly underrepresented compared to others, pose a major challenge for developing robust DL models. Without addressing this imbalance, models tend to be biased towards the majority class, resulting in poor detection of rare but clinically important arrhythmia events. To mitigate this, several data balancing techniques were applied to the training dataset to synthetically increase minority class samples and improve class distribution.

Adaptive synthetic sampling (ADASYN): ADASYN generates synthetic samples for minority classes by adaptively focusing on harder-to-learn examples near class boundaries. By emphasizing these complex samples, ADASYN helps the model better distinguish subtle arrhythmic patterns that are often overlooked, improving sensitivity to rare classes.

Synthetic minority over-sampling technique (SMOTE): SMOTE creates new synthetic samples for minority classes by interpolating between existing minority instances. This approach balances the dataset without simply duplicating samples, which helps reduce overfitting and enhances the generalization capability of the model.

SMOTE combined with Tomek links (SMOTETomek): This hybrid method combines SMOTE oversampling with Tomek Links undersampling to both synthesize minority samples and remove borderline majority samples that overlap with minorities. The result is a cleaner and more balanced dataset, which can improve classifier performance by reducing class overlap.

Random over-sampling (ROS): ROS duplicates existing minority class samples to balance the dataset. It ensures that the minority class is adequately represented, helping models avoid strong bias toward the majority class. Since no synthetic data is introduced, the natural distribution of the minority samples is preserved. It is simple, fast, and easy to implement compared to SMOTE, ADASYN, and SMOTE-Tomek. It preserves the original feature space without introducing synthetic or noisy samples, making results more interpretable. While prone to overfitting, ROS avoids the higher computational cost and complexity of other oversampling methods, serving as a strong baseline for comparison.

By applying these balancing methods, the dataset’s class distribution was adjusted to reduce bias and improve the DL models’ ability to accurately detect and classify arrhythmic events across all classes.

The impact of data balancing using our proposed ROS technique is demonstrated in Fig. 3, Fig. 4 and Fig. 5, which illustrate the class distributions before and after balancing across the MITDB, PTBDB and NSTDB datasets.

Deep learning models

For arrhythmia classification, two DL architectures were developed and evaluated: a fully connected Dense Neural Network (DNN) and a one-dimensional Convolutional Neural Network (CNN). Both models were designed for binary classification to distinguish normal from arrhythmic ECG segments.

Dense neural network (DNN):

The DNN model consists of an input layer matching the feature dimension of the preprocessed ECG segments, followed by three dense layers with 128, 64 and 32 neurons respectively. Each hidden layer uses the ReLU activation function to introduce non-linearity and dropout layers with rates of 0.3 and 0.2 were added after the first and second dense layers to reduce overfitting by randomly disabling neurons during training. The output layer contains a single neuron with a sigmoid activation function, providing a probability score for binary classification. The model was compiled with the Adam optimizer and binary cross-entropy loss, tracking accuracy and the Area Under the ROC Curve (AUC) as evaluation metrics. Early stopping based on validation loss with a patience of 10 epochs was employed to prevent overfitting during training, which was conducted for a maximum of 30 epochs with a batch size of 32.

Convolutional neural network (CNN):

The CNN model leverages the temporal structure of ECG signals by applying one-dimensional convolutional layers. Input segments were reshaped to include a channel dimension, enabling convolutional operations. The architecture comprises two convolutional layers with 128 and 64 filters respectively, each followed by max-pooling layers that downsample the feature maps, reducing computational complexity and extracting dominant features. After flattening the output, a dropout layer with a rate of 0.3 was applied before a dense layer of 32 neurons with ReLU activation. The final output layer uses a sigmoid activation function for binary classification. The CNN was compiled and trained with the same optimizer, loss function and early stopping criteria as the DNN model.

Both models were evaluated on a held-out test set. Predictions were thresholded at 0.5 to convert probabilities into binary class labels. Extensive performance was assessed to show the efficiency of each model. The CNN typically benefits from capturing local temporal dependencies in ECG data, while the DNN exploits global features from the flattened input. Their comparative results provide insights into the efficacy of feature extraction approaches for arrhythmia detection.

Model size and memory footprint discussion

The DNN model comprises approximately 103K trainable parameters, with a total memory footprint (including optimizer states) of around 1.17 MB. This relatively small size makes it computationally efficient and suitable for deployment in resource-constrained environments. The fully connected architecture effectively learns from flattened input features but may have limited ability to capture temporal dependencies in ECG signals (Table 1).

In contrast, the CNN model contains about 390K trainable parameters and requires approximately 4.46 MB of memory, including optimizer states. The increased complexity is largely due to the convolutional layers and the large dense layer following flattening, which enable the model to automatically extract hierarchical temporal features from ECG data. This capacity often translates into improved classification performance but requires more computational resources and memory (Table 2).

The trade-off between model size and performance should be considered when selecting an architecture. The DNN offers faster training and lower memory usage, while the CNN provides more powerful feature extraction at the cost of increased resource demand.

Table 1 Summary of DNN model architecture and parameters.

Subjects

Abstract

Similar content being viewed by others

A hybrid deep learning network for automatic diagnosis of cardiac arrhythmia based on 12-lead ECG

Hybrid CNN-BLSTM architecture for classification and detection of arrhythmia in ECG signals

Hybrid deep learning framework for heart disease prediction using ECG signal images

Introduction

Contributions

Related works

Arrhythmia detection on MITDB dataset

Arrhythmia detection on PTBDB dataset

Arrhythmia detection on NSTDB dataset

Methodology

Dataset description

Dataset preparation

Data preprocessing

Data balancing

Deep learning models

Model evaluation metrics

Explainable AI (XAI) techniques

Result analysis

Performance analysis on MITDB dataset

Performance analysis on PTBDB dataset

Performance analysis on NSTDB dataset

Classification reports analysis

ROC curve analysis

XAI Analysis

Clinical implications

Discussion

Comparative analysis on MITDB dataset

Comparative analysis on PTBDB dataset

Comparative analysis on NSTDB dataset

Ablation study

Comparison analysis of GAN and ROS

Complexity analysis

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethics approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links