Abstract
Cardiovascular arrhythmia, characterized by irregular heart rhythms, poses significant health risks, including stroke and heart failure, making accurate and early detection critical for effective treatment. Traditional detection methods often struggle with challenges such as imbalanced datasets, limiting their ability to identify rare arrhythmia types. This study proposes a novel hybrid approach that integrates ConvNeXt-X deep learning models with advanced data balancing techniques to improve arrhythmia classification accuracy. Specifically, we evaluated three ConvNeXt variants—ConvNeXtTiny, ConvNeXtBase, and ConvNeXtSmall—combined with Random Oversampling (RO) and SMOTE-TomekLink (STL) on the MIT-BIH Arrhythmia Database. Experimental results demonstrate that the ConvNeXtTiny model paired with STL achieved the highest accuracy of 99.75%, followed by ConvNeXtTiny with RO at 99.72%. The STL technique consistently enhanced minority class detection and overall performance across models, with ConvNeXtBase and ConvNeXtSmall achieving accuracies of 99.69% and 99.72%, respectively. These findings highlight the efficacy of ConvNeXt-X models, when coupled with robust data balancing techniques, in achieving reliable and precise arrhythmia detection. This methodology holds significant potential for improving diagnostic accuracy and supporting clinical decision-making in healthcare.
Similar content being viewed by others
Introduction
Cardiovascular disease (CVD) is one of the leading causes of mortality worldwide, significantly impacting global health by affecting the heart and blood vessels and increasing the risk of stroke and heart attack. According to the World Health Organization (WHO), CVDs accounted for more than 17.6 million deaths in 2016, with projections suggesting this figure may rise to 23.6 million by 20301. Notably, over 75% of CVD-related deaths occur in low- and middle-income countries2. Among CVDs, cardiac arrhythmias—irregular heart rhythms—are particularly dangerous, as they can lead to sudden cardiac arrest, hemodynamic collapse, or even death if untreated3. Arrhythmias include conditions such as bradycardia, tachycardia, and other rhythm abnormalities, making their early detection crucial4. Electrocardiograms (ECGs) are widely used to diagnose CVDs, offering a non-invasive and reliable means to monitor heart activity. However, the large volumes and complexities of ECG data can make manual interpretation challenging, particularly in clinical settings with high workloads. Additionally, ECG signals are affected by individual, temporal, and environmental variations, which complicate diagnosis5. One key segment of ECG data is the QRS complex, which reflects ventricular depolarization; deviations here, as well as in the associated P and T waves, are often indicative of arrhythmias like atrial fibrillation (AF), ventricular fibrillation (VF), and premature ventricular contractions (PVCs)6. Traditional methods for extended ECG monitoring, such as Holter monitors, mobile cardiac outpatient telemetry (MCOT), and implantable devices, offer continuous heart monitoring to varying extents7,8. While these devices collect valuable data for arrhythmia detection, they do not always provide real-time diagnostic insights, which can limit timely intervention. Furthermore, manual interpretation of ECG recordings remains a time-consuming task, adding to the burden on clinicians.
In recent years, artificial intelligence (AI) and machine learning (ML) have shown great potential in automating arrhythmia detection in ECG signals. These technologies can improve the accuracy and efficiency of arrhythmia detection, alleviating some of the pressures on healthcare systems. A growing body of research has focused on using deep learning (DL) for automatic arrhythmia classification in ECG data9,10. DL models have a particular advantage over traditional ML techniques, as they can learn relevant features directly from raw data without the need for manual feature extraction11. In the medical domain, DL has already achieved notable success in areas like ultrasound imaging for breast cancer detection, cardiovascular assessments, and carotid artery evaluations12,13,14. These advances highlight the potential of DL for ECG-based CVD diagnosis, where capturing complex patterns in time-series data is essential.
In response to these challenges, this study explores the following research questions:
-
Q1: How can arrhythmias in ECG signals be accurately identified?
-
Q2: What procedures are necessary to build an automated model for reliable arrhythmia detection?
-
Q3: How can the effectiveness of the developed model be evaluated for arrhythmia detection?
This study aims to develop a high-performance hybrid DL approach for accurate arrhythmia detection from ECG signals, address class imbalance issues using data balancing (DB) techniques, and create a framework adaptable for clinical integration to enhance patient outcomes.
In this study, we propose a novel hybrid DL approach that employs the ConvNeXtTiny, ConvNeXtBase, and ConvNeXtSmall models for arrhythmia detection from ECG signals. To address class imbalances in ECG datasets, we incorporate SMOTE-TomekLink (STL) and Random Oversampling (RO) techniques, ensuring a balanced and representative dataset for training. By leveraging the capabilities of these ConvNeXt models, our approach captures intricate ECG patterns, thereby enhancing the accuracy and reliability of arrhythmia detection.
The key contributions of this study are:
-
Hybrid Approach for ECG Analysis: We propose a novel hybrid methodology that combines DB techniques with fine-tuned ConvNeXt models (Tiny, Base, Small) to enhance arrhythmia detection accuracy from ECG signals.
-
Advanced Data Balancing (DB) Techniques: To mitigate the challenges posed by class imbalances, we apply SMOTE-TomekLink (STL) and Random Oversampling (RO), which help generate a more balanced dataset, thereby enhancing model robustness and generalizability.
-
Improved Clinical Applicability: The fine-tuned ConvNeXt-X models demonstrate high accuracy and reliability in detecting arrhythmias, showcasing their potential for integration into clinical settings to facilitate early diagnosis and improve patient outcomes.
This paper’s remaining sections are organized as follows: The literature on arrhythmia detection is reviewed in Section . Our research methodology is described in Section . The performance analysis of our studies is shown in Section . In Section , the model complexity is analyzed, the research questions are validated, and comparisons with previous works are covered. Section concludes by summarizing the research findings and offering recommendations for future research directions.
Literature review
This section presents an in-depth review of recent advancements in the identification and classification of ECG arrhythmias, with a particular focus on the application of DL models. Additionally, it explores various data augmentation strategies employed to address the challenge of data imbalance in ECG datasets, highlighting their effectiveness in improving model performance and ensuring robust arrhythmia detection.
Bechinia et al.15 introduced a novel deep-learning method for classifying and detecting arrhythmias in ECG signals. The study utilized a transfer learning (TL) model featuring MobileNet-V2 architecture and a Lightweight Custom Convolutional Neural Network (LC-CNN). MobileNet-V2 employed pre-trained features, while LC-CNN, equipped with three convolutional layers, aimed to enhance performance. Data preprocessing included R-peak detection for accurate beat segmentation and noise reduction using a Butterworth filter. To tackle class imbalance, an Auxiliary Classifier Generative Adversarial Network (ACGAN) was employed for data augmentation on the MIT-BIH dataset. LC-CNN achieved a superior classification accuracy of 99.22% compared to MobileNet-V2’s 98.69%.
Zhang et al.16 proposed a unique approach for automatic heart rate detection using a two-dimensional convolutional neural network (2D-CNN). The method converted multi-lead single-channel ECG data into dual-channel ECG data before feeding it into the 2D-CNN. A multi-scale pyramid module integrated into the network facilitated comprehensive contextual information extraction from images to improve diagnostic accuracy. Experimental results utilizing public datasets such as Sudden Cardiac Death Holter and MIT-BIH demonstrated classification accuracies of 99.12% and 98.40%, respectively. This approach efficiently monitored abnormal heart rates, potentially reducing the risk of sudden death associated with such conditions, without requiring human preprocessing.
Zubair et al.17 presented a novel deep representation learning technique designed to address imbalanced data distributions for efficient arrhythmic beat detection. Unlike traditional oversampling methods, their approach utilized a unique translation loss function to transform majority-class data into minority-class samples, improving the model’s ability to generalize minority-class representations. An augmented attention module further enhanced performance by focusing on crucial information. After applying the oversampling strategy, the method significantly improved classification performance, achieving 96.19% accuracy when evaluated using an inter-patient classification model on the MIT-BIH arrhythmia database.
Chen et al.18 introduced a multiperceptive region spatial-temporal graph convolutional shrinking network (MPR-STSGCN) for intelligent arrhythmia recognition. Existing graph convolutional networks (GCNs) often struggle with fixed perceptual zones and noise susceptibility in ECG data, hindering efficient signal correlation capture. The MPR-STSGCN addressed these challenges by combining features from multiple perceptual regions and utilizing a shrinkage block incorporating channel attention and soft thresholding modules to filter unnecessary features dynamically. Tested on the MIT-BIH Arrhythmia dataset, the MPR-STSGCN outperformed state-of-the-art techniques, demonstrating its potential for highly accurate arrhythmia identification with impressive results.
Tahmid et al.19 presented an advanced DL architecture designed for the automatic categorization of CVDs using ECG inputs. Traditional DL models often lose time-sequence relationships and inter-channel information. To address these challenges, their model extracted volumetric, spatial, and temporal information from multi-lead ECG signals using multidimensional convolutions. Evaluation of a large public dataset with ECG signals from over 10,000 patients demonstrated the effectiveness of the design, achieving an impressive classification accuracy of 97.3%. This research provides valuable insights into enhancing DL-based ECG-based CVD classification, setting the stage for future advancements in the field.
Aphale et al.20 introduced ArrhyNet, a custom 15-layer CNN tailored for accurate arrhythmia detection using the MIT-BIH Arrhythmia Database. ArrhyNet addresses dataset imbalance through noise elimination via low-pass and baseline wander filters, and employs the Daubechies Wavelet Transform for feature extraction. Classifying 16 arrhythmia types into 5 classes according to AAMI standards, ArrhyNet comprises six 1-D convolution layers, four max-pooling layers, one global max-pooling layer, a flattened layer, and three dense layers. Evaluation metrics include top-1 accuracy of 92.73%, and macro-average precision, recall, and F1 scores of 91%, 92%, and 91% respectively, with a weighted average of 93%. Precision measures correct positive predictions, recall identifies actual positives, and the F1 score balances precision and recall harmonically. The Classification Report (Table IV) highlights ArrhyNet’s efficacy, particularly in achieving high true positives for class 0 among 1572 test beats, underscoring its robust performance in arrhythmia detection.
Katal et al.21 evaluated techniques for discerning arrhythmia patterns, focusing on metrics like accuracy, specificity, precision, and F1 score. A novel CNN was proposed and meticulously designed for this purpose, compared against established models such as GoogleNet. The evaluation utilized three PhysioNet databases: MIT-BIH Arrhythmia, MIT-BIH Normal Sinus Rhythm, and BIDMC Congestive Heart Failure, each offering unique insights into cardiac health across different conditions. The study comprehensively assessed the CNN’s performance across varied cardiac scenarios. Results highlighted the small CNN’s superiority with an overall accuracy of 91.20% and validation accuracy of 92.31%, surpassing GoogleNet. This underscores its robust performance in accurately identifying arrhythmia patterns, which is essential for prompt diagnosis and intervention in clinical settings.
Shi et al.22 presented a novel automatic classification system for ECG heartbeat classification, aimed at improving diagnostic performance. The system integrated a CNN and long short-term memory (LSTM) network within a deep architecture with multiple input layers. Four input layers were designed based on distinct regions of a heartbeat and RR interval features. The first three inputs underwent convolution with varying strides, and their outputs were concatenated and processed through an LSTM network. Subsequently, two fully connected layers followed, and their output was combined with the fourth input. The final fully connected layer produced the predicted label. Evaluation on the MIT-BIH arrhythmia database, using both class-oriented and subject-oriented schemes, achieved high accuracies of 94.20%. The study emphasized the effectiveness of integrating automatic and handcrafted features in heartbeat classification, highlighting the system’s potential for clinical applications.
Banos et al.23 introduced a novel computational model for cardiac arrhythmia detection, which integrated the particle swarm optimization (PSO) algorithm with a CNN. Using the MIT-BIH Arrhythmia Dataset (MITDB), the model aimed to optimize CNN hyperparameters to improve accuracy and reduce categorical cross-entropy errors. The proposed approach successfully identified an optimal layered architecture within 17.68 h, achieving 97% accuracy and low error rates in both the training and testing phases. This innovative method eliminated the need for manual hyperparameter selection, showcasing reliability and presenting a promising approach for advancing arrhythmia detection techniques.
Huang et al.24 introduced a novel method for ECG arrhythmia detection using a two-dimensional (2D) deep CNN, leveraging recent advancements in DL for feature extraction. Time-domain ECG signals representing five heartbeat types were transformed into time-frequency spectrograms via short-time Fourier transform. These spectrograms served as inputs to the 2D-CNN, facilitating the identification and classification of arrhythmia types. Model parameter optimization revealed optimal performance with a learning rate of 0.001 and a batch size of 2500, resulting in highest accuracy and minimal loss. Furthermore, the proposed 2D-CNN was compared with a conventional one-dimensional CNN model, where the 1D-CNN achieved an average accuracy of 90.93%. These results underscored the effectiveness of the novel method in enhancing ECG arrhythmia detection accuracy compared to traditional approaches.
NEO-CNN was introduced as a robust arrhythmia detection algorithm tailored for wearable applications and simple micro-controller implementation25. The algorithm accurately detected the QRS complex and precisely located the R-peak using an adaptive time-dependent thresholding technique, significantly improving accuracy and sensitivity in arrhythmia detection. Employing an optimized compact 1D-CNN network with 9,701 parameters, the method achieved a notable classification accuracy of 97.83%. During training, a QRS complex augmentation method was introduced to reduce R-peak location errors. The algorithm’s robustness was rigorously evaluated using a nested k1k2-fold cross-validation method. Implemented on the STM32F407 microcontroller, NEO-CNN demonstrated exceptional performance, achieving high accuracy and sensitivity, particularly highlighting an
Farag et al.26 introduced a novel approach for real-time arrhythmia detection at the edge using a lightweight, self-contained model based on a short-time Fourier Transform (STFT) CNN. The model utilized an STFT-based 1D convolutional layer to extract spectrograms from ECG signals, which were then transformed into 2D heatmap images for classification using a Conv2D neural network. Training and evaluation employed the MIT-BIH arrhythmia database, optimizing four model variants initially on a cloud platform and subsequently on a Raspberry Pi for edge deployment. Techniques such as weight quantization and pruning reduced the model size to 90 KB, achieving up to 99.1% classification accuracy and 95% F1-score. This study highlighted the potential of deploying efficient, real-time arrhythmia classifiers on edge devices, addressing the need for privacy-preserving ECG analysis in decentralized healthcare applications.
Singh et al.27 introduced an attention-based convolutional denoising autoencoder (ACDAE) model for rapid and accurate denoising and classification of low-quality ECG signals. The model employed skip-layer connections and a lightweight efficient channel attention (ECA) module to efficiently reconstruct ECG signals from severe noise conditions and enhance relevant features. Training and evaluation utilized four widely available databases, including the MIT-BIH noise stress test database (NSTDB) and the MIT-BIH arrhythmia (BIHA) database. The evaluation involved mixing ECG signals with simulated additive white Gaussian noise (AWGN) and NSTDB noise at various dB levels. The ACDAE model demonstrated significant improvements, achieving an average signal-to-noise ratio (SNR) enhancement of 19.07 ± 1.67 and a percentage-root-mean-square difference (PRD) of 11.0% at 0-dB SNR. For classification, the model achieved 98.88% ± 0.42% accuracy, 98.76% ± 0.44% precision, and 98.48% ± 0.58% recall using stratified fivefold cross-validation on 60,000 beats. This study comprehensively validated the effectiveness of ACDAE in denoising low SNR ECG signals and improving atrial fibrillation (AF) classification compared to current benchmarks.
The automated heartbeat classification system was developed to address the global rise in cardiovascular diseases, leveraging the MIT-BIH arrhythmia database for training and testing in28. The approach extracted 61 features using the time-series feature extraction library (TSFEL) and applied various techniques such as feature scaling, correlation removal, and random forest (RF) recursive feature elimination for enhanced classification. TSFEL integration during feature extraction, synthetic minority oversampling, and an ensemble of RF and support vector machines (SVM) with weighted majority voting were key innovations. Hyperparameter optimization via grid search was performed for RF and SVM classifiers. Evaluation under a “subject-specific” scheme demonstrated high sensitivity for arrhythmic heartbeat classes: N (99.50%), SVEB (74.20%), VEB (94.22%), F (73.21%), and Q (0%). Comparative analysis with state-of-the-art methods showed significant efficiency gains, achieving an impressive 98.21% accuracy. This study emphasized the pivotal role of automated heartbeat classification in accelerating cardiovascular disease diagnosis, potentially reducing both healthcare costs and societal impacts.
Pokaprakarn et al.29 proposed a novel DL architecture that combines CNN and Recurrent Neural Networks (RNN) for segmenting and classifying five cardiac rhythms based on ECG recordings. Operating in a sequence-to-sequence setting, the model processed five-second ECG signal windows as input and produced cardiac rhythm labels as output. Notably, the architecture handled both spectrograms and heartbeat signal waveforms, ensuring robust performance against label noise. Validation on an external database demonstrated an average F1 score of 0.89 across five classes. The approach achieved classification performance comparable to that of existing methods with significantly fewer training parameters. Models A (Co-teaching) and B (PENCIL) achieved impressive test accuracies of 97.60% and 97.43%, respectively.
Gill et al.30 focused on the critical task of arrhythmia categorization in cardiology and biomedical engineering research. Accurate categorization is crucial for diagnosis, therapy planning, and patient care due to the severe health impacts of irregular cardiac rhythms. The study explored various feature extraction and selection strategies to identify informative aspects of ECG signals essential for precise classification. Feature engineering played a pivotal role in crafting effective arrhythmia detection models by discovering novel features or combinations that enhanced accuracy. The study aimed to develop an ECG classification system using TL, where the DenseNet121 model demonstrated notable classification performance with an accuracy rate of 81% in detecting arrhythmias.
Proposed methodology
In this study, we present a novel methodology for arrhythmia detection from ECG signals. Our approach is designed to tackle common challenges in ECG signal analysis, such as noise, missing data, and class imbalance, while leveraging the power of advanced DL models to achieve high accuracy in arrhythmia classification. The proposed framework, illustrated in Fig. 1, follows a systematic process, which includes the following key steps: dataset preparation, data preprocessing, data balancing, model development, and performance evaluation.
Dataset Preparation: The process begins with selecting a comprehensive ECG dataset containing both normal and arrhythmic heart rate data. This dataset is critical for training a model capable of detecting various arrhythmia types. By utilizing a diverse dataset, the model is better equipped to generalize across different arrhythmic patterns.
Data Preprocessing: To prepare the raw ECG signals for model training, several preprocessing techniques are applied. These include noise removal using bandpass filtering, handling of missing values, and normalization to standardize the data. These preprocessing steps are essential for ensuring that the input signals are clean, consistent, and ready for accurate modeling.
Data Balancing: Given the inherent class imbalance in ECG datasets, we apply advanced data balancing techniques to address this issue. Specifically, we use SMOTE-TomekLink (STL) and Random Oversampling (RO) methods to generate a more balanced dataset. This ensures that both normal and arrhythmic signals are adequately represented during training, which improves model robustness and performance.
Model Development: The core of our methodology involves the development of deep learning models using the ConvNeXt architecture. Three variants of ConvNeXt—ConvNeXtTiny, ConvNeXtBase, and ConvNeXtSmall—are utilized to capture spatial and temporal dependencies in ECG signals. These models are fine-tuned to ensure optimal performance in detecting arrhythmias.
Performance Evaluation: Finally, the trained models are evaluated using several performance metrics, including accuracy, precision, recall, f1-score, sensitivity and specificity. These metrics provide a comprehensive assessment of the model’s ability to accurately classify arrhythmias and its overall effectiveness in real-world clinical applications.
By following this structured methodology, we aim to significantly enhance the accuracy of arrhythmia detection while addressing the inherent challenges in ECG signal analysis. The integration of advanced DL techniques with robust preprocessing and data balancing strategies enables the development of an accurate and reliable arrhythmia detection system. Ultimately, this approach seeks to improve the performance of ECG-based diagnostic tools, supporting more informed clinical decision-making and contributing to better patient outcomes.
Dataset description
The MIT-BIH Arrhythmia Database is a widely used benchmark dataset for the study of ECG arrhythmia detection31. It contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects, including both healthy individuals and patients with arrhythmias. The dataset was recorded at a sampling rate of 360 Hz with 11-bit resolution over a 0.1-volt range, providing a detailed representation of ECG signals. These recordings encompass a wide variety of arrhythmias, making the dataset highly valuable for training and evaluating models designed to detect abnormal heart rhythms. The MIT-BIH dataset includes a total of 23 different types of arrhythmias, including but not limited to premature ventricular contractions (PVCs), atrial premature beats (APBs), and other complex arrhythmias. The ECG signals are manually annotated by experts, with annotations indicating the type and location of each arrhythmic event. The dataset is divided into two groups: one for training and the other for testing, with a balanced distribution of normal and abnormal rhythms. This structure allows for both supervised training of machine learning models and evaluation of their generalization capabilities.
In this study, we utilized the MIT-BIH Arrhythmia Database to train and test ConvNeXt-X models for arrhythmia detection. The diverse set of arrhythmic patterns in the dataset provides a challenging yet essential resource for developing robust ECG classification systems, making it an ideal choice for exploring advanced machine learning techniques in the field of ECG arrhythmia detection.
A random number was generated, and one ECG sample from each category in the dataset was selected to illustrate the variations between the different arrhythmia categories in Fig. 2.
Figure 3 shows the count of each label, providing a clear view of the number of samples per arrhythmia category.
Data preprocessing
Data preprocessing is a crucial step in any machine learning or DL pipeline, especially when working with biomedical signals like Electrocardiogram (ECG) data. Raw signals often contain noise and artifacts that can hinder the performance of machine learning models, making preprocessing essential to ensure high-quality data. Effective data preprocessing enhances model accuracy, ensures faster convergence during training, and reduces the risk of overfitting by presenting the model with clean and standardized data. In our study, preprocessing of the raw ECG signals is necessary to remove noise, handle missing values, and prepare the data for optimal input into our DL models. The preprocessing steps for the ECG signals are outlined below:
Handling missing values
Incomplete or missing data points in the ECG dataset can introduce bias or reduce the effectiveness of the learning model. To address this, we employed specific imputation techniques:
-
For numerical columns, missing values were replaced with the mean of the respective column, ensuring a consistent signal amplitude range across samples.
-
For categorical columns, missing values were filled using the mode (most frequent category) to preserve the distribution of categorical features.
Signal processing techniques
To further enhance the quality of the ECG signals, the following signal processing methods were applied:
-
Bandpass Filtering: ECG signals often contain noise from high-frequency electrical interference, which can mask important signal characteristics. We applied a bandpass filter with a frequency range of 0.5 Hz to 40 Hz, effectively removing noise outside this range. This ensures that only the relevant frequency components of the ECG signal are preserved, improving the signal-to-noise ratio.
-
Baseline Drift Removal: Baseline drift refers to low-frequency changes in the ECG signal, which can occur due to patient movement or electrode placement. We utilized detrending techniques to eliminate this drift, ensuring that the signal remains centered around a baseline, making it easier to analyze specific arrhythmia patterns.
-
Normalization: After filtering, normalization was applied to standardize the ECG signal. Using the StandardScaler, we scaled the signals so that they have a zero mean and unit variance. This step is essential for DL models as it ensures that features have comparable ranges, leading to faster convergence and improved model performance.
One-hot encoding
For categorical data present in the dataset (e.g., arrhythmia class), we applied one-hot encoding. This method transforms categorical variables into binary vectors, making the data suitable for machine learning models. By representing categorical data in this format, the model can interpret and differentiate between various classes more effectively during training.
These preprocessing steps ensure that the ECG signals are clean, free from noise, and ready for efficient input into the DL models.
Data balancing (DB) techniques
In ML and DL, imbalanced datasets—where certain classes significantly outnumber others—pose a challenge by leading to biased models that disproportionately favor the majority class. This imbalance is particularly critical in arrhythmia detection from ECG signals, as rare arrhythmias, despite their clinical significance, are often underrepresented in the dataset. A failure to adequately address this imbalance can hinder the model’s ability to accurately detect such rare conditions, reducing its overall robustness and clinical utility. To mitigate this issue, we adopted two advanced data balancing techniques: SMOTE-Tomek Link (STL) and Random Oversampling (RO). These methods were carefully selected for their ability to enhance the representation of minority classes in the training data while minimizing the risk of introducing noise or overfitting. STL combines the Synthetic Minority Oversampling Technique (SMOTE) with Tomek Link undersampling to generate synthetic samples for minority classes and simultaneously eliminate borderline or overlapping samples, ensuring a cleaner dataset. RO, on the other hand, duplicates existing samples from minority classes to balance the dataset, offering a simpler yet effective approach. Together, these techniques play a pivotal role in improving the model’s capacity to detect rare arrhythmias with higher precision and reliability.
Random oversampling (RO)
RO is a straightforward yet effective technique where the minority class is augmented by duplicating existing samples until the dataset becomes balanced32. This method increases the representation of under-represented classes without altering the feature space. Mathematically, if the minority class has \(n\) samples, RO increases the number of samples to match the majority class by drawing with replacement:
where \(\{x_1, x_2, \dots , x_n\}\) are the original minority class samples and \(\{x_{i_1}, x_{i_2}, \dots , x_{i_k}\}\) are the newly added duplicated samples, with \(k\) determined by the difference in class sizes. While this technique is computationally inexpensive and ensures class balance, care must be taken to prevent overfitting due to repeated instances.
Figure 4 shows the distribution of labels after applying RO to balance the dataset. The bar plot illustrates the counts and percentages of each label, respectively. The categories include Normal, Atrial Premature, Premature Ventricular Contraction, Fusion of Ventricular and Normal, and Fusion of Paced and Normal. The resampling process has successfully equalized the number of instances for each class, ensuring a more balanced dataset for training the model. This balance is crucial for improving the model’s performance across all classes, particularly the minority ones.
For our arrhythmia disease detection task, both STL and RO were employed in tandem to ensure that rare arrhythmias were adequately represented in the training set. The combination of these techniques allowed us to generate a dataset that not only maintained a balanced distribution but also minimized noise and potential bias in the majority class. These methods were crucial in improving the model’s ability to detect minority arrhythmias, as reflected in the performance metrics during evaluation.
SMOTE-TomekLink (STL)
STL is a hybrid resampling technique that combines the Synthetic Minority Over-sampling Technique (SMOTE) with the Tomek Link algorithm33. This method helps in addressing imbalances by over-sampling the minority class while simultaneously cleaning the majority class. The process can be mathematically broken down as follows:
SMOTE (Synthetic Minority Over-sampling Technique): SMOTE generates synthetic samples for the minority class by interpolating between existing minority class samples. Given a minority class sample \(x_i\), the SMOTE algorithm selects one of its \(k\)-nearest neighbors \(x_{i, k}\), and a synthetic sample is created as:
where \(\lambda\) is a random number between 0 and 1. This process results in new data points that lie along the line segments connecting \(x_i\) and its neighbors, helping to balance the class distribution by creating new, realistic samples.
Tomek Link: After applying SMOTE, the Tomek Link algorithm is used for cleaning the dataset by identifying and removing instances that form Tomek Links. A pair of data points \((x_i, x_j)\) forms a Tomek Link if:
where \(d(\cdot )\) represents the distance between two samples. If such a pair exists where \(x_i\) belongs to the majority class and \(x_j\) to the minority class, the majority class sample \(x_i\) is removed. This process helps to reduce class overlap and clean the decision boundary, improving classification performance.
By combining SMOTE and Tomek Link, STL overcomes the weaknesses of over-sampling methods by not only generating synthetic samples but also removing noisy examples from the majority class.
Figure 5 demonstrates the effects of using STL to balance our dataset. Initially, we combined the training and test data and then applied the STL technique, which both oversamples the minority class and removes Tomek links to enhance class separation. The bar plot illustrates the counts and percentages of each label, respectively. The categories include Normal, Atrial Premature, Premature Ventricular Contraction, Fusion of Ventricular and Normal, and Fusion of Paced and Normal.
ConvNeXt overview
TL in ConvNeXt allows pre-trained ConvNeXt models, originally trained on large datasets, to be fine-tuned for specific tasks like ECG signal classification34,35,36. This approach leverages learned features from similar domains, reducing training time and improving performance even with limited ECG data. It is a modern convolutional neural network (CNN) architecture designed to streamline and improve traditional CNN-based models37,38,39. ConvNeXt builds upon the foundations of classic CNNs by modernizing their design to handle more complex data structures and patterns while maintaining computational efficiency. In particular, It has proven to be effective in tasks requiring the analysis of sequential and spatial information, such as ECG signals. It introduces a deeper and more flexible architecture, which consists of convolutional layers optimized for feature extraction, normalization techniques that stabilize learning, and pooling layers that reduce dimensionality while preserving important information. This makes ConvNeXt highly suitable for tasks involving time-series data, where both local and global patterns need to be captured. ConvNeXt is particularly advantageous for ECG signal classification, where detecting both short-term variations (such as individual heartbeats) and long-term trends (such as rhythm patterns) is crucial for accurate arrhythmia detection.
ConvNeXtTiny
The ConvNeXtTiny model represents the smallest variant within the ConvNeXt family, tailored for scenarios involving smaller datasets or constrained computational resources. Despite its compact design, ConvNeXtTiny maintains the ability to capture critical features in ECG signals through its hierarchical convolutional architecture. Its simplicity and efficiency make it particularly well-suited for applications requiring low latency or real-time performance. With fewer parameters, it mitigates the risk of overfitting, ensuring robust feature extraction while minimizing computational overhead. This efficiency makes ConvNeXtTiny an optimal choice for low-resource environments or preliminary analysis of ECG signals.
ConvNeXtSmall
Positioned as the mid-sized variant of the ConvNeXt architecture, ConvNeXtSmall is engineered to address the needs of medium-scale datasets. By incorporating a greater number of parameters and deeper layers compared to ConvNeXtTiny, this model is better equipped to identify intricate patterns within ECG signals. ConvNeXtSmall achieves a balanced trade-off between computational efficiency and predictive performance, offering improved accuracy for arrhythmia classification. It is particularly advantageous for projects with moderate computational resources, as its architectural enhancements provide superior sensitivity to subtle variations in ECG data while maintaining reasonable training times.
ConvNeXtBase
The ConvNeXtBase model is the largest and most sophisticated variant employed in this study, designed to process large datasets with high-dimensional and complex features. Its deep architecture, characterized by an extensive number of parameters and layers, enables the model to learn intricate features and dependencies within ECG signals. This makes ConvNeXtBase ideal for tasks requiring high accuracy and robustness, such as detailed arrhythmia classification. While the model demands significant computational resources and longer training durations, its superior capacity for capturing both short-term and long-term patterns in ECG data justifies its use in projects prioritizing precision and reliability. ConvNeXtBase represents the pinnacle of performance among the three variants, excelling in scenarios where dataset size and computational power are not limiting factors.
Fine-tuning of ConvNeXt-X models
Fine-tuning involves adapting pre-trained models to a specific task by training them further on a domain-specific dataset. In this study, we fine-tuned ConvNeXtTiny, ConvNeXtSmall, and ConvNeXtBase models, originally trained on large-scale image datasets, to optimize them for the unique requirements of ECG signal classification. The architecture of these models was reconfigured by introducing additional layers to accommodate the one-dimensional nature of ECG data while preserving the pre-trained feature extraction capabilities of ConvNeXt. The process included using the Adam optimizer with a learning rate of 0.0001 to achieve stable and efficient convergence. Metrics such as accuracy, precision, and confusion matrix evaluations were employed to assess the models’ performance on arrhythmia detection. This approach enabled the models to effectively adapt to ECG signal patterns, demonstrating their robustness in distinguishing between various cardiac conditions. The detailed structural modifications and layer configurations used for this adaptation are outlined below
-
Input Layer—This layer defines the shape of the input data, which includes the sequence length (timesteps) and the number of features per timestep. No computation is performed here.
First Convolutional Block:
-
Conv1D—This layer applies 64 convolution filters, each with a kernel size of 3, to the input data. The filters slide along the time dimension (sequence length), performing element-wise multiplication and summing up the results. The purpose of this layer is to extract local patterns in the input sequences. The ReLU (Rectified Linear Unit) activation function is applied to introduce non-linearity, enabling the network to learn more complex patterns. The operation performed is:
$$\begin{aligned} y_{i,j} = \text {ReLU}\left( \sum _{k=1}^{K} x_{i,j+k} \cdot w_{j,k} + b_j\right) \end{aligned}$$where \(y_{i,j}\) is the output feature map, \(x_{i,j+k}\) is the input sequence, \(w_{j,k}\) are the filter weights, \(b_j\) is the bias, \(K\) is the kernel size (3 in this case), and ReLU is the activation function defined as:
$$\begin{aligned} y_{i,j} = \text {ReLU}\left( \sum _{k=1}^{K} x_{i,j+k} \cdot w_{j,k} + b_j\right) \end{aligned}$$where \(y_{i,j}\) is the output feature map, \(x_{i,j+k}\) is the input sequence, \(w_{j,k}\) are the filter weights, \(b_j\) is the bias, \(K\) is the kernel size (3 in this case), and ReLU is the activation function defined as:
$$\begin{aligned} \text {ReLU}(z) = \max (0, z) \end{aligned}$$ -
BatchNormalization—This layer normalizes the output of the convolutional layer to have zero mean and unit variance:
$$\begin{aligned} \hat{y}_{i,j} = \frac{y_{i,j} - \mu _j}{\sqrt{\sigma _j^2 + \epsilon }} \end{aligned}$$where \(\mu _j\) and \(\sigma _j^2\) are the mean and variance of the feature map \(y_{i,j}\), and \(\epsilon\) is a small constant to prevent division by zero.
-
MaxPool1D—This layer performs max pooling, reducing the dimensionality of the feature maps by taking the maximum value in each window of size 2:
$$\begin{aligned} z_{i,j} = \max (y_{i,2j}, y_{i,2j+1}) \end{aligned}$$where \(z_{i,j}\) is the output of the max pooling operation.
Second Convolutional Block:
-
Conv1D—Similar to the first convolutional layer, this layer applies 64 filters with a kernel size of 3. The same convolution operation is performed:
$$\begin{aligned} y_{i,j} = \text {ReLU}\left( \sum _{k=1}^{K} x_{i,j+k} \cdot w_{j,k} + b_j\right) \end{aligned}$$ -
BatchNormalization(): Batch normalization is applied again:
$$\begin{aligned} \hat{y}_{i,j} = \frac{y_{i,j} - \mu _j}{\sqrt{\sigma _j^2 + \epsilon }} \end{aligned}$$ -
MaxPool1D(pool size=2, strides=2, padding=“same”): Max pooling is performed with the same parameters:
$$\begin{aligned} z_{i,j} = \max (y_{i,2j}, y_{i,2j+1}) \end{aligned}$$
Dense Layer:
-
Flatten()—This layer converts the 3D output of the last max pooling layer into a 1D vector. This is necessary because the dense (fully connected) layers that follow expect 1D input. Flattening preserves the spatial structure of the data, arranging it into a single vector of features.
-
First Dense Layer—This fully connected layer has 128 neurons. Each neuron receives input from all neurons of the previous layer (the flattened vector). The ReLU activation function is used to introduce non-linearity. This layer allows the model to learn complex feature representations by combining the features extracted by the convolutional layers in different ways.
-
Second Dense Layer—Another fully connected layer with 64 neurons and ReLU activation. This layer further refines the feature representations learned by the previous dense layer.
-
Third Dense Layer—A fully connected layer with 32 neurons and ReLU activation. This layer reduces the dimensionality of the feature representations, making the model more efficient while still retaining important information.
-
Output Layer—The final dense layer corresponds to the 5 classes in the classification task. Each neuron in this layer represents the probability of the input belonging to one of the classes. The class with the highest probability is chosen as the model’s prediction.
$$\begin{aligned} y_i = \frac{e^{z_i}}{\sum _{j=1}^{5} e^{z_j}} \end{aligned}$$where \(y_i\) is the probability of the input belonging to class \(i\). The softmax function ensures that the output values are between 0 and 1 and sum to 100%, representing a probability distribution over the 5 classes. The model follows a typical convolutional neural network architecture with convolutional layers for feature extraction, followed by fully connected layers for classification. Batch normalization and max pooling layers help in stabilizing the training process and reducing dimensionality, respectively. The output layer with softmax activation provides a probability distribution over the target classes.
-
Performance analysis
The performance analysis evaluates the effectiveness of the ConvNeXt models—ConvNeXtBase, ConvNeXtSmall, and ConvNeXtTiny—in ECG arrhythmia detection. The analysis focuses on key evaluation metrics such as accuracy, precision, recall, F1-score, sensitivity and specificity, providing a comprehensive assessment of the models’ ability to classify ECG signals accurately and efficiently. Additionally, the analysis considers the impact of different data balancing techniques, namely RO and STL, on model performance. By comparing the results across various models and balancing techniques, this section aims to highlight the strengths and limitations of each approach in handling imbalanced ECG data and detecting rare arrhythmia classes. This performance evaluation serves as a foundation for understanding the models’ practical applicability in real-world clinical settings, where accurate and reliable detection of arrhythmias is crucial for patient care and diagnosis.
Experiment setup
The experiments were conducted on a personal computer equipped with an 8GB NVIDIA graphics card, 8GB of RAM, and an Intel Core i5 processor. The system ran a 64-bit version of Windows 10, with a clock speed of 1.80 GHz. The model implementation and training were carried out using Python (version 3.12.0), utilizing essential ML and DL libraries such as Scikit-learn (Sklearn), TensorFlow, and Keras. For data preprocessing, manipulation, and analysis, we used pandas and NumPy. To visualize the results, we employed Matplotlib and Seaborn. Performance metrics, including accuracy, precision, recall, F1 score, sensitivity, and specificity, were calculated using Sklearn’s built-in functions.
Performance metrics
The confusion matrix is constructed using the four detection variables: true positive (TP), true negative (TN), false positive (FP), and false negative (FN), as illustrated in Table 1. The following performance metrics were used to evaluate the effectiveness of the proposed model:
We used the following performance metrics to analyze the proposed model’s performance:
-
Accuracy: Accuracy represents the proportion of correctly predicted outcomes out of all observations. It is a general indicator of model performance and is defined as:
$$\begin{aligned} Accuracy = \frac{TP + TN}{TP + FP + FN + TN} \end{aligned}$$(1) -
Precision: Precision measures the proportion of correctly predicted positive observations among all predicted positive observations. It is an important indicator of the model’s ability to avoid false positives and is defined as:
$$\begin{aligned} Precision = \frac{TP}{TP + FP} \end{aligned}$$(2) -
Recall (Sensitivity): Recall (also known as sensitivity) measures the proportion of correctly predicted positive observations out of all actual positive observations. It indicates the model’s ability to detect true positives and is calculated as:
$$\begin{aligned} Recall = Sensitivity = \frac{TP}{TP + FN} \end{aligned}$$(3) -
F1-Score: The F1 score is the harmonic mean of precision and recall, providing a balanced measure of both metrics. It is especially useful when the class distribution is imbalanced:
$$\begin{aligned} F1\_score = 2 \times \frac{(precision \times recall)}{(precision + recall)} \end{aligned}$$(4) -
Specificity: Specificity, also known as the true negative rate, measures the proportion of correctly predicted negative observations out of all actual negative observations. It indicates the model’s ability to avoid false negatives and is defined as:
$$\begin{aligned} Specificity = \frac{TN}{TN + FP} \end{aligned}$$(5)
These metrics were calculated to assess the model’s performance in terms of both its ability to correctly identify positive and negative cases, as well as to balance the trade-off between precision and recall.
Result analysis
The results of our experiments highlight the effectiveness of using ConvNeXt-X models in combination with DB techniques for arrhythmia detection on the MIT-BIH Arrhythmia database. Two resampling methods, RO and STL, were applied to address class imbalances and enhance the detection of minority arrhythmia classes. Table 2 presents the classification performance of ConvNeXtTiny, ConvNeXtSmall, and ConvNeXtBase models across different performance metrics, including accuracy, precision, recall, F1-score, sensitivity, and specificity. The findings indicate that both resampling techniques substantially improved the performance of the ConvNeXt models, with STL consistently outperforming RO in all metrics. The ConvNeXtTiny model demonstrated the best performance across all metrics, achieving a peak accuracy of 99.75% when paired with STL. This model also exhibited superior recall and F1 scores, indicating its ability to balance precision and recall effectively, particularly for minority classes. The ConvNeXtSmall and ConvNeXtBase models also performed exceptionally well, with STL boosting their respective accuracies to 99.72% and 99.69%. Between the two resampling techniques, STL proved to be more effective than RO in improving classification outcomes. While RO merely duplicates minority class instances, STL combines synthetic sample generation with Tomek link removal, creating a more refined and balanced training dataset. This enhancement is evident in the higher sensitivity and specificity values achieved by all three ConvNeXt models when trained with STL (Fig. 6).
Table 3 details the classification performance of ConvNeXtBase, ConvNeXtSmall, and ConvNeXtTiny models on the MIT-BIH Arrhythmia dataset, evaluated using precision, recall, and F1-score for five arrhythmia classes: Normal, Fusion of Paced and Normal, Premature Ventricular Contraction, Atrial Premature, and Fusion of Ventricular and Normal. Under RO, ConvNeXtTiny achieved exceptional F1-scores, including 99.79% for Normal, 99.74% for Fusion of Paced and Normal, 99.63% for Premature Ventricular Contraction, 99.70% for Atrial Premature, and 99.73% for Fusion of Ventricular and Normal. ConvNeXtSmall also demonstrated strong performance, attaining F1-scores of 99.73%, 99.68%, 99.60%, 99.65%, and 99.68%, respectively, for the same classes. ConvNeXtBase provided consistent results, with F1-scores of 99.70%, 99.67%, 99.57%, 99.62%, and 99.69%, respectively. With STL, the models generally improved, particularly for minority classes. ConvNeXtTiny led the performance, achieving F1-scores of 99.82% for Normal, 99.76% for Fusion of Paced and Normal, 99.67% for Premature Ventricular Contraction, 99.75% for Atrial Premature, and 99.77% for Fusion of Ventricular and Normal. ConvNeXtSmall followed closely, with F1-scores of 99.78%, 99.74%, 99.63%, 99.70%, and 99.74%, respectively. ConvNeXtBase also showed enhanced performance under STL, achieving F1-scores of 99.74%, 99.71%, 99.60%, 99.64%, and 99.74%, respectively. Overall, STL outperformed RO in balancing class distributions and improving classification metrics, with ConvNeXtTiny consistently demonstrating the highest accuracy across all classes and resampling methods. These results highlight the effectiveness of STL and the robustness of ConvNeXtTiny in arrhythmia detection.
Figure 7 showcases the training and validation accuracy and loss curves for the ConvNeXt-X models on the MIT-BIH Arrhythmia dataset using the STL resampling technique. Each subfigure highlights the performance dynamics of a specific model in terms of accuracy and loss across the training epochs. In subfigure (a), the ConvNeXtBase accuracy curve exhibits a smooth ascent, reaching good accuracy levels for both training and validation sets, with minimal divergence, indicating effective learning and generalization. Subfigure (b) illustrates the corresponding loss curve, which steadily decreases, stabilizing at low values, further confirming the model’s robustness. Subfigures (c) and (d) represent the ConvNeXtSmall model’s accuracy and loss curves. The accuracy graph indicates rapid convergence with stable validation performance, while the loss graph displays a consistent reduction in values, underscoring efficient optimization and a well-calibrated model. Subfigures (e) and (f) detail the performance of the ConvNeXtTiny model. The accuracy curve demonstrates the quickest and smoothest convergence among the three models, achieving the highest validation accuracy. Similarly, the loss curve for ConvNeXtTiny decreases sharply and stabilizes at the lowest value, signifying exceptional generalization capability. Overall, the graphs highlight the proficiency of the ConvNeXt models when trained using STL. ConvNeXtTiny outperforms its counterparts in achieving superior accuracy and minimized loss, reaffirming its suitability for arrhythmia classification tasks under the STL resampling approach.
Figure 8 presents the confusion matrices for the ConvNeXt-X models, trained on the MIT-BIH Arrhythmia dataset after applying the STL. With STL applied, both the training and test datasets are now balanced, ensuring that each arrhythmia class has an almost equal representation, which helps mitigate the bias caused by class imbalance in model training. For ConvNeXtBase, the model correctly predicted 9011 “Normal,” 9169 “Fusion of Paced and Normal,” 9008 “Premature Ventricular Contraction,” 8860 “Atrial Premature,” and 9118 “Fusion of Ventricular and Normal.” The misclassifications included 6 “Normal” predicted as “Fusion of Paced and Normal,” 11 “Normal” predicted as “Premature Ventricular Contraction,” 5 “Normal” predicted as “Atrial Premature,” and 4 “Fusion of Ventricular and Normal” predicted as “Normal.” For ConvNeXtSmall, the model predicted 9014 “Normal,” 9172 “Fusion of Paced and Normal,” 9010 “Premature Ventricular Contraction,” 8862 “Atrial Premature,” and 9124 “Fusion of Ventricular and Normal.” The misclassifications included 5 “Normal” predicted as “Fusion of Paced and Normal,” 9 “Fusion of Paced and Normal” predicted as “Premature Ventricular Contraction,” 4 “Premature Ventricular Contraction” predicted as “Atrial Premature,” and 6 “Atrial Premature” predicted as “Fusion of Ventricular and Normal.” Lastly, for ConvNeXtTiny, the model predicted 9004 “Normal,” 9166 “Fusion of Paced and Normal,” 9006 “Premature Ventricular Contraction,” 8857 “Atrial Premature,” and 9118 “Fusion of Ventricular and Normal.” The misclassifications included 5 “Normal” predicted as “Fusion of Paced and Normal,” 14 “Normal” predicted as “Premature Ventricular Contraction,” 3 “Fusion of Paced and Normal” predicted as “Premature Ventricular Contraction,” and 9 “Atrial Premature” predicted as “Fusion of Ventricular and Normal.” In all three models, the confusion matrices reveal that the STL balancing technique significantly reduced the number of misclassifications, as most of the predicted labels closely match the true labels, particularly along the diagonal, demonstrating improved classification accuracy across all classes.
Overall, the results underscore the importance of using advanced resampling methods like STL in conjunction with state-of-the-art DL models to achieve robust and accurate arrhythmia classification. This study demonstrates that the ConvNeXtTiny model, with its smaller architecture and efficient processing, offers the best trade-off between computational efficiency and predictive accuracy, making it a promising tool for clinical applications in arrhythmia detection.
Complexity analysis
The complexity analysis evaluates the computational efficiency of three ConvNeXt models—ConvNeXtBase, ConvNeXtSmall, and ConvNeXtTiny—when applied to ECG data, using two different data balancing (DB) techniques: Random Oversampling (RO) and SMOTE-TomekLink (STL). Table 4 summarizes the build time (in seconds) required to train the models and the prediction time (in seconds) for each model under both DB techniques. Under the RO technique, the ConvNeXtBase model exhibited a build time of 1736.39 s and a prediction time of 2.24 s. In contrast, the ConvNeXtTiny model, being the most efficient, had a build time of 1724.67 s and a slightly faster prediction time of 2.14 s. Similarly, under the STL technique, ConvNeXtBase had a build time of 1746.03 s and a prediction time of 2.31 s, while ConvNeXtTiny again demonstrated superior efficiency, with a build time of 1725.19 s and a prediction time of 2.25 s.
Overall, the ConvNeXtTiny model consistently outperformed the other models in terms of both build and prediction times, highlighting its computational efficiency in ECG signal classification tasks.
Discussion
In this study, we introduced a novel hybrid DL approach for arrhythmia detection, integrating ConvNeXtTiny, ConvNeXtBase, and ConvNeXtSmall models with advanced data balancing techniques, such as STL and RO. These techniques were specifically chosen to address the challenge of class imbalance in ECG datasets, which can negatively impact model performance, especially for detecting rare arrhythmic events. By applying STL and RO, our model was able to mitigate the imbalance, enhancing its ability to accurately detect both common and rare arrhythmic patterns. This approach proved to be effective in improving the robustness and reliability of arrhythmia detection, a critical task in clinical environments where precise detection of infrequent arrhythmic events can be life-saving. The results of our comparative analysis, presented in Table 5, demonstrate that our proposed method outperforms several state-of-the-art techniques. Notably, the ConvNeXtTiny model combined with STL achieved an impressive accuracy of 99.75%, significantly surpassing other models such as ArrhyNet (92.73%), CNN (99.22%), CNN+LSTM (94.20%), 2D-CNN (99.12%), and MPR-STSGCN (99.71%). Furthermore, models like RF + SVM (98.21%) and ACDAE (98.88%) showed strong performance but still lagged behind our approach. The exceptional performance of our model can be attributed to the ability of the ConvNeXt architecture to effectively capture intricate patterns in ECG data, while STL and RO ensured that the model remained robust against class imbalance. These findings validate the efficacy of our hybrid approach in extracting meaningful arrhythmic features from ECG signals, which is crucial for accurate diagnosis and patient monitoring.
Our approach demonstrates a substantial improvement over existing methods, particularly in handling imbalanced datasets, which is a common challenge in ECG analysis. While models like CNN and CNN+LSTM showed promising results, they did not match the performance of our hybrid model. The success of our model underscores the importance of combining advanced DL architectures with effective DB techniques to achieve high accuracy and generalizability. The promising results suggest that our model has the potential to significantly enhance the accuracy and efficiency of arrhythmia detection systems, improving clinical decision-making and patient outcomes.
Pros of the Proposed Method Our work provides a clear advantage in terms of addressing class imbalance, a significant issue in ECG data, which can adversely impact model training and classification performance. The incorporation of STL and Random Oversampler techniques in our preprocessing pipeline plays a critical role in mitigating this issue, as evidenced by the near-perfect classification accuracy across most classes using the ConvNeXtTiny model. This enhancement in data representativeness allows the DL models to generalize better across both majority and minority classes. When comparing our model with other existing works, it becomes evident that while many previous studies, such as the works by Bechinia et al. (99.22%) and Zhang et al. (99.12%), achieved high accuracy, they did not explicitly address the class imbalance issue or use hybrid architectures for arrhythmia detection. Our approach stands out by incorporating both advanced DL architectures and targeted DB techniques, which together improve the model’s robustness and performance.
Motivation behind ConvNeXt-X models
The primary motivation behind selecting the three models such as ConvNeXtTiny, ConvNeXtSmall, and ConvNeXtBase, was to evaluate the effect of model complexity on ECG arrhythmia classification, which has not been extensively explored in existing research. ConvNeXt-X introduces a deeper and more flexible architecture, which consists of convolutional layers optimized for feature extraction, normalization techniques that stabilize learning, and pooling layers that reduce dimensionality while preserving important information. This makes ConvNeXt highly suitable for tasks involving time-series data, where both local and global patterns need to be captured. ConvNeXt is particularly advantageous for ECG signal classification, where detecting both short-term variations (such as individual heartbeats) and long-term trends (such as rhythm patterns) is crucial for accurate arrhythmia detection.
Moreover, these models differ in the number of parameters, allowing us to assess their capacity to generalize across various DB techniques, such as RO and STL. Among the models, ConvNeXtTiny, when combined with STL, demonstrated the highest performance, achieving a 99.75% accuracy rate. This model struck an optimal balance between computational efficiency and classification accuracy, particularly in addressing class imbalances in the dataset. Our results highlight the critical role of selecting appropriate models and DB methods to achieve superior performance in ECG classification tasks.
Validation of research questions
-
Q1: Accurate Identification of Arrhythmias in ECG Signals: Our hybrid approach effectively captures intricate patterns in ECG signals using ConvNeXt models. The combination of advanced DB techniques like STL ensures that minority classes are well-represented, enhancing the model’s ability to accurately identify arrhythmias.
-
Q2: Building a Reliable Automated Model: By fine-tuning ConvNeXt-X models, we have developed a robust methodology that leverages the strengths of each model. This fine-tuning process, combined with DB, results in a highly accurate and reliable automated model for arrhythmia detection.
-
Q3: Assessing Model Effectiveness: The effectiveness of our model is demonstrated through comprehensive evaluation against other DL algorithms. The superior accuracy and performance metrics indicate that our approach significantly improves the identification of arrhythmias, supporting more effective clinical decision-making.
Implications for clinical practice
The hybrid approach using ConvNeXt-X models and advanced DB techniques shows great potential for improving arrhythmia detection from ECG signals. By automating the precise identification of cardiovascular arrhythmias, this method can help healthcare providers make faster, more accurate diagnoses, particularly in early detection where timely intervention is critical. The ConvNeXtTiny model, paired with the STL technique, excels in handling imbalanced datasets and detecting rare but severe arrhythmias. This reduces the risk of misdiagnosis, refines decision boundaries, and enhances model generalizability. By integrating this system into clinical workflows, diagnostic accuracy improves, reducing healthcare professionals’ workload and allowing for more personalized, data-driven patient care.
Conclusions
In this study, we leveraged the MIT-BIH arrhythmia database and applied two DB techniques: RO and STL, to enhance the performance of ConvNeXt-X models in arrhythmia detection. Our findings demonstrate that the ConvNeXtTiny model, when using RO, achieved an accuracy of 99.72%, with precision, recall, and F1-scores all at 99.72%. Similarly, the ConvNeXtBase and ConvNeXtSmall models achieved accuracies of 99.65% and 99.67%, respectively. The application of STL resulted in a significant performance boost. The ConvNeXtTiny model attained an accuracy of 99.75%, along with perfect precision, recall, and F1-scores of 99.75%. The ConvNeXtBase and ConvNeXtSmall models also demonstrated enhanced accuracies of 99.69% and 99.72%, respectively. These results highlight the superior effectiveness of the STL technique over RO. The advantages of using STL are evident, as it generates synthetic samples for minority classes, creating a more diverse and representative dataset. This approach mitigates the risk of overfitting and enables the model to learn more robust decision boundaries, leading to improved model performance, higher accuracy, and better generalization to unseen data.
Cros of the proposed method
Despite the strong performance of our proposed hybrid DL approach, certain limitations warrant consideration. Notably, our study did not incorporate attention mechanisms within the ConvNeXt-X models. Additionally, we did not explore the integration of machine learning models or feature fusion techniques alongside our DL architectures. while our model addresses the class imbalance effectively, further research could explore how it performs across larger, more diverse datasets, or when applied to other physiological signals beyond ECG.
Future works
In order to overcome these constraints, future research will examine how attention processes could be used to improve feature extraction capabilities. Furthermore, we aim to evaluate the performance of various machine learning models in conjunction with DL techniques and explore feature fusion strategies to leverage the strengths of both approaches. Additionally, we plan to validate our model on more diverse datasets and extend our research to include other physiological signals beyond ECG, thereby enhancing the generalizability and applicability of our findings in clinical settings.
In conclusion, our study demonstrates that the hybrid DL approach, combining ConvNeXt models with DB techniques, offers a substantial improvement in ECG-based arrhythmia detection. The success of this approach suggests it can be further optimized for real-world clinical applications where accurate, efficient arrhythmia detection is critical.
Data availability
The selected dataset is sourced from free and open-access sources such as MIT-BIH Arrhythmia Dataset: https://physionet.org/content/mitdb/1.0.0/.
References
Liu, X., Wang, H., Li, Z. & Qin, L. Deep learning in ECG diagnosis: A review. Knowl.-Based Syst. 227, 107187 (2021).
Hussain, M. M., Rafi, U., Imran, A., Rehman, M. U., & Abbas, S. K. Risk factors associated with cardiovascular disorders: Risk factors associated with cardiovascular disorders. Pak. BioMed. J. 03–10 (2024).
Luo, S. & Johnston, P. A review of electrocardiogram filtering. J. Electrocardiol. 43(6), 486–496 (2010).
Ma, S., Cui, J., Chen, C.-L., Chen, X. & Ma, Y. An effective data enhancement method for classification of ECG arrhythmia. Measurement 203, 111978 (2022).
Majnaric, L. & Sabanovic, S. Cardiovascular disease research by using data from electronic health records. Atherosclerosis 252, 41 (2016).
Sahoo, S., Dash, M., Behera, S. & Sabut, S. Machine learning approach to detect cardiac arrhythmias in ECG signals: A survey. IRBM 41(4), 185–194 (2020).
Sampson, M. Ambulatory electrocardiography: Indications and devices. Br. J. Cardiac Nurs. 14(3), 114–121 (2019).
Norlock, V. et al. Comparing the outcomes and costs of cardiac monitoring with implantable loop recorders and mobile cardiac outpatient telemetry following stroke using real-world evidence. J. Comp. Eff. Res. 13(6), 240008 (2024).
Sun, J.-Y., Shen, H., Qu, Q., Sun, W. & Kong, X.-Q. The application of deep learning in electrocardiogram: Where we came from and where we should go?. Int. J. Cardiol. 337, 71–78 (2021).
Ansari, Y., Mourad, O., Qaraqe, K., & Serpedin, E. Deep learning for ECG arrhythmia detection and classification: An overview of progress for period 2017–2023. Front. Physiol. 14 (2023)
Jibon, F. A., Tasbir, A., Talukder, M. A., Uddin, M. A., Rabbi, F., Uddin, M. S., Alanazi, F. K. & Kazi, M. Parkinson’s disease detection from EEG signal employing autoencoder and RBFNN-based hybrid deep learning framework utilizing power spectral density. Digit. Health 10, 20552076241297355 (2024).
Savaş, S., Topaloğlu, N., Kazcı, Ö. & Koşar, P. Comparison of deep learning models in carotid artery intima-media thickness ultrasound images: CAIMTUSNet. Bilişim Teknolojileri Dergisi 15(1), 1–12 (2022).
Rasa, S. M. et al. Brain tumor classification using fine-tuned transfer learning models on magnetic resonance imaging (MRI) images. Dig. Health 10, 20552076241286140 (2024).
Rana, M. M. et al. A robust and clinically applicable deep learning model for early detection of Alzheimer’s. IET Image Process. 17(14), 3959–3975 (2023).
Bechinia, H., Benmerzoug, D. & Khlifa, N. Approach based lightweight custom convolutional neural network and fine-tuned mobilenet-v2 for ECG arrhythmia signals classification. IEEE Access 12, 40827–40841 (2024).
Zhang, F., Wang, J., Li, M., & Wang, B. Multi-scale and multi-channel information fusion for exercise electrocardiogram feature extraction and classification. IEEE Access (2024)
Zubair, M., Woo, S., Lim, S., & Kim, D. Deep representation learning with sample generation and augmented attention module for imbalanced ECG classification. IEEE J. Biomed. Health Inform. (2023)
Chen, Y., Qiu, S., Wang, Z., Zhao, H., & Cao, X. Multiperceptive region of spatial-temporal graph convolutional shrinkage network for arrhythmia recognition. IEEE Trans. Instrum. Meas. (2024)
Tahmid, M. T., Kader, M. E., Mahmud, T., & Fattah, S. A. Md-cardionet: A multi-dimensional deep neural network for cardiovascular disease diagnosis from electrocardiogram. IEEE J. Biomed. Health Inform. (2023).
Aphale, S. S., John, E., & Banerjee, T. Arrhynet: a high accuracy arrhythmia classification convolutional neural network. In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), 453–457 (2021). IEEE
Katal, N., Gupta, S., Verma, P. & Sharma, B. Deep-learning-based arrhythmia detection using ECG signals: A comparative study and performance evaluation. Diagnostics 13(24), 3605 (2023).
Shi, H., Qin, C., Xiao, D., Zhao, L. & Liu, C. Automated heartbeat classification based on deep neural network with multiple input layers. Knowl.-Based Syst. 188, 105036 (2021).
Baños, F. S., Romero, N. H., Mora, J. C. S. T., MarĂn, J. M., Vite, I. B., & Fuentes, G. E. A. A novel hybrid model based on convolutional neural network with particle swarm optimization algorithm for classification of cardiac arrhytmias. IEEE Access (2023)
Huang, J., Chen, B., Yao, B. & He, W. ECG arrhythmia classification using STFT-based spectrogram and convolutional neural network. IEEE Access 7, 92871–92880 (2020).
Sabor, N., Gendy, G., Mohammed, H., Wang, G. & Lian, Y. Robust arrhythmia classification based on QRS detection and a compact 1d-CNN for wearable ECG devices. IEEE J. Biomed. Health Inform. 26(12), 5918–5929 (2022).
Farag, M. M. A self-contained STFT CNN for ECG classification and arrhythmia detection at the edge. IEEE Access 10, 94469–94486 (2022).
Singh, P. & Sharma, A. Attention-based convolutional denoising autoencoder for two-lead ECG denoising and arrhythmia classification. IEEE Trans. Instrum. Meas. 71, 1–10 (2022).
Bhattacharyya, S., Majumder, S., Debnath, P. & Chanda, M. Arrhythmic heartbeat classification using ensemble of random forest and support vector machine algorithm. IEEE Trans. Artif. Intell. 2(3), 260–268 (2021).
Pokaprakarn, T. et al. Sequence to sequence ECG cardiac rhythm classification using convolutional recurrent neural networks. IEEE J. Biomed. Health Inform. 26(2), 572–580 (2021).
Gill, K. S., Anand, V., & Gupta, R. Arrhythmia classification using ECG image dataset using machine learning approach on densenet121 model. In 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1–4 (2023). IEEE.
Mark, R., Schluter, P., Moody, G., Devlin, P., & Chernoff, D. An annotated ECG database for evaluating arrhythmia detectors. In IEEE Transactions on Biomedical Engineering, Vol. 29, 600–600 (IEEE-Inst Electrical Electronics Engineers Inc 345 E 47th st, New York, NY, 1982).
Talukder, M. A., Khalid, M. & Uddin, M. A. An integrated multistage ensemble machine learning model for fraudulent transaction detection. J. Big Data 11, 168 (2024).
Talukder, M. A., Sharmin, S., Uddin, M. A., Islam, M. M. & Aryal, S. MLSTL-WSN: Machine learning-based intrusion detection using SMOTETomek in WSNs. Int. J. Inf. Secur. 23(3), 2139–2158 (2024).
Liu, F. et al. An improved covid-19 lung x-ray image classification algorithm based on convnext network. Int. J. Image Graph. 24(03), 2450036 (2024).
Benchallal, F., Hafiane, A., Ragot, N. & Canals, R. Convnext based semi-supervised approach with consistency regularization for weeds classification. Expert Syst. Appl. 239, 122222 (2024).
Mo, H. & Wei, L. SA-ConvNeXt: A hybrid approach for flower image classification using selective attention mechanism. Mathematics 12(14), 2151 (2024).
Talukder, M. A., Layek, M. A., Kazi, M., Uddin, M. A. & Aryal, S. Empowering covid-19 detection: Optimizing performance through fine-tuned efficientnet deep learning architecture. Comput. Biol. Med. 168, 107789 (2024).
Islam, M. M. et al. A deep learning model for cotton disease prediction using fine-tuning with smart web application in agriculture. Intell. Syst. Appl. 20, 200278 (2023).
Islam, M. M., Talukder, M. A., Uddin, M. A., Akhter, A. & Khalid, M. Brainnet: Precision brain tumor classification with optimized efficientnet architecture. Int. J. Intell. Syst. 2024(1), 3583612 (2024).
Acknowledgements
The authors would like to extend their sincere appreciation to the Researchers Supporting Project Number (RSPD2024R994), King Saud University, Riyadh, Saudi Arabia.
Funding
This research is supported by the RSPD2024R994, King Saud University, Riyadh, Saudi Arabia.
Author information
Authors and Affiliations
Contributions
Md. Alamin Talukder: Conceptualization, Data curation, Methodology, Software, Resource, Visualization, Formal Analysis, Supervision, Writing-original draft and review & editing. Majdi Khalid: Formal Analysis, Visualization, Validation, Investigation, writing-review & editing. Mohsin Kazi: Formal Analysis, Visualization, Validation, Investigation, writing-review & editing. Nusrat Jahan Muna: Data curation, Methodology, Software, Resource, Visualization, Formal Analysis. Mohammad Nur-e-Alam: Visualization, Validation, Investigation and Writing-Review & editing. Sajal Halder: Visualization, Validation, Resource, Formal Analysis, Investigation and Writing-Review & editing. Nasrin Sultana: Visualization, Validation, Investigation and Writing-Review & editing.
Corresponding authors
Ethics declarations
Competing interests
The authors have no conflicts of interest to declare that they are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Talukder, M.A., Khalid, M., Kazi, M. et al. A hybrid cardiovascular arrhythmia disease detection using ConvNeXt-X models on electrocardiogram signals. Sci Rep 14, 30366 (2024). https://doi.org/10.1038/s41598-024-81992-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-81992-w
Keywords
This article is cited by
-
HPML-CVD: Hyperparameter Tuned Machine Learning Model to Predict Cardiovascular Disease
Biomedical Materials & Devices (2026)
-
Enhancing malware detection with feature selection and scaling techniques using machine learning models
Scientific Reports (2025)
-
Abnormal Driving Behavior Detection: A Machine and Deep Learning Based Hybrid Model
International Journal of Intelligent Transportation Systems Research (2025)
-
An Integrated Deep Learning Model for Skin Cancer Detection Using Hybrid Feature Fusion Technique
Biomedical Materials & Devices (2025)
-
Intrusion Detection and Mitigation Method for the Industrial Internet of Things Using Bidirectional Convolutional Long Short-Term Memory and Deep Recurrent Convolutional Q-Networks
International Journal of Computational Intelligence Systems (2025)