Robust screening of atrial fibrillation with distribution classification

Massiani, Pierre-François; Haverbeck, Lukas; Thesing, Claas; Solowjow, Friedrich; Verket, Marlo; Zink, Matthias Daniel; Schütt, Katharina; Müller-Wieland, Dirk; Marx, Nikolaus; Trimpe, Sebastian

doi:10.1038/s41598-025-10090-2

Download PDF

Article
Open access
Published: 22 July 2025

Robust screening of atrial fibrillation with distribution classification

Pierre-François Massiani¹,
Lukas Haverbeck¹,
Claas Thesing¹,
Friedrich Solowjow¹,
Marlo Verket²,
Matthias Daniel Zink²,
Katharina Schütt²,
Dirk Müller-Wieland²,
Nikolaus Marx² &
…
Sebastian Trimpe¹

Scientific Reports volume 15, Article number: 26582 (2025) Cite this article

1175 Accesses
Metrics details

Subjects

Abstract

Atrial fibrillation (AF) correlates with an increased risk of all-cause mortality or stroke, mainly due to undiagnosed patients and undertreatment. Its screening is thus a key challenge, for which machine learning methods hold the promise of cheaper and faster campaigns. The robustness of such methods to varying artifacts, noise, and conditions is then crucial. We introduce the first distributional support vector machine (SVM) for robust detection of AF from short, noisy electrocardiograms. It achieves state-of-the-art performance and unprecedented robustness on the screening problem while only leveraging one interpretable feature and little training data. We illustrate these advantages by evaluating on other data sources (cross-data-set) and through sensitivity studies. These strengths result from two main components: (i) preliminary peak detection enabling robust computation of medically relevant features; and (ii) a mathematically principled way of aggregating those features to compare their full distributions. This establishes our algorithm as a relevant candidate for screening campaigns.

Machine learning workflow for edge computed arrhythmia detection in exploration class missions

Article Open access 22 June 2024

Quantifying deep neural network uncertainty for atrial fibrillation detection with limited labels

Article Open access 22 November 2022

Machine learning model for predicting late recurrence of atrial fibrillation after catheter ablation

Article Open access 14 September 2023

Introduction

Atrial fibrillation (AF) is the most widespread cardiac arrhythmia, with a prevalence estimated at $3\%$ in adults¹. Although it is not, in most cases, a life-threatening condition directly, it correlates with an increased risk of all-cause mortality and of stroke². A core stake of AF is thus its early detection to enable preemptive care and prophylactic treatments, thereby mitigating the risk of aggravated conditions and reducing public health burdens¹. Yet, a significant portion of the population living with AF remains undiagnosed, primarily those with subclinical AF² even though this condition can be identified from a single-lead electrocardiogram (ECG). Therefore, efficient and broadly accessible screening for early AF detection has been identified as crucial in reducing AF-related complications¹.

Automated methods together with low-cost sensors hold the potential for large-scale screening campaigns for AF. They can assist physicians by an automatic and fast analysis of patient data, increasing throughput. However, this requires methods with unprecedented robustness, as low-cost sensors are notoriously noisy and the populations on which screening algorithms are used often differ statistically from those on which they were calibrated. This effect is known as distribution shift.

Methods for automatic AF detection can be broadly classified as being either rule-based^{3,4,5,6,7,8,9} or learning-based^{10,11,12,13,14,15,16,17,18,19,20,21,22,23}. The former are historically the most widespread due to their simplicity, low computational cost, and interpretability. They are also relatively robust, as they rely on established biomarkers. On the other hand, they can fail to distinguish certain conditions due to the limited number of features they use, and increasing this number makes them difficult to tune. In contrast, learning-based methods hold the promise of increased performance, by allowing more features—some learned from data—and tuning their relative weights and decision thresholds based on examples. While numerous recent studies highlight the excellent performance of such methods, the question of their robustness is much less explored. In fact, recent evidence points to the fact that many methods lack such robustness^13,14, which is a clear hindrance to their deployment in screening campaigns.

In this study, we propose an algorithm for the robust screening of AF from single-lead ECGs. We find that our method is more sensitive than tested alternatives^24,25, which is an essential quality for screening. Further, a methodological contribution of our work is the systematic cross-data-set evaluation of algorithms, meaning that we evaluate performance on test sets from a source different that that of the training set. We use the DiagnoStick²⁵, SPH²⁶, and CinC 2017²⁴ data sets, as they have varied lengths and noise levels, and do not use the MIT-BIH data set²⁷ as its high-quality recordings are not representative of the situation of screening. This enables us to evaluate robustness, an aspect that other studies typically neglect¹³. In fact, we find in our comparative study that the performances of almost all top-performers of the Computing in Cardiology competition²⁴ collapse cross-data-set. In contrast, we achieve our robust performance by relying on a single feature; namely, R–R intervals (RRis). This strongly supports that it is important for screening to rely on few, but medically-relevant features with high signal-to-noise ratio so that even low-cost sensors can capture them reliably. Such features then need to be exploited to their maximum, requiring a powerful representation of their distribution. While the classical approach consists of computing empirical moments in a window of fixed size, we leverage a novel class of algorithms called distributional support vector machines (SVMs)^28,29 that operate on and compare probability distributions directly. This allows them to work with inputs of arbitrary size, refining the estimates of the distributions as more samples become available, which is essential to allow different data sources and, thus, for screening. Furthermore, our SVM is competitive with recent neural networks to which we compare, achieving marginally lower accuracy but higher robustness and sensitivity. Our work thus demonstrates that excellent performance for screening can be achieved with simpler and more interpretable algorithms such as distributional SVMs, contrasting with previous studies¹⁴ where a clear supremacy of neural networks emerged, as seen by comparing the reported performances of neural networks^{10,11,12,13,14,15,16} and of SVMs^{14,17,18,19,20}. Finally, this work is the first implementation of distributional SVMs to our knowledge, which is then a methodological contribution.

Results

We propose the first implementation of SVMs for distribution classification to detect the presence of AF from a short ECG. We evaluate our model on three data sets: MyDiagnoStick²⁵, SPH²⁶, and CinC 2017²⁴ data sets. Their sizes are in Table 1. The exact length of the ECG depends on the specific data set at hand and may vary between training and inference time. We remind the reader that “in-data-set” refers to the configuration where both the training and testing set are from the same original data set, while “cross-data-set” to that where they are from different data sets. The model takes as an input the outcome of a preliminary peak detection (Fig. 1).

Table 1 Set sizes.

Full size table

Performance and cross-data-set stability

We report accuracy, F1-score, sensitivity, and specificity, respectively defined as

$$\begin{aligned} \text {Accuracy}&= \frac{\text {TP}+ \text {TN}}{N},&\text {F}1\text {-Score}&=\frac{2\text {TP}}{2\text {TP}+ \text {FP}+ \text {FN}},\\ \text {Sensitivity}&= \frac{\text {TP}}{\text {TP}+ \text {FN}},&\text {Specificity}&=\frac{\text {TN}}{\text {TN}+\text {FP}}, \end{aligned}$$

where N is the number of samples and $\text {TP}$, $\text {TN}$, $\text {FP}$, and $\text {FN}$ are respectively the numbers of true positives, true negatives, false positives, false negatives. We also report area under the receiving operator characteristic (AUROC), and provide the corresponding ROC curves (Fig. 2). We achieve excellent performance in-data-set; for comparison, we outperform both the proprietary algorithm of the MyDiagnoStick medical device and the neural network and SVM of¹⁴ on the DiagnoStick and SPH data sets, respectively. The only exception is the CinC data set, where we have lower F1-score than top-performers of the competition²⁴. Furthermore, our performance is robust cross-data-set; for each test set, the attained F1-score only fluctuates by at most 2 points upon switching the training set, with the exception of the SPH data set where in-data-set training yields better results. This is not the case for the baselines, whose performance all collapse cross-data-set, with the notable exception of³⁰ which achieves similar performance as our method but a moderately inferior sensitivity. This stability across different training sets demonstrates the robustness of our method, as the classification performance is only mildly affected by this distribution shift between training and inference. Finally, a visual inspection of the misclassified examples of the DiagnoStick test set highlights that peak extraction is of poor quality on those examples (Supplementary Table S1).

Confusion matrices (Fig. 3) reveal that the confusions are mainly independent of the training set and arise from conditions that are hard to tell apart from AF purely based on RRi, such as sinus arrhythmia (SA) and atrial flutter (AFlut). Normal sinus rhythm, however, is almost perfectly separated.

Data efficiency

We examine the data efficiency of our algorithm by training it on sets of increasing sizes (Fig. 4). Each training is repeated several times to account for the non-deterministic choice of the training set. The results show excellent and consistent performance is already reached with training sets containing about 50 to 100 positive examples.

Table 2 Performance metrics on test sets, in percentage points, for our method, the proprietary algorithm, and the baselines^30,31,32,33.

Full size table

Influence of peak detection

We run our pipeline for different peak detection algorithms on all validation data sets and find that the performance highly depends on the choice of the algorithm (Table 3). Moreover, some algorithms perform particularly poorly on noisy data sets. The XQRS algorithm³⁴ outperforms or is equally as good as all others on all data sets, and is the one that we implement for the rest of the results.

Table 3 In-data-set F1-score, in percentage points, on validation data sets.

Full size table

Discussion

Our method achieves state-of-the-art performance on the SPH and DiagnoStick data sets, outperforming the proprietary algorithm of the MyDiagnostick medical device and the SVM and neural network of¹⁴. It is not directly competitive^{Footnote 1} in-data-set with the top-performers of the CinC 2017 competition²⁴, but it significantly outperforms them cross-data-set, demonstrating unprecedented robustness to distribution shift (with the exception of³⁰, which is also robust). Furthermore, our method is the most sensitive of all evaluated alternatives.

A first conclusion our results support is that SVMs can be competitive with neural networks for AF screening—an observation that deviates from recent results on learning-based detection of AF. This is particularly important for deployment, as SVMs based on well-understood features are notriously more interpretable than black-box neural networks trained end-to-end. The essential component enabling this improvement is the distributional embedding step of our classifier. It extracts more information from this feature than a classical SVM would and bases the classification decision on the full distribution of the feature, as opposed to only using a few moments. This step also provides a principled and theory-backed answer to a recurring challenge of cross-data-set evaluation; namely, adapting the size of the input to match the expectations of the classifier. Indeed, our distributional SVM works with inputs of arbitrary size, possibly differing from that seen at training.

A second conclusion is that it can be beneficial for screening to rely on few, but medically-relevant prominent features that are exploited to their maximum rather than to multiply statistical features. Indeed, this provides the resulting classifiers with inherent robustness and data-efficiency, as even low-quality sensors are able to capture them reliably and sensor- or population-specific statistical artifacts are filtered out. It is clear that classifiers relying on such few features may not separate conditions that require additional information to be told apart; yet, this is not necessarily problematic for screening, where false positives are preferred over false negatives to an extent and the gained robustness and data efficiency can be essential.

The robustness of our classifier is largely influenced by that of the peak detection algorithm it uses. We find high variability in our results depending on the choice of such an algorithm, despite all of them reporting excellent performance on their respective benchmarks. An examination of our classifier’s mistakes on the DiagnoStick data set (supplementary material) suggests that misclassifications mainly occur together with inaccurate peak detection, which is in turn primarily due to noise. This highlights the crucial role of these algorithms for robust and interpretable AF detection, as they enable extracting the features effectively used by practitioners. Perhaps a part of the effort on end-to-end AF detection could thus be redirected to extracting these features reliably when existing algorithms fail. Such a pipeline including separate feature extraction has already shown promising results for robustness in cancer mutation prediction ⁴², and our results support that this robustness carries to AF detection.

Methods

Data sets

The DiagnoStick data set consists of 7209 1 min-long ECGs recorded at a frequency of $200\,\textrm{Hz}$ with the MyDiagnostick medical device (hand-held, single-lead ECG device)²⁵. The data was collected in pharmacies in Aachen for research assessing the efficiency of the MyDiagnoStick medical device. Each ECG comes with a binary label indicating whether or not it shows AF. The data set is composed of 383 recordings with AF (labeled “AF” in the remainder of this work), and 6745 recordings without AF (labeled “noAF”). The remaining 81 recordings have a label “Unknown” and are discarded in this study. These labels were determined by medical practitioners, as reported in²⁵.

The MyDiagnoStick medical device also comes with a proprietary algorithm that classifies recorded ECGs. We refer to these labels as the “Proprietary” ones; they may differ from the ones outlined above. We use the proprietary labels to compare against this proprietary algorithm.

The SPH data set consists of 10646 recordings of 12-lead ECGs collected for research on various cardiovascular diseases²⁶. Each recording lasts $10\,\textrm{s}$, is sampled at a frequency of $500\,\textrm{Hz}$, and was annotated with one of ten rhythm labels by medical professionals, which include AF. In the original presentation of the SPH data set, the acronym AF refers to atrial flutter. We reserve AF for atrial fibrillation, and denote atrial flutter with AFlut. We only consider the first lead and collapse the annotations to AF and noAF to perform binary classification. In particular, the noAF class includes normal sinus rhythm and other conditions that are not AF.

The CinC data set consists of 8528 recordings of single-lead ECGs, lasting from about $30\,\textrm{s}$ to $60\,\textrm{s}$ and sampled at $300\,\textrm{Hz}$²⁴. Each ECG is labeled with one of four labels among “Normal”, “AF”, “Other rhythm”, and “Noisy”. We disregard “Noisy” ECGs, and collapse the labels “Normal” and “Other rhythm” into a class labeled “noAF” to perform binary classification.

We use the denoised version of the SPH data set²⁶ and do not apply further filtering. This denoising is a low-level filter available on most ECG acquisition devices²⁶. The DiagnoStick and CinC data sets are not further denoised.

The three data sets are independently split into stratified training ($60\%$), validation ($20\%$), and testing ($20\%$) sets. For the SPH and CinC data sets, the stratification is performed before we collapse the labels to “AF” and “noAF”, such that the training, validation, and test sets all have the same proportions of each class with the original labels. A summary of the sizes of the final sets is available in Table 1. All reported results refer to the testing data, which were never used to train or tune the model, with the exception of the results of Table 3 which use the merged data sets between validation and testing.

Data preprocessing

Each recording is processed by an automatic peak detection algorithm that identifies the time indices of the R-peaks. We then calculate the RRis of each ECG by subtracting these successive time indices. The ECG is then discarded and represented instead by its collection of RRis. Each collection is normalized independently (see below). In what follows, we omit the adjective “normalized” when referring to RRis or their distributions for conciseness. We discard recordings with fewer than 8 (SPH data set), 51 (DiagnoStick data set), or 20 (CinC data set) peaks.

Feature vector

Let $N\in \mathbb {N}$ be the number of patients, and denote for each $n\in \{1,\dots ,N\}$ the time index of the location of the i-th R-peak of patient n by $t_i^{(n)}$. The RRis of patient n are then $\delta ^{(n)}_i = t_{i+1}^{(n)} - t^{(n)}_i$. We aggregate them in a single vector, which we then normalize to obtain the feature vector

$$\begin{aligned} x^{(n)} = \begin{pmatrix}x_i^{(n)}\end{pmatrix}_{i=1,\dots ,d_n} = \begin{pmatrix}\frac{\delta ^{(n)}_{i}}{\frac{1}{d_n}\sum _{j=1}^{d_n}\delta _{j}^{(n)}}\end{pmatrix}_{i=1,\dots ,d_n}\in \mathbb {R}^{d_n}, \end{aligned}$$

where $d_n$ is the number of intervals available for patient n. This normalization enables setting the average value of each feature vector to 1, without units. A data set $\mathcal {D}$ is then a collection of feature vectors with their corresponding labels $\mathcal {D}=\{(x^{(n)}, y^{(n)})\mid n\in \{1,\dots ,N\}\}$, where $y^{(n)}=1$ represents AF and $y^{(n)} = -1$ represents noAF. This corresponds to the step “Feature Extraction” in Fig. 1.

SVM for distribution classification

The most robust marker to assess AF from an ECG is through the irregularity of the interval between R-peaks^2,43. In other words, the distribution of RRis is more spread in the presence of AF than without it. The goal of our SVM for distribution classification is to classify whether a patient’s RRis distribution is more similar to that of a patient with or without AF. Unfortunately, we do not have access to that distribution, but only to the samples $x^{(n)}_i$, which we assume to be independent and identically distributed (i.i.d.) according to $P^{(n)}$, the RRis distribution of patient n. We explicitly note here that the vectors $x^{(n)}$ may have different lengths, contrary to classical SVMs. We use these i.i.d. samples to approximate a representation of $P^{(n)}$ with its empirical distribution, and then leverage an SVM for classification on this representation. This procedure is called two-stage sampling^28,29.

We now detail the distributional SVM, which builds on ideas and concepts from classical SVMs, of which a complete introduction can be found in ⁴⁴, Chapter 7. We begin with a symmetric, positive semi-definite kernel function $k:\mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}$. The core idea is to represent each distribution $P^{(n)}$ with its kernel mean embedding (KME) $\mu ^{P^{(n)}}$, defined as

$$\begin{aligned} \mu ^{P^{(n)}} = \int _{\mathbb {R}} k(\cdot ,x)\textrm{d} P^{(n)}(x). \end{aligned}$$

(1)

We would like to run an SVM for binary classification on the set of input-output pairs ${\mathscr {D}}_\mu = \{(\mu ^{P^{(n)}},y^{(n)})\}$. Such an SVM would then require another kernel function $K:{\mathscr {H}}\times {\mathscr {H}}\rightarrow R$, where $K(\mu ^{P^{(n)}},\mu ^{P^{(m)}})$ is the similarity between two kernel mean embeddings $\mu ^{P^{(n)}}$ and $\mu ^{P^{(m)}}$. Here, ${\mathscr {H}}$ is the set in which all of the kernel mean embeddings lie. Unfortunately, we do not have access to $\mu ^{P^{(n)}}$, and are thus unable to compute the corresponding value of K. The second-stage sampling then consists of approximating $\mu ^{P^{(n)}}$ with its empirical estimator $\mu ^{(n)}$, defined as

$$\begin{aligned} \mu ^{(n)} = \frac{1}{d_n}\sum _{i=1}^{d_n} k\left( \cdot ,\,x^{(n)}_i\right) , \end{aligned}$$

(2)

and to approximate the Gram matrix of K with the approximate Gram matrix $(K(\mu ^{(n)}, \mu ^{(m)}))_{m,n}$. This procedure corresponds to the step “Distributional Embedding” in Fig. 1. Summarizing, the full learning algorithms consists of first performing feature extraction to compute the vector $x^{(n)}$ for each patient, then estimating the KME through (2), and finally performing SVM classification on the empirical KMEs with the kernel K (Fig. 1).

This algorithm requires choosing two different kernel functions, k and K. We use for both a Gaussian kernel,

$$\begin{aligned} k(z_1, z_2) = \exp \left[ -\frac{(z_1 - z_2)^2}{\sigma ^2}\right] ,\quad \text {and}\quad K(\mu _1,\mu _2) = \exp \left[ -\frac{\Vert \mu _1 - \mu _2\Vert ^2_{{\mathscr {H}}}}{2\gamma ^2}\right] \end{aligned}$$

where $\sigma ^2$ and $\gamma ^2$ are hyperparameters whose choices are discussed next, and

$$\begin{aligned} \Vert \mu ^{(n)} - \mu ^{(m)}\Vert ^2 = \sum _{i,j=1}^{d_n} k\left( x_i^{(n)},x_{j}^{(n)}\right) + \sum _{i,j=1}^{d_m} k\left( x_i^{(m)},x_{j}^{(m)}\right) - 2\sum _{i=1}^{d_n}\sum _{j}^{d_m} k\left( x_i^{(n)},x_{j}^{(m)}\right) . \end{aligned}$$

(3)

We refer to⁴⁵ for more details on (3).

Hyperparameter tuning

All hyperparameters are selected via in-data-set 5-fold cross-validation on the merged training and validation sets. Specifically, we perform a grid search over $\sigma$, $\gamma$, and the costs of misclassification of the loss function, and choose the combination maximizing the mean AUROC on the held-out fold. We apply this procedure to each candidate peak-extraction algorithm and ultimately select the XQRS algorithm³⁴, because it performs the best across data sets (see Table 3). Detailed instructions to reproduce the hyperparameter study are available in the code.

Baselines

The baselines^30,31,32,33 were selected as the four best-scoring models of the CinC 2017 competition²⁴, with the exception of⁴⁶ as the code available on the competition’s website did not run. In particular, we used the published models directly and did not re-train them.

Data availability

The SPH data set that supports the findings of this study is available in Figshare with the identifier https://doi.org/10.6084/m9.figshare.c.4560497, and the CinC data set accompanies the publication²⁴ and is available on Physionet at the link https://doi.org/10.13026/d3hm-sf11. The DiagnoStick data set was first presented in the article accessible at https://doi.org/10.1007/s00399-020-00711-w and is available from the authors of that article upon request. Alternatively, the DiagnoStick data set can be accessed upon request to Nikolaus Marx.

Code availability

The underlying code for this study is publicly available at https://github.com/Data-Science-in-Mechanical-Engineering/af-detection with instructions on how to reproduce the results and figures.

Notes

F1-scores for AF detection of the graded models can be found as the $\textrm{F}1\textrm{a}$ field in the file results_all_F1_scores_for_each_classification_type.csv on the competition webpage.

References

Camm, A. J. et al. 2012 focused update of the ESC guidelines for the management of atrial fibrillation: An update of the 2010 ESC guidelines for the management of atrial fibrillation developed with the special contribution of the European Heart Rhythm Association. Eur. Heart J.33(21), 2719–2747 (2012).
Kirchhof, P. et al. 2016 ESC Guidelines for the management of atrial fibrillation developed in collaboration with EACTS. Eur. J. Cardiothorac. Surg. 50(5), e1–e88 (2016) (ISSN 1010-7940).
Article PubMed Google Scholar
Tateno, K. & Glass, L. Automatic detection of atrial fibrillation using the coefficient of variation and density histograms of rr and $\delta$rr intervals. Med. Biol. Eng. Comput. 39, 664–671 (2001).
Article CAS PubMed Google Scholar
Huang, C. et al. A novel method for detection of the transition between atrial fibrillation and sinus rhythm. IEEE Trans. Biomed. Eng. 58(4), 1113–1119 (2010).
Article PubMed Google Scholar
Alcaraz, R., Abásolo, D., Hornero, R. & Rieta, J. J. Optimal parameters study for sample entropy-based atrial fibrillation organization analysis. Comput. Methods Programs Biomed. 99(1), 124–132 (2010).
Article PubMed Google Scholar
DeMazumder, D. et al. Dynamic analysis of cardiac rhythms for discriminating atrial fibrillation from lethal ventricular arrhythmias. Circ. Arrhythm. Electrophysiol. 6(3), 555–561 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X., Ding, H., Ung, B., Pickwell-MacPherson, E. & Zhang, Y. Automatic online detection of atrial fibrillation based on symbolic dynamics and Shannon entropy. Biomed. Eng. 13, 1–18 (2014).
Google Scholar
Petrėnas, A., Marozas, V. & Sörnmo, L. Low-complexity detection of atrial fibrillation in continuous long-term monitoring. Comput. Biol. Med. 65, 184–191 (2015).
Article PubMed Google Scholar
Linker, D. T. Accurate, automated detection of atrial fibrillation in ambulatory recordings. Cardiovasc. Eng. Technol. 7, 182–189 (2016).
Article PubMed PubMed Central Google Scholar
Faust, O. et al. Automated detection of atrial fibrillation using long short-term memory network with RR interval signals. Comput. Biol. Med. 102, 327–335 (2018).
Article PubMed Google Scholar
He, R. et al. Automatic detection of atrial fibrillation based on continuous wavelet transform and 2D convolutional neural networks. Front. Physiol. 9, 1206 (2018).
Article PubMed PubMed Central Google Scholar
Ping, Y., Chen, C., Wu, L., Wang, Y. & Shu, M. Automatic detection of atrial fibrillation based on CNN-LSTM and shortcut connection. Healthcare 8(2), 139 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, X. et al. Over-fitting suppression training strategies for deep learning-based atrial fibrillation detection. Med. Biol. Eng. Comput. 59, 165–173 (2021).
Article PubMed Google Scholar
Aziz, S., Ahmed, S. & Alouini, M.-S. Ecg-based machine-learning algorithms for heartbeat classification. Sci. Rep. 11(1), 18738 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Yoon, T. & Kang, D. Bimodal CNN for cardiovascular disease classification by co-training ECG grayscale images and scalograms. Sci. Rep. 13(1), 2937 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Lai, J. et al. Practical intelligent diagnostic algorithm for wearable 12-lead ECG via self-supervised learning on large-scale dataset. Nat. Commun. 14(1), 3741 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Asgari, S., Mehrnia, A. & Moussavi, M. Automatic detection of atrial fibrillation using stationary wavelet transform and support vector machine. Comput. Biol. Med. 60, 132–142 (2015).
Article PubMed Google Scholar
Andersen, R. S., Poulsen, E. S. & Puthusserypady, S. A novel approach for automatic detection of atrial fibrillation based on inter beat intervals and support vector machine. In 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 2039–2042. IEEE, (2017).
Colloca, R., Johnson, Alistair E. W., Mainardi, L. & Clifford, G. D. A support vector machine approach for reliable detection of atrial fibrillation events. In Computing in Cardiology 2013, pp 1047–1050. IEEE (2013).
Czabanski, R. et al. Detection of atrial fibrillation episodes in long-term heart rhythm signals using a support vector machine. Sensors 20(3), 765 (2020).
Article ADS PubMed PubMed Central Google Scholar
Bruun, I. H., Hissabu, S. M. S., Poulsen, E. S. & Puthusserypady, S. Automatic Atrial Fibrillation detection: A novel approach using discrete wavelet transform and heart rate variability. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 3981–3984 (2017).
Datta, S., Puri, C., Mukherjee, A., Banerjee, R., Choudhury, A. D., Singh, R., Ukil, A., Bandyopadhyay, S., Pal, A. & Khandelwal, S. Identifying normal, AF and other abnormal ECG rhythms using a cascaded binary classifier. In 2017 Computing in cardiology (cinc), pp 1–4. IEEE (2017).
Baygin, M., Tuncer, T., Dogan, S., Tan, R.-S. & Rajendra Acharya, U. Automated arrhythmia detection with homeomorphically irreducible tree technique using more than 10,000 individual subject ECG records. Inf. Sci. 575, 323–337 (2021).
Article Google Scholar
Clifford, G. D., Liu, C., Moody, B., Li-wei, H. L., Silva, I., Li, Q., Johnson, A. E. & Mark, R. G. AF classification from a short single lead ECG recording: The PhysioNet/computing in cardiology challenge 2017. In IEEE Computing in Cardiology (CinC), pp 1–4 (2017).
Zink, M. D., Napp, A. & Gramlich, M. Experience in screening for atrial fibrillation and monitoring arrhythmia using a single-lead ECG stick. Herzschrittmachertherapie Elektrophysiologie 31(3), 246–253 (2020).
Article PubMed Google Scholar
Zheng, J. et al. A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients. Sci. Data 7, 48 (2020).
Article PubMed PubMed Central Google Scholar
Moody, G. B. & Mark, R. G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001).
Article CAS PubMed Google Scholar
Szabó, Z., Sriperumbudur, B. K., Póczos, B. & Gretton, A. Learning theory for distribution regression. J. Mach. Learn. Res. 17(152), 1–40 (2016).
MathSciNet Google Scholar
Fiedler, C., Massiani, P.-F., Solowjow, F. & Trimpe, S. On statistical learning theory for distributional inputs. In International Conference of Machine Learning (2024).
Datta, S., Puri, C., Mukherjee, A., Banerjee, R., Choudhury, A. D., Singh, R., Ukil, A., Bandyopadhyay, S., Pal, A. & Khandelwal, S. Identifying normal, AF and other abnormal ECG rhythms using a cascaded binary classifier. In 2017 Computing in cardiology (cinc), pp 1–4. IEEE (2017).
Hong, S., Wu, M., Zhou, Y., Wang, Q., Shang, J., Li, H. & Xie, J. Encase: An ensemble classifier for ECG classification using expert features and deep neural networks. In 2017 Computing in cardiology (cinc), pp 1–4. IEEE (2017).
Zabihi, M., Rad, A. B., Katsaggelos, A. K., Kiranyaz, S., Narkilahti, S. & Gabbouj, M. Detection of atrial fibrillation in ECG hand-held devices using a random forest classifier. In 2017 Computing in Cardiology (cinc), pp 1–4. IEEE (2017).
Mahajan, R., Kamaleswaran, R., Howe, J. A. & Akbilgic, O. Cardiac rhythm classification from a short single lead ECG recording via random forest. In 2017 Computing in Cardiology (CinC), pp 1–4. IEEE (2017).
Moody, G., Pollard, T. & Moody, B. WFDB Software Package (version 10.7.0). Physionet, https://doi.org/10.13026/gjvw-1m31 (2022).
Christov, I. I. Real time electrocardiogram QRS detection using combined adaptive threshold. Biomed. Eng. 3(1), 1–9 (2004).
Google Scholar
Elgendi, M., Jonkman, M. & De Boer, F. Frequency bands effects on QRS detection. Biosignals 2003, 2002 (2010).
Google Scholar
Hamilton, P. Open source ECG analysis. In Computers in cardiology, pp 101–104. IEEE, (2002).
Rodrigues, T., Samoutphonh, S., Silva, H., & Fred, A. A low-complexity r-peak detection algorithm with adaptive thresholding for wearable devices. In 2020 25th International Conference on Pattern Recognition (ICPR), pp 1–8. IEEE (2021).
Zong, W., Moody, G. B. & Jiang, D. A robust open-source algorithm to detect onset and duration of qrs complexes. In Computers in Cardiology, 2003, pp 737–740. IEEE (2003).
Makowski, D. et al. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav. Res. Methods 53(4), 1689–1696 (2021).
Article PubMed Google Scholar
Pan, J. & Tompkins, W. J. A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng. 3, 230–236 (2007).
Google Scholar
Saldanha, O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precis. Oncol. 7(1), 35 (2023).
Article PubMed PubMed Central Google Scholar
Van Gelder, I. C., Rienstra, M., Bunting, K. V., Casado-Arroyo, R., Caso, V., Crijns, H. J., De Potter, T. J., Dwight, J., Guasti, L., Hanke, T. & Jaarsma, T. 2024 ESC Guidelines for the management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS) Developed by the task force for the management of atrial fibrillation of the European Society of Cardiology (ESC), with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Endorsed by the European Stroke Organisation (ESO). Eur. Heart J. , p.ehae176 (2024).
Smola, A. J. & Schölkopf, B. Learning with kernels, volume 4. Citeseer (1998).
Muandet, K., Fukumizu, K., Sriperumbudur, B. & Schölkopf, B. Kernel mean embedding of distributions: A review and beyond. Found. Trends Mach. Learn. 10, 1–141 (2017).
Article Google Scholar
Teijeiro, T., García, C. A., Castro, D. & Félix, P. Arrhythmia classification from the abductive interpretation of short single-lead ECG records. In 2017 Computing in cardiology (cinc), pp 1–4. IEEE (2017).

Download references

Acknowledgements

Dr. Marx is supported by the Deutsche Forschungsgemeinschaft (German Research Foundation; TRR 219; Project-ID 322900939 [M03, M05]). Dr. Schütt is supported by the Deutsche Forschungsgemeinschaft (German Research Foundation; TRR 219; Project-ID 322900939 [C07]).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute for Data Science in Mechanical Engineering, RWTH Aachen University, Aachen, Germany
Pierre-François Massiani, Lukas Haverbeck, Claas Thesing, Friedrich Solowjow & Sebastian Trimpe
Department of Internal Medicine I, University Hospital RWTH Aachen, Aachen, Germany
Marlo Verket, Matthias Daniel Zink, Katharina Schütt, Dirk Müller-Wieland & Nikolaus Marx

Authors

Pierre-François Massiani
View author publications
Search author on:PubMed Google Scholar
Lukas Haverbeck
View author publications
Search author on:PubMed Google Scholar
Claas Thesing
View author publications
Search author on:PubMed Google Scholar
Friedrich Solowjow
View author publications
Search author on:PubMed Google Scholar
Marlo Verket
View author publications
Search author on:PubMed Google Scholar
Matthias Daniel Zink
View author publications
Search author on:PubMed Google Scholar
Katharina Schütt
View author publications
Search author on:PubMed Google Scholar
Dirk Müller-Wieland
View author publications
Search author on:PubMed Google Scholar
Nikolaus Marx
View author publications
Search author on:PubMed Google Scholar
Sebastian Trimpe
View author publications
Search author on:PubMed Google Scholar

Contributions

N.M. and S.T. conceived the project. P.-F.M., F.S., and S.T. designed the learning algorithm. P.-F.M., L.H., and C.T. implemented the algorithm and conducted the experiments. M.V., M.D.Z., K.S., D.M.-W., and N.M. identified and provided suitable data sets for the experiments, and found meaningful inputs to provide to the algorithm. All authors participated in the analysis of the results. P.-F.M. wrote the first draft of this manuscript, and all authors contributed to the writing and improvement.

Corresponding author

Correspondence to Pierre-François Massiani.

Ethics declarations

Competing interests

The authors declare no financial or non-financial competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Massiani, PF., Haverbeck, L., Thesing, C. et al. Robust screening of atrial fibrillation with distribution classification. Sci Rep 15, 26582 (2025). https://doi.org/10.1038/s41598-025-10090-2

Download citation

Received: 13 February 2025
Accepted: 01 July 2025
Published: 22 July 2025
DOI: https://doi.org/10.1038/s41598-025-10090-2

Subjects

Abstract

Similar content being viewed by others

Machine learning workflow for edge computed arrhythmia detection in exploration class missions

Quantifying deep neural network uncertainty for atrial fibrillation detection with limited labels

Machine learning model for predicting late recurrence of atrial fibrillation after catheter ablation

Introduction

Results

Performance and cross-data-set stability

Data efficiency

Influence of peak detection

Discussion

Methods

Data sets

Data preprocessing

Feature vector

SVM for distribution classification

Hyperparameter tuning

Baselines

Data availability

Code availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links