Introduction

Atrial fibrillation (AF) is the most common sustained arrhythmia in adults, currently affecting more than 37.6 million persons worldwide, with the number increasing annually due to the growing prevalence of risk factors such as advanced age, obesity and hypertension1.

AF is a major risk factor for ischaemic stroke (IS). Approximately 25% of strokes are attributed to previously undetected asymptomatic AF. Timely detection could have prevented many of these strokes through the initiation of anticoagulation therapy2,3. However, as AF can be paroxysmal or completely asymptomatic at onset or occur when conventional monitoring is not nearby, its diagnosis remains challenging. While AF-associated stroke is currently the most actionable complication of AF, heart failure (HF) is twice as common following an AF diagnosis4, with approximately one in five receiving their first HF diagnosis at the same time as their initial AF diagnosis5.

The current clinical routine for AF detection often consists of Holter recording, which remains the gold standard for ambulatory ECG monitoring, with the likelihood of detecting AF in those with paroxysms dependent of the burden of AF and the duration of monitoring6,7. Although Holter monitoring extends up to 14 days or more in some clinical settings, effectively enhancing paroxysmal AF detection, the sporadic nature of AF and variable access to long-duration monitors still lead to numerous undetected cases. An alternative approach is continuous long-term monitoring using implantable loop recorders (ILRs), overcoming the limitations of intermittent monitoring7. However, their invasive and costly nature has spurred interest in exploring non-invasive alternatives for monitoring silent arrhythmias such as AF8. Moreover, atrial fibrillation consequences and treatments may be influenced by the number, duration, severity and burden of atrial fibrillation, with longer durations of monitoring providing a clearer picture of all of these variables.

Over the last few years, various mobile devices and smartwatches equipped with artificial intelligence (AI) algorithms to detect AF were introduced9,10. AI-based devices primarily employ two technologies for automatic AF detection: photoplethysmography (PPG) and single-lead ECG. PPG relies on light absorption and reflection by blood vessels, offering a non-invasive and reliable means to measure blood flow, typically at the skin’s surface11. Single-lead ECG methods involve wearable devices recording a single-lead ECG. Individuals are instructed to maintain contact between specific body parts and the device for a predetermined duration12. For both techniques, the recorded data are transmitted to an AI application, which classifies recordings such as “possible AF” and “no AF”13.

These wearable technologies offer distinct advantages compared to the current gold standards as they enable long-term remote monitoring, are low in cost, are non-invasive, and allow simultaneous monitoring of various vital parameters10. For paroxysmal AF, in particular, the prolonged monitoring times can increase the chance of AF detection13. However, not all available devices and techniques have thoroughly been evaluated, whereas high sensitivity and specificity are required to limit false results14. False positives have been shown to lead to stress among patients and increased healthcare burden and cost15.

Considering these benefits and risks of wearable technologies for AF detection and quantification, this study compared the performance of a wrist-worn sensor device and the corresponding PPG- and single-lead ECG-based AF algorithms to conventional 24-h Holter recordings. First, we performed 24 h of head-to-head comparisons, and second, an extended 28-day evaluation.

Results

Informed consent was obtained from all 173 enroled patients. Among them, ten withdrew from the study due to reasons including a high study burden (n = 5), or an adverse event being either an allergic reaction to the bracelet strap (n = 4) or hospitalization (n = 1). Two patients with an adverse event, an allergic reaction (n = 1) and a concussion (n = 1), chose to remain in the study. Furthermore, insufficient quality PPG data was observed for thirteen patients. To adhere to the study protocol, participants who withdrew or had inadequate data were replaced, ensuring a total inclusion of 150 patients. Patient characteristics are detailed in Table 1.

Table 1 Patient characteristics

PPG-based atrial fibrillation detection vs. 24-h Holter recordings

Across the 24-h Holter period a total of 103,368 90-s PPG windows were collected by CW2, of which 56,781 (54.9%) met the quality criteria for analysis. All other windows were excluded either due to movement, poor signal quality, or indeterminate outcomes (TMM, BADQ, UND). The result of the CW2’s PPG algorithm (AF or SR) was individually compared to the Holter data from the corresponding time window for each analysable window. A confusion matrix showing the results for both the CW2’s PPG algorithm and the Holter per analysable window is provided in Table 2. Across all analysed windows, CW2’s PPG algorithm obtained a specificity of 98.2% and sensitivity of 98.8% compared to the Holter. The PPV and NPV were 80.7% and 99.9%, respectively.

Table 2 Confusion matrix showing results for all analysed 90-s windows based on the CW2’s PPG algorithm in comparison to a Holter reference across 24 h

Single-lead ECG-based atrial fibrillation detection vs. 24-h Holter recordings

In total, 162 successful CW2 single-lead ECGs were performed, each lasting 30 s. For each successful CW2 single-lead ECG, the result of the CW2’s ECG algorithm (AF or NSR) was individually compared to the Holter data from the corresponding time window. Table 3 provides a confusion matrix showing the results for both the CW2’s ECG algorithm and the Holter per successful ECG.

Table 3 Confusion matrix showing results for all successful CW2 single-lead ECG based on the CW2’s PPG algorithm in comparison to a Holter reference across 24 h

Across all single-lead ECGs, the CW2’s ECG algorithm obtained a specificity of 98.6% and sensitivity of 95.8% compared to the Holter. The PPV and NPV were 92.0% and 99.3%, respectively.

Atrial fibrillation detection over 28 days

Patient adherence to the CW2 was 90.9%, corresponding to a wear time of approximately 22 h a day. Adherence did not decrease throughout the study period. Throughout the 28 days, AF-detection based on PPG was possible each day across 17.4 of 24 h (72.5%) on average, the remaining hours include data with poor signal quality and device recharge time or non-wearing (1.2 h of recharge time and non-wearing on average). After AF detection by the PPG-based algorithm, in total, the CW2 sent 721 ECG alerts to all participants. Patients that received an alert received 9 alerts on average (median of 4 alerts, range of 1-32 alerts per patient with alert). The ECG alerts were, by design, generated no more than every 4 h, and reflect that the AF detection algorithm detected AF 3 or more times in a 10-minute timespan.

Patients reacted to these ECG alerts in 69.8% of the cases when considering the average reaction percentage across all patients. On average, 1.7 single-lead ECG attempts ( ± 3.8 SD) were performed before a successful ECG occurred. After an ECG alert following PPG-based AF detection, the single-lead ECG algorithm confirmed AF within one hour in 66.8% of the cases when only considering single-lead ECGs performed within one hour after the ECG alert.

Atrial fibrillation prevalence over 28 days

Based on the 24-h Holter recordings, 22 out of 150 patients were diagnosed with AF. Therefore, the prevalence of AF within this population, according to the 24-h Holter, was 14.7%. In paroxysmal AF, the 24-h did not always overlap with an AF period (Fig. 1). Therefore, the AF prevalence increased to 19.3% (29 out of 150 patients) when including the entire 28-day CW2 period, considering single-lead ECG confirmed AF only. When considering all cases where the CW2’s PPG algorithm detected AF, the prevalence of AF increased further to 26.7% (40 out of 150 patients).

Fig. 1: Example of gathered data across 28 days in three patients with (paroxysmal) atrial fibrillation (Afib).
figure 1

Each panel repesents the data of a single patient. AF was detected in all three patients according to CardioWatch 287-2’s photoplethysmography and single-lead-ECG algorithms, but only twice (top two) according to 24-h Holter. The 24-h Holter period is marked by a red box. The red/blue line represents the photoplethysmography (PPG)-based categorisation based on the Happitech algorithm, where red represents normal sinus rhythm (NSR), and blue represents Afib. Black dots represent the incidences where an ECG was performed. The blue/red dots below the PPG-based diagnosis represent the categorisation made by the Cardiolyse Analytics platform using the single-lead ECG, again red represents NSD, and blue represents Afib.

Atrial fibrillation burden

The AF burden recorded during the 24-h Holter period ranged from 0.42 to 100% according to the Holter reference, with the shortest AF period lasting 6 min. RMSE between the AF burden according to 24-h PPG recordings and the AF burden determined by 24-h Holter was 2.2%. The PPG-based AF burden estimation was highly correlated with the Holter reference measurements via Pearson correlation coefficient (R2 = 0.99, Fig. 2a). Bland-Altman analysis showed a mean difference of 0.17% and 95% limits of agreement of –4.2 to 4.5% (Fig. 2b). In both calculations cases where no AF was detected as an AF burden of 0%.

Fig. 2: Correlation and Bland-Altman plot comparing photoplethysmography-based and holter-based atrial fibrillation burden.
figure 2

Comparison between photoplethysmography (PPG)-based and Holter-based atrial fibrillation (AF) burden across 24 h. Demonstrating (a) a correlation of R2: 0.9937 between the PPG-based and Holter-based AF burden across 24-h, and (b) a Bland-Altman plot showing a mean difference in AF burden between PPG and Holter of 0.17% with lower and upper limits of −4.19% (SD: −1.96) and 4.53% (SD: 1.96), respectively.

Human factors validation questionnaire

Across the 89 responders to the human factors validation questionnaire, all questions received either the highest or second highest score when considering the median score (Table 4).

Table 4 Responses to the human factors validation questionnaire assessing the user-friendliness of the wrist-worn PPG device and corresponding mobile application

Discussion

This study firstly shows that both the CW2’s PPG and the single-lead ECG algorithm have a high specificity and sensitivity for AF detection compared to the gold standard across 24 h. Secondly, the 24-h PPG-based AF burden showed a high correlation and low error compared to the AF burden determined by the standard 24-h Holter. Importantly, the prevalence of AF increased from 14.7% (24-h Holter) to 26.7% and 19.3% when considering the CW2’s PPG algorithm alone or in combination with the single-lead ECG algorithm across 28 days, respectively.

Various studies have previously investigated the performance of PPG and single-lead ECG algorithms for AF detection. When considering PPG-based AF detection algorithms alone, the median sensitivity and specificity, according to several (systematic) reviews, ranged from 95.1 to 100.0% and 95.0% to 97.7%, respectively9,10,16. For single-lead ECG-based AF detection algorithms, the sensitivity and specificity lie between 75.1 to 98.0% and 84.0 to 98.2%, respectively16,17,18. However, many of these studies were conducted in a controlled clinical setting and not in an ambulatory environment. The median sensitivity and specificity achieved in an ambulatory setting were often lower for both PPG- and single-lead ECG-based evaluations16,17,18. The sensitivity and specificity obtained by the CW2’s PPG- and single-lead ECG-based algorithm are equal to those achieved within clinical studies. However, the CW2’s algorithms achieved such high sensitivity and specificity in an ambulatory, real-world setting.

The novelty of this device lies especially in the combination of accurate non-invasive AF monitoring by means of both continuous PPG and single-lead ECG spot-checks. This combination allows for accurate and long-term AF burden evaluations via PPG while still offering cardiologists the familiarity of an ECG report for the confirmation of the AF diagnosis through a full strip ECG in combination with an annotation tool. Furthermore, only a small amount of active measurements is necessary, due to the fact that an active ECG recording is only required in those cases where AF is already detected by PPG, while still allowing cardiologists to evaluate AF Burden by PPG and confirm AF diagnosis by ECG.

Reporting of PPV was less frequent in the existing literature. In an ambulatory setting, the PPV ranged from 39.9% to 97.7% when considering both PPG-based and single-lead ECG-based algorithms17,19,20. Both CW2’s algorithms, with 81% and 92%, respectively, are again on the high end of this spectrum. Additionally, analyses considering AF burden and the correspondence between estimate and reference are often lacking in the current literature. To our knowledge, only two recent studies reported AF burden results, presenting correlation coefficients in line with the results presented here (Pearson correlation coefficient: 0.986; intraclass correlation coefficient: 0.88)19,21.

Of the 90-s PPG windows collected by CW2 across the 24-h Holter period, only 54.9% met the quality criteria for analysis due to movement, poor signal quality, or indeterminate outcomes. For the 28-day period this percentage increased to 72.5%. This increase in the percentage of analysable data may be due to patients becoming accustomed to the device and, as a result, wearing it more consistently according to the instructions. This selective inclusion of more limited high-quality PPG data relative to continuous ECG data underscores the practical challenges in long-term, patient-friendly, continuous monitoring, including inevitable periods of noise interference reflecting real-world settings. The high exclusion rate of PPG data in the study primarily stems from the inherent sensitivity of PPG technology to motion artifacts due to movement. Nonetheless, the inclusion of high-quality data alone decreases the frequency of false positives, while still allowing AF detection at every incidence that high-quality data is available. Furthermore, with an average of 17.4 analysable hours a day for 28 days across 150 patients, we still gathered a vast amount of analysable data, demonstrating the advantages of continuous monitoring using a device such as the CW2 compared to standard Holter monitoring.

As the CW2 can be used over an extended period, patient compliance, the number of ECG attempts before a successful ECG, and the AF confirmation rate according to the single-lead ECG algorithm are essential to the eventual clinical implementation. However, these results were difficult to compare to the literature, as the measurement period used here is one of the longest considered10,16. This extended measurement period proved especially useful in cases of paroxysmal AF, as this led to an increase in AF detection and prevalence compared to standard Holter recordings.

Not only does the CW2 provide accurate AF monitoring across extended periods, but the single wristband is water-resistant, very thin, and light. As reflected by the high score across the user-friendliness questionnaire, these characteristics improve the wearability and usability of the CW2 and prevent disturbances in daily activities and sleep. Furthermore, as the CW2 can measure multiple other parameters (including pulse rate, SpO2, breathing frequency, non-invasive blood pressure, and accelerometer data), early detection and monitoring of various (cardiac) health conditions is enabled. These qualities set the CW2 apart from many similar AF-detection devices using a PPG-based or single-lead ECG-based algorithm, which focuses solely on AF detection and/or consists of multiple or large wearable devices19,20. Furthermore, the continous measurement capabilities and non-intrusive nature of the device distinguishes the CW2 from devices which use spot measurement or adhesive patches. Finally, the CW2’s high accuracy and ease of use make it an ideal alternative for AF screening and monitoring, offering a less burdensome option compared to standard-of-care methods whilst allowing an extended monitoring period.

Despite the valuable insights gained from this study, several limitations must be acknowledged. First, the ambulatory nature of the study led to a significant portion of the collected PPG data being discarded. Additionally, an average of 1.7 attempts ( ± 3.8 SD) were required to obtain a successful and analysable single-lead ECG. Moreover, only 66.8% of the performed single-lead ECGs confirmed AF while the sensitivity and specificity of both the single-lead and PPG-based algorithms remained above 95%. This suggests that some cases of paroxysmal AF may have reverted to sinus rhythm before the single-lead ECG was recorded. As such, further research is needed to explore the impact of the time gap between the alert and the single-lead ECG on detection accuracy. Additionally, the exclusion of unanalysable data may have affected the outcomes of this study. Nonetheless, the amount of unanalysable PPG data and single-lead ECG attempts was in line with the results presented by other studies16. Even though a large amount of analysable 90-s PPG windows were available, AF episodes were relatively small due to the limited number of patients with AF. Furthermore, the period, with a direct comparison to the gold standard, was relatively short. Additionally, no Fitzpatrick score V or VI inclusions were present within this evaluation. Therefore, a cohort study across a sizeable AF population with a longer gold standard reference duration and representation of all Fitzpatrick scores is recommended to confirm this study’s promising initial results. Furthermore, post-market studies should be considered to further evaluate the effect of demographic caracteristics such as skin colour. Finally, although the CW2 was considered to be userfriendly by the participants of this study, the potential influence of user bias should be considered, as patients willing to participate in a wearables study may be more familiar with these devices than those who do not participate.

To conclude, this study demonstrates that the CW2’s PPG-based and single-lead ECG-based algorithms have a high specificity and sensitivity for AF detection compared to standard 24-h Holter monitoring. Furthermore, the PPG-based AF burden showed a high correlation and low error compared to the gold standard. Notably, AF detection rates increased compared to the gold standard when considering 28 days of CW2 data. Despite some limitations, the study suggests promising prospects for using wearable devices like the CW2 in long-term AF monitoring, potentially improving patient care and outcomes.

Methods

Study design

The presented analytical validation research was part of a prospective, single-centre study conducted at the teaching hospital Reinier de Graaf Gasthuis, Delft, the Netherlands, between June 2023 and December 2023. This study was registered on ClinicalTrials.gov under registration number NCT05899959 on June 2, 2023. The study adhered to the principles outlined in the Declaration of Helsinki and received approval from the regional Dutch medical ethical committee (METC Leiden Den Haag Delft, NL83281.000.22).

Study population

The study population consisted of consecutive patients receiving a 24-h Holter as standard of care at the Reinier de Graaf Gasthuis. Following the study protocol, participants with insufficient data quality and participants who withdrew from the study were replaced to ensure a total inclusion of 150 participants.

Exclusion criteria encompassed being unable to wear the Corsano CardioWatch 287, not being able to receive BP measurements per cuff, pregnancy, breastfeeding, upper arm circumference outside the cuff range (22–42 cm), being unable or unwilling to provide informed consent, or having significant mental or cognitive impairment. Written informed consent was obtained from all participants before their inclusion in the study.

Sample size calculations for the assessment of AF detection were based on a previous study that we conducted at an outpatient referral clinic, where we observed an atrial fibrillation incidence of 3% after 48 h of Holter monitoring21. To observe a 7% absolute increase in atrial fibrillation detection with a power of 0.80 at a two-sided α = 0.05, a total of 125 patients is required (McNemar test on dependent proportions)22. Assuming an attrition rate of 20%, a total of 150 patients were required.

Devices

The investigational device consisted of the CardioWatch 287-2 (CW2), a rechargeable, wearable device developed by Corsano Health (The Hague, the Netherlands) and manufactured by MMT (Geneva, Switzerland)23. The CW2 is currently available on the market and utilises PPG to monitor multiple vital signs on the wrist, combining signals from light sources, light sensors, electrodes, and an accelerometer. AF can be determined in two ways, either directly via the PPG data or via the single-lead ECG. PPG-based AF detection was performed using the Happitech algorithm (Rotterdam, the Netherlands)24. An example result for the PPG-based AF detection algorithm is provided in Fig. 3. Single-lead ECGs were analysed through the Cardiolyse analytics platform (Helsinki, Finland)25. Figure 4 illustrates an example of the CW2’s utilisation chain and its corresponding cloud-based portal system demonstrating the results of a single-lead ECG.

Fig. 3: Demonstration of the CardioWatch 287-2 photoplethysmography (PPG)-based atrial fibrillation detection method.
figure 3

This method includes the algorithm developed by Happitech (Rotterdam, the Netherlands) which, here, is applied on a 100-s time window and shows the detection of an irregular heart rythm. In the upper pannel a poncoire plot is included, demonstrating the instability of RR-intervals. An RR Tachogram follows, showing the variation in RR-intervals over a time period of 100-s. Finally, the raw PPG data of the 100-s window is shown.

Fig. 4: Demonstration of CardioWatch 287-2 atrial fibrillation detection method.
figure 4

This demonstation includes (a) the use chain showing the steps from health care provider, through patient, arrhythmia detection, data transfer, to the eventual data report, and (b) the results of a single-lead ECG for a patient with atrial fibrillation within the Corsano cloud-based portal showing irregular RR-intervals (red highlights) and classification as atrial fibrillation.

The Happitech PPG-based AF-detection algorithm has been validated in several clinical studies, such as Mol et al. 2020 and Schäck et al. 201726,27. The algorithm (Heart Rhythm SDK) utilizes a classification machine learning (ML) algorithm. This algorithm evaluates various factors such as signal quality, RR intervals, curve shape, and motion to determine if the user has a normal heart rhythm, atrial fibrillation (AFib), or if the measurements need to be repeated due to low quality. The algorithm processes signals from two sensors: the PPG sensors and the accelerometer. Data from these sensors is synchronized and combined into a single frame, which is then fed into the algorithm. The algorithm comprises several relatively independent sub-algorithms. These sub-algorithms include individual ML models, each supported by a labelling tool with a graphical user interface (LabGUI) for dataset labelling. This complex system ensures accurate heart rhythm detection by evaluating signal quality and other critical factors, providing users with reliable results. The final outputs include heart rate, heart rate variability, and heart rhythm classification, which are visually displayed on the user interface for easy interpretation.

The Cardiolyse single-lead ECG-based AF-detection algorithm is an integral part of the proprietary electrocardiogram scaling technology described in Chaikovsky et al. 202028. This algorithm is hybrid, i.e. most of its steps are rule-based, but there are also elements of ML. The first step consists of determining the level of signal noise and the presence of artifacts. To solve this problem, ML methods are used. As a result, the recording section is considered suitable or unsuitable for further analysis. Secondly, QRS complexes are defined and the type of each QRS complex is determined, excluding ectopic beats. Following this, the degree of order or, conversely, the chaos of the series of R-R intervals is determined. To solve this problem, a number of methods are used, both time-domain (upgraded RMSSD) and nonlinear dynamics (upgraded Sample Entropy) and some others. Finally, the P and T waves as part of each QRST – complexes are determined, and an assessment of their ordering and type of connection with the QRS is performed. Eventually leading to a conclusion about the presence or absence of AF.

The reference, 24-h Holter, recordings were performed using the custo flash 500/510/510 V (custo med GmbH, Ottobrunn, Germany)29. The custo flash 500/510/510 V is a classic Holter ECG recorder with three channels and integrated ECG cables for continuous ECG recording. Recordings were first automatically analysed after which a manual check by a human overreader was performed, according to standard-of-care.

Study endpoints

Primary endpoints include

  1. 1.

    AF detection: sensitivity, specificity, positive and negative predictive value (PPV/NPV) for:

    1. a.

      24-h PPG-based (Happitech) vs. 24-h Holter

    2. b.

      24-h single-lead ECG (Cardiolyse) vs. 24-h Holter

  2. 2.

    28-day CW2 period results including:

    1. a.

      Compliance with ECG alerts and number of ECG attempts before successful ECG

    2. b.

      Confirmation of AF: single-lead ECG vs PPG-based (ECGs within one hour of alert)

    3. c.

      Prevalence of AF: 28-day PPG-based (Happitech) vs. 24-h Holter

Secondary endpoints include:

  1. 1.

    AF burden: root mean squared error (RMSE), correlation and Bland-Altman for:

    1. a.

      24-h PPG-based (Happitech) AF burden vs. 24-h Holter AF burden

  2. 2.

    User-friendliness of the device during the 28 days.

Study protocol

For this study, the accuracy of the CW2’s AF detection during 24 h was directly compared to simultaneous monitoring by conventional 24-h Holter. The AF detection and AF burden were then evaluated for 27 additional days. To achieve this, study participants were equipped with the CW2, a wrist-worn PPG device, on the non-dominant wrist, directly following 24-h Holter placement. Patients were instructed to wear the CW2 continuously for 28 days. Throughout this duration, PPG data, collected using multi-wavelength (paired green, red, and infrared sensors) with a sampling frequency of 32 Hz or 128 Hz, was continuously transmitted to a secure cloud via a Bluetooth-connected smartphone. Patients are actively encouraged to wear the device if they are not currently doing so by means of an automatic alert on the patients’ smarthphone every 12 h. Additionally, if no data was collected by the CW2 for two or more consecutive days, the researcher contacted the patient to remind them of the importance of wearing the CW2 and to address any potential issues.

If AF was detected within the PPG data, a notification was sent to the patient’s smartphone, prompting them to perform a single-lead ECG by holding the rim of the bracelet with two fingers of the contralateral hand. This notification was only sent if AF was detected three times within 10 minutes and only if the previous notification was more than 4 h ago. Furthermore, no notifications were sent at night (between 23:00 and 07:00). A maximum of once per 4-h was established to minimize the burden on patients experiencing prolonged or continuous AF.

At the start of the study period, baseline parameters, including skin type (Fitzpatrick scale) and arm hair density, were collected for all patients. The researchers documented all adverse events and study withdrawals. In case of study withdrawal or insufficient data quality, participants were replaced, following the study protocol, to ensure a total inclusion of 150 patients.

Halfway through the study, between days 12 and 28, participants were asked to review the CW2’s user-friendliness using a human factors validation questionnaire (Supplementary Table 1). This nine-question survey was developed and validated across studies in the USA in 2022 and 2023 in the process of acquiring FDA clearance for the CardioWatch 287-2, and assesses the CW2’s wearability and usability, its charging process, and the accessory mobile application.

Statistical design and analysis

All statistical analyses were performed using Python (version 3.10).

For the 24-h Holter period, data collected with the CW2 and Holter were synchronised in time by means of aligning the timestamps of the data gathered from both devices. Following this synchronisation, PPG and Holter’s data were split into windows of 90 s. For each 90-s window, the PPG-based AF algorithm provides a prediction ranging from the following options: atrial fibrillation (AF), normal sinus rhythm (NSR), undetermined (UND), too much movement (TMM) or poor signal quality (BADQ). For all AF and NSR outcomes, the PPG-based prediction was compared to the Holter results from the corresponding 90-s window. Holter windows were scored as AF when more than 50% of the window included AF. The average sensitivity and specificity of AF detections by the investigational device (PPG- and single-lead ECG-based) were calculated using the 24-h Holter as a reference.

The AF burden according to the 24-h PPG-based measurements, was compared to the AF burden determined by the 24-h Holter. For PPG-based measurements, the AF burden was determined by dividing the number of analysable windows with AF by the total amount of analysable windows. For the 24-h Holter measurements, the AF burden was calculated by dividing the time with AF by the total recording time. In case no AF was detected the AF burden was set to zero percent. The RMSE between the PPG- and Holter-based AF burden was calculated according to Formula 1. Additionally, the Pearson correlation coefficient between the PPG- and Holter-based AF burden was determined. To further compare the 24-h PPG-based AF burden to the Holter AF burden, a Bland-Altman figure, pooled across all subjects, was determined according to Bland et al. (2007)30.

$${RMSE}=\sqrt{\mathop{\sum }\limits_{i=1}^{N}\frac{{({{Predicted}}_{i}-{{Actual}}_{i})}^{2}}{N}}$$
(1)

For the whole 28-day period, patient compliance to perform a single-lead ECG after an ECG alert was calculated by dividing the number of ECG alerts by the number of performed ECGs per individual and averaging across all individuals. Furthermore, it was determined how many times an ECG alert led to the affirmation of AF within one hour after the ECG alert averaged across all individuals. The prevalence of AF according to the 24-h Holter was compared to the AF prevalence across 28 days according to the CW2 PPG and single-lead ECG algorithms individually. Finally, the median and range of the replies to each of the questions in the user-friendliness survey were reported.

Subgroup analyses for subgroups based on gender, age and AF burden were performed, but were found to be unrevealing and are therefore not included here.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.