Abstract
Artificial intelligence (AI)-derived electrocardiographic (ECG) age is a promising marker of atrial fibrillation (AF) risk. We developed PROPHECG-Age Single—an AI model estimating ECG age from wearable single-lead ECGs—and examined whether the ECG-age gap (predicted minus chronological age) is associated with AF presence and burden in real-world self-monitoring context. One million 12-lead ECGs from a hospital were converted to synthetic single-lead signals via Cycle-Consistent Generative Adversarial Network and used to train a residual network-based model. Validation in two independent wearable cohorts (S-Patch [ClinicalTrials.gov: NCT05119725, registered November 2021]; Memo Patch [ClinicalTrials.gov: NCT05355948, registered May 2022]) showed mean absolute errors of 10.01 and 11.88 years, respectively. The pooled association with AF presence was significant (odds ratio 1.03 per 1-year gap), and for AF burden, each 1-year gap increase corresponded to a 0.8 percentage point rise. These findings support wearable-based AI-ECG age as a potential digital biomarker for proactive cardiovascular monitoring.
Similar content being viewed by others
Introduction
Physiological aging is a fundamental contributor to the onset and progression of cardiovascular diseases, notably atrial fibrillation (AF)1,2, whose global prevalence continues to rise significantly3,4. Electrocardiography (ECG), a non-invasive and broadly accessible diagnostic tool, captures the electrical activity of the heart and has recently emerged as an intriguing digital biomarker for evaluating cardiovascular aging. Advances in artificial intelligence (AI) now allow precise estimation of an individual’s “electrocardiographic age” (AI-ECG age) during sinus rhythm from standard 12-lead ECG recordings. This AI-derived biomarker is especially insightful because discrepancies between AI-predicted ECG age and chronological age—termed the “AI-ECG age gap”—have been strongly correlated with worsening cardiovascular health, heightened mortality, and increased AF risk, including recurrence post-intervention5,6,7,8,9.
However, existing AI-ECG age models have primarily relied on episodic, hospital-based 12-lead ECG recordings. Such snapshots inherently miss the continuous electrophysiological variations occurring in everyday life, potentially overlooking subtle yet clinically relevant changes in cardiac aging. Consequently, the episodic nature of these assessments limits the temporal resolution and real-world applicability of AI-derived age metrics. Recent developments in wearable single-lead ECG technology offer promising opportunities for continuous, longitudinal cardiac monitoring outside clinical settings. Yet, a major barrier has been the limited availability of large-scale, high-quality single-lead ECG datasets necessary to train robust AI models suited for continuous wearables10.
To overcome this critical limitation, we introduce the PROPHECG-Age Single (PRediction Of PHenotypes using ElectroCardioGraphy-Age Single) model—a novel deep-learning framework utilizing large-scale synthetic data specifically designed to estimate AI-ECG age from wearable single-lead ECG recordings (Fig. 1). Through this study, we assess the potential of continuous single-lead AI-ECG age estimation as a promising digital biomarker, shifting wearable self-monitoring from episodic event detection toward precision-driven, proactive AF risk management.
Schematic representation illustrating the data integration, model training, validation, and downstream clinical analyses. Three distinct datasets were utilized: the Severance Hospital ECG archive (1,008,566 standard 12-lead ECGs), the S-Patch registry (1980 participants), and the Memo Patch registry (582 participants). A Cycle-Consistent Generative Adversarial Network (CycleGAN) was first trained on 50,000 standard 12-lead ECGs (Severance) and 100,000 single-lead ECGs (S-Patch), and subsequently generated synthetic single-lead waveforms from the full Severance dataset (n = 1,008,566). These synthetic single-lead waveforms were used to train the PROPHECG-Age Single model, a one-dimensional ResNet architecture designed to estimate electrophysiological (AI-ECG) age from 10-s single-lead ECG segments. Internal validation utilized 1502 eligible S-Patch participants, whereas external validation employed 529 eligible Memo Patch participants. For clinical evaluation, analyses of prevalent atrial fibrillation (AF) presence were conducted for all eligible participants (S-Patch, n = 1502; Memo Patch, n = 529). Participants demonstrating at least one AF episode underwent additional AF burden analyses (S-Patch, n = 233; Memo Patch, n = 24). Abbreviations: AF atrial fibrillation, ECG electrocardiogram, CycleGAN Cycle-Consistent Generative Adversarial Network.
Results
Clinical characteristics of registry participants
Baseline demographic, clinical, and ECG-derived metrics for participants from the S-Patch and Memo Patch registries are summarized in Table 1. The S-Patch cohort had a notably higher proportion of participants with AF compared to Memo Patch (81% [n = 1,217] vs. 5% [n = 24]; p < 0.001). Although S-Patch participants were younger (62.2 ± 11.0 vs. 67.4 ± 9.6 years; p < 0.001), their AI-ECG–predicted age at sinus rhythm was paradoxically higher (60.4 ± 9.3 vs. 58.2 ± 8.8 years; p < 0.001), resulting in a higher AI-ECG age gap (−1.8 ± 12.4 vs. −9.2 ± 11.0 years; p < 0.001). Clinically significant differences included a higher prevalence of congestive heart failure among S-Patch participants (13% vs. 5%; p < 0.001), whereas Memo Patch users exhibited higher rates of hypertension (66% vs. 55%; p < 0.001) and diabetes mellitus (27% vs. 18%; p < 0.001), resulting in a correspondingly higher CHA₂DS₂-VASc risk score (2.9 ± 1.1 vs. 2.0 ± 1.4; p < 0.001). Additionally, substantial differences were observed in demographic characteristics: the S-Patch registry was predominantly male (66% vs. 29%; p < 0.001) and had a higher proportion of severe alcohol consumption (22% vs. 6%; p < 0.001).
Model performance
The CycleGAN architecture successfully generated high-fidelity single-lead ECG signals from standard 12-lead recordings, closely matching actual waveforms acquired from wearable S-Patch single-lead devices (Fig. S1). Training the PROPHECG-Age Single model using these synthetic single-lead ECGs resulted in a mean squared error (MSE) of 203.4 in the training set and 215.9 in the internal validation split, corresponding to a mean absolute error (MAE) of 11.15 years (Fig. S2). In real-world wearable ECG data, the PROPHECG-Age Single algorithm demonstrated consistent predictive performance. Internal validation within the S-Patch cohort (n = 1502) demonstrated an MAE of 10.01 years and a significant correlation with chronological age (Pearson’s r = 0.26; p < 0.001; Fig. 2a). External validation in the independent Memo Patch cohort (n = 529) yielded a slightly higher but comparable MAE (11.88 years) with significant correlation (Pearson’s r = 0.30; p < 0.001; Fig. 2b). By comparison, an alternative workflow—in which single-lead wearable ECG data were first reconstructed into 12-lead ECGs using CycleGAN and then processed with the original 12-lead PROPHECG-Age model—achieved a similar MAE (8.86 years) but exhibited weaker correlation (Pearson’s r = 0.13; p < 0.001) and notable regression toward the mean (Fig. S3).
Hexbin scatter plots showing the relationship between AI-predicted ECG age (derived from continuous single-lead, sinus-rhythm segments) and true chronological age in a the internal validation cohort (S-Patch registry, n = 1,502; Pearson r = 0.26, p < 0.001; MAE = 10.01 years) and b the external validation cohort (Memo Patch registry, n = 529; r = 0.30, p < 0.001; MAE = 11.88 years). The red dashed line indicates perfect prediction (y = x), and darker hexagons denote regions of higher sample density. Abbreviations: AF atrial fibrillation, ECG electrocardiogram.
Performance contextualization and baseline comparison
Lead ablation experiments demonstrated a structural error floor with single-lead inputs: the same architecture achieved a mean absolute error (MAE) of ~7.8 years with 8-lead data but approximately 10 years with 1-lead data (Table S1). By contrast, wearable-like artifacts (baseline drift, motion bursts, EMG-like noise) and sampling-rate changes between 200 and 500 Hz produced only minor variation in performance (Tables S2, S3). In baseline comparisons, feature-based linear regression models yielded a lower global MAE (~8.2 years) but exhibited pronounced regression to the mean, with poor calibration slopes (~0.08–0.11). The deep-learning model, despite a higher global MAE of ~10.0 years, achieved a superior calibration slope (~0.21) and outperformed the linear baselines in younger participants (≤40 years) by approximately 5–7 years (Table S4, Fig. S4). Subgroup and bias assessments, including Bland–Altman analyses and sex- and age-bin summaries, are provided in Fig. S5.
Association between AI-ECG age gap and prevalent atrial fibrillation
In the internal validation cohort (S-Patch, n = 1,502), participants with AF demonstrated a significantly greater AI-ECG age gap compared to those without AF (–1.2 ± 12.3 vs. –4.2 ± 12.8 years; p < 0.001; Fig. 3a). In multivariable logistic regression adjusted for sex and all CHARGE-AF covariates, each additional 1-year increment in the AI-ECG age gap independently corresponded to 2% higher odds of prevalent AF (adjusted OR: 1.02; 95% CI: 1.01–1.04; Fig. 3b). Analysis by AF subtype revealed a stepwise increase in AI-ECG age gap from Non-AF (–4.2 ± 12.8 years) through paroxysmal AF (–1.4 ± 12.3 years) to persistent AF (–0.1 ± 12.3 years; p < 0.001 for trend; Table S5). Additionally, dichotomization of the AI-ECG age gap at the cohort mean (–1.8 years) identified an elevated gap as uniquely associated with AF (OR: 1.76; 95% CI: 1.35–2.30; p < 0.001), without significant associations for other cardiovascular comorbidities (Fig. S6). The baseline characteristics of the external validation cohort (Memo Patch, n = 529), stratified by the presence of AF, are detailed in Table S6. In this external sample, each additional year of AI-ECG age gap conferred a non-significant 3% increase in the odds of prevalent AF (adjusted OR 1.03, 95% CI 0.98–1.09; Fig. S7a). Pooling both cohorts yielded a common-effect OR of 1.03 per 1-year increment in the AI-ECG age gap (95% CI 1.01–1.04) with no evidence of between-study heterogeneity (I² = 0%, p = 0.760; Fig. S7b).
a Distribution of the 48-hour average AI-ECG age gap during sinus rhythm according to atrial fibrillation (AF) status. Overlaid histograms display the proportion of participants within each ECG-age gap bin for those without AF (Non-AF, n = 285; gray) and with AF (AF, n = 1217; blue). The AF group shows a right-shifted distribution, indicating a larger positive ECG-age gap compared with the non-AF group. Vertical dashed lines indicate the mean ECG-age gap for each group, and a Welch’s t-test confirmed a significant difference between groups (t = −3.61, p < 0.001). b Forest plot of adjusted odds ratios (ORs) and 95% confidence intervals (CIs) from a multivariable logistic regression for AF presence in the S-Patch cohort. Each 1-year increment in the AI-ECG sinus age gap is associated with an OR of 1.02 (95% CI: 1.01–1.04), indicating a significant independent predictor of AF. Other covariates include male sex, heart failure, weight, chronological age, smoking status, systolic and diastolic blood pressure, height, hypertension, diabetes mellitus, and prior myocardial infarction. The dashed vertical line marks OR = 1; blue markers denote predictors with p < 0.05.
Association between AI-ECG age gap and atrial fibrillation burden
The AI-ECG age gap was positively correlated with AF burden in the internal cohort (S-Patch, n = 233 with AF episodes; Pearson’s r = 0.13; p = 0.048; Fig. 4a). The clinical characteristics of these patients, stratified by AF subtype, were otherwise similar (Table S7). In a multivariable fractional logit model adjusted for sex and CHARGE-AF covariates, each 1-year increase in the AI-ECG age gap was independently associated with a 0.74–percentage-point higher AF burden (average marginal effect 0.0074; 95% CI 0.001–0.014; Fig. 4b). Although the direction of effect was consistent in the external cohort (Memo Patch, n = 24 with AF episodes; average marginal effect 0.017, corresponding to a 1.7–percentage-point increase in AF burden per 1-year increase in the age gap), the association did not reach statistical significance because of the small sample size (95% CI –0.007 to 0.042; Figure S8a). Combining both cohorts in a meta-analysis (heterogeneity I² = 0%; p = 0.34) yielded a significant average marginal effect of 0.008 per 1-year increase in the AI-ECG age gap (95% CI 0.002–0.014; Figure S8b), translating to a 0.8–percentage-point increase in AF burden.
This analysis was conducted in the S-Patch cohort for all subjects with AF burden >0% detected by the wearable device (including both device-detected and clinically adjudicated AF episodes). a Scatter plot showing the relationship between AF burden proportion (x-axis) and the 48-h average AI-ECG age gap during sinus rhythm (y-axis). The solid red line and shaded band represent the fitted linear trend and its 95% confidence interval (Pearson’s r = 0.130, p = 0.048). b Forest plot of average marginal effects (AMEs) and 95% confidence intervals derived from a multivariable fractional logit regression model (generalized linear model with binomial family and logit link) with AF burden proportion as the outcome. Each point estimate represents the absolute change in AF burden (proportion) associated with a one-unit increase in the corresponding predictor. For the AI-ECG age gap, each additional 1-year increase in the gap was associated with an AME of 0.0074 (95% CI 0.001–0.014), corresponding to a 0.74-percentage-point higher AF burden. Covariates include heart failure, diabetes mellitus, chronological age, height, diastolic and systolic blood pressure, body weight, prior myocardial infarction, antihypertensive medication, sex, and smoking status. The dashed vertical line indicates AME = 0; blue markers denote predictors with p < 0.05.
Within-subject reproducibility of AI-ECG age gap
To evaluate within-subject temporal consistency, each 7–14 day Memo Patch recording (n = 214, non-AF participants) was segmented into six consecutive 48-hour epochs. The mean AI-ECG age gap exhibited high linear reproducibility between adjacent epochs, with Pearson’s correlations ranging from 0.90 to 0.98 (epoch 1 vs. epoch 2: r = 0.96; p < 0.001; Fig. 5a, b). The overall reliability across all six epochs remained very high (two-way mixed-effects ICC[A,1] = 0.93). When dichotomized at the cohort mean from epoch 1 (–7.5 years), binary agreement with the baseline epoch was substantial but diminished slightly over the 14-day interval, decreasing from 92% in epoch 2 to 71% by epoch 6. Corresponding Cohen’s κ values similarly declined, from 0.84 (epoch 2) to 0.39 (epoch 6), reflecting substantial-to-moderate categorical agreement (Fig. 5c, d).
In the Memo Patch cohort, continuous single-lead ECG recordings were divided into six sequential 48-h segments (period 1 = first 48 h, period 2 = second 48 h, …, period 6). At each segment, the AI-ECG age gap was computed as (predicted ECG age − chronological age). To assess categorical consistency, each period_gap was binarized using the threshold of –7.5 years (the mean AI-ECG age gap in Memo Patch period 1): “high” if > –7.5 years and “low” if ≤ –7.5 years. a Scatter plot of period 1 versus period 2 AI-ECG age gaps (years), demonstrating excellent linear agreement (Pearson r = 0.96, p < 0.001, red dashed line = y = x). b Pearson correlation matrix for AI-ECG age gaps across all six periods, with values indicating pairwise r. c Pairwise simple agreement matrix (proportion concordant) for these binary categories. d Pairwise Cohen’s κ matrix quantifying beyond-chance agreement for the same binary classification. Abbreviations: AI artificial intelligence, ECG electrocardiogram, r Pearson correlation coefficient, κ Cohen’s kappa.
Discussion
In this dual-registry cohort study, we developed and validated PROPHECG-Age Single, a novel AI model that estimates “electrocardiographic age” from wearable single-lead ECG recordings using synthetic training data. Across two prospective wearable cohorts, each 1-year increase in the AI-ECG age gap was associated with 3% higher odds of prevalent atrial fibrillation (AF; pooled adjusted odds ratio 1.03, 95% confidence interval [CI] 1.01–1.04) and a 0.80 percentage-point increase in AF burden (pooled adjusted average marginal effect 0.008, 95% CI 0.002–0.014) after comprehensive covariate adjustment. This study is first to provide AI-ECG age estimates in a single-lead wearable environment, demonstrating a modest yet statistically significant association of the AI-ECG age gap with both AF presence and burden. Taken together, these results support the single-lead AI-ECG age gap as a potential, accessible digital biomarker for wearable, continuous AF risk assessment and may contribute to a gradual shift from traditional episodic, hospital-based evaluations toward more personalized, patient-centered cardiovascular monitoring. To advance this field and facilitate broader adoption, we are making both the trained model and its weights publicly available to the research community.
The development of AI models based on single-lead ECGs has emerged as a highly promising avenue for wearable cardiovascular monitoring, yet several structural challenges have limited further progress. First, single-lead recordings are inherently noisier than standard 12-lead ECGs and lack the comprehensive spatial information that multi-lead systems provide across different cardiac regions, fundamentally limiting signal quality and interpretability. Second, unlike decades-accumulated standard 12-lead ECG databases with millions of recordings, available single-lead datasets remain dramatically smaller with limited patient diversity, and the absence of dominant vendors introduces substantial inter-device heterogeneity. To address these limitations, prior studies have attempted to reconstruct 12-lead ECGs from single-lead inputs before applying existing algorithms11,12. However, these approaches face fundamental limitations in attempting to extrapolate limited information into richer representations, resulting in loss of individual physiological variability and convergence toward population averages13,14, as confirmed by our own experiments showing weaker correlations (r = 0.13 vs. 0.26–0.35) and regression-to-the-mean artifacts (Fig. S3).
To our knowledge, this is the first study to adopt the reverse strategy: rather than expanding limited single-lead information, we leverage the rich information content of established 12-lead ECG archives by transforming them into single-lead formats using a forward CycleGAN–based domain adaptation strategy15. This information-preserving approach circumvents the fundamental extrapolation problem. Although the MAE (10.01–11.88 years) derived from PROPHECG-Age Single is modestly higher than in our previous 12-lead studies (4.7–7.9 years)7, it represents a reproducible level of performance in the context of continuous single-lead monitoring, as validated across two prospective cohorts using devices from different vendors. Furthermore, our ablation studies confirmed that this elevated error is not attributable to wearable-like noise or lower sampling rates. Instead, it reflects a structural error floor (~10-year MAE) imposed by the fundamental “cost” of losing spatial information—translating the heart’s complex three-dimensional view (12-lead) into a one-dimensional signal (single-lead). Compared with feature-based linear models, our deep-learning model appears to better capture complex, nonlinear physiological aging patterns while preserving contextual relevance to real-world wearable acquisition.
Building on our previous research demonstrating the association between AI-derived ECG age and AF risk in 12-lead settings7, this study shows that single-lead AI-ECG age gap maintains correlations with both AF presence and burden in continuous monitoring environments. Conventional sinus-rhythm ECG interpretation rarely reveals underlying arrhythmogenic conditions, yet AI algorithms can extract subtle, high-dimensional electrophysiological features—such as changes in P-wave morphology and rhythm regularity—that reflect early or latent atrial pathology even during sinus rhythm and in the absence of overt AF episodes16. Positive relationship between AF burden and AI-ECG age gap implies biological plausibility, aligning with the “AF-begets-AF” paradigm where sustained arrhythmic episodes accelerate atrial remodeling, reflected in heightened electrophysiological aging17. This also suggests that continuous single-lead monitoring can capture cumulative atrial remodeling processes that may facilitate early AF substrate detection before overt clinical manifestations.
Our model offers a step towards a more patient-centered approach by providing individualized electrophysiological age information. This could serve as an intuitive and representative biomarker for cardiac health that patients can readily understand and engage with. Wearable ECG devices primarily capture sinus rhythm, even in patients with paroxysmal AF (median annual AF burden ≈0.1%)18. Our model leverages this abundant sinus rhythm data by estimating electrophysiological age, providing continuous risk assessment beyond episodic AF detection. The PROPHECG-Age Single model capitalizes on these intervals by converting subtle electrophysiological variations into a continuous AI-derived age estimate, offering a personalized measure of AF propensity and disease burden that is not achievable with conventional episodic rhythm detection alone. By making our model and weights publicly accessible and demonstrating robustness across different vendor devices, we facilitate truly democratized, patient-centered care that transcends institutional and technological barriers, potentially accelerating future developments in personalized cardiovascular monitoring.
Our study has several critical limitations, which must be clearly stated. First, our model demonstrated a high MAE (≈ 10–12 years). As our supplementary analyses suggest (Tables S1–S4, Figs. S4-S5), this appears to be not random error but a structural consequence of the information loss inherent to the single-lead setting. Nevertheless, we observed consistent and statistically significant associations under these constraints. Second, the cross-sectional design cannot establish causality or temporality. Our findings represent cross-sectional associations that do not imply predictive or causal relationships. Third, although ECGs were recorded continuously, our analysis used high-quality 10-second segments extracted every five minutes. This high-frequency sampling is best described as quasi-episodic and is conceptually distinct from real-time, beat-to-beat continuous tracking. Fourth, the sample size for the AF burden analysis in the external cohort (n = 24) was insufficient to draw robust conclusions and must be considered strictly exploratory. However, we believe our multi-registry validation design—using different vendors and populations (including both AF-risk and AF-poor registries)—strengthens internal validity. Fifth, while the registries were multi-center, they consisted almost entirely of a single (East Asian) ethnicity, which severely limits generalizability to other populations. Sixth, a direct comparison between the single-lead and a standard 12-lead AI-ECG age gap within the same participants was not feasible, as 12-lead ECGs were not collected in the registries. Finally, the model provides a point estimate of age, and the lack of uncertainty estimates limits the ability to differentiate true physiological variability from model prediction error at the individual level. Future multi-ethnic, prospective, longitudinal studies and larger-scale external validations are essential to overcome these limitations.
PROPHECG-Age Single extends electrocardiographic age estimation from episodic 12-lead ECGs into the more challenging setting of continuous wearable single-lead monitoring. By translating sinus rhythm segments into an AI-derived age gap, the model provides a patient-centered biomarker that shows modest but statistically significant and consistent associations with AF, and may complement traditional event-driven AF assessment. As such, this approach may provide a foundation for future work toward more personalized atrial health monitoring and, with prospective validation, could ultimately contribute to improved AF risk stratification and preventive care.
Methods
Ethical approval and clinical trial registration
The study protocol regarding the development dataset was reviewed and approved by the Institutional Review Board (IRB) of Severance Hospital, Yonsei University Health System (IRB No. 4-2024-1455), which waived the requirement for informed consent due to the retrospective nature of the analysis. For the prospective wearable validation cohorts, data were obtained from two pre-registered clinical trials approved by the respective Institutional Review Boards: the S-Patch registry (IRB No. 1-2021-0002; ClinicalTrials.gov identifier: NCT05119725, registered November 2021) and the Memo Patch registry (IRB No. 1-2022-0008; ClinicalTrials.gov identifier: NCT05355948, registered May 2022). Informed consent was obtained from all participants in these prospective registries. The study was performed in accordance with the Declaration of Helsinki.
Data sources and study populations
This analysis integrated three complementary datasets comprising both retrospective and prospective sources: the Severance Hospital 12-lead ECG archive (model development), the S-Patch registry (internal validation), and the Memo Patch registry (external validation).
The development dataset was extracted from a large institutional repository containing 3,672,020 ECGs from 837,666 individuals who visited or were referred to Severance Hospital, a major tertiary referral center in South Korea, between January 2006 and September 20217. For the present study, strict inclusion criteria were applied: a standard 12-lead configuration, 500 Hz sampling frequency, 10-s duration, and availability of age metadata for participants aged 20–90 years. ECGs lacking proper waveform structure or failing to meet these parameters were excluded. Following this quality assurance process, 1,008,566 high-quality 12-lead ECGs were randomly sampled. These recordings were employed exclusively to train the deep learning model and to generate synthetic single-lead ECGs via a CycleGAN-based domain conversion framework.
For internal validation, the S-Patch dataset was obtained from a prospective multicenter registry (ClinicalTrials.gov: NCT05119725) that enrolled 1980 adults across 15 tertiary and general hospitals in South Korea from September 2021 to August 2024. Continuous single-lead ECG monitoring was performed using the S-Patch EX device (Wellysis Corp., Seoul, Korea) for up to 72 h per participant in ambulatory settings. For analysis, the first consecutive 48-h segment (starting at 00:00 h) was extracted for each subject. For age estimation, only sinus-rhythm 10-s segments were used as model input. AF detection involved a two-step process: an automated deep learning-based detection algorithm followed by manual verification by trained ECG technicians. Participants who had not received a prior diagnosis but exhibited AF rhythm patterns on ECG recordings were designated as “Device-Detected AF.” Subjects who were not diagnosed with AF and exhibited no AF recordings during monitoring were classified as “Non-AF.”
For external validation, the Memo Patch registry (ClinicalTrials.gov: NCT05355948) served as an external cohort, comprising 582 adults enrolled at 13 Korean centers between September 2022 and November 2023. Participants were monitored using the MEMO Patch device (HUINNO Co., Ltd., Seoul, Korea), which provided uninterrupted single-lead ECG recording for up to 14 days. Prior to enrollment, all participants underwent clinical evaluation confirming the absence of AF. However, during monitoring, AF episodes were detected in 24 participants, who were subsequently classified as the AF group for validation of AF burden. Participants with no AF rhythm during monitoring were classified as non-AF. The extended monitoring durations (7–14 days in most cases) enabled a detailed assessment of the temporal stability of the AI-ECG age gap.
Data preprocessing
To ensure consistency and high signal fidelity across all single-lead recordings, a standardized ECG preprocessing pipeline was applied19. Signals were first filtered to remove baseline wander and noise. Thereafter, the recordings were segmented into 10-second epochs every five minutes. Only segments with a signal quality index (SQI) ≥ 0.5 were retained for downstream analysis.
Model development and training
We constructed the AI model through a two-step deep-learning workflow. Initially, an attention-guided Cycle-Consistent Generative Adversarial Network (CycleGAN)20 was developed to translate standard 12-lead ECG recordings (source domain, Domain A; using the standard 8 independent leads: I, II, V1–V6) into synthetic single-lead waveforms that closely matched the signal characteristics of the wearable S-Patch device (target domain, Domain B).
The CycleGAN architecture comprised two generator networks and two discriminator networks. Generator G_AB translated 12-lead ECGs into synthetic single-lead outputs, while G_BA mapped single-lead signals back into 12-lead representations. Both generators were based on a 1D Residual Network (ResNet-1D) architecture. The generators combined an initial convolution with sine activation functions and Instance Normalization, followed by five residual blocks using 1D convolutions with Batch Normalization and ReLU activations. A multi-head self-attention mechanism was incorporated to enhance context-aware synthesis. Both discriminators (D_A and D_B) followed a 1D PatchGAN configuration employing sine activation functions and Instance Normalization to assess local waveform fidelity.
The training objective combined three loss functions: (1) Adversarial binary cross-entropy loss; (2) Cycle-Consistency Loss to ensure identity preservation; and (3) Identity Loss. The overall loss was computed as a weighted sum of these components and optimized using the Nadam optimizer with a batch size of 4. Training utilized 50,000 randomly sampled 10-second 12-lead ECG segments from the Severance archive (Domain A) and 100,000 single-lead ECGs from the S-Patch registry (Domain B). Once trained, the CycleGAN was applied to the entire Severance dataset (n = 1,008,566), generating synthetic single-lead ECG recordings.
Subsequently, these synthetic single-lead ECG recordings were utilized to train a one-dimensional ResNet model (“PROPHECG-Age Single”), adapted from our previously validated PROPHECG-Age architecture7. The model accepts 10-second single-lead ECGs sampled at 200 Hz (2,000 time points). From the synthesized dataset of 1,008,566 single-lead ECGs, data were partitioned into training (80%), validation (5%), and testing (15%) sets. The network architecture begins with an initial projection via a Conv1D layer (kernel size = 7, padding = “same”), followed by Batch Normalization. This is followed by five sequential residual blocks with feature channel dimensions progressing as 32 → 64 → 128 → 256 → 512. Temporal resolution is reduced via strided convolutions within the blocks. Each residual block applies Batch Normalization, ReLU activations, and dropout layers (rate = 0.4). Finally, the feature maps from the last residual block are flattened, and a fully connected dense layer generates the continuous age prediction. Training was performed using the Adam optimizer (initial learning rate 1 × 10⁻³) with a ReduceLROnPlateau scheduler and early stopping triggered after 20 epochs without validation loss improvement. A fixed random seed (42) was used. Training was performed on NVIDIA GPUs using PyTorch DataParallel with a batch size of 256. The loss function was a sample-weighted mean squared error (MSE); sample weights were calculated inversely proportional to the frequency of each age in the combined training and validation sets to mitigate data imbalance.
Validation of AI-ECG age estimation
The PROPHECG-Age Single model was validated in two independent wearable single-lead ECG cohorts using distinct devices. Performance was assessed by comparing the AI-predicted ECG age with the participant’s chronological age.
For internal validation, the S-Patch registry was utilized. Given that the CycleGAN was trained using single-lead waveforms specifically from the S-Patch device, validation within the S-Patch registry constituted an internal validation. Of 1980 enrolled participants, 1502 satisfied the analytic inclusion criteria: age 20–90 years, continuous recording duration ≥48 h, presence of at least one valid sinus-rhythm epoch, and available clinical AF status. To calculate the AI-ECG age for each participant, the 48-h recording was segmented into non-overlapping 5-min windows. From each window, the highest-quality 10-s sinus-rhythm ECG segment was extracted (maximum 576 segments per participant). Segments failing predefined quality thresholds (SQI < 0.5) were discarded. The final AI-ECG age for a participant was computed as the arithmetic mean of the predicted ages from all valid remaining segments. Model accuracy was quantified using the mean absolute error (MAE) and Pearson correlation coefficient (r) relative to chronological age.
External validation utilized the Memo Patch registry to assess model generalizability across different devices. The Memo Patch registry enrolled 582 participants, comprising a 7–14 day monitoring group (n = 280) and a 24-hour group (n = 302). Of these, 529 participants met the analytic inclusion criteria (age 20–90 years, ≥ 1 valid sinus-rhythm epoch, and available AF status). All available ECG data were processed using the identical analytical pipeline applied in the internal validation: extraction of the highest-quality 10-second sinus-rhythm segment every five minutes, removal of low-quality segments, and calculation of the participant-level AI-ECG age as the mean of the valid predictions. Performance was likewise evaluated using MAE and Pearson’s r.
Performance contextualization, baseline comparison, and systematic bias
To contextualize single-lead performance, we performed ablation experiments using a dedicated “contextualization set” of 100,000 resting 12-lead ECGs randomly sampled from the Severance Hospital archive7. Only ECGs with normal sinus rhythm were included, comprising 10-s synchronous signals sampled at 500 Hz without waveform preprocessing. To avoid data leakage, a patient-wise split was used (81% training, 9% validation, 10% test). For these experiments, we utilized our previously validated state-of-the-art architecture, PROPHECG-Age7, which is a one-dimensional convolutional neural network (1D-CNN) with consecutive residual blocks. The models were trained using the Adam optimizer (batch size 64) for up to 100 epochs, with early stopping (patience = 30). All experiments were repeated three times with distinct random seeds (42, 2025, 2026) to assess stability.
Three specific ablation experiments were conducted to quantify performance degradation relative to this 12-lead benchmark. First, a lead-ablation experiment quantified spatial information loss by comparing models trained on 8-lead (I, II, V1–V6), 4-lead (I, II, V2, V5), 2-lead (II, V2), and 1-lead (II) sets. Second, a wearable-noise robustness experiment assessed 1-lead (Lead II, 200 Hz) performance under five conditions: clean (no noise), baseline drift (0.25 Hz sinusoidal, 5% amplitude), motion-like bursts (Hann-windowed low-frequency perturbations, 8–10% amplitude), EMG-like broadband noise (high-frequency, 1.5–2% amplitude), and a combination of all noise types. Third, a sampling-rate ablation experiment evaluated the impact of temporal resolution by comparing the 500 Hz standard against downsampled rates common in wearables (400, 300, and 200 Hz), with kernel sizes scaled proportionally to preserve the temporal receptive field (~34 ms).
To contextualize the deep learning (DL) model’s performance against a transparent baseline, we compared it with feature-based linear regression models using the independent S-Patch registry cohort (N = 1502). A feature pipeline involving band-pass filtering (0.5–40 Hz), R-peak detection, and morphology delineation was used to extract beat-level and heart rate variability (HRV) features. We defined three feature sets: Minimal-3 (QRS median, QTc Fridericia median, Heart Rate median), Core-6 (Minimal-3 plus PR median, SDNN, RMSSD), and Extended-12 (Core-6 plus QT median, QRS area, amplitudes, pNN50, and Sample Entropy). To ensure robust comparison, we performed 1000 resampled runs. In each run, a bootstrap sample was split into 80% training (for linear regression) and 20% testing. Performance was compared against the DL model (PROPHECG-Age Single) using paired differences on the same test sets.
Uncertainty in performance metrics was quantified using nonparametric bootstrapping with 1000 resamples to generate 95% confidence intervals (CIs). This procedure was applied to the S-Patch cohort overall and within prespecified strata (sex and 10-year age bins). Primary performance metrics included mean absolute error (MAE), Pearson’s correlation coefficient (r), and calibration parameters (slope and intercept). To mitigate distributional imbalance across age groups, we computed Macro-MAE (the arithmetic mean of MAE within 10-year age bins). We also evaluated extreme-age MAE (≤40 years and ≥75 years) to characterize error structure at the boundaries of the lifespan. Systematic bias was evaluated using Bland–Altman analysis (mean bias and 95% limits of agreement). Fairness was assessed by stratifying performance metrics by sex and age groups. All statistical comparisons were two-sided, and p-values for the baseline comparisons were derived from the distribution of paired differences across the 1000 resampled runs.
Association of AI-ECG age gap with AF presence and burden
The clinical relevance of the AI-ECG age gap was investigated in the S-Patch (internal) and Memo Patch (external) cohorts. First, we assessed the associations between the AI-ECG age gap during sinus rhythm and AF presence. In the S-Patch cohort, 1,502 participants were evaluated based on device-detected rhythm and clinical records. We compared the mean AI-ECG age gap between participants with and without AF using Welch’s t-test. To quantify the independent predictive value, we fitted multivariable logistic regression models adjusted for sex and all components of the CHARGE-AF risk score: chronological age, height, weight, systolic and diastolic blood pressure, smoking status, antihypertensive treatment, diabetes mellitus, heart failure, and prior myocardial infarction21. An identical analytic pipeline was applied to the external Memo Patch cohort (n = 529), and cohort-specific odds ratios per 1-year increase in AI-ECG age gap were pooled via meta-analysis using fixed- or random-effects methods, depending on inter-study heterogeneity.
We further investigated whether the magnitude of the AI-ECG age gap correlated with AF burden among those in whom AF was detected. In the S-Patch cohort, 233 participants had at least one AF episode during the first 48 h, allowing us to define AF burden as the percentage of total recording time spent in AF (range 0–100%). Unadjusted relationships were first explored with Pearson’s correlation coefficient. To account for the bounded nature of the AF burden outcome, we then employed fractional logit regression (binomial family, logit link), with AF burden as the dependent variable and AI-ECG age gap as the independent variable, adjusting again for sex and all CHARGE-AF covariates. Model results were computed as average marginal effects, representing the change in percentage points of AF burden per 1-year increment in AI-ECG age gap. In the Memo Patch cohort, 24 participants with recorded AF episodes underwent the same analysis. Cohort-specific marginal effects were subsequently combined through meta-analysis.
Temporal consistency analysis
To assess the temporal consistency of the AI-ECG age gap as a stable, personalized biomarker, we analyzed 214 non-AF participants from the Memo Patch registry who underwent continuous single-lead ECG monitoring for at least 7 days (typically 7–14 days). For each participant, the recording was partitioned into up to six consecutive, non-overlapping 48-hour epochs, depending on the available monitoring duration. For every epoch, we computed the mean AI-ECG age gap from all valid sinus-rhythm segments. Continuous reproducibility was quantified in two ways: first, by calculating Pearson’s correlation coefficients between the mean age gaps of adjacent epochs to gauge short-term stability; and second, by estimating a two-way mixed-effects intraclass correlation coefficient (ICC) for absolute agreement across all available epochs per participant (maximum of six) to capture overall repeatability. Additionally, to evaluate categorical reliability, we dichotomized the age gap at –7.5 years (the mean gap observed during the first epoch) and determined pairwise percentage agreement and Cohen’s κ for every possible epoch pair.
Statistical analysis
All quantitative variables were summarized as mean ± standard deviation (SD) or median (interquartile range) according to distributional normality, and categorical variables as counts (percentages). Between-group comparisons of continuous outcomes employed Welch’s t-test, with multivariable linear or logistic regression used for adjusted analyses as appropriate. Categorical differences were tested by χ² or Fisher’s exact test. Within-subject reproducibility was quantified with ICC(A,1). Statistical analyses were conducted in Python (NumPy, pandas, SciPy) and R v4.4.3, with meta-analyses performed using the meta package v6.5-0. A two-sided p < 0.05 was deemed statistically significant.
Data availability
The anonymised ECG datasets used in this study are not publicly available due to patient privacy restrictions and institutional data protection policies. However, the data will be made available to qualified investigators for the purpose of replicating the analyses and findings upon reasonable request to the corresponding authors, subject to appropriate ethical approvals and institutional authorizations.
Code availability
The complete AI algorithm (PROPHECG-Age Single), including source code and pre-trained model weights, is openly available on GitHub (https://github.com/dr-you-group/PROPHECG-Age-Single) and archived on Zenodo with the identifier https://doi.org/10.5281/zenodo.18218561. Any additional custom scripts used for data preprocessing are available from the corresponding authors upon reasonable request.
References
Roberts, J. D. et al. Epigenetic age and the risk of incident atrial fibrillation. Circulation 144, 1899–1911 (2021).
Hamczyk, M. R., Nevado, R. M., Barettino, A., Fuster, V. & Andrés, V. Biological versus chronological aging: JACC focus seminar. J. Am. Coll. Cardiol. 75, 919–930 (2020).
Linz, D. et al. Atrial fibrillation: epidemiology, screening and digital health. Lancet Reg. Health Eur. 37, 100786 (2024).
Freedman, B. et al. World Heart Federation roadmap on atrial fibrillation—a 2020 update. Glob. Heart 16, 41 (2021).
Lima, E. M. et al. Deep neural network-estimated electrocardiographic age as a mortality predictor. Nat. Commun. 12, 5117 (2021).
Saleh, G. et al. Artificial intelligence electrocardiogram-derived heart age predicts long-term mortality after transcatheter aortic valve replacement. JACC Adv. 3, 101171 (2024).
Cho, S. et al. Artificial intelligence–derived electrocardiographic aging and risk of atrial fibrillation: a multi-national study. Eur. Heart J. 46, 839–852 (2025).
Park, H. et al. Artificial intelligence estimated electrocardiographic age as a recurrence predictor after atrial fibrillation catheter ablation. npj Digit. Med. 7, 234 (2024).
Attia, Z. I. et al. Age and sex estimation using artificial intelligence from standard 12-lead ECGs. Circ. Arrhythm. Electrophysiol. 12, e007284 (2019).
Mossavarali, S. et al. Determinants of artificial intelligence electrocardiogram-derived age and its association with cardiovascular events and mortality: a systematic review and meta-analysis. npj Digit. Med. 8, 1–13 (2025).
Gundlapalle, V. & Acharyya, A. Proc. IEEE 13th Latin America Symposium on Circuits and System (LASCAS) 01–04 (IEEE, 2022).
Seo, H.-C., Yoon, G.-W., Joo, S. & Nam, G.-B. Multiple electrocardiogram generator with single-lead electrocardiogram. Comput. Methods Programs Biomed. 221, 106858 (2022).
Obianom, E. N., Ng, G. A. & Li, X. Reconstruction of 12-lead ECG: a review of algorithms. Front. Physiol. 16, 1532284 (2025).
Presacan, O. et al. Evaluating the feasibility of 12-lead electrocardiogram reconstruction from limited leads using deep learning. Commun. Med. 5, 139 (2025).
Shin, S. J. et al. Style transfer strategy for developing a generalizable deep learning application in digital pathology. Comput. Methods Programs Biomed. 198, 105815 (2021).
Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394, 861–867 (2019).
Rivner, H., Mitrani, R. D. & Goldberger, J. J. Atrial myopathy underlying atrial fibrillation. Arrhythm. Electrophysiol. Rev. 9, 61 (2020).
Charitos, E. I., Pürerfellner, H., Glotzer, T. V. & Ziegler, P. D. Clinical classifications of atrial fibrillation poorly reflect its temporal persistence: insights from 1,195 patients continuously monitored with implantable devices. J. Am. Coll. Cardiol. 63, 2840–2848 (2014).
Schlesinger, D. E. et al. Artificial intelligence for hemodynamic monitoring with a wearable electrocardiogram monitor. Commun. Med. 5, 4 (2025).
Mohebbian, M. R. et al. Fetal ECG extraction from maternal ECG using attention-based CycleGAN. IEEE J. Biomed. Health Inform. 26, 515–526 (2021).
Alonso, A. et al. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J. Am. Heart Assoc. 2, e000102 (2013).
Acknowledgements
This study was funded by the Ministry of Health & Welfare, Republic of Korea (RS-2022-KH125397, RS-2022-KH129902, RS-2023-00265440, RS-2024-00397290, HI22C0452), the Ministry of Science and ICT, Republic of Korea (RS-2025-24533659), Samjin Pharmaceutical, Yuhan Corporation, Wellysis, and HUINNO. We express our gratitude to Severance Hospital for providing invaluable ECG data that made this research possible. S.H.P. acknowledges support from the Yonsei University College of Medicine MSTP Scholarship. We also appreciate Daeun Joung for her support.
Author information
Authors and Affiliations
Contributions
S.H.P. led the study, taking primary responsibility for manuscript drafting (including tables and figures), AI model development, and data analysis. J.H.J. contributed to data analysis and provided research assistance. B.J. designed and supervised the cohort studies. S.C.Y. contributed to model development, designed the statistical analysis plan, and provided overall supervision. The study concept was developed jointly by S.C.Y., H.T.Y., and B.J., who also offered critical feedback throughout the project. J.K., D.L., D.K., and J.J. (Industry collaborators) secured, processed, and provided the single-lead wearable ECG data. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
S.H.P. declares no competing interests beyond the institutional funding reported above. J.H.J., D.L., and H.T.Y. declare no competing interests. J.K. is a shareholder of Wellysis Corp. and reports pending patent applications related to atrial fibrillation prediction using AI (United States Application No. 18/636,402, filed 15 April 2024; Republic of Korea Application No. 10-2023-0069397, filed 30 May 2023). D.K. and J.J. are shareholders of HUINNO Corp. S.C.Y. serves as the Chief Executive Officer of PHI Digital Healthcare, reports grants from Daiichi Sankyo, and is a coinventor of granted Korean Patents (DP-2023-1223, DP-2023-0920) and pending Patent Applications (DP-2024-0909, DP-2024-0908, DP-2022-1658, DP-2022-1478, DP-2022-1365, PATENT-2025-0039190, PATENT-2025-0039191, PATENT-2025-0039192, PATENT-2025-0039193, PATENT-2025-0039194), all unrelated to the present work. B.J. has served as a speaker for Bayer, BMS/Pfizer, Medtronic, and Daiichi-Sankyo, and received research funds from Samjin, Yuhan, Medtronic, Boston Scientific, and Abbott Korea.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Park, S.H., Jin, J.H., Kim, J. et al. Wearable device derived electrocardiographic age and its association with atrial fibrillation. npj Digit. Med. 9, 157 (2026). https://doi.org/10.1038/s41746-026-02344-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-026-02344-8







