Background & Summary

Atrial Fibrillation (AF) is the most prevalent arrhythmia which diminishes the quality of life and increases the risk of severe complications such as heart failure and stroke1,2. Because (1) maintaining sinus rhythm in patients with AF (so called “rhythm control”) by either drugs or catheter ablation not only improves quality of life but also reverses structural abnormality of heart, and extends lifespan at least partly by reducing major cardiovascular events3,4,5,6,7,8,9; and (2) early treatment appears effective to enhance the treatment effect10,11,12,13, early and acute diagnosis is crucial. However, current diagnostic tools for AF are suboptimal, leading to a lack of awareness and underdiagnosis of the condition.

ECG is the standard diagnostic tool of heart disease and required to confirm the diagnosis of AF. A challenge in AF diagnosis at its early stage is that a patient is in sinus rhythm for most time. In this context, diagnostic sensitivity crucially depends on the duration of ECG recordings. and across the modality, there is a trade-off between sensitivity and availability. The diagnostic sensitivity of standard 12-lead ECG is markedly hampered by brief recording times (10 seconds to 3 min) and limited access (only available at a healthcare provider facility). Therefore, Holter ECG monitor, which is capable of continuous monitoring ~24 hours plays a crucial role and therefore widely used as a major diagnostic tool1,14,15,16,17,18,19,20.

Prior works

There are a few publicly available AF ECG database (Table 1). These include the MIT-BIH arrhythmia database (MITDB) from USA including 48 two-channel ECG recordings of 30 minute each, sampled at 360 Hz with 11-bit resolution over a 10 mV range21. MIT-BIH Atrial Fibrillation database (AFDB) includes 23 long-term ECG recording with AF, mostly paroxysmal, 10 hours in duration, with two-channel ECG, each sampled at 250 samples per second with 12-bit resolution over a 10 mV range21,22. Recordings were manually annotated for AF, Atrial flutter (AFL) and AV junctional rhythms. Long Term AF database (LTAFDB) contains 84 long-term ECG recordings recorded by two-channels belonging to subjects with paroxysmal or sustained AF. Those recordings are of approximately 24 hours long digitized at 128 Hz with 12-bit resolution over 20 mV range23. Rhythms were automatically generated and manually verified by an experienced team of ECG technicians.

Table 1 Comparison of SHDB-AF to previous works.

More recently, IRIDIA-AF database24, including 167 paroxysmal AF tracings from 152 patients, 19–95 hours at 200 Hz, emerged as a new resource for ML/DL research. Icentia11k is a database from Canada; 1–2 week-long continuous raw ECGs were recorded with single-lead CardioSTAT patch device, containing some AF events25. CPSC2021 contains total of 47 patients with AF events of varied duration from a single-lead device26.

Research gaps, objectives, and summary

A major challenge in translational medical AI is generalizability. Generalizability in DL refers to the ability of a model to perform well on new, previously unseen data that include a variety of geography, ethnicity, acquisition device, age groups and disease groups. Such databases are scarce.

Especially, more data from north-east Asia (e.g., Japan, Korea, China and so on) is critically needed as the area is infamous for a rapid aging of the large population, leading to an ominous view that number of patients with age-related disorders including AF should markedly increase in near future27.

In clinical scenario, every AF ECG is to be interpreted with surrounding information for better clinical decision making as AF is understood as not only ECG abnormality but also chronic, progressive disease that should be managed in the patient-centered approach, which includes (1) anticoagulation to prevent stroke in selected patients, (2) lifestyle modification, and (3) symptom relief by anti-arrhythmic drugs and ablation. However, none of the prior databases provide detailed clinical information, leaving a critical research gap.

In the present work, we developed a new 24-hour Holter AF database from Japan, a north-east Asian island. AF segment is manually annotated by certified cardiologists and with relevant clinical information in the majority of tracings. Detailed clinical information such as age, body weight and height, clinical category (paroxysmal vs. persistent), duration from first diagnosis of AF, presence or absence of anti-arrhythmic drugs, anticoagulative agents and ablation, etc. of each tracing is provided.

Methods

Ethics

The present work was approved by the institutional ethics committee at Saitama Medical University International Medical Center (IRB number 2023-145). Due to retrospective and descriptive nature of the study, written informed consent was waived.

Patient characteristics

Table 2 presents patient characteristics from the SHDB-AF database. The database includes 128 patients, with a mean age of 68.0 years (SD: 11.3) and 36.7% female. Key metrics include a mean height of 1.6 m (SD: 0.1), mean weight of 62.4 kg (SD: 13.9), and a mean BMI of 23.0 (SD: 4.1). AF types are distributed as paroxysmal (62.5%), persistent (8.6%), and non-AF cases (25.8%). Median time between first diagnosis of AF and Holter Median AF duration was 9.0 months [IQR: 0.0, 36.0, Fig. 1. A history of atrial flutter is noted in 13.3%, and 32.0% underwent previous ablations. Antiarrhythmic drugs are used sparingly, with 72.7% on none, and beta blockers and anticoagulants show diverse usage patterns. Comorbidities include hypertension (33.6%), stroke (14.8%), and vascular diseases (11.7%).

Table 2 Patient characteristics of the SHDB-AF database.
Fig. 1
figure 1

Time between first diagnosis and Holter. The bin “0” indicates that AF was diagnosed for the first time by the Holter ECG included in SHDB-AF.

Supplemental Table provides detailed individual data from a study, capturing diverse patient characteristics, medical history, and comorbidities. Key values include (1) basic Demographics: Age, sex, height, weight, and BMI, (2) medical Details: AF type, comorbidities (e.g., congestive heart failure, hypertension, diabetes), and anticoagulation and anti-arrhythmic drug use (3) diagnostic & treatment History: dates and indications for Holter monitoring, first diagnosis of AF/AFL, AF ablations, and echocardiographic findings (e.g., LAD, LVEF, valve assessments), and (4) procedural Details: Information on ablations, redo procedures, and permanent pacemaker status. The table provides a detailed dataset that facilitates analysis of atrial fibrillation management and outcomes, offering insights into procedural timelines, clinical characteristics, and patient demographics.

AF burden

The AF burden (AFB) is defined as the percentage time spent in AF, Eq. 1.

$${AFB}=100\times \frac{{Time\; spent\; in\; AF}}{{Total\; monitoring\; time}}$$
(1)

Time spent in AF is the total duration a patient spends in atrial fibrillation during the monitoring period and Total monitoring time is the overall duration of the monitoring period. The result is multiplied by 100 to express the AF burden as a percentage, ranging from 0 to 100.

The distribution of AFB across the entire database is shown in Fig. 2, with mean ± std of 19.5 ± 27.6%. Regarding the AFB calculation, it is derived as the sum of annotated peak rhythms divided by the overall recording length. The first bin represents the 0% value, while all subsequent bins are defined with a 2% increment, covering ranges such as 0–2% (excluding 0%), 2–4%, 4–6%, and so on, based on expert annotations. This approach provides a clearer distinction between the absence of AF burden and the minimal presence of AF burden. Note that AFB 1% represents 14.4 min, a short-lasting but not necessarily negligible AF episode.

Fig. 2
figure 2

Histogram of the AFB (AF Burden) for SHDB.

Inclusion criteria

The most common indication of Holter was diagnostic workup for palpitations (53/128). This was followed by routine Holter following AF with and without ablation (44/ and 8, respectively) and so on (Fig. 3). There were no specific instructions to the patients. Patients were advised to spend 24-hours as natural as possible with symptomatic events to be noted.

Fig. 3
figure 3

Holter indication.

Definition

In the present work, AF is categorized based on clinical diagnosis into paroxysmal, persistent, or permanent as described below28:

  • paroxysmal AF (PAF: episodes of arrhythmia that terminate spontaneously)

  • persistent AF (PeAF: episodes that continue for >7 days and are not self-terminating)

  • permanent AF (not included in the present study: ongoing long-term episodes)

Note that some persistent AF patients presented with ongoing AF episodes >24 hours on Holter recording and restored sinus rhythm later on while some persistent AF is treated by either ablation, electrical cardioversion or antiarrhythmic drugs and happen to be either sinus rhythm with short-lasting AF or even non-AF on the Holter recording day. Such cases are all categorized as persistent based on clinical history because of chronic and progressive nature of AF and persistent AF is no longer considered paroxysmal once diagnosed even though the treatment may result in sinus rhythm recovery.

Data preparation

To avoid potential risk of security breach, we generated 2 types of arbitrary IDs i.e., “Study_ID” (unique ID for each Holter tracing) and “Subject_ID” (unique ID for each patient), which are both irrelevant to original IDs of the clinical record. Subject_ID may be used to identify a few Holter tracings from the same patient on different occasions.

Holter recordings were recorded using Fukuda Holter monitor and digitized at 125 Hz from two available leads, CC5 and NASA (Fig. 4). The least significant bit corresponds to 9.76  μV. Hardware (analog) filters were applied for both LPF and HPF (LPF 40 Hz and HPF 0.51 Hz). The frequency of powerline interference is 50 Hz in the subject area.

Fig. 4
figure 4

Electrode placement of Holter leads used in this research.

Overall, 147 Holter recordings were collected between November 2019 and January 2022 (Fig. 5). ECG tracings and relevant clinical information (Supplemental Table) were gathered and reviewed for quality check. Of 145 Holter tracings gathered, 128 tracings were included in the present database while 17 were excluded due to duplicate, missing clinical information and low signal quality. As a result, 128 recordings belonging to 122 unique patients were selected. While most tracings were from unique patients, two Holter tracings from two different occasions were collected from 5 patients (Study ID #5 and #20 from Subject_ID 4899921, Study_ID #15 and #47 from Subject_ID 4339581, Study_ID #35 and #36 from Subject_ID 573723, Study_ID #52 and #128 from Subject_ID 5196228, Study_ID #66 and #118 from Subject_ID 5133906).

Fig. 5
figure 5

Inclusion criteria.

Manual rhythm inspection and annotation

Among the included 128 tracings, 98 were manually annotated by certified cardiologists using the PhysioZoo software29,30 at the beat level. Supraventricular arrhythmias were divided and annotated into five categories: (1) AF, (2) AFL, (3) atrial tachycardia (AT), (4) Other supraventricular tachycardias such as Wolf-Parkinson-White and intranodal tachycardias and (5) other, such as NSR, that were not labeled. Because distinction between AF and AFL is often difficult or even impossible, we used AF_l as a combined label for AF and AFL. Reading time by a clinical fellow was estimated to be on average 45 min per 24 h Holter recording.

Clinical information

Detailed clinical information regarding each Holter tracing (Supplemental Table) was retrieved from medical records and reviewed by certified cardiologists.

Data Records

SHDB-AF31 is deposited and available at PhysioNet32.

Database description

The data files are provided in open WFDB standard format. The ECG waveforms are stored in *.dat files. The header files (.hea) are associated with specific recording files and specifies attributes such as associated.dat file name, number of leads, sampling frequency, recording date and time. The.qrs files are the R-peak annotation files detected by epltd algorithm33 and were used as the input data to an DL algorithm for AF detection describe in the Performance section. ECG example is given in Fig. 6.

Fig. 6
figure 6

Segment of an ECG example extracted as part of this research shown using PhysioZoo29, AFIB: Atrial Fibrillation.

Column headings of clinical detail

The column headings of the supplemental table, which details clinical information are listed in Table 3.

Table 3 Description of column names of the supplemental table that contains clinical detail associated with each tracings.

Technical Validation

The quality assessment for the waveform data was done during the data selection process. As stated previously, the data was validated by a board-certified cardiologist.

Signal quality index (bSQI)34 was computed for non-overlapping windows of 60-beat. Reference beat annotation were detected using the epltd implementation of the Pan and Tompkins algorithm33. Those were compared against test peaks detected with xqrs32, with an agreement window of 50 ms. A bSQI lower than 0.8 were pre-defined as low quality. Recordings with a rate of exclusion, i.e. the number of excluded windows over the total number of windows exceed 75% were considered corrupted by noise and were to be discarded. In the present database, the median and interquartile range (Q1-Q3) of bSQI for 60-beat windows was 1.0 (1.0–1.0) and 0.999 (0.997–0.999) for the entire recordings, resulting in all data passing the pre-defined rejection criteria.

We validated SHDB-AF using two DL algorithms: ArNet235 and RawECGNet36. ArNet2 processes 60-beat RR intervals to classify AF/non-AF via residual blocks for feature extraction and GRUs for temporal learning. Trained on UVFDB (separate database provided from the University of Virginia), it was tested on four global databases (ones from SHDB-AF, China, Israel, and USA) and a combined set, achieving an F1 score of 0.92, indicating the validity of SHDB-AF. RawECGNet, trained and evaluated similarly, uses 30-second raw ECG windows, yielding an F1 of 0.93. Full details are in Biton et al.35 and Ben-Moshe et al.36.