Fig. 1: Study design and analytical workflow. | Nature Aging

Fig. 1: Study design and analytical workflow.

From: Subtyping Alzheimer’s disease and Parkinson’s disease using longitudinal electronic health records

Fig. 1

Longitudinal prediagnostic EHRs from the CPRD and UK Biobank were used to subtype AD and PD. Each patient’s time-stamped EHR were tokenized by visit, age and calendar year to construct sequential inputs for a transformer model. Patient embeddings derived from the model were clustered using K-means to identify data-driven subtypes. The CPRD cohort was split into derivation (80%) and validation (20%) sets based on GP identifiers. UK Biobank served as an external validation dataset. Subsequent analyses compared clusters with respect to prognosis (5-year follow-up for mortality and hospitalization), comorbidities (prediagnostic EHR records and medication history), symptoms (pre- and post-diagnosis disease-related symptoms within ±5 years) and genetics (disease-related PRS and SNPs).

Back to article page