Fig. 2: Cluster characteristics and prognostic outcomes for AD and PD.
From: Subtyping Alzheimer’s disease and Parkinson’s disease using longitudinal electronic health records

a,b, Cluster labels and main characteristics for both AD (a) and PD (b). c, AD population distribution on CPRD validation dataset (total n = 22,664), 5-year mortality and hospitalization rates for AD, mortality global log-rank P = 2.7 × 10−50; hospitalization global log-rank P = 3.0 × 10−18. d, PD population distribution (total N = 8,946), mortality global log-rank P = 2.1 × 10−24; hospitalization global log-rank P = 1.5 × 10−4. Solid lines represent the estimated survival or hospitalization rates, and shaded regions represent the 95% CIs (c and d). e, AD 5-year post-diagnosis mean MMSE scores across clusters. Data are shown as violin plots, with a narrow box-and-whisker overlay indicating the median (center line), upper and lower quartiles (box limits) and whiskers extending to ±1.5× the IQR; individual points beyond the whiskers represent outliers. Statistical differences between clusters were assessed using two-sided Mann–Whitney U tests. Number of samples: cluster 1, 1,618; cluster 2, 1,435; cluster 3, 863; cluster 4, 545, cluster 5, 989. f, The 10-year MMSE scores trend; the bars represent mean ± standard error of mean (s.e.m.) at each point. The dotted line represents AD diagnoses. Sample size per cluster (patients with more than one MMSE in 10 years): cluster 1, 1,767; cluster 2, 1,623; cluster 3, 1,046; cluster 4, 768, cluster 5, 1,179.