Introduction

White matter hyperintensities (WMHs) are among the most common brain abnormalities observed on brain MRI and serve as key imaging biomarkers of brain health. They are typically linked to cerebral small vessel disease (SVD) due to arteriolosclerosis, lipohyalinosis, or other chronic vascular alterations, particularly in older individuals with or without vascular risk factors such as hypertension and diabetes1,2,3,4,5,6,7,8,9,10,11. WMH burden is a risk factor for stroke and poor functional outcomes following stroke12,13,14,15,16,17,18,19,20,21. However, WMH distribution varies in both location and size22,23, potentially reflecting its diverse underlying pathologies24,25,26,27,28,29. This complexity limits the utility of WMH as a cerebrovascular biomarker. Identifying distinct spatial subtypes of WMH may contribute to more tailored approaches in stroke prevention and treatment.

The most widely used spatial classification was first introduced by Fazekas et al.30. It divides WMH into periventricular white matter region (PVWM) and deep white matter region (DWM), grading their progression independently by using semi-quantitative visual rating scales. Recent approaches have employed more detailed spatial subdivisions based on functional lobes23,31,32 or arterial territories23,33, along with volumetric measurements. Yet these methods assess WMH progression using only cross-sectional approaches, without considering longitudinal changes34,35,36.

Recently, Subtype and Stage Inference (SuStaIn) modeling37, a machine learning-based probabilistic algorithm, has been proposed to identify distinct spatiotemporal trajectories of disease progression by applying longitudinal inference to large cross-sectional datasets. The algorithm assumes that disease progression follows distinct subtypes with subtype-specific stages and that each individual’s phenotype reflects a snapshot of progression at a particular stage for a given subtype. This method has successfully revealed distinct disease progression trajectories for cortical atrophy, tau pathology, brain lesion volume, and T1/T2 ratio, reflecting unique genotypes, demographics, cognitive profiles, and prognoses in various types of dementia37,38 and multiple sclerosis39. To date, this method has not been used to explore trajectories of WMH progression.

In this study, we employed SuStaIn modeling to identify distinct spatiotemporal trajectories of WMH progression in 9179 consecutive patients admitted with acute ischemic stroke to 11 academic stroke centers in Korea. We found that, indeed, subtyping WMH severity and location defined distinct spatiotemporal patterns of WMH progression, identified subtypes enriched for certain demographics and vascular risk factors, and proved useful for predicting distinct stroke outcomes.

Results

Three distinct spatiotemporal trajectories of WMH progression: fronto-parietal, radial, and temporo-occipital lesion growth

The study population flowchart is shown in Supplementary Fig. 1, and baseline characteristics of the study groups are summarized in Table 1 and Supplementary Table 1. We used SuStaIn modeling37 to identify distinct spatiotemporal patterns of WMH progression in 9179 (5395 males, 59%) consecutive patients with acute ischemic stroke (Korean MRI-based stroke database11,40,41,42). SuStaIn modeling simulates disease progression as a series of discrete, probabilistic events with a disease-related biomarker reaching predefined levels classified as abnormal37. First, we utilized fluid-attenuated inversion recovery (FLAIR) MRIs of low-risk controls (n = 13,811, UK Biobank) without vascular risk factors to establish baseline distributions of WMH volumes in 20 regions of interest (ROIs) across five functional brain lobes (Supplementary Fig. 2). Next, we assessed WMH severity in stroke patients and high-risk controls (n = 22,399, UK Biobank) by normalizing each group’s WMH volumes to the age-adjusted WMH volumes of low-risk controls. Severity can reach a threshold, predefined as a z-score of 1.5 or greater, in each of the 20 ROIs. Therefore, there were 20 total WMH progression stages spanning all 20 ROIs.

Table 1 Baseline characteristics in study groups

Based on a cross-validation (CV) process (Supplementary Fig. 3a), we found that a three-subtype model was optimal for addressing the spatial heterogeneity of WMH progression. Although models with more subtypes showed higher log likelihoods and lower CV-based information criteria (CVIC), indicating better model fits, such models began to plateau beyond three subtypes. Models with more than three subtypes also showed a substantial decrease in progression pattern similarity between CV folds within the same subtypes, signifying a decline in the robustness of detecting subtype-specific progression patterns. Moreover, pattern similarity substantially increased between different subtypes within CV folds, a result that signals poor spatiotemporal discrimination between progression patterns of different subtypes.

The three-subtype model identified distinct subtypes of spatiotemporal WMH progression (Fig. 1a) across five different brain lobes (Supplementary Fig. 2). WMH growth was primarily from the frontal lobe to the parietal lobe in Subtype 1 (fronto-parietal, FP), radial across all lobes in Subtype two (radial), and from the temporal lobe to the occipital lobe in Subtype three (temporo-occipital, TO). In all three subtypes, WMH grew outward from the periventricular regions. These findings were corroborated by voxel-wise frequency mapping of WMH (Supplementary Fig. 4).

Fig. 1: Distinct spatiotemporal trajectories of white matter hyperintensity (WMH) progression in stroke patients.
figure 1

a Spatiotemporal patterns of distinct WMH progression subtypes (fronto-parietal [FP]; radial; temporo-occipital [TO]). Bullseye presentations (F: frontal; B: basal ganglia; T: temporal; P: parietal; O: occipital) and median WMH severity maps are presented for each WMH subtype at WMH stages 2, 6, 10, and 14. The z-axis coordinates in the Montreal Neurological Institute space appear in the bottom right corner of the first three brain slices in the FP subtype. b Left, Sankey diagram illustrating patient distributions across the number of subtypes in Subtype and Stage Inference (SuStaIn) modeling. Main results in this study utilized the three-subtype model (*). Groups in models were assigned the same color as the subtype with the highest patient proportion based on the three-subtype model. Middle-stage distributions across subtypes. Patients at stage 0 were not included in any other subtypes (FP, radial, and TO subtype). Right, scatter plot showing maximum likelihood (ML) subtype probabilities (prob.) of individuals, in a 2D projection on a triangular plane. c Left, distributions of follow-up (FU) years across subtypes. Middle, longitudinal subtype stability rates (number of patients) by subtypes at admission and FU. Right, simple linear regression graph of observed vs. predicted stage changes in stroke patients after FU. Two-sided P and R-squared values for the correlation are provided (P = 2.94 × 10−22). Scatter dot sizes represent number of patients (n = 1, 2 ≤ n ≤ 5, 6 ≤ n ≤ 10, and 11 ≤ n ≤ 18). These findings suggest baseline assignment to a subtype is mostly stable, with relatively few patients crossing out of their initially assigned group. Source data are provided as a Source Data file.

Of all patients (n = 9179), 443 (5%) were assigned as stage 0 and were not classified to any subtype (FP, radial, and TO subtype). Among the remaining 8736 patients, 1870 (21%), 4014 (46%), and 2852 (33%) were classified as FP, radial, and TO subtypes, respectively. In addition to these three-subtype model data, the Sankey diagram (Fig. 1b, left) shows how the proportional distribution of each subtype changes as the number of subtypes expands, ranging from 1 to 5. The selected three-subtype model effectively captured the full progression of WMH, as patients were distributed across all 20 stages within each subtype (Fig. 1b, middle). Of note, patient distribution across WMH stages differed among WMH subtypes. Thus, WMH stage was included as a covariate in subsequent analyses to compare demographics, risk factors, stroke etiology, stroke severity, and post-stroke outcomes across WMH subtypes. As shown in the probability of maximum likelihood (ML) subtype plot (Fig. 1b, right; a 2D projection on a triangular plane) presenting each patient’s probability of being classified as the most likely subtype, approximately 99% (n = 8673) of all patients showed good fits to the model (ML subtype probability ≥ 0.5, outside of the gray-shaded triangle).

We also determined whether an assigned subtype category remained consistent through stage advancement of WMH progression over time. Indeed, we assessed this subtype stability to validate the longitudinal robustness of classifying distinct WMH growth trajectories in individual patients. To evaluate this, we used a longitudinal dataset of 264 patients’ FLAIR MRI at admission and follow-up (FU) (median [interquartile range; IQR] years: 4.5 [2.8‒7.3]; Fig. 1c, left). Five patients (2%) were assigned to stage 0 at admission and were excluded from the analysis. In the majority (>77%) of the 259 included patients, subtypes at admission did not change at FU (Fig. 1c, middle), thereby indicating subtype categorization is consistent over time. This subtype stability was not related to FU period duration (Supplementary Table 2; T [251] = −0.19, P = 0.847, odds ratio [OR] and 95% confidence interval [CI] = 0.99 [0.89–1.11]). Notably, patients with diabetes mellitus (T [251] = −2.58, P = 0.010, OR [95% CI] = 0.43 [0.22–0.82]) or those who underwent revascularization therapy (T [251] = −2.21, P = 0.028, OR [95% CI]  = 0.27[0.08–0.87]) had lower subtype stability at FU when compared to patients without diabetes mellitus or revascularization therapy. Next, we predicted stage progression from admission to FU by employing a mixed-effects regression model that incorporated demographic and vascular risk factors, FU years, and WMH subtype and stage at admission (Fig. 1c, right). Through leave-one-out CV, the predicted stage progressions moderately correlated with observed stage progressions (F [1, 257] = 113.95, P = 2.9\(4\times\)10–22, R2 [95% CI] = 0.31 [0.21–0.40]).

Distinct demographic and risk factor profiles in different WMH subtypes

Table 2 shows baseline characteristics of patients with different WMH progression subtypes. Figure 2 presents raw means (scatter dots) and estimated marginal means (EMM; a curved line with shaded 95% CI, obtained from mixed-effects regression models after adjusting for age and sex) along the stage in each subtype. Detailed statistical information is presented in Supplementary Table 3. If the P for the subtype-stage interaction was less than 0.1, we conducted subgroup analyses by dividing patients into two groups, namely those with stage ≤ 9 (median) and those with stage > 9, to assess how subtype differences varied across stages.

Fig. 2: Distinct demographics, risk factor, and etiology profiles of stroke patients with different white matter hyperintensity (WMH) subtypes.
figure 2

Graphs of estimated marginal means (EMMs; curved line with shaded 95% confidence interval) for demographics, vascular risk factors, and etiologies across WMH stages within each WMH subtype (fronto-parietal [FP]; radial; temporo-occipital [TO]) in stroke patients. Scatter plots of raw means were superimposed. The graph explicitly presents significant inter-subtype differences across all stages. For factors showing significant subtype–stage interactions, we further stratified stages into early and late periods based on median stage (9 for stroke patients) in order to derive inter-subtype differences across these periods. a Age, hypertension (HTN), diabetes mellitus (DM), hyperlipidemia (HL), and current smoking. b Atrial fibrillation (AF) and coronary artery disease (CAD). c Presence of silent brain infarcts (SBIs), cerebral microbleeds (CMBs), and WMH volume. d Stroke subtypes: large artery atherosclerosis (LAA), small vessel stroke (SVS), and cardioembolism (CE). e Acute infarct volume and admission National Institute of Health Stroke Scale (NIHSS) score. Source data are provided as a Source Data file.

Table 2 Demographics, vascular risk factors, and stroke etiologies by subtype and stage of white matter hyperintensity progression in stroke patients

Mean age showed significant interactions with subtype and stage (Table 2; F [2, 8729] = 97.67), P = 1.13\(\times\)10−42), although age increased as WMH stage advanced in all subtypes (Fig. 2a and Supplementary Table 3). At earlier stages (≤9), the mean age was higher in FP subtype, a pattern that indicated relatively delayed WMH onset in this subtype compared to radial and TO subtype (Supplementary Table 4). At later stages (>9), mean ages were similar between FP and radial subtype but still lower in TO subtype (Supplementary Table 5). As shown in Table 2, the proportion of men was lower in FP (46.7%) and radial (54.7%) than in TO subtype (72.3%; T [8731] = −14.52, P = 3.20\(\times\)10−47, OR [95% CI] = 0.39 [0.34–0.44] and T [8731] = −14.33, P = 4.69\(\times\)10−46, OR [95% CI] = 0.44 [0.39–0.49], respectively).

After adjusting for age and sex, hypertension was most frequently observed in FP subtype (at earlier stages), while atrial fibrillation (AF) was most frequently observed in TO subtype (Fig. 2a, b). Hypertension showed a trend towards rising prevalence in all subtypes, particularly at earlier stages (≤9), and subsequently reached a plateau without significant inter-subtype differences in frequency (Supplementary Tables 3, 4 and 5). AF exhibited a decreasing trend across all stages in all subtypes. These findings suggest that patients with AF are less likely to remain stroke-free until they develop severe WMH, compared to those without AF. Our findings were similar for coronary artery disease. There was no significant inter-subtype difference in the prevalence of diabetes mellitus, hyperlipidemia, or current smoking. Of note, only diabetes mellitus showed an increasing trend with advancing stages in all subtypes. Unadjusted data of demographics and risk factor profiles are shown in Supplementary Fig. 5a, b and Supplementary Table 6.

Chronic and acute stroke-related lesions have distinct etiological features in different WMH subtypes

Silent brain infarcts and cerebral microbleeds as key indicators of SVD43 appeared most frequently in FP subtype and least frequently in TO subtype (Table 2 and Fig. 2c). In all subgroups, the proportion of patients with chronic lesions (silent brain infarcts and cerebral microbleeds) increased with advancing stages. Detailed statistical information is presented in Supplementary Table 7. Interestingly, total WMH volumes differed significantly across subtypes and increased exponentially with advancing stage (Table 2 and Fig. 2c). Overall WMH volume was higher in FP subtype, followed by radial subtype, and lowest in TO subtype. Moreover, a significant subtype-stage interaction was observed: the rate of the volume increase with stage was slowest in FP, intermediate in radial, and fastest in TO (Supplementary Table 7).

In line with the association between FP subtype and SVD markers, acute infarction in FP subtype was more likely to be due to small vessel stroke (SVS) (Fig. 2d) compared to that in TO subtype. Reflecting the aforementioned higher prevalence of AF in TO subtype, cardioembolic (CE) stroke occurred more frequently in TO subtype across all stages than in other subtypes. Similar to FP subtype, radial subtype showed a higher proportion of SVS and a lower proportion of CE than TO subtype. The proportion of SVS expanded with progressing WMH stages after adjusting for age and sex. This finding aligned with, as noted above, rising hypertension prevalence as WMH stages advance. After age, hypertension is the most important risk factor for SVD44 and WMH11,45,46. As in AF, the proportion of CE strokes decreased with progressing WMH stages. Large artery atherosclerotic strokes did not show notable trends with advancing WMH stages. As expected, acute infarct volumes were relatively high in TO subtype (Fig. 2e). Further, acute infarct volumes diminished as WMH stages advanced in all WMH subtypes with or without adjusting for age and sex (Supplementary Fig. 5e and Supplementary Table 6). As WMH stages advanced, the etiologic proportion shifted—SVS increased while CE declined—consistent with the decreasing trend observed in AF (Fig. 2b, d). Given that SVS typically produces smaller lesions than those caused by CE, this shift likely contributed to the reduction in infarct volume at higher WMH stages. Admission National Institute of Health Stroke Scale (NIHSS) scores data appeared to corroborate infarct volume data when adjusted for age and sex (Fig. 2e) but without such adjustment, NIHSS scores showed an increasing trend as WMH stages advanced (Supplementary Fig. 5e and Supplementary Table 6).

Different WMH subtypes have distinct stroke outcomes

We compared 3-week (in-hospital), 3-month, and 1-year outcomes in the three different WMH progression subtypes after adjusting for age, sex, and revascularization therapy (Table 3 and Fig. 3a–c). Early neurological deterioration within 3 weeks and its most common cause, stroke progression, occurred more frequently with advancing WMH stages in all subtypes, without significant inter-subtype differences (Fig. 3a). Symptomatic hemorrhagic transformation appeared more often in TO subtype than in other subtypes. Prior anticoagulant use was slightly more frequent in subtype TO subtype (2.8%) compared to FP (2.2%) and radial (2.5%) subtypes, but the differences were not statistically significant (logistic mixed-effects model; F [2, 8732] = 0.97), P = 0.378). Thus, the higher rate of symptomatic hemorrhagic transformation in TO subtype is unlikely to be attributable to differences in anticoagulant use across subtypes. Intriguingly, the incidence of symptomatic hemorrhagic transformation tended to decrease with WMH progression. Stroke recurrence did not change as WMH stages advanced, with no significant differences observed within or between subtypes. At 3 months, the proportion of patients with unfavorable functional outcome (modified Rankin Scale [mRS] score >3) was larger in TO subtype than in other subtypes (Fig. 3b). Unfavorable functional outcome levels were higher with advancing WMH stages in all subtypes. The 1-year recurrence of ischemic stroke was notably higher in FP subtype than in other WMH subtypes, regardless of their stages (Fig. 3c and Table 3). However, hemorrhagic stroke recurrence, which was less frequent than ischemic stroke recurrence, expanded with WMH progression in all subtypes, showing no significant inter-subtype difference. All-cause death and nonvascular death within 1 year rose with advancing WMH stages, but these increases did not differ among WMH subtypes. Vascular death was not related to either WMH subtypes or stages (Table 3).

Fig. 3: Post-stroke outcomes by subtype and stage of white matter hyperintensity (WMH) progression in stroke patients.
figure 3

Graphs of estimated marginal means (EMMs; curved line with shaded 95% confidence interval) for post-stroke outcomes by WMH stage in each WMH subtype (fronto-parietal [FP]; radial; temporo-occipital [TO]). Scatter plots of raw means were superimposed. a Early (<3 weeks) neurological deterioration and its causes (stroke recurrence, stroke progression, and symptomatic hemorrhagic transformation [HT]). b Unfavorable functional outcome at 3 months (modified Rankin Scale score >3). c Stroke recurrence (ischemic and hemorrhagic) and mortality (all-cause and nonvascular death) within 1 year. Cox proportional hazards regression models were used; hazard ratio (HR) and two-sided P for stage (hemorrhagic stroke recurrence: 3.74 × 10−5; all-cause death: 1.52 × 104; nonvascular death: 6.24 × 10−6) are explicitly presented. * indicate significant (P < 0.05) between-group differences (FP vs radial, Z = 2.49, P = 0.013; FP vs TO, Z = 2.52, P = 0.012). Note that ischemic stroke recurrence and mortalities for radial (black) and TO subtype (blue) are very similar, with closely overlapping curves. Source data are provided as a Source Data file.

Table 3 Mixed-effects and Cox proportional hazards regression models for post-stroke outcomes according to white matter hyperintensity subtype and stage in stroke patients

To assess whether WMH subtypes and stages provide prognostic information beyond demographics, vascular risk factors, and initial stroke outcomes, we conducted additional analyses adjusting for a comprehensive set of clinical variables, including hypertension, diabetes mellitus, hyperlipidemia, current smoking, AF, coronary artery disease, stroke subtype, admission NIHSS score, and acute infarct volume as well as age, sex, and revascularization therapy (Supplementary Table 8). Compared to the original main analyses—which adjusted only for age, sex, and revascularization therapy to minimize potential overfitting—the results remained largely consistent: the associations between WMH subtype and post-stroke outcomes, and those between stage and post-stroke outcomes remained significant (3 of 10 and 7 of 10 outcomes, respectively, in the main analyses [Table 3]). The only exception was the association between WMH subtype and 3-month unfavorable functional outcome, where the previously observed higher risk in TO subtype was no longer statistically significant. Instead, this outcome was better explained by clinical variables, including stroke subtype (SVS; T [7815] = −3.49, OR [95% CI] = 0.62 [0.47–0.81], P = 4.89 \(\times\)10−4), NIHSS score (T [7815] = 26.24, OR = 1.23 [1.21–1.25], P = 1.51\(\times\)10−145), and infarct volume (T [7815] = 9.23, OR = 1.28 [1.22–1.35], P = 3.47\(\times\)10−20). These results indicate that the worst 3-month outcome in the TO subtype was attributable to a lower frequency of SVS and greater baseline stroke severity, rather than an independent effect of the WMH subtype. Notably, unlike the main analysis, the fully adjusted model identified WMH subtype as a predictor of 1-year all-cause and nonvascular mortality—independent of vascular risk factors and other clinical variables—with the radial subtype showing a significantly higher risk.

Alignment and extension of the WMH subtyping–staging model relative to the Fazekas scale in stroke risk and outcomes

We assessed the relationships Fazekas scale scores (PVWMH 0–3, DWMH 0–3, and total 0–6) have with demographics, vascular risk factors, stroke etiology and severity, and post-stroke outcomes. PVWMH and total Fazekas scale scores were associated with age, sex, hypertension, diabetes mellitus, AF, and the presence of silent brain infarcts and cerebral microbleeds (Supplementary Tables 9 and 10), patterns that are consistent with connections observed between these vascular risk factors and WMH stages in our model (Supplementary Table 3). DWMH score was associated with age, hypertension, and the presence of silent brain infarcts and cerebral microbleeds. Current smoking and hyperlipidemia were not significantly associated with either Fazekas scale scores or WMH stage in our model. Similar to WMH stage in our model, PVWMH and total Fazekas scale scores were associated with SVS and CE, although DWMH score was associated only with SVS (Supplementary Table 10). Acute infarct volume and admission NIHSS score were associated with PVWMH and total Fazekas scale scores but not with DWMH score. Total Fazekas scale score was associated with early neurological deterioration and its causes, including stroke progression and symptomatic hemorrhagic transformation, but not stroke recurrence within 3 weeks (Supplementary Table 11). However, PVWMH and DWMH scale scores were not significantly associated with early neurological deterioration or its causes, except that the PVWMH scale score was negatively associated with symptomatic hemorrhagic transformation. Unlike our WMH subtypes or stages, total and regional Fazekas scales did not significantly predict unfavorable functional outcome at 3 months. Total Fazekas scale predicted hemorrhagic stroke recurrence (but not ischemic stroke recurrence) and all-cause and nonvascular mortality (but not vascular mortality) within 1 year, whereas regional Fazekas scales predicted none of these outcomes. These 1-year outcome-related findings were, as described previously, consistent with results for our WMH stages. For additional context on how the WMH subtyping–staging model extends beyond the Fazekas scale in capturing stroke risk and outcomes, please refer to the Supplementary Information.

Similar spatiotemporal patterns of WMH progression but distinct WMH stage distributions in high-risk controls and stroke patients

Using the three-subtype model we developed by employing data from our stroke patients (vs. data from the low-risk controls in the UK Biobank), we assigned the high-risk controls (without a history of neurological diseases but with vascular risk factors, n = 22,399 in the UK Biobank) to their most likely WMH subtype and stage (Fig. 4a). The spatiotemporal patterns of WMH progression, paralleling those observed in stroke patients (Fig. 1a), demonstrated fronto-parietal (FP subtype), radial, and temporo-occipital (TO subtype) WMH progression. After excluding ~56% of the high-risk controls classified as stage 0, the remaining 9967 were categorized as FP (n = 558, 46%), radial (n = 4155, 42%), and TO subtype (n = 1254, 13%) (Fig. 4b, left). In all subtypes, high-risk controls were predominantly concentrated in the early stages (median [IQR] = 3[1‒6]), and their frequencies were a function of stage following an exponential decrease (Fig. 4b, middle). The stroke population-based three-subtype model effectively classified 86% (n = 8600) of high-risk controls (ML subtype probability ≥0.5, outside of the gray-shaded triangle; Fig. 4b, right). As noted previously, the value was 99% for the stroke population itself.

Fig. 4: Similar spatiotemporal trajectories of white matter hyperintensity (WMH) progression in high-risk controls (vs. stroke patients).
figure 4

a Spatiotemporal patterns of distinct WMH progression subtypes (fronto-parietal [FP]; radial; temporo-occipital [TO]) in high-risk controls. Bullseye presentations (F frontal, B basal ganglia, T temporal, P parietal, O occipital) and median WMH severity maps are presented for each WMH subtype at WMH stages 2, 6, 10, and 14. The z-axis coordinates in the Montreal Neurological Institute space are displayed in the bottom right corner of the first three brain slices in the FP subtype. b Left, Sankey diagram illustrating distributions of high-risk controls across the number of subtypes in Subtype and Stage Inference (SuStaIn) modeling. Main results in this study utilized the three-subtype model (*). Groups in models were assigned the same color as the subtype with the highest proportion of patients based on the three-subtype model derived from stroke patients. Middle, distributions of stages across the subtypes. Individuals at stage 0 were not included in any other subtypes (FP, radial, and TO subtype). Right, scatterplot showing maximum likelihood (ML) subtype probabilities (prob.) of individuals, in a 2D projection on a triangular plane. c Trajectory deviations in spatiotemporal WMH progression between stroke patients and high-risk controls across stages at each subtype, showing the median values (dots) and their 95th percentiles (shaded area) for Euclidean distances from the median of the stroke group in the regions-of-interest [ROI]-based 20-dimensional space. Within-population 95th percentiles for the stroke group are also displayed. Gray bars at the top of the graph represent, among high-risk controls, the proportion of outliers, defined as those with trajectory deviations exceeding the 95th percentile observed in stroke patients for the corresponding subtype and stage. d Graphs of estimated marginal means (EMMs; curved line with a shaded 95% confidence interval) for demographics and vascular risk factors across stages within each subtype in high-risk controls: age, hypertension (HTN), diabetes mellitus (DM), hyperlipidemia (HL), and current smoking. Scatter plots of raw means were superimposed. The graph explicitly presents significant inter-subtype differences across all stages. For factors showing significant subtype–stage interactions, we further stratified the stages into early and late periods based on the median stage (three for high-risk controls), thus defining inter-subtype differences across these periods. See Supplementary Fig. 6 for atrial fibrillation and coronary artery disease. Source data are provided as a Source Data file.

In addition, spatiotemporal patterns of WMH progression in high-risk controls appeared comparable to those observed in stroke patients, as indicated by quantifying spatial and severity-related differences in WMH between the two respective maps at each stage within each subtype to estimate the degree of alignment in their progression trajectories (Fig. 4c). To estimate the trajectory deviation between the stroke group and the high-risk control group in WMH spatiotemporal progression, we calculated the Euclidean distances between the ROI-based 20-dimensional coordinates of individual high-risk controls and the corresponding median value of the stroke group for each stage in each subtype, deriving a total of 60 (20 stages × 3 subtypes) median values and their 95th percentiles. We also calculated within-population median values and 95th percentiles for the stroke patients. As shown in Fig. 4c, the trajectories reflecting the spatiotemporal progression of WMH closely overlapped between the stroke and high-risk control populations.

The high-risk control group and the stroke group exhibited similarities in demographic and risk factor profiles across WMH subtypes. Older age at earlier stages (<3) and higher proportions of females and hypertension (Fig. 4d and Supplementary Table 12). We observed no significant inter-subtype differences in the prevalence of diabetes mellitus, hyperlipidemia, or current smoking. Trends in both age and the prevalence of hypertension and diabetes mellitus increased with advancing stages, consistent with patterns observed in the stroke group. We also found some dissimilarities between the groups. In the high-risk control group, the proportions of AF and coronary artery disease did not differ significantly across subtypes (Supplementary Fig. 6 and Supplementary Table 12 vs. Fig. 2b and Table 2). Also, the proportions of hyperlipidemia and current smoking significantly changed with advancing stages (Fig. 4d and Supplementary Table 12 vs. Fig. 2a and Table 2).

Given the concordant WMH progression trajectories, it is also noteworthy that WMH stage distributions clearly diverged between high-risk controls and stroke patients (Fig. 5a). As mentioned above, the stage distribution of high-risk controls predominantly clustered in the early stages (Fig. 4b, middle), but stroke patients were markedly different (Fig. 1b, middle): FP subtype presented consistently high frequencies up to stage 9, followed by a linear reduction; Radial subtype showed increasing frequencies up to stage 9, followed by a plateau and then an initially gradual but later sharp decrease; and TO subtype displayed relatively low frequencies up to stage 14, followed by a late sharp rise and subsequent decline. This clear divergence in WMH stage distributions was further supported by the area under the receiver operating characteristic (ROC) curve, which, compared to WMH volume, demonstrated more pronounced distinctions between stroke patients and high-risk controls (Fig. 5b). A similar advantage of WMH stage was observed in differentiating stroke patients from low-risk controls. However, both measures showed limited discriminative power between low-risk and high-risk controls (area under the ROC: 0.58 and 0.60 for WMH stage and volume, respectively).

Fig. 5: Comparing white matter hyperintensity (WMH) stage and volume in order to discriminate stroke patients from controls.
figure 5

a Distributions of WMH stages and volumes in controls and stroke patients. Two-sided P values were calculated using Mann-Whitney U-tests to compare WMH stages and volumes between the groups: stage (patients vs. high-risk controls: 1.03 × 10–2640; patients vs. low-risk controls: 3.99 × 10–2839; high-risk controls vs. low-risk controls: 7.76 × 10–178); volume (patients vs. high-risk controls: 1.61 × 10–1334; patients vs. low-risk controls: 2.71 × 10–1731; high-risk controls vs. low-risk controls: 2.97 × 10–223). b Receiver operating characteristic curves for distinguishing stroke patients from controls based on either WMH stage or WMH volume. Areas under the curves (AUCs) were calculated and statistically compared using the DeLong test; two-sided P values are reported (patients vs. high-risk controls: 8.88 × 10–975; patients vs. low-risk controls: 1.50 × 10–571; high-risk controls vs. low-risk controls: 6.50 × 10–32). Source data are provided as a Source Data file.

Using cerebral arterial territories provides an alternative but less robust model of spatiotemporal WMH progression

In addition to our proposed model based on ROIs representing frontal, basal ganglia, temporal, parietal, and occipital lobes, we explored an alternative WMH subtyping–staging approach (Fig. 6) based on ROIs representing cerebral arterial territories (anterior, middle, posterior cerebral arteries [ACA, MCA, PCA], and their border zones) and distance from the ventricular surface (Supplementary Fig. 7). SuStaIn modeling and a CV process led to the development of a two-subtype model (Fig. 6a). Models with more than two subtypes showed substantially reduced progression pattern similarity across CV folds, a result that indicates lower model robustness, along with markedly elevated maximum pattern similarity between different subtypes, a result that indicates poorer spatial discrimination between progression patterns of different subtypes (Supplementary Fig. 3b). However, the two-subtype model’s CV evaluation metrics (log likelihood, CVIC, and pattern similarities) also demonstrated poorer model fit, robustness, and spatial discrimination between subtypes than our primary proposed model (Supplementary Fig. 3a). Subtype 1 (T1’) showed major WMH progression starting in the MCA territory and border zones of MCA-PCA and MCA-ACA, while Subtype 2 (T2’) began with major WMH progression in the MCA-PCA border zone and the PCA territory. Nevertheless, these progression patterns were less distinct than those in our proposed model (Fig. 1a). Moreover, these subtypes did not show significant differences in incidence of early (<3 weeks) neurological deterioration (symptomatic hemorrhagic transformation; linear mixed-effects regression model, P for subtype = 0.561), unfavorable functional outcome at 3 months (linear mixed-effects regression model, P for subtype = 0.347), or stroke recurrence within 1 year (Cox proportional hazards model, P = 0.794 for T2’ [vs. T1’]) (Fig. 6b). Furthermore, in the longitudinal subtype-stability analysis for this two-subtype model, only 67% of patients who exhibited the T2’ subtype at admission remained in the same subtype at FU (Fig. 6c), which is a particularly low rate given that the model includes only two subtypes. Our proposed three-subtype model, by contrast, showed subtype stabilities of more than 77% for all subtypes, despite having one more subtype (Fig. 1c, middle).

Fig. 6: Cerebral arterial territory model for spatiotemporal patterns of white matter hyperintensity (WMH) progression in stroke patients.
figure 6

a Spatiotemporal patterns of WMH progression across arterial territories (anterior cerebral artery [A], middle cerebral artery [M], posterior cerebral artery [P], and their border zones [A-M and M-P]). Bullseye presentations and median WMH severity maps are presented for each WMH subtype at WMH stages 2, 6, 10, and 14. The z-axis coordinates in the Montreal Neurological Institute space are displayed in the bottom right corner of the first three brain slices in the FP subtype. b Estimated marginal means (EMMs; a curve line with a shaded 95% confidence interval) of proportions of symptomatic hemorrhagic transformation (HT) leading early neurological deterioration within 3 weeks and unfavorable functional outcome at 3 months (modified Rankin Scale score >3) by stage across subtypes (T1’ and T2’) are presented. Cumulative events of ischemic stroke recurrence within 1 year across subtypes are shown, as are hazard ratio (HR) and two-sided P for WMH stage (hemorrhagic stroke recurrence: 3.14 × 10–4; all-cause death: 9.85 × 10–5; nonvascular death: 4.33 × 10–5) from Cox proportional hazards regression. c Longitudinal subtype stability rates by subtype at admission and follow-up (FU). Source data are provided as a Source Data file.

Using propensity score matching for sensitivity analysis on WMH trajectories

To minimize potential modeling bias arising from population differences between stroke patients and low-risk controls, we applied propensity score matching and conducted SuStaIn modeling on the matched stroke patient cohort (Supplementary Fig. 1). The analysis included 7311 stroke patients and 7311 age- and sex-matched low-risk controls. A total of 412 (6%) patients were assigned to stage 0 and not classified to any subtype. The sensitivity analysis yielded results that were consistent with main analysis findings in terms of the three distinct spatiotemporal patterns of WMH progression (Supplementary Fig. 8a, b), baseline characteristics of the three subtypes (Supplementary Fig. 8c–g and Supplementary Table 13), and their post-stroke outcomes (Supplementary Fig. 9 and Supplementary Table 14). Unlike in the main results, radial and TO subtype showed a higher incidence of nonvascular death than FP subtype (Supplementary Fig. 9c).

Discussion

We identified distinct spatiotemporal trajectories of WMH progression by using statistical modeling with longitudinal inference37 based on a large cross-sectional multi-center dataset of consecutive stroke patients (n = 9179). Consecutive enrollment and large sample size allowed us to reliably estimate WMH progression trajectories from cross-sectional data. Different WMH progression subtypes showed: (i) distinct demographic and vascular risk factor profiles (delayed WMH onset, more women, and more frequent hypertension in FP subtype vs. earlier WMH onset, more men, and more frequent AF in TO subtype); (ii) distinct stroke etiology-related features (more frequent SVS in FP subtype vs. more frequent CE in TO subtype); and (iii) distinct patterns of stroke outcomes (more frequent early symptomatic hemorrhagic transformation and 3-month unfavorable functional outcome in TO subtype vs. more frequent 1-year ischemic stroke recurrence in FP subtype). These data complemented or accorded with total or regional Fazekas scale-related findings. Longitudinal validation confirmed that the modeling closely aligned with observed WMH progression trajectories. Furthermore, the concordant WMH progression trajectories between high-risk controls and stroke patients, the divergence in their WMH stage distributions, and the consequent higher area under the curve in the ROC (compared to WMH volume) for distinguishing stroke patients from high-risk controls warrant further investigation to confirm whether WMH stage can better predict future stroke occurrence.

FP subtype (Fronto-parietal WMH progression) was characterized by older age or late onset of severe WMH as well as higher proportions of female patients, hypertension, silent brain infarcts, and cerebral microbleeds. A previous study47 has also reported anterior-dominant distribution of WMH in older and hypertensive adults. Another study33 has shown that patients who are older, female, hypertensive, or have SVS tend to have more extensive WMH in the ACA territory than in MCA and PCA territories. FP subtype patients had the highest incidence of 1-year ischemic stroke recurrence, which might be partly attributed to their older age and higher prevalence of hypertension48,49,50. In addition, our study revealed that subtype differences in age distribution and hypertension prevalence were greater in early stages than in late stages. Differences in fronto-parietal WMH severity across subtypes were more pronounced in early stages. In late stages, radial and TO subtype also showed WMH progression in fronto-parietal regions, likely due to age and hypertension. As a result, late-stage radial and TO subtype patients might also be older and have higher rates of hypertension, similar to late-stage FP subtype patients.

Radial subtype, characterized by radial WMH progression across all lobes, was the predominant subtype, accounting for 46% of stroke patients in our study. Notably, vascular risk factors such as hypertension, AF, and coronary artery disease had the lowest prevalence in radial subtype. This subtype had the highest prevalence of SVS. Obviously, both acute infarct volume and NIHSS score were lowest in radial subtype compared to the other subtypes.

TO subtype (temporo-occipital WMH progression) had the highest proportions of men and AF, along with the largest acute infarct volumes and the highest NIHSS scores. A previous study has found that temporal and posterior WMH is more frequently observed in patients with AF than in those without AF51. Moreover, CE strokes52 predominantly caused by AF are strongly associated with larger acute infarct volumes and elevated NIHSS scores53,54. Additionally, earlier studies have shown that AF, larger acute infarct volume, and higher admission NIHSS score are associated with increased risk of symptomatic hemorrhagic transformation leading to early neurological deterioration55,56. These findings aligned with the relatively higher incidence of symptomatic hemorrhagic transformation and early neurological deterioration in TO subtype observed in our study, a connection which suggests TO subtype WMH might have potential clinical implications, particularly in situations, such as revascularization therapy, where hemorrhagic complications are a concern57. The incidence of unfavorable functional outcome at 3 months rose with advancing WMH stages across all subtypes but was highest in TO subtype, likely due to larger acute infarct volume, higher NIHSS score, and more frequent hemorrhagic transformation.

The Fazekas scale, the most widely used WMH staging measure, can estimate overall WMH burden and predict stroke occurrence and prognosis25,58,59. Additionally, regional Fazekas scales for PVWMH and DWMH have been applied based on the hypothesis that PVWMH and DWMH have different vascular pathologies. However, it has been challenging to clearly differentiate regionally distinct pathophysiological mechanisms5,60,61,62.

Previous studies have shown that total Fazekas scale score is associated with the majority of risk factors related to both PVWMH and DWMH5,60,61,63,64, suggesting that PVWMH and DWMH likely represent a continuum of the same pathology. In a similar vein, there was a strong correlation between PVWMH and DWMH scores5,60,61, a finding the present study also corroborates (R2 = 0.50). We also showed that both total and PVWMH Fazekas scale scores were associated with age, female sex, hypertension, diabetes mellitus, AF, the presence of silent brain infarcts and cerebral microbleeds, stroke subtype (SVS and CE), acute infarct volume, and admission NIHSS score. The DWMH score was associated with age and hypertension—major risk factors for WMH—as well as the presence of silent brain infarcts and cerebral microbleeds. In our model, temporal information about WMH stages mirrored the relationships total and PVWMH Fazekas scale scores have with demographic and cerebrovascular risk factors, stroke etiology, and stroke severity. Additionally, our model provided the following spatial information with distinct WMH progression trajectories: (i) older, hypertensive, and SVS patients were more likely to be FP subtype; (ii) AF, coronary artery disease, and CE patients were more likely to be TO subtype; and (iii) most clinical factors were less prevalent in radial subtype.

In terms of stroke outcomes, total Fazekas scale score, but not PVWMH or DWMH scores, and WMH stage in our model were both associated with early neurological deterioration and its causes, including stroke progression and symptomatic hemorrhagic transformation. Moreover, unlike our WMH subtypingstaging model, total and regional Fazekas scale scores did not significantly predict poor functional outcome at 3 months. This further suggests that our the new WMH subtyping–staging model aligns with and complements the conventional Fazekas scale in profiling stroke risk and outcomes.

Intriguingly, total and PVWMH Fazekas scale scores, as well as WMH stage in our model, all showed negative associations with the prevalence of CE etiology. Previous studies have reported that patients with larger WMH volumes are more likely to experience SVS than those with other stroke subtypes, particularly CE65,66, which could partly explain the negative associations observed in the present study. We also found that only DWMH Fazekas scale score did not show a significant association with diabetes mellitus or AF, although previous studies have reported inconsistent associations between these factors and both PVWMH and DWMH63,67,68,69.

Our WMH subtypingstaging model offers a more refined spatiotemporal resolution for assessing WMH burden, encompassing 21 stages (0–20) for each of the three regional progression subtypes. This contrasts with the Fazekas scale, which has total scores ranging from 0 to 6, with 0 to 3 assigned to each of the two regional subtypes. Such increased granularity may enhance sensitivity in detecting small yet significant changes in WMH severity, thereby improving prognostic accuracy and contributing to personalized medicine. While the Fazekas scale is limited by fewer staging categories and qualitative assessment, it has been widely adopted due to its simplicity: anyone can easily assess it by visually inspecting images. Compared to the Fazekas scale, our model involves more complex steps, including the following processes: (1) delineating WMH based on brain ROIs, (2) constructing a statistical model, and (3) assigning individuals to specific subtypes and stages based on the model. These processes require a modest amount of computer processing, which currently limits user accessibility. However, this challenge could be addressed by developing an automated algorithm and implementing it with software. Automating the pipeline would allow users to fully benefit from the model without the current complexity. Our research team is actively developing this automated pipeline by using deep-learning algorithms. We plan to release the software before long, thus enabling users to easily analyze individual WMH subtype and stage directly from raw images. Through these efforts, we aim to maximize the strengths of our model and boost its applicability.

Our proposed WMH stage was associated with various demographics and vascular risk factors, including age, male sex, hypertension, diabetes mellitus, AF, coronary artery disease, and chronic SVD markers such as silent brain infarcts and microbleeds. These findings were largely consistent with previous studies using the Fazekas scales or WMH volume63,65,66. Interestingly, however, we found a positive association between male sex and WMH stage but a negative association between AF and WMH stage. The observed association between male sex and WMH stage might be attributed to an overcompensation for older female patients in the multivariate analysis. Regarding the inverse relationship between AF and WMH stage, as we previously discussed when reporting similar findings in elderly hypertensive patients11, AF could cause ischemic stroke (i.e., CE stroke) in the absence of other vascular risk factors that could contribute to elevated WMH stage70. Moreover, because our study found fewer CE strokes at higher WMH stages, and previous studies demonstrated that hemorrhagic transformation tended to occur more frequently in CE strokes (vs. strokes from other etiologies)71, it seems reasonable that the incidence of symptomatic hemorrhagic transformation decreased as WMH stages increased. Supporting this, the trend reversed and increased after we adjusted for stroke etiology (T [8053] = 7.31, P = 2.89 × 10–13, OR [95% CI] = 1.17 [1.12–1.21]).

We included only patients with first-ever stroke whose WMH were not attributable to clinically evident prior stroke, with FLAIR MRI scans acquired early (median [IQR] = 14 [6‒38] h) after symptom onset. Additionally, we have made great efforts to enroll patients consecutively to minimize selection bias. Despite potential inherent differences between stroke patients and high-risk controls, as well as racial differences between the two datasets used (i.e., the Korean MRI-based stroke database and the UK Biobank), we observed concordant spatiotemporal WMH progression trajectories in both groups. These findings might suggest that our subtyping–staging model may not only be applicable to stroke populations but could also seamlessly extend into a continuum within high-risk normal populations. In other words, our model could be broadly applicable as a potentially generalizable biomarker for cerebrovascular brain health. Future studies should confirm whether our current findings in a first-ever stroke patient population can be extrapolated to high-risk, stroke-naïve populations.

In our study, the mean [95% CI] WMH volume (mL) in the UK Biobank controls was 4.20 [4.14‒4.26], which closely aligns with findings from a meta-analysis of 17 studies involving 9716 healthy subjects72. In both high-risk and low-risk controls, WMH volumes showed right-skewed distribution, and this result is consistent with the typical right-skewed distribution of WMH volumes observed in population-based cohorts73,74. Stroke patients from our Korean MRI-based stroke database had less right-skewed and significantly higher WMH volumes (Median [IQR]: 8.90 [4.39‒18.17]) compared to UK Biobank controls (2.45 [1.61‒4.37]), in accordance with previous studies on stroke patients23,33,65,66. Since the UK Biobank represents a general (low- and high-risk) control population and the Korean MRI-based stroke database represents a stroke population, the significantly different stage distributions between these groups have notable implications for stroke risk assessment, particularly in high-risk individuals.

High-risk controls were predominantly concentrated in the early stages and had significantly more left-skewed distribution than stroke patients. This finding suggests the model can estimate subclinical WMH burden that is a precursor to symptomatic stroke. Previous longitudinal studies have shown that larger baseline WMH volumes were associated with elevated stroke risk, with hazard ratios ranging from 1.4 to 1.79 per log-unit increase in WMH volume75,76. In our ROC analysis, WMH stage demonstrated significantly greater discriminative power than WMH volume for distinguishing stroke patients from high-risk controls. These results indicate that the WMH stage derived from our WMH subtypingstaging model provides a more precise and reliable tool for predicting symptomatic stroke risk compared to traditional WMH volume-based approaches.

The WMH subtypes and stages identified based on functional lobes effectively predicted post-stroke outcomes. However, the alternative WMH progression model based on arterial territories did not effectively predict symptomatic hemorrhagic transformation, 3-month unfavorable functional outcomes, or 1-year ischemic stroke recurrence. Moreover, when reassessed at an FU time point in the same patients, the WMH subtypes were more likely to show changes and are thus less reliable predictors of WMH progression trajectories. Because WMH is closely linked to vascular risk factors, arterial territories would be expected to be a better basis for modeling (compared to functional lobes of the brain8,33,77), yet our findings suggest that pathological processes of WMH (a marker of SVD) might be less affected by mechanisms related to large artery disease.

Despite the strengths of this study, such as the use of large datasets of consecutively enrolled stroke patients and robust validation, the study has some limitations. First, we used data from low-risk controls in the UK Biobank to normalize WMH volumes into z-scores (WMH severity) for each ROI in stroke patients from the Korean MRI-based stroke database. Since the UK Biobank dataset primarily consisted of Caucasian individuals with only a small proportion of Asians, we made efforts to minimize bias due to differences in the datasets’ racial composition. We constructed a low-risk control dataset from the UK Biobank, excluding individuals with hypertension, diabetes mellitus, hyperlipidemia, current smoking, AF, or coronary artery disease, to mitigate racial differences in the effects vascular risk factors exert on WMH progression. Additionally, we conducted a sensitivity analysis using propensity score matching for demographics, such as age and sex, between datasets. Repeating our main analyses with these matched datasets produced nearly identical results. Nevertheless, a previous study has shown that, even after controlling for demographics and risk factors, a Chinese population had a higher prevalence of severe WMH than a white population78. Thus, caution should be exercised when applying these findings to other racial groups. Further validation with a large racially matched dataset should be performed in future studies. Second, the lack of association between WMH subtypes and AF in the high-risk control group may reflect differences in cohort selection. Controls in the UK Biobank were included according to age strata, whereas patients in the Korean MRI-based stroke database had to have had a stroke, some of which were due to AF, to enter the study. It is likely that AF is more thoroughly investigated or monitored in stroke patients (e.g., prolonged electrocardiogram monitoring) due to its clinical implications, resulting in more accurate AF detection79, thereby more clearly clarifying its potential association with WMH subtypes compared to controls. Third, SuStaIn modeling37, which assumes multiple discrete progression sequences and infers longitudinal trajectories from cross-sectional data, may artificially construct subtype trajectories by linking unrelated disease states. However, this concern may be partially accounted for by our longitudinal validation, which showed that the model consistently assigned the same subtypes over time, and its stronger predictive ability for post-stroke outcomes compared to conventional WMH measures. Fourth, given the associative nature of our stroke prediction findings, caution is warranted in interpreting the results, and prospective studies are needed to confirm these results.

In conclusion, we developed and validated a new WMH subtypingstaging model by identifying distinct spatiotemporal trajectories of WMH progression, and we demonstrated that this model can not only reliably reflect demographics and vascular risk factors but also predict stroke outcomes across various WMH subtypes and stages. Moreover, the aligned WMH progression trajectories observed in high-risk controls and stroke patients, combined with their distinct stage distributions, suggest this model could potentially predict stroke risk in the general population.

Methods

This study was approved by the institutional review boards of all participating centers (DUIH2010-01-083-020). All analyses were conducted using MATLAB R2021b (MathWorks, Natick, MA, USA), unless otherwise noted. All statistical tests were two-sided.

We used the term “sex” to refer to biological attributes, which were determined based on self-reporting. We assessed the proportion of males across different progression subtypes to explore potential sex-related effects on subtype classification. Additionally, sex was included as a covariate in the analysis of risk factors and stroke outcomes.

Dataset

The flow diagram of the study data is presented in Supplementary Fig. 1. The main data on stroke patients (n = 9179) were obtained from the Korean MRI-based stroke database11,40,41,42, a subproject of Clinical Research Collaboration for Stroke-Korea (CRCS-K, a nationwide stroke registry; http://crcs-k.strokedb.or.kr/eng/). The database consecutively enrolled patients (n = 13,186) with acute stroke who were admitted to 11 participating centers within 7 days of symptom onset between May 2011 and February 2014. Consecutive data from one center (n = 748) were separated and assigned to validation data. Patients (n = 4007 in the main data, n = 191 in the validation data) were excluded based on the following criteria: non-ischemic stroke, previous stroke history, poor-quality FLAIR MRI, and no white matter hyperintensity in FLAIR MRI. The validation data were further refined by excluding 280 patients without FU FLAIR MRI and 13 patients with poor-quality FU FLAIR MRI, resulting in a total of 264 patients (median [interquartile range] years of follow-up: 4.5 [2.8‒7.3]). Of the 280 excluded patients (50% of 557), 49 (8.8% of 557) were lost to FU, 178 (32.0% of 557) were not offered FU MRI by the physician, and 53 (9.5% of 557) declined it. Of the 264 included patients, 132 patients (50% of 264) without new neurological symptoms underwent FU MRI for routine monitoring, with only one patient (0.8% of 132) showing new diffusion-weighted imaging-confirmed lesions. The remaining 132 patients (50% of 264) underwent FU MRI based on the presence or clinical suspicion of new neurological symptoms, and 39 (29% of 132) were found to have diffusion-weighted imaging-confirmed lesions.

We collected demographic and clinical data, including medication history, laboratory results, and vascular risk factors, by using a standardized protocol11,40. Acute ischemic stroke subtypes were determined by consensus among experienced vascular neurologists at each center, all using a validated MRI-based algorithm80 based on the Trial of Org 10,172 in Acute Stroke Treatment (TOAST) classification81. The mRS score82 before and 3 months after stroke (FU loss in n = 229) and NIHSS score83 at admission were collected prospectively. Neurological worsening and new neurological symptoms within 3 weeks of stroke onset (defined as early neurological deterioration) were assessed using previously published criteria40,42,84,85 (FU loss in n = 4): (1) an increase in total NIHSS score by ≥2 points; (2) an increase in level-of-consciousness items (NIHSS 1a–1c) by ≥1 point; (3) an increase in motor items (NIHSS 5a–6b) by ≥1 point; or (4) the occurrence of any new neurological deficit not captured by the NIHSS. Causes of early neurological deterioration were then classified as stroke recurrence, stroke progression, symptomatic hemorrhagic transformation, other, or unknown. Data quality was ensured through regular audits, including monthly monitoring and on-site inspections to review medical records, conducted by an outcome adjudication committee. All patients underwent standard clinical evaluation, treatment, and rehabilitation in accordance with current guidelines for stroke, as published in CRCS-K annual report: http://crcs-k.strokedb.or.kr/eng/reporting/report.asp. Stroke recurrence and mortality within 1 year were prospectively collected using a predefined protocol86 during hospitalization, or after discharge through routine clinic visits or telephone interviews (FU loss in n = 93). Vascular death was defined as any death that occurred during the qualifying stroke admission period, including fatal stroke recurrence, fatal myocardial infarction, fatal congestive heart failure, and any sudden death without an identifiable nonvascular cause after discharge. Nonvascular death was defined as any death not caused by vascular causes.

From the UK Biobank database87, we initially included individuals (n = 37,070) with both (baseline) T1-weighted and FLAIR MRI scans in this study. For control data, individuals with neurological diseases or without white matter hyperintensity in FLAIR MRI were excluded (n = 861). Only individuals with no vascular risk factors, including hypertension, diabetes mellitus, hyperlipidemia, current smoking, AF, and coronary artery disease, were included (n = 13,811) to construct a low-risk control population in relation to stroke. These low-risk control data were used to estimate baseline distributions of WMH volumes in each ROI, and these estimated distributions were then used to calculate WMH severity as z-scores. Binarized diagnoses were determined by assessing all relevant data fields (ex: ICD10, self-report, medications). Diagnoses included hypertension (data field number: 41270, 20002, 6150), diabetes (41270, 20002, 6153, 6177), hyperlipidemia (41270, 20002), smoking status (20116), AF (41270, 41272, 20002, 20004, 12653), and coronary artery disease (41270, 41272, 20002, 20004, 6150). Data on current smoking status were missing for 328 individuals.

MRI acquisition and processing

Multicenter MRI protocols for stroke patient data were as follows. Diffusion-weighted MRI was acquired with b-values of 0 and 1000 s/mm2, 2400–10,400 ms repetition time, 47–111 ms echo time, 0.6–2 × 0.6–2 × 3–6  mm3 voxel size, 3–7.5 mm interslice gap, and 200–260 × 200–260 mm FOV. FLAIR MRI was acquired with 6000–13,000 ms repetition time, 76–169 ms echo time, 0.3–0.9 × 0.3–0.9 × 5–6 mm3 voxel size, 3–7.5 mm interslice gap, and 160–250 × 200–250 mm FOV. T2-weigthed MRI was acquired with 1500–8800 ms repetition time, 45–126 ms echo time, 0.2–0.9 × 0.2–0.9 × 3–6 mm3 voxel size, 3–7.5 mm interslice gap, and 160–250 × 200–250 mm FOV. Gradient-echo MRI was acquired with 350–1230 ms repetition time, 14–30 ms echo time, 0.4–0.9 × 0.4–0.9 × 3–6 mm3 voxel size, 4–7.5 mm interslice gap, and 160–250 × 200–250 mm FOV. As previously described11,40, stroke-related lesions on MR images were semi-automatically segmented and registered onto a standard brain template88 in Montreal Neurological Institute (MNI) space by research assistants operating a custom-built software package (Image_QNA)89 under slice-by-slice supervision by an experienced vascular neurologist (author W.-S.R.). This quantitative method demonstrated excellent inter- and intra-observer reliability11. Of note, when chronic lesions on FLAIR and acute lesions on diffusion-weighted MRI overlapped, the extent and distribution of FLAIR WMH contralateral to the location of acute infarct served as a reference to determine what volumes to include and exclude, by assuming a symmetric distribution of WMH across the midline11,12.

UK Biobank structural imaging included 3D MPRAGE T1-weighted MRI acquisition with 1 mm isotropic resolution (repetition time = 2000 ms, echo time = 2 ms, slice thickness = 1 mm, field of view (FOV) = 192 × 256 × 256) and 3D SPACE FLAIR MRI acquisition with 1.05 × 1 × 1 mm resolution (repetition time = 5000 ms, echo time = 395 ms, slice thickness = 1.05 mm, FOV = 192 × 256 × 256). T1-weighted MRI was preprocessed using FreeSurfer90 v7.1 and the bias-corrected image was linearly registered to the FLAIR using six degrees of freedom. Both T1-weighted and FLAIR images were used as inputs for WMH segmentation following the method described by Park et al.91. This deep-learning-based algorithm, an ensemble U-Net with multi-scale highlighting foregrounds, demonstrated exceptional performance in detecting WMH and placed first among 57 teams in the WMH segmentation challenge (https://wmh.isi.uu.nl/#_Results)92. The segmented WMH lesion in the native space was then registered to the standard brain template in MNI space by using a non-linear registration method (FMRIB Software Library [FSL], FNIRT93).

Brain parcellation

For main analyses, we used a bullseye parcellation method31,94 to define white matter ROIs for WMH. We applied the bullseye pipeline (https://github.com/gsanroma/bullseye_pipeline) to the standard brain template in MNI space, segmenting the brain into 20 ROIs (Supplementary Fig. 2). Each functional lobe, including frontal, basal ganglia, temporal, parietal, and occipital lobes, was divided into four layers based on the relative distance to lateral ventricles and cortical surfaces: layer 1 (closest to the ventricles) ranged from 0 to 0.25, layer 2 ranged from 0.25 to 0.5, layer 3 ranged from 0.5 to 0.75, and layer four (closest to the cortical surfaces) ranged from 0.75 to 1. We chose bullseye parcellation for its excellent spatial resolution across axial, sagittal, and coronal planes of brain, plus its capacity to incorporate conventional parcellation based on distance from ventricles. Additionally, we generated another bullseye parcellation using cerebral arterial territories instead of functional lobes (Supplementary Fig. 7). We have previously specified cerebral arterial territories of major arteries (ACA, MCA, and PCA) and their border zones (ACA-MCA and MCA-PCA) in MNI space41, derived from diffusion-weighted MRI of 1160 patients with supratentorial infarction due to isolated major artery stenosis (>50%) or occlusion. We again divided each arterial territory into four layers, resulting in a total of 20 ROIs.

SuStaIn modeling

SuStaIn modeling is event-based modeling95,96 that describes disease progression as a sequence of discrete events, wherein the total number of stages equals the number of events. Each event is defined as the incident at which a disease-related biomarker reaches a predefined abnormal level (cutoff). For a detailed description of the mathematical framework, see previously published work by Young et al.37. In the present study, to identify subtyping and staging for spatiotemporal WMH progression we estimated subtype-specific WMH severity evolution by using cross-sectional datasets, in which each individual contributed a single progression pattern derived from 20 prespecified ROIs. Thus, the number of total WMH progression stages was set to 20. Because baseline WMH burden and ROI size vary across regions, WMH severity must be normalized to ensure that the abnormal level (cutoff) reflects comparable pathological significance across all ROIs. Otherwise, raw WMH volumes may confound the event sequence by exaggerating or underestimating regional involvement. To address this, and in line with previous SuStaIn studies, we evaluated abnormal levels by calculating ROI-wise WMH severity in stroke patients (from the Korean MRI-based stroke database) relative to a low-risk control subpopulation (from the UK Biobank). First, we calculated ROI-wise WMH volumes in both low-risk controls and stroke patients by counting lesioned voxels. Poisson regression was performed on WMH volumes in each ROI for the low-risk controls, with age as an independent variable. The effect of age was then regressed from raw WMH volumes in each ROI to obtain age-adjusted WMH volumes for low-risk controls. The mean and standard deviation of these age-adjusted WMH volumes in low-risk controls were then used to normalize raw WMH volumes in stroke patients, thereby yielding ROI-specific WMH severities (z-scores). Regressions and normalizations were performed separately for men and women.

We applied SuStaIn modeling to the stroke patient dataset, with a WMH severity (z-scores) cutoff set to 1.5 and the maximum WMH severity threshold set to the median of 95th percentile values derived from ROI-wise WMH severities in stroke patients (8.5 for the main model). Model fitting and uncertainty estimation methods were adopted from previous work by Young et al.37. The expectation-maximization algorithm for model fitting was run with 25 different start points for random cluster assignments. Model uncertainty was then estimated through 100,000 Markov chain Monte Carlo iterations. The maximum number of subtypes was set to five. The optimal number of subtypes was decided by 10-fold CV. Out-of-sample model fits were evaluated by calculating the log-likelihood and CVIC37,38 across each N-subtype model. Progression pattern similarities within the same subtype between CV folds were assessed using the Kendall rank correlation coefficient for estimated event sequences. Similarly, progression pattern similarities across different subtypes were assessed using the Kendall rank correlation coefficient within each CV fold, and the maximum similarity for all subtype pairs was identified within each fold. We determined the optimal number of subtypes by considering all CV evaluation metrics. Then, individual subtypes were estimated by integrating likelihood across all stages and selecting the subtype with the highest likelihood. Each individual’s stage was then assigned based on the stage with the highest likelihood for the selected subtype. All likelihoods were integrated across all Markov chain Monte Carlo samples. Probability of ML subtype was calculated as the likelihood of the assigned subtype divided by the sum of all subtype likelihoods for each individual.

Prediction of longitudinal change in WMH stage

We applied linear multivariable regression models with leave-one-out CV to the validation data in order to predict the change in WMH stage difference from admission to FU. Models included FU duration (in years); baseline characteristics such as age, sex, hypertension, diabetes mellitus, hyperlipidemia, current smoking, AF, coronary artery disease, and revascularization therapy; plus WMH subtype and WMH stage at admission. After calculating predicted changes in WMH stage, we performed a linear regression analysis comparing observed WMH stage changes to predicted changes.

Subtype characterization of demographics, vascular risk factors, stroke etiology, stroke severity, and post-stroke outcomes

Subtype differences in demographics, vascular risk factors, and all stroke outcomes were evaluated. We used mixed-effects regression models—linear regression for continuous variables and logistic regression for categorical variables—to analyze all clinical assessments, except 1-year stroke recurrence and mortality, as dependent variables (outcomes). The mixed-effects regression model for each clinical assessment included age (except in the case of age), sex (except in the case of sex), subtype, stage, and subtype–stage interaction as fixed effects, with imaging center as a random effect. Inclusion of subtype-stage interaction was determined by comparing restricted ML (for continuous variables) and Akaike information criterion (for categorical variables) between models with and without the interaction term. Even if the model with the interaction term provided a better fit, the model without the interaction term was finally selected if the P for the interaction exceeded 0.1. When subtype–stage interaction was included in the model, subgroup analyses were extended by dividing patients into stage ≤9 (median) or stage >9 to further evaluate subtype differences in clinical assessments along stage. Onset to MRI time was included as a fixed factor for acute infarct volume analysis. Revascularization therapy was included as a fixed factor for post-stroke outcomes such as early neurological deterioration within 3 weeks, early neurological deterioration causes, and unfavorable functional outcome at 3 months. For 3-month unfavorable functional outcome, patients with pre-stroke mRS scores ≥2 were excluded, and pre-stroke mRS score was added as a covariate. EMMs across stages for each dependent variable were calculated by marginal effects at average values of covariates97. WMH volume, acute infarct volume, and NIHSS scores (as dependent variables) were log-transformed to account for skewed distributions that caused model residuals to deviate from normality. EMMs and CIs were back-transformed to their original scale.

We performed Cox proportional hazards regression models for 1-year stroke recurrence and mortality. These models included WMH subtype and stage with covariates such as age, sex, and revascularization therapy. Inclusion of subtype–stage interaction was determined by comparing Akaike information criterion between models, with the interaction term excluded if its P exceeded 0.1.

Fazekas scale assessment

We assessed Fazekas scale in 9179 stroke patients from the main dataset. Five experienced vascular neurologists visually rated PVWMH and DWMH on a 0–3 scale, each for approximately 2000 patients. All raters followed the same instructions, based on the report by Fazekas et al. in 198730. To evaluate inter-rater reliability, all raters assessed the same 100 patients. Intraclass correlation coefficient values and 95% CIs for total, PVWMH, and DWMH Fazekas scales were 0.97 [0.96–0.98], 0.95 [0.93–0.96], and 0.95 [0.93–0.96], respectively.

Discrimination between high-risk controls and stroke patients by WMH stage vs. volume

We compared the distributions of WMH stage and volume between controls and stroke patients by using Mann–Whitney U tests. Next, we generated ROC curves for each measure (WMH stage and volume) to assess their capability to discriminate between the two groups, using the pROC package in R software (version 4.3.1). Finally, we compared the areas under the ROC curves by using DeLong test98.

Analysis of trajectory deviations in spatiotemporal WMH progression between stroke patients and high-risk controls

To estimate trajectory deviations in WMH spatiotemporal progression in the high-risk control group compared to the stroke group, we calculated the Euclidean distances between the ROI-based 20-dimensional coordinates of individual high-risk controls and the corresponding median coordinates of the stroke group for each stage and subtype, thereby generating 60 (20 stages × 3 subtypes) median values with their 95th percentiles. Additionally, we computed within-population 95th percentiles for the stroke group. For each subtype and stage, outliers among high-risk controls were identified using the 95th percentile of Euclidean distances observed in the stroke group.

Cerebral arterial territory model for WMH progression

SuStaIn modeling was performed for stroke patients by using cerebral arterial territory parcellation (Supplementary Fig. 7), with WMH severity cutoff set to 1.5 and maximum WMH severity set to 11.5 (the median of the 95th percentiles of ROI-wise WMH severities in stroke patients). We repeated the same procedure to evaluate the cerebral arterial territory model as the main model.

Sensitivity analysis

We used propensity score matching to extract age- and sex-matched low-risk controls and stroke patients (Supplementary Fig. 1). Propensity scores were calculated using logistic regression based on age and sex, followed by 1:1 matching with a caliper width of 0.2. After matching, 7311 stroke patients and 7311 low-risk controls were included for sensitivity analysis, which was performed using MatchIt99 and cobalt100 packages in R software version 4.3.1. We repeated the same analyses as for the main model.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.