Introduction

Parkinson’s disease (PD) is a chronic and progressively debilitating neurodegenerative disorder. It is clinically characterized by cardinal motor symptoms including bradykinesia, resting tremor, muscle rigidity, and postural instability1. These movement symptoms significantly impair patients’ quality of life, leading to progressive disability and compromised activities of daily living. As the disease advances, PD patients exhibit extensive neurodegenerative changes throughout the brain, manifesting as progressive functional and structural alterations at both macroscopic and microscopic levels2,3. However, due to the inherent pathogenic heterogeneity and diverse progression patterns, the development of effective clinical interventions remains challenging. Emerging evidence suggests that clinical performance and neural degeneration patterns exhibit distinct characteristics across different disease stages4. Consequently, establishing validated longitudinal biomarkers and implementing dynamic monitoring frameworks for neurodegeneration quantification arises as an imperative research priority. Decoding patterns of progressive neural reorganization at the whole-brain level across distinct stages of the disease course will contribute to clarifying the neural mechanism of PD over time.

Magnetic resonance imaging (MRI) is a non-invasive technique with standardized acquisition parameters that enables the quantification of macroscopic changes in brain regions. Previous studies have reported gray matter (GM) volume alterations in both cortical and subcortical structures5,6, which demonstrate significant associations with PD motor-symptom severity. Past investigations have consistently identified reduced GM volume in both brain hemispheres of PD patients, particularly affecting key brain regions including the amygdala7, putamen8,9, frontal lobe10, and thalamus11. However, these cross-sectional studies, focusing on specific time points, are inherently limited in tracking how brain structures evolve over time in PD. This limitation becomes particularly apparent given the substantial differences in individual progression rates, which sometimes even lead to contradictory findings across studies12,13.

Longitudinal studies give a possible way to address the limitations4. Findings from the research with follow-up periods ranging from 1.5 to 3 years for PD and healthy controls consistently indicate significant reductions in total GM volume among patients with PD in early to mild stages14,15. Blair et al.16 assessed GM density in patients across early and late disease stages. They found that patients exhibited GM atrophy in the bilateral hippocampus in advanced stages. Additionally, a separate longitudinal study17 revealed a gradual GM volume shrinkage in the bilateral caudate nucleus in PD from baseline to 12-month follow-up. These reports of localized GM atrophy in the PD brain fail to account for the complex interplay and pattern-level synergistic effects across the whole brain. This may be due to the discrete analysis of the standard voxel-based morphometry (VBM) method, focusing solely on local brain structural changes, neglecting the correlation between brain regions that often signify characteristics of the latent brain co-degenerating. Consequently, there remains a gap in understanding the latent pattern of progressive changes in GM volume in the brain in PD. A comprehensive investigation of whole-brain-level GM alterations across distinct disease stages using longitudinal neuroimaging data is needed.

Non-negative matrix factorization (NMF) is an unsupervised multivariate analysis method18, similar to principal component analysis and independent component analysis19,20,21,22,23,24,25,26 in some aspects. However, due to their algorithms, these methods have limitations in the interpretability of their results compared to NMF, even with high prediction accuracy. Non-negativity constraints ensure that the decomposed matrices are free of negative components and weights, enabling the data to be described as a simple additive reconstruction of each decomposed component, which enhances the identification of potential structural patterns. In neuroimaging, NMF has been widely used in MRI image segmentation27,28, disease heterogeneity analysis29 and data dimensionality reduction30. Compared with the VBM method, NMF can take advantage of the differences between brain regions. Correlation information is used to cluster voxels with similar information into latent factors, facilitating the identification of potential distribution patterns of brain structure. With biologically meaningful features, predictions of clinical scales will be more reliable, practical, and interpretable.

Therefore, the present study used NMF to obtain the GM latent factors and identify the longitudinal structural changes in PD. Since this pattern-based approach provides a more comprehensive and biologically plausible representation of GM patterns than isolated regional measures, we first hypothesize that GM patterns (as captured by NMF-derived factors) remain relatively stable in HC during their sixties and seventies31, thus providing a reliable normative basis. Secondly, we hypothesize that deviations in these factor weights observed in PD relative to the established HC normative trajectory would reflect heterogeneous pathological change patterns specific to PD progression, especially in GM patterns associated with motor cognition. Therefore, we further hypothesize that the longitudinal factor weights in PD would significantly predict longitudinal changes in motor-symptom severity, which would function as clinically relevant neuroimaging biomarkers for tracking the longitudinal trajectory of motor symptoms in PD. To this end, we first decomposed GM images into latent factors from healthy adults in the Open Access Series of Imaging Studies 3 (OASIS-3), and then validated the stability on an independent dataset by verifying the similarity of the structural factors. The latent factors were subsequently applied as a basis for the GM images of the PD patients for further longitudinal analysis and prediction. Figure 1 shows the flowchart of data analysis.

Fig. 1: The flowchart of data analysis.
figure 1

a T1w images were processed in the standard SPM workflow to construct a GM matrix for NMF. b The NMF reconstruction algorithm was performed on PPMI under the most stable factors from HC_1. c After performing longitudinal ComBAT, group-level and longitudinal analyses of the weights were conducted to investigate the trajectory of disease progression associated with each factor. The XGBoost predictive model was employed to validate the utility of these factors as biomarkers. GM: gray matter; TIV: total intracranial volume; NMF: non-negative matrix factorization; NNBP: non-negative basis pursuit.

Results

The decomposed latent GM factors

Following the NMF procedures outlined in the “Methods” section, we observed that the reconstruction errors in both datasets exhibited similar distributions (Fig. S1a, b). Notably, as the number of decomposition factors exceeded 5 and the sparsity surpassed 0.3, the NMF reconstruction errors gradually stabilized. Specifically, with 7 decomposition factors and a sparsity of 0.4, the average similarity of the decomposition factors between the HC_1 and HC_2 datasets reached 0.75 (Fig. S1c).

Employing the optimal decomposition parameters (k = 7, λ = 0.4), we obtained the most stable decomposed latent GM factors, denoted as WHC_1 (Fig. 2a). The decomposed results of the HC_2 dataset under the same parameters are shown in Fig. S2 for comparison. Factor 1 predominantly occupied the frontal lobe area, while Factor 2 was situated in the supplementary motor area and precentral gyrus. Factor 3 covered the middle temporal gyrus, precuneus, and inferior occipital gyrus, and Factor 4 spanned the pericalcarine cortex and superior occipital gyrus. Factor 5 encompassed mainly the basal ganglia, while Factor 6 included the amygdala, parahippocampal gyrus, and inferior temporal gyrus. Factor 7 was primarily distributed in the cerebellum area.

Fig. 2: NMF decomposition 7 factors spatial distribution differences in weights, and NeuroSynth meta-analytic decoding of factors.
figure 2

a Factor 1: higher cognitive processing; Factor 2: motor function; Factor 3: perceptual processing; Factor 4: visual processing; Factor 5: subcortical basal ganglia; Factor 6: emotion processing; Factor 7: cerebellum. The darker color indicates a higher contribution at the spatial location for the factors. b The patterns of longitudinal weight change of each factor. *Significant between-group differences after Tukey HSD (p < 0.05) *<0.05; **<0.01; ***<0.001; ****<0.0001. c Heatmap shows the correlation coefficients for each factor with the 36 terms of interest. A darker color represents a high correlation coefficient.

Longitudinal change trajectory of factor weights in PD

Linear Mixed-effects Model (LMM) with covariates regressed was applied on PPMI longitudinal analysis after harmonizing the site/scanner confounding by longitudinal ComBat32. Post-hoc pairwise comparisons were corrected by Tukey’s honest significant difference (Tukey HSD). PD GM volume, as shown in Fig. 2b, revealed a declining trajectory as the disease progressed. Specifically, some decreases were statistically significant at baseline compared to the 1-year follow-up: Factors 1, 3, and 4: p < 0.0001; Factor 6: p < 0.001 (Tukey HSD: q < 0.05). Furthermore, the 2-year follow-up of Factors 4 and 6 showed a significant decline compared to the 1-year follow-up (Factors 4 and 6: p < 0.05, Tukey HSD: q < 0.05). All factors showed significant weight decrease over 2 years compared with baseline (Factors 1, 3, 4 & 6: p < 0.0001; Factor 2: p < 0.01; Factor 5: p < 0.05, Tukey HSD: q < 0.05), except for Factor 7, which demonstrated a unique trajectory.

Meta-analytic function decoding of factors

To decode the psychological and physiological functions of the derived factors, we compared the spatial pattern of factors to the functional anatomy of the human brain using NiMARE. A total of 36 terms with strong correlations to the factors were selected, each demonstrating distinct functional profiles. A heatmap was subsequently generated to visually assess the potential functions associated with each factor (Fig. 2c). Specifically, Factor 1 was associated with higher cognitive processing, such as decision-making, personality, social behavioral regulation, executive function, and cognitive control; Factor 2 was correlated with motor control concepts, including motor and actions; Factor 3 centered around perceptual processing, including visual, auditory, action and observation; Factor 4 exhibited relatively concentrated correlations in terms associated with visual perception and navigation; Factor 5 was linked to concepts related to incentive and reward; Factor 6 highlighted affective processing and emotion regulation; Factor 7 demonstrated the peak in navigation and motor.

Longitudinal motor-symptom severity prediction

The XGBoost regression model showed that the factor weights successfully predicted the longitudinal motor-symptom severity measured by MDS-UPDRS II and III. When using factor weights in baseline to predict 1-year follow-up MDS-UPDRS-II (Spearman’s ρ = 0.4715, 95% CI: [0.2671, 0.6759], p < 0.001, MSE = 8.9928), Factors 3 and 7 demonstrated a predominant contribution (Fig. 3a). When using factor weights in baseline and 1-year follow-up to predict 2-year follow-up MDS-UPDRS-II, (Spearman’s ρ = 0.4543, 95% CI: [0.2457, 0.6629], p < 0.001, MSE = 19.4617) both Factors 2 and 3 in baseline exhibited notable importance, and Factor 4 in 1-year follow-up showed relative importance (Fig. 3c).

Fig. 3: Reconstruction weights of PD predict the MDS-UPDRS Ⅱ/Ⅲ scores in the first/second year by XGBoost.
figure 3

For each set, the components are presented from left to right as follows: correlation graph of CV results; feature contribution (blue: baseline; red: 1 year). a Baseline weights ->1-year MDS-UPDRS-II. b Baseline weights ->1-year MDS-UPDRS-Ⅲ. c Baseline and 1-year follow-up weights ->2-year MDS-UPDRS-Ⅱ. d Baseline and 1-year follow-up weights ->2-year MDS-UPDRS-Ⅲ. e Summary of the results above. Respectively, the longitudinal statistical results for MDS-UPDRS-II and III are presented, with the most powerful predictive factor identified for each disease stage through the prediction model on CV.

When using factor weights in baseline to predict 1-year follow-up MDS-UPDRS-III (Spearman’s ρ = 0.4984, 95% CI: [0.3008, 0.6959], p < 0.0001, MSE = 59.3074), Factor 3 emerged as the leading predictor (Fig. 3b). When using factor weights in baseline and 1-year follow-up to predict 2-year follow-up MDS-UPDRS-III (Spearman’s ρ = 0.5625, 95% CI: [0.3828, 0.7422], p < 0.0001, MSE = 61.1864), Factor 5 in baseline emerged as a critical contributor, while Factor 3 in baseline showed relative importance, and its 1-year follow-up values retained substantial influence (Fig. 3d). The prediction behavior was summarized in Fig. 3e.

Discussion

Our study explored the latent structure of GM in healthy elderly brains using the NMF method and identified 7 factors corresponding to different covariance patterns. The factors decomposed the GM into the frontal lobe area, the motor area, the perceptual processing area, the visual processing area, the subcortical basal ganglia, the emotion processing area, and the cerebellum area. These factors demonstrated robustness when applied to an independent dataset (Fig. S2). Through further longitudinal analysis in PD, we found that the weights of these factors exhibited consistently gradual reductions in GM volume over 2-year follow-up, except for Factor 7, the cerebellum, which exhibited an inverted U-shaped trajectory (Fig. 2). The RF model proved that the weights had the ability to predict the longitudinal clinical scores of MDS-UPDRS-Ⅱ & Ⅲ in PD, and the important factors contributing to the prediction were detected. The findings revealed distinct patterns in how these factors contribute to predicting symptom severity as the disease progresses.

The MDS-UPDRS-Ⅱ captures motor-related daily living experiences in Parkinson’s disease33. In both longitudinal prediction models of MDS-UPDRS-Ⅱ, Factor 3 (perceptual processing) demonstrated a marked persistence in feature importance during disease progression (feature importance from 0.2596 at 1-year prediction to 0.4535 at 2-year prediction). It was a hub for motion perception34, and the inferior occipital gyrus integrated visual inputs for motor planning35. While some argued that occipital-temporal atrophy primarily reflects comorbid Lewy body pathology36,37, we found that Factor 3 specifically predicted motor (MDS-UPDRS-Ⅱ and Ⅲ) scores, supporting its role in visuomotor integration instead of pure dementia progression38,39,40. These regions played a crucial role in motor control, visual feedback, and cognitive-motor coordination. Factor 7 (cerebellum) suggested an important role of the cerebellum in early PD advances. Notably, in longitudinal GM volume analysis, the weights of Factor 7 exhibited a distinct trajectory and even not significantly different: an inverted U-shaped trajectory instead of a decline in cerebellar GM volume was observed over time. Other studies on patients with movement disorders have also reported increased cerebellar volume in this age group, attributing it to a possible compensatory mechanism in response to functional impairments41,42,43,44,45,46. Further functional studies are needed to clarify the causal relationship between cerebellar activity and early PD pathology.

In the second year prediction of MDS-UPDRS-Ⅱ, the growing significance of baseline Factor 2 (motor function) highlighted the progressive disruption of premotor cortical networks, which were critical for self-initiated movement and autonomous action initiation47. This dysfunction likely exacerbated difficulties in executing routine motor tasks, as the brain’s ability to generate spontaneous movement became increasingly compromised48,49. Concurrently, the emergence of the importance of 1-year atrophy in Factor 4 (visual processing) introduced a new layer of complexity: reduced gray matter volumes in these domains would impair the visual-motor coordination and navigation50, and consequently affect the motor symptom in the following year as disease progression. Over a 2-year follow-up, the weights associated with Factors 2, 3, and 4 showed a significant progressive decline, indicating increasing difficulties in the executive integration of action plans51. Together, these findings underscore how PD progression transforms motor behavior from a relatively simple system into a complex one that becomes increasingly reliant on frontal-parietal-cerebellar-cortical interactions to sustain functional independence52.

Factors 3 and 5 demonstrated robust predictive utility for MDS-UPDRS-III scores, a relatively objective clinician-rated scale assessing motor impairment severity in PD, across disease progression. Factor 3 showed consistent importance across all MDS-UPDRS-Ⅲ predictions, while Factor 5 stressed the most critical role in the 2-year prediction of the scale. Their combined degeneration could impair visuomotor coordination, a known factor to overt gait dysfunction in PD53,54. Factor 5 was mostly composed of the basal ganglia, one of the crucial subcortical structures in the human central nervous system that influences motor ability, cognitive function, and emotional behavior at multiple levels55,56. Previous studies have revealed that patients with PD experience significant loss of GM volume in basal ganglia regions such as the nucleus accumbens, amygdala, and caudate nucleus as the disease progresses5,6. The temporal shift in primary baseline predictors probably suggested that patients suffered more severe and obvious motor dysfunction along the progression of the disease.

The present study has several limitations. Firstly, we assumed the stability of the structural pattern factors obtained by NMF within the 60–70 age range, and the factors acted as a normative basis in our study. Although it has been verified that gray matter atrophy patterns in healthy individuals exhibit high stability in the 60–70 age range31, the strictly matched age range of all datasets and the matched longitudinal controls may quantify the subtle age-related GM loss and provide more purified and detailed comparison results. The absence of age-matched healthy controls in our study implied that long-term changes might not be conclusively separated from normative aging effects. Secondly, as far as the method is concerned per se, NMF is to explore the normative basis from the 5,855,005 GM voxels as the elements of the GM distributed pattern. In our study, seven factors/elements were detected in HC, which represent the main skeleton of the GM in HC. To balance the size of the detailed structure and the main basis, we used the smooth kernel of 8 mm, as prior studies suggested57,58. Therefore, unlike the standard VBM analysis, which focused solely on local brain structural changes, some small structures, such as the substantia nigra might not be included in the factors obtained by NMF. Finally, the predictive performance for clinical scales, while statistically significant, remained moderate in our limited sample. Although the feature importance remained relatively stable across the different models, our predictive models would be better considered as exploratory tools with limited clinical applicability.

In sum, our study leverages NMF to map the heterogeneity of longitudinal gray matter change patterns in PD. We identified seven distinct neuroanatomical factors in HC that serve as normative reference patterns, which were consistent when using an independent dataset. The factors were found to be functionally associated with (1) higher cognitive processing; (2) motor function; (3) perceptual processing; (4) visual processing; (5) subcortical basal ganglia; (6) emotion processing; (7) cerebellum. Crucially, the weights corresponding to the factors exhibited disease-specific longitudinal trajectories in PD, which demonstrated significant predictive power for motor-symptom progression. Factors 2, 3, and 7 played pivotal roles in longitudinally predicting MDS-UPDRS-Ⅱ scores, whereas Factors 3 and 5 accounted for most change in MDS-UPDRS-Ⅲ, suggesting differentiated GM elements that characterized the progressive changes of motor-related daily living experiences and motor impairment severity, respectively. The proposed data-driven framework provides a novel approach for characterizing disease heterogeneity progression in this neurodegenerative disease, and shows potentially quantitative neuroimaging biomarkers of pathological progression of PD.

Methods

Participants

Three datasets were included in this study, all of which were approved by the ethical review boards of the respective research institutions.

  1. (1)

    Dataset 1 (HC_1) was sourced from OASIS-3. OASIS-3 is a publicly available neuroimaging database developed by the University of Washington, encompassing multiple age groups59. It includes MRI data from healthy adults, individuals with mild cognitive impairment, and patients with Alzheimer’s disease, offering a substantial repository of high-quality brain imaging data for research purposes. Following OASIS-3’s inclusion criteria for healthy subjects, this study included 199 participants aged between 50 and 85 years, with Mini-Mental State Examination (MMSE) > 24 and Clinical Dementia Rating Scale (CDR) = 0, indicating healthy controls.

  2. (2)

    Dataset 2 (HC_2) was derived from the collaborative efforts of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Neuroimaging in Frontotemporal Dementia (NIFD). ADNI is a longitudinal research endeavor spanning multiple research centers, aimed at identifying and validating clinical indicators, imaging characteristics, genetic markers, and biochemical indicators for the early identification and monitoring of Alzheimer’s disease60. NIFD, on the other hand, provides longitudinal clinical and imaging data related to frontotemporal lobar degeneration61. By combining the enrollment criteria of these two studies concerning healthy elderly subjects, we included 163 healthy controls aged between 50 and 85, with MMSE > 24 and CDR = 0. Clinical information for HC_1 and HC_2 is summarized in Table 1.

    Table 1 HC_1 and HC_2 subject information table

The Parkinson’s data was sourced from the PPMI. PPMI is a large-scale, multinational, multicenter study dedicated to collecting and publicly sharing clinical data, genomic information, patient-reported outcomes, and imaging study results related to Parkinson’s disease62. PPMI baseline inclusion criteria for PD patients included: (1) exhibiting at least two motor symptoms; (2) being diagnosed with PD no more than 2 years ago and in the early clinical stage of the disease at baseline; (3) no symptomatic treatment within 6 months post-baseline; (4) presence of Dopamine Transporter (DAT) deficiency. Healthy subjects included in the study must exhibit no obvious neurological impairment, have no first-degree family member with PD, and score above 26 on the Montreal Cognitive Assessment (MOCA). Based on these criteria, we selected 48 healthy individuals (baseline data only) and 78 PD patients (containing data at baseline, 1-year follow-up, and 2-year follow-up) from the PPMI database. Table 2 summarizes baseline demographic and clinical data.

Table 2 Clinical results at baseline in PD and control samples

Ethical approval was obtained from local ethics committees for each original studies: For the HC_1 and HC_2 data (OASIS-3, ADNI, and NIFD), approval was granted by the Institutional Review Board of Washington University School of Medicine; WW-ADNI Resource Allocation Review Committee; the Trial Innovation Network at Johns Hopkins University, and local ethics committees at all sites approved the studies. PPMI data were approved by the ethical standards committee on human research at each participating institution. All subjects gave written informed consent in accordance with the Declaration of Helsinki prior to enrollment. As this study involved secondary analysis of existing de-identified data, no new ethical approval was required from the ethics committees for the current report.

MRI acquisition

All magnetic resonance imaging scans adhered to the standard protocols established by their respective studies. Sagittal 3D T1-weighted (T1w) images were acquired using the gradient echo/inversion recovery (GR/IR) sequence. Scanning parameters per study are detailed in Table 3, in accordance with the data inclusion criteria of their respective research plans.

Table 3 MRI data acquisition parameters

Image preprocessing

Each participant’s T1w images were processed using MATLAB 2018a and the CAT12 toolbox in SPM 12 (https://neuro-jena.github.io/cat/). The preprocessing steps included the following: (1) Manual correction of the origin of all T1w images to align the anterior commissure and posterior commissure on the same horizontal line. (2) Segmenting the images to extract three tissue components: gray matter, white matter, and cerebrospinal fluid. (3) Normalizing the gray matter images to the Montreal Neurological Institute template. (4) Modulating the gray matter voxel density into volumes. (5) Correcting the gray matter volume by dividing it by the total intracranial volume to mitigate the impact of multi-center site acquisition on the results. (6) Smoothing the gray matter images using an 8 mm full-width at half maximum Gaussian kernel (Fig. 1a).

Data harmonization

We implemented the validated longitudinal ComBat method to harmonize GM factor weights across multicenter PPMI datasets using its implementation in R with parametric empirical Bayes estimation (https://github.com/jcbeer/longCombat?tab=readme-ov-file). This technique, extended from cross-sectional ComBat63, effectively removes non-biological variance induced by differing MRI scanners and acquisition protocols while preserving longitudinal within-subject dependencies. Harmonization was performed on all seven GM factor weights, with diagnostic group (PD/HC), age, sex, total intracranial volume (TIV), and education years specified as biological covariates to retain. Scanner site and longitudinal time points were modeled as batch effects.

Non-negative matrix factorization

NMF produces a sparse, part-based data representation under the constraint of non-negativity18, and the results of NMF can be viewed as an additive combination of factors and weights. We used sparse non-negative matrix decomposition under \({{\mathcal{l}}}^{0}\) norm constraints64, which can specify the number of non-zero elements in the decomposition factor or weight. The package for NMF is available at https://github.com/smatmo/l0-sparse-NMF. Its mathematical definition is as follows:

$${\min {||X}-{WX||}}_{2}s\cdot t\left\{\begin{array}{c}W(:)\ge 0\\ H(:)\ge 0\\ \sum \left(W\left(:,k\right) > 0\right)\le L\end{array}\right.$$
(1)

In the formula, X is an m×n-dimensional non-negative GM matrix, where m represents the number of GM voxels, and n represents the number of subjects. This process is specifically carried out for the GM voxels in the brain, using a GM template sourced from the GM probability map in the SPM12 toolbox, with a probability threshold set to 0.25. The final output of segmentation, for each subject, was a 3D image registered to the GM template and with a size of 169 × 205 × 169. W is an m×k matrix, where k is the number of decompositions and k\(\le\)min(m,n), representing the number of GM matrix decomposition factors. H has dimensions of k×n, representing the weight of each subject in each GM latent factor, respectively. L represents the maximum number of non-zero elements in each column of W.

NMF followed a two-stage iterative approach, as illustrated in the Algorithm64. We first calculated an optimal, unconstrained solution for the basis matrix W (with fixed H) in step 3 by sparseNNLS.m. \(\boldsymbol{\mathcal{l}}\)0-constraints were satisfied by projecting the basis vectors onto the closest non-negative vector in Euclidean space (Steps 4–6). Step 7 enhanced H, maintaining the sparse structure and updating the non-zero entries of W. The technique for unconstrained NMF does not increase ||X−WH||2 and typically reduces the objective by the following multiplicative updates rules18.

$$H\leftarrow H\otimes \frac{\left({W}^{T}X\right)}{\left({W}^{T}{WH}\right)}$$
(2)

and

$$W\leftarrow W\otimes \frac{\left({{XH}}^{T}\right)}{\left({{WHH}}^{T}\right)}$$
(3)

where and / denote element-wise multiplication and division, respectively. Therefore, Step 7 in the Algorithm can be implemented by executing for several iterations.

In our script, the “num” was set as 30. Reproducibility was quantified by the similarity of factors between the HC_1 and the HC_2 cohorts, computed as the mean correlation between corresponding factor pairs (using the Hungarian matching algorithm, which is accessible by https://github.com/ondrejdee/hungarian/).

Reconstruction error and decomposition parameters

We aimed for the NMF decomposition latent factor to be consistent across different datasets. Therefore, we initially computed the reconstruction error across varying numbers of decompositions (k: 2–20) and sparsity values (λ: 0.1–0.9) in both the HC_1 and HC_2 datasets. The NMF reconstruction error was quantified as the Frobenius norm between the original input gray matter matrix and its corresponding reconstructed matrix.

$$\left\{\begin{array}{c}{\epsilon }_{\lambda }^{k}={\rm{||}}X-W\times H{\rm{||}}\frac{2}{F}\\ \lambda =\frac{L}{m}\end{array}\right.$$
(4)

In the formula, \({\epsilon }_{\lambda }^{k}\) represents the decomposition error when the number of decompositions is k and the sparsity is λ.

The Hungarian matching algorithm was then used to match the decomposition factors of the two datasets, and the average Pearson similarity of the matching factors was calculated to measure the repeatability of NMF in different datasets. As shown in Equation [5]:

$$\overline{{r}_{k,\lambda }}=\frac{{\sum }_{i=1}^{k}r\left({W}_{H{C}_{1}}^{i},{W}_{H{C}_{2}}^{i}\right)}{k}$$
(5)

In the formula, \(\bar{{r}_{k,\lambda }}\) represents the average Pearson similarity of the two datasets when the number of decompositions is k and the sparsity is λ. Then, select k and λ when \(\bar{{r}_{k,\lambda }}\) is the largest as the optimal decomposition parameters. In line with previous research65, the NMF procedure was performed 100 times to reduce the impact of random initializations. The final factors were determined by selecting the one that was most similar to the decompositions obtained in the remaining 99 runs.

Finally, we used the reconstruction algorithm to calculate the decomposition weight of the gray matter matrix of PPMI baseline healthy subjects and PD at different time points under the most stable decomposition result WHC_1 constraint, as shown in Equation [6].

$${X}_{{PPMI}{{\_}}{HC}}={W}_{{Aging}{{\_}}1}\times {H}_{{PPMI}{{\_}}{HC}}$$
$${X}_{{PPMI}{{\_}}{PD}}={W}_{{Aging}{{\_}}1}\times {H}_{{PPM}{I}_{{PD}}}$$
(6)

Meta-analytic functional association mapping

Probabilistic functional profiles of the identified factors were decoded using Neurosynth v0.4.1, which is available on https://github.com/neurostuff/NiMARE. It’s a large-scale meta-analysis platform synthesizing over 15,000 published fMRI studies66. We selected the representative regions of each factor for meta-analysis to mitigate confounding effects and enhance analytical specificity. We quantified associations with 36 curated functional terms spanning affective, cognitive control, sensory, and motor domains with a frequency threshold of 0.001. The output values were scaled to 0~1 for a better visualization.

MDS-UPDRS for model validation

The MDS-UPDRS was developed to provide a comprehensive assessment of PD symptoms and their impact on daily functioning33. This scale was established in response to the need for a standardized tool that could effectively capture the multifaceted nature of PD, encompassing both motor and non-motor symptoms, which can reflect the quality of patients’ life67. The MDS-UPDRS consists of four distinct parts: Part I evaluates non-motor experiences of daily living, Part II assesses motor experiences of daily living, Part III focuses on the motor examination, and Part IV addresses motor complications. It allows a thorough evaluation of disease progression and treatment effects. The affection of the MDS-UPDRS in longitudinal datasets has been proven68,69. Since our research mainly focused on the evolution of motor symptoms over time, parts Ⅱ and Ⅲ of the scale were selected for this study to characterize and evaluate the disease trajectory of the patient.

XGBoost prediction model

To investigate whether the weights of the NMF decomposition factor can predict longitudinal clinical scale scores, XGBoost70 was utilized to predict the MDS-UPDRS scale scores. This selection was based on the comparison of a range of models, including linear regression, Decision Tree, Extra Tree, AdaBoost, and Random Forest (Supplementary Material 1). We implemented a rigorous multi-stage validation approach to develop and evaluate our predictive model. Firstly, we randomly partitioned the dataset into a training set and an independent test set using a 3:1 ratio. To optimize model performance while mitigating overfitting risks, we employed a 5-fold cross-validation (CV) process on the training set with grid search hyperparameter tuning. The performance of prediction models during cross-validation was assessed using the Mean Squared Error (MSE) between the predicted and observed scale scores, with the significance level p = 0.05. The optimally configured model, selected based on the cross-validation results, was then trained on the entire training set and evaluated on the independent test set. With the limited data contexts, CV could provide more stable estimating results, but we reported the independent testing set as the generalization measure (Supplementary Material 2). To further validate the significance of our predictive model, we conducted a permutation test with 5000 iterations, maintaining a rigorous significance threshold of p < 0.001 for determining statistical significance of the observed predictive performance. Finally, we computed the feature importance of each predictive model across the entire dataset, with results visualized as radar charts. Feature importance analysis was computed using the weight method. All clinical scale scores violated normality assumptions (Kolmogorov–Smirnov test, p < 0.01 for all scales). We therefore adopted Spearman’s rank correlation for all correlation analyses between the predicted and true scores.

Statistical analysis

Following the reconstruction of the model trained on HC_1 (mean age: 69.01) onto the younger PPMI cohort (mean age: HC 61.75; PD 61.88), we implemented a linear adjustment to the factor weights (Supplementary Material 2)31. Two-sample t-tests and χ² tests were used to examine group differences in basic demographic variables. To assess significant differences in factor values across time points, this study performed post-hoc pairwise comparisons for each factor using LMM:

$${H}_{{ijt}}={\beta }_{0}+{\beta }_{1}\cdot {tim}{e}_{1}+{\beta }_{2}\cdot {tim}{e}_{2}+{\beta }_{3}\cdot {age}+{\beta }_{4}\cdot {sex}+{\beta }_{5}\cdot {education}+{\beta }_{6}\cdot {TIV}+{\beta }_{7}\cdot {{site}}_{1}+{\beta }_{8}\cdot {{site}}_{2}+{u}_{i}+{\epsilon }_{{ij}}$$
(7)

In the formula, Hijt represents the projected weight of the j-th factor for subject i at time t. Age, education, and total intracranial volume (TIV) were normalized. ui and ϵij are the random intercept and error, respectively, and are subject to a normal distribution. The new models appropriately account for within-subject correlations through random intercepts while adjusting for key covariates, including age, sex, education, and TIV. Least squares means were used to estimate expected factor values at each time point:

$$\hat{{\mu }_{t}}=E\left[{H}_{{ijt}}|{time}=t\right]\,{fort}\in \left\{0,1,2\right\}$$
(8)

The Tukey HSD was then used to control for multiple comparisons in the analyses. All analyses were performed in R.