Abstract
Parkinson’s disease (PD) is the second most common neurodegenerative disease with progressive structural alterations throughout the brain, resulting in motor symptoms that seriously affect patients’ daily life. The present study then aimed to explore the progressive co-changes in gray matter patterns in PD and identify the longitudinal neuroimaging biomarkers that could predict the progressive motor symptoms of PD. Non-negative Matrix Factorization (NMF) was first used to decompose gray matter images into 7 latent factors from healthy samples, and then the latent factors were validated on an independent dataset to verify the stability of the structural factors. Parkinson’s patients (including baseline, 1-year follow-up, and 2-year follow-up data) and healthy controls (HC) from Parkinson’s Progression Markers Initiative (PPMI) were used to find the correlation between factor weights and motor-symptom related Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) scores. The decreasing trend of the factor weights with increasing disease duration was found in the first 6 factors. The XGBoost prediction model demonstrated that Factor 2 (motor function), 3 (perceptual processing) & 7 (cerebellum) played pivotal roles in longitudinally predicting MDS-UPDRS-Ⅱ scores, whereas Factor 3 & 5 (subcortical basal ganglia) accounted for most change in MDS-UPDRS-Ⅲ. Our research indicated that the NMF factors could capture the progressive alterations of structural architectures in PD, and the factor weights were capable of predicting the clinical motor symptoms. This provides new perspectives for exploring the neural mechanisms underlying the disease and future clinical diagnostic and therapeutic approaches associated with disease progression.
Similar content being viewed by others
Introduction
Parkinson’s disease (PD) is a chronic and progressively debilitating neurodegenerative disorder. It is clinically characterized by cardinal motor symptoms including bradykinesia, resting tremor, muscle rigidity, and postural instability1. These movement symptoms significantly impair patients’ quality of life, leading to progressive disability and compromised activities of daily living. As the disease advances, PD patients exhibit extensive neurodegenerative changes throughout the brain, manifesting as progressive functional and structural alterations at both macroscopic and microscopic levels2,3. However, due to the inherent pathogenic heterogeneity and diverse progression patterns, the development of effective clinical interventions remains challenging. Emerging evidence suggests that clinical performance and neural degeneration patterns exhibit distinct characteristics across different disease stages4. Consequently, establishing validated longitudinal biomarkers and implementing dynamic monitoring frameworks for neurodegeneration quantification arises as an imperative research priority. Decoding patterns of progressive neural reorganization at the whole-brain level across distinct stages of the disease course will contribute to clarifying the neural mechanism of PD over time.
Magnetic resonance imaging (MRI) is a non-invasive technique with standardized acquisition parameters that enables the quantification of macroscopic changes in brain regions. Previous studies have reported gray matter (GM) volume alterations in both cortical and subcortical structures5,6, which demonstrate significant associations with PD motor-symptom severity. Past investigations have consistently identified reduced GM volume in both brain hemispheres of PD patients, particularly affecting key brain regions including the amygdala7, putamen8,9, frontal lobe10, and thalamus11. However, these cross-sectional studies, focusing on specific time points, are inherently limited in tracking how brain structures evolve over time in PD. This limitation becomes particularly apparent given the substantial differences in individual progression rates, which sometimes even lead to contradictory findings across studies12,13.
Longitudinal studies give a possible way to address the limitations4. Findings from the research with follow-up periods ranging from 1.5 to 3 years for PD and healthy controls consistently indicate significant reductions in total GM volume among patients with PD in early to mild stages14,15. Blair et al.16 assessed GM density in patients across early and late disease stages. They found that patients exhibited GM atrophy in the bilateral hippocampus in advanced stages. Additionally, a separate longitudinal study17 revealed a gradual GM volume shrinkage in the bilateral caudate nucleus in PD from baseline to 12-month follow-up. These reports of localized GM atrophy in the PD brain fail to account for the complex interplay and pattern-level synergistic effects across the whole brain. This may be due to the discrete analysis of the standard voxel-based morphometry (VBM) method, focusing solely on local brain structural changes, neglecting the correlation between brain regions that often signify characteristics of the latent brain co-degenerating. Consequently, there remains a gap in understanding the latent pattern of progressive changes in GM volume in the brain in PD. A comprehensive investigation of whole-brain-level GM alterations across distinct disease stages using longitudinal neuroimaging data is needed.
Non-negative matrix factorization (NMF) is an unsupervised multivariate analysis method18, similar to principal component analysis and independent component analysis19,20,21,22,23,24,25,26 in some aspects. However, due to their algorithms, these methods have limitations in the interpretability of their results compared to NMF, even with high prediction accuracy. Non-negativity constraints ensure that the decomposed matrices are free of negative components and weights, enabling the data to be described as a simple additive reconstruction of each decomposed component, which enhances the identification of potential structural patterns. In neuroimaging, NMF has been widely used in MRI image segmentation27,28, disease heterogeneity analysis29 and data dimensionality reduction30. Compared with the VBM method, NMF can take advantage of the differences between brain regions. Correlation information is used to cluster voxels with similar information into latent factors, facilitating the identification of potential distribution patterns of brain structure. With biologically meaningful features, predictions of clinical scales will be more reliable, practical, and interpretable.
Therefore, the present study used NMF to obtain the GM latent factors and identify the longitudinal structural changes in PD. Since this pattern-based approach provides a more comprehensive and biologically plausible representation of GM patterns than isolated regional measures, we first hypothesize that GM patterns (as captured by NMF-derived factors) remain relatively stable in HC during their sixties and seventies31, thus providing a reliable normative basis. Secondly, we hypothesize that deviations in these factor weights observed in PD relative to the established HC normative trajectory would reflect heterogeneous pathological change patterns specific to PD progression, especially in GM patterns associated with motor cognition. Therefore, we further hypothesize that the longitudinal factor weights in PD would significantly predict longitudinal changes in motor-symptom severity, which would function as clinically relevant neuroimaging biomarkers for tracking the longitudinal trajectory of motor symptoms in PD. To this end, we first decomposed GM images into latent factors from healthy adults in the Open Access Series of Imaging Studies 3 (OASIS-3), and then validated the stability on an independent dataset by verifying the similarity of the structural factors. The latent factors were subsequently applied as a basis for the GM images of the PD patients for further longitudinal analysis and prediction. Figure 1 shows the flowchart of data analysis.
a T1w images were processed in the standard SPM workflow to construct a GM matrix for NMF. b The NMF reconstruction algorithm was performed on PPMI under the most stable factors from HC_1. c After performing longitudinal ComBAT, group-level and longitudinal analyses of the weights were conducted to investigate the trajectory of disease progression associated with each factor. The XGBoost predictive model was employed to validate the utility of these factors as biomarkers. GM: gray matter; TIV: total intracranial volume; NMF: non-negative matrix factorization; NNBP: non-negative basis pursuit.
Results
The decomposed latent GM factors
Following the NMF procedures outlined in the “Methods” section, we observed that the reconstruction errors in both datasets exhibited similar distributions (Fig. S1a, b). Notably, as the number of decomposition factors exceeded 5 and the sparsity surpassed 0.3, the NMF reconstruction errors gradually stabilized. Specifically, with 7 decomposition factors and a sparsity of 0.4, the average similarity of the decomposition factors between the HC_1 and HC_2 datasets reached 0.75 (Fig. S1c).
Employing the optimal decomposition parameters (k = 7, λ = 0.4), we obtained the most stable decomposed latent GM factors, denoted as WHC_1 (Fig. 2a). The decomposed results of the HC_2 dataset under the same parameters are shown in Fig. S2 for comparison. Factor 1 predominantly occupied the frontal lobe area, while Factor 2 was situated in the supplementary motor area and precentral gyrus. Factor 3 covered the middle temporal gyrus, precuneus, and inferior occipital gyrus, and Factor 4 spanned the pericalcarine cortex and superior occipital gyrus. Factor 5 encompassed mainly the basal ganglia, while Factor 6 included the amygdala, parahippocampal gyrus, and inferior temporal gyrus. Factor 7 was primarily distributed in the cerebellum area.
a Factor 1: higher cognitive processing; Factor 2: motor function; Factor 3: perceptual processing; Factor 4: visual processing; Factor 5: subcortical basal ganglia; Factor 6: emotion processing; Factor 7: cerebellum. The darker color indicates a higher contribution at the spatial location for the factors. b The patterns of longitudinal weight change of each factor. *Significant between-group differences after Tukey HSD (p < 0.05) *<0.05; **<0.01; ***<0.001; ****<0.0001. c Heatmap shows the correlation coefficients for each factor with the 36 terms of interest. A darker color represents a high correlation coefficient.
Longitudinal change trajectory of factor weights in PD
Linear Mixed-effects Model (LMM) with covariates regressed was applied on PPMI longitudinal analysis after harmonizing the site/scanner confounding by longitudinal ComBat32. Post-hoc pairwise comparisons were corrected by Tukey’s honest significant difference (Tukey HSD). PD GM volume, as shown in Fig. 2b, revealed a declining trajectory as the disease progressed. Specifically, some decreases were statistically significant at baseline compared to the 1-year follow-up: Factors 1, 3, and 4: p < 0.0001; Factor 6: p < 0.001 (Tukey HSD: q < 0.05). Furthermore, the 2-year follow-up of Factors 4 and 6 showed a significant decline compared to the 1-year follow-up (Factors 4 and 6: p < 0.05, Tukey HSD: q < 0.05). All factors showed significant weight decrease over 2 years compared with baseline (Factors 1, 3, 4 & 6: p < 0.0001; Factor 2: p < 0.01; Factor 5: p < 0.05, Tukey HSD: q < 0.05), except for Factor 7, which demonstrated a unique trajectory.
Meta-analytic function decoding of factors
To decode the psychological and physiological functions of the derived factors, we compared the spatial pattern of factors to the functional anatomy of the human brain using NiMARE. A total of 36 terms with strong correlations to the factors were selected, each demonstrating distinct functional profiles. A heatmap was subsequently generated to visually assess the potential functions associated with each factor (Fig. 2c). Specifically, Factor 1 was associated with higher cognitive processing, such as decision-making, personality, social behavioral regulation, executive function, and cognitive control; Factor 2 was correlated with motor control concepts, including motor and actions; Factor 3 centered around perceptual processing, including visual, auditory, action and observation; Factor 4 exhibited relatively concentrated correlations in terms associated with visual perception and navigation; Factor 5 was linked to concepts related to incentive and reward; Factor 6 highlighted affective processing and emotion regulation; Factor 7 demonstrated the peak in navigation and motor.
Longitudinal motor-symptom severity prediction
The XGBoost regression model showed that the factor weights successfully predicted the longitudinal motor-symptom severity measured by MDS-UPDRS II and III. When using factor weights in baseline to predict 1-year follow-up MDS-UPDRS-II (Spearman’s ρ = 0.4715, 95% CI: [0.2671, 0.6759], p < 0.001, MSE = 8.9928), Factors 3 and 7 demonstrated a predominant contribution (Fig. 3a). When using factor weights in baseline and 1-year follow-up to predict 2-year follow-up MDS-UPDRS-II, (Spearman’s ρ = 0.4543, 95% CI: [0.2457, 0.6629], p < 0.001, MSE = 19.4617) both Factors 2 and 3 in baseline exhibited notable importance, and Factor 4 in 1-year follow-up showed relative importance (Fig. 3c).
For each set, the components are presented from left to right as follows: correlation graph of CV results; feature contribution (blue: baseline; red: 1 year). a Baseline weights ->1-year MDS-UPDRS-II. b Baseline weights ->1-year MDS-UPDRS-Ⅲ. c Baseline and 1-year follow-up weights ->2-year MDS-UPDRS-Ⅱ. d Baseline and 1-year follow-up weights ->2-year MDS-UPDRS-Ⅲ. e Summary of the results above. Respectively, the longitudinal statistical results for MDS-UPDRS-II and III are presented, with the most powerful predictive factor identified for each disease stage through the prediction model on CV.
When using factor weights in baseline to predict 1-year follow-up MDS-UPDRS-III (Spearman’s ρ = 0.4984, 95% CI: [0.3008, 0.6959], p < 0.0001, MSE = 59.3074), Factor 3 emerged as the leading predictor (Fig. 3b). When using factor weights in baseline and 1-year follow-up to predict 2-year follow-up MDS-UPDRS-III (Spearman’s ρ = 0.5625, 95% CI: [0.3828, 0.7422], p < 0.0001, MSE = 61.1864), Factor 5 in baseline emerged as a critical contributor, while Factor 3 in baseline showed relative importance, and its 1-year follow-up values retained substantial influence (Fig. 3d). The prediction behavior was summarized in Fig. 3e.
Discussion
Our study explored the latent structure of GM in healthy elderly brains using the NMF method and identified 7 factors corresponding to different covariance patterns. The factors decomposed the GM into the frontal lobe area, the motor area, the perceptual processing area, the visual processing area, the subcortical basal ganglia, the emotion processing area, and the cerebellum area. These factors demonstrated robustness when applied to an independent dataset (Fig. S2). Through further longitudinal analysis in PD, we found that the weights of these factors exhibited consistently gradual reductions in GM volume over 2-year follow-up, except for Factor 7, the cerebellum, which exhibited an inverted U-shaped trajectory (Fig. 2). The RF model proved that the weights had the ability to predict the longitudinal clinical scores of MDS-UPDRS-Ⅱ & Ⅲ in PD, and the important factors contributing to the prediction were detected. The findings revealed distinct patterns in how these factors contribute to predicting symptom severity as the disease progresses.
The MDS-UPDRS-Ⅱ captures motor-related daily living experiences in Parkinson’s disease33. In both longitudinal prediction models of MDS-UPDRS-Ⅱ, Factor 3 (perceptual processing) demonstrated a marked persistence in feature importance during disease progression (feature importance from 0.2596 at 1-year prediction to 0.4535 at 2-year prediction). It was a hub for motion perception34, and the inferior occipital gyrus integrated visual inputs for motor planning35. While some argued that occipital-temporal atrophy primarily reflects comorbid Lewy body pathology36,37, we found that Factor 3 specifically predicted motor (MDS-UPDRS-Ⅱ and Ⅲ) scores, supporting its role in visuomotor integration instead of pure dementia progression38,39,40. These regions played a crucial role in motor control, visual feedback, and cognitive-motor coordination. Factor 7 (cerebellum) suggested an important role of the cerebellum in early PD advances. Notably, in longitudinal GM volume analysis, the weights of Factor 7 exhibited a distinct trajectory and even not significantly different: an inverted U-shaped trajectory instead of a decline in cerebellar GM volume was observed over time. Other studies on patients with movement disorders have also reported increased cerebellar volume in this age group, attributing it to a possible compensatory mechanism in response to functional impairments41,42,43,44,45,46. Further functional studies are needed to clarify the causal relationship between cerebellar activity and early PD pathology.
In the second year prediction of MDS-UPDRS-Ⅱ, the growing significance of baseline Factor 2 (motor function) highlighted the progressive disruption of premotor cortical networks, which were critical for self-initiated movement and autonomous action initiation47. This dysfunction likely exacerbated difficulties in executing routine motor tasks, as the brain’s ability to generate spontaneous movement became increasingly compromised48,49. Concurrently, the emergence of the importance of 1-year atrophy in Factor 4 (visual processing) introduced a new layer of complexity: reduced gray matter volumes in these domains would impair the visual-motor coordination and navigation50, and consequently affect the motor symptom in the following year as disease progression. Over a 2-year follow-up, the weights associated with Factors 2, 3, and 4 showed a significant progressive decline, indicating increasing difficulties in the executive integration of action plans51. Together, these findings underscore how PD progression transforms motor behavior from a relatively simple system into a complex one that becomes increasingly reliant on frontal-parietal-cerebellar-cortical interactions to sustain functional independence52.
Factors 3 and 5 demonstrated robust predictive utility for MDS-UPDRS-III scores, a relatively objective clinician-rated scale assessing motor impairment severity in PD, across disease progression. Factor 3 showed consistent importance across all MDS-UPDRS-Ⅲ predictions, while Factor 5 stressed the most critical role in the 2-year prediction of the scale. Their combined degeneration could impair visuomotor coordination, a known factor to overt gait dysfunction in PD53,54. Factor 5 was mostly composed of the basal ganglia, one of the crucial subcortical structures in the human central nervous system that influences motor ability, cognitive function, and emotional behavior at multiple levels55,56. Previous studies have revealed that patients with PD experience significant loss of GM volume in basal ganglia regions such as the nucleus accumbens, amygdala, and caudate nucleus as the disease progresses5,6. The temporal shift in primary baseline predictors probably suggested that patients suffered more severe and obvious motor dysfunction along the progression of the disease.
The present study has several limitations. Firstly, we assumed the stability of the structural pattern factors obtained by NMF within the 60–70 age range, and the factors acted as a normative basis in our study. Although it has been verified that gray matter atrophy patterns in healthy individuals exhibit high stability in the 60–70 age range31, the strictly matched age range of all datasets and the matched longitudinal controls may quantify the subtle age-related GM loss and provide more purified and detailed comparison results. The absence of age-matched healthy controls in our study implied that long-term changes might not be conclusively separated from normative aging effects. Secondly, as far as the method is concerned per se, NMF is to explore the normative basis from the 5,855,005 GM voxels as the elements of the GM distributed pattern. In our study, seven factors/elements were detected in HC, which represent the main skeleton of the GM in HC. To balance the size of the detailed structure and the main basis, we used the smooth kernel of 8 mm, as prior studies suggested57,58. Therefore, unlike the standard VBM analysis, which focused solely on local brain structural changes, some small structures, such as the substantia nigra might not be included in the factors obtained by NMF. Finally, the predictive performance for clinical scales, while statistically significant, remained moderate in our limited sample. Although the feature importance remained relatively stable across the different models, our predictive models would be better considered as exploratory tools with limited clinical applicability.
In sum, our study leverages NMF to map the heterogeneity of longitudinal gray matter change patterns in PD. We identified seven distinct neuroanatomical factors in HC that serve as normative reference patterns, which were consistent when using an independent dataset. The factors were found to be functionally associated with (1) higher cognitive processing; (2) motor function; (3) perceptual processing; (4) visual processing; (5) subcortical basal ganglia; (6) emotion processing; (7) cerebellum. Crucially, the weights corresponding to the factors exhibited disease-specific longitudinal trajectories in PD, which demonstrated significant predictive power for motor-symptom progression. Factors 2, 3, and 7 played pivotal roles in longitudinally predicting MDS-UPDRS-Ⅱ scores, whereas Factors 3 and 5 accounted for most change in MDS-UPDRS-Ⅲ, suggesting differentiated GM elements that characterized the progressive changes of motor-related daily living experiences and motor impairment severity, respectively. The proposed data-driven framework provides a novel approach for characterizing disease heterogeneity progression in this neurodegenerative disease, and shows potentially quantitative neuroimaging biomarkers of pathological progression of PD.
Methods
Participants
Three datasets were included in this study, all of which were approved by the ethical review boards of the respective research institutions.
-
(1)
Dataset 1 (HC_1) was sourced from OASIS-3. OASIS-3 is a publicly available neuroimaging database developed by the University of Washington, encompassing multiple age groups59. It includes MRI data from healthy adults, individuals with mild cognitive impairment, and patients with Alzheimer’s disease, offering a substantial repository of high-quality brain imaging data for research purposes. Following OASIS-3’s inclusion criteria for healthy subjects, this study included 199 participants aged between 50 and 85 years, with Mini-Mental State Examination (MMSE) > 24 and Clinical Dementia Rating Scale (CDR) = 0, indicating healthy controls.
-
(2)
Dataset 2 (HC_2) was derived from the collaborative efforts of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Neuroimaging in Frontotemporal Dementia (NIFD). ADNI is a longitudinal research endeavor spanning multiple research centers, aimed at identifying and validating clinical indicators, imaging characteristics, genetic markers, and biochemical indicators for the early identification and monitoring of Alzheimer’s disease60. NIFD, on the other hand, provides longitudinal clinical and imaging data related to frontotemporal lobar degeneration61. By combining the enrollment criteria of these two studies concerning healthy elderly subjects, we included 163 healthy controls aged between 50 and 85, with MMSE > 24 and CDR = 0. Clinical information for HC_1 and HC_2 is summarized in Table 1.
Table 1 HC_1 and HC_2 subject information table
The Parkinson’s data was sourced from the PPMI. PPMI is a large-scale, multinational, multicenter study dedicated to collecting and publicly sharing clinical data, genomic information, patient-reported outcomes, and imaging study results related to Parkinson’s disease62. PPMI baseline inclusion criteria for PD patients included: (1) exhibiting at least two motor symptoms; (2) being diagnosed with PD no more than 2 years ago and in the early clinical stage of the disease at baseline; (3) no symptomatic treatment within 6 months post-baseline; (4) presence of Dopamine Transporter (DAT) deficiency. Healthy subjects included in the study must exhibit no obvious neurological impairment, have no first-degree family member with PD, and score above 26 on the Montreal Cognitive Assessment (MOCA). Based on these criteria, we selected 48 healthy individuals (baseline data only) and 78 PD patients (containing data at baseline, 1-year follow-up, and 2-year follow-up) from the PPMI database. Table 2 summarizes baseline demographic and clinical data.
Ethical approval was obtained from local ethics committees for each original studies: For the HC_1 and HC_2 data (OASIS-3, ADNI, and NIFD), approval was granted by the Institutional Review Board of Washington University School of Medicine; WW-ADNI Resource Allocation Review Committee; the Trial Innovation Network at Johns Hopkins University, and local ethics committees at all sites approved the studies. PPMI data were approved by the ethical standards committee on human research at each participating institution. All subjects gave written informed consent in accordance with the Declaration of Helsinki prior to enrollment. As this study involved secondary analysis of existing de-identified data, no new ethical approval was required from the ethics committees for the current report.
MRI acquisition
All magnetic resonance imaging scans adhered to the standard protocols established by their respective studies. Sagittal 3D T1-weighted (T1w) images were acquired using the gradient echo/inversion recovery (GR/IR) sequence. Scanning parameters per study are detailed in Table 3, in accordance with the data inclusion criteria of their respective research plans.
Image preprocessing
Each participant’s T1w images were processed using MATLAB 2018a and the CAT12 toolbox in SPM 12 (https://neuro-jena.github.io/cat/). The preprocessing steps included the following: (1) Manual correction of the origin of all T1w images to align the anterior commissure and posterior commissure on the same horizontal line. (2) Segmenting the images to extract three tissue components: gray matter, white matter, and cerebrospinal fluid. (3) Normalizing the gray matter images to the Montreal Neurological Institute template. (4) Modulating the gray matter voxel density into volumes. (5) Correcting the gray matter volume by dividing it by the total intracranial volume to mitigate the impact of multi-center site acquisition on the results. (6) Smoothing the gray matter images using an 8 mm full-width at half maximum Gaussian kernel (Fig. 1a).
Data harmonization
We implemented the validated longitudinal ComBat method to harmonize GM factor weights across multicenter PPMI datasets using its implementation in R with parametric empirical Bayes estimation (https://github.com/jcbeer/longCombat?tab=readme-ov-file). This technique, extended from cross-sectional ComBat63, effectively removes non-biological variance induced by differing MRI scanners and acquisition protocols while preserving longitudinal within-subject dependencies. Harmonization was performed on all seven GM factor weights, with diagnostic group (PD/HC), age, sex, total intracranial volume (TIV), and education years specified as biological covariates to retain. Scanner site and longitudinal time points were modeled as batch effects.
Non-negative matrix factorization
NMF produces a sparse, part-based data representation under the constraint of non-negativity18, and the results of NMF can be viewed as an additive combination of factors and weights. We used sparse non-negative matrix decomposition under \({{\mathcal{l}}}^{0}\) norm constraints64, which can specify the number of non-zero elements in the decomposition factor or weight. The package for NMF is available at https://github.com/smatmo/l0-sparse-NMF. Its mathematical definition is as follows:
In the formula, X is an m × n-dimensional non-negative GM matrix, where m represents the number of GM voxels, and n represents the number of subjects. This process is specifically carried out for the GM voxels in the brain, using a GM template sourced from the GM probability map in the SPM12 toolbox, with a probability threshold set to 0.25. The final output of segmentation, for each subject, was a 3D image registered to the GM template and with a size of 169 × 205 × 169. W is an m × k matrix, where k is the number of decompositions and k\(\le\)min(m,n), representing the number of GM matrix decomposition factors. H has dimensions of k × n, representing the weight of each subject in each GM latent factor, respectively. L represents the maximum number of non-zero elements in each column of W.
NMF followed a two-stage iterative approach, as illustrated in the Algorithm64. We first calculated an optimal, unconstrained solution for the basis matrix W (with fixed H) in step 3 by sparseNNLS.m. \(\boldsymbol{\mathcal{l}}\)0-constraints were satisfied by projecting the basis vectors onto the closest non-negative vector in Euclidean space (Steps 4–6). Step 7 enhanced H, maintaining the sparse structure and updating the non-zero entries of W. The technique for unconstrained NMF does not increase ||X−WH||2 and typically reduces the objective by the following multiplicative updates rules18.
and
where ⊗ and / denote element-wise multiplication and division, respectively. Therefore, Step 7 in the Algorithm can be implemented by executing for several iterations.

In our script, the “num” was set as 30. Reproducibility was quantified by the similarity of factors between the HC_1 and the HC_2 cohorts, computed as the mean correlation between corresponding factor pairs (using the Hungarian matching algorithm, which is accessible by https://github.com/ondrejdee/hungarian/).
Reconstruction error and decomposition parameters
We aimed for the NMF decomposition latent factor to be consistent across different datasets. Therefore, we initially computed the reconstruction error across varying numbers of decompositions (k: 2–20) and sparsity values (λ: 0.1–0.9) in both the HC_1 and HC_2 datasets. The NMF reconstruction error was quantified as the Frobenius norm between the original input gray matter matrix and its corresponding reconstructed matrix.
In the formula, \({\epsilon }_{\lambda }^{k}\) represents the decomposition error when the number of decompositions is k and the sparsity is λ.
The Hungarian matching algorithm was then used to match the decomposition factors of the two datasets, and the average Pearson similarity of the matching factors was calculated to measure the repeatability of NMF in different datasets. As shown in Equation [5]:
In the formula, \(\bar{{r}_{k,\lambda }}\) represents the average Pearson similarity of the two datasets when the number of decompositions is k and the sparsity is λ. Then, select k and λ when \(\bar{{r}_{k,\lambda }}\) is the largest as the optimal decomposition parameters. In line with previous research65, the NMF procedure was performed 100 times to reduce the impact of random initializations. The final factors were determined by selecting the one that was most similar to the decompositions obtained in the remaining 99 runs.
Finally, we used the reconstruction algorithm to calculate the decomposition weight of the gray matter matrix of PPMI baseline healthy subjects and PD at different time points under the most stable decomposition result WHC_1 constraint, as shown in Equation [6].
Meta-analytic functional association mapping
Probabilistic functional profiles of the identified factors were decoded using Neurosynth v0.4.1, which is available on https://github.com/neurostuff/NiMARE. It’s a large-scale meta-analysis platform synthesizing over 15,000 published fMRI studies66. We selected the representative regions of each factor for meta-analysis to mitigate confounding effects and enhance analytical specificity. We quantified associations with 36 curated functional terms spanning affective, cognitive control, sensory, and motor domains with a frequency threshold of 0.001. The output values were scaled to 0~1 for a better visualization.
MDS-UPDRS for model validation
The MDS-UPDRS was developed to provide a comprehensive assessment of PD symptoms and their impact on daily functioning33. This scale was established in response to the need for a standardized tool that could effectively capture the multifaceted nature of PD, encompassing both motor and non-motor symptoms, which can reflect the quality of patients’ life67. The MDS-UPDRS consists of four distinct parts: Part I evaluates non-motor experiences of daily living, Part II assesses motor experiences of daily living, Part III focuses on the motor examination, and Part IV addresses motor complications. It allows a thorough evaluation of disease progression and treatment effects. The affection of the MDS-UPDRS in longitudinal datasets has been proven68,69. Since our research mainly focused on the evolution of motor symptoms over time, parts Ⅱ and Ⅲ of the scale were selected for this study to characterize and evaluate the disease trajectory of the patient.
XGBoost prediction model
To investigate whether the weights of the NMF decomposition factor can predict longitudinal clinical scale scores, XGBoost70 was utilized to predict the MDS-UPDRS scale scores. This selection was based on the comparison of a range of models, including linear regression, Decision Tree, Extra Tree, AdaBoost, and Random Forest (Supplementary Material 1). We implemented a rigorous multi-stage validation approach to develop and evaluate our predictive model. Firstly, we randomly partitioned the dataset into a training set and an independent test set using a 3:1 ratio. To optimize model performance while mitigating overfitting risks, we employed a 5-fold cross-validation (CV) process on the training set with grid search hyperparameter tuning. The performance of prediction models during cross-validation was assessed using the Mean Squared Error (MSE) between the predicted and observed scale scores, with the significance level p = 0.05. The optimally configured model, selected based on the cross-validation results, was then trained on the entire training set and evaluated on the independent test set. With the limited data contexts, CV could provide more stable estimating results, but we reported the independent testing set as the generalization measure (Supplementary Material 2). To further validate the significance of our predictive model, we conducted a permutation test with 5000 iterations, maintaining a rigorous significance threshold of p < 0.001 for determining statistical significance of the observed predictive performance. Finally, we computed the feature importance of each predictive model across the entire dataset, with results visualized as radar charts. Feature importance analysis was computed using the weight method. All clinical scale scores violated normality assumptions (Kolmogorov–Smirnov test, p < 0.01 for all scales). We therefore adopted Spearman’s rank correlation for all correlation analyses between the predicted and true scores.
Statistical analysis
Following the reconstruction of the model trained on HC_1 (mean age: 69.01) onto the younger PPMI cohort (mean age: HC 61.75; PD 61.88), we implemented a linear adjustment to the factor weights (Supplementary Material 2)31. Two-sample t-tests and χ² tests were used to examine group differences in basic demographic variables. To assess significant differences in factor values across time points, this study performed post-hoc pairwise comparisons for each factor using LMM:
In the formula, Hijt represents the projected weight of the j-th factor for subject i at time t. Age, education, and total intracranial volume (TIV) were normalized. ui and ϵij are the random intercept and error, respectively, and are subject to a normal distribution. The new models appropriately account for within-subject correlations through random intercepts while adjusting for key covariates, including age, sex, education, and TIV. Least squares means were used to estimate expected factor values at each time point:
The Tukey HSD was then used to control for multiple comparisons in the analyses. All analyses were performed in R.
Data availability
Data for this study are freely available in the public domain through https://ida.loni.usc.edu/. Specifically, our data was selected from the following websites: Parkinson’s Progression Markers Initiative website (https://www.ppmi-info.org/); Open Access Series of Imaging Studies 3 (https://sites.wustl.edu/oasisbrains/home/oasis-3/); Alzheimer’s Disease Neuroimaging Initiative (https://adni.loni.usc.edu/data-samples/adni-data/); Neuroimaging in Frontotemporal Dementia (https://www.allftd.org/data). For more detailed information, please see the “Methods” section.
Code availability
All code used to perform the analyses can be found at https://github.com/UESTC-nuero-lab/Longitudinal-NMF-identifies-the-altered-trajectory-of-motorsymptoms-in-Parkinsons-disease.
References
Williams-Gray, C. H. & Barker, R. A. Parkinson disease: defining PD subtypes - a step toward personalized management? Nat. Rev. Neurol. 13, 454–455 (2017).
Morris, H. R., Spillantini, M. G., Sue, C. M. & Williams-Gray, C. H. The pathogenesis of Parkinson’s disease. Lancet 403, 293–304 (2024).
Camacho, M. et al. Exploiting macro- and micro-structural brain changes for improved Parkinson’s disease classification from MRI data. npj Parkinsons. Dis. 10, 43 (2024).
Filippi, M. et al. Progressive brain atrophy and clinical evolution in Parkinson’s disease. Neuroimage Clin. 28, 102374 (2020).
Charroud, C. & Turella, L. Subcortical grey matter changes associated with motor symptoms evaluated by the Unified Parkinson’s disease Rating Scale (part III): a longitudinal study in Parkinson’s disease. Neuroimage Clin. 31, 102745 (2021).
Lewis, M. M. et al. The pattern of gray matter atrophy in Parkinson’s disease differs in cortical and subcortical regions. J. Neurol. 263, 68–75 (2016).
Ibarretxe-Bilbao, N. et al. Progression of cortical thinning in early Parkinson’s disease. Mov. Disord. 27, 1746–1753 (2012).
Lisanby, S. H. et al. Diminished subcortical nuclei volumes in Parkinson’s disease by MR imaging. J. Neural Transm. Suppl. 40, 13–21 (1993).
Summerfield, C. et al. Structural brain changes in Parkinson disease with dementia: a voxel-based morphometry study. Arch. Neurol. 62, 281–285 (2005).
Gerrits, N. J. et al. Gray matter differences contribute to variation in cognitive performance in Parkinson’s disease. Eur. J. Neurol. 21, 245–252 (2014).
Cui, X. et al. Gray matter atrophy in Parkinson’s disease and the Parkinsonian variant of multiple system atrophy: a combined ROI- and voxel-based morphometric study. Clinics 75, e1505 (2020).
Karagulle Kendi, A. T., Lehericy, S., Luciana, M., Ugurbil, K. & Tuite, P. Altered diffusion in the frontal lobe in Parkinson disease. Am. J. Neuroradiol. 29, 501 (2008).
Potgieser, A. R. E. et al. Anterior temporal atrophy and posterior progression in patients with Parkinson’s disease. Neurodegener. Dis. 14, 125–132 (2014).
Mak, E. et al. Longitudinal whole-brain atrophy and ventricular enlargement in nondemented Parkinson’s disease. Neurobiol. Aging 55, 78–90 (2017).
Mollenhauer, B. et al. Monitoring of 30 marker candidates in early Parkinson disease as progression markers. Neurology 87, 168–177 (2016).
Blair, J. C. et al. Brain MRI reveals ascending atrophy in Parkinson’s disease across severity. Front. Neurol. 10, 1329 (2019).
Jia, X. et al. Longitudinal study of gray matter changes in Parkinson disease. AJNR Am. J. Neuroradiol. 36, 2219–2226 (2015).
Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999).
Calhoun, V. D., Adali, T., Pearlson, G. D. & Pekar, J. J. A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14, 140–151 (2001).
Kawaguchi, H. et al. Principal component analysis of multimodal neuromelanin MRI and dopamine transporter PET data provides a specific metric for the nigral dopaminergic neuronal density. PLoS ONE 11, e0151191 (2016).
Salmanpour, M. R., Shamsaei, M., Hajianfar, G., Soltanian-Zadeh, H. & Rahmim, A. Longitudinal clustering analysis and prediction of Parkinson’s disease progression using radiomics and hybrid machine learning. Quant. Imaging Med. Surg. 12, 906–919 (2022).
Seki, M. et al. Diagnostic potential of multimodal MRI markers in atypical Parkinsonian disorders. J. Parkinsons Dis. 9, 681–691 (2019).
Steel, A., Garcia, B. D., Silson, E. H. & Robertson, C. E. Evaluating the efficacy of multi-echo ICA denoising on model-based fMRI. Neuroimage 264, 119723 (2022).
Ashburner, J. & Kloppel, S. Multivariate models of inter-subject anatomical variability. Neuroimage 56, 422–439 (2011).
McIntosh, A. R. & Misic, B. Multivariate statistical analyses for neuroimaging data. Annu Rev. Psychol. 64, 499–525 (2013).
Tang, J. et al. Artificial neural network-based prediction of outcome in Parkinson’s disease patients using DaTscan SPECT imaging features. Mol. Imaging Biol. 21, 1165–1173 (2019).
Sauwen, N. et al. Semi-automated brain tumor segmentation on multi-parametric MRI using regularized non-negative matrix factorization. BMC Med. Imaging 17, 29 (2017).
Sun, P. et al. Tissue segmentation using sparse non-negative matrix factorization of spherical mean diffusion MRI data. Comput. Diffus. MRI 2019, 69–76 (2019).
Han, S. et al. Parsing altered gray matter morphology of depression using a framework integrating the normative model and non-negative matrix factorization. Nat. Commun. 14, 4053 (2023).
Anderson, A. et al. Non-negative matrix factorization of multimodal MRI, fMRI and phenotypic data reveals differential changes in default mode subnetworks in ADHD. Neuroimage 102, 207–219 (2014).
Bethlehem, R. A. I. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).
Beer, J. C. et al. Longitudinal ComBat: a method for harmonizing longitudinal multi-scanner imaging data. Neuroimage 220, 117129 (2020).
Goetz, C. G. et al. Movement disorder society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord. 23, 2129–2170 (2008).
Chen, Y. et al. Structural and functional differences of the thalamus between drug-naïve Parkinson’s disease motor subtypes. Front. Neurol. 14, 1102927 (2023).
Gan, L. et al. Alterations of structure and functional connectivity of visual brain network in patients with freezing of gait in Parkinson’s disease. Front. Aging Neurosci. 14, 978976 (2022).
Mehraram, R. et al. Functional and structural brain network correlates of visual hallucinations in Lewy body dementia. Brain 145, 2190–2205 (2022).
Gasca-Salas, C., Clavero, P., García-García, D., Obeso, J. A. & Rodríguez-Oroz, M. C. Significance of visual hallucinations and cerebral hypometabolism in the risk of dementia in Parkinson’s disease patients with mild cognitive impairment. Hum. Brain Mapp. 37, 968–977 (2016).
Camicioli, R. et al. Ventricular dilatation and brain atrophy in patients with Parkinson’s disease with incipient dementia. Mov. Disord. 26, 1443–1450 (2011).
Compta, Y. et al. Combined dementia-risk biomarkers in Parkinson’s disease: a prospective longitudinal study. Parkinsonism Relat. Disord. 19, 717–724 (2013).
Zhu, L. J. et al. Dentate nNOS accounts for stress-induced 5-HT(1A) receptor deficiency: implication in anxiety behaviors. CNS Neurosci. Ther. 26, 453–464 (2020).
Gellersen, H. M. et al. Cerebellar atrophy in neurodegeneration-a meta-analysis. J. Neurol. Neurosurg. Psychiatry 88, 780–788 (2017).
Linder, J. et al. Degenerative changes were common in brain magnetic resonance imaging in patients with newly diagnosed Parkinson’s disease in a population-based cohort. J. Neurol. 256, 1671–1680 (2009).
Messina, D. et al. Patterns of brain atrophy in Parkinson’s disease, progressive supranuclear palsy and multiple system atrophy. Parkinsonism Relat. Disord. 17, 172–176 (2011).
Ballanger, B. et al. Motor urgency is mediated by the contralateral cerebellum in Parkinson’s disease. J. Neurol. Neurosurg. Psychiatry 79, 1110–1116 (2008).
Yu, H., Sternad, D., Corcos, D. M. & Vaillancourt, D. E. Role of hyperactive cerebellum and motor cortex in Parkinson’s disease. Neuroimage 35, 222–233 (2007).
Cao, H. et al. A voxel-based magnetic resonance imaging morphometric study of cerebral and cerebellar gray matter in patients under 65 years with essential tremor. Med. Sci. Monit. 24, 3127–3135 (2018).
Haggard, P. Human volition: towards a neuroscience of will. Nat. Rev. Neurosci. 9, 934–946 (2008).
Wu, T., Hallett, M. & Chan, P. Motor automaticity in Parkinson’s disease. Neurobiol. Dis. 82, 226–234 (2015).
Rosenberg-Katz, K. et al. Gray matter atrophy distinguishes between Parkinson disease motor subtypes. Neurology 80, 1476–1484 (2013).
Wang, L. et al. Gray matter structural and functional brain abnormalities in Parkinson’s disease: a meta-analysis of VBM and ALFF data. J. Neurol. 272, 276 (2025).
Kostić, V. S. et al. Pattern of brain tissue loss associated with freezing of gait in Parkinson disease. Neurology 78, 409–416 (2012).
Melloni, M. et al. Cortical dynamics and subcortical signatures of motor-language coupling in Parkinson’s disease. Sci. Rep. 5, 11899 (2015).
Jin, C. et al. Abnormal functional connectivity density involvement in freezing of gait and its application for subtyping Parkinson’s disease. Brain Imaging Behav. 17, 375–385 (2023).
Nantel, J., McDonald, J. C., Tan, S. & Bronte-Stewart, H. Deficits in visuospatial processing contribute to quantitative measures of freezing of gait in Parkinson’s disease. Neuroscience 221, 151–156 (2012).
Stoessl, A. J., Lehericy, S. & Strafella, A. P. Imaging insights into basal ganglia function, Parkinson’s disease, and dystonia. Lancet 384, 532–544 (2014).
Blandini, F., Nappi, G., Tassorelli, C. & Martignoni, E. Functional changes of the basal ganglia circuitry in Parkinson’s disease. Prog. Neurobiol. 62, 63–88 (2000).
Varikuti, D. P. et al. Evaluation of non-negative matrix factorization of grey matter in age prediction. Neuroimage 173, 394–410 (2018).
Solis-Urra, P. et al. Early life factors and structural brain network in children with overweight/obesity: the ActiveBrains project. Pediatr. Res. 95, 1812–1817 (2024).
Marcus, D. S., Fotenos, A. F., Csernansky, J. G., Morris, J. C. & Buckner, R. L. Open access series of imaging studies(OASIS): longitudinal MRI data in nondemented and demented older adults. J. Cogn. Neurosci. 22, 2677–2684 (2010).
Petersen, R. C. et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010).
Heuer, H. W. et al. ALLFTD: characterization of Frontotemporal Lobar Degeneration (FTLD) disease trajectories through longitudinal assessment. Alzheimers Dement. 20, e093231 (2024).
Parkinson Progression Marker, I. The Parkinson Progression Marker Initiative (PPMI). Prog. Neurobiol. 95, 629–635 (2011).
Fortin, J.-P. et al. Harmonization of cortical thickness measurements across scanners and sites. Neuroimage 167, 104–120 (2018).
Peharz, R. & Pernkopf, F. Sparse nonnegative matrix factorization with ℓ0-constraints. Neurocomputing 80, 38–46 (2012).
Tang, S. et al. Reconciling dimensional and categorical models of autism heterogeneity: a brain connectomics and behavioral study. Biol. Psychiatry 87, 1071–1082 (2020).
Salo, T. et al. NiMARE: Neuroimaging Meta-Analysis Research Environment. Aperture Neuro 3, 1–32 (2023).
Skorvanek, M. et al. Relationship between the MDS-UPDRS and quality of life: a large multicenter study of 3206 patients. Parkinsonism Relat. Disord. 52, 83–89 (2018).
Skorvanek, M. et al. Differences in MDS‐UPDRS scores based on Hoehn and Yahr stage and disease duration. Mov. Disord. Clin. Pract. 4, 536–544 (2017).
Holden, S. K., Finseth, T., Sillau, S. H. & Berman, B. D. Progression of MDS-UPDRS scores over five years in de novo Parkinson disease from the Parkinson’s Progression Markers Initiative Cohort. Mov. Disord. Clin. Pr. 5, 47–53 (2018).
Chen T & Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794 (2016).
Acknowledgements
This study was funded by the National Natural Science Foundation of China (62173070, 62036003, 62333003, and 82372085), the Autonomous Region Science and Technology Plan Project (2024D01A15), the Science and Technology Bureau of Chengdu Program (2024-YF05-00873-SN), and Medical Research Project of Chengdu (2024150). We are also grateful to the participants in the MRI scans used here.
Author information
Authors and Affiliations
Contributions
X.H., Q.G., K.Z., and Y.W. designed the algorithm and analyzed the data; Q.G., X.H., and R.L. were mainly responsible for preparing the manuscript; K.Z., R.L., J.Y., and Q.C. revised the manuscript; F.L., H.C., and Q.G. supervised the project. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hou, X., Zhou, K., Wu, Y. et al. Longitudinal non-negative matrix factorization identifies the altered trajectory of motor symptoms in Parkinson’s disease. npj Parkinsons Dis. 11, 263 (2025). https://doi.org/10.1038/s41531-025-01127-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41531-025-01127-4