Systematic review of prognostic models in Parkinson’s disease

Li, Yan; McDonald-Webb, Millie; McLernon, David J.; Counsell, Carl E.; Macleod, Angus D.

doi:10.1038/s41531-025-01112-x

Download PDF

Article
Open access
Published: 29 August 2025

Systematic review of prognostic models in Parkinson’s disease

npj Parkinson's Disease volume 11, Article number: 266 (2025) Cite this article

2520 Accesses
18 Altmetric
Metrics details

Subjects

Abstract

Predicting outcomes for people with Parkinson’s (PwP) can enable better information provision, personalised treatments, and enhanced trial design. It is unclear what prognostic models are optimal for use. We systematically reviewed previously published prognostic models for PwP, assessed quality, and made recommendations. We searched MEDLINE and EMBASE for studies developing/validating models predicting clinical outcomes in PwP. We assessed risk of bias and applicability using the PROBAST tool. We screened 1024 references and identified 25 studies (41 prognostic models). The most common outcomes were falls (11 studies), dementia (7) and motor complications (4). Most models made short-term predictions (60% ≤2 years). All studies had concerns about bias, e.g., inadequate population details (n = 16), suboptimal methods for missing data (n = 21), and no external validation (n = 22). 13 models had sufficient information to be used in practice. Further development and validation of prognostic models is needed which follows existing guidelines to reduce risk of bias.

Structural underpinnings and long-term effects of resilience in Parkinson’s disease

Article Open access 02 May 2024

Four questions to predict cognitive decline in de novo Parkinson’s disease

Article Open access 25 April 2025

Deep learning predicts prevalent and incident Parkinson’s disease from UK Biobank fundus imaging

Article Open access 13 February 2024

Introduction

Parkinson’s disease (PD) is a progressive disorder, which often leads to poor outcomes, including falls, dementia, and shortened survival. Being able to predict individualised risk of such outcomes in PD has many advantages: (i) informing people with PD (PwP) how they may be impacted; (ii) improving recruitment, randomisation, and analysis of randomised controlled trials; (iii) enabling clinicians to offer targeted personalised treatment to PwP; and (iv) allowing case-mix correction when comparing outcomes over different hospitals or regions^1,2. These benefits can best be realised with prognostic models. A prognostic model is a statistical tool which combines an individual’s characteristics to predict the probability of a specific outcome within a period of time.

Given the importance of model validation, it is important to clarify related terminology. Internal validation involves resampling from the same development dataset to test the model performance in the underlying population, while external validation involves assessing model performance in another independent dataset³. Calibration and discrimination are measurements of model performance in validation. Calibration refers to the agreement between predicted risks from the model and observed outcomes. Three popular methods to assess calibration are mean calibration (overall observed outcome fraction/average predicted risk), calibration slope (assesses under or over prediction in high/low risk PwP), and calibration plots⁴. Discrimination refers to the model’s ability to distinguish predicted risk between PwP who developed the outcome and those who did not, often measured with the C-statistic⁵.

To date, there has been no published systematic review of prognostic models in PD. A systematic review of studies which identified PD subtypes using cluster analysis has been published⁶, but the aim of these studies is to make group-level, rather than individualised predictions. We therefore performed a systematic review of studies of prognostic models in PD to comprehensively describe existing prognostic models, assess their methodological quality and make recommendations for use in clinical practice.

Results

We identified 560 papers in MEDLINE and 1087 papers in EMBASE and one paper was identified outside the formal search strategy. We removed 569 duplicates and excluded 994 papers by abstract and title screening. 84 papers were selected for full text screening. 25 papers^{7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}, comprising 41 prognostic models, were eligible for inclusion (see Fig. 1).

Study populations and designs

15 studies (60%) were published since 2015^{7,9,11,14,15,17,18,19,20,23,25,27,28,29,30,31}, and one before 2010⁸ (Table 1, Fig. 2). Most studies included European (40%)^{8,12,13,15,17,19,20,24,25,28}, North American (12%)^10,23,29, Australian populations (12%)^11,16,22, or a combination of these (16%)^14,18,26,27. 20 studies (80%) were prospective observational cohort studies^{7,8,9,10,11,13,14,15,16,17,19,20,21,22,23,24,27,28,29,31} and 7 studies (28%) were inception cohort studies^{14,15,19,20,24,27,28}. Models from 7 studies (28%)^{14,15,19,20,27,28,29} had a defined time-point at which they could be used (i.e. at diagnosis or in early PD) (Table 1). 18 studies (72%)^{7,8,9,10,11,12,13,16,17,18,21,22,23,24,25,26,30,31} recruited PwP at various disease stages or did not define which PwP were recruited, so we were unable to identify which time-points in the disease course the models were designed to be used. However, one model²³ recruited PwP with disease durations ranging from 0 to 30 years and included disease duration as a predictor variable in the model, so could potentially be used throughout the disease course if adequately validated.

Table 1 Summary of design of studies included in the systematic review of prognostic models in Parkinson’s Disease

Full size table

Outcomes of study

The most common prognostic outcome was falls/recurrent falls, which was predicted in 11 studies (44%)^{7,8,9,10,12,13,16,17,19,21,22}. 7 studies (28%)^{12,18,19,23,27,28,31} predicted cognitive impairment/dementia, 4 studies (16%)^12,15,25,26 predicted motor complications, 3 studies (12%)^11,12,19 predicted freezing of gait, 3 studies (12%) predicted imbalance^12,19,30, 2 studies (8%)^18,20 predicted functional disability, 2 studies (8%)^20,28 predicted a composite poor outcome, and single studies predicted depression¹⁴, mortality²⁰, fracture risk²⁴, difficulty doing hobbies¹⁹, and several other symptoms and signs^12,29. The follow-up duration over which predictions were made varied from 3 months⁸ to 12 years²⁰, most of which were <2 years (60% of models) and 4 studies^18,20,25,28 had 5 or more years’ follow-up (Table 1).

Predictors in study

The number of predictors per model ranged from 3 to 998 (Table 1). 17 studies comprising 24 prognostic models (59%) used variables which were simple to collect in clinical practice, but 7 studies comprising 11 prognostic models (27%) included predictors that are not always routinely available in clinical practice, such as DAT imaging measurements, CSF biomarkers, or genetic polymorphism data (supplementary Table 1)^{13,14,18,23,25,27,31}. In one study, 6 models (15%) were based on smartphone features and the corresponding app/analysis pipelines are not available for routine use in clinical practice¹⁹. 8 studies dichotomised or categorised continuous/discrete predictors^{7,10,12,13,17,22,24,31}. Across 24 studies with 35 final models which specified the predictors, the most common predictors were age/age at onset (n = 25), sex (n = 15), and original or Movement Disorder Society Revision of the UPDRS (n = 12) (supplementary Table 2). In Fig. 3 we showed the percentages of predictors included in the models for the two most common outcomes (falls/recurrent falls [13 models] and cognitive impairment/dementia [7 models]). We question the usefulness of previous falls as a predictor of future falls, as was the case in 11 models^{7,8,9,10,13,17,21,22} because once PwP have started to fall, the fracture risk is already present and physiotherapy interventions for falls and balance are already indicated.

Study sample sizes

5 studies (20%) had fewer than 100 participants^8,9,13,15,30 (Table 1). Only 4 studies (16%) had an events per variable (EPV) of at least 10^10,17,18,20 (Table 1), the usual rule of thumb for minimum EPV required for Cox or logistic regression modelling³², and many of the other studies had EPVs much less than 10^{7,8,9,11,13,14,16,19,25,27,28,30,31}. 4 studies (16%) didn’t give information about the number of events^18,24,26,29 (Table 1).

Model development

12 studies (48%) did not provide information on the number of participants lost to follow-up^{9,10,11,12,15,18,20,22,24,25,26,29,31} and 11 studies (44%) didn’t report the number of participants with missing data^{9,11,12,15,16,17,21,22,24,26,31} (supplementary Tables 3 and 4). 10 studies (40%) gave full information of missing data (number and imputation method)^{7,10,13,14,18,23,25,27,28,29}. The most common method of handling missing data was complete case analysis (28%)^{7,10,13,15,18,25,29}. 2 studies (8%) handled the missing data with multiple imputation^14,28 (Table 2). 8 studies (32%) transformed continuous predictors into dichotomous or category variables^{7,10,12,13,17,22,24,31} and 10 studies (40%) selected predictors by univariable analysis^{7,9,13,14,16,20,22,25,27,31} (supplementary table 1 and 5).

Table 2 Summary of modelling methods and validation

Full size table

12 studies (48%) used logistic regression^{8,9,10,11,13,14,16,17,21,22,27,28} and 3 studies (12%) used machine learning (decision trees, XGBoost, and random forests) to build the prognostic model^12,14,19. None of the machine learning models reported key predictor importance (e.g., SHAP values) or provided sufficient details for independent validation.8 studies (32%) didn’t account for censoring and simply excluded censored participants in the analysis^{8,13,14,16,17,21,27,28}. 10 studies (40%) used time-to-event survival analysis to build the prognostic models: 6 studies used Cox regression^{7,15,24,25,26,31}. Other studies used a frailty Cox model^18,23, Weibull parametric survival model²⁰ and a dynamic prediction model²⁹ (Table 2). Three studies reported checking the proportional hazards assumption in survival analysis^7,18,20 (Table 2 and supplementary table 5).

Model evaluation and performance

Two studies^10,17 that aimed to externally validate previously published models did not use the original model equation to make predictions for PwP in their validation dataset³. Therefore, these 2 studies^10,17 were not truly external validation studies. We classed these studies as model development in the PROBAST assessment (Tables 1 and 3).

Table 3 Summary of risk of bias and applicability in PROBAST

Full size table

Internal validation and model equation assessment only applies to model development studies (n = 24) (Table 1). 7 studies (28%) didn’t perform internal validation^{8,9,10,11,17,21,26}, 7 studies (28%) didn’t provide clear information about whether internal validation had been applied in all model development procedures or not^{13,14,16,23,24,29,31}, and 3 studies (12%) used split data methods^14,27,30 (supplementary Table 6). 15 studies (60%) used cross-validation or bootstrap resampling to assess optimism in model performance^{7,12,13,16,18,19,20,22,23,24,25,27,28,29,31} (supplementary Table 6). Only 3 studies (12%) performed both internal and external validation after model development^18,20,28 (supplementary Table 6). One study¹⁸ didn’t give the number of events in the development and validation datasets (Table 1).

3 studies (12%) didn’t evaluate model performance^8,12,21 (supplementary Table 7). 12 studies (48%) reported internal discrimination performance but did not report calibration performance^{7,13,16,18,19,23,24,25,26,29,30,31} and one external validation study¹⁵ reported the discrimination performance without reporting calibration (Table 2). 6 studies (24%) used the Hosmer-Lemeshow goodness-fit-test to assess the internal calibration performance^{9,10,11,17,22,27} (supplementary Table 7). One study (4%) used both calibration plot and slope to present models’ internal and external calibration performance²⁸, one study (4%) used calibration plot to present models’ internal and external calibration performance²⁰ and one study (4%) used calibration plot to present models’ internal calibration performance¹⁴ (supplementary Table 7).

Model reporting

9 studies (36%) including 13 models (32%) gave sufficient information for the models to be used in clinical practice^{11,14,18,20,26,27,28,29,30} (Table 2). 10 studies (40%) did not report the intercept or baseline hazard^{7,8,9,10,13,17,18,22,25,31}. 5 studies (20%) did not provide the model equation or sufficient details to replicate the model^{12,19,21,23,24} and one study provided a plot of estimated coefficients instead of giving specific values¹⁶.

Risk of bias/applicability

We found 8 studies (32%) which had inclusion and exclusion criteria that would be broadly generalisable to unselected populations with PD^{14,15,18,19,20,24,27,28} (supplementary Table 8), which had low concern of applicability (supplementary Table 9). 16 studies (64%) lacked details of important aspects of study design (e.g. recruitment methods/dates, diagnostic criteria)^{7,8,10,11,12,13,16,17,21,22,23,25,26,29,30,31} and 7 studies (28%) had selection concerns that could bias the studies towards healthier participants (e.g., excluding on the basis of comorbidities, older age) raising concerns about generalisability or risk of bias^{7,8,9,16,17,30,31} (supplementary Table 8, 9 and 10).

Supplementary Table 11 contains the risk of bias results relating to the predictors studied. One study (4%) had risk of bias in the predictors as they used a retrospective cohort without stating how subjective predictors (e.g., depression, olfactory dysfunction) were measured²⁵. 7 studies (28%) included predictors that may not be routinely available in clinical practice, such as CSF biomarkers or imaging data^{13,14,18,23,25,27,31} so these models may not be feasible in clinical practice, especially in resource-poor settings.

For the risk of bias relating to the outcomes in studies, one study (4%) had unclear risk of bias as it didn’t state the outcome definition¹² (supplementary Tables 12 and 13). Outcome definitions in 2 studies (8%) may have been biased by determination with knowledge of predictor information as the outcome definitions were subjective^19,25 (supplementary Tables 12 and 13).

Discussion

We identified 25 prognostic model studies, comprising 41 prognostic models, which have been published with the aim of predicting the individualised risk of future outcomes in PD. A wide range of clinical outcomes were used in these studies and the most common outcome was falls/recurrent falls. Most models made short-term predictions. None of the prognostic models had low risk of bias. The common analysis issues leading to risk of bias were potential mishandling of missing data including incorrect missing data imputation (potentially leading to biased predictions and biased model performance); selecting predictors using univariable screening and risk of overfitting from low EPV ratios (leading to both biased model performance from over-estimated discrimination performance and also biased predictions due to overestimation in those at higher risk of the outcome and underestimation in those at lower risk of the outcome³³); and the lack of external validation (leading to potential bias in model performance if used in different populations). Many of the included studies did not provide sufficient details of the models to enable use in clinical practice or research.

The review showed that some studies omitted to give basic information about the study population, which made it difficult to assess selection bias and applicability. Other studies had selection biases which led to study populations being skewed towards healthier subjects. Most studies were performed in Europe, the United States and Australia, so non-Caucasian populations are under-represented.

Half of the studies didn’t report the number of participants lost to follow-up. As PD is slowly progressive, there will often be losses to follow-up with long follow-up durations. Most models had too many predictors for the number of events, which carries a high risk of overfitting^32,34, and therefore high risk of poor performance.

The recommended method for handling missing data when data are missing at random is multiple imputation³². Missing at random means that systematic differences between the observed and missing data can be explained by associations with the observed data³⁵. In this scenario, using single imputation or deleting participants with missing values and conducting a complete cases analysis may cause a selection bias. 12 studies did not mention anything about missing data^{8,9,11,12,16,17,19,21,24,26,30,31}, 8 studies deleted observations with missing data or used single imputation with no justification^{10,13,15,18,20,22,25,27}, 2 studies assume missing at random but deleted missing data^7,29, 1 study imputed missing data with Restricted Boltzmann machine with adequate justification²³, and 2 studies used multiple imputation with no justification^14,28. Researchers should be aware that multiple imputation may lead to biased results when data is not missing at random and that a complete case analysis may be appropriate even when data are not missing completely at random³⁶.

Time to event models assume that censoring is uninformative, i.e., that the probability of being censored is independent of the outcome (i.e. the probability of getting the outcome in those who are censored is the same as those who remain under follow-up). An example of the probability of being censored being related to the outcome is patients who drop out having more severe disease than those who remain under follow-up. The missing survival times would likely be systematically shorter than survival times in those who remain, resulting in biased estimates. In our review we only found one paper⁷ that reported the number of patients lost to follow-up. There were only 4 patients lost to follow-up and the reason for the loss was not stated. While it is not clear whether this censoring was uninformative, the small number lost means it is unlikely to bias the predictions. All other studies that used time-to-event methods to account for censoring did not provide information about censored patients. We suggest that researchers report the number of patients censored before the end of study (i.e. non-administrative censoring) and if possible, provide reasons why. Methods to account for informative censoring include using inverse probability weights in the Cox model or joint models, which should be considered in studies with higher rates of loss to follow-up^37,38,39.

None of the studies considered competing risks in their analysis. Competing risks occurs when one or more events precludes the occurrence of the event of interest. Ignoring them can result in biased predictions. Competing risks can be accounted for using methods such as stratified Cox regression or the Fine and Grey model⁴⁰.

Several studies dichotomised or categorised continuous variables which may lose information and reduce predictive performance^32,41. Most studies selected predictors inappropriately with univariable analysis or backward/forward selection. A predictor which has no association with the outcome in univariate analysis, may become statistically significant in the multivariable analysis due to confounding^42,43. It is recommended that known clinically important predictors should be included in the modelling regardless of statistical significance³². This is because selection of predictors based on statistical significance such as backward/forward selection methods can lead to model overfitting, miscalibrated risks, and biased predictions³³. In our view, the selection of predictors should primarily be based on clinical knowledge rather than solely on statistical significance. We recommend that researchers collaborate with clinicians to select predictors, combining clinical and statistical expertise. If it is known from previous research or clinical knowledge that a predictor is associated with the outcome, even if not statistically significant, it should still be included in the analysis⁴².

The performance of most prognostic models was unclear and many lacked external validation, which is essential before a model can be applied in clinical practice⁴⁴. Half of the studies only reported discrimination performance by C-statistic which provides limited information (a high C-statistic may still lead to poor estimation of absolute risk⁴⁵). Ideally a prognostic model would predict an individual’s risk of a specific outcome within a period of time. However, two papers^7,24 stratified patients into different risk groups rather than estimate individual predictions. In such cases, reporting only the C-statistic may be sufficient. However, for the other 10 studies^{13,16,18,19,23,25,26,29,30,31} which did develop a model to provide individual predictions then the C-statistic is not enough to assess predictive performance. Without also assessing calibration performance, we cannot determine how well the predicted probabilities align with observed outcomes. Calibration performance is critical for ensuring that a prognostic model’s predictions are accurate and reliable, which is essential for clinical decision-making. For calibration performance, most studies only used the Hosmer-Lemeshow test which has limited statistical power to evaluate miscalibration³². 3 studies used calibration plots or slope to present their model’s calibration performance as recommended and no study used the gold standard approach (flexible calibration plot) to assess calibration⁴⁶.

While prognostic factor studies only need to report the estimated coefficients of predictors, a prognostic model study must report additional details (e.g. the constant) so that the model can be replicated by independent researchers to perform external validation or for clinicians to predict probabilities in clinical practice. 8 studies’ models gave full model details although another study’s model presented an online risk calculator, which could be applied in clinical practice.

None of the included studies had low risk of bias, as per the PROBAST criteria, so we cannot recommend any models without reservations. It is vital that models are externally validated to demonstrate generalisability before use in contexts other than the local geographical context in which it was originally developed⁴³. Only 3 studies with external validation reported sufficient information for the models to be used by other researchers or clinicians, and therefore could be considered for use in practice, ideally following further validation work^18,20,28. These models all had some concerns about bias relating only to the analysis domain (potentially leading to bias in predictions and in model performance).

The first of these is the prognostic model predicting risk of dementia by ten years by Liu et al.¹⁸ who performed individual-participant-data meta-analysis of nine prospective cohorts with a very large sample size using a frailty Cox model to account the heterogeneity between studies⁴⁷. The study didn’t report calibration performance and used complete case analysis for missing data. We recommend that calibration is fully assessed in future validation studies for this model. Another issue about the model’s use in clinical practice is the fact that predictor information was collected at widely varying disease durations, without a variable for disease duration in the model, so it is unclear when it is valid to use this model, although the majority (61%) of participants were recruited within two years from diagnosis. Although measures of disease severity may account, to a degree, for differences in disease duration, rates of disease progression over time vary substantially between individuals. Therefore, a combination of a disease duration variable together with disease severity is important in a prediction model². Further work to clarify the validity in inception cohorts is needed.

The second is a set of prognostic models predicting functional dependency, mortality, or a composite outcome “death or dependency” by Macleod et al.²⁰. This study developed parametric survival models in a UK incidence cohort and performed external validation in a Norwegian incidence cohort. This model had reasonable discrimination and showed a calibration plot with lower baseline risk in the Norwegian cohort. The authors reported recalibrated values of the model which could be used in the Norwegian setting. Concern about risk of bias relate to the use of univariable analysis for predictor selection, and low events per variable ratios in the validation cohort. Further validation of this model in a larger cohort would therefore be useful.

The third is a prognostic model to predict a composite poor outcome at five years from diagnosis by Velseboer et al.²⁸, developed in an inception cohort in the Netherlands using logistic regression, with external validation in a UK incidence cohort. The model demonstrated good discrimination (C-statistic 0.85) and adequate calibration (calibration slope 1.13) in external validation. There were some concerns about risk of bias due to their use of logistic regression, which does not account for censoring, and the low events-per-variable ratio raising concerns about overfitting. Further validation in larger cohorts would again be useful.

These models may be of use in research, for example in stratification in clinical trial randomisation, for adjustment for confounding in analysis of randomised controlled trials, or for case-mix correction. However, the use of prognostic models in clinical practice can potentially lead to harms as well as benefits so we hesitate to recommend their use for individual prognostication for PwP, given their limitations, without further external validation followed by rigorous testing to ensure any benefits of using model predictions in clinical care are not outweighed by harms.

This is the first systematic review of prognostic models in PD that aimed to make individual-level predictions. The main strength of this review is that we assessed studies’ quality rigorously using the PROBAST checklist. Other strengths include identifying studies with all types of clinical outcomes, not using language restriction, and using a comprehensive search strategy in multiple databases, displaying the results of the screening process using a PRISMA flow diagram.

There are also limitations of this review. The main limitations are lack of searching of grey literature and not contacting other authors for missing information in the included reports. Due to the time taken to perform this review and prepare it for publication, the searches are now over three years out of date. An updated search carried out on 05/02/2025 found 1118 additional papers in MEDLINE and EMBASE, representing a 104% increase, so future work is needed to update this review.

None of the prognostic models we identified had low risk of bias for all aspects of the study design so there is clearly a need for further prognostic modelling studies in PD. There is clear guidance for carrying out prognostic models, including a reporting checklist (TRIPOD)⁴⁸ and practical guidance for assessment of prognostic model performance and clinical usefulness⁴, and these should be considered in the design, analysis, and reporting stages of future prognostic modelling studies. We would draw attention to recent research regarding sample size calculations for prognostic modelling studies⁴⁹.

To enable prognostic models to be used in clinical settings, regardless of the prediction performance, we recommend researchers give full details about their data source (recruitment methods and dates, diagnostic and inclusion/exclusion criteria) and clear definitions of outcomes and predictors. We strongly recommend researchers present the full equations of prognostic models so they can be replicated or used by others. It is also important that researchers reporting prognostic model development make clear what time point in the disease course the models are to be used (e.g. at diagnosis or at another specified time point). Furthermore, to enhance the feasibility of clinical use of prognostic models, we recommend researchers choose predictors that are routinely available in clinical practice, unless there is clear additional prognostic value of particular biomarkers that are more expensive or invasive to collect. When models are used in clinical practice it is important to evaluate the impact of the model. We did not find any papers describing the use of prognostic models in clinical practice or evaluating the impact of any prognostic model in PD.

In conclusion, there are many methodological shortcomings in existing prognostic model studies in PD and many were published with insufficient detail to allow them to be used by other researchers or clinicians. We have made recommendations for the limited use of three prognostic models that have been externally validated but these all have some concerns about risk of bias and are probably not appropriate for individual use at present without further evaluation. There is therefore a pressing need for further prognostic model development and validation studies using high quality methodology to ensure low risk of bias and for clinical use of high-quality models to be evaluated thoroughly before widespread use.

Methods

Literature search

We searched MEDLINE (1946 to latest update) and EMBASE (1947 to latest update) on 20 Feb 2021 to identify primary articles that developed and/or validated prognostic models in PD. The search strategy is detailed in Supplementary Appendix 1.

Eligibility criteria

We sought to include all published studies of prognostic models in PD predicting clinical outcomes. We did not set inclusion/exclusion criteria relating to timing or definition of outcomes other than to exclude models predicting surrogate measures of outcomes such as measurement scales (e.g. impairment or cognitive scales) or imaging changes. No language restriction was applied.

PD subtyping studies which did not aim to make individualised predictions were excluded. We also excluded prognostic models for use in highly selected groups of PwP, such as those with deep brain stimulation.

Screening process

References were imported into Endnote and de-duplicated. Two reviewers independently reviewed titles and abstracts for eligibility (YL, MM). The full text papers of the articles were obtained for relevant studies or where relevance was unclear from the abstract. Full text papers were assessed by the same two reviewers independently. Disagreements on inclusion/exclusion of full text papers were discussed with a third or fourth reviewer (ADM, DJM). Reference lists of included papers were reviewed to identify any relevant papers missed from the database searches.

Data extraction

Two reviewers independently performed the data extraction and recorded it in an electronic data collection form using Microsoft Excel (YL and either MM, ADM, or DJM). Any disagreement was discussed with another reviewer (ADM or DJM). The data extraction form was based on CHARMS (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) checklist⁵⁰ and risk of bias assessment using PROBAST (Prediction model Risk Of Bias ASsessment Tool) checklist³². We categorised models into three groups (model development only; model development with external validation; and external validation with or without model updating) and extracted 10 domains based on the CHARMS list from each model:

(1)
Study location and data source;
(2)
Recruitment methods, diagnostic criteria;
(3)
Inclusion/exclusion criteria;
(4)
Outcomes predicted, follow-up duration, losses to follow-up;
(5)
Prognostic factors;
(6)
Sample size, events per variable;
(7)
Missing data frequency and methods for dealing with missing data.
(8)
Model development methods;
(9)
Model performance: internal validation methods, results of calibration, and discrimination.
(10)
Model evaluation: whether external validation was done and results of external calibration and discrimination.

Synthesis

We assessed the risk of bias and applicability of published prognostic models using the PROBAST tool and tabulated key aspects of study design, model development, model validation, and risk of bias. We tabulated the most commonly used prognostic factors from the studies. We made recommendations about the usefulness of existing prognostic models. Lastly, we made recommendations for future prognostic model development.

Registration

The protocol of this systematic review is registered in PROSPERO international prospective register of systematic reviews. The registration number is CRD42021247039. All data collections were presented in the supplementary file.

Data availability

For this systematic review we did not have access to the patient data from the original studies. We extracted information from the published articles of the included studies. The information extracted from the published articles of the included studies is available in this published article and its supplementary information files.

References

Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E. & Altman, D. G. Prognosis and prognostic research: what, why, and how?. Bmj-Br. Med. J. 338, b375 (2009).
Article Google Scholar
Steyerberg, E. W. et al. Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research. Plos Med. 10, e1001381 (2013).
Article PubMed PubMed Central Google Scholar
Ramspek, C. L., Jager, K. J., Dekker, F. W., Zoccali, C. & van Diepen, M. External validation of prognostic models: what, why, how, when and where?. Clin. Kidney J. 14, 49–58 https://doi.org/10.1093/ckj/sfaa188 (2020).
Article PubMed PubMed Central Google Scholar
McLernon, D. J. et al. Assessing performance and clinical usefulness in prediction models with survival outcomes: practical guidance for cox proportional hazards models. Ann. Intern. Med. 176, 105–114 https://doi.org/10.7326/M22-0844 (2023).
Article PubMed Google Scholar
Pencina, M. J. & D’Agostino, R. B. Evaluating discrimination of risk prediction models the C statistic. J. Am. Med. Assoc. 314, 1063–1064 (2015).
Article CAS Google Scholar
van Rooden, S. M. et al. The identification of Parkinson’s disease subtypes using cluster analysis: a systematic review. Mov. Disord. 25, 969–978 (2010).
Article PubMed Google Scholar
Almeida, L. R. S., Valenca, G. T., Negreiros, N. N., Pinto, E. B. & Oliveira-Filho, J. Predictors of recurrent falls in people with Parkinson’s disease and development of a predictive tool. Mov. Disord. 31, S511–S511 https://doi.org/10.1002/mds.26688 (2016).
Article Google Scholar
Ashburn, A., Stack, E., Pickering, R. M. & Ward, C. D. Predicting fallers in a community-based sample of people with Parkinson’s disease. Gerontology 47, 277–281 (2001).
Article CAS PubMed Google Scholar
Custodio, N. et al. Predictive model for falling in Parkinson disease patients. eNeurologicalSci 5, 20–24 https://doi.org/10.1016/j.ensci.2016.11.003 (2016).
Article PubMed PubMed Central Google Scholar
Duncan, R. P. et al. External validation of a simple clinical tool used to predict falls in people with Parkinson disease. Parkinsonism Relat. Disord. 21, 960–963 https://doi.org/10.1016/j.parkreldis.2015.05.008 (2015).
Article PubMed PubMed Central Google Scholar
Ehgoetz, Martens, K. et al. Predicting the onset of freezing of gait: A longitudinal study. Mov. Disord. 32, 28 https://doi.org/10.1002/mds.27087 (2017).
Article Google Scholar
Exarchos, T. P. et al. Using partial decision trees to predict Parkinson’s symptoms: a new approach for diagnosis and therapy in patients suffering from Parkinson’s disease. Comput. Biol. Med. https://doi.org/10.1016/j.compbiomed.2011.11.008
Gervasoni, E. et al. Clinical and stabilometric measures predicting falls in Parkinson disease/parkinsonisms. Acta neurologica Scandinavica 132, 235–241 https://doi.org/10.1111/ane.12388 (2015).
Article CAS PubMed Google Scholar
Gu, S.-C., Zhou, J., Yuan, C.-X. & Ye, Q. Personalized prediction of depression in patients with newly diagnosed Parkinson’s disease: a prospective cohort study. J. Affect. Disord. 268, 118–126, https://doi.org/10.1016/j.jad.2020.02.046 (2020).
Article PubMed Google Scholar
Kelly, M. J. et al. Predictors of motor complications in early Parkinson’s disease: a prospective cohort study. Mov. Disord. 34, 1174–1183 https://doi.org/10.1002/mds.27783 (2019).
Article PubMed PubMed Central Google Scholar
Kerr, G. K. et al. Predictors of future falls in Parkinson disease. Neurology 75, 116–124 https://doi.org/10.1212/WNL.0b013e3181e7b688 (2010).
Article CAS PubMed Google Scholar
Lindholm, B., Nilsson, M. H., Hansson, O. & Hagell, P. External validation of a 3-step falls prediction model in mild Parkinson’s disease. J. Neurol. 263, 2462–2469 (2016).
Article PubMed PubMed Central Google Scholar
Liu, G. et al. Prediction of cognition in Parkinson’s disease with a clinical-genetic score: a longitudinal analysis of nine cohorts. Lancet Neurol. 16, 620–629 https://doi.org/10.1016/S1474-4422(17)30122-9 (2017).
Article PubMed PubMed Central Google Scholar
Lo, C. et al. Predicting motor, cognitive & functional impairment in Parkinson’s. Ann. Clin. Transl. Neurol. 6, 1498–1509 https://doi.org/10.1002/acn3.50853 (2019).
Article PubMed PubMed Central Google Scholar
Macleod, A. D., Dalen, I., Tysnes, O.-B., Larsen, J. P. & Counsell, C. E. Development and validation of prognostic survival models in newly diagnosed Parkinson’s disease. Mov. Disord.33, 108–116 https://doi.org/10.1002/mds.27177 (2018).
Article PubMed Google Scholar
Mak, M. K., Wong, A. & Pang, M. Y. Impaired executive function can predict recurrent falls in Parkinson’s disease. Arch. Phys. Med. Rehabil. 95, 2390–2395 https://doi.org/10.1016/j.apmr.2014.08.006 (2014).
Article PubMed Google Scholar
Paul, S. S. et al. Three simple clinical tests to accurately predict falls in people with Parkinson’s disease. Mov. Disord. 28, 655–662 https://doi.org/10.1002/mds.25404 (2013).
Article PubMed Google Scholar
Phongpreecha, T. et al. Multivariate prediction of dementia in Parkinson’s disease. NPJ Parkinson’s Dis. 6, 20 https://doi.org/10.1038/s41531-020-00121-2 (2020).
Article Google Scholar
Pouwels, S. et al. Five-year fracture risk estimation in patients with Parkinson’s disease. Bone 56, 266–270 https://doi.org/10.1016/j.bone.2013.06.018 (2013).
Article PubMed Google Scholar
Redensek, S., Jenko Bizjan, B., Trost, M. & Dolzan, V. Clinical-pharmacogenetic predictive models for time to occurrence of levodopa related motor complications in Parkinson’s Disease. Front. Genet. 10, 461 https://doi.org/10.3389/fgene.2019.00461 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schapira, A. H. et al. Development of a risk calculator based on the STRIDE-PD study for predicting dyskinesia in patients with Parkinson’s disease. Mov. Disord. 27, S138–S138 https://doi.org/10.1002/mds.25051 (2012).
Article Google Scholar
Schrag, A., Siddiqui, U. F., Anastasiou, Z., Weintraub, D. & Schott, J. M. Clinical variables and biomarkers in prediction of cognitive impairment in patients with newly diagnosed Parkinson’s disease: a cohort study. Lancet Neurol. 16, 66–75 https://doi.org/10.1016/S1474-4422(16)30328-3 (2017).
Article CAS PubMed Google Scholar
Velseboer, D. C. et al. Development and external validation of a prognostic model in newly diagnosed Parkinson disease. Neurology 86, 986–993 https://doi.org/10.1212/WNL.0000000000002437 (2016).
Article PubMed PubMed Central Google Scholar
Wang, J., Luo, S. & Li, L. Dynamic prediction for multiple repeated measures and event time data: an application to Parkinson’s disease. Ann. Appl. Stat. 11, 1787–1809 https://doi.org/10.1214/17-Aoas1059 (2017).
Article PubMed PubMed Central Google Scholar
Wang, M. et al. Predicting the multi-domain progression of Parkinson’s disease: a Bayesian multivariate generalized linear mixed-effect model. BMC Med. Res. Methodol. 17, 147 https://doi.org/10.1186/s12874-017-0415-4 (2017).
Article PubMed PubMed Central Google Scholar
Ye, B. S. et al. Dementia-predicting cognitive risk score and its correlation with cortical thickness in Parkinson disease. Dement. Geriatr. Cogn. Disord. 44, 203–212 https://doi.org/10.1159/000479057 (2017).
Article PubMed Google Scholar
Moons, K. G. M. et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann. Intern. Med. 170, W1–W33 (2019).
Article PubMed Google Scholar
Van Calster, B. et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 17, 230 https://doi.org/10.1186/s12916-019-1466-7 (2019).
Article PubMed PubMed Central Google Scholar
Vittinghoff, E. & McCulloch, C. E. Relaxing the rule of ten events per variable in logistic and Cox regression. Am. J. Epidemiol. 165 710–718, https://doi.org/10.1093/aje/kwk052 (2007).
Article PubMed Google Scholar
Sterne, J. A. C. et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 https://doi.org/10.1136/bmj.b2393 (2009).
Article PubMed PubMed Central Google Scholar
Hughes, R. A., Heron, J., Sterne, J. A. C. & Tilling, K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int. J. Epidemiol. 48, 1294–1304 https://doi.org/10.1093/ije/dyz032 (2019).
Article PubMed PubMed Central Google Scholar
Buchanan, A. L., Hudgens, M. G., Cole, S. R., Lau, B. & Adimora, A. A. Worth the weight: using inverse probability weighted Cox Models in AIDS research. AIDS Res. Hum. Retroviruses 30, 1170–1177 https://doi.org/10.1089/aid.2014.0037 (2014).
Article PubMed PubMed Central Google Scholar
Matsouaka, R. A. & Atem, F. D. Regression with a right-censored predictor using inverse probability weighting methods. Stat. Med. 39, 4001–4015 https://doi.org/10.1002/sim.8704 (2020).
Article PubMed Google Scholar
Zhou, J., Zhao, X. & Sun, L. A. New inference approach for joint models of longitudinal data with informative observation and censoring times. Stat. Sin. 23, 571–593 (2013).
Google Scholar
van Geloven, N. et al. Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ 377, e069249 https://doi.org/10.1136/bmj-2021-069249 (2022).
Article PubMed Google Scholar
Royston, P. et al. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat. Med. 25, 127–141 https://doi.org/10.1002/sim.2331 (2006).
Article PubMed Google Scholar
Harrell, F. E. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis, 2nd Edition. Springer Ser. Stat. https://doi.org/10.1007/978-3-319-19425-7 (2015).
Sun, G. W., Shook, T. L. & Kay, G. L. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J. Clin. Epidemiol. 49, 907–916 https://doi.org/10.1016/0895-4356(96)00025-X (1996).
Article CAS PubMed Google Scholar
Moons, K. G. M. et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98, 691–698 https://doi.org/10.1136/heartjnl-2011-301247 (2012).
Article PubMed Google Scholar
Alba, A. C. et al. Discrimination and calibration of clinical prediction models users’ guides to the medical literature. J. Am. Med. Assoc. 318, 1377–1384 (2017).
Article Google Scholar
Van Calster, B. et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J. Clin. Epidemiol. 74, 167–176 https://doi.org/10.1016/j.jclinepi.2015.12.005 (2016).
Article PubMed Google Scholar
Riley, R. D. & Debray, T. P. A. Individual Participant Data Meta-Analysis 127−162 (Wiley, 2021).
Moons, K. G. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann. Intern. Med. 162, W1–W73 https://doi.org/10.7326/M14-0698 (2015).
Article PubMed Google Scholar
Riley, R. D. et al. Calculating the sample size required for developing a clinical prediction model. BMJ 368, m441 https://doi.org/10.1136/bmj.m441 (2020).
Article PubMed Google Scholar
Moons, K. G. M. et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS Checklist. Plos Med. 11, e1001744 (2014).

Download references

Acknowledgements

Yan Li is funded by a studentship from the Meikle Foundation.

Author information

Authors and Affiliations

Institute of Applied Health Sciences, University of Aberdeen, Polwarth Building, Foresterhill, Aberdeen, UK
Yan Li, David J. McLernon, Carl E. Counsell & Angus D. Macleod
School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Polwarth Building, Foresterhill, Aberdeen, UK
Millie McDonald-Webb

Authors

Yan Li
View author publications
Search author on:PubMed Google Scholar
Millie McDonald-Webb
View author publications
Search author on:PubMed Google Scholar
David J. McLernon
View author publications
Search author on:PubMed Google Scholar
Carl E. Counsell
View author publications
Search author on:PubMed Google Scholar
Angus D. Macleod
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.L.: design, execution, analysis, writing first draft of the manuscript. M.M.W.: execution, review of final version of the manuscript. D.J.M.: design, execution, analysis, review of final version of the manuscript. C.E.C.: design, review of final version of the manuscript. A.D.M.: design, execution, analysis, review of final version of the manuscript.

Corresponding author

Correspondence to Yan Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., McDonald-Webb, M., McLernon, D.J. et al. Systematic review of prognostic models in Parkinson’s disease. npj Parkinsons Dis. 11, 266 (2025). https://doi.org/10.1038/s41531-025-01112-x

Download citation

Received: 14 June 2024
Accepted: 04 August 2025
Published: 29 August 2025
DOI: https://doi.org/10.1038/s41531-025-01112-x

Subjects

Abstract

Similar content being viewed by others

Structural underpinnings and long-term effects of resilience in Parkinson’s disease

Four questions to predict cognitive decline in de novo Parkinson’s disease

Deep learning predicts prevalent and incident Parkinson’s disease from UK Biobank fundus imaging

Introduction

Results

Study populations and designs

Outcomes of study

Predictors in study

Study sample sizes

Model development

Model evaluation and performance

Model reporting

Risk of bias/applicability

Discussion

Methods

Literature search

Eligibility criteria

Screening process

Data extraction

Synthesis

Registration

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information file

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links