Abstract
Background
Parkinson’s disease is a progressive neurodegenerative disorder with both motor and non-motor symptoms. Mental and behavioural non-motor symptoms such as cognitive impairment, sleep disturbances, depression, and anxiety greatly affect quality of life but remain difficult to assess with traditional tools. Artificial intelligence has shown potential in healthcare, yet its role in evaluating these symptoms in Parkinson’s disease remains under-reviewed. This systematic review aims to evaluate the performance of artificial intelligence tools in diagnosing, assessing, and managing these symptoms.
Methods
Five databases (Medline, Embase, Scopus, Web of Science and PubMed) were searched up to June 2024 for peer-reviewed studies applying artificial intelligence to mental or behavioural symptoms in adults with Parkinson’s disease. Studies published before 2010 or lacking artificial-intelligence technologies were excluded. Study quality and risk of bias were assessed using QUADAS-2. Extracted data include study objectives, data sources, algorithms, best model, and diagnostic performance (accuracy, sensitivity, specificity). The study received no external financial support.
Results
Here we show sixteen studies examine cognitive impairment and seven examine sleep disorders. However, only three studies focus on depression and one on anxiety, revealing a research gap. No meta-analysis was performed due to heterogeneity.
Conclusions
Artificial intelligence shows promise for assessing mental and behavioural symptoms in Parkinson’s disease, particularly cognitive and sleep disorders. Multimodal models demonstrate higher accuracy than single-source models, though external validation is necessary. The limited studies on depression and anxiety reflect existing diagnostic challenges and data limitations. Future research should refine diagnostic tools and expand multimodal approaches to these symptoms.
Plain Language Summary
Parkinson’s disease causes both movement problems and non-movement symptoms such as difficulties with thinking, decision-making, memory, sleep, depression, and anxiety. These symptoms are common and deeply affect quality of life for people living with Parkinson’s, but they are often missed or underestimated by traditional clinical assessments. We systematically examined whether artificial intelligence could help detect and monitor these symptoms.We analysed twenty-seven studies and found that artificial intelligence shows promise for identifying thinking and sleep-related problems, but very few studies examined depression or anxiety. The findings suggest that combining different types of patient data improves accuracy, but more validation is needed before these tools can be used in clinical settings to help doctors provide better care.
Similar content being viewed by others
Introduction
Parkinson’s disease (PD) is a progressive neurodegenerative disorder characterised by both motor symptoms and non-motor symptoms (NMS)1 While motor symptoms, such as tremors, rigidity, impaired balance, freezing of gait, and hypokinetic dysarthria remain the primary focus of PD diagnosis and management, NMS, such as cognitive impairment, sleep disorders, anxiety, and depression can significantly affect patient well-being. Yet, these are often overlooked2 In 2019, over 8.5 million people were diagnosed with PD globally3, with studies reporting that up to 75% of PD patients experience sleep disorders, 50% experience depression, 40% experience anxiety, and 20–50% have cognitive impairment4 These symptoms not only contribute to disease burden but also reduce quality of life and complicate care delivery.
Moreover, the global burden of PD has risen sharply, with 5.8 million disability-adjusted life years (DALYs) and 329,000 deaths in 2019, reflecting an over 100% increase since 20005 PD also carries a significant economic burden, with annual costs of $52 billion in the US and £20,123 per person in the UK, including direct and indirect costs6,7 Due to PD’s progressive nature1, the symptom severity may worsen over time, making treatment more challenging. As a result, PD increasingly challenges healthcare systems and caregivers, requiring significant resources for disease management.
Current assessment of NMS relies heavily on clinical interviews, standardised questionnaires, and medical tests. For example, depression is commonly diagnosed using the Hamilton Depression Rating Scale (HDRS), while anxiety is evaluated with the Hamilton Anxiety Rating Scale (HAM-A) or the Hospital Anxiety and Depression Scale (HADS-A). Cognitive impairment is typically assessed with the Montreal Cognitive Assessment (MoCA), and sleep disturbances are diagnosed through polysomnography (PSG). However, these tools were not designed specifically for PD, which may limit their relevance. For instance, some mental health tools fail to distinguish PD-related mental health symptoms from primary psychiatric disorders8,9 Additionally, cognitive tests, such as MoCA can be biased by education levels even after variable adjustments, and the standard 1-point correction may not be sufficient across different settings10 Although PSG is accurate, it is costly and resource-intensive, with inconsistencies arising from subjective reporting and clinician interpretation.
To address these challenges, Parkinson-specific instruments, such as the Parkinson Anxiety Scale (PAS), the PD Cognitive Rating Scale (PD-CRS), and the Movement Disorder Society-Unified PD Rating Scale (MDS-UPDRS) have been developed to better assess relevant PD symptoms. Nevertheless, previously described challenges remain. Therefore, AI has been further explored as a complement to current assessments, to make better use of the data they generate, and potentially improve diagnostic accuracy.
AI has shown promise in PD research for its ability to process large datasets and identify patterns. While much of the existing work has focused on motor symptoms, researchers have also begun investigating its application in NMS. For example, a systematic review by Sun et al.11 examined how machine learning models combining clinical and magnetic resonance imaging (MRI) data could detect cognitive impairment in PD. However, such efforts have typically focused on individual symptoms. To our knowledge, no comprehensive review has systematically evaluated how AI has been applied to mental and behavioural NMS as a group, particularly cognitive impairment, anxiety, depression, and sleep disorders, despite their substantial prevalence and clinical impact.
This systematic review, therefore, aims to systematically evaluate how AI tools have been applied to diagnose, classify, or predict these four key NMS in PD. We focus on summarising the reported performance of these tools, including their accuracy, sensitivity, and specificity, concerning outcomes, such as symptom classification, risk prediction, and disease diagnosis.
In this systematic review of 27 studies, we find that artificial intelligence shows promising performance for identifying cognitive impairment and sleep disorders in PD, especially when multimodal approaches are used. However, the application of AI to depression and anxiety in PD remains limited, highlighting a research gap. While multimodal models combining different data types demonstrate superior performance compared to single-modality approaches, the contribution of individual features varies across contexts, emphasising the importance of optimising feature selection during model development. Nevertheless, variability in model performance between training and testing sets raises concerns about overfitting and limited generalisability. These findings suggest that AI tools have the potential to improve assessment of NMSs in PD, but further validation in real-world settings is essential before clinical implementation.
Methods
Search strategy
A literature search was conducted across five electronic databases up to June 2024: Medline, Embase, Scopus, Web of Science and PubMed. Additional systematic review searches were performed in the PROSPERO and Cochrane Library databases. The search strategy was developed in consultation with a research librarian and incorporated relevant keywords and MeSH terms related to PD, AI, and mental and behavioural NMS (Supplementary Data 1). The strategy was guided by the PICO framework (Table 1) and applied database-specific filters, yielding 5222 articles (Supplementary Data 2).
Inclusion and exclusion criteria
This systematic review included peer-reviewed studies from 2010 onward that applied AI to diagnosing and managing mental and behavioural NMSs in adult PD patients. Key exclusion criteria included non-PD conditions, non-AI interventions, pre-2010 studies, and non-English publications. The full breakdown of inclusion and exclusion criteria is listed in Supplementary Data 3.
Screening process
A three-stage screening process was conducted based on the protocol by Bounsall et al.12, with modifications made to the original protocol (Table 2), and the PRISMA guidelines (Fig. 1). Electronic screening was first applied to filter articles based on the presence of keywords across different fields (any field, abstract, and title), as detailed in Supplementary Data 4, followed by title and abstract, and full-text screening. Reasons for exclusion at the full-text stage are summarised in Supplementary Data 5. All screening steps were performed independently by one reviewer (SC).
Quality assessment
The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool was used to evaluate the quality of included studies13 QUADAS-2 assesses the risk of bias across four domains: patient selection, index test, reference standard, and flow and timing. Applicability concerns were evaluated for the first three domains (Supplementary Data 6). Each study was rated as “low,” “high,” or “unclear” risk based on predefined criteria (Supplementary Data 7). High-risk ratings suggested significant biases, low-risk ratings suggested minimal concerns, and unclear ratings reflected insufficient information.
Data Extraction
Data extraction was conducted by one reviewer, SC, using a structured table (Table 3). The table was developed based on the research objectives and the PICOS framework, capturing study objectives and reported performance metrics.
The following performance metrics were used to evaluate the AI models: accuracy: the proportion of correct predictions (both true positives and true negatives) among all predictions Sensitivity: the proportion of actual positive cases correctly identified (true positive rate) Specificity: The proportion of actual negative cases correctly identified (true negative rate)
Data analysis and synthesis
Due to heterogeneity in outcome measures and methodologies, a meta-analysis was not feasible. Instead, a descriptive analysis summarised study characteristics, methodologies, and outcomes across the four symptom categories (cognitive impairment, depression, sleep disorders, and anxiety). Supplementary Data 8 presents the extraction of the included studies, documenting their objectives, data sources, algorithms, performance metrics, and best performing model. This allows the synthesis of observed patterns in algorithm performance, providing a comprehensive overview of the current literature on AI’s diagnostic performance in analysing mental and behavioural NMS in PD. Table 4 summarises the best-performing model from each study. Models were excluded if multiple models performed equally well, or if the best-performing model varied across different tasks within the same study. This systematic review received no external financial support and was not registered.
SPIDER technique
The SPIDER technique was used to review reference lists from included studies to retrieve more relevant studies. This approach identified one additional article14 on cognitive impairment. However, no further studies focusing on AI applications for depression or anxiety in PD were identified through this approach.
Statistics and reproducibility
Statistical meta-analysis was not performed due to heterogeneity in study methodologies and outcome measures. Instead, a descriptive synthesis was used to identify trends in accuracy, sensitivity, and specificity across studies. To ensure reproducibility, the search strategy, PICO framework, inclusion and exclusion criteria, and data extraction framework were predefined, with details provided in the Supplementary Information. Quality assessment was performed using the QUADAS-2 tool to evaluate risk of bias and applicability concerns across domains.
Ethics approval and consent to participate
No ethical approval was required for this study as it did not involve human or animal subjects. All sources of data were publicly available.
Results
Study Selection
In total, 5222 articles were retrieved from five electronic databases, with one additional record identified using the SPIDER method. After duplicate removal and multi-stage screening (Fig. 1), 27 studies met the inclusion criteria and were included in the final review. During full-text screening, 26 articles were excluded for reasons including lack of access (n = 7), need for purchase (n = 2), duplication (n = 5), and not following PICO criteria (n = 12) (Supplementary Data 5).
Included Studies and Study characteristics
The characteristics of the 27 studies are summarised in Supplementary Data 8, which outlines each study’s objectives, data sources, AI algorithms used, reported accuracy, sensitivity, specificity, best-performing model, and relevant notes. Sixteen focused on cognitive impairment14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29, involving 2241 participants; seven on sleep disorders with a total of 687 participants30,31,32,33,34,35,36; three on depression37,38,39 with 265 participants, and one on anxiety40 with 219 participants.
Data extraction table
Table 4 summarises the best-performing AI model, defined as those achieving the highest accuracy compared to other models tested within the same study. Support Vector Machine (SVM) were the most frequently reported best performer, identified in nine studies, followed by Random Forest (RF) in five studies. However, the overall heterogeneity in best-performing models suggests that the best AI approach depends on the NMS analysed, available data, and patient characteristics.
Supplementary Data 9 summarises the input data sources used across the included studies, categorised into four main categories: neuroimaging, electrophysiological, wearable and portable devices, and clinical assessments or biomarkers. The table allows readers to identify which studies used single-modality input and which combined different types of data, showing the diversity of input data and the use of multimodal approaches in AI model development for PD-related NMS.
The included studies employed various algorithms, with some studies comparing multiple approaches. SVM and RF were most prevalent; SVM was used in 15/27 studies (55.6%), followed by RF in 11/27 studies (40.7%), while newer approaches like deep learning were less commonly used. The full range of algorithms employed is provided in Supplementary Data 10.
The following subsections present results categorised by conditions, summarising the AI methods and their reported accuracy, sensitivity, and specificity. A detailed overview of each study’s objective, input data, algorithms used, and performance metrics is presented in Supplementary Data 8, which can be used to contextualise the findings discussed below.
Cognitive Impairment
16 out of 27 studies (59.3%) focused on cognitive impairment; these studies varied in study objectives, data sources, and AI models, leading to heterogeneity in reported performance metrics. Among these, eight studies focused on binary classification between PD patients with and without cognitive impairment, with classification accuracy ranging from 71.9 to 100%, sensitivity from 50 to 99.2%, and specificity from 81 to 100%. Five studies predicted future cognitive decline, with accuracy ranging from 74 to 86.7%, sensitivity from 67.7 to 91%, and specificity from 70 to 96.1%, though they varied in their follow-up periods (ranging from 4 to 8 years). The remaining three studies had different objectives: one differentiated between multiple cognitive states (PD-Dementia (PDD), PD-Mild Cognitive Impairment (PDMCI), PD-Cognitive Impairment (PDCI)). Another aimed to predict MoCA scores at year 4 and achieved 83% accuracy. The third study distinguished PD-MCI patients from healthy controls, with no other groups included, and achieved 83.3% accuracy. Several studies17,20,24,28 showed that multimodal approaches combining different data types outperformed models using only one data type. This trend was also observed in studies combining multiple features within the same data type, such as combining intravoxel and intervoxel diffusion metrics in Diffusion Tensor Imaging (DTI)22 or combining graph frequency features in functional near-infra-red spectroscopy (fNIRS)29 Furthermore, RF achieved the highest accuracy in two studies that directly compared multiple machine learning algorithms: one classified PD with Normal Cognition (PDNC) vs. PD-MCI and PDD, and the other classified PDNC vs. PD-MCI among non-demented patients. However, some studies have shown a drop in performance from training to testing sets18,25,26, raising concerns about potential overfitting.
Sleep disorders
Seven studies (25.9%) focused on sleep disorders in PD. For PSG-based studies, Sorensen et al.30 achieved 89.8% sensitivity in sleep arousal detection using Artificial Neural Networks (ANN). Bisgin et al.31 reported 87.84% accuracy in Rapid Eye Movement (REM) sleep classification using RF combined with feature selection, showing particular strength in REM sleep stages detection. As REM sleep disturbances are considered crucial early evidence of neurodegeneration and may support the diagnosis of PD31, PSG-based AI models targeting sleep patterns may provide valuable insights for early PD detection.
Two studies focused on Rapid Eye Movement Sleep Behaviour Disorder (RBD) prediction, and both demonstrated the superiority of RF over other algorithms despite having different study objectives. Byeon’s32 model achieved 71% accuracy, 79% sensitivity, and 67% specificity in identifying high-risk RBD-PD patients. Chong-wen’s34 RF model demonstrated higher accuracy (83.05%) and specificity (93.06%) but lower sensitivity (67.39%), which was 100% in the training set. Both studies identified predictive factors for RBD in PD patients with significant overlap, such as age, cognitive function, and motor scores. As RBD can precede PD for several years and is associated with more aggressive PD phenotypes35, AI-based models for RBD detection and prediction may offer valuable opportunities for earlier intervention and more accurate risk prediction.
Studies using wearable devices also showed promising results: Ko et al.33 achieved 84.5% accuracy in awake/sleep detection using smartwatch sensors, Raschella et al.35 reported 96.2–100% accuracy in RBD classification using wrist actigraphy, and Rechichi et al.36 achieved 96.2% accuracy for PD detection and 85.7% for sleep quality classification using Inertial Measurement Units (IMU). These wearables are valuable for continuously tracking PD related sleep disorders.
Depression
Three studies (11.1%) focused on differentiating depression in PD (DPD) from non-depressed PD (NDPD), healthy control (HC), and non-PD depression. Zhang et al.37 showed high accuracy across different classification tasks: DPD vs HC at 100% (SVM), NDPD vs HC at 96% (Lasso), and DPD vs NDPD at 90% (RF). With Linear predictive coding of EEG Algorithm for PD (LEAPD), Espinoza et al.38 reported 97% accuracy in distinguishing DPD from NDPD and 100% accuracy for DPD vs non-PD depression. Using SVM, Yang et al.39 reported 73% accuracy, 88% sensitivity, and 57% specificity for differentiating DPD from NDPD in the test set, with a significant drop in specificity compared to the training set (from 73- to 57%).
Anxiety
Only one study40 focused on anxiety in PD (3.7%). Combining clinical and structural MRI features, the study reported 88.0% accuracy, 86.0% precision, and 81.0% sensitivity in identifying anxiety in PD using a SVM. This multimodal model showed improved accuracy compared to models using either data type alone, highlighting its importance.
Overall findings
Overall, this systematic review shows that cognitive impairment in PD has received the most attention in AI-based studies, followed by sleep disorders. In contrast, the application of AI to depression and anxiety remains relatively under-analysed, suggesting a gap in current research.
Furthermore, RF and SVM are commonly used across studies. RF was evaluated in 11/27 studies (40.7%) with multiple-algorithm comparisons, ranking as the best-performing model in 5 studies based on their specific objectives. This suggests RF may be well-suited to analysing the complex nature of mental and behavioural NMS in PD. SVM appeared in 15/27 studies (55.6%), achieving the best performance in 9 studies according to their specific objectives. However, 7 of these 9 studies used SVM as the only model, introducing potential limitations for comparison with other models.
Discussion
This systematic review of 27 studies examined AI applications across four major NMSs in PD: cognitive impairment (59.3%), sleep disorders (25.9%), depression (11.1%), and anxiety (3.7%). Cognitive impairment and sleep disorders in PD demonstrated more extensive AI applications, characterised by comparisons of multiple algorithms and promising results. Evaluations of AI with cognitive applications have evolved from simple algorithm comparisons to multimodal approaches over the years. Similarly, sleep disorder research shows an evolution from complex PSG data analysis to more accessible wearable devices that can offer continuous monitoring. This trend is supported by Iakovakis et al.41, who showed that sleep data from smartwatches in real-world settings could accurately distinguish early PD, highlighting the potential of wearable devices. Prashanth et al.42 also showed that self-reported olfactory loss and RBD questionnaires could successfully distinguish early PD, demonstrating the value of accessible, non-invasive data sources for AI models. In contrast, AI applications for depression and anxiety in PD (though demonstrating positive results) are in earlier stages, with most studies (2/3 for depression, 1/1 for anxiety) evaluating a single algorithm for a given objective rather than systematically comparing multiple models. The limited research on anxiety in PD highlights this as an under-reviewed area needing more investigation.
A meta-analysis was not performed in this systematic review due to substantial heterogeneity identified across studies. Supplementary Data 8 presents the data necessary for a descriptive analysis of AI model performance by summarising reported metrics for each study. Across studies, several key trends that have significant implications for future research were observed. Five studies17,20,24,28,40 in this systematic review showed that multimodal models combining different data types demonstrated better performance compared to single-modality models. Improvements were reported across various metrics: accuracy, sensitivity, and specificity17,20; accuracy alone24,40; and AUC and specificity28 Several studies in this systeamtic review demonstrated that the contribution of different input features to predictive performance can vary, highlighting the need for careful feature selection and evaluation during model development. For example, Hosseinzadeh et al.24 found that clinical features were the dominant predictors, with imaging features providing only marginal improvement in accuracy, whereas Huang et al.27 showed that certain neuroimaging features alone could achieve performance comparable to models combining both clinical and imaging data. Similarly, Jeon et al.14 found that combining MoCA domain scores with cognitive complaints outperformed the traditional MoCA cutoff, but adding depression scores did not improve accuracy, suggesting that not all variables contribute equally to model performance. In another example, Chen et al.19 reported that dimensionality reduction using Principal Component Analysis (PCA) improved model accuracy from 84.6 to 100%. Collectively, these findings align with Altham et al.’s43 finding on the importance of optimising feature selection. Another observation was the variability in model performance between training and testing sets in some studies18,25,26,39, raising concerns about potential overfitting and limited generalisability of certain models. This highlights the need for further validation before clinical applications.
Previous systematic reviews of AI applications in PD have typically focused on individual symptoms. In contrast, this systematic review synthesises research across multiple prevalent NMS, including cognitive impairment, sleep disorders, depression, and anxiety. Our findings address a critical research gap identified by Sun et al.11, who observed that almost no qualified studies had integrated radiomics and other biomarkers into machine learning models for cognitive impairment diagnosis in PD. Several recent studies in our systematic review, such as those by Hosseinzadeh et al.24 and Jian et al.28, successfully demonstrated the superior performance of multimodal models that integrate radiomics features with clinical features. Our findings align with Altham et al.‘s43 conclusions that RF algorithms effectively handle complex data, as evidenced by their strong performance across multiple studies14,21,31,32,34
Strengths of this systematic review included the comprehensive search strategy across five major databases (Embase, Medline, PubMed, Web of Science, Scopus), the use of the population, intervention, comparison, outcomes (PICO) and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) frameworks to structure the review, and the identification of additional references via spidering. These strategies reduced the risk of missing potentially relevant papers in the initial search. Inclusion and exclusion criteria were clearly defined before commencing screening to increase consistency, and a systematic multi-stage screening process was conducted to focus the review on the most relevant studies to address the research question.
However, several limitations should be noted. Restriction to studies published in English could introduce language bias, but was necessary given available resources. Excluding grey literature has impacted the findings; given the rapid advancement of AI in healthcare, some of the most recent and innovative work might first appear in non-peer-reviewed sources. This is particularly relevant for AI applications for anxiety and depression in PD, which is in earlier stages, as the limited number of studies could have led to an underestimation of current research. This exclusion criteria was implemented to ensure the systematic review focused on high-quality evaluations.
Conducting a meta-analysis would have strengthened our ability to quantify AI performance across studies, but was not possible due to the heterogeneity of models and outcomes. Future systematic reviews should consider conducting a meta-analysis, which would benefit from a less heterogeneous set of included studies, particularly in terms of comparable populations, study objectives, outcomes, and index tests or models. Another limitation is that the systematic review was conducted by a single reviewer, which may have increased the risk of bias and human error in quality assessment. Finally, publication bias may have influenced the findings, as studies with positive results are more likely to be published, potentially leading to an overestimated assessment of AI’s role in managing NMS in PD.
Multimodal models repeatedly demonstrated higher accuracy than models using single data sources for identifying NMS in PD, which indicates that investing in developing these models could help improve PD care. This aligns with a previous study done by Prashanth et al.44, who showed that while prior studies had used non-motor, CSF, or imaging markers individually, their study was among the first to combine all three, and resulted in enhanced accuracy in classifying early PD from HC. However, to ensure reliability in real-world settings, external validation of AI tools in PD diagnosis is necessary to support clinical implementation, as several studies observed performance drops in testing sets. Future studies should emphasise rigorous validation using independent datasets and longitudinal studies to capture symptom progression. Given the evidence that NMS are interconnected and often co-occurring. For example, Pearson et al. showed that sleep disorders can worsen cognitive impairment and increase depression risk45 Therefore, future research should also focus on early symptom prediction using baseline data and explore AI models that can assess multiple NMS.
The limited number of AI studies identified that focused on depression (n = 3) and anxiety (n = 1) in PD may be due to broader research gap in the field. For example, Yang et al.39 found that over 60% of self-reported depression cases in PD went unrecognised by neurologists using standard tools like the UPDRS. This could result in insufficient reliable training data for AI development, contributing to the low number of published AI studies addressing these symptoms. Similarly, Jia et al.40 reported in 2024 that no previous studies had used machine learning to integrate clinical and neuroimaging data for identifying PD-related anxiety. Together, these findings indicate an imbalance in the current literature. As such, future research should develop more accurate diagnostic tools for depression and anxiety in PD, and extend promising multimodal AI approaches from cognitive and sleep disorders to these symptoms. Furthermore, research should expand beyond English-language studies and grey literature to identify additional relevant studies. These approaches will help develop more reliable and clinically useful AI tools for managing NMS in PD patients, ultimately supporting better clinical decision-making and patient outcomes.
This systematic review examined 27 studies assessing the ability of AI tools to diagnose, assess, and support mental and behavioural NMS in PD patients. The available research focuses primarily on cognitive impairment and sleep, while NMS, such as depression and anxiety, are under-explored despite their high prevalence in PD patients. SVM and RF were the most commonly used algorithms and showed promising results across different NMS, though SVM was the only model being tested in many studies. We found evidence of the superior performance of multimodal models that combine different data types compared to single modality models, although the contribution of features varied by context. This indicates that while integrating multiple data sources can enhance diagnostic accuracy, feature selection needs careful planning based on the context. This systematic review suggests potential for AI applications in PD care. The included studies show that AI tools have achieved promising diagnostic performance in research settings for assessing NMS. This is valuable given that traditional diagnostic methods often overlook these symptoms3,39 However, to ensure this potential can be realised, robust external validation of models before clinical implementation must be prioritised, as several studies observed performance drops between training and testing sets. Given the interconnected nature of NMS in PD, there is a significant potential benefit for comprehensive patient care solutions in PD from developing AI models capable of simultaneously assessing multiple symptoms. By doing these, future research could improve how NMS are identified and managed in PD patients, leading to more comprehensive and effective patient care.
Data availability
All data supporting the findings of this study are available within the article and its Supplementary Information. The detailed extraction table for all included studies is provided as Supplementary Data 8. Further inquiries can be directed to the corresponding author.
References
Ba, F., Obaid, M., Wieler, M., Camicioli, R. & Martin, W. R. W. Parkinson disease: the relationship between non-motor symptoms and motor phenotype. Can. J. Neurol. Sci. J. Canadien des. Sci. Neurologiques 43, 261–267 (2015).
Todorova, A., Jenner, P. & Ray Chaudhuri, K. Non-motor Parkinson’s: integral to motor Parkinson’s, yet often neglected. Pract. Neurol. 14, 310–322 (2014).
World Health Organization (2023) Parkinson disease. Available at: https://www.who.int/news-room/fact-sheets/detail/parkinson-disease (Accessed: June (WHO, 2024).
Parkinson’s Foundation Non-Movement Symptoms. Available at: https://www.parkinson.org/understanding-parkinsons/non-movement-symptoms/ (Accessed: 5 July (2024).
World Health Organization (2022). Parkinson disease: A public health approach. Technical brief. Geneva: World Health Organization. Available at: https://www.who.int/publications/i/item/9789240050983 (Accessed: 10 July (WHO, 2024).
Yang, W. et al. Current and projected future economic burden of Parkinson’s disease in the U.S. npj Parkinson’s Disease 6, https://doi.org/10.1038/s41531-020-0117-1 (2020).
Gumber, A. et al. Economic, Social and Financial Cost of Parkinson’s on Individuals, Carers and their Families in the UK. Sheffield Hallam University (2017). Available at: https://shura.shu.ac.uk/15930/. (Accessed: 5th July (SHURA, 2024).
Chen, J. J. & Marsh, L. Anxiety in Parkinson’s disease: identification and management. Therapeutic Adv. Neurol. Disord. 7, 52–59 (2013).
Thompson, A. W. et al. Diagnostic accuracy and agreement across three depression assessment measures for Parkinson’s disease. Parkinsonism Relat. Disord. 17, 40–45 (2011).
Koshimoto, B. H. B. et al. Floor and ceiling effects on the Montreal Cognitive Assessment in patients with Parkinson’s disease in Brazil. Dementia & Neuropsychologia 17, https://doi.org/10.1590/1980-5764-dn-2023-0022 (2023).
Sun, M. et al. Predictive value of machine learning in diagnosing cognitive impairment in patients with Parkinson’s disease: a systematic review and meta-analysis. Ann. Palliat. Med. 11, 3775–3784 (2022).
Bounsall, K., Milne-Ives, M., Hall, A., Carroll, C. & Meinert, E. Artificial Intelligence Applications for Assessment, Monitoring, and Management of Parkinson Disease Symptoms: Protocol for a Systematic Review. JMIR Research Protocols 12, https://doi.org/10.2196/46581 (JMIR, 2023).
Whiting, P. F. et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.
Jeon, J. et al. Accuracy of machine learning using the montreal cognitive assessment for the diagnosis of cognitive impairment in Parkinson’s Disease. J. Mov. Disord. 15, 132–139 (2022).
Morales, D. A. et al. Predicting dementia development in Parkinson’s disease using Bayesian network classifiers. Psychiatry Res. Neuroimag. 213, 92–98 (2013).
Ricciardi, C. et al. Machine learning can detect the presence of Mild cognitive impairment in patients affected by Parkinson’s Disease. (IEEE, 2020).
Zhang, J. et al. Identifying Parkinson’s disease with mild cognitive impairment by using combined MR imaging and electroencephalogram. Eur. Radiol. 31, 7386–7394 (2021).
Booth, S., Park, K. W., Lee, C. S. & Ko, J. H. Predicting cognitive decline in Parkinson’s disease using FDG-PET–based supervised learning. J. Clin. Investig. 132, https://doi.org/10.1172/jci157074 (2022).
Chen, P.-H., Hou, T.-Y., Cheng, F.-Y. & Shaw, J.-S. Prediction of Cognitive Degeneration in Parkinson’s Disease Patients Using a Machine Learning Method. Brain Sci. 12, https://doi.org/10.3390/brainsci12081048 (2022).
Harvey, J. et al. Machine learning-based prediction of cognitive outcomes in de novo Parkinson’s disease. npj Parkinson’s Disease 8, https://doi.org/10.1038/s41531-022-00409-5 (2022).
Shibata, H. et al. Machine learning trained with quantitative susceptibility mapping to detect mild cognitive impairment in Parkinson’s disease. Parkinsonism Relat. Disord. 94, 104–110 (2022).
Chen, B. et al. Detection of mild cognitive impairment in Parkinson’s disease using gradient boosting decision tree models based on multilevel DTI indices. J. Transl. Med. 21, https://doi.org/10.1186/s12967-023-04158-8 (2023).
Parajuli, M., Amara, A. W. & Shaban, M. Deep-learning detection of mild cognitive impairment from sleep electroencephalography for patients with Parkinson’s disease. Plos One 18, https://doi.org/10.1371/journal.pone.0286506 (2023).
Hosseinzadeh, M. et al. Prediction of Cognitive Decline in Parkinson’s Disease Using Clinical and DAT SPECT Imaging Features, and Hybrid Machine Learning Systems. Diagnostics 13, https://doi.org/10.3390/diagnostics13101691 (2023).
Anjum, M. F. et al. Resting-state EEG measures cognitive impairment in Parkinson’s disease. npj Parkinsons Disease 10, https://doi.org/10.1038/s41531-023-00602-0 (2024).
Beheshti, I. & Ko, J. H. Predicting the occurrence of mild cognitive impairment in Parkinson’s disease using structural MRI data. Front. Neurosci. 18, https://doi.org/10.3389/fnins.2024.1375395 (2024).
Huang, X. et al. Structural connectivity from DTI to predict mild cognitive impairment in de novo Parkinson’s disease. NeuroImage Clin. 41, https://doi.org/10.1016/j.nicl.2023.103548 (2024).
Jian, Y. et al. Prediction of cognitive decline in Parkinson’s disease based on MRI radiomics and clinical features: A multicenter study. CNS Neurosci. Therapeutics 30, https://doi.org/10.1111/cns.14789 (2024).
Shu, Z. L. et al. fNIRS-based graph frequency analysis to identify mild cognitive impairment in Parkinson’s disease. J. Neurosci. Methods 402 https://doi.org/10.1016/j.jneumeth.2023.110031 (2024).
Sorensen, G. L., Kempfner, J., Jennum, P. & Sorensen, H. B. Detection of arousals in Parkinson’s disease patients. Annu. Int. Conf. IEEE Eng. Med Biol. Soc. 2011, 2764–2767 (2011).
Bisgin, P., Houta, S., Burmann, A. & Lenfers, T. Rem sleep stage detection of parkinson’s disease patients with rbd. 389 LNBIP, 35-45 (2020).
Byeon, H. Exploring the predictors of rapid eye movement sleep behavior disorder for Parkinson’s Disease patients using classifier ensemble. Healthcare 8, https://doi.org/10.3390/healthcare8020121 (2020).
Ko, Y.-F. et al. Quantification analysis of sleep based on smartwatch sensors for Parkinson’s Disease. Biosensors 12, https://doi.org/10.3390/bios12020074 (2022).
Chong-Wen, W., Sha-Sha, L. & Xu, E. Predictors of rapid eye movement sleep behavior disorder in patients with Parkinson’s disease based on random forest and decision tree. Plos One 17, https://doi.org/10.1371/journal.pone.0269392 (2022).
Raschellà, F., Scafa, S., Puiatti, A., Martin Moraud, E. & Ratti, P. L. Actigraphy enables home screening of rapid eye movement behavior disorder in Parkinson’s Disease. Ann. Neurol. 93, 317–329 (2022).
Rechichi, I., Gangi, L. D., Zibetti, M. & Olmo, G. Home Monitoring of Sleep Disturbances in Parkinson’s Disease: A Wearable Solution. 106-111 https://doi.org/10.1109/PerComWorkshops59983.2024.10502893 (2024).
Zhang, X. et al. Aberrant functional connectivity and activity in Parkinson’s disease and comorbidity with depression based on radiomic analysis. Brain Behav. 11, e02103 (2021).
Espinoza, A. I. et al. A pilot study of machine learning of resting-state EEG and depression in Parkinson’s disease. Clin. Parkinsonism Relat. Disord. 7, 100166 (2022).
Yang, Y. et al. Identifying depression in Parkinson’s Disease by using combined diffusion tensor imaging and support vector machine. Front. Neurol. 13, 878691 (2022).
Jia, M. et al. Early identification of Parkinson’s disease with anxiety based on combined clinical and MRI features. Front. Aging Neurosci. 16, 1414855 (2024).
Iakovakis, D. et al. Smartwatch-based activity analysis during sleep for early Parkinson’s disease detection. Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2020, 4326–4329 https://doi.org/10.1109/EMBC44109.2020.9176412 (2020).
Prashanth, R. et al. Parkinson’s disease detection using olfactory loss and REM sleep disorder features. Proceedings of the 2014 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2014, 5764–5767 https://doi.org/10.1109/EMBC.2014.6944937 (2014).
Altham, C., Zhang, H. & Pereira, E. Machine learning for the detection and diagnosis of cognitive impairment in Parkinson’s Disease: a systematic review. Plos One 19 https://doi.org/10.1371/journal.pone.0303644 (2024).
Prashanth, R. et al. High-accuracy detection of early Parkinson’s Disease through multimodal features and machine learning. Int. J. Med. Inform. 90, 13–21 (2016).
Pearson, O. et al. The relationship between sleep disturbance and cognitive impairment in mood disorders: a systematic review. J. Affect. Disord. 327, 207–216 (2023).
Author information
Authors and Affiliations
Contributions
S.C. conceived the topic, draughted the protocol, conducted the searches, screening, data extraction and analysis, and wrote the first draft of the manuscript. R.B.-S., C.C., and M.M.-I. contributed to the revisions. E.M. performed the final review and supervised its execution. All authors contributed to the article and approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
EM is an Editorial Board Member of Nature Scientific Reports. EM is the Co-Founder and Chief Executive Officer of Gnosis Health Limited, a company specialising in the design and development of digital tools for chronic disease management. Newcastle University and the University of Plymouth are shareholders in Gnosis Health Limited. All other authors declare no competing interests.
Peer review
Peer review information
Communications Medicine thanks Oury Monchi, Artur Chudzik and Justyna Skibińska for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chou, S.C., Cong, C., Brownson-Smith, R. et al. Assessment of mental and behavioural non-motor symptoms of Parkinson’s Disease using Artificial Intelligence (AI): a systematic review. Commun Med 6, 101 (2026). https://doi.org/10.1038/s43856-025-01304-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s43856-025-01304-9



