Abstract
Managing Parkinson’s disease (PD) through medication can be challenging due to varying symptoms and disease duration. This study aims to demonstrate the potential of sequence-by-sequence algorithms in recommending personalized medication combinations for patients with PD based on their previous visits. Our proposed method employs a gated recurrent unit model to predict accurate combinations of critical medication types for PD based on each patient’s motor symptoms and prescribed medication from previous visits. We built a multi-label model with gated recurrent units on two data architectures: (1) personalized input using each patient’s previous visits as a sample and (2) non-personalized input treating each visit as an independent sample. The 10-fold cross-validation results showed that the personalized architecture model outperforms the non-personalized model in accuracy (0.92), precision (0.94), recall (0.94), F1-score (0.94), Hamming loss (0.03), and macro average area under the receiver operating characteristic (0.94). To interpret the model’s predictions, we employed SHapley Additive exPlanations (SHAP) values, which provide insights into the importance of variables both globally (across the entire model) and at the individual patient level. The results contribute to the sequential-based decision support system potentially enhancing the remote management of PD pharmacologic issues.
Similar content being viewed by others
Introduction
Parkinson’s disease (PD) is a complex neurodegenerative disorder that progresses with loss of nerve cells in the substantia nigra of the brain1. Interaction of different factors such as genetics, age, gender, and lifestyle may lead to PD2,3. Patients with PD face various motor and non-motor symptoms during their lives4while there is no definite treatment for PD5. However, a multidisciplinary care program can approach complex diseases and improve patients’ quality of life6.
A critical aspect of the care and treatment program for PD is medication management, which effectively suppresses motor symptoms. As the severity of symptoms and disease duration for every patient differ, selecting the optimized combination of medication based on each patient is critical. PD medication can be categorized into three main groups: levodopa (LD), dopamine agonists (DA), and other medicines, including monoamine oxidase-B inhibitors (MAOBi), catechol-O-methyltransferase inhibitors (COMTi), amantadine, and anticholinergics7.
Although numerous guidelines are available for treating PD8,9the intricate nature of the disease can make it critical to extract relevant information from a comprehensive database encompassing a broad range of symptoms observed during successive visits. Decision support systems (DSSs) are invaluable10. In addition, in recent years, some researchers have focused on extracting disease severity using wearable sensors and smartphones, and achieving clinically matched results11,12,13 Integrating wearable systems with DSS can lead to the management of patients with Parkinson’s under the supervision of physicians. Artificial intelligence (AI) could speed up the achievement of accurate automated DSSs by using data from patients with PD.
Using actual patient data from multiple visits can offer physicians more precise insights and assist them in making optimized decisions based on prior experience, surpassing traditional guidelines. Because of the sequential nature of consecutive patient visits, we applied a branch of AI called recurrent neural networks (RNNs) to fill the gap. RNNs, with their internal memory, are perfect for processing sequential data step by step. Furthermore, multi-label classification is well performed by RNN, and therefore, it is suitable for our multi-medication study14.
Gated recurrent units (GRUs) and long-short-term memory (LSTM) cells are the most used types of RNNs that solve gradient vanishing issues of standard RNNs. However, the GRU model was selected in this study because it has fewer parameters and, as a result, faster calculations. Each unit of this network, having a reset gate and update gate, helps to capture short-term and long-term dependencies in sequences, respectively.
Although AI can provide more accurate solutions for decision support systems, the challenges associated with its functioning make it difficult to trust. We propose an explainable artificial intelligence (XAI) model to overcome this issue. The XAI helps users to understand the reasoning behind the AI decisions and trust the model by understanding which features have the most critical role in output decisions. Furthermore, using a local interpretation of XAI, the user can dive into the elements with a binding effect on the output for each instance, which is incredibly useful for clarifying our case of personalized decisions.
This work uses recurrent neural networks to predict medication combinations for PD patients. A personalized algorithm is more accurate than assuming each visit as an independent sample. An in-depth analysis was conducted using explainable AI to identify the parameters significantly impacting the model’s performance. The local interpreter determined the features significantly, predicting each patient’s medication combination.
Literature review
In recent years, substantial research has been performed by applying decision support systems to automate medication recommendations for patients with PD. Bohanec et al.15 developed expert models to suggest medication changes for patients. Researchers in the EU Horizon 2020 project PD-manager developed models to investigate when medication changes are needed, using motor and non-motor symptoms and epidemiological data as input16.
In another work, Boshkoska et al.17 developed two models based on expert knowledge and data. The researchers examined medication switching among eight states, focusing on symptoms such as dyskinesia severity, OFF periods, and mental health issues. They also considered factors like age, activity level, and comorbidities like hypertension and cardiovascular disease. Their proposed state transitions considered all potential LD, DA, and MAOBi combinations. An expert questionnaire and the Parkinson’s Progression Markers Initiative (PPMI)18 dataset were used to evaluate the model. In this group’s studies, the a-priori accuracy results in the expert model outperformed the data model.
Employing AI makes a significant change to develop decision support systems in areas such as disease diagnosis19,20,21selecting optimized treatment plans22,23,24,25automating surgery26,27risk management for disease prevention28,29,30enhancing medication decisions31,32,33,34drug discovery35,36,37,38and precision medicine39,40,41,42,43. Hong et al.44 employed recurrent neural networks to predict Alzheimer’s disease stages by considering patients’ information in consecutive visits. They used the LSTM model to connect patients’ previous temporal information to predict the disease’s future stage. Komorowski et al.22 optimized medication dosage to reduce sepsis mortality using a reinforcement learning (RL) algorithm.
Kim et al.45 employed artificial intelligence for the first time to predict an optimized medication regimen for patients with PD. They used the Markov decision process to consider the previous visit of each patient to predict the combination of medication in three classes of LD, DA, and all other medications to minimize motor symptoms (UPDRS part III) of the disease. Kim’s proposed model incorporates disease states at the current visit, actions involving combinations of three medication classes, and rewards representing the patient’s response to the treatment. To optimize future rewards, a virtual agent in a specific state selects an action through trial and error from previous options. The model’s objective is to minimize the overall UPDRS score. Using PPMI data, UPDRS part III was utilized to assess the effect of medication on each patient.
In a 2023 study, Baucum et al.46 extended the application of reinforcement learning (RL) to develop the first “patient-in-the-loop” framework to recommend personalized medication regimens using wearable wrist-mounted movement trackers. Using a dataset of 26 patients with PD, they paired a movement symptoms response to a medication simulation model to an RL policy. The model utilizes bradykinesia and dyskinesia scores derived from the wearable sensors, time since the last dose, three types of levodopa, and total equivalent dosage as state variables. It incorporated 22 actions, ranging from no medication to a combination of up to two levodopa dosages. Despite being tested on a limited dataset, the wearable-based RL policies demonstrated superior performance compared to policies derived from PPMI data.
In our previous research47we used recurrent neural networks to predict medication dosage for patients with PD. Recurrent neural networks, with their memory blocks, help to keep crucial information from previous patient visits. We input parameters such as all sub-items of the Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS)48Activities of Daily Living (ADL), age, and the Hoehn and Yahr scale into the recurrent neural networks over consecutive visits to predict medication doses. This study explores the potential of sequence-by-sequence time series methods to predict medication dosage using the PPMI dataset. This approach has not been extensively investigated in the context of PD’s treatment optimization and offers a promising way to model the temporal progression of the disease and its impact on medication needs. This method could provide insights into tailoring therapies for patients with PD by analyzing how dosage requirements may change over time.
Objectives
This study introduces a personalized approach to Parkinson’s disease (PD) medication management using a GRU-based model that leverages sequential data from each patient’s past visits. Unlike previous methods, such as quantitative multi-criteria analyses or earlier AI attempts, our model captures individual patient histories to achieve more accurate medication predictions. Furthermore, by incorporating explainable AI, we provide insights into the influence of past medications and current symptoms on model recommendations. This novel combination of personalized longitudinal modeling, multi-label classification, and explainable AI significantly advances decision support for optimizing PD medication regimens.
Data description and visualization
The dataset used in this study was extracted from the Parkinson’s Progression Markers Initiative (PPMI) database18 collected from various locations, making it unbiased from a single center. Table 1 shows the statistical parameters of extracted data. We employed the MDS-UPDRS Part III (motor symptoms) in “ON state,” age, gender, ADL, and medication. Patients with one visit without any follow-up were removed from the dataset. The remaining patients had an average of 6.4 visits, with a minimum of 2 and a maximum of 13 visits. Out of the 632 patients, all had at least two visits, while only five had at least 13 visits. The patients’ severity scores of UPDRS-Part III from the MDS-UPDRS were divided into four groups of symptoms: axial (sum of items 1, 9–13), rigidity (item 3), tremor (sum of items 15–18), and bradykinesia (sum of items 2, 4–9 and 14). Patients’ medication in each visit was extracted from the Levodopa Equivalent Daily Dosage (LEDD) concomitant medication log. This log contains patients’ medication usage with their commercial brands, LEDD dosage, and start and stop dates. All types of PD-related medication are categorized into three main groups: LD, DA, and other medications. The other medications consist of MAOBi, COMTi, and amantadine. Anticholarinegics were not included in the PPMI. Spelling mistakes were corrected, and different representations of medication names were considered the same. For example, “LEVODOP,” “LEVODAP,” and “L-DOPA,” which were recorded differently, were matched. The medication log’s start and stop dates were compared with patients’ visit dates.
Figure 1 shows the distribution of medication usage in the data set. There are eight combinations of LD, DA, and other medications from No medication (NoMed in Fig. 1) to all medications given (LDO in Fig. 1). As presented, the most common combinations are levodopa as a standalone medication (L), levodopa alongside other medications (LO), levodopa alongside dopamine agonists, and other medications (LDO).
Our study focuses on patients with early to moderate PD (mean Hoehn and Yahr score of 1.90 ± 0.56). The study population, drawn from the PPMI database, exhibits diverse motor symptoms (quantified by MDS-UPDRS Part III) and is actively undergoing various PD medication regimens. With an average of 6.4 follow-up visits each, this cohort provides the longitudinal data necessary to develop a personalized decision-support system that addresses the critical need for optimized medication management in PD.
Materials and methods
We selected RNNs for their well-established effectiveness in modeling sequential and temporal data, which is a key characteristic of our dataset. These models are particularly suited for capturing dependencies over time, making them a natural choice for analyzing longitudinal patient data. Additionally, RNNs strike a balance between complexity and computational feasibility. Advanced architectures, while powerful, often require significantly more computational resources and larger datasets to achieve optimal performance. Using RNNs, we aimed to provide a transparent and interpretable analysis that aligns with the practical constraints of our study, such as dataset size and clinical relevance.
Gated recurrent units
RNNs are promising solutions for sequential inputs. Given a sequence of \(\:X=({x}_{1},{x}_{2},\:.\:.\:.,\:{x}_{t})\), an RNN updates a recurrent hidden state \(\:\left({h}_{t}\right);\) the activation is based on
the previous time. The hidden state is updated based on
where \(\:\sigma\:\) is a bounded and smooth function, and W and U are the coefficients of \(\:{x}_{t}\) and \(\:{h}_{t-1}\), respectively. However, the vanishing gradient of RNNs makes it challenging for them to train long-term dependencies. Recurrent units such as LSTM and GRU models were proposed to address this issue49,50. This study used the GRU model due to its faster computational performance over LSTM models.
The main concept behind the GRUs is to update the hidden state at each time step, selectively. The GRU architecture includes a reset gate, which determines how much of the previous hidden state should be forgotten, and an update gate, which specifies how much of the new input should be used to update the hidden state. The reset gate \(\:\left({r}_{t}\right)\) is denoted as
where \(\:{W}_{r}\) and \(\:{U}_{r}\) are weights of input and previous hidden states, and \(\:{b}_{r}\) is the bias term. The update gate \(\:\left({z}_{t}\right)\) can be calculated as
where \(\:{W}_{z}\) and \(\:{U}_{z}\) are weights of input and previous hidden states and \(\:{b}_{z}\) is the bias term. A candidate hidden state can be computed as
in which \(\:{W}_{\stackrel{\sim}{h}}\) and \(\:{U}_{\stackrel{\sim}{h}}\) are weights of input and previous hidden states, which are multiplied by the reset gate and \(\:{b}_{\stackrel{\sim}{h}}\) is the bias term.
Finally, the hidden state at time t can be written as
Multi-label GRU model
We propose a multi-label classification network to predict the use of LD, DA, or other types of medication. Multi-label refers to allowing a sample to have more than one label as a correct prediction, but distinct labels will be used to compute a single loss. To make personalized predictions based on each patient, we have built the model inputs by considering symptoms and medication in the patient’s last five visits47. The model architecture for the medication combination prediction includes an input layer, three hidden layers, and an output layer (Fig. 2).
The input layer involves twelve features and the last five visits of each patient (time-steps = 5). The input features were motor symptoms consisting of axial, rigidity, tremor, and bradykinesia in “ON state,” total UPDRS part III score, Hoehn and Yahr scale to assess functional disabilities associated with PD, Daily Living Activities, which presents how much a patient can do their routines and essential activities without help, gender, age, and the presence of each of three class of LD, DA and other medication in the previous visit (0 or 1). All features were normalized to [0,1] for more homogeneity. In constructing the input layer, the last five visits were considered for patients with more than five consecutive visits. For patients with fewer than five visits, the inputs were padded to five by inserting zeros for features at the start of the sequence. The proposed model’s input is a 3D array whose dimensions are number of patients (632), time-steps (5), and number of features (12).
The hidden layers in this model comprise two GRU layers and one fully connected dense layer. The first GRU has 100 units, while the second has 50 units. The fully connected dense layer has been optimized with 20 neurons using trial and error techniques. The L2 kernel regularization was applied to avoid overfitting. The Relu activation function was used for all hidden layers since it addresses the gradient vanishing.
Finally, the output layer for our multi-medication purpose has three neurons, each for a medication class, including LD, DA, and Other (MAOBi, COMTi, and amantadine). The ‘Sigmoid’ activation function was used for the output classification layer to predict any medication type’s presence (1) or absence (0). For example, the output for patients using a combination of LD and DA without any other medication would be “110”.
As a comparison, we applied a non-personalized model with another input form, in which all patients’ samples from different visits without considering each patient’s previous visit history (time-step = 1) were considered. In this case, the input is a 3D array with subsequent dimensions: the number of all visits (4041), time-steps of (1), and the number of features (12). Furthermore, we evaluated simpler machine learning models, namely logistic regression51 and random forest52in addition to the non-personalized baseline model.
Evaluation metrics
In multi-label classification, each sample can belong to more than one class. This means that accuracy may not be a fair evaluation metric. The metrics used are the same as in multi-class classification. Here, we describe a multi-label model’s most critical performance metrics53. Considering a multi-label classifier with n sample, k label of two classes 0,1, \(\:{Y}_{i}\) ground truth label of \(\:{i}^{th}\) sample and \(\:\widehat{{Y}_{i}}\:\)predicted labels of \(\:{i}^{th}\) sample, the example-based performance metrics will be as follows54.
Subset accuracy, also known as exact match ratio (MR), measures when the actual Y is the same as \(\:\widehat{Y}\), and it is an extension of accuracy for single-label classification. MR ignores partially matched as completely unmatched samples.
Precision is the average of matched predicted labels divided by the total number of predicted labels through all samples. It shows how many predicted true labels are actually in the ground truth.
Recall is the average of matched predicted labels divided by the total number of the actual labels through all samples and presents what proportion of the actual labels were predicted.
F1 measure combines precision and recall and indicates the correct prediction across the entire dataset.
Hamming loss is the prediction error and missing error normalized over the total number of samples and classes.
Confidence Interval (CI) is a statistical range used to estimate the uncertainty of a performance metric, providing a range within which the true value is likely to fall with a specified level of confidence (e.g., 95%). For a metric M, the CI is calculated as:
where \(\:\stackrel{-}{M}\) is the mean metric, \(\:{Z}_{\raisebox{1ex}{$\alpha\:$}\!\left/\:\!\raisebox{-1ex}{$2$}\right.}\) is the critical value for the desired confidence level (e.g., 1.96 for 95%), and SE is the standard error of the metric.
Receiver operating characteristic (ROC) is a measure to visually evaluate binary classification by measuring the area under the ROC curve (AUC). In the ROC curve, the y-axis indicates the true positive rate (TPR), and the x-axis is the false positive rate (FPR). As a result, the area under the curve and the steepness of it show essential information. The classification results become more accurate with a higher AUC and curve slope. In multi-class and multi-label classification problems, two methods exist to obtain TPR and FPR. The first one is one-vs-rest, which considers each class across all other classes, and another is one-vs-one, which considers every pairwise combination of classes.
To enhance our understanding of the relationships between medication classes and their effect on classification accuracy, we conducted an analysis using a co-occurrence matrix54. By tracking how frequently instances of each class overlap with instances of other medication classes, this matrix can provide insight into the extent of label confusion and potential misclassification.
Shapley additive explanation
SHapley Additive exPlanations (SHAP) is a framework based on game theory that explains black-box supervised learning algorithms. Estimating the importance of each feature in collaboration with others defines the positive and negative effects of input features on the output55. The importance of each input feature (\(\:{\phi\:}_{i}\)) in SHAP is defined as:
In Eq. 11, f(S) is the output of the model with a set of input features of S, N is the set involving all features, and the shapely value (\(\:{\phi\:}_{i}\)) of ith feature is calculated by averaging its contribution through all possible permutations of the feature set. Among different introduced methods to interpret artificial intelligence algorithms56,57,58we selected SHAP because it can demonstrate the positive and negative direction of effect alongside its magnitude (global interpretability); furthermore, it can calculate SHAP values for a single prediction without considering the whole dataset (local interpretability), which is very useful to investigate the effective parameters for each patient individually in our study. Finally, SHAP can explain a wide range of models, including linear models, tree-based models, and neural networks59. Therefore, in this study, we utilized SHAP values as the primary method for model interpretability. SHAP values provide insight into the contribution of each feature to the model’s predictions, offering a local interpretability approach.
Results
To ensure that the results are more dependable, we used k-fold cross-validation in this study. We divided the dataset into k non-overlapping folds and used each fold as a test set while the rest were combined to create a training set. After fitting and evaluating k models on the test sets, we report the mean performance. The method’s performance on personalized and non-personalized input with 10-fold cross-validation can be seen in Table 2. Our proposed model improves classification results with higher average accuracy, precision, recall, F1 scores, and lower Hamming Loss. The results in Table 2 are the average of a 10-fold cross-validation process to increase reliability. To further evaluate the potential of our personalized model to predict the multi-medication therapy of patients with PD, the ROC curves were drawn for both personalized and non-personalized models (Fig. 3).
In our 10-fold cross-validation with 3 repeats, we evaluated 1,890 true and predicted labels. The most frequent misclassification (observed in 25 repetitions) occurred between LD-only usage and LD combined with other medications.
This paper used the one-vs-rest (OVR) method to binarize the output. In the one-vs-rest method, the ROC curve is computed for all classes where each class is considered true, and all other classes are false. There are two notions in the ROC curve: macro-average and micro-average AUC. Macro- average AUC is the average of AUC through all classes, and micro-average is a weighted score characterized by more frequent classes. Figure 3(a) shows the ROC of our proposed sequential multi-label classification for personalized input structures. The macro-average and micro-average are 0.94 and 0.96 for personalized input, respectively. Moreover, the area under the curve for LD, DA, and other medications is 0.9, 0.96, and 0.97. Figure 3(b) presents the ROC curve for non-personalized input, in which the macro-average and micro-average are 0.85 and 0.84, respectively. Furthermore, the AUC for LD, DA, and other medications is 0.75, 0.87, and 0.9 less than our personalized model.
The co-occurrence matrix revealed overlaps between the LD class and other medication groups, particularly Other (315 instances) and DA (210 cases). This overlap suggests potential label confusion, as instances of LD may share similar feature distributions with Other and DA, leading to misclassification and a reduction in the model’s accuracy for the LD class.
The ROC of one vs. rest classification of personalized and non-personalized multi-label predictions. (a) Personalized input with macro-average and micro-average AUC of 0.94 and 0.96, respectively, showing higher AUC for individual medication classes. (b) Non-personalized input with macro-average and micro-average AUC of 0.85 and 0.84, demonstrating lower performance compared to the personalized model.
Feature importance using SHAP
Feature importance is a way of measuring how much each feature contributes to the model’s prediction. A SHAP values bar plot is a visualization type that can show a feature’s importance for a given model. It can help explain how the model works and the features most relevant to Its decisions. As demonstrated in Fig. 4, five key parameters for the LD class are LD, DA, gender, other medication usage in previous visits, and ADL. For the DA class, the DA, LD usage in the earlier visits, age, other medication, and gender are included. other medication class includes LD, other medication, and DA usage in previous medication, gender, and ADL.
Local interpretation
The SHAP force plot is a visualization that shows the impact of features on the model output for a specific instance or group of instances. It helps discover how features contribute to deciding each patient’s medication combination. Our study randomly selected a sample to represent how features impact its three labels. The SHAP force plot in Fig. 5 visualizes the most influential features of the prediction for the selected sample. The bold number on the x-axis is the baseline for a specific observation, the average of all predictions60. The feature contributions’ length and color indicate the effect’s magnitude and direction.
For example, a long and red contribution means that the feature has a positive and large impact on the model output. In contrast, a short and blue contribution means that the feature has a negative and small impact on the model output. The SHAP values were averaged through the five last patient visits for easier representation. As depicted, the baseline for LD is 0.90 for the randomly selected patients. The increasing probability of using LD effects, such as previous LD usage and age, are offset by decreasing impacts, such as previous DA and other medications. The low probability baseline of 0.34 for DA is affected by decreasing effects such as previous DA and LD usage, offset by the low, increasing effects of tremor and bradykinesia. In the same way, the other medication probability is also low for this patient (baseline = 0.47), which decreases effects such as previous medication usage and age, overcoming increasing effects such as bradykinesia and previous DA usage.
SHAP local interpretation for a sample patient’s medication usage: (a) LD, (b) DA, and (c) other medications. The SHAP values, averaged over the last five visits, highlight the positive and negative impacts of features such as previous medication usage, age, and symptoms like tremor and bradykinesia on medication probabilities.
Conclusion
In this study, we introduced a decision-making algorithm designed to suggest suitable medication combinations for patients with Parkinson’s disease based on their past visit history. We discovered that taking into account each patient’s medication history can significantly enhance the recommendations for their next visit. This kind of system could assist both physicians and patients in making informed decisions and could be integrated into an automated system for remotely managing Parkinson’s disease patients. We utilized an explainable AI system to analyze how input features could impact output results for each patient in both general and local contexts. This tool allows physicians to focus on the most crucial features of each patient, considering their previous medication and symptoms.
Discussion
By incorporating data from various regions, our analysis reflects a broader spectrum of patient characteristics, treatment practices, and disease progression patterns, all critical for developing robust and widely applicable predictive models. The study carries enhanced clinical relevance on a global scale due to the diversity of its findings, which mitigates the risk of results being disproportionately influenced by regional healthcare practices, socio-economic factors, or demographic-specific variations. The proposed algorithm can be evaluated for its predictive performance and clinical utility in real-world clinical settings in the future. Prospective validation would be achieved by integrating the algorithm into routine clinical workflows, where its recommendations can be evaluated alongside clinical decision-making processes. Confirming its practical applicability and identifying areas for refinement would be achieved by this approach, ensuring that the model aligns with the dynamic and multifaceted nature of patient care in Parkinson’s disease.
The gender imbalance (372 men vs. 260 women) and excluding anticholinergic medications may introduce biases that could affect model generalizability. These factors might limit our model’s applicability to more diverse populations and treatment regimens. Future work should aim to mitigate these biases through balanced datasets and the inclusion of all relevant medications to enhance the model’s robustness and broader applicability.
In our preliminary experiments comparing GRU and LSTM, both models demonstrated comparable performance across key metrics: accuracy (90%), precision (94%), recall (94%), F1-score (93%), and Hamming loss (3%). Given the similar results and GRU’s lower computational complexity, we selected GRU to optimize the modeling process. While the current study focuses on the use of RNNs to model longitudinal patient data, we acknowledge that there are more advanced models, such as Transformers and attention-based mechanism61and Temporal Convolutional Networks62that may offer improved performance and scalability, particularly for more complex datasets, including image documents. In future work, we plan to explore these advanced methods to assess their effectiveness compared to RNNs. These newer architectures may provide enhanced accuracy and the ability to capture more intricate patterns in the data, especially when dealing with longer sequences or more heterogeneous datasets. However, implementing such models would require additional computational resources and a larger dataset for optimal training. As part of future developments, we will aim to integrate these techniques to further enhance the robustness and generalizability of the model, building on the findings from this study.
The proposed GRU-based model using 3D data offers a personalized approach by incorporating sequential information from patients’ consecutive visits, outperforming the non-personalized baseline model that treated each visit as an independent observation. To further validate the robustness of our approach, we also compared it against simpler machine learning models, such as Logistic Regression and Random Forest, alongside the non-personalized model. These simpler models were applied to a 2D representation of the data and consistently demonstrated lower performance, highlighting the value of incorporating temporal dynamics and personalized modeling in predicting outcomes.
Feature importance rankings derived from SHAP values have significant potential to support clinical decision-making. These rankings can aid in risk stratification by identifying patients most likely to benefit from specific interventions based on their risk profile. Furthermore, by consistently identifying highly important features, SHAP analysis can direct clinical attention toward predictors that may warrant additional investigation or monitoring. For example, the key features identified in our SHAP importance analysis—LD, DA, gender, other PD medications, and ADL—are crucial for predicting whether levodopa should be included in subsequent medication regimens. These findings align with established clinical guidelines for PD management. Levodopa remains the cornerstone of PD treatment, and motor symptoms influence its future prescription. Dopamine agonists, on the other hand, help manage fluctuations and delay levodopa dependency. Moreover, gender differences impact treatment choices, with men often experiencing more motor symptoms. The use of other PD medications, such as MAOBi and COMTi inhibitors, also plays a role in optimizing therapy and preventing motor complications. Lastly, ADL scores, reflecting functional decline, are key indicators for adjusting therapy to maintain patient quality of life. These features underscore the importance of personalized decision-making in PD treatment planning.
The use of SHAP force plots in this study allowed us to visually demonstrate how individual features can influence the choice of LD, DA, or other medications for a specific patient. Clinical workflows can be directly integrated with these insights in multiple ways. In the first place, they provide physicians with explanations that are both interpretable and actionable, which help them make individualized treatment decisions and enhance shared decision-making with patients. They emphasize the primary factors that influence medication choice, enabling clinicians to prioritize these factors during patient assessments. Finally, aggregating these interpretations across patients could inform clinical guidelines and refine treatment protocols. By bridging the gap between predictive modeling and clinical application, SHAP-based local explanations can contribute to more transparent, data-driven, and patient-centered care.
Our analysis revealed misclassifications mostly between LD-only and LD combination alongside other therapies; clinically, other medications such as MAOBi, COMTi and amantadine are often prescribed to enhance levodopa’s effects or manage motor symptoms in PD, resulting in comparable therapeutic outcomes. Consequently, patients on LD monotherapy and those on combination therapies may exhibit overlapping symptom profiles, making differentiation challenging. Co-occurrence matrix results further showed label confusion between LD and Other classes, reducing model accuracy. The underrepresentation of subgroups contributed to these errors, highlighting the need for better model sensitivity and balanced datasets. These findings emphasize the clinical complexity of PD treatment and the importance of addressing overlaps in future predictive models.
While our data-driven approach offers significant advantages over expert-driven systems like PD_manager15as it adapts to individual patient histories and predicts specific medication combinations, it is not without limitations. Our model’s performance depends on sufficient longitudinal patient data, which may be a challenge in specific clinical settings or for newly diagnosed patients. Furthermore, while our explainable AI component enhances transparency, it does not capture the deep causal relationships between symptoms, medications, and outcomes that expert clinicians develop over years of experience. Therefore, future research should focus on hybrid approaches, integrating expert-elicited knowledge and data-driven methods, as well as exploring causal inference techniques to understand the more profound impact of interventions.
While our GRU-based decision-support system demonstrates strong predictive accuracy in research populations—as evidenced by its performance on the PPMI dataset—its generalizability to real-world clinical settings remains uncertain. As underscored by Beaulieu-Jones et al.63disease progression and treatment patterns differ markedly between research cohorts and clinical practice. Their findings reveal that real-world PD populations are typically older, exhibit faster cognitive decline, carry more significant comorbidity burdens, and experience delayed treatment initiation compared to research cohorts like PPMI. These discrepancies, particularly in treatment timing and disease trajectory, suggest that our model’s performance in complex, real-world scenarios requires further validation. To address this limitation, future work will focus on two critical steps: (1) adapting the algorithm’s architecture to account for delayed treatment initiation and accelerated cognitive decline observed in clinical populations, and (2) rigorous evaluation on diverse, real-world datasets that better reflect the heterogeneity of broader PD populations, including socioeconomic and demographic variability. Such efforts are essential to bridge the gap between research validation and clinical deployment.
As part of our future work, we plan to investigate additional fairness metrics and techniques, such as adversarial debiasing or re-weighting the training samples, to reduce potential bias further. Additionally, we aim to collaborate with healthcare professionals to explore the ethical and clinical implications of deploying the model in real-world settings.
The training process took approximately 202.4 s and consumed a peak memory of 606.64 MB, performed on a MacBook Air with an Apple M1 chip and 16 GB of memory. While the GRU model with 10-fold cross-validation used in this study demonstrates reasonable computational efficiency for the current dataset size, we acknowledge that resource demands may increase significantly with larger datasets or more frequent model retraining in real-world applications. To mitigate these challenges, we propose potential strategies such as model optimization, dimensionality reduction, or employing distributed computing resources to enhance scalability.
Data availability
The dataset generated and analyzed during the current study was extracted from the PPMI database. Up-to-date information is available at www.ppmi-info.org by request. For further inquiries regarding data access, please contact the corresponding author, Mehdi Delrobaei, at delrobaei@kntu.ac.ir.
References
Armstrong, M. J. & Okun, M. S. Diagnosis and treatment of Parkinson disease. JAMA 323, 548 (2020).
Marras, C., Canning, C. G. & Goldman, S. M. Environment, lifestyle, and parkinson’s disease: implications for prevention in the next decade. Mov. Disord. 34, 801–811 (2019).
Picillo, M. et al. The relevance of gender in parkinson’s disease: a review. J. Neurol. 264, 1583–1607 (2017).
Sveinbjornsdottir, S. The clinical symptoms of parkinson’s disease. J. Neurochem. 139, 318–324 (2016).
Kalia, L. V. & Lang, A. E. Parkinson’s disease. Lancet 386, 896–912 (2015).
Tosserams, A., de Vries, N. M., Bloem, B. R. & Nonnekes, J. Multidisciplinary care to optimize functional mobility in Parkinson disease. Clin. Geriatr. Med. 36, 159–172 (2020).
Grosset, D. G., Macphee, G. J. A. & Nairn, M. Diagnosis and Pharmacological management of parkinson’s disease: summary of SIGN guidelines. BMJ 340, b5614–b5614 (2010).
Oertel, W. & Schulz, J. B. Current and experimental treatments of Parkinson disease: A guide for neuroscientists. J. Neurochem. 139, 325–337 (2016).
Olanow, C. W., Watts, R. L. & Koller, W. C. An algorithm (decision tree) for the management of parkinson’s disease (2001):: treatment. Neurology 56, S1–S88 (2001).
Schaefer, A. J., Bailey, M. D., Shechter, S. M. & Roberts, M. S. Modeling medical treatment using Markov decision processes. in 593–612 (2005).
Chen, O. Y. et al. Building a Machine-Learning framework to remotely assess parkinson’s disease using smartphones. IEEE Trans. Biomed. Eng. 67, 3491–3500 (2020).
Delrobaei, M. et al. Towards remote monitoring of parkinson’s disease tremor using wearable motion capture systems. J. Neurol. Sci. 384, 38–45 (2018).
Pulliam, C. L. et al. Continuous assessment of Levodopa response in parkinson’s disease using wearable motion sensors. IEEE Trans. Biomed. Eng. 65, 159–164 (2018).
Lipton, Z. C., Kale, D. C., Elkan, C. & Wetzel, R. Learning to Diagnose with LSTM Recurrent Neural Networks. (2015).
Bohanec, M. et al. A decision support system for Parkinson disease management: expert models for suggesting medication change. J. Decis. Syst. 27, 164–172 (2018).
[Online] http://www.parkinson-manager.eu/ (Accessed: May 30, 2025).
Boshkoska, B. M. et al. Decision support for medication change of parkinson’s disease patients. Comput. Methods Programs Biomed. 196, (2020).
Marek, K. et al. The Parkinson progression marker initiative (PPMI). Prog Neurobiol. 95, 629–635 (2011).
Orchi, H., Sadik, M. & Khaldoun, M. On using artificial intelligence and the internet of things for crop disease detection: A contemporary survey. Agriculture 12, 9 (2021).
Kumar, Y., Koul, A., Singla, R. & Ijaz, M. F. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J. Ambient Intell. Humaniz. Comput. 14, 8459–8486 (2023).
Amato, F. et al. Artificial neural networks in medical diagnosis. J. Appl. Biomed. 11, 47–58 (2013).
Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. & Faisal, A. A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 1716–1720 (2018).
Kearney, V., Chan, J. W., Valdes, G., Solberg, T. D. & Yom, S. S. The application of artificial intelligence in the IMRT planning process for head and neck cancer. Oral Oncol. 87, 111–116 (2018).
Wang, C., Zhu, X., Hong, J. C. & Zheng, D. Artificial intelligence in radiotherapy treatment planning: present and future. Technol. Cancer Res. Treat. 18, 153303381987392 (2019).
Kiser, K. J., Fuller, C. D. & Reed, V. K. Artificial intelligence in radiation oncology treatment planning: a brief overview. J. Med. Artif. Intell. 2, 9–9 (2019).
Wall, J. & Krummel, T. The digital surgeon: how big data, automation, and artificial intelligence will change surgical practice. J. Pediatr. Surg. 55, 47–50 (2020).
Gumbs, A. A. et al. Artificial intelligence surgery: how do we get to autonomous actions in surgery?? Sensors 21, 5526 (2021).
Newby, D. et al. Artificial Intelligence for Dementia Prevention (Alzheimer’s & Dementia, 2023).
Al-Maini, M. et al. Artificial intelligence-based preventive, personalized and precision medicine for cardiovascular disease/stroke risk assessment in rheumatoid arthritis patients: a narrative review. Rheumatol. Int. 43, 1965–1982 (2023).
Dong, J. et al. Application of big data and artificial intelligence in COVID-19 prevention, diagnosis, treatment and management decisions in China. J. Med. Syst. 45, 84 (2021).
Iancu, A., Leb, I., Prokosch, H. U. & Rödle, W. Machine learning in medication prescription: a systematic review. Int. J. Med. Inf. 105241 (2023).
Poweleit, E. A., Vinks, A. A. & Mizuno, T. Artificial intelligence and machine learning approaches to facilitate therapeutic drug management and Model-Informed precision dosing. Ther. Drug Monit. 45, 143–150 (2023).
Eggerth, A., Hayn, D. & Schreier, G. Medication management needs information and communications technology-based approaches, including telehealth and artificial intelligence. Br. J. Clin. Pharmacol. 86, 2000–2007 (2020).
Romm, E. L. & Tsigelny, I. F. Artificial intelligence in drug treatment. Annu. Rev. Pharmacol. Toxicol. 60, 353–369 (2020).
Zhu, H. Big data and artificial intelligence modeling for drug discovery. Annu. Rev. Pharmacol. Toxicol. 60, 573–589 (2020).
Chan, H. C. S., Shan, H., Dahoun, T., Vogel, H. & Yuan, S. Advancing drug discovery via artificial intelligence. Trends Pharmacol. Sci. 40, 592–604 (2019).
Jiménez-Luna, J., Grisoni, F., Weskamp, N. & Schneider, G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin. Drug Discov. 16, 949–959 (2021).
Deng, J., Yang, Z., Ojima, I., Samaras, D. & Wang, F. Artificial intelligence in drug discovery: applications and techniques. Brief. Bioinform 23, (2022).
Bhinder, B., Gilvary, C., Madhukar, N. S. & Elemento, O. Artificial intelligence in Cancer research and precision medicine. Cancer Discov. 11, 900–915 (2021).
Ahmed, Z., Mohamed, K., Zeeshan, S. & Dong, X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database (2020). (2020).
Hamamoto, R. et al. Application of artificial intelligence technology in oncology: towards the establishment of precision medicine. Cancers (Basel). 12, 3532 (2020).
Subramanian, M. et al. Precision medicine in the era of artificial intelligence: implications in chronic disease management. J. Transl Med. 18, 472 (2020).
Filipp, F. V. Opportunities for artificial intelligence in advancing precision medicine. Curr. Genet. Med. Rep. 7, 208–213 (2019).
Hong, X. et al. Predicting alzheimer’s disease using LSTM. IEEE Access. 7, 80893–80901 (2019).
Kim, Y., Suescun, J., Schiess, M. C. & Jiang, X. Computational medication regimen for parkinson’s disease using reinforcement learning. Sci. Rep. 11, (2021).
Baucum, M., Khojandi, A., Vasudevan, R. & Ramdhani, R. Optimizing Patient-Specific medication regimen policies using wearable sensors in parkinson’s disease. Manage. Sci. 69, 5964–5982 (2023).
Riasi, A., Delrobaei, M. & Salari, M. A decision support system based on recurrent neural networks to predict medication dosage for patients with parkinson’s disease. Sci. Rep. 14, 8424 (2024).
Goetz, C. G. et al. MDS-UPDRS 1, (2008).
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. (2014).
Hochreiter, S. & Schmidhuber, J. Long Short-Term memory. Neural Comput. 9, 1735–1780 (1997).
Nick, T. G. & Campbell, K. M. Logistic regression. Top. Biostatistics, 273–301 (2007).
Breiman, L. Random Forests Mach. Learn. 45, 5–32 (2001).
Sorower, M. S. A literature survey on algorithms for Multi-Label learning. Or. State Univ. Corvallis. 18, 25 (2010).
Kafrawy, P., El, Mausad, A. & Esmail, H. Experimental comparison of methods for Multi-Label classification in different application domains. Int. J. Comput. Appl. vol. 114 (2015).
Winter, E. Chapter 53 The shapley value. Handbook of Game Theory with Economic Applications 3, 2025–2054 (2002).
Štrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41, 647–665 (2014).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should i trust you?’ Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 1135–1144 (2016).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning Important Features Through Propagating Activation Differences. In International conference on machine learning, 3145–3153 (2017).
Chen, S. Interpretation of multi-label classification models using Shapley values. ArXiv Preprint arXiv :210410505 (2021).
Lubo-Robles, D. et al. Machine learning model Interpretability using SHAP values: application to a seismic facies classification task. In SEG technical program, D021S008R006 (2020).
Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, I (2017).
Hao, H. et al. Temporal Convolutional Attention-based Network For Sequence Modeling. arXiv preprint arXiv:2002.12530 (2020).
Beaulieu-Jones, B. K. et al. Disease progression strikingly differs in research and real-world parkinson’s populations. NPJ Parkinsons Dis. 10, (2024).
Acknowledgements
This work was supported by K. N. Toosi University of Technology’s Chancellor’s Visionary Grant.
Author information
Authors and Affiliations
Contributions
A.R. performed the research and analyzed the data. A.R. and M.D. designed the conceptual framework and drafted the manuscript. M.S. provided the clinical perspective. All authors contributed to the interpretation of data and reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Riasi, A., Delrobaei, M. & Salari, M. Personalized medication recommendations for Parkinson’s disease patients using gated recurrent units and SHAP interpretability. Sci Rep 15, 19074 (2025). https://doi.org/10.1038/s41598-025-04217-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-04217-8