Introduction

Cognitive impairment (CI) is a common and early non-motor manifestation of Parkinson’s disease (PD), often emerging in the early stages of the disease and significantly impacting patient quality of life and caregiver burden1,2. It primarily affects executive/attention, memory, visuospatial and language domains2. Approximately 30% of patients with mild cognitive impairment in PD (PD-MCI), which may progress to PD with dementia (PDD)3. While widely used, Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) show imperfect sensitivity/specificity for PD-related CI and may miss milder or domain-specific deficits4,5, early detection of CI is challenging. Consequently, there is an urgent need for objective biomarkers that can assist in early diagnosis and monitoring of CI in PD.

Recent studies have linked bile acid (BA) dysregulation with neurodegeneration in both PD and Alzheimer’s disease (AD)6,7. However, it is essential to distinguish the disease-specific mechanisms that drive cognitive decline in these two neurodegenerative disorders. In AD, BA dysregulation has been primarily associated with the accumulation of amyloid plaques and tau. In contrast, PD is characterized by the accumulation of misfolded α-synuclein within the central nervous system, initially involving the dorsal motor nucleus of the vagus and anterior olfactory structures, and subsequently progressing in a stereotyped caudo-rostral pattern toward cortical regions8. Notably, α-syn pathology has been detected in the gastrointestinal tract before motor symptoms appear, suggesting a role for the gut-brain axis in PD pathogenesis9,10. Human studies report dysbiosis accompanied by elevations of secondary, microbially derived BAs in PD, suggesting a microbiota–BA link11.

BAs, synthesized in the liver from cholesterol and subsequently metabolized by the gut microbiota, act as signaling molecules that regulate metabolic, inflammatory, and neuronal processes12,13. Evidence from non-PD models suggests that BAs can influence cognitive function and blood–brain barrier (BBB) integrity14, though this remains hypothesis-generating and requires direct validation in PD. Several reviews have indicated that specific BAs can signal within the central nervous system (CNS)14,15,16, and circulating BA concentrations correlate with brain levels17, suggesting that peripheral BA alterations may modulate CNS physiology. Recent studies indicate that BA metabolism is disrupted in prodromal PD animal models18, and human investigations have revealed altered serum levels of specific BAs—most notably reduced unconjugated primary BAs such as cholic acid (CA) and chenodeoxycholic acid (CDCA), along with elevated conjugated species like glycochenodeoxycholic acid (GCDCA)—which correlate with the severity of CI. In addition, neuroprotective BAs such as ursodeoxycholic acid (UDCA) and tauroursodeoxycholic acid (TUDCA) have demonstrated mitochondrial rescue and anti-apoptotic effects in PD experimental models and early clinical studies19,20,21,22.

In PD, dysregulated BA metabolism may contribute to cognitive decline through the accumulation of neurotoxic secondary BAs, particularly deoxycholic acid (DCA), and the depletion of protective BAs such as UDCA20,23. Although evidence from AD supports an association between altered BA profiles and cognitive deterioration, experimental studies suggest that secondary BAs such as DCA may exacerbate neurodegeneration in AD through neuroinflammatory and oxidative stress pathways24. However, their specific contribution to CI associated with PD remains to be elucidated. Prior studies investigating BA dysregulation in PD have often been constrained by small sample sizes, inconsistent results, and limited analytical sensitivity of conventional assay platforms. Moreover, few studies have integrated advanced computational approaches, such as machine learning, to assess the discriminative utility of BA profiles for distinguishing cognitive subtypes of PD. Variability in cognitive assessment tools has further hindered reproducibility and interpretability.

To address these gaps, the present study employed ultra-performance liquid chromatography–tandem mass spectrometry (UPLC–MS/MS) for targeted quantification of serum BAs, combined with machine learning models to classify PD patients with normal cognition (PD-NC) and cognitive impairment (PD-CI). We hypothesize that alterations in serum BA profiles, particularly involving secondary BAs such as DCA, are associated with cognitive decline in PD and may serve as non-invasive biomarkers for early detection and monitoring of CI. By integrating metabolomic profiling with computational modeling, this study aims to clarify the role of bile acid dysregulation in PD and provide a foundation for biomarker-guided approaches to PD-related CI.

Results

Participant characteristics and serum bile acid dysregulation

The study included 63 participants in the healthy control (HC) group, 154 in the PD-NC group, and 146 in the PD-CI group. As illustrated in Fig. 1A, all participants underwent targeted UPLC-MS/MS profiling of 15 serum BAs, followed by integration of these metabolomic data with clinical variables and multivariate analyses to identify a diagnostic BA panel. Demographic and clinical characteristics are summarized in Table 1. Groups did not differ significantly in sex distribution, age, smoking status, alcohol use, hypertension or diabetes prevalence (all p > 0.05). By contrast, PD-CI patients had fewer years of education (p < 0.0001) and a lower proportion with high school education or above (p = 0.0497). Disease duration was longer (p = 0.0204) and Hoehn-Yahr (H-Y) stage higher (p = 0.0022) in PD-CI than PD-NC. MMSE and MoCA scores decreased stepwise across HC, PD-NC and PD-CI (p < 0.0001).

Fig. 1: Overview of study design and serum bile acid profiles in HC and PD groups.
figure 1

A Schematic of the multi-fluid biomarker discovery workflow. B Serum BA composition, including 15 distinct types of BAs, in the HC and PD groups. Data are presented as the mean ± SEM. **p < 0.01,***p < 0.001 (Mann-Whitney U test).

Table 1 Demographic and clinical data for the PD participants in the cohort

Beyond these clinical differences, routine biochemical indices showed a mixed pattern across groups (Table 1). A significant overall group effect was observed for body mass index (BMI), total cholesterol (TC), and low-density lipoprotein cholesterol (LDL), driven primarily by higher levels in HC compared with both PD-NC and PD-CI groups (all p < 0.05). In contrast, PD-NC and PD-CI did not differ in BMI, fasting glucose, serum lipid profiles, or liver enzyme levels—including alanine aminotransferase (ALT) and aspartate aminotransferase (AST)—and all liver-function parameters remained within normal physiological ranges. Thus, while selective metabolic differences were present between HC and PD groups, the two PD subgroups were metabolically comparable, suggesting that PD-related BA abnormalities are unlikely to result from hepatic dysfunction or metabolic impairment. We next compared overall serum BA levels between HC and all PD patients, regardless of cognitive status (Fig. 1B). PD patients showed a marked dysregulation of both primary and conjugated bile acids: CA, CDCA, DCA, glyco-lithocholic acid (GLCA) and taurocholic acid (TCA) were all significantly elevated in PD compared to HC (all p < 0.001). In contrast, LCA, taurolithocholic acid (TLCA), UDCA, glycoursodeoxycholic acid(GUDCA) and TUDCA were significantly reduced in PD patients (all p < 0.05) (Table 2). These findings reveal a pronounced disturbance of BA metabolism in PD patients.

Table 2 Comparison of serum bile acid concentrations between HC and PD patients

Comparison of serum bile acid profiles between PD-CI and PD-NC groups

Building on our observation that several BAs are dysregulated in PD versus healthy controls, we next sought to determine which of these—and which additional BA species—distinguish PD-CI from PD-NC. We quantified 15 serum BAs in 154 PD-NC and 146 PD-CI using UPLC-MS/MS. Univariate comparisons revealed that CA (p = 0.008), DCA (p = 0.000) and TUDCA (p = 0.018) were significantly elevated in PD-CI, whereas GLCA (p = 0.024), TCA (p = 0.045) and UDCA (p = 0.002) were decreased (Fig. 2A, Table 3). No significant differences were observed for total BAconcentration or for the ratios of primary to secondary and conjugated to unconjugated BAs (all p > 0.05; Fig. S1A–H).

Fig. 2: Serum bile acid profiles in PD-CI and PD-NC groups.
figure 2

A Serum BA composition, including 15 distinct types of BAs, in the PD-CI and PD-NC groups. B OPLS-DA of serum BA profiles between the PD-CI and PD-NC groups. C VIP plot identifying the key BAs contributing to the observed group differences. D Random Forest “Mean Decrease Accuracy” values highlighting the most discriminative BAs for distinguishing PD-CI from PD-NC. E SVM “Average Importance” plot further confirming the contribution of each BA to the classification. Data are presented as the mean ± SEM. *p < 0.05, **p < 0.01,***p < 0.001 (Mann-Whitney U test).

Table 3 Comparison of serum bile acid levels between PD-NC and PD-CI groups

To further explore metabolic differences between PD-CI and PD-NC, we performed orthogonal partial least squares discriminant analysis (OPLS-DA), which revealed a modest yet distinct separation between the two groups (R²Y = 0.478, Q² = 0.293; 200-time permutation test, p = 0.005; Fig. 2B, C, Fig. S2). The permutation-derived intercepts (R²Y = −0.086, Q² = −0.198) further confirmed that the model was not overfitted.Given its exploratory nature, the primary purpose of applying OPLS-DA was to identify BA species contributing to group separation. Variable importance in projection (VIP) scores from the OPLS-DA model identified DCA (VIP = 1.761), UDCA (VIP = 1.412) and GLCA (VIP = 1.411) as the top contributors (Table S1). Consistent with these findings, Random Forest analysis (10-fold CV; Fig. 2D, Table S2) ranked DCA, GLCA and UDCA highest by mean decrease in accuracy, and support vector machine (SVM) modeling (radial kernel; Fig. 2E, Table S3) again highlighted DCA as the most discriminatory feature.

Correlation between DCA levels and cognitive performance

Participants were stratified into High-DCA (n = 150) and Low-DCA (n = 150) groups at the median serum DCA concentration (27.7279 nmol/g). Chi-square tests indicated that the High-DCA group had a significantly higher rate of CI than the Low-DCA group by both MMSE (p = 0.0055) and MoCA (p = 0.03755) (Table 4). Spearman’s rank correlation (two-tailed) further revealed an inverse relationship between serum DCA levels and global cognitive performance: MMSE (r = −0.30, p < 0.0001) and MoCA (r = −0.26, p < 0.0001) (Fig. 3A, B). This negative association remained significant after controlling for education using rank-based partial correlation analysis (partial ρ = –0.30, p < 0.0001 for MMSE; partial ρ = –0.23, p < 0.0001 for MoCA), indicating that DCA may contribute to cognitive decline in PD independently of educational differences and other sociodemographic influences (Table S4).

Fig. 3: Correlation between serum DCA levels and cognitive domain scores in PD patients.
figure 3

A Spearman correlation plot showing the relationship between serum DCA levels and MMSE scores. B Spearman correlation plot showing the relationship between serum DCA levels and MoCA scores. C Violin plots comparing the cognitive domain scores, including orientation, memory, attention, visuospatial ability, language/naming, executive function, and calculation, between High-DCA and Low-DCA groups. Data are presented as the mean ± SEM. ****p < 0.0001 (Mann-Whitney U test).

Table 4 Comparison of cognitive status between Low-DCA and High-DCA groups

Building on this inverse association, we next examined domain-specific cognitive performance. High-DCA participants scored significantly lower than Low-DCA participants across all seven MMSE and MoCA domains—orientation, memory recall, attention, visuospatial ability, language/naming, executive function, and calculation (all p < 0.0001; Fig. 3C, Table S5). These domain-wide deficits reinforce the association between elevated DCA levels and both the prevalence and severity of cognitive impairment in PD.

Predictive value of serum DCA levels for cognitive impairment in PD patients

To evaluate the predictive value of serum BAs for cognitive normalcy in PD (MMSE ≥ 27; MoCA ≥ 26), we first assessed DCA alone by ROC analysis, which yielded moderate discrimination (MMSE: AUC = 0.645, 95% CI 0.588–0.702; MoCA: AUC = 0.725, 95% CI 0.672–0.778; Table 5) (Fig. 4A, B). Building on this finding, we performed univariate logistic regressions on all 15 BAs, identifying five independent predictors—CA, GLCA, TCA,GLCA and TLCA (Tables 67). We then incorporated these five markers into a forward-stepwise multivariate model, which improved discrimination to AUC = 0.741 (95% CI 0.5318–0.7395) for MMSE and AUC = 0.751 (95% CI 0.5901–0.7897) for MoCA (Fig. 4C, D; Tables 57).

Fig. 4: Sequential modeling and LASSO-based refinement for predicting cognitive normalcy in PD.
figure 4

A ROC curve for the DCA-only logistic model predicting MMSE ≥ 27. B ROC curve for the DCA-only model predicting MoCA ≥ 26. C ROC curve for the forward-stepwise five BA model (CA, DCA, GLCA, TCA, TLCA) predicting MMSE ≥ 27. D ROC curve for the same five BA model predicting MoCA ≥ 26. E LASSO coefficient paths for the full 15 BA panel in the MMSE model; curves are labeled at λ min (red dashed line) and λ 1se (blue dashed line). F Ten-fold cross-validation binomial deviance versus log(λ) for the MMSE LASSO model; vertical lines mark λ min and λ 1se. G Bubble plot of absolute LASSO coefficients (importance) and coefficient sign for variables retained at λ min in the MMSE model. H ROC curve for the refitted, unpenalized MMSE logistic model using the nine LASSO-selected BAs. I LASSO coefficient paths for the full 15 BA panel in the MoCA model, with λ min (red) and λ 1se (blue) indicated. J Ten-fold CV binomial deviance versus log(λ) for the MoCA LASSO model, with vertical lines at λ min and λ 1se. K Bubble plot of absolute LASSO coefficients for the seven acids retained at λ min in the MoCA model. L ROC curve for the refitted, unpenalized MoCA logistic model using the seven LASSO-selected acids.

Table 5 Performance of bile acid models for predicting cognitive normalcy in PD patients
Table 6 Logistic regression of bile acid biomarkers for predicting MMSE-defined cognitive status in PD patients
Table 7 Logistic regression of bile acid biomarkers for predicting MoCA-defined cognitive status in PD patients

To further refine this multibiomarker panel, we applied LASSO logistic regression with 10-fold cross-validation to the full 15-acid profile (training 70%: n = 210; test 30%: n = 90). For MMSE prediction, the minimal CV error occurred at λ_min = 0.0258, retaining nine BAs (DCA, LCA, CDCA, GLCA, TCA, TDCA, CA, TLCA, GCA), and the refitted test-set model achieved AUC = 0.715 (95% CI 0.608–0.821; Fig. 4E–G; Table 8). Similarly, for MoCA prediction, λ_min = 0.0284 preserved seven acids (DCA, LCA, GLCA, TCA, CA, TLCA, GCA), with a test-set AUC = 0.737 (95% CI 0.634–0.839; Fig. 4H–J; Table 8). A bubble plot illustrates the absolute LASSO coefficients (bubble size = |coefficient|; color = direction) for each metabolite (Fig. 4G, K). Finally, by intersecting the MMSE- and MoCA-derived LASSO panels, we defined a robust seven BA signature—DCA, LCA, GLCA, TCA, CA, TLCA, GCA—that consistently discriminates cognitive status in PD while balancing model parsimony and performance.

Table 8 Optimal LASSO Regularization Parameters for MMSE and MoCA Prediction Models

Machine learning classification of cognitive status using the seven-acid signature

Building on our LASSO-derived 7 BA signature (DCA, LCA, GLCA, TCA, CA, TLCA, GCA), we trained five classifiers—random forest (RF), XGBoost, radial-basis SVM, linear SVM and multilayer perceptron (MLP)—using 70% of subjects for training (n = 210) and 30% for testing (n = 90), with 10-fold cross-validation. In the test cohort, RF achieved the highest discrimination for MMSE ≥ 27 (AUC = 0.777, 95% CI 0.680–0.875), followed by XGBoost (AUC = 0.729, 95% CI 0.624–0.834), linear SVM (0.715, 95% CI 0.605–0.826), MLP (0.700, 95% CI 0.589–0.811) and radial SVM (0.696, 95% CI 0.584–0.808) (Fig. 5A; Table 9). Calibration analysis demonstrated that the random forest model achieved the best agreement between predicted and actual MMSE status, exhibiting the narrowest distribution of residual errors and the highest proportion of predictions falling within a small error margin (Fig. 5B, C); quantitatively, MMSE Brier scores ranged from 0.197 to 0.232, and Hosmer–Lemeshow (HL) p-values were >0.05 for SVMs and RF, whereas XGBoost and MLP showed marginal miscalibration(Table S6). Feature-importance analysis across cross-validation folds highlighted DCA, GLCA, and CA as the leading contributors (Fig. 5D; Table S7). Bootstrap-based stability analysis further confirmed robust rankings for key predictors—DCA showed perfect stability across algorithms, with GLCA/TCA/TLCA/CA/LCA exhibiting generally high stability (Table S8).

Fig. 5: Machine-learning performance and normalized feature-importance profiles for MMSE and MoCA prediction.
figure 5

A ROC curves for classifying MMSE ≥ 27 versus <27 based on our 7 BA panel (DCA, LCA, GLCA, TCA, CA, TLCA and GCA), using five classifiers: RBF SVM, linear SVM, RF, XGBoost and MLP. B Absolute-residual boxplots showing the distribution of the prediction error (observed minus predicted probability) for each MMSE model. C Reverse-cumulative residual curves for MMSE models, depicting the proportion of predictions within increasing error thresholds. D Stacked bar chart of normalized importance scores (0–100%) for each BA across the five MMSE classifiers, based on RF mean-decrease-Gini, XGBoost gain, and absolute weight magnitudes for the SVMs and MLP. E ROC curves for classifying MoCA ≥ 26 versus <26 with the same 7 BA panel and classifiers. F Absolute-residual boxplots for MoCA models. G Reverse-cumulative residual curves for MoCA predictions. H Stacked bar chart of normalized importance scores for each bile acid across the five MoCA classifiers.

Table 9 Performance of machine learning classifiers for predicting MMSE ≥ 27 using the seven-acid signature

Applying the same models to predict MoCA ≥ 26 produced a nearly identical pattern: RF again led with AUC = 0.833 (95% CI 0.750–0.916), followed by XGBoost (0.803, 95% CI 0.710–0.895), MLP (0.784, 95% CI 0.687–0.881), linear SVM (0.777, 95% CI 0.678–0.875) and radial SVM (0.740, 95% CI 0.637–0.844) (Fig. 5E; Table 10). RF also demonstrated superior MoCA calibration (Fig. 5F, G); quantitatively, MoCA Brier scores ranged from 0.207 to 0.278, with HL p > 0.05 for SVMs and RF and < 0.05 for XGBoost and MLP (Table S6). The stability-validated importance ranking again prioritized DCA, GLCA, and CA (Fig. 5H; Tables S9–10). These two complementary analyses indicate that our compact 7 BA signature delivers robust discrimination, reliable calibration, and clear biochemical interpretability for both MMSE and MoCA outcomes in PD.

Table 10 Performance of machine learning classifiers for predicting MoCA ≥ 26 using the seven-acid signature

Random forest-based integration of clinical covariates and bile-acid biomarkers

To assess the incremental value of clinical covariates alone, we first trained a RF model using the four variables that differed significantly between PD-CI and PD-NC—disease duration, H–Ystage, UPDRS-III score and years of education. In the test cohort, this clinical-only RF achieved an AUC of 0.697(95% CI 0.586–0.807) for predicting MMSE ≥ 27 (Fig. 6A, Table 11), with variable importance (mean decrease in Gini) ranking UPDRS-III highest, followed by disease duration, education and H–Y stage (Fig. 6B). When the same clinical model was applied to MoCA ≥ 26, discrimination improved to an AUC of 0.783(95% CI 0.688–0.879) (Fig. 6C), and the importance order remained essentially unchanged (Fig. 6D).

Fig. 6: Random Forest performance and feature importance for clinical-only versus clinical and bile acid models.
figure 6

A ROC curve for the RF model trained on four clinical covariates to predict MMSE ≥ 27. B Variable importance (mean decrease in Gini) for each clinical feature in (A). Bubble color indicates the magnitude of mean decrease in accuracy (MDI; warmer = higher). C ROC curve for the same clinical-only RF model predicting MoCA ≥ 26. D Variable importance for each clinical feature in (C) (mean decrease in Gini; bubble color as in B). E Cross-validated accuracy versus number of features retained during RF-RFE for MMSE prediction. F ROC curve of the RF model trained on the optimal RFE-selected feature subset for MMSE ≥ 27. G Mean Decrease in Accuracy importance for each feature in the MMSE RF-RFE final model. H Cross-validated accuracy versus feature subset size during RF-RFE for MoCA prediction. I ROC curve of the RF model trained on the optimal RFE-selected feature subset for MoCA ≥ 26. J Mean Decrease in Accuracy importance for each feature in the MoCA RF-RFE final model.

Table 11 Performance of RF models integrating clinical covariates and bile-acid biomarkers for predicting cognitive normalcy in PD patients

Building on this foundation, we then evaluated whether adding our 7 BA signature (DCA, LCA, GLCA, TCA, CA, TLCA, GCA) could further boost performance via RF-based recursive feature elimination (RF-RFE). For MMSE prediction, RF-RFE accuracy peaked when retaining 11 features (Fig. 6E), and the optimized RF model achieved an AUC of 0.841(95% CI 0.750–0.932) on the test set (Fig. 6F, Table 11). In its importance profile, DCA ranked first, followed by education, TLCA, UPDRS-III, CA and LCA (Fig. 6G). A parallel analysis for MoCA showed maximum accuracy with the same 11 features (Fig. 6H) and yielded an RF AUC of 0.901(95% CI 0.801–0.100) (Fig. 6I, Table 11), with DCA again leading the feature ranking alongside GLCA, CA and education (Fig. 6J). We next examined whether BA features conferred additional predictive value beyond demographic covariates by comparing a clinical-only model (age, sex, and education) with a combined clinical + BA model using the DeLong test for correlated ROC curves. The combined model achieved significantly higher discrimination (ΔAUC = 0.144, p < 0.001 for MMSE; ΔAUC = 0.118, p < 0.001 for MoCA; Supplementary Table S11), demonstrating that BA profiles add independent predictive power beyond clinical variables. These results demonstrate that integrating BA biomarkers with key clinical covariates and applying RF-RFE markedly enhances discrimination of cognitive normalcy in PD, with DCA emerging as the most powerful single predictor.

Discussion

BAs, cholesterol-derived metabolites transformed by the gut microbiota and recirculated via enterohepatic circulation, can interact with receptors and signaling pathways within the CNS, where such effects are biologically plausible, though direct evidence for comprehensive CNS BA metabolism remains limited15,25. The CNS contains enzymes and receptors relevant to BA signaling; their activation influences neuronal energy metabolism, survival, and function, thereby playing important roles in neurological diseases26,27. Consequently, BAs are emerging as key mediators through which the gut microbiota modulate CNS activity.

Recent studies have proposed that PD may originate in the gut, with misfolded α-synuclein propagating to the brain via neural pathways, underscoring the significance of peripheral systems in PD pathogenesis28. Clinical investigations further suggest that dysregulation of peripheral BA homeostasis contributes to PD development29. Beyond metabolic regulation, BAs and their receptors exert immunomodulatory effects that may influence neuroinflammation and microglial activity30. In this context, our study provides a systematic evaluation of serum BA remodeling in PD and its relationship with CI. Using a targeted UPLC–MS/MS platform combined with machine learning, we identified significant alterations in peripheral BA profiles among PD patients. Compared with cognitively normal participants, the PD-CI group exhibited higher levels of the secondary BA DCA and its conjugated form GLCA, along with decreased levels of the primary BA TCA and the neuroprotective secondary BA UDCA. This pattern indicates an imbalance in BA synthesis and conversion within the enterohepatic cycle, leading to overproduction of neurotoxic secondary BAs and reduced generation of protective species. Similar alterations have been reported in AD, including enhanced conversion of CA to DCA and depletion of UDCA31. However, such parallels should be interpreted cautiously, as the mechanisms in PD are likely disease-specific and require direct experimental validation.

Our findings suggest a coexistence of neurotoxic BA accumulation and depletion of neuroprotective BAs that may collectively contribute to CI in PD. Serum DCA exhibited moderate yet consistent inverse correlations with global cognition, and these associations remained significant after adjusting for education, indicating that the observed effect is unlikely to be driven by demographic differences. Although higher DCA levels were associated with worse cognitive outcomes, this relationship does not imply a causal or driving role, but rather reflects a potential biochemical signature of cognitive impairment. While a single-metabolite model for DCA showed only moderate discriminative ability, the multivariate bile acid model substantially enhanced overall prediction performance. The inclusion of 7 LASSO-selected BAs with key clinical covariates (disease duration, H–Y stage, UPDRS-III, education) within a RF-RFE framework further enhanced performance. These results emphasize the advantage of integrating metabolic and clinical dimensions using non-linear ensemble algorithms.

Notably, several primary BAs (CA, CDCA) were elevated in PD compared with controls but did not differ between PD-NC and PD-CI, suggesting that altered primary BA synthesis represents a generalized metabolic signature of PD rather than a cognition-specific feature. Conversely, reduced GUDCA levels across PD groups may reflect systemic cholestatic or mitochondrial dysfunction unrelated to CI. The marked variability observed in certain BAs (Fig. S1) further indicates that a subset of PD patients may exhibit disproportionate metabolic alterations, potentially influenced by inter-individual differences in gut microbiota composition, small intestinal bacterial overgrowth genetic variation in BA-synthetic enzymes, or medication use32,33,34,35. These findings highlight the importance of personalized metabolic profiling and caution against overgeneralization from group-level averages.

Despite these insights, several limitations warrant consideration. First, the cross-sectional design precludes causal or temporal inference. While our models accurately classified cognitive status based on current serum BA profiles, longitudinal validation will be essential to determine whether these signatures can predict future CI. Second, although medications known to alter BA metabolism (e.g., statins, BA sequestrants, proton pump inhibitors, cholinesterase inhibitors) were reviewed and users excluded, unmeasured factors—such as diet, microbiota composition, renal function, and comorbidities—may still contribute to variability. Routine hepatic indices (ALT, AST, bilirubin) were comparable across groups, but detailed microbiome and dietary assessments were not conducted. Future multi-omics studies integrating metagenomic and liver-function analyses are needed to delineate gut–liver–brain mechanisms underlying BA dysregulation in PD. Third, although rigorous calibration, pooled quality control, and isotope-labeled internal standards were used to minimize analytical bias, inter-laboratory reproducibility remains to be established. Validation across analytical centers will be necessary to confirm the robustness and clinical portability of UPLC–MS/MS BA quantification. Fourth, the clinical-only model (age, sex, education) showed moderate discrimination, and adding the seven-acid signature yielded a significant incremental improvement, indicating additive value of BA features while not yet meeting thresholds for clinical deployment. These results suggest that serum BA profiles may serve as adjunctive, rather than standalone, biomarkers for cognitive stratification in PD. Future studies should refine feature selection, expand population diversity, and include external validation cohorts to assess clinical applicability. Finally, our study did not incorporate established CSF or plasma biomarkers of neurodegeneration (e.g., Aβ, tau, NfL), which primarily reflect neuronal injury. BAs instead represent a complementary metabolic axis linking peripheral and central systems. Future multi-biomarker frameworks should evaluate whether BA signatures provide incremental value beyond traditional neuronal markers for predicting cognitive outcomes in PD.

In conclusion, our targeted metabolomics and machine-learning framework identified several BAs, particularly DCA, as potential biochemical correlates of CI in PD. These findings suggest that peripheral BA dysregulation reflects, rather than drives, PD-related cognitive changes. Comprehensive longitudinal, multi-center, and multi-omics studies integrating microbiome, hepatic, and neurodegenerative markers will be essential to validate these associations and determine the translational potential of BA signatures for early detection and monitoring of cognitive decline in PD.

Methods

Participants and clinical assessment

A total of 300 patients diagnosed with idiopathic PD were consecutively recruited from the Neurology Department of Xuzhou Medical University Affiliated Hospital between October 2022 and March 2024. Diagnosis was independently confirmed by two senior neurologists according to the 2015 Movement Disorder Society (MDS) Clinical Diagnostic Criteria for PD. Inclusion criteria included the ability to complete H-Y staging and MDS-UPDRS III scoring with sufficient language and cognitive function. Exclusion criteria were: secondary or atypical parkinsonism; comorbid neurological or psychiatric disorders (e.g., schizophrenia, stroke, major depression); systemic diseases affecting major organs or metabolic function; and medications known to influence BA metabolism (e.g., statins, BA sequestrants, proton pump inhibitors, cholinesterase inhibitors). Participants using these medications were excluded from the study to minimize confounding effects on BA metabolism. Additionally, participants with active liver disease, gastrointestinal disorders, or other major metabolic conditions (e.g., hyperlipidemia, diabetes) were also excluded, as these conditions can affect BA metabolism.

In parallel, 63 age- and sex-matched HCs, with no history of neurological or psychiatric illness, were recruited from the local community. Clinical parameters were measured, including BMI, fasting blood sugar, lipid levels (TC, TG, LDL, HDL), and liver function markers (AST, ALT). Clinical parameters were measured and the participant characteristics were displayed in Table 1. All participants provided written informed consent. The study protocol was approved by the Ethics Committee of Xuzhou Medical University (approval no. XYFY2021-KL054-01).

Cognitive and motor evaluation

Cognitive assessments were conducted on the second morning of admission while participants were in a stable condition. Trained neurologists administered the MMSE and MoCA36,37. One point was added to the MoCA score for participants with ≤12 years of education. Cognitive normality was defined as MMSE scores ≥27 and MoCA scores ≥26, with values below these thresholds indicating CI. Based on these criteria, PD patients were classified into PD-CI or PD-NC. Motor function was assessed using MDS-UPDRS part III and disease stage using the H-Y scale. Non-motor symptoms were evaluated with the Non-Motor Symptoms Scale (NMSS). Cognitive assessments were cross-validated by a third neurologist in cases of disagreement, with final classification by consensus. All assessors received standardized training before the study.

Serum sampling and bile acid quantification

Peripheral venous blood (4–5 mL) was collected on admission. After centrifugation at 3500 rpm for 10 min at 4 °C, serum was aliquoted and stored at –80 °C. Biochemical parameters were measured on a Beckman AU5800 autoanalyzer. For BA profiling, a targeted UPLC–MS/MS platform (Waters Corp., USA) was used, identifying 15 BA species including 6 primary and 9 secondary BAs. Internal deuterated standards were obtained from Sigma-Aldrich (Germany). Serum (50 μL) was mixed with 5 μL IS and 500 μL precooled methanol, centrifuged at 12,000 × g, dried under nitrogen, and reconstituted in 100 μL water. Chromatographic separation was performed on a UPLC BEH C18 column (1.7 μm, 100 × 2.1 mm) with a mobile phase of ammonium acetate and formic acid in acetonitrile, gradient-eluted at 0.4 mL/min. Quantification used multiple-reaction monitoring (MRM) in negative ESI mode. All analyses were performed in technical triplicates, with calibration curves (R² > 0.995) and pooled QC samples included for batch correction.

Peripheral venous blood was processed within 2 h and stored at –80 °C in single-use aliquots to prevent repeated freeze–thaw cycles. Stability testing confirmed <8% intra-sample variation over 6 months. Quantification was performed by targeted UPLC–MS/MS (Waters Corp.) using 18 isotope-labeled internal standards (DCA-d₄, CA-d₄, CDCA-d₄ etc.)(Table S12). Each analytical batch was independently calibrated using freshly prepared standard curves (all with R² > 0.99). Every run included low, mid, and high concentration quality control (QC) samples, with both intra- and inter-batch coefficients of variation maintained below 15%. All quantifications were performed using a stable isotope-labeled internal standard to correct for matrix effects and procedural variations.

Statistical analysis

Data were analyzed using R (version 4.4.3), SPSS (version 28.0), and GraphPad Prism (version 8.0). Continuous variables were tested for normality using the Shapiro–Wilk test. Normally distributed variables were compared using t-tests or one-way ANOVA, while non-normally distributed data were analyzed with appropriate non-parametric tests (Mann–Whitney U or Kruskal–Wallis). Categorical variables were assessed using the Chi-square (χ²) test or Fisher’s exact test. Correlations between bile acids and clinical measures were assessed Spearman correlation and rank-based partial correlation analyses, with the latter controlling for key demographic factors. Given the number of targeted bile acids and subgroup analyses, we applied the Benjamini–Hochberg false discovery rate (FDR) correction to control for multiple comparisons. Both raw and adjusted p-values are reported, and only FDR-adjusted p < 0.05 was considered statistically significant.

Multivariate analyses were performed using OPLS-DA in SIMCA (version 13.0) to visualize group separation and identify BAs contributing to discrimination. Model robustness was assessed through sevenfold internal cross-validation and 200-time permutation testing. VIP > 1.0 and FDR-adjusted p < 0.05 were considered significant contributors. Low permutation intercepts (R²Y < 0.3, Q² < 0.05) confirmed the absence of overfitting. To identify independent variables associated with cognitive impairment, we used Cox regression with forward selection as an exploratory feature-screening step. Final multivariable models were constructed using LASSO regression, which applies penalized likelihood estimation to improve stability and reduce overfitting. Optimal regularization parameters were determined via 10-fold cross-validation using the glmnet package in R.

Machine learning classifiers (logistic regression, random forest, support vector machine) were trained on 80% of the data and validated using stratified 10-fold cross-validation and bootstrap resampling (1,000 iterations). Model discrimination was evaluated using the AUC, sensitivity, and specificity. To determine whether BA signatures provided incremental value beyond demographic variables, DeLong’s test for correlated ROC curves was used to compare (i) clinical-only and (ii) combined clinical + BA models. A p < 0.05 by DeLong test indicated significant improvement in classification performance. All statistical tests were two-sided, and FDR-adjusted p < 0.05 was considered statistically significant. For all bile-acid comparisons, fold change (PD-CI/PD-NC) was computed in addition to p-values to describe the magnitude of between-group differences. For classification models, ROC analyses included 95% bootstrapped confidence intervals for AUCs and displayed bootstrapped confidence bands to visualize model uncertainty.

Machine learning modeling

All predictors seven BAs (DCA, LCA, GLCA, TCA, CA, TLCA, GCA) and four clinical variables (disease duration, education years, Hoehn–Yahr stage, and UPDRS-III)—were standardized (z-score) after median imputation of missing values. Participants were classified by cognitive status using both the MMSE (≥27 = normal cognition, <27 = impairment) and MoCA (≥26 = normal cognition, <26 = impairment) thresholds. To capture both linear and non-linear relationships between BAs and cognition, 5 algorithmic families were implemented: logistic regression (baseline linear model), support vector machine with radial and linear kernels, random forest (500 trees), extreme gradient boosting (XGBoost), and a multilayer perceptron (MLP) neural network. Hyperparameters for each classifier were optimized within a nested 10-fold cross-validation framework (inner loop for grid-search tuning, outer loop for model evaluation). This design minimizes bias from random data partitioning and prevents overfitting. Final performance metrics represent the mean ± SD across 100 randomized resampling iterations.

Model performance was quantified by AUC, accuracy, sensitivity, and specificity. Model calibration was assessed both graphically and quantitatively using the Brier score and Hosmer–Lemeshow goodness-of-fit test. Feature importance was computed for each algorithm (permutation importance for tree-based models; absolute weight coefficients for linear models). To evaluate robustness, 1000 bootstrap resamples were used to estimate 95% confidence intervals and rank-stability indices for each feature’s importance. Recursive feature elimination based on the random-forest algorithm (RF-RFE) was then applied to identify the smallest predictor subset achieving maximal accuracy. The RF-RFE-derived feature set was retrained using the random forest classifier, and its classification and calibration metrics were re-evaluated under the same cross-validation procedure. To assess incremental discrimination contributed by bile acids beyond demographic covariates, DeLong’s test for correlated ROC curves compared the AUCs of (i) clinical-only and (ii) combined clinical + BA models. A two-tailed p < 0.05 was considered statistically significant.