Decreased SNCA expression in whole-blood RNA analysis of Parkinson’s disease adjusting for neutrophils

Xu, Kayla Y.; Violich, Ivo; Hutchins, Elizabeth; Alsop, Eric; Nalls, Mike A.; Moore, Anni; Blauwendraat, Cornelis; Raphael Gibbs, J.; Cookson, Mark R.; Van Keuren-Jensen, Kendall; Craig, David W.

doi:10.1038/s41531-025-01062-4

Download PDF

Article
Open access
Published: 14 October 2025

Decreased SNCA expression in whole-blood RNA analysis of Parkinson’s disease adjusting for neutrophils

Kayla Y. Xu¹,
Ivo Violich²,
Elizabeth Hutchins^3,4,
Eric Alsop⁵,
Mike A. Nalls^3,4,
Anni Moore⁶,
Cornelis Blauwendraat⁶,
J. Raphael Gibbs⁶,
Mark R. Cookson⁶,
Kendall Van Keuren-Jensen⁵ &
…
David W. Craig¹

npj Parkinson's Disease volume 11, Article number: 292 (2025) Cite this article

Subjects

Abstract

Blood-based RNA transcriptomics offers a promising avenue for identifying biomarkers of Parkinson’s disease (PD) progression and mechanisms of pathogenesis. Previous work uncovered an age-related increase of neutrophil-enriched gene expression in PD whole blood, which may obscure disease-relevant transcriptomic signals. To better capture PD-associated molecular differences, we analyzed PD whole-blood RNA sequencing data using a differential expression approach that accounts for neutrophil composition. We built a model to estimate neutrophil percentages in 6897 Parkinson’s Progression Markers Initiative and Parkinson’s Disease Biomarkers Program samples from gene expression. By incorporating predicted neutrophil percentages as a covariate, we see significant SNCA downregulation in all PD cohorts, a signal previously obscured by immune cell-related effects. Lowered SNCA expression was observed in individuals with known PD-linked gene mutations (e.g., SNCA, GBA1, LRRK2) and those without known pathogenic variants. These findings suggest that decreased SNCA expression in whole blood may be a defining transcriptomic feature of PD.

RNA sequencing of whole blood reveals early alterations in immune cells and gene expression in Parkinson’s disease

Article 05 August 2021

Characterizing a complex CT-rich haplotype in intron 4 of SNCA using large-scale targeted amplicon long-read sequencing

Article Open access 26 July 2024

Deep sequencing of sncRNAs reveals hallmarks and regulatory modules of the transcriptome during Parkinson’s disease progression

Article 15 March 2021

Introduction

Parkinson’s disease (PD) is among the most rapidly proliferating neurological conditions globally, with an anticipated 12 million individuals affected by 2040¹. A lack of accurate biomarkers and diagnostic measures significantly limits the timely and effective treatment of PD, contributing to its growing global impact². PD diagnosis is predominantly based on clinical features that frequently overlap with other neurological disorders, leading to high rates of misdiagnosis^2,3. Consequently, dependable diagnostic biomarkers could substantially improve patient prognosis through early detection and possible therapeutic targets^2,3.

Recent progress in PD diagnostic biomarkers includes the cerebrospinal fluid (CSF)-based alpha-synuclein (α-syn) seed amplification assay (SAA), exhibiting elevated sensitivity^4,5,6. Blood-derived transcriptomic biomarkers in PD, however, are still extensively investigated^7,8,9. The benefits of blood-based biomarkers include the less invasive and more broadly applicable nature of blood draws, in contrast to the more invasive lumbar puncture necessary for acquiring CSF samples⁷. Examining transcriptomic variation in PD may also enhance our understanding of the disorder’s fundamental biological underpinnings. Previous GWAS analyses have identified 90 distinct signals that account for 16–36% of PD’s heritable risk, suggesting a strong genetic component likely to be seen in the transcriptome¹⁰. Prior research has revealed significant gene-level alterations in PD blood samples, such as in pathways associated with immune activity, inflammation, mitochondrial function, cell death, etc.⁹. However, their small sample sizes reduce statistical power and make the reproduction of differentially expressed genes difficult⁹.

The Parkinson’s Progressive Markers Initiative (PPMI) and Parkinson’s Disease Biomarkers Program (PDBP) are two multicenter, longitudinal observational studies developed to identify PD biomarkers^11,12. The two datasets include whole blood RNA-sequencing data for control and PD participants spanning multiple visits, including PD participants with known PD risk mutations (i.e., mutations in SNCA, GBA1, and LRRK2)^11,12,13. In a prior study limited to the PPMI cohort, Craig et al. found a significant enrichment of immune-related genes and pathways, including upregulated ‘Neutrophil Degranulation’¹³. This finding aligns with previous research demonstrating a connection between PD and immunological activity, with multiple studies identifying neutrophil count or a neutrophil-to-lymphocyte ratio as potential biomarkers for PD¹⁴.

The role of neutrophils in the etiology of PD remains ambiguous. Certain studies suggest PD may result from dysregulated inflammatory responses that induce α-syn aggregates or an overall overexpression of the protein in dopaminergic neurons. Increased a-syn expression may worsen inflammation, triggering a cycle of protein aggregation that eventually leads to neurodegeneration^15,16,17. In the brain, the buildup of α-syn has been linked to pro-inflammatory factors, alterations in astrocyte activity, and microglia hyperactivity¹⁸. However, evidence linking α-syn and neutrophil expression is scarce. According to the Human Protein Atlas, blood cell types such as neutrophils, monocytes, and dendritic cells do express the SNCA gene. However, SNCA expression is relatively low in neutrophils (9.8 protein-coding transcripts per million or pTPM) and highest in plasmacytoid dendritic cells (115.5 pTPM) (https://v19.proteinatlas.org/ENSG00000145335-SNCA/blood)¹⁹.

In this study, we examined the role of neutrophil expression in PD by analyzing 3965 PPMI and 2932 PDBP whole blood RNA-seq samples from 1497 PPMI and 1279 PDBP participants. Samples were collected from control participants with no diagnosis of PD (sample n = 2325), with idiopathic PD (n = 3006), and those with a germline mutation in SNCA (n = 57), LRRK2 (n = 834), or GBA1 (n = 675). We built a regression learning model to estimate neutrophil percentage for the remaining 5643 samples, given that only 1254 PPMI samples had complete blood counts (CBC). We performed a differential gene expression analysis—adjusted for predicted neutrophil percentage—on control and case cohorts only (n = 6490), then examined the results to find potentially relevant pathways. When we included neutrophil percentage as a covariate, SNCA showed a strong and significant differential expression signal, suggesting that PD suppresses SNCA in the whole blood transcriptome independent of neutrophil-related inflammation. We additionally identify a potential mitochondrial gene expression signature in PD cohorts distinct from control samples.

Results

Out of 8461 total samples, we identified 6897 whole-blood RNA-sequencing profiles from 2776 participants in PPMI and PDBP that passed quality control metrics and met our filtering criteria (Table 1 and Supplementary Tables 1 and 2). These criteria included excluding BioFIND samples due to their limited sample size (see “Methods”). Among the qualifying data, 1254 PPMI samples included documented neutrophil percentages from complete blood count (CBC) data. We used these known values to train machine learning models that predicted neutrophil percentages for the remaining 5643 PPMI and PDBP samples without CBC data. Among the 6897 screened samples, 407 had diagnoses other than ‘Case’ (clinical PD) or ‘Control’ (no PD diagnosis) at enrollment and were excluded from subsequent differential expression and pathway analyses. Separately, participants were tested for PD risk variants in SNCA, LRRK2, and GBA1 through clinical assessments and whole-genome sequencing.

Table 1 Demographic information and genetic status of PPMI and PDBP participants and samples

Full size table

Neutrophil percentage linear modeling and prediction

We constructed several machine learning models from whole blood gene expression and evaluated their performance to identify the optimal model for neutrophil percentage prediction. To avoid data leakage between the training and testing sets, we ensured that samples from the same participant were not present in both sets simultaneously. This was achieved by performing a 0.8–0.2 train-test split based on participant IDs, after which all samples belonging to a given participant were assigned exclusively to the training or testing set. We additionally considered splitting samples by visit to prevent sample dependencies within sets (i.e., only samples taken at month 12 are in the training set). The resulting models demonstrated performance that was functionally equal or marginally inferior to that of models segmented by individuals (Supplementary Fig. 1). As a result, we chose to divide the samples by participant rather than by visit month.

For both the training and testing sets, gene counts were normalized and transformed independently using the DESeq2 variance stabilizing transformation (VST) with the design formula set to ~1. This ensured that the normalization process was not influenced by any sample metadata.

The initial model we created was a linear model founded on genes known to be enriched in blood cell transcriptomes (Fig. 1a). Genes enriched in blood cells were identified using the Human Blood Atlas RNA-seq data for neutrophils, lymphocytes (T-cells and B-cells), monocytes, eosinophils, basophils, and dendritic cells¹⁹. To address linear dependencies between cell types, separate linear models were initially developed for neutrophils, basophils, eosinophils, monocytes, lymphocytes, and dendritic cells. These models were used to identify the genes most predictive of neutrophil percentage within each cell type. We applied backward elimination until all genes in each model exhibited a p-value of less than 0.05, ensuring that only genes having a significant association with neutrophil percentage were included in the model. The 118 significant genes were combined to create a final linear model, once again employing backward elimination until all remaining genes exhibited a p-value less than 0.05. The final blood cell-based linear model contained 27 significant genes (Supplementary Table 3).

**Fig. 1: Workflow diagram of regression model development for predicting neutrophil percentage from gene expression data.**

The second model was a linear model that utilized data-driven feature selection to investigate whether genes not enriched in blood cells could be informative of neutrophil percentage (Fig. 1b). We employed mutual information (MI) feature selection to determine which of the 58,780 total genes exhibited the greatest dependency on neutrophil percentage. We implemented an MI score threshold of 0.3, yielding 352 genes with high dependency. Like the first model, these 352 genes were utilized to develop a linear model, and backward elimination was applied until all p values were less than 0.05. The final MI-based linear model contained 17 genes, of which only 2 were present in the blood cell-based model (Supplementary Table 4). However, all genes in the model were enriched in neutrophils, based on the Human Blood Atlas¹⁹.

Alongside linear modeling, we employed an XGBoost regression model, which utilized all 58,780 genes to predict neutrophil percentage (Fig. 1c). XGBoost is a method of gradient tree boosting that has demonstrated enhanced predictive performance across various fields and applications, including expression-related predictions^20,21. Nonetheless, this improved predictive performance may come at the cost of decreased interpretability. In our implementation, we used the XGBoost R package to create a regression model with hyperparameters nrounds = 10, eta = 0.3, and max depth = 3.

We developed a final third linear model utilizing significant genes in the preceding blood cell-based and MI-based models (Fig. 1d). The combined 42 unique genes were used to train the combined model, once again applying backward elimination until all p values were less than 0.05. The combined model contained 31 genes: 21 from the blood cell-based model, 8 from the MI-based model, and the 2 genes that were present in both (Supplementary Table 5).

We evaluated and compared the four models by calculating the average R-squared, root mean squared error (RMSE), and mean absolute error (MAE) in test sets across 100 0.8–0.2 train-test splits. The MI-based model exhibited the worst performance across all three measures, with statistically significant deficiencies in each category (Fig. 2). When evaluating based on the R-squared in test sets, the combined linear model markedly surpasses all alternative models (Fig. 2a). Regarding RMSE, although the combined model remains a leading performer, the difference between the combined and cell-based models was not statistically significant (Fig. 2b). The same applied to MAE (Fig. 2c). The combined model occasionally shows outlier behavior in terms of RMSE and MAE across certain train-test splits. This is likely due to the inclusion of genes from the MI-based model, which may contribute to higher prediction errors, and/or the use of underpowered train-test splits. Increasing the sample size could help mitigate these issues. Nonetheless, the test set R-squared values of the combined model remained consistently higher than all other models. Consequently, we opted for the combined linear model to predict neutrophil percentage in our samples with no CBC data.

**Fig. 2: Comparison of different machine learning models to predict neutrophil percentage in PPMI and PDBP patients.**

A final linear model was fitted utilizing all 1254 samples and 31 genes. Subsequently, we employed the model to predict neutrophil percentage in the 5643 samples without known blood cell counts. A total of 1254 known neutrophil percentages and 5236 predicted values were utilized in downstream analyses. Four hundred seven predicted neutrophil percentages were not used, as the associated samples were obtained from participants with a diagnosis of ‘Other’ at baseline.

Analysis of sample variation

Before performing differential expression analysis, we assessed the validity of our covariate design and the effect of incorporating predicted neutrophil percentage with principal component analysis (PCA). PCAs were generated using the plotPCA() function in DESeq2. Initially, we conducted PCA on VST counts from all 6897 samples, revealing a strong and statistically significant correlation between PC1 and sex of the participant (Supplementary Fig. 2a, b). We observed notable correlations between PC2 to PC5 and sample QC metrics such as percent intronic bases, percent mRNA bases, percent usable bases, etc. A slight correlation with percent chimeric reads was observed; therefore, we implemented a filter for >3% chimeric reads in the passing samples (see “Methods”). The predicted neutrophil percentages were most highly correlated with PC8, suggesting that the neutrophil percentages predicted by our model do appear to correlate with variation in gene expression. In a second PCA of the VST counts utilizing the limma removeBatchEffect() function in R to correct for our chosen design covariates (i.e. disease status, sex, percent mRNA bases, participant age, and neutrophil percentage), we effectively mitigate the influence of neutrophil percentage and other covariates/confounders (Supplementary Fig. 2c, d). Eliminating these effects significantly reduced the percentage of total variation explained by the first nine principal components, illustrating the beneficial impact of accounting for these covariates in the differential expression design matrix (Supplementary Fig. 2e, f).

Differential gene expression analysis with predicted neutrophil percentage

Craig et al. previously conducted neutrophil-corrected differential expression analysis using 1254 PPMI samples with known neutrophil percentage¹³. This study aimed to enhance their analysis by imputing a larger number of samples based on the predicted neutrophil percentages obtained from our linear model. We performed differential gene expression analysis using 6490 samples on multiple PD cohort vs. control comparisons. The full design = ~clinical diagnosis + sex + age squared + percent mRNA bases + predicted neutrophil percentage was determined through variance analysis and design testing (Supplementary Table 6). For each cohort comparison, we conducted the DE analyses using all PPMI and PDBP samples, as well as with samples segregated by study and at baseline (i.e., age at initial blood draw upon enrollment into PPMI or PDBP) (Figs. 3 and 4 and Supplementary Figs. 3–5).

**Fig. 3: Differential expression in patients comparing cases to controls, with and without controlling for predicted neutrophil percentage.**

**Fig. 4: Differential expression analysis of all samples by genetic cohort.**

After adjusting for predicted neutrophil percentage, we observed a substantial reduction in the number of differentially expressed genes between case vs. control samples from both PPMI and PDBP studies (Fig. 3a, b). A subset of these genes comprised of neutrophil-enriched genes that were eliminated by correction, indicating that the predicted neutrophil percentage serves as an effective proxy in DE analysis for actual neutrophil percentage (Fig. 3c, d). Neutrophil correction additionally appeared to correct for leukocyte-enriched genes overall, not exclusively those within neutrophils (Supplementary Fig. 6).

SNCA was the only identified PD causal variant that maintained differential expression after neutrophil correction. Not only did SNCA retain statistical significance, but the exclusion of genes related to neutrophil percentage further enhanced the DE signal of SNCA: SNCA ranked as the 973rd most significant DE gene without neutrophil correction and the 27th after neutrophil correction (Fig. 3b and Supplementary Table 6). Two other PD causal variants, LRRK2 and LRP10, were no longer differentially expressed after neutrophil correction (Fig. 3a, b).

Neutrophil correction eliminated the DE signals of blood cell-enriched or related genes while simultaneously improving the DE signal of SNCA in our case vs. control comparison. We further established that SNCA expression appeared uncorrelated with predicted neutrophil percentage, suggesting that the mechanisms behind lower SNCA expression in PD whole blood occur independently from immune cell-related activity (Supplementary Fig. 7). The absence of correlation was further corroborated by the increased significance of SNCA downregulation in DE analyses after neutrophil correction, as genes known to be associated with neutrophil expression (i.e., neutrophil-enriched genes, LRRK2, and LRP10) were predominantly rendered statistically insignificant (Fig. 3).

Subsequently, we performed DE analysis among distinct PD disease cohorts by genetic status (i.e., PD-SNCA+, PD-GBA1+, and PD-LRRK2+) and compared these case samples to healthy control (HC) samples, defined as participants classified as ‘Control’ at baseline without SNCA/GBA1/LRRK2 mutations. Only HC samples were employed to isolate the impact of known variants on disease expression. We continued to see a statistically significant downregulation of SNCA in each PD cohort relative to HC, with a similar improvement in the SNCA DE signal compared to uncorrected analysis (Fig. 4a–d and Supplementary Figs. 3–5). In baseline samples, SNCA was significantly downregulated in the DE analysis of case vs. controls; however, this was not observed in comparisons split by genetic cohort vs. HC, likely due to insufficient sample sizes (Supplementary Fig. 5). The statistically significant depression of SNCA was also observed when graphing logged counts per million (CPM) of the gene counts, corrected for predicted neutrophil percentage (Fig. 4g and Supplementary Fig. 8). When analyzed by age, SNCA depression was largely insignificant in younger participant cohorts, such as those younger than 50, most likely due to limited sample size (Supplementary Fig. 8b).

SNCA exhibited no differential expression in the PD-LRRK2+ vs. PD-LRRK2− and PD-GBA1+ vs. PD-GBA1− comparisons, aligning with our observation that SNCA was solely downregulated in contrast to control samples (Fig. 4e, f). The number of blood cell-enriched genes was also not meaningfully reduced in PD-LRRK2+ vs. PD-LRRK2− after correcting for predicted neutrophil percentage (Supplementary Fig. 6k, l). A reduction was noted in PD-GBA1+ vs. PD-GBA1− (Supplementary Fig. 6g, h). LRRK2 appeared to associate with genes enriched in blood cells, somewhat independent of PD-related blood cell activity²².

Pathway analysis highlights mitochondrial dysfunction in PD

To identify pathways with altered expression in the PD cohorts, we performed Ingenuity Pathway Analysis (IPA) on differential expression results from each PD cohort vs. control cohort analysis. As identified in Craig et al., the most significantly enriched pathway in all cases vs. controls without correcting for neutrophil percentage was ‘Neutrophil Degranulation’ (Supplementary Fig. 9). The IPA of neutrophil-corrected case vs. control DE results successfully eliminated enrichment of the ‘Neutrophil Degranulation’ pathway, with the most significantly enriched pathway identified as ‘Mitochondrial RNA Degradation’ (Supplementary Fig. 10). In PD-LRRK2+ vs. HC and PD-SNCA+ vs. HC, the ‘Neutrophil Degranulation’ pathway was statistically significant, but exhibited depression rather than enrichment in the PD cohorts (Supplementary Figs. 13 and 14).

Neutrophil correction reduced the number of pathways in the case vs. control contrast, with z-scores greater than 2 or less than −2 and p values less than 0.05 from 44 to 12. Notably, the ‘Parkinson’s Signaling Pathway’ exhibited significant expression in case vs. control and PD-LRRK2+ vs. HC with neutrophil correction (Supplementary Figs. 9, 10, 14). However, in IPD, PD-GBA1+, and PD-SNCA+ pathway analysis, ‘Parkinson’s Signaling Pathway’ was present in the results but was insignificant based on our z-score and p value thresholds (Supplementary Figs. 11–13). The absence of significance was likely due to the small sample size, given that numerous ‘Parkinson’s Signaling Pathway’ genes exhibit differential expression in each cohort.

In pathway analyses by genetic cohort, IPA continued to indicate significant enrichment of blood cell and immune-related pathways, even following neutrophil correction. As gene expression data was sequenced from blood samples, high enrichment of these pathways could be attributed to tissue type. Considering this possibility, we identified alternative categories of pathways associated with PD, specifically those related to mitochondrial function. Pathways related to mitochondrial dysfunction were significantly enriched in every PD cohort vs. HC analysis (Supplementary Figs. 10–14). We observed positive enrichment of the ‘Mitochondrial Dysfunction’ pathway in PD-GBA1+, PD-SNCA+, and PD-LRRK2+ samples. Additionally, ‘Granzyme A Signaling’ in PD-LRRK2+, ‘NAFLD Signaling Pathway’ in PD-SNCA+, and ‘Coronavirus Pathogenesis Pathway’ in PD-GBA1+ were pathways associated with mitochondrial dysfunction and positively enriched in PD cohorts. In the IPD vs. HC analysis, ‘BBSome Signaling Pathway’, a pathway positively associated with mitochondrial function, was one of nine significant pathways and was depressed in IPD samples⁸. ‘Leukocyte Extravasation Signaling’ was another mitochondria-related pathway enriched in the IPD analysis, involved in mitochondrial fission and fusion. This association between PD and mitochondrial dysfunction is further supported by differential expression analysis, as genes in the mitochondrial genome were consistently downregulated in the case vs. control contrasts (Fig. 3b). Mitochondrial gene downregulation strengthened after neutrophil correction (Fig. 3d). We again saw a similar overall downregulation of mitochondrial genes in DE analyses split by PD cohort (Fig. 4a–d). This downregulation is not as apparent in DE analysis split by PD cohort stratified by study (i.e., PPMI and PDBP), especially in GBA1+ samples, likely due to smaller sample sizes (Supplementary Fig. 3). PD-LRRK2+ samples appeared to have both lower mitochondrial gene expression and a larger number of significantly enriched mitochondria-related pathways compared to all other PD cohorts (Fig. 4c and Supplementary Fig. 14). We overall see consistent evidence of mitochondrial activity in relation to PD, with a specific trend towards increased dysfunction in disease samples.

PD cohorts exhibit niches of gene expression in unsupervised analysis

The presence of differentially expressed genes and enriched pathways from DE and IPA analyses indicated that PD cohorts may exhibit distinct overarching transcriptomic profiles. To further investigate any global transcriptomic differences between PD and control cohorts, we applied UMAP (Uniform Manifold Approximation and Projection) dimensionality reduction to identify clusters of HC, IPD, PD-SNCA+, PD-GBA1+, and PD-LRRK2+ samples. As IPA revealed an enrichment of mitochondrial dysfunction pathways, we performed UMAP dimensionality reduction on VST counts of genes in the ‘Parkinson’s Signaling Pathway’ (Fig. 5a), the ‘Mitochondrial Dysfunction’ pathway (Fig. 5b), ‘BBSome Signaling Pathway’ (Fig. 5c), and ‘Leukocyte Extravasation Signaling’ (Fig. 5d), as listed in the IPA database.

**Fig. 5: UMAP dimensionality reduction with pathway-specific genes by disease status and genetic cohort.**

None of the four UMAPs demonstrated discernible clustering by cohort, including between control and case samples. As hierarchal clustering similarly failed to discriminate between cohorts, we can infer that PD whole blood gene expression may have no unique global transcriptomic structure compared to controls, even in disease-specific pathways (Supplementary Fig. 15). However, by generating density plots stratified by sample cohort for each UMAP, specific niches of gene expression emerged, particularly within PD-SNCA+, PD-GBA1+, and PD-LRRK2+ samples. In the UMAP of ‘Parkinson’s Signaling Pathway’ genes, we observed that HC and IPD samples were distributed consistently throughout the plot, whereas genetic cohorts clustered near the lower right quadrant (Fig. 5a). In the ‘Mitochondrial Dysfunction’ genes, we observed an even more pronounced division between HC/IPD and PD-SNCA+/PD-GBA1+/PD-LRRK2+, where genetic samples predominantly clustered in a central band, while HC/IPD samples clustered towards the upper and lower extremes of the UMAP (Fig. 5b). PD-SNCA+ samples typically exhibited the most distinct niche of expression for both the ‘Parkinson’s Signaling Pathway’ and ‘Mitochondrial Dysfunction’ genes. Differences between IPD and HC samples, however, were comparatively minor. Subtle distinctions were still discernible, including in the UMAP density plots derived from the IPD-enriched ‘BBSome Signaling Pathway’ and ‘Leukocyte Extravasation Signaling’ (Fig. 5c, d).

Demographic, clinical, and biological factors influencing SNCA gene expression

The most notable finding from our analysis of PD whole blood RNA sequencing data was the significant downregulation of SNCA in DE analyses. We further investigated which phenotypic characteristics of PD could be responsible for the SNCA signal by plotting CPM normalized and logged SNCA gene counts by mutation status and age at baseline.

The reduction of SNCA expression relative to all controls was most apparent in PD-SNCA+ samples, both in those from participants with a clinical PD diagnosis and without (Fig. 6a). This decreased expression was statistically significant even when solely analyzing baseline samples (Fig. 6b). However, the SNCA+ sample size is much smaller than other PD cohorts, and our results should be interpreted with caution and merit additional validation. Mutations in SNCA are typically missense, indicating that transcriptomic depression of the gene may be due to some indirect mechanism that potentially becomes active in PD²³. The observation that PD-SNCA- samples also demonstrated SNCA downregulation suggests that SNCA mutations alone are not solely responsible for diminished gene expression. When examining other potential genetic drivers, we observed a more modest decrease in SNCA expression in PD-LRRK2+ and PD-GBA1+ samples. This decrease was statistically significant when analyzing all samples but became non-significant in analyses of only samples at baseline (Figs. 4g and 6a, b). This is likely due to the relatively smaller pool of PD-LRRK2+ and PD-GBA1+ participants compared to controls and a more subtle decrease in SNCA expression compared to PD-SNCA+ participants (Fig. 6b). The overall behavior of SNCA expression was consistent with our findings in differential expression analysis by genetic cohort (Supplementary Fig. 3). It remains likely that LRRK2 and GBA1, along with SNCA, may indirectly impact the mechanism responsible for decreased SNCA in PD whole blood.

**Fig. 6: *SNCA* expression stratified by demographic, clinical, and biological factors.**

Further analysis of SNCA expression by participant age at baseline also indicated a possible SNCA-age dependency in PD. Notably, though SNCA expression was only moderately decreased in PD-GBA1+ participants overall, younger PD-GBA1+ participants (i.e. 30s and 40s) appeared to exhibit a larger decrease in SNCA expression similar to PD-SNCA+ participants (Fig. 6a). In IPD samples (i.e. case with SNCA−/GBA1−/LRRK2−), there was a statistically significant decrease in SNCA expression as the baseline age of the participant increases (Fig. 6c and Supplementary Fig. 8). This steady age-related decrease of SNCA was not present in the control samples of any genetic cohort, indicating a possible disease-specific behavior of SNCA expression in whole blood. Overall, it appears that some combination of genetic and age-related factors may contribute to a systemic downregulation of SNCA in the PD whole blood transcriptome.

Discussion

This study aimed to examine the influence of blood cell-enriched gene expression on the differential expression analysis of RNA sequencing samples from whole blood in Parkinson’s disease. We previously identified an over-representation of neutrophil-enriched genes and the ‘Neutrophil Degranulation’ pathway in case-versus-control differential expression (DE) and IPA analyses. To address this, we developed a linear model using 1254 PPMI samples with CBC data to predict neutrophil percentages in an additional 5643 PPMI and PDBP samples. We constructed four models: a linear model using blood cell-enriched gene expression, a second linear model with genes selected through MI feature selection, a third linear model using a combined set of the most significant genes from the prior two models, and an XGBoost regression model trained on the entire gene expression dataset. We chose the combined model for neutrophil prediction due to strong performance in the R-squared, RMSE, and MAE assessments.

We included both the 1254 known neutrophil percentages and the 5643 predicted percentages as a design covariate in our differential expression analysis. This approach successfully eliminated many neutrophil-enriched genes, as well as genes enriched in other blood cells (e.g., lymphocytes, monocytes, basophils, eosinophils, and dendritic cells). After neutrophil correction, IPA analysis of case-versus-control DE results no longer showed an enrichment of the ‘Neutrophil Degranulation’ pathway. This finding suggests that using predicted neutrophil percentages successfully adjusts for neutrophil-related pathway activity. PD-SNCA+ and PD-LRRK2+ did, however, exhibit significant depression of the ‘Neutrophil Degranulation’ pathway, which may be a consequence of neutrophil correction or an area of further investigation. We additionally see significant upregulation of the ‘Parkinson’s Signaling Pathway’ post-correction, further highlighting how disease-specific features of the whole blood transcriptome are more apparent after neutrophil correction. The ‘Parkinson’s Signaling Pathway’ is present, though non-significant, in pathway analysis stratified by PD cohort, likely due to insufficient sample size.

While lowered SNCA has been observed previously in cerebrospinal fluid (CSF)²², this study demonstrates that whole blood not only expresses SNCA at substantial levels but also shows a marked decrease in PD patients. With neutrophil correction, we continued to see a consistent depression and DE signal improvement of SNCA expression in the whole blood transcriptome for all disease cohorts. We no longer see differential expression of PD risk variants LRRK2 and LRP10 in the case vs. control contrast, likely due to the strong association both genes have with neutrophil expression. LRRK2 was identified as a contributor to neutrophil chemotaxis in Mazaki et al.²⁴. LRP10 has been demonstrated to associate with genes implicated in neutrophil degranulation in RNA analysis performed by the Human Blood Atlas¹⁹. Consequently, reduced differential expression following neutrophil correction aligned with the biological function of LRRK2 and LRP10.

Why SNCA downregulation occurs is an open question. For one, SNCA appears to be relevant in PD beyond the gene’s involvement in blood cell function. While SNCA is not highly expressed in neutrophils, SNCA is expressed in plasmacytoid dendritic cells (115.5 pTPM), classical monocytes (62.7 pTPM), and basophils (36.8 pTPM)²⁵. We demonstrated that neutrophil correction decreased the number of differentially expressed blood cell-enriched genes across all cell types, which may explain the greater magnitude and significance of SNCA downregulation after correction. Including neutrophil percentage in our analyses uncovered a stronger gene expression signal of SNCA in PD cohorts that occurs independently of immune cells. In each PD cohort, we observed consistent SNCA downregulation relative to healthy controls, including in IPD samples lacking SNCA+/GBA1+/LRRK2+. This pattern suggests that lowered SNCA expression in whole blood may be a significant hallmark of PD.

Although this reduction reaches statistical significance, its utility as a biomarker is limited: mRNA-based measures are prone to degradation, and the lack of specificity is problematic for a relatively rare disorder such as PD. We also investigated whether PD might involve specific SNCA isoforms. However, because our random-priming assay detects both spliced and pre-spliced mRNA, we found no clear evidence of differential splicing. It should be noted, though, that ruling out isoform-specific variations conclusively is challenging. Importantly, our results suggest that other protein-based biomarkers in blood may warrant further exploration, especially considering promising α-synuclein seed amplification assays (SAA) in CSF. More recently, Kluge and colleagues have shown potential for SAA in PRKN-PD²⁶. Despite the limited promise of RNA-based biomarkers in this study, these data underscore important biological insights into SNCA and its role in PD.

We further identified multiple mitochondria-related pathways enriched in PD cohorts with IPA. Most notable was the positive enrichment of the ‘Mitochondrial Dysfunction’ pathway in PD-SNCA+, PD-GBA1+, and PD-LRRK2+ samples. SNCA is associated with mitochondrial function and is present in the ‘Mitochondrial Dysfunction’ pathway, suggesting that mitochondrial activity may contribute to the SNCA DE signal in PD. Dysfunction of mitochondria has previously been tied to neuronal cell death and neurodegenerative disease^27,28,29. Mitochondrial inhibition in DA neurons was also demonstrated to induce Parkinsonian motor symptoms in both primates and humans³⁰. In the same PPMI samples used in our study, Barnhoorn et al. observed reduced mitochondrial function in PD patients and found that the severity of mitochondrial dysfunction correlates with disease severity in gene set enrichment analysis³¹. UMAP density plots also provide evidence of altered mitochondrial activity, with PD cohorts clustering in specific regions of gene expression rather than forming a distinct transcriptomic profile. The lack of distinct global structural differences between PD cohorts and HC samples corresponds with the high degree of complexity in PD development and expression. More sophisticated methods of unsupervised analysis may provide a better understanding of the highly nuanced transcriptomic differences in PD related to mitochondrial activity and dysfunction.

This study does have notable limitations. Although we used the largest compiled PD transcriptomic dataset available, our analyses were confined to clinically diagnosed PD participants and controls. The relatively small number of prodromal and SWEDD patients prevented detailed investigation of those groups. We also conducted analyses by sample rather than by participant, since individuals with SNCA/GBA1/LRRK2 mutations compose a relatively small proportion of our total pool of participants. As such, participant-associated sample dependencies may be unaccounted for in some statistical tests. While SNCA may be a strong signal of PD, significant transcriptomic differences that typify IPD are still unclear and will likely require further single-cell level analysis to elucidate. Long-read RNA-sequencing may also provide a more detailed picture of what specific transcripts are being expressed. Transcriptomic differences between control and PD samples may exist in gene isoforms that are difficult to parse from short-read sequencing, even with splice junction information. Overall, future research should employ both single-cell and long-read transcriptomic analyses to pinpoint which differentially expressed genes and pathways are causally linked to PD pathogenesis, as opposed to only associated with the disease.

In conclusion, this study incorporated neutrophil percentage correction into differential transcriptomic analysis of PD whole blood samples. We see a stronger depression of SNCA expression, which may be caused by mitochondrial dysfunction and other related mitochondrial pathway activity. Correcting for immune cell-enriched genes in PD whole blood RNA analysis can uncover more relevant pathways in the PD transcriptomic profile, which will help guide future work in blood-based analysis of the disease.

Methods

Data

All data used in the study were collected and processed according to the protocols outlined by the Parkinson’s Progression Markers Initiative (PPMI) and Accelerating Medicines Partnership Parkinson’s Disease (AMP PD®) program^11,12. Both PPMI and Parkinson’s Disease Biomarkers Program (PDBP) from AMP PD® patient samples were labeled by their disease status (PD or non-PD). Genetic group (GBA1+/−, LRRK2+/−, and SNCA+/− germline mutation status was determined by genetic testing or whole genome sequencing variant data, as provided within AMP-PD and assigned by Craig et al.¹³. LRRK2+ participants expressed at least one of the following mutations: LRRK2^R1441G, LRRK2^R1441C, LRRK2^R1628P, LRRK2^Y1699C, LRRK2^G2019S, or LRRK2^G2385R. GBA1+ participants expressed at least one of the following mutations: GBA1^N370S, GBA1^T408M, GBA1^E365K, GBA1^IVS2, GBA1^84GG, or GBA1^L444P. SNCA+ participants expressed at least one of the following mutations: SNCA ^A53T, SNCA ^E46K, or SNCA ^A30P. A subset of 1254 PPMI patient samples included complete blood count (CBC) and neutrophil percentage.

Written informed consent to data collection and sharing was obtained from all participants by both PPMI and PDBP. Ethical guidelines on human data collection were adhered to. The PPMI project was approved by the Institutional Review Board or Independent Ethics Committee of all participating sites in Europe, including Attikon University Hospital (Greece), Hospital Clinic de Barcelona and Hospital Universitario Donostia (Spain), Innsbruck University (Austria), Paracelsus-Elena-Klinic Kassel/University of Marburg (Germany), Imperial College London (UK), Pitié-Salpêtrière Hospital (France), University of Salerno (Italy), and in the USA, including Emory University, Johns Hopkins University, University of Alabama at Birmingham, PD and Movement Disorders Center of Boca Raton, Boston University, Northwestern University, University of Cincinnati, Cleveland Clinic Foundation, Baylor College of Medicine, Institute for Neurodegenerative Disorders, Columbia University Medical Center, Beth Israel Medical Center, University of Pennsylvania, Oregon Health and Science University, University of Rochester, University of California at San Diego, and University of California, San Francisco. The PDBP project was approved by the Institutional Review Board or Independent Ethics Committee of all participating sites in the USA, including University of Pennsylvania, University of Texas Southwestern Medical Center, Johns Hopkins University, Penn State Milton S. Hershey Medical Center, Harvard Medical School, Brigham and Women’s Hospital, University of Florida, University of Washington, Columbia University, Pacific Northwest National Laboratory, and University of Alabama at Birmingham.

Genome and transcriptome alignment, quantification, and quality control

Samples were sequenced, aligned, and quantified as part of PPMI. Sequencing was done using the Illumina NovaSeq 6000 platform, after which FASTQ files were aligned to the GRCh38 human genome using STAR 2.6.1d. Gene counts were created using featureCounts 1.6.2 and GENCODE 29 annotations.

A total of 8461 samples were provided by PPMI (labeled as PPMI) and AMP PD® (labeled as either PDBP or BioFIND). Only samples with a RIN value of greater than 6, usable bases percentage between 20% and 65%, and chimeric reads percentage below 3% were included in both the neutrophil percentage prediction model development and differential gene expression analysis. Additionally, 166 BioFIND samples were removed due to the relatively small sample size compared to PPMI and PDBP. Of the 6897 passing samples, 1254 were used for neutrophil prediction model development. A total of 6490 samples from participants labeled as ‘Case’ or ‘Control’ at baseline were included in differential expression analyses.

Neutrophil percentage linear modeling and prediction

Genes enriched in white blood cells (neutrophils, eosinophils, basophils, monocytes, lymphocytes, and dendritic cells) were identified using annotations from the Human Blood Atlas²⁵. Linear models were developed using the lm() function in R. Backward elimination was applied recursively until the only genes used in the model had p-values less than 0.05. Mutual information features selection was conducted using SelectKBest() and mutual_info_regression from sklearn in Python. The XGBoost regression model was built using the xgboost R package with the parameters nrounds = 10, eta = 0.3, and max depth = 3. All four models were compared using the Pearson R-squared, root mean squared error, and mean absolute error across 100 train-test splits.

Variance analysis

PCA was calculated using the plotPCA() function from DESeq2, and batch correction was conducted with the removeBatchEffect() function from limma. PCs were correlated to the technical (study, plate, usable bases, etc.) and biological (neutrophil percentage, age, sex, etc.) variables of the samples using a Spearman’s rank correlation or intraclass correlation if the variable was continuous or categorical, respectively. The significance of each correlation was evaluated using either a Spearman’s or an ANOVA test p value for continuous or categorical variables, respectively. Categorical variables with singular unique values (sample_id, participant_id, diagnosis_at_baseline, diagnosis_latest) could not be evaluated using either method; as such, p values were set to 0 for all PCs.

Differential expression analysis

All differential expression analyses were conducted using the well-developed limma-voom 3.58.1 framework in R. We used a p value threshold of 0.05 and a log fold change threshold of 0.1. Each comparison used the design = ~0 + clinical diagnosis + sex + percent mRNA bases + predicted neutrophil percentage + age squared, where age is determined by the age at patient enrollment. The design matrix was determined through successive testing and variance analysis to identify the most significant and biologically relevant covariates (Supplementary Fig. 2 and Supplementary Table 6).

Genes were labeled in volcano plots by category of interest. Causal variants of PD were determined by evidence in previous studies. Mitochondrial genes were labeled based on genes in the Human Gene Nomenclature Committee’s mitochondrial genome list³². Pathway genes were compiled and identified from statistically significant pathways in Ingenuity Pathway Analysis (IPA) analyses. Leukocyte-enriched genes were identified per the Human Blood Atlas.

Ingenuity pathway analysis

Differential expression analysis results from multiple comparisons were used to conduct pathway analysis with QIAGEN Ingenuity Pathway Analysis (IPA) software (QIAGEN Inc., https://digitalinsights.qiagen.com/IPA). For DE results from each comparison, we ran IPA Expression Analysis, which compares given differential gene expression to IPA’s libraries of well-characterized pathways. Based on a given p value and log fold change threshold, IPA calculates an activation z-score for each. Pathways with an activation z-score of greater than |2| and a p value less than 0.5 were considered significantly expressed.

For IPA of DE results without neutrophil correction, an adjusted p value threshold of 0.05 and log fold change threshold of <−0.1 and >0.1 was applied to identify significant genes. For IPA of DE results with neutrophil correction, an adjusted p value threshold of 0.05 and log fold change threshold of <−0.085 and >0.065 was used to avoid biased z-score calculations (i.e. ensure number of upregulated genes is approximately equivalent to the number of downregulated genes) and include a similar number of differentially expressed genes as the un-corrected analysis (~500 upregulated and ~500 downregulated). Log fold change thresholds of <−0.1 and >0.075 and an adjusted p value threshold of 0.05 were used for IPD vs. HC analysis (394 downregulated and 396 upregulated genes). PD-GBA1+ v HC analysis was conducted with the same adjusted p value threshold and a log fold change threshold of <−0.1 and >0.09 (1140 downregulated and 1115 upregulated genes). PD-LRRK2+ v HC was conducted with the same adjusted p value threshold and a log fold change threshold of <−0.14 and >0.15 (1420 downregulated and 1421 upregulated genes). Finally, PD-SNCA+ v HC was conducted with the same adjusted p value threshold and a log fold change threshold of <−0.1 and >0.14 (1618 downregulated and 1625 upregulated genes). Genes from significant pathways related to mitochondrial function were then compiled and used in UMAP dimensionality reduction.

Dimensionality reduction and density plots

Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction was conducted using the umap() function in the R package umap version 0.2.10.0. Gene counts were normalized and transformed using DESeq2 vst() before applying dimensionality reduction. To make the density plots, samples were labeled with ‘Healthy Control’ if the participant was not diagnosed with PD and was SNCA−/GBA1−/LRRK2−, ‘Idiopathic PD’, if the participant was diagnosed with PD and was SNCA−/GBA1−/LRRK2−, PD-SNCA+ if the participant was diagnosed with PD and had a SNCA mutation, PD-GBA1+ if the participant was diagnosed with PD and had a GBA1 mutation, and PD-LRRK2+ if the participant was diagnosed with PD and had a LRRK2 mutation. Density plots were created using ggplot2 and the stat_density_2d() function. Mitochondrial genes were identified using the HUGO Gene Nomenclature Committee (HGNC) Mitochondrial genome gene group. ‘Mitochondrial Dysfunction’, ‘BBSome Signaling Pathway’, ‘Leukocyte Extravasation Signaling’, and ‘Parkinson’s Signaling Pathway’ genes were identified based on the corresponding molecule list in IPA.

Statistical software

Statistical analyses were conducted in either R version 4.3.1 or Python version 3.7.16. Linear models were created in R using the lm() from the R stats package, and the XGBoost model was built using the R package xgboost version 1.7.8.1. The package scikit-learn version 1.0.2 was used for mutual information feature selection in Python. Differential expression analysis and variance analysis were conducted in R using DESeq2 version 1.40.2, limma version 3.58.1, and edgeR version 3.42.4. Plots were made using either ggplot2 version 3.4.4 or EnhancedVolcano version 1.18.0. The geom_signif() function in ggpubr version 0.6.0 was used whenever a two-tailed Wilcoxon rank sum test was applied.

Data availability

Raw sequencing data (FASTQ files), alignment files (BAM files), TPM data and counts for each sample are available at the LONI IDA (https://fairsharing.org/, IDA; LONI IDA, https://doi.org/10.25504/FAIRsharing.r4ph5f). Data are also available through the AMP PD® (https://amp-pd.org/). Data for this study were obtained through the AMP-PD® Terra platform. These are the requirements for downloading from the AMP PD®: (1) personal and institutional or company details; (2) description of intended data use, for example, proposed analyses; (3) institutional signature on the AMP PD® Data Use Agreement (for researchers requesting access to individual level, ’omics data). Additional data, including but not limited to study arm, motor assessments, DaTscan and MRI imaging, genetic testing results, whole-exome and genome sequencing data, patient history, and standardized techniques and protocols for data collection are also available through the IDA. To access complete data, researchers need to fill out a data-use agreement. Data are available in a public (institutional, general, or participant-specific) repository that does not issue datasets with DOIs (non-mandated deposition).

Code availability

All code for data analysis in this study is available on GitHub and can be accessed via this link: https://github.com/kaylaxu/pd_wb_rnaseq_snca_paper.

References

Dorsey, E. R., Sherer, T., Okun, M. S. & Bloem, B. R. The emerging evidence of the Parkinson pandemic. J. Parkinsons Dis. 8, S3–S8 (2018).
PubMed PubMed Central Google Scholar
Tolosa, E., Garrido, A., Scholz, S. W. & Poewe, W. Challenges in the diagnosis of Parkinson’s disease. Lancet Neurol. 20, 385–397 (2021).
CAS PubMed PubMed Central Google Scholar
Rizzo, G. et al. Accuracy of clinical diagnosis of Parkinson disease: a systematic review and meta-analysis. Neurology 86, 566–576 (2016).
PubMed Google Scholar
Oftedal, L., Maple-Grodem, J., Tysnes, O. B., Alves, G. & Lange, J. Seed amplification assay as a diagnostic tool in newly-diagnosed Parkinson’s disease. J. Parkinsons Dis. 13, 841–844 (2023).
CAS PubMed PubMed Central Google Scholar
Concha-Marambio, L., Pritzkow, S., Shahnawaz, M., Farris, C. M. & Soto, C. Seed amplification assay for the detection of pathologic alpha-synuclein aggregates in cerebrospinal fluid. Nat. Protoc. 18, 1179–1196 (2023).
CAS PubMed PubMed Central Google Scholar
Fairfoul, G. et al. Alpha-synuclein RT-QuIC in the CSF of patients with alpha-synucleinopathies. Ann. Clin. Transl. Neurol. 3, 812–818 (2016).
CAS PubMed PubMed Central Google Scholar
Whittle, B. J. et al. Early-stage idiopathic Parkinson’s disease is associated with reduced circular RNA expression. NPJ Parkinsons Dis. 10, 25 (2024).
CAS PubMed PubMed Central Google Scholar
Guo, D. F. et al. The BBSome regulates mitochondria dynamics and function. Mol. Metab. 67, 101654 (2023).
CAS PubMed Google Scholar
Borrageiro, G., Haylett, W., Seedat, S., Kuivaniemi, H. & Bardien, S. A review of genome-wide transcriptomics studies in Parkinson’s disease. Eur. J. Neurosci. 47, 1–16 (2018).
PubMed Google Scholar
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
CAS PubMed PubMed Central Google Scholar
Marek, K. et al. The Parkinson’s progression markers initiative (PPMI)—establishing a PD biomarker cohort. Ann. Clin. Transl. Neurol. 5, 1460–1477 (2018).
CAS PubMed PubMed Central Google Scholar
Iwaki, H. et al. Accelerating medicines partnership: Parkinson’s disease. Genetic resource. Mov. Disord. 36, 1795–1804 (2021).
CAS PubMed PubMed Central Google Scholar
Craig, D. W. et al. RNA sequencing of whole blood reveals early alterations in immune cells and gene expression in Parkinson’s disease. Nat. Aging 1, 734–747 (2021).
PubMed Google Scholar
Munoz-Delgado, L. et al. Peripheral inflammatory immune response differs among sporadic and familial Parkinson’s disease. NPJ Parkinsons Dis. 9, 12 (2023).
CAS PubMed PubMed Central Google Scholar
Grozdanov, V. & Danzer, K. M. Intracellular alpha-synuclein and immune cell function. Front. Cell Dev. Biol. 8, 562692 (2020).
PubMed PubMed Central Google Scholar
Kasen, A. et al. Upregulation of alpha-synuclein following immune activation: possible trigger of Parkinson’s disease. Neurobiol. Dis. 166, 105654 (2022).
CAS PubMed Google Scholar
Tansey, M. G. et al. Inflammation and immune dysfunction in Parkinson disease. Nat. Rev. Immunol. 22, 657–673 (2022).
CAS PubMed PubMed Central Google Scholar
Cardinale, A., Calabrese, V., de Iure, A. & Picconi, B. Alpha-synuclein as a prominent actor in the inflammatory synaptopathy of Parkinson’s disease. Int. J. Mol. Sci. 22, https://doi.org/10.3390/ijms22126517 (2021).
Uhlen, M. et al. A genome-wide transcriptomic analysis of protein-coding genes in human blood cells. Science 366, https://doi.org/10.1126/science.aax9198 (2019).
Chen, S. et al. A novel XGBoost method to infer the primary lesion of 20 solid tumor types from gene expression data. Front. Genet. 12, 632761 (2021).
CAS PubMed PubMed Central Google Scholar
Chen, T. & Guestrin, C. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Mollenhauer, B. et al. Longitudinal analyses of cerebrospinal fluid alpha-Synuclein in prodromal and early Parkinson’s disease. Mov. Disord. 34, 1354–1364 (2019).
CAS PubMed PubMed Central Google Scholar
Siddiqui, I. J., Pervaiz, N. & Abbasi, A. A. The Parkinson Disease gene SNCA: evolutionary and structural insights with pathological implication. Sci. Rep. 6, 24475 (2016).
CAS PubMed PubMed Central Google Scholar
Mazaki, Y., Handa, H., Fumoto, Y., Horinouchi, T. & Onodera, Y. LRRK2 is involved in the chemotaxis of neutrophils and differentiated HL-60 cells, and the inhibition of LRRK2 kinase activity increases fMLP-induced chemotactic activity. Cell Commun. Signal. 21, 300 (2023).
CAS PubMed PubMed Central Google Scholar
Uhlen, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Kluge, A. et al. alpha-Synuclein pathology in PRKN-linked Parkinson’s disease: new insights from a blood-based seed amplification assay. Ann. Neurol. 95, 1173–1177 (2024).
CAS PubMed Google Scholar
Banerjee, K. et al. Alpha-synuclein induced membrane depolarization and loss of phosphorylation capacity of isolated rat brain mitochondria: implications in Parkinson’s disease. FEBS Lett. 584, 1571–1576 (2010).
CAS PubMed Google Scholar
Abramov, A. Y. et al. Mechanism of neurodegeneration of neurons with mitochondrial DNA mutations. Brain 133, 797–807 (2010).
PubMed PubMed Central Google Scholar
Martin, L. J., Semenkow, S., Hanaford, A. & Wong, M. Mitochondrial permeability transition pore regulates Parkinson’s disease development in mutant alpha-synuclein transgenic mice. Neurobiol. Aging 35, 1132–1152 (2014).
CAS PubMed Google Scholar
Burns, R. S. et al. A primate model of parkinsonism: selective destruction of dopaminergic neurons in the pars compacta of the substantia nigra by N-methyl-4-phenyl-1,2,3,6-tetrahydropyridine. Proc. Natl. Acad. Sci. USA 80, 4546–4550 (1983).
CAS PubMed PubMed Central Google Scholar
Barnhoorn, S. et al. Orthogonal analysis of mitochondrial function in Parkinson’s disease patients. Cell Death Dis. 15, 243 (2024).
CAS PubMed PubMed Central Google Scholar
Seal, R. L. et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 51, D1003–D1009 (2023).
CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported in part by a grant from NINDS (U01-NS120260) (K.X., D.W.C., and I.V.). This research was supported and funded by the MJFF under grant numbers 12749 (K.V.K.-J.), 12749.01 (K.V.K.-J., M.R.C.) and 14696 (K.V.K.-J., A.M., D.W.C.). Many MJFF staff assisted in harmonizing and transferring data. This research was supported in part by the Intramural Research Program of the National Institutes of Health, National Institute on Aging. Data used in the preparation of this article were obtained on 2022-12-09 from the Parkinson’s Progression Markers Initiative (PPMI) database (https://www.ppmi-info.org/access-data-specimens/download-data), RRID:SCR_006431. For up-to-date information on the study, visit http://www.ppmi-info.orgg. PPMI—a public-private partnership—is funded by the Michael J. Fox Foundation for Parkinson’s Research and funding partners, including 4D Pharma, Abbvie, AcureX, Allergan, Amathus Therapeutics, Aligning Science Across Parkinson's, AskBio, Avid Radiopharmaceuticals, BIAL, BioArctic, Biogen, Biohaven, BioLegend, BlueRock Therapeutics, Bristol-Myers Squibb, Calico Labs, Capsida Biotherapeutics, Celgene, Cerevel Therapeutics, Coave Therapeutics, DaCapo Brainscience, Denali, Edmond J. Safra Foundation, Eli Lilly, Gain Therapeutics, GE HealthCare, Genentech, GSK, Golub Capital, Handl Therapeutics, Insitro, Jazz Pharmaceuticals, Johnson & Johnson Innovative Medicine, Lundbeck, Merck, Meso Scale Discovery, Mission Therapeutics, Neurocrine Biosciences, Neuron23, Neuropore, Pfizer, Piramal, Prevail Therapeutics, Roche, Sanofi, Servier, Sun Pharma Advanced Research Company, Takeda, Teva, UCB, Vanqua Bio, Verily, Voyager Therapeutics, the Weston Family Foundation and Yumanity Therapeutics. We also thank the AMP PD® for allowing us access to PDBP data. The PDBP consortium is supported by the National Institute of Neurological Disorders and Stroke at the National Institutes of Health. A full list of PDBP investigators can be found at https://pdbp.ninds.nih.gov/policy. PDBP investigators have not participated in reviewing the data analysis or content of the text. Data used in the preparation of this article were obtained from the AMP PD® Knowledge Platform. For up-to-date information on the study, visit https://www.amp-pd.org. The AMP PD®, a public–private partnership, is managed by the FNIH and funded by Celgene, GlaxoSmithKline, the MJFF, the National Institute of Neurological Disorders and Stroke, Pfizer, Sanofi and Verily. We thank all people with PD and families for participating in the study and donating their samples and time. We would like to thank the group at IU for RNA isolation, QC metrics and safe shipment of samples.

Author information

Authors and Affiliations

Integrative Translational Sciences, Beckman Research Institute City of Hope, Duarte, CA, USA
Kayla Y. Xu & David W. Craig
Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Ivo Violich
DataTecnica LLC, Washington, DC, USA
Elizabeth Hutchins & Mike A. Nalls
Center for Alzheimer’s and Related Dementias, NIH, Bethesda, MD, USA
Elizabeth Hutchins & Mike A. Nalls
Translational Genomics Research Institute, Phoenix, AZ, USA
Eric Alsop & Kendall Van Keuren-Jensen
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
Anni Moore, Cornelis Blauwendraat, J. Raphael Gibbs & Mark R. Cookson

Authors

Kayla Y. Xu
View author publications
Search author on:PubMed Google Scholar
Ivo Violich
View author publications
Search author on:PubMed Google Scholar
Elizabeth Hutchins
View author publications
Search author on:PubMed Google Scholar
Eric Alsop
View author publications
Search author on:PubMed Google Scholar
Mike A. Nalls
View author publications
Search author on:PubMed Google Scholar
Anni Moore
View author publications
Search author on:PubMed Google Scholar
Cornelis Blauwendraat
View author publications
Search author on:PubMed Google Scholar
J. Raphael Gibbs
View author publications
Search author on:PubMed Google Scholar
Mark R. Cookson
View author publications
Search author on:PubMed Google Scholar
Kendall Van Keuren-Jensen
View author publications
Search author on:PubMed Google Scholar
David W. Craig
View author publications
Search author on:PubMed Google Scholar

Contributions

C.B., M.R.C., K.V.K.-J., and D.W.C. contributed to the overall study design. J.R.G. contributed to analysis strategy. K.X., I.V., E.H., E.A., M.A.N., A.M., C.B., and J.R.G. contributed to the primary analyses of the project. A.M. conducted genomic analyses. I.V., E.H., and E.A. contributed to data processing. K.X. and D.W.C. contributed to writing the manuscript. All authors contributed to data interpretations and review and editing of the manuscript.

Corresponding author

Correspondence to David W. Craig.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, K.Y., Violich, I., Hutchins, E. et al. Decreased SNCA expression in whole-blood RNA analysis of Parkinson’s disease adjusting for neutrophils. npj Parkinsons Dis. 11, 292 (2025). https://doi.org/10.1038/s41531-025-01062-4

Download citation

Received: 19 November 2024
Accepted: 22 June 2025
Published: 14 October 2025
DOI: https://doi.org/10.1038/s41531-025-01062-4

Subjects

Abstract

Similar content being viewed by others

RNA sequencing of whole blood reveals early alterations in immune cells and gene expression in Parkinson’s disease

Characterizing a complex CT-rich haplotype in intron 4 of SNCA using large-scale targeted amplicon long-read sequencing

Deep sequencing of sncRNAs reveals hallmarks and regulatory modules of the transcriptome during Parkinson’s disease progression

Introduction

Results

Neutrophil percentage linear modeling and prediction

Analysis of sample variation

Differential gene expression analysis with predicted neutrophil percentage

Pathway analysis highlights mitochondrial dysfunction in PD

PD cohorts exhibit niches of gene expression in unsupervised analysis

Demographic, clinical, and biological factors influencing SNCA gene expression

Discussion

Methods

Data

Genome and transcriptome alignment, quantification, and quality control

Neutrophil percentage linear modeling and prediction

Variance analysis

Differential expression analysis

Ingenuity pathway analysis

Dimensionality reduction and density plots

Statistical software

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links