Fig. 1: Workflow diagram of regression model development for predicting neutrophil percentage from gene expression data. | npj Parkinson's Disease

Fig. 1: Workflow diagram of regression model development for predicting neutrophil percentage from gene expression data.

From: Decreased SNCA expression in whole-blood RNA analysis of Parkinson’s disease adjusting for neutrophils

Fig. 1

1254 passing samples with CBC test results were used to create machine learning regression models to predict neutrophil percentage. a, b, d Train-test splits for regression model development were created by randomly splitting the 600 unique participants between an 80% train set and 20% test set, then assigning the respective samples to the corresponding set. Three different linear models were created to compare the performance of different methods of feature selection: a biology-based via selection of only blood cell enriched genes, b data-driven via mutual information feature selection from all genes, and d combining the methods to include genes from both biology-based and data-driven selection. c Additionally, an XGBoost regression model (c) was developed with all 58,780 transformed gene counts. We used the best-performing model to predict neutrophil percentage for 2932 PDBP samples and 2711 PPMI samples with no known neutrophil percentage.

Back to article page