Fig. 2: Schematic representation of the data pipeline from data collection to feature ranking. | npj Mental Health Research

Fig. 2: Schematic representation of the data pipeline from data collection to feature ranking.

From: A systematic exploration of digital biomarkers for the detection of depressive episodes in bipolar disorder

Fig. 2

a Variable preprocessing, missing data imputation, and feature extraction: This consists of the use of multiple imputations to generate 10 imputed datasets, followed by the extraction of the main statistical characteristics (i.e., features) from each variable, spanning time scales from five minutes to one month. b Variables across various time scales used to generate matrices of features: each variable with a temporal resolution from five minutes to a month is used to generate seven features. All the features of all participants are used to create two matrices, one corresponding to euthymic states and one corresponding to depressive states. These matrices serve as input for the binary classifier. c Classification and generation of performance metrics: Input features and confounding factors used in the ensemble-based binary classifier for distinguishing clinical polarity (euthymia vs. depression). This panel also illustrates the classifier’s performance metrics, including Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves for both the Decision Tree model and the XGBoost model. Performance is quantified by the Area Under the ROC Curve (AU-ROC) and the Area Under the Precision-Recall Curve (AU-PRC). d Feature ranking: SHAPLEY plots are generated to show each feature’s average impact on the model’s predictions and class-specific impacts for the features extracted during euthymia and depression. The plots provide insights into the relative importance of each feature in driving the classifier’s outputs, enhancing model interpretability.

Back to article page