Figure 4 | Scientific Reports

Figure 4

From: Disease severity classification using passively collected smartphone-based keystroke dynamics within multiple sclerosis

Figure 4

The classification pipeline for a given clinical outcome inference is a three-stage non-linear mapping, formally \(\Phi : \Re ^{f \times d} \rightarrow \Re\). The pipeline requires a matrix containing f composite scores generated over d consecutive days as an input for a single subject. During the learning phase, the time window of length d is centred around the clinical visit where EDSS, NHPT, and SDMT scores are recorded. The matrix values are subsequently normalized, and missing values (due to insufficient data within a day) are imputed via chained equations22,23, a.k.a. Iterative Imputer. The second stage pipeline delivers an output \(\tilde{y} \in \Re ^{d}\) with d predicted probabilities coming from an ensemble model composed of three classifiers (two for the clinical diagnosis) pointing to a soft voting meta learner. The third and last stage of the pipeline yields the actual prediction denoted as \(\hat{y} \in \Re\) obtained by averaging the probabilities, i.e. \(\hat{y} = \sum _{i = 1}^{d} \tilde{y}_{i}\). Depending the target and the corresponding feature set, \(\hat{y}\) provides the estimation relative to the clinical diagnosis (HC versus pwMS), disease severity level based on EDSS, manual dexterity and cognitive function level based on NHPT and SDMT, respectively.

Back to article page