Figure 4

The classification pipeline for a given clinical outcome inference is a three-stage non-linear mapping, formally \(\Phi : \Re ^{f \times d} \rightarrow \Re\). The pipeline requires a matrix containing f composite scores generated over d consecutive days as an input for a single subject. During the learning phase, the time window of length d is centred around the clinical visit where EDSS, NHPT, and SDMT scores are recorded. The matrix values are subsequently normalized, and missing values (due to insufficient data within a day) are imputed via chained equations22,23, a.k.a. Iterative Imputer. The second stage pipeline delivers an output \(\tilde{y} \in \Re ^{d}\) with d predicted probabilities coming from an ensemble model composed of three classifiers (two for the clinical diagnosis) pointing to a soft voting meta learner. The third and last stage of the pipeline yields the actual prediction denoted as \(\hat{y} \in \Re\) obtained by averaging the probabilities, i.e. \(\hat{y} = \sum _{i = 1}^{d} \tilde{y}_{i}\). Depending the target and the corresponding feature set, \(\hat{y}\) provides the estimation relative to the clinical diagnosis (HC versus pwMS), disease severity level based on EDSS, manual dexterity and cognitive function level based on NHPT and SDMT, respectively.