Fig. 1: Schematic overview of the study workflow.

The input data types used for the cross-validated machine learning analyses are highlighted on the left and the prediction tasks on the right. The prediction tasks focus on the estimation of four groups of outcomes (roughly sorted by increasing complexity): (1) disease diagnosis, (2) motor score severity, (3) gait and mobility impairments, and (4) comorbidities, non-motor outcomes, and progression rate (measured by the average annual change in the MDS-UPDRS III motor score over four years of follow-up and categorized as slow or fast, depending on whether the change falls in the lower or upper quartile, respectively). For the first three data modalities, unimodal machine learning models were built using gait data only, as this was sufficient to achieve satisfactory cross-validation performance, whereas for the more challenging fourth group of tasks, aimed at detecting comorbidities, non-motor outcomes and progression rate subgroups, multimodal models combining gait, omics and clinical data were built in addition to comparing the individual data modalities.