Fig. 2: Overall machine-learning framework using both simulated and real data.

In the outer CV loop, the data were randomly split, leaving out 25% of the subjects for testing with 10 replications for the simulated data and 100 replications for the real data. The subjects were stratified with respect to cohort (short-term treatment response) or outcome (long-term treatment response and diagnostic classification). Values missing at random were imputed using estimates derived from the training set. The training and test sets were standardized by the mean and standard deviation derived from the training set. Both training and test data were split into submodalities. We used the following conventional covariates: sex, age, cohort, and handedness. In the inner CV loop, the training data were further split into a training and test set using threefold CV. Threefold CV was selected as a tradeoff between limited sample size and computation time. Algorithm parameters and ensembles were optimized in the inner CV loop with two different approaches (see text). The best performing model was applied to the outer CV loop test set and the prediction of each submodality was combined in a late integration scheme to provide the prediction. The analysis of the real data followed the same framework as the simulated data, except that only the best, median and poorest performing algorithms, parameter settings, and methods learned from the simulated data were applied on the real data. CV cross-validation.