Figure 1

Effect of threshold selection on the relative proportion of missing data and the number of remaining variables. With increasing threshold levels (‘Minimum Number of Observations’), more limitedly represented variables are removed, thereby decreasing the amount of missing data (A) and remaining variables (B). A high number of variables has to be removed to obtain a high decrease in missing data (C). The dashed lines represent the additional data sets to be used for evaluating model performance and the effect of model settings. Thresholds were based on visual inspection and obtained drop in missing data.