Figure 2: Model performance.

Competitive Phase: (a) Bootstrap distributions for each of the 27 models submitted to the classification subchallenge ordered by overall rank. The top 11 models were significantly better than random at Bonferroni-corrected P value<0.05. Collaborative Phase: (b) Distributions of the models built with randomly sampled SNPs, by team, along with scores for their full model, containing data-driven SNP, as well as clinical variable selection, (pink) and clinical model, which contains clinical variables but excludes SNP data (blue). For 5 of 7 teams, the full models are nominally significantly better relative to the random SNP models for AUPR, AUROC or both (enrichment P value 4.2e−5). (c) AUPR and AUROC of each collaborative phase team’s full model, containing SNP and clinical predictors, versus their clinical model, which does not consider SNP predictors. There was no significant difference in either metric between models developed in the presence or absence of genetic information (paired t-test P value=0.85, 0.82, for AUPR and AUROC, respectively).