Supplementary Figure 8: Machine learning and predictor informativeness. | Nature Genetics

Supplementary Figure 8: Machine learning and predictor informativeness.

From: A genetics-led approach defines the drug target landscape of 30 immune-related traits

Supplementary Figure 8

a, Performance comparisons between machine learning algorithms. Area under the curve (AUC) shown with 95% confidence intervals based on 10-repeated 3-fold cross validation per algorithm with optimized tuning parameters. Per fold, two thirds of GSPs and GSNs used for training, one third for performance evaluation. GSPs are based on target genes of drugs at phase 2 and above (that is, clinical proof-of-concept targets). Using random forest consistently outperforms or is competitive to the top performer of other algorithms (followed by state-of-the-art boosting algorithms, classical ones, and generalized linear algorithms). b, Relative importance of predictors. Measured by decrease in accuracy (disabling that predictor) scaled relative to maximum decrease, estimated by random forest. Annotation predictors with knowledge of genomic seed genes are in general more informative than genomic predictors based on actual ‘usage’ of predictors by random forest; this is consistent with performance directly measured for individual predictors (Fig. 3b).

Back to article page