Table 1 Parameters used for models when comparing to EBMs

From: StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

Model

Implementation

Hyperparameter levels

Decision tree

rpart (R)

Cost_complexity: 0.00001, 0.0001, 0.001, 0.01, 0.1

Logistic regression

glmnet (R)

Penalty: 0.000001, 0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10

Mixture: 0, 0.5, 1

Random forest

ranger (R)

mtry : 1, 4, 7

trees: 500, 1000, 2000

XGBoost

xbgoost (python/gpu accel)

max_depth : 3, 6, 9

n_estimators: 100, 500, 1000

gamma: 1, 10, 100

  1. All models (including the EBMs) were trained on a compute cluster with 512 GB memory, 2 20-core Intel Xeon E52698 v4 CPUs, and 8 Nvidia Tesla V100 (per node). Each job was allowed 3 days of compute time. Of all the algorithms used (including EBMs), only xgboost was able to take advantage of GPU acceleration.