Table 4 Hyperparameter search spaces for each classification model used in GridSearchCV. (Abbreviations: kNN = k-nearest neighbors, LR = logistic regression, NB = naïve Bayes, DT = decision tree, SVM = support vector machine, RF = random forest, and XGBoost = extreme gradient boosting. Default parameters indicates models without tunable hyperparameters.).
Models | Scikit-learn parameter names | Hyperparameter grid |
|---|---|---|
kNN | n_neighbors | [3, 5, 7, 9, 11, 13, 15] |
LR | C | [0.001, 0.01, 0.1, 1, 10, 100] |
NB | – | Default parameters (GaussianNB) |
DT | max_depth | [5, 10, 15, 20, None] |
min_samples_split | [2, 5, 10, 15] | |
SVM | C | [0.01, 0.1, 1, 10, 100] |
kernel | [‘linear’, ‘rbf’, ‘poly’] | |
degree | [2, 3, 4, 5] | |
RF | n_estimators | [50, 100, 200, 500] |
max_depth | [5, 10, 15, 20, None] | |
min_samples_split | [2, 5, 10, 15] | |
XGBoost | n_estimators | [50, 100, 200, 500] |
max_depth | [3, 6, 9] | |
learning_rate | [0.01, 0.1, 0.2] | |
subsample | [0.6, 0.8, 1.0] | |
colsample_bytree | [0.6, 0.8, 1.0] |