Table 1 List of HP systematically evaluated during the 10-fold cross-validation grid search analysis to find the best-performing configuration for each model.
Model | Hyperparameters |
|---|---|
Logistic regression (LR) | C = 0.1, 0.5, 1, 2, 5, 10 |
Penalty = l1, l2, elasticnet | |
Solver = liblinear, newton-cg, newton-cholesky | |
sag, saga | |
Support vector machine (SVM) | C = 0.1, 0.5, 1, 2, 5, 10 |
Kernel = linear, rbf, poly | |
Decision tree | Criterion = gini, entropy, log_loss |
Max depth = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Min samples leaf = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Min samples split = 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Random forest | Criterion = gini, entropy, log_loss |
Max depth = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Min samples leaf = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Min samples split = 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
K-nearest neighbors | Algorithm = auto, ball_tree, kd_tree, brute |
Metric = euclidean, manhattan | |
Num neighbors = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Weights = uniform, distance | |
XGBoost | Learning rate = 0.001, 0.01, 0.1, 0.2, 0.3, 0.5 |
Max depth = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | |
Min child weight = 1, 2 | |
Num estimators = 20, 30, 40, 50, 100, 150 | |
Subsample = 0.1, 0.5, 1, 2, 5, 10 |