Table 4 Hyperparameter search spaces for each classification model used in GridSearchCV. (Abbreviations: kNN = k-nearest neighbors, LR = logistic regression, NB = naïve Bayes, DT = decision tree, SVM = support vector machine, RF = random forest, and XGBoost = extreme gradient boosting. Default parameters indicates models without tunable hyperparameters.).

From: A baseline study of interpretable machine learning using GC-MS breath VOCs for classifying asthma, bronchiectasis, and COPD

Models

Scikit-learn parameter names

Hyperparameter grid

kNN

n_neighbors

[3, 5, 7, 9, 11, 13, 15]

LR

C

[0.001, 0.01, 0.1, 1, 10, 100]

NB

Default parameters (GaussianNB)

DT

max_depth

[5, 10, 15, 20, None]

min_samples_split

[2, 5, 10, 15]

SVM

C

[0.01, 0.1, 1, 10, 100]

kernel

[‘linear’, ‘rbf’, ‘poly’]

degree

[2, 3, 4, 5]

RF

n_estimators

[50, 100, 200, 500]

max_depth

[5, 10, 15, 20, None]

min_samples_split

[2, 5, 10, 15]

XGBoost

n_estimators

[50, 100, 200, 500]

max_depth

[3, 6, 9]

learning_rate

[0.01, 0.1, 0.2]

subsample

[0.6, 0.8, 1.0]

colsample_bytree

[0.6, 0.8, 1.0]