Table 3 Parameter settings to train the classifiers.

From: An open automation system for predatory journal detection

Model

Parameters

GNB

priors = None

MNB

alpha = 1.0

Logistic Regression

solver = 'liblinear'

Random Forest

n_estimators = 50, random_state = 1

SGD

penalty = ”l2”

SVM

kernel = 'linear', probability = True

KNN

n_neighbors = 4

Voting

estimators = [('gnb', clf1), ('mnb', clf2), ('svm', clf3), ('sgd', clf4), ('lr', clf5), ('rf', clf6), ('knn', clf7)], voting = 'hard'

Voting (no GNB)

estimators = [('mnb', clf2), ('svm', clf3), ('sgd', clf4), ('lr', clf5), ('rf', clf6),('knn', clf7)], voting = 'hard'

Voting (the top 3 model)

estimators = [('sgd', clf4), ('lr', clf5), ('rf', clf6)],voting = 'hard'

  1. Abbreviations: GNB, Gaussian naïve Bayes; MNB, multinomial naïve Bayes; SGD, stochastic gradient descent; SVM, support vector machine; KNN, K-nearest neighbor.
  2. clf1 = GNB, clf2 = MNB, clf3 = SVM, clf4 = SGD, clf5 = LogisticRegression, clf6 = RandomForest, clf7 = KNeighborsClassifier.