Table 4 Single classifiers representation.

From: Mitigating class imbalance in churn prediction with ensemble methods and SMOTE

Classifier

Representation

Description

Parametric classifiers

 Logistic regressor (LR)

\(\:\widehat{y}=\sigma\:(w.x+b)\)

\(\:\widehat{y}\) is the predicted probability of the positive class.

​\(\:\sigma\:\left(z\right)\) is the sigmoid function, defined as \(\:\sigma\:\left(z\right)=\frac{1}{1+{e}^{-z}}\)

\(\:w\) is the vector of weights.

\(\:x\) is the input feature vector.

\(\:b\) is the bias term.

 Linear discriminant analysis (LDA)

\(\:{\delta\:}_{k}\left(x\right)={x}^{T}{{\Sigma\:}}^{-1}{\mu\:}_{k}-\frac{1}{2}{\mu\:}_{k}^{T}{{\Sigma\:}}^{-1}{\mu\:}_{k}+log{\:\pi\:}_{k}\)

x is the input feature vector.

\(\:{\mu\:}_{k}\) is the mean vector of class

\(\:{\Sigma\:}\) is the common covariance matrix.

\(\:{\:\pi\:}_{k}\) is the prior probability of class

 Naive Bayes (NB)

\(\:\widehat{y}={argmax}_{k}\left(\text{log}P\left({C}_{k}\right)+{\sum\:}_{i=1}^{n}logP\left({x}_{i}|{C}_{k}\right)\right)\)

\(\:\widehat{y}\) is the predicted class label

\(\:P\left({C}_{k}\right)\) Class Prior Probability

\(\:P\left({x}_{i}|{C}_{k}\right)\) Conditional probability

 Support vector machines (with linear kernel) (LSVM)

\(\:\widehat{y}=sign(w.x+b)\)

\(\:\widehat{y}\) is the predicted class label (+ 1 or − 1).

​\(\:w\) is the vector of weights.

\(\:x\) is the input feature vector.

\(\:b\) is the bias term.

\(\:sign\left(z\right)\) is the sign function, returns + 1 if z > 0, -1 otherwise

 Quadratic discriminant analysis (QDA)

\(\:{\delta\:}_{k}\left(x\right)=-\frac{1}{2}\text{log}|{{\Sigma\:}}_{k}|-\frac{1}{2}{\left(x-{\mu\:}_{k}\right)}^{T}{{\Sigma\:}}_{k}^{-1}(x-{\mu\:}_{k})\:+log{\:\pi\:}_{k}\)

x is the input feature vector.

\(\:{\mu\:}_{k}\) is the mean vector of class

\(\:{{\Sigma\:}}_{k}\) is the covariance matrix of class k.

\(\:{\:\pi\:}_{k}\) is the prior probability of class

\(\:{|{\Sigma\:}}_{k}|\) is the determinant of the covariance matrix \(\:{{\Sigma\:}}_{k}\)

 Multi layer perceptron (MLP)

\(\:\widehat{y}=\:\varnothing\:({W}^{\left(L\right)}{h}^{\left(L-1\right)}+{b}^{\left(L\right)})\)

L- Total number of layers

W – Weight of layers

h – hidden layer output

\(\:\varnothing\:(.)\) Activation function

Non parametric classifiers

 K-nearest neighbors (KNN)

\(\:\widehat{y}=mode\left({\{y}_{i\:}\right|i\:\epsilon {\:KNN}_{\left(x\right)}\left\}\right)\)

\(\:\widehat{y}\)is the predicted class label.

KNN

\(\:{\:KNN}_{\left(x\right)}\) is the set of indices of the nearest neighbors of x

​ \(\:{y}_{i\:}\) are the class labels of the nearest neighbors.

 Decision trees (DT)

\(\:if\:{x}_{1}\:\le\:\:{\theta\:}_{1}\:then\)

\(\:\:\:if\:{x}_{2}\:\le\:\:{\theta\:}_{2}\:then\:\widehat{y}-\:{y}_{1}\:\)

\(\:\:\:else\:\widehat{y}=\:{y}_{2}\:\)

\(\:else\)

\(\:if\:{x}_{3}\:\le\:\:{\theta\:}_{3}\:then\:\widehat{y}-\:{y}_{3}\:\)

\(\:\:\:else\:\widehat{y}=\:{y}_{4}\:\)

\(\:{x}_{i}\:\) are the feature values.

\(\:{\theta\:}_{i}\)​ are the threshold values for the splits.

\(\:\widehat{y}\) is the predicted class label

 Support vector machines (with non-linear kernel) (KSVM)

\(\:\widehat{y}=sign({\sum\:}_{i=1}^{n}{\alpha\:}_{i}{y}_{i}K\left({x}_{i},x\right)+b)\)

x is the input feature vector.

\(\:{\alpha\:}_{i}\) are the Lagrange multipliers (support vector coefficients).

\(\:{y}_{i}\) are the class labels of the training examples (± 1 ± 1).

\(\:K\left({x}_{i},x\right)\)is the kernel function.

b is the bias term.