Table 4 Single classifiers representation.
From: Mitigating class imbalance in churn prediction with ensemble methods and SMOTE
Classifier | Representation | Description |
---|---|---|
Parametric classifiers | ||
 Logistic regressor (LR) | \(\:\widehat{y}=\sigma\:(w.x+b)\) | \(\:\widehat{y}\) is the predicted probability of the positive class. ​\(\:\sigma\:\left(z\right)\) is the sigmoid function, defined as \(\:\sigma\:\left(z\right)=\frac{1}{1+{e}^{-z}}\) \(\:w\) is the vector of weights. \(\:x\) is the input feature vector. \(\:b\) is the bias term. |
 Linear discriminant analysis (LDA) | \(\:{\delta\:}_{k}\left(x\right)={x}^{T}{{\Sigma\:}}^{-1}{\mu\:}_{k}-\frac{1}{2}{\mu\:}_{k}^{T}{{\Sigma\:}}^{-1}{\mu\:}_{k}+log{\:\pi\:}_{k}\) | x is the input feature vector. \(\:{\mu\:}_{k}\) is the mean vector of class \(\:{\Sigma\:}\) is the common covariance matrix. \(\:{\:\pi\:}_{k}\) is the prior probability of class |
 Naive Bayes (NB) | \(\:\widehat{y}={argmax}_{k}\left(\text{log}P\left({C}_{k}\right)+{\sum\:}_{i=1}^{n}logP\left({x}_{i}|{C}_{k}\right)\right)\) | \(\:\widehat{y}\) is the predicted class label \(\:P\left({C}_{k}\right)\) Class Prior Probability \(\:P\left({x}_{i}|{C}_{k}\right)\) Conditional probability |
 Support vector machines (with linear kernel) (LSVM) | \(\:\widehat{y}=sign(w.x+b)\) | \(\:\widehat{y}\) is the predicted class label (+ 1 or − 1). ​\(\:w\) is the vector of weights. \(\:x\) is the input feature vector. \(\:b\) is the bias term. \(\:sign\left(z\right)\) is the sign function, returns + 1 if z > 0, -1 otherwise |
 Quadratic discriminant analysis (QDA) | \(\:{\delta\:}_{k}\left(x\right)=-\frac{1}{2}\text{log}|{{\Sigma\:}}_{k}|-\frac{1}{2}{\left(x-{\mu\:}_{k}\right)}^{T}{{\Sigma\:}}_{k}^{-1}(x-{\mu\:}_{k})\:+log{\:\pi\:}_{k}\) | x is the input feature vector. \(\:{\mu\:}_{k}\) is the mean vector of class \(\:{{\Sigma\:}}_{k}\) is the covariance matrix of class k. \(\:{\:\pi\:}_{k}\) is the prior probability of class \(\:{|{\Sigma\:}}_{k}|\) is the determinant of the covariance matrix \(\:{{\Sigma\:}}_{k}\) |
 Multi layer perceptron (MLP) | \(\:\widehat{y}=\:\varnothing\:({W}^{\left(L\right)}{h}^{\left(L-1\right)}+{b}^{\left(L\right)})\) | L- Total number of layers W – Weight of layers h – hidden layer output \(\:\varnothing\:(.)\) Activation function |
Non parametric classifiers | ||
 K-nearest neighbors (KNN) | \(\:\widehat{y}=mode\left({\{y}_{i\:}\right|i\:\epsilon {\:KNN}_{\left(x\right)}\left\}\right)\) | \(\:\widehat{y}\)is the predicted class label. KNN \(\:{\:KNN}_{\left(x\right)}\) is the set of indices of the nearest neighbors of x ​ \(\:{y}_{i\:}\) are the class labels of the nearest neighbors. |
 Decision trees (DT) | \(\:if\:{x}_{1}\:\le\:\:{\theta\:}_{1}\:then\) \(\:\:\:if\:{x}_{2}\:\le\:\:{\theta\:}_{2}\:then\:\widehat{y}-\:{y}_{1}\:\) \(\:\:\:else\:\widehat{y}=\:{y}_{2}\:\) \(\:else\) \(\:if\:{x}_{3}\:\le\:\:{\theta\:}_{3}\:then\:\widehat{y}-\:{y}_{3}\:\) \(\:\:\:else\:\widehat{y}=\:{y}_{4}\:\) | \(\:{x}_{i}\:\) are the feature values. \(\:{\theta\:}_{i}\)​ are the threshold values for the splits. \(\:\widehat{y}\) is the predicted class label |
 Support vector machines (with non-linear kernel) (KSVM) | \(\:\widehat{y}=sign({\sum\:}_{i=1}^{n}{\alpha\:}_{i}{y}_{i}K\left({x}_{i},x\right)+b)\) | x is the input feature vector. \(\:{\alpha\:}_{i}\) are the Lagrange multipliers (support vector coefficients). \(\:{y}_{i}\) are the class labels of the training examples (± 1 ± 1). \(\:K\left({x}_{i},x\right)\)is the kernel function. b is the bias term. |