Scientific Reports

Table 4 Single classifiers representation.

From: Mitigating class imbalance in churn prediction with ensemble methods and SMOTE

Classifier	Representation	Description
Parametric classifiers
Logistic regressor (LR)	\(\:\widehat{y}=\sigma\:(w.x+b)\)	\(\:\widehat{y}\) is the predicted probability of the positive class. \(\:\sigma\:\left(z\right)\) is the sigmoid function, defined as \(\:\sigma\:\left(z\right)=\frac{1}{1+{e}^{-z}}\) \(\:w\) is the vector of weights. \(\:x\) is the input feature vector. \(\:b\) is the bias term.
Linear discriminant analysis (LDA)	\(\:{\delta\:}_{k}\left(x\right)={x}^{T}{{\Sigma\:}}^{-1}{\mu\:}_{k}-\frac{1}{2}{\mu\:}_{k}^{T}{{\Sigma\:}}^{-1}{\mu\:}_{k}+log{\:\pi\:}_{k}\)	x is the input feature vector. \(\:{\mu\:}_{k}\) is the mean vector of class \(\:{\Sigma\:}\) is the common covariance matrix. \(\:{\:\pi\:}_{k}\) is the prior probability of class
Naive Bayes (NB)	\(\:\widehat{y}={argmax}_{k}\left(\text{log}P\left({C}_{k}\right)+{\sum\:}_{i=1}^{n}logP\left({x}_{i}\|{C}_{k}\right)\right)\)	\(\:\widehat{y}\) is the predicted class label \(\:P\left({C}_{k}\right)\) Class Prior Probability \(\:P\left({x}_{i}\|{C}_{k}\right)\) Conditional probability
Support vector machines (with linear kernel) (LSVM)	\(\:\widehat{y}=sign(w.x+b)\)	\(\:\widehat{y}\) is the predicted class label (+ 1 or − 1). \(\:w\) is the vector of weights. \(\:x\) is the input feature vector. \(\:b\) is the bias term. \(\:sign\left(z\right)\) is the sign function, returns + 1 if z > 0, -1 otherwise
Quadratic discriminant analysis (QDA)	\(\:{\delta\:}_{k}\left(x\right)=-\frac{1}{2}\text{log}\|{{\Sigma\:}}_{k}\|-\frac{1}{2}{\left(x-{\mu\:}_{k}\right)}^{T}{{\Sigma\:}}_{k}^{-1}(x-{\mu\:}_{k})\:+log{\:\pi\:}_{k}\)	x is the input feature vector. \(\:{\mu\:}_{k}\) is the mean vector of class \(\:{{\Sigma\:}}_{k}\) is the covariance matrix of class k. \(\:{\:\pi\:}_{k}\) is the prior probability of class \(\:{\|{\Sigma\:}}_{k}\|\) is the determinant of the covariance matrix \(\:{{\Sigma\:}}_{k}\)
Multi layer perceptron (MLP)	\(\:\widehat{y}=\:\varnothing\:({W}^{\left(L\right)}{h}^{\left(L-1\right)}+{b}^{\left(L\right)})\)	L- Total number of layers W – Weight of layers h – hidden layer output \(\:\varnothing\:(.)\) Activation function
Non parametric classifiers
K-nearest neighbors (KNN)	\(\:\widehat{y}=mode\left({\{y}_{i\:}\right\|i\:\epsilon {\:KNN}_{\left(x\right)}\left\}\right)\)	\(\:\widehat{y}\)is the predicted class label. KNN \(\:{\:KNN}_{\left(x\right)}\) is the set of indices of the nearest neighbors of x \(\:{y}_{i\:}\) are the class labels of the nearest neighbors.
Decision trees (DT)	\(\:if\:{x}_{1}\:\le\:\:{\theta\:}_{1}\:then\) \(\:\:\:if\:{x}_{2}\:\le\:\:{\theta\:}_{2}\:then\:\widehat{y}-\:{y}_{1}\:\) \(\:\:\:else\:\widehat{y}=\:{y}_{2}\:\) \(\:else\) \(\:if\:{x}_{3}\:\le\:\:{\theta\:}_{3}\:then\:\widehat{y}-\:{y}_{3}\:\) \(\:\:\:else\:\widehat{y}=\:{y}_{4}\:\)	\(\:{x}_{i}\:\) are the feature values. \(\:{\theta\:}_{i}\) are the threshold values for the splits. \(\:\widehat{y}\) is the predicted class label
Support vector machines (with non-linear kernel) (KSVM)	\(\:\widehat{y}=sign({\sum\:}_{i=1}^{n}{\alpha\:}_{i}{y}_{i}K\left({x}_{i},x\right)+b)\)	x is the input feature vector. \(\:{\alpha\:}_{i}\) are the Lagrange multipliers (support vector coefficients). \(\:{y}_{i}\) are the class labels of the training examples (± 1 ± 1). \(\:K\left({x}_{i},x\right)\)is the kernel function. b is the bias term.

Back to article page

Search

Advanced search

Quick links