Table 3 Comparison of models (n = 212).

Methods	Three mainstream variable encoding methods			Four for data imbalance processing methods				Four ML models				Four XGB models
Methods	WOE	One-Hot	CE	NearMiss	ENN	SMOTE	FL	XGB	SVM	Random Forest	Logistic Regression	XGB	XGB + FC	XGB + FL	XGB + FL + FC
Accuracy	0.967	0.958	0.967	0.967	0.962	0.965	0.967	0.967	0.972	0.962	0.953	0.972	0.976	0.920	0.967
F1	0.851	0.816	0.851	0.796	0.811	0.824	0.851	0.851	0.856	0.810	0.770	0.850	0.878	0.679	0.851
Recall	0.870	0.870	0.870	0.786	0.812	0.835	0.870	0.870	0.820	0.780	0.740	0.739	0.783	0.783	0.870
AUC	0.994	0.990	0.990	0.907	0.932	0.945	0.994	0.994	0.952	0.993	0.972	0.985	0.993	0.976	0.994

Note: FC, feature construction; AUC, area under the ROC curve; ENN, edited nearest neighbor; FL, focal loss; LR, logistic regression; ML, machine-learning; ROC, receiver operating characteristic; RFC, random forest classifier; SVM, support vector machine; WOE, weight of evidence.

Quick links

Search