Fig. 1 | Scientific Reports

Fig. 1

From: Development and validation of a machine learning approach for screening new leprosy cases based on the leprosy suspicion questionnaire

Fig. 1

Design of machine learning algorithms. The missing data was dropped from the dataset. A data augmentation process was implemented by combining questions. The data set was split in training and test sets in the proportion of 80:20. Based on Φ coefficient (check supplementary material for definition), variables with high associations were dropped from training set and those same variables were dropped from the test set. SMOTE was applied in a copy of the training set and only the variables accepted by Boruta were kept in both training and test set. Four classifiers, Support Vectors Machine (SVM), Logistic Regression (LR), Random Forest (RF) and XGBoost (XGB), had their hyperparameters optimized (HyperOp) within a pipeline with SMOTE implemented and the trained models were used to classify the test set whose predictions were compared to their true values generating several metrics availed to compare performances.

Back to article page