Table 3 Comparison of performance of determining fasting status by XGBoost, CatBoost, H2O Ensemble and logistic regression models in the testing dataset (n = 70,644).

From: Application of machine learning methods for the prediction of true fasting status in patients performing blood tests

Algorithm/modeling strategy

Feature

Sensitivity

Specificity

Precision

F1-score

Accuracy

AUC

Parsimonious modeling

Logistic regression

Model 2*

0.7608

0.8084

0.8081

0.7804

0.7845

0.868 (0.865–0.870)

XGBoost

Model 2*

0.8261

0.7700

0.7844

0.8047

0.7982

0.887 (0.885–0.890)

CatBoost

Model 2*

0.8415

0.7614

0.7813

0.8103

0.8017

0.889 (0.887–0.892)

H2O Ensemble

Model 2*

0.8823

0.7093

0.7546

0.8135

0.7964

0.886 (0.884–0.889)

Full modeling

XGBoost

67

0.8394

0.7785

0.7934

0.8158

0.8092

0.896 (0.894–0.898)

CatBoost

67

0.8511

0.7574

0.7805

0.8142

0.8046

0.892 (0.890–0.894)

H2O Ensemble

67

0.8770

0.7399

0.7735

0.8220

0.8089

0.897 (0.894–0.899)

Feature selection modeling

XGBoost

Top 45

0.8369

0.7789

0.7932

0.8145

0.8081

0.895 (0.892–0.897)

XGBoost

Top 35

0.8413

0.7735

0.7901

0.8149

0.8076

0.894 (0.892–0.897)

XGBoost

Top 25

0.8414

0.7706

0.7880

0.8138

0.8062

0.893 (0.891–0.896)

XGBoost

Top 10

0.8502

0.7496

0.7748

0.8108

0.8002

0.887 (0.885–0.890)

  1. *Model 2 involves the features including glucose, age, male, timing of the day, interval between request and sampling, No. of outpatient visits, distance from home to hospital, division, hypertension, diabetes, coronary artery disease, stroke, statin use, and concomitant lipid testing as in Table 2.