Table 4 Performance of the new and Korean undiagnosed diabetes screening method in the development and validation datasets.

From: Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods

 

Model

Screening method

Feature

AUC

(95% CI)

Youden index

Sensitivity (%)

Specificity (%)

PPV

NPV

PLR

NLR

Train and Internal validation set

Lee* + RHR

Risk score

Sex, Age, WC, RHR, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption

0.756

(0.728 to 784)

39

70

69

0.09

0.98

2.24

0.44

Logistic Regression

Logistic Regression

0.799

(0.775 to 0.823)

45.4

83.20

62.20

0.09

0.99

2.21

0.27

Random Forest

Random Forest Classifier

0.794

(0.77 to 0.818)

48.3

86.60

61.70

0.09

0.99

2.3

0.22

LGBM

LightGBM Classifier

0.802

(0.778 to 0.826)

45.1

83.50

61.60

0.09

0.99

2.17

0.27

XGB

XGBoost Classifier

0.796

(0.772 to 0.820)

44.9

81.40

63.50

0.09

0.99

2.35

0.23

Ada

AdaBoost Classifier

0.796

(0.772 to 0.820)

44.3

80.80

63.50

0.09

0.99

2.21

0.3

External validation set

Lee* + RHR

Risk score

Sex, Age, WC, RHR, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption

0.765

(0.738 to 0.792)

42

78

64

0.11

0.98

2.17

0.35

Logistic Regression

Logistic Regression

0.808

(0.793 to 0.823)

48.7

88.70

59.90

0.11

0.99

2.21

0.18

Random Forest

Random Forest Classifier

0.807

(0.792 to 0.822)

47.6

83.50

64.03

0.11

0.98

2.32

0.26

LGBM

LightGBM Classifier

0.811

(0.796 to 0.826)

48.3

84.00

64.30

0.11

0.99

2.35

0.25

XGB

XGBoost Classifier

0.810

(0.975 to 0.825)

48

85.20

63.00

0.11

0.99

2.29

0.23

Ada

AdaBoost Classifier

0.800

(0.784 to 0.816)

46.3

84.50

61.80

0.11

0.99

2.21

0.25

  1. *Lee et al. 20125 and Park et al. 20226 When Lee model’s + RHR (Park et al., 2022) performance was tested, data from 2019, 2020 were used to build prediction model and data from 2014, 2015, 2016, 2017, 2018 were used to validate. WC: Waist circumference, RHR: Resting heart rate, LGBM: Light Gradient Boosting Machine, XGB: Extreme Gradient Boosting, Ada: Ada Boost. AUC: The receiver operating characteristics curve under the curve. For this study, five different machine learning classification algorithms were used to predict undiagnosed diabetes. Based on their performance assessed by AUC, results from the best performed machine learning classification was used.