Table 4 Performance of the new and Korean undiagnosed diabetes screening method in the development and validation datasets.

	Model	Screening method	Feature	AUC (95% CI)	Youden index	Sensitivity (%)	Specificity (%)	PPV	NPV	PLR	NLR
Train and Internal validation set	Lee* + RHR	Risk score	Sex, Age, WC, RHR, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption	0.756 (0.728 to 784)	39	70	69	0.09	0.98	2.24	0.44
	Logistic Regression	Logistic Regression		0.799 (0.775 to 0.823)	45.4	83.20	62.20	0.09	0.99	2.21	0.27
	Random Forest	Random Forest Classifier		0.794 (0.77 to 0.818)	48.3	86.60	61.70	0.09	0.99	2.3	0.22
	LGBM	LightGBM Classifier		0.802 (0.778 to 0.826)	45.1	83.50	61.60	0.09	0.99	2.17	0.27
	XGB	XGBoost Classifier		0.796 (0.772 to 0.820)	44.9	81.40	63.50	0.09	0.99	2.35	0.23
	Ada	AdaBoost Classifier		0.796 (0.772 to 0.820)	44.3	80.80	63.50	0.09	0.99	2.21	0.3
External validation set	Lee* + RHR	Risk score	Sex, Age, WC, RHR, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption	0.765 (0.738 to 0.792)	42	78	64	0.11	0.98	2.17	0.35
	Logistic Regression	Logistic Regression		0.808 (0.793 to 0.823)	48.7	88.70	59.90	0.11	0.99	2.21	0.18
	Random Forest	Random Forest Classifier		0.807 (0.792 to 0.822)	47.6	83.50	64.03	0.11	0.98	2.32	0.26
	LGBM	LightGBM Classifier		0.811 (0.796 to 0.826)	48.3	84.00	64.30	0.11	0.99	2.35	0.25
	XGB	XGBoost Classifier		0.810 (0.975 to 0.825)	48	85.20	63.00	0.11	0.99	2.29	0.23
	Ada	AdaBoost Classifier		0.800 (0.784 to 0.816)	46.3	84.50	61.80	0.11	0.99	2.21	0.25

*Lee et al. 2012⁵ and Park et al. 2022⁶ When Lee model’s + RHR (Park et al., 2022) performance was tested, data from 2019, 2020 were used to build prediction model and data from 2014, 2015, 2016, 2017, 2018 were used to validate. WC: Waist circumference, RHR: Resting heart rate, LGBM: Light Gradient Boosting Machine, XGB: Extreme Gradient Boosting, Ada: Ada Boost. AUC: The receiver operating characteristics curve under the curve. For this study, five different machine learning classification algorithms were used to predict undiagnosed diabetes. Based on their performance assessed by AUC, results from the best performed machine learning classification was used.

Search