Table 3 Performance of the new and Korean undiagnosed diabetes screening method in the development and validation datasets.

	Model	Screeing method	Feature	AUC (95% CI)	Youden index	Sensitivity (%)	Specificity (%)	PPV	NPV	PLR	NLR
Train & Internal Validation Set	Lee model*	Risk score	Sex, Age, WC, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption	0.750 (0.722 to 0.778)	36	86	51	0.07	0.99	1.74	0.28
	Logistic Regression	Logistic Regression		0.786 (0.761 to 0.811)	42.1	89.50	52.60	0.08	0.99	1.88	0.2
	Random Forest	Random Forest Classifier		0.781 (0.756 to 0.806)	43.5	82.70	60.80	0.08	0.98	2021	0.22
	LGBM	LightGBM Classifier		0.777 (0.751 to 0.803)	42.4	80.80	61.50	0.08	0.98	2.26	0.21
	XGB	XGBoost Classifier		0.786 (0.761 to 0.811)	42.7	82.80	61.20	0.08	0.98	2.31	0.18
	Ada	AdaBoost Classifier		0.785 (0.76 to 0.81)	42.4	80.30	62.10	0.08	0.99	2.12	0.32
External Validation set	Lee	Risk score	Sex, Age, WC, Family history of diabetes, Hypertension status, Smoking status, Alcohol consumption	0.759 (0.741 to 0.777)	36	90	46	0.08	0.99	1.67	0.21
	Logistic Regression	Logistic Regression		0.801 (0.786 to 0.816)	46.4	86.40	60.00	0.1	0.99	2.16	0.23
	Random Forest	Random Forest Classifier		0.792 (0.776 to 0.808)	46.1	83.00	63.10	0.11	0.99	2.25	0.27
	LGBM	LightGBM Classifier		0.795 (0.779 to 0.811)	45.8	81.90	64.00	0.11	0.98	2.27	0.28
	XGB	XGBoost Classifier		0.802 (0.787 to 0.817)	44.4	90.00	54.50	0.1	0.99	1.98	0.18
	Ada	AdaBoost Classifier		0.784 (0.768 to 0.8)	42.4	82.90	59.50	0.1	0.99	2.05	0.29

*Lee et al. 2012⁵, When Lee model’s performance was tested, data from 2019, 2020 were used to build prediction model and data from 2014, 2015, 2016, 2017, 2018 were used to validate. WC: Waist circumference, RHR: Resting heart rate, LGBM: Light Gradient Boosting Machine, XGB: Extreme Gradient Boosting, Ada: Ada Boost, AUC: The receiver operating characteristics curve under the curve.
For this study, five different machine learning classification algorithms were used to predict undiagnosed diabetes. Based on their performance assessed by AUC, results from the best performed machine learning classification was used.

Search