Fig. 4: Benchmarking of metabolic subphenotyping prediction from features extracted from OGTT glucose time series versus existing surrogate markers.

Bar graphs represent the average auROC of the best-performing model for each metabolic subphenotype and each corresponding feature set. Error bars represent the standard deviation of the measured auROC. In total, nine feature sets including demographics were evaluated for each metabolic subphenotype: two sets of features obtained from OGTT glucose curve (OGTT_G_Features and OGTT_G_ReducedRep), T2D polygenic risk score (PRS), and six measures in current use including demographics alone (age, sex, BMI, ethnicity and participant family history for T2D), lab (HbA1C and FPG), HOMA-B, HOMA-IR and Matsuda Index (both are surrogate markers for insulin resistance), and incretins (total GIP and GLP-1 concentrations at OGTT_2h, which are optimized surrogate markers for incretin effect). Four classifiers were trained on the training set and the y axis represents the auROC of the best-performing classifier on the test set for each metabolic subphenotype and each feature set. Statistical significance of differences between the measure of auROCs among all tested features and OGTT_G_ReducedRep was determined using the Wilcoxon rank-sum test.