Table 3 Performance of the machine learning predictions when patients are targeted according to the calibration cutoff

From: Predicting control of cardiovascular disease risk factors in South Asia using machine learning

Outcome definition

Not achieving control

Not achieving improvements

CVD risk factor

HbA1c

SBP

LDL

HbA1c

SBP

LDL

Chosen specification

Logistic (M)

Logistic (S)

Logistic (M)

Logistic (L)

Tree (S)

Logistic (M)

Detection prevalence

36%

16%

27%

69%

74%

67%

[95% CI]

[31%,41%]

[12%,20%]

[23%,32%]

[64%, 73%]

[70%, 78%]

[63%,72%]

Precision

75%

30%

23%

71%

71%

74%

[95% CI]

[67%,82%]

[19%,41%]

[15%,30%]

[66%, 77%]

[66%, 76%]

[68%,79%]

Sensitivity

53%

26%

42%

82%

88%

86%

[95% CI]

[46%,60%]

[16%,37%]

[29%,55%]

[76%, 86%]

[83%, 92%]

[82%,90%]

  1. This is the performance of the models with the highest median area under the curve over all cross-validation folds applied to the validation data. Detection prevalence is the percentage of patients which will be selected based on the model. Precision is the percentage of selected patients not achieving the outcome. Sensitivity is the percentage of patients not achieving the outcome who are selected. Confidence intervals obtained from bootstrapping with 1000 draws.