Table 2 Evaluation results (mean values and 95% confidence intervals in brackets).

From: The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

Model

AUPRC

AUROC

Brier *100 (y = 1)

Brier *100 (y = 0)

ACE (%)

Fixed time to onset

 Extra trees classifier

0.466 (0.435–0.496)

0.906 (0.897–0.915)

67.574 (65.277–69.870)

0.322 (0.286–0.358)

39.18 (35.76–42.60)

 Random forest classifier

0.449 (0.407–0.491)

0.860 (0.852–0.869)

66.446 (64.476–68.417)

0.402 (0.374–0.430)

42.82 (38.12–47.52)

 Light gradient boosting machine

0.317 (0.270–0.363)

0.815 (0.802–0.829)

74.290 (72.327–76.252)

0.415 (0.381–0.449)

45.35 (43.43–47.28)

 XGBoost

0.292 (0.244–0.339)

0.777 (0.750–0.804)

78.062 (75.776–8.3470)

0.326 (0.293–0.358)

43.57 (42.24–44.90)

 Logistic regression

0.239 (0.212–0.266)

0.752 (0.739–0.764)

77.662 (76.415–78.909)

0.466 (0.433–0.498)

46.63 (45.5–47.760)

Sliding windows

 Extra trees classifier

0.006 (0.005–0.007)

0.566 (0.539–0.593)

98.102 (97.811–98.394)

0.012 (0.011–0.013)

66.60 (63.21–69.98)

 Random forest classifier

0.007 (0.007–0.008)

0.612 (0.604–0.621)

97.381 (97.170–97.592)

0.014 (0.013–0.015)

73.74 (73.06–74.43)

 Light gradient boosting machine

0.004 (0.004–0.005)

0.624 (0.596–0.653)

98.785 (98.569–98.998)

0.015 (0.013–0.018)

75.22 (75.00–75.44)

 XGBoost

0.007 (0.006–0.008)

0.756 (0.741–0.771)

98.852 (98.628–99.077)

0.003 (0.003–0.004)

75.14 (73.95–76.34)

 Logistic regression

0.004 (0.004–0.004)

0.703 (0.688–0.717)

99.484 (99.462–99.506)

0.001 (0.001–0.001)

79.02 (73.57–84.47)

Sliding windows w. D.I.

 Extra trees classifier

0.009 (0.007–0.010)

0.593 (0.582–0.604)

97.580 (97.116–98.045)

0.016 (0.014–0.018)

64.56 (59.48–69.63)

 Random forest classifier

0.011 (0.009–0.013)

0.638 (0.625–0.0651)

96.610 (96.006–97.213)

0.020 (0.018–0.022)

72.66 (71.22–74.10)

 Light gradient boosting machine

0.007 (0.006–0.008)

0.665 (0.628–0.703)

98.278 (97.859–98.698)

0.023 (0.021–0.025)

74.89 (74.21–75.57)

 XGBoost

0.011 (0.009–0.013)

0.747 (0.740–0.0755)

98.370 (98.180–98.560)

0.005 (0.004–0.005)

72.77 (70.57–74.96)

 Logistic regression

0.006 (0.005–0.007)

0.684 (0.656–0.713)

99.285 (99.206–99.364)

0.001 (0.001–0.001)

82.13 (76.74–87.52)

On clinical demand

 Extra trees classifier

0.147 (0.139–0.155)

0.719 (0.704–0.733)

89.654 (89.217–9.090)

0.013 (0.012–0.014)

41.60 (38.40–44.80)

 Random forest classifier

0.192 (0.154–0.231)

0.742 (0.717–.0766)

86.881 (85.482–88.281)

0.017 (0.016–0.017)

42.90 (39.20–46.60)

 Light gradient boosting machine

0.056 (0.040–0.072)

0.774 (0.751–0.797)

91.376 (89.983–92.769)

0.030 (0.024–0.036)

62.89 (58.76–67.03)

 XGBoost

0.114 (0.081–0.148)

0.779 (0.752–0.806)

91.799 (9.565–93.034)

0.009 (0.008–0.009)

46.79 (41.90–51.68)

 Logistic regression

0.014 (0.011–0.016)

0.735 (0.720–0.749)

98.370 (98.167–98.572)

0.005 (0.005–0.006)

75.44 (74.25–76.63)

Best scoring machine learning models across framing structures and evaluation metrics

Fixed time to onset

Extra trees

Extra trees

Random forest

Extra trees

Extra trees

Running windows

Random forest

XGBoost

Random forest

Logistic regression

Extra trees

Running windows w. D.I.

Random forest

XGBoost

Random forest

Logistic regression

Extra trees

On clinical demand

Random forest

XGBoost

Random forest

Logistic regression

Extra trees

  1. AUPRC: Area under the precision recall curve; AUROC: Area under the receiver operating characteristics curve; Brier *100 (y = 1): The stratified Brier score for the positive class multiplied by 100; Brier *100 (y = 0): The stratified Brier score for the negative class multiplied by 100; ACE: Average calibration error; Sliding windows w. D.I.: Sliding window with dynamic inclusion.