Table 3 Comparison the performance of ML models for YOCRC risk stratification in the temporal validation dataset
 | AUC | Accuracy | Sensitivity (Recall) | Specificity | NPV | Precision (PPV) | F1 score | Brier score |
---|---|---|---|---|---|---|---|---|
LR | 0.799 | 0.782 | 0.577 | 0.790 | 0.979 | 0.098 | 0.167 | 0.200 |
RF | 0.888 | 0.779 | 0.872 | 0.775 | 0.994 | 0.133 | 0.231 | 0.163 |
KNN | 0.726 | 0.648 | 0.679 | 0.647 | 0.981 | 0.071 | 0.128 | 0.244 |
SVC | 0.827 | 0.757 | 0.744 | 0.757 | 0.987 | 0.108 | 0.188 | 0.192 |
DT | 0.779 | 0.848 | 0.705 | 0.853 | 0.987 | 0.159 | 0.260 | 0.172 |
XGBoost | 0.892 | 0.801 | 0.808 | 0.801 | 0.991 | 0.138 | 0.236 | 0.159 |
Adaboost | 0.887 | 0.802 | 0.782 | 0.803 | 0.989 | 0.136 | 0.231 | 0.220 |
Stacking | 0.849 | 0.804 | 0.821 | 0.803 | 0.991 | 0.141 | 0.241 | 0.150 |