Table 2 Comparison the performance of ML models for YOCRC risk stratification in the internal validation dataset

From: Development and validation of machine learning models for young-onset colorectal cancer risk stratification

 

AUC

Accuracy

Sensitivity (Recall)

Specificity

NPV

Precision (PPV)

F1 score

Brier score

LR

0.768

0.741

0.650

0.745

0.978

0.108

0.185

0.219

RF

0.859

0.747

0.840

0.743

0.990

0.134

0.231

0.177

KNN

0.692

0.622

0.665

0.619

0.975

0.077

0.137

0.258

SVC

0.777

0.729

0.720

0.729

0.982

0.112

0.194

0.207

DT

0.732

0.830

0.625

0.840

0.979

0.156

0.250

0.182

XGBoost

0.871

0.790

0.775

0.790

0.987

0.149

0.251

0.166

AdaBoost

0.843

0.776

0.745

0.778

0.985

0.137

0.232

0.223

Stacking

0.821

0.780

0.800

0.779

0.988

0.147

0.248

0.163

  1. Accuracy (\(\frac{{TP}+{TN}}{{TP}+{FP}+{TN}+{FN}}\)), Sensitivity (Recall) \((\frac{{TP}}{{TP}+{FN}})\), Specificity \((\frac{{TN}}{{TN}+{FP}})\), Negative predictive value (NPV) \((\frac{{TN}}{{TN}+{FN}})\), Precision or Positive predictive value (PPV)\(\,(\frac{{TP}}{{TP}+{FP}})\), and F1 scores (\(\frac{2* {Precision}* {Recall}}{{Precision}+{Recall}}\)), Brier score (Y, P) = 1/n * \({\sum }_{i=1}^{n}{({P}_{i}-{Y}_{i})}^{2}\).
  2. ML machine learning, YOCRC Young-onset colorectal cancer, AUC area under the curve of ROC, LR logistic regression, RF random forest, KNN k-Nearest Neighbor, SVC support vector classification, DT decision tree, XGBoost eXtreme Gradient Boosting, AdaBoost Adaptive Boosting, TP true positive, TN true negative, FP false positive, FN false negative, P probability of model prediction, Y actual probability of occurrence (no occurrence recorded as 0), n number of predicted events.