Table 2 Comparison the performance of ML models for YOCRC risk stratification in the internal validation dataset

	AUC	Accuracy	Sensitivity (Recall)	Specificity	NPV	Precision (PPV)	F1 score	Brier score
LR	0.768	0.741	0.650	0.745	0.978	0.108	0.185	0.219
RF	0.859	0.747	0.840	0.743	0.990	0.134	0.231	0.177
KNN	0.692	0.622	0.665	0.619	0.975	0.077	0.137	0.258
SVC	0.777	0.729	0.720	0.729	0.982	0.112	0.194	0.207
DT	0.732	0.830	0.625	0.840	0.979	0.156	0.250	0.182
XGBoost	0.871	0.790	0.775	0.790	0.987	0.149	0.251	0.166
AdaBoost	0.843	0.776	0.745	0.778	0.985	0.137	0.232	0.223
Stacking	0.821	0.780	0.800	0.779	0.988	0.147	0.248	0.163

Accuracy (\(\frac{{TP}+{TN}}{{TP}+{FP}+{TN}+{FN}}\)), Sensitivity (Recall) \((\frac{{TP}}{{TP}+{FN}})\), Specificity \((\frac{{TN}}{{TN}+{FP}})\), Negative predictive value (NPV) \((\frac{{TN}}{{TN}+{FN}})\), Precision or Positive predictive value (PPV)\(\,(\frac{{TP}}{{TP}+{FP}})\), and F1 scores (\(\frac{2* {Precision}* {Recall}}{{Precision}+{Recall}}\)), Brier score (Y, P) = 1/n * \({\sum }_{i=1}^{n}{({P}_{i}-{Y}_{i})}^{2}\).
ML machine learning, YOCRC Young-onset colorectal cancer, AUC area under the curve of ROC, LR logistic regression, RF random forest, KNN k-Nearest Neighbor, SVC support vector classification, DT decision tree, XGBoost eXtreme Gradient Boosting, AdaBoost Adaptive Boosting, TP true positive, TN true negative, FP false positive, FN false negative, P probability of model prediction, Y actual probability of occurrence (no occurrence recorded as 0), n number of predicted events.

Quick links

Search