Table 3 3-fold cross-validation, (a), and hold-out test set performance, (b), of classifiers trained with radiomic, deep and hybrid datasets.

From: Value of handcrafted and deep radiomic features towards training robust machine learning classifiers for prediction of prostate cancer disease aggressiveness

(a)

Training data

3-fold cross-validation performance (95% CI)

F2

CohensKappa

AUC

Sensitivity

Specificity

Rad

0.8830

(0.8129, 0.9570)

0.7350

(0.5300, 0.8562)

0.8679

(0.7654, 0.9281)

0.8920

(0.8164, 0.9762)

0.8439

(0.6697, 0.9163)

Deep

0.9577

(0.8267, 0.9675)

0.8609

(0.6684, 0.9235)

0.9311

(0.8338, 0.9615)

0.9753

(0.8135, 0.9762)

0.8870

(0.7701, 0.9639)

Hybrid

0.9392

(0.8236, 0.9892)

0.9164

(0.7175, 0.9444)

0.9583

(0.8585, 0.9722)

0.9306

(0.8056, 1)

0.9861

(0.8282, 0.9868)

(b)

Training data

Hold-out test-set performance

F2

CohensKappa

AUC

Sensitivity

Specificity

Rad

0.7767

0.5055

0.8192

0.8421

0.7179

Deep

0.3333

0.2648

0.8131

0.2941

0.9355

Hybrid

0.5063

0.4062

0.8710

0.4706

0.9032