Table 3 Outcomes of the internal validation test after the training with two data sets and of the external validation test using the excluded data set (Phase 1).

From: Deep learning algorithms for detecting and visualising intussusception on plain abdominal radiography in children: a retrospective multicenter study

 

Training and internal validation test

External validation test

P-value of difference between two validation (95% CI)

Data set

AUC

Highest accuracy

Highest Youden index

Sen

Spe

Optimal cut-off value

Data set

AUC

Youden index

Sen

Spe

(A)

A + B

0.966 (0.955, 0.975)

0.952

0.818

0.913

0.905

0.02

C

0.811 (0.784, 0.835)

0.421

0.793

0.628

 < 0.001* (0.128, 0.183)

(B)

B + C

0.971 (0.959, 0.980)

0.943

0.848

0.943

0.905

0.06

A

0.895 (0.874, 0.913)

0.431

0.947

0.485

 < 0.001* (0.059, 0.102)

(C)

C + A

0.946 (0.926, 0.961)

0.927

0.764

0.913

0.851

0.01

B

0.844 (0.828, 0.858)

0.493

0.651

0.842

 < 0.001* (0.080, 0.125)

  1. (A) External validation with set C after training and internal validation with sets A + B, (B) External validation with set A after training and internal validation with sets B + C, (C) External validation with set B after training and internal validation with sets C + A, (D) Internal validation after training with sets A + B + C. Positive, with intussusception; negative, without intussusception. AUC, area under the receiver operating characteristic curve (ROC). Accuracy, the fraction of the correct predictions over the total number of predictions. The Youden index, sensitivity + specificity – 1—that is, the vertical distance between the 45° line and the point on the ROC curve. In the external validation tests, we selected the optimal cut-off value based on the highest Youden index value in the internal validation tests. CI, confidence interval. Sen, sensitivity. Spe, specificity.
  2. *P-values < 0.05 indicate a statistically significant difference.