Table 1 Performance of the model in binary and subtype classification on different datasets

From: An interpretable deep learning model for first-trimester fetal cardiac screening

Dataset

AUROC (95% CI)

Sensitivity (95% CI)

Specificity (95% CI)

F1 score (95% CI)

Binary Classification

Validation set

1.000 (1.000–1.000)

0.995 (0.988–1.000)

0.999 (0.997–1.000)

0.996 (0.992–0.999)

External validation set 1

0.986 (0.980–0.992)

0.994 (0.980–1.000)

0.917 (0.895–0.937)

0.849 (0.809–0.887)

External validation set 2

0.925 (0.881–0.963)

0.974 (0.912–1.000)

0.817 (0.738–0.890)

0.794 (0.696–0.879)

External validation set 3

1.000 (0.999–1.000)

0.816 (0.756–0.872)

1.000 (1.000–1.000)

0.898 (0.861–0.932)

Subtype classification

Validation set

   

Normal

0.978 (0.962–0.991)

0.968 (0.950–0.985)

0.985 (0.967–1.000)

0.980 (0.970–0.989)

AVSD

0.928 (0.867–0.967)

0.580 (0.400–0.750)

0.952 (0.934–0.969)

0.464 (0.320–0.597)

HV

0.875 (0.789–0.948)

0.630 (0.435–0.815)

0.995 (0.988–1.000)

0.719 (0.556–0.857)

FSV

0.962 (0.943–0.978)

0.846 (0.786–0.901)

0.946 (0.925–0.966)

0.840 (0.793–0.882)

External validation set

Normal

0.956 (0.933–0.977)

0.869 (0.821–0.913)

0.999 (0.997–1.000)

0.927 (0.899–0.953)

AVSD

0.836 (0.806–0.864)

0.476 (0.370–0.581)

0.897 (0.880–0.915)

0.331 (0.255–0.407)

HV

0.913 (0.891–0.934)

0.669 (0.612–0.726)

0.991 (0.985–0.996)

0.785 (0.742–0.826)

FSV

0.886 (0.866–0.905)

0.858 (0.832–0.884)

0.769 (0.733–0.803)

0.842 (0.822–0.862)

  1. AUROC area under the receiver operating characteristic curve, AVSD atrioventricular septal defect, FSV functional single ventricle, HV hypoplastic ventricles.