Table 1 Performance of the model in binary and subtype classification on different datasets
From: An interpretable deep learning model for first-trimester fetal cardiac screening
Dataset | AUROC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1 score (95% CI) |
|---|---|---|---|---|
Binary Classification | ||||
Validation set | 1.000 (1.000–1.000) | 0.995 (0.988–1.000) | 0.999 (0.997–1.000) | 0.996 (0.992–0.999) |
External validation set 1 | 0.986 (0.980–0.992) | 0.994 (0.980–1.000) | 0.917 (0.895–0.937) | 0.849 (0.809–0.887) |
External validation set 2 | 0.925 (0.881–0.963) | 0.974 (0.912–1.000) | 0.817 (0.738–0.890) | 0.794 (0.696–0.879) |
External validation set 3 | 1.000 (0.999–1.000) | 0.816 (0.756–0.872) | 1.000 (1.000–1.000) | 0.898 (0.861–0.932) |
Subtype classification | ||||
Validation set | ||||
Normal | 0.978 (0.962–0.991) | 0.968 (0.950–0.985) | 0.985 (0.967–1.000) | 0.980 (0.970–0.989) |
AVSD | 0.928 (0.867–0.967) | 0.580 (0.400–0.750) | 0.952 (0.934–0.969) | 0.464 (0.320–0.597) |
HV | 0.875 (0.789–0.948) | 0.630 (0.435–0.815) | 0.995 (0.988–1.000) | 0.719 (0.556–0.857) |
FSV | 0.962 (0.943–0.978) | 0.846 (0.786–0.901) | 0.946 (0.925–0.966) | 0.840 (0.793–0.882) |
External validation set | ||||
Normal | 0.956 (0.933–0.977) | 0.869 (0.821–0.913) | 0.999 (0.997–1.000) | 0.927 (0.899–0.953) |
AVSD | 0.836 (0.806–0.864) | 0.476 (0.370–0.581) | 0.897 (0.880–0.915) | 0.331 (0.255–0.407) |
HV | 0.913 (0.891–0.934) | 0.669 (0.612–0.726) | 0.991 (0.985–0.996) | 0.785 (0.742–0.826) |
FSV | 0.886 (0.866–0.905) | 0.858 (0.832–0.884) | 0.769 (0.733–0.803) | 0.842 (0.822–0.862) |