Table 1 Model performance.

From: PENet—a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging

 

Internal dataset: Stanford

Internal dataset: Stanford (real prevalence)

External dataset: Intermountain

External dataset: Intermountain (real prevalence)

Metric

Accuracy

0.77 [0.76–0.78]

0.81 [0.80–0.82]

0.78 [0.77–0.78]

0.80 [0.79–0.81]

AUROC

0.84 [0.82–0.87]

0.84 [0.79–0.90]

0.85 [0.81–0.88]

0.85 [0.80–0.90]

Specificity

0.82 [0.81–0.83]

0.82 [0.82–0.83]

0.80 [0.79–0.81]

0.81 [0.80–0.82]

Sensitivity

0.73 [0.72–0.74]

0.75 [0.73–0.77]

0.75 [0.74–0.76]

0.75 [0.73–0.77]

PPV/precision

0.81 [0.80–0.81]

0.47 [0.45–0.48]

0.77 [0.76–0.78]

0.44 [0.43–0.46]

NPV

0.75 [0.74–0.76]

0.94 [0.94–0.95]

0.78 [0.77–0.79]

0.94 [0.94–0.95]

  1. Model performance on the internal test set (Stanford) and external test set (Intermountain) with 95% confidence interval using probability threshold of 0.55 that maximizes both sensitivity and specificity on Stanford validation dataset. Bootstrapping is used to generate prevalence of PE in real world (between 14 and 22%).