Table 3 Comparison between multimodality and the best performing single modality models.

From: Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection

Evaluation metrics

Including subsegmental

Excluding subsegmental

Imaging model

EMR model

Late elastic average

Imaging model

EMR model

Late elastic average

Operating threshold

0.625

0.630

0.448

0.625

0.612

0.414

Accuracy

0.687

[0.685–0.689]

0.834

[0.832–0.835]

0.885

[0.884–0.886]

0.756

[0.754–0.758]

0.873

[0.871–0.874]

0.902

[0.900–0.903]

AUROC

0.791

[0.788–0.793]

0.911

[0.910–0.913]

0.947

[0.946–0.948]

0.833

[0.830–0.835]

0.921

[0.919–0.923]

0.962

[0.961–0.963]

Specificity

0.862

[0.860–0.865]

0.875

[0.872–0.877]

0.902

[0.9–0.904]

0.863

[0.861–0.866]

0.878

[0.876–0.880]

0.849

[0.847–0.852]

Sensitivity

0.559

[0.557–0.562]

0.804

[0.801–0.806]

0.873

[0.871–0.875]

0.651

[0.647–0.654]

0.867

[0.865–0.870]

0.953

[0.951–0.954]

PPV

0.848

[0.846–0.851]

0.898

[0.896–0.899]

0.924

[0.923–0.926

0.830

[0.827–0.833]

0.879

[0.877–0.882]

0.866

[0.864–0.869]

NPV

0.588

[0.585–0.590]

0.765

[0.761–0.767]

0.838

[0.835–0.84]

0.707

[0.705–0.710]

0.866

[0.864–0.868]

0.946

[0.945–0.948]

  1. Model performance on the held-out testset with 95% confidence interval using probability threshold that maximizes both sensitivity and specificity on the validation dataset. Best performance metrics in bold text.