Table 1 Comparison with baseline models in the task of pulmonary disease identification

From: A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

Method

Mean

COPD

Bronchiectasis

Pneumothorax

Pneumonia

ILD

Tuberculosis

Lung cancer

Pleural effusion

Image-only

0.805 (0.802, 0.808)

0.847 (0.845, 0.851)

0.746 (0.743, 0.748)

0.789 (0.786, 0.791)

0.845 (0.843, 0.848)

0.799 (0.796, 0.801)

0.769 (0.765, 0.772)

0.825 (0.821, 0.830)

0.819 (0.817, 0.822)

Early fusion

0.835 (0.832, 0.839)

0.895 (0.893, 0.898)

0.772 (0.768, 0.775)

0.810 (0.807, 0.812)

0.873 (0.870, 0.875)

0.824 (0.822, 0.827)

0.793 (0.791, 0.796)

0.871 (0.868, 0.875)

0.842 (0.839, 0.845)

Late fusion

0.826 (0.823, 0.828)

0.888 (0.885, 0.890)

0.765 (0.763, 0.767)

0.822 (0.820, 0.825)

0.870 (0.868, 0.872)

0.804 (0.802, 0.805)

0.770 (0.767, 0.772)

0.839 (0.836, 0.841)

0.850 (0.847, 0.852)

GIT

0.848 (0.844, 0.850)

0.911 (0.907, 0.913)

0.798 (0.796, 0.800)

0.824 (0.821, 0.827)

0.895 (0.893, 0.898)

0.819 (0.816, 0.821)

0.807 (0.804, 0.810)

0.872 (0.871, 0.873)

0.858 (0.855, 0.860)

Perceiver

0.858 (0.855, 0.861)

0.910 (0.907, 0.912)

0.788 (0.784, 0.791)

0.846 (0.842, 0.850)

0.903 (0.901, 0.906)

0.830 (0.827, 0.833)

0.825 (0.823, 0.828)

0.890 (0.887, 0.892)

0.872 (0.869, 0.874)

IRENE

0.924 (0.921, 0.926)

0.922 (0.920, 0.925)

0.907 (0.903, 0.910)

0.954 (0.952, 0.957)

0.921 (0.918, 0.923)

0.934 (0.929, 0.937)

0.918 (0.917, 0.921)

0.914 (0.911, 0.917)

0.924 (0.921, 0.926)

  1. The baseline models include the image-only model, the early fusion method, the late fusion approach and two recent transformer-based multimodal classification models (that is, GIT and Perceiver). The evaluation metric is AUROC, with 95% confidence intervals in brackets.