Table 5 Performance results for external validation of our proposed methods using ViT-S backbone under linear evaluation. We summarize AUROC and AUPRC results on CheXpert and NIH-14 test sets including 95% confidence intervals. The best results are shown in bold. We use \(\uparrow\) and \(\downarrow\) to indicate whether the performance of a given model is 0.5-1.5% better or worse than the reference model (i.e., vanilla MSN). \(\uparrow \uparrow\) and \(\downarrow \downarrow\) indicate whether the difference is 1.5% better or worse than the reference model, respectively. − indicates that the performance difference is less than 0.5% compared to the reference model.

From: Multimodal masked siamese network improves chest X-ray representation learning

Model

CheXpert

NIH-14

AUROC (CI)

AUPRC (CI)

AUROC (CI)

AUPRC (CI)

MSN

0.770 (0.755, 0.785)

0.420 (0.409, 0.453)

0.699 (0.695, 0.702)

0.220 (0.217, 0.226)

MSN\(+ x_{sex}\)

0.768 (0.754, 0.788) −

0.417 (0.414, 0.457) −

0.734\(^*\) (0.730, 0.738) \(\uparrow \uparrow\)

0.265\(^*\) (0.261, 0.274)\(\uparrow \uparrow\)

MSN\(+ x_{age}\)

0.772 (0.754, 0.797) −

0.424 (0.416, 0.446) −

0.736\(^*\) (0.732, 0.739) \(\uparrow \uparrow\)

0.263\(^*\) (0.257, 0.270) \(\uparrow \uparrow\)

MSN\(+ x_{view}\)

0.801\(^{\dagger }\)(0.785, 0.825)\(\uparrow \uparrow\)

0.445\(^{\dagger }\)(0.420, 0.469)\(\uparrow \uparrow\)

0.732\(^*\) (0.728, 0.736) \(\uparrow \uparrow\)

0.261\(^*\) (0.258, 0.270) \(\uparrow \uparrow\)

MSN\(+ x_{pos}\)

0.785 (0.776, 0.798) \(\uparrow \uparrow\)

0.426 (0.412, 0.465) \(\uparrow\)

0.734\(^*\) (0.730, 0.737) \(\uparrow \uparrow\)

0.261\(^*\) (0.257, 0.270) \(\uparrow \uparrow\)

MSN\(+ x_{mort}\)

0.778 (0.743, 0.811) \(\uparrow\)

0.424 (0.412, 0.451) −

0.737\(^*\) (0.734, 0.742) \(\uparrow \uparrow\)

0.265\(^*\) (0.261, 0.275) \(\uparrow \uparrow\)

MSN\(+ x_{icu}\)

0.773 (0.739, 0.789) −

0.422 (0.408, 0.445) −

0.727\(^*\) (0.722, 0.731) \(\uparrow \uparrow\)

0.256\(^*\) (0.251, 0.264) \(\uparrow \uparrow\)

MSN\(+ x_{D}\)

0.776 (0.757, 0.802) \(\uparrow\)

0.419 (0.406, 0.439) −

0.736\(^*\) (0.732, 0.740) \(\uparrow \uparrow\)

0.267\(^*\) (0.262, 0.276) \(\uparrow \uparrow\)

MSN\(+ x_{SM}\)

0.796\(^{\dagger }\) (0.775, 0.813) \(\uparrow \uparrow\)

0.427 (0.418, 0.473) \(\uparrow\)

0.732\(^*\) (0.728, 0.735) \(\uparrow \uparrow\)

0.258\(^*\) (0.254, 0.268) \(\uparrow \uparrow\)

MSN\(+ x_{SI}\)

0.771 (0.753, 0.788) −

0.423 (0.402, 0.443) −

0.738\(^*\)(0.734, 0.741)\(\uparrow \uparrow\)

0.270\(^*\)(0.264, 0.278)\(\uparrow \uparrow\)

MSN\(+ x_{D+SM}\)

0.776 (0.756, 0.791) \(\uparrow\)

0.412 (0.397, 0.447) \(\downarrow\)

0.728\(^*\) (0.725, 0.732) \(\uparrow \uparrow\)

0.254\(^*\) (0.250, 0.263) \(\uparrow \uparrow\)

MSN\(+ x_{D+SI}\)

0.757 (0.724, 0.769) \(\downarrow\)

0.418 (0.404, 0.441) −

0.728\(^*\) (0.724, 0.732) \(\uparrow \uparrow\)

0.269\(^*\) (0.263, 0.279) \(\uparrow \uparrow\)

MSN\(+ x_{SM+D}\)

0.782 (0.761, 0.803) \(\uparrow\)

0.431 (0.418, 0.452) \(\uparrow\)

0.734\(^*\) (0.730, 0.738) \(\uparrow \uparrow\)

0.263\(^*\) (0.258, 0.273) \(\uparrow \uparrow\)

MSN\(+ x_{D+SM+SI}\)

0.775 (0.753, 0.798) \(\uparrow\)

0.427 (0.413, 0.452) \(\uparrow\)

0.725\(^*\) (0.721, 0.728) \(\uparrow \uparrow\)

0.251\(^*\) (0.248, 0.258) \(\uparrow \uparrow\)

  1. \(^*\) Statistical significance results with respect to vanilla MSN (\(p < 0.001\)).
  2. \(^{\dagger }\) Statistical significance results with respect to vanilla MSN (\(p < 0.01\)).