Table 6 Performance results of external validation for linear evaluation of self-supervised methods using ViT-T as backbone model. We summarize AUROC and AUPRC results on CheXpert and NIH-14 test sets including 95% confidence intervals. The best results are shown in bold. We use \(\uparrow\) and \(\downarrow\) to indicate whether the performance of a given model is 0.5-1.5% better or worse than the reference model (i.e., vanilla MSN). \(\uparrow \uparrow\) and \(\downarrow \downarrow\) indicate whether the difference is 1.5% better or worse than the reference model, respectively. − indicates that the performance difference is less than 0.5% compared to the reference model.

From: Multimodal masked siamese network improves chest X-ray representation learning

 

CheXpert

NIH-14

AUROC (CI)

AUPRC (CI)

AUROC (CI)

AUPRC (CI)

MSN

0.740 (0.720, 0.763)

0.396 (0.382, 0.421)

0.676 (0.671, 0.68)

0.199 (0.197, 0.203)

MSN\(+ x_{sex}\)

0.742 (0.721, 0.764) −

0.403 (0.389, 0.425)\(\uparrow\)

0.711\(^*\)(0.707, 0.714)\(\uparrow \uparrow\)

0.233\(^*\)(0.230, 0.242)\(\uparrow \uparrow\)

MSN\(+ x_{age}\)

0.767 (0.746, 0.789)\(\uparrow \uparrow\)

0.390 (0.378, 0.410)\(\downarrow\)

0.711\(^*\)(0.707, 0.715)\(\uparrow \uparrow\)

0.232\(^*\)(0.229, 0.238)\(\uparrow \uparrow\)

MSN\(+ x_{view}\)

0.765 (0.742, 0.788)\(\uparrow \uparrow\)

0.420\(\uparrow \uparrow\)(0.405, 0.453)\(\uparrow \uparrow\)

0.711\(^*\)(0.708, 0.715)\(\uparrow \uparrow\)

0.236\(^*\)(0.232, 0.245)\(\uparrow \uparrow\)

MSN\(+ x_{pos}\)

0.749 (0.728, 0.770)\(\uparrow\)

0.392 (0.380, 0.411) −

0.710\(^*\)(0.706, 0.713)\(\uparrow \uparrow\)

0.235\(^*\)(0.232, 0.241)\(\uparrow \uparrow\)

MSN\(+ x_{mort}\)

0.750 (0.728, 0.772)\(\uparrow\)

0.413 (0.400, 0.435)\(\uparrow \uparrow\)

0.709\(^*\)(0.705, 0.712)\(\uparrow \uparrow\)

0.236\(^*\)(0.232, 0.244)\(\uparrow \uparrow\)

MSN\(+ x_{icu}\)

0.744 (0.723, 0.765) −

0.398 (0.386, 0.422) −

0.704\(^*\)(0.700, 0.708)\(\uparrow \uparrow\)

0.227\(^*\)(0.225, 0.235)\(\uparrow \uparrow\)

MSN\(+ x_{D}\)

0.746 (0.726, 0.767)\(\uparrow\)

0.379 (0.368, 0.399)\(\downarrow \downarrow\)

0.713\(^*\)(0.709, 0.717)\(\uparrow \uparrow\)

0.238\(^*\)(0.235, 0.245)\(\uparrow \uparrow\)

MSN\(+ x_{SM}\)

0.766 (0.745, 0.786)\(\uparrow \uparrow\)

0.402 (0.387, 0.427)\(\uparrow\)

0.716\(^*\)(0.712, 0.719)\(\uparrow \uparrow\)

0.241\(^*\)(0.236, 0.249)\(\uparrow \uparrow\)

MSN\(+ x_{SI}\)

0.765 (0.745, 0.783)\(\uparrow \uparrow\)

0.387 (0.375, 0.407)\(\downarrow\)

0.712\(^*\)(0.708, 0.715)\(\uparrow \uparrow\)

0.234\(^*\)(0.231, 0.241)\(\uparrow \uparrow\)

MSN\(+ x_{D+SM}\)

0.752 (0.727, 0.778)\(\uparrow\)

0.391 (0.379, 0.423)\(\downarrow\)

0.704\(^*\)(0.700, 0.707)\(\uparrow \uparrow\)

0.227\(^*\)(0.224, 0.234)\(\uparrow \uparrow\)

MSN\(+ x_{D+SI}\)

0.755 (0.738, 0.777)\(\uparrow \uparrow\)

0.388 (0.376, 0.408)\(\downarrow\)

0.709\(^*\)(0.705, 0.712)\(\uparrow \uparrow\)

0.233\(^*\)(0.230, 0.241)\(\uparrow \uparrow\)

MSN\(+ x_{SM+SI}\)

0.769 (0.752, 0.794)\(\uparrow \uparrow\)

0.406 (0.403, 0.432)\(\uparrow\)

0.708\(^*\)(0.704, 0.711)\(\uparrow \uparrow\)

0.236\(^*\)(0.232, 0.243)\(\uparrow \uparrow\)

MSN\(+ x_{D+SM+SI}\)

0.770\(^{\dagger }\)(0.746, 0.781)\(\uparrow \uparrow\)

0.409 (0.403, 0.454)\(\uparrow\)

0.711\(^*\)(0.706, 0.716)\(\uparrow \uparrow\)

0.233\(^*\)(0.230, 0.240)\(\uparrow \uparrow\)

  1. \(^*\)Statistical significance results with respect to vanilla MSN (\(p < 0.001\)).
  2. \(^{\dagger }\)Statistical significance results with respect to vanilla MSN (\(p < 0.01\)).