Table 3 Performance results for linear evaluation of our proposed method using ViT-S backbone on the MIMIC-CXR dataset. We summarize the AUROC and AUPRC performance on the test set for the self-supervised methods as well as the 95% confidence intervals. The best results are shown in bold. We use \(\uparrow\) and \(\downarrow\) to indicate whether the performance of a given model is 0.5-1.5% better or worse than the reference model (i.e., vanilla MSN). \(\uparrow \uparrow\) and \(\downarrow \downarrow\) indicate whether the difference is 1.5% better or worse than the reference model, respectively.
From: Multimodal masked siamese network improves chest X-ray representation learning
Pretraining | AUROC (CI) | AUPRC (CI) |
|---|---|---|
ImageNet | 0.703 (0.699, 0.706) \(\downarrow \downarrow\) | 0.269 (0.266, 0.272) \(\downarrow \downarrow\) |
DINO | 0.714 (0.711, 0.718) \(\downarrow \downarrow\) | 0.278 (0.276, 0.281) \(\downarrow\) |
MAE | 0.649 (0.645, 0.653) \(\downarrow \downarrow\) | 0.223 (0.221, 0.225) \(\downarrow \downarrow\) |
MSN | 0.731 (0.727, 0.734) | 0.291 (0.289, 0.294) |
MSN\(+ x_{sex}\) | 0.751\(^*\)(0.748, 0.754)\(\uparrow \uparrow\) | 0.311\(^*\)(0.309, 0.314)\(\uparrow \uparrow\) |
MSN\(+ x_{age}\) | 0.746\(^*\) (0.743, 0.749) \(\uparrow \uparrow\) | 0.307\(^*\) (0.305, 0.310) \(\uparrow \uparrow\) |
MSN\(+ x_{view}\) | 0.747\(^*\) (0.744, 0.750) \(\uparrow \uparrow\) | 0.307\(^*\) (0.305, 0.310) \(\uparrow \uparrow\) |
MSN\(+ x_{pos}\) | 0.748\(^*\) (0.745, 0.752) \(\uparrow \uparrow\) | 0.306\(^*\) (0.303, 0.309) \(\uparrow \uparrow\) |
MSN\(+ x_{mort}\) | 0.748\(^*\) (0.744, 0.751)\(\uparrow \uparrow\) | 0.308\(^*\) (0.306, 0.312) \(\uparrow \uparrow\) |
MSN\(+ x_{icu}\) | 0.746\(^*\) (0.742, 0.749) \(\uparrow \uparrow\) | 0.305\(^*\) (0.303, 0.308) \(\uparrow\) |
MSN\(+ x_{SD}\) | 0.751\(^*\)(0.748, 0.754)\(\uparrow \uparrow\) | 0.310\(^*\) (0.308, 0.313) \(\uparrow \uparrow\) |
MSN\(+ x_{SM}\) | 0.744\(^*\) (0.741, 0.747) \(\uparrow\) | 0.306\(^*\) (0.303, 0.309) \(\uparrow \uparrow\) |
MSN\(+ x_{IS}\) | 0.749\(^*\) (0.746, 0.752) \(\uparrow \uparrow\) | 0.308\(^*\) (0.305, 0.311) \(\uparrow \uparrow\) |
MSN\(+ x_{SD+SM}\) | 0.742\(^*\) (0.739, 0.746) \(\uparrow\) | 0.302\(^*\) (0.300, 0.305) \(\uparrow\) |
MSN\(+ x_{SD+SI}\) | 0.744\(^*\) (0.740, 0.747) \(\uparrow\) | 0.306\(^*\) (0.304, 0.309) \(\uparrow \uparrow\) |
MSN\(+ x_{SM+SI}\) | 0.748\(^*\) (0.744, 0.751) \(\uparrow \uparrow\) | 0.307\(^*\) (0.304, 0.310) \(\uparrow \uparrow\) |
MSN\(+ x_{SD+SM+SI}\) | 0.739\(^{\dagger }\) (0.736, 0.743) \(\uparrow\) | 0.301\(^*\) (0.299, 0.305) \(\uparrow\) |