Table 4 Performance results for linear evaluation of self-supervised methods using ViT-T as backbone model. We summarize the AUROC and AUPRC performance on the test set for the self-supervised methods as well as the 95% confidence intervals. The best results are shown in bold. We use \(\uparrow\) and \(\downarrow\) to indicate whether the performance of a given model is 0.5-1.5% better or worse than the reference model (i.e., vanilla MSN). \(\uparrow \uparrow\) and \(\downarrow \downarrow\) indicate whether the difference is 1.5% better or worse than the reference model, respectively.
From: Multimodal masked siamese network improves chest X-ray representation learning
Pretraining | AUROC (CI) | AUPRC (CI) |
|---|---|---|
ImageNet | 0.684 (0.680, 0.688) \(\downarrow \downarrow\) | 0.251 (0.249, 0.254) \(\downarrow \downarrow\) |
DINO | 0.694 (0.691, 0.698) \(\downarrow\) | 0.262 (0.259, 0.264) \(\downarrow\) |
MAE | 0.633 (0.629, 0.637) \(\downarrow \downarrow\) | 0.211 (0.209, 0.213) \(\downarrow \downarrow\) |
MSN | 0.708 (0.704, 0.711) | 0.272 (0.270, 0.275) |
MSN\(+ x_{sex}\) | 0.728\(^*\) (0.724, 0.731) \(\uparrow \uparrow\) | 0.290\(^*\) (0.288, 0.293) \(\uparrow \uparrow\) |
MSN\(+ x_{age}\) | 0.728\(^*\) (0.724, 0.732) \(\uparrow \uparrow\) | 0.289\(^*\) (0.287, 0.292) \(\uparrow \uparrow\) |
MSN+\(x_{view}\) | 0.726\(^*\) (0.722, 0.730) \(\uparrow \uparrow\) | 0.288\(^*\) (0.286, 0.291) \(\uparrow \uparrow\) |
MSN+\(x_{pos}\) | 0.729\(^*\)(0.726, 0.733)\(\uparrow \uparrow\) | 0.292\(^*\)(0.290, 0.295)\(\uparrow \uparrow\) |
MSN+\(x_{mort}\) | 0.727\(^*\) (0.723, 0.730) \(\uparrow \uparrow\) | 0.291\(^*\) (0.288, 0.294) \(\uparrow \uparrow\) |
MSN+\(x_{icu}\) | 0.726\(^*\) (0.722, 0.729) \(\uparrow \uparrow\) | 0.289\(^*\) (0.287, 0.292) \(\uparrow \uparrow\) |
MSN+\(x_{D}\) | 0.727\(^*\) (0.723, 0.731) \(\uparrow \uparrow\) | 0.290\(^*\) (0.287, 0.293) \(\uparrow \uparrow\) |
MSN+\(x_{SM}\) | 0.727\(^*\) (0.723, 0.730) \(\uparrow \uparrow\) | 0.290\(^*\) (0.288, 0.293) \(\uparrow \uparrow\) |
MSN+\(x_{SI}\) | 0.725\(^*\) (0.721, 0.728) \(\uparrow \uparrow\) | 0.289\(^*\) (0.286, 0.292) \(\uparrow \uparrow\) |
MSN+\(x_{D+SM}\) | 0.724\(^*\) (0.720, 0.727) \(\uparrow \uparrow\) | 0.285\(^*\) (0.283, 0.288) \(\uparrow\) |
MSN+\(x_{D+SI}\) | 0.723\(^*\) (0.720, 0.727) \(\uparrow \uparrow\) | 0.287\(^*\) (0.285, 0.291) \(\uparrow \uparrow\) |
MSN+\(x_{SM+SI}\) | 0.726\(^*\) (0.723, 0.729) \(\uparrow \uparrow\) | 0.288\(^*\) (0.286, 0.291) \(\uparrow \uparrow\) |
MSN+\(x_{D+SM+SI}\) | 0.724\(^*\) (0.720, 0.727) \(\uparrow \uparrow\) | 0.287\(^*\) (0.284, 0.289) \(\uparrow \uparrow\) |