Table 6 Performance results of external validation for linear evaluation of self-supervised methods using ViT-T as backbone model. We summarize AUROC and AUPRC results on CheXpert and NIH-14 test sets including 95% confidence intervals. The best results are shown in bold. We use \(\uparrow\) and \(\downarrow\) to indicate whether the performance of a given model is 0.5-1.5% better or worse than the reference model (i.e., vanilla MSN). \(\uparrow \uparrow\) and \(\downarrow \downarrow\) indicate whether the difference is 1.5% better or worse than the reference model, respectively. − indicates that the performance difference is less than 0.5% compared to the reference model.
From: Multimodal masked siamese network improves chest X-ray representation learning
CheXpert | NIH-14 | |||
|---|---|---|---|---|
AUROC (CI) | AUPRC (CI) | AUROC (CI) | AUPRC (CI) | |
MSN | 0.740 (0.720, 0.763) | 0.396 (0.382, 0.421) | 0.676 (0.671, 0.68) | 0.199 (0.197, 0.203) |
MSN\(+ x_{sex}\) | 0.742 (0.721, 0.764) − | 0.403 (0.389, 0.425)\(\uparrow\) | 0.711\(^*\)(0.707, 0.714)\(\uparrow \uparrow\) | 0.233\(^*\)(0.230, 0.242)\(\uparrow \uparrow\) |
MSN\(+ x_{age}\) | 0.767 (0.746, 0.789)\(\uparrow \uparrow\) | 0.390 (0.378, 0.410)\(\downarrow\) | 0.711\(^*\)(0.707, 0.715)\(\uparrow \uparrow\) | 0.232\(^*\)(0.229, 0.238)\(\uparrow \uparrow\) |
MSN\(+ x_{view}\) | 0.765 (0.742, 0.788)\(\uparrow \uparrow\) | 0.420\(\uparrow \uparrow\)(0.405, 0.453)\(\uparrow \uparrow\) | 0.711\(^*\)(0.708, 0.715)\(\uparrow \uparrow\) | 0.236\(^*\)(0.232, 0.245)\(\uparrow \uparrow\) |
MSN\(+ x_{pos}\) | 0.749 (0.728, 0.770)\(\uparrow\) | 0.392 (0.380, 0.411) − | 0.710\(^*\)(0.706, 0.713)\(\uparrow \uparrow\) | 0.235\(^*\)(0.232, 0.241)\(\uparrow \uparrow\) |
MSN\(+ x_{mort}\) | 0.750 (0.728, 0.772)\(\uparrow\) | 0.413 (0.400, 0.435)\(\uparrow \uparrow\) | 0.709\(^*\)(0.705, 0.712)\(\uparrow \uparrow\) | 0.236\(^*\)(0.232, 0.244)\(\uparrow \uparrow\) |
MSN\(+ x_{icu}\) | 0.744 (0.723, 0.765) − | 0.398 (0.386, 0.422) − | 0.704\(^*\)(0.700, 0.708)\(\uparrow \uparrow\) | 0.227\(^*\)(0.225, 0.235)\(\uparrow \uparrow\) |
MSN\(+ x_{D}\) | 0.746 (0.726, 0.767)\(\uparrow\) | 0.379 (0.368, 0.399)\(\downarrow \downarrow\) | 0.713\(^*\)(0.709, 0.717)\(\uparrow \uparrow\) | 0.238\(^*\)(0.235, 0.245)\(\uparrow \uparrow\) |
MSN\(+ x_{SM}\) | 0.766 (0.745, 0.786)\(\uparrow \uparrow\) | 0.402 (0.387, 0.427)\(\uparrow\) | 0.716\(^*\)(0.712, 0.719)\(\uparrow \uparrow\) | 0.241\(^*\)(0.236, 0.249)\(\uparrow \uparrow\) |
MSN\(+ x_{SI}\) | 0.765 (0.745, 0.783)\(\uparrow \uparrow\) | 0.387 (0.375, 0.407)\(\downarrow\) | 0.712\(^*\)(0.708, 0.715)\(\uparrow \uparrow\) | 0.234\(^*\)(0.231, 0.241)\(\uparrow \uparrow\) |
MSN\(+ x_{D+SM}\) | 0.752 (0.727, 0.778)\(\uparrow\) | 0.391 (0.379, 0.423)\(\downarrow\) | 0.704\(^*\)(0.700, 0.707)\(\uparrow \uparrow\) | 0.227\(^*\)(0.224, 0.234)\(\uparrow \uparrow\) |
MSN\(+ x_{D+SI}\) | 0.755 (0.738, 0.777)\(\uparrow \uparrow\) | 0.388 (0.376, 0.408)\(\downarrow\) | 0.709\(^*\)(0.705, 0.712)\(\uparrow \uparrow\) | 0.233\(^*\)(0.230, 0.241)\(\uparrow \uparrow\) |
MSN\(+ x_{SM+SI}\) | 0.769 (0.752, 0.794)\(\uparrow \uparrow\) | 0.406 (0.403, 0.432)\(\uparrow\) | 0.708\(^*\)(0.704, 0.711)\(\uparrow \uparrow\) | 0.236\(^*\)(0.232, 0.243)\(\uparrow \uparrow\) |
MSN\(+ x_{D+SM+SI}\) | 0.770\(^{\dagger }\)(0.746, 0.781)\(\uparrow \uparrow\) | 0.409 (0.403, 0.454)\(\uparrow\) | 0.711\(^*\)(0.706, 0.716)\(\uparrow \uparrow\) | 0.233\(^*\)(0.230, 0.240)\(\uparrow \uparrow\) |