Table 5 Performance results for external validation of our proposed methods using ViT-S backbone under linear evaluation. We summarize AUROC and AUPRC results on CheXpert and NIH-14 test sets including 95% confidence intervals. The best results are shown in bold. We use \(\uparrow\) and \(\downarrow\) to indicate whether the performance of a given model is 0.5-1.5% better or worse than the reference model (i.e., vanilla MSN). \(\uparrow \uparrow\) and \(\downarrow \downarrow\) indicate whether the difference is 1.5% better or worse than the reference model, respectively. − indicates that the performance difference is less than 0.5% compared to the reference model.
From: Multimodal masked siamese network improves chest X-ray representation learning
Model | CheXpert | NIH-14 | ||
|---|---|---|---|---|
AUROC (CI) | AUPRC (CI) | AUROC (CI) | AUPRC (CI) | |
MSN | 0.770 (0.755, 0.785) | 0.420 (0.409, 0.453) | 0.699 (0.695, 0.702) | 0.220 (0.217, 0.226) |
MSN\(+ x_{sex}\) | 0.768 (0.754, 0.788) − | 0.417 (0.414, 0.457) − | 0.734\(^*\) (0.730, 0.738) \(\uparrow \uparrow\) | 0.265\(^*\) (0.261, 0.274)\(\uparrow \uparrow\) |
MSN\(+ x_{age}\) | 0.772 (0.754, 0.797) − | 0.424 (0.416, 0.446) − | 0.736\(^*\) (0.732, 0.739) \(\uparrow \uparrow\) | 0.263\(^*\) (0.257, 0.270) \(\uparrow \uparrow\) |
MSN\(+ x_{view}\) | 0.801\(^{\dagger }\)(0.785, 0.825)\(\uparrow \uparrow\) | 0.445\(^{\dagger }\)(0.420, 0.469)\(\uparrow \uparrow\) | 0.732\(^*\) (0.728, 0.736) \(\uparrow \uparrow\) | 0.261\(^*\) (0.258, 0.270) \(\uparrow \uparrow\) |
MSN\(+ x_{pos}\) | 0.785 (0.776, 0.798) \(\uparrow \uparrow\) | 0.426 (0.412, 0.465) \(\uparrow\) | 0.734\(^*\) (0.730, 0.737) \(\uparrow \uparrow\) | 0.261\(^*\) (0.257, 0.270) \(\uparrow \uparrow\) |
MSN\(+ x_{mort}\) | 0.778 (0.743, 0.811) \(\uparrow\) | 0.424 (0.412, 0.451) − | 0.737\(^*\) (0.734, 0.742) \(\uparrow \uparrow\) | 0.265\(^*\) (0.261, 0.275) \(\uparrow \uparrow\) |
MSN\(+ x_{icu}\) | 0.773 (0.739, 0.789) − | 0.422 (0.408, 0.445) − | 0.727\(^*\) (0.722, 0.731) \(\uparrow \uparrow\) | 0.256\(^*\) (0.251, 0.264) \(\uparrow \uparrow\) |
MSN\(+ x_{D}\) | 0.776 (0.757, 0.802) \(\uparrow\) | 0.419 (0.406, 0.439) − | 0.736\(^*\) (0.732, 0.740) \(\uparrow \uparrow\) | 0.267\(^*\) (0.262, 0.276) \(\uparrow \uparrow\) |
MSN\(+ x_{SM}\) | 0.796\(^{\dagger }\) (0.775, 0.813) \(\uparrow \uparrow\) | 0.427 (0.418, 0.473) \(\uparrow\) | 0.732\(^*\) (0.728, 0.735) \(\uparrow \uparrow\) | 0.258\(^*\) (0.254, 0.268) \(\uparrow \uparrow\) |
MSN\(+ x_{SI}\) | 0.771 (0.753, 0.788) − | 0.423 (0.402, 0.443) − | 0.738\(^*\)(0.734, 0.741)\(\uparrow \uparrow\) | 0.270\(^*\)(0.264, 0.278)\(\uparrow \uparrow\) |
MSN\(+ x_{D+SM}\) | 0.776 (0.756, 0.791) \(\uparrow\) | 0.412 (0.397, 0.447) \(\downarrow\) | 0.728\(^*\) (0.725, 0.732) \(\uparrow \uparrow\) | 0.254\(^*\) (0.250, 0.263) \(\uparrow \uparrow\) |
MSN\(+ x_{D+SI}\) | 0.757 (0.724, 0.769) \(\downarrow\) | 0.418 (0.404, 0.441) − | 0.728\(^*\) (0.724, 0.732) \(\uparrow \uparrow\) | 0.269\(^*\) (0.263, 0.279) \(\uparrow \uparrow\) |
MSN\(+ x_{SM+D}\) | 0.782 (0.761, 0.803) \(\uparrow\) | 0.431 (0.418, 0.452) \(\uparrow\) | 0.734\(^*\) (0.730, 0.738) \(\uparrow \uparrow\) | 0.263\(^*\) (0.258, 0.273) \(\uparrow \uparrow\) |
MSN\(+ x_{D+SM+SI}\) | 0.775 (0.753, 0.798) \(\uparrow\) | 0.427 (0.413, 0.452) \(\uparrow\) | 0.725\(^*\) (0.721, 0.728) \(\uparrow \uparrow\) | 0.251\(^*\) (0.248, 0.258) \(\uparrow \uparrow\) |