Fig. 1
From: Multimodal masked siamese network improves chest X-ray representation learning

Overview of the multimodal MSN pretraining framework. We use \(x_{ehr}\) as an auxiliary modality during model pretraining. In downstream classification, we freeze \(f_{target}\) and use it as a feature extractor along with a classification head for multi-label image classification. Components of the vanilla MSN are outlined in black.