Table 2 Summary of datasets and data splits. We summarize the size of the training, validation, and test sets in terms of the number of images used in our experiments. We used MIMIC-CXR for self-supervised pretraining and downstream classification and CheXPert to obtain an external test set only. For NIH-14, we used the training set during downstream classification since it has different labels from MIMIC-CXR.
From: Multimodal masked siamese network improves chest X-ray representation learning
Dataset | Purpose | Training | Validation | Test |
|---|---|---|---|---|
MIMIC-CXR | Internal validation | 325,188 | 15,282 | 36,625 |
ChexPert | External validation | - | - | 688 |
NIH-14 | External validation | 32,457 | 3,567 | 15,735 |