Table 2 Summary of datasets and data splits. We summarize the size of the training, validation, and test sets in terms of the number of images used in our experiments. We used MIMIC-CXR for self-supervised pretraining and downstream classification and CheXPert to obtain an external test set only. For NIH-14, we used the training set during downstream classification since it has different labels from MIMIC-CXR.

Dataset	Purpose	Training	Validation	Test
MIMIC-CXR	Internal validation	325,188	15,282	36,625
ChexPert	External validation	-	-	688
NIH-14	External validation	32,457	3,567	15,735

Quick links

Search