Extended Data Table 4 Comparing four large-scale pretrained foundation models’ ability to tolerate sex-related bias

All models are evaluated on CheXpert³⁴ via linear-probing using sex-exclusive training/test splits by following Larrazabal et al.³³. Ark⁺ has 13 unbiased results (bolded), demonstrating the greatest resilience to sex-imbalanced data compared with the other models. Sex bias is characterized by a significant drop in performance when training and test data are of the opposite sex compared to when they are of the same sex. A robust model should have more unbiased results that do not show a statistically significant performance difference (at p = 0.05) between datasets exclusively composed of male or female data. All unbiased results are highlighted in green.
*Abbreviation: Dz: Disease, EC: Enlarged Cardiomediastinum, CM: Cardiomegaly, LO: Lung Opacity, LL: Lung Lesion, ED: Edema, CS: Consolidation, PN: Pneumonia, AT: Atelectasis, PX: Pneumothorax, PE: Pleural Effusion, PO: Pleural Other, FR: Fracture.
^↑Greater number of unbiased results indicate that the model is more effective in tolerating sex-related biases.

Quick links

Search