Extended Data Fig. 3: Trade-offs between the FPR gap and the FNR gap for each task and attribute, for models trained on MIMIC-CXR or CheXpert and evaluated on the same dataset for (a) No Finding and (b) Effusion prediction. | Nature Medicine

Extended Data Fig. 3: Trade-offs between the FPR gap and the FNR gap for each task and attribute, for models trained on MIMIC-CXR or CheXpert and evaluated on the same dataset for (a) No Finding and (b) Effusion prediction.

From: The limits of fair medical imaging AI in real-world generalization

Extended Data Fig. 3

We evaluate these metrics across age (‘80–100’ vs ‘18–40’), sex (‘female’ vs ‘male’), race (‘White’ vs. ‘Black’), and the intersection of sex and race (‘White male’ vs. ‘Black female’). We find for the most part, there is a positive correlation, indicating that fairer models achieve fairness with respect to both FPR and FNR (that is, equal odds). All p values are calculated using two-sided t-test. The center line and the shadow denote the mean and 95% CI, respectively.

Back to article page