Extended Data Fig. 1: Evaluation results for models on the ISIC dataset for No Finding prediction. | Nature Medicine

Extended Data Fig. 1: Evaluation results for models on the ISIC dataset for No Finding prediction.

From: The limits of fair medical imaging AI in real-world generalization

Extended Data Fig. 1

a, Dataset statistics. b, Subgroup fairness gaps of the ERM model as defined by FPR. Each subgroup contains at least 100 samples for analysis (age: subgroup ‘60–80’ vs ‘18–40’; sex: subgroup ‘female’ vs ‘male’). We find that disparities in FPR are small and statistically insignificant in the case of age. Each bar and its error bar indicate the mean and standard deviation across 3 independent runs. c, Trade-off between the fairness gap and overall AUROC for all trained models, evaluated against sensitive attribute age and sex, respectively. We find that most models, including ERM, achieve a good fairness-performance tradeoff.

Back to article page