Fig. 2: AUC and PFD of DenseNet across subgroups of age (left), sex (middle), and race (right) in the four tasks.
From: Improving model fairness in image-based computer-aided diagnosis

a COVID-19 detection on the MIDRC dataset, b thorax abnormality detection on the MIMIC-CXR dataset, c POAG detection on the OHTS dataset, and d AMD detection on the AREDS dataset. The results are averaged over five trained models using different portions of the data for training (80%) and testing (20%). Standard deviations are also shown. We were unable to summarize race with AMD detection because the Black subgroup in the AREDS dataset has too few members to be studied reliably (<3.7%).