Fig. 2: FlexFair achieves superior fairness and accuracy across diverse medical datasets.
From: Achieving flexible fairness metrics in federated medical imaging

We compare FlexFair with six baseline methods (FedAvg, FedNova, FedProx, SCAFFOLD, FairFed, and FairMixup) across four datasets: polyp, fundus vascular, cervical cancer, and skin disease. Each method is evaluated on fairness (EA, DP, EO) and accuracy metrics (dice score for segmentation tasks and accuracy for diagnostic tasks). a–c illustrate the Pareto front for segmentation datasets, highlighting trade-offs between fairness and accuracy. FlexFair highlighted with red color consistently achieves superior dice scores and fairness gap. d–f depict maximum gap values for dice scores, where lower values indicate greater fairness. FlexFair outperforms other methods by minimizing the max dice gap across sites. g–j analyze fairness and accuracy in diagnostic tasks on the skin disease dataset, emphasizing FlexFair’s ability to balance demographic parity and equal opportunity across age and gender attributes. k–n confirm that FlexFair achieves the lowest max dice gap values, ensuring equitable performance across all metrics and datasets. Source data are provided as a Source Data file.