Figure 9

Predictions of highest (2 client IID) and lowest (32 client IID) F1-scoring models for 10 class models and 20 class models compared to the data labels/ground truth. The two patches are selected from the centralised testset representing a patch with small fields and high class diversity and a patch with large fields with low class diversity.