Fig. 5
From: GastroHUN an Endoscopy Dataset of Complete Systematic Screening Protocol for the Stomach

Bootstrap distribution of Macro F1-score rankings across different architectures, evaluated by repeated sampling (b = 100). Each point represents a ranking for a specific architecture obtained from a bootstrap iteration. The distribution of rankings shows the stability of each model’s performance with respect to sampling variability.