Figure 3
From: An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy

Out-of-sample generalization of participant detection methods. (a) Plot of detection (mAPd) and generalization (mAPg) mAP values per team: Overall (gray) and per artefact class (colored, legend is provided on the top). The black dashed line represents the ideal identity line. R denotes Pearson’s R correlation. P denotes the p-value of the null hypothesis being that the slope of the least-squares linear line is zero. (b) Plot of deviation score (scored) and generalization mAP, (mAPg) per team. Team markers in panel (a,b) are plotted large to small with decreasing detection score. scored. (c) Paired bar plots of mean team detection and generalization mAP scores are denoted by ‘d’ and ‘g’, respectively, for each artefact class. Error bars show ± 1 standard deviation of team scores relative to the overall team mean score shown by each bar. Paired t-test was used to test for difference in mean, n.s. - no significance, *p < 0.05, **p < 0.01. In all panels, the same color scheme is used to color individual artefact classes. Color points in (a) and color bars in (c) constituting of red, green, blue, violet, orange, yellow and brown colors represent for specularity, artefact, saturation, blur, contrast, bubbles, and instrument classes, respectively. Similarly, gray colored points in (a,b) are used to represent overall performance of each team. Also, star, diamond and square are used to represent baseline methods in comparison.