Figure 4
From: An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy

Artefact semantic segmentation performance. (a) Segmentation masks predicted by top ranked methods (black border) for a representative image from the test set. Ground truth (GT) mask (top-left and blue bordered) and baseline methods (middle and bottom left, green bordered) are shown for comparison. (b) Box plot of s-score and artefact class over all teams. Classes are plotted in increasing order of region-of-interest area. Whiskers are plotted at 1.5 × inter-quartile range of upper and lower quartiles. Outliers are plotted as black points. c. Error and swarm plots of s-score. Teams are ordered by decreasing mean s-scores. Error bars show ± one standard deviation of class-specific scores relative to the global mean score for each team. Red dashed line plots the s-score if blank segmentation masks were predicted. ‘*’ denotes statistical difference (p < 0.05) in ranked performance relative to the U-Net baseline following Friedman with Bonferroni-Dunn posthoc testing. (d) Average s-score rank performance of individual methods considering artefact classes independently. Solid black lines join methods with no significant rank difference following Friedman Nemenyi post-hoc analysis with p < 0.05. Colored annotation regions in (a), color bars in (b) and color points in (c) constituting of red, green, blue, violet, orange, yellow and brown represent specularity, artefact, saturation, blur, contrast, bubbles, and instrument classes, respectively.