Figure 3
From: Generalising from conventional pipelines using deep learning in high-throughput screening workflows

Measuring Deep learning (DL) generalisation robustness with noisy label data for semantic segmentation. Result of the three complementary methods assessing the Convolutional Neural Network (CNN) capacity of overcoming segmentation errors. Comparison of CNN deep learning (CDL) prediction and Conventional image processing (CIP, Method for label generation). (A) Results of the qualitative analysis: Expert rating. Violin plots represent the distribution of the score given to each method in the double-blinded test.The CDL method scored significantly higher in all the expert. (B,C) Results of the quantitative analysis: Detection as a surrogate metric. (B) Scatter plots representing the misclassification ratio of a method (CIP or CDL) compared to human annotations. Each point represents the result of a given expert for a given class and method. Therefore, there are 4 points, one for each expert, per class and method. The highest the error ratio, the worse the method performance. (C) Results of the quantitative analysis: Detection as a surrogate metric. Plots represent the average number of Bounding Box intersections between the manual reference generated by the four experts and the evaluated method. Error bars represent standard deviation. Overlapping levels split as follows: On the left, lower overlap (between 0.1 and 0.49) and on the right, higher overlap (0.5 to 1). For both classes (Phagophore and Autolysosome) and both overlapping ranges, the number of detected events is higher in the CDL segmentation than in the CIP one. (D,E) Results of the quantitative analysis: Detection as a surrogate metric CIP results are presented in light orange, CDL results in dark blue. (D) Aggregated metrics. (E) Metrics broken-down per class. The CDL method showed a better performance in line with the results obtained with the previous methods.