Figure 4
From: Generalising from conventional pipelines using deep learning in high-throughput screening workflows

Comparison of Segmentation accuracy of CIP vs CDL vs MDL. (A) Box plot of the general metrics scores of the three different approaches. Mean aggregation is recommended over Global and Weighted for our strong-class imbalanced dataset. (B) Accuracy, Intersection over Union and Mean BFS of the three methods for each class: Background, Phagophore and Autolysosome. The large class imbalance between background and the other classes explains the small differences in performance for the background class. In these regards, Mean BFS is a more representative metric for imbalanced scenarios. (C) Confusion matrix for each approach. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class. In general, CIP scores the lowest, followed by CDL and MDL score the highest.