Fig. 3

Comparison of segmentation and quantification methods for Lucchi ++ benchmark dataset validating the efficiency of the mask-based automated quantification module. (A) Overview of segmentation and quantification methods tested on the Lucchi++ dataset. The input image is segmented using a manual method using the semi-automated tool or the deep learning (DL)-based probabilistic interactive model. The resulting segmentation masks are used for mask-based automated quantification, while manual quantification with ImageJ is performed directly on the input image. (B) Segmentation results compared to the gold standard. Segmentation masks are displayed for the semi-automatic method and probabilistic interactive model. Differences from the gold standard are shown, with false positives in red, false negatives in blue, true positives in white, and true negatives in black. (C) The use of a probabilistic interactive model resulted in a reduction of false positives and false negatives were reduced through (mean ± SD, p = 2.0 × 10−8, p = 2.0 × 10−4, Mann–Whitney U test. N = 15). (D) Comparison of IoU values derived from semi-automated (SA) and probabilistic interactive (PI) segmentation (whiskers show min-to-max range, p = 4.4 × 10−6, independent two-sample t-test. N = 15). Arithmetic mean is indicated by a + symbol in each box plot. (E) Comparison of fold changes in quantified mitochondrial morphological parameters (height, width, area, and count) relative to the gold standard. Multiple parameters/variables were extracted from each image. The gold standard is based on manual quantification using ImageJ, whereas the semi-automatic (SA) segmentation and probabilistic interactive (PI) model results were derived from automatic quantification modules applied to segmentation masks obtained by each respective method. (mean ± SD, independent two-sample t-test. N = 230–235). (F) Box and whisker plot illustrating the elapsed time for each method (whiskers show 10–90 percentile, p = 2.0 × 10−8, Kruskal–Wallis test followed by Dunn’s multiple comparisons test. N = 15). The average time for the manual quantification method was 641.9 s, while the probabilistic interactive and automatic quantification method required an average of 62.33 s, indicating a 90.3% reduction in analysis time when using the method. Statistical significance is indicated by asterisks.