Fig. 4: Benchmarking and comparison of nuclear segmentation performance of pre-trained deep learning models on the merged dataset stratified by tissue type.

A Pancreas, B lung, C skin, D tonsil, E colon, F breast, and G tongue tissue type. (i) Mean F1-score (averaged across ROIs) evaluated at an IoU threshold of 0.5 with error bars showing 95% confidence interval. The IoU threshold of 0.5 is the most lenient threshold required to ensure a maximum of one true positive predicted nucleus for each ground truth nucleus. (ii) Mean F1-score (averaged across ROIs) at varying IoU thresholds, with area under the curve shown. A higher IoU threshold results in a stricter condition for classifying a predicted nucleus as a true positive.