Fig. 3

Box plots of the Dice similarity coefficient (DSC), Hausdorff distance (HD95), and average Likert ratings for different organs-at-risk, comparing the three AI software. Asterisks indicate statistically significant differences between AI software based on the Wilcoxon paired signed-rank test with Bonferroni correction (∗ → 0.05 > p > 0.01; ∗∗ → 0.01 > p > 0.001).