Fig. 5: Distribution of segmentation metrics across the test cohort.

Left: Dice scores (higher is better); Right: HD95 (lower is better). Prompt-Mamba-AF exhibits a tighter distribution with fewer outliers, demonstrating superior robustness compared to U-Net and TransUNet.