Fig. 3: Distribution of cross-domain Dice scores across methods.

Each violin shows the distribution of case-level Dice scores when models are trained on one dataset and evaluated on unseen domains. AMAP exhibits a narrower and higher distribution compared with baselines, indicating both higher average Dice and lower variance.