Fig. 9: Ablation studies on generative models and generation strategies in GenSeg.
From: Generative AI enables medical image segmentation in ultra low-data regimes

a, b Ablation study evaluating the effectiveness of different generative models - including Pix2Pix (GAN-based), BBDM (diffusion-based), and Soft-Intro VAE (VAE-based) - under separate and end-to-end training strategies. Evaluations were conducted under both in-domain (a) and out-of-domain (b) scenarios, using UNet as the segmentation model. For out-of-domain scenarios, datasets are labeled in the format X-Y, where X denotes the training dataset and Y denotes the test dataset. c Comparison of training time (left) measured on an A100 GPU and model size (right) for Pix2Pix, BBDM, and Soft-Intro VAE within our end-to-end training framework, in skin lesion segmentation with 40 training examples from the ISIC dataset when using UNet as the segmentation model. d Impact of mask-to-image GAN models on the performance of GenSeg-UNet was evaluated on the test datasets of ISIC, PH2, and DermIS, in skin lesion segmentation. GenSeg-UNet was trained using 40 examples from the ISIC training dataset. e, f Ablation study comparing simultaneous image-mask generation with the two-step approach, where masks are first augmented and then used to generate images. The two-step strategy outperforms simultaneous generation. Experiments were conducted under both in-domain (e) and out-of-domain (f) settings. In all panels (except c), bar heights represent the mean, and error bars indicate the standard deviation across three independent runs with different random seeds. Results from individual runs are shown as dot points. Source data are provided as a Source Data file.