Fig. 8: GenSeg consistently enhances segmentation performance across diverse tasks, domains, and data regimes.
From: Generative AI enables medical image segmentation in ultra low-data regimes

a GenSeg-UNet consistently outperforms nnUNet across a range of segmentation tasks under in-domain scenarios. b GenSeg-UNet consistently demonstrates superior performance to nnUNet across diverse segmentation tasks in out-of-domain settings. In the X-Y notation, X refers to the training dataset and Y to the test dataset, where X and Y are from distinct distributions. c GenSeg-SwinUnet outperforms SwinUnet, both trained on 40 examples from the ISIC dataset and evaluated on the test sets of ISIC, PH2, and DermIS. d Extension of the GenSeg framework to 3D medical image segmentation tasks under different training data regimes. “Hippo.-low” refers to training with an ultra-low data setting for hippocampus segmentation, while “Hippo.-full” refers to training with the full available dataset. The same settings are applied to the liver segmentation task. e Comparison of model performance under ultra-low and high data regimes. “UNet-low” denotes the UNet model trained with an ultra-low amount of data, while “UNet-high” refers to the model trained with the full available dataset. The same training settings are applied to GenSeg-UNet. f GenSeg’s performance on the ISIC and FetReg datasets can be further improved by employing several strategies, including increasing the number of training examples, using task-appropriate segmentation models, and refining augmentation techniques. g The runtime (in hours on an A100 GPU) of GenSeg-UNet was measured for lung segmentation using JSRT as the training data and for skin lesion segmentation using ISIC as the training data. In all panels (except g), bar heights represent the mean, and error bars indicate the standard deviation across three independent runs with different random seeds. Results from individual runs are shown as dot points. Source data are provided as a Source Data file.