Fig. 1

The pipline of the proposed MedFuseNet, which aims to fuse local and global deep feature representations with hybrid attention mechanisms for medical image segmentation. MedFuseNet consists of three key components: (1) an encoder integrating a CNN branch equipped with the atrous spatial pyramid pooling (ASPP)19 for local feature learning, a Swin-Transformer (ST) branch for global feature learning, and a cross-attention module for fusing local and global features, (2) an decoder incorporated with a squeeze-and-excitation attention (SE-attention) module20, and (3) skip connections with three CNN branches equipped with an adaptive cross attention (ACA) module.