Fig. 11

This diagram represents the hierarchical processing of refined feature maps through multiple transformer-based layers, including self-attention and feed-forward networks, leading to the final segmentation output mask.

This diagram represents the hierarchical processing of refined feature maps through multiple transformer-based layers, including self-attention and feed-forward networks, leading to the final segmentation output mask.