Fig. 3

(a) The overview of the proposed WAN. \(X_l\):lower-level feature map, A: the attention map, \({\widetilde{X}}_h\): transformed new feature map. \(\odot\) represents the element-wise multiplication. The above line is the trained self-predictor in the first stage, which is used to directly predict the CTV segmentation. (b) WA-block. It is inserted in the decoder stage to conduct classification. The purpose of inserting WA-block is to enhance the result of CTV segmentation by multiple tasks.