Fig. 1
From: Context-guided segmentation for histopathologic cancer segmentation

Overall architecture of our CGS-Net model. It takes two inputs: a higher-resolution patch of the target area and a lower-resolution context patch with a larger field of view. It consists of two transformer-based encoders and a UNet-like decoder. At each level, the cross-attention module incorporates the corresponding context information provided by the context patch encoder. In the cross-attention module, K (key) and V (value) are derived from the target area encoder, while Q (Query) is obtained from the context encoder.