Fig. 5

The spatial local attention module. (A) is our proposed SLA. The input feature map X was processed to obtain the feature vector of \(\:1\times\:1\:{C}^{{\prime\:}}\)by the MaxPool layer and AvgPool layer. Meanwhile, X was compressed by \(\:1\times\:1\) Conv and then normalized by the Softmax layer. This can obtain the self-attention weight of spatial dimension. It can highlight the low-frequency regions of the image and reduce the loss of local features in the Maximum pooling. (B) is the Spatial Attention Module (SPA) in the Convolutional Block Attention Module (CBAM).