Fig. 6

Structure of the MobileNetV2 used for Feature Extraction: an input patch undergoes stem convolution and stacked inverted residual blocks with linear bottlenecks. Each block expands channels, applies depthwise convolution, and projects back using residual shortcuts. This efficient design retains spatial detail with low computation. The resulting compact feature map is sent to the DA block for context refinement before classification.