Table 2 Algorithm of the proposed model.
From: Retinal vessel segmentation using multi scale feature attention with MobileNetV2 encoder
| Â | Step | Description |
|---|---|---|
1 | Input layer | - Define an input tensor of shape [256, 256, 3] to handle RGB images. |
2 | Downsampling | - Use MobileNetV2 as the backbone for feature extraction: |
|  |  | - Extract feature maps from the layers: block_1_expand_relu (64 × 64), block_3_expand_relu (32 × 32), block_6_expand_relu (16 × 16), block_13_expand_relu (8 × 8), and block_16_project (4 × 4). - Freeze the backbone to prevent weight updates during training. |
3 | Bottleneck (MSFA) | Apply Multi-Scale Feature Aggregation (MSFA) at the lowest resolution (4 × 4) feature map: - Use convolutional layers with kernel sizes 1 × 1, 3 × 3, 5 × 5, and 7 × 7. - Concatenate the resulting feature maps to capture multi-scale spatial features. |
4 | Upsampling with attention | Define 4 upsampling blocks, each including: - Transposed convolution for upsampling (stride = 2). - Batch normalization and ReLU activation. - Attention Block to enhance important features: - Use Global Average Pooling and Global Max Pooling to create attention weights. - Pass pooled features through convolution layers with ReLU and sigmoid activations. - Apply the attention weights to the feature maps. |
5 | Skip connections with MSFA | For each upsampling step: - Retrieve the corresponding downsampled feature map (skip connection). - Pass the skip connection through Multi-Scale Feature Aggregation (MSFA). - Concatenate the upsampled feature map with the aggregated skip connection. - Add a residual connection by summing the concatenated features with the original skip connection. |
6 | Output layer | Apply a transposed convolution layer to upsample the final feature map to the original input size (256 × 256): - The number of filters corresponds to the number of output channels (e.g., 1 for binary segmentation, N for multi-class segmentation). |