Table 2 Algorithm of the proposed model.

From: Retinal vessel segmentation using multi scale feature attention with MobileNetV2 encoder

 

Step

Description

1

Input layer

- Define an input tensor of shape [256, 256, 3] to handle RGB images.

2

Downsampling

- Use MobileNetV2 as the backbone for feature extraction:

  

- Extract feature maps from the layers: block_1_expand_relu (64 × 64), block_3_expand_relu (32 × 32), block_6_expand_relu (16 × 16), block_13_expand_relu (8 × 8), and block_16_project (4 × 4).

- Freeze the backbone to prevent weight updates during training.

3

Bottleneck (MSFA)

Apply Multi-Scale Feature Aggregation (MSFA) at the lowest resolution (4 × 4) feature map:

- Use convolutional layers with kernel sizes 1 × 1, 3 × 3, 5 × 5, and 7 × 7.

- Concatenate the resulting feature maps to capture multi-scale spatial features.

4

Upsampling with attention

Define 4 upsampling blocks, each including:

- Transposed convolution for upsampling (stride = 2).

- Batch normalization and ReLU activation.

- Attention Block to enhance important features:

- Use Global Average Pooling and Global Max Pooling to create attention weights.

- Pass pooled features through convolution layers with ReLU and sigmoid activations.

- Apply the attention weights to the feature maps.

5

Skip connections with MSFA

For each upsampling step:

- Retrieve the corresponding downsampled feature map (skip connection).

- Pass the skip connection through Multi-Scale Feature Aggregation (MSFA).

- Concatenate the upsampled feature map with the aggregated skip connection.

- Add a residual connection by summing the concatenated features with the original skip connection.

6

Output layer

Apply a transposed convolution layer to upsample the final feature map to the original input size (256 × 256):

- The number of filters corresponds to the number of output channels (e.g., 1 for binary segmentation, N for multi-class segmentation).