Table 1 Architectures of the 2 CNN models. s stands for stride: step of the convolution over the input. p stands for padding: addition of zeros to avoid shrinking of the input during convolution operation. Left: Approach A, the CNN architecture ResNet. Right: Approach B, Architecture of the feature extractor (see Figure 2).
Layers | Type | Parameters |
|---|---|---|
1 | Convolutional layer | 64 kernels (7x7), s = 2, p = 3 |
Activation | Relu | |
1 | Max Pooling layer | Size (3,3) |
2-5 | ConvBlock1 | [64x(3x3)] x 2, s = 2, p = 1 |
Activation | Relu | |
6-9 | ConvBlock2 | [128x(3x3] x 2, s = 2, p = 1 |
Activation | Relu | |
10-13 | ConvBlock3 | [256x(3x3)] x 2, s = 2, p = 1 |
Activation | Relu | |
14-17 | ConvBlock4 | [512x(3x3)] x 2, s = 2, p = 1 |
Activation | Relu | |
18 | Average pooling layer | Size (1x1) |
19 | Fully connected layer | 1000x(output feature map) |
20 | Softmax activation | Output probabilities |
1 | Convolutional layer | 8 kernels (3x3), s = 1, p = 1 |
2 | Activation | Relu |
3 | Convolutional layer | 32 kernels (2x2), s = 1, p = 1 |
4 | Activation | Relu |
5 | Convolutional layer | 64 kernels (2x2), s = 1, p = 1 |
6 | Activation | Relu |
7 | Max Pooling layer | Size (3x3) |
8 | Convolutional layer | 128 kernels (3x3), s = 1, p = 1 |
9 | Activation | Relu |