Table 1 Detailed architecture of the encoder.
Encoder | |||||
---|---|---|---|---|---|
Block | Input | Input-size | Out-size | Channel | element |
Conv1 | RGB | H × W | H × W | 3 → 16 | 3 × 3, stride 1 |
Layer1 | RGB | H × W | H/2 × W/2 | 3 → 64 | 7 × 7, stride 2 |
Layer2 | F (Layer1) | H/2 × W/2 | H/4 × W/4 | 64 → 256 | 3 × 3 max pool, stride 2 |
\(\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1 \times 1,{\text{ }}128} \\ {3 \times 3,{\text{ 128}}} \\ {1 \times 1,{\text{ }}256} \end{array}}&{C=32} \end{array}} \right] \times 3\) | |||||
Layer3 | F (Layer2) | H/4 × W/4 | H/8 × W/8 | 256 → 512 | \(\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1 \times 1,{\text{ }}256} \\ {3 \times 3,{\text{ 256}}} \\ {1 \times 1,{\text{ }}512} \end{array}}&{C=32} \end{array}} \right] \times 4\) |
Layer4 | F (Layer3) | H/8 × W/8 | H/16 × W/16 | 512 → 1024 | \(\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1 \times 1,{\text{ }}512} \\ {3 \times 3,{\text{ 512}}} \\ {1 \times 1,{\text{ }}1024} \end{array}}&{C=32} \end{array}} \right] \times 23\) |