Table 1 Detailed architecture of the encoder.

From: LapUNet: a novel approach to monocular depth estimation using dynamic laplacian residual U-shape networks

Encoder

Block

Input

Input-size

Out-size

Channel

element

Conv1

RGB

H × W

H × W

3 → 16

3 × 3, stride 1

Layer1

RGB

H × W

H/2 × W/2

3 → 64

7 × 7, stride 2

Layer2

F (Layer1)

H/2 × W/2

H/4 × W/4

64 → 256

3 × 3 max pool, stride 2

\(\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1 \times 1,{\text{ }}128} \\ {3 \times 3,{\text{ 128}}} \\ {1 \times 1,{\text{ }}256} \end{array}}&{C=32} \end{array}} \right] \times 3\)

Layer3

F (Layer2)

H/4 × W/4

H/8 × W/8

256 → 512

\(\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1 \times 1,{\text{ }}256} \\ {3 \times 3,{\text{ 256}}} \\ {1 \times 1,{\text{ }}512} \end{array}}&{C=32} \end{array}} \right] \times 4\)

Layer4

F (Layer3)

H/8 × W/8

H/16 × W/16

512 → 1024

\(\left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1 \times 1,{\text{ }}512} \\ {3 \times 3,{\text{ 512}}} \\ {1 \times 1,{\text{ }}1024} \end{array}}&{C=32} \end{array}} \right] \times 23\)