Table 2 An example instantiation of the SlowFast network.

From: A new dataset for video-based cow behavior recognition

Stage

Slow pathway

Fast pathway

Output sizes T\(\times S^2\)

Raw clip

–

–

644\(\times 224^2\)

Data layer

Stride 16, \(1^2\)

Stride 2, \(1^2\)

Slow: 4\(\times 224^2\)

Fast: 32\(\times 224^2\)

conv\(_1\)

1\(\times 7^2\), 64

5\(\times 7^2\), 8

Slow: 4\(\times 112^2\)

 

Stride 1, \(2^2\)

Stride 1, \(2^2\)

Fast: 32\(\times 112^2\)

pool\(_1\)

1\(\times 3^2\), max

1\(\times 3^2\), max

Slow: 4\(\times 56^2\)

 

Stride 1, \(2^2\)

Stride 1, \(2^2\)

Fast: 32\(\times 56^2\)

res\(_2\)

\(\left[ \begin{array}{c} 1\times 1^2, 64\\ 1\times 3^2, 64\\ 1\times 1^2, 256 \end{array}\right] \times 3\)

\(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{8}}}\\ 1\times 3^2, {\textbf {{8}}}\\ 1\times 1^2, {\textbf {{32}}} \end{array}\right] \times 3\)

Slow: 4\(\times 56^2\)

Fast: 32\(\times 56^2\)

res\(_3\)

\(\left[ \begin{array}{c} 1\times 1^2, 128\\ 1\times 3^2, 128\\ 1\times 1^2, 512 \end{array}\right] \times 4\)

\(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{16}}}\\ 1\times 3^2, {\textbf {{16}}}\\ 1\times 1^2, {\textbf {{64}}} \end{array}\right] \times 4\)

Slow: 4\(\times 28^2\)

Fast: 32\(\times 28^2\)

res\(_4\)

\(\left[ \begin{array}{c} 3\times 1^2, 256\\ 1\times 3^2, 256\\ 1\times 1^2, 1024 \end{array}\right] \times 6\)

\(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{32}}}\\ 1\times 3^2, {\textbf {{32}}}\\ 1\times 1^2, {\textbf {{128}}}\end{array}\right] \times 6\)

Slow: 4\(\times 14^2\)

Fast: 32\(\times 14^2\)

res\(_5\)

\(\left[ \begin{array}{c} 3\times 1^2, 512\\ 1\times 3^2, 512\\ 1\times 1^2, 2048 \end{array}\right] \times 3\)

\(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{64}}}\\ 1\times 3^2, {\textbf {{64}}}\\ 1\times 1^2, {\textbf {{256}}}\end{array}\right] \times 3\)

Slow: 4\(\times 7^2\)

Fast: 32\(\times 7^2\)

  1. Significant values are in bold.