Table 2 An example instantiation of the SlowFast network.
From: A new dataset for video-based cow behavior recognition
Stage | Slow pathway | Fast pathway | Output sizes T\(\times S^2\) |
|---|---|---|---|
Raw clip | – | – | 644\(\times 224^2\) |
Data layer | Stride 16, \(1^2\) | Stride 2, \(1^2\) | Slow: 4\(\times 224^2\) |
Fast: 32\(\times 224^2\) | |||
conv\(_1\) | 1\(\times 7^2\), 64 | 5\(\times 7^2\), 8 | Slow: 4\(\times 112^2\) |
| Â | Stride 1, \(2^2\) | Stride 1, \(2^2\) | Fast: 32\(\times 112^2\) |
pool\(_1\) | 1\(\times 3^2\), max | 1\(\times 3^2\), max | Slow: 4\(\times 56^2\) |
| Â | Stride 1, \(2^2\) | Stride 1, \(2^2\) | Fast: 32\(\times 56^2\) |
res\(_2\) | \(\left[ \begin{array}{c} 1\times 1^2, 64\\ 1\times 3^2, 64\\ 1\times 1^2, 256 \end{array}\right] \times 3\) | \(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{8}}}\\ 1\times 3^2, {\textbf {{8}}}\\ 1\times 1^2, {\textbf {{32}}} \end{array}\right] \times 3\) | Slow: 4\(\times 56^2\) |
Fast: 32\(\times 56^2\) | |||
res\(_3\) | \(\left[ \begin{array}{c} 1\times 1^2, 128\\ 1\times 3^2, 128\\ 1\times 1^2, 512 \end{array}\right] \times 4\) | \(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{16}}}\\ 1\times 3^2, {\textbf {{16}}}\\ 1\times 1^2, {\textbf {{64}}} \end{array}\right] \times 4\) | Slow: 4\(\times 28^2\) |
Fast: 32\(\times 28^2\) | |||
res\(_4\) | \(\left[ \begin{array}{c} 3\times 1^2, 256\\ 1\times 3^2, 256\\ 1\times 1^2, 1024 \end{array}\right] \times 6\) | \(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{32}}}\\ 1\times 3^2, {\textbf {{32}}}\\ 1\times 1^2, {\textbf {{128}}}\end{array}\right] \times 6\) | Slow: 4\(\times 14^2\) |
Fast: 32\(\times 14^2\) | |||
res\(_5\) | \(\left[ \begin{array}{c} 3\times 1^2, 512\\ 1\times 3^2, 512\\ 1\times 1^2, 2048 \end{array}\right] \times 3\) | \(\left[ \begin{array}{ccc} 3\times 1^2, {\textbf {{64}}}\\ 1\times 3^2, {\textbf {{64}}}\\ 1\times 1^2, {\textbf {{256}}}\end{array}\right] \times 3\) | Slow: 4\(\times 7^2\) |
Fast: 32\(\times 7^2\) |