Supplementary Figure 3: Comparison of neural network architecture. | Nature Methods

Supplementary Figure 3: Comparison of neural network architecture.

From: Fast animal pose estimation using deep neural networks

Supplementary Figure 3

a, Diagram of our neural network architecture. Raw images are provided as input into the network, which then computes a set of confidence maps of the same height and width as the input image (top row). The network consists of a set of convolutions, max pooling and transposed convolutions whose weights are learned during training (top middle). Estimated confidence maps are compared to ground truth maps generated from user labels using a mean squared error loss function, which is then minimized during training (bottom row). b, Accuracy comparison between network architectures. We compared the accuracy of our architecture to the hourglass and stacked hourglass versions of the network described in ref. 10. The accuracy of our network is equivalent or better than that achieved when training with hourglass (over all 32 body parts, n = 300 held-out frames, P < 1 × 10–10, Wilcoxon rank sum test, one-tailed, z = –74.65) and stacked hourglass (over all 32 body parts, n = 300 held-out frames, P < 1 × 10–10, Wilcoxon rank sum test, one-tailed, z = –53.21) versions of the network described in ref. 10. Dots and error bars denote median and 25th and 75th percentiles; violin plots denote full distributions of errors.

Back to article page