Figure 2
From: Learning on tree architectures outperforms a convolutional feedforward network

BP step on highly pruning Tree-3 architecture. (a) Scheme of a BP step in the first branch of a highly pruning Tree-3 architecture (Fig. 1d). The gray squares in the first layer represent convolutional hidden units, \({\sigma }_{Conv}\), and max-pooling hidden units that are equal zero, except several denoted by RGB dots. The non-zero tree output hidden units, \({\sigma }_{Tree}\), are denoted by black dots. The updated weights with nonzero gradients, in first layer, \({W}^{Conv}\), second layer, \({W}^{Tree}\), and third fully connected layer, \({W}^{FC}\), are denoted by RGB lines. (b) Fraction of zero gradients, averaged over the test set, and their standard deviations for the tree layers of Tree-3 architecture (K = 15, M = 16), after many epochs (“Methods” section).