Fig. 6: Error propagation and learning with GLE in a small hierarchical network.

a Network setup. A network with one output neuron and two hidden layers (red) learns to mimic the output of a teacher network (identical architecture, different input weights to the first hidden layer). Each hidden layer contains one instantaneous (τm = τr = 1) and one retrospective (τm = 1, τr = 0.1) neuron. In GLE, the corresponding error pathway (blue) follows the microcircuit template in Fig. 4. The input is defined by a superposition of three angular frequency components ω ∈ {0.49, 1.07, 1.98}. Here, we compare error propagation, synaptic plasticity and ultimately the convergence of learning under GLE and AM. b Input and output rates (r, red), along with bottom layer errors (e, light blue) and adjoints (λ, dark blue) before and during the late stages of learning. A running average over e is shown in orange. c Phase shifts (compared to the output error e3) and d amplitudes of bottom layer errors and adjoints across a wide range of their angular frequency spectrum before and during learning. The moments “before” and “during” learning are marked by vertical dashed lines in (e). Top: \({e}_{1}^{\,{\mbox{i}}\,}\) and \({\lambda }_{1}^{\,{\mbox{i}}\,}\) for the instantaneous neuron. Bottom: \({e}_{1}^{\,{\mbox{r}}\,}\) and \({\lambda }_{1}^{\,{\mbox{r}}\,}\) for the retrospective neuron. Note that due to the nonlinearity of neuronal outputs, the network output has a much broader distribution of frequency components compared to the input (with its three components highlighted by the red crosses
). Error amplitudes are shown at two different moments during learning. e Evolution of the bottom weights (\({w}_{0}^{\,{\mbox{i}}},{w}_{0}^{{\mbox{r}}\,}\)) and f of the loss during learning. The vertical dashed lines mark the snapshots at which adjoint and error spectra are plotted above.