Fig. 1: Illustration of gradient update computation steps.
From: Fast and robust analog in-memory deep neural network training

The general structure of the gradient computation is shared for all improved learning algorithms discussed and is based on Tiki-Taka version 2 (TTv2) (see ref. 24). For each input vector x and backpropagated error vector d the weight gradient is first accumulated on a crossbar array \(\breve{A}\), using a parallel pulsed outer-product update with learning rate λA (13; see Supplementary Alg. 1). Note that the matrices are here displayed in a transposed fashion so that voltage inputs x are delivered from the left and d from the bottom side. Then a single row of the accumulated gradient in \(\breve{A}\) is read out intermittently every ns vector updates (looping through the rows over time), and digital computation is used to arrive at a FP vector zk that is added to the digital storage H with learning rate λH. Finally, the corresponding row of actual weight matrix, which is represented by a second crossbar array \({\breve{W}}\), is updated when a threshold is crossed, and the hidden matrix H is reset correspondingly. The newly proposed algorithms differ in the digital computation to arrive at \(\hat{{{\bf{x}}}}\) and zk. For the TTv2 baseline algorithm, it is \(\hat{{{\bf{x}}}}\equiv {{\bf{x}}}\) and \({{{\bf{z}}}}_{k}\equiv (\breve{A}-\breve{R})\,{{{\bf{v}}}}_{k}\) where the reference crossbar array \(\breve{R}\) is programmed before DNN training and a fast differential analog MVM is used for readout (using one-hot unit vector vk). See “Methods” section “Fast and robust in-memory training” and Supplementary Fig. 2 for more details on the digital operations of the proposed algorithms.