Fig. 3: Illustration of the gradient update mechanism of the algorithms assuming constant (negative) gradient input.
From: Fast and robust analog in-memory deep neural network training

A–C Reference conductance (\(\breve{R}\)) set to the symmetry point (SP) of \(\breve{A}\) without offset (\({\breve{r}}_{ij}={\breve{a}}_{ij}^{*}\)). D-F Reference conductance set to the symmetry point (SP) with added offset (\({\breve{r}}_{ij}={\breve{a}}_{ij}^{*}-0.8\)). A, D Tiki-Taka version 2 (TTv2) accumulates the gradient onto \({\breve{a}}_{ij}\) (blue curve) which is constantly updated in the direction of the net gradient. The hidden weights hij (green curve) is updated intermittently with the readout of \({\breve{a}}_{ij}\) (indicated with dots; here every 25 updates). The weight (orange line) is updated once the threshold is reached (dotted line). Note that the weight is updated correctly without offset (plot (A) blue area indicates correctly signed updates), however, with reference offset (plot (D) blue dashed line) the weight update breaks down. B, E Chopped-TTv2 (c-TTv2) introduces a chopper (dashed gray lines) that switches the gradient accumulation direction (here set to regular intervals). Note that weight is correctly updated without offset (plot (B)), however, a reference offset (plot (E)) causes a slowdown (but not breakdown) of the weight learning as the offset disturbs the zero point in one chopper cycle but recovers every other cycle (red areas indicate wrong sign of the gradient readout due to the offset). C, F (AGAD) introduces an on-the-fly reference estimation (pij; red line) that is copied to the current reference (\({{{\bf{p}}}}_{ij}^{{{\rm{ref}}}}\), violet line) when the chopper changes. Note that in this case the reference is dynamically adjusted so that weight update is correct without (plot (C)) as well as with any offset (plot (F)). Parameter settings: 5 × 5 matrix size (only first element is plotted), δ = 0.05, σb = σ± = σd-to-d = σc-to-c = 0.3, γ0 = 200, λ = 0.1, ns = 5, β = 0.5, ρ = 0.1, \({l}_{\max }=5\), λA = 1, and σr = 0.