Extended Data Fig. 2: Estimation of the error ϵ1 arising from the difference between idealised (perfectly linear) and actual (non-linear) weight updates.
From: Actor–critic networks with analogue memristors mimicking reward-based learning

(a) Measured and fitted potentiation and depression curves of ten different cycles from a single memristor device. As mentioned in the main text, an error term ϵ1 is introduced due to the difference between the ideal (linear) potentiation/depression (black line) and non-ideal measurement curves (dark and light blue). This error ϵ1 is extracted by subtracting the measurement data (b) and the fits (c) from the ideal linear curve for all ten cycles. In both cases, the error is directly related to the non-linearity of the potentiation/depression and is largest at the beginning of the potentiation (light blue) and depression curves (dark blue), as indicated by the red circles. This implies that weight updates when the memristor conductance is close to the beginning of the potentiation and depression curves result in a larger error ϵ1. Because the weights in our actor-critic reinforcement learning tasks are generally more often increased than decreased (their initial value is zero), the problem is more severe in the case of potentiation than depression (see for example Fig. 4b in the main text). The first update of each weight is therefore the most impacted one.