Fig. 3: Emulation of three-factor synaptic plasticity and reinforcement learning.
From: Chalcogenide optomemristors for multi-factor neuromorphic computation

A Mixed mode behavior of a non-volatile Ag/GeSe3/Ag device. In the absence of light (yellow trace), electrical pulses applied to the device do not induce a switching event. Under illumination (blue trace) however, electrical pulses can trigger HRS to LRS switching. B Sketch of a rodent in a maze. Place cells represent the rodent’s location in the maze, such that, at each location, one unique place cell is active. Each action cell represents one of four movement directions, and each location triggers one of the four movements. Initially, all synaptic weights equal zero, and through the exploration of the maze and reinforcement learning the rodent learns the weights that enable correct navigation from the initial position to the cheese reward. Each synapse and its weight here are emulated by a non-volatile type memristive device and its conductance. Reinforcement learning emerges through a three-factor synaptic plasticity rule. The rule involves an eligibility flag, which in our case is the illumination of the corresponding memristor, and a reward applied as an electrical signal sent to all memristors. C Example trials during the rodent’s training. Each time the rodent moves, an eligibility flag (optical signal) is raised at the synapses of the corresponding place and action cell (red trace). The three eligible synaptic weights are not updated by the optical flag alone, e.g. in an unsuccessful trial (top sequence). A successful trial provides the electrical reward that potentiates the eligible synapses (bottom sequence). D Results of training. In the learned weight matrix of the neural network, the electrical conductance (in µS) of the memristive synapses maps each place to an action. This learned mapping corresponds to the correct path to the cheese (inset).