Fig. 4: Recall property in dynamics environments and power efficiency of the system.
From: A self-adaptive hardware with resistive switching synapses for experience-based neurocomputing

a Experimental results for 9 successive trials of a maze which changes topological configuration every 3 trials. The system explores the environment to find the reward and it recalls the first solution once the previous configuration is proposed. b The time to get the solution improves from trial to trial along with the optimization of the policy. However, note that, once the maze changes shape, the reward time increases accordingly since a new solution must be found. When the maze comes back to the previous situation the first solution is recalled. c Energy consumption tendency for each core of the system. d Once the initial point is changed from trial to trial, the energy consumption stays high, but a policy map of the whole environment is retrieved. e Map of the firing rate of the neurons, showing that the highest values are, on average, in the nearby zone of the final reward. f Colour maps of the accuracy for standard Python-based deep Q-learning and the proposed bio-inspired approach under the same benchmarking condition. Note that the bio-inspired hardware assures better accuracy results for every combination of explorative parameters (number of trials per experiment and number of steps per single trial, i.e., exploration time). g Comparison in terms of memory computing elements between the deep Q-learning procedure and the bio-inspired solution at increasing sizes of the environment to explore. Note that the power consumption is also furtherly improved in the bio-inspired solution thanks to the use of RRAM memory devices built in the back end of the line, which avoids the von Neumann bottleneck typical of standard computing platforms.