Extended Data Table 1 Energy consumption and latency of the actor-critic TD learning algorithm for three different scenarios (i) a crossbar implementation of our framework (‘This work’), (ii) common approaches of using memristors within RL applications (‘Hybrid’) and (iii), a full software implementation executed on a GPU
From: Actor–critic networks with analogue memristors mimicking reward-based learning
