Extended Data Table 1 Energy consumption and latency of the actor-critic TD learning algorithm for three different scenarios (i) a crossbar implementation of our framework (‘This work’), (ii) common approaches of using memristors within RL applications (‘Hybrid’) and (iii), a full software implementation executed on a GPU

From: Actor–critic networks with analogue memristors mimicking reward-based learning

  1. The energy consumption is given per single operation and is normalized to the number of weights. Here, we assume an fp16 precision of the GPU. For both the activity and weight update calculation on the GPU we assumed a vector-matrix operation. In practice, the weight update calculation would be a vector-vector multiplication which cannot benefit from the same degree of parallelism and would be less efficient. The latency values for the operations performed on the GPU are taken from97, and were extracted for the specific matrix sizes in their algorithm. Thereby they can only serve as a rough reference. : The energy consumption and latency for the GPU operation of vector-matrix multiplications already includes fetching the data from memory and storing the results.