Fig. 1: Challenges and solutions for realizing memristive Bellman solver.

a Schematic of dynamic programming idea for various fields application, such as reinforcement learning, auto-driving and path planning etc. b A scene with three states (S1, S2, S3) where each state can transition to any other state (including itself). The optimized value of each state could be obtained by solving the Bellman equation for decision-making. c The action implicit backward induction Bellman equation (without time dimension) for optimizing each state value in (b). Due to the absence of time dimension, V(Sn) (red marked) may be indeterminate for any state. Therefore, although it seems compatible with MCIM, it cannot be determined at the time of hardware deployment whether the output corresponds to the subsequent input. A MBS for optimizing each state value in b realized by incorporating the temporal dimension and transforming the iterative solving process into recurrent dot product operations, facilitating the compatibility with MCIM (right panel). d In digital computing system, the Bellman solution process is a precise process. However, when there are same state transition probabilities, it will be necessary to conduct more iterations to achieve distinction. e The intrinsic noise of memristor would facilitates the distinction of same state transition probabilities, featuring approximate solution process and reducing iterations.