Fig. 2: The work flows of memristive Bellman solver for decision-making.

a The solving process of recurrent dot product Bellman equation for the application scene in Fig. 1b. Here, two operators are defined, that is, memristive Bellman dot operator (MBdot operator) and memristive Bellman recurrent operator (MBr operator). The blue arrows indicate that the current flows at each time step. b Memristive Bellman solver and memristive decision optimization. The Bellman solved by performance recurrent MBdot and MBr operations, until the difference between two adjacent MBr operation results lower than a specific threshold (ε). After the Bellman equation solved by MBS, the weights (conductance states of memristor) would be updated according to ε-greedy rule to optimize the decision. The updated weights would be compared with previous weights until the difference is less than the threshold (τ), namely, the weights are approaching stability, meaning the decision optimization process finished. c The value iteration numbers variation with the state space. The comparison of the (d), MBdot recurrent cycles and (e), MBr recurrent cycles with approximate solution and precise solution.