A framework based on actor–critic temporal difference learning and employing a biologically plausible network architecture that mimics reward-based learning on memristors and enables full in-memory training for navigation tasks is discussed.
- Kevin Portner
- Till Zellweger
- Alexandros Emboras