Fig. 4: Cliff walking task with PIC-RL.
From: High-efficiency reinforcement learning with hybrid architecture photonic integrated circuit

a The 4 × 12 cliff walking grid world task includes a starting point (S) and goal point (G). The agent seeks the optimal path marked by the arrowed green line. Insert shows the HyArch PIC configuration for solving the cliff walk task. b Numerical (up) and experimental (down) similarity map. The white arrow indicates the guiding direction. White numbers represent calculation errors in the similarity map. c Q-table of the 4 × 12 cliff walking task, displaying cumulative rewards for all state-action pairs. d Error map between the numerical and experimental Q-tables. e Training curves for the 4 × 12 cliff walking task based on 2000 agents. The SRF RL algorithm improves by 30.6% over the CRF RL algorithm. f A 10 × 10 grid world with a complex cliff environment and its optimal path. g Error map between numerical and experimental similarity calculation in the 10 × 10 grid world. h Training curves for the RL algorithm in the complex cliff environment, with 2000 agents, indicating a 12.2% improvement for the SRF RL algorithm over the CRF RL algorithm.