Figure 10
From: Relative importance sampling for off-policy actor-critic in deep reinforcement learning

(a,b) Training summary of RIS-off-PAC and RIS-off-PNAC respectively for different value of \(\beta \in [0,1]\). The x-axis shows the total number of training episodes. The y-axis shows the averaged rewards over 100 episodes.