Figure 12
From: Relative importance sampling for off-policy actor-critic in deep reinforcement learning

(a) Training summary of all algorithms of CartPole. The x-axis shows the total number of training episodes. The y-axis shows the averaged rewards over 300 episodes. (b) Training summary of all algorithms of Humanoid-v2. The x-axis shows the total number of training episodes. The y-axis shows the averaged rewards over 5000 episodes.