Fig. 3: Evaluation results for sparse-reward tasks.
From: Discovery of the reward function for embodied reinforcement learning agents

a The environments used in the sparse-reward task, including CartPole-v1, Acrobot-v1, FourRoom-v0, and LunarLander-v2. b The sizes of the observation and action spaces of the four tasks, illustrating the varying complexity levels. c The reward curves created during the learning processes of the DQN agent with different reward functions on different tasks. The solid lines show the average rewards, and the error bars indicate the standard deviations across five trials using different random seeds. d The reward curves created during the learning processes of PPO agent with different reward functions on different tasks. e The average rewards obtained by the embodied RL agent during all episodes, with the standard deviations of the rewards received when different algorithms performed each task.