Fig. 3
From: Continual deep reinforcement learning with task-agnostic policy distillation

Performance evaluation in the task-agnostic phase. The environment is uniformly sampled, indicating no task-boundaries. Runs averaged over 8 random seeds. Timesteps=300,000 between distillation rounds in the task-agnostic phases. Averages are taken over 100 episodes.