Fig. 3: Maintaining plasticity in a non-stationary reinforcement-learning problem. | Nature

Fig. 3: Maintaining plasticity in a non-stationary reinforcement-learning problem.

From: Loss of plasticity in deep continual learning

Fig. 3

a, The reinforcement-learning agent controls torques at the eight joints of the simulated ant (red circles) to maximize forward motion and minimize penalties. b, Here we use a version of the ant problem in which the friction on contact with the ground is abruptly changed every 2 million time steps. c, The standard PPO learning algorithm fails catastrophically on the non-stationary ant problem. If the optimizer of PPO (Adam) is tuned in a custom way, then the failure is less severe, but adding continual backpropagation or L2 regularization is necessary to perform well indefinitely. These results are averaged over 100 runs; the solid lines represent the mean and the shaded regions represent the 95% bootstrapped confidence interval.

Back to article page