Figure 3 | Scientific Reports

Figure 3

From: Towards biologically plausible model-based reinforcement learning in recurrent spiking networks by dreaming new experiences

Figure 3

Planning. (A) Example of online “planning”, during the observation of the world (gray lines) the model is used to predict \(n_{fut}\) steps in the future. (B) Average (dashed line), standard error (shading) and 80th percentile (solid line) of the reward as a function of the number of interactions with the environment, with (green) and without planning (black, policy gradient only). (C) Average final reward as a function \(n_{fut}\).

Back to article page