Figure 2 | Scientific Reports

Figure 2

From: Towards biologically plausible model-based reinforcement learning in recurrent spiking networks by dreaming new experiences

Figure 2

Dreaming. (A) Three consecutive frames of the reconstructed environment during the dreaming phase. (B) Example of the spiking activity in the two sub-networks during a dream (left). (right) Example, of the read out of the two sub-networks, representing the policy and the predicted position of the y paddle. (C) Average (dashed line), standard error (shading), and 80th percentile (solid line) over 10 independent realizations of the achieved reward. Reward as a function of the number of interactions with the environment, with (orange) and without (black) the dreaming phase. Thin lines represent the single realizations. (D) Average (dashed line), standard error (shading) and 80th percentile (solid line) of the reward when the policy gradient is active only during sleep. (E) Same as in (D), but the update of the model is interrupted after 40 (\(\times\)5000) interactions with the environment (blue dashed vertical line).

Back to article page