Fig. 5: H agents learn to mimic an RL policy when WM contributes to guiding choice. | Nature Human Behaviour

Fig. 5: H agents learn to mimic an RL policy when WM contributes to guiding choice.

From: A habit and working memory model as an alternative account of human reward-based learning

Fig. 5

Left: I simulated RLWM (top) or WMH (bottom) mixture agents on a simple probabilistic two-armed bandit task. Right: the policy learned by the H agent (bottom) resembles an RL policy (top) when there is enough WM contribution to choices, in a probabilistic two-armed bandit task. I varied parameters ρ (indicating the contributions of the WM module) and β (indicating the noise in the softmax policy). The error bars indicate the standard error of the mean across n = 1,000 simulations.

Back to article page