Fig. 4: Results replicate in a probabilistic learning task.
From: A habit and working memory model as an alternative account of human reward-based learning

a, Model comparison showing the results from a family of models manipulating the subjective outcome value of outcome 0, r0, for RL, WM or bothāwith r0 a free parameter unless labelled to its fixed value. r0ā=ā0 corresponds to standard RL or WM computations; r0ā=ā1 corresponds to an H agent that handles both outcomes similarly. Highlighted in pink are agents that can be interpreted as WMH and in brown those that correspond to RL mixtures. The winning model RLr0ā=ā1; WMr0ā=ā0 assumes RLr0ā=ā1 and WMr0ā=ā0 and is thus a WMH agent, replicating the findings in the deterministic version of the task. I further verified that the winning model was better than the best single-process model, WMf (Methods). The data are plotted as individual (dots) and group mean AIC (± standard error), baselined to the group mean best model; the right plot shows the proportion of participants best fit by each model. b, A set-size effect was also observed in a probabilistic version of the task; the winning model (third from the left) captures the learning curve pattern better than the competing models. The error bars indicate the standard error of the mean across nā=ā34 individual participants (dots in a).