Fig. 1: Protocol, behaviour and predictions.
From: A habit and working memory model as an alternative account of human reward-based learning

a, RLWM experimental paradigm. Participants performed multiple independent blocks of an RL task, using deterministic binary feedback to identify which of three actions was correct (Cor.) for each of ns stimuli. Varying ns targets WM load and allows me to isolate its contribution21. b, Behaviour (plotted as meanā±āstandard error) across six datasets on the RLWM task: CF1221, SZ24, EEG31, fMRI30, Dev34 and GL (novel dataset). Top: learning curves showing the probability of a correct action choice as a function of stimulus iteration number, plotted per set size, illustrating a strong set-size effect that highlights WM contributions to behaviour. Bottom: error trial analysis showing the number of previous errors that are the same as the chosen error (purple) or the other possible error (unchosen; cyan) as a function of set size. The large gap in low set sizes indicates that participants avoid errors they made previously more often than other errors; the absence of a gap in high set sizes indicates that participants are unable to learn to avoid their past errors (black arrows). c, Qualitative predictions for the RL, WM and H modules, based on the trial example in a. Only the WM module predicts a set-size effect21. Only the H module predicts that participants are more likely to repeat a previous error (for example, selecting action A1 for the triangle) than to avoid it.