Fig. 5

VmPFC activity—experiment 3. In experiment 3, each of the three options was associated with a drifting probability of reward. a Left panel: sigmoid functions illustrating the proportion of trials on which a stimulus (stimulus A) was chosen as a function of the difference between the values estimated for that stimulus and the alternative option (stimulus B). The value estimates were derived from a standard reinforcement learning algorithm (METHODS: Reinforcement learning—experiment 3). a Right panel: normalized choice values used in the GLM analysis. As in Fig. 3c (right panel) normalization was carried out separately on the chosen option value estimates, unchosen option value estimates, and the chosen–-unchosen value difference estimates. The effect shown in b is unpacked in c, demonstrating that activity in macaque vmPFC decreases as the value of the chosen option increases and increases as the value of the unchosen option increases. Note that in experiment 3 trials were performed quickly so that activity in the first seven seconds, approximately, reflects the current trial (trial n). Later activity reflects decisions on subsequent trials (trial n + 1). The gray vertical bar indicates the approximate boundary between trial n and n + 1. On 66% of occasions the option chosen on trial n would be offered again on trial n + 1 (and it was often chosen again) and on 66% of occasions the unchosen option on trial n would be offered again (in which case it was frequently unchosen again) and so the contrasts for trial n capture activity also on trial n+1