Fig. 2

Methods and analysis. a Probabilistic reversal learning task. Subjects chose between a green and a blue stimulus with different reward magnitudes (displayed inside each stimulus). In addition to the random reward magnitude, stimulus value was determined by the probability of reward associated with each stimulus which drifted during the course of the experiment and had to be learned from feedback. After the choice was made, the red bar moved from left to right if the chosen option was rewarded. Reaching the silver bar was rewarded with £10, the gold bar with £20. In this example situation, the subject had chosen the green stimulus (gray frame), but was not rewarded so the red bar did not move. b Example of reward probability schedule and estimated volatility of the reward probability from a Bayesian learner when the stable phase came first1. Each session had a stable phase of 60 trials where one stimulus was rewarded 75% of trials, the other 25%, and a volatile phase with reward probabilities of 80 vs. 20%, swapping every 20 trials. The order was counterbalanced between subjects. Note, the reward rate and volatility estimates from the Bayesian learner are only shown to convey task structure and the difference in volatility between sub-sessions. The Bayesian learner model was not used for analysis. c Analysis. As in a conventional fMRI analysis, the blood-oxygen-level-dependent (BOLD) signal time course in every voxel was analyzed in a GLM with a design matrix containing relevant regressors. Additionally, one of the regressors modeled the estimated reward probability of the chosen option during the decision phase, another one the prediction error during the outcome phase. The same LRs were used for deriving the reward probability estimates and the prediction error regressors (these two regressors are referred to collectively as LR regressors). This analysis was repeated 30 times, deriving the beta-values for probability estimates and prediction errors based on 30 different LRs, testing their ability to explain signal variance. d With equally spaced LRs across the LR spectrum (0.01–0.99) the regressors would be more strongly correlated at higher LRs, therefore we derived 30 LRs with larger intervals between higher LRs, resulting in uniform correlation across the spectrum. e In a highly volatile environment, the stimulus-reward history should be more steeply discounted (higher LR) because information from many trials ago is likely to be outdated. The blue decay functions show the relative contribution of the previous trials’ outcomes to the current reward probability estimation with different LRs. We derived the best-fitting LR for every voxel in every subject. For example, within dACC the BOLD signal in some voxels is best explained by a low LR (red), in others by a high LR (yellow)