Fig. 1: Task, computational model and behavioural results.

A Probabilistic reversal learning (PRL) task. At the beginning of each trial participants were shown a fixation cross of jittered duration (between 1 and 4 s) followed by two geometrical shapes, which were sampled from a pool of 18 different geometrical shapes and appeared randomly on either side of the screen for 1.25 s. Participants had 1 s to make a choice via a button press. In case of late responses, they were prompted to respond faster. The outcome of their decision was displayed for 0.65 s after a second jittered interstimulus interval and was either positive (+10) or negative (−10). As shown in the box, the reinforcement schedule was probabilistic and asymmetrically skewed so that one stimulus (i.e. the high probability stimulus) was more likely to be associated with a positive outcome than the other stimulus (i.e. the low probability stimulus). Crucially, reinforcement contingencies were reversed based on a fixed learning criterion. We also added a randomly generated number of buffer trials before reversals to prevent participants from easily figuring out the underlying learning criterion. Prior to the experiment, participants were advised of the probabilistic nature of the task and that reinforcement contingencies might reverse based on their performance. They were allowed to practice the task for 5 min to become familiar with the speed requirements of the task prior to fMRI scanning. Moreover, they were advised they would be remunerated based on their task performance. ISI stands for interstimulus interval. B The choice rule of the computational model was a pure drift-diffusion process. Solid EA traces represent moment-by-moment ramping up of decision evidence at fast (blue) and slow (light blue) accumulation rates. The shaded areas under the EA ramps denote the amount of integrated decision evidence. Convolving a greater amount of integrated evidence with the haemodynamic response function (HRF) yields increasingly higher predicted peak BOLD responses. nDT stands for nondecision time. Raincloud plots showing pre-cCBT between-group comparisons of median RTs (C) and mean choice accuracies (D) recorded during the PRL task. Boxplots display median and interquartile range. Colour-coded dots represent responders (R, green) and nonresponders (NR, red).