Fig. 1: Trial structure of the reinforcement learning task and Q-learning model.

a Trial structure: The task involved gain and loss trials, where choosing the correct stimulus resulted in a win of 50 pence versus getting “nil” (gain) or where choosing the correct stimulus resulted in getting “nil” versus losing 50 pence (loss). b Q-learning model: Left panel (learning rate, \(\alpha\)) shows trajectories of the expected value of a given stimulus-action pair for an example time series of outcomes (green and red dashes), under a range of values of the learning rate; Right panel (inverse temperature, \(\beta\)) shows the probability of performing a given action as a function of the expected value of that action relative to the expected value of an alternative action, under a range of values of the inverse temperature.