Figure 4

Description of the best model. The best model was selected using a group-level random-effect analysis. It included the third variant of RL (as if subjects figured out that only the outcome valence, and not the monetary amount, was informative about cue value, and that the two cues always had opposite valence such that they could both be updated after every outcome). Confidence was based on outcome optimality and used to modulate both the learning rate and choice temperature, with a same weight. Q is cue value; C is confidence; Op is outcome optimality (1 for winning £1 or losing 10p, -1 otherwise); Val is outcome valence (1 if positive, −1 otherwise); P risky/A is the probability of choosing the risky option when cue A is on screen. γ is confidence learning rate; α0 is learning rate value when confidence=0; β0 is choice temperature value when confidence=0; κ is the weight of confidence on learning rate and choice temperature.