Figure 2

Model predictions. (a) In the unequal information condition (when information concerning the 3 decks was not always available), sRL (right frames) predicts a higher probability of choosing highly rewarded option (i.e., exploitation), whereas kRL (left frames) predicts a higher probability of choosing the most informative/never experienced option (i.e., 0seen). (b) Both models predict a decrement in directed exploration in high reward contexts (when the generative mean was set to 50 points) compared to low reward contexts (when the generative mean was set to 30 points). However, in the high reward context, kRL still chooses more often 0seen options compared to exploitation (left frames), whereas sRL chooses more often exploitative options (right). (c) In the equal information condition (Baseline + Reward) under different reward contexts, both models predict increased random exploration in the high reward context compared to the low reward context.