Fig. 2: Best RL and RL-like models.
From: Hybrid neural–cognitive models reveal how memory shapes human reward learning

a–c, Model architectures. Best RL (a) is the best handcrafted model based on Q-learning, identified using extensive model comparison (Supplementary Results and Supplementary Table 2). It contains a standard Q-value update (left oval) with decay of unchosen action values (left oval, dotted recurrent arrows for Q(¬a)), as well as a reward-agnostic choice perseveration mechanism (right oval). The outputs of both computations are combined additively to sample the next choice. RL-ANN (b) has the same architecture as Best RL, consisting of a reward module that computes Q-values (left oval) and an action module that computes a perseveration kernel (right oval). However, RL-ANN uses ANNs to allow each module to perform any update rule, making it a generalization of linear update models that encompasses Best RL as a special case. Vanilla RNN (c) is a standard RNN and the most flexible model. It provides an upper bound in terms of behavioural prediction. d, Model fits. Predictive accuracy was derived from the loss of each fitted model to held-out participants not seen during training (see ‘Model training’ in Methods). Best RL and RL-ANN predicted human choices significantly worse than Vanilla RNN, with no significant difference between them, according to two-sided t-tests (see the main text for statistics). The data are presented as mean values over held-out blocks (n = 413) plus/minus s.e.m. e,f, Reward processing. In classic Q-learning as modelled by Best RL (e), updated values Qt+1(at) increase monotonically both in the observed reward rt and in the previous value Qt(at) (colour), with strictly linear relationships (for model details and equations, see ‘Model architectures’ in Methods). After fitting to human behaviour, RL-ANN (f) acquired a qualitatively similar update rule with monotonically increasing and near-linear relationships. For ease of visualization, we averaged sampled values Qt(at) (colour) within quantile groups to obtain discrete lines.