Fig. 4: Behavioural model validation. | Nature Human Behaviour

Fig. 4: Behavioural model validation.

From: Hybrid neural–cognitive models reveal how memory shapes human reward learning

Fig. 4: Behavioural model validation.

a–g, Comparison of human and simulated model behaviour. As shown in a, humans prefer increasing rewards. Humans (top left) repeated a choice (‘stayed’; y axis) more often when the rewards for this choice had increased (positive reward change; x axis) rather than decreased (negative reward change) on the previous two trials. Best-RL (Supplementary Fig. 11a) and RL-ANN (top right) showed the inverse pattern, whereas Context-ANN and Memory-ANN qualitatively reproduced the effect. The data are presented as mean values over blocks (n = 4,134) for each reward change bin, plus/minus 95% bootstrapped CIs. Panel b shows example action sequences. These are raw sequences of chosen actions (coloured circles) from humans and models performing the reward schedule shown in the top row of Supplementary Fig. 5a. Humans showed two common patterns: multiple repeats, extended periods of the same action (grey lines); and cyclic responses, sets of four sequential trials in which each action was sampled once (black lines). Panel c shows the effect of reward change on stay probability (see a). The data are presented as the regression coefficients from the model stay ~ reward × reward_change (n = 862 participants), plus/minus standard errors of the coefficient estimates. As indicated in d, humans showed longer sequences of identical actions (average length, 6.9; 95% CI, (6.1, 7.6)) than expected by chance (chance length, 1.3; t861 = 14.6, P < 0.001, d = 0.50) or seen in RL-ANN (average length, 4.5; 95% CI, (3.9, 5.1); t861 = 9.4, P < 0.001, d = 0.32) and Context-ANN (average length, 5.5; 95% CI, (4.8, 6.1); t861 = 5.4, P < 0.001, d = 0.19). Memory-ANN sequence length did not differ from that of humans (average length, 7.5; 95% CI, (6.8, 8.3); t861 = −1.8, P = 0.075, d = 0.06). As shown in e, human choices contained more cyclic sequences than synthetic data (human mean, 5.37; 95% CI, (5.01, 5.68); RL-ANN mean, 2.68; 95% CI, (2.59, 2.76), t861 = 10.9, P < 0.001, d = 0.37; Context-ANN mean, 3.87; 95% CI, (3.77, 3.97), t861 = 7.77, P < 0.001, d = 0.26). Memory-ANN produced the qualitatively closest number of cyclic sequences compared to humans (Memory-ANN mean, 4.62; 95% CI, (4.48, 4.76), t861 = 5.7, P < 0.001, d = 0.19). As shown in f, we used the Lempel–Ziv–Welch algorithm to compress human and model action sequences (see ‘Behavioural analyses’ in Methods), quantifying systematic temporal structure. Human sequences were substantially more compressible than those of RL-ANN and Context-ANN (human mean, 1.73; 95% CI, (1.70, 1.76); RL-ANN mean, 1.45; 95% CI, (1.44, 1.45), t861 = 20.48, P < 0.001, d = 0.70; Context-ANN mean, 1.60; 95% CI, (1.59, 1.61), t861 = 9.47, P < 0.001, d = 0.322). Memory-ANN compressibility did not differ from that of humans (mean, 1.74; 95% CI, (1.72, 1.76), t861 = 0.69, P = 0.49, d = 0.02). In d–f, the data are presented as violin plots of the raw data distribution (n = 862 participants) and error bars indicating 95% CIs of the mean. P < 0.001 in paired, two-sided t-tests. Panel g shows the integration timescale. Weights of trial-history regression models trained to predict future choices on the basis of past choices and rewards (see ‘Behavioural analyses’ in Methods) are plotted. Only Memory-ANN reproduced the patterns seen in human behaviour qualitatively. The data are presented as mean values over participants (n = 862) for each trial, plus/minus 95% bootstrapped CIs. (The corresponding plots for Simple RL, Best RL and Vanilla RNN for all these measures are shown in Supplementary Figs. 10 and 11).

Back to article page