Fig. 3: Goodness of fit of the optimised Q-learning parameters, with no replay. | Nature Communications

Fig. 3: Goodness of fit of the optimised Q-learning parameters, with no replay.

From: Post-learning replay of hippocampal-striatal activity is biased by reward-prediction signals

Fig. 3: Goodness of fit of the optimised Q-learning parameters, with no replay.

a Reliability diagram (trials pooled across all animals). Observed action probability indicates how often an action was chosen by the animal, averaged over similar predicted action probabilities. Data points represent per-rat percentile averages of action probabilities. b Histogram of residuals of the data in (a). Colour scale indicates, on average, which session the residuals within each bin occurred in. c Range of error scores for each trial (calculated from residuals) for each animal. An error of 0 reflects perfect modelling of action choices. Boxes represent 25th and 75th percentiles, circles represent median, whiskers represent range. n = 603–1394 error scores per rat. d Error scores pooled across rats and grouped into training sessions, normalised to the average error for each animal (shown in Table 1). n=55-523 error scores per session. Data points show normalised error for all trials; solid line represents mean for all animals. Error bars represent s.e.m. e Change in error score, normalised to the optimised error score for each animal, with varying perturbations to the optimised parameter values. The optimised values for learning rate α, discount factor γ and exploration factor ϵ were individually perturbed by 1–50% above and below the optimised value and the Q-learning algorithm was trained on behavioural data according to the perturbed parameter values 1000 times to obtain an average. Source data for (ae) are provided as a Source Data file.

Back to article page