Fig. 5: Prediction accuracy of Q-learning model with four alternative replay policies.
From: Post-learning replay of hippocampal-striatal activity is biased by reward-prediction signals

a Normalised error score for each of 6 rats, with varying numbers of samples replayed between sessions, averaged over all trials for each rat, according to the four replay policies shown. Error scores normalised to the average error with no replay. Dashed line represents baseline with no replay. Error bars represent s.e.m. * indicates score for RPE-prioritised replay statistically different from 1 (one-sided linear mixed-effects model, p < 0.05, uncorrected). b, c Average error for each session, normalised to the average error for no-replay for each animal. With 1 sample replayed between each session (b) and 20 samples replayed between each session (c). Error bars represent s.e.m. d, e Average normalised error for each session, with varying numbers of samples replayed. d RPE-prioritised replay policy. e RPE-proportional replay policy. Source data for (a–e) are provided as a Source Data file.