Sequential memory improves sample and memory efficiency in episodic control

Freire, Ismael T.; Amil, Adrián F.; Verschure, Paul F. M. J.

doi:10.1038/s42256-024-00950-3

Article
Published: 31 December 2024

Sequential memory improves sample and memory efficiency in episodic control

Ismael T. Freire ORCID: orcid.org/0000-0002-5740-1513¹^nAff3,
Adrián F. Amil¹ &
Paul F. M. J. Verschure²

Nature Machine Intelligence volume 7, pages 43–55 (2025)Cite this article

1440 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Deep reinforcement learning algorithms are known for their sample inefficiency, requiring extensive episodes to reach optimal performance. Episodic reinforcement learning algorithms aim to overcome this issue by using extended memory systems to leverage past experiences. However, these memory augmentations are often used as mere buffers, from which isolated events are resampled for offline learning (for example, replay). In this Article, we introduce Sequential Episodic Control (SEC), a hippocampal-inspired model that stores entire event sequences in their temporal order and employs a sequential bias in their retrieval to guide actions. We evaluate SEC across various benchmarks from the Animal-AI testbed, demonstrating its superior performance and sample efficiency compared to several state-of-the-art models, including Model-Free Episodic Control, Deep Q-Network and Episodic Reinforcement Learning with Associative Memory. Our experiments show that SEC achieves higher rewards and faster policy convergence in tasks requiring memory and decision-making. Additionally, we investigate the effects of memory constraints and forgetting mechanisms, revealing that prioritized forgetting enhances both performance and policy stability. Further, ablation studies demonstrate the critical role of the sequential memory component in SEC. Finally, we discuss how fast, sequential hippocampal-like episodic memory systems could support both habit formation and deliberation in artificial and biological systems.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: SEC memory storage and retrieval phases.**

**Fig. 3: Illustration of the benchmarks from the Animal-AI environment.**

**Fig. 4: Comparative performance of SEC against benchmark algorithms DQN, MFEC, ERLAM and NSEC.**

**Fig. 5: Effect of memory constraints on SEC and NSEC in the Double T-Maze.**

**Fig. 6: Forgetting enhances episodic control performance and policy stability in the Double T-Maze benchmark.**

Temporally extended successor feature neural episodic control

Article Open access 02 July 2024

Contextual prediction errors reorganize naturalistic episodic memories in time

Article Open access 11 June 2021

Abrupt hippocampal remapping signals resolution of memory interference

Article Open access 10 August 2021

Data availability

The data sets supporting the findings of this study are available via Zenodo at https://doi.org/10.5281/zenodo.11506323 (ref. ⁷⁷).

Code availability

The implementation of the SEC model used in this study is available in the GitHub repository at https://github.com/IsmaelTito/SEC. The specific version used to generate the results is also archived on Zenodo at https://doi.org/10.5281/zenodo.14014111 (ref. ⁷⁸).

References

Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).
Article MathSciNet MATH Google Scholar
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Article Google Scholar
Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at http://arxiv.org/abs/1912.06680 (2019).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, 1–58 (2017).
Article MATH Google Scholar
Marcus, G. Deep learning: a critical appraisal. Preprint at http://arxiv.org/abs/1801.00631 (2018).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article MATH Google Scholar
Baker, B. et al. Emergent tool use from multi-agent autocurricula. International Conference on Learning Representations (ICLR, 2020).
Botvinick, M. et al. Reinforcement learning fast and slow. Trends Cogn. Sci. 23, 408–422 (2019).
Article MATH Google Scholar
Hansen, S., Pritzel, A., Sprechmann, P., Barreto, A. & Blundell, C. Fast deep reinforcement learning using online adjustments from the past. In Adv. Neural Information Processing Systems (eds. Bengio, S. et al.) 10567–10577 (Curran Associates, 2018).
Zhu, G., Lin, Z., Yang, G. & Zhang, C. Episodic reinforcement learning with associative memory. In International Conference on Learning Representations (eds Zhu, G, Lin, Z., Yang G. & Zhang, C.) 370–384 (Curran Associates, 2019).
Lin, Z., Zhao, T., Yang, G. & Zhang, L. Episodic memory deep q-networks. In Proc. IJCAI International Joint Conference on Artificial Intelligence (ed. Lang, J.) 2433–2439 (IJCAI, 2018).
Lee, S. Y., Sungik, C. & Chung, S. Y. Sample-efficient deep reinforcement learning via episodic backward update. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 2112–2121 (Curran Associates, 2019).
Blundell, C. et al. Model-free episodic control. Preprint at http://arxiv.org/abs/1606.04460 (2016).
Pritzel, A. et al. Neural episodic control. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Yeh, Y. W.) 2827–2836 (ACM, 2017).
Yalnizyan-Carson, A. & Richards, B. A. Forgetting enhances episodic control with structured memories. Front. Comput. Neurosci. 16, 757244 (2022).
Article Google Scholar
Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).
Article MATH Google Scholar
Voegtlin, T. & Verschure, P. F. What can robots tell us about brains? A synthetic approach towards the study of learning and problem solving. Rev. Neurosci. 10, 291–310 (1999).
Article MATH Google Scholar
Lisman, J. E. & Idiart, M. A. Storage of 7+/-2 short-term memories in oscillatory subcycles. Science 267, 1512–1515 (1995).
Article MATH Google Scholar
Jensen, O. & Lisman, J. E. Dual oscillations as the physiological basis for capacity limits. Behav. Brain Sci. 24, 126 (2001).
Article MATH Google Scholar
Ramani, D. A short survey on memory based reinforcement learning. Preprint at http://arxiv.org/abs/1904.06736 (2019).
Buzsáki, G. & Tingley, D. Space and time: the hippocampus as a sequence generator. Trends Cogn. Sci. 22, 853–869 (2018).
Article MATH Google Scholar
Lisman, J. & Redish, A. D. Prediction, sequences and the hippocampus. Philos. Trans. R. Soc. B 364, 1193–1201 (2009).
Article MATH Google Scholar
Verschure, P. F., Pennartz, C. M. & Pezzulo, G. The why, what, where, when and how of goal-directed choice: neuronal and computational principles. Philos. Trans. R. Soc. B 369, 20130483 (2014).
Article Google Scholar
Merleau-Ponty, M. et al. The Primacy of Perception: And Other Essays on Phenomenological Psychology, the Philosophy of Art, Hhistory, and Politics (Northwestern Univ. Press, 1964).
Bornstein, A. M. & Norman, K. A. Reinstated episodic context guides sampling-based decisions for reward. Nat. Neurosci. 20, 997–1003 (2017).
Article Google Scholar
Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).
Article MATH Google Scholar
Wu, C. M., Schulz, E. & Gershman, S. J. Inference and search on graph-structured spaces. Comput. Brain Behav. 4, 125–147 (2021).
Article MATH Google Scholar
Johnson, A. & Redish, A. D. Neural ensembles in ca3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).
Article MATH Google Scholar
Ludvig, E. A., Madan, C. R. & Spetch, M. L. Priming memories of past wins induces risk seeking. J. Exp. Psychol. Gen. 144, 24 (2015).
Article Google Scholar
Wang, S., Feng, S. F. & Bornstein, A. M. Mixing memory and desire: How memory reactivation supports deliberative decision-making. Wiley Interdiscip. Rev. Cogn. Sci. 13, e1581 (2022).
Article Google Scholar
Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).
Article MATH Google Scholar
Santos-Pata, D. et al. Epistemic autonomy: self-supervised learning in the mammalian hippocampus. Trends Cogn. Sci. 25, 582–595 (2021).
Article MATH Google Scholar
Santos-Pata, D. et al. Entorhinal mismatch: a model of self-supervised learning in the hippocampus. iScience 24, 102364 (2021).
Article Google Scholar
Amil, A. F., Freire, I. T. & Verschure, P. F. Discretization of continuous input spaces in the hippocampal autoencoder. Preprint at http://arxiv.org/abs/2405.14600 (2024).
Rennó-Costa, C., Lisman, J. E. & Verschure, P. F. The mechanism of rate remapping in the dentate gyrus. Neuron 68, 1051–1058 (2010).
Article MATH Google Scholar
Estefan, D. P. et al. Coordinated representational reinstatement in the human hippocampus and lateral temporal cortex during episodic memory retrieval. Nat. Commun. 10, 1–13 (2019).
Google Scholar
de Almeida, L., Idiart, M. & Lisman, J. E. A second function of gamma frequency oscillations: an E%-max winner-take-all mechanism selects which cells fire. J. Neurosci. 29, 7497–7503 (2009).
Article Google Scholar
Skaggs, W. E., McNaughton, B. L., Wilson, M. A. & Barnes, C. A. Theta phase precession in hippocampal neuronal populations and the compression of temporal sequences. Hippocampus 6, 149–172 (1996).
Article MATH Google Scholar
Redish, A. D. Vicarious trial and error. Nat. Rev. Neurosci. 17, 147–159 (2016).
Article Google Scholar
Clayton, N. S. & Dickinson, A. Episodic-like memory during cache recovery by scrub jays. Nature 395, 272–274 (1998).
Article Google Scholar
Foster, D. J. & Knierim, J. J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).
Article MATH Google Scholar
Mattar, M. G. & Daw, N. D. Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21, 1609–1617 (2018).
Article MATH Google Scholar
Eichenbaum, H. Memory: organization and control. Annu. Rev. Psychol. 68, 19–45 (2017).
Article MATH Google Scholar
Estefan, D. P. et al. Volitional learning promotes theta phase coding in the human hippocampus. Proc. Natl Acad. Sci. USA 118, e2021238118 (2021).
Article Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018); https://doi.org/10.1109/tnn.2004.842673
Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).
Article MATH Google Scholar
Kubie, J. L. & Fenton, A. A. Heading-vector navigation based on head-direction cells and path integration. Hippocampus 19, 456–479 (2009).
Article Google Scholar
Mathews Z. et al. Insect-like mapless navigation based on head direction cells and contextual learning using chemo-visual sensors. 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2243–2250 (IEEE, 2009).
Amil, A. F. & Verschure, P. F. Supercritical dynamics at the edge-of-chaos underlies optimal decision-making. J. Phys. Complex. 2, 045017 (2021).
Article MATH Google Scholar
Verschure, P. F., Voegtlin, T. & Douglas, R. J. Environmentally mediated synergy between perception and behaviour in mobile robots. Nature 425, 620–624 (2003).
Article MATH Google Scholar
Vikbladh, O., Shohamy, D. & Daw, N. Episodic contributions to model-based reinforcement learning. In Annual Conference on Cognitive Computational Neuroscience (CCN, 2017).
Cazé, R., Khamassi, M., Aubin, L. & Girard, B. Hippocampal replays under the scrutiny of reinforcement learning models. J. Neurophysiol. 120, 2877–2896 (2018).
Article MATH Google Scholar
Gonzalez, C., Lerch, J. F. & Lebiere, C. Instance-based learning in dynamic decision making. Cogn. Sci. 27, 591–635 (2003).
MATH Google Scholar
Gonzalez, C. & Dutt, V. Instance-based learning: integrating sampling and repeated decisions from experience. Psychological Rev. 118, 523 (2011).
Article MATH Google Scholar
Lengyel, M. & Dayan, P. Hippocampal contributions to control: the third way. In Proc. Advances in Neural Information Processing Systems (eds. Platt, J. et al.) 889–896 (Curran, 2008).
Freire, I. T., Moulin-Frier, C., Sanchez-Fibla, M., Arsiwalla, X. D. & Verschure, P. F. Modeling the formation of social conventions from embodied real-time interactions. PLoS ONE 15, e0234434 (2020).
Article Google Scholar
Papoudakis, G., Christianos, F., Rahman, A. & Albrecht, S. V. Dealing with non-stationarity in multi-agent deep reinforcement learning. Preprint at http://arxiv.org/abs/1906.04737 (2019).
Freire, I. & Verschure, P. High-fidelity social learning via shared episodic memories can improve collaborative foraging. Paper presented at Intrinsically Motivated Open-Ended Learning Workshop@NeurIPS 2023 (2023).
Albrecht, S. V. & Stone, P. Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018).
Article MathSciNet MATH Google Scholar
Freire, I. T., Arsiwalla, X. D., Puigbò, J.-Y. & Verschure, P. F. Limits of multi-agent predictive models in the formation of social conventions. In Proc. Artificial Intelligence Research and Development (eds Falomir, Z. et al.) 297–301 (IOS, 2018).
Freire, I. T., Puigbò, J.-Y., Arsiwalla, X. D. & Verschure, P. F. Modeling the opponent’s action using control-based reinforcement learning. In Proc. Conference on Biomimetic and Biohybrid Systems (eds Vouloutsi, V. et al.) 179–186 (Springer, 2018).
Freire, I. T., Arsiwalla, X. D., Puigbò, J.-Y. & Verschure, P. Modeling theory of mind in dyadic games using adaptive feedback control. Information 14, 441 (2023).
Article Google Scholar
Kahali, S. et al. Distributed adaptive control for virtual cyborgs: a case study for personalized rehabilitation. In Proc. Conference on Biomimetic and Biohybrid Systems (eds Meder, F. et al.) 16–32 (Springer, 2023).
Freire, I. T., Guerrero-Rosado, O., Amil, A. F. & Verschure, P. F. Socially adaptive cognitive architecture for human-robot collaboration in industrial settings. Front. Robot. AI 11, 1248646 (2024).
Article Google Scholar
Verschure, P. F. Distributed adaptive control: a theory of the mind, brain, body nexus. BICA 1, 55–72 (2012).
MATH Google Scholar
Rosado, O. G., Amil, A. F., Freire, I. T. & Verschure, P. F. Drive competition underlies effective allostatic orchestration. Front. Robot. AI 9, 1052998 (2022).
Article Google Scholar
Daw, N. D. Are we of two minds? Nat. Neurosci. 21, 1497–1499 (2018).
Article MATH Google Scholar
Freire, I. T., Urikh, D., Arsiwalla, X. D. & Verschure, P. F. Machine morality: from harm-avoidance to human-robot cooperation. In Proc. Conference on Biomimetic and Biohybrid Systems (eds Vouloutsi, V. et al.) 116–127 (Springer, 2020).
Verschure, P. F. Synthetic consciousness: the distributed adaptive control perspective. Philos. Trans. R. Soc. B 371, 20150448 (2016).
Article MATH Google Scholar
Goode, T. D., Tanaka, K. Z., Sahay, A. & McHugh, T. J. An integrated index: engrams, place cells, and hippocampal memory. Neuron 107, 805–820 (2020).
Article Google Scholar
Amil, A F., Albesa-González, A. & Verschure, P. F. M. J. Theta oscillations optimize a speed-precision trade-off in phase coding neurons. PLOS Comp. Biol. 20.12, e1012628 (2024).
Article Google Scholar
Tremblay, L. & Schultz, W. Relative reward preference in primate orbitofrontal cortex. Nature 398, 704–708 (1999).
Article MATH Google Scholar
Cromwell, H. C., Hassani, O. K. & Schultz, W. Relative reward processing in primate striatum. Exp. Brain Res. 162, 520–525 (2005).
Article Google Scholar
Soldati, F., Burman, O. H., John, E. A., Pike, T. W. & Wilkinson, A. Long-term memory of relative reward values. Biol. Lett. 13, 20160853 (2017).
Article Google Scholar
Beyret, B. et al. The Animal-AI environment: training and testing animal-like artificial cognition. Preprint at http://arxiv.org/abs/1909.07483 (2019).
Crosby, M., Beyret, B. & Halina, M. The Animal-AI olympics. Nat. Mach. Intell. 1, 257 (2019).
Article MATH Google Scholar
Freire, I. T. Dataset for ‘Sequential memory improves sample and memory efficiency in episodic control’. Zenodo https://doi.org/10.5281/zenodo.11506323 (2024).
Freire, I. T. IsmaelTito/SEC: SEC v.1.0 release (v.1.0.0). Zenodo https://doi.org/10.5281/zenodo.14014111 (2024).

Download references

Acknowledgements

This study was funded by the Counterfactual Assessment and Valuation for Awareness Architecture (CAVAA) project (European Innovation Council’s Horizon programme, grant no. 101071178) awarded to P.F.M.J.V.

Author information

Ismael T. Freire
Present address: Institute of Intelligent Systems and Robotics, Sorbonne University, Paris, France

Authors and Affiliations

Donders Institute for Brain Cognition and Behaviour – Centre for Neuroscience (DCN-FNWI), Radboud University, Nijmegen, the Netherlands
Ismael T. Freire & Adrián F. Amil
Alicante Institute of Neuroscience, Department of Health Psychology, Universidad Miguel Hernandez de Elche, Elche, Spain
Paul F. M. J. Verschure

Authors

Ismael T. Freire
View author publications
Search author on:PubMed Google Scholar
Adrián F. Amil
View author publications
Search author on:PubMed Google Scholar
Paul F. M. J. Verschure
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: all authors. Methodology: I.T.F. and A.F.A. Software: I.T.F. and A.F.A. Data curation: I.T.F. Formal analysis: I.T.F. Resources: P.F.M.J.V. Writing—original draft: I.T.F. Writing—review and editing: all authors. Visualization: I.T.F. Supervision: P.F.M.J.V. Project administration: P.F.M.J.V. Funding acquisition: P.F.M.J.V. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Ismael T. Freire, Adrián F. Amil or Paul F. M. J. Verschure.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Mehdi Khamassi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance increase of SEC over NSEC across different memory- limit conditions in the Double T-maze benchmark.

Units reported are the total mean perfor- mance of SEC over NSEC. Each column shows SEC’s performance with a limited memory capacity of 125, 250, 500, and 1000 sequences respectively. Each row shows the memory ratio between SEC and NSEC, ranging from 1/1 (equal memory limit) to 1/8 (SEC memory limit is 8 times smaller than NSEC).

Extended Data Fig. 2 Ablation studies of the Sequential Episodic Control (SEC) algorithm in the Double T-Maze task.

The left panel presents the single mechanism ablations with average reward (top) and entropy (bottom). The right panel shows the double mechanism ablations under the same metrics. In the single ablations, ’SEC- noDist’ lacks the distance to the goal component, ’SEC-noRR’ lacks the relative reward component, and ’SEC-noGi’ lacks the eligibility score component. In the double ablations, ’SEC- soloDist’, ’SEC-soloRR’, and ’SEC-soloGi’ operate with only the distance to the goal, relative reward, and eligibility score components, respectively. The full SEC model is included as a benchmark in both panels. These graphs demonstrate the comparative impact of individual and combined components of the SEC algorithm on its performance and decision-making uncertainty. Vertical bars represent the average episode around which the memory was filled. Average values were computed using a sliding window of 20 episodes. Error bars represent SE.

Supplementary information

Supplementary Information

Supplementary Tables 1–6.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Freire, I.T., Amil, A.F. & Verschure, P.F.M.J. Sequential memory improves sample and memory efficiency in episodic control. Nat Mach Intell 7, 43–55 (2025). https://doi.org/10.1038/s42256-024-00950-3

Download citation

Received: 06 October 2022
Accepted: 05 November 2024
Published: 31 December 2024
Issue date: January 2025
DOI: https://doi.org/10.1038/s42256-024-00950-3

This article is cited by

A scalable reinforcement learning framework inspired by hippocampal memory mechanisms for efficient contextual and sequential decision making
- Hamed Poursiami
- Ayana Moshruba
- Maryam Parsa
Scientific Reports (2025)