Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Sequential memory improves sample and memory efficiency in episodic control

A preprint version of the article is available at arXiv.

Abstract

Deep reinforcement learning algorithms are known for their sample inefficiency, requiring extensive episodes to reach optimal performance. Episodic reinforcement learning algorithms aim to overcome this issue by using extended memory systems to leverage past experiences. However, these memory augmentations are often used as mere buffers, from which isolated events are resampled for offline learning (for example, replay). In this Article, we introduce Sequential Episodic Control (SEC), a hippocampal-inspired model that stores entire event sequences in their temporal order and employs a sequential bias in their retrieval to guide actions. We evaluate SEC across various benchmarks from the Animal-AI testbed, demonstrating its superior performance and sample efficiency compared to several state-of-the-art models, including Model-Free Episodic Control, Deep Q-Network and Episodic Reinforcement Learning with Associative Memory. Our experiments show that SEC achieves higher rewards and faster policy convergence in tasks requiring memory and decision-making. Additionally, we investigate the effects of memory constraints and forgetting mechanisms, revealing that prioritized forgetting enhances both performance and policy stability. Further, ablation studies demonstrate the critical role of the sequential memory component in SEC. Finally, we discuss how fast, sequential hippocampal-like episodic memory systems could support both habit formation and deliberation in artificial and biological systems.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: SEC architecture.
Fig. 2: SEC memory storage and retrieval phases.
Fig. 3: Illustration of the benchmarks from the Animal-AI environment.
Fig. 4: Comparative performance of SEC against benchmark algorithms DQN, MFEC, ERLAM and NSEC.
Fig. 5: Effect of memory constraints on SEC and NSEC in the Double T-Maze.
Fig. 6: Forgetting enhances episodic control performance and policy stability in the Double T-Maze benchmark.

Similar content being viewed by others

Data availability

The data sets supporting the findings of this study are available via Zenodo at https://doi.org/10.5281/zenodo.11506323 (ref. 77).

Code availability

The implementation of the SEC model used in this study is available in the GitHub repository at https://github.com/IsmaelTito/SEC. The specific version used to generate the results is also archived on Zenodo at https://doi.org/10.5281/zenodo.14014111 (ref. 78).

References

  1. Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  2. Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    Article  Google Scholar 

  3. Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at http://arxiv.org/abs/1912.06680 (2019).

  4. Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, 1–58 (2017).

    Article  MATH  Google Scholar 

  5. Marcus, G. Deep learning: a critical appraisal. Preprint at http://arxiv.org/abs/1801.00631 (2018).

  6. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  MATH  Google Scholar 

  7. Baker, B. et al. Emergent tool use from multi-agent autocurricula. International Conference on Learning Representations (ICLR, 2020).

  8. Botvinick, M. et al. Reinforcement learning fast and slow. Trends Cogn. Sci. 23, 408–422 (2019).

    Article  MATH  Google Scholar 

  9. Hansen, S., Pritzel, A., Sprechmann, P., Barreto, A. & Blundell, C. Fast deep reinforcement learning using online adjustments from the past. In Adv. Neural Information Processing Systems (eds. Bengio, S. et al.) 10567–10577 (Curran Associates, 2018).

  10. Zhu, G., Lin, Z., Yang, G. & Zhang, C. Episodic reinforcement learning with associative memory. In International Conference on Learning Representations (eds Zhu, G, Lin, Z., Yang G. & Zhang, C.) 370–384 (Curran Associates, 2019).

  11. Lin, Z., Zhao, T., Yang, G. & Zhang, L. Episodic memory deep q-networks. In Proc. IJCAI International Joint Conference on Artificial Intelligence (ed. Lang, J.) 2433–2439 (IJCAI, 2018).

  12. Lee, S. Y., Sungik, C. & Chung, S. Y. Sample-efficient deep reinforcement learning via episodic backward update. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 2112–2121 (Curran Associates, 2019).

  13. Blundell, C. et al. Model-free episodic control. Preprint at http://arxiv.org/abs/1606.04460 (2016).

  14. Pritzel, A. et al. Neural episodic control. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Yeh, Y. W.) 2827–2836 (ACM, 2017).

  15. Yalnizyan-Carson, A. & Richards, B. A. Forgetting enhances episodic control with structured memories. Front. Comput. Neurosci. 16, 757244 (2022).

    Article  Google Scholar 

  16. Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).

    Article  MATH  Google Scholar 

  17. Voegtlin, T. & Verschure, P. F. What can robots tell us about brains? A synthetic approach towards the study of learning and problem solving. Rev. Neurosci. 10, 291–310 (1999).

    Article  MATH  Google Scholar 

  18. Lisman, J. E. & Idiart, M. A. Storage of 7+/-2 short-term memories in oscillatory subcycles. Science 267, 1512–1515 (1995).

    Article  MATH  Google Scholar 

  19. Jensen, O. & Lisman, J. E. Dual oscillations as the physiological basis for capacity limits. Behav. Brain Sci. 24, 126 (2001).

    Article  MATH  Google Scholar 

  20. Ramani, D. A short survey on memory based reinforcement learning. Preprint at http://arxiv.org/abs/1904.06736 (2019).

  21. Buzsáki, G. & Tingley, D. Space and time: the hippocampus as a sequence generator. Trends Cogn. Sci. 22, 853–869 (2018).

    Article  MATH  Google Scholar 

  22. Lisman, J. & Redish, A. D. Prediction, sequences and the hippocampus. Philos. Trans. R. Soc. B 364, 1193–1201 (2009).

    Article  MATH  Google Scholar 

  23. Verschure, P. F., Pennartz, C. M. & Pezzulo, G. The why, what, where, when and how of goal-directed choice: neuronal and computational principles. Philos. Trans. R. Soc. B 369, 20130483 (2014).

    Article  Google Scholar 

  24. Merleau-Ponty, M. et al. The Primacy of Perception: And Other Essays on Phenomenological Psychology, the Philosophy of Art, Hhistory, and Politics (Northwestern Univ. Press, 1964).

  25. Bornstein, A. M. & Norman, K. A. Reinstated episodic context guides sampling-based decisions for reward. Nat. Neurosci. 20, 997–1003 (2017).

    Article  Google Scholar 

  26. Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).

    Article  MATH  Google Scholar 

  27. Wu, C. M., Schulz, E. & Gershman, S. J. Inference and search on graph-structured spaces. Comput. Brain Behav. 4, 125–147 (2021).

    Article  MATH  Google Scholar 

  28. Johnson, A. & Redish, A. D. Neural ensembles in ca3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).

    Article  MATH  Google Scholar 

  29. Ludvig, E. A., Madan, C. R. & Spetch, M. L. Priming memories of past wins induces risk seeking. J. Exp. Psychol. Gen. 144, 24 (2015).

    Article  Google Scholar 

  30. Wang, S., Feng, S. F. & Bornstein, A. M. Mixing memory and desire: How memory reactivation supports deliberative decision-making. Wiley Interdiscip. Rev. Cogn. Sci. 13, e1581 (2022).

    Article  Google Scholar 

  31. Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).

    Article  MATH  Google Scholar 

  32. Santos-Pata, D. et al. Epistemic autonomy: self-supervised learning in the mammalian hippocampus. Trends Cogn. Sci. 25, 582–595 (2021).

    Article  MATH  Google Scholar 

  33. Santos-Pata, D. et al. Entorhinal mismatch: a model of self-supervised learning in the hippocampus. iScience 24, 102364 (2021).

    Article  Google Scholar 

  34. Amil, A. F., Freire, I. T. & Verschure, P. F. Discretization of continuous input spaces in the hippocampal autoencoder. Preprint at http://arxiv.org/abs/2405.14600 (2024).

  35. Rennó-Costa, C., Lisman, J. E. & Verschure, P. F. The mechanism of rate remapping in the dentate gyrus. Neuron 68, 1051–1058 (2010).

    Article  MATH  Google Scholar 

  36. Estefan, D. P. et al. Coordinated representational reinstatement in the human hippocampus and lateral temporal cortex during episodic memory retrieval. Nat. Commun. 10, 1–13 (2019).

    Google Scholar 

  37. de Almeida, L., Idiart, M. & Lisman, J. E. A second function of gamma frequency oscillations: an E%-max winner-take-all mechanism selects which cells fire. J. Neurosci. 29, 7497–7503 (2009).

    Article  Google Scholar 

  38. Skaggs, W. E., McNaughton, B. L., Wilson, M. A. & Barnes, C. A. Theta phase precession in hippocampal neuronal populations and the compression of temporal sequences. Hippocampus 6, 149–172 (1996).

    Article  MATH  Google Scholar 

  39. Redish, A. D. Vicarious trial and error. Nat. Rev. Neurosci. 17, 147–159 (2016).

    Article  Google Scholar 

  40. Clayton, N. S. & Dickinson, A. Episodic-like memory during cache recovery by scrub jays. Nature 395, 272–274 (1998).

    Article  Google Scholar 

  41. Foster, D. J. & Knierim, J. J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).

    Article  MATH  Google Scholar 

  42. Mattar, M. G. & Daw, N. D. Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21, 1609–1617 (2018).

    Article  MATH  Google Scholar 

  43. Eichenbaum, H. Memory: organization and control. Annu. Rev. Psychol. 68, 19–45 (2017).

    Article  MATH  Google Scholar 

  44. Estefan, D. P. et al. Volitional learning promotes theta phase coding in the human hippocampus. Proc. Natl Acad. Sci. USA 118, e2021238118 (2021).

    Article  Google Scholar 

  45. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018); https://doi.org/10.1109/tnn.2004.842673

  46. Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).

    Article  MATH  Google Scholar 

  47. Kubie, J. L. & Fenton, A. A. Heading-vector navigation based on head-direction cells and path integration. Hippocampus 19, 456–479 (2009).

    Article  Google Scholar 

  48. Mathews Z. et al. Insect-like mapless navigation based on head direction cells and contextual learning using chemo-visual sensors. 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems 2243–2250 (IEEE, 2009).

  49. Amil, A. F. & Verschure, P. F. Supercritical dynamics at the edge-of-chaos underlies optimal decision-making. J. Phys. Complex. 2, 045017 (2021).

    Article  MATH  Google Scholar 

  50. Verschure, P. F., Voegtlin, T. & Douglas, R. J. Environmentally mediated synergy between perception and behaviour in mobile robots. Nature 425, 620–624 (2003).

    Article  MATH  Google Scholar 

  51. Vikbladh, O., Shohamy, D. & Daw, N. Episodic contributions to model-based reinforcement learning. In Annual Conference on Cognitive Computational Neuroscience (CCN, 2017).

  52. Cazé, R., Khamassi, M., Aubin, L. & Girard, B. Hippocampal replays under the scrutiny of reinforcement learning models. J. Neurophysiol. 120, 2877–2896 (2018).

    Article  MATH  Google Scholar 

  53. Gonzalez, C., Lerch, J. F. & Lebiere, C. Instance-based learning in dynamic decision making. Cogn. Sci. 27, 591–635 (2003).

    MATH  Google Scholar 

  54. Gonzalez, C. & Dutt, V. Instance-based learning: integrating sampling and repeated decisions from experience. Psychological Rev. 118, 523 (2011).

    Article  MATH  Google Scholar 

  55. Lengyel, M. & Dayan, P. Hippocampal contributions to control: the third way. In Proc. Advances in Neural Information Processing Systems (eds. Platt, J. et al.) 889–896 (Curran, 2008).

  56. Freire, I. T., Moulin-Frier, C., Sanchez-Fibla, M., Arsiwalla, X. D. & Verschure, P. F. Modeling the formation of social conventions from embodied real-time interactions. PLoS ONE 15, e0234434 (2020).

    Article  Google Scholar 

  57. Papoudakis, G., Christianos, F., Rahman, A. & Albrecht, S. V. Dealing with non-stationarity in multi-agent deep reinforcement learning. Preprint at http://arxiv.org/abs/1906.04737 (2019).

  58. Freire, I. & Verschure, P. High-fidelity social learning via shared episodic memories can improve collaborative foraging. Paper presented at Intrinsically Motivated Open-Ended Learning Workshop@NeurIPS 2023 (2023).

  59. Albrecht, S. V. & Stone, P. Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  60. Freire, I. T., Arsiwalla, X. D., Puigbò, J.-Y. & Verschure, P. F. Limits of multi-agent predictive models in the formation of social conventions. In Proc. Artificial Intelligence Research and Development (eds Falomir, Z. et al.) 297–301 (IOS, 2018).

  61. Freire, I. T., Puigbò, J.-Y., Arsiwalla, X. D. & Verschure, P. F. Modeling the opponent’s action using control-based reinforcement learning. In Proc. Conference on Biomimetic and Biohybrid Systems (eds Vouloutsi, V. et al.) 179–186 (Springer, 2018).

  62. Freire, I. T., Arsiwalla, X. D., Puigbò, J.-Y. & Verschure, P. Modeling theory of mind in dyadic games using adaptive feedback control. Information 14, 441 (2023).

    Article  Google Scholar 

  63. Kahali, S. et al. Distributed adaptive control for virtual cyborgs: a case study for personalized rehabilitation. In Proc. Conference on Biomimetic and Biohybrid Systems (eds Meder, F. et al.) 16–32 (Springer, 2023).

  64. Freire, I. T., Guerrero-Rosado, O., Amil, A. F. & Verschure, P. F. Socially adaptive cognitive architecture for human-robot collaboration in industrial settings. Front. Robot. AI 11, 1248646 (2024).

    Article  Google Scholar 

  65. Verschure, P. F. Distributed adaptive control: a theory of the mind, brain, body nexus. BICA 1, 55–72 (2012).

    MATH  Google Scholar 

  66. Rosado, O. G., Amil, A. F., Freire, I. T. & Verschure, P. F. Drive competition underlies effective allostatic orchestration. Front. Robot. AI 9, 1052998 (2022).

    Article  Google Scholar 

  67. Daw, N. D. Are we of two minds? Nat. Neurosci. 21, 1497–1499 (2018).

    Article  MATH  Google Scholar 

  68. Freire, I. T., Urikh, D., Arsiwalla, X. D. & Verschure, P. F. Machine morality: from harm-avoidance to human-robot cooperation. In Proc. Conference on Biomimetic and Biohybrid Systems (eds Vouloutsi, V. et al.) 116–127 (Springer, 2020).

  69. Verschure, P. F. Synthetic consciousness: the distributed adaptive control perspective. Philos. Trans. R. Soc. B 371, 20150448 (2016).

    Article  MATH  Google Scholar 

  70. Goode, T. D., Tanaka, K. Z., Sahay, A. & McHugh, T. J. An integrated index: engrams, place cells, and hippocampal memory. Neuron 107, 805–820 (2020).

    Article  Google Scholar 

  71. Amil, A F., Albesa-González, A. & Verschure, P. F. M. J. Theta oscillations optimize a speed-precision trade-off in phase coding neurons. PLOS Comp. Biol. 20.12, e1012628 (2024).

    Article  Google Scholar 

  72. Tremblay, L. & Schultz, W. Relative reward preference in primate orbitofrontal cortex. Nature 398, 704–708 (1999).

    Article  MATH  Google Scholar 

  73. Cromwell, H. C., Hassani, O. K. & Schultz, W. Relative reward processing in primate striatum. Exp. Brain Res. 162, 520–525 (2005).

    Article  Google Scholar 

  74. Soldati, F., Burman, O. H., John, E. A., Pike, T. W. & Wilkinson, A. Long-term memory of relative reward values. Biol. Lett. 13, 20160853 (2017).

    Article  Google Scholar 

  75. Beyret, B. et al. The Animal-AI environment: training and testing animal-like artificial cognition. Preprint at http://arxiv.org/abs/1909.07483 (2019).

  76. Crosby, M., Beyret, B. & Halina, M. The Animal-AI olympics. Nat. Mach. Intell. 1, 257 (2019).

    Article  MATH  Google Scholar 

  77. Freire, I. T. Dataset for ‘Sequential memory improves sample and memory efficiency in episodic control’. Zenodo https://doi.org/10.5281/zenodo.11506323 (2024).

  78. Freire, I. T. IsmaelTito/SEC: SEC v.1.0 release (v.1.0.0). Zenodo https://doi.org/10.5281/zenodo.14014111 (2024).

Download references

Acknowledgements

This study was funded by the Counterfactual Assessment and Valuation for Awareness Architecture (CAVAA) project (European Innovation Council’s Horizon programme, grant no. 101071178) awarded to P.F.M.J.V.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: all authors. Methodology: I.T.F. and A.F.A. Software: I.T.F. and A.F.A. Data curation: I.T.F. Formal analysis: I.T.F. Resources: P.F.M.J.V. Writing—original draft: I.T.F. Writing—review and editing: all authors. Visualization: I.T.F. Supervision: P.F.M.J.V. Project administration: P.F.M.J.V. Funding acquisition: P.F.M.J.V. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Ismael T. Freire, Adrián F. Amil or Paul F. M. J. Verschure.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Mehdi Khamassi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance increase of SEC over NSEC across different memory- limit conditions in the Double T-maze benchmark.

Units reported are the total mean perfor- mance of SEC over NSEC. Each column shows SEC’s performance with a limited memory capacity of 125, 250, 500, and 1000 sequences respectively. Each row shows the memory ratio between SEC and NSEC, ranging from 1/1 (equal memory limit) to 1/8 (SEC memory limit is 8 times smaller than NSEC).

Extended Data Fig. 2 Ablation studies of the Sequential Episodic Control (SEC) algorithm in the Double T-Maze task.

The left panel presents the single mechanism ablations with average reward (top) and entropy (bottom). The right panel shows the double mechanism ablations under the same metrics. In the single ablations, ’SEC- noDist’ lacks the distance to the goal component, ’SEC-noRR’ lacks the relative reward component, and ’SEC-noGi’ lacks the eligibility score component. In the double ablations, ’SEC- soloDist’, ’SEC-soloRR’, and ’SEC-soloGi’ operate with only the distance to the goal, relative reward, and eligibility score components, respectively. The full SEC model is included as a benchmark in both panels. These graphs demonstrate the comparative impact of individual and combined components of the SEC algorithm on its performance and decision-making uncertainty. Vertical bars represent the average episode around which the memory was filled. Average values were computed using a sliding window of 20 episodes. Error bars represent SE.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Freire, I.T., Amil, A.F. & Verschure, P.F.M.J. Sequential memory improves sample and memory efficiency in episodic control. Nat Mach Intell 7, 43–55 (2025). https://doi.org/10.1038/s42256-024-00950-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-024-00950-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing