Abstract
Recurrent neural networks (RNNs) are ubiquitously used in neuroscience to capture both neural dynamics and behaviours of living systems. However, when it comes to complex cognitive tasks, training RNNs with traditional methods can prove difficult and fall short of capturing crucial aspects of animal behaviour. Here we propose a principled approach for identifying and incorporating compositional tasks as part of RNN training. Taking as the target a temporal wagering task previously studied in rats, we design a pretraining curriculum of simpler cognitive tasks that reflect relevant subcomputations, which we term ‘kindergarten curriculum learning’. We show that this pretraining substantially improves learning efficacy and is critical for RNNs to adopt similar strategies as rats, including long-timescale inference of latent states, which conventional pretraining approaches fail to capture. Mechanistically, our pretraining supports the development of slow dynamical systems features needed for implementing both inference and value-based decision making. Overall, our approach helps endow RNNs with relevant inductive biases, which is important when modelling complex behaviours that rely on multiple cognitive functions.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The rat behavioural data and the statistical model of behaviour are detailed in ref. 19 and are available via Zenodo at https://doi.org/10.5281/zenodo.10031483 (ref. 45). The data used to generate figures in this manuscript, as well as a repository of pretrained RNN model files for different curriculum learning strategies, are available via Zenodo at https://doi.org/10.5281/zenodo.14907819 (ref. 46).
Code availability
The data were analysed with code written in Python (Python v3.9.5, Pytorch v1.8.0), as well as Matlab (v2023b). The code used to train RNNs, analyse data and generate figures is available at GitHub via https://github.com/Savin-Lab-Code/kind_cl (ref. 47) and as CodeOcean capsule48.
References
Yang, G. R. & Wang, X.-J. Artificial neural networks for neuroscientists: a primer. Neuron 107, 1048–1070 (2020).
Krueger, K. A. & Dayan, P. Flexible shaping: How learning in small steps helps. Cognition 110, 380–394 (2009).
Narvekar, S. et al. Curriculum learning for reinforcement learning domains: a framework and survey. J. Mach. Learn. Res. 21, 181 (2020).
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proc. 26th Annual Int. Conference on Machine Learning 41–48 (Association for Computing Machinery, 2009).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Int. Conference on Machine Learning, 1126–1135 (PMLR, 2017).
Thrun, S. & Pratt, L. in Learning to Learn (eds Thrun, S. & Pratt, L.) 3–17 (Springer, 1998).
Harlow, H. F. The formation of learning sets. Psychol. Rev. 56, 51–65 (1949).
Skinner, B. F. How to teach animals. Sci. Am. 185, 26–29 (1951).
Savin, C. & Triesch, J. Emergence of task-dependent representations in working memory circuits. Front. Comput. Neurosci. 8, 57 (2014).
McAndrew, R. & Helms Tillery, S. I. Laboratory primates: their lives in and after research. Temperature https://doi.org/10.1080/23328940.2016.1229161 (2016).
Chowdhury, S. A. & DeAngelis, G. C. Fine discrimination training alters the causal contribution of macaque area mt to depth perception. Neuron 60, 367–377 (2008).
Arlt, C. et al. Cognitive experience alters cortical involvement in goal-directed navigation. eLife 11, e76051 (2022).
Wang, J. X. Meta-learning in natural and artificial intelligence. Curr. Opin. Behav. Sci. 38, 90–95 (2021).
Vanderschuren, L. J., Achterberg, E. M. & Trezza, V. The neurobiology of social play and its rewarding value in rats. Neurosci. Biobehav. Rev. 70, 86–105 (2016).
Vanderschuren, L. J. & Trezza, V. What the laboratory rat has taught us about social play behavior: role in behavioral development and neural mechanisms. In The Neurobiology of Childhood (eds Andersen, S. L. & Pine, D. S.) 189–212 (Springer, 2014).
Baarendse, P. J., Limpens, J. H. & Vanderschuren, L. J. Disrupted social development enhances the motivation for cocaine in rats. Psychopharmacology 231, 1695–1704 (2014).
Einon, D. F. & Morgan, M. A critical period for social isolation in the rat. Dev. Psychobiol. J. Int. Soc. Dev. Psychobiol. 10, 123–132 (1977).
Zador, A. M. A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 10, 3770 (2019).
Mah, A., Schiereck, S. S., Bossio, V. & Constantinople, C. M. Distinct value computations support rapid sequential decisions. Nat. Commun. 14, 7573 (2023).
Schiereck, S. S. et al. Neural dynamics in the orbitofrontal cortex reveal cognitive strategies. Preprint at bioRxiv https://doi.org/10.1101/2024.10.29.620879 (2024).
Constantino, S. M. & Daw, N. D. Learning the opportunity cost of time in a patch-foraging task. Cog. Affect. Behav. Neurosci. 15, 837–853 (2015).
Charnov, E. L. Optimal foraging, the marginal value theorem. Theor. Popul. Biol. 9, 129–136 (1976).
McNamara, J. M. & Houston, A. I. Optimal foraging and learning. J. Theor. Biol. 117, 231–249 (1985).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Bartolo, R. & Averbeck, B. B. Inference as a fundamental process in behavior. Curr. Opin. Behav. Sci. 38, 8–13 (2021).
Averbeck, B. & O’Doherty, J. P. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 47, 147–162 (2022).
Miyashita, Y. & Chang, H. S. Neuronal correlate of pictorial short-term memory in the primate temporal cortexyasushi miyashita. Nature 331, 68–70 (1988).
Dubreuil, A., Valente, A., Beiran, M., Mastrogiuseppe, F. & Ostojic, S. The role of population structure in computations through neural dynamics. Nat. Neurosci. 25, 783–794 (2022).
Khona, M. & Fiete, I. R. Attractor and integrator networks in the brain. Nat. Rev. Neurosci. 23, 744–766 (2022).
Marschall, O. & Savin, C. Probing learning through the lens of changes in circuit dynamics. Preprint at bioRxiv https://doi.org/10.1101/2023.09.13.557585 (2023).
Sussillo, D. & Barak, O. Opening the black box: low-dimensional dynamics in high-dimensional recurrent neural networks. Neural Comput. 25, 626–649 (2013).
Elman, J. L. Learning and development in neural networks: the importance of starting small. Cognition 48, 71–99 (1993).
Kepple, D., Engelken, R. & Rajan, K. Curriculum learning as a tool to uncover learning principles in the brain. In Int. Conference on Learning Representations (2022).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Jensen, K. T., Hennequin, G. & Mattar, M. G. A recurrent network model of planning explains hippocampal replay and human behavior. Nat. Neurosci. 27, 1340–1348 (2024).
Braun, D. A., Mehring, C. & Wolpert, D. M. Structure learning in action. Behav. Brain Res. 206, 157–165 (2010).
Makino, H. Arithmetic value representation for hierarchical behavior composition. Nat. Neurosci. 26, 140–149 (2023).
Driscoll, L. N., Shenoy, K. & Sussillo, D. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs. Nat. Neurosci. 27, 1349–1363 (2024).
Gupta, D., DePasquale, B., Kopec, C. D. & Brody, C. D. Trial-history biases in evidence accumulation can give rise to apparent lapses in decision-making. Nat. Commun. 15, 662 (2024).
Richards, B. A. et al. A deep learning framework for neuroscience. Nat. Neurosci. 22, 1761–1770 (2019).
Ma, W. J. & Peters, B. A neural network walks into a lab: towards using deep nets as models for human behavior. Preprint at https://arxiv.org/abs/2005.02181 (2020).
Goldman, M. S. Memory without feedback in a neural network. Neuron 61, 621–634 (2009).
Mah, A., Schiereck, S., Bossio, V. & Constantinople, C. Distinct value computations support rapid sequential decisions (version v1). Zenodo https://doi.org/10.5281/zenodo.10031483 (2023).
Hocker, D., Constantinople, C. M. & Savin, C. Composition of simple computational tasks captures the inductive biases of animals in network models (version v1). Zenodo https://doi.org/10.5281/zenodo.14907819 (2025).
Hocker, D., Constantinople, C. M. & Savin, C. Savin-Lab-Code/kind_cl: Nature Machine Intelligence code (v1.0.0). Zenodo https://doi.org/10.5281/zenodo.14907734 (2025).
Hocker, D., Constantinople, C. M. & Savin, C. Compositional pretraining improves computational efficiency and matches animal behavior on complex tasks. Code Ocean Capsule https://doi.org/10.24433/CO.3440797.v1 (2025).
Arnold, T. B. & Emerson, J. W. Nonparametric goodness-of-fit tests for discrete null distributions. R J. 3, 34–39 (2011).
Acknowledgements
We thank L. Driscoll, P. Glimcher, V. Goudar, O. Marschall, K. Miller and J. Wang for helpful discussions and comments on the manuscript. We also thank M. Chittireddy for fixed-point characterization of shaping-trained RNNs. This work was supported by the NIMH (grant nos. 1K01MH132043-01A1 to D.H. and 1R01MH125571-01 to C.M.C. and C.S.). This work was supported in part through the NYU IT High Performance Computing resources, services and staff expertise. We gratefully acknowledge use of the research computing resources of the Empire AI Consortium, Inc., with support from the State of New York, the Simons Foundation and the Secunda Family Foundation.
Author information
Authors and Affiliations
Contributions
C.M.C. designed the behavioural task. D.H., C.M.C. and C.S. designed the curriculum training protocol. D.H. created the RNN model. D.H., C.M.C. and C.S analysed the data. D.H. prepared the figures. D.H., C.M.C. and C.S. wrote the manuscript. C.M.C. and C.S. supervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Maria Eckstein, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–7 and sections: hyperparameters, LSTM derivatives for linearized dynamics and additional MDP analyses.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hocker, D., Constantinople, C.M. & Savin, C. Compositional pretraining improves computational efficiency and matches animal behaviour on complex tasks. Nat Mach Intell 7, 689–702 (2025). https://doi.org/10.1038/s42256-025-01029-3
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01029-3