Abstract
Since the earliest proposals for artificial neural network models of the mind and brain, critics have pointed out key weaknesses in these models compared with human cognitive abilities. Here we review recent work that uses metalearning to overcome several classic challenges, which we characterize as addressing the problem of incentive and practice—that is, providing machines with both incentives to improve specific skills and opportunities to practice those skills. This explicit optimization contrasts with more conventional approaches that hope that the desired behaviour will emerge through optimizing related but different objectives. We review applications of this principle to address four classic challenges for artificial neural networks: systematic generalization, catastrophic forgetting, few-shot learning and multi-step reasoning. We also discuss how large language models incorporate key aspects of this metalearning framework (namely, sequence prediction with feedback trained on diverse data), which helps to explain some of their successes on these classic challenges. Finally, we discuss the prospects for understanding aspects of human development through this framework, and whether natural environments provide the right incentives and practice for learning how to make challenging generalizations.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).
Rosenblatt, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).
Rumelhart, D. E., McClelland, J. L. & PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (MIT Press, 1986).
McClelland, J. L., Rumelhart, D. E. & PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models (MIT Press, 1986).
Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).
Marcus, G. F. Rethinking eliminative connectionism. Cogn. Psychol. 37, 243–282 (1998).
Lake, B. M. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2873–2882 (PMLR, 2018).
Greff, K., van Steenkiste, S. & Schmidhuber, J. On the binding problem in artificial neural networks. Preprint at https://doi.org/10.48550/arXiv.2012.05208 (2020).
McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989).
Ratcliff, R. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97, 285–308 (1990).
French, R. M. Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proc. 13th Annual Conference of the Cognitive Science Society 173–178 (Cognitive Science Society, 1991).
French, R. M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3, 128–135 (1999).
Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992).
Miller, E. G., Matsakis, N. E. & Viola, P. A. Learning from one example through shared densities on transforms. In Proc. Conference on Computer Vision and Pattern Recognition 464–471 (IEEE, 2000).
Fei-Fei, L., Fergus, R. & Perona, P. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. 9th IEEE International Conference on Computer Vision 1134–1141 (IEEE, 2003).
Lake, B., Salakhutdinov, R., Gross, J. & Tenenbaum, J. One shot learning of simple visual concepts. In Proc. 33rd Annual Conference of the Cognitive Science Society (eds Carlson, L. et al.) 2568–2573 (Cognitive Science Society, 2011).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
Anderson, C. W. Learning and Problem-solving with Multilayer Connectionist Systems (Adaptive, Strategy Learning, Neural Networks, Reinforcement Learning). PhD thesis, Univ. Massachusetts Amherst (1986).
Schmidhuber, J. Towards Compositional Learning in Dynamic Networks. Technical Report FKI-129-90 (Institut für Informatik, Technische Universität München, 1990).
Chollet, F. On the measure of intelligence. Preprint at https://doi.org/10.48550/arXiv.1911.01547 (2019).
LeCun, Y. A path towards autonomous machine intelligence. OpenReview.net https://openreview.net/forum?id=BZ5a1r-kVsf (2022).
Anderson, J. R. Problem solving and learning. Am. Psychol. 48, 35–44 (1993).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Griffiths, T. L. Understanding human intelligence through human limitations. Trends Cogn. Sci. 24, 873–883 (2020).
Griffiths, T. L. et al. Doing more with less: meta-reasoning and meta-learning in humans and machines. Curr. Opin. Behav. Sci. 29, 24–30 (2019).
Binz, M. et al. Meta-learned models of cognition. Behav. Brain Sci. 47, e147 (2023).
Ong, D. C., Zhi-Xuan, T., Tenenbaum, J. B. & Goodman, N. D. Probabilistic programming versus meta-learning as models of cognition. Behav. Brain Sci. 47, e158 (2024).
Marinescu, I., McCoy, R. T. & Griffiths, T. L. Distilling symbolic priors for concept learning into neural networks. In Proc. 46th Annual Conference of the Cognitive Science Society 5848–5855 (Cognitive Science Society, 2024).
Nussenbaum, K. & Hartley, C. A. Understanding the development of reward learning through the lens of meta-learning. Nat. Rev. Psychol. 3, 424–438 (2024).
Nussenbaum, K. & Hartley, C. A. Meta-learned models as tools to test theories of cognitive development. Behav. Brain Sci. 47, e157 (2024).
Russin, J., McGrath, S. W., Pavlick, E. & Frank, M. J. Is human compositionality meta-learned? Behav. Brain Sci. 47, e162 (2024).
Smolensky, P. The constituent structure of connectionist mental states: a reply to Fodor and Pylyshyn. South. J. Philos. 26, 137–161 (1988).
Fodor, J. A. & McLaughlin, B. P. Connectionism and the problem of systematicity: why Smolensky’s solution doesn’t work. Cognition 35, 183–204 (1990).
Chalmers, D. J. Why Fodor and Pylyshyn were wrong: the simplest refutation. In Proc. 12th Annual Conference of the Cognitive Science Society 340–347 (Cognitive Science Society, 1990).
Hadley, R. F. Systematicity in connectionist language learning. Mind Lang. 9, 247–272 (1994).
Hadley, R. F. Systematicity revisited: Reply to Christiansen and Chater and Niklasson and van Gelder. Mind Lang. 9, 431–444 (1994).
Niklasson, L. F. & Van Gelder, T. On being systematically connectionist. Mind Lang. 9, 288–302 (1994).
Frank, S. L., Haselager, W. F. & van Rooij, I. Connectionist semantic systematicity. Cognition 110, 358–379 (2009).
Russin, J., McGrath, S. W., Williams, D. J. & Elber-Dorozko, L. From Frege to ChatGPT: compositionality in language, cognition, and deep neural networks. Preprint at https://doi.org/10.48550/arXiv.2405.15164 (2024).
Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2001).
Alhama, R. G. & Zuidema, W. A review of computational models of basic rule learning: the neural-symbolic debate and beyond. Psychon. Bull. Rev. 26, 1174–1194 (2019).
Kurtz, K. J. Simple auto-associative networks succeed at universal generalization of the identity function and reduplication rule. Cogn. Sci. 49, e70033 (2025).
Liška, A., Kruszewski, G. & Baroni, M. Memorize or generalize? Searching for a compositional RNN in a haystack. In Proc. Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP, 2018).
Bahdanau, D. et al. CLOSURE: assessing systematic generalization of CLEVR models. In Proc. Visually Grounded Interaction and Language (ViGIL), NeurIPS 2019 Workshop (NeurIPS, 2019).
Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise?J. Artif. Intell. Res. 67, 757–795 (2020).
Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 9087–9105 (Association for Computational Linguistics, 2020).
Keysers, D. et al. Measuring compositional generalization: a comprehensive method on realistic data. In Proc. 8th International Conference on Learning Representations (ICLR, 2020).
Csordás, R., Irie, K. & Schmidhuber, J. The devil is in the detail: simple tricks improve systematic generalization of transformers. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 619–634 (Association for Computational Linguistics, 2021).
Csordás, R., Irie, K. & Schmidhuber, J. The neural data router: adaptive control flow in transformers improves systematic generalization. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).
Csordás, R., Irie, K. & Schmidhuber, J. CTL++: evaluating generalization on never-seen compositional patterns of known functions, and compatibility of neural representations. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Goldberg, Y. et al.) 9758–9767 (Association for Computational Linguistics, 2022).
McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl Acad. Sci. USA 121, e2322420121 (2024).
Chen, X., Liang, C., Yu, A. W., Song, D. & Zhou, D. Compositional generalization via neural-symbolic stack machines. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Nye, M. I., Solar-Lezama, A., Tenenbaum, J. B. & Lake, B. M. Learning compositional rules via neural program synthesis. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K. & Wierstra, D. Matching networks for one shot learning. In Proc. Advances in Neural Information Processing Systems 29 (eds Lee, D. et al.) (NeurIPS, 2016).
Ravi, S. & Larochelle, H. Optimization as a model for few-shot learning. In Proc. 5th International Conference on Learning Representations (ICLR, 2017).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. One-shot learning by inverting a compositional causal process. In Proc. Advances in Neural Information Processing Systems 26 (eds Burges, C. J. et al.) (NeurIPS, 2013).
Hsu, Y.-C., Liu, Y.-C., Ramasamy, A. & Kira, Z. Re-evaluating continual learning scenarios: a categorization and case for strong baselines. In Proc. Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NeurIPS, 2018).
van de Ven, G. M. & Tolias, A. S. Three scenarios for continual learning. In Proc. Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NeurIPS, 2018).
McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).
Robins, A. Catastrophic forgetting, rehearsal and pseudorehearsal. Conn. Sci. 7, 123–146 (1995).
Wang, L., Zhang, X., Su, H. & Zhu, J. A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 46, 5362–5383 (2024).
Fei-Fei, L., Fergus, R. & Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006).
Carey, S., Diamond, R. & Woods, B. Development of face recognition: a maturational component? Devel. Psychol. 16, 257 (1980).
Biederman, I. Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115 (1987).
Carey, S. in Linguistic Theory and Psychological Reality (eds Halle, M. et al.) 264–293 (MIT Press, 1978).
Carey, S. & Bartlett, E. Acquiring a single new word. Pap. Rep. Child Lang. Dev. 15, 17–29 (1978).
Bloom, P. How Children Learn the Meanings of Words (MIT Press, 2000).
Frank, M. C. Bridging the data gap between children and large language models. Trends Cogn. Sci. 27, 990–992 (2023).
Johnson-Laird, P. N. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness (Harvard Univ. Press, 1983).
Fedorenko, E. & Varley, R. Language and thought are not the same thing: evidence from neuroimaging and neurological patients. Ann. N. Y. Acad. Sci. 1369, 132–153 (2016).
Smith, E. E., Langston, C. & Nisbett, R. E. The case for rules in reasoning. Cogn. Sci. 16, 1–40 (1992).
Sun, R. Robust reasoning: integrating rule-based and similarity-based reasoning. Artif. Intell. 75, 241–295 (1995).
Browne, A. & Sun, R. Connectionist inference models. Neural Netw. 14, 1331–1355 (2001).
Newell, A. Unified Theories of Cognition (Harvard Univ. Press, 1990).
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).
Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3987–3995 (PMLR, 2017).
Schmidhuber, J. Evolutionary Principles in Self-referential Learning. On Learning How to Learn: The Meta-Meta-… Hook. PhD thesis, Technische Univ. München (1987).
Cotter, N. E. & Conwell, P. R. Fixed-weight networks can learn. In Proc. 1990 IJCNN International Joint Conference on Neural Networks 553–559 (IEEE, 1990).
Cotter, N. E. & Conwell, P. R. Learning algorithms and fixed dynamics. In Proc. IJCNN-91-Seattle International Joint Conference on Neural Networks 799–801 (IEEE, 1991).
Younger, A. S., Conwell, P. R. & Cotter, N. E. Fixed-weight on-line learning. IEEE Trans. Neural Netw. 10, 272–283 (1999).
Hochreiter, S., Younger, A. S. & Conwell, P. R. Learning to learn using gradient descent. In Proc. Artificial Neural Networks - ICANN 2001 (eds Dorffner, G. et al.) 87–94 (Springer, 2001).
Rich, J. A. & Farrall, G. A. Vacuum arc recovery phenomena. Proc. IEEE 52, 1293–1301 (1964).
White, M. W., Holdaway, R. M., Guo, Y. & Paulos, J. J. New strategies for improving speech enhancement. Int. J. Biomed. Comput. 25, 101–124 (1990).
Bosc, T. Learning to learn neural networks. In Proc. Reasoning, Attention, Memory (RAM) Workshop, NIPS 2015 (NeurIPS, 2015).
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. P. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1842–1850 (PMLR, 2016).
Duan, Y. et al. RL2: fast reinforcement learning via slow reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.1611.02779 (2016).
Wang, J. et al. Learning to reinforcement learn. In Proc. 39th Annual Conference of the Cognitive Science Society (eds Gunzelmann, G. et al.) 1319 (Cognitive Science Society, 2017).
Munkhdalai, T. & Yu, H. Meta networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 2554–2563 (PMLR, 2017).
Mishra, N., Rohaninejad, M., Chen, X. & Abbeel, P. A simple neural attentive meta-learner. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).
Brown, T. B. et al. Language models are few-shot learners. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Xie, S. M., Raghunathan, A., Liang, P. & Ma, T. An explanation of in-context learning as implicit Bayesian inference. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).
Garg, S., Tsipras, D., Liang, P. & Valiant, G. What can transformers learn in-context? A case study of simple function classes. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Raventós, A., Paul, M., Chen, F. & Ganguli, S. Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression. In Proc. Advances in Neural Information Processing Systems 36 (eds Oh, A. et al.) (NeurIPS, 2023).
Panwar, M., Ahuja, K. & Goyal, N. In-context learning through the Bayesian prism. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Von Oswald, J. et al. Transformers learn in-context by gradient descent. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 35151–35174 (PMLR, 2023).
Dai, D. et al. Why can GPT learn in-context? Language models secretly perform gradient descent as meta-optimizers. In Proc. Findings Association for Computational Linguistics:ACL 2023 (eds Rogers, A. et al.) 4005–4019 (Association for Computational Linguistics, 2023).
Akyürek, E., Schuurmans, D., Andreas, J., Ma, T. & Zhou, D. What learning algorithm is in-context learning? Investigations with linear models. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Min, S. et al. Rethinking the role of demonstrations: what makes in-context learning work? In Proc. Conference on Empirical Methods in Natural Language Processing (eds Goldberg Y. et al.) 11048–11064 (Association for Computational Linguistics, 2022).
Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).
Irie, K., Csordás, R. & Schmidhuber, J. Metalearning continual learning algorithms. Trans. Mach. Learn. Res. (2025).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NeurIPS, 2017).
Schmidhuber, J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4, 131–139 (1992).
Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. Transformers are RNNs: fast autoregressive transformers with linear attention. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 5156–5165 (PMLR, 2020).
Schlag, I., Irie, K. & Schmidhuber, J. Linear transformers are secretly fast weight programmers. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 9355–9366 (PMLR, 2021).
Irie, K., Schlag, I., Csordás, R. & Schmidhuber, J. Going beyond linear transformers with recurrent fast weight programmers. In Proc. Advances in Neural Information Processing Systems 34 (eds Ranzato, M. et al.) (NeurIPS, 2021).
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1842–1850 (PMLR, 2016).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Graves, A. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1126–1135 (PMLR, 2017).
Finn, C. & Levine, S. Meta-learning and universality: deep representations and gradient descent can approximate any learning algorithm. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).
Javed, K. & White, M. Meta-learning representations for continual learning. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) (NeurIPS, 2019).
Beaulieu, S. et al. Learning to continually learn. In Proc. 24th European Conference on Artificial Intelligence (eds De Giacomo, G. et al.) 992–1001 (IOS Press, 2020).
Conklin, H., Wang, B., Smith, K. & Titov, I. Meta-learning to compositionally generalize. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (eds Zong, C. et al.) 3322–3335 (Association for Computational Linguistics, 2021).
Lee, S., Son, J. & Kim, G. Recasting continual learning as sequence modeling. In Proc. Advances in Neural Information Processing Systems 36 (eds Oh, A. et al.) (NeurIPS, 2023).
Vettoruzzo, A., Vanschoren, J., Bouguelia, M.-R. & Rögnvaldsson, T. S. Learning to learn without forgetting using attention. In Proc. 3rd Conference on Lifelong Learning Agents (eds Lomonaco, V. et al.) 285–300 (PMLR, 2024).
Irie, K., Schlag, I., Csordás, R. & Schmidhuber, J. A modern self-referential weight matrix that learns to modify itself. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 9660–9677 (PMLR, 2022).
Irie, K., Csordás, R. & Schmidhuber, J. Practical computational power of linear transformers and their recurrent and self-referential extensions. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 9455–9465 (Association for Computational Linguistics, 2023).
Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F. & Schmidhuber, J. Compete to compute. In Proc. Advances in Neural Information Processing Systems 36 (eds Burges, C. J. et al.) (NeurIPS, 2013).
Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NeurIPS, 2017).
Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, pgae233 (2024).
Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. Preprint at https://doi.org/10.48550/arXiv.2112.00114 (2021).
Cobbe, K. et al. Training verifiers to solve math word problems. Preprint at https://doi.org/10.48550/arXiv.2110.14168 (2021).
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. https://openreview.net/forum?id=yzkSU5zdwD (2022)
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Rajani, N. F., McCann, B., Xiong, C. & Socher, R. Explain yourself! Leveraging language models for commonsense reasoning. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 4932–4942 (Association for Computational Linguistics, 2019).
Lightman, H. et al. Let’s verify step by step. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Kirchner, J. H. et al. Prover-verifier games improve legibility of LLM outputs. Preprint at https://doi.org/10.48550/arXiv.2407.13692 (2024).
Zelikman, E., Wu, Y., Mu, J. & Goodman, N. D. STaR: bootstrapping reasoning with reasoning. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Gandhi, K. et al. Stream of search (SoS): learning to search in language. In Proc. 1st Conference on Language Modeling (COLM, 2024).
Shanahan, M. & Mitchell, M. Abstraction for deep reinforcement learning. In Proc. 31st International Joint Conference on Artificial Intelligence (ed. De Raedt, L.) 5588–5596 (IJCAI, 2022).
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J. & Garrabrant, S. Risks from learned optimization in advanced machine learning systems. Preprint at https://doi.org/10.48550/arXiv.1906.01820 (2019).
Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025).
Conwell, C. & Ullman, T. Testing relational understanding in text-guided image generation. Preprint at https://doi.org/10.48550/arXiv.2208.00005 (2022).
Betker, J. et al. Improving image generation with better captions. Preprint at OpenAI https://cdn.openai.com/papers/dall-e-3.pdf (2023).
Berglund, L. et al. The reversal curse: LLMs trained on “A is B” fail to learn “B is A”. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Wang, W., Jiang, G., Linzen, T. & Lake, B. M. Rapid word learning through meta in-context learning. Preprint at https://doi.org/10.48550/arXiv.2502.14791 (2025).
Smith, L. B. & Karmazyn-Raz, H. Episodes of experience and generative intelligence. Trends Cogn. Sci. 26, 1064–1065 (2022).
Chan, S. C. Y. et al. Data distributional properties drive emergent in-context learning in transformers. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Bergelson, E. The comprehension boost in early word learning: older infants are better learners. Child Dev. Perspect. 14, 142–149 (2020).
Smith, L. B., Jones, S. S., Landau, B., Gershkoff-Stowe, L. & Samuelson, L. Object name learning provides on-the-job training for attention. Psychol. Sci. 13, 13–19 (2002).
Piantadosi, S. & Aslin, R. Compositional reasoning in early childhood. PLoS ONE 11, e0147734 (2016).
Piantadosi, S. T., Palmeri, H. & Aslin, R. Limits on composition of conceptual operations in 9-month-olds. Infancy 23, 310–324 (2018).
Coffman, J. L. et al. Relating children’s early elementary classroom experiences to later skilled remembering and study skills. J. Cogn. Dev. 20, 203–221 (2019).
Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science 383, 504–511 (2024).
Orhan, A. E. & Lake, B. M. Learning high-level visual representations from a child’s perspective without strong inductive biases. Nat. Mach. Intell. 6, 271–283 (2024).
Shuvaev, S., Lachi, D., Koulakov, A. & Zador, A. Encoding innate ability through a genomic bottleneck. Proc. Natl Acad. Sci. USA 121, e2409160121 (2024).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Lukas Galke, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Irie, K., Lake, B.M. Overcoming classic challenges for artificial neural networks by providing incentives and practice. Nat Mach Intell 7, 1602–1611 (2025). https://doi.org/10.1038/s42256-025-01121-8
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01121-8


