Overcoming classic challenges for artificial neural networks by providing incentives and practice

Irie, Kazuki; Lake, Brenden M.

doi:10.1038/s42256-025-01121-8

Perspective
Published: 20 October 2025

Overcoming classic challenges for artificial neural networks by providing incentives and practice

Nature Machine Intelligence volume 7, pages 1602–1611 (2025)Cite this article

2203 Accesses
2 Citations
45 Altmetric
Metrics details

Subjects

Abstract

Since the earliest proposals for artificial neural network models of the mind and brain, critics have pointed out key weaknesses in these models compared with human cognitive abilities. Here we review recent work that uses metalearning to overcome several classic challenges, which we characterize as addressing the problem of incentive and practice—that is, providing machines with both incentives to improve specific skills and opportunities to practice those skills. This explicit optimization contrasts with more conventional approaches that hope that the desired behaviour will emerge through optimizing related but different objectives. We review applications of this principle to address four classic challenges for artificial neural networks: systematic generalization, catastrophic forgetting, few-shot learning and multi-step reasoning. We also discuss how large language models incorporate key aspects of this metalearning framework (namely, sequence prediction with feedback trained on diverse data), which helps to explain some of their successes on these classic challenges. Finally, we discuss the prospects for understanding aspects of human development through this framework, and whether natural environments provide the right incentives and practice for learning how to make challenging generalizations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Illustrations of four examples of cognitive challenges for ANNs.**

Fig. 2: The PIP.

**Fig. 3: Applying metalearning to address classic challenges for ANNs.**

Aligning machine and human visual representations across abstraction levels

Article Open access 12 November 2025

A generic self-learning emotional framework for machines

Article Open access 28 October 2024

Discovering cognitive strategies with tiny recurrent neural networks

Article Open access 02 July 2025

References

McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).
Article MathSciNet Google Scholar
Rosenblatt, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958).
Article Google Scholar
Rumelhart, D. E., McClelland, J. L. & PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations (MIT Press, 1986).
McClelland, J. L., Rumelhart, D. E. & PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models (MIT Press, 1986).
Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).
Article Google Scholar
Marcus, G. F. Rethinking eliminative connectionism. Cogn. Psychol. 37, 243–282 (1998).
Article Google Scholar
Lake, B. M. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2873–2882 (PMLR, 2018).
Greff, K., van Steenkiste, S. & Schmidhuber, J. On the binding problem in artificial neural networks. Preprint at https://doi.org/10.48550/arXiv.2012.05208 (2020).
McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 109–165 (1989).
Article Google Scholar
Ratcliff, R. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97, 285–308 (1990).
Article Google Scholar
French, R. M. Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proc. 13th Annual Conference of the Cognitive Science Society 173–178 (Cognitive Science Society, 1991).
French, R. M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3, 128–135 (1999).
Article Google Scholar
Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992).
Article Google Scholar
Miller, E. G., Matsakis, N. E. & Viola, P. A. Learning from one example through shared densities on transforms. In Proc. Conference on Computer Vision and Pattern Recognition 464–471 (IEEE, 2000).
Fei-Fei, L., Fergus, R. & Perona, P. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. 9th IEEE International Conference on Computer Vision 1134–1141 (IEEE, 2003).
Lake, B., Salakhutdinov, R., Gross, J. & Tenenbaum, J. One shot learning of simple visual concepts. In Proc. 33rd Annual Conference of the Cognitive Science Society (eds Carlson, L. et al.) 2568–2573 (Cognitive Science Society, 2011).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
Article MathSciNet Google Scholar
Anderson, C. W. Learning and Problem-solving with Multilayer Connectionist Systems (Adaptive, Strategy Learning, Neural Networks, Reinforcement Learning). PhD thesis, Univ. Massachusetts Amherst (1986).
Schmidhuber, J. Towards Compositional Learning in Dynamic Networks. Technical Report FKI-129-90 (Institut für Informatik, Technische Universität München, 1990).
Chollet, F. On the measure of intelligence. Preprint at https://doi.org/10.48550/arXiv.1911.01547 (2019).
LeCun, Y. A path towards autonomous machine intelligence. OpenReview.net https://openreview.net/forum?id=BZ5a1r-kVsf (2022).
Anderson, J. R. Problem solving and learning. Am. Psychol. 48, 35–44 (1993).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Article Google Scholar
Griffiths, T. L. Understanding human intelligence through human limitations. Trends Cogn. Sci. 24, 873–883 (2020).
Article Google Scholar
Griffiths, T. L. et al. Doing more with less: meta-reasoning and meta-learning in humans and machines. Curr. Opin. Behav. Sci. 29, 24–30 (2019).
Article Google Scholar
Binz, M. et al. Meta-learned models of cognition. Behav. Brain Sci. 47, e147 (2023).
Article Google Scholar
Ong, D. C., Zhi-Xuan, T., Tenenbaum, J. B. & Goodman, N. D. Probabilistic programming versus meta-learning as models of cognition. Behav. Brain Sci. 47, e158 (2024).
Article Google Scholar
Marinescu, I., McCoy, R. T. & Griffiths, T. L. Distilling symbolic priors for concept learning into neural networks. In Proc. 46th Annual Conference of the Cognitive Science Society 5848–5855 (Cognitive Science Society, 2024).
Nussenbaum, K. & Hartley, C. A. Understanding the development of reward learning through the lens of meta-learning. Nat. Rev. Psychol. 3, 424–438 (2024).
Article Google Scholar
Nussenbaum, K. & Hartley, C. A. Meta-learned models as tools to test theories of cognitive development. Behav. Brain Sci. 47, e157 (2024).
Article Google Scholar
Russin, J., McGrath, S. W., Pavlick, E. & Frank, M. J. Is human compositionality meta-learned? Behav. Brain Sci. 47, e162 (2024).
Smolensky, P. The constituent structure of connectionist mental states: a reply to Fodor and Pylyshyn. South. J. Philos. 26, 137–161 (1988).
Article Google Scholar
Fodor, J. A. & McLaughlin, B. P. Connectionism and the problem of systematicity: why Smolensky’s solution doesn’t work. Cognition 35, 183–204 (1990).
Article Google Scholar
Chalmers, D. J. Why Fodor and Pylyshyn were wrong: the simplest refutation. In Proc. 12th Annual Conference of the Cognitive Science Society 340–347 (Cognitive Science Society, 1990).
Hadley, R. F. Systematicity in connectionist language learning. Mind Lang. 9, 247–272 (1994).
Article Google Scholar
Hadley, R. F. Systematicity revisited: Reply to Christiansen and Chater and Niklasson and van Gelder. Mind Lang. 9, 431–444 (1994).
Article Google Scholar
Niklasson, L. F. & Van Gelder, T. On being systematically connectionist. Mind Lang. 9, 288–302 (1994).
Article Google Scholar
Frank, S. L., Haselager, W. F. & van Rooij, I. Connectionist semantic systematicity. Cognition 110, 358–379 (2009).
Article Google Scholar
Russin, J., McGrath, S. W., Williams, D. J. & Elber-Dorozko, L. From Frege to ChatGPT: compositionality in language, cognition, and deep neural networks. Preprint at https://doi.org/10.48550/arXiv.2405.15164 (2024).
Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2001).
Alhama, R. G. & Zuidema, W. A review of computational models of basic rule learning: the neural-symbolic debate and beyond. Psychon. Bull. Rev. 26, 1174–1194 (2019).
Article Google Scholar
Kurtz, K. J. Simple auto-associative networks succeed at universal generalization of the identity function and reduplication rule. Cogn. Sci. 49, e70033 (2025).
Article Google Scholar
Liška, A., Kruszewski, G. & Baroni, M. Memorize or generalize? Searching for a compositional RNN in a haystack. In Proc. Workshop on Architectures and Evaluation for Generality, Autonomy and Progress in AI (AEGAP, 2018).
Bahdanau, D. et al. CLOSURE: assessing systematic generalization of CLEVR models. In Proc. Visually Grounded Interaction and Language (ViGIL), NeurIPS 2019 Workshop (NeurIPS, 2019).
Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise?J. Artif. Intell. Res. 67, 757–795 (2020).
Article MathSciNet Google Scholar
Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 9087–9105 (Association for Computational Linguistics, 2020).
Keysers, D. et al. Measuring compositional generalization: a comprehensive method on realistic data. In Proc. 8th International Conference on Learning Representations (ICLR, 2020).
Csordás, R., Irie, K. & Schmidhuber, J. The devil is in the detail: simple tricks improve systematic generalization of transformers. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 619–634 (Association for Computational Linguistics, 2021).
Csordás, R., Irie, K. & Schmidhuber, J. The neural data router: adaptive control flow in transformers improves systematic generalization. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).
Csordás, R., Irie, K. & Schmidhuber, J. CTL++: evaluating generalization on never-seen compositional patterns of known functions, and compatibility of neural representations. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Goldberg, Y. et al.) 9758–9767 (Association for Computational Linguistics, 2022).
McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl Acad. Sci. USA 121, e2322420121 (2024).
Chen, X., Liang, C., Yu, A. W., Song, D. & Zhou, D. Compositional generalization via neural-symbolic stack machines. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Nye, M. I., Solar-Lezama, A., Tenenbaum, J. B. & Lake, B. M. Learning compositional rules via neural program synthesis. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Article Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K. & Wierstra, D. Matching networks for one shot learning. In Proc. Advances in Neural Information Processing Systems 29 (eds Lee, D. et al.) (NeurIPS, 2016).
Ravi, S. & Larochelle, H. Optimization as a model for few-shot learning. In Proc. 5th International Conference on Learning Representations (ICLR, 2017).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. One-shot learning by inverting a compositional causal process. In Proc. Advances in Neural Information Processing Systems 26 (eds Burges, C. J. et al.) (NeurIPS, 2013).
Hsu, Y.-C., Liu, Y.-C., Ramasamy, A. & Kira, Z. Re-evaluating continual learning scenarios: a categorization and case for strong baselines. In Proc. Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NeurIPS, 2018).
van de Ven, G. M. & Tolias, A. S. Three scenarios for continual learning. In Proc. Continual Learning Workshop, 32nd Conference on Neural Information Processing Systems (NeurIPS, 2018).
McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).
Article Google Scholar
Robins, A. Catastrophic forgetting, rehearsal and pseudorehearsal. Conn. Sci. 7, 123–146 (1995).
Article Google Scholar
Wang, L., Zhang, X., Su, H. & Zhu, J. A comprehensive survey of continual learning: theory, method and application. IEEE Trans. Pattern Anal. Mach. Intell. 46, 5362–5383 (2024).
Article Google Scholar
Fei-Fei, L., Fergus, R. & Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006).
Article Google Scholar
Carey, S., Diamond, R. & Woods, B. Development of face recognition: a maturational component? Devel. Psychol. 16, 257 (1980).
Article Google Scholar
Biederman, I. Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115 (1987).
Article Google Scholar
Carey, S. in Linguistic Theory and Psychological Reality (eds Halle, M. et al.) 264–293 (MIT Press, 1978).
Carey, S. & Bartlett, E. Acquiring a single new word. Pap. Rep. Child Lang. Dev. 15, 17–29 (1978).
Google Scholar
Bloom, P. How Children Learn the Meanings of Words (MIT Press, 2000).
Frank, M. C. Bridging the data gap between children and large language models. Trends Cogn. Sci. 27, 990–992 (2023).
Article Google Scholar
Johnson-Laird, P. N. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness (Harvard Univ. Press, 1983).
Fedorenko, E. & Varley, R. Language and thought are not the same thing: evidence from neuroimaging and neurological patients. Ann. N. Y. Acad. Sci. 1369, 132–153 (2016).
Article Google Scholar
Smith, E. E., Langston, C. & Nisbett, R. E. The case for rules in reasoning. Cogn. Sci. 16, 1–40 (1992).
Article Google Scholar
Sun, R. Robust reasoning: integrating rule-based and similarity-based reasoning. Artif. Intell. 75, 241–295 (1995).
Article Google Scholar
Browne, A. & Sun, R. Connectionist inference models. Neural Netw. 14, 1331–1355 (2001).
Article Google Scholar
Newell, A. Unified Theories of Cognition (Harvard Univ. Press, 1990).
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).
Article MathSciNet Google Scholar
Zenke, F., Poole, B. & Ganguli, S. Continual learning through synaptic intelligence. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 3987–3995 (PMLR, 2017).
Schmidhuber, J. Evolutionary Principles in Self-referential Learning. On Learning How to Learn: The Meta-Meta-… Hook. PhD thesis, Technische Univ. München (1987).
Cotter, N. E. & Conwell, P. R. Fixed-weight networks can learn. In Proc. 1990 IJCNN International Joint Conference on Neural Networks 553–559 (IEEE, 1990).
Cotter, N. E. & Conwell, P. R. Learning algorithms and fixed dynamics. In Proc. IJCNN-91-Seattle International Joint Conference on Neural Networks 799–801 (IEEE, 1991).
Younger, A. S., Conwell, P. R. & Cotter, N. E. Fixed-weight on-line learning. IEEE Trans. Neural Netw. 10, 272–283 (1999).
Article Google Scholar
Hochreiter, S., Younger, A. S. & Conwell, P. R. Learning to learn using gradient descent. In Proc. Artificial Neural Networks - ICANN 2001 (eds Dorffner, G. et al.) 87–94 (Springer, 2001).
Rich, J. A. & Farrall, G. A. Vacuum arc recovery phenomena. Proc. IEEE 52, 1293–1301 (1964).
Article Google Scholar
White, M. W., Holdaway, R. M., Guo, Y. & Paulos, J. J. New strategies for improving speech enhancement. Int. J. Biomed. Comput. 25, 101–124 (1990).
Article Google Scholar
Bosc, T. Learning to learn neural networks. In Proc. Reasoning, Attention, Memory (RAM) Workshop, NIPS 2015 (NeurIPS, 2015).
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. P. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1842–1850 (PMLR, 2016).
Duan, Y. et al. RL²: fast reinforcement learning via slow reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.1611.02779 (2016).
Wang, J. et al. Learning to reinforcement learn. In Proc. 39th Annual Conference of the Cognitive Science Society (eds Gunzelmann, G. et al.) 1319 (Cognitive Science Society, 2017).
Munkhdalai, T. & Yu, H. Meta networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 2554–2563 (PMLR, 2017).
Mishra, N., Rohaninejad, M., Chen, X. & Abbeel, P. A simple neural attentive meta-learner. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).
Brown, T. B. et al. Language models are few-shot learners. In Proc. Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Xie, S. M., Raghunathan, A., Liang, P. & Ma, T. An explanation of in-context learning as implicit Bayesian inference. In Proc. 10th International Conference on Learning Representations (ICLR, 2022).
Garg, S., Tsipras, D., Liang, P. & Valiant, G. What can transformers learn in-context? A case study of simple function classes. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Raventós, A., Paul, M., Chen, F. & Ganguli, S. Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression. In Proc. Advances in Neural Information Processing Systems 36 (eds Oh, A. et al.) (NeurIPS, 2023).
Panwar, M., Ahuja, K. & Goyal, N. In-context learning through the Bayesian prism. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Von Oswald, J. et al. Transformers learn in-context by gradient descent. In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 35151–35174 (PMLR, 2023).
Dai, D. et al. Why can GPT learn in-context? Language models secretly perform gradient descent as meta-optimizers. In Proc. Findings Association for Computational Linguistics:ACL 2023 (eds Rogers, A. et al.) 4005–4019 (Association for Computational Linguistics, 2023).
Akyürek, E., Schuurmans, D., Andreas, J., Ma, T. & Zhou, D. What learning algorithm is in-context learning? Investigations with linear models. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Min, S. et al. Rethinking the role of demonstrations: what makes in-context learning work? In Proc. Conference on Empirical Methods in Natural Language Processing (eds Goldberg Y. et al.) 11048–11064 (Association for Computational Linguistics, 2022).
Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).
Article Google Scholar
Irie, K., Csordás, R. & Schmidhuber, J. Metalearning continual learning algorithms. Trans. Mach. Learn. Res. (2025).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NeurIPS, 2017).
Schmidhuber, J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4, 131–139 (1992).
Article Google Scholar
Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. Transformers are RNNs: fast autoregressive transformers with linear attention. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 5156–5165 (PMLR, 2020).
Schlag, I., Irie, K. & Schmidhuber, J. Linear transformers are secretly fast weight programmers. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 9355–9366 (PMLR, 2021).
Irie, K., Schlag, I., Csordás, R. & Schmidhuber, J. Going beyond linear transformers with recurrent fast weight programmers. In Proc. Advances in Neural Information Processing Systems 34 (eds Ranzato, M. et al.) (NeurIPS, 2021).
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proc. 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 1842–1850 (PMLR, 2016).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Graves, A. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016).
Article Google Scholar
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1126–1135 (PMLR, 2017).
Finn, C. & Levine, S. Meta-learning and universality: deep representations and gradient descent can approximate any learning algorithm. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).
Javed, K. & White, M. Meta-learning representations for continual learning. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) (NeurIPS, 2019).
Beaulieu, S. et al. Learning to continually learn. In Proc. 24th European Conference on Artificial Intelligence (eds De Giacomo, G. et al.) 992–1001 (IOS Press, 2020).
Conklin, H., Wang, B., Smith, K. & Titov, I. Meta-learning to compositionally generalize. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (eds Zong, C. et al.) 3322–3335 (Association for Computational Linguistics, 2021).
Lee, S., Son, J. & Kim, G. Recasting continual learning as sequence modeling. In Proc. Advances in Neural Information Processing Systems 36 (eds Oh, A. et al.) (NeurIPS, 2023).
Vettoruzzo, A., Vanschoren, J., Bouguelia, M.-R. & Rögnvaldsson, T. S. Learning to learn without forgetting using attention. In Proc. 3rd Conference on Lifelong Learning Agents (eds Lomonaco, V. et al.) 285–300 (PMLR, 2024).
Irie, K., Schlag, I., Csordás, R. & Schmidhuber, J. A modern self-referential weight matrix that learns to modify itself. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 9660–9677 (PMLR, 2022).
Irie, K., Csordás, R. & Schmidhuber, J. Practical computational power of linear transformers and their recurrent and self-referential extensions. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 9455–9465 (Association for Computational Linguistics, 2023).
Srivastava, R. K., Masci, J., Kazerounian, S., Gomez, F. & Schmidhuber, J. Compete to compute. In Proc. Advances in Neural Information Processing Systems 36 (eds Burges, C. J. et al.) (NeurIPS, 2013).
Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) (NeurIPS, 2017).
Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, pgae233 (2024).
Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. Preprint at https://doi.org/10.48550/arXiv.2112.00114 (2021).
Cobbe, K. et al. Training verifiers to solve math word problems. Preprint at https://doi.org/10.48550/arXiv.2110.14168 (2021).
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. https://openreview.net/forum?id=yzkSU5zdwD (2022)
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Rajani, N. F., McCann, B., Xiong, C. & Socher, R. Explain yourself! Leveraging language models for commonsense reasoning. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 4932–4942 (Association for Computational Linguistics, 2019).
Lightman, H. et al. Let’s verify step by step. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Kirchner, J. H. et al. Prover-verifier games improve legibility of LLM outputs. Preprint at https://doi.org/10.48550/arXiv.2407.13692 (2024).
Zelikman, E., Wu, Y., Mu, J. & Goodman, N. D. STaR: bootstrapping reasoning with reasoning. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Gandhi, K. et al. Stream of search (SoS): learning to search in language. In Proc. 1st Conference on Language Modeling (COLM, 2024).
Shanahan, M. & Mitchell, M. Abstraction for deep reinforcement learning. In Proc. 31st International Joint Conference on Artificial Intelligence (ed. De Raedt, L.) 5588–5596 (IJCAI, 2022).
Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J. & Garrabrant, S. Risks from learned optimization in advanced machine learning systems. Preprint at https://doi.org/10.48550/arXiv.1906.01820 (2019).
Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025).
Article Google Scholar
Conwell, C. & Ullman, T. Testing relational understanding in text-guided image generation. Preprint at https://doi.org/10.48550/arXiv.2208.00005 (2022).
Betker, J. et al. Improving image generation with better captions. Preprint at OpenAI https://cdn.openai.com/papers/dall-e-3.pdf (2023).
Berglund, L. et al. The reversal curse: LLMs trained on “A is B” fail to learn “B is A”. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Wang, W., Jiang, G., Linzen, T. & Lake, B. M. Rapid word learning through meta in-context learning. Preprint at https://doi.org/10.48550/arXiv.2502.14791 (2025).
Smith, L. B. & Karmazyn-Raz, H. Episodes of experience and generative intelligence. Trends Cogn. Sci. 26, 1064–1065 (2022).
Article Google Scholar
Chan, S. C. Y. et al. Data distributional properties drive emergent in-context learning in transformers. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) (NeurIPS, 2022).
Bergelson, E. The comprehension boost in early word learning: older infants are better learners. Child Dev. Perspect. 14, 142–149 (2020).
Article Google Scholar
Smith, L. B., Jones, S. S., Landau, B., Gershkoff-Stowe, L. & Samuelson, L. Object name learning provides on-the-job training for attention. Psychol. Sci. 13, 13–19 (2002).
Article Google Scholar
Piantadosi, S. & Aslin, R. Compositional reasoning in early childhood. PLoS ONE 11, e0147734 (2016).
Article Google Scholar
Piantadosi, S. T., Palmeri, H. & Aslin, R. Limits on composition of conceptual operations in 9-month-olds. Infancy 23, 310–324 (2018).
Article Google Scholar
Coffman, J. L. et al. Relating children’s early elementary classroom experiences to later skilled remembering and study skills. J. Cogn. Dev. 20, 203–221 (2019).
Article Google Scholar
Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science 383, 504–511 (2024).
Article Google Scholar
Orhan, A. E. & Lake, B. M. Learning high-level visual representations from a child’s perspective without strong inductive biases. Nat. Mach. Intell. 6, 271–283 (2024).
Article Google Scholar
Shuvaev, S., Lachi, D., Koulakov, A. & Zador, A. Encoding innate ability through a genomic bottleneck. Proc. Natl Acad. Sci. USA 121, e2409160121 (2024).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Harvard University, Cambridge, MA, USA
Kazuki Irie
Departments of Computer Science and Psychology, Princeton University, Princeton, NJ, USA
Brenden M. Lake

Authors

Kazuki Irie
View author publications
Search author on:PubMed Google Scholar
Brenden M. Lake
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Kazuki Irie or Brenden M. Lake.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Lukas Galke, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Irie, K., Lake, B.M. Overcoming classic challenges for artificial neural networks by providing incentives and practice. Nat Mach Intell 7, 1602–1611 (2025). https://doi.org/10.1038/s42256-025-01121-8

Download citation

Received: 15 December 2024
Accepted: 22 August 2025
Published: 20 October 2025
Version of record: 20 October 2025
Issue date: October 2025
DOI: https://doi.org/10.1038/s42256-025-01121-8