Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Aligning generalization between humans and machines

Abstract

Recent advances in artificial intelligence (AI)—including generative approaches—have resulted in technology that can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals. The responsible use of AI and its participation in human–AI teams increasingly shows the need for AI alignment, that is, to make AI systems act according to our preferences. A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalize. In cognitive science, human generalization commonly involves abstraction and concept learning. By contrast, AI generalization encompasses out-of-domain generalization in machine learning, rule-based reasoning in symbolic AI, and abstraction in neurosymbolic AI. Here we combine insights from AI and cognitive science to identify key commonalities and differences across three dimensions: notions of, methods for, and evaluation of generalization. We map the different conceptualizations of generalization in AI and cognitive science along these three dimensions and consider their role for alignment in human–AI teaming. This results in interdisciplinary challenges across AI and cognitive science that must be tackled to support effective and cognitively supported alignment in human–AI teaming scenarios.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison of the strengths of humans and statistical machines, illustrating their complementary generalization in human–AI teaming scenarios.
Fig. 2: Illustrative examples of human generalization and its inspiration of rule-based, example-based and statistical ML approaches.

Similar content being viewed by others

References

  1. Jumper, J. M. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  2. Ferrara, E. GenAI against humanity: nefarious applications of generative artificial intelligence and large language models. J. Comput. Social Sci. 7, 549–569 (2024).

  3. Carroll, M., Foote, D., Siththaranjan, A., Russell, S. & Dragan, A. AI alignment with changing and influenceable reward functions. In Proc. 41st International Conference on Machine Learning 5706–5756 (JMLR, 2024).

  4. Metcalfe, J. S., Perelman, B. S., Boothe, D. L. & Mcdowell, K. Systemic oversimplification limits the potential for human-AI partnership. IEEE Access 9, 70242–70260 (2021).

    Article  Google Scholar 

  5. Gottweis, J. et al. Towards an AI co-scientist. Preprint at https://doi.org/10.48550/arXiv.2502.18864 (2025).

  6. Donnelly, J. et al. Rashomon sets for prototypical-part networks: Editing interpretable models in real-time. In Computer Vision and Pattern Recognition Conference 4528–4538 (CVPR, 2025).

  7. Bengio, Y. et al. Managing extreme AI risks amid rapid progress. Science 384, 842–845 (2024).

    Article  Google Scholar 

  8. Bellogín, A. et al. The EU AI act and the wager on trustworthy AI. Commun. ACM 67, 58–65 (2024).

    Article  Google Scholar 

  9. Son, J. Y., Smith, L. B. & Goldstone, R. L. Simplicity and generalization: short-cutting abstraction in children’s object categorizations. Cognition 108, 626–638 (2008).

    Article  Google Scholar 

  10. Harnad, S. in Handbook of Categorization in Cognitive Science 2nd edn (eds Cohen, H. & Lefebvre, C.) 21–54 (Elsevier Academic Press, 2017).

  11. Holzinger, A. et al. Toward human-level concept learning: pattern benchmarking for AI algorithms. Patterns 4, 100788 (2023).

    Article  Google Scholar 

  12. Lin, H. W., Tegmark, M. & Rolnick, D. Why does deep and cheap learning work so well? J. Stat. Phys. 168, 1223–1247 (2017).

    Article  MathSciNet  Google Scholar 

  13. Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997).

    Article  Google Scholar 

  14. Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and ai are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).

  15. Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 55, 248 (2023).

  16. Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).

  17. Losing, V., Hammer, B. & Wersing, H. KNN classifier with self adjusting memory for heterogeneous concept drift. In IEEE International Conference on Data Mining 291–300 (IEEE, 2016).

  18. Hitzler, P. et al. (eds) Compendium of Neurosymbolic Artificial Intelligence (IOS Press, 2023).

  19. Bruner, J., Goodnow, J. J. & Austin, G. A. A Study of Thinking (Wiley, 1956).

  20. Hunt, E. B., Marin, J. & Stone, P. J. Experiments in Induction (Academic Press, 1966).

  21. Winston, P. H. Learning Structural Descriptions From Examples. AI Technical Report (Massachusetts Institute of Technology, 1970).

  22. Muggleton, S. & De Raedt, L. Inductive logic programming: theory and methods. J. Logic Program. 19, 629–679 (1994).

    Article  MathSciNet  Google Scholar 

  23. Schmid, U. & Kitzelmann, E. Inductive rule learning on the knowledge level. Cognit. Syst. Res. 12, 237–248 (2011).

    Article  Google Scholar 

  24. Gulwani, S. et al. Inductive programming meets the real world. Commun. ACM 58, 90–99 (2015).

    Article  Google Scholar 

  25. De Raedt, L. & Kersting, K. in Probabilistic Inductive Logic Programming: Theory and Applications 1–27 (Springer, 2008).

  26. Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011).

    Article  MathSciNet  Google Scholar 

  27. Kahneman, D. Thinking, Fast and Slow (Macmillan, 2011).

  28. Lafond, D., Lacouture, Y. & Cohen, A. L. Decision-tree models of categorization response times, choice proportions, and typicality judgments. Psychol. Rev. 116, 833 (2009).

    Article  Google Scholar 

  29. Rosch, E. & Mervis, C. B. Family resemblances: studies in the internal structure of categories. Cognit. Psychol. 7, 573–605 (1975).

    Article  Google Scholar 

  30. Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).

    Article  Google Scholar 

  31. Nosofsky, R. M. Exemplar-based accounts of relations between classification, recognition, and typicality. J. Exp. Psychol. Learn. Mem. Cognit. 14, 700 (1988).

    Article  Google Scholar 

  32. Labov, W. in New Ways of Analyzing Variation in English (eds Bailey, C.-J. N. & Shuy, R. W.) 67–90 (Georgetown Univ. Press, 1973).

  33. Gentner, D. Structure-mapping: a theoretical framework for analogy. Cognit. Sci. 7, 155–170 (1983).

    Google Scholar 

  34. Falkenhainer, B., Forbus, K. D. & Gentner, D. The structure-mapping engine: algorithm and examples. Artif. Intell. 41, 1–63 (1989).

    Article  Google Scholar 

  35. Forbus, K. D., Ferguson, R. W., Lovett, A. & Gentner, D. Extending SME to handle large-scale cognitive modeling. Cognit. Sci. 41, 1152–1201 (2017).

    Article  Google Scholar 

  36. Rumelhart, D. E. & Todd, P. M. in Attention and Performance XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience (eds Meyer, D. E. & Kornblum, S.) 3–30 (MIT Press, 1993).

  37. Rumelhart, D. E., McClelland, J. L. & PDP Research Group. Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations (MIT Press, 1986).

  38. Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).

    Article  Google Scholar 

  39. Yang, J., Zhou, K., Li, Y. & Liu, Z. Generalized out-of-distribution detection: a survey. Int. J. Comput. Vision 132, 5635–5662 (2024).

  40. Achtibat, R. et al. From attribution maps to human-understandable explanations through concept relevance propagation. Nat. Mach. Intell. 5, 1006–1019 (2023).

    Article  Google Scholar 

  41. Miller, T. Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019).

    Article  MathSciNet  Google Scholar 

  42. Waldmann, M. R., Hagmayer, Y. & Blaisdell, A. P. Beyond the information given: causal models in learning and reasoning. Curr. Directions Psychol. Sci. 15, 307–311 (2006).

    Article  Google Scholar 

  43. Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article  Google Scholar 

  44. Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2003).

  45. Ilievski, F. Human-Centric AI with Common Sense (Springer, 2025).

  46. Ji, J. et al. AI alignment: a comprehensive survey. Preprint at https://doi.org/10.48550/arXiv.2310.19852 (2023).

  47. Butlin, P. AI alignment and human reward. In 2021 AAAI/ACM Conference on AI, Ethics, and Society 437–445 (ACM, 2021).

  48. Langley, P. & Simon, H. A. in Cognitive Skills and Their Acquisition 361–380 (Psychology Press, 2013).

  49. Colunga, E. & Smith, L. B. The emergence of abstract ideas: evidence from networks and babies. Philos. Trans. R. Soc. London Ser B 358, 1205–1214 (2003).

    Article  Google Scholar 

  50. French, R. M. The Subtlety of Sameness: A Theory and Computer Model of Analogy-Making (MIT Press, 1995).

  51. Biehl, M. The Shallow and the Deep: A Biased Introduction to Neural Networks and Old School Machine Learning (University of Groningen Press, 2023).

  52. Lu, J. et al. Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2019).

    Google Scholar 

  53. Zhang, Y. & Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34, 5586–5609 (2021).

    Article  Google Scholar 

  54. Verwimp, E. et al. Continual learning: applications and the road forward. Trans. Mach. Learn. Res. (2024).

  55. Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. IEEE 109, 43–76 (2020).

    Article  Google Scholar 

  56. Jackendoff, R. S. Semantics and Cognition Vol. 8 (MIT Press, 1985).

  57. Bertsimas, D. & Dunn, J. Optimal classification trees. Mach. Learn. 106, 1039–1082 (2017).

    Article  MathSciNet  Google Scholar 

  58. Rosch, E. Natural categories. Cognit. Psychol. 4, 328–350 (1973).

    Article  Google Scholar 

  59. Bien, J. & Tibshirani, R. Prototype selection for interpretable classification. Ann. Appl. Stat. 5, 2403–2424 (2011).

    Article  MathSciNet  Google Scholar 

  60. Peterson, L. E. K-nearest neighbor. Scholarpedia 4, 1883 (2009).

    Article  Google Scholar 

  61. Bengesi, S. et al. Advancements in generative AI: a comprehensive review of GANs, GPT, autoencoders, diffusion model, and transformers. IEEE Access 12, 69812–69837 (2024).

    Article  Google Scholar 

  62. Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory to Algorithms (Cambridge Univ. Press, 2014).

  63. Decelle, A. An introduction to machine learning: a perspective from statistical physics. Physica A 631, 128154 (2022).

  64. Vapnik, V. N. & Chervonenkis, A. Y. in Measures of Complexity: Festschrift for Alexey Chervonenkis 11–30 (Springer, 2015).

  65. Cover, T. M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965).

  66. Brown, T. et al. Language models are few-shot learners. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 1877–1901 (Curran Associates, 2020).

  67. Müller, T., Pérez-Torró, G. & Franco-Salvador, M. Few-shot learning with Siamese Networks and label tuning. In 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Muresan, S. et al.) 8532–8545 (Association for Computational Linguistics, 2022).

  68. Gold, E. M. Language identification in the limit. Inf. Control 10, 447–474 (1967).

    Article  MathSciNet  Google Scholar 

  69. Zeugmann, T. in Algorithmic Learning Theory (eds Gavaldá, R. et al.) 17–38 (Springer Berlin Heidelberg, 2003).

  70. Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).

    Article  Google Scholar 

  71. Vapnik, V. N. The Nature of Statistical Learning Theory (Springer-Verlag, 1995).

  72. Bousquet, O. & Elisseeff, A. Algorithmic stability and generalization performance. In Proc. 14th International Conference on Neural Information Processing Systems (eds Leen, T. et al.) 178–184 (MIT Press, 2000).

  73. Grohs, P. & Kutyniok, G. (eds) Mathematical Aspects of Deep Learning (Cambridge Univ. Press, 2022).

  74. Ye, H. et al. Towards a theoretical framework of out-of-distribution generalization. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 23519–23531 (Curran Associates, 2021).

  75. Papernot, N. et al. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy 372–387 (IEEE, 2016).

  76. Jiao, T., Guo, C., Feng, X., Chen, Y. & Song, J. A comprehensive survey on deep learning multi-modal fusion: methods, technologies and applications. Comput. Mater. Continua 80, 1–35 (2024).

    Article  Google Scholar 

  77. Dalal, A. et al. On the value of labeled data and symbolic methods for hidden neuron activation analysis. In Neural-Symbolic Learning and Reasoning Proceedings, Part II (eds Besold, T. R. et al.) 109–131 (Springer, 2024).

  78. Baker, R. E., Peña, J.-M., Jayamohan, J. & Jérusalem, A. Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol. Lett. 14, 20170660 (2018).

  79. Yao, L. et al. A survey on causal inference. ACM Trans. Knowl. Discov. Data 15, 74 (2021).

  80. Kitzelmann, E. & Schmid, U. Inductive synthesis of functional programs: an explanation based generalization approach. J. Mach. Learn. Res. 7, 429–454 (2006).

    MathSciNet  Google Scholar 

  81. Manhaeve, R., Dumančić, S., Kimmig, A., Demeester, T. & De Raedt, L. Neural probabilistic logic programming in DeepProbLog. Artif. Intell. 298, 103504 (2021).

    Article  MathSciNet  Google Scholar 

  82. Cao, J., Fang, J., Meng, Z. & Liang, S. Knowledge graph embedding: a survey from the perspective of representation spaces. ACM Comput. Surv. 56, 159 (2024).

  83. Berthet, M., Surbeck, M. & Townsend, S. W. Extensive compositionality in the vocal system of bonobos. Science 388, 104–108 (2025).

    Article  Google Scholar 

  84. Hammer, B. Learning with Recurrent Neural Networks (Springer-Verlag, 2000).

  85. Wiedemer, T. et al. Provable compositional generalization for object-centric learning. In Proc. 40th International Conference on Machine Learning 3038–3062 (PMLR, 2023).

  86. Aha, D. Lazy Learning (Springer Netherlands, 2013).

  87. Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L. & Lewis, M. Generalization through memorization: nearest neighbor language models. In International Conference on Learning Representations (ICLR, 2020).

  88. Chen, H.-J., Cheng, A.-C., Juan, D.-C., Wei, W. & Sun, M. Mitigating forgetting in online continual learning via instance-aware parameterization. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H., et al.) 17466–17477 (Curran Associates, 2020).

  89. Kulis, B. Metric learning: a survey. Found. Trends Mach. Learn. 5, 287–364 (2013).

    Article  Google Scholar 

  90. He, J. Z.-Y., Erickson, Z., Brown, D. S., Raghunathan, A. & Dragan, A. Learning representations that enable generalization in assistive tasks. In 6th Annual Conference on Robot Learning (PMLR, 2022).

  91. Shepard, R. N. Toward a universal law of generalization for psychological science. Science 237, 1317–1323 (1987).

    Article  MathSciNet  Google Scholar 

  92. Gao, Y. et al. Retrieval-augmented generation for large language models: a survey. Preprint at https://doi.org/10.48550/arXiv.2312.10997 (2024).

  93. Li, C. & Flanigan, J. Task contamination: language models may not be few-shot anymore. In Proc. 38th AAAI Conference on Artificial Intelligence 18471–18480 (AAAI Press, 2024).

  94. Lewis, M. & Mitchell, M. Using counterfactual tasks to evaluate the generality of analogical reasoning in large language models. Preprint at https://doi.org/10.48550/arXiv.2402.08955 (2024).

  95. Dodge, J. et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 1286–1305 (Association for Computational Linguistics, 2021).

  96. Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at https://doi.org/10.48550/arXiv.2108.13624 (2021).

  97. Cossu, A. et al. Don’t drift away: Advances and applications of streaming and continual learning. In Proc. 33th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN, 2025).

  98. Hendrycks, D. & Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. In Proc. 7th International Conference on Learning Representations (ICLR, 2019).

  99. Gonen, H., Iyer, S., Blevins, T., Smith, N. & Zettlemoyer, L. Demystifying prompts in language models via perplexity estimation. In Findings of the Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 10136–10148 (Association for Computational Linguistics, 2023).

  100. Boult, T. E. et al. Learning and the unknown: surveying steps toward open world recognition. In Proc. 33rd AAAI Conference on Artificial Intelligence 9801–9807 (AAAI Press, 2019).

  101. Hovy, D. & Spruit, S. L. The social impact of natural language processing. In 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Erk, K. & Smith, N. A.) 591–598 (Association for Computational Linguistics, 2016).

  102. Sourati, Z. et al. Robust and explainable identification of logical fallacies in natural language arguments. Knowledge-Based Syst. 266, 110418 (2023).

    Article  Google Scholar 

  103. Mundt, M., Hong, Y., Pliushch, I. & Ramesh, V. A wholistic view of continual learning with deep neural networks: forgotten lessons and the bridge to active and open world learning. Neural Networks 160, 306–336 (2023).

    Article  Google Scholar 

  104. Lapuschkin, S. et al. Unmasking clever hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).

  105. Shah, H., Tamuly, K., Raghunathan, A., Jain, P. & Netrapalli, P. The pitfalls of simplicity bias in neural networks. In Proc. 34th International Conference on Neural Information Processing Systems 9573–9585 (Curran Associates, 2020).

  106. Yuan, Y., Li, Z. & Zhao, B. A survey of multimodal learning: methods, applications, and future. ACM Comput. Surv. 57, 167 (2025).

  107. Davis, E. Benchmarks for automated commonsense reasoning: a survey. ACM Comput. Surv. 56, 1–41 (2023).

    Article  Google Scholar 

  108. Sourati, Z., Ilievski, F., Sommerauer, P. & Jiang, Y. ARN: analogical reasoning on narratives. Trans. Assoc. Comput. Ling. 12, 1063–1086 (2024).

    Google Scholar 

  109. Nie, W. et al. Bongard-LOGO: a new benchmark for human-level concept learning and reasoning. Adv. Neural Inf. Process. Syst. 33, 16468–16480 (2020).

    Google Scholar 

  110. Wang, C. et al. Survey on factuality in large language models: knowledge, retrieval and domain-specificity. Preprint at https://doi.org/10.48550/arXiv.2310.07521 (2023).

  111. Guha, N. et al. LegalBench: a collaboratively built benchmark for measuring legal reasoning in large language models. Adv. Neural Inf. Process. Syst. 36, 44123–44279 (2023).

    Google Scholar 

  112. Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D. & Gama, J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 12, e1449 (2022).

    Article  Google Scholar 

  113. Atkinson, K. et al. Towards artificial argumentation. AI Magazine 38, 25–36 (2017).

    Article  Google Scholar 

  114. Yang, K. et al. LeanDojo: theorem proving with retrieval-augmented language models. Adv. Neural Inf. Process. Syst. 36, 21573–21612 (2023).

    Google Scholar 

  115. Nauta, M. et al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput. Surv. 55, 295 (2023).

  116. Kambhampati, S., Stechly, K. & Valmeekam, K. (How) do reasoning models reason? Ann. N.Y. Acad. Sci. 1547, 33-40 (2025).

  117. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712 (2023).

  118. Cohen, T. & Welling, M. Group equivariant convolutional networks. In 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 2990–2999 (PMLR, 2016).

  119. Liao, M. et al. Calibration-based multi-prototype contrastive learning for domain generalization semantic segmentation in traffic scenes. IEEE Trans. Intell. Transport. Syst. 25, 20985–21001 (2024).

    Article  Google Scholar 

  120. Tyukin, I. Y., Gorban, A. N., Alkhudaydi, M. H. & Zhou, Q. Demystification of few-shot and one-shot learning. In 2021 International Joint Conference on Neural Networks 1–7 (IEEE, 2021).

  121. Manginas, V. et al. A scalable approach to probabilistic neuro-symbolic verification. Preprint at https://doi.org/10.48550/arXiv.2502.03274 (2025).

  122. Singh, G., Tommasini, R., Bhatia, S. & Mutharaju, R. Benchmarking neuro-symbolic description logic reasoners: existing challenges and a way forward. Neurosymbolic Artif. Intell. https://doi.org/10.1177/29498732251339943 (2025).

  123. de Boer, M., Smit, Q., van Bekkum, M., Meyer-Vitali, A. & Schmid, T. Design patterns for llm-based neuro-symbolic systems. Neurosymbolic Artif. Intell. (in the press).

  124. Giunchiglia, F., Villafiorita, A. & Walsh, T. Theories of abstraction. AI Commun. 10, 167–176 (1997).

    Google Scholar 

  125. Gordon, A. S. & Hobbs, J. R. A Formal Theory of Commonsense Psychology: How People Think People Think (Cambridge Univ. Press, 2017).

  126. Lenat, D. B. CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38, 33–38 (1995).

    Article  Google Scholar 

  127. Weir, N. et al. From models to microtheories: distilling a model’s topical knowledge for grounded question answering. In International Conference on Representation Learning 2025 (eds Yue, Y. et al.) (ICLR, 2025).

  128. Luo, Y. et al. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. Preprint at https://doi.org/10.48550/arXiv.2308.08747 (2025).

  129. Shi, Z., Jing, J., Sun, Y., Lim, J.-H. & Zhang, M. Unveiling the tapestry: the interplay of generalization and forgetting in continual learning. IEEE Trans. Neural Netw. Learn. Syst. 36, 15070–15084 (2025).

  130. Malialis, K., Li, J., Panayiotou, C. G. & Polycarpou, M. M. Incremental learning with concept drift detection and prototype-based embeddings for graph stream classification. In International Joint Conference on Neural Networks 1–7 (IEEE, 2024).

  131. Huang, J. et al. Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal. In 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Ku, L.-W. et al.) 1416–1428 (Association for Computational Linguistics, 2024).

  132. Fuente, N. D. L. et al. Prototype augmented hypernetworks for continual learning. Preprint at https://doi.org/10.48550/arXiv.2505.07450 (2025).

  133. Lin, S., Ju, P., Liang, Y. & Shroff, N. Theory on forgetting and generalization of continual learning. In Proc. 40th International Conference on Machine Learning 21078–21100 (JMLR, 2023).

  134. Chollet, F. On the measure of intelligence. Preprint at https://doi.org/10.48550/arXiv.1911.01547 (2019).

  135. Jiang, Y., Ilievski, F., Ma, K. & Sourati, Z. BRAINTEASER: lateral thinking puzzles for large language models. In Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 14317–14332 (Association for Computational Linguistics, 2023).

  136. Draws, T., Rieger, A., Inel, O., Gadiraju, U. & Tintarev, N. A checklist to combat cognitive biases in crowdsourcing. In Proc. 9th AAAI Conference on Human Computation and Crowdsourcing 48–59 (AAAI Press, 2021).

  137. Duan, J., Yu, S., Tan, H. L., Zhu, H. & Tan, C. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerging Top. Comput. Intell. 6, 230–244 (2022).

    Article  Google Scholar 

  138. Mitchell, M. et al. Model cards for model reporting. In Proc. Conference on Fairness, Accountability, and Transparency 220–229 (Association for Computing Machinery, 2019).

  139. Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 1776–1826 (Association for Computing Machinery, 2022).

  140. Kapoor, S. et al. Reforms: consensus-based recommendations for machine-learning-based science. Sci. Adv. 10, eadk3452 (2024).

    Article  Google Scholar 

  141. Akata, Z. et al. A research agenda for hybrid intelligence: augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. IEEE Comput. 53, 18–28 (2020).

    Article  Google Scholar 

  142. Widmer, C. L. et al. Towards human-compatible XAI: explaining data differentials with concept induction over background knowledge. J. Web Semant. 79, 100807 (2023).

    Article  Google Scholar 

  143. Finzel, B., Hilme, P., Rabold, J. & Schmid, U. When a relation tells more than a concept: exploring and evaluating classifier decisions with CoReX. Preprint at https://doi.org/10.48550/arXiv.2405.01661 (2024).

  144. Braun, M., Greve, M., Gnewuch, U. The new dream team? A review of human AI collaboration research from a human teamwork perspective. In Proc. 44th International Conference on Information Systems 1192 (ICIS, 2023).

  145. Medin, D. L., Wattenmaker, W. D. & Hampson, S. E. Family resemblance, conceptual cohesiveness, and category construction. Cognit. Psychol. 19, 242–279 (1987).

    Article  Google Scholar 

Download references

Acknowledgements

The manuscript resulted from the May 2024 Dagstuhl seminar: Generalization by People and Machines (24192). K. Forbus, P. Vossen, D. Shahaf, W. Abd-Almageed, and M. Waldmann provided valuable insights during the seminar. F.I. is funded by the NWO AiNed project ‘Human-Centric AI Agents with Common Sense’. B.H., B.P., A.-C.N.N. gratefully acknowledge funding by the Ministry of Culture and Science of North Rhine-Westphalia (MKW NRW) through the project SAIL (grant no. NW21-059A-D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filip Ilievski.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Mengmi Zhang, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ilievski, F., Hammer, B., van Harmelen, F. et al. Aligning generalization between humans and machines. Nat Mach Intell 7, 1378–1389 (2025). https://doi.org/10.1038/s42256-025-01109-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-025-01109-4

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics