Aligning generalization between humans and machines

Ilievski, Filip; Hammer, Barbara; van Harmelen, Frank; Paassen, Benjamin; Saralajew, Sascha; Schmid, Ute; Biehl, Michael; Bolognesi, Marianna; Dong, Xin Luna; Gashteovski, Kiril; Hitzler, Pascal; Marra, Giuseppe; Minervini, Pasquale; Mundt, Martin; Ngomo, Axel-Cyrille Ngonga; Oltramari, Alessandro; Pasi, Gabriella; Saribatur, Zeynep G.; Serafini, Luciano; Shawe-Taylor, John; Shwartz, Vered; Skitalinskaya, Gabriella; Stachl, Clemens; van de Ven, Gido M.; Villmann, Thomas

doi:10.1038/s42256-025-01109-4

Perspective
Published: 15 September 2025

Aligning generalization between humans and machines

Filip Ilievski¹,
Barbara Hammer²,
Frank van Harmelen¹,
Benjamin Paassen²,
Sascha Saralajew³,
Ute Schmid ORCID: orcid.org/0000-0002-1301-0326⁴,
Michael Biehl ORCID: orcid.org/0000-0001-5148-4568⁵,
Marianna Bolognesi ORCID: orcid.org/0000-0002-3292-8968⁶,
Xin Luna Dong⁷,
Kiril Gashteovski^3,8,
Pascal Hitzler⁹,
Giuseppe Marra¹⁰,
Pasquale Minervini^11,12,
Martin Mundt¹³,
Axel-Cyrille Ngonga Ngomo¹⁴,
Alessandro Oltramari ORCID: orcid.org/0000-0003-1559-4852¹⁵,
Gabriella Pasi¹⁶,
Zeynep G. Saribatur¹⁷,
Luciano Serafini¹⁸,
John Shawe-Taylor¹⁹,
Vered Shwartz^20,21,
Gabriella Skitalinskaya²²,
Clemens Stachl ORCID: orcid.org/0000-0002-4498-3067²³,
Gido M. van de Ven ORCID: orcid.org/0000-0002-5239-5660¹⁰ &
…
Thomas Villmann ORCID: orcid.org/0000-0001-6725-0141^24,25

Nature Machine Intelligence volume 7, pages 1378–1389 (2025)Cite this article

6160 Accesses
15 Citations
83 Altmetric
Metrics details

Subjects

Abstract

Recent advances in artificial intelligence (AI)—including generative approaches—have resulted in technology that can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals. The responsible use of AI and its participation in human–AI teams increasingly shows the need for AI alignment, that is, to make AI systems act according to our preferences. A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalize. In cognitive science, human generalization commonly involves abstraction and concept learning. By contrast, AI generalization encompasses out-of-domain generalization in machine learning, rule-based reasoning in symbolic AI, and abstraction in neurosymbolic AI. Here we combine insights from AI and cognitive science to identify key commonalities and differences across three dimensions: notions of, methods for, and evaluation of generalization. We map the different conceptualizations of generalization in AI and cognitive science along these three dimensions and consider their role for alignment in human–AI teaming. This results in interdisciplinary challenges across AI and cognitive science that must be tackled to support effective and cognitively supported alignment in human–AI teaming scenarios.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comparison of the strengths of humans and statistical machines, illustrating their complementary generalization in human–AI teaming scenarios.**

**Fig. 2: Illustrative examples of human generalization and its inspiration of rule-based, example-based and statistical ML approaches.**

A cognitive approach to human–AI complementarity in dynamic decision-making

Article 17 October 2025

Towards artificial general intelligence via a multimodal foundation model

Article Open access 02 June 2022

When combinations of humans and AI are useful: A systematic review and meta-analysis

Article Open access 28 October 2024

References

Jumper, J. M. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Article Google Scholar
Ferrara, E. GenAI against humanity: nefarious applications of generative artificial intelligence and large language models. J. Comput. Social Sci. 7, 549–569 (2024).
Carroll, M., Foote, D., Siththaranjan, A., Russell, S. & Dragan, A. AI alignment with changing and influenceable reward functions. In Proc. 41st International Conference on Machine Learning 5706–5756 (JMLR, 2024).
Metcalfe, J. S., Perelman, B. S., Boothe, D. L. & Mcdowell, K. Systemic oversimplification limits the potential for human-AI partnership. IEEE Access 9, 70242–70260 (2021).
Article Google Scholar
Gottweis, J. et al. Towards an AI co-scientist. Preprint at https://doi.org/10.48550/arXiv.2502.18864 (2025).
Donnelly, J. et al. Rashomon sets for prototypical-part networks: Editing interpretable models in real-time. In Computer Vision and Pattern Recognition Conference 4528–4538 (CVPR, 2025).
Bengio, Y. et al. Managing extreme AI risks amid rapid progress. Science 384, 842–845 (2024).
Article Google Scholar
Bellogín, A. et al. The EU AI act and the wager on trustworthy AI. Commun. ACM 67, 58–65 (2024).
Article Google Scholar
Son, J. Y., Smith, L. B. & Goldstone, R. L. Simplicity and generalization: short-cutting abstraction in children’s object categorizations. Cognition 108, 626–638 (2008).
Article Google Scholar
Harnad, S. in Handbook of Categorization in Cognitive Science 2nd edn (eds Cohen, H. & Lefebvre, C.) 21–54 (Elsevier Academic Press, 2017).
Holzinger, A. et al. Toward human-level concept learning: pattern benchmarking for AI algorithms. Patterns 4, 100788 (2023).
Article Google Scholar
Lin, H. W., Tegmark, M. & Rolnick, D. Why does deep and cheap learning work so well? J. Stat. Phys. 168, 1223–1247 (2017).
Article MathSciNet Google Scholar
Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997).
Article Google Scholar
Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and ai are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).
Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surv. 55, 248 (2023).
Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).
Losing, V., Hammer, B. & Wersing, H. KNN classifier with self adjusting memory for heterogeneous concept drift. In IEEE International Conference on Data Mining 291–300 (IEEE, 2016).
Hitzler, P. et al. (eds) Compendium of Neurosymbolic Artificial Intelligence (IOS Press, 2023).
Bruner, J., Goodnow, J. J. & Austin, G. A. A Study of Thinking (Wiley, 1956).
Hunt, E. B., Marin, J. & Stone, P. J. Experiments in Induction (Academic Press, 1966).
Winston, P. H. Learning Structural Descriptions From Examples. AI Technical Report (Massachusetts Institute of Technology, 1970).
Muggleton, S. & De Raedt, L. Inductive logic programming: theory and methods. J. Logic Program. 19, 629–679 (1994).
Article MathSciNet Google Scholar
Schmid, U. & Kitzelmann, E. Inductive rule learning on the knowledge level. Cognit. Syst. Res. 12, 237–248 (2011).
Article Google Scholar
Gulwani, S. et al. Inductive programming meets the real world. Commun. ACM 58, 90–99 (2015).
Article Google Scholar
De Raedt, L. & Kersting, K. in Probabilistic Inductive Logic Programming: Theory and Applications 1–27 (Springer, 2008).
Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011).
Article MathSciNet Google Scholar
Kahneman, D. Thinking, Fast and Slow (Macmillan, 2011).
Lafond, D., Lacouture, Y. & Cohen, A. L. Decision-tree models of categorization response times, choice proportions, and typicality judgments. Psychol. Rev. 116, 833 (2009).
Article Google Scholar
Rosch, E. & Mervis, C. B. Family resemblances: studies in the internal structure of categories. Cognit. Psychol. 7, 573–605 (1975).
Article Google Scholar
Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).
Article Google Scholar
Nosofsky, R. M. Exemplar-based accounts of relations between classification, recognition, and typicality. J. Exp. Psychol. Learn. Mem. Cognit. 14, 700 (1988).
Article Google Scholar
Labov, W. in New Ways of Analyzing Variation in English (eds Bailey, C.-J. N. & Shuy, R. W.) 67–90 (Georgetown Univ. Press, 1973).
Gentner, D. Structure-mapping: a theoretical framework for analogy. Cognit. Sci. 7, 155–170 (1983).
Google Scholar
Falkenhainer, B., Forbus, K. D. & Gentner, D. The structure-mapping engine: algorithm and examples. Artif. Intell. 41, 1–63 (1989).
Article Google Scholar
Forbus, K. D., Ferguson, R. W., Lovett, A. & Gentner, D. Extending SME to handle large-scale cognitive modeling. Cognit. Sci. 41, 1152–1201 (2017).
Article Google Scholar
Rumelhart, D. E. & Todd, P. M. in Attention and Performance XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience (eds Meyer, D. E. & Kornblum, S.) 3–30 (MIT Press, 1993).
Rumelhart, D. E., McClelland, J. L. & PDP Research Group. Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations (MIT Press, 1986).
Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).
Article Google Scholar
Yang, J., Zhou, K., Li, Y. & Liu, Z. Generalized out-of-distribution detection: a survey. Int. J. Comput. Vision 132, 5635–5662 (2024).
Achtibat, R. et al. From attribution maps to human-understandable explanations through concept relevance propagation. Nat. Mach. Intell. 5, 1006–1019 (2023).
Article Google Scholar
Miller, T. Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019).
Article MathSciNet Google Scholar
Waldmann, M. R., Hagmayer, Y. & Blaisdell, A. P. Beyond the information given: causal models in learning and reasoning. Curr. Directions Psychol. Sci. 15, 307–311 (2006).
Article Google Scholar
Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).
Article Google Scholar
Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (MIT Press, 2003).
Ilievski, F. Human-Centric AI with Common Sense (Springer, 2025).
Ji, J. et al. AI alignment: a comprehensive survey. Preprint at https://doi.org/10.48550/arXiv.2310.19852 (2023).
Butlin, P. AI alignment and human reward. In 2021 AAAI/ACM Conference on AI, Ethics, and Society 437–445 (ACM, 2021).
Langley, P. & Simon, H. A. in Cognitive Skills and Their Acquisition 361–380 (Psychology Press, 2013).
Colunga, E. & Smith, L. B. The emergence of abstract ideas: evidence from networks and babies. Philos. Trans. R. Soc. London Ser B 358, 1205–1214 (2003).
Article Google Scholar
French, R. M. The Subtlety of Sameness: A Theory and Computer Model of Analogy-Making (MIT Press, 1995).
Biehl, M. The Shallow and the Deep: A Biased Introduction to Neural Networks and Old School Machine Learning (University of Groningen Press, 2023).
Lu, J. et al. Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2019).
Google Scholar
Zhang, Y. & Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 34, 5586–5609 (2021).
Article Google Scholar
Verwimp, E. et al. Continual learning: applications and the road forward. Trans. Mach. Learn. Res. (2024).
Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. IEEE 109, 43–76 (2020).
Article Google Scholar
Jackendoff, R. S. Semantics and Cognition Vol. 8 (MIT Press, 1985).
Bertsimas, D. & Dunn, J. Optimal classification trees. Mach. Learn. 106, 1039–1082 (2017).
Article MathSciNet Google Scholar
Rosch, E. Natural categories. Cognit. Psychol. 4, 328–350 (1973).
Article Google Scholar
Bien, J. & Tibshirani, R. Prototype selection for interpretable classification. Ann. Appl. Stat. 5, 2403–2424 (2011).
Article MathSciNet Google Scholar
Peterson, L. E. K-nearest neighbor. Scholarpedia 4, 1883 (2009).
Article Google Scholar
Bengesi, S. et al. Advancements in generative AI: a comprehensive review of GANs, GPT, autoencoders, diffusion model, and transformers. IEEE Access 12, 69812–69837 (2024).
Article Google Scholar
Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory to Algorithms (Cambridge Univ. Press, 2014).
Decelle, A. An introduction to machine learning: a perspective from statistical physics. Physica A 631, 128154 (2022).
Vapnik, V. N. & Chervonenkis, A. Y. in Measures of Complexity: Festschrift for Alexey Chervonenkis 11–30 (Springer, 2015).
Cover, T. M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 14, 326–334 (1965).
Brown, T. et al. Language models are few-shot learners. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 1877–1901 (Curran Associates, 2020).
Müller, T., Pérez-Torró, G. & Franco-Salvador, M. Few-shot learning with Siamese Networks and label tuning. In 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Muresan, S. et al.) 8532–8545 (Association for Computational Linguistics, 2022).
Gold, E. M. Language identification in the limit. Inf. Control 10, 447–474 (1967).
Article MathSciNet Google Scholar
Zeugmann, T. in Algorithmic Learning Theory (eds Gavaldá, R. et al.) 17–38 (Springer Berlin Heidelberg, 2003).
Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).
Article Google Scholar
Vapnik, V. N. The Nature of Statistical Learning Theory (Springer-Verlag, 1995).
Bousquet, O. & Elisseeff, A. Algorithmic stability and generalization performance. In Proc. 14th International Conference on Neural Information Processing Systems (eds Leen, T. et al.) 178–184 (MIT Press, 2000).
Grohs, P. & Kutyniok, G. (eds) Mathematical Aspects of Deep Learning (Cambridge Univ. Press, 2022).
Ye, H. et al. Towards a theoretical framework of out-of-distribution generalization. In Proc. 35th International Conference on Neural Information Processing Systems (eds Beygelzimer, A. et al.) 23519–23531 (Curran Associates, 2021).
Papernot, N. et al. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy 372–387 (IEEE, 2016).
Jiao, T., Guo, C., Feng, X., Chen, Y. & Song, J. A comprehensive survey on deep learning multi-modal fusion: methods, technologies and applications. Comput. Mater. Continua 80, 1–35 (2024).
Article Google Scholar
Dalal, A. et al. On the value of labeled data and symbolic methods for hidden neuron activation analysis. In Neural-Symbolic Learning and Reasoning Proceedings, Part II (eds Besold, T. R. et al.) 109–131 (Springer, 2024).
Baker, R. E., Peña, J.-M., Jayamohan, J. & Jérusalem, A. Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol. Lett. 14, 20170660 (2018).
Yao, L. et al. A survey on causal inference. ACM Trans. Knowl. Discov. Data 15, 74 (2021).
Kitzelmann, E. & Schmid, U. Inductive synthesis of functional programs: an explanation based generalization approach. J. Mach. Learn. Res. 7, 429–454 (2006).
MathSciNet Google Scholar
Manhaeve, R., Dumančić, S., Kimmig, A., Demeester, T. & De Raedt, L. Neural probabilistic logic programming in DeepProbLog. Artif. Intell. 298, 103504 (2021).
Article MathSciNet Google Scholar
Cao, J., Fang, J., Meng, Z. & Liang, S. Knowledge graph embedding: a survey from the perspective of representation spaces. ACM Comput. Surv. 56, 159 (2024).
Berthet, M., Surbeck, M. & Townsend, S. W. Extensive compositionality in the vocal system of bonobos. Science 388, 104–108 (2025).
Article Google Scholar
Hammer, B. Learning with Recurrent Neural Networks (Springer-Verlag, 2000).
Wiedemer, T. et al. Provable compositional generalization for object-centric learning. In Proc. 40th International Conference on Machine Learning 3038–3062 (PMLR, 2023).
Aha, D. Lazy Learning (Springer Netherlands, 2013).
Khandelwal, U., Levy, O., Jurafsky, D., Zettlemoyer, L. & Lewis, M. Generalization through memorization: nearest neighbor language models. In International Conference on Learning Representations (ICLR, 2020).
Chen, H.-J., Cheng, A.-C., Juan, D.-C., Wei, W. & Sun, M. Mitigating forgetting in online continual learning via instance-aware parameterization. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H., et al.) 17466–17477 (Curran Associates, 2020).
Kulis, B. Metric learning: a survey. Found. Trends Mach. Learn. 5, 287–364 (2013).
Article Google Scholar
He, J. Z.-Y., Erickson, Z., Brown, D. S., Raghunathan, A. & Dragan, A. Learning representations that enable generalization in assistive tasks. In 6th Annual Conference on Robot Learning (PMLR, 2022).
Shepard, R. N. Toward a universal law of generalization for psychological science. Science 237, 1317–1323 (1987).
Article MathSciNet Google Scholar
Gao, Y. et al. Retrieval-augmented generation for large language models: a survey. Preprint at https://doi.org/10.48550/arXiv.2312.10997 (2024).
Li, C. & Flanigan, J. Task contamination: language models may not be few-shot anymore. In Proc. 38th AAAI Conference on Artificial Intelligence 18471–18480 (AAAI Press, 2024).
Lewis, M. & Mitchell, M. Using counterfactual tasks to evaluate the generality of analogical reasoning in large language models. Preprint at https://doi.org/10.48550/arXiv.2402.08955 (2024).
Dodge, J. et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 1286–1305 (Association for Computational Linguistics, 2021).
Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at https://doi.org/10.48550/arXiv.2108.13624 (2021).
Cossu, A. et al. Don’t drift away: Advances and applications of streaming and continual learning. In Proc. 33th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN, 2025).
Hendrycks, D. & Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. In Proc. 7th International Conference on Learning Representations (ICLR, 2019).
Gonen, H., Iyer, S., Blevins, T., Smith, N. & Zettlemoyer, L. Demystifying prompts in language models via perplexity estimation. In Findings of the Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 10136–10148 (Association for Computational Linguistics, 2023).
Boult, T. E. et al. Learning and the unknown: surveying steps toward open world recognition. In Proc. 33rd AAAI Conference on Artificial Intelligence 9801–9807 (AAAI Press, 2019).
Hovy, D. & Spruit, S. L. The social impact of natural language processing. In 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Erk, K. & Smith, N. A.) 591–598 (Association for Computational Linguistics, 2016).
Sourati, Z. et al. Robust and explainable identification of logical fallacies in natural language arguments. Knowledge-Based Syst. 266, 110418 (2023).
Article Google Scholar
Mundt, M., Hong, Y., Pliushch, I. & Ramesh, V. A wholistic view of continual learning with deep neural networks: forgotten lessons and the bridge to active and open world learning. Neural Networks 160, 306–336 (2023).
Article Google Scholar
Lapuschkin, S. et al. Unmasking clever hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).
Shah, H., Tamuly, K., Raghunathan, A., Jain, P. & Netrapalli, P. The pitfalls of simplicity bias in neural networks. In Proc. 34th International Conference on Neural Information Processing Systems 9573–9585 (Curran Associates, 2020).
Yuan, Y., Li, Z. & Zhao, B. A survey of multimodal learning: methods, applications, and future. ACM Comput. Surv. 57, 167 (2025).
Davis, E. Benchmarks for automated commonsense reasoning: a survey. ACM Comput. Surv. 56, 1–41 (2023).
Article Google Scholar
Sourati, Z., Ilievski, F., Sommerauer, P. & Jiang, Y. ARN: analogical reasoning on narratives. Trans. Assoc. Comput. Ling. 12, 1063–1086 (2024).
Google Scholar
Nie, W. et al. Bongard-LOGO: a new benchmark for human-level concept learning and reasoning. Adv. Neural Inf. Process. Syst. 33, 16468–16480 (2020).
Google Scholar
Wang, C. et al. Survey on factuality in large language models: knowledge, retrieval and domain-specificity. Preprint at https://doi.org/10.48550/arXiv.2310.07521 (2023).
Guha, N. et al. LegalBench: a collaboratively built benchmark for measuring legal reasoning in large language models. Adv. Neural Inf. Process. Syst. 36, 44123–44279 (2023).
Google Scholar
Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D. & Gama, J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 12, e1449 (2022).
Article Google Scholar
Atkinson, K. et al. Towards artificial argumentation. AI Magazine 38, 25–36 (2017).
Article Google Scholar
Yang, K. et al. LeanDojo: theorem proving with retrieval-augmented language models. Adv. Neural Inf. Process. Syst. 36, 21573–21612 (2023).
Google Scholar
Nauta, M. et al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput. Surv. 55, 295 (2023).
Kambhampati, S., Stechly, K. & Valmeekam, K. (How) do reasoning models reason? Ann. N.Y. Acad. Sci. 1547, 33-40 (2025).
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712 (2023).
Cohen, T. & Welling, M. Group equivariant convolutional networks. In 33rd International Conference on Machine Learning (eds Balcan, M. F. & Weinberger, K. Q.) 2990–2999 (PMLR, 2016).
Liao, M. et al. Calibration-based multi-prototype contrastive learning for domain generalization semantic segmentation in traffic scenes. IEEE Trans. Intell. Transport. Syst. 25, 20985–21001 (2024).
Article Google Scholar
Tyukin, I. Y., Gorban, A. N., Alkhudaydi, M. H. & Zhou, Q. Demystification of few-shot and one-shot learning. In 2021 International Joint Conference on Neural Networks 1–7 (IEEE, 2021).
Manginas, V. et al. A scalable approach to probabilistic neuro-symbolic verification. Preprint at https://doi.org/10.48550/arXiv.2502.03274 (2025).
Singh, G., Tommasini, R., Bhatia, S. & Mutharaju, R. Benchmarking neuro-symbolic description logic reasoners: existing challenges and a way forward. Neurosymbolic Artif. Intell. https://doi.org/10.1177/29498732251339943 (2025).
de Boer, M., Smit, Q., van Bekkum, M., Meyer-Vitali, A. & Schmid, T. Design patterns for llm-based neuro-symbolic systems. Neurosymbolic Artif. Intell. (in the press).
Giunchiglia, F., Villafiorita, A. & Walsh, T. Theories of abstraction. AI Commun. 10, 167–176 (1997).
Google Scholar
Gordon, A. S. & Hobbs, J. R. A Formal Theory of Commonsense Psychology: How People Think People Think (Cambridge Univ. Press, 2017).
Lenat, D. B. CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38, 33–38 (1995).
Article Google Scholar
Weir, N. et al. From models to microtheories: distilling a model’s topical knowledge for grounded question answering. In International Conference on Representation Learning 2025 (eds Yue, Y. et al.) (ICLR, 2025).
Luo, Y. et al. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. Preprint at https://doi.org/10.48550/arXiv.2308.08747 (2025).
Shi, Z., Jing, J., Sun, Y., Lim, J.-H. & Zhang, M. Unveiling the tapestry: the interplay of generalization and forgetting in continual learning. IEEE Trans. Neural Netw. Learn. Syst. 36, 15070–15084 (2025).
Malialis, K., Li, J., Panayiotou, C. G. & Polycarpou, M. M. Incremental learning with concept drift detection and prototype-based embeddings for graph stream classification. In International Joint Conference on Neural Networks 1–7 (IEEE, 2024).
Huang, J. et al. Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal. In 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Ku, L.-W. et al.) 1416–1428 (Association for Computational Linguistics, 2024).
Fuente, N. D. L. et al. Prototype augmented hypernetworks for continual learning. Preprint at https://doi.org/10.48550/arXiv.2505.07450 (2025).
Lin, S., Ju, P., Liang, Y. & Shroff, N. Theory on forgetting and generalization of continual learning. In Proc. 40th International Conference on Machine Learning 21078–21100 (JMLR, 2023).
Chollet, F. On the measure of intelligence. Preprint at https://doi.org/10.48550/arXiv.1911.01547 (2019).
Jiang, Y., Ilievski, F., Ma, K. & Sourati, Z. BRAINTEASER: lateral thinking puzzles for large language models. In Conference on Empirical Methods in Natural Language Processing (eds Bouamor, H. et al.) 14317–14332 (Association for Computational Linguistics, 2023).
Draws, T., Rieger, A., Inel, O., Gadiraju, U. & Tintarev, N. A checklist to combat cognitive biases in crowdsourcing. In Proc. 9th AAAI Conference on Human Computation and Crowdsourcing 48–59 (AAAI Press, 2021).
Duan, J., Yu, S., Tan, H. L., Zhu, H. & Tan, C. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerging Top. Comput. Intell. 6, 230–244 (2022).
Article Google Scholar
Mitchell, M. et al. Model cards for model reporting. In Proc. Conference on Fairness, Accountability, and Transparency 220–229 (Association for Computing Machinery, 2019).
Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 1776–1826 (Association for Computing Machinery, 2022).
Kapoor, S. et al. Reforms: consensus-based recommendations for machine-learning-based science. Sci. Adv. 10, eadk3452 (2024).
Article Google Scholar
Akata, Z. et al. A research agenda for hybrid intelligence: augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. IEEE Comput. 53, 18–28 (2020).
Article Google Scholar
Widmer, C. L. et al. Towards human-compatible XAI: explaining data differentials with concept induction over background knowledge. J. Web Semant. 79, 100807 (2023).
Article Google Scholar
Finzel, B., Hilme, P., Rabold, J. & Schmid, U. When a relation tells more than a concept: exploring and evaluating classifier decisions with CoReX. Preprint at https://doi.org/10.48550/arXiv.2405.01661 (2024).
Braun, M., Greve, M., Gnewuch, U. The new dream team? A review of human AI collaboration research from a human teamwork perspective. In Proc. 44th International Conference on Information Systems 1192 (ICIS, 2023).
Medin, D. L., Wattenmaker, W. D. & Hampson, S. E. Family resemblance, conceptual cohesiveness, and category construction. Cognit. Psychol. 19, 242–279 (1987).
Article Google Scholar

Download references

Acknowledgements

The manuscript resulted from the May 2024 Dagstuhl seminar: Generalization by People and Machines (24192). K. Forbus, P. Vossen, D. Shahaf, W. Abd-Almageed, and M. Waldmann provided valuable insights during the seminar. F.I. is funded by the NWO AiNed project ‘Human-Centric AI Agents with Common Sense’. B.H., B.P., A.-C.N.N. gratefully acknowledge funding by the Ministry of Culture and Science of North Rhine-Westphalia (MKW NRW) through the project SAIL (grant no. NW21-059A-D).

Author information

Authors and Affiliations

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Filip Ilievski & Frank van Harmelen
University of Bielefeld, Bielefeld, Germany
Barbara Hammer & Benjamin Paassen
NEC Laboratories Europe, Heidelberg, Germany
Sascha Saralajew & Kiril Gashteovski
University of Bamberg, Bamberg, Germany
Ute Schmid
University of Groningen, Groningen, The Netherlands
Michael Biehl
Università di Bologna, Bologna, Italy
Marianna Bolognesi
Meta Reality Labs, Redmond, WA, USA
Xin Luna Dong
CAIR, Ss. Cyril and Methodius University, Skopje, North Macedonia
Kiril Gashteovski
Kansas State University, Manhattan, KS, USA
Pascal Hitzler
KU Leuven, Leuven, Belgium
Giuseppe Marra & Gido M. van de Ven
University of Edinburgh, Edinburgh, UK
Pasquale Minervini
Miniml.AI, Edinburgh, UK
Pasquale Minervini
University of Bremen, Bremen, Germany
Martin Mundt
Paderborn University, Paderborn, Germany
Axel-Cyrille Ngonga Ngomo
Carnegie Bosch Institute, Pittsburgh, PA, USA
Alessandro Oltramari
Università degli Studi di Milano Bicocca, Milan, Italy
Gabriella Pasi
TU Wien, Vienna, Austria
Zeynep G. Saribatur
Fondazione Bruno Kessler, Trento, Italy
Luciano Serafini
University College London, London, UK
John Shawe-Taylor
University of British Columbia, Vancouver, British Columbia, Canada
Vered Shwartz
Vector Institute, Toronto, Ontario, Canada
Vered Shwartz
Duolingo, Pittsburgh, PA, USA
Gabriella Skitalinskaya
University of St. Gallen, Institute of Behavioral Science and Technology, St. Gallen, Switzerland
Clemens Stachl
University of Applied Sciences Mittweida, Mittweida, Germany
Thomas Villmann
Technical University Freiberg, Freiberg, Germany
Thomas Villmann

Authors

Filip Ilievski
View author publications
Search author on:PubMed Google Scholar
Barbara Hammer
View author publications
Search author on:PubMed Google Scholar
Frank van Harmelen
View author publications
Search author on:PubMed Google Scholar
Benjamin Paassen
View author publications
Search author on:PubMed Google Scholar
Sascha Saralajew
View author publications
Search author on:PubMed Google Scholar
Ute Schmid
View author publications
Search author on:PubMed Google Scholar
Michael Biehl
View author publications
Search author on:PubMed Google Scholar
Marianna Bolognesi
View author publications
Search author on:PubMed Google Scholar
Xin Luna Dong
View author publications
Search author on:PubMed Google Scholar
Kiril Gashteovski
View author publications
Search author on:PubMed Google Scholar
Pascal Hitzler
View author publications
Search author on:PubMed Google Scholar
Giuseppe Marra
View author publications
Search author on:PubMed Google Scholar
Pasquale Minervini
View author publications
Search author on:PubMed Google Scholar
Martin Mundt
View author publications
Search author on:PubMed Google Scholar
Axel-Cyrille Ngonga Ngomo
View author publications
Search author on:PubMed Google Scholar
Alessandro Oltramari
View author publications
Search author on:PubMed Google Scholar
Gabriella Pasi
View author publications
Search author on:PubMed Google Scholar
Zeynep G. Saribatur
View author publications
Search author on:PubMed Google Scholar
Luciano Serafini
View author publications
Search author on:PubMed Google Scholar
John Shawe-Taylor
View author publications
Search author on:PubMed Google Scholar
Vered Shwartz
View author publications
Search author on:PubMed Google Scholar
Gabriella Skitalinskaya
View author publications
Search author on:PubMed Google Scholar
Clemens Stachl
View author publications
Search author on:PubMed Google Scholar
Gido M. van de Ven
View author publications
Search author on:PubMed Google Scholar
Thomas Villmann
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Filip Ilievski.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Mengmi Zhang, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ilievski, F., Hammer, B., van Harmelen, F. et al. Aligning generalization between humans and machines. Nat Mach Intell 7, 1378–1389 (2025). https://doi.org/10.1038/s42256-025-01109-4

Download citation

Received: 11 December 2024
Accepted: 06 August 2025
Published: 15 September 2025
Version of record: 15 September 2025
Issue date: September 2025
DOI: https://doi.org/10.1038/s42256-025-01109-4

This article is cited by

Beyond Limits: Enhancing the Extrapolation Performance of Regression Models by Leaving the Boundary Out
- Francisco Ambrosio Garcia
- Frank Naets
Machine Learning (2025)