Abstract
The rapid development of artificial intelligence (AI) systems has created an urgent need for their scientific quantification. While their fluency across a variety of domains is impressive, AI systems fall short on tests requiring algorithmic reasoning—a glaring limitation, given the necessity for interpretable and reliable technology. Despite a surge in reasoning benchmarks emerging from the academic community, no theoretical framework exists to quantify algorithmic reasoning in AI systems. Here we adopt a framework from computational complexity theory to quantify algorithmic generalization using algebraic expressions: algebraic circuit complexity. Algebraic circuit complexity theory—the study of algebraic expressions as circuit models—is a natural framework for studying the complexity of algorithmic computation. Algebraic circuit complexity enables the study of generalization by defining benchmarks in terms of the computational requirements for solving a problem. Moreover, algebraic circuits are generic mathematical objects; an arbitrarily large number of samples can be generated for a specified circuit, making it an ideal experimental sandbox for the data-hungry models that are used today. In this Perspective, we adopt tools from algebraic circuit complexity, apply them to formalize a science of algorithmic generalization, and address key challenges for its successful application to AI science.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
References
Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://arxiv.org/abs/2303.12712 (2023).
Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022); https://openreview.net/pdf?id=yzkSU5zdwD
Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. https://doi.org/10.1038/s41562-023-01659-w (2023).
DeepSeek-AI et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at https://arxiv.org/abs/2501.12948 (2025).
Kim, N., Linzen, T. & Smolensky, P. Uncontrolled lexical exposure leads to overestimation of compositional generalization in pretrained models. Preprint at https://arxiv.org/abs/2212.10769 (2022).
Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? Adv. Neural Inf. Process. Syst. 36, 55565–55581 (2023).
Wu, Z. et al. Reasoning or eeciting? Exploring the capabilities and limitations of language models through counterfactual tasks. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (eds Duh, K. et al.) 1819–1862 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.naacl-long.102
Lake, B. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning 2873–2882 (PMLR, 2018); http://proceedings.mlr.press/v80/lake18a.html
Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise? J. Artif. Intell. Res. 67, 757–795 (2020).
Hudson, D. A. & Manning, C. D. GQA: a new dataset for real-world visual reasoning and compositional question answering. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6700–6709 (IEEE, 2019); https://openaccess.thecvf.com/content_CVPR_2019/html/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.html
Yang, G. R., Ganichev, I., Wang, X.-J., Shlens, J. & Sussillo, D. A dataset and architecture for visual reasoning with a working memory. In Computer Vision – ECCV 2018 Lecture Notes in Computer Science Vol. 11214 (eds Ferrari, V. et al.) 729–745 (Springer, 2018); https://doi.org/10.1007/978-3-030-01249-6_44
Ito, T., Dan, S., Rigotti, M., Kozloski, J. & Campbell, M. On the generalization capacity of neural networks during generic multimodal reasoning. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=zyBJodMrn5¬eId=zyBJodMrn5
Johnson, J. et al. CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1988–1997 (IEEE, 2017); https://doi.org/10.1109/CVPR.2017.215
Clark, C. et al. BoolQ: exploring the surprising difficulty of natural yes/no questions. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 2924–2936 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1300
Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 9087–9105 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.731
Ruis, L., Andreas, J., Baroni, M., Bouchacourt, D. & Lake, B. M. A benchmark for systematic generalization in grounded language understanding. Adv. Neural Inf. Process. Syst. 33, 19861–19872 (2020).
Keysers, D. et al. Measuring compositional generalization: a comprehensive method on realistic data. In International Conference on Learning Representations (2020); https://openreview.net/forum?id=SygcCnNKwr&utm_campaign=Graph
Kudo, K. et al. Do deep neural networks capture compositionality in arithmetic reasoning? In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (Vlachos, A. & Augenstein, I.) 1351–1362 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.eacl-main.98
McLeish, S. et al. Transformers can do arithmetic with the right embeddings. Adv. Neural Inf. Proc. Syst. 37, 108012–108041 (2024).
Zhou, H. et al. What algorithms can transformers learn? A study in length generalization. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=AssIuHnmHX
Shen, R. et al. Positional description matters for transformers arithmetic. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=ZMuPAOY8Oz
Saxton, D., Grefenstette, E., Hill, F. & Kohli, P. Analysing mathematical reasoning abilities of neural models. In International Conference on Learning Representations (2019); https://openreview.net/forum?id=H1gR5iR5FX
Lee, N., Sreenivasan, K., Lee, J. D., Lee, K. & Papailiopoulos, D. Teaching arithmetic to small transformers. In International Conference on Learning Representations (2023); https://openreview.net/forum?id=dsUB4bst9S
Dziri, N. et al. Faith and fate: limits of transformers on compositionality. Adv. Neural Inf. Process. Syst. 36, 70293–70332 (2023).
McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl Acad. Sci. USA 121, e2322420121 (2024).
Zhou, Y. et al. Transformers can achieve length generalization but not robustly. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=DWkWIh3vFJ
Sinha, S., Premsri, T. & Kordjamshidi, P. A survey on compositional learning of AI models: theoretical and experimental practices. Trans. Mach. Learn. Res. https://openreview.net/forum?id=BXDxwItNqQ (2024).
Frege, G. in Logic and Philosophy for Linguists (ed. Moravcsik, J. M. E.) 279–298 (De Gruyter, 1975); https://doi.org/10.1515/9783111546216-018
Carnap, R. Meaning and Necessity: A Study in Semantics and Modal Logic Vol. 30 (Univ. Chicago Press, 1988).
Russin, J., McGrath, S. W., Williams, D. J. & Elber-Dorozko, L. From Frege to chatGPT: compositionality in language, cognition, and deep neural networks. Preprint at https://arxiv.org/abs/2405.15164 (2024).
Kazemnejad, A., Padhi, I., Natesan Ramamurthy, K., Das, P. & Reddy, S. The impact of positional encoding on length generalization in transformers. Ad. Neural Inf. Process. Syst. 36, 24892–24928 (2023).
Hupkes, D. et al. A taxonomy and review of generalization research in NLP. Nat. Mach. Intell. 5, 1161–1174 (2023).
Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).
Fodor, J. & McLaughlin, B. P. Connectionism and the problem of systematicity: why Smolensky’s solution doesn’t work. Cognition 35, 183–204 (1990).
Smolensky, P. in Connectionism and the Philosophy of Mind (eds Horgan, T. & Tienson, J.) 281–308 (Springer, 1991); https://doi.org/10.1007/978-94-011-3524-5_13
Deletang, G. et al. Neural networks and the chomsky hierarchy. In International Conference on Learning Representations (2022); https://openreview.net/forum?id=WbxHAzkeQcn
Zhou, H. et al. Teaching algorithmic reasoning via in-context learning. In International Conference on Learning Representations (2023); https://openreview.net/forum?id=6dlC7E1H_9
Ruoss, A. et al. Randomized positional encodings boost length generalization of transformers. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Rogers, A. et al.) 1889–1903 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.acl-short.161
Jelassi, S. et al. Length generalization in arithmetic transformers. Preprint at https://arxiv.org/abs/2306.15400 (2023).
Wang, C., Zheng, B., Niu, Y. & Zhang, Y. in Natural Language Processing and Chinese Computing (eds Wang, L. et al.) 758–769 (Springer, 2021); https://doi.org/10.1007/978-3-030-88480-2_61
Nogueira, R., Jiang, Z. & Lin, J. Investigating the limitations of transformers with simple arithmetic tasks. Preprint at https://arxiv.org/abs/2102.13019 (2021).
Shpilka, A. & Yehudayoff, A. Arithmetic circuits: a survey of recent results and open questions. Found. Trends Theor. Comput. Sci. 5, 207–388 (2010).
Bürgisser, P., Clausen, M. & Shokrollahi, M. A. Algebraic Complexity Theory Vol. 315 (Springer Science & Business Media, 2013).
Arora, S. & Barak, B. Computational Complexity: A Modern Approach (Cambridge Univ. Press, 2009).
Wyeth, C. & Sturtivant, C. A circuit complexity formulation of algorithmic information theory. Physica D 456, 133925 (2023).
Ram, P., Klinger, T. & Gray, A. G. What makes models compositional? A theoretical view: with supplement. Preprint at https://arxiv.org/abs/2405.02350 (2024).
Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956).
Strobl, L., Merrill, W., Weiss, G., Chiang, D. & Angluin, D. What formal languages can transformers express? A survey. Trans. Assoc. Comput. Linguist. 12, 543–561 (2024).
Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature https://doi.org/10.1038/s41586-023-06668-3 (2023).
Ruis, L. & Lake, B. Improving systematic generalization through modularity and augmentation. Preprint at https://arxiv.org/abs/2202.10745 (2022).
Ontanon, S., Ainslie, J., Fisher, Z. & Cvicek, V. Making transformers solve compositional tasks. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Muresan, S. et al) 3591–3607 (Association for Computational Linguistics, 2022); https://doi.org/10.18653/v1/2022.acl-long.251
Csordás, R., Irie, K. & Schmidhuber, J. The devil is in the detail: simple tricks improve systematic generalization of transformers. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 619–634 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.emnlp-main.49
Klinger, T. et al. Compositional program generation for systematic generalization. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (2023); https://openreview.net/forum?id=Wxj9U0ySU-s
Poggio, T. & Fraser, M. Compositional sparsity of learnable functions. article, center for brains, minds and machines (CBMM). MIT Libraries https://dspace.mit.edu/handle/1721.1/153475 (2024).
Zhang, X., Blanchet, J., Ghosh, S. & Squillante, M. S. A class of geometric structures in transfer learning: minimax bounds and optimality. In Proc. 25th International Conference on Artificial Intelligence and Statistics 3794–3820 (PMLR, 2022); https://proceedings.mlr.press/v151/zhang22a.html
Ghosh, S., Squillante, M. & Wollega, E. Efficient generalization with distributionally robust learning. Adv. Neural Inf. Process. Syst. 34, 28310–28322 (2021).
Hadley, R. F. Systematicity in connectionist language learning. Mind Lang. 9, 247–272 (1994).
Meurer, A. et al. SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017).
Hendrycks, D. et al. Measuring massive multitask language understanding. In International Conference on Learning Representations (2020); https://openreview.net/forum?id=d7KBjmI3GmQ
Andreas, J. Measuring compositionality in representation learning. In International Conference on Learning Representations (2018); https://openreview.net/forum?id=HJz05o0qK7
Nezhurina, M., Cipolina-Kun, L., Cherti, M. & Jitsev, J. Alice in Wonderland: simple tasks showing complete reasoning breakdown in state-of-the-art large language models. Preprint at https://arxiv.org/abs/2406.02061 (2024).
Chan, S. C. Y. et al. Transformers generalize differently from information stored in context vs in weights. Preprint at https://arxiv.org/abs/2210.05675 (2022).
Reddy, G. The mechanistic basis of data dependence and abrupt learning in an in-context classification task. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=aN4Jf6Cx69
Olsson, C. et al. In-context learning and induction heads. Preprint at https://arxiv.org/abs/2209.11895 (2022).
Zhou, D. et al. Least-to-most prompting enables complex reasoning in large language models. In International Conference on Learning Representations (2022); https://openreview.net/forum?id=WZH7099tgfM
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. OpenReview.net https://openreview.net/forum?id=iedYJm92o0a&ref=morioh.com&utm_source=morioh.com (2021).
Feng, G. et al. Towards revealing the mystery behind chain of thought: a theoretical perspective. Adv. Neural Inf. Process. Syst. 36, 70757–70798 (2023).
Merrill, W. & Sabharwal, A. The expressive power of transformers with chain of thought. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=NjNGlPh8Wh
Li, Z., Liu, H., Zhou, D. & Ma, T. Chain of thought empowers transformers to solve inherently serial problems. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=3EWTEy9MTM
Shao, Z. et al. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at https://arxiv.org/abs/2402.03300 (2024).
Mitchell, M., Palmarini, A. B. & Moskvichev, A. Comparing humans, GPT-4, and GPT-4V on abstraction and reasoning tasks. Preprint at https://arxiv.org/abs/2311.09247 (2023).
LeGris, S., Vong, W. K., Lake, B. M. & Gureckis, T. M. H-ARC: a robust estimate of human performance on the abstraction and reasoning corpus benchmark. Preprint at https://arxiv.org/abs/2409.01374 (2024).
McClelland, J. L. et al. Letting structure emerge: connectionist and dynamical systems approaches to cognition. Trends Cogn. Sci. 14, 348–356 (2010).
Chollet, F. On the measure of intelligence. Preprint at https://arxiv.org/abs/1911.01547 (2019).
Fedor, A., Varga, M. & Szathmáry, E. Semantics boosts syntax in artificial grammar learning tasks with recursion. J. Exp. Psychol. Learn. Mem. Cogn. 38, 776–782 (2012).
Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, 233 (2024).
Kolmogorov, A. N. Three approaches to the quantitative definition of information. Int. J. Comput. Math. 2, 157–168 (1968).
Li, M. & Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications Texts in Computer Science (Springer, 2019).
Johnston, I. G. et al. Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution. Proc. Natl Acad. Sci. USA 119, e2113883119 (2022).
Dingle, K., Camargo, C. Q. & Louis, A. A. Input–output maps are strongly biased towards simple outputs. Nat. Commun. 9, 761 (2018).
Merrill, W. & Sabharwal, A. A little depth goes a long way: the expressive power of log-depth transformers. Preprint at https://arxiv.org/abs/2503.03961 (2025).
Yang, A., Chiang, D. & Angluin, D. Masked hard-attention transformers recognize exactly the star-free languages. Adv. Neural Inf. Proc. Syst. 37, 10202–10235 (2024).
Amiri, A., Huang, X., Rofin, M. & Hahn, M. Lower bounds for chain-of-thought reasoning in hard-attention transformers. Preprint at https://arxiv.org/abs/2502.02393 (2025).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
Poesia, G. & Goodman, N. D. Peano: learning formal mathematical reasoning. Phil. Trans. R. Soc. A 381, 20220044 (2023).
Trinh, T. H., Wu, Y., Le, Q. V., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature 625, 476–482 (2024).
Ellis, K. et al. DreamCoder: bootstrapping inductive program synthesis with wake–sleep library learning. In Proc. 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation 835–850 (Association for Computing Machinery, 2021); https://doi.org/10.1145/3453483.3454080
Lippl, S. & Stachenfeld, K. When does compositional structure yield compositional generalization? A kernel theory. Preprint at https://arxiv.org/abs/2405.16391 (2024).
Ito, T. et al. Compositional generalization through abstract representations in human and artificial neural networks. Adv. Neural Inf. Process. Syst. 35, 32225–32239 (2022).
Zhang, Z., Lin, P., Wang, Z., Zhang, Y. & Xu, Z.-Q. J. Initialization is critical to whether transformers fit composite functions by reasoning or memorizing. Adv. Neural Inf. Proc. Syst. 37, 14093–14126 (2024).
Saglietti, L., Mannelli, S. & Saxe, A. An analytical theory of curriculum learning in teacher–student networks. Adv. Neural Inf. Process. Syst. 35, 21113–21127 (2022).
Friedman, D., Wettig, A. & Chen, D. Learning transformer programs. Adv. Neural Inf. Process. Syst. 36, 49044–49067 (2023).
Acknowledgements
We thank M. Carmosino and K. Srivastava for helpful discussions on earlier versions of the paper. We acknowledge funding support from the Exploratory Science Councils at IBM Research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Martha Lewis, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ito, T., Campbell, M., Horesh, L. et al. Quantifying artificial intelligence through algorithmic generalization. Nat Mach Intell 7, 1195–1205 (2025). https://doi.org/10.1038/s42256-025-01092-w
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01092-w