Quantifying artificial intelligence through algorithmic generalization

Ito, Takuya; Campbell, Murray; Horesh, Lior; Klinger, Tim; Ram, Parikshit

doi:10.1038/s42256-025-01092-w

Perspective
Published: 18 August 2025

Quantifying artificial intelligence through algorithmic generalization

Nature Machine Intelligence volume 7, pages 1195–1205 (2025)Cite this article

1687 Accesses
7 Altmetric
Metrics details

Subjects

Abstract

The rapid development of artificial intelligence (AI) systems has created an urgent need for their scientific quantification. While their fluency across a variety of domains is impressive, AI systems fall short on tests requiring algorithmic reasoning—a glaring limitation, given the necessity for interpretable and reliable technology. Despite a surge in reasoning benchmarks emerging from the academic community, no theoretical framework exists to quantify algorithmic reasoning in AI systems. Here we adopt a framework from computational complexity theory to quantify algorithmic generalization using algebraic expressions: algebraic circuit complexity. Algebraic circuit complexity theory—the study of algebraic expressions as circuit models—is a natural framework for studying the complexity of algorithmic computation. Algebraic circuit complexity enables the study of generalization by defining benchmarks in terms of the computational requirements for solving a problem. Moreover, algebraic circuits are generic mathematical objects; an arbitrarily large number of samples can be generated for a specified circuit, making it an ideal experimental sandbox for the data-hungry models that are used today. In this Perspective, we adopt tools from algebraic circuit complexity, apply them to formalize a science of algorithmic generalization, and address key challenges for its successful application to AI science.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Examples of algebraic expressions represented as circuits.**

**Fig. 2: Commonly used AI evaluations for length generalization with arithmetic tasks.**

**Fig. 3: Algorithmic capabilities of modern AI systems and architectures can be studied with algebraic circuits.**

**Fig. 4: Analogues of common compositional generalization benchmarks in terms of algebraic circuits.**

**Fig. 5: Algebraic problems as machine learning challenges.**

**Fig. 6: Using an algebraic circuit’s adjacency matrix as a ground-truth comparison to interpret transformer attention representations.**

**Fig. 7: The algebraic generalization capability across circuit divergence metrics can be evaluated through few-shot prompting in LLMs.**

Navigating artificial general intelligence development: societal, technological, ethical, and brain-inspired pathways

Article Open access 11 March 2025

Scalable algorithm simplification using quantum AND logic

Article Open access 14 November 2022

Linear growth of quantum circuit complexity

Article Open access 28 March 2022

References

Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://arxiv.org/abs/2303.12712 (2023).
Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022); https://openreview.net/pdf?id=yzkSU5zdwD
Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. https://doi.org/10.1038/s41562-023-01659-w (2023).
DeepSeek-AI et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at https://arxiv.org/abs/2501.12948 (2025).
Kim, N., Linzen, T. & Smolensky, P. Uncontrolled lexical exposure leads to overestimation of compositional generalization in pretrained models. Preprint at https://arxiv.org/abs/2212.10769 (2022).
Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? Adv. Neural Inf. Process. Syst. 36, 55565–55581 (2023).
Wu, Z. et al. Reasoning or eeciting? Exploring the capabilities and limitations of language models through counterfactual tasks. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (eds Duh, K. et al.) 1819–1862 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.naacl-long.102
Lake, B. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning 2873–2882 (PMLR, 2018); http://proceedings.mlr.press/v80/lake18a.html
Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise? J. Artif. Intell. Res. 67, 757–795 (2020).
Hudson, D. A. & Manning, C. D. GQA: a new dataset for real-world visual reasoning and compositional question answering. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6700–6709 (IEEE, 2019); https://openaccess.thecvf.com/content_CVPR_2019/html/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.html
Yang, G. R., Ganichev, I., Wang, X.-J., Shlens, J. & Sussillo, D. A dataset and architecture for visual reasoning with a working memory. In Computer Vision – ECCV 2018 Lecture Notes in Computer Science Vol. 11214 (eds Ferrari, V. et al.) 729–745 (Springer, 2018); https://doi.org/10.1007/978-3-030-01249-6_44
Ito, T., Dan, S., Rigotti, M., Kozloski, J. & Campbell, M. On the generalization capacity of neural networks during generic multimodal reasoning. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=zyBJodMrn5&noteId=zyBJodMrn5
Johnson, J. et al. CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1988–1997 (IEEE, 2017); https://doi.org/10.1109/CVPR.2017.215
Clark, C. et al. BoolQ: exploring the surprising difficulty of natural yes/no questions. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 2924–2936 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1300
Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 9087–9105 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.731
Ruis, L., Andreas, J., Baroni, M., Bouchacourt, D. & Lake, B. M. A benchmark for systematic generalization in grounded language understanding. Adv. Neural Inf. Process. Syst. 33, 19861–19872 (2020).
Keysers, D. et al. Measuring compositional generalization: a comprehensive method on realistic data. In International Conference on Learning Representations (2020); https://openreview.net/forum?id=SygcCnNKwr&utm_campaign=Graph
Kudo, K. et al. Do deep neural networks capture compositionality in arithmetic reasoning? In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (Vlachos, A. & Augenstein, I.) 1351–1362 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.eacl-main.98
McLeish, S. et al. Transformers can do arithmetic with the right embeddings. Adv. Neural Inf. Proc. Syst. 37, 108012–108041 (2024).
Google Scholar
Zhou, H. et al. What algorithms can transformers learn? A study in length generalization. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=AssIuHnmHX
Shen, R. et al. Positional description matters for transformers arithmetic. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=ZMuPAOY8Oz
Saxton, D., Grefenstette, E., Hill, F. & Kohli, P. Analysing mathematical reasoning abilities of neural models. In International Conference on Learning Representations (2019); https://openreview.net/forum?id=H1gR5iR5FX
Lee, N., Sreenivasan, K., Lee, J. D., Lee, K. & Papailiopoulos, D. Teaching arithmetic to small transformers. In International Conference on Learning Representations (2023); https://openreview.net/forum?id=dsUB4bst9S
Dziri, N. et al. Faith and fate: limits of transformers on compositionality. Adv. Neural Inf. Process. Syst. 36, 70293–70332 (2023).
Google Scholar
McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl Acad. Sci. USA 121, e2322420121 (2024).
Zhou, Y. et al. Transformers can achieve length generalization but not robustly. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=DWkWIh3vFJ
Sinha, S., Premsri, T. & Kordjamshidi, P. A survey on compositional learning of AI models: theoretical and experimental practices. Trans. Mach. Learn. Res. https://openreview.net/forum?id=BXDxwItNqQ (2024).
Frege, G. in Logic and Philosophy for Linguists (ed. Moravcsik, J. M. E.) 279–298 (De Gruyter, 1975); https://doi.org/10.1515/9783111546216-018
Carnap, R. Meaning and Necessity: A Study in Semantics and Modal Logic Vol. 30 (Univ. Chicago Press, 1988).
Russin, J., McGrath, S. W., Williams, D. J. & Elber-Dorozko, L. From Frege to chatGPT: compositionality in language, cognition, and deep neural networks. Preprint at https://arxiv.org/abs/2405.15164 (2024).
Kazemnejad, A., Padhi, I., Natesan Ramamurthy, K., Das, P. & Reddy, S. The impact of positional encoding on length generalization in transformers. Ad. Neural Inf. Process. Syst. 36, 24892–24928 (2023).
Google Scholar
Hupkes, D. et al. A taxonomy and review of generalization research in NLP. Nat. Mach. Intell. 5, 1161–1174 (2023).
Article Google Scholar
Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).
Article Google Scholar
Fodor, J. & McLaughlin, B. P. Connectionism and the problem of systematicity: why Smolensky’s solution doesn’t work. Cognition 35, 183–204 (1990).
Article Google Scholar
Smolensky, P. in Connectionism and the Philosophy of Mind (eds Horgan, T. & Tienson, J.) 281–308 (Springer, 1991); https://doi.org/10.1007/978-94-011-3524-5_13
Deletang, G. et al. Neural networks and the chomsky hierarchy. In International Conference on Learning Representations (2022); https://openreview.net/forum?id=WbxHAzkeQcn
Zhou, H. et al. Teaching algorithmic reasoning via in-context learning. In International Conference on Learning Representations (2023); https://openreview.net/forum?id=6dlC7E1H_9
Ruoss, A. et al. Randomized positional encodings boost length generalization of transformers. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Rogers, A. et al.) 1889–1903 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.acl-short.161
Jelassi, S. et al. Length generalization in arithmetic transformers. Preprint at https://arxiv.org/abs/2306.15400 (2023).
Wang, C., Zheng, B., Niu, Y. & Zhang, Y. in Natural Language Processing and Chinese Computing (eds Wang, L. et al.) 758–769 (Springer, 2021); https://doi.org/10.1007/978-3-030-88480-2_61
Nogueira, R., Jiang, Z. & Lin, J. Investigating the limitations of transformers with simple arithmetic tasks. Preprint at https://arxiv.org/abs/2102.13019 (2021).
Shpilka, A. & Yehudayoff, A. Arithmetic circuits: a survey of recent results and open questions. Found. Trends Theor. Comput. Sci. 5, 207–388 (2010).
Article MathSciNet Google Scholar
Bürgisser, P., Clausen, M. & Shokrollahi, M. A. Algebraic Complexity Theory Vol. 315 (Springer Science & Business Media, 2013).
Arora, S. & Barak, B. Computational Complexity: A Modern Approach (Cambridge Univ. Press, 2009).
Wyeth, C. & Sturtivant, C. A circuit complexity formulation of algorithmic information theory. Physica D 456, 133925 (2023).
Article MathSciNet Google Scholar
Ram, P., Klinger, T. & Gray, A. G. What makes models compositional? A theoretical view: with supplement. Preprint at https://arxiv.org/abs/2405.02350 (2024).
Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956).
Article Google Scholar
Strobl, L., Merrill, W., Weiss, G., Chiang, D. & Angluin, D. What formal languages can transformers express? A survey. Trans. Assoc. Comput. Linguist. 12, 543–561 (2024).
Article Google Scholar
Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature https://doi.org/10.1038/s41586-023-06668-3 (2023).
Ruis, L. & Lake, B. Improving systematic generalization through modularity and augmentation. Preprint at https://arxiv.org/abs/2202.10745 (2022).
Ontanon, S., Ainslie, J., Fisher, Z. & Cvicek, V. Making transformers solve compositional tasks. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Muresan, S. et al) 3591–3607 (Association for Computational Linguistics, 2022); https://doi.org/10.18653/v1/2022.acl-long.251
Csordás, R., Irie, K. & Schmidhuber, J. The devil is in the detail: simple tricks improve systematic generalization of transformers. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 619–634 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.emnlp-main.49
Klinger, T. et al. Compositional program generation for systematic generalization. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (2023); https://openreview.net/forum?id=Wxj9U0ySU-s
Poggio, T. & Fraser, M. Compositional sparsity of learnable functions. article, center for brains, minds and machines (CBMM). MIT Libraries https://dspace.mit.edu/handle/1721.1/153475 (2024).
Zhang, X., Blanchet, J., Ghosh, S. & Squillante, M. S. A class of geometric structures in transfer learning: minimax bounds and optimality. In Proc. 25th International Conference on Artificial Intelligence and Statistics 3794–3820 (PMLR, 2022); https://proceedings.mlr.press/v151/zhang22a.html
Ghosh, S., Squillante, M. & Wollega, E. Efficient generalization with distributionally robust learning. Adv. Neural Inf. Process. Syst. 34, 28310–28322 (2021).
Google Scholar
Hadley, R. F. Systematicity in connectionist language learning. Mind Lang. 9, 247–272 (1994).
Article Google Scholar
Meurer, A. et al. SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017).
Article Google Scholar
Hendrycks, D. et al. Measuring massive multitask language understanding. In International Conference on Learning Representations (2020); https://openreview.net/forum?id=d7KBjmI3GmQ
Andreas, J. Measuring compositionality in representation learning. In International Conference on Learning Representations (2018); https://openreview.net/forum?id=HJz05o0qK7
Nezhurina, M., Cipolina-Kun, L., Cherti, M. & Jitsev, J. Alice in Wonderland: simple tasks showing complete reasoning breakdown in state-of-the-art large language models. Preprint at https://arxiv.org/abs/2406.02061 (2024).
Chan, S. C. Y. et al. Transformers generalize differently from information stored in context vs in weights. Preprint at https://arxiv.org/abs/2210.05675 (2022).
Reddy, G. The mechanistic basis of data dependence and abrupt learning in an in-context classification task. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=aN4Jf6Cx69
Olsson, C. et al. In-context learning and induction heads. Preprint at https://arxiv.org/abs/2209.11895 (2022).
Zhou, D. et al. Least-to-most prompting enables complex reasoning in large language models. In International Conference on Learning Representations (2022); https://openreview.net/forum?id=WZH7099tgfM
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Google Scholar
Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. OpenReview.net https://openreview.net/forum?id=iedYJm92o0a&ref=morioh.com&utm_source=morioh.com (2021).
Feng, G. et al. Towards revealing the mystery behind chain of thought: a theoretical perspective. Adv. Neural Inf. Process. Syst. 36, 70757–70798 (2023).
Google Scholar
Merrill, W. & Sabharwal, A. The expressive power of transformers with chain of thought. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=NjNGlPh8Wh
Li, Z., Liu, H., Zhou, D. & Ma, T. Chain of thought empowers transformers to solve inherently serial problems. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=3EWTEy9MTM
Shao, Z. et al. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at https://arxiv.org/abs/2402.03300 (2024).
Mitchell, M., Palmarini, A. B. & Moskvichev, A. Comparing humans, GPT-4, and GPT-4V on abstraction and reasoning tasks. Preprint at https://arxiv.org/abs/2311.09247 (2023).
LeGris, S., Vong, W. K., Lake, B. M. & Gureckis, T. M. H-ARC: a robust estimate of human performance on the abstraction and reasoning corpus benchmark. Preprint at https://arxiv.org/abs/2409.01374 (2024).
McClelland, J. L. et al. Letting structure emerge: connectionist and dynamical systems approaches to cognition. Trends Cogn. Sci. 14, 348–356 (2010).
Article Google Scholar
Chollet, F. On the measure of intelligence. Preprint at https://arxiv.org/abs/1911.01547 (2019).
Fedor, A., Varga, M. & Szathmáry, E. Semantics boosts syntax in artificial grammar learning tasks with recursion. J. Exp. Psychol. Learn. Mem. Cogn. 38, 776–782 (2012).
Article Google Scholar
Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, 233 (2024).
Article Google Scholar
Kolmogorov, A. N. Three approaches to the quantitative definition of information. Int. J. Comput. Math. 2, 157–168 (1968).
Article MathSciNet Google Scholar
Li, M. & Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications Texts in Computer Science (Springer, 2019).
Johnston, I. G. et al. Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution. Proc. Natl Acad. Sci. USA 119, e2113883119 (2022).
Article MathSciNet Google Scholar
Dingle, K., Camargo, C. Q. & Louis, A. A. Input–output maps are strongly biased towards simple outputs. Nat. Commun. 9, 761 (2018).
Article Google Scholar
Merrill, W. & Sabharwal, A. A little depth goes a long way: the expressive power of log-depth transformers. Preprint at https://arxiv.org/abs/2503.03961 (2025).
Yang, A., Chiang, D. & Angluin, D. Masked hard-attention transformers recognize exactly the star-free languages. Adv. Neural Inf. Proc. Syst. 37, 10202–10235 (2024).
Google Scholar
Amiri, A., Huang, X., Rofin, M. & Hahn, M. Lower bounds for chain-of-thought reasoning in hard-attention transformers. Preprint at https://arxiv.org/abs/2502.02393 (2025).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
Article MathSciNet Google Scholar
Poesia, G. & Goodman, N. D. Peano: learning formal mathematical reasoning. Phil. Trans. R. Soc. A 381, 20220044 (2023).
Article Google Scholar
Trinh, T. H., Wu, Y., Le, Q. V., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature 625, 476–482 (2024).
Article Google Scholar
Ellis, K. et al. DreamCoder: bootstrapping inductive program synthesis with wake–sleep library learning. In Proc. 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation 835–850 (Association for Computing Machinery, 2021); https://doi.org/10.1145/3453483.3454080
Lippl, S. & Stachenfeld, K. When does compositional structure yield compositional generalization? A kernel theory. Preprint at https://arxiv.org/abs/2405.16391 (2024).
Ito, T. et al. Compositional generalization through abstract representations in human and artificial neural networks. Adv. Neural Inf. Process. Syst. 35, 32225–32239 (2022).
Google Scholar
Zhang, Z., Lin, P., Wang, Z., Zhang, Y. & Xu, Z.-Q. J. Initialization is critical to whether transformers fit composite functions by reasoning or memorizing. Adv. Neural Inf. Proc. Syst. 37, 14093–14126 (2024).
Google Scholar
Saglietti, L., Mannelli, S. & Saxe, A. An analytical theory of curriculum learning in teacher–student networks. Adv. Neural Inf. Process. Syst. 35, 21113–21127 (2022).
Google Scholar
Friedman, D., Wettig, A. & Chen, D. Learning transformer programs. Adv. Neural Inf. Process. Syst. 36, 49044–49067 (2023).
Google Scholar

Download references

Acknowledgements

We thank M. Carmosino and K. Srivastava for helpful discussions on earlier versions of the paper. We acknowledge funding support from the Exploratory Science Councils at IBM Research.

Author information

Authors and Affiliations

Mathematics and Theoretical Computer Science Department, T.J. Watson Research Center, IBM Research, Yorktown Heights, NY, USA
Takuya Ito, Murray Campbell, Lior Horesh, Tim Klinger & Parikshit Ram

Authors

Takuya Ito
View author publications
Search author on:PubMed Google Scholar
Murray Campbell
View author publications
Search author on:PubMed Google Scholar
Lior Horesh
View author publications
Search author on:PubMed Google Scholar
Tim Klinger
View author publications
Search author on:PubMed Google Scholar
Parikshit Ram
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Takuya Ito.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Martha Lewis, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ito, T., Campbell, M., Horesh, L. et al. Quantifying artificial intelligence through algorithmic generalization. Nat Mach Intell 7, 1195–1205 (2025). https://doi.org/10.1038/s42256-025-01092-w

Download citation

Received: 03 October 2024
Accepted: 18 June 2025
Published: 18 August 2025
Issue date: August 2025
DOI: https://doi.org/10.1038/s42256-025-01092-w