Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Quantifying artificial intelligence through algorithmic generalization

Abstract

The rapid development of artificial intelligence (AI) systems has created an urgent need for their scientific quantification. While their fluency across a variety of domains is impressive, AI systems fall short on tests requiring algorithmic reasoning—a glaring limitation, given the necessity for interpretable and reliable technology. Despite a surge in reasoning benchmarks emerging from the academic community, no theoretical framework exists to quantify algorithmic reasoning in AI systems. Here we adopt a framework from computational complexity theory to quantify algorithmic generalization using algebraic expressions: algebraic circuit complexity. Algebraic circuit complexity theory—the study of algebraic expressions as circuit models—is a natural framework for studying the complexity of algorithmic computation. Algebraic circuit complexity enables the study of generalization by defining benchmarks in terms of the computational requirements for solving a problem. Moreover, algebraic circuits are generic mathematical objects; an arbitrarily large number of samples can be generated for a specified circuit, making it an ideal experimental sandbox for the data-hungry models that are used today. In this Perspective, we adopt tools from algebraic circuit complexity, apply them to formalize a science of algorithmic generalization, and address key challenges for its successful application to AI science.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Examples of algebraic expressions represented as circuits.
Fig. 2: Commonly used AI evaluations for length generalization with arithmetic tasks.
Fig. 3: Algorithmic capabilities of modern AI systems and architectures can be studied with algebraic circuits.
Fig. 4: Analogues of common compositional generalization benchmarks in terms of algebraic circuits.
Fig. 5: Algebraic problems as machine learning challenges.
Fig. 6: Using an algebraic circuit’s adjacency matrix as a ground-truth comparison to interpret transformer attention representations.
Fig. 7: The algebraic generalization capability across circuit divergence metrics can be evaluated through few-shot prompting in LLMs.

Similar content being viewed by others

References

  1. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://arxiv.org/abs/2303.12712 (2023).

  2. Wei, J. et al. Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022); https://openreview.net/pdf?id=yzkSU5zdwD

  3. Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Nat. Hum. Behav. https://doi.org/10.1038/s41562-023-01659-w (2023).

  4. DeepSeek-AI et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. Preprint at https://arxiv.org/abs/2501.12948 (2025).

  5. Kim, N., Linzen, T. & Smolensky, P. Uncontrolled lexical exposure leads to overestimation of compositional generalization in pretrained models. Preprint at https://arxiv.org/abs/2212.10769 (2022).

  6. Schaeffer, R., Miranda, B. & Koyejo, S. Are emergent abilities of large language models a mirage? Adv. Neural Inf. Process. Syst. 36, 55565–55581 (2023).

  7. Wu, Z. et al. Reasoning or eeciting? Exploring the capabilities and limitations of language models through counterfactual tasks. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (eds Duh, K. et al.) 1819–1862 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.naacl-long.102

  8. Lake, B. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning 2873–2882 (PMLR, 2018); http://proceedings.mlr.press/v80/lake18a.html

  9. Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise? J. Artif. Intell. Res. 67, 757–795 (2020).

  10. Hudson, D. A. & Manning, C. D. GQA: a new dataset for real-world visual reasoning and compositional question answering. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6700–6709 (IEEE, 2019); https://openaccess.thecvf.com/content_CVPR_2019/html/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.html

  11. Yang, G. R., Ganichev, I., Wang, X.-J., Shlens, J. & Sussillo, D. A dataset and architecture for visual reasoning with a working memory. In Computer Vision – ECCV 2018 Lecture Notes in Computer Science Vol. 11214 (eds Ferrari, V. et al.) 729–745 (Springer, 2018); https://doi.org/10.1007/978-3-030-01249-6_44

  12. Ito, T., Dan, S., Rigotti, M., Kozloski, J. & Campbell, M. On the generalization capacity of neural networks during generic multimodal reasoning. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=zyBJodMrn5&noteId=zyBJodMrn5

  13. Johnson, J. et al. CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1988–1997 (IEEE, 2017); https://doi.org/10.1109/CVPR.2017.215

  14. Clark, C. et al. BoolQ: exploring the surprising difficulty of natural yes/no questions. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 2924–2936 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1300

  15. Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) 9087–9105 (Association for Computational Linguistics, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.731

  16. Ruis, L., Andreas, J., Baroni, M., Bouchacourt, D. & Lake, B. M. A benchmark for systematic generalization in grounded language understanding. Adv. Neural Inf. Process. Syst. 33, 19861–19872 (2020).

  17. Keysers, D. et al. Measuring compositional generalization: a comprehensive method on realistic data. In International Conference on Learning Representations (2020); https://openreview.net/forum?id=SygcCnNKwr&utm_campaign=Graph

  18. Kudo, K. et al. Do deep neural networks capture compositionality in arithmetic reasoning? In Proc. 17th Conference of the European Chapter of the Association for Computational Linguistics (Vlachos, A. & Augenstein, I.) 1351–1362 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.eacl-main.98

  19. McLeish, S. et al. Transformers can do arithmetic with the right embeddings. Adv. Neural Inf. Proc. Syst. 37, 108012–108041 (2024).

    Google Scholar 

  20. Zhou, H. et al. What algorithms can transformers learn? A study in length generalization. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=AssIuHnmHX

  21. Shen, R. et al. Positional description matters for transformers arithmetic. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=ZMuPAOY8Oz

  22. Saxton, D., Grefenstette, E., Hill, F. & Kohli, P. Analysing mathematical reasoning abilities of neural models. In International Conference on Learning Representations (2019); https://openreview.net/forum?id=H1gR5iR5FX

  23. Lee, N., Sreenivasan, K., Lee, J. D., Lee, K. & Papailiopoulos, D. Teaching arithmetic to small transformers. In International Conference on Learning Representations (2023); https://openreview.net/forum?id=dsUB4bst9S

  24. Dziri, N. et al. Faith and fate: limits of transformers on compositionality. Adv. Neural Inf. Process. Syst. 36, 70293–70332 (2023).

    Google Scholar 

  25. McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression show how large language models are shaped by the problem they are trained to solve. Proc. Natl Acad. Sci. USA 121, e2322420121 (2024).

  26. Zhou, Y. et al. Transformers can achieve length generalization but not robustly. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=DWkWIh3vFJ

  27. Sinha, S., Premsri, T. & Kordjamshidi, P. A survey on compositional learning of AI models: theoretical and experimental practices. Trans. Mach. Learn. Res. https://openreview.net/forum?id=BXDxwItNqQ (2024).

  28. Frege, G. in Logic and Philosophy for Linguists (ed. Moravcsik, J. M. E.) 279–298 (De Gruyter, 1975); https://doi.org/10.1515/9783111546216-018

  29. Carnap, R. Meaning and Necessity: A Study in Semantics and Modal Logic Vol. 30 (Univ. Chicago Press, 1988).

  30. Russin, J., McGrath, S. W., Williams, D. J. & Elber-Dorozko, L. From Frege to chatGPT: compositionality in language, cognition, and deep neural networks. Preprint at https://arxiv.org/abs/2405.15164 (2024).

  31. Kazemnejad, A., Padhi, I., Natesan Ramamurthy, K., Das, P. & Reddy, S. The impact of positional encoding on length generalization in transformers. Ad. Neural Inf. Process. Syst. 36, 24892–24928 (2023).

    Google Scholar 

  32. Hupkes, D. et al. A taxonomy and review of generalization research in NLP. Nat. Mach. Intell. 5, 1161–1174 (2023).

    Article  Google Scholar 

  33. Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).

    Article  Google Scholar 

  34. Fodor, J. & McLaughlin, B. P. Connectionism and the problem of systematicity: why Smolensky’s solution doesn’t work. Cognition 35, 183–204 (1990).

    Article  Google Scholar 

  35. Smolensky, P. in Connectionism and the Philosophy of Mind (eds Horgan, T. & Tienson, J.) 281–308 (Springer, 1991); https://doi.org/10.1007/978-94-011-3524-5_13

  36. Deletang, G. et al. Neural networks and the chomsky hierarchy. In International Conference on Learning Representations (2022); https://openreview.net/forum?id=WbxHAzkeQcn

  37. Zhou, H. et al. Teaching algorithmic reasoning via in-context learning. In International Conference on Learning Representations (2023); https://openreview.net/forum?id=6dlC7E1H_9

  38. Ruoss, A. et al. Randomized positional encodings boost length generalization of transformers. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (eds Rogers, A. et al.) 1889–1903 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.acl-short.161

  39. Jelassi, S. et al. Length generalization in arithmetic transformers. Preprint at https://arxiv.org/abs/2306.15400 (2023).

  40. Wang, C., Zheng, B., Niu, Y. & Zhang, Y. in Natural Language Processing and Chinese Computing (eds Wang, L. et al.) 758–769 (Springer, 2021); https://doi.org/10.1007/978-3-030-88480-2_61

  41. Nogueira, R., Jiang, Z. & Lin, J. Investigating the limitations of transformers with simple arithmetic tasks. Preprint at https://arxiv.org/abs/2102.13019 (2021).

  42. Shpilka, A. & Yehudayoff, A. Arithmetic circuits: a survey of recent results and open questions. Found. Trends Theor. Comput. Sci. 5, 207–388 (2010).

    Article  MathSciNet  Google Scholar 

  43. Bürgisser, P., Clausen, M. & Shokrollahi, M. A. Algebraic Complexity Theory Vol. 315 (Springer Science & Business Media, 2013).

  44. Arora, S. & Barak, B. Computational Complexity: A Modern Approach (Cambridge Univ. Press, 2009).

  45. Wyeth, C. & Sturtivant, C. A circuit complexity formulation of algorithmic information theory. Physica D 456, 133925 (2023).

    Article  MathSciNet  Google Scholar 

  46. Ram, P., Klinger, T. & Gray, A. G. What makes models compositional? A theoretical view: with supplement. Preprint at https://arxiv.org/abs/2405.02350 (2024).

  47. Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 2, 113–124 (1956).

    Article  Google Scholar 

  48. Strobl, L., Merrill, W., Weiss, G., Chiang, D. & Angluin, D. What formal languages can transformers express? A survey. Trans. Assoc. Comput. Linguist. 12, 543–561 (2024).

    Article  Google Scholar 

  49. Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature https://doi.org/10.1038/s41586-023-06668-3 (2023).

  50. Ruis, L. & Lake, B. Improving systematic generalization through modularity and augmentation. Preprint at https://arxiv.org/abs/2202.10745 (2022).

  51. Ontanon, S., Ainslie, J., Fisher, Z. & Cvicek, V. Making transformers solve compositional tasks. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Muresan, S. et al) 3591–3607 (Association for Computational Linguistics, 2022); https://doi.org/10.18653/v1/2022.acl-long.251

  52. Csordás, R., Irie, K. & Schmidhuber, J. The devil is in the detail: simple tricks improve systematic generalization of transformers. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 619–634 (Association for Computational Linguistics, 2021); https://doi.org/10.18653/v1/2021.emnlp-main.49

  53. Klinger, T. et al. Compositional program generation for systematic generalization. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (2023); https://openreview.net/forum?id=Wxj9U0ySU-s

  54. Poggio, T. & Fraser, M. Compositional sparsity of learnable functions. article, center for brains, minds and machines (CBMM). MIT Libraries https://dspace.mit.edu/handle/1721.1/153475 (2024).

  55. Zhang, X., Blanchet, J., Ghosh, S. & Squillante, M. S. A class of geometric structures in transfer learning: minimax bounds and optimality. In Proc. 25th International Conference on Artificial Intelligence and Statistics 3794–3820 (PMLR, 2022); https://proceedings.mlr.press/v151/zhang22a.html

  56. Ghosh, S., Squillante, M. & Wollega, E. Efficient generalization with distributionally robust learning. Adv. Neural Inf. Process. Syst. 34, 28310–28322 (2021).

    Google Scholar 

  57. Hadley, R. F. Systematicity in connectionist language learning. Mind Lang. 9, 247–272 (1994).

    Article  Google Scholar 

  58. Meurer, A. et al. SymPy: symbolic computing in Python. PeerJ Comput. Sci. 3, e103 (2017).

    Article  Google Scholar 

  59. Hendrycks, D. et al. Measuring massive multitask language understanding. In International Conference on Learning Representations (2020); https://openreview.net/forum?id=d7KBjmI3GmQ

  60. Andreas, J. Measuring compositionality in representation learning. In International Conference on Learning Representations (2018); https://openreview.net/forum?id=HJz05o0qK7

  61. Nezhurina, M., Cipolina-Kun, L., Cherti, M. & Jitsev, J. Alice in Wonderland: simple tasks showing complete reasoning breakdown in state-of-the-art large language models. Preprint at https://arxiv.org/abs/2406.02061 (2024).

  62. Chan, S. C. Y. et al. Transformers generalize differently from information stored in context vs in weights. Preprint at https://arxiv.org/abs/2210.05675 (2022).

  63. Reddy, G. The mechanistic basis of data dependence and abrupt learning in an in-context classification task. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=aN4Jf6Cx69

  64. Olsson, C. et al. In-context learning and induction heads. Preprint at https://arxiv.org/abs/2209.11895 (2022).

  65. Zhou, D. et al. Least-to-most prompting enables complex reasoning in large language models. In International Conference on Learning Representations (2022); https://openreview.net/forum?id=WZH7099tgfM

  66. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).

    Google Scholar 

  67. Nye, M. et al. Show your work: scratchpads for intermediate computation with language models. OpenReview.net https://openreview.net/forum?id=iedYJm92o0a&ref=morioh.com&utm_source=morioh.com (2021).

  68. Feng, G. et al. Towards revealing the mystery behind chain of thought: a theoretical perspective. Adv. Neural Inf. Process. Syst. 36, 70757–70798 (2023).

    Google Scholar 

  69. Merrill, W. & Sabharwal, A. The expressive power of transformers with chain of thought. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=NjNGlPh8Wh

  70. Li, Z., Liu, H., Zhou, D. & Ma, T. Chain of thought empowers transformers to solve inherently serial problems. In International Conference on Learning Representations (2024); https://openreview.net/forum?id=3EWTEy9MTM

  71. Shao, Z. et al. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at https://arxiv.org/abs/2402.03300 (2024).

  72. Mitchell, M., Palmarini, A. B. & Moskvichev, A. Comparing humans, GPT-4, and GPT-4V on abstraction and reasoning tasks. Preprint at https://arxiv.org/abs/2311.09247 (2023).

  73. LeGris, S., Vong, W. K., Lake, B. M. & Gureckis, T. M. H-ARC: a robust estimate of human performance on the abstraction and reasoning corpus benchmark. Preprint at https://arxiv.org/abs/2409.01374 (2024).

  74. McClelland, J. L. et al. Letting structure emerge: connectionist and dynamical systems approaches to cognition. Trends Cogn. Sci. 14, 348–356 (2010).

    Article  Google Scholar 

  75. Chollet, F. On the measure of intelligence. Preprint at https://arxiv.org/abs/1911.01547 (2019).

  76. Fedor, A., Varga, M. & Szathmáry, E. Semantics boosts syntax in artificial grammar learning tasks with recursion. J. Exp. Psychol. Learn. Mem. Cogn. 38, 776–782 (2012).

    Article  Google Scholar 

  77. Lampinen, A. K. et al. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3, 233 (2024).

    Article  Google Scholar 

  78. Kolmogorov, A. N. Three approaches to the quantitative definition of information. Int. J. Comput. Math. 2, 157–168 (1968).

    Article  MathSciNet  Google Scholar 

  79. Li, M. & Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications Texts in Computer Science (Springer, 2019).

  80. Johnston, I. G. et al. Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution. Proc. Natl Acad. Sci. USA 119, e2113883119 (2022).

    Article  MathSciNet  Google Scholar 

  81. Dingle, K., Camargo, C. Q. & Louis, A. A. Input–output maps are strongly biased towards simple outputs. Nat. Commun. 9, 761 (2018).

    Article  Google Scholar 

  82. Merrill, W. & Sabharwal, A. A little depth goes a long way: the expressive power of log-depth transformers. Preprint at https://arxiv.org/abs/2503.03961 (2025).

  83. Yang, A., Chiang, D. & Angluin, D. Masked hard-attention transformers recognize exactly the star-free languages. Adv. Neural Inf. Proc. Syst. 37, 10202–10235 (2024).

    Google Scholar 

  84. Amiri, A., Huang, X., Rofin, M. & Hahn, M. Lower bounds for chain-of-thought reasoning in hard-attention transformers. Preprint at https://arxiv.org/abs/2502.02393 (2025).

  85. Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).

    Article  MathSciNet  Google Scholar 

  86. Poesia, G. & Goodman, N. D. Peano: learning formal mathematical reasoning. Phil. Trans. R. Soc. A 381, 20220044 (2023).

    Article  Google Scholar 

  87. Trinh, T. H., Wu, Y., Le, Q. V., He, H. & Luong, T. Solving olympiad geometry without human demonstrations. Nature 625, 476–482 (2024).

    Article  Google Scholar 

  88. Ellis, K. et al. DreamCoder: bootstrapping inductive program synthesis with wake–sleep library learning. In Proc. 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation 835–850 (Association for Computing Machinery, 2021); https://doi.org/10.1145/3453483.3454080

  89. Lippl, S. & Stachenfeld, K. When does compositional structure yield compositional generalization? A kernel theory. Preprint at https://arxiv.org/abs/2405.16391 (2024).

  90. Ito, T. et al. Compositional generalization through abstract representations in human and artificial neural networks. Adv. Neural Inf. Process. Syst. 35, 32225–32239 (2022).

    Google Scholar 

  91. Zhang, Z., Lin, P., Wang, Z., Zhang, Y. & Xu, Z.-Q. J. Initialization is critical to whether transformers fit composite functions by reasoning or memorizing. Adv. Neural Inf. Proc. Syst. 37, 14093–14126 (2024).

    Google Scholar 

  92. Saglietti, L., Mannelli, S. & Saxe, A. An analytical theory of curriculum learning in teacher–student networks. Adv. Neural Inf. Process. Syst. 35, 21113–21127 (2022).

    Google Scholar 

  93. Friedman, D., Wettig, A. & Chen, D. Learning transformer programs. Adv. Neural Inf. Process. Syst. 36, 49044–49067 (2023).

    Google Scholar 

Download references

Acknowledgements

We thank M. Carmosino and K. Srivastava for helpful discussions on earlier versions of the paper. We acknowledge funding support from the Exploratory Science Councils at IBM Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuya Ito.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Martha Lewis, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ito, T., Campbell, M., Horesh, L. et al. Quantifying artificial intelligence through algorithmic generalization. Nat Mach Intell 7, 1195–1205 (2025). https://doi.org/10.1038/s42256-025-01092-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-025-01092-w

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics