Expert-level test is a head-scratcher for AI

Collins, Katherine M.; Tenenbaum, Joshua B.

doi:10.1038/d41586-025-04098-x

NEWS AND VIEWS
28 January 2026

Expert-level test is a head-scratcher for AI

Conventional benchmarks are becoming less effective at assessing AI performance, but a multi-disciplinary test has set AI systems a fresh challenge.

By

Katherine M. Collins
1. Katherine M. Collins is in the Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
View author publications

Search author on: PubMed Google Scholar
Joshua B. Tenenbaum
1. Joshua B. Tenenbaum is in the Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
View author publications

Search author on: PubMed Google Scholar

Access through your institution

Buy or subscribe

The quest to build increasingly powerful artificial-intelligence systems demands a clear definition of what counts as intelligence, and how it should be measured. AI systems are typically assessed using tests called benchmarks. These are often sets of question–answer pairs in which each question has a definitive, verifiable answer that enables the AI tool to be scored automatically. Benchmarks have been used to assess how quickly frontier AI models (such as those behind OpenAI’s ChatGPT and Google’s Gemini systems) are improving in capacities ranging from general common sense¹ and domain-specific knowledge² to code generation³ and mathematical problem-solving⁴. However, over time, many benchmarks become less effective at identifying genuine advances — a phenomenon known as benchmark saturation.

Access options

Access through your institution

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Learn more

Prices may be subject to local taxes which are calculated during checkout

Nature 649, 1115-1116 (2026)

doi: https://doi.org/10.1038/d41586-025-04098-x

References

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A. & Choi, Y. in Proc. 57th Annu. Meet. Assoc. Comput. Linguist. 4791–4800 (Association for Computational Linguistics, 2019).
Article Google Scholar
Hendrycks, D. et al. in 8th Int. Conf. Learn. Represent. (ICLR, 2020).
Article Google Scholar
Jimenez, C. E. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.06770 (2024).
Cobbe, K. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.14168 (2021).
Center for AI Safety, Scale AI & HLE Contributors Consortium Nature 649, 1139–1146 (2026).
Article Google Scholar
Collins, K. M. et al. Nature Hum. Behav. 8, 1851–1863 (2024).
Article PubMed Google Scholar
Chu, J., Tenenbaum, J. B. & Schulz, L. E. Trends Cogn. Sci. 28, 628–642 (2024).
Article PubMed Google Scholar
Getzels, J. W. in Frontiers of Creativity Research: Beyond the Basics (ed. Isaksen, S. G.) 88–102 (Bearly, 1987).
Google Scholar

Download references

Reprints and permissions

Competing Interests

The authors know some colleagues who participated in the HLE question generation and review process.

Subjects

Jobs

Associate or Senior Editor, Nature

Title: Associate or Senior Editor, Nature Locations: Shanghai or Madrid – hybrid working model Closing Date: 21st April 2026 About Springer Natur...

Shanghai (CN), Madrid

Springer Nature Ltd
Associate or Senior Editor, Scientific Reviews

Job Title: Associate or Senior Editor, Scientific Reviews Locations: Shanghai or Beijing (hybrid) Application deadline: April 23rd, 2026 About Sp...

Shanghai (CN) /Beijing

Springer Nature Ltd
Associate or Senior Editor, Communications Sustainability

Title: Associate or Senior Editor, Communications Sustainability Location: Shanghai, Beijing, Nanjing, Pune or New Delhi Application Deadline: Marc...

Shanghai, Beijing, Nanjing, Pune or New Delhi

Springer Nature Ltd
Global Talent Recruitment Announcement of the College of Engineering, HZAU

Join HZAU's global faculty team to advance research with competitive benefits.

Wuhan, Hubei (CN)

Huazhong Agricultural University (HZAU)
Assistant Professor (teaching) - Computer Science and Data Science

This field encompasses Computer Science, Data Science, Artificial Intelligence, and related interdisciplinary areas

Dongguan, Guangdong

City University of Hong Kong (Dongguan)

[1] Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A. & Choi, Y. in Proc. 57th Annu. Meet. Assoc. Comput. Linguist. 4791–4800 (Association for Computational Linguistics, 2019).
Article Google Scholar

[2] Hendrycks, D. et al. in 8th Int. Conf. Learn. Represent. (ICLR, 2020).
Article Google Scholar

[3] Jimenez, C. E. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.06770 (2024).

[4] Cobbe, K. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.14168 (2021).

[5] Center for AI Safety, Scale AI & HLE Contributors Consortium Nature 649, 1139–1146 (2026).
Article Google Scholar

[6] Collins, K. M. et al. Nature Hum. Behav. 8, 1851–1863 (2024).
Article PubMed Google Scholar

[7] Chu, J., Tenenbaum, J. B. & Schulz, L. E. Trends Cogn. Sci. 28, 628–642 (2024).
Article PubMed Google Scholar

[8] Getzels, J. W. in Frontiers of Creativity Research: Beyond the Basics (ed. Isaksen, S. G.) 88–102 (Bearly, 1987).
Google Scholar

Access options

References

Competing Interests

Related Articles

Subjects

Latest on:

Jobs

Associate or Senior Editor, Nature

Associate or Senior Editor, Scientific Reviews

Associate or Senior Editor, Communications Sustainability

Global Talent Recruitment Announcement of the College of Engineering, HZAU

Assistant Professor (teaching) - Computer Science and Data Science

Search

Quick links