Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Search
Log in

nature
search

Search

Advanced search

Filter By:

Journal Check one or more journals to show results from those journals only.

Nature (2)

Choose more journals

Article type Check one or more article types to show results from those article types only.

Research (2)

Subject Check one or more subjects to show results from those subjects only.

Business and industry
Mathematics and computing
Psychology

Date Choose a date option to show results from those dates only.

Today
Last 7 days
Last 30 days
Last 12 months
Last 2 years
Last 5 years

Custom date range

Clear all filters

Sort by:

Relevance

Date published (new to old)

Date published (old to new)

Showing 1–2 of 2 results

Advanced filters: Author: Yael Moros-Daval Clear advanced filters

General scales unlock AI evaluation with explanatory and predictive power

A fully automated methodology based on rubrics capturing a broad range of cognitive and intellectual demands is illustrated using LLMs and tasks, demonstrating a new way to evaluate the capabilities of AI systems and anticipate their performance.

Lexin Zhou
Lorenzo Pacchiardi
José Hernández-Orallo
ResearchOpen Access01 Apr 2026
Nature

Volume: 652, P: 58-67
Larger and more instructable language models become less reliable

Scaling up and shaping up large language models increased their tendency to provide sensible yet incorrect answers at difficulty levels humans cannot supervise, highlighting the need for a fundamental shift in artificial intelligence design towards reliability.

Lexin Zhou
Wout Schellaert
José Hernández-Orallo
ResearchOpen Access25 Sept 2024
Nature

Volume: 634, P: 61-68

Search

Search articles by subject, keyword or author

Show results from

Advanced search

Quick links

Explore articles by subject
Find a job
Guide to authors
Editorial policies

Nature.com

nature.com footer links

About Nature Portfolio

About us
Press releases
Press office
Contact us

Discover content

Journals A-Z
Articles by subject
protocols.io
Nature Index

Publishing policies

Nature portfolio policies
Open access

Author & Researcher services

Reprints & permissions
Research data
Language editing
Scientific editing
Nature Masterclasses
Research Solutions

Libraries & institutions

Librarian service & tools
Librarian portal
Open research
Recommend to library

Advertising & partnerships

Advertising
Partnerships & Services
Media kits
Branded content

Professional development

Nature Awards
Nature Careers
Nature Conferences

Regional websites

Nature Africa
Nature China
Nature India
Nature Japan
Nature Middle East

Privacy Policy
Use of cookies
Legal notice
Accessibility statement
Terms & Conditions
Your US state privacy rights

Search

Filter By:

General scales unlock AI evaluation with explanatory and predictive power

Larger and more instructable language models become less reliable

Search

Quick links