A fully automated methodology based on rubrics capturing a broad range of cognitive and intellectual demands is illustrated using LLMs and tasks, demonstrating a new way to evaluate the capabilities of AI systems and anticipate their performance.
- Lexin Zhou
- Lorenzo Pacchiardi
- José Hernández-Orallo