Extended Data Table 1 Accuracy and RMS Calibration error of frontier LLMs on the text-only questions of HLE

From: A benchmark of expert-level academic questions to assess AI capabilities