Table 5 The performance of the LLMs on the MMLU⁵⁰, GPQA⁴⁹, and DROP⁵¹benchmarks, collected from the following references^38,39,48.

Benchmark	Prompt	GPT-4o	Claude 3.5 S.	Gemini 1.5 P.
MMLU (Undergraduate Level Knowledge)	Zero Shot	88.70%	88.30%	-
MMLU (Undergraduate Level Knowledge)	Five Shot	-	88.70%	85.90%
GPQA (Graduate Level Reasoning)	Chain of Thought	53.60%	59.40%	46.20%
DROP (Reasoning)	Three Shot	83.40%	87.10%	74.90%

Quick links

Search