Table 5 The performance of the LLMs on the MMLU50, GPQA49, and DROP51benchmarks, collected from the following references38,39,48.
Benchmark | Prompt | GPT-4o | Claude 3.5Â S. | Gemini 1.5Â P. |
|---|---|---|---|---|
MMLU (Undergraduate Level Knowledge) | Zero Shot | 88.70% | 88.30% | - |
Five Shot | - | 88.70% | 85.90% | |
GPQA (Graduate Level Reasoning) | Chain of Thought | 53.60% | 59.40% | 46.20% |
DROP (Reasoning) | Three Shot | 83.40% | 87.10% | 74.90% |