Extended Data Table 1 An illustrative example from the AIME dataset

From: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

  1. The presented question is sourced from the 2024 AIME. The model is tasked with solving the problem and formatting its answer in a required format (for example, ANSWER). For evaluation, a rule-based grading system is used to determine correctness. The output of the model is considered correct if and only if it exactly matches the ground-truth solution.