Fig. 2: The multistage pipeline of DeepSeek-R1.
From: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

A detailed background on DeepSeek-V3 Base and DeepSeek-V3 is provided in Supplementary Information, section 1.1. The models DeepSeek-R1 Dev1, Dev2 and Dev3 represent intermediate checkpoints in this pipeline.