Nature

Extended Data Fig. 2: Illustration of the proposed GRPO for RL-based training. | Nature

Extended Data Fig. 2: Illustration of the proposed GRPO for RL-based training.

From: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

Extended Data Fig. 2: Illustration of the proposed GRPO for RL-based training.

Search

Advanced search

Quick links