Table 1 An interesting ‘aha moment’ of an intermediate version of DeepSeek-R1-Zero

Question: If a > 1, then the sum of the real solutions of \(\sqrt{{\boldsymbol{a}}{\boldsymbol{-}}\sqrt{{\boldsymbol{a}}{\boldsymbol{+}}{\boldsymbol{x}}}}{\boldsymbol{=}}{\boldsymbol{x}}\) is equal to
Response: <think> To solve the equation \(\sqrt{a-\sqrt{a+x}}=x\), let’s start by squaring both … \({(\sqrt{a-\sqrt{a+x}})}^{2}={x}^{2}\Rightarrow a-\sqrt{a+x}={x}^{2}.\) Rearrange to isolate the inner square root term: \({(a-{x}^{2})}^{2}=a+x\Rightarrow {a}^{2}-2a{x}^{2}+{({x}^{2})}^{2}=a+x\Rightarrow {x}^{4}-2a{x}^{2}-x+({a}^{2}-a)=0\) … Wait, wait. Wait. That’s an aha moment I can flag here. Let’s reevaluate this step by step to identify whether the correct sum can be … We started with the equation: \(\sqrt{a-\sqrt{a+x}}=x\) First, let’s square both sides: \(a-\sqrt{a+x}={x}^{2}\Rightarrow \sqrt{a+x}=a-{x}^{2}\) Next, I could square both sides again, treating the equation: … …

The model learns to rethink using an anthropomorphic tone. This is also an aha moment for us, allowing us to witness the power and beauty of RL.

Quick links

Search