Extended Data Fig. 10: Logic Outline.
From: Mesolimbic dopamine adapts the rate of learning from action

A) Key points of the paper grouped by theme (left), with location in figures for primary supporting data (blue). B) In Reinforcement Learning, an agent learns iteratively from environmental feedback to improve a policy, which is a set of parameters (Θ) describing an action (a) that is performed given a state (s). In policy learning, the agent applies a learning rule.