Fig. 4

RL with belief states explains dopamine reward responses and behavior better than standard RL. Individual DA responses to rewards were fit using either a standard RL model or a RL model computing values on belief states. a Fits to dopamine responses on trial 1. Both RL models fit the dopamine response, since on trial 1 there is no evidence to infer a state on. b Fits to dopamine responses on trial 2. Only computing RPEs using belief states reproduced the non-monotonic change in dopamine response across increasing rewards. c Model predictions on behavior. The value functions from either model fits were positively correlated with the mice’s anticipatory licking, but the RL model with belief state provided a better fit (signed rank test: p = 0.032), suggesting that mice’s anticipatory licking tracks the value of the belief state. d Individual examples of extracted value function from either model and anticipatory licking across increasing rewards on trial 2. n = 11, data represent mean ± s.e.m.