Extended Data Fig. 9: Discounting heterogeneity explains ramping diversity in a common reward expectation model.

a, Uncertainty in reward timing reduces as mice approach the reward zone. Not only does the mean expected reward time reduces but the standard deviation of the estimate also reduces. Distribution in the bottom row from fitted data (see panels c-e). b, A model where each neuron contributes to its individual value function but share a common reward expectation predicts ramping heterogeneity across neurons. Left panel, as mice approach reward, the uncertainty, quantified by the standard deviation, of reward timing reduces. 2nd panel from left, The Expectation of reward timing takes the form of a folded normal distribution. As the mice approach the reward there is a reduction of both the mean and the standard deviation of the expected reward timing distribution. 3rd panel from left, each neuron computes a distinct value function given their individual discount factor and the common expected reward timing distribution with. Right panel, The diverse value functions across neurons lead to ramping heterogeneity across neurons in the reward prediction error. (see Methods ‘Common Reward Expectation model’). c, The inferred standard deviation of the reward expectation model reduces as a function of time to reward. Line indicates the mean inferred standard deviation and the shading indicates the standard error of the mean over 100 bootstraps. d, Expected timing of the reward as a function of true time to reward. As the mice approach the reward not only does the mean expected time to reward reduces but the uncertainty of the reward timing captured by the standard deviation shown in c also reduces. This effect leads to increasingly convex value functions that lead to the observed ramps in dopamine neuron activity. e, Value function for each individual neuron (same order as in h-i). f, Distribution of inferred discount factors under the common reward expectation model. g, Although the range of discount factor between the fits from the common value (x axis) and common reward expectation (y axis) models differs, the inferred discount factors are strongly correlated for single neurons (Spearman’s ρ = 0.93, P < 1.0 × 10−20, two-tailed Student’s t-test). h, Predicted ramping activity from the model fits under the common reward expectation model. i, Diversity of ramping activity across single neurons as mice approach reward (aligned by inferred discount factor in the common reward expectation model).