Extended Data Fig. 6: Decoding reward timing from the hyperbolic model and exponential model simulations. | Nature

Extended Data Fig. 6: Decoding reward timing from the hyperbolic model and exponential model simulations.

From: Multi-timescale reinforcement learning in the brain

Extended Data Fig. 6

a, Distribution of the inferred discount parameter k across the neurons. b, Correlation between the discount factor inferred in the exponential model of the discount parameter k from the hyperbolic model (r = −0.9, P < 1.0 × 10−30, Student’s t-test). Note the in the hyperbolic model a larger value of k implies faster discounting hence the negative correlation. c, Discount matrix for the hyperbolic model. For each neuron we plot the relative value of future events given its inferred discount parameter. Neurons are sorted by decreasing estimated value of the discount parameter. d, Decoded subjective expected timing of future reward \(E(r|t)\) using the discount matrix from the hyperbolic model (see Methods). e, Distribution of 1-Wassertein distances between the reward timing and the predicted reward timing from the decoding on the test data with the exponential model (shown in Fig. 2k, top row) and on the test data with the hyperbolic model (shown in d). Decoding is better for the exponential model from Fig. 2 than the hyperbolic model except for the shortest delay (P(t = 0.6 s) = 1, P(t = 1.5 s) <1.0 × 10−31, P(t = 3.75) < 1.0 × 10−33, P(t = 9.375 s) <1.0 × 10−3), one-tailed Wilcoxon signed rank test, see Methods). f, Decoded subjective expected timing of future reward \(E(r|t)\) using simulated data based on the parameters of the exponential model (see Methods). g, Distribution of 1-Wassertein distances between the reward timing and the predicted reward timing from the decoding on the test data from exponential fits (shown in Fig. 2k, top row) and on the simulated data from the parameters of the exponential fits (shown in f). Decoding is marginally better for the data predictions (P(t = 0.6 s) = 0.002, P(t = 1.5 s) = 0.999, P(t = 3.75) < 1 × 10−12, P(t = 9.375 s) = 0.027), one-tailed Wilcoxon signed rank test, see Methods), suggesting that decoding accuracy is limited by the number of trials.

Back to article page