Extended Data Fig. 4: Decoding reward timing using the regularized pseudo-inverse of the discount matrix. | Nature

Extended Data Fig. 4: Decoding reward timing using the regularized pseudo-inverse of the discount matrix.

From: Multi-timescale reinforcement learning in the brain

Extended Data Fig. 4

(a-c), Singular value decomposition (SVD) of the discount matrix. a, left singular vectors (in the neuron space). b, Singular values. The black line at 2 indicates the values of the regularization term α. c, right singular vectors (in the time space). d, Decoding matrix based on the regularized pseudo-inverse. e, Distribution of 1-wassertein distances between the reward timing and the predicted reward timing from the decoding on the test data from exponential fits (shown in Fig. 2k, top row) and on the average exponential model (shown in Fig. 2k, bottom row). Decoding is better for the exponential model from Fig. 2 than the average exponential model except for the shortest delay (P(t = 0.6 s) = 1, P(t = 1.5 s) <1.0 × 10−31, P(t = 3.75) = 0.0135, P(t = 9.375 s) <1.0 × 10−14), one-tailed Wilcoxon signed rank test, see Methods). f, The ability to decode the timing of expected future reward is not due to a general property of the discounting matrix and collapses if we randomize the identity of the cue responses (see Methods). g, Distribution of 1-Wassertein distances between the reward timing and the predicted reward timing from the decoding on the test data exponential fits (shown in Fig. 2k, top row) and on the shuffled data (shown in panel f). The prediction from the test data are better predictions (smaller 1-Wasserstein distance) than shuffled data (P = 1.2 × 10−4 for 0.6 s reward delay, P < 1.0 × 10−20 for the other delays, one-tailed Wilcoxon signed rank test, see Methods).

Back to article page