Extended Data Fig. 10: Alternative model agnostic, directly data-derived, measures of asymmetric learning correlate with model fit parameter-derived asymmetric learning.
From: Distributional reinforcement learning in prefrontal cortex

a) Similar to the analysis in Fig. 3e, wherein we analysed the neural firing rates around transitions in the sign of the RPE (as defined from the best-fitting model), we also analysed firing rates on the first and second trials following the highest reward option and the lowest reward option (that is, analogous to Fig. 3e but where the x-axis is trial number following the onset of consecutive trials of highest – or lowest – reward level delivered). On these trial types we can be confident that all neurons (regardless of optimism) will have positive and negative RPEs, respectively (because these reward levels are at the extremes of the reward distribution). We observed diversity across the population of neurons in a per-neuron t-score measure, obtained from an unpaired t-test testing for differences in firing rate change between the first and second trial following a high reward vs. the same following a low reward. (Note this is the same measure that we used in the main text to provide a per-neuron measure capturing the asymmetries plotted in Fig. 3e, which we correlated with model-derived asymmetric learning.) These t-scores reflect the per-cell significance in rejecting the null hypothesis that there is no difference in firing rate change from the first to second trial receiving highest reward, vs. that on lowest reward. a is a histogram of these t-scores, and demonstrates there is significant diversity across the population. b) This t-score measure in a correlated across neurons with asymmetric learning derived from the best-fitting model (R = 0.21, P = 0.044). c) Additionally, we constructed a regression model to capture asymmetries in the effect of a highest vs. lowest reward level delivered on the previous trial on the current trial’s firing rate. This also captures asymmetries in learning. The regression model was the following: \(FR=\,{\beta }_{0}+{\beta }_{1}Rew(t)+{\beta }_{2}HighestRew(t-1)+{\beta }_{3}LowestRew(t-1)\), where Rew(t) is the reward on the current trial, HighestRew(t−1) is a binary regressor with value 1 if the previous trial delivered the highest reward level and 0 otherwise, and LowestRew(t−1) is similarly a binary regressor with value 1 if the previous trial delivered a lowest reward level and 0 otherwise. We then do a [1 -1] contrast for β3 vs. β2 to capture differences in the effect of a highest vs. lowest reward delivered on the previous trial on the current trial’s firing rate. This value will be more positive if delivery of the highest reward level on the previous trial decreases the firing on the current trial more than delivery of the lowest reward level increases it (this pattern would be expected from an optimistic neuron). Delivery of the highest reward level is expected to decrease firing on the subsequent trial (captured by β2) due to the learning induced from positive outcomes: it should suppress subsequent RPEs as the value expectation is now higher (same logic as in Bayer & Glimcher31). Similarly, delivery of the lowest reward level is expected to increase firing on the subsequent trial (captured by β3) due to the learning induced from negative outcomes: it should increase subsequent RPEs as the value expectation is now lower. The [1 -1] contrast testing β3 vs. β2 captures the relative differences in these effects and is therefore another index of asymmetric learning: optimistic neurons should be more impacted by the highest reward level compared to the lowest. We found the t-scores of this contrast were also diverse across the population (c), correlated with the other data-driven measure described above in a and b (d; R = 0.44, P < 0.001), and also correlated with asymmetric learning derived from the best-fitting model (e; R = 0.21, P = 0.039). Combining both of these noisy measures from a and c into a hybrid measure (by averaging the t-scores) gives a summary model-agnostic measure that is also correlated with model-derived asymmetric learning: R = 0.25, P = 0.016.