Fig. 3: Diverse asymmetric learning.
From: Distributional reinforcement learning in prefrontal cortex

a,b, Asymmetric scaling and asymmetric learning are both predictions of distributional RL, but are dissociable. Asymmetric scaling reflects differences in the degree to which positive and negative RPEs are scaled to predict firing rate. Asymmetric learning reflects differences in the rate of state value update after positive and negative RPEs (which may or may not be affected by asymmetric scaling). These different learning rates are denoted by α+ and α−, respectively. \(\delta =r-V\) is the RPE, where r is the reward on the current trial and V the value. a, Simulated examples demonstrating the difference between asymmetric scaling and learning, as governed by the equations in b. The top shows predicted RPEs generated by asymmetric scaling with symmetric learning (equations (iii) and (ii)). In this extreme case, the scaling does not impact learning and the learned value would converge on the expectation. The middle and bottom show the converse: RPEs and corresponding values generated by symmetric scaling with asymmetric learning (equations (i) and (iv)). We have presented them in this way to highlight how asymmetric scaling and asymmetric learning can be dissociable phenomena that we can measure separately, not because we do not predict that they are related. On the contrary, we show that they are related in g. c, Comparing the crossvalidated model fits revealed that a model with both asymmetric learning and asymmetric scaling (ALAS) is the best explanation of the ACC data, and the fully classic (symmetric) model (SLSS) is the worst model of the data. Each bar in the bar graph shows the comparison between a pair of models and is the difference in the R2 value of the two models being compared. Error bars denote s.e.m. The significance of the differences is determined by paired, two-sided Student’s t-tests over neurons: *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001. d, Example model fits. Top, RPE regressors generated using learning rate parameters fitted to individual neuron data, for three different neurons from the same session. Different levels of optimism can be seen via the different rates at which RPEs tend back toward zero after changes in state value (denoted by the dashed black line in the bottom plot). Bottom, this is reflected in the corresponding values. The pessimistic neuron (shown in blue), for example, is quick to devalue but slow to value. e, Example real neuron responses around transitions in the sign of the RPE, from three separate neurons. We used the best-fitting model to define trials when the RPE switched from negative to positive, or vice versa. We then plotted the mean firing rate on that first trial of the switch and the subsequent trial, and observed asymmetries in the rate of change in the firing rate after the first positive versus negative RPE, as predicted by distributional RL in a. For example, the (pessimistic) neuron on the left changes its firing rate more following negative than positive RPEs (the slope for negative RPEs is more positive than the slope for positive RPEs is negative), indicating that it has learnt more from negative than from positive RPEs. The converse pattern is true for the (optimistic) neuron on the right. Error bars denote s.e.m. f, The per-neuron asymmetry in learning derived from the model, defined as \({\alpha }^{+}/({\alpha }^{+}+{\alpha }^{-})\), estimated in one half of the data predicted that in the other half of the data (R = 0.62, P = 0.0001), demonstrating that there is consistent diversity in asymmetric learning across the population of neurons, as predicted by distributional RL. g, Asymmetric learning and asymmetric scaling positively correlated, consistent with the theoretical proposal that asymmetric scaling drives asymmetric learning (R = 0.35, P = 0.04 for a correlation between asymmetric learning estimated in the first data partition and asymmetric scaling estimated in the second, and R = 0.38, P = 0.03 for the converse correlation; average across partitions: mean R = 0.37, geometric mean P = 0.03).