Fig. 3: Potential mechanisms for asymmetric learning.

a Schematic of the Mechanism 1. Increases or decreases in baseline dopamine modulate the degree to which bursts and pauses in dopamine cause changes in D1R and D2R occupancy. Increases in baseline dopamine make dopamine pauses to cause greater decreases in D2R occupancy than the increases in D1R occupancy caused by dopamine bursts. Conversely, decreases in dopamine make dopamine bursts to cause smaller increases in D1R occupancy than the decreases in D2R occupancy caused by dopamine pauses. b Schematic of the change in receptor occupancies in D1R and D2R, for a given transient increase or decrease in dopamine, caused by a firing rate ‘burst’ or ‘pause’, receptively. The slope is modulated by the baseline dopamine (colormap) and is equivalent to the receptor’s sensitivity to dopamine transients (\({f}^{{{{\rm{D}}}}1}\) and \({f}^{{{{\rm{D}}}}2}\) in Eq. 9). Here \({\sigma }^{{{{\rm{D}}}}1}\), \({\sigma }^{{{{\rm{D}}}}2}\) corresponds to the receptors’ dose-occupancy curves. The receptor sensitivities (\({f}^{{{{\rm{D}}}}1}\) and \({f}^{{{{\rm{D}}}}2}\)), act as asymmetric learning rates in our model (\({\alpha }^{+}\) and \({\alpha }^{-}\)). c Receptor sensitivity for D1R and D2R as a function of baseline dopamine. d Asymmetric scaling factor (\(\tau\)) as a function of baseline dopamine. Colors depict how ‘optimistic’ or ‘pessimistic’ the convergent value estimate will be when learning with a given \(\tau\). e Schematic of Mechanism 2. Left, the relationship between dopamine reward responses (spikes/s denoted by \({\psi }^{+}\) and \({\psi }^{-}\) for dopamine bursts and pauses, respectively) and RPEs. The slopes of these response functions correspond to the asymmetric learning rates (\({\alpha }^{+}\), \({\alpha }^{-}\)) for positive and negative RPEs, respectively. Colors depict how optimistic or pessimistic the convergent value estimate will be when learning with a given asymmetric scaling factor. Source data provided in ‘source_data/figure_3’.