Fig. 5: Striatal dopamine tracks the expected reward value of the multi-trial POMDP RL model. | Nature Communications

Fig. 5: Striatal dopamine tracks the expected reward value of the multi-trial POMDP RL model.

From: Temporal regularities shape perceptual decisions and striatal dopamine signals

Fig. 5

a Schematic of fiber photometry in the dorsolateral striatum (DLS), imaging dopamine release using ultra-fast dopamine sensors (GRABDA2m). b Psychometric curves of mice (n = 6) during the dopamine recording experiment. Gray lines show individual mice, whereas black data points show the group average. Error bars in all panels depict SEMs. c Trial-by-trial dopamine responses from all sessions of an example animal, aligned to stimulus onset (white dashed line) and sorted by trial type (left column) and outcome time (black dots). d Group-average dopamine response (n = 6 mice, d, e, f, and i), aligned to stimulus onset (gray dashed line), split by stimulus contrast (gray to black; correct trials only). Gray shaded area indicates the stimulus time period over which we averaged stimulus responses (e and i; excluding time points after reward delivery). e Average stimulus-evoked dopamine responses as a function of current absolute contrast (rewarded trials only; averaged over gray shaded area in (d)). f History kernel of the probabilistic choice model fit to mice data (solid line) and the predicted history kernel of the multi-trial POMDP RL model (dashed line). Mice exhibit a higher 2- compared to 1-back choice weight (inset, one-sided t-test, t(5) = −2.73, p = 0.02). Shaded region depicts SEMs. g Expected reward value Q (black) of the multi-trial POMDP RL model as a function of current contrast (absolute value, i.e., independent of its L or R position), separately when the current stimulus is on the same (repeat, blue) or opposite side (alternate, orange) as the 2-back stimulus (current and previous rewarded trials only). Q reflects the expected value before the choice, computed by summing QL and QR weighted by the probability of making a left and right choice. For QC, the expected value after the choice, see Supplementary Fig. 13. h Difference in Q between repetitions and alternations of stimulus side (ΔQ) as a function of n-back trial (current and previous rewarded trials only). The single-trial (blue) and multi-trial models (pink) make opposite predictions about the difference between 1- and 2-back trials. While the single-trial model predicts a higher ΔDA for the 1-back versus to 2-back trial, the multi-trial model predicts a higher ΔDA for the 2- compared to 1-back trial. i Difference in stimulus-evoked dopamine responses between repetitions and alternations of stimulus side (ΔDA) as a function of n-back trial (current and previous rewarded trials only). Mice exhibit a higher 2- compared to 1-back ΔDA (inset, two-sided t-test, t(5) = 3.51, p = 0.017). 1- and 2-back ΔDA are not significantly different from zero (n.s., two-sided t-tests), respectively. Source data are provided as a Source Data file.

Back to article page