Abstract
Goal-directed navigation requires animals to continuously evaluate their current direction and speed of travel relative to landmarks to discern whether they are approaching or deviating from their goal. Striatal dopamine release signals the reward-predictive value of cues1,2, probably contributing to motivation3,4, but it is unclear how dopamine incorporates an animal’s ongoing trajectory for effective behavioural guidance. Here we demonstrate that cue-evoked striatal dopamine release in mice encodes bidirectional trajectory errors reflecting the relationship between the speed and direction of ongoing movement relative to optimal goal trajectories. Trajectory error signals could be computed from locomotion or visual flow, and were independent from simultaneous dopamine increases reflecting learned cue value. Joint trajectory error and cue-value encoding were reproduced by the reward prediction error term in a standard reinforcement learning algorithm with mixed sensorimotor inputs. However, these two signals had distinct state space requirements, suggesting that they could arise from a common reinforcement learning algorithm with distinct neural inputs. Striatum-wide multifibre array measurements resolved overlapping, yet temporally and anatomically separable, representations of trajectory error and cue value, indicating how functionally distinct dopamine signals for motivation and guidance are multiplexed across striatal regions to facilitate goal-directed behaviour.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All processed fluorescence, behavioural and fibre localization data have been deposited at Zenodo65 (https://doi.org/10.5281/zenodo.17653000). Additional data are available on reasonable request from the corresponding author.
Code availability
All code used in this study for key analyses was written in MATLAB or Python and is available at Zenodo65 (https://doi.org/10.5281/zenodo.17653000).
References
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Schultz, W. Multiple dopamine functions at different time courses. Annu. Rev. Neurosci. 30, 259–288 (2007).
Berridge, K. C. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology 191, 391–431 (2007).
Wise, R. A. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 483–494 (2004).
Chersi, F. & Burgess, N. The cognitive architecture of spatial navigation: hippocampal and striatal contributions. Neuron 88, 64–77 (2015).
Joel, D., Niv, Y. & Ruppin, E. Actor–critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw. 15, 535–547 (2002).
Montague, P., Dayan, P. & Sejnowski, T. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005).
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Gerfen, C. R. & Surmeier, D. J. Modulation of striatal projection systems by dopamine. Annu. Rev. Neurosci. 34, 441–466 (2011).
Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 (2018).
Lee, R. S., Sagiv, Y., Engelhard, B., Witten, I. B. & Daw, N. D. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat. Neurosci. https://doi.org/10.1038/s41593-024-01689-1 (2024).
Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Howe, M. W., Tierney, P. L., Sandberg, S. G., Phillips, P. E. M. & Graybiel, A. M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).
Kim, H. R. et al. A unified framework for dopamine signals across timescales. Cell 183, 1600–1616 (2020).
Krausz, T. A. et al. Dual credit assignment processes underlie dopamine signals in a complex spatial environment. Neuron 111, 3465–3478 (2023).
Farrell, K., Lak, A. & Saleem, A. B. Midbrain dopamine neurons signal phasic and ramping reward prediction error during goal-directed navigation. Cell Rep. 41, 111470 (2022).
Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).
Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).
Sousa, M. et al. A multidimensional distributional map of future reward in dopamine neurons. Nature 642, 691–699 (2025).
Masset, P. et al. Multi-timescale reinforcement learning in the brain. Nature 642, 682–690 (2025).
Dombeck, D. A., Harvey, C. D., Tian, L., Looger, L. L. & Tank, D. W. Functional imaging of hippocampal place cells at cellular resolution during virtual navigation. Nat. Neurosci. 13, 1433–1440 (2010).
Collins, A. L. & Saunders, B. T. Heterogeneity in striatal dopamine circuits: form and function in dynamic reward seeking. J. Neurosci. Res. 98, 1046–1069 (2020).
Vu, M.-A. T. et al. Targeted micro-fiber arrays for measuring and manipulating localized multi-scale neural dynamics over large, deep brain volumes during behavior. Neuron 112, 909–923 (2024).
Patriarchi, T. et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018).
Mohebi, A., Wei, W., Pelattini, L., Kim, K. & Berke, J. D. Dopamine transients follow a striatal gradient of reward time horizons. Nat. Neurosci. 27, 737–746 (2024).
Jørgensen, S. H. et al. Behavioral encoding across timescales by region-specific dopamine dynamics. Proc. Natl Acad. Sci. USA 120, e2215230120 (2023).
Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006).
Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).
Dodson, P. D. et al. Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc. Natl Acad. Sci. USA 113, E2180–E2188 (2016).
Da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).
Sutton, R. S. Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).
Lindsay, G. W., Rigotti, M., Warden, M. R., Miller, E. K. & Fusi, S. Hebbian learning in a random network captures selectivity properties of the prefrontal cortex. J. Neurosci. 37, 11021–11036 (2017).
Roy, N. A., Bak, J. H., Akrami, A., Brody, C. D. & Pillow, J. W. Extracting the dynamics of behavior in sensory decision-making experiments. Neuron 109, 597–610 (2021).
Graybiel, A. M. Habits, rituals, and the evaluative brain. Annu. Rev. Neurosci. 31, 359–387 (2008).
Yin, H. H. & Knowlton, B. J. The role of the basal ganglia in habit formation. Nat. Rev. Neurosci. 7, 464–476 (2006).
Shadmehr, R., Smith, M. A. & Krakauer, J. W. Error correction, sensory prediction, and adaptation in motor control. Annu. Rev. Neurosci. 33, 89–108 (2010).
Gadagkar, V. et al. Dopamine neurons encode performance error in singing birds. Science 354, 1278–1282 (2016).
Greenstreet, F. et al. Dopaminergic action prediction errors serve as a value-free teaching signal. Nature https://doi.org/10.1038/s41586-025-09008-9 (2025).
Green, J. et al. A cell-type-specific error-correction signal in the posterior parietal cortex. Nature 620, 366–373 (2023).
He, Q., Liu, J. L., Eschapasse, L., Beveridge, E. H. & Brown, T. I. A comparison of reinforcement learning models of human spatial navigation. Sci. Rep. 12, 13923 (2022).
Foster, D. J., Morris, R. G. & Dayan, P. A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000).
Watabe-Uchida, M., Zhu, L., Ogawa, S. K., Vamanrao, A. & Uchida, N. Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron 74, 858–873 (2012).
Jeong, H. et al. Mesolimbic dopamine release conveys causal associations. Science 378, eabq6740 (2022).
Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).
Matsuda, W. et al. Single nigrostriatal dopaminergic neurons form widely spread and highly dense axonal arborizations in the neostriatum. J. Neurosci. 29, 444–453 (2009).
Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).
Lerner, T. N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).
De Jong, J. W. et al. A neural circuit mechanism for encoding aversive stimuli in the mesolimbic dopamine system. Neuron 101, 133–151 (2019).
Van Elzelingen, W. et al. A unidirectional but not uniform striatal landscape of dopamine signaling for motivational stimuli. Proc. Natl Acad. Sci. USA 119, e2117270119 (2022).
Aragona, B. J. et al. Regional specificity in the real-time development of phasic dopamine transmission patterns during acquisition of a cue-cocaine association in rats. Eur. J. Neurosci. 30, 1889–1899 (2009).
Tsutsui-Kimura, I. et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. eLife 9, e62390 (2020).
Barter, J. W. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).
Coddington, L. T. & Dudman, J. T. Learning from action: reconsidering movement signaling in midbrain dopamine neuron activity. Neuron 104, 63–77 (2019).
Long, C. et al. Constraints on the subsecond modulation of striatal dynamics by physiological dopamine signaling. Nat. Neurosci. https://doi.org/10.1038/s41593-024-01699-z (2024).
Markowitz, J. E. et al. Spontaneous behaviour is structured by reinforcement without explicit reward. Nature 614, 108–117 (2023).
Lahiri, A. K. & Bevan, M. D. Dopaminergic transmission rapidly and persistently enhances excitability of D1 receptor-expressing striatal projection neurons. Neuron 106, 277–290 (2020).
Bouabid, S. et al. Distinct spatially organized striatum-wide acetylcholine dynamics for the learning and extinction of Pavlovian associations. Nat. Commun. 16, 5169 (2025).
Ragozzino, M. E., Mohler, E. G., Prior, M., Palencia, C. A. & Rozman, S. Acetylcholine activity in selective striatal regions supports behavioral flexibility. Neurobiol. Learn. Mem. 91, 13–22 (2009).
Claudi, F. Mouse top detailed. Zenodo https://doi.org/10.5281/zenodo.3925997 (2020).
Wang, Q. et al. The Mouse Brain Common Coordinate Framework: a 3D reference atlas. Cell 181, 936–953 (2020).
Aronov, D. & Tank, D. W. Engagement of neural circuits underlying 2D spatial navigation in a rodent virtual reality system. Neuron 84, 442–456 (2014).
Mardia, K. V. & Jupp, P. E. Directional Statistics (Wiley, 1999); https://doi.org/10.1002/9780470316979.
Brown, E. H. et al. Data and code for ‘Striatum-wide dopamine encodes trajectory errors separated from value’. Zenodo https://doi.org/10.5281/zenodo.17653000 (2026).
Acknowledgements
This work was supported by a Klingenstein-Simons’s Foundation fellowship, Whitehall Foundation Fellowship, National Institute of Mental Health (R01 MH125835) to M.W.H.; NIMH F31NS127536-01A1 to E.H.B.; NIMH F32MH120894 to M.-A.V.; and JSTPN Early Stage Training in Neuroscience Award to Y.Z. We thank the staff at Boston University Centers for Neurophotonics and Systems Neuroscience for financial and technical support; the members of the Micro CT core, especially S. Holder, for providing equipment and technical expertise for micro-CT scanning; the staff at the Boston University Animal Science Center for providing central laboratory and animal care and support resources; L. Tian and Y. Li for providing dLight1.3b and Ach3.0 viral vectors; and M. Valyear, C. Stern and M. Hasselmo for feedback on a draft of the manuscript.
Author information
Authors and Affiliations
Contributions
E.H.B. and M.W.H. conceived the project, designed experiments, interpreted the data and wrote the manuscript. E.H.B. performed experiments and conducted formal statistical analyses. Y.Z. and B.D. developed and wrote code for the learning analyses, with input from E.H.B. and M.W.H.; M.-A.V., S.B., E.H.B. and M.W.H. contributed to development and validation of the microfibre array method. J.L. and A.L.-K. conceived and wrote initial code for the tdRL model approach with contributions from E.H.B.; A.A. and C.G.-N. contributed to mouse behaviour training and fibre array construction.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Patricia Janak, Alexxai Kravitz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Locations of all fibers across individual mice.
Atlas registered locations of all fibers (red circles) included in analysis for each of 10 mice plotted in the coronal and horizontal planes relative to atlas common coordinate framework (see Methods). Each coronal plane represents a slice with a 1 mm thickness. Numbers beneath each coronal section are AP coordinates of the most anterior section of the slice, in mm relative to bregma. A few fibers appear outside of the striatal boundaries (i.e. in mice 7, 8 and 9), due to slight deviations in the actual striatum boundaries of individual mice, as determined by the CT image, from the atlas striatum boundaries. Brain schematics were adapted from the Allen Mouse Brain Common Coordinate Framework (CCFv3) (https://atlas.brain-map.org/).
Extended Data Fig. 2 Signal-to-noise ratios for determining recording site inclusion.
a, Power spectral density (PSD) plots for example sites with high (included sites, top) and low (excluded sites, bottom) signal-to-noise ratios (SNRs). Horizontal line indicates the noise power density, estimated as the average power density within the 10-14 Hz frequency band (see Methods). b, Average baseline-corrected ΔF/F aligned to cue onset on all congruent trials for the same example fibers in a. c, Peak cue-evoked ΔF/F as a function of log SNR, each data point represents one recording site, n = 638 sites across 10 mice. The vertical line represents the point where the signal is equal to the noise (SNR = 1). Recording sites to the left of the vertical line (SNR < 1) were excluded from future analyses. d, Violin plots showing the log-transformed SNR for sites classified (via micro-CT localization) as within (n = 356 sites) or outside of (n = 182 sites) the striatum. SNR > 1 in blue, SNR < 1 in red. Thin lines represent average SNR for individual mice. Error bar shows mean and 95% confidence interval estimated from linear mixed effects models (in striatum: t355 = 9.60 p = 4.33 × 10−19, n = 356 sites, 10 mice; out striatum: t181 = 1.35, p = 0.53, n = 182 sites, 10 mice, two-sided t-test on model intercepts). Effect of recording site classification on SNR estimated from a linear mixed effects model, two-sided t-test t636 = 8.62, p = 1.62 × 10−16, n = 638 sites, 10 mice. P-values were post hoc Bonferroni corrected for three comparisons.
Extended Data Fig. 3 Trajectory error encoding is present across trial types and is not explained by encoding of other task variables.
a, Violin plots of the percent change in the akaike information criterion (AIC) when each variable is removed from the model (removed variable indicated above each plot). Negative values indicate worse model fit when the variable is removed (see Methods for variable definitions). Reward history: t630 = 4.53, p = 2.85 × 10−5; upcoming reward: t630 = 6.64, p = 2.71 × 10−10; cue identity: t630 = 3.80, p = 6.36 × 10−4; locomotion: t630 = 5.42, p = 3.45 × 10−7, two-sided t-test on linear mixed effects model for main effect of model type on AIC, p-values bonferroni corrected for 4 comparisons (n = 316 sites, 10 mice). Each data point represents one site. b, TE coefficients from models with (grey) and without (coloured) each variable, averaged across sites and mice using a linear mixed effects model (df = 315, n = 316 sites, 10 mice). c, Violin plots of the peak TE coefficient magnitudes across recording sites in models with (grey) and without (coloured) each variable. TE coefficients were unaffected by exclusion of each variable (except cue identity). Reward history: t630 = 0.74; p = 1.0; upcoming reward: t630 = −0.65; p = 1.0; cue identity: t630 = −2.69; p = 0.30; locomotion: t630 = −1.02; p = 1.0; two-sided t-test on main effect of model type on TE coefficient magnitude in a linear mixed effects model (n = 316 sites, 10 mice), bonferroni corrected for 4 comparisons. d, Percent change in AIC comparing the full model to a model with the TE term replaced with a binary congruence only term (no scaling with angular speed), presented as in a. t630 = 4.52, p = 7.47 × 10−6, n = 316 sites, 10 mice, two-sided t-test on effect of model type in a linear mixed effects model. e, Model coefficient t-statistics for TE (orange) and congruence only (grey) averaged across recording sides and mice using a linear mixed effects model (df = 315, n = 316 sites, 10 mice). f, Violin plot showing the maximum coefficient t-statistics for TE and congruence coefficients in the full model and congruence only model respectively across recording sites and mice. t630 = −4.24, p = 2.59 × 10−5, n = 316 sites, 10 mice, two-sided t-test on effect of model type on coefficient t-stat in a linear mixed effects model. g, TE coefficients averaged across sites and mice using a linear mixed effects model for trials split by cue identity (left, df = 315, n = 316 sites, 10 mice), initial running direction (middle, df = 233, n = 234 sites, 7 mice, only mice with >30 trials in each direction were included), and congruence (right, df = 315, n = 316 sites, 10 mice) independently. h, Violin plots comparing TE coefficients computed on trials split by cue identity (left), initial rotational velocity direction (middle), and congruence (right). Cue identity: t630 = −1.23, p = 0.66, n = 316 sites, 10 mice; rotational velocity direction: t466 = −0.65, p = 1.00, n = 234 sites, 7 mice; congruence: t630 = −0.24, p = 1.00, n = 316 sites, 10 mice, two-sided t-test on effect of trial type on maximum TE coefficient in a linear mixed effects model, followed by a Bonferroni correction for three comparisons. Shaded regions and error bars in all plots are 95% confidence intervals. Thin lines in violin plots and cue aligned averages represent averages for individual mice. For box plots in a,c,d,f,h, the centre point is the median, the lower and upper bounds are the first and third quartiles, and the whiskers extend from the box bounds to the most extreme value no further than 1.5 x interquartile range from the bounds.
Extended Data Fig. 4 Relationships of dopamine signaling with task-independent locomotion cannot account for trajectory error encoding.
a, Average acceleration coefficients during running in directions contralateral (left) and ipsilateral (right) to the implant in a model describing ΔF/F as a function of continuous acceleration at different time lags during the intertrial interval (see Methods). Acceleration was calculated from the absolute value of the angular velocity for contralateral and ipsilateral running periods independently. Thin lines represent averages for each mouse, thick black line, average across mice (n = 193 sites, 6 mice) b, Left, average signed angular acceleration (+contralateral, − ipsilateral) aligned to cue onset on congruent (green) and incongruent (purple) contralateral cue trials, split into tertiles (dark to light colours indicates fast to slow) for one mouse. Accelerations on incongruent trials are oppositely signed for ipsilateral and contralateral trials, but TE coefficients are positive for both trial types (Extended Data Fig. 3g). c, Average absolute value of angular treadmill velocity aligned to cue onset across all trials (left) and trials where running was contralateral (middle) and ipsilateral (right) to the implant at cue onset during a non-directional, Pavlovian delay conditioning task (top left inset, see Methods). Trial averages are split into tertiles within 0.3 s before cue onset (dark to light colours indicate fast to slow trials). d, Average cue onset aligned ΔF/F for a single recording site on the trial types indicated in c. e, Violin plots showing maximum and minimum angular velocity coefficients across all sites for the trial types in c. For the boxplot, the centre point is the median, the lower and upper bounds are the first and third quartiles, and the whiskers extend from the box bounds to the most extreme value no further than 1.5 x interquartile range from the bounds. Each data point represents a single site. Inset text, proportion of fibers significant (two-sided t-test on model coefficients, p < 0.05 for 3 timepoints in a row within 1 s cue window, Bonferroni corrected). Shaded regions in all plots, 95% confidence intervals.
Extended Data Fig. 5 Trajectory error encoding can be based independently on visual flow and locomotion.
a, Schematic of the visual only task (vis-only) in which the associations between cue identity and rewarded cue position were the same as in the instrumental (loc + vis task, Fig. 1), but the mouse angular velocity was decoupled from movement of the visual cue. Cue direction and velocity on each trial were yoked to congruent and incongruent trials performed by separate mice (see Methods). b, Average lick indices on rewarded (blue) and unrewarded (red) trials for each training day for one example mouse showing successful discrimination of cue-direction associations after learning (n = 88, 92, 91, 69, 89, 91, 92, 89, 95 rewarded trials on days 1-9, respectively; n = 38, 39, 35, 26, 39, 39, 40, 38, 38 unrewarded trials on days 1-9, respectively.) c, Average lick indices across trials and days for rewarded (blue) and unrewarded (red) trials across mice. t3 = 72.50, p = 5.78 × 10−6, n = 4 mice, two-sided paired t-test on the effect of trial type on lick index. d, Top, average congruence-signed cue angular velocity (+congruent, −incongruent) aligned to cue onset on congruent (green) and incongruent (purple) trials, split into thirds by cue velocity magnitude for one example mouse. Bottom, same as top but with congruence signed relative to the angular mouse locomotion velocity at cue onset based on cue-direction contingencies in the loc + vis instrumental task. e, Same as d but for mouse angular velocity. Note the decoupling of cue and mouse velocity. f, Average cue aligned ΔF/F for an example recording site on congruent (green) and incongruent (purple) trials based on the cue (top) or mouse (bottom) congruence-signed angular velocity, split into thirds by magnitude, as in d and e respectively. g, Top, Trajectory error coefficients aligned to cue onset based on the congruence-signed angular cue (blue) and locomotion (green) velocity averaged across sites and mice with a linear mixed effects model. Bottom, Violin plot of the maximum visual and locomotion based TE coefficients (t-stats) for all fibers and mice t230 = −8.93, p = 1.4 × 10−16, n = 116 sites, 4 mice, two-sided t-test on main effect of TE type in a linear mixed effects model. h, Top, trajectory error coefficients aligned to cue onset based on congruence-signed angular velocity of the cue in the vis-only task (blue) and the loc + vis task (orange) averaged across sites and mice with a linear mixed effects model. Bottom, violin plot of the maximum TE coefficients (t-stats) for all sites and mice. t230 = −0.001, p = 0.999, two-sided t-test on main effect of task type in a linear mixed effects model, n = 116 sites, 4 mice. i, Schematic of the locomotion only task (loc-only) in which task rules were the same as loc+vis but the visual cue remained stationary at the centre of the screens. j, Average reward rates in the loc-only (pink) and loc + vis tasks (orange), for sessions above criterion. Mouse 1: X2 = 1.36, p = 0.49, n = 762 and 535 trials for loc+vis and loc-only, respectively; mouse 2: X2 = 2.64, p = 0.21, n = 792 and 786 trials for loc+vis and loc-only, respectively; Chi-squared test for comparing two proportions, p-values bonferroni corrected for 2 mice. k, Histogram of times to reward for each task. l, Average congruence-signed mouse angular velocity (left) and ΔF/F for a representative fiber (right) aligned to cue onset for congruent (green) and incongruent (purple) trials, split into thirds by magnitude, during the loc-only task. m, Trajectory error coefficients for the loc-only (pink) and loc + vis tasks (orange) averaged across fibers and mice (n = 79 sites, 2 mice). n, Violin plot of the max loc-only (pink) and loc + vis (orange) TE coefficients for all fibers and mice. t156 = 1.70, p = 0.093, two-sided t-test on main effect of task type in a linear mixed-effects model, n = 79 sites, 2 mice. Shaded regions and error bars in all plots are 95% confidence intervals. Thin lines in all plots are averages for individual mice. For the boxplots in g,h,n, the centre point is the median, the lower and upper bounds are the first and third quartiles, and the whiskers extend from the box bounds to the most extreme value no further than 1.5 x interquartile range from the bounds.
Extended Data Fig. 6 Cue-evoked dopamine release encodes cue-value RPEs.
a, Non-TE DA cue responses averaged across sites and mice using a linear mixed-effects model (470: df = 315, n = 316 sites, 10 mice; 405: df = 230, n = 231 sites, 7 mice). b, Maximum cue-evoked z-scored ΔF/F during early learning for a subset of mice (df = 79, n = 80 sites, 3 mice). c, Violin plot of maximum non-TE coefficients for mice with quasi-simultaneous 470 nm and 405 nm recordings (two-sided t-test on wavelength effect: p = 1.33 × 10−18, t460 = –9.19, n = 231 sites, 7 mice). d, DA reward-history coefficients averaged as in a for mice with >30 unrewarded trials (470: df = 260, n = 261 sites, 8 mice; 405: df = 175, n = 176 sites, 5 mice). e, Z-scored ΔF/F aligned to cue onset following rewarded (red) or unrewarded (blue) trials, averaged across sites and mice (df = 260, n = 261 sites, 8 mice). f, Violin plot of maximum ΔF/F within 1 s from cue onset for trials in e (two-sided t-test on reward-history effect: p = 0.0052, t520 = –2.80, n = 261 sites). g, DA cue-identity coefficients averaged as in a (mixed-effects intercept: df = 315, n = 316 sites, 10 mice). Inset, maximum cue-identity coefficients vs contralateral bias index (Pearson r = 0.63, p = 0.025, n = 10 mice). h, Z-scored ΔF/F for cues signaling contralateral (red; bias cue in 9/10 mice) or ipsilateral (blue) directions, averaged as in e. i, Violin plot of maximum cue-identity coefficients (two-sided t-test on cue-identity effect: p = 6.77 × 10−7, t630 = –5.02, n = 316 sites, 10 mice). j, Mean number of trials with initial velocity directed contralaterally (red) or ipsilaterally (blue) to implant (two-sided paired t-test: p = 0.0037, t9 = −3.89, n = 10 mice). k, Reward rate for contralateral (red) vs ipsilateral (blue) cue trials (two-sided paired t-test: p = 0.0087, t9 = 3.33). l, Time to reward for contralateral (red) vs ipsilateral (blue) cue trials (two-sided paired t-test: p = 0.0066, t9 = −3.51). Shaded regions and error bars denote 95% confidence intervals; thin lines, individual-mouse averages. Box plots: centre, median; box bounds, first and third quartiles; whiskers, extremes value less than 1.5 x interquartile range from bounds.
Extended Data Fig. 7 RPE encoding of TEs requires mixed state-space representations of sensorimotor variables.
a, Initial running direction bias introduced into the tdRPE model to mimic mouse behaviour (Fig. 3j). Bars represent average trial numbers for bias and unbiased directions across 10 model simulations. b, Average reward rates (left) and times to reward (right) for cues associated with the pre-cue bias and non-bias initial running directions (compare to mouse behaviour data in Fig. 3k–l). c, Average simulated RPEs aligned to cue onset (t = 0) for trials presented with the cue associated with the bias (left) or non-bias (right) directions, for congruent (green) and incongruent (purple) trials split into thirds by simulated initial angular velocity magnitude. d, Average trial numbers for each cue for a model with no initial direction bias, presented as in a. e, Same as c for an un-biased model, split by cues associated with left and right directions. f, Average simulated RPEs aligned to cue onset on trials presented with the cue associated with the right (red) or left (blue) direction in an un-biased model. g-i, Results from a tdRL model (with the directional bias in a) in which the state only includes the cue identity variable with no mixing. g, Same as c. h, Same as f. i, Same as f with trials split by whether previous trial was rewarded (red) or unrewarded (blue). RPEs are larger for the cue associated with the initial running bias and following previously rewarded trials, consistent with the mouse DA data and cue value RPE encoding (Fig. 3d–i). j-l, Results from a model (with the directional bias in a) in which the state includes all variables (Fig. 3b) but with no mixing. RPEs are plotted as in g-i. Shaded regions and error bars in all plots are 95% confidence intervals.
Extended Data Fig. 8 Spatial maps and model coefficients for cue value measures across sites.
a, Horizontal (left) and sagittal (right) maps of the maximum reward history (top) and cue identity (bottom) coefficients for each recording site (bubble) across mice, normalized to the maximal coefficients across all recording sites. Empty bubbles, no significant coefficients. Black arrow is the axis of maximal variation, pointing in the direction of increasing magnitude based on the spatial coefficients in Fig. 5h. b, Trajectory error (orange) and non trajectory error (cue value, Fig. 5; blue) coefficients for three example sites in the same mouse (locations at right). Shaded regions are 95% confidence intervals around the FIR coefficient estimates (centre line). Brain schematics were adapted from the Allen Mouse Brain Common Coordinate Framework (CCFv3) (https://atlas.brain-map.org/).
Extended Data Fig. 9 Spatial organization of trial-trial performance encoding in trajectory error signaling.
a, Angular velocity, signed by congruence, aligned to cue onset averaged across correct (blue) and incorrect (red) incongruent trials for one example mouse. b, ΔF/F aligned to cue onset averaged across correct (blue) and incorrect (red) trials for two example recording sites in the same mouse. Recording site locations indicated in c. c, Horizontal (top) and sagittal (bottom) maps of the difference in dip magnitude between correct and incorrect trials. Colour of each bubble (recording site) represents the difference, with positive values indicating a larger dip for correct trials. d, Spatial coefficients representing the magnitude and direction of the variation in the difference in cue-evoked dip magnitude between correct and incorrect incongruent trials at recording sites with a significant dip on either trial type (ML: t127 = 4.74, p = 1.66 × 10−5; AP: t127 = 4.01, p = 3.06 × 10−4; DV: t127 = 5.21, p = 2.25 × 10−6; n = 131 sites, 10 mice, two-sided t-test on effect of location on dip magnitude difference in a LME, Bonferroni corrected). e, Vector showing the axis of maximal variation in dip magnitude difference, based on the coefficients in d. \(\underline{R}\), Rayleigh’s coefficient, S* = 22.38, p = 5.43 × 10−5. Thin lines represent individual mice. Shaded regions and error bars in all plots are 95% confidence intervals around the mean (a,b) or LME coefficient estimates (d).
Extended Data Fig. 10 Spatial and temporal variations in TE expression across learning.
a, Dynamic regression model estimates of trial-by-trial reward rates (black) and TE coefficients for two example fibers across all trials and sessions through learning (see Methods). Dashed lines indicate the trials in which the TE coefficients and reward rates reached significance (p < 0.01, shuffle test). Fiber locations indicated in the coronal atlas schematic. Black line, reward rate, averaged with a moving average window of 50 trials. b, Histogram of lags between the earliest trial where TE and reward rate reached statistical significance for each fiber in the three included mice. Colours indicate different mice. Negative values indicate that TE becomes significant prior to reward rate. Mouse 1:t27 = −10.9, p = 7.0 × 10−11, n = 28 sites; Mouse 2: t27 = −76.2, p = 1.4 × 10−32, n = 28 sites, Mouse 3: t29 = −5.4, p = 2.6 × 10−5, n = 30 sites, t-test on difference in trials between significant trajectory error encoding and significant performance, p-values Bonferroni corrected for 3 mice. c, Non-negative matrix factors (NNMF) captured from trial-by-trial TE trajectories (estimated with dynamic regression) across sites within each mouse representing early (blue) and late (red) TE increases across training for each of the three mice recorded across learning (see Methods for details). d, TE coefficients across trials averaged across recording sites with a larger weight for the early (blue) or late (red) component and normalized to the maximum TE coefficient, for each mouse. Grey line, reward rate averaged with a moving average window of 50 trials. e, Horizontal (left) and sagittal (right) plane maps of the differences in the weights for the late and early components of the NNMF for each recording site (bubble) across mice (n = 88 fibers, 3 mice). f, Spatial coefficients representing the magnitude and direction of the variation in the difference between early and late NNMF weights in e. Data points represent individual mice. ML: t84 = 4.1, p = 3.11 × 10−4; AP: t84 = 3.6, p = 0.0016; DV: t84 = 1.4, p = 0.54, two-sided t-test on spatial coefficients, p-values bonferroni corrected for 3 dimensions. Error bar, 95% confidence interval around LME model coefficients. g, Vector showing the axis of maximal variation in weight difference, based on the coefficients in g. \(\underline{R}\), Rayleigh’s coefficient, S* = 9.92, p = 0.02. Thin lines represent individual mice. h, Horizontal plane maps of the differences in the weights for the late and early components of the NNMF for each mouse as in e. The bubble sizes indicate the DV position (increasing size indicates more dorsal positions). Shaded regions in all plots are 95% confidence intervals. Brain schematics were adapted from the Allen Mouse Brain Common Coordinate Framework (CCFv3) (https://atlas.brain-map.org/).
Supplementary information
Supplementary Tables 1 and 2
Supplementary Table 1: statistics. A table with additional statistical information for Figs. 4b,c,h,i,j,k and 5j,k that did not fit in the figure legends or text. Supplementary Table 2: Mouse information. Information about the sex, implant hemisphere, fluorescent sensor, number of fibres implanted and included in analyses and tasks performed for each included mouse.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Brown, E.H., Zi, Y., Vu, MA. et al. Striatum-wide dopamine encodes trajectory errors separated from value. Nature (2026). https://doi.org/10.1038/s41586-025-10083-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41586-025-10083-1


