Dopamine dynamics during stimulus-reward learning in mice can be explained by performance rather than learning

Bakhurin, Konstantin; Hughes, Ryan N.; Jiang, Qiaochu; Hossain, Meghdoot; Gutkin, Boris; Fallon, Isabella P.; Yin, Henry H.

doi:10.1038/s41467-025-64132-4

Download PDF

Article
Open access
Published: 13 October 2025

Dopamine dynamics during stimulus-reward learning in mice can be explained by performance rather than learning

Nature Communications volume 16, Article number: 9081 (2025) Cite this article

20k Accesses
6 Citations
85 Altmetric
Metrics details

Subjects

Abstract

The reward prediction error (RPE) hypothesis posits that phasic dopamine (DA) activity in the ventral tegmental area (VTA) encodes the difference between expected and actual rewards to drive reinforcement learning. However, emerging evidence suggests DA may instead regulate behavioral performance. Here, we used force sensors to measure subtle movements in head-fixed mice during a Pavlovian stimulus-reward task, while recording and manipulating VTA DA activity. We identified distinct DA neuron populations tuned to forward and backward force exertion. They are active during both spontaneous and conditioned behaviors, independent of learning or reward predictability. Variations in force and licking fully account for DA dynamics traditionally attributed to RPE, including variations in firing rates related to reward magnitude, probability, and omission. Optogenetic manipulations further confirmed that DA modulates force exertion and behavioral transitions in real time, without affecting learning. Our findings challenge the RPE hypothesis and instead suggest that VTA DA neurons dynamically adjust the gain of motivated behaviors, controlling their latency, direction, and intensity during performance.

A feature-specific prediction error model explains dopaminergic heterogeneity

Article 03 July 2024

Mesolimbic dopamine adapts the rate of learning from action

Article Open access 18 January 2023

Explaining dopamine through prediction errors and beyond

Article 25 July 2024

Introduction

The ventral tegmental area (VTA) is the source of the mesolimbic dopamine (DA) pathway that has been implicated in reward and motivation¹. According to the influential reward prediction error (RPE) hypothesis, VTA DA neurons encode the difference between actual and predicted reward, providing a teaching signal for associative learning^2,3. This model has shaped our understanding of DA’s role in associative learning for decades. However, the RPE model cannot explain many experimental results^4,5,6. It has been challenged by many who propose that DA contributes to vigor, effort regulation, incentive salience, or movement kinematics, but there is no consensus on the contribution of DA to learning and behavior^6,7,8,9,10.

One reason for conflicting opinions on DA function is the lack of precise and continuous behavioral measurements. Many previous studies used measures that are usually limited to discrete time stamps (e.g., licking), or temporal durations (e.g., time spent in the reward port)^{11,12,13,14,15}. When continuous behavioral measures were used with high temporal and spatial resolution, DA activity in the substantia nigra pars compacta was found to be highly correlated with kinematic variables like head velocity, regardless of reward prediction or outcome valence, and selective stimulation of DA neurons increased movement velocity⁶. Recent work using sensitive measures of force also showed that VTA DA neurons represent and regulate force exertion in head-fixed mice¹⁶. These results suggest that previous observations in support of the RPE hypothesis could be explained by the contributions of DA signaling to online behavioral performance rather than learning.

In this study, we used force sensors to measure subtle movements in head-fixed mice during a Pavlovian stimulus-reward learning task previously used to study RPE signaling. Using in vivo electrophysiology and optogenetics, we showed that phasic DA activity in the VTA does not encode RPE, but is critical for modulating behavioral performance online as measured by force exertion and licking. We found two major populations of DA neurons that increased firing before forward and backward force exertion during the generation of anticipatory CRs and during spontaneous behaviors. Such force tuning is the same regardless of learning, reward predictability, or outcome valence. We also found an abrupt change in DA tuning when the mouse transitions to consummatory responding upon reward delivery, at which point phasic DA becomes more predictive of licking behavior. Based on these results, we propose a working model of phasic DA function, according to which DA modulates the adaptive gain in behavioral transition control. This model explains the pattern of phasic DA activity during Pavlovian conditioning.

Results

Force tuning in dopamine neurons

We investigated the relationship between force and VTA DA activity by training mice in a Pavlovian conditioning task while using a force-sensing head fixation apparatus (Fig. 1a and Supplementary Fig. 1)^16,17,18. Mice exerted forward forces after CS presentation, reflecting their anticipatory approach behavior. The force sensors also revealed small spontaneous movements during the inter-trial interval (ITI, Fig. 1b). Spontaneous movements were present throughout training and their amplitudes remained stable, even in well-trained mice (Fig. 1c and Supplementary Fig. 1).

**Fig. 1: Opponent signaling of force direction by two distinct DA neuron populations.**

We recorded single unit activity from the VTA in mice with moveable optrodes and used optogenetic stimulation to confirm cell type (n = 1683 single units; n = 948 from putative DA neurons; n = 98 from tagged DA neurons; Fig. 1d and Supplementary Figs. 2–4). We identified two major classes of DA neurons that exhibited distinct direction-specific tuning during spontaneous movements (Fig. 1e). They make up about 50% of recorded DA neurons (Fig. 1f). Both populations led force generation in their preferred direction (Fig. 1g). Forward DA neurons (n = 341) increased firing prior to spontaneous forward movements and decreased firing during backward movements (Fig. 1h). These neurons were tuned for spontaneous force exertion (Fig. 1i): when movements were forward, these neurons increased their firing rate and when movements were backward, they reduced their firing rate (Fig. 1j). In contrast, Backward DA neurons (n = 133) reduced firing during spontaneous forward movements and increased their firing during movements backward (Fig. 1k). Backward DA neurons were also tuned for force (Fig. 1l, m).

If the release of DA by VTA DA neurons plays a causal role in movement, it must do so through DA receptor activation in downstream neurons. To model the relationship between the firing rate of DA neurons and extracellular DA concentration, we generated a biophysical model using standard DA release and reuptake parameter values^19,20. This model can predict DA concentration using the firing rates of Forward and Backward DA neurons. We assumed that these populations target different striatal regions that are involved in generating forces in opposite directions (Fig. 1n). Because DA concentration is proportional to exerted force, the direction of force exerted is determined by the difference in DA concentrations in target regions. Without any tuning to fit the data or using any filters, the model predicted the direction of exerted force and its time course (Fig. 1o and Supplementary Fig. 5).

Force direction and DA neuron activity during Pavlovian conditioning

To reveal force direction-related DA activity during the Pavlovian task we took advantage of the flexibility of the force sensors and moved the location of the reward spout behind the mouth while keeping reward predictability constant (Fig. 2a). Although this change in spout location is only ~ 2 mm, it required a change in the direction of force exertion to access the reward (Fig. 2a and Supplementary Movie 2). Anticipatory licking rates (a conventional measure of the CR on this task) were similar for both spout positions (Fig. 2b). However, the slight change in spout position resulted in different movements generated by the mice: when the spout was moved backward by only 2 mm, mice generated more backward force and less forward force (Fig. 2c). Consequently, depending on the spout location, the force component of the CRs could be generated in different directions for the same reward (Fig. 2d). This allows us to test whether VTA DA neurons can be modulated by force direction even when reward prediction is constant.

**Fig. 2: Direction-selective DA activity during anticipatory approach is revealed by aligning to force CR onset rather than CS.**

In traditional analysis of DA activity during Pavlovian conditioning, neural activity is always aligned to stimulus events. However, it is well known that phasic DA activity can be elicited by salient sensory events with a very short-latency independent of learning^{21,22,23,24,25}. Any DA activity that may be direction-specific during the Pavlovian task may therefore be obscured by the large-magnitude direction-independent CS-response. Indeed, aligning Forward and Backward DA populations to the CS shows that CS activations are similar regardless of spout direction (Fig. 2e, f).

Because force CR latency varies from trial to trial, the direction selectivity of DA neurons could be revealed when we aligned their activity to CRs with longer latencies (Fig. 2g, h). Movement-related responses were therefore separated in time from the short-latency salience-related responses to the CS. Forward DA neurons showed higher activity for forward CRs than backward CRs, whereas Backward DA neurons showed higher activity for backward CRs than forward CRs (Fig. 2j). Importantly, force CR direction preference matched the direction preferences exhibited by these same cells during spontaneous forward and backward movements (Fig. 2k, l).

We also identified two additional classes of putative DA neurons with no direction preference, increasing or decreasing firing during both forward and backward spontaneous movements (Supplementary Fig. 6). A large population of unclassified neurons had relatively wide waveforms and low firing rates, unlike GABAergic VTA neurons, which have narrow waveforms with high baseline firing rates (Supplementary Fig. 3)²⁶. These unclassified neurons were better correlated with force rather than the change in force. They also showed reduced activity to the CS on trials lacking CRs (Supplementary Fig. 7), further indicating a role in generating force during both spontaneous and task-related movements.

DA activity and force generation during aversive stimuli

According to the RPE hypothesis, an unexpected aversive stimulus should produce a negative prediction error, reflected as a decrease or pause in VTA DA activity^27,28. If DA activity reflects force exertion, the same relationship between DA and force should be observed regardless of whether the outcome is rewarding or aversive. To test this possibility, we delivered aversive air puffs in separate sessions (Fig. 3a)¹⁸. Unexpected air puffs resulted in a backward movement, away from the source of the air puff, followed by rebound forward movement (Fig. 3b). This pattern was observed across all mice (Fig. 3c). Latency measures confirmed the backward component had a shorter latency than the rebound forward component (Fig. 3d).

**Fig. 3: DA activation to aversive air puff is explained by bidirectional force changes.**

The direction-selective DA neurons reflected the temporal patterns of force changes during an aversive air puff (Fig. 3e). The Backward DA population was activated first during the initial backward movement, followed by the Forward DA population that was activated prior to the onset of forward movement (Fig. 3f). Forward DA neurons and Backward DA neurons did not differ in their firing rates (Fig. 3g). Consistent with results shown in Figs. 1 and 2, Forward and Backward DA neurons maintained tuning for changes in force in their preferred direction (Fig. 3h). Our results agree with past research showing activation of DA neurons in response to aversive stimuli as well as rewarding stimuli^3,6,18,22, but we now show that previous observations could be explained by distinct types of DA neurons signaling force in different directions.

Changes in reward size and probability change force exertion and DA activity

Phasic DA signaling is known to change systematically with reward size manipulations^29,30. Previous studies did not quantify the behavioral changes associated with such manipulations other than licking. We found that the increase in reward size also altered movements generated by mice (Fig. 4a). Increasing reward size reduced the latency of force CRs (Fig. 4b) and the duration of URs (Fig. 4c), and increased the mean force exerted for both the CR and the UR (Fig. 4d). DA neurons increased their firing rates to the CS and to the US when mice were receiving larger rewards (Fig. 4e–g).

**Fig. 4: Changes in DA activity with reward size or probability manipulations can be explained by changes in force exertion.**

Prior work also found that phasic DA activity was modulated by reward probability³⁰. Higher reward probability increases DA signal after CS—a finding usually interpreted as providing support for the RPE hypothesis. We manipulated reward probability and compared its effect on force exerted and DA activation (Fig. 4h). UR duration increased when mice received rewards only 50% of the time (Fig. 4i). Mice produced lower CR force when reward probability was 50% but generated larger UR force upon reward delivery (Fig. 4j). These patterns were accompanied by similar changes in firing rate of DA neurons (Fig. 4k, l). Firing rates after the CS were lower when the reward probability was 50%, but once mice received the reward, DA neurons showed higher activity. The opposite pattern was observed when the reward was delivered at 100% probability (Fig. 4m).

Reward omission reveals dips in DA firing and force

Reduced firing after reward omission is thought to signal a negative RPE (actual reward less than predicted)³¹. In well-trained mice (n = 5), we also omitted the reward after CS presentation (Fig. 5a). DA neuron populations showed reduced activity after reward omission (Fig. 5b, c). Interestingly, after reward omission, mice abruptly terminated force exertion, and did not generate additional force which is normally observed after reward delivery (Fig. 5d). The change in force was therefore negative following reward omission (Fig. 5e). The characteristic “dip” in DA neuron activity after reward omission (Fig. 5f) coincided with a clear reduction in force exertion (Fig. 5g and Supplementary Movie 3). Together, these results show that activity after the reward is consistent with DA mediating the transition to consummatory behavior and determining its persistence. In the absence of reward, DA signaling is low, and the UR is terminated early. The dip in DA contributes to pausing the ongoing behavior.

**Fig. 5: Dip in DA activity after reward omission can be explained by changes in force exertion.**

CR force increases during Pavlovian conditioning

DA neurons are known to shift their activity from the US to the CS during stimulus-reward learning^31,32. Such changes have been explained by the RPE hypothesis, in particular by the temporal difference algorithm (Fig. 5h–j)^3,31,33. We found that the development of CS-evoked DA responses during learning was accompanied by an increase in force as mice gradually produced more anticipatory behavior (Fig. 6a, b). There was a corresponding increase in DA activity (Fig. 6c). Phasic DA activity strongly predicts CR force changes (Fig. 6d). In addition to increasing CR magnitude, force CR onset time also becomes less variable with training (Fig. 6e). With training, there was a significant decrease in both force CR latency and the variance in latency (Fig. 6f, g and Supplementary Fig. 8).

**Fig. 6: DA activity and force CRs become larger and earlier during learning.**

DA activity determines force onset during learning

In principle, the changes in CR timing during learning can be explained by the RPE hypothesis, which equates the development of reward prediction with the probability of CR generation^31,34. We found that a trial-by-trial analysis of the relationship between DA activity and behavior could disambiguate the role of DA in performance versus learning.

During intermediate training stages, mice frequently generated force CRs, though CR latency remained variable (Supplementary Fig. 8). During such sessions, the activity of many DA neurons after the CS varied with the onset time of the CR, i.e., they strongly predict when the force CR will be generated. When the force CR took longer to generate (Fig. 7a), these latency-predicting neurons showed lower firing rates than during trials when the CR was more rapidly produced (Fig. 7b). During the session, their activity did not vary with trial number, a pattern that we might expect if DA activity reflects increasing reward prediction during training (Fig. 7c, d). Overall, CR-predicting DA activity was better explained by movement onset latency (Fig. 7e and Supplementary Fig. 8) than time in session (Fig. 7f). Their activity appears to contribute to the production of the CR rather than signaling RPE (Fig. 7g and Supplementary Fig. 8). A separate population of DA neurons recorded during these same sessions was activated by the CS regardless of movement onset latency, showing no relationship between force CR latency and their firing rate (non-CR-predicting DA, Supplementary Fig. 8). The activity of this DA population was not explained by the amount of training either, as it did not vary with trial number during the session (Supplementary Fig. 8). This pattern suggests that this population may be related to CS salience rather than learning.

**Fig. 7: Variations in DA activity predicts force CR latency rather than reward prediction.**

With further conditioning, as force CR latencies became more uniform (Fig. 7h and Supplementary Fig. 8), DA activity in response to the CS also became less variable (Fig. 7h–j). We could not predict CR latency nor time in session for most DA neurons after extensive training (Fig. 7k), although some DA neurons could still predict CR onset latency after training (Supplementary Fig. 8).

Reduced DA responses to the CS in the absence of CR, independent of learning

Occasionally, even in well-trained mice, the force CR was not generated on some trials. DA neurons showed dramatically reduced responses to the CS if no CR was generated, regardless of the amount of training (Fig. 8a–e). Moreover, sometimes CSs were presented while mice were already engaged in spontaneous movement bouts (Fig. 8f), which resulted in smaller changes in CR force (Fig. 8g). This was also accompanied by reduced DA signaling after CS presentation (Fig. 8h) and independent of the amount of training (Fig. 8i, j). Together, these results strengthen the connection between DA activity and CR initiation rather than reward prediction. While the CS always predicted reward in trained mice, reduced DA activity corresponded to a lack of CRs or CRs generated with reduced force.

**Fig. 8: Reduced DA whenever force CR was not generated, regardless of the amount of training.**

DA signals the transition to consummatory behavior upon reward delivery

DA activation upon US delivery is often described as mediating a reward signal, but prior work has not closely inspected the behavioral changes after the US. Early in training, the URs (force exertion and licking) were variable in their onset relative to the US (Supplementary Fig. 9). On trials where the UR was delayed, aligning data to the US showed reduced force and licking UR measures and reduced DA firing activity (Supplementary Fig. 9). However, by aligning to the delayed licking UR onset rather than the US timestamp, we showed that force generation, licking, and DA firing activity were restored in magnitude (Supplementary Fig. 9). DA activity therefore signals the initiation of consummatory behavior rather than US delivery.

With training, the UR (force after US delivery) becomes a continuation of the CR and cannot be clearly separated (Supplementary Fig. 9). The RPE hypothesis predicts a reduction in phasic DA in response to the US in trained mice, as the reward becomes well-predicted. While the UR force changes decreased with training, DA activity after the US did not decrease (Supplementary Fig. 9). Mice reliably produce subtle changes in licking and force generation after the US, even after extensive training. Once the CR was generated, upon detection of the sucrose reward, mice increased licking (Fig. 9a–c). DA activity just before licking strongly predicts subsequent licking UR (Fig. 9d, e). Mice also changed their force exertion pattern when the reward was delivered (Fig. 9f). Upon reward delivery, the force briefly reversed direction, resulting in a transient backward force as the mouse stopped the anticipatory approach behavior and transitioned to forward force exertion again while increasing licking behavior (Fig. 9g, h). Overall, DA neuron activation was coincident with the redirection of force but preceded consummatory behavior as measured by forward force generation and the increase in licking (Fig. 9i). DA activity after reward delivery predicts licking rate rather than either force or the integral of force (Fig. 9j and Supplementary Fig. 9). Both Forward and Backward DA neurons showed phasic increases in activity upon reward delivery (Supplementary Fig. 10), suggesting the direction-specific force tuning was only a feature of anticipatory CR reflecting approach behavior, but the same DA neurons contribute to the generation of the lick bout. In other words, upon reward feedback, these DA neurons can now adjust gain for consummatory behaviors rather than preparatory behaviors.

**Fig. 9: DA activity immediately after US reward delivery predicts transition to consummatory behavior.**

Bidirectional optogenetic manipulation altered force exertion without affecting learning

According to the RPE hypothesis, DA serves as a teaching signal for learning³⁵. We tested whether stimulating DA neurons in place of a sucrose reward after CS presentation would lead to learning of the CS-stimulation association, as reported previously¹². We trained both VTA DAT-Cre with ChR2 expression (n = 3) and WT mice (n = 3) by delivering a tone (CS) followed by stimulation instead of sucrose reward. There was no licking CR in the absence of reward (Fig. 10a, b). After 400 trials, we introduced a sucrose reward in addition to stimulation. When the reward was delivered, stimulation did not affect the learning rate compared to controls (Fig. 10b, c). In agreement with previous work^6,18, in well trained mice optogenetic stimulation 4 s after reward delivery also produced forward force (Fig. 10d–f).

**Fig. 10: VTA DA is neither necessary nor sufficient for learning the stimulus-reward association.**

To further test whether VTA DA signaling is necessary for learning, we used an inhibitory opsin (SIO-stGtACR2³⁶, N = 4 DAT-cre mice, control: N = 6 WT mice) to inhibit VTA DA neurons during the CS-US interval (Fig. 10g). Inhibition produced backward movement, but did not impair learning (Fig. 10h). In fact, mice licked more after inhibition of DA neurons compared to controls, contrary to what the RPE hypothesis predicts (Fig. 10i). Inhibition also reduced their latency to backward movement (Fig. 10j). According to the RPE hypothesis, if DA neurons are inhibited immediately after reward delivery, there should be a negative RPE and reduced anticipatory licking on future trials³⁷. Contrary to this prediction, when we inhibited DA neurons at reward delivery to mimic a negative prediction error (Fig. 10k), anticipatory licking (CR) was not reduced on subsequent trials (Fig. 10l). Together our optogenetic results show that phasic DA is neither necessary nor sufficient for stimulus-reward learning.

Discussion

We found systematic changes in force measures and the activity of VTA DA neurons during a stimulus-reward task. We identified two distinct groups of DA neurons with force tuning (Fig. 1). Force tuning is found during spontaneous movements during the ITI and during force CR production. It does not depend on learning. Altering reward size, probability, or omitting the reward systematically alter phasic DA activity, as predicted by the RPE hypothesis, but the changes in DA activity can be explained by subtle changes in performance rather than learning (Figs. 4 and 5). DA activation after CS predicts variations in both the latency and the presence of the CR (Figs. 6–8) and UR (Fig. 9 and Supplementary Fig. 9). Finally, stimulating or suppressing DA neurons can directly affect the direction of force exertion without affecting learning (Fig. 10).

Studies have shown short-latency DA bursts in response to salient stimuli, independent of learning^22,24. This has led to the proposal that there are two components of DA signaling, a short-latency salience signal and a delayed signal reflecting RPE²¹. We could also discern two components of phasic DA activity, though the early salience-related component can be merged with the slower component for approach behavior when CR latency is short. The first component can be evoked by a salient stimulus like the CS. Although this component can predict the latency of CR force onset, it does not show direction-specific force tuning. On the other hand, there is a second component of DA signaling with longer latency that is relatively independent of salience. This component can distinguish between forward and backward force exertion (Fig. 2i, j). Thus, aside from force tuning, most DA neurons can also signal stimulus salience. But both components of DA signaling can be explained by changes in performance rather than RPE, as they are present before learning.

Whereas the RPE model predicts gradual increases in phasic DA after CS as reward prediction grows with each reward delivery³¹, our single-trial analysis shows that DA activity does not increase uniformly across a session. Rather, the variations in phasic DA signaling following the CS are better explained by CR latency. Higher DA predicts earlier CR force, suggesting a faster rise to activation thresholds for CR generation (Fig. 7). Furthermore, DA activation was significantly reduced if force CRs were not generated prior to the US, regardless of training stage (Fig. 8). This suggests that increased DA signaling alters performance by initiating and speeding up responses, instead of signaling RPE.

DA response to reward and reward omission

While DA responses to the CS grow as the CR becomes more consistent, the DA response to the US does not decrease with training, contrary to the predictions of the RPE model (Supplementary Fig. 9). A recent study in mice found that DA activity aligned to the US increased with training, but did not show whether this is associated with reduced UR latency³². Our results suggest that, regardless of training, DA activity after the US promotes transition to consummatory behavior. By aligning to the reward delivery (US), we showed reduced DA activity if force and licking URs are not immediately generated after reward. Thus the putative “reward response” is actually used to generate the UR. This activity does not show force direction tuning, but reflects subtle performance changes, such as a brief pause associated with stopping forward approach (CR) as soon as the reward is detected, followed by a DA burst that predicts consummatory licking quantitatively (Fig. 9 and Supplementary Figs. 9 and 10). Thus DA is not just a reward signal; it is directly shaping ongoing action patterns. For rhythmic pattern generators with a relatively fixed frequency (e.g., licking at ~7 Hz in mice), this could mean controlling the duty cycle—the proportion of time spent licking, in agreement with our observations^38,39.

A key observation that has been used to support the RPE hypothesis is the “dip” in DA firing when an expected reward does not arrive³¹. This dip is conventionally interpreted as a negative RPE signal. We also found a dip in DA activity following omission of predicted reward. However, this pattern could be explained by a change in performance, as mice abruptly stop force exertion after reward omission. There was a corresponding reduction in force that parallels the dip in DA activity at the expected time of reward delivery (Fig. 5).

Finally, we showed that VTA DA neurons can increase firing to rewarding and aversive stimuli, but such activity is determined by differences in the patterns of force exertion. DA activity appears to be independent of outcome valence but reflects the direction and magnitude of anticipatory approach behavior^3,40. If VTA DA neurons signals RPE or reward value, they should be inhibited by the air puff. We found, however, that they were excited and modulated by the direction of movements evoked by the air puff (Fig. 3). Importantly, force tuning remains similar despite the change in outcome valence. These results are also consistent with previous research demonstrating that mesolimbic DA is also responsive to aversive stimuli⁴¹.

Our results therefore contradict the RPE hypothesis of DA function. The basic assumption in traditional reinforcement learning models in general, and the RPE hypothesis in particular, is that learning is expressed directly and immediately in performance⁷. They therefore conflate learning with changes in performance. In addition, such models treat actions as simple, discrete choices, ignoring their spatial and temporal complexity. Recently, Lee et al. argued that a vector RPE model can explain results showing heterogeneous coding of task variables but uniform reward responses⁴². They suggest that DA heterogeneity reflects a high-dimensional state representation with a distributed RPE code. Our results suggest that DA signaling dynamically modulates behavior in real time, but its role is to facilitate transitions commanded by other brain areas, e.g., corticostriatal inputs. Consequently, it could play multiple roles to facilitate different behavioral demands depending on its timing.

Adaptive gain hypothesis

Our results provide support for an alternative hypothesis of DA function—the adaptive gain hypothesis⁴³. This hypothesis posits that dopamine acts as a dynamic gain signal in the cortico-basal ganglia networks. It can adjust the transitions between states or actions in transition control systems⁴⁴. In this framework, dopamine does not just signal reward or drive motivation in a static way; it shapes how behaviors unfold in real time, including their initiation, maintenance, and termination. The primary role of DA is to alter the activation function of downstream neurons, such as striatal projection neurons. Because it is a gain signal, in accord with its role as a neuromodulator, it does not unconditionally cause behavior all by itself, but works in concert with corticostriatal commands, and possibly other glutamatergic pathways, for action generation.

DA signaling could have two components with different timing^21,24. In response to a salient stimulus DA neurons can be activated by early-stage sensory pathways and show a short-latency burst. The early dopamine burst acts as a global alert signal, amplifying the gain on downstream circuits to promote a new behavioral transition. This “preparatory gain” boosts sensitivity or readiness, lowering the threshold for action initiation and predicting when a behavioral transition can occur (CR latency). Virtually all recorded DA neurons show this property. In principle, their uniform activation by salient stimuli allows them to act as general preparatory gain signal that primes all potential action modules without specifying details of the action. This could be mediated by changing the excitability of many striatal neurons in response to inputs⁴⁵. The short-latency response does not show force tuning, as the detailed action command with kinematic information has not been formulated. On the other hand, once a particular action has been commanded (e.g., move forward), the basal ganglia output can initiate actions, while simultaneously sending an efference copy signal that modulates DA signaling in real time⁴⁶. This can be done via axon collaterals of GABAergic projection neurons from the VTA and SNr^26,47. In turn, DA adjusts the gain in corticostriatal transmission. According to this hypothesis, force tuning is a consequence of the slightly delayed “execution gain” that uses an online estimate of the ongoing behavioral state to adjust the system gain, a common method used in control systems to update gain dynamically according to task demands. This model is also supported by previous work showing a role for DA in regulating effort¹⁰, movement vigor⁴⁸, kinematics⁶, or the regulation of impulse in approach behavior¹⁸.

Although the present results provide an alternative explanation of results that are traditionally used to support the RPE hypothesis (Figs. 4–7), they do not rule out a role for DA in learning. There is a close relationship between learning and performance, though one cannot simply interpret any change in performance as learning⁴⁹. Learning can be defined as long-term changes in system parameters, which are presumably implemented by long-term synaptic plasticity, rather than a transient change in system performance⁴³. Many factors, such as motivational state, effector properties, and environmental changes could contribute to performance changes, without requiring long-term changes in system parameters. On the other hand, many forms of learning still require repetitions, similar to the more artificial repetitive stimulation patterns used in induction protocols for synaptic plasticity in in vitro studies. As adaptive gain, DA is necessary for behavioral repetition and persistence, and as such could contribute to long-term plasticity and learning. In this sense, DA plays a permissive role in learning. This account is in accord with short-term effects of DA modulation, such as altering neuronal excitability, and long-term effects of DA signaling in gating synaptic plasticity^45,50,51. According to this account, changes in DA signaling during learning are due to changes in upstream neurons (e.g., from the prefrontal cortex) that directly project to DA neurons⁵².

Explaining previous results on DA signaling

Our results may seem to contradict a large body of work on the role of VTA DA in learning. In particular, recent studies using optogenetics have attempted to demonstrate a causal role for DA in learning^12,53. It is therefore important to discuss some of the most relevant results, to see if our results shed light on their interpretation.

Steinberg et al. claimed to provide evidence for a causal role of VTA DA neurons in learning using a blocking task⁵³. In blocking, learning of a stimulus-reward association is prevented (blocked) in the presence of an established reward predictor. Around six decades ago, this observation motivated the formulation of learning models using prediction errors, according to which existing associative strength of all available predictors reduces the RPE and hence learning about a new CS. Steinberg et al. argued that stimulation of DA neurons at the time of reward produced a positive RPE to restore learning to the blocked stimulus. However, in their optogenetic experiments, there were no controls for stimulation-induced changes in attention to the CS or US, stimulus generalization, or CR performance. Since the behavioral measure they used was just time spent in the reward port, it is unclear what the CR was or how it was affected by stimulation. Indeed, because blocking can be reversed by manipulations like spontaneous recovery, post-training reminders, and post-training extinction of the competing stimulus, some have argued that it is due to a performance deficit rather than a failure to learn⁵⁴.

Saunders et al. found that optogenetically mimicking phasic DA signaling, instead of a natural reward, can artificially generate CRs. According to them, stimulation of DA neurons paired with neutral cues transforms these cues into CSs that elicit locomotion CRs. After training, the CR can start even before stimulation. These results do not necessarily support the RPE model. Saunders et al. did not use reward omission or manipulations of reward predictability to test RPE. All DA bursts were experimenter-controlled, ensuring consistent “reward” (laser stimulation) during paired trials. Although after extensive training CRs can start before stimulation onset, we cannot rule out residual effects of DA from previous trials contributing to CR production. DA can have postsynaptic effects (e.g., on excitability and GPCR-dependent intracellular signaling) that last for seconds, even when the extracellular DA has declined⁴⁵. In their study, there is no evidence that the CS simply evokes a CR without any DA stimulation, since there are no CS-only probe trails. In fact, their observations are in accord with the predictions of the adaptive gain model, with the preparatory gain phase mapping onto DA bursts that create CSs and drive CRs. DA and excitatory corticostriatal drive together determine whether striatal projection neurons (e.g., those in the nucleus accumbens) reach the threshold for firing. DA can also increase in response to salient stimulus (preparatory gain), which is sufficient for behavioral activation if the relevant top-down cortical command for approach or locomotion is present.

It should also be noted that in the Saunders et al. study TH-Cre rats were used to activate DA neurons, but TH may not be a selective marker for DA neurons. Previous work in mice has shown that TH is also expressed in non-dopaminergic neurons¹⁴. Consequently, whether the results could be attributed to the activation of DA neurons per se remains open to debate. In our optogenetic experiments, we did not observe the generation of force CRs when we stimulated DA neurons in DAT-Cre mice in place of sucrose reward. This inconsistency with the Saunders et al. study could be attributed to the different promoters (DAT vs TH), or due to differences in the overall number of stimulation sessions. Only after learning of the CS−Reward association were we able to evoke movements with DA activation alone outside of the task. Our results suggest a performance-modulation role for DA once learning is established rather than one in which new CRs are generated.

Another relevant study by Iino et al. manipulated the DA dip that is normally observed following omission of predicted rewards⁵⁵. DA dips detected by D2 DA receptors (D2Rs) occurred when rewards were omitted (CS− trials), disinhibiting adenosine A2A receptor (A2AR)-mediated signaling in D2-SPNs to suppressing licking to CS−. In the adaptive gain model, DA reductions can suppress ongoing behavior, enabling transitions to alternative states (e.g., pausing or withholding licking). Importantly, Iino et al. found that extinction learning did not involve DA dips or D2-SPNs. According to RPE, DA dips (negative RPE) should lead to extinction, but this prediction was not supported by their results, which suggest that DA dips are specific to discrimination tasks where a non-rewarded cue (CS−) is contrasted with a rewarded one (CS+).

Lee et. al found that inhibition of DA during reward consumption reduced the likelihood of subsequent CR. They argued that the manipulation generated a negative RPE. According to the RPE model, negative RPE signals would lead to a progressive weakening of the association, and relearning with positive RPE signals will be required to restore the original level of performance. However, mice resumed full CR generation immediately after the end of inhibition, without relearning (Fig. 4a and Extended Data Fig. 2). This pattern does not support their claim that inhibiting DA neurons introduces a negative RPE, which weakens the CS-US association, which would predict gradual relearning with CS-US pairings. This supports a performance account, where DA modulates gain for CR generation. We also performed a similar experiment where we used GtACR2 to inhibit DA neurons at the US. In previous work, we have shown GtACR2 can successfully inhibit DA spiking¹⁸. We did not observe changes in subsequent licking CRs (Fig. 10k–m). As Lee et al. did not measure force exertion, it is unclear exactly how their mice changed their behavior during learning. Differences in experimental procedures (e.g., using a different opsin (NpHR)) may also account for the conflicting results.

Recent work by Cai et al. has shown that abolishing phasic DA does not abolish spontaneous locomotion in mice⁴². They therefore argue that phasic DA is not necessary for movement per se, but contributes to the generation of reward-guided behavior. But their results are incompatible with the RPE hypothesis, as abolishing phasic DA did not affect learning. The finding that mice can still move without significant phasic DA signaling is not surprising given the adaptive gain model. As a gain signal, DA is expected to amplify the corticostriatal drive but is not unconditionally necessary for movement. With low gain, it is still possible to generate some behaviors, given sufficient driving signal from excitatory transmission. One would simply expect behavior to be less frequent and slowed, and indeed this was precisely what Cai et al. observed. Consequently, their results also contradict the RPE model but can be explained by the adaptive gain model.

In short, although results from these studies as well as many others are often considered to support the RPE hypothesis, they can in fact be better explained by the adaptive gain model. Previous studies often neglected subtle changes in behavioral performance during learning, which can be masked by the common practice of averaging across many trials with variable movement properties. A common practice is to align DA activity to experimenter-determined timestamps for events such as the CS and US, instead of aligning to behavioral transitions. This is not surprising since previous work did not use continuous behavioral measures. As we have shown, aligning to CS and US can lead to misinterpretation, and detailed behavioral analysis based on continuous behavioral measures is therefore critical for understanding the relationship between DA signaling and behavior^56,57.

Methods

Mice

All experimental procedures were approved by the Animal Care and Use Committee at Duke University (Protocol # 162-22-09). 11 DAT-ires-Cre mice, 6 DAT-Cre + Ai32 mice, 2 VGAT-Cre mice, and 8 wild-type male and female mice were used (Jackson Labs, Bar Harbor, ME). Sex was not considered in study design. DAT::Ai32 mice were generated by crossing Ai32 mice, which express channelrhodopsin (ChR2) in neurons with Cre recombinase, and DAT-Cre mice, which express Cre under the control of the dopamine transporter (DAT) promoter. Mice (2–8 months old) were group housed on a 12:12 light cycle, with experimentation occurring during the light phase. During testing, mice were put on water restriction and maintained at 85–90% of their initial body weights. They received free access to water for approximately two hours following daily experimental sessions.

Viral constructs

rAAV9.EF1α.DIO.hChR2(H134R) (Addgene plasmid # 35507) and pAAV-hSyn1-SIO-stGtACR2-FusionRed (Addgene # 105677-AAV1) were used in this study.

Surgery

Mice were anesthetized with 2.0–3.0% isoflurane and then placed into a stereotactic frame (David Kopf Instruments, Tujunga, CA) and maintained at 1.0–1.5% isoflurane for surgical procedures. A craniotomy was then drilled above the VTA (AP: 3.1–3.4 mm relative to bregma, ML: 0.4–0.6 mm relative to bregma, DV: 4.0–4.4 mm relative to brain surface). For electrophysiological recordings, drivable electrodes were placed just above the VTA (AP: 3.2–3.4 mm, ML 0.3–0.6 mm, DV, −3.8 mm) and 16-channel recording electrodes were lowered into the VTA (AP: 3.2–3.4 mm, ML 0.5 mm, DV, −4.0–4.4 mm). For optotagging experiments, an optic fiber was attached to the electrode array at an angle (~ 15°). For optogenetic experiments, 300 nL of DIO-ChR2 or SIO-StGtaCR2 were bilaterally infused into the VTA of DAT-Cre or WT mice using a microinjector (Nanoject 3000, Drummond Scientific) at a rate of 1 nL per second. The injection pipette was left to sit for 3–5 min in order to allow the virus to absorb into the brain tissue and prevent leakage. Custom-made optic fibers (5–6 mm length below ferrule, >70% transmittance, 105 μm core diameter) were then implanted at an angle (15°) above the VTA (AP: 3.2–3.4 mm, ML: 1.6 mm, DV: 3.8 mm). Fibers and electrodes were secured to the skull using screws and dental acrylic and all mice were fitted with a titanium headbar implant for head fixation. All mice were allowed to recover for two weeks before beginning training on the Pavlovian task.

Histology

For brain collection, mice were anesthetized in an induction chamber with 3–5% isoflurane until immobile. Depth of anesthesia is confirmed by absence of toe-pinch response and steady respiration. Under deep anesthesia, euthanasia is performed by transcardial perfusion: Mice were transcardially perfused with 0.1 M phosphate-buffered saline (PBS) followed by 4% paraformaldehyde (PFA) in order to confirm viral expression as well as optic fiber and electrode placement.

To confirm placement, brains were stored in 4% PFA with 30% sucrose for 72 h. Tissue was then post-fixed for 24 h in 30% sucrose before cryostat sectioning coronally (Leica CM1850) at 60 µm. Fiber and electrode implantation sites were then verified. To confirm eYFP and FusionRed expression in DAT+ cells in the VTA of DAT-ires-Cre and DAT + Ai32 transgenic mice, sections were rinsed in 0.1 M PBS for 20 min before being placed in a PBS-based blocking solution. The solution contained 5% goat serum and 0.1% Triton ×-100 and was allowed to sit at room temperature for 1 h. Sections were then incubated with a primary antibody (polyclonal rabbit anti-TH 1:500 dilution, ThermoFisher, catalog no. P21962; polyclonal chicken anti-EGFP, 1:500 dilution, Abcam, catalog no. ab13970) in blocking solution overnight at 4 °C. Sections were then rinsed in PBS for 20 min before being placed in a blocking solution with secondary antibody used to visualize DAT neurons in the VTA (goat anti-rabbit Alexa Fluor 594, 1:1000 dilution, Abcam, catalog no. ab150080; goat anti-chicken Alexa Fluor 488, 1:1000 dilution, Life Technologies, catalog no. A11039) for 1 h at room temperature. Sections were mounted and immediately coverslipped with Fluoromount G with DAPI medium (Electron Microscopy Sciences; catalog no. 17984-24). Placement was validated using an Axio Imager.V16 upright microscope (Zeiss) and fluorescent images were acquired and stitched using a Z780 inverted microscope (Zeiss).

Head-fixed behavioral system

The head-fixation device for measuring forces exerted by mice during behavioral testing and stimulation was described previously^16,17. Briefly, the head was clamped via the headbar into the head-fixation frame, which contained force sensors (100 × g load-cells, RB-Phil-203, RobotShop.com). Load cells measure force by linearly translating mechanical deformations into a voltage signal. This voltage signal was then amplified using an INA125P (Texas Instruments) in a circuit configuration that allowed for bidirectional measurement of force. Load cell voltages (1 kHz sampling rate), electrophysiological data, and timestamps for licks, reward, and laser were recorded with a Cerebus data acquisition system (Blackrock Microsystems) for offline analysis. A spout connected to a reservoir with a 10% sucrose solution was positioned at in front or slightly below the mouth (Behind condition). Reward delivery was controlled by opening a solenoid valve (161T010, NResearch, NJ) attached to the tubing connected to the spout. A capacitance-touch sensor (MPR121, AdaFruit.com) attached to the spout was used to detect licks.

Pavlovian stimulus-reward task

Mice were first allowed to habituate to the head-fixed condition for 5 min for approximately 1–2 days. Once habituated to head-fixation, they were trained on approximately 70–100 trials a day. A spout that delivers 10% sucrose was positioned in front of the mouth. White noise was continually present in the background. At the beginning of each trial, a 3 KHz tone that lasted 200 ms was presented, followed by delivery of 5–30 μL of sucrose 800 ms after the end of the tone. The delay between onset of the tone and reward delivery was 1 s. There was a random intertrial interval that varied to prevent any anticipation of the onset of the tone (4–60 s).

Air puff delivery was delivered using an EFD 1500 XL pneumatic fluid dispenser. The puff lasted 20 ms and the output tube was aimed at the face. Air puff trials were conducted as distinct sessions outside of the cue-reward sessions. There was a 5-min break between the end of a reward session and the beginning of an air puff session.

For Spout Behind sessions, the spout tip was moved slightly underneath (2 mm change in position, Supplementary Movie 2) the chin of the mice so that they had to move backwards to obtain water reward. For reward probability manipulations, sessions contained reward delivery of either 100% or 50% reward probability. For reward magnitude manipulations, sessions contained either small rewards (20 ms reward duration, 5 µl) or large rewards (60 ms reward duration, 15 µl).

Wireless in vivo electrophysiology

Drivable electrodes were single-drive movable micro-bundles of tungsten electrodes (1 × 16; 23 μm diameter) placed within a guide cannula (Innovative Neurophysiology, Inc.). Electrophysiological data were recorded using a wireless head stage (Triangle Biosystems) that was interfaced with a Blackrock Cerebus data acquisition system (Blackrock Microsystems). A digital bandpass filter was applied to the electrophysiological data (250 Hz–5 kHz) and spike timestamps and waveforms were recorded at 30 kHz. Filtered data were sorted using Offline Sorter (Plexon). A 3:1 signal-to-noise ratio, and an 800 μs or greater refractory period were required for the neural data to be used for analysis. Single units were selected based on a principal component analysis of waveforms using 2 principal components. To obtain new neurons with driveable electrodes, the electrodes were lowered by ~25 μm after each behavioral session. By comparing spike waveforms from consecutive recordings, we estimated that only about 4% of the recorded units may come repeated recording of the same neurons. These units were not omitted from the dataset, as it is not possible to conclusively establish that they are from the same units despite similar waveforms. All peri-event raster plots were generated using NeuroExplorer (Nex Technologies).

Optogenetic identification of VTA DA neurons

Several populations have wide waveforms and low firing rates characteristic of DA neurons (Supplementary Fig. 3). However, prior work has shown that waveforms and firing rate cannot establish neuronal identify in some VTA neurons^58,59. We therefore used optogenetic tagging to confirm DA neuron identity. We attached fiber optic implants to our drivable electrodes¹⁶. Only fibers with ≥ 70% light transmittance measured through the optic fiber tip (PM120VA, ThorLabs) were used. Light (5–8 mW, 5 ms pulse width, 10–20 Hz for 1 s for ChR2) was delivered via laser (470 nm DPSS laser, Shanghai Laser & Optics) at the end of each behavioral session. Neurons were classified as tagged if each pulse of light produced a spike occurring with a latency of ≤ 7 ms and on ≥ 70% of trials (Supplementary Fig. 2). A total of 1683 neurons were recorded, of which 98 were tagged using optogenetics.

Classification of VTA neurons

For each unit, firing rates were estimated in a 3 s window beginning 0.5 s before the tone and lasting 3 s after the tone using 25 ms bins. Neural firing rates were only estimated on trials with the Spout in Front position and that contained CRs. Neuron firing rates were baseline-subtracted by the average of all pre-tone baseline firing rates (first 50 bins) and normalized to obtain the z-score for each unit around that baseline. These responses were concatenated along with 50 replicates of the baseline rate to generate a functional vector for each unit. The functional vectors representing the entire population were stacked together into one matrix. We then used agglomerative clustering on this matrix, which resulted in 5 different response profiles. All preprocessing was done in Matlab and clustering was performed with the function clusterdata using Euclidean distance and Ward linkage parameters. After clustering was complete, cells were manually checked for the consistency of the responses. Some neurons were found to be clustered incorrectly and were then manually changed to the appropriate cluster. We confirmed that these functional classes corresponded to distinct waveform profiles of DA and GABAergic populations. All three DA populations are distinct from our GABAergic populations and clustered together with the confirmed tagged DA neurons (Supplementary Fig. 4). All 3 populations were analyzed as DA neurons in the manuscript.

To classify DA neurons according to spontaneous responses, a similar approach was used. We first estimated the firing rates of all DA neurons around spontaneously generated forward and backward movements using 10 ms bins and baseline subtracted by the average of all pre-event baseline firing rates (first 50 bins). Mean firing rates during both forward and backward movements were then combined to produce a spontaneous activity vector. These vectors were assembled into a single matrix representing all DA neurons. We then used agglomerative clustering, which resulted in 6 different response profiles. All preprocessing was done in Matlab and clustering was performed with the function clusterdata using Euclidean distance and Ward linkage parameters. The resulting groups fell into either Forward DA, Backward DA, Increasing non-selective DA, decreasing non-selective DA neurons, in addition to a small group of unrelated cells.

Optogenetic parameters

Optogenetic stimulation sessions were identical to the Pavlovian conditioning tasks with the electrophysiological recordings, as described above. Pulses of light (Excitation: 5–8 mW measured at the tip of the optic fiber connected to the optic implant, 5 ms pulse width, 30 Hz 5–30 pulses; Inhibition: 5 mW, 1 s pulse width) were delivered via a laser (470 nm DPSS laser, Shanghai Laser & Optics) and controlled through an Arduino.

Force conversions

Force conversion of load cell signals was described previously^16,17. Briefly, we calibrated the load cell circuits using a conversion factor (expressed in Newtons per Volt) determined by the linear relationship between the voltage changes resulting from known masses placed on the sensor. Force was determined by multiplying voltage signal by the conversion factor to obtain a value in Newtons. Impulse was calculated using the Matlab function trapz to integrate the total area under the force curve over time from movement onset to movement termination¹⁸. The change in force was the first derivative of the force signal, calculated using the diff function in Matlab. The resulting values were divided by the bin size (1 ms) and smoothed by convolving the signal with a Gaussian filter with a standard deviation of 12 bins.

Detection of movement initiation

Forward movements were defined during force exertions that exceeded a threshold of 0.5 standard deviation of force greater than 0, lasted longer than 100 ms and was also separated by at least 100 ms from another movement. Backward movements were defined as events lower than 1.5 standard deviations of all force values below 0. These events had to last longer than 100 ms, had to be separated by at least 500 ms. Backward movements had to occur independently of forward movements and so could not coincide with the end of any forward movement, as sometimes mice moved backward immediately after moving forward. CRs were defined as the first force exertion after the CS. URs were defined as first exertion events occurring after the reward. During early stages of training, URs were detected using first lick onset to overcome high variability in force direction. Spontaneous movements were identified as any forward or backward movements occurring in the intervals beginning 3 s after any stimulus (Tones, rewards, or air puff) and 0.5 s before the next stimulus.

To see if DA activity can predict latency, trials with force exertion between 0 and 1 s following CS presentation were selected. Firing rate on a given trial was estimated by taking the inverse of the mean inter-spike-interval for all pre-movement spikes. Trials were binned according to latency into 20 bins. Neurons whose CS activation did not exceed a z-score value of 3 for two consecutive time bins were not included in the analysis. For each CS-modulated neuron, the mean firing rates observed during trials with a specific binned latency value were assigned to that bin and averaged. R² was computed using the averaged firing rates corresponding to binned latency values (Matlab function corr).

Training stages were defined as early, intermediate, and late by the following criteria. Early training: initial sessions with CR variance being above the 95th percentile across all sessions recorded for the mouse. Intermediate training: sessions not classified as early but that were above the median CR latency variance across all sessions. Late training: sessions with CR latency variance below the median CR latency variance across all sessions.

Analysis of force tuning

Recent work found that VTA DA neurons can be identified based on their responses to forward and backward movements¹⁸. Because cell responses during Pavlovian conditioning are not entirely indicative of direction tuning, we also classified VTA DA neurons according to their preferring direction by comparing the neural activity of all recorded neurons during spontaneous forward and backward movement events, which were a consistent feature of mouse behavior independent of learning (Fig. 1 and Supplementary Fig. 1). This approach allowed us to identify force tuning independently of Pavlovian conditioning. Change in force during spontaneous movements from −500 ms to +500 ms around movement initiation was used (25 ms bins). Firing rate for each neuron was also binned into the same bins, creating two vectors. A cross-correlation was computed between the firing activity vectors and the change in force for the neuron’s preferred direction of force exertion to determine the time shift between the two signals. Neural activity was then shifted according to the lag between its activity and change in force in the preferred direction. We next sorted the change in force vector according to magnitude and excluded outliers (1st and 99th percentile), which contain very few replicates. We averaged neural activity obtained concurrently with forces from 10 evenly spaced monotonically increasing force bins that evenly spanned the range force changes. This same procedure was used when analyzing the relationships between single unit activity with force in the non-preferred direction.

Cross-correlation analysis was performed in Matlab. Neuron firing was referenced to either Force or Change in force, depending on the signal best matched to the firing profile (broad modulation or bursting). 25 ms bins were used. The cross-correlogram was smoothed with a Gaussian filter with SD = 3, and the lag that corresponded to the peak of the cross-correlogram was collected for each neuron. For neurons that showed only decreasing changes in firing (DA decreasing), we used time to the minimum of the cross-correlogram as the lag.

Analysis of DA activity in relation to consummatory behavior

For each trial, we calculated the lick rate and the area under the force curve (Impulse) over the 1 s window after reward. The average change in force was taken over a 500 ms window over reward. The estimated firing rate for each DA neuron was also calculated for each trial over a 250 ms window, and these estimates were averaged across the entire DA population to generate a single DA firing rate estimate per trial. Behavior variables and DA population rates were binned over all training sessions into 25 trial bins for linear regression analysis.

Licking activity for each trial was aligned to the US. Instantaneous change in licking rate was estimated by calculating the inverse of each inter-lick-interval. The instantaneous change in lick rate was given a timestamp by taking the midpoint between its corresponding two lick timestamps, and these values were binned into a time bin vector spanning −1 to +2 from the US, with 25 ms bins. Estimated firing rates from DA neurons was determined in the same manner and were binned into the same time bins as licking. The final latency to change in lick rate for each subject was found by taking the latency from the US to the peak change in lick rate.

Computational model of DA concentration

Extrasynaptic concentrations over time (d[DA]/dt) are determined by spike rate (S_t) and reuptake kinetics (K_m & V_max). Differences in concentration in target regions determine direction of force exertion. We assume that Forward and Backward VTA DA neurons have different target cells that contribute to force generation in a preferred direction. To approximate the resulting force, we compute the difference between predicted DA concentrations resulting from activity in these two DA populations from a previous model we can describe DA concentration dynamics in the striatum as follows:

$$\frac{d[{DA}]}{{dt}}={[{DA}]}_{{release}}*{{FR}}_{t}-\,\frac{{\left[{DA}\right]}_{t}*{V}_{\max }}{{K}_{m}+{[{DA}]}_{t}}$$

(1)

where [DA] is the average DA concentration at time t in the target striatal area (in µM), [DA]_release is the instantaneous increase in DA concentration (in µM) proportional to the DA population’s firing rate FR (in Hz), V_max is the maximal reuptake rate of DA in the striatum, and K_m is the Michaelis–Menten parameter that is the reciprocal of the affinity of the DA transporter. Value constraints for all parameters were determined using prior experimental and modeling work^19,20,60. We neglect the contribution of diffusion, degradation, and glial uptake on DA concentrations, and only consider reuptake in our model. No tuning was done to fit the data and no filters were applied. All curves were set at the same average starting baseline and the model curves are scaled so that they have the same maximum as the experimental data.

Quantification and statistical analyses

All analyses were performed with Matlab, NeuroExplorer, and Graphpad Prism. Statistical analyses were performed in Graphpad Prism. Post-hoc tests for multiple comparisons were reported for all significant one-way ANOVAs and all two-way ANOVA analyses with significant interactions between categories. A power analysis was not conducted a priori to determine sample size.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Data https://doi.org/10.6084/m9.figshare.29485934. Source data are provided with this paper.

Code availability

No custom code was used for data analysis.

References

Morales, M. & Margolis, E. B. Ventral tegmental area: cellular heterogeneity, connectivity and behaviour. Nat. Rev. Neurosci. 18, 73–85 (2017).
Article CAS PubMed Google Scholar
Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).
Article CAS PubMed Google Scholar
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Cagniard, B. et al. Dopamine scales performance in the absence of new learning. Neuron 51, 541–547 (2006).
Article CAS PubMed Google Scholar
Yin, H. H., Zhuang, X. & Balleine, B. W. Instrumental learning in hyperdopaminergic mice. Neurobiol. Learn Mem. 85, 283–288 (2006).
Article CAS PubMed Google Scholar
Barter, J. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).
Article PubMed PubMed Central Google Scholar
Berridge, K. C. The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology 191, 391–431 (2007).
Article CAS PubMed Google Scholar
Randall, P. A. et al. Dopaminergic modulation of effort-related choice behavior as assessed by a progressive ratio chow feeding choice task: pharmacological studies and the role of individual differences. PLoS One 7, e47934 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Panigrahi, B. et al. Dopamine is required for the neural representation and control of movement vigor. Cell 162, 1418–1430 (2015).
Article CAS PubMed Google Scholar
Salamone, J. D. & Correa, M. The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485 (2012).
Article CAS PubMed PubMed Central Google Scholar
Mohebi, A. et al. Dissociable dopamine dynamics for learning and motivation. Nature 570, 65–70 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072 (2018).
Article CAS PubMed PubMed Central Google Scholar
Maes, E. J. et al. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat. Neurosci. 23, 176–178 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lammel, S. et al. Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron 85, 429–438 (2015).
Article CAS PubMed PubMed Central Google Scholar
Keiflin, R., Pribut, H. J., Shah, N. B. & Janak, P. H. Ventral tegmental dopamine neurons participate in reward identity predictions. Curr. Biol. 29, 93–103.e103 (2019).
Article CAS PubMed Google Scholar
Bakhurin, K. I., Hughes, R. N., Barter, J. W., Zhang, J. & Yin, H. H. Protocol for recording from ventral tegmental area dopamine neurons in mice while measuring force during head-fixation. STAR Protocols 1, 100091 (2020).
Hughes, R. N., Bakhurin, K. I., Barter, J. W., Zhang, J. & Yin, H. H. A head-fixation system for continuous monitoring of force generated during behavior. Front. Integr. Neurosci. 14, 11 (2020).
Article PubMed PubMed Central Google Scholar
Hughes, R. N. et al. Ventral tegmental dopamine neurons control the impulse vector during motivated behavior. Curr. Biol. 30, 1–14 (2020).
Article Google Scholar
Wightman, R. M. & Zimmerman, J. B. Control of dopamine extracellular concentration in rat striatum by impulse flow and uptake. Brain Res. Rev. 15, 135–144 (1990).
Article CAS PubMed Google Scholar
Jones, S. R., Joseph, J. D., Barak, L. S., Caron, M. G. & Wightman, R. M. Dopamine neuronal transport kinetics and effects of amphetamine. J. Neurochemistry 73, 2406–2414 (1999).
Article CAS Google Scholar
Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195 (2016).
Article CAS PubMed PubMed Central Google Scholar
Horvitz, J. C. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656 (2000).
Article CAS PubMed Google Scholar
Rossi, M. A., Fan, D., Barter, J. W. & Yin, H. H. Bidirectional modulation of substantia nigra activity by motivational state. PLoS ONE 8, e71598 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Redgrave, P., Prescott, T. J. & Gurney, K. Is the short-latency dopamine response too short to signal reward error? Trends Neurosci. 22, 146–151 (1999).
Article CAS PubMed Google Scholar
Kutlu, M. G. et al. Dopamine release in the nucleus accumbens core signals perceived saliency. Curr. Biol. 31, 4748–4761. e4748 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jiang, Q. et al. GABAergic neurons in the ventral tegmental area represent and regulate force vectors. Cell Rep. 44, 115313 (2025).
Article CAS PubMed PubMed Central Google Scholar
Fiorillo, C. D. Two dimensions of value: dopamine neurons represent reward but not aversiveness. Science 341, 546–549 (2013).
Article ADS CAS PubMed Google Scholar
Matsumoto, M. & Hikosaka, O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115 (2007).
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Article ADS CAS PubMed Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Article CAS PubMed Google Scholar
Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563 (2018).
Article CAS PubMed PubMed Central Google Scholar
Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).
Article ADS CAS PubMed Google Scholar
McClure, S. M., Daw, N. D. & Read Montague, P. A computational substrate for incentive salience. Trends Neurosci. 26, 423–428 (2003).
Article CAS PubMed Google Scholar
Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).
Article CAS PubMed Google Scholar
Mahn, M. et al. High-efficiency optogenetic silencing with soma-targeted anion-conducting channelrhodopsins. Nat. Commun. 9, 4125 (2018).
Article ADS PubMed PubMed Central Google Scholar
Lee, K. et al. Temporally restricted dopaminergic control of reward-conditioned movements. Nat. Neurosci. 23, 209–216 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rossi, M. A. & Yin, H. H. Elevated dopamine alters consummatory pattern generation and increases behavioral variability during learning. Front. Integr. Neurosci. 9, 37 (2015).
Article PubMed PubMed Central Google Scholar
Bakhurin, K. I. et al. Opponent regulation of action performance and timing by striatonigral and striatopallidal pathways. eLife 9, e54831 (2020).
Article CAS PubMed PubMed Central Google Scholar
Choi, W. Y., Morvan, C., Balsam, P. D. & Horvitz, J. C. Dopamine D1 and D2 antagonist effects on response likelihood and duration. Behav. Neurosci. 123, 1279–1287 (2009).
Article CAS PubMed PubMed Central Google Scholar
McCullough, L. D., Cousins, M. S. & Salamone, J. D. The role of nucleus accumbens dopamine in responding on a continuous reinforcement operant schedule: a neurochemical and behavioral study. Pharm. Biochem Behav. 46, 581–586 (1993).
Article CAS Google Scholar
Cai, X. et al. Dopamine dynamics are dispensable for movement but promote reward responses. Nature 635, 406–414 (2024).
Yin, H. The Integrative Functions of the Basal Ganglia (CRC Press, 2023).
Yin, H. H. How basal ganglia outputs generate behavior. Adv. Neurosci. 2014, 768313 (2014).
Article Google Scholar
Lahiri, A. K. & Bevan, M. D. Dopaminergic transmission rapidly and persistently enhances excitability of D1 receptor-expressing striatal projection neurons. Neuron 106, 277–290.E6 (2020).
Kim, N. et al. A striatal interneuron circuit for continuous target pursuit. Nat. Commun. 10, 2715 (2019).
Article ADS PubMed PubMed Central Google Scholar
Barter, J. W. et al. Basal ganglia outputs map instantaneous position coordinates during behavior. J. Neurosci. 35, 2703–2716 (2015).
Article CAS PubMed PubMed Central Google Scholar
da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244 (2018).
Article ADS PubMed Google Scholar
Kimble, G. A. Hilgard and Marquis’ Conditioning and Learning 2nd edn, (Appleton-Century-Crofts, 1961).
Thurley, K., Senn, W. & Luscher, H.-R. Dopamine increases the gain of the input-output response of rat prefrontal pyramidal neurons. J. Neurophysiol. 99, 2985–2997 (2008).
Article PubMed Google Scholar
Calabresi, P., Picconi, B., Tozzi, A. & Di Filippo, M. Dopamine-mediated regulation of corticostriatal synaptic plasticity. Trends Neurosci. 30, 211–219 (2007).
Article CAS PubMed Google Scholar
Kim, I. H. et al. Spine pruning drives antipsychotic-sensitive locomotion via circuit control of striatal dopamine. Nat. Neurosci. 18, 883–891 (2015).
Article CAS PubMed PubMed Central Google Scholar
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Article CAS PubMed PubMed Central Google Scholar
Blaisdell, A. P., Gunther, L. M. & Miller, R. R. Recovery from blocking achieved by extinguishing the blocking CS. Anim. Learn. Behav. 27, 63–76 (1999).
Article Google Scholar
Iino, Y. et al. Dopamine D2 receptors in discrimination learning and spine enlargement. Nature 579, 555–560 (2020).
Article ADS CAS PubMed Google Scholar
Yin, H. H. Aligning brain and behavior. Curr. Opin. Behav. Sci. 62, 101487 (2025).
Article PubMed Google Scholar
Yin, H. H. The basal ganglia in action. Neuroscientist 23, 299–313 (2017).
Article PubMed Google Scholar
Margolis, E. B., Coker, A. R., Driscoll, J. R., Lemaitre, A. I. & Fields, H. L. Reliability in the identification of midbrain dopamine neurons. PLoS ONE 5, e15222 (2010).
Article ADS PubMed PubMed Central Google Scholar
Ungless, M. A. & Grace, A. A. Are you or aren’t you? Challenges associated with physiologically identifying dopamine neurons. Trends Neurosci. 35, 422–430 (2012).
Article CAS PubMed PubMed Central Google Scholar
Di Volo, M., Morozova, E. O., Lapish, C. C., Kuznetsov, A. & Gutkin, B. Dynamical ventral tegmental area circuit mechanisms of alcohol-dependent dopamine release. Eur. J. Neurosci. 50, 2282–2296 (2019).
Article PubMed Google Scholar

Download references

Acknowledgements

We would like to thank Fengxia Allen, Joseph Barter, Guozhong Yu, and Jinyong Zhang for technical assistance. This work was supported by NIH grants NS094754 and DA061556 to H.H.Y. This paper is dedicated to the memory of our co-author Ryan Hughes (1987–2021).

Author information

Authors and Affiliations

Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
Konstantin Bakhurin, Ryan N. Hughes, Qiaochu Jiang & Henry H. Yin
Group for Neural Theory, LNC2 INSERM U960, DEC, École Normale Supérieure, PSL Université, Paris, France
Meghdoot Hossain & Boris Gutkin
Départment de Biologie, École Normale Supérieure, PSL Université, Paris, France
Meghdoot Hossain
Department of Neurobiology, Duke University School of Medicine, Durham, NC, USA
Isabella P. Fallon & Henry H. Yin

Authors

Konstantin Bakhurin
View author publications
Search author on:PubMed Google Scholar
Ryan N. Hughes
View author publications
Search author on:PubMed Google Scholar
Qiaochu Jiang
View author publications
Search author on:PubMed Google Scholar
Meghdoot Hossain
View author publications
Search author on:PubMed Google Scholar
Boris Gutkin
View author publications
Search author on:PubMed Google Scholar
Isabella P. Fallon
View author publications
Search author on:PubMed Google Scholar
Henry H. Yin
View author publications
Search author on:PubMed Google Scholar

Contributions

K.B., R.N.H., and H.H.Y. designed the experiments. R.N.H. and K.B. performed surgeries, in vivo electrophysiological and optogenetic experiments, and histological analysis. K.B., R.N.H., I.P.F., and Q.J. performed data analysis. M.H. and B.G. built the model. K.B., R.N.H., and H.H.Y. wrote the manuscript. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Henry H. Yin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks John Salamone and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Movie 1 (download MP4 )

Supplementary Movie 2 (download MP4 )

Supplementary Movie 3 (download MP4 )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bakhurin, K., Hughes, R.N., Jiang, Q. et al. Dopamine dynamics during stimulus-reward learning in mice can be explained by performance rather than learning. Nat Commun 16, 9081 (2025). https://doi.org/10.1038/s41467-025-64132-4

Download citation

Received: 08 June 2023
Accepted: 05 September 2025
Published: 13 October 2025
Version of record: 13 October 2025
DOI: https://doi.org/10.1038/s41467-025-64132-4

This article is cited by

Functional imaging of time on task and the involvement of dopaminergic and cholinergic substrates in cognitive effort and reward
- Chiara Orsini
- Julia E. Bosch
- Roberto Viviani
Scientific Reports (2026)