Abstract
The nucleus accumbens, a highly integrative brain region controlling motivated behavior, receives various glutamatergic inputs, yet the relative functional specialization of these inputs is unclear. While circuit neuroscience commonly seeks specificity, redundancy can be highly adaptive and is a critical motif in circuit organization. Using dual-site fiber photometry in an operant reward task in mice, we simultaneously recorded from two accumbal glutamatergic afferents to assess circuit specialization. We identify a common neural motif integrating reward history in medial prefrontal cortex and ventral hippocampus inputs. By systematically degrading task complexity, dissociating reward from choice and action, we identify circuit-specificity in the behavioral conditions that recruit encoding. While input from the prefrontal cortex invariantly encodes reward, encoding in ventral hippocampal input is uniquely anchored to unrewarded outcomes. Optogenetic stimulation demonstrates that both inputs co-operatively modulate task engagement. We illustrate how similar encoding, differentially gated by behavioral state, supports state-sensitive tuning of reward-motivated behavior.
Similar content being viewed by others
Introduction
Redundancy is a defining property of nervous system organization1,2,3 yet there has been limited consideration of the role of redundancy in neural circuit mechanisms of motivated behavior. Redundancy in neural circuits may confer various advantages, including increasing the robustness of cognition and behavior to perturbation, enhancing encoding accuracy, and facilitating the coherent integration of multiple inputs, suggesting it should be a frequently observed motif4,5,6. While the literature abounds with examples of apparent circuit-specific cognitive and behavioral functions, the potential for redundancy is rarely examined. To better understand these opposing motifs in nucleus accumbens (NAc) circuits we leveraged dual circuit recordings and computational modeling to rigorously test the specificity and redundancy of information processing in a fully controlled, within-animal comparison of two NAc glutamatergic inputs.
The NAc integrates glutamatergic inputs with dopaminergic input from the ventral tegmental area, with multiple glutamatergic inputs converging at the level of individual medium spiny neurons in the NAc medial shell7,8,9,10,11,12,13,14. Prominent theoretical perspectives hold that these inputs send qualitatively distinct information, which the NAc then integrates to orchestrate motivated behavior10,12,15,16. For example, the mPFC contributes information about rewarding events and executive control, while the vHip contributes emotional context and behavioral inhibition16,17,18,19,20,21,22,23,24,25. Despite predictions of distinct encoding and behavioral function for mPFC-NAc and vHip-NAc, strong evidence of functional specialization is lacking. To date, most studies have examined a single input, and the few studies that examined one or more inputs in the same task compared across animals leaving open the possibility that inter-individual variation in behavior and other variables influence neural encoding7,26,27.
To systematically interrogate functional redundancy versus specialization, we simultaneously probed neural encoding using dual-site in vivo fiber photometry to record activity in two glutamatergic circuits during reward-guided choice in a two-armed bandit task. The mPFC-NAc is widely appreciated to mediate reward processing, and given that vHip-NAc inputs converge with mPFC-NAc, we asked if the vHip-NAc might also contribute to this function16,22,23. Using trial-by-trial modeling of neural activity, we identify a mechanism for integrating outcome information across trials that is common to both circuits. Analyzing the redundancy across signals revealed an additional dimension of uniqueness to vHip-NAc encoding. By sequentially degrading task complexity, we show that, despite sharing a common mechanism for outcome integration, each circuit is recruited in distinct behavioral states, with the vHip-NAc preferentially encoding reward after unrewarded outcomes. Optogenetically manipulating circuit-specific activity revealed that, once recruited, both inputs cumulatively mediate dynamic behavioral engagement. Our findings reveal a co-operative circuit organization in NAc wherein redundant encoding in two inputs is gated by circuit-specific mechanisms for state-sensitive tuning of reward-motivated behavior.
Results
mPFC-NAc and vHip-NAc similarly encode outcomes in a probabilistically rewarded environment
To assess redundancy versus specificity in outcome encoding in two distinct circuits under matched conditions and trial histories, we injected retrograding AAV-GCaMP7f in NAc medial shell and implanted optic fibers in mPFC and vHip to record Ca2+-associated fluorescence while mice engaged in reward-guided choice (Fig. 1E). We trained mice in a two-lever probabilistic reward learning task (i.e. a two-armed bandit task) in which lever pressing probabilistically earns a chocolate milk reward (Fig. 1A). Following each lever press, one of two distinct auditory cues signaled trial outcome (rewarded, unrewarded) and start of the inter-trial interval (ITI). To maintain a dynamic environment with robustly encountered rewarded and unrewarded outcomes, levers were probabilistically rewarded on 80% or 20% of presses with probabilities switched after five consecutive responses on the high probability lever. Female (n = 10) and male (n = 12) mice experienced similarly high numbers of unrewarded and rewarded trials and low numbers of omission trials (Fig. 1B–D). Examining behavior across sessions shows decreasing staying probability after unrewarded outcomes and increasing rewards earned, indicating animals use information about outcomes to guide behavior (Fig. S1).
A Schematic of two-armed bandit task55. Mice lever press in a two lever task in which one lever is rewarded with chocolate milk on 80% of trials, and the other on 20%. Following a lever press, levers retract, and auditory cues signal the outcome and start of a 10 sec inter-trial interval (ITI). Contingencies switch after five consecutive responses on the high probability lever. Female (n = 10) and male (n = 12) mice robustly engage with the task, experiencing similar numbers of (B) unrewarded (C) rewarded, and (D) omission trials. E Retrograding jGCaMP7f is injected into the nucleus accumbens (NAc) medial shell and optic fibers implanted in medial prefrontal cortex (mPFC) and ventral hippocampus (vHip) to simultaneously probe neural activity indicated by Ca2+-associated fluorescence changes in (F) mPFC neurons projecting to NAc (representative image; scale bars, 200 μm) and (G) vHip neurons projecting to NAc (representative image; scale bars, 200 μm) as mice encounter reward and non-reward. Estimated mean mPFC-NAc activity across all rewarded and unrewarded trials in (H) female (n = 10) and (I) male (n = 12) mice. y = 0 is indicated by a dashed horizontal line. Analysis focused on 8–10 sec after lever press (ITI end, shaded). At ITI end, mPFC-NAc activity is suppressed by rewarded outcomes in female (J; n = 10; Z = 21.348, p = 8.14E−101) and male (K; n = 11; Z = 19.625, p = 1.89E−85) mice. Estimated mean vHip-NAc activity across all rewarded and unrewarded trials in (L) female (n = 10) and (M) male (n = 12) mice. At ITI end, vHip-NAc activity is suppressed by rewarded outcomes in female (N; n = 10; Z = 8.161; p = 6.65E−16) and male (O; n = 12; Z = 8.924; p = 8.99E−19) mice. Heatmap of mPFC-NAc activity to (P) rewarded outcomes and (Q) unrewarded outcomes in a representative animal across one session. Heatmap of vHip-NAc activity to (R) rewarded outcomes and (S) unrewarded outcomes in a representative animal across one session. Comparisons were performed using a two-sided Z-test and Sidak’s method to adjust for multiple comparisons. Error bars represent SEM around the estimated mean. Source data are provided as a Source Data file. ****p < 0.0001.
Trial based tasks are ideal for probing neural encoding, generating large numbers of trials. However, standard analysis approaches either analyze individual trials, failing to account for the within-animal nested data structure and inappropriately inflating effects, or average all trials within animals, thereby underestimating effects. Choice tasks are additionally challenging, with the number of instances of each trial type varying across animals. To preserve the power of trial-by-trial data while accounting for the nested structure and unbalanced observations, we used a linear mixed model approach28. To examine how the outcome is encoded in each projection, we modeled normalized Ca2+-associated fluorescence change as a function of trial outcome while controlling for inter-individual variability.
Reward strongly suppressed mPFC-NAc and vHip-NAc activity in female and male mice. In mPFC-NAc, a peak following the lever press and outcome delivery is followed by gradually emerging reward-associated suppression across the ITI (Fig. 1H, I, P, Q). In vHip-NAc, an initial peak is followed by suppression after the lever press and outcome delivery, with suppression sustained following reward or activity gradually increasing following unrewarded outcomes (Fig. 1 L, M, R, S). We focused analysis on the end of the ITI (8–10 sec after lever press) when the trial outcome has been integrated prior to the next trial start. By ITI end, reward robustly suppressed mPFC-NAc activity in female and male mice (Fig. 1J, K). Reward also robustly suppressed vHip-NAc activity in female and male mice (Fig. 1N, O). This indicates that outcome encoding emerges across the ITI with reward suppressing mPFC-NAc and vHip-NAc activity. To explore modulation by other task factors, we examined neural encoding time-locked to licking and decision-relevant behaviors. We did not observe clear neural encoding of licking (Fig. S2A, D), the decision to stay or shift (Fig. S2B, E), or the identity of the chosen lever (Fig. S2C, F), suggesting that outcome is the primary source of modulation in mPFC-NAc and vHip-NAc in this task.
Having observed that mPFC-NAc and vHip-NAc are similarly modulated by reward, we then examined if one circuit leads the other. We found that the time-lag for the maximum cross-correlation between mPFC-NAc and vHip-NAc did not significantly differ from zero in rewarded or unrewarded trials in either sex. This shows that neither circuit drives outcome encoding in the other (Fig. S3A, B). Interestingly, we note that although suppression emerges earlier in vHip-NAc than mPFC-NAc (Fig. S3C), the utility of this suppression in distinguishing rewarded vs unrewarded outcome emerges earlier in mPFC-NAc than vHip-NAc (~3 seconds post lever press in mPFC-NAc vs. ~ 4 seconds post lever press in vHip-NAc; Fig. S3D, E). This suggests that while the overall informational encoding is comparable, the underlying dynamics likely vary considerably between pathways.
mPFC-NAc and vHip-NAc integrate reward history
We find that mPFC-NAc and vHip-NAc similarly encode outcomes. Visualizing this encoding across a longer timespan shows that reward-mediated suppression can last across tens of seconds in mPFC-NAc and vHip-NAc (Fig. S4). We thus speculated that this enduring modulation might integrate reward information across successive trials and that this integration might be more prominent in mPFC-NAc than vHip-NAc, given prior evidence of enduring representation in mPFC16,23,29. To test this, we sorted trials by both prior and current outcome, identifying trial sequences that were rewarded then rewarded (R→R), rewarded then unrewarded (R→U), unrewarded then rewarded (U→R), and unrewarded then unrewarded (U→U). We then compared neural activity across the ITI on the most recent trial to determine how prior outcome modulates outcome encoding on the current trial. Analyzing males and females separately revealed similar modulation (Fig. S5), and we, therefore, report sex-combined analyses. Both previous and current outcome modulate mPFC-NAc activity (Fig. 2A). Following a given trial (t -1), reward suppresses mPFC-NAc activity (Fig. 2B), effectively resetting the baseline for the next trial. Reward on the subsequent trial (t0) similarly suppresses mPFC-NAc by ITI end, regardless of prior outcome. However, when mice are unrewarded on the subsequent trial (t0), suppression of mPFC-NAc by prior reward is maintained through ITI end (Fig. 2C). This suggests that a single reward maximally and enduringly suppresses mPFC-NAc activity and that, in the absence of subsequent reward, this suppression slowly dissipates.
A Estimated mean mPFC-NAc activity across pairs of consecutive trials (t-1→t0) showing rewarded+rewarded (R→R), rewarded+unrewarded (R→U), unrewarded+rewarded (U→R), and unrewarded+unrewarded (U→U) trial pairs in female (n = 10) and male (n = 11) mice. y = 0 is indicated by a dashed horizontal line. Analysis focused on 8–10 sec after lever press (ITI end, shaded). B On trial t-1, mPFC-NAc activity is significantly suppressed by reward (male n = 11, female n = 10; U→U vs R→U: Z = 28.99496, p = 1.52E−84; U→R vs R→R: Z = 25.6767, p = 4.25E−145). (C) On the subsequent trial, t0, mPFC-NAc activity is significantly suppressed by current reward (male n = 11, female n = 10; U→U vs U→R: Z = -28.5098, p = 5.31E−178; R→U vs R→R: Z = −19.8981, p = 2.53E−87, U→U vs R→R: Z = −29.0153, p = 2.53E−84). When trial t0 is unrewarded, mPFC-NAc activity remains significantly suppressed by reward experienced on the previous trial, t-1, (U→U vs R→U: Z = 9.1965, p = 2.22E−19; U→R vs R→R: Z = 1.9308, p = 0.2811; R→U vs U→R: Z = −18.7786, p = 6.78E−78). D Estimated mean vHip-NAc activity across pairs of consecutive trials (t-1→t0) showing rewarded+rewarded (R→R), rewarded+unrewarded (R→U), unrewarded+rewarded (U→R), and unrewarded+unrewarded (U→U) trial pairs in female (n = 10) and male (n = 12) mice. y = 0 is indicated by a dashed horizontal line. E On trial t-1, vHip-NAc activity is significantly suppressed by reward (male n = 12, female n = 10; U→U vs R→U: Z = 14.9372, p = 3.77E−50; U→R vs R→R: Z = 11.6962, p = 2.27E−30). F On the subsequent trial, t0, vHip-NAc activity is significantly suppressed by current reward (male n = 12, female n = 10; U→U vs U→R: Z = -17.4993, p = 8.70E−68; R→U vs R→R: Z = −7.1005, p = 7.46E−12; U→U vs R→R: Z = −18.0126, p = 9.31E−72). When trial t0 is unrewarded, vHip-NAc activity remains suppressed by reward experienced on the previous trial, t-1, (U→U vs R→U: Z = 11.4112, p = 2.21E−29; U→R vs R→R: Z = 1.4235, p = 0.6349; R→U vs U→R: Z = 5.9394, p = 0.6349). Individual-animal averages are indicated by circles for males and triangles for females. Averages are indicated by circles for males and triangles for females. Comparisons were performed using a two-sided Z-test and Sidak’s method to adjust for multiple comparisons. Error bars represent SEM around the estimated mean. Error bars represent SEM around the estimated mean. Source data are provided as a Source Data file. ****p < 0.0001.
We then examined if vHip-NAc similarly integrates outcomes (Fig. 2D). Following a given trial (t-1), reward suppresses vHip-NAc activity (Fig. 2E). As with mPFC-NAc, this resets the baseline for the next trial (t0), wherein reward suppresses vHip-NAc regardless of prior outcome. However, when the subsequent trial (t0) is unrewarded, suppression of vHip-NAc activity by prior reward is maintained through ITI end (Fig. 2F). Together, this shows that mPFC-NAc and vHip-NAc similarly integrate outcomes across trials. In both circuits, reward maximally suppresses neural activity, and activity gradually increases following subsequent unrewarded outcomes, such that, by ITI end, the relative degree of suppression represents an integrated reward outcome history.
mPFC-NAc and vHip-NAc are differentially sensitive to unrewarded outcomes
Analyzing neural encoding of reward and outcome integration revealed that mPFC-NAc and vHip-NAc similarly encode reward suggesting they may provide redundant information to the NAc. To test redundancy between mPFC-NAc and vHip-NAc we calculated the conditional entropy of mPFC-NAc given vHip-NAc (H(mPFC-NAc|vHip-NAc)) and vHip-NAc given mPFC-NAc (H(vHip-NAc|mPFC-NAc)). In this way, we assessed the information contributed by each circuit beyond that contributed by the other at ITI end, when the outcome is fully integrated (Fig. 3A). We contrasted entropy between rewarded and unrewarded outcomes as a function of prior outcome. Relative to unrewarded outcomes, the entropy of mPFC-NAc given vHip-NAc was reduced by rewarded outcomes, indicating that vHip-NAc and mPFC-NAc signals are more redundant after reward than non-reward (Fig. 3B). In contrast, following previous unreward, but not previous reward, current reward increased the entropy of vHip-NAc given mPFC-NAc (Fig. 3C), indicating that, under these conditions, mPFC-NAc explains less of the vHip-NAc signal. This shows that, after reward, vHip-NAc and mPFC-NAc encoding converges, becoming more redundant, but when reward is made more surprising by immediately following an unrewarded outcome, vHip-NAc carries additional information. That is, despite global redundancy in reward encoding motifs, we identify a dimension of circuit specificity and a potential unique role for vHip-NAc in encoding reward following unrewarded outcomes.
A Venn diagram representing the relationship between the mutual information and conditional entropy that exists between observed mPFC-NAc and vHip-NAc signals. Conditional entropy is a measure of the additional unique information contributed by a second signal given fully knowledge of a first signal. B Conditional entropy in mPFC-NAc is reduced on rewarded relative to unrewarded trials regardless of previous outcome (male n = 11, female n = 10; U→U vs U→R: Z = 2.8644, p = 0.0083; R→U vs R→R: Z = 3.5185, p = 0.0009) indicating that less unique information is carried in mPFC-NAc after reward. C Conditional entropy in vHip-NAc is increased on rewarded relative to unrewarded trials only when the prior outcome was unrewarded (male n = 11, female n = 10; U→U vs U→R: Z = −3.8585, p = 0.0002) indicating that more unique information is carried in vHip-NAc when reward follows nonreward. Comparison of activity at ITI end on currently rewarded or unrewarded trials considering prior outcome history up to three trials back shows that (D) mPFC-NAc activity is suppressed on every currently rewarded trial indicating that mPFC-NAc consistently encodes current outcome via relative suppression regardless of outcome history (male n = 11, female n = 10). In contrast, (E) vHip-NAc activity is suppressed on currently rewarded trials except when current reward is preceded by two (Z = 1.2310, p = 0.8606) or three (Z = 0.8398, p = 0.9834) prior consecutive rewards indicating that vHip-NAc ceases to encode current outcome via relative suppression after consistent reward (male n = 12, female n = 10). See Supplementary Table 1 for all comparisons. Comparisons were performed using a two-sided Z-test and Sidak’s method to adjust for multiple comparisons. Error bars represent SEM around the estimated mean. Source data are provided as a Source Data file. **p < 0.01, ***p < 0.001, ****p < 0.0001.
If this is true, across outcome histories, vHip-NAc encoding should be most apparent when reward follows an unrewarded outcome, whereas, following consecutive rewards, vHip-NAc should become insensitive to outcome as rewards become less surprising. In contrast, mPFC-NAc encoding is predicted to be relatively invariant across outcome histories. To test this, we examined current outcome encoding at ITI end while considering prior outcomes up to three trials back. Consistent with our prediction, mPFC-NAc encoded current outcome regardless of prior outcome history (Fig. 3D; Supplementary Table 1) while vHip-NAc failed to encode current outcome after two or more consecutive rewards (Fig. 3E; Supplementary Table 1). This effect seems to be mostly mediated by differences in how mPFC-NAc and vHip-NAc respond to unrewarded outcomes. While reward continues to suppress activity in both pathways regardless of reward history, when encountering an unrewarded outcome following several rewarded outcomes, mPFC-NAc activity increases as expected but vHip-NAc activity fails to immediately increase.
Degrading task requirements reveals circuit-specific roles in reward integration
Analyzing informational redundancy and encoding across varying outcome histories suggested that, while mPFC-NAc and vHip-NAc encode and integrate reward via a common mechanism, each may nevertheless serve distinct functions in reward processing. To isolate the specific conditions under which each circuit integrates outcomes we recorded neural activity while degrading task requirements to sequentially eliminate choice and action. We first eliminated choice, extending only a single lever while maintaining the requirement to press to elicit an outcome. To hold outcome experience constant, the specific sequence of reward and unreward was yoked to each animal’s prior performance on the two-lever task (Fig. 4A). In the absence of choice, mPFC-NAc continued to encode previous and current outcome (Fig. 4B). On trial t0, by ITI end, current and prior outcomes were encoded, as in the two-lever task (Figs. 4C, 2C). Examining vHip-NAc in the one-lever task also revealed largely similar outcome-mediated modulation (Figs. 4D, 2D). At ITI end, prior and current outcomes were integrated, similar to the two-lever task (Figs. 4E, 2F). Despite conserved information encoding in both circuits, the shape of the vHip-NAc signal was more visibly altered than the mPFC-NAc. In particular, the vHip-NAc signal in the one-lever task appeared noisier and blunted with the expected peak following lever press largely absent, potentially suggesting heightened sensitivity to task structure. Overall, we find that both mPFC-NAc and vHip-NAc maintain similar graded representations of reward history that are largely independent of choice requirements.
A One-lever task schematic55. B Estimated mean mPFC-NAc activity across consecutive trial pairs (t-1→t0) showing rewarded+rewarded (R→R), rewarded+unrewarded (R→U), unrewarded+rewarded (U→R), and unrewarded+unrewarded (U→U) trial pairs (male n = 7, female n = 6). Dashed line y = 0. We analyzed 8–10 sec post-press (ITI end, shaded). (C) On trial t0, reward suppresses mPFC-NAc (male n = 7, female n = 6; U→U vs U→R: Z = 18.8757, p = 1.08E−78; R→U vs R→R: Z = 12.0687, p = 9.27E−23; U→U vs R→R: Z = 18.2004, p = 3.07E−73). When trial t0 is unrewarded, mPFC-NAc remains suppressed by prior reward, t-1, (U→U vs R→U: Z = 6.3467, p = 1.32E−9; U→R vs R→R: Z = −0.7826, p = 0.9671; U→R vs R→U: Z = 12.7865, p = 1.17E−36). D Estimated mean vHip-NAc activity across trial pairs (t-1→t0) (male n = 8, female n = 6). E On trial t0, reward suppresses vHip-NAc (male n = 8, female n = 6; U→U vs U→R: Z = 8.5245, p = 9.21E−17; R→U vs R→R: Z = 4.0519, p = 0.0003; U→U vs R→R: Z = 10.2097, p1.08E−23). When trial t0 is unrewarded, vHip-NAc remains suppressed by prior reward, t-1, (U→U vs R→U: Z = 6.2425, p = 2.58E−9; U→R vs R→R: Z = 1.7019, p = 0.4275; U→R vs R→U: Z = 2.3408, p = 0.1101). F No-lever task schematic55. G Estimated mean mPFC-NAc activity across trial pairs (t-1→t0) (male n = 7, female n = 6). H On trial t0, reward suppresses mPFC-NAc (male n = 7, female n = 6; U→U vs U→R: Z = 8.2136, p = 1.29E−15; R→U vs R→R: Z = 7.4647, p = 5.01E−13; U→U vs R→R: Z = 8.5242, p = 9.23E−17; U→U vs R→U: Z = 1.1662, p = 0.8126; U→R vs R→R: Z = 0.3493, p = 0.9996; U→R vs R→U: Z = 7.1124, p = 6.84E−12). I Estimated mean vHip-NAc activity across trial pairs (t-1→t0) (male n = 8, female n = 6). J On trial t0, reward suppresses vHip-NAc only if trial t-1 was unrewarded (male n = 8, female n = 6; U→U vs U→R: Z = 3.7413, p = 0.0011; R→U vs R→R: Z = 1.5289, p = 0.5551; U→U vs R→R: Z = 5.3913, p = 4.20E−7). When trial t0 is unrewarded, vHip-NAc remains suppressed by prior reward, t-1, (U→U vs R→U: Z = 3.8661, p = 0.0007; U→R vs R→R: Z = 1.6584, p = 0.4587; U→R vs R→U: Z = −0.1282, p = 0.9999). Circles (male) and triangles (female) indicate individual-animal averages. Two-sided Z-test with Sidak’s correction for multiple comparisons. Error bars represent SEM of estimated mean. Source data in Source Data file. ***p < 0.001, ****p < 0.0001.
Removing the lever choice minimally impacted reward integration. We then asked if neural integration of outcome history is entirely independent of response requirements by removing both levers in a choice-free, response-free task. Trials continued to be signaled by cue-lights, but without lever extension and outcomes were passively delivered yoked to each animal’s individual performance on the full two-lever task (Fig. 4F). Eliminating the response requirement markedly and distinctly altered reward integration in both circuits. In mPFC-NAc (Fig. 4G), encoding of prior outcome was erased, and only the current outcome encoded (Fig. 4H). This differs from both the two-lever and one-lever tasks wherein mPFC-NAc encoded a graded representation of reward history and suggests that mPFC-NAc integrates reward history only in instrumental settings where a response elicits outcomes. However, even when rewards are passively encountered (i.e., when no lever press is required), mPFC-NAc continues to encode reward but with a shortened time constant, such that only the most recent outcome is retained.
vHip-NAc representation of reward history was also degraded, yet in a distinct manner (Fig. 4I). Current outcomes were encoded only when the previous trial, t-1, was unrewarded (Fig. 4J). This shift in encoding translates into vHip-NAc effectively overlooking isolated instances of non-reward, likely reflecting an extended time constant. Critically, this cannot be explained by changes in task engagement given that mPFC-NAc continued to represent reward in these same animals (Fig. 4G, H), and licking bouts were similarly maintained across task variants (Fig. S6). Following the removal of response requirements, we returned animals to the two-lever task and again observed encoding of integrated reward history (Fig. S7,) confirming that the modulation of encoding across task degradation is indeed attributable to altered task requirements and is not artifactual (e.g., potential signal degradation over time). This reveals that task demands differentially shape neural encoding of reward in mPFC-NAc and vHip-NAc. When reward is passively encountered, independent of a required response, mPFC-NAc maintains a simplified reward representation across a shortened temporal window, limiting integration across trials. In contrast, vHip-NAc anchors encoding to unrewarded outcomes with an extended time constant, to preferentially represent surprising rewards. This suggests that while the fundamental function of mPFC-NAc in rewarding contexts is to encode outcomes, the fundamental function of vHip-NAc is to use information about unrewarded outcomes to tune outcome encoding.
mPFC-NAc and vHip-NAc modulate task engagement
Examining neural representation of outcomes identified both mechanistic redundancy and functional specificity in mPFC-NAc and vHip-NAc encoding. We then asked how this neural processing might integrate to modulate behavior. While in general encoding was similar in both circuits, reducing the requirement for engagement by making reward non-contingent revealed functional specialization. We hypothesized that outcome-associated neural activity in mPFC-NAc and vHip-NAc modulates task engagement. To test this, we examined if neural activity at ITI end predicted latency to lever press on the subsequent trial, a metric operationalizing engagement30,31,32,33,34. A linear mixed effects model revealed modest yet significant relationships between latency to lever press and mPFC-NAc, vHip-NAc, and the interaction of mPFC-NAc and vHip-NAc activity (Fig. 5A, Supplementary Table 2; Fig. S8). This suggests that increased activity during outcome integration in either circuit increases latency to lever press, indicating reduced behavioral engagement (Fig. S9).
A Heatmap of estimated latency to respond on the subsequent trial given mPFC-NAc and vHip-NAc activity at ITI end shows that increased activity associates with longer latency. B Optogenetic stimulation in the two-armed bandit task55 is delivered for the duration of the ITI to either mPFC-NAc, vHip-NAc, or simultaneously to both circuits. C AAVrg-ChR2-mCherry or AAVrg-mCherry is injected into the NAc, and optic fibers implanted in mPFC and vHip to stimulate (D) mPFC-NAc neurons (representative image; scale bars, 200 μm) and (E) vHip-NAc neurons (representative image; scale bars, 200 μm). F Simultaneous 5 Hz stimulation of mPFC-NAc and vHip-NAc, but neither circuit individually, increased latency to respond in ChR2 animals (male n = 6, female n = 7) compared to mCherry controls (male n = 6, female n = 6; Z = -6.3611, p = 1.60E−9 (G) 8 Hz stimulation of mPFC-NAc (Z = −3.8398, p = 0.0010), vHip-NAc (Z = −5.3250, p = 8.08E−7), and simultaneous stimulation of both mPFC-NAc and vHip-NAc (Z = −6.4875, p = 6.98E−10) all increased latency in ChR2 animals (male n = 5, female n = 6) compared to mCherry controls (male n = 6, female n = 6). Individual-animal averages are indicated by circles for males and triangles for females. Comparisons were performed using a two-sided Z-test and Sidak’s method to adjust for multiple comparisons. Error bars represent SEM around the estimated mean. Source data are provided as a Source Data file. **p < 0.01,****p < 0.0001.
From the association between neural activity and latency, we hypothesized that reward suppresses activity in mPFC-NAc and vHip-NAc to support behavioral engagement, defining a mechanism whereby recent reward history modulates engagement in reward-motivated behavior. We predicted that acutely increasing activity in either mPFC-NAc or vHip-NAc would suppress engagement. To test this, we injected retrograding AAV-ChR2 into NAc and implanted fibers above mPFC and vHip to deliver blue light stimulation during the ITI on a subset of trials in the two-armed bandit task (Figs. 5B–E; S10). To test if mPFC-NAc and vHip-NAc uniquely or redundantly control behavior, we stimulated each circuit alone or both simultaneously. Stimulating either circuit alone had no effect, whereas stimulating both simultaneously increased latency to lever press but did not alter choice behavior (Figs. 5F; S11A). This could indicate either a threshold for sufficient cumulative glutamatergic drive or a requirement for synergistic interaction between inputs. To differentiate these possibilities, we repeated the experiment with stronger stimulation. Strong stimulation of either circuit alone increased latency to lever press, again with no effect on choice (Figs. 5G, S11B). This shows that total glutamatergic input modulates engagement, independent of input identity. mPFC-NAc stimulation yielded a slightly weaker effect than vHip-NAc, consistent with previous findings that mPFC projections to NAc medial shell are sparser than those from vHip7. Stimulation during lever presentation did not yield any changes in latency or choice behavior, supporting the importance of neural integration of outcome during the ITI period, prior to action initiation (Fig. S12). Together, our results demonstrate that mPFC-NAc and vHip-NAc dynamically track outcome information to modulate behavioral engagement according to the recent history of reward. While each circuit is specialized to execute this function under distinct behavioral states, once engaged, they redundantly modulate behavior, pointing to complementary roles in the control of reward-seeking.
Discussion
We examined redundancy and specificity in the function of two distinct glutamatergic inputs to the NAc. Using dual-site fiber photometry to probe trial-by-trial outcome encoding simultaneously in two circuits in the same animal during reward-guided choice, we find that mPFC-NAc and vHip-NAc similarly integrate reward via suppression of neural activity. By then systematically manipulating the conditions in which outcomes are encountered, we revealed that each circuit executes this common function under distinct behavioral states. While the mPFC-NAc invariantly encodes outcome, vHip-NAc uses information about unrewarded outcomes to tune outcome encoding, effectively amplifying surprising reward. By comparing independent or synchronous circuit-specific optogenetic stimulation, we show that, once engaged, these circuits cooperatively execute a shared function, i.e., modulating task engagement. Taken together, we identify a redundant mechanism for outcome integration with circuit-specific gating. This supports the convergence of multiple inputs in tuning behavioral engagement to the recent history of reward.
Our finding that both mPFC-NAc and vHip-NAc integrate information about outcomes of reward-motivated actions is consistent with the well-established role of mPFC in reward processing. Critically, we demonstrate that this function is not specific or limited to the mPFC-NAc. Globally, the mPFC encodes information about previous actions and outcomes29, and mPFC projections to the NAc bridge information about current actions and outcomes across trials16,23. Our findings suggest these functions are not unique to mPFC-NAc and are shared by vHip-NAc. However, we identify state-dependent specialization in how reward integration is engaged in each circuit. We show that the mPFC-NAc fundamentally functions as a reward ledger, with reward suppressing neural activity no matter the behavioral state. In contrast, we find that vHip-NAc is tuned to preferentially encode outcome information after unrewarded outcomes.
Differential encoding between mPFC-NAc and vHip-NAc emerged upon degrading task requirements, a manipulation that minimizes cognitive and behavioral demands, effectively reducing the behavioral utility of representing integrated reward history. Under these circumstances, the base functionality of each circuit is revealed: mPFC-NAc encoding is anchored to reward whereas vHip-NAc is anchored to unrewarded outcomes. Layered on top of this base functionality, representation of reward history scales with task complexity in support of behavioral demands. When reward is passively encountered with limited utility for action-outcome associations, mPFC-NAc encoding is limited to the most recent outcome. In more complex environments wherein actions elicit reward and action-outcome associations have high utility, the mPFC-NAc encoding window extends to integrate reward history. In simpler task structures that no longer require active engagement with a lever to earn rewards, the time-constant of vHip-NAc encoding shifts such that activity no longer increases when a single unrewarded outcome follows a reward. As a result, the vHip-NAc effectively comes to encode consecutive loss against all other outcomes. Together, this suggests a role for vHip-NAc in providing information about the state of reward statistics in the environment, modulating behavior as a function of unrewarded outcomes, and reveals a role for this circuit as a parallel and distinct stream of outcome integration.
The NAc has long been implicated in reward processing, yet the precise neural circuit mechanisms are still being resolved. In the NAc medial shell, reward predominantly suppresses neural activity26. This suppression likely maintains reward-seeking as stimulation of either D1 or D2 medium spiny neurons bidirectionally controls reward-seeking behavior35. Here we show that reward suppresses both mPFC-NAc and vHip-NAc, two major excitatory inputs to NAc medial shell. Reward-associated suppression of these inputs would lead to reduced NAc activity. As such, our findings are consistent with reports that optogenetic stimulation of diverse glutamatergic inputs inhibits motivated behavior and the idea that glutamatergic input to NAc medial shell functions as a brake on motivated behavior27,35,36,37,38. We show that outcome integration in mPFC-NAc and vHip-NAc initiates parallel, temporally integrated neural signaling that may engage this ‘brake’ to align ongoing behavior with recent reward history and so tune behavioral engagement to prevailing environmental conditions.
Employing a redundant mechanism in mPFC-NAc and vHip-NAc may serve several functions. A common mechanism makes for simple integration of multiple inputs and ensures the robustness of the fundamental function of reward-guided engagement against insults. Further, modulating redundant encoding with state-dependent circuit-specific sensitivity may increase the granularity and range of encoding to ultimately amplify the behavioral impact of surprising rewards. We demonstrate that high levels of reward suppress activity in both mPFC-NAc and vHip-NAc to favor continued engagement. In contrast, strong activation of either input suppresses engagement, but, when weakly activated, synchronous recruitment of both circuits is required. Functionally, this may translate into a mechanism whereby moderate, balanced activity predominantly modulates task engagement while allowing for strong activation of either circuit to exert more direct behavioral control.
Preferential outcome encoding in vHip-NAc after unrewarded outcomes may serve to strengthen engagement in variably rewarding environments, driving increased engagement when reward is infrequently encountered. The sensitivity of vHip-NAc to continuous unrewarded outcomes, may also serve to gauge reward statistics of the environment, continually increasing with each consecutive unrewarded outcome to trigger task disengagement when activity reaches some threshold. Qualitatively, we see hints of this in the shape of the signal after experience with an unrewarded outcome: mPFC-NAc tends to plateau while vHip-NAc continues to increase. Ultimately, dysregulated outcome-encoding in either mPFC-NAc or vHip-NAc could alter behavioral sensitivity to reward. Relative to mPFC-NAc, the vHip-NAc is poised to exert an outsized effect on behavioral engagement both in the strength of its input to NAc medial shell7 and in its role in signaling unrewarded outcomes. For example, hyperactivity of vHip-NAc may erroneously signal a large amount of consecutively unrewarded outcomes, causing premature disengagement. Given our finding that engagement is modulated by the cumulative glutamatergic input to NAc, a sufficiently strong vHip-NAc signal could effectively jam any reward signal from mPFC-NAc, compounding insensitivity to reward that manifests as anhedonia. Indeed, disruption of the balance between NAc inputs and increased vHip-NAc drive is observed following chronic stress17,21,39,40, as well as chronic alcohol41,42 and cocaine intake43,44,45,46, manipulations associated with aberrant reward processing.
Here, we examined the simultaneous encoding in two key neural circuits for motivated behavior. By considering outcome encoding within the context of recent outcome history and behavioral demands, we identified a common neural mechanism of sustained temporal integration of reward outcomes and revealed how the external environment differentially shapes internal representations within two neural circuits. We also revealed critical circuit specificity: while mPFC-NAc consistently tracks outcomes, vHip-NAc preferentially encodes outcome information after unrewarded outcomes. By illustrating the interplay of redundancy and specificity in circuit control of motivated behavior we demonstrate the need to contextualize events within varied behavioral states to fully understand neural encoding. Overall, our findings point to the importance of balanced suppression of NAc glutamatergic inputs during outcome integration to maintain reward-modulated behavioral engagement.
Methods
Animals
Mice were maintained on a 12-h light-dark cycle (lights on at 7:00 AM) at 22–25 °C and 50% humidity, group-housed with 3–4 same-sex cage-mates with ad libitum access to food and water. All experimental manipulations occurred during the light cycle, in accordance with guidelines of McGill University’s Comparative Medicine and Animal Resources Center and approved by the McGill Animal Care Committee. 7-week-old male and female C57BL/6J mice were obtained from Jackson Laboratories and habituated to the colony room one week prior to the start of manipulations. Mice were food-restricted to 85% of their free-feeding body weight during experimentation.
Surgeries
Stereotaxic surgery was performed under ketamine (100 mg/kg)/xylazine (10 mg/kg) anesthesia. To achieve projection-specific GCaMP7f expression in glutamatergic NAc-projecting cells, 0.3 μl pGP-AAVrg-syn-jGCaMP7f-WPRE virus (1.85 × 1013GC/ml; Addgene) was infused into the NAc (A/P: +1.3, M/L: +/−0.60, D/V: −4.9) at a rate of 0.1 μl per min, before raising the needle to D/V: −4.7 and infusing a further 0.4 µl virus, and allowed to diffuse for 10 min before withdrawing the needle. pGP-AAV-syn-jGCaMP7f-WPRE was a gift from Douglas Kim & GENIE Project (Addgene plasmid # 104488; http://n2t.net/addgene:104488; RRID:Addgene_104488)47. Chronically implantable optic fibers (Neurophotometrics) with 200μm core and 0.37 NA threaded through ceramic ferrules were implanted above the ventral subiculum of the vHip (A/P: -3.40, M/L: +/−3.00, D/V: −4.75) and infralimbic mPFC (A/P: 1.90, M/L: +/-0.3, D/V: -2.80). Recordings began a minimum of 4 weeks after surgery to allow sufficient time for stable and robust retrograde virus expression. To achieve projection-specific ChR2 expression in glutamatergic NAc-projecting cells, 0.3 μl pGP-AAVrg-hSyn-hChR2(H134R)-EYFP virus (7 × 1012GC/ml; Addgene) or a fluorophore only control, pGP-AAVrg-hSyn-mCherry (7 × 1012GC/ml; Addgene) was infused into the NAc (A/P: +1.3, M/L: +/−0.60, D/V: −4.9) at a rate of 0.1 μl per min, before raising the needle to D/V: -4.7 and infusing a further 0.4 µl virus, and allowed to diffuse for 10 min before withdrawing the needle. pAAV-hSyn-hChR2(H134R)-EYFP was a gift from Karl Deisseroth (Addgene plasmid # 26973; http://n2t.net/addgene:26973; RRID:Addgene_26973). pAAV-hSyn-mCherry was a gift from Karl Deisseroth (Addgene plasmid # 114472; http://n2t.net/addgene:114472; RRID:Addgene_114472). Chronically implantable optic fibers (Neurophotometrics) with 200μm core and 0.22 NA threaded through ceramic ferrules were implanted above the ventral subiculum of the vHip (A/P: -3.40, M/L: +/−3.00, D/V: −4.75) and infralimbic mPFC (A/P: 1.90, M/L: +/−0.3, D/V: -2.80). Optogenetic manipulations began minimum 4 weeks after surgery to allow sufficient time for stable and robust retrograde virus expression.
Histology
After completion of all behavioral testing, mice were deeply anesthetized with ketamine/xylazine and transcardially perfused with phosphate buffered saline (PBS) and paraformaldehyde (4%). Brains were removed and post-fixed in paraformaldeyhde for 24 h and stored in PBS until sectioning on a vibratome (50 µm). Sections were mounted with Vectashield with DAPI (Vector Laboratories) and examined under a fluorescent microscope (Leica DM6000 B) to confirm viral expression and fiber placement. A confocal microscope (Zeiss LSM800) was used to obtain fluorescent images. Images were acquired as tiles with a 20x air objective (NA 0.8) using Zeiss Zen Blue imaging software. Images were collected in the McGill University Advanced BioImaging Facility (ABIF), RRID:SCR_017697. Mistargeted animals were excluded from analysis.
Apparatus
Behavioral experiments were performed in standard Med Associates operant boxes (15.24 × 13.34 × 12.7 cm) enclosed in sound attenuating chambers outfitted with a programmable audio generator, two retractable levers and cue lights either side of a food port for delivering a liquid chocolate milk reward (30 μl, Nesquick) diluted with water in a 2:1 ratio. Boxes were controlled and data collected by a computer running MED-PC software (Med-Associates).
Lever Press Training
Training was completed in three stages, with all training sessions lasting 30 minutes. In the first stage, animals were presented with two levers, both of which delivered a chocolate milk reward with a 100% probability. To signal the start of the trial, both levers were extended and the cue lights above the levers turned on, animals then had 60 seconds to make a response on either lever. A press on either lever resulted in lever retraction, immediate delivery of a 30 µL chocolate milk reward, and the start of a 3 second auditory cue (2 kHz pure tone or white noise). Following either a lever press or 60 seconds with no press (i.e., an omission), a 10-second intertrial interval (ITI) was triggered. After one session with over 25 responses, the animals progressed to the second stage. In this stage animals again were presented with two levers, but the reward was now delivered with a 50% probability on both levers. To signal the start of the trial, both levers extended and the cue lights above the levers turned on. Animals then had 60 seconds to make a response. A lever press resulted in lever retraction and immediate delivery of the outcome, either a 30 µL chocolate milk reward and a 3 second auditory cue (2 kHz pure tone or white noise, counterbalanced across animals) or just a 3 second auditory cue (white noise or 2 kHz pure tone). Following either a lever press or omission, a 10 second intertrial interval (ITI) was triggered. Following two consecutive sessions with over 40 responses, animals progressed to the third stage. This stage was the same as stage two except that animals now had only 10 seconds to make a response before an omission was registered. Following two consecutive sessions with over 100 responses animals achieved criterion to progress to the two-armed bandit task.
Two-armed bandit Task
The two-armed bandit task was performed over 6 days, with each session lasting one hour. In this task, animals were presented with two levers, with one lever rewarded on 80% of trials, and the other lever rewarded on 20% of trials. To signal the start of the trial, both levers were extended and the cue lights above the levers turned on. Animals then had 10 seconds to make a response on either lever or an omission was registered. A lever press resulted in lever retraction and immediate delivery of the outcome, either a 30 µL chocolate milk reward and a 3 second auditory cue (2 kHz pure tone or white noise, counterbalanced across animals) or simply a different 3 second auditory cue (white noise or 2 kHz pure tone) signaling non-reward. Following either a lever press or an omission, a 10 second intertrial interval (ITI) was triggered. To maintain a dynamic learning environment and high rates of rewarded and unrewarded outcomes, the probability of reward was switched between levers after five consecutive responses on the high probability lever.
One-lever forced choice task
The one-lever forced choice task was performed over 3 days, with each session lasting one hour. In this task, animals were presented with a single lever (counterbalanced across animals). Pressing this lever resulted in a probabilistic reward on a predetermined schedule. The outcome schedule was matched to each animal’s individual performance in the final three days of the two-armed bandit task, such that the first session in the one-lever task was yoked to the reward schedule experienced by the animal on day four in the two-armed bandit task, the second to day five, and the third to day six. To signal the start of the trial, the lever extended, and the cue light above the lever turned on. Animals then had 10 seconds to make a response. A lever press resulted in lever retraction and immediate delivery of the outcome, either a 30 µL chocolate milk reward and a 3 second auditory cue (2 kHz pure tone or white noise) or simply a different 3 second auditory cue (white noise or 2 kHz pure tone). Following either a lever press or an omission, a 10-second intertrial interval (ITI) was triggered.
No lever response free task
The no-lever response free task was performed over the course of 3 days with each session lasting one hour. In this task, animals were able to retrieve non-contingently delivered rewards under a similar trial structure to both the two-armed bandit task and the one-lever forced choice task but with no levers available. To signal the start of the trial, cue lights above both levers turned on and remained illuminated for a period of time matched to each animal’s response time in the last three days of the two-armed bandit task. After cue lights turned off, outcomes were delivered, either a 30 µL chocolate milk reward and a 3 second auditory cue (2 kHz pure tone or white noise) or simply a different 3 second auditory cue (white noise or 2 kHz pure tone). As in the one-lever task, the outcome schedule was matched to each animal’s performance in the final three days of the two-armed bandit task, now also matching the latency to deliver the outcome to the trial-by-trial latency to lever press on the two-armed bandit task with a 10 second intertrial interval (ITI).
Frame independent projected fiber photometry
To measure calcium-associated changes in fluorescence in real time, recordings were made from vHip-NAc and mPFC-NAc-projecting cells during the two-armed bandit task, the one-lever forced choice task, and the no-lever response free task. Samples were collected at a frequency of 20 Hz using Neurophotometrics hardware through Bonsai and FlyCap software. Recordings were coupled to the start of behavioral analysis by interfacing Bonsai with MED-PC using a custom DAQ box (Neurophotometrics).
Photometry data extraction and normalization
Photometry data were extracted and analyzed using custom-written scripts in Python (3.6.10). To normalize the data, the control channel (415 nm) was fitted to the raw (470 nm). The fitted control was then subtracted from the raw trace. The resultant trace was divided by the fitted control, giving the ΔF/F and converted to a Z-score. This calculation was performed over the entirety of the session to preserve dynamic fluctuation in population activity that persists beyond individual trials to allow comparison across trials. For heatmaps Z-scores were baseline subtracted from average activity in the two seconds prior to lever press to accommodate moving baselines. For analyses of reward history, Z-scores were baseline-subtracted from average activity in the two seconds prior to lever press on trial t-1 to account for shifted baselines in trial t0.
Optogenetics in two-armed bandit task
Following lever press training, animals started the two-armed bandit task with optogenetic manipulations of mPFC-NAc and vHip-NAc activity for the duration of the ITI. Each day animals received either mPFC-NAc, vHip-NAc, or simultaneous mPFC-NAc and vHip-NAc stimulation on a subset of trials over the course of 9 days such that they received a total of 3 days of stimulation per condition for each stimulation protocol tested (5 Hz, 10 ms, 1–2 mW; 8 Hz, 10 ms, 2–3 mW). The order of stimulation days was fully counterbalanced within and between mice to avoid any order effects. Stimulation was delivered by 450 nm lasers controlled by a laser driver (Doric) running Doric studios software and triggered via a TTL (Med-Associates) at ITI start on a random subset of trials (30%) and terminated immediately prior to lever extension.
Ex vivo current-clamp electrophysiology
Brain slice preparation
Mice were deeply anesthetized with isofluorane. Transcardial perfusion was performed with 25–30 ml of ice-chilled carbogenated NNMDG artificial cerebrospinal fluid (aCSF: containing in mM: 92 NMDG, 2.5 KCl, 1.25 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 glucose, 2 thiourea, 5 Na-ascorbate, 3 Na-pyruvate, 0.5 CaCl2·4H2O and 10 MgSO4·7H2O; titrated to pH 7.3–7.4 with concentrated hydrochloric acid). Brain slices (200 μm) were prepared in ice-chilled carbogenated NMDG aCSF by a vibratome (Lecia VT 1200S). All brain slices were recovery in 32–34 °C carbogenated NMDG aCSF for 10 min and then were transferred into room-temperature carbogenated HEPES holding aCSF (containing in mM: 92 NaCl, 2.5 KCl, 1.25 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 glucose, 2 thiourea, 5 Na-ascorbate, 3 Na-pyruvate, 2 CaCl2·4H2O and 2 MgSO4·7H2O; titrated to pH to 7.3–7.4 with NaOH) for at least 1 hour before current-clamp recording.
Electrophysiology recordings
Current-clamp recordings were performed in room-temperature carbogenated aCSF (containing in mM: mM: 128 NaCl, 3 KCl, 1.25 NaH2PO4, 2 MgCl2, 2 CaCl2, 24 NaHCO3 and 10 glucose; pH 7.2). The patch pipette solution was composed of (in mM) 115 K-gluconate, 20 KCl, 1.5 MgCl2, 10 Phosphocreatine-Tris, 2 Mg-ATP, 0.54 Na-GTP, and 10 HEPES. Blue light (wavelength: 470 nm) from an LED system (DC4100, Thorlabs) was used for optogenetic stimulation to evoke action potentials. The optogenetic stimulation protocol consisted of trains of 5 Hz (1–2 mW) or 8 Hz (2–3 mW) 10 ms light pulses for 5 s. All signals were amplified and digitized by Multiclamp 700B (Molecular Device) and Digidata 1550B (Molecular Device,) respectively. Series and access resistance were monitored during the experiments and signals were bessel filtered at 2 kHz.
Data analysis & statistics
Linear mixed effects regression
Linear Mixed Effects Regression Models are a powerful approach to probe variance attributable to variables of interest (e.g., trial outcome) while simultaneously controlling for random effects (e.g., session ID)28,48,49. This is useful for modeling instances where there is nonindependence in the structure of data, e.g. multiple trials recorded within multiple animals. Models were fit using the full interaction of the factors of interest (trial outcome, previous trial outcome, sex) and using animal ID and session ID as random effects using the lme4 package (1.1-30) in R (4.2.1)50. Where the dependent variable was latency, a Gamma link function was used to approximate the non-Gaussian distribution. The fitted models were used to calculate estimated marginal means using the emmeans package (1.8.0) in R (4.2.1)51. The effect of variables of interest was then examined by comparing estimated marginal means. Given the large number of samples generated using this approach (all trials x all animals), comparisons of estimated marginal means were conducted using a Z-test and Sidak’s method to adjust for multiple comparisons.
Cross-correlation time delay analysis
Time delay analysis was performed by first calculating the cross-correlation between mPFC-NAc and vHip-NAc during the ITI across a maximum lag of ± 5 seconds using the CCF function in R (4.2.1). The argument of the maximum (i.e. the time offset of peak correlation) of the resulting cross-correlation function was used to estimate the delay between mPFC-NAc and vHip-NAc on a trial-by-trial basis52. Linear mixed effects models were then fit to assess if the delay was non-zero (i.e. non-synchronous) using the following models to test for effects of sex [Time Delay~Sex-1+(1|ID)+(1|Day)] and for the interaction between sex and reward [Time Delay~Rewards:Sex-1+(1|ID)+(1|Day)]. The resulting regression coefficients from each model were examined to determine if the time delay was non-zero in any group (i.e. regression coefficient significantly different from zero).
Conditional entropy analysis
Conditional entropy is an information measure used to estimate the amount of additional information needed to explain one signal given full knowledge of a second signal. This can be interpreted as the unique information contributed by a second signal beyond that contributed by a first with smaller conditional entropy, suggesting less unique information carried by the second signal. Conditional entropy was calculated on the first two seconds and the last two seconds of the ITI using the PyInform package (0.2.0) in Python to calculate the entropy (H) of the mPFC circuit given the vHip-NAc circuit, H(mPFC-NAc|vHip-NAc), and the entropy of the vHip-NAc circuit given the mPFC-NAc circuit, H(vHip-NAc|mPFC-NAc)53,54.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw, processed, and source data generated in this study have been deposited in the Open Science Foundation (OSF) database under accession code https://osf.io/v89ey/?view_only=6cb3865aee944a658b589bc27ecf0d28. Source data are provided with this paper.
Code availability
Code used to perform analyses for all figures is available at https://github.com/eshaaniyer/mPFCvHip-NAc_RewardIntegration
References
Marder, E. Variability, compensation, and modulation in neurons and circuits. Proc. Natl. Acad. Sci. USA. 108, 15542–15548 (2011).
Marder, E. & Goaillard, J.-M. Variability, compensation and homeostasis in neuron and network function. Nat. Rev. Neurosci. 7, 563–574 (2006).
Mizusaki, B. E. P. & O’Donnell, C. Neural circuit function redundancy in brain disorders. Curr. Opin. Neurobiol. 70, 74–80 (2021).
Ghanbari, M., Li, G., Hsu, L. & Yap, P. Accumulation of network redundancy marks the early stage of Alzheimer’s disease. Hum. Brain Mapp. 44, 2993–3006 (2023).
Hiratani, N. & Fukai, T. Redundancy in synaptic connections enables neurons to learn optimally. Proc. Natl. Acad. Sci. 115, E6871–E6879 (2018).
Nguyen, A. T., Xu, J., Luu, D. K., Zhao, Q. & Yang, Z. Advancing system performance with redundancy: from biological to artificial designs. Neural Comput. 31, 555–573 (2019).
Britt, J. P. et al. Synaptic and behavioral profile of multiple Glutamatergic inputs to the nucleus accumbens. Neuron 76, 790–803 (2012).
Carter, A. G., Soler-Llavina, G. J. & Sabatini, B. L. Timing and location of synaptic inputs determine modes of subthreshold integration in striatal medium Spiny Neurons. J. Neurosci. 27, 8967–8977 (2007).
Christoffel, D. J. et al. Selective filtering of excitatory inputs to nucleus accumbens by dopamine and serotonin. Proc. Natl. Acad. Sci. 118, e2106648118 (2021).
Floresco, S. B. The nucleus accumbens: an interface between cognition, emotion, and action. Annu. Rev. Psychol. 66, 25–52 (2015).
French, S. J. & Totterdell, S. Hippocampal and prefrontal cortical inputs monosynaptically converge with individual projection neurons of the nucleus accumbens. J. Comp. Neurol. 446, 151–165 (2002).
Lind, E. B. et al. A quadruple dissociation of reward-related behaviour in mice across excitatory inputs to the nucleus accumbens shell. Commun. Biol. 6, 1–12 (2023).
Muir, J. et al. Sex-biased neural encoding of threat discrimination in nucleus accumbens afferents drives suppression of reward behavior. Nat. Neurosci. 27, 1966–1976 (2024).
O’Donnell, P. & Grace, A. Synaptic interactions among excitatory afferents to nucleus accumbens neurons: hippocampal gating of prefrontal cortical input. J. Neurosci. 15, 3622–3639 (1995).
Grace, A. A., Floresco, S. B., Goto, Y. & Lodge, D. J. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci. 30, 220–227 (2007).
Parker, N. F. et al. Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning. Cell Rep. 39, 110756 (2022).
Bagot, R. C. et al. Ventral hippocampal afferents to the nucleus accumbens regulate susceptibility to depression. Nat. Commun. 6, 7062 (2015).
Barker, J. M., Bryant, K. G. & Chandler, L. J. Inactivation of ventral hippocampus projections promotes sensitivity to changes in contingency. Learn. Mem. 26, 1–8 (2019).
Hamel, L. et al. Cortico-striatal control over adaptive goal-directed responding elicited by cues signaling sucrose reward or punishment. J. Neurosci. 42, 3811–3822 (2022).
Lindenbach, D., Vacca, G., Ahn, S., Seamans, J. K. & Phillips, A. G. Optogenetic modulation of glutamatergic afferents from the ventral subiculum to the nucleus accumbens: Effects on dopamine function, response vigor and locomotor activity. Behav. Brain Res. 434, 114028 (2022).
Muir, J. et al. Ventral hippocampal afferents to nucleus accumbens encode both latent vulnerability and stress-induced susceptibility. Biol. Psychiatry 88, 843–854 (2020).
Otis, J. M. et al. Prefrontal cortex output circuits guide reward seeking through divergent cue encoding. Nature 543, 103–107 (2017).
Spellman, T., Svei, M., Kaminsky, J., Manzano-Nieves, G. & Liston, C. Prefrontal deep projection neurons enable cognitive flexibility via persistent feedback monitoring. Cell 184, 2750–2766.e17 (2021).
Wenzel, J. M. et al. Selective chemogenetic inactivation of corticoaccumbal projections disrupts trait choice impulsivity. Neuropsychopharmacology 48, 1821–1831 (2023).
Yoshida, K. et al. Opposing ventral striatal medium spiny neuron activities shaped by striatal Parvalbumin-Expressing Interneurons during goal-directed behaviors. Cell Rep. 31, 107829 (2020).
Chen, G. et al. Distinct reward processing by subregions of the nucleus accumbens. Cell Rep. 42, 112069 (2023).
Reed, S. J. et al. Coordinated reductions in excitatory input to the nucleus accumbens underlie food consumption. Neuron 99, 1260–1273.e4 (2018).
Yu, Z. et al. Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 110, 21–35 (2022).
Sul, J. H., Kim, H., Huh, N., Lee, D. & Jung, M. W. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66, 449–460 (2010).
Bari, B. A. et al. Stable representations of decision variables for flexible behavior. Neuron 103, 922–933.e7 (2019).
Beierholm, U. et al. Dopamine modulates reward-related vigor. Neuropsychopharmacology 38, 1495–1503 (2013).
Cox, J. et al. A neural substrate of sex-dependent modulation of motivation. Nat. Neurosci. 26, 274–284 (2023).
Hamid, A. A. et al. Mesolimbic Dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Niv, Y., Daw, N. D., Joel, D. & Dayan, P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology 191, 507–520 (2007).
Lafferty, C. K., Yang, A. K., Mendoza, J. A. & Britt, J. P. Nucleus accumbens cell type- and input-specific suppression of unproductive reward seeking. Cell Rep. 30, 3729–3742.e3 (2020).
Millan, E. Z., Kim, H. A. & Janak, P. H. Optogenetic activation of amygdala projections to nucleus accumbens can arrest conditioned and unconditioned alcohol consummatory behavior. Neuroscience 360, 106–117 (2017).
Yoshida, K., Drew, M. R., Mimura, M. & Tanaka, K. F. Serotonin-mediated inhibition of ventral hippocampus is required for sustained goal-directed behavior. Nat. Neurosci. 22, 770–777 (2019).
Yoshida, K. et al. Chronic social defeat stress impairs goal-directed behavior through dysregulation of ventral hippocampal activity in male mice. Neuropsychopharmacology 46, 1606–1616 (2021).
Pignatelli, M. et al. Cooperative synaptic and intrinsic plasticity in a disynaptic limbic circuit drive stress-induced anhedonia and passive coping in mice. Mol. Psychiatry 26, 1860–1879 (2021).
Williams, E. S. et al. Androgen-dependent excitability of mouse ventral hippocampal afferents to nucleus accumbens underlies sex-specific susceptibility to stress. Biol. Psychiatry 87, 492–501 (2020).
Griffin, W. C., Lopez, M. F., Woodward, J. J. & Becker, H. C. Alcohol dependence and the ventral hippocampal influence on alcohol drinking in male mice. Alcohol 106, 44–54 (2023).
Kircher, D. M., Aziz, H., Mangieri, R. A. & Morrisett, R. A. Ethanol Experience Enhances Glutamatergic Ventral Hippocampal Inputs To D1 Receptor-Expressing Medium Spiny Neurons In The Nucleus Accumbens Shell. J. Neurosci. 3051–18 https://doi.org/10.1523/JNEUROSCI.3051-18.2019 (2019).
Barrientos, C. et al. Cocaine-induced structural plasticity in input regions to distinct cell types in nucleus accumbens. Biol. Psychiatry 84, 893–904 (2018).
Cahill, M. E. et al. Bidirectional synaptic structural plasticity after chronic cocaine administration occurs through Rap1 Small GTPase signaling. Neuron 89, 566–582 (2016).
Pascoli, V. et al. Contrasting forms of cocaine-evoked plasticity control components of relapse. Nature 509, 459–464 (2014).
Zinsmaier, A. K., Dong, Y. & Huang, Y. H. Cocaine-induced projection-specific and cell type-specific adaptations in the nucleus accumbens. Mol. Psychiatry 27, 669–686 (2022).
Dana, H. et al. High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nat. Methods 16, 649–657 (2019).
Fetcho, R. N. et al. Regulation of social interaction in mice by a frontostriatal circuit modulated by established hierarchical relationships. Nat. Commun. 14, 2487 (2023).
Kato, T. et al. Oscillatory population-level activity of dorsal raphe serotonergic neurons is inscribed in sleep structure. J. Neurosci. 42, 7244–7255 (2022).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. ArXiv Prepr. ArXiv14065823 (2014).
Lenth, R. et al. emmeans: Estimated Marginal Means, Aka Least-squares Means. 1.8.0 https://CRAN.R-project.org/package=emmeans (2021).
Abboud, S. & Sadeh, D. The use of cross-correlation function for the alignment of ECG waveforms and rejection of extrasystoles. Comput. Biomed. Res. 17, 258–266 (1984).
Cover, T. M. & Thomas, J. A. Entropy, relative entropy and mutual information. Elem. Inf. Theory 2, 12–13 (1991).
Moore, D. G., Valentini, G., Walker, S. I. & Levin, M. Inform: efficient information-theoretic analysis of collective behaviors. Front. Robot. AI 5, 60 (2018).
Claudi, F. Mouse Top Detailed. Retrieved from: https://scidraw.io/drawing/183. https://doi.org/10.5281/zenodo.3925997 (2020).
Acknowledgements
We would like to thank Dr. Becket Ebitz, Dr. Mihaela Iordanova, and Heike Schuler for their helpful comments and feedback throughout this project. The research was supported by a grant to RCB from NSERC (RGPIN-2017-04225 RCB). Mouse artwork used in Figs. 1A, 4A, 4F, 5B, and Supplementary Fig. 12A of this article was sourced from SciDraw and is the original artwork of Federico Claudi, distributed under the CC-BY 4.0 license55.
Author information
Authors and Affiliations
Contributions
Conceptualization, E.S.I. and R.C.B.; Methodology, E.S.I. and R.C.B.; Investigation, E.S.I., P.V., S.W., J.M., Y.C.T., V.C.; Writing – Original Draft, E.S.I. and R.C.B.; Writing – Review & Editing, E.S.I. and R.C.B.; Funding Acquisition, R.C.B.; Resources, R.C.B.; Supervision, R.C.B.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Sylvain Crochet, Jose Moron and the other, anonymous, reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Iyer, E.S., Vitaro, P., Wu, S. et al. Reward integration in prefrontal-cortical and ventral-hippocampal nucleus accumbens inputs cooperatively modulates engagement. Nat Commun 16, 3573 (2025). https://doi.org/10.1038/s41467-025-58858-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-58858-4