Introduction

In daily life, we often choose actions based on what we anticipate the outcomes will be. Such behaviors are flexible, modifiable, and based on expectations. For example, one might frequent a particular vending machine for one’s afternoon snack. If the machine malfunctions, one adapts by developing a new strategy – seeking a new machine or patronizing a coffee shop or bringing a snack from home. The orbitofrontal cortex (OFC) is thought to catalog information necessary for developing strategies to obtain desired outcomes, particularly when organisms must deviate from established routines1,2,3,4. Recently, neuron ensembles within the OFC were discovered that maintain representations of task structure5, cue-reward associations6, context-outcome associations7, and action-outcome expectations8, all supporting action flexibility. How these ensembles are formed and stabilized, and the extent to which they are functionally distinct, is incompletely understood.

Excitatory plasticity in the basolateral amygdala (BLA) has long been known to alter the coding properties of OFC neurons9. Excitatory projections from the BLA to OFC are necessary for reward expectations to motivate specific action plans10,11, and for new memory formation when familiar behaviors are unexpectedly not rewarded8. BLA inputs to the OFC are thus positioned to fine-tune ensembles in the OFC that form specific memory traces. We tested this possibility by training mice to reliably obtain rewards by engaging in two distinct behaviors. Then, one behavior ceased to be reinforced, requiring mice to form long-term memory to inhibit the non-reinforced behavior to favor the other, still-reinforced behavior in the future. We first replicate our recent finding that specific OFC neurons active during memory encoding are necessary for later action flexibility, suggesting that they form stable memory traces for action-outcome expectations8. We then report here that they are sufficient to trigger action flexibility and are functionally distinct from cells responsive to familiar reward information. We also report that reducing excitatory neural activity in the BLA during memory encoding obstructed later action flexibility. Thus, plasticity within the BLA is necessary for memory trace formation within the OFC.

Long-term storage of action variables requires new experiences to trigger durable cellular changes that allow prior learning to be later accessed12. Accordingly, we also examined the density and structure of dendritic spines, the principal sites of postsynaptic excitatory plasticity in the brain, specifically on memory trace neurons. We questioned whether the neurons undergo structural plasticity that is: 1) learning-related, 2) selective to conditions in which mice form novel reward-related memories, 3) unique relative to neighboring neurons not contained within the memory ensemble, and 4) like learning itself, dependent on concurrent neural activity within the BLA. Altogether, our findings indicate that amygdalo-OFC interactions form selective, durable memory ensembles within the OFC that are necessary for action flexibility, and they are characterized by distinct patterns of structural plasticity, providing a likely substrate for long-term memory storage.

Results

Neurons within the OFC are capable of forming and maintaining memories enabling organisms to deviate from previously learned action sequences and generate new strategies to dynamically seek reward8. Meanwhile, they can also maintain cue-reward and context-outcome memories, which can support reward pursuit in the presence of reward-predictive cues and contexts6,7. Potentially for this reason, the OFC has been implicated in habit-like behavior, referring to familiar action sequences that are elicited by cues13,14,15,16. Whether OFC neurons controlling these seemingly opposing processes—forming memory traces necessary for flexible action vs. engaging in habitual routines – form functional ensembles that are distinct (or overlapping) is unclear. To test this, mice were trained to nose poke equally on two distinct ports for food (supplementary Fig. 1a–b). Once mice responded proficiently on each port, they were given isolated access to one port; here, responding no longer resulted in food reward, and pellets were instead delivered non-contingently (non-reinforced session; Fig. 1A). In the next session, mice were given isolated access to the opposite port, and responding remained reinforced, as during training (reinforced session; Fig. 1A). Thus, for one nose-poking behavior, established reinforcement conditions were violated, whereas for the other, expectations were maintained. Finally, response preference was tested by placing the mice in the chambers with both ports available for a brief choice test conducted in extinction (Fig. 1A). Mice that prefer the behavior that had remained consistently reinforced are considered flexible, deviating from equivalent responding established during training. Meanwhile, poking equivalently on both ports is considered inflexible (habit-like), insensitive to new action-reward contingencies.

Fig. 1: Memory for novel action strategies is evident within the OFC.
Fig. 1: Memory for novel action strategies is evident within the OFC.
Full size image

A Mice were trained to perform two food-reinforced nose-poke responses (acquisition). Subsequently, one response was no longer reinforced, and food was delivered non-contingently (non-reinforced), triggering new memory encoding. The action-outcome association linked with the other food remained intact (reinforced). Reinforced (reinf) and non-reinforced (non-reinf) sessions were counterbalanced. Response preferences were measured upon memory retrieval during a subsequent choice test. Created in BioRender. Yount, S. (https://BioRender.com/qt89zub). B Viral targeting of active OFC neuron ensembles. C Representative induced mCherry in the OFC following the non-reinforced session and experiment timeline. Scale bar = 100 µm. D Training. E Choice test 1: Chemogenetically silencing memory encoding neurons obstructed flexible choice upon memory retrieval (no effect of group F < 1, main effect of reinforcement F(1,15) = 12.86, p = 0.0106, interaction group X reinforcement F(1,15) = 4.485, p = 0.05). F Choice test 2: Groups were insensitive to new action contingencies following RI training, as expected, with no impact of Gi-DREADDs (no effect of group or reinforcement Fs < 1, no interaction group X reinforcement F(1,15) = 1.586, p = 0.2272). G Response preferences differed only in choice test 1 (no effect of group F(1,14) = 1.514, p = 0.2388, no main effect of test F(1,14) = 1.931, p = 0.1863, interaction group X test F(1,14) = 7.027, p = 0.0190). Ctrl n = 9; Gi n = 8. H Representative induced mCherry in the OFC following the execution of familiar behaviors and experiment timeline. Scale bar=100 µm. I Training. J Choice test 1: Groups were insensitive to new action contingencies following RI training, with no impact of Gi-DREADDs (no effect of group or interaction Fs < 1, no main effect of reinforcement F(1,18) = 3.631, p = 0.0728). K Choice test 2: Groups were sensitive to new action contingencies following RR training, with no impact of Gi-DREADDs (no effect of group or interaction Fs < 1, main effect of reinforcement F(1,18) = 7.627, p = 0.0128). L Response preferences did not differ at any time (no effect of group F(1,18) = 1.391, p = 0.2536, main effect of test F(1,18) = 4.199, p = 0.0553, no interaction group X test F < 1). Ctrl n = 10; Gi n = 10. Data are presented as individual mice or mean ± s.e.m. *p ≤  0.05 (main effect or post hoc). ns, not significant, analyzed by ANOVA. Full statistics detailed in supp. Table 1.

First, we used Fos2A-iCreER mice, which express a tamoxifen (4OHT)-inducible form of Cre recombinase (Cre) under the control of the Fos gene promoter. Thus, 4OHT induces Cre in cfos+ cells, allowing for their labeling and manipulation, for instance through Cre-dependent chemogenetic receptors like Designer Receptors Exclusively Activated by Designer Drugs (DREADDs). We first induced inhibitory chemogenetic receptors, Gi-coupled DREADDs, in OFC neurons expressing cfos when mice encountered and encoded new reward information – that a familiar behavior is no longer reinforced – by injecting 4OHT immediately following the non-reinforced training session (Fig. 1B-C). We then inhibited the activity of this neuron population later using the chemogenetic ligand clozapine N-oxide (CNO) (supplementary Fig. 1c-d) during the choice test, when mice must retrieve newly learned information to flexibly deviate from established routines. Chemogenetically silencing this cell population blocked the ability of mice to update action strategies (Fig. 1D-E) (as also in ref. 8). Thus, this neuron population maintains a stable representation of novel information necessary for flexible choice.

The same mice were next trained to respond on both ports using a random interval (RI) schedule of reinforcement, which prompts habitual response strategies that are insensitive to action outcomes17 and then retested in their ability to flexibly deviate from routine. Chemogenetically inhibiting the same neuron population had no impact on the expression of habit-like behavior, in that both control and chemogenetic receptor-expressing mice responded non-selectively at a subsequent choice test (Fig. 1F). This phenomenon was further exemplified by calculating preference ratios (response rates on the reinforced port / response rate on the non-reinforced port), in which case, preferences >1 indicate flexible action, while preferences near 1 indicate equivalent responding and no preference for the reinforced response. As expected, control mice as a group demonstrated preferences >1 in the initial and not second choice test, while experimental mice as a group generated scores approximating 1 throughout (Fig. 1G). Thus, neural ensembles in the OFC stably represent new information necessary for action flexibility (that is, a memory trace; MT), but the same ensemble does not control the expression of habit-like behavior.

A separate cohort of mice was trained initially using RI schedules of reinforcement. Expression of Gi-coupled DREADDS was induced in OFC neurons expressing cfos late in training, during the execution of highly routinized behaviors (Fig. 1H), capturing a comparable number of neurons relative to the prior experiment (supplementary Fig. 1E–G). Again, inhibiting cell activity did not impact the later expression of habit-like behavior, indicated by non-preferential responding even when one familiar response was unlikely to be reinforced (Fig. 1I, J). Mice were then re-trained using a ratio schedule of reinforcement to engender flexible responding. Inhibiting cell activity after contingencies were modified again also had no impact, now on the expression of flexible responding (Fig. 1K, L). Thus, although some OFC neurons are apparently active during the execution of familiar, routinized behaviors and even reactivated upon a choice test (supplementary Fig. 1G), they are not necessary for shaping habitual behavior or flexible action selection.

We next examined the overlap between neuron populations expressing cfos upon learning new action-outcome information with those expressing cfos upon the expression of habit-like behavior, and between neuron populations expressing cfos upon RI training with neuron populations expressing cfos upon the expression of flexible behavior. To determine whether overlap differed from what was to be expected by chance, we performed a bootstrap analysis. In both circumstances, the overlap between cell populations fell within the 95% confidence intervals of the overlap expected by chance (supplementary Fig. 1H, I). Thus, cells expressing cfos upon the execution of routinized behavior are randomly distributed among MT cell ensembles.

Bidirectional control of action strategies

To summarize, neural ensembles in the OFC stably represent new memories necessary for action flexibility. We next asked whether this neuron population was sufficient to induce action flexibility. To do this, we induced excitatory Gq-coupled chemogenetic constructs in MT neurons (Fig. 2A, B), increasing immediate-early gene levels in the presence of CNO (supplementary Fig. 2A, B). Mice were trained using ratio schedules, and response flexibility was first tested in the absence of CNO. All mice responded flexibly as expected, this test thus confirming that MTs had formed (Fig. 2C, D). Mice were next trained using an RI schedule of reinforcement to induce habit-biased responding, allowing us to detect improvements in action flexibility, if they existed. At choice test 2 in the presence of CNO, control mice responded inflexibly, not preferring one port over the other as expected, while mice expressing the excitatory chemogenetic receptor updated their action strategies (Fig. 2E, F). Thus, MT neurons are sufficient to induce action flexibility.

Fig. 2: Memory trace neurons exert bidirectional control over action strategies.
Fig. 2: Memory trace neurons exert bidirectional control over action strategies.
Full size image

A Viral targeting of active OFC neuron ensembles and representative induced mCherry in the OFC following the non-reinforced session (upon new memory encoding). Scale bar, 100 µm. B Experiment timeline. C Responses across training. D Choice test 1: Mice were sensitive to changes in reward contingencies in the absence of CNO (no main effect of group F < 1, main effect of reinforcement F(1,18) = 39.22, p < 0.0001, no interaction group X reinforcement F < 1). E Choice test 2: Chemogenetic stimulation of MT neurons induced action flexibility, despite extended training that otherwise confers inflexible responding (no main effect of group F(1,18) = 2.577, p = 0.1258, main effect of reinforcement F(1,18) = 7.549, p = 0.013, interaction group X reinforcement F(1,18) = 7.343, p = 0.0144). F Accordingly, response preference differed only upon choice test 2 (main effect of test F(1,18) = 13.11, p = 0.0020, no main effect of group F < 1, interaction test X group F(1,18) = 4.699, p = 0.0438). veh n = 10; 4OHT n = 10. G Experiment timeline. H. Responses across training. I. Choice test 1: Mice were sensitive to changes in reward contingencies in the absence of CNO (no main effect of group F < 1, main effect of reinforcement F(1,14) = 14.04, p = 0.0022, no interaction group X reinforcement F < 1). J. Choice test 2: Chemogenetic stimulation of neurons expressing cfos following familiar task conditions had no impact on later choice (all F < 1). K. Accordingly, response preferences did not differ at any time (main effect of test F(1,14) = 8.058, p = 0.0131, no main effect of group F < 1, no interaction test X group F < 1). veh n = 8; 4OHT n = 8. Data are presented as individual mice or mean ± s.e.m. *p <  0.05 (main effect or post hoc). ns not significant. Data were analyzed by ANOVA. Full statistics detailed in supp. Table 2.

We hypothesized that MT neurons are specifically tuned to novel task parameters – specifically, that a familiar behavior is no longer reinforced. Accordingly, we conducted a parallel control experiment in which we induced excitatory chemogenetic receptors in OFC neurons expressing cfos following the session when responding remained reinforced (Fig. 2G), in other words, when task parameters remained familiar. Again, all mice responded flexibly during choice test 1 in the absence of CNO, a control measure confirming that MTs had formed (Fig. 2H, I). Unlike with stimulation of MT neurons, though, the later stimulation of this “familiar” neuron population did not impact behavior following RI training, as both groups of mice engaged in habit-like, non-preferential responding (Fig. 2J, K). Thus, the allocation of neurons into an MT necessary for action flexibility is gated by novel, not familiar, information.

New learning triggers dendritic spine plasticity on MT neurons

Long-term memory storage triggers lasting cellular changes that allow prior learning to be later accessed18. We sought to define synaptic plasticity-related mechanisms that may support flexible action. We isolated ribosome bound mRNA transcripts from MT neurons and in separate mice, neurons that expressed cfos following familiar reinforcement conditions as in the prior figure, termed “familiar” neurons (Fig. 3A–C; and supplementary Fig. 3A, B). This allows for the measurement of mRNA being actively transcribed in MT vs. “familiar” neurons during the execution of choice behavior, which we expected to differ from bulk OFC tissue. In both conditions, neurons expressed less Pvalb (encoding parvalbumin, a marker of inhibitory interneurons) compared to bulk OFC tissue (Fig. 3D), indicating that isolated cells were primarily excitatory in nature, as expected. Unexpectedly, though, Dlg4 (encoding post-synaptic density protein 95; PSD95, an indicator of excitatory synapses) was also lower compared to bulk OFC tissue and surprisingly, comparable between groups (Fig. 3E). This finding suggests that PSD95 is altered (reduced) in captured neuron populations. As a control, we lastly measured Gapdh, which did not differ between groups or from bulk tissues, expected because Gapdh is expressed ubiquitously and would not necessarily be expected to change in response to task experience (supplementary Fig. 3C).

Fig. 3: Memory trace neurons undergo learning-related dendritic spine plasticity.
Fig. 3: Memory trace neurons undergo learning-related dendritic spine plasticity.
Full size image

A Viral vector strategy. B 4OHT in FosCreER mice induces eGFP-L10a transgene in active neuron populations. EGFP-labeled polysomes are isolated to concentrate actively translating mRNAs. Created in BioRender. Yount, S. (https://BioRender.com/7fqnpc0). C Experiment timeline. D Pvalb (familiar p = 0.0015; MT p = 0.0127) and (E) Dlg4 (familiar p = 0.0463; MT p = 0.0094) were lower in both MT and “familiar” neurons relative to bulk tissue. F Strategy for high-resolution imaging of MT and “familiar” OFC neurons and representative cell co-labeled with mCherry and YFP. Scale bar = 10 µm. G Representative dendritic spine reconstructions. Scale bars=2 µm. H Thin-type spines were lower in density on both familiar and MT neurons relative to neighboring control neurons in the same mouse (main effect of Cfos status F(1,14) = 12.546, p = 0.003, no main effect of when trapped F(1,14) = 2.308, p = 0.151, no interaction Cfos status X when trapped F < 1). I MT neurons selectively displayed higher mushroom-type spine densities (no main effect of Cfos status F(1,14) = 1.708, p = 0.203, no main effect of when trapped F(1,14) = 2.994, p = 0.106, interaction Cfos status X when trapped F(1,14) = 10.906, p = 0.005). J Thus, MT neurons are distinguished by increased mushroom-to-thin-type spine ratios (main effect of Cfos status F(1,14) = 11.156, p = 0.005, main effect of when trapped F(1,14) = 7.800, p = 0.014, interaction Cfos status X when trapped F(1,14) = 7.157, p = 0.018). familiar n = 6; MT n = 10. K Representative induced mCherry in the OFC. Scale bar=100 µm. L Experiment timeline. M Representative dendritic spine reconstructions. Scale bars=2 µm. N Responses across training. O Cocaine obstructed flexible choice (main effect of cocaine F(1,29) = 4.483, p = 0.0429, main effect of reinforcement F(1,29) = 17.20, p = 0.0003, interaction group X reinforcement F(1,29) = 2.443, p = 0.1289; planned post hoc comparisons applied). saline n = 16; cocaine n = 15. P Thin-type spines were lower in density on MT cells, referred to as Cfos + , relative to neighboring neurons, referred to as cfos-. Inset: Cocaine blunted this attrition in thin-type spines (p = 0.0128). Q Mushroom-type spine densities also increased on Cfos+ neurons, while (inset) cocaine blunted this elevation (p = 0.0135). saline n = 7; cocaine n = 7. R MT neurons (Cfos + ) in cocaine-naïve mice had large spine heads (KS = 4.951, p < 0.0001 saline Cfos+ neurons vs. neighboring Cfos- neurons), while cocaine-exposed neurons did not (KS = 0.6669, p = 0.7653 cocaine vs. saline Cfos+ neurons). Individual spines: saline cfos+ n = 862; saline cfos- n = 1011; cocaine cfos+ n = 935; cocaine cfos- n = 1116. S. Accordingly, only MT neurons (Cfos + ) in cocaine-naïve mice developed the expected elevation in mushroom spine-type proportions (main effect of Cfos status F(1,12) = 54.728, p < 0.0001, main effect of cocaine F(1,12) = 10.321, p = 0.007, interaction Cfos status X cocaine F(1,12) = 13.455, p = 0.003). n = 7/group. Data are presented as pooled samples (top), individual mice, individual spine heads, or mean ± s.e.m. $p < 0.05 1-sample t-test vs. 1, *p < 0.05, **p < 0.001. ns not significant, s saline, c cocaine. Data analyzed by 1-sample t-test, ANOVA, or K-S comparison. Full statistics detailed in supp. Table 3.

Given that PSD95 can predict synaptic density19, and dendritic spines are the primary hosts of excitatory synapses in the brain20, we hypothesized that both MT and “familiar” neurons undergo dendritic spine plasticity relative to neurons not stimulated by task features. To investigate this, we crossed Fos2A-iCreER mice with Thy1-YFP mice. This Thy1-YFP H line results in sparse, Golgi-like labeling of V pyramidal tract neurons21,22, this low density enabling their clear single-cell visualization and 3D reconstruction because it minimizes crossing fibers in each field of view. As before, MT neurons were identified through activity-driven mCherry (Fig. 3F, G), allowing us to examine YFP + /mCherry+ cells. Throughout, both MT and “familiar” YFP+ neurons had lower densities of thin-type spines relative to neighboring unlabeled YFP+ cells, referred to as Cfos- cells (Fig. 3H). This is consistent with lower Dlg4 expression in both groups (again, Fig. 3E). Interestingly, though, MT neurons contained greater densities of mushroom-type spines (Fig. 3I), distinguishing them from “familiar” neurons. Accordingly, MT neurons are characterized by an enhanced proportion of mushroom-shaped spines (Fig. 3J). This spine type is durable, housing stable excitatory contacts, relative to thin spines, which are transient in nature.

If spine plasticity is related to new learning, we reasoned that obstructing the capacity of mice to update action strategies should also obstruct spine plasticity on MT neurons. To that end, we exposed new mice chronically to either saline or cocaine, which reliably obstructs flexible responding, including in the present task (Fig. 3K–O) (see refs. 23,24,25,26). MT neurons (Cfos+ cells) in saline-treated mice contained fewer thin-type spines and more mushroom-type spines relative to neighboring control cells, as expected (Fig. 3P, Q). However, cocaine minimized these disparities, reducing the differences between Cfos + /- cells (inset Fig. 3P, Q). This observation led us to also measure individual spine head diameters. Diameters differed between groups, with drug-naïve MT neurons having the largest heads. Cocaine obstructed this apparent growth (Fig. 3R). Accordingly, cocaine decreased the proportion of mushroom-shaped spines on MT neurons (Fig. 3S), again eliminating differences between MT and non-MT cells. In control mice, this proportion was elevated on MT neurons relative to neighboring cells, replicating our prior experiment (again, Fig. 3J). Thus, cocaine blunted learning-induced dendritic spine plasticity on MT neurons.

Notably, cocaine did not impact the assembly of MT ensembles, as indicated by the density of labeled cells, or immediate-early gene levels following the choice test—in other words, the reactivation of MT neurons during memory retrieval (supplementary Fig. 3D, E). This outcome is consistent with our prior report indicating that the size of the ensemble does not alone predict the ability of mice to learn new action strategies8. The present data suggest that spine plasticity more faithfully predicts action strategies.

Amygdalo-cortical interactions coordinate action strategies

We next investigated inputs that might relay new reward information to the OFC. To do so, we first identified inputs to the OFC that are active during the encoding of new action strategies by selectively inducing mCherry in neurons that both project to the OFC and express Cfos during new memory encoding (Fig. 4A, B; and supplementary Fig. 4A, B). We found robust mCherry labeling of neurons within the basolateral amygdala (BLA) at levels that exceeded mCherry induced during familiar task contingencies or when mice were at rest, in their home cages (Fig. 4C, D). We also noted sparse labeling throughout the cortex (supplementary Fig. 4C). As a negative control, we confirmed that no striatal neurons were labeled, as expected given that striatal cells do not project to the frontal cortex (supplementary Fig. 4C).

Fig. 4: Memory trace neurons synchronize with the BLA to coordinate flexible action strategies and associated neurosequelae in the OFC.
Fig. 4: Memory trace neurons synchronize with the BLA to coordinate flexible action strategies and associated neurosequelae in the OFC.
Full size image

A Viral vector strategy. B Experiment timeline. C Representative induced mCherry in the BLA. Scale bar, 100 µm, inset scale bar, 50 µm. Blue reflects DAPI. D Percentage of DAPI+ cells expressing mCherry in the BLA (main effect of group F(2,18) = 59.14, p < 0.0001). home cage n = 8, familiar n = 9, novel n = 4. E Viral vector strategy. F Representative infusions of (top) of PSAM-GlyR in the OFC (scale bar, 100 µm) and (bottom) Gi-DREADDs in the BLA (scale bar, 50 µm). G Experiment timeline. H Contralateral infusions impaired the ability of mice to update action strategies, with mice failing to respond flexibly during the choice test (no main effect of group F < 1, main effect of reinforcement F(1,19) = 6.492, p = 0.0196, interaction group X reinforcement F(1,19) = 9.418, p = 0.0063). ipsilateral n = 12; contralateral n = 9. I Viral vector strategy. J Experiment timeline. K Representative dendritic spine reconstructions of MT neurons in the hemisphere with unadulterated cellular activity within the BLA (mCherry) or suppressed neural activity (Gi). Scale bars, 2 µm. L Thin-type spine densities did not differ between hemispheres (p = 0.0950), but M mushroom-type spine densities were lower in the Gi hemisphere (p = 0.0545). N Thus, the ratio of mushroom-to-thin-type spines was impoverished on MT neurons in the OFC lacking typical BLA neuron activity during the encoding of novel reward information (p = 0.0151). n = 7 pairs. Data are presented as individual mice and group means. *p ≤ 0.05, **p < 0.001. ns not significant. Data were analyzed by t-test or ANOVA, as indicated by the test statistic. Full statistics detailed in supp. Table 4.

We employed combinatorial chemogenetics to next test the hypothesis that excitatory plasticity in the BLA during new memory encoding is necessary for MT formation in the OFC, i.e., for memories supporting the later retrieval of updated action strategies. We used Gi-coupled DREADDs to inhibit the activity of BLA neurons in one hemisphere during new memory encoding (supplementary Fig. 4D, E). In the ipsilateral or contralateral hemisphere, the inhibitory modified glycine receptor PSAM4-GlyR was induced in MT neurons and then used to silence those neurons during memory retrieval (the choice test) (Fig. 4E–G; and supplementary Fig. 4F, G). Importantly, we first confirmed in a control experiment that uPSEM792, the synthetic ligand for PSAM-GlyR, has no effect on choice behavior in the absence of PSAM-GlyR (supplementary Fig. 4H, I). Because BLA→OFC projections are largely ipsilateral, mice with contralateral infusions will have BLA neuron activity suppressed during memory encoding, thus depriving MT neurons of signal. The contralateral hemisphere will have intact BLA neurons during memory encoding, but the MT neurons receiving their input will be inhibited during memory retrieval. If neural activity in the BLA is necessary for MT formation, mice in the contralateral group will be unable to retrieve updated action memories. Meanwhile, mice in the control ipsilateral group will have one hemisphere intact throughout. As hypothesized, mice in the contralateral condition were unable to execute flexible responding during the choice test, while ipsilateral mice were unaffected (Fig. 4H). Thus, BLA neuron activity is necessary during the encoding of new reward information for MT formation in the OFC and later engagement of adaptive action strategies. This likely occurs via output of MT neurons to the dorsal striatum, given that these MT neurons terminate in this brain region (supplementary Fig. 4J), and orbito-dorsostriatal projections are necessary for goal-directed action flexibility8,27,28,29.

This discovery leads to the prediction that excitatory neural activity in the BLA is necessary for dendritic spine rearrangement on MT neurons in the OFC. Accordingly, we again labeled MT neurons—those expressing Cfos upon new memory encoding. Also, during the memory encoding period, excitatory neurons in the BLA in one hemisphere were chemogenetically silenced, while the BLA in the opposite hemisphere carried only a control viral vector (Fig. 4I, J; and supplementary Fig. 4K, L). Following memory retrieval, MT neurons in the hemisphere with dampened BLA input had fewer mushroom-type spines compared to MT neurons receiving unadulterated BLA input (Fig. 4L, M) and accordingly, a lower proportion of mushroom-type spines (Fig. 4N). Thus, excitatory activity in the BLA triggers learning-related dendritic spine plasticity on MT neurons in the OFC.

Neurotrophin-related signaling within amygdalo-cortical circuits supports new learning and associated spine plasticity

Lastly, to refine our understanding of what kind of cell activities in the BLA may support MT formation in the OFC, we hypothesized that neurotrophin signaling within the BLA is necessary for MT formation in the OFC.

The high-affinity receptor for Brain-derived neurotrophic factor (BDNF) is tropomyosin receptor kinase B (TrkB). TrkB forms dimers that initiate downstream signaling30,31. Full-length isoforms are necessary for long-term potentiation and other forms of excitatory plasticity, including in the BLA32. Meanwhile, the truncated isoform, TrkB.t1, lacks the intracellular tail necessary for signal propagation and thus serves as a dominant negative. Here we again capitalized on the ipsilateral nature of BLA-to-OFC connections. We overexpressed Trkb.t1 in one hemisphere of the BLA and induced Gi-coupled DREADDs in MT neurons in the ipsilateral or contralateral hemisphere (Fig. 5A–C). Thus, mice with contralateral infusions will have BLA neurons with compromised TrkB signaling projecting to healthy MT neurons. In the contralateral hemisphere, BLA neurons with intact TrkB-mediated signaling will project to MT neurons that will be inhibited during memory retrieval. In the control group, one hemisphere is intact throughout. These mice were capable of demonstrating flexible action strategies, while mice in the contralateral condition failed to respond flexibly during the choice test (Fig. 5D, E). Thus, neurotrophin-related signaling within the BLA is necessary for MT formation in the OFC.

Fig. 5: Neurotrophin signaling in the BLA is necessary for flexible OFC-dependent action strategies and associated dendritic spine plasticity.
Fig. 5: Neurotrophin signaling in the BLA is necessary for flexible OFC-dependent action strategies and associated dendritic spine plasticity.
Full size image

A. Viral vector strategy. B. Representative infusions (left) of induced Gi-DREADDs in the OFC and (right) HA-tagged TrkB.t1 in the BLA. Scale bars, 100 µm. C. Experiment timeline. D. Responses across training. E. Contralateral infusions impaired the ability of mice to update action strategies, such that this group failed to respond flexibly during the choice test (no main effect of group F(1,14) = 1.323, p = 0.2692, no main effect of reinforcement F < 1, interaction group X reinforcement F(1,14) = 7.821, p = 0.0143). ipsilateral n = 7 contralateral n = 9. F. Representative MT neurons (red) co-labeled with YFP. MT neurons were identified using the same training and 4OHT administration timing as in C. Scale bars, 50 µm top, 10 µm bottom. G. Thin-type spine densities did not differ between hemispheres (p = 0.141), but H. mushroom-type spine densities were lower in the TrkB.t1 hemisphere (p = 0.007). I. The ratio of mushroom-to-thin-type spines trended lower on MT neurons in the OFC lacking typical BLA neuron activity (p = 0.078). n = 8 hemisphere pairs. Data are presented as individual mice or mean ± s.e.m. *p <  0.05. ns not significant. Data were analyzed by t-test or ANOVA, as indicated by the test statistic. Full statistics detailed in supp. Table 5.

A natural prediction is that neurotrophin signaling in the BLA would also impact dendritic spine profiles of MT neurons in the OFC. To test this possibility, we infused a control viral vector in the BLA of one hemisphere and the Trkb.t1-expressing viral vector in the opposite hemisphere of Fos2A-iCreER /Thy1-YFP mice crossed with mice expressing Cre-driven tdTomato. Thus, 4OHT injection immediately following the non-reinforced session again labeled MT neurons with a red fluorophore, while YFP allowed for their high-resolution 3D reconstruction (Fig. 5F). As with chemogenetically silencing BLA neurons, obstructing neurotrophin signaling in the BLA caused MT neurons within the same hemisphere to suffer attrition of mushroom-shaped dendritic spines (Fig. 5G–I). Thus, neurotrophin signaling in the BLA supports mature spine presence on MT neurons in the OFC.

Discussion

Here we investigated a cohesive neuron ensemble within the OFC that controls flexible behavior. This neuron ensemble is active during the encoding of new memories regarding action consequences and is both necessary and sufficient to drive the later expression of flexible choice when mice must retrieve those memories (see also8,24). This discovery is consistent with the cognitive map hypothesis of OFC function, indicating that the OFC tracks associations between cues and actions and events, creating models to predict the likely outcomes of future behaviors33. Thus, the OFC must be online for a “map” and predictions to form and then available to guide behavior later, and particularly in contexts of new and detailed learning33,34,35. Here, we provide evidence that a stable ensemble controls this function and that neurons therein are functionally and structurally distinct relative to neighboring cell populations not in the ensemble, as well as cell populations that are responsive to familiar reward information. Further, they undergo learning-related structural remodeling that requires concurrent excitatory neural activity and neurotrophin signaling in the BLA.

Across species, the OFC supports action flexibility, referring to the ability to modify familiar action strategies based on an awareness of the consequence of our behaviors27,36,37,38. This process often requires the encoding and retrieval of new long-term memories, which we investigated here. Mice were trained to generate multiple responses for food delivery. Then, one response became less likely to be reinforced, violating the established action-outcome contingency associated with this particular behavior and triggering new memory encoding. Mice were later given the opportunity to engage in varied responses, requiring them to retrieve reward memories to adaptively favor the more highly reinforced action. By genetically accessing neurons active during memory encoding and silencing them during memory retrieval, we first replicated evidence8 that a neuron population, which we refer to as memory trace (MT) neurons, resides within the OFC that encodes action strategy memory necessary for flexible choice.

In some circumstances, stimulation of excitatory OFC neurons paradoxically obstructs the very same flexible choice behavior, inducing habit-like actions8,13 and compulsive-like behavior39,40, often considered related to habit. These patterns raise the question: Could cell ensembles in the OFC control both goal seeking and habit-like behavior? To address this question, we utilized distinct schedules of reinforcement, with ratio schedules generally supporting goal-directed behavior that is sensitive to action outcomes. In contrast, interval schedules promote habitual behavior (non-selective responding) by weakening the association between actions and their outcomes17,41. Chemogenetically silencing MT neurons did not alter habit-based behavior. Meanwhile, stimulation of these cells augmented choice flexibility following interval training, over-riding habit-like behavior. Thus, MT neurons are both necessary and sufficient for flexible goal seeking, but do not impact routinized habit. This neuron ensemble appears durable, given that we recently found that chemogenetically stimulating MT neurons multiple weeks after their initial ‘trapping’ and in mice that experienced additional training still enriched action flexibility24. This multi-week interval exceeded the duration of time between tests here, suggesting that the patterns reported here cannot simply be attributed to the limited passage of time. Notably, MT cell stimulation also improves action flexibility in similar tasks – namely when response inhibition results in reward24. Thus, it appears that one function of these MT cells is to enable organisms to apply learned rules to similar circumstances in the future. We hypothesize that the nature of the memory may be to represent the causation between actions and outcomes and that this relationship is changeable, such that mice are primed to adaptively inhibit behaviors that fail to be reinforced in the future.

Here we also found that some OFC neurons expressed the immediate-early gene Cfos when mice engaged in familiar response strategies. Their inactivation had no impact on later goal-directed or habit-like behavior. Of course, this outcome does not contradict evidence that neural hyper-activity in the OFC can under some circumstances trigger habit-like action sequences8,13,15,16,42; rather, it suggests that OFC neurons active during familiar actions are not necessary for the same actions to be executed later and it is again consistent with the task space hypothesis of OFC function, which would not predict a role for OFC neurons in habitual behavior33. Indeed, the observation that OFC neurons are apparently active in response to familiar task conditions was unexpected (here and also8). What their function might be remains unclear.

Interestingly, both MT neurons and “familiar” neurons—those containing Cfos following the execution of familiar behaviors – expressed less Dlg4, encoding PSD95, compared to bulk OFC tissue. PSD95 is a marker of excitatory synapses, which led to the hypothesis that dendritic spines, the primary sites of excitatory synapses in the brain, underwent some sort of plasticity relative to neighboring cells. Thin-type dendritic spines were indeed lower in density, consistent with Dlg4 attrition, and consistent with prior reports of thin-type spine elimination on OFC neurons in instrumentally trained vs. yoked mice8,43. What could not be captured in prior reports, though, is the morphology of functionally defined sub-populations, which revealed that MT neurons contain high densities and proportions of mature mushroom-type spines, relative to “familiar” neurons or neighboring cells in the same mice. These spines house stable synapses44, which could facilitate the accessibility of these MT neurons for subsequent memory retrieval.

Dendritic spines are often considered substrates of learning and memory45,46,47, leading us to reason that if spine profiles are associated with new learning, then stimuli that obstruct learning should interfere with spine plasticity. Here we turned to the psychostimulant cocaine, which, across species, impairs the ability of organisms to learn from feedback—for instance whether certain choices are reinforced and should be pursued or not—hampering one’s ability to predict the consequences of one’s actions48,49,50,51,52. Cocaine obstructed the capacity of mice to update expectations, as expected, and also diminished learning-related spine plasticity on MT neurons. For instance, thin-type spines were pruned, but to a lesser degree following cocaine. Similarly, spine head enlargement, a process necessary for synapse stabilization53, was blunted. And finally, the high proportions of mature, mushroom-type spines on MT cells failed to manifest, suggesting that the conversion of spines into this mature form is essential for new learning. These effects of cocaine are likely attributable, at least in part, to the impact of cocaine on cortical cell adhesion factors, leading to spine stasis54, as well as to cocaine-elicited stress hormone release, destabilizing mushroom-type spines24.

Cocaine did not impact overall dendritic spine densities, despite prior evidence of spine attrition on OFC neurons following psychostimulant exposure54, including self-administered cocaine26,55,56. A possible explanation is the localization of imaging: Cocaine in adult subjects (as here) causes dendritic spine attrition on apical arbors in the lateral OFC24, but this phenomenon has not been well-investigated, to our knowledge, on basilar arbors. Basilar arbors were imaged here because dendritic spine features (spine head volumes and thin-type spine densities) on the basilar arbors of the same YFP + OFC neurons imaged here that receive BLA input and that have terminals in the dorsomedial striatum predict the capacity for flexible action in mice8.

The BLA is a major source of projections to the OFC57, and BLA-to-OFC projections are necessary for action memory updating writ large8, reinforcement learning58, and value-based choice59, among other processes. We found that like OFC neurons, BLA neurons projecting to the OFC were active during new memory encoding. Further, chemogenetically silencing BLA neurons during memory encoding obstructed later action flexibility and proliferation of mature spines on OFC neurons, indicating that excitatory neural activity in the BLA is necessary for MT formation and associated neurosequelae in the OFC. While we did not confirm that MT neurons receive direct input from BLA projections, we think it likely because OFC neurons that receive direct monosynaptic input from the BLA undergo virtually identical spine ratio changes as MT neurons here, which are also blocked by cocaine8. A related but distinct question is whether the mature spines apparently gained receive input from the BLA and/or elsewhere. It is conceivable that MT neurons receive direct BLA input and at the same time, new spines receive input from the same and/or other structures. Possibly, BLA input conveys salience signals, communicating outcome-related information to the OFC8,58,60. The allocation of neurons into an MT is tightly linked to relative neuronal excitability61. Potentially, the BLA biases which OFC neurons are allocated to the MT by providing excitatory input, reinforcing mature spines on recipient neurons.

We lastly investigated molecular activities in the BLA necessary for new learning. BDNF is a primary neurotrophic factor in the brain, necessary for long-term potentiation in the BLA32, as well as the serotonergic regulation of inhibitory GABAergic transmission62. TrkB is the high-affinity receptor for BDNF, essential for amygdala-dependent learning like fear conditioning63. TrkB dimerizes upon binding, with full-length isoforms initiating intracellular signaling30,31. Meanwhile, the truncated isoform lacks the intracellular tail necessary for signal propagation and suppresses BDNF-mediated signaling30,31,64. We unilaterally expressed TrkB.t1 in the BLA to disrupt endogenous neurotrophin signaling and chemogenetically inhibited MT neurons in the contralateral hemisphere during memory retrieval, disrupting action flexibility. In concert, MT neurons in the OFC suffered mature spine attrition if the associated ipsilateral BLA bore the truncated trkB isoform. Thus, TrkB-mediated signaling supports the intra-hemispheric amygdala-orbital interactions necessary for sustained action flexibility and mature spine stability on MT neurons. The source of BDNF is one open question: While the amygdala contains Bdnf mRNA65, suggesting that it is locally produced, BDNF protein is also subject to anterograde and retrograde transport, such that Bdnf depletion in the OFC deprives the amygdala of BDNF content66. This raises the possibility that BDNF within an amygdala-orbital circuit is necessary for MT formation, though projection-specific Bdnf depletion (a tool that would enable one to test this hypothesis) remains challenging67.

Interestingly, chemogenetically silencing BLA projections and suppressing trkB signaling in the BLA largely spared learning-related pruning of thin-type spines. Other inputs may contribute to the elimination of this immature spine population (or its conversion to mature types). Candidates include cortico-cortical projections, given that some were active during new memory encoding (supplementary Fig. 4c). Further, projections from the medial prefrontal cortex to OFC appear to optimize other forms of reward-related learning via interactions with spine-enriched cell adhesion proteins within excitatory OFC neurons68.

Memory engrams refer to groups of cells that 1) become activated through the process of learning, 2) are reactivated by a component of the initial stimulus to facilitate memory retrieval, and 3) exhibit lasting cellular modifications69. We report that MT neurons 1) are active after learning new reward information, 2) need to be reactivated to retrieve updated action strategy, and 3) exhibit unique learning-induced changes in dendritic spine profiles. Thus, memory engrams appear to form in the OFC to support flexible action strategy. To visualize dendritic spines, we used a mutant mouse that bears sparse, Golgi-like YFP labeling in a subset of layer V neurons. The sparse labeling is useful for visualizing dendrites in high resolution because it minimizes background interference from other cells traversing the field of view, but of course, it prohibits us from determining the cell types, and their relative proportions, making up the engram, which could be a topic of future investigation. Our findings also raise the question of whether structural changes on MT neurons are causally related to updated action strategies. We have demonstrated strong associations here supporting this possibility, and further, pharmacological interventions that restore response flexibility following cocaine require dendritic spine plasticity in the OFC25. Recently bioengineered actin cytoskeletal modulators70 could potentially be leveraged to allow for optical control of spine dynamics specifically in activity defined neuronal populations, thus causally linking dendritic spine plasticity on defined neurons—such as those forming an MT—with behavioral outcomes.

Methods

Subjects

Procedures were performed in accordance with National Institutes of Health Guidelines for the Care and Use of Laboratory Animals and approved by the Emory University IACUC, protocol 201700227. Mice were bred from Jackson Laboratory stock. Memory trace experiments were performed using Fostm2.1(iCre/ERT2)Luo/J mice (no. 030323)71. For dendritic spine imaging, they were crossed with B6.Cg-Tg(Thy1-YFP)HJrs/J mice (no. 003782)21. In a final experiment, these double-mutants were crossed with Cre-dependent tdTomato reporter mice (B6.Cg-Gt(ROSA)26Sortm14(CAG-tdTomato)Hze/J, no. 007914)72 to create triple mutant mice bred on a mixed C57BL/6 background. Mice were homozygous for the Fostm2.1(iCre/ERT2)Luo/J construct throughout. Male and female mice were evenly distributed, with no effects of sex or trends for such effects detected. Mice were group-housed on a 14-hour light cycle with temperatures between 17.8-26.1 °C and provided food and water ad libitum before testing. Mice randomized to their viral vector groups. Then, throughout testing, mice were food-restricted until body weights were reduced to 90-93% of baseline to motivate food-reinforced responding. Mice were aged to at least postnatal day 56 (P56).

Instrumental response training

Mice were trained to nose poke 2 ports to receive food reinforcers delivered in an independent food magazine in illuminated Med-Associates conditioning chambers. All mice were initially trained using a fixed ratio 1 (FR1) schedule of reinforcement. Mice were trained daily, with 30 pellets available for responding on each active port, for a total of 60 pellets/session. Once 60 pellets were attained, or 70 min elapsed, the session ended. Mice required 7–10 days of training to become proficient, earning all available pellets. In cases when mice underwent only FR1 training, the final 7 sessions for each mouse are reported. Reinforced responses are expressed as responses/minute.

Some mice were then trained further according to either ratio schedules of reinforcement, which engender goal-directed behavior because reinforcement is predictable, or random interval (RI) schedules, which, with time, induce habitual responding that is insensitive to action contingency17,41. In this case, random intervals of time are inserted during which reinforcers are not available, thus weakening the association between actions and outcomes. Random ratio (RR) schedules were RR3 or RR6, as indicated in the figures. RI schedules were RI30 or RI60 seconds, also as indicated. Schedule parameters were established such that mice responded at equivalent rates and acquired the same number of pellets, regardless of schedule, to control for motor activity and reward experience between groups.

Initially (Fig. 1), each response (left vs. right) resulted in a distinct reinforcer because a hallmark feature of goal-directed behavior is linking an action with the sensory properties of the associated reward. Here, 20 mg sweet grain- vs. chocolate-based pellets were used (Bioserv) because mice display no systematic preference for either flavor (see supplementary data). After confirming evidence of an MT neuron population that supports reward seeking, subsequent experiments proceeded to use single pellet reinforcement (sweet grain). This approach controls for spatial associations (e.g., left port = sweet grain pellet; right port = chocolate), ensuring that mice cannot use this variable to solve the task.

Test of response flexibility

After training, we assessed response flexibility in a test adapted from classical instrumental contingency degradation73. Two 25 min sessions were conducted on 2 consecutive days: First, 1 nose poke port was occluded. Pellets were delivered at a rate equivalent to the pellet delivery rate from the prior day or at least 1 pellet per minute (whichever was higher), with no relation to responding on the available port. Thus, responding at this port no longer predicted reinforcement. On the following day, the opposite port was occluded, and responding at the available port remained reinforced according to a variable ratio 2 schedule. The port designated to be reinforced vs. non-reinforced was randomized, and the order of the sessions over the 2-day period were counter-balanced. On a final third day, both ports were available during a 5 min choice test conducted in extinction. Preferential responding on the reinforced port is evidence of action flexibility, while equal responding across ports reflects inflexible responding, a deferral to response strategies established during training.

Drug administration

Prior to injection, mice were administered saline daily for 3 days to habituate them to injection stress. Injections followed training sessions. 4OHT (40.0 mg/kg, in 2% Tween 80, 5% DMSO and saline, 2 mL/100 g, i.p.; Hello Bio) was used to induce Cre-recombinase in Cfos+ cells in Fostm2.1(iCre/ERT2)Luo/J mice. Injection timing is indicated in the figure timelines. In all experiments using 4OHT, mice were housed with cagemates in a dark, quiet room for ≥2 hr after injection to minimize task-unrelated cfos, before being returned to the vivarium. They had been previously habituated to this quiet room by placing them in the space for >2 hr before the day of injection. Mice were then left undisturbed in their home cages for 5 days to allow for expression of the newly recombined construct. Then, behavioral testing resumed.

The synthetic DREADD ligand CNO was administered 25 min prior to the choice test or immediately following the non-reinforced session (1.0 mg/kg, in 2% DMSO and saline, 1 mL/100 g, i.p.; Sigma). Importantly, this dose does not impact responding in this task8,43, and it does not produce detectable back-metabolized clozapine74. When CNO was administered in combination with 4OHT, it was administered as a drug cocktail at the doses indicated in 2% Tween 80, 5% DMSO, and saline, 2 mL/100 g, i.p.

The synthetic PSAM ligand, uPSEM792, was administered 25 min prior to the choice test (1.0 mg/kg, in saline in a volume of 1 mL/100 g, i.p.; Tocris). When uPSEM792 was administered in combination with CNO, it was administered as a drug cocktail at the doses indicated in 2% DMSO and saline, 1 mL/100 g, i.p.

In these experiments, all mice received 4OHT, CNO, and/or uPSEM792, regardless of viral vector, except for experiments using Gq-DREADDs. Here, vehicle-treated mice were included to control for any (unlikely) constitutive activity of excitatory DREADDs constructs.

Finally, cocaine (30.0 mg/kg in saline, 1 mL/100 g; Sigma-Aldrich) or saline was administered i.p. daily for 14 days. Behavioral testing began 14 days following the final injection (procedure from75).

Surgery

Mice were anesthetized with ketamine (100.0 mg/kg, i.p., 1 mL/100 g) and dexmedetomidine (0.5 mg/kg, i.p., 1 mL/100 g). Presurgical and post operative analgesic meloxicam (5.0 mg/kg, s.c., 1 mL/100 g) was administered. Mice were placed in a digitized stereotaxic frame (Stoelting), the head was leveled, and scalp incised with a midsagittal cut to perform a craniotomy. Viral vector infusions were performed using a microliter syringe (Hamilton). Viral vectors were deposited into the OFC at stereotaxic coordinates: AP + 2.6, ML ± 1.2, DV-2.8; 0.5uL/site over 5 min, and into the BLA at: AP-1.5, ML ± 3.1, DV-5.0; 0.25uL/site over 10 min. Needles were left in place for an additional 5 min following infusion. Next, needles were withdrawn, the scalp was sutured, and mice were revived with antisedan (3.0 mg/kg, i.p., 1 mL/100 g). Mice were allowed 3 weeks for recovery and viral vector expression before testing.

Chemogenetic manipulation of MT neuron populations

Viral vectors containing Cre-dependent Gi-coupled DREADDs (AAV5-hSyn-DIO-hM4D(Gi)-mCherry; Addgene, 44362), Cre-dependent mCherry (AAV5-hSyn-DIO-mCherry; Addgene, 50459) or Cre-dependent Gq-coupled DREADDs (AAV5-hSyn-DIO-hM3D(Gq)-mCherry; Addgene, 44361) were deposited into the OFC.

After recovery, mice with Cre-dependent Gi-coupled DREADDs were taken through RR training. Next, 4OHT was administered immediately following the non-reinforced session. Subsequently, CNO was administered prior to the choice test. Next, mice underwent RI training and another test of response flexibility. Again, CNO was administered prior to the choice test.

Separate mice with Cre-dependent Gi-coupled DREADDs were taken first through RI training. 4OHT was administered immediately following the last 3 sessions prior to the test of response flexibility, when mice were performing highly routinized behaviors. CNO was then administered prior to the choice test. Next, mice underwent RR training and another test of response flexibility. Again, CNO was administered prior to the choice test.

Mice with Cre-dependent Gq-coupled DREADDs were trained using an FR1 schedule of reinforcement. Mice then received vehicle or 4OHT immediately following the non-reinforced session of the response flexibility task, when memory traces for new action strategies are forming, or as a control, the reinforced session. Mice were then further trained using RI schedules to develop habit-like behavior, which is insensitive to changes in action contingencies, giving us the resolution to detect improvements, if any, in action flexibility upon stimulation of MT neurons. CNO was administered prior to the choice test.

vTRAP affinity purification

A Cre-dependent fluorescently-tagged ribosomal subunit (AAV-FLEX-EGFP-L10a, Addgene, 98747) was infused into the OFC as described. Mice were then trained using an FR1 schedule of reinforcement and administered 4OHT immediately after the reinforced or non-reinforced session, inducing the Cre-dependent fluorescently tagged ribosomal subunit in neurons that were active after either familiar or new reward information. One hour after the subsequent choice test, mice were briefly anesthetized by isoflurane and rapidly decapitated. Brains were frozen at −80 °C, sectioned into 1 mm coronal sections, and then 1 mm tissue punches were collected from the OFC. Tissue punches from 3 mice/group were pooled to ensure sufficient yield for subsequent analysis.

Immunopurification proceeded as previously described23, with first homogenizing frozen tissue punches in chilled lysis buffer [20 mM HEPES KOH (pH 7.4), 150 mM KCl, 10 mM MgCl2 EDTA-free protease inhibitor (Roche), 0.5 mM DL-Dithiothreitol (DTT; Sigma-Aldrich), 100 µg/mL cycloheximide (Sigma-Aldrich), 10 µl/ml rRNasin (Promega) and Superasin (Applied Biosystems)]. Before immunoprecipitation, an “input” control was aliquoted from gross tissue lysate. Remaining lysate was taken through immunoprecipitation. Lysates were gently mixed for 16 hr at 4 °C with an affinity matrix suspension [300 µl Streptavidin MyOne T1 Dynabeads (Invitrogen), 120 µl Biotinylated Protein L (1 µg/µl in PBS; Fisher Scientific), and 50 µg each of GFP antibodies Htz-GFP-19C8 and Htz-GFP-19F7 (Memorial-Sloan Kettering Monoclonal Antibody Facility; bioreactor supernatant purity)]. The Absolutely RNA Nanoprep Kit (Agilent) was used to perform final RNA purification.

Quantitative PCR

100 ng of purified RNA was reverse transcribed. Reverse transcription was performed using SuperScript IV VILO Master Mix with ezDNAse enzyme (ThermoFisher Scientific, 11766050). 10 uL of water was added to each synthesized cDNA sample. cDNA was amplified using TaqMan™ Gene Expression Master Mix (ThermoFisher Scientific, 4369542) with probes of interest (Pvalb, Mm00443100_m1 and Dlg4, Mm00492193_m1) and an internal calibration probe (Gapdh, Mm99999915_g1). Non-template controls were performed for each experiment. The ΔCT method76 was used to create normalized mRNA expression quantities from raw CT values with Gapdh amplification as the reference for both immunopurified and input control samples. The fold change in levels of each transcript was calculated between normalized expression of immunopurified vs. input control samples.

Strategy to visualize dendritic spines on MT neurons

Mice in these experiments expressed sparse, Golgi-like YFP expression in pyramidal tract-type later V neurons21,22, enabling high-resolution single-cell imaging and reconstruction. To identify which of these neurons to image, additional fluorophores were induced in an activity-dependent fashion as follows:

  1. 1.

    For dendritic spine visualization demonstrated in Fig. 3F-S, Cre-dependent mCherry-expressing viral vectors were infused as described into the OFC. Mice were trained using an FR1 schedule of reinforcement and taken through the test of response flexibility. Mice were administered 4OHT immediately following the reinforced or non-reinforced session, inducing mCherry in neurons that expressed cfos after either familiar or new reward information. The choice test was conducted drug free.

  2. 2.

    To observe the impact of excitatory neuron activity in the BLA on dendritic spine morphologies on MT neurons in the OFC (Fig. 4I-H), Cre-dependent mCherry-expressing viral vectors were deposited bilaterally into the OFC as described. During the same surgery, constitutively expressed Gi-coupled DREADDs (AAV5-CaMKIIα-hM4D(Gi)-mCherry; Addgene, 504777) were infused in one hemisphere of the BLA, with a control viral vector (AAV5-CaMKIIα-mCherry; Addgene, 114469) in the opposite hemisphere. Mice were then trained using an FR1 schedule of reinforcement. Mice were taken through the test of response flexibility and administered a cocktail of 4OHT and CNO immediately after the non-reinforced session to suppress excitatory neural activity in the BLA during the encoding of new reward information. The choice test was conducted drug free.

  3. 3.

    Finally, to observe the impact of neurotrophin signaling in the BLA on dendritic spines on MT neurons (Fig. 5F-I), triple mutant mice were infused with a lentiviral vector expressing Trkb.t1 (LV-CMV-Trkb.t1) tagged with HA (Emory University Viral Vector Core) in the BLA to disrupt TrkB-mediated signaling in vivo. The contralateral BLA received an infusion of lenti-GFP as a control (Emory University Viral Vector Core). Mice were administered 4OHT immediately following the non-reinforced session, inducing tdTomato in neurons that expressed cfos upon new reward information. The choice test was conducted drug free.

Dendritic spine imaging, reconstruction, and quantification

Mice were briefly anesthetized by isoflurane and rapidly decapitated 1 hr after the choice test. Brains were collected and fixed in 4% paraformaldehyde (PFA) for 24 hr before transferring to 30% w/v sucrose. Brains were sectioned into 50 µm coronal sections on a freezing microtome held at −20 °C. Sections were mounted and cover slipped with Fluoromount-G™ mounting medium.

A spinning disk confocal Leica microscope (VisiTech International) was used to image secondary basilar dendritic branches on neurons in the OFC, anterior ventrolateral region—noted because some evidence suggests that anterior vs. posterior OFC may have different functions in goal-seeking behavior that could correspond with spine densities or features77,78. Only YFP+ neurons were visualized. Basilar branches were selected because dendritic spine morphologies thereon correlate with the capacity of mice to select actions based on anticipated consequences8,40. Z-stacks were collected with a 100 × 1.4 numerical port objective using a 0.1 µm step size.

The Imaris software (Oxford Instruments, version 8) FilamentTracer module was used to perform three-dimensional reconstructions of dendrites. The semi-automated auto-depth function was used to identify dendritic spines on segments ranging between 20–30 µm. Mushroom-type spines were identified by having a head:neck diameter ratio ≥1.1 and head diameter ≥0.7 µm. Thin-type spines were classified as having a head:neck diameter ratio < 1.1 and a length:neck diameter ratio ≥2.5, or otherwise classified as stubby-type24,40. 5–8 independent segments were imaged, analyzed, and averaged per mouse. Dendrites in each independent experiment were imaged and reconstructed by a single rater blinded to group.

Identifying active neurons projecting to the OFC during memory encoding

Cre-dependent retrograde mCherry (AAVRg-hSyn-DIO-mCherry; Addgene, 50459) was bilaterally infused into the OFC as described. Mice were then trained using an FR1 schedule of reinforcement and taken through the test of response flexibility. Mice were administered 4OHT immediately following the non-reinforced session, inducing mCherry in neurons that expressed cfos and that project to the OFC. Other mice received 4OHT immediately following the reinforced session as a “familiar” control group. A second control group remained in the home cage, received 4OHT, and then returned in the home cage. “MT” and “familiar” groups were tested in the choice test 5 days later. Finally, 1 hr after this session, mice were deeply anesthetized by ketamine/xylazine (120 and 10 mg/kg, i.p.) and trans-cardially perfused with cold PBS and 4% PFA. Brains were collected and fixed in 4% PFA for 24 hr before transferring to 30% w/v sucrose. Brains were sectioned into 50 µm coronal sections on a freezing microtome held at −20 °C. Sections were mounted and cover slipped with DAPI Fluoromount-G™ mounting medium. Images were acquired at 2-40X magnification, with 0.8 µm step size, and 8 images per z-stack, on a Keyence BZ-X710 microscope. Quantification analyses were performed using Cell profiler software. The analysis pipeline included background subtracting, intensity thresholding (Otsu method), parent object overlay, relating objects to one another, and automated cell counting. DAPI was used to identify cell bodies – the parent objects. mCherry+ cells and DAPI + /mcherry+ cells were counted. 3 images from 3 separate sections were quantified, and counts were averaged to give each mouse a single value per analysis. The experimenters were blinded to group throughout.

Asymmetric viral vector infusions

Mice underwent intracranial surgeries as described and received a unilateral infusion of a Cre-driven modified glycine receptor (AAV5-SYN-flex-PSAM4-GlyR-IRES-EGFP; Addgene, 119741) in the OFC. During the same surgery, they received a unilateral infusion of a Gi-coupled DREADD construct (AAV5-CaMKIIα-hM4D(Gi)-mCherry; Addgene, 504777) in either the ipsilateral or contralateral BLA. The infusion sides (left vs. right hemisphere) were randomly assigned, and roughly half of the mice had ipsilateral infusions and half contralateral. Mice underwent FR1 training and the test of response flexibility. CNO and 4OHT were administered immediately following the non-reinforced session to suppress excitatory neuron activity in the BLA during the encoding of new reward information. The modified glycine receptor ligand uPSEM792 was then administered prior to the choice test to silence MT neurons in the OFC. If excitatory neuron activity in the BLA during memory encoding is necessary for MT formation in the OFC, then contralateral infusions should ablate the ability of mice to update action strategies.

In a separate cohort, mice received a unilateral infusion of a Cre-driven Gi-coupled DREADDs (AAV5-hSyn-DIO-hM4D(Gi)-mCherry; Addgene, 44362) in the OFC as described. In the same surgery, they received a unilateral infusion of a lentiviral vector expressing Trkb.t1 (LV-CMV-Trkb.t1) tagged with HA (Emory University Viral Vector Core) in the BLA to disrupt TrkB-mediated signaling in vivo. We selected this virus because it has been repeatedly validated: It reduces phosphorylation of the full-length receptor and excitatory plasticity marker p-ERK by ~15-25% at the infusion site in vivo64,79 and causes dendritic spine collapse on excitatory neurons80, but without impacting full-length TrkB or BDNF content or inducing obvious neuronal damage63,64. Mice underwent FR1 training and the test of response flexibility. 4OHT was administered immediately following the non-reinforced session. CNO was administered prior to the choice test. If neurotrophin-related signaling in the BLA is necessary for MT formation in the OFC, then contralateral infusions should ablate the ability of mice to update action strategies.

Throughout, left vs. right hemisphere infusion sites were randomly assigned, and roughly half of the mice had ipsilateral infusions and half contralateral, detailed in supplemental table 6.

Histology

To verify viral vector placement, mice were deeply anesthetized by ketamine/xylazine (120.0 and 10.0 mg/kg, i.p.) and trans-cardially perfused with cold PBS and 4% PFA. Brains were collected and fixed in 4% PFA for 24 hr before transferring to 30% w/v sucrose. Brains were sectioned into 50 µm coronal sections on a freezing microtome held at −20 °C. mCherry or GFP were visualized to localize most chemogenetic constructs. Alternatively, localization of the modified glycine receptor constructs and TrkB.t1 constructs was visualized using immunohistochemistry. Free-floating brain sections were washed three times for 5 min each in TBS and then blocked in a 0.01% Triton X-100 and 5% normal goat serum (NGS) solution for 1 hr at room temperature. Tissues were then incubated for either 48 hr at 4 °C in primary antibody against GFP (to visualize the tag on the glycine receptor construct; Cell Signaling Technology; 1:250), or for 24 hr at 4 °C in primary antibody against HA (to visualize the tag on the TrkB.t1 construct; Millipore Sigma; 1:250) in 0.1% Triton X-100 and 1% NGS. Sections were washed 3 times for 10 min each in TBS and then incubated in a biotinylated secondary antibody (Vector Laboratories; 1:1,000) in 1% NGS in TBST (0.03% Triton X-100 in 1x TBS) for 1 hr at room temperature. Sections were washed 3 times for 10 min each in TBS and then incubated in 5 µg/mL streptavidin-fluorophore conjugate (Vector Laboratories SA-1500) in 1x TBS for 30 min at room temperature. Sections were mounted and cover slipped with DAPI Fluoromount-G™ mounting medium. Spread of viral vectors is documented in supplementary Fig. 5. Mice with mis-targeted viral vector infusion sites were excluded from analysis. Total number of mice excluded for mis-targeted viral vectors is listed in supplementary table 7. Experimenters making exclusions based on mis-targeted viral vectors were blinded to behavioral or dendritic spine outcomes.

Experimental design and tissue collection for validation of chemogenetic constructs

To validate chemogenetic constructs, cfos counts in mice that were transduced with Cre-driven Gi- or Gq-coupled DREADDs were compared to cfos counts in control mice expressing control viral vectors lacking DREADDs. 1 hr after the final choice test (when CNO was on-board), mice were deeply anesthetized by ketamine/xylazine (120 and 10 mg/kg, i.p.) and trans-cardially perfused with cold PBS and 4% PFA. Brains were collected and fixed in 4% PFA for 24 hr before transferring to 30% w/v sucrose. Brains were then sectioned into 50 µm coronal sections on a freezing microtome held at −20 °C.

To verify decreased cfos in the experiment utilizing both the Cre-driven modified glycine receptor and Gi-coupled DREADDs, mice were administered a drug cocktail of uPSEM792 and CNO 25 min prior to being exposed to forced swim stress. The purpose was to induce stress-related cfos, allowing for the resolution to detect lower cfos upon synthetic ligand administration (procedure from80). In this case, mice were placed in a glass cylinder (24 cm 3 15.5 cm diameter) filled with 25 °C water for 6 min. Mice were then dried and placed in a warm cage. 1 hr after swimming, mice were euthanized and brains were prepared as above. cfos immunostaining proceeded as described below.

Determining the reactivation properties of neuron ensembles

Mice expressed Cre-driven mCherry. Co-labeling between mCherry and cfos was measured to determine reactivation of neuron ensembles upon the choice test. In this case, 1 hr after the final choice test, mice were euthanized and tissue was collected as described. cfos immunostaining proceeded as described below.

cfos immunostaining

Sectioned tissue was incubated for 90 min at room temperature in a blocking solution of 2% NGS, 1% BSA, and 0.03% Triton X-100 (Sigma). Next, sections were incubated at 4 °C overnight with the primary antibody solution containing anti-cfos (Abcam; 1:1000), 2% NGS, and 0.03% Triton X-100. Sections were then incubated at room temperature for 1 hr in a secondary antibody solution (Alexa Fluor 488 or 647 (Life Technologies; 1:1000), 2% NGS, and 0.03% Triton X-100). Sections were mounted and cover slipped with DAPI Fluoromount-G™ mounting medium.

Images were obtained at 40X magnification using an ROI of fixed size for cfos quantification. Uniform exposure parameters were used throughout. Analyses were performed using Cell profiler software with a pipeline including background subtracting, intensity thresholding (Otsu method), and automated cell counting. In analyses determining % of cell populations that were co-labeled mCherry + /cfos + , the pipeline additionally included parent object overlay and relating objects to one another. mCherry was used to identify transduced cells – the parent objects. Total cfos+ cells, mCherry+ cells, and cfos + /mCherry+ cells were counted. 3 images from 3 separate sections were averaged to give each mouse a single value. The experimenter was blinded to group throughout.

Analysis of expected versus actual overlap between active neuron populations

To assess whether the observed overlap between cells containing cfos during 2 behavioral epochs differed from what would occur by chance, we conducted a bootstrap analysis (as described in81). The total number of neurons containing induced mCherry was first determined, as was the number of cells containing cfos following the choice test. Then, analyses proceeded. For instance, we calculated the total number of neurons containing mCherry upon learning new action-outcome information (NRR) and the total number of neurons containing cfos after the expression of habit-like behavior at the choice test (NRI). The proportion of neurons expressing both mCherry and cfos represents the overlapping population. To calculate an expected overlap, we randomly selected a sample of neurons (Nrandom) equal to NRI and calculated the percentage of overlap between Nrandom and NRR. This process was repeated 1000 times to generate a distribution of the expected overlap between action-outcome learning and habit expression neuron populations. The shaded rectangles in supplementary Fig. 1h represent the 95% confidence interval of this distribution. We classified the populations as either distinct (less overlap than chance), randomly overlapping (within expected chance), or overlapping (greater overlap than chance), if the actual overlap is less than, within, or greater than the 95% confidence interval of the expected overlap by chance, respectively.

Statistical analysis

Statistical analyses were conducted with GraphPad Prism and SPSS, versions 10.2.3 and 29.0.0.0. n values are reported in the figure captions, and individual mice are represented except for a small number of cases noted. Response rates, preference ratios, dendritic spine densities, and dendritic spine type ratios were analyzed by ANOVA, with repeated measures when appropriate. The percent of mCherry-labeled cells out of the total cell population labeled with DAPI in the home cage, familiar, or novel (“MT”) condition in Fig. 4D was also compared using ANOVA. In the case of significant interactions or main effects between >2 groups, post-hoc Tukey’s or paired t-tests were applied as indicated in the supplementary tables, with results indicated graphically. Planned comparisons were applied based on a priori hypotheses regarding the behavioral consequences of cocaine exposure in Fig. 3O (see23,24,25,26). Preference ratios refer to the response rate on the reinforced / non-reinforced ports.

t-tests were used to compare mCherry+ cell counts, cfos+ cell counts, % mCherry + /cfos + , and dendritic spine densities and dendritic spine type ratios between two groups, with paired tests utilized when comparing between hemispheres within mice. One-sample t-tests were used to compare RNA transcript levels calculated as fold change from bulk input control, set at 1, and groups were compared to each other using t-tests. We also calculated the percent change of spine densities in labeled MT cells (cfos + ) from non-labeled (cfos-) cells according to equation 1:

$$\left(\%{{{\boldsymbol{change}}}}=\frac{{{{{\boldsymbol{cfos}}}}}^{+}{{{\boldsymbol{dendrite}}}}\; {{{\boldsymbol{spine}}}\; {{{\boldsymbol{density}}}}-{{{\boldsymbol{mean}}}}\,{{{{\boldsymbol{cfos}}}}}^{-}{{{\boldsymbol{dendrite}}}}\; {{{\boldsymbol{density}}}}}}{{{{\boldsymbol{mean}}}}\,{{{{\boldsymbol{cfos}}}}}^{-}{{{\boldsymbol{dendrite}}}}\; {{{\boldsymbol{density}}}}}\right)$$

Spine type ratios were calculated as mushroom / thin spines for each dendrite.

Throughout these analyses, alpha was set at 0.05 and comparisons were 2-tailed. Sample sizes were based on power analyses using data from similar experiments. One p value at 0.0545 is noted. Values falling >2 standard deviations outside the mean were considered outliers and excluded. Based on these parameters, a mouse in the Gi-DREADD group was excluded from preference ratio comparisons in Fig. 1G, a mouse in the Gi-DREADD group was excluded from analysis in Fig. 1I-L, and a mouse from each group was excluded from analysis in supplementary Fig. 1F. One mouse from the “familiar” group was excluded from dendritic spine analysis in Fig. 3H-J.

Lastly, dendritic spine head diameters were compared by Kolmogorov-Smirnov (K-S) comparisons, with p 0.01 considered significant. Experiments were replicated at least once, with concordant results.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.