A combinatorial neural code for long-term motor memory

Kim, Jae-Hyun; Daie, Kayvon; Li, Nuo

doi:10.1038/s41586-024-08193-3

Download PDF

Article
Open access
Published: 13 November 2024

A combinatorial neural code for long-term motor memory

Nature volume 637, pages 663–672 (2025)Cite this article

33k Accesses
7 Citations
21 Altmetric
Metrics details

Subjects

Abstract

Motor skill repertoire can be stably retained over long periods, but the neural mechanism that underlies stable memory storage remains poorly understood^{1,2,3,4,5,6,7,8}. Moreover, it is unknown how existing motor memories are maintained as new motor skills are continuously acquired. Here we tracked neural representation of learned actions throughout a significant portion of the lifespan of a mouse and show that learned actions are stably retained in combination with context, which protects existing memories from erasure during new motor learning. We established a continual learning paradigm in which mice learned to perform directional licking in different task contexts while we tracked motor cortex activity for up to six months using two-photon imaging. Within the same task context, activity driving directional licking was stable over time with little representational drift. When learning new task contexts, new preparatory activity emerged to drive the same licking actions. Learning created parallel new motor memories instead of modifying existing representations. Re-learning to make the same actions in the previous task context re-activated the previous preparatory activity, even months later. Continual learning of new task contexts kept creating new preparatory activity patterns. Context-specific memories, as we observed in the motor system, may provide a solution for stable memory storage throughout continual learning.

Cortical preparatory activity indexes learned motor memories

Article 26 January 2022

Contextual inference underlies the learning of sensorimotor repertoires

Article 24 November 2021

Remembrance of things practiced with fast and slow learning in cortical and subcortical pathways

Article Open access 23 December 2020

Main

In our lifetime we stably retain a myriad of motor skills. How learned actions are stored in motor memory remains poorly understood. In the motor cortex, specific learned actions are evoked by distinct patterns of preparatory activity^7,9,10,11 (Fig. 1a). Preparatory activity is thought to provide the initial conditions for the ensuing dynamics dictating movement execution^{12,13,14,15,16}, but its relationship to subsequent action remains obscure^17,18,19. For example, it remains unknown whether preparatory activity states are linked to subsequent movement execution and therefore fixed for actions with identical kinematics; alternatively, preparatory activity might encode other cognitive variables associated with learned actions beyond the movement itself^4,5,7,8,20.

**Fig. 1: A behaviour paradigm for continual learning.**

A related question is how learned actions are maintained by motor circuits over time. Motor cortex circuits exhibit considerable plasticity during motor learning^{7,21,22,23,24,25,26,27,28,29,30,31,32}. Given this plasticity, the neural mechanism underlying motor memory storage is unclear. Recent studies propose memory storage mechanisms based on unstable representations^3,33: in a redundant neural network in which multiple network configurations produce the same output, activity patterns leading to the same motor output can change over time^34,35. For example, if a pattern of activity drives our speech of the word ‘cat’, a different pattern of activity may occur when we utter the word ‘cat’ a year later (Fig. 1b, left). This question remains under-explored as motor cortical activity has rarely been tracked over periods of more than one month^{36,37,38,39,40}.

Moreover, it is unknown how existing motor memories are protected from modifications by continual learning of new motor skills. Theories of learning posit a modular approach, in which multiple parallel motor memories are formed for distinct contexts^1,4,5,6, thus new learning takes place in separate modules. Neurophysiological studies of motor learning mostly examine single tasks. It remains poorly understood how neural representation of an action is formed and maintained when we learn to utilize the same action in different contexts—for example, learning to speak the word ‘cat’ in different sentences (Fig. 1b, right).

To address these questions, we developed an automated home-cage training paradigm in which mice learned to perform directional licking in different task contexts. Learned directional licking is dependent on preparatory activity in anterior lateral motor cortex^14,41,42 (ALM). We tracked ALM activity across continual learning for multiple months using two-photon calcium imaging. We found that learned directional licking was stably encoded in preparatory activity with little representational drift. Across learning multiple task contexts, multiple preparatory states were created to encode the same licking action in a context-dependent manner. Our results show that motor memories encode learned actions in combination with their context, which we call a combinatorial code. A feedforward network that stored sensorimotor combinations in high-dimensional hidden layers was able to explain multiple aspects of the results. Context-specific motor memories may help reduce interference of new learning to previously learned representations^6,8, thus protecting existing motor repertoire from erasure in the face of continual learning.

A continual learning paradigm

To track neural representation of the same movement across continual learning of new motor skills, we studied a stereotyped and yet cortex-dependent movement: goal-directed directional licking in mice. We developed a home-cage system in which mice voluntarily engaged in head fixation and learned multiple licking tasks without human supervision⁴³ (Fig. 1c). In a tactile-instructed licking task, mice discriminated the location of a pole during a sample epoch and reported decision using ‘lick left’ or ‘lick right’ after a delay epoch (Fig. 1d). Mice initially learned to lick left for anterior pole position and lick right for posterior pole position (task context 1; Fig. 1d). After achieving more than 75% correct (Methods), the home-cage system automatically reversed the contingency between pole locations and lick directions (task context 2; Fig. 1d). The delay epoch separated sensory stimuli from motor response. Thus in the two tasks, mice made identical actions under identical external environment after the delay epoch, but with different stimulus history and task rules. We therefore refer to these conditions as different ‘task context’.

Mice learned many rounds of reversals over several months (Fig. 1e). High-speed videography showed that tongue and jaw movements were consistent over time and across contingency reversals (Extended Data Fig. 1a,b). Mice were faster to reach criterion performance (>75% correct) for subsequent reversals (Fig. 1f and Methods). Faster reversal was observed when mice re-learned the previously learned sensorimotor contingency, but less correlated with overall amount of prior training (Extended Data Fig. 1c), consistent with a saving effect typically associated with motor skill learning².

The ALM is critical for planning and execution of directional licking^14,41,44,45. To test whether ALM is required for learned directional licking after extended training, we optogenetically silenced ALM activity during task performance in home cage⁴³ (Fig. 1g and Methods). We virally expressed a red-shifted channelrhodopsin⁴⁶ (ChRmine) in ALM GABA (γ-aminobutyric acid)-expressing (GABAergic) neurons and photostimulated ALM through a clear skull implant during voluntary head fixation (Fig. 1g). ALM photoinhibition during the delay epoch disrupted behavioural performance, even after multiple rounds of contingency reversal (Fig. 1h). Left ALM photoinhibition biased future licking to the ipsilateral direction (lick left) in a light dose-dependent manner (Fig. 1i and Extended Data Fig. 1d). These results show that directional licking consistently depends on ALM preparatory activity over time, thus enabling us to chronically track neural activity that is causally driving the learned licking actions.

Stable representation of action

To examine whether neural representations of learned actions drift over time (Fig. 2a), we performed longitudinal two-photon calcium imaging of ALM (GP4.3 mice; Extended Data Fig. 1e–g; imaging duration, 26–233 days). After mice attained high performance under task context 1 in the home cage, we transferred them to a two-photon microscope where they performed the same task in daily sessions (Methods). After brief acclimatization, mice maintained stable performance (Fig. 2b), with little performance change within session (Extended Data Fig. 1j). We imaged the same field of view across multiple days (Fig. 2c and Extended Data Fig. 2a; referred to as ‘expert-early’ or ‘expert-late’ sessions), covering different fields of view on interleaved days (Extended Data Fig. 2b). The imaged fields of view were remarkably stable. We identified 42,739 neurons that could be confidently matched across days based on their shapes and centroid locations⁴⁷ (Extended Data Fig. 2c–i; 50 fields of view, 8 mice; Methods).

**Fig. 2: Stable task-related activity over time within the same task context.**

ALM neurons exhibited task-related activity (dF/F₀; Fig. 2d, top). We deconvolved dF/F₀ to avoid the spillover of slow-decaying calcium dynamics across task epochs⁴⁸ (Extended Data Fig. 2j and Methods). Sorting neurons by their peak activities revealed similar task-related activity across days (Fig. 2d, bottom). We computed selectivity as the difference in activity between trial types divided by their sum (anterior versus posterior pole position for the sample epoch; lick left versus lick right for the delay and response epochs; correct trials; Methods). On error trials, when mice licked in the opposite direction to the instruction provided by pole location, ALM activity during the delay epoch predicted the licking direction (Extended Data Fig. 2k,l). Neurons showing significant trial-type selectivity (P < 0.001, two-tailed t-test) in expert-early sessions largely maintained their selectivity in expert-late sessions (Fig. 2e; Pearson’s correlation, sample epoch: R = 0.9404, P = 0; delay epoch: R = 0.8861, P = 0; response epoch: R = 0.9001, P = 0). A subset of ALM neurons exhibited altered activity across days, but these changes mainly occurred in non-selective neurons (Extended Data Fig. 3a–c). This suggests that lick direction encoding is selectively maintained.

To investigate lick direction encoding at the population level, we analysed ALM activity in an activity space, where each dimension corresponds to activity of one neuron^14,49. We estimated a ‘coding direction’ (CD_Delay) along which activity maximally discriminated future lick direction at the end of the delay epoch (‘preparatory state’; Methods). To examine population encoding over time (Fig. 2f), we estimated the CD_Delay using 50% of the trials in a session (training dataset) and projected activity in non-overlapping trials from the same session or across sessions (testing dataset; Fig. 2g). ALM activity along the CD_Delay was maintained over time (Fig. 2h), despite moderate changes in population activity vector (Extended Data Fig. 3d–f). We used a decision boundary on the CD_Delay to predict lick direction from ALM activity (Methods). A decoder defined in one session could accurately predict lick direction in other sessions regardless of the timespan between sessions, even up to 2 months apart (Fig. 2i; linear regression: −0.08 ± 0.11, mean ± s.e.m. across mice; P = 0.4870, t-test against 0). A decoder from expert-early or late sessions could similarly predict lick direction in expert-late or early sessions, respectively (Fig. 2j). Individual neurons contributing to the CD_Delay were highly correlated across sessions (Fig. 2k; Pearson’s correlation, R = 0.6053, P = 0).

We analysed ALM activity during the sample and response epochs and found similarly stable selectivity along the coding directions (Extended Data Fig. 4). These results show that ALM activity is selectively maintained along coding directions that encode learned directional licking for at least two months.

New representation emerges with learning

We next explored how motor memories form when new motor skills are acquired. A key question here is whether existing activity states are reused^11,31 or whether entirely new activity states are formed (Fig. 3a). To address this question, we monitored ALM activity across two different task contexts. After imaging in task context 1, we returned mice to the home cage to learn reversed sensorimotor contingency then imaged them again in task context 2 (Fig. 3b; task context 1→2). Performance was similar in the two task contexts (85.59 ± 1.00% versus 84.06 ± 0.99% correct rate, mean ± s.e.m.; P = 0.1862, paired t-test), and video analysis showed that mice made the same tongue and jaw movements (Extended Data Fig. 1a,b, bottom). We identified 1,118 ± 500 matched neurons in each field of view (58 fields of view, 10 mice; 31.88 ± 13.88 days between imaging sessions, mean ± s.d. across sessions).

**Fig. 3: New preparatory activity emerges by learning new task context.**

We observed a profound reorganization of ALM preparatory activity in new task context. Many ALM neurons lost or even reversed their lick direction selectivity in task context 2 (Fig. 3c, top), whereas other neurons retained their selectivity. Also, new selective neurons emerged in task context 2 (Fig. 3c, bottom). Across the population, neuronal selectivity across the two task contexts were not correlated (Fig. 3d and Extended Data Fig. 5e; Pearson’s correlation, R = −0.0057, P = 0.6774).

We examined population encoding of future lick direction by calculating the CD_Delay in each task context (Fig. 3e). Activity projected on the CD_Delay reliably differentiated lick direction within task context, but this activity collapsed when projected on the CD_Delay across task contexts (Fig. 3f). Across all fields of view, a CD_Delay decoder predicted lick direction at near chance level on average in the other task context (Fig. 3g). Individual neurons supporting the CD_Delay vectors in the two task contexts were weakly correlated (Fig. 3h; Pearson’s correlation, R = 0.3; significantly less than the correlation within task context over time in Fig. 2k, P = 0, bootstrap). Thus, different task contexts yielded distinct CD_Delay vectors. In contrast to the reorganization of ALM preparatory activity, selectivity during the sample and response epochs remained remarkably stable across task contexts (Extended Data Fig. 5). This ruled out the possibility that the change in preparatory activity was due to unstable imaging or changes in motor behaviour.

Although ALM preparatory activity was reorganized across task contexts on average, we found substantial individual variability across mice (Fig. 3g and Extended Data Fig. 6a–c). In some mice, the CD_Delay vectors in the two task contexts were nearly orthogonal (Fig. 3f). But in other mice, preparatory activity maintained along the same CD_Delay (Extended Data Fig. 6d), or even reversed direction along the CD_Delay (Extended Data Fig. 6e). Within each mouse, similar pattern of reorganization was consistently observed across different fields of view (Extended Data Fig. 6a–c), indicating that the variability was not due to heterogeneous sampling of neurons or location of imaging (Extended Data Fig. 6g). Task performance, uninstructed movements, task learning speed or the time interval between imaging sessions did not explain this individual variability (Extended Data Fig. 6f,g). Individual variability may result from differences in the underlying circuits (see later modelling).

Thus, new preparatory states form when mice learn to make the same licking actions under new task contexts. These results also show that distinct preparatory states in motor cortex can drive the same subsequent movement execution. Preparatory states could therefore encode a learned action in multiple representations that index distinct contexts.

Stable retention of learned representations

Encoding learned actions in combination with context could enable stable retention of motor memories over continual learning, because learning in different contexts forms parallel new representations without altering previously learned representations. To test this notion, we examined whether learned preparatory states in previous contexts were retained after intervening learning (Fig. 4a).

**Fig. 4: Previous task context re-activates learned preparatory activity.**

After imaging ALM activity in task contexts 1 and 2, mice were re-trained in task context 1 (notated as 1′ for re-learning) in the automated home cage (Extended Data Fig. 1f). We then imaged the same neuronal populations again (Fig. 4b; task context 1→2→1′). We observed a re-activation of the previous preparatory activity pattern, even though task contexts 1 and 1′ were tested 2 months apart on average (32–78 days; Fig. 4b and Extended Data Fig. 1f,g). Individual neurons showing lick direction selectivity in task context 1 were reconfigured in task context 2 but reappeared in task context 1′ (Fig. 4c,d and Extended Data Fig. 7h; Pearson’s correlation, task context 1 versus 1′, R = 0.7675, P = 0).

We examined whether ALM preparatory activity was re-activated along similar coding directions in activity space (Fig. 4e). Activity trajectories in lick left and lick right trials were well separated in task context 1′ when projected on the CD_Delay from task context 1 (Fig. 4f). By contrast, the activity trajectories were poorly separated when projected on the CD_Delay from task context 2 (Fig. 4g). Across all fields of view, a CD_Delay decoder trained on task context 1 predicted lick direction at near chance level in task context 2, but performance recovered in task context 1′ (Fig. 4h). Together, these data indicate a re-activation of the previously learned preparatory states under task context 1′.

We also observed a similar re-activation of ALM preparatory states associated with task context 2. In a subset of mice (n = 3), we further imaged the same ALM populations across task context 1→2→1′→2′, spanning up to 3 months (59–97 days across mice; Extended Data Fig. 1f,g). We found consistent reorganization and re-activation of CD_Delay vectors across the reversals (Fig. 4i). Thus, stable retention of preparatory states was not limited to any specific task context. Unlike preparatory activity, selectivity during the sample and response epochs were stably maintained across all task contexts (Extended Data Fig. 7).

In addition to the reorganization and re-activation of coding directions (CD_Delay), we also observed activity changes along other dimensions of activity space across task contexts (Extended Data Fig. 8). Activity along these dimensions did not discriminate lick direction (Extended Data Fig. 8e; ‘movement-irrelevant subspace’), and activity did not recover in previous task contexts (Extended Data Fig. 8c,d). Therefore, preparatory activity is selectively maintained along coding directions encoding behaviour-related information, but activity drifts over time along other non-informative directions^7,14,50.

Learning creates parallel representations

We next tested whether continual learning in new task contexts will keep creating new preparatory states. Experiments so far only tested two task contexts. Now we tested whether yet new preparatory states would emerge if mice learned to perform directional licking instructed by a novel stimulus (Fig. 5a).

**Fig. 5: Learning new tasks results in new preparatory activity.**

We trained mice to perform an auditory-instructed licking task in the automated home cage after imaging ALM activity in the tactile tasks (Fig. 5b; task context 1→2→3; 40–118 days). Mice discriminated frequency of a pure tone, and licked left for 2 kHz and licked right for 10 kHz. We then imaged the same ALM populations in auditory task. Individual neurons with significant lick direction selectivity during the delay epoch in tactile task showed distinct pattern of selectivity in auditory task (Fig. 5c; Pearson’s correlation, task context 1 versus 3, R = 0.3435; significantly less than the correlation within task context over time in Fig. 2e, P = 0, bootstrap).

We further examined whether ALM preparatory activity encoded tactile- and auditory-instructed lickings along different coding directions (Fig. 5d). Indeed, we found poor separation between activity trajectories in lick left and lick right trials when activities in the auditory task were projected on the CD_Delay from the tactile task (Fig. 5e). Across all fields of view, the CD_Delay decoders trained on the tactile tasks predicted lick direction poorly when tested on the auditory task (Fig. 5f). By contrast, a decoder trained within the auditory task could decode lick direction significantly better than the decoders from tactile task 1 and 2 (P = 0.0022 and P = 0.002, two-tailed paired t-test), indicating that their poor decoding performances in the auditory task were not due to a lack of neuronal selectivity.

Finally, ALM activity during the sample epoch was distinct across tactile and auditory tasks (Extended Data Fig. 9a,b). Lick direction selectivity during the response epoch remained stable across all task contexts (Extended Data Fig. 9e,f), which probably reflected conserved licking movement execution across tasks and ruled out the possibility of unstable imaging over time.

Together, these results show that motor learning produces context-specific preparatory states. Once learned, these activity states are stably stored and can be recalled after several months, despite intervening motor learning involving the same actions in other contexts. At the same time, activity related to movement execution remains the same across contexts. Preparatory states thus reflect context-specific motor memories that are stably retained over continual learning.

Preparatory activity reflects motor memory

We next explored how a context-specific neural code could support motor memory behaviour. Mice exhibited faster re-learning in the previously learned sensorimotor contingency (Extended Data Fig. 1c). We examined whether preparatory states retained a memory trace that could facilitate faster re-learning⁸.

We re-analysed the imaging data from tactile task 1→2→1′ in which we imaged ALM activity in the same task context before and after an intervening learning. If learning of task context 2 left a memory trace, we should observe an activity change in task context 1′ compared with task context 1, and this change should support the performance of task 2. We calculated the CD_Delay for task context 2 and projected ALM activity at the end of the delay epoch on the CD_Delay (Extended Data Fig. 10a). ALM activity in task context 1′ exhibited increased lick direction selectivity along the CD_Delay compared with task context 1 (Extended Data Fig. 10b; P = 0.005, paired t-test). To examine whether this activity change could support the performance of task 2, we performed decoding of lick direction using activity projected on the CD_Delay from task context 2. Decoding was near chance level in task context 1 (52.75 ± 5.24%, mean ± s.e.m. across sessions) but significantly increased to 58.66 ± 4.63% in task context 1′ (Extended Data Fig. 10c; P = 0.0199, paired t-test). Thus learning of task context 2 left a subtle but persistent alteration of ALM preparatory activity along the CD_Delay⁸.

If each task-specific CD_Delay retains a memory trace of previous learning, distinct CD_Delay vectors could provide a place to store task-specific motor memories while protecting them from interference. We tested this notion by taking advantage of the individual variability that some mice exhibited distinct CD_Delay vectors across task contexts, whereas others exhibited fixed CD_Delay vectors (Extended Data Fig. 6a–c). Remarkably, mice with distinct CD_Delay vectors in different task contexts (lower dot product) re-learned the previously learned task faster (Extended Data Fig. 10d; P = 0.0002, Pearson’s correlation).

These results suggest that task-specific motor memories are stored along distinct coding directions in activity space, which could help protect the memories from new learning and support faster re-learning of previously learned tasks.

A feedforward network for stable memory storage

We used network modelling to explore network architectures that might support the observed memory storage. Preparatory activity is mediated by interactions between ALM and multiple brain regions⁵¹. Our goal was to be agnostic to how models map onto brain regions but explore what networks could explain reorganization of preparatory activity by learning, specifically: (1) formation of new preparatory activity across contingency reversal; and (2) re-activation of learned preparatory activity patterns after intervening task learning.

We started with recurrent neural networks⁵² (RNNs) (Fig. 6a). RNNs were trained to generate linear ramps along the correct readout dimension and no activity along the incorrect readout dimension (Fig. 6b, task context 1; Methods). For contingency reversal, we trained the internal connections of learned RNNs to generate the opposite responses while keeping the input and output connections fixed (Methods). Contrary to the neural data in which a new pattern of selectivity emerged after contingency reversal (Fig. 3), RNN activity mostly followed the network output (that is, lick direction; Fig. 6b). Network units similarly contributed to the CD_Delay defined by lick direction in both task contexts (Fig. 6c). We also tested RNNs in which only two internal units contributed to the output, yielding similar results (Extended Data Fig. 11a–c). RNN dynamics were therefore constrained to the previously learned CD_Delay and the networks solved the contingency reversal by re-association (Fig. 6d).

We next explored a class of amplifying feedforward (AFF) networks that generate persistent activity by passing activity through a chain of network states^53,54 (Fig. 6e and Extended Data Fig. 11d), which can be modelled as a series of layers with feedforward connections. AFF networks learned feedforward amplifications to generate choice-specific persistent activity in response to transient inputs to the early layer (Fig. 6f). Feedback connections conveyed output signals to early layers and allowed the network to learn (Methods). In the hidden layers, AFF networks maintained persistent activity along multiple dimensions (Extended Data Fig. 11e,f). AFF networks readily captured both features of the neural data: (1) upon contingency reversal, the network learned a new CD_Delay; (2) re-training in the previous sensorimotor contingency re-activated the previous CD_Delay (Fig. 6f,g). Resetting the weights of the hidden layers before re-training prevented the CD_Delay re-activation (Extended Data Fig. 11g,h). Thus, AFF networks stored sensorimotor mappings in hidden layers.

We next examined the features that allowed AFF networks to create new CD_Delay vectors upon contingency reversal learning while retaining previously learned CD_Delay vectors. Owing to feedforward and feedback connections, intermediate layers contained mixtures of input and output representations. We decompose AFF network activity into distinct modes. AFF networks learned a persistent stimulus mode and an output mode along orthogonal dimensions that together established the CD_Delay (Extended Data Fig. 12a). Upon contingency reversal, the output mode combined with the new stimulus mode to form a new CD_Delay (Extended Data Fig. 12a). Reversion to the previous contingency re-activated the original stimulus and output modes, which re-activated the previously CD_Delay (Fig. 6g). By contrast, we found that the persistent stimulus mode was absent in RNNs, which resulted in CD_Delay vectors that were aligned to only the output mode (Extended Data Fig. 12b). This suggests that a high-dimensional circuit that can maintain multiple persistent activity modes is critical to support context-dependent CD_Delay reorganization.

This feature of AFF networks could also explain individual variability across mice (Extended Data Fig. 6a–c). Individual networks could exhibit a range of CD_Delay reorganization depending on the relative strength of input and output representations in the intermediate layers (Extended Data Fig. 12a). Networks with strong stimulus modes (due to weak feedback connections) exhibited reorganized CD_Delay vectors; networks with strong output modes exhibited stable CD_Delay vectors aligned to the network output (Fig. 6h and Extended Data Fig. 12c). This suggests an unexpected role of stimulus activity in the formation of motor memory. We tested whether ALM stimulus activity could explain the individual variability across mice in our data. Remarkably, stimulus activity strength measured in task context 1 predicted whether a mouse would exhibit context-dependent reorganization of CD_Delay across task contexts (Extended Data Fig. 12d). This suggests individual differences in their underlying neural circuits.

In summary, an AFF network architecture that maintained multiple persistent activity modes to encode sensorimotor combinations in high-dimensional hidden layers could explain multiple aspects of the neural data. These results suggest that stable motor memory is rooted in high-dimensional representations. AFF network is a subclass of RNNs. There may be other architectures that could also produce these neural dynamics.

Discussion

Our study reveals a combinatorial neural code that stores learned actions in combination with their contexts. Within a task context, preparatory activity encoding lick direction is stably maintained over multiple months (Fig. 2), and even across intervening motor learning (Fig. 4). Across task contexts, the same action is preceded by distinct preparatory activity (Fig. 3), whereas selectivity related to sensory stimulus and movement execution remains remarkably stable over time and across task contexts (Extended Data Figs. 4, 5, 7 and 9). These results suggest that the same action can be encoded by multiple preparatory states. This afforded degree of freedom may allow the motor circuits to create parallel representations for the same actions while indexing their contexts. Indeed, we find that new task learning continually creates new preparatory states for learned actions in a context-dependent manner (Fig. 5). Motor learning thus forms modular motor memories for each context.

Preparatory states in different task contexts are arranged along distinct coding directions in activity space. Each coding direction retains a memory trace of the previous learning in specific tasks (Extended Data Fig. 10a–c). Context-specific coding directions could help protect existing memories from interference by new learning: mice with distinct coding directions across task contexts were faster to re-learn previously learned tasks—that is, greater saving (Extended Data Fig. 10d). These properties of ALM preparatory activity indicate that it reflects motor memory and reveal the underlying neural code for stable motor skill retention. Context-specific memory, as we observed in the motor system, may provide a solution for stable memory storage throughout continual learning. Learning in new contexts produces parallel new representations instead of modifying existing representations, thus protecting existing motor memories from erasure^6,8.

Motor cortical preparatory activity is thought to provide the initial conditions for subsequent movement execution¹⁶. Our results show that preparatory activity is not directly linked to the movement itself but reflects motor memories of learned actions and contexts⁵. Reorganization of preparatory activity across task contexts shares similarities with place cells of hippocampus, which encode space and experience within specific context and undergo global remapping across distinct contexts⁵⁵. Context-specific code may be a general feature for learning cognitive representations.

Our findings suggest that when movement parameters and task context are controlled, neural representation of actions in motor cortex shows surprisingly little representational drift. Interestingly, preparatory activity is selectively maintained along coding directions, but activity drifts over time along other non-informative directions (Extended Data Figs. 3 and 8). Preparatory activity is maintained by recurrent networks in motor cortex and connected brain areas^16,51. Our findings suggest that motor memories are stored in stable network configurations. Previous studies have reported representational drift in sensory, association, and memory-related brain regions^34,56,57. However, little representational drift has been reported in motor areas^38,39,40. Differences in brain areas and behavioural paradigms may explain some differences in these findings.

It was recently reported that motor learning induces a persistent change in preparatory activity^7,8. Notably, this persistent change occurs outside of the activity subspace encoding specific movements (coding directions), whereas the geometry of activity states encoding specific movements is mostly preserved. These studies examine activity change within a session or across a few days, thus the stability of the reorganized activity remains to be determined. By tracking activity over long-term, here we find that learning new task context induces a dramatic reorganization of the coding directions (Figs. 3 and 5), along with changes in movement-irrelevant subspace (Extended Data Fig. 8). We also find that, once learned, the preparatory states are stably retained and can be recalled after multiple months (Fig. 4). Thus multiple concerted changes, along both coding directions and movement-irrelevant subspaces, accompany motor skill learning and may work collectively to differentiate motor memories.

A combinatorial code requires high-capacity storage for motor memories owing to potentially many combinations of actions and contexts. Standard RNNs mostly reused output activity states in different tasks. The delay epoch separating sensory input and network output in time and the network training to generate ramping output dynamics during the delay epoch might have made it difficult for the RNNs to learn sensorimotor combinations. Our network modelling suggests that stable motor memory is rooted in high-dimensional representations and requires a network architecture that can readily acquire and store sensorimotor combinations (Fig. 6e–h). It remains to be determined how such high-dimensional representations map onto neural circuits. Preparatory activity is maintained by recurrent loops between ALM and subcortical regions^51,58, including thalamus⁵⁹, midbrain⁶⁰, and cerebellum⁶¹. The storage locus for such motor memories is unknown. We propose the cerebellum as a potential candidate. Cerebellar granule cells integrate inputs from the neocortex and form the basis for cerebellar output that influences preparatory activity⁶². Cerebellar granule cells are the most numerous cell type in the brain, which could provide a substrate for high-dimensional representations with minimal interference between motor memories^63,64. Future work probing mechanisms of memory storage in the cerebellum may be of interest.

Methods

Mice

This study was based on data from 36 mice (age more than postnatal day 60, both male and female mice). Fifteen GP4.3 mice (Thy1-GCaMP6s; Jackson laboratory, JAX 024275) were used for longitudinal two-photon calcium imaging. Among them, one mouse was removed from subsequent neuronal data analyses due to the low number of matched neurons across days (see ‘Preprocessing of two-photon imaging data’). Five GAD2-IRES-Cre mice (JAX 010802) were used for ALM photoinhibition in home cage. Five additional GAD2-IRES-Cre mice were used only for behaviour training in home cage. Eleven Slc17a7-Cre mice (JAX 023527) crossed to Cre-dependent GCaMP6f reporter Ai148 mice (JAX 030328) were used for behaviour training but were not used for calcium imaging due to poor behavioural performance (Extended Data Fig. 1i).

All procedures were in accordance with protocols approved by the Institutional Animal Care and Use Committees at Baylor College of Medicine. Mice were housed in a 12:12 reversed light:dark cycle and tested during the dark phase. On days not tested, mice received 0.5–1 ml of water. On other days, mice were tested in experimental sessions lasting 1–2 h where they received all their water (0.5–1 ml). If mice did not maintain a stable body weight, they received supplementary water⁶⁵. All surgical procedures were carried out aseptically under 1–2% isoflurane anaesthesia. Buprenorphine Sustained Release (1 mg kg⁻¹) and Meloxicam Sustained Release (4 mg kg⁻¹) were used for preoperative and postoperative analgesia. A mixture of bupivacaine and lidocaine was administered topically before scalp removal. After surgery, mice were allowed to recover for at least 3 days with free access to water before water restriction.

Surgery

Mice were prepared with a clear skull cap and a headpost^41,65. The scalp and periosteum over the dorsal skull were removed. For ALM photoinhibition in GAD2-ires-cre mice, AAV8-Ef1a-DIO-ChRmine-mScarlet⁴⁶ (Stanford Gene Vector and Virus Core; titre 8.44 × 10¹² viral genomes (vg) per ml) was injected in the left ALM (anterior 2.5 mm from bregma, lateral 1.5 mm, depth 0.5 and 0.8 mm, 200 nl at each depth) using a Nanoliter 2010 injector (World Precision Instruments) with glass pipettes (20–30 µm diameter tip and beveled). A layer of cyanoacrylate adhesive was applied to the skull. A custom headpost was placed on the skull and cemented in place with clear dental acrylic. A thin layer of clear dental acrylic was applied over the cyanoacrylate adhesive covering the entire exposed skull.

For two-photon calcium imaging in GP4.3 mice, a glass window was additionally implanted over ALM. A circular craniotomy with diameter 3.2 mm was made over the left ALM (anterior 2.5 mm from bregma, lateral 1.5 mm). Dura inside craniotomy was removed. A glass assembly consisting of a single 4 mm diameter coverslip (Warner Instruments; CS-4R) on the top of two 3 mm diameter coverslips (Warner Instruments; CS-3R) was combined using optical adhesive (Norland Products; NOA 61) and UV light (Kinetic instruments Inc.; SpotCure-B6). The glass window was affixed to the surrounding skull of craniotomy using cyanoacrylate adhesive (Elmer; Krazy Glue) and dental acrylic (Lang Dental Jet Repair Acrylic; 1223-clear).

Behaviour tasks and training in home cage

Details of behaviour task and training in the autonomous home-cage system have been described previously⁴³. In brief, a headport (~20 × 20 mm) was in the frontal side of the home cage. The two sides of the headport were fitted with widened tracks that guided a custom headpost (26.5 mm long, 3.2 mm wide) into a narrow spacing where the headpost could trigger two snap action switches (D429-R1ML-G2, Mouser) mounted on both sides of the headport. Upon switch trigger, two air pistons (McMaster; 6604K11) were pneumatically driven (Festo; 557773) to clamp the headpost. A custom 3D-printed platform was placed inside the home cage in front of the headport. The stage was embedded with a load cell (Phidgets; CZL639HD) to record mouse body weight. This body weight-sensing stage was also used to detect struggles during head fixations and triggered self-release. A lickport with two lickspouts (5 mm apart) was placed in front of the headport. Each of the lickspout was electrically coupled to the custom circuit board that detected licks via completion of an electrical circuit upon licking contacts^41,66. Water rewards were dispensed by two solenoid valves (The Lee Company; LHDA1233215H). The sensory stimulus for the tactile-instructed licking task was a mechanical pole (1.5 mm diameter) on the right side of the headport. The pole was motorized by a linear motor (Actuonix; L12-30-50-12-I) and presented at different locations to stimulate the whiskers. The sensory stimuli for the auditory-instructed licking task were pure tones (2 kHz or 10 kHz) provided by a piezo buzzer (CUI Devices; CPE-163) placed in front of the headport. The auditory ‘go’ cue (3.5 kHz) in both tactile and auditory tasks was provided by the same piezo buzzer.

Protocols stored on microcontrollers (Arduino; A000062) operated the home-cage system and autonomously trained mice in voluntary head fixation and behavioural tasks, as well as carrying out optogenetic testing. In brief, mice were placed inside the home cage and could freely lick both lickspouts that were placed inside the home cage through the headport. The rewarded lickspout alternated between the left and right lickspouts (3 times each) to encourage licking on both lickspouts. This phase of the training acclimatized mice to the lickport and the lickport was gradually retracted into the headport away from the home cage. The lickport retraction continued until the tip of the lickspouts was approximately 14 mm away from the headport. At this point, mice could only reach the lickspouts by entering the headport with the headpost triggering the head-fixation switches. After 30 successful voluntary head-fixation switch triggers, the pneumatic pistons were activated to clamp the headpost upon the switch trigger (‘voluntary head fixation’; Fig. 1c). The head-fixation training protocol continuously increased the pneumatic clamping duration (from 3 s to 30 s). This clamping was self-released when the body weight readings from the load-sensing platform exceeded either an upper (30 g) or lower (−1 g) threshold. Overt movements of the mice during the head fixation typically produced large fluctuations in weight readings exceeding the thresholds. These thresholds were dynamically adjusted during the training process.

When mice successfully performed head-fixation training protocol by reaching 30 s head-fixation duration, the next training protocol for the tactile-instructed licking task began. In the tactile-instructed licking task, mice used their whiskers to discriminate the location of a pole and reported choice using directional licking for a water reward^41,65 (Fig. 1d). The pole was presented at one of two positions that were 6 mm apart along the anterior–posterior axis. The posterior pole position was approximately 5 mm from the right whisker pad. The sample epoch was defined as the time between the pole movement onset to 0.1 s after the pole retraction onset (sample epoch, 1.3 s). A delay epoch followed during which the mice must keep the information in short-term memory (delay epoch, 1.3 s). An auditory ‘go’ cue (0.1 s duration) signalled the beginning of response epoch and mice reported choice by licking one of the two lickspouts. Task training had three subprotocols that shaped mice behaviour in stages. First, a ‘directional licking’ subprotocol trained mice to lick both lickspouts and switch between the two. Then, a ‘discrimination’ subprotocol taught mice to report pole position with directional licking. Finally, a ‘delay’ subprotocol taught mice to withhold licking during the delay epoch and initiate licking upon the ‘go’ cue by gradually (in 0.2 s steps) increasing the delay epoch duration up to 1.3 s. At the end of the delay subprotocol, the head-fixation duration was further increased from 30 s to 60 s. The head-fixation duration was increased by 2 s after every 20 successful head fixations. This was done to obtain more behavioural trials in each head fixation. The program also adjusted the probability of each trial type to correct biased licking of the mice.

Mice were first trained in one sensorimotor contingency (Fig. 1b, task context 1; anterior pole position→lick left, posterior pole position→lick right). Then, the correspondence between pole locations and lick directions was reversed (task context 2; anterior pole position→lick right, posterior pole position→lick left). Over multiple months, mice could learn multiple rounds of sensorimotor contingency reversal depending on experiment (see ‘Performance criteria for contingency reversals and acclimatization to imaging setup’).

For auditory-instructed licking task, mice were trained to perform directional licking to report the frequency of a pure tone presented during the sample epoch (Fig. 5b, task context 3; 2 kHz (low tone)→lick left, 10 kHz (high tone)→lick right). Task structures such as the delay epoch (1.3 s) and auditory go cue (3.5 kHz, 0.1 s) were the same as the tactile-instructed licking task.

Performance criteria for contingency reversals and acclimatization to imaging setup

For mice that underwent optogenetic experiment in home cage, contingency reversal was automatically introduced when mice reached performance criteria of >75% correct and <50% early lick for 100 trials in a given task contingency (Fig. 1e,h). Mice learned multiple rounds of contingency reversals before optogenetic experiment initiated. Optogenetic experiment was manually initiated based on inspections of behavioural performance (Fig. 1h).

Mice for two-photon imaging were over-trained in each task context to reach performance criteria of >80–85% correct for 100 trials. Over-training facilitated faster habituation after transferring to the two-photon setup. After mice acquired this high level of task performance in home-cage training, we transferred the mice to the imaging setup where they performed the same task in daily sessions under the two-photon microscope. During this period, mice were singly housed outside of the automated home-cage system. A brief acclimatization period lasting for a few days was required to habituate the mice to perform the task under the microscope (Extended Data Fig. 1e–g). We started imaging sessions once mice recovered their task performance (typically >75%). After imaging across multiple sessions, mice were returned to the automated home cage again in which they learned other tasks. In this manner, we repeatedly transferred mice between the automated home cage and two-photon setup for as long as possible (Extended Data Fig. 1f,g).

For tactile-instructed licking task, mice were first trained and imaged in one sensorimotor contingency (Fig. 3b, task context 1). After imaging under the two-photon microscope, we transferred the mice back to the home cage and reversed the sensorimotor contingency (Fig. 3b, task context 2). The mice were over-trained in the new task contingency before transferring to the two-photon setup to re-image the same ALM populations across task contexts (task context 1→2; 10 mice). In a subset of mice, after imaging, we re-trained the mice in the previous contingency in the home cage (Fig. 4b, task context 1′). After achieving proficient task performance, we translocated the mice to the two-photon setup and imaged the same ALM populations again (task context 1→2→1′; 5 mice). In a subset of mice, we further repeated the contingency reversal one more time and imaged across four task contexts (task context 1→2→1′→2′; 3 mice).

For auditory-instructed licking task, mice were imaged first in the tactile task contexts 1 and 2 before training in the auditory task to image the same ALM populations across task contexts (task context 1→2→3; 8 mice).

ALM photoinhibition in home cage

The procedure for ALM photoinhibition in home cage has been described previously⁴³. Light from a 633 nm laser (Ultralaser; MRL-III-633L-50 mW) was delivered via an optical fibre (Thorlabs; M79L005) placed above the headport (Fig. 1g). Photostimulation of the virus injection site was through a clear skull. The photostimulus was a 40 Hz sinusoid lasting for 1.3 s, including a 100 ms linear ramp during photostimulus offset to reduce rebound neuronal activity⁶⁷. Photostimulation was delivered in a random subset of trials (18%) during either the sample, delay, or response epoch. Photostimulation started at the beginning of the task epoch. Photostimulation power was 2.5, 12.5, or 25 mW, randomly selected in each trial. Therefore, the probability of each photostimulation condition was 2% (total of 9 conditions). The size of the light beam on the skull surface was 7.07 mm² (3.0 mm diameter). 2.5, 12.5, and 25.0 mW power corresponded to 0.35, 1.77, and 3.54 mW mm⁻² in light intensity. This range of the light intensity was much lower than the previous studies^41,42 (typically 1.5 mW with a light beam diameter of 0.4 mm, corresponding to 11.9 mW mm⁻²). To prevent the mice from distinguishing photostimulation trials from control trials using visual cues, a masking flash was delivered using a 627 nm LED on all trials near the eyes of the mice. The masking flash began at the start of the sample epoch and continued through the end of the response epoch in which photostimulation could occur.

Videography

Two CMOS cameras (Teledyne FLIR; Blackfly BFS-U3-04S2M) were used to measure orofacial movements of the mouse from the bottom and side views (Extended Data Figs. 1a,b and 5e). Both the bottom and side views were acquired at 224 × 192 pixels and 400 frames per second. Mice performed the task in complete darkness, and videos were recorded under infrared 940 nm LED illumination (Luxeon Star; SM-01-R9). A custom written software controlled the video acquisition⁶⁸.

Two-photon imaging

A Thorlabs Bergamo II two-photon microscope equipped with a tunable femtosecond laser (Coherent; Chameleon Discovery) is controlled by ScanImage 2016a (Vidrio). GCaMP6s was excited at 920 nm. Images were collected with a 16× water immersion lens (Nikon, 0.8 NA, 3 mm working distance) at 2× zoom (512 × 512 pixels, 600 × 600 µm). For all imaging sessions, we performed volumetric imaging by serially scanning five planes (30 or 40 μm equally spaced along the z axis) at 6 Hz each. The range of depth from all imaging planes was 120–500 μm below the pial surface, and the range of laser power was 80–225 mW, measured below the objective. To identify the spatial locations of individual field of view (FOV), we imaged at the pial surface before imaging during the task (Extended Data Fig. 2b). To monitor the same ALM neurons across days, we saved 6 reference images with 10 µm interval around the most superficial imaging plane for all imaging sessions and identified the most similar imaging plane based on visual inspection across sessions.

Multiple FOVs were imaged across multiple days in each task context. The same set of FOVs were imaged across multiple task contexts. Across all experiments, the total duration from the first imaging session to the last imaging session was 26–233 days (Extended Data Fig. 1g; 95.86 ± 71.95 days, mean ± s.d. across mice).

Behaviour data analysis

Performance was computed as the fraction of correct choices, excluding early lick trials and no lick trials. Mice whose performance never exceeded 70% after 35–40 days of training were considered unsuccessful in task learning (Extended Data Fig. 1h,i). Chance performance was 50%. Behavioural effects of photoinhibition were quantified by comparing the performance under photoinhibition with control trials using paired two-tailed t-test (Fig. 1i). To quantify the speed of task learning in a given task context (Fig. 1f and Extended Data Figs. 1c, 6g and 10d), we calculated the number of trials to reach performance criteria of >75% correct and <50% early lick for 100 trials. We excluded the trials in the head-fixation training protocol from the initial task learning for a fair comparison.

Video data analysis

We used DeepLabCut⁶⁹ to track manually defined body parts. Separate models were used to track tongue and jaw movements (Extended Data Fig. 1a,b). The development dataset for model training and validation contained manually labelled videos from multiple mice and multiple sessions (correct trials only). For tongue network model, 6 markers were manually labelled in 500 video frames. For jaw network model, 5 markers were manually labelled in 300 video frames. The frames for labelling were automatically and uniformly selected by the program at different timepoints within trials. The labelled frames of the training dataset were split randomly into a training dataset (95%) and a test dataset (5%). Training was performed using the default settings of DeepLabCut. All models were trained up to 500,000 iterations with a batch size of one. The trained models tracked the body features in the test data with an average tracking error of less than 2.5 pixels⁶⁸.

To analyse tongue and jaw movements during the response epoch, we defined single lick events based on continuous presence of the tongue volume in each frame⁴⁴. Tongue volume was determined from the internal area of the four tongue markers (Extended Data Fig. 1a, left), which were located at the corners of tongue. Lick events were separately grouped based on the lick duration for further time-bin-matched correlation analysis. x and y pixel positions of the tongue tip trajectories were calculated by averaging the frontal tongue markers in each frame. x and y pixel positions of the jaw tip trajectories were calculated by averaging the three frontal jaw markers in each frame. For each lick event, we obtained four time series (x position, y position, x velocity and y velocity) for the tongue (or jaw) tip trajectories (Extended Data Fig. 1a,b, middle). To calculate the similarity between the tongue (or jaw) tip trajectories across lick events (within lick left or lick right), we computed Pearson correlation on the time series for all pairwise lick events within and across sessions. We then calculated the average correlation for the four parameters (x position, y position, x velocity and y velocity) and compared them within session and across sessions (Extended Data Fig. 1a,b, right).

To examine jaw movements during the delay epoch across task contexts, we calculated the x and y displacement jaw tip position by subtracting the average jaw position in a baseline period (1.57 s) before the sample epoch (Extended Data Fig. 6f).

Preprocessing of two-photon imaging data

Imaging data were preprocessed using Suite2p package⁷⁰ to perform motion correction and extract raw fluorescence signals (F) from automatically identified regions of interest (ROIs). ROIs with >1 skewness were used for further analyses. Neuropil corrected trace was estimated as F_{neuropil_corrected}(t) = F(t) – 0.7 × F_neuropil(t). To visualize activity (Fig. 1d, top and Extended Data Fig. 2j, left), ΔF/F₀ (type 1) was separately calculated in each trial as (F − F₀)/F₀, where F₀ is the baseline fluorescence signal averaged over a 1.57 s period immediately before the start of each trial. For all other analyses, we calculated deconvolved activity to avoid the spillover influence of slow-decaying calcium dynamics across task epochs (Extended Data Fig. 2j). To calculate deconvolved activity, F_{neuropil_corrected} from all trials were concatenated and ΔF/F₀ (type 2) was calculated as (F − F₀)/F₀, where F₀ is a running baseline calculated as the median fluorescence within a sliding window of 60 s. Subsequently, ΔF/F₀ (type 2) was deconvolved using the OASIS algorithm⁴⁸ (Extended Data Fig. 2j) after estimating the time constant by auto-regressive model with order p = 1. Deconvolved activities were used for all the analyses in this study, except in Fig. 2d (top) and Extended Data Fig. 2j (left) where ΔF/F₀ (type 1) traces were shown. Type 1 and type 2 ΔF/F₀ only differed in their F₀ calculation.

To track the activity of the same neurons across days, spatial footprints of individual ROIs from the same FOVs were aligned across different imaging days using the CellReg pipeline⁴⁷. This probabilistic algorithm computes the distributions of centroid distance and spatial correlation between neuronal pairs of the nearest neighbour and all other neighbours within a 10 μm distance (Extended Data Fig. 2g,h). Based on the bimodality between distributions (nearest neighbours versus other neighbours), CellReg algorithm calculates the estimated false positive and false negative probabilities. By minimizing both estimated error rates for each pair of ROIs, this probabilistic algorithm identifies co-registered neurons and quantifies registration scores for these co-registered neurons (Extended Data Fig. 2i). If the mean squared errors of both centroid distance and spatial correlation model are above 0.1 (a pre-determined hyperparameter), CellReg algorithm generates an error and the FOV is considered as a failure to find co-registered neurons across days. One mouse was removed from all subsequent neuronal data analyses due to failures to find matched neurons across days from all imaging sessions, primarily due to poor imaging window quality. Among co-registered neurons, only neurons with reliable responses in at least one imaging session (i.e., Pearson correlation between trial-averaged and trial-type-concatenated ΔF/F₀ (type 1) peristimulus time histograms (PSTHs) calculated using the first versus second halves of the trials >0.5) were used for further analyses.

In the experiment where we imaged the same FOV across multiple sessions in the same task context, we define the sessions as expert-early and expert-late sessions (Fig. 2). In cases where we imaged the same FOV twice over time, the 2 sessions were defined as expert-early and expert-late sessions accordingly. In cases where we imaged more than 2 sessions from the same FOV over time, the expert-early and expert-late sessions were defined for pairs of sessions. Specifically, for single neuron analyses (for example, Fig. 2e,k), we only compared the first and second imaging sessions to avoid inclusion of duplicate data points from the same session. These two sessions are defined as expert-early and expert-late sessions, respectively. For population level activity projection and decoding analyses (Fig. 2i,j), we included all the possible pairwise comparisons. For each pair, the two sessions used are defined as expert-early and expert-late sessions, respectively.

Two-photon imaging data analysis

Neurons were tested for significant trial-type selectivity during the sample, delay, and response epochs, using deconvolved activities from different trial types (non-paired two-tailed t-test, P < 0.001; correct trials only). We used the early sample epoch (first 0.83 s, 5 imaging frames), late delay epoch (last 0.67 s, 4 frames), and early response epoch (first 1.33 s, 8 frames) as the respective time windows for the statistical comparisons and all the following analyses (Extended Data Fig. 4a–c). To examine the stability of single neuron selectivity index, we first identified significantly selective neurons in each task epoch. We then determined each neuron’s preferred trial type (‘lick left’ versus ‘lick right’) using the earlier imaging session in task context 1. Next, selectivity index was calculated as the difference in activity between trial types divided by their sum (anterior versus posterior pole position for sample epoch selectivity; lick left versus lick right for delay and response epoch selectivity; correct trials only). To define preferred trial types in earlier sessions, a portion of the trials were used for statistical tests to determine significant selectivity and the preferred trial type, then independent trials were used to calculate selectivity index within the same session. We then calculated selectivity for the defined neurons in later sessions or across different task contexts.

For error trial analysis (Extended Data Fig. 2k,l), only the imaging sessions with more than ten error trials for each trial type were analysed. Selectivity was calculated as the difference in trial-averaged activity (deconvolved calcium activity) between instructed lick right and lick left trials, using correct and error trials separately. Selectivity was calculated during the early sample epoch, late delay epoch, and response epoch.

To analyse the encoding of trial types in ALM population activity, we built linear decoders that were weighted sums of ALM neuron activities to best differentiate trial types. We examined the encoding of four kinds of trial types: (1) anterior versus posterior pole position trials for stimulus encoding during the sample epoch in the tactile-instructed lick task; (2) low tone (2 kHz) versus high tone (10 kHz) for stimulus encoding during the sample epoch in the auditory-instructed lick task; (3) lick left versus lick right for lick direction encoding during the delay epoch; and (4) lick left versus lick right for lick direction encoding during the response epoch.

To build the linear decoder for a population of n ALM neurons, we found a n × 1 vector coding direction (CD) in the n dimensional activity space that maximally separates response vectors in different trial types during defined task epochs—that is, CD_Sample for stimulus encoding during the sample epoch, CD_Delay for lick direction encoding during the delay epoch, and CD_Response for lick direction encoding during the response epoch. To estimate the CD vectors, we first computed CD_t at different time points as:

$${\bf{C}}{{\bf{D}}}_{t}({\rm{tactile}}\,{\rm{stimulus}},{\rm{sample}}\,{\rm{epoch}})={\bar{{\bf{x}}}}_{{\bf{posterior\; pole}}}-{\bar{{\bf{x}}}}_{{\bf{anterior\; pole}}}\,{\rm{for}}\,{\bf{C}}{{\bf{D}}}_{{\bf{S}}{\bf{a}}{\bf{m}}{\bf{p}}{\bf{l}}{\bf{e}}}$$

$${\bf{C}}{{\bf{D}}}_{t}({\rm{a}}{\rm{u}}{\rm{d}}{\rm{i}}{\rm{t}}{\rm{o}}{\rm{r}}{\rm{y}}\,{\rm{s}}{\rm{t}}{\rm{i}}{\rm{m}}{\rm{u}}{\rm{l}}{\rm{u}}{\rm{s}},{\rm{s}}{\rm{a}}{\rm{m}}{\rm{p}}{\rm{l}}{\rm{e}}\,{\rm{e}}{\rm{p}}{\rm{o}}{\rm{c}}{\rm{h}})={\bar{{\bf{x}}}}_{{\bf{high\; tone}}}-{\bar{{\bf{x}}}}_{{\bf{low\; tone}}}\,{\rm{f}}{\rm{o}}{\rm{r}}\,{\bf{C}}{{\bf{D}}}_{{\bf{S}}{\bf{a}}{\bf{m}}{\bf{p}}{\bf{l}}{\bf{e}}}$$

$${\bf{C}}{{\bf{D}}}_{t}({\rm{lick}}\,{\rm{direction}},{\rm{delay}}\,{\rm{epoch}})={\bar{{\bf{x}}}}_{{\bf{lick\; right}}}-{\bar{{\bf{x}}}}_{{\bf{lick\; left}}}\,{\rm{for}}\,{\bf{C}}{{\bf{D}}}_{{\bf{D}}{\bf{e}}{\bf{l}}{\bf{a}}{\bf{y}}}$$

$${\bf{C}}{{\bf{D}}}_{t}({\rm{l}}{\rm{i}}{\rm{c}}{\rm{k}}\,{\rm{d}}{\rm{i}}{\rm{r}}{\rm{e}}{\rm{c}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}},{\rm{r}}{\rm{e}}{\rm{s}}{\rm{p}}{\rm{o}}{\rm{n}}{\rm{s}}{\rm{e}}\,{\rm{e}}{\rm{p}}{\rm{o}}{\rm{c}}{\rm{h}})={\bar{{\bf{x}}}}_{{\bf{lick\; right}}}-{\bar{{\bf{x}}}}_{{\bf{lick\; left}}}\,{\rm{f}}{\rm{o}}{\rm{r}}\,{\bf{C}}{{\bf{D}}}_{{\bf{R}}{\bf{e}}{\bf{s}}{\bf{p}}{\bf{o}}{\bf{n}}{\bf{s}}{\bf{e}}}$$

where $\bar{{\bf{x}}}$ are n × 1 trial-averaged response vectors that described the population response for each trial type at each time point, t, during the defined task epochs. Next, we averaged the CD_t vectors within the defined task epoch to separately estimate the CD_Sample, CD_Delay, and CD_Response. CD_Sample, CD_Delay, and CD_Response were computed using 50% of trials and the remaining trials from the same session or from different sessions were used for activity projections and decoding (Fig. 2g; correct trials only).

To project the ALM population activity along the CD_Sample, CD_Delay, and CD_Response, we computed the deconvolved activity for individual neurons and assembled their single-trial activity at each time point into population response vectors, x (n × 1 vectors for n neurons). The activity projection in Figs. 2–5 and Extended Data Figs. 3–5, 7 and 9 were obtained as CD_Sample^Tx, CD_Delay^Tx, and CD_Response^Tx.

To decode trial types using ALM population activity projected onto the CD_Sample, CD_Delay and CD_Response (Figs. 2–5 and Extended Data Figs. 4, 5, 7 and 9), we calculated ALM activity projections (CD_Sample^Tx, CD_Delay^Tx and CD_Response^Tx) within defined time windows and we computed a decision boundary (DB) to best separate different trial types:

$${\rm{D}}{\rm{B}}({\rm{t}}{\rm{a}}{\rm{c}}{\rm{t}}{\rm{i}}{\rm{l}}{\rm{e}}\,{\rm{s}}{\rm{t}}{\rm{i}}{\rm{m}}{\rm{u}}{\rm{l}}{\rm{u}}{\rm{s}},{\rm{s}}{\rm{a}}{\rm{m}}{\rm{p}}{\rm{l}}{\rm{e}}\,{\rm{e}}{\rm{p}}{\rm{o}}{\rm{c}}{\rm{h}})=\frac{\langle {{{{\bf{C}}{\bf{D}}}_{{\bf{S}}{\bf{a}}{\bf{m}}{\bf{p}}{\bf{l}}{\bf{e}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{posterior\; pole}}}\rangle /{\sigma }_{{\rm{p}}{\rm{o}}{\rm{s}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{r}}\,{\rm{p}}{\rm{o}}{\rm{l}}{\rm{e}}}^{2}+\langle {{{{\bf{C}}{\bf{D}}}_{{\bf{S}}{\bf{a}}{\bf{m}}{\bf{p}}{\bf{l}}{\bf{e}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{anterior\; pole}}}\rangle /{\sigma }_{{\rm{a}}{\rm{n}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{r}}\,{\rm{p}}{\rm{o}}{\rm{l}}{\rm{e}}}^{2}}{{1/\sigma }_{{\rm{p}}{\rm{o}}{\rm{s}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{r}}\,{\rm{p}}{\rm{o}}{\rm{l}}{\rm{e}}}^{2}+{1/\sigma }_{{\rm{a}}{\rm{n}}{\rm{t}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{o}}{\rm{r}}\,{\rm{p}}{\rm{o}}{\rm{l}}{\rm{e}}}^{2}}$$

$${\rm{DB}}({\rm{auditory}}\,{\rm{stimulus}},{\rm{sample}}\,{\rm{epoch}})=\frac{\langle {{{{\bf{CD}}}_{{\bf{Sample}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{high\; tone}}}\rangle /{\sigma }_{{\rm{high}}\,{\rm{tone}}}^{2}+\langle {{{{\bf{CD}}}_{{\bf{Sample}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{low\; tone}}}\rangle /{\sigma }_{{\rm{low}}\,{\rm{tone}}}^{2}}{{1/\sigma }_{{\rm{high}}\,{\rm{tone}}}^{2}+{1/\sigma }_{{\rm{low}}\,{\rm{tone}}}^{2}}$$

$${\rm{D}}{\rm{B}}({\rm{l}}{\rm{i}}{\rm{c}}{\rm{k}}\,{\rm{d}}{\rm{i}}{\rm{r}}{\rm{e}}{\rm{c}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}},{\rm{d}}{\rm{e}}{\rm{l}}{\rm{a}}{\rm{y}}\,{\rm{e}}{\rm{p}}{\rm{o}}{\rm{c}}{\rm{h}})=\frac{\langle {{{{\bf{C}}{\bf{D}}}_{{\bf{D}}{\bf{e}}{\bf{l}}{\bf{a}}{\bf{y}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{lick\; right}}}\rangle /{\sigma }_{{\rm{l}}{\rm{i}}{\rm{c}}{\rm{k}}\,{\rm{r}}{\rm{i}}{\rm{g}}{\rm{h}}{\rm{t}}}^{2}+\langle {{{{\bf{C}}{\bf{D}}}_{{\bf{D}}{\bf{e}}{\bf{l}}{\bf{a}}{\bf{y}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{lick\; left}}}\rangle /{{\sigma }}_{{\rm{l}}{\rm{i}}{\rm{c}}{\rm{k}}\,{\rm{l}}{\rm{e}}{\rm{f}}{\rm{t}}}^{2}}{{1/{\sigma }}_{{\rm{l}}{\rm{i}}{\rm{c}}{\rm{k}}\,{\rm{r}}{\rm{i}}{\rm{g}}{\rm{h}}{\rm{t}}}^{2}+{1/{\sigma }}_{{\rm{l}}{\rm{i}}{\rm{c}}{\rm{k}}\,{\rm{l}}{\rm{e}}{\rm{f}}{\rm{t}}}^{2}}$$

$${\rm{DB}}({\rm{lick}}\,{\rm{direction}},{\rm{response}}\,{\rm{epoch}})=\frac{\langle {{{{\bf{CD}}}_{{\bf{Response}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{lick\; right}}}\rangle /{\sigma }_{{\rm{lick}}\,{\rm{right}}}^{2}+\langle {{{{\bf{CD}}}_{{\bf{Response}}}}^{{\rm{T}}}{\bf{x}}}_{{\bf{lick\; left}}}\rangle /{\sigma }_{{\rm{lick}}\,{\rm{left}}}^{2}}{{1/\sigma }_{{\rm{lick}}\,{\rm{right}}}^{2}+{1/\sigma }_{{\rm{lick}}\,{\rm{left}}}^{2}}$$

σ² is the variance of the activity projection ${{\bf{CD}}}^{{\rm{T}}}{\bf{x}}$ within each trial types. Decision boundaries were computed using the same trials used to compute the CD vectors and independent trials were used to predict trial types. To examine decoding performance across task contexts, we restricted the analysis to decoders with accuracy of >0.7 within the session it was trained in (cross-validated performance). This is because if a decoder exhibited low decoding performance to begin with, its decoding performance will be generally low in other sessions due to poor training of the decoder.

To analyse activity changes along other dimensions of activity space across task contexts, we defined a ‘uniform shift (US) axis⁷’ using trial-type-averaged activity:

$${{\bf{US}}}_{{\bf{context\; 1}}\to {\bf{2}}}=\left(\frac{{\bar{{\bf{R}}}}_{{\bf{context\; 2}}}+{\bar{{\bf{L}}}}_{{\bf{context\; 2}}}}{2}\right)-\left(\frac{{\bar{{\bf{R}}}}_{{\bf{context\; 1}}}+{\bar{{\bf{L}}}}_{{\bf{context\; 1}}}}{2}\right)$$

where $\bar{{\bf{R}}}$ and $\bar{{\bf{L}}}$ are n × 1 response vectors that described the trial-averaged population response for lick left and lick right trials at the end of the delay epoch. We separately calculated US axes for each task context change—that is, US_1→2 for task context 1→2, US_2→_1′ for task context 2→1′, US_1′_→2′ for task context 1′→2′ (Extended Data Fig. 8b). For activity projections (Extended Data Fig. 8c), the US axes are further orthogonalized to the CD vectors using the Gram–Schmidt process to capture activity changes along dimensions of activity space that were not selective for lick direction (‘movement-irrelevant subspace’). We computed the US vectors using 50% of the trials and the remaining 50% of the trials were used for activity projections (Extended Data Fig. 8c). The dot products in Extended Data Fig. 8d were calculated without any orthogonalization.

Modelling

The instructed directional licking task with a delay epoch was modelled with simulations lasting for two seconds. The first second of the simulation was the sample epoch during which time trial-specific external inputs were provided and the last second was the delay epoch in which the inputs were removed. The coding direction, ${{\bf{CD}}}_{{\bf{Delay}}}$ was calculated as the difference between network activity on lick left and lick right trials at the end of the delay epoch ($t=0$), similar to the neural data. The trial type was always defined by instructed lick direction in different task contexts (across contingency reversals).

Recurrent neural networks

RNNs consisted of 50 units with dynamics governed by the equations

$$\frac{\tau {\rm{d}}{r}_{i}(t)}{{\rm{d}}t}=-{r}_{i}(t)+f\left(\sum _{j}{W}_{i,j}{r}_{j}(t)+{I}_{i}^{{\rm{TT}}}(t)\right)$$

where ${r}_{i}(t)$ is the spike rate of neuron i, the synaptic time constant $\tau $ was set equal to 200 ms, ${W}_{i,j}$ is the synaptic strength from neuron j to neuron i, ${I}_{i}^{{\rm{TT}}}(t)$ is the trial-type (TT)-dependent external input to neuron i, and $f(x)=\tanh (x)$ is the neural activation function.

The connection matrix W was randomly initialized from a Gaussian distribution. The network was scaled to have a maximum eigenvalue equal to 0.9. To generate persistent activity, networks must have an eigenvalue greater than or equal to one. Networks initialized with eigenvalues greater than one tended to learn the task with high-dimensional persistent activity, inconsistent with ALM dynamics¹⁴. Initializing with eigenvalues less than one tended to produce lower dimensional persistent activity.

External input strengths ${I}_{i}^{{\rm{TT}}}$ were drawn from a Gaussian distribution with mean equal to zero and s.d. of 0.3. Two distinct input vectors were used for anterior ${I}_{i}^{A}$ and posterior ${I}_{i}^{P}$ pole position trials.

Behavioural readout B was given by the linear projections $B={\sum }_{i}{r}_{i}(t=0){W}_{{\rm{out}}}^{R}-{\sum }_{i}{r}_{i}(t=0){W}_{{\rm{out}}}^{L}$, where $t=0$ is the time at the end of the delay epoch, ${W}_{{\rm{out}}}^{R}$ and ${W}_{{\rm{out}}}^{L}$ are Gaussian random readout vectors corresponding to rightward and leftward movements, respectively.

RNNs were trained using backpropagation through time (BPTT). The input (${I}_{i}^{{\rm{TT}}}$) and readout weights (${W}_{{\rm{out}}}^{R}$ and ${W}_{{\rm{out}}}^{L}$) were fixed and only the recurrent weights ${W}_{i,j}$ internal to the RNN were trained. For each trial type, activity along the correct readout direction was trained to match a linear ramp of activity starting at the beginning of the sample epoch and the incorrect readout direction was trained to have zero activation. For task context 1, presentation of ${I}_{i}^{A}$ was associated with ramping along ${W}_{{\rm{out}}}^{L}$ and zero activation along ${W}_{{\rm{out}}}^{R}$, presentation of ${I}_{i}^{P}$ was associated with the opposite behaviour. These associations were reversed for task context 2. Networks were trained for 100 iterations.

In the RNNs, the behaviour readout relied on many units (dense ${W}_{{\rm{out}}}^{R}$ and ${W}_{{\rm{out}}}^{L}$). Because only 2 units in the AFF networks contributed to behaviour output, this difference in readout may affect how these networks learned to produce reversed output. We therefore also tested RNNs in which we fixed the behaviour readout to only 2 units like the AFF network (sparse ${W}_{{\rm{out}}}^{R}$ and ${W}_{{\rm{out}}}^{L}$), but all results remained unchanged.

Amplifying feedforward network

ALM circuitry contains an AFF circuit motif⁵⁴. The AFF network is a recurrent circuit in which preparatory activity during the delay epoch flows through a sequence of activity states. Each activity state can be modelled as a layer within a feedforward network. In addition, the late layers in the network are connected to early layers through feedback connections. Here we develop a framework for training AFF networks to generate choice-selective persistent activity.

Before detailing the learning rules used for training AFF networks, we first introduce several features that make AFF networks advantageous for training. Training neural networks require pathways linking input units to output units for computation, and pathways linking outputs to inputs for learning. In the simplest cases, output to input feedback may interfere with the input to output computations. AFF networks, and non-normal networks in general, do not generate reverberating feedback. For this reason, it is possible to construct AFF networks that bidirectionally link inputs to outputs through separate channels that do not interfere with each other.

AFF (also commonly referred to as non-normal) networks are constructed by applying orthonormal transformations to purely feedforward networks. Orthonormal transformations to feedforward networks serve two useful anatomical purposes: (1) they form feedback connections from late layers to early layers; and (2) they form stabilizing excitatory/inhibitory connections to eliminate any reverberation that may result from the newly formed feedback connections. In this model, we use the feedback connections from late layers to early layers to convey performance feedback signals allowing the AFF network to learn via error backpropagation.

We first constructed a purely feedforward network with 4 layers referred to as input (n; 30 units), hidden layer 1 (h1; 200 units), hidden layer 2 (h2; 5 units) and output (o; 2 units) (Extended Data Fig. 11). Trial-type (TT)-dependent external inputs, ${I}_{i}^{{\rm{TT}}}(t)$, were provided only to the input layer. Feedforward connection matrices (${W}_{i,j}^{n,{\rm{h1}}},{W}_{i,j}^{{\rm{h1,h2}}}$ and ${W}_{i,j}^{{\rm{h2}},o}$) conveyed these inputs to downstream layers and were initialized from a uniform positive distribution. Next, we added feedback connections from o to h2 (${W}_{j,i}^{o,{\rm{h2}}}$) and from h2 to h1 (${W}_{j,i}^{{\rm{h2,h1}}}$) to provide performance feedback for training the feedforward connections. Feedback connections were matched to feedforward connections so that ${W}_{j,i}^{o,{\rm{h2}}}={W}_{i,j}^{{\rm{h2}},o}$. These feedback connections provide scaffolding to precisely implement error backpropagation to train feedforward connections. However, the presence of feedback connections in the circuit will introduce feedback to the network that will interfere with its feedforward computations.

To cancel out the reverberations caused by this feedback we incorporated additional stabilization hidden layers s1 (200 units) and s2 (5 units) (Extended Data Fig. 11). Each hidden unit in layer h1 is matched with a stabilizing neuron in the stabilization layer s1 which receives the same feedback connections as its paired excitatory neuron and projects inhibitory connections of the same strength as its excitatory partner. Similarly, each neuron in h2 has a corresponding unit in s2. Mathematically this relationship is written as

$${W}_{i,j}^{({\rm{s1,h2}})}=-{W}_{i,j}^{({\rm{h1,h2}})}\,{\rm{and}}\,{W}_{i,j}^{({\rm{s2}},o)}=-{W}_{i,j}^{({\rm{h2}},o)}$$

and

$${W}_{j,i}^{({\rm{h2,s1}})}={W}_{j,i}^{({\rm{h2,h1}})}\,{\rm{and}}\,{W}_{j,i}^{(o,{\rm{s2}})}={W}_{j,i}^{(o,{\rm{h2}})}$$

Because of the precisely balanced excitation and inhibition, this recurrent network is non-normal; all eigenvalues are equal to zero. This non-normal network has two independent pathways, one linking the input layer to the output layer, useful for computation; and the other linking the output layer to the input layer, useful for learning.

The network is trained using error backpropagation; an error signal is computed and then sent back into each unit in the output layer. This error signal is conveyed to the early layers by the feedback connections. The stabilizing network ensures that this error signal does not reverberate. The backpropagated signal in neuron $i$ in the hidden layers h1 and h2 are thus given by the equations

$$\tau \frac{{\rm{d}}{B}_{i}^{{\rm{h2}}}(t)}{{\rm{d}}t}=-{B}_{i}^{{\rm{h2}}}(t)+\sum _{j}{e}_{j}{(t)W}_{j,i}^{o,{\rm{h2}}}$$

$$\tau \frac{{\rm{d}}{B}_{i}^{{\rm{h1}}}(t)}{{\rm{d}}t}=-{B}_{i}^{{\rm{h1}}}(t)+\sum _{j}{B}_{j}^{{\rm{h2}}}(t){W}_{j,i}^{{\rm{h2,h1}}}$$

As in error backpropagation, feedforward weights (that is, ${W}_{i,j}^{{\rm{h1,h2}}}$) are updated by taking the product of the forward pass activity and the backward pass activity. For example, connections from neuron i in layer h1 onto neuron j in layer h2 are updated according to the rule

$$\Delta {W}_{i,j}^{({\rm{h1,h2}})}={\sum }_{t}{r}_{i}^{{\rm{h1}}}(t){B}_{j}^{{\rm{h2}}}(t)\,{\rm{and}}\,\Delta {W}_{i,j}^{({\rm{h2}},o)}={\sum }_{t}{r}_{i}^{{\rm{h2}}}(t){B}_{j}^{o}(t)$$

This rule is applied to all feedforward connections (that is, $n\to {\rm{h1}}$, ${\rm{h1}}\to {\rm{h2}}$ and ${\rm{h2}}\to o$). Changing the feedforward weights will necessarily disrupt the precise balance in the network. To maintain stability, the stabilizing weights must be updated to precisely cancel the changes to the feedforward weights

$$\Delta {W}_{i,j}^{({\rm{s1,h2}})}=-\Delta {W}_{i,j}^{({\rm{h1,h2}})}$$

Compensatory weight changes based on this equation are applied to all connections in the stabilization layers (that is, ${\rm{s1}}\to {\rm{h2}}$ and ${\rm{s2}}\to o$).

The AFF network was trained to form the same associations as the RNN. Unlike the RNN, the AFF utilized a linear neuronal activation ($f(x)=x$) so that dynamics are governed by the equation

$$\frac{\tau {\rm{d}}{r}_{i}(t)}{{\rm{d}}t}=-{r}_{i}(t)+\sum _{j}{W}_{i,j}{r}_{j}(t)+{I}_{i}^{{\rm{TT}}}(t)$$

Additionally, because the AFF naturally generates ramping signals⁵⁴, the output units were not trained to match a ramping signal at all time points, but rather trained to be activated at a specific level at the end of the delay. For example, the target for the lick right output unit (T_R) on posterior trials was T_R(t = 0) = 6 and T_R(t = 0) = 0 on anterior trials.

Analysis of neural dynamics within RNN and AFF networks

For each network, we calculated the selectivity of each unit as the activity difference between the lick right and lick left trials in each task context. We calculated eigenvectors of the network selectivity matrix using singular value decomposition (SVD). The data for the SVD were an n × t matrix containing the selectivity of n units over t time bins (selectivity from task contexts 1 and 2 were concatenated). Three vectors usually captured most of the network activity variance across both task contexts (Extended Data Fig. 11f). We then rotated the 3 eigenvectors so that the first vector was aligned to the dimension that maximized the difference in network selectivity matrix between task contexts 1 and 2. Network activity projected on the first vector was correlated with the network input across task contexts, thus referred to as the stimulus mode (Extended Data Fig. 12a,b). Network activity projected on the second vector was correlated with the network output across task contexts and exhibited ramping activity during the delay epoch, thus referred to as the output mode (Extended Data Fig. 12a,b).

To examine the CD_Delay reorganization across task contexts as a function of stimulus mode strength (Extended Data Fig. 12c), we summed the network activity projected on the stimulus mode across time. This activity strength was normalized to the mean activity of each network to enable comparisons across different networks.

Statistics and reproducibility

The sample sizes were similar to sample sizes used in the field: for behaviour and two-photon calcium imaging, three mice or more per condition. No statistical methods were used to determine sample size. All key results were replicated in multiple mice. Mice were allocated into experimental groups according to their strain or by experimenter. Unless stated otherwise, the investigators were not blinded to mouse group allocation during experiments and outcome assessment. Trial types were randomly determined by a computer program. Statistical comparisons using t-tests and other statistical tests are described above. All statistics are two-sided unless otherwise noted. We used Pearson’s correlation for the linear regression. Error bars indicate mean ± s.e.m. unless noted otherwise. Representative images in Fig. 2c and Extended Data Fig. 2a,c,d were reproduced across all FOVs (n = 78 fields of view, 14 mice).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Processed data have been deposited on the DANDI (Distributed Archives for Neurophysiology Data Integration) archive and can be accessed at https://doi.org/10.48324/dandi.001188/0.240912.1925. Source data are provided with this paper.

Code availability

All analyses and statistics were performed with MATLAB R2020b using custom written code. Code used for data analysis is available at https://github.com/NuoLiLabBCM/KimEtAl2024.

References

Wolpert, D. M. & Kawato, M. Multiple paired forward and inverse models for motor control. Neural Netw. 11, 1317–1329 (1998).
Article CAS PubMed MATH Google Scholar
Smith, M. A., Ghazizadeh, A. & Shadmehr, R. Interacting adaptive processes with different timescales underlie short-term motor learning. PLoS Biol 4, e179 (2006).
Article PubMed PubMed Central Google Scholar
Rokni, U., Richardson, A. G., Bizzi, E. & Seung, H. S. Motor learning with unstable neural representations. Neuron 54, 653–666 (2007).
Article CAS PubMed Google Scholar
Herzfeld, D. J., Vaswani, P. A., Marko, M. K. & Shadmehr, R. A memory of errors in sensorimotor learning. Science 345, 1349–1353 (2014).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Sheahan, H. R., Franklin, D. W. & Wolpert, D. M. Motor planning, not execution, separates motor memories. Neuron 92, 773–779 (2016).
Article CAS PubMed PubMed Central Google Scholar
Heald, J. B., Lengyel, M. & Wolpert, D. M. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600, 489–493 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Sun, X. et al. Cortical preparatory activity indexes learned motor memories. Nature 602, 274–279 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Losey, D. M. et al. Learning leaves a memory trace in motor cortex. Curr. Biol. 34, 1519–1531.e1514 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Mitz, A. R., Godschalk, M. & Wise, S. P. Learning-dependent neuronal activity in the premotor cortex: activity during the acquisition of conditional motor associations. J. Neurosci. 11, 1855–1872 (1991).
Article CAS PubMed PubMed Central Google Scholar
Paz, R., Boraud, T., Natan, C., Bergman, H. & Vaadia, E. Preparatory activity in motor cortex reflects learning of local visuomotor skills. Nat. Neurosci. 6, 882–890 (2003).
Article CAS PubMed Google Scholar
Vyas, S. et al. Neural population dynamics underlying motor learning transfer. Neuron 97, 1177–1186.e1173 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Tanji, J. & Evarts, E. V. Anticipatory activity of motor cortex neurons in relation to direction of an intended movement. J. Neurophysiol. 39, 1062–1068 (1976).
Article CAS PubMed MATH Google Scholar
Kubota, K. & Hamada, I. Preparatory activity of monkey pyramidal tract neurons related to quick movement onset during visual tracking performance. Brain Res. 168, 435–439 (1979).
Article CAS PubMed MATH Google Scholar
Li, N., Daie, K., Svoboda, K. & Druckmann, S. Robust neuronal dynamics in premotor cortex during motor planning. Nature 532, 459–464 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Gallego, J. A., Perich, M. G., Miller, L. E. & Solla, S. A. Neural manifolds for the control of movement. Neuron 94, 978–984 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation through neural population dynamics. Annu. Rev. Neurosci. 43, 249–275 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Kaufman, M. T., Churchland, M. M., Ryu, S. I. & Shenoy, K. V. Cortical activity in the null space: permitting preparation without movement. Nat. Neurosci. 17, 440–448 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V. A neural network that finds a naturalistic solution for the production of muscle activity. Nat. Neurosci. 18, 1025–1033 (2015).
Article CAS PubMed PubMed Central Google Scholar
Darlington, T. R. & Lisgerger, S. G. Mechanisms that allow cortical preparatory activity without inappropriate movement. eLife 9, e50962 (2020).
Article CAS PubMed PubMed Central Google Scholar
Vyas, S., O'Shea, D. J., Ryu, S. I. & Shenoy, K. V. Causal role of motor preparation during error-driven learning. Neuron 106, 329–339.e4 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nudo, R. J., Milliken, G. W., Jenkins, W. M. & Merzenich, M. M. Use-dependent alterations of movement representations in primary motor cortex of adult squirrel monkeys. J. Neurosci. 16, 785–807 (1996).
Article CAS PubMed PubMed Central Google Scholar
Xu, T. et al. Rapid formation and selective stabilization of synapses for enduring motor memories. Nature 462, 915–919 (2009).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yang, G., Pan, F. & Gan, W. B. Stably maintained dendritic spines are associated with lifelong memories. Nature 462, 920–924 (2009).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Peters, A. J., Chen, S. X. & Komiyama, T. Emergence of reproducible spatiotemporal activity during motor learning. Nature 510, 263–267 (2014).
Article ADS CAS PubMed MATH Google Scholar
Chen, S. X., Kim, A. N., Peters, A. J. & Komiyama, T. Subtype-specific plasticity of inhibitory circuits in motor cortex during motor learning. Nat. Neurosci. 18, 1109–1115 (2015).
Article CAS PubMed PubMed Central Google Scholar
Costa, R. M., Cohen, D. & Nicolelis, M. A. Differential corticostriatal plasticity during fast and slow motor skill learning in mice. Curr. Biol. 14, 1124–1134 (2004).
Article CAS PubMed Google Scholar
Huber, D. et al. Multiple dynamic representations in the motor cortex during sensorimotor learning. Nature 484, 473–478 (2012).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Li, C. S., Padoa-Schioppa, C. & Bizzi, E. Neuronal correlates of motor performance and motor learning in the primary motor cortex of monkeys adapting to an external force field. Neuron 30, 593–607 (2001).
Article CAS PubMed Google Scholar
Padoa-Schioppa, C., Li, C. S. & Bizzi, E. Neuronal activity in the supplementary motor area of monkeys adapting to a new dynamic environment. J. Neurophysiol. 91, 449–473 (2004).
Article PubMed Google Scholar
Sadtler, P. T. et al. Neural constraints on learning. Nature 512, 423–426 (2014).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Golub, M. D. et al. Learning by neural reassociation. Nat. Neurosci. 21, 607–616 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Clancy, K. B., Koralek, A. C., Costa, R. M., Feldman, D. E. & Carmena, J. M. Volitional modulation of optically recorded calcium signals during neuroprosthetic learning. Nat. Neurosci. 17, 807–809 (2014).
Article CAS PubMed PubMed Central Google Scholar
Driscoll, L. N., Duncker, L. & Harvey, C. D. Representational drift: emerging theories for continual learning and experimental future directions. Curr. Opin. Neurobiol. 76, 102609 (2022).
Article CAS PubMed Google Scholar
Driscoll, L. N., Pettit, N. L., Minderer, M., Chettih, S. N. & Harvey, C. D. Dynamic reorganization of neuronal activity patterns in parietal cortex. Cell 170, 986–999.e916 (2017).
Article CAS PubMed PubMed Central Google Scholar
Rule, M. E., O’Leary, T. & Harvey, C. D. Causes and consequences of representational drift. Curr. Opin. Neurobiol. 58, 141–147 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Chestek, C. A. et al. Single-neuron stability during repeated reaching in macaque premotor cortex. J. Neurosci. 27, 10742–10750 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ganguly, K. & Carmena, J. M. Emergence of a stable cortical map for neuroprosthetic control. PLoS Biol. 7, e1000153 (2009).
Article PubMed PubMed Central MATH Google Scholar
Katlowitz, K. A., Picardo, M. A. & Long, M. A. Stable sequential activity underlying the maintenance of a precisely executed skilled behavior. Neuron 98, 1133–1140.e1133 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Jensen, K. T., Kadmon Harpaz, N., Dhawale, A. K., Wolff, S. B. E. & Olveczky, B. P. Long-term stability of single neuron activity in the motor system. Nat. Neurosci. 25, 1664–1674 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23, 260–270 (2020).
Article CAS PubMed PubMed Central Google Scholar
Guo, Z. V. et al. Flow of cortical activity underlying a tactile decision in mice. Neuron 81, 179–194 (2014).
Article CAS PubMed MATH Google Scholar
Li, N., Chen, T. W., Guo, Z. V., Gerfen, C. R. & Svoboda, K. A motor cortex circuit for motor planning and movement. Nature 519, 51–56 (2015).
Article ADS CAS PubMed Google Scholar
Hao, Y., Thomas, A. N. & Li, N. Fully autonomous mouse behavioral and optogenetic experiments in home-cage. eLife 10, e66112 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bollu, T. et al. Cortex-dependent corrections as the tongue reaches for and misses targets. Nature 594, 82–87 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, D. et al. Cortical processing of flexible and context-dependent sensorimotor sequences. Nature 603, 464–469 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Marshel, J. H. et al. Cortical layer-specific critical dynamics triggering perception. Science 365, eaaw5202 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sheintuch, L. et al. Tracking the same neurons across multiple days in Ca²⁺ imaging data. Cell Rep. 21, 1102–1115 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Friedrich, J., Zhou, P. & Paninski, L. Fast online deconvolution of calcium imaging data. PLoS Comput Biol. 13, e1005423 (2017).
Article ADS PubMed PubMed Central MATH Google Scholar
Cunningham, J. P. & Yu, B. M. Dimensionality reduction for large-scale neural recordings. Nat. Neurosci. 17, 1500–1509 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
Druckmann, S. & Chklovskii, D. B. Neuronal circuits underlying persistent representations despite time varying activity. Curr. Biol. 22, 2095–2103 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Svoboda, K. & Li, N. Neural mechanisms of movement planning: motor cortex and beyond. Curr. Opin. Neurobiol. 49, 33–41 (2018).
Article CAS PubMed MATH Google Scholar
Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks. Neuron 63, 544–557 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Goldman, M. S. Memory without feedback in a neural network. Neuron 61, 621–634 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Daie, K., Fontolan, L., Druckmann, S. & Svoboda, K. Feedforward amplification in recurrent networks underlies paradoxical neural coding. Preprint at bioRxiv https://doi.org/10.1101/2023.08.04.552026 (2023).
Colgin, L. L., Moser, E. I. & Moser, M. B. Understanding memory through hippocampal remapping. Trends Neurosci. 31, 469–477 (2008).
Article CAS PubMed MATH Google Scholar
Ziv, Y. et al. Long-term dynamics of CA1 hippocampal place codes. Nat. Neurosci. 16, 264–266 (2013).
Article CAS PubMed PubMed Central MATH Google Scholar
Schoonover, C. E., Ohashi, S. N., Axel, R. & Fink, A. J. P. Representational drift in primary olfactory cortex. Nature 594, 541–546 (2021).
Article ADS CAS PubMed Google Scholar
Chen, S. et al. Brain-wide neural activity underlying memory-guided movement. Cell 187, 676–691.e16 (2024).
Article CAS PubMed MATH Google Scholar
Guo, Z. V. et al. Maintenance of persistent activity in a frontal thalamocortical loop. Nature 545, 181–186 (2017).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Thomas, A. et al. Superior colliculus bidirectionally modulates choice activity in frontal cortex. Nat. Commun. 14, 7358 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zhu, J., Hasanbegovic, H., Liu, L., Gao, Z. & Li, N. Activity map of a cortico-cerebellar loop underlying motor planning. Nat. Neurosci. 26, 1916–1928 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Li, N. & Mrsic-Flogel, T. D. Cortico-cerebellar interactions during goal-directed behavior. Curr. Opin. Neurobiol. 65, 27–37 (2020).
Article CAS PubMed PubMed Central Google Scholar
Albus, J. S. A theory of cerebellar function. Math. Biosci. 10, 25–61 (1971).
Article MATH Google Scholar
Marr, D. A theory of cerebellar cortex. J. Physiol. 202, 437–470 (1969).
Article CAS PubMed PubMed Central MATH Google Scholar
Guo, Z. V. et al. Procedures for behavioral experiments in head-fixed mice. PloS ONE 9, e88678 (2014).
Article ADS PubMed PubMed Central Google Scholar
Slotnick, B. A simple 2-transistor touch or lick detector circuit. J. Exp. Anal. Behav. 91, 253–255 (2009).
Article PubMed PubMed Central MATH Google Scholar
Li, N. et al. Spatiotemporal constraints on optogenetic inactivation in cortical circuits. eLife 8, e48622 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mangin, E. N., Chen, J., Lin, J. & Li, N. Behavioral measurements of motor readiness in mice. Curr. Biol. 33, 3610–3624.e3614 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Article CAS PubMed MATH Google Scholar
Pachitariu, M. et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. Preprint at bioRxiv https://doi.org/10.1101/061507 (2016).
Chen, T. W., Li, N., Daie, K. & Svoboda, K. A map of anticipatory activity in mouse motor cortex. Neuron 94, 866–879.e864 (2017).
Article CAS PubMed MATH Google Scholar

Download references

Acknowledgements

The authors thank S. Lisberger, J. Yau, D. Herzfeld, H. Inagaki, M. Economo, D. Lipshutz, D. Ji and members of the Li laboratory for comments on the manuscript and insightful discussions. This work was funded by the Pew Scholars Program, NIH NS112312, NS113110, NS131229, NS132025, McKnight Foundation, and Simons Collaboration on the Global Brain. J.-H.K. is supported by National Research Foundation of Korea (RS-2023-00238217). K.D. is supported by Allen Institute for Neural Dynamics. Diagrams in Fig. 1 were created with BioRender.com.

Author information

Authors and Affiliations

Department of Neurobiology, Duke University, Durham, NC, USA
Jae-Hyun Kim & Nuo Li
Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
Jae-Hyun Kim & Nuo Li
Allen Institute for Neural Dynamics, Seattle, WA, USA
Kayvon Daie

Authors

Jae-Hyun Kim
View author publications
Search author on:PubMed Google Scholar
Kayvon Daie
View author publications
Search author on:PubMed Google Scholar
Nuo Li
View author publications
Search author on:PubMed Google Scholar

Contributions

J.-H.K. and N.L. conceived and designed the experiments. J.-H.K. performed the experiments. J.-H.K. analysed data. K.D. performed modelling. J.-H.K., K.D. and N.L. wrote the paper.

Corresponding author

Correspondence to Nuo Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Chunyu Duan, Reza Shadmehr and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Behavioral analysis, behavioral training, and experimental timeline.

a. Top left, representative video frame with automatically labeled tongue markers using DeepLabCut. Top middle, superimposed tongue tip trajectories and x and y velocities of individual lick events during lick left (red) and lick right (blue). Data from an example mouse across sessions within the same task context. Tongue tip trajectory scale bar, 4 pixels (x) and 6 pixels (y). X velocity scale bar, 12 ms and 2 pixels/s. Y velocity scale bar, 12 ms and 1.5 pixels/s. Top right, scatter of averaged pairwise similarity of single lick events (Pearson’s correlation) calculated within session versus across sessions. Data from two mice. Bottom, same as top but for data across task contexts 1 and 2. b. Same as a, but for jaw marker analysis. Jaw tip trajectory scale bar, 4 pixels (x) and 4 pixels (y). X velocity scale bar, 12 ms and 1 pixels/s. Y velocity scale bar, 12 ms and 1.5 pixels/s. c. Left, schematics of learning speed under two models. Context-specific saving effect (top): faster re-learning only for previously learned tasks. Context non-specific saving effect (bottom): faster learning each time. Right, faster reversal learning is consistent with a context-specific saving effect. Re-learning of task context 2’ is significantly faster than initial learning of task context 2 (top). P = 0.0487, paired t-test. Circles indicate individual mice (N = 13 mice). Crosses indicate mean ± s.e.m. We examine task context 2 because the initial learning of task context 1 is confounded by the exposure to home-cage training. To examine context non-specific saving effect, we compare the speed of re-learning task context 1’ versus re-learning task context 2’ (bottom). The two conditions have similar task-specific prior training. No significant difference is observed. P = 0.3425, two-tailed paired t-test. d. Same as Fig. 1i, but separately plotting photoinhibition results for task context 1 (left) and task context 2 (right). e. Experimental timeline of an example mouse imaged within the same task context over extended time. Black, behavior training in automated home-cage. Gray, habituation in two-photon setup. Red, calcium imaging in two-photon setup. All the trials are concatenated. Black triangle indicates the end of learning voluntary head-fixation and start of learning in tactile instructed licking task. Averaging window, 100 trials. f. Same as e, but for two mice imaged across different task contexts. g. Summary plot of experimental timeline from all GP4.3 mice used for imaging in this study. h-i. Behavior performance curves for the initial learning from GP4.3 mice (h, n = 15 mice, all were trained in automated home-cage) and Slc17a7-Cre x Ai148 mice (i, n = 11 mice, 7 mice were trained in automated home-cage and 4 mice were manually trained). Different colors represent individual mice. Circles indicate end of the learning curves for GP4.3 mice and termination of training for Slc17a7-Cre x Ai148 mice. j. Behavior performance within imaging sessions across 4 segments of trials. Thin gray lines indicate individual sessions. Thick black lines indicate mean ± s.e.m. Data from Fig. 2b.

Source Data

Extended Data Fig. 2 Preprocessing of imaging data and ALM preparatory activity.

a. Mean two-photon fluorescence images from the same field of view (FOV) across 3 imaging sessions (Day 1, 17, and 60). b. Left, cranial windows from two example mice. Each black box indicates one imaging FOV (600 × 600 µm). Right, all imaging FOVs (n = 50 from 8 mice). Imaging FOVs cover ALM, defined as the area where photoinhibition during the delay epoch impairs behavior performance (dotted red line⁴¹) and exhibiting enriched choice selectivity (gray⁷¹). c. Spatial footprints of individual neurons from the same FOV across 3 imaging sessions, which are the output of Suite2p (Methods). d. Identified co-registered neurons (green) across 3 imaging sessions, which are computed by CellReg (Methods). e. The number of neurons from the expert-early session (n = 1,690 ± 758, mean ± SD), expert-late session (n = 1,704 ± 777), and matched neurons in both expert-early and expert-late sessions (n = 855 ± 402). 12.80 ± 8.90 (mean ± SD) days between imaging sessions. Data from Fig. 2. f. Fraction of match neurons across individual mice. Dots, individual FOVs. Error bars, mean ± SD. g. Distribution of centroid distance (left) and spatial footprint correlation (right) from nearest neighboring neuronal pairs (green) and other neighboring neuronal pairs within 10 µm (red). Centroid distance and spatial footprint correlation are parameters used to define co-registered neurons across imaging sessions used by CellReg package. Data from the same FOV in a, c, d. h. Density map between centroid distance and spatial footprint correlation from all co-registered neurons (n = 42,739 from 8 mice). Data from Fig. 2. i. Distribution of registration score from all co-registered neurons (n = 42,739 from 8 mice). j. dF/F₀ activity (left), deconvolved activity (middle), and heatmap of single trial deconvolved activity (right) from two example neurons. Thick lines represent the mean; thin lines represent single trials. k. Single trial deconvolved activity (top) and peristimulus time histograms (PSTH, bottom) for correct and error trials are shown for three example ALM neurons. Trial types are based on instructed lick direction (blue, lick right; red, lick left). Correct trials, solid lines. Error trials, dotted lines. mean ± s.e.m. l. Top, comparison of individual neuron trial-type selectivity between correct and error trials. Neurons with significant trial-type selectivity (P < 0.001, two-tailed t-test). Selectivity is the difference in deconvolved activity between instructed lick right and lick left trials during the early sample epoch (left), late delay epoch (middle), and response epoch (right). On error trials, when mice licked in the opposite direction to the instruction provided by object location (Fig. 2a), a majority of ALM neurons switched their trial type preference to predict the licking direction during the delay and response epochs, as indicated by the negative correlations (R, Pearson’s correlation). Bottom, histogram of selectivity angle between correct and error trials. A negative angle indicates neuron switching selectivity on error trials. Bin size: 2°.

Source Data

Extended Data Fig. 3 Activity drift in individual neurons and population activity.

a. Quantification of PSTH stability. Pearson’s correlation between PSTHs of individual neurons from different imaging sessions (expert-early and expert-late). b. Relationship between PSTH stability and weight contribution to the CD_Delay. Dots, individual neurons. Red neurons (n = 3,542) are the top 10% weight contributors to the CD_Delay. Black neurons (n = 31,878) are the remaining 90% of neurons. c. Probability density functions of PSTH correlation for the top 10% weight contributors (red) and the other neurons (black). P = 2.85 × 10⁻¹¹⁴, two-sample Kolmogorov-Smirnov test, two-tailed test. d. Pearson’s correlation between vectors of concatenated PSTHs across the whole population as a function of delta days between expert-early and expert-late imaging sessions. Different colors represent different mice. Dotted lines indicated linear regressions of individual mice across days. e. Same as d, but for population activity vectors projected onto the CD_Delay. f. R values of linear regressions of individual mice in panels d and e. P = 0.0156, two-tailed paired t-test. Data from Fig. 2 (50 fields of view from 8 mice); mean ± s.e.m.

Source Data

Extended Data Fig. 4 Task-related activity during the sample, delay, and response epochs within the same task context.

a-c. Analysis time window to estimate coding direction in different task epochs. Early sample epoch (a, CD_Sample), late delay epoch (b, CD_Delay), and early response epoch (c, CD_Response) were used, respectively. d-f. Scatter plots and histograms of individual neuron selectivity index during the sample (d), delay (e), and response epoch (f) comparing expert-early and expert-late sessions. Colors indicate neurons with significant trial-type selectivity (P < 0.001, two-tailed t-test) during specific epochs in expert-early sessions. Neurons are colored based on their preferred trial-type in expert-early sessions. Green, neurons preferring anterior pole position. Purple, neurons preferring posterior pole position. Red, neurons preferring lick left. Blue, neurons preferring lick right. Gray, no preference neurons. Pearson’s correlation, sample epoch, R = 0.9404, P = 0 (d); delay epoch, R = 0.8861, P = 0 (e); response epoch, R = 0.9001, P = 0 (f). g. Same as Fig. 2h, but from the example FOV projected on the CD_Sample trained on day 1 (top) or day 23 (bottom) and tested on day 1 (left) or day 23 (right). h. Same as Fig. 2h (for CD_Delay) replotted here for comparison. i. Same as Fig. 2h, but for CD_Response. j-l. Trial-averaged ALM activities projected on the CD_Sample (j), CD_Delay (k), and CD_Response (l) from the same session (left) and across different sessions (right). Thin lines represent individual sessions. Thick lines represent the mean. m-o. Same as Fig. 2i, but for CD_Sample (m), CD_Delay (n), and CD_Response (o). P = 0.4203 (o), P = 0.4870 (n), P = 0.0886 (o), R values of linear regression, two-tailed t-test against 0. p-r. Same as Fig. 2j, but for CD_Sample (p, n = 111 pairs of sessions, 8 mice), CD_Delay (q, n = 113 pairs of sessions, 8 mice), and CD_Response (r, n = 123 pairs of sessions, 8 mice).

Source Data

Extended Data Fig. 5 Task-related activity during the sample, delay, and response epochs across different tasks contexts.

a-c. Same as Fig. 3c, but sorting the same neuronal population based on their selectivity during the sample epoch (a, n = 1,043), delay epoch (b, n = 1,112), and response epoch (c, n = 654) in task context 1 (top) or task context 2 (bottom). a, b, and c contain different fields of views. d-f. Scatter plots and histograms of individual neuron selectivity index during the sample (d), delay (e), and response epoch (f) comparing task contexts 1 and 2. Colors indicate neurons with significant trial-type selectivity (P < 0.001, two-tailed t-test) during specific epochs in task context 1. Neurons are colored based on their preferred trial-type in task context 1. Green, neurons preferring anterior pole position. Purple, neurons preferring posterior pole position. Red, neurons preferring lick left. Blue, neurons preferring lick right. Gray, no preference neurons. Pearson’s correlation, sample epoch, R = 0.8707, P = 0 (d); delay epoch, R = −0.0057, P = 0.6774 (e); response epoch R = 0.6804, P = 0 (f). g-i. Same as Fig. 3f, but for CD_Sample (g), CD_Delay (h), and CD_Response (i). j-l. Trial-averaged ALM activities projected on the CD_Sample (j), CD_Delay (k), and CD_Response (l) within the same task context (left) and across different task contexts. m-o. Same as Fig. 3g, but for CD_Sample (m, n = 55 pairs of sessions, 10 mice), CD_Delay (n, n = 58 pairs of sessions, 10 mice), and CD_Response (o, n = 58 pairs of sessions, 10 mice). p. Schematic of calculating Euclidean distance between selectivity index vectors. q. Euclidean distance between the delay epoch selectivity vectors calculated within task context (1 vs. 1) and across task contexts (1 vs. 2). Selectivity vectors within task context are calculated using split-half trials from the same session. ***P = 4.03 × 10⁻²⁰, two-tailed paired t-test. Mean ± s.e.m. r. Euclidean distance of sample, delay, and response epoch selectivity vectors across task contexts (1 vs. 2). Sample vs. delay epoch, ***P = 9.21 × 10⁻¹³; delay vs. response epoch, ***P = 1.71 × 10⁻¹², two-tailed paired t-test. Mean ± s.e.m.

Source Data

Extended Data Fig. 6 Individual variability across mice in the degree of CD_Delay reorganization across task contexts.

a. Data from Fig. 3g, but broken out by individual mice, sorted by mean decoding accuracy across task contexts (train context 1 and test context 2). Individual mice are plotted separately in different colors. b. Dot product between CD_Delay within the same task context (left) and across different task contexts (right). Individual mice are sorted by mean dot product of CD_Delay across task contexts. Same color scheme as a. Note that variability across mice is much higher than variability across FOVs within the same mouse. Numbers in brackets indicate pairs of sessions for each mouse. c. Left, decoding accuracy of the CD_Delay, CD_Sample, and CD_Response across task contexts. Right, decoding accuracy of the CD_Delay, normalized by that of CD_Sample and CD_Response. Arrows indicate two outlier data points (y-axis values, 1.93 and 2.23). d-e. Same as Fig. 3f (top), but data from an example FOV of a mouse with stable CD_Delay across task contexts (c, JH118; see a-b) and a mouse with reversed CD_Delay (d, JH123; see a-b). f. Displacement of x and y jaw positions in task contexts 1 and 2 from two example mice. Mean ± SD across trials. g. No relationship between decoding accuracy of CD_Delay across task contexts (x axis) versus 7 parameters (y axis) as follows: delta days between imaging sessions; matched number of neurons co-registered across imaging sessions; number of correct trials from task context 1; behavior performance from task context 1; relative learning speed to reach 75% behavior performance (number of trials to reach criterion performance in task context 2 relative to task context 1); relative AP location of imaging FOVs after subtracting mean AP location of each mouse; relative ML location of imaging FOVs after subtracting mean ML location of each mouse.

Source Data

Extended Data Fig. 7 Task-related activity during the sample, delay, and response epochs across task context 1, 2, and 1’.

a. Mean deconvolved activities from an example field of view across three task contexts (n = 781 neurons). b. Scatter plots and histograms of individual neuron selectivity index during the sample epoch comparing task contexts 1 and 2 (left) or task contexts 1 and 1’ (right). Colors indicate neurons with significant trial-type selectivity (P < 0.001, two-tailed t-test) during specific epochs in expert-early sessions. Neurons are colored based on their preferred trial-type in task context 1. Green, neurons preferring anterior pole position. Purple, neurons preferring posterior pole position. Gray, no preference neurons. Pearson’s correlation, task context 1 vs. 2, R = 0.8912, P = 0; task context 1 vs. 1’, R = 0.8290, P = 0. c-f. Same as Fig. 4f–i, but for activity during the sample epoch. In d and f, gray circles and lines indicate FOVs imaged across task contexts 1, 2, and 1’ (n = 24 FOVs, 5 mice); black circles and lines indicate subset of FOVs imaged across task contexts 1, 2, 1’, and 2’ (n = 5 FOVs, 2 mice). Bar/errorbar, mean/s.e.m. g-l. Same as a-f, but for activity during the delay epoch. h, red indicates neurons preferring lick left and blue indicates neurons preferring lick right in task context 1. Pearson’s correlation, task context 1 vs. 2, R = −0.1224, P = 1.38 × 10⁻⁸; task context 1 vs. 1’, R = 0.7675, P = 0. In j and l, FOVs imaged across task contexts 1, 2, and 1’ (n = 26 FOVs, 5 mice); FOVs imaged across task contexts 1, 2, 1’, and 2’ (n = 7 FOVs, 3 mice). Data from Fig. 4, replotted here for comparison. Bar/errorbar, mean/s.e.m. m-r. Same as g-l, but for activity during the response epoch. n, Pearson’s correlation, task context 1 vs. 2, R = 0.6220, P = 0; task context 1 vs. 1’, R = 0.7624, P = 0. In p and r, FOVs imaged across task contexts 1, 2, and 1’ (n = 26 FOVs, 5 mice); FOVs imaged across task contexts 1, 2, 1’, and 2’ (n = 5 FOVs, 3 mice). Bar/errorbar, mean/s.e.m.

Source Data

Extended Data Fig. 8 Neural activity change in movement-irrelevant activity subspace across task contexts.

a. Schematic of activity changes across task contexts along coding directions (top, CD_Delay, estimated from task contexts 1 and 2) and movement-irrelevant subspace (bottom, US_Delay, estimated from task context 1→2 and 2→1’). b. Formula to calculate the CD_Delay’s and the US_Delay’s. CD_Delay’s are calculated separately for each task context. US_Delay’s are calculated separately for each task context change. c. Activity of an example FOV across 4 task contexts (1, 2, 1’, and 2’). Top, activity projections on the CD_Delay’s. The CD_Delay of task context 2 is orthogonalized to the CD_Delay of task context 1 here for visualization purposes. Similar patterns of activity are re-activated in the same task context (1 vs. 1’ and 2 vs. 2’). Big solid circles represent the mean; small transparent circles represent activity in single trials. Bottom, activity projections on the US_Delay’s. In contrast to the activity along the CD_Delay’s, activity along the US_Delay’s does not show reliable re-activation in the same context. d. Dot products (mean ± s.e.m.) between the CD_Delay’s and US_Delay’s (n = 9 fields of view, 3 mice). Activity along the coding directions shows reliable re-activation (consistent CD_Delay’s for task context 1 vs. 1’ and 2 vs. 2’). Activity in the movement-irrelevant subspace does not show consistent changes across re-learning of previous task contexts (1→2 vs. 2→1’ and 1→2 vs. 1’→2’). e. Decoding accuracy of the CD_Delay and US_Delay to predict lick directions (58 pairs of imaging sessions, 10 mice). Bar/errorbar, mean/s.d. ***P = 5.64 × 10⁻³⁹, two-tailed paired t-test. Activity projection on the US_Delay does not predict lick direction.

Source Data

Extended Data Fig. 9 Task-related activity during the sample, delay, and response epochs across task context 1, 2, and 3.

a-b. Same as Fig. 5e,f, but for activity during the sample epoch (n = 4 fields of view, 2 mice). Bar/errorbar, mean/s.e.m. c-d. Same as Fig. 5e,f, replotted here for comparison. Bar/errorbar, mean/s.e.m. e-f. Same as Fig. 5e,f, but for activity during the response epoch (n = 8 fields of view, 3 mice). Bar/errorbar, mean/s.e.m.

Source Data

Extended Data Fig. 10 Context-specific preparatory activity retains memory trace of previous learning and reduces interference.

a. Schematic of the memory trace. ALM preparatory activity from task contexts 1, 2, and 1’ is projected onto the CD_Delay from task context 2. Memory trace is defined as a selectivity increase along the CD_Delay for task context 2 during performance of task context 1’, as shown in black arrows combining blue and red arrows. b. Memory trace. Change in delay epoch selectivity along the CD_Delay for task context 2 from task context 1 to 1’. Bar/errorbar, mean/s.e.m. **P = 0.005, two-tailed paired t-test. c. Decoding accuracy of the CD_Delay for task context 2 tested on task contexts 1 (52.75 ± 5.24%) and 1’ (58.66 ± 4.63%). Cross, mean ± s.e.m. *P = 0.0199, two-tailed paired t-test. 26 fields of view from 5 mice. d. Speed of re-learning task context 1’ as a function of the CD_Delay reorganization across task contexts 1 and 2. Number of trials to reach criterion performance in task context 1’ relative to number of trials during the initial learning of task context 1. Mice exhibiting more distinct CD_Delay’s across task contexts (i.e. lower dot product) re-learned the previously learned task context 1’ faster (i.e. fewer trials to reach 75% performance criterion). Each dot shows one field of view from one mouse. Dotted line, linear regression; R, Pearson’s correlation.

Source Data

Extended Data Fig. 11 Recurrent neural network (RNN) and amplifying feedforward (AFF) network models.

a. A schematic of RNN model with sparse readout. Only two internal units directly contribute to the output. b-c. Same as Fig. 6c,d, but for RNNs with sparse readout. d. A schematic of the AFF networks and governing equations. e. Analysis of the AFF network similar to Daie et al.⁵⁴. We identified directions in activity space at different time points that influence network activity along the CD_Delay at the end of the delay epoch (t = 0 s). We refer to these as transitional directions. The plot shows correlation between transitional directions at time point t vs time point t’ for all sample and delay epoch time points. AFF networks generate persistent activity by passing activity through a chain of network states, where early layers influence activity in the late layers. This results in network activity sequentially traversing multiple directions in activity space, as indicated by the low correlation values off diagonal. f. Dimensionality of trial-type selectivity in the AFF networks. 3 dimensions captured most of the network selectivity. g. Resetting AFF network hidden layer weights to random values before re-learning task context 1’. h. Resetting synaptic weights before re-learning task context 1’ prevented the re-activation of the CD_Delay. Weight contribution of the AFF units to the CD_Delay’s from task contexts 1 and 1’.

Extended Data Fig. 12 Neural dynamics within RNN and AFF networks.

a. AFF network activity projected on the stimulus mode (top), output mode (middle), and in state space (bottom). Trial types are defined by lick direction. Blue, lick right. Red, lick left. See Methods for decomposition of network activity modes. AFF networks exhibit persistent activity along the stimulus mode, which combines with the ramping output mode to produce distinct CD_Delay’s in each task context (yellow arrows). b. Same as a, but for RNN. RNNs do not maintain persistent activity along the stimulus mode, which results in stable CD_Delay’s across task contexts that are aligned to the output mode. c. The strength of network activity along the stimulus mode predicts the degree of CD_Delay reorganization across task contexts (dot product of the CD_Delay’s from task contexts 1 and 2). Dots, individual randomly initialized AFFs. R and P values, Pearson’s correlation. d. Neural data. Individual variability of CD_Delay reorganization across task contexts is predicted by the strength of stimulus activity in ALM. The strength of ALM stimulus activity is quantified as the decoding accuracy of the CD_Sample in task context 1 (trained and tested within task context), or the fraction of neurons with significant trial-type selectivity during the sample epoch in task context 1. Dots, individual FOVs. Individual mice are plotted separately in different colors. R and P values, Pearson’s correlation.

Source Data

Supplementary information

Reporting Summary

Peer Review file

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Source Data Extended Data Fig. 5

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 7

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Source Data Extended Data Fig. 10

Source Data Extended Data Fig. 12

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, JH., Daie, K. & Li, N. A combinatorial neural code for long-term motor memory. Nature 637, 663–672 (2025). https://doi.org/10.1038/s41586-024-08193-3

Download citation

Received: 13 December 2023
Accepted: 10 October 2024
Published: 13 November 2024
Issue date: 16 January 2025
DOI: https://doi.org/10.1038/s41586-024-08193-3

Subjects

Abstract

Similar content being viewed by others

Main

A continual learning paradigm

Stable representation of action

New representation emerges with learning

Stable retention of learned representations

Learning creates parallel representations

Preparatory activity reflects motor memory

A feedforward network for stable memory storage

Discussion

Methods

Mice

Surgery

Behaviour tasks and training in home cage

Performance criteria for contingency reversals and acclimatization to imaging setup

ALM photoinhibition in home cage

Videography

Two-photon imaging

Behaviour data analysis

Video data analysis

Preprocessing of two-photon imaging data

Two-photon imaging data analysis

Modelling

Recurrent neural networks

Amplifying feedforward network

Analysis of neural dynamics within RNN and AFF networks

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links