Main

External context can change the processing of stimuli through recurrent neural dynamics1. In this process, the evolution of neural population activity depends on its own history as well as external inputs2, giving context-specific meaning to otherwise ambiguous stimuli3. To study how hypotheses can be held in memory and serve as internal signals to compute new information, we developed a task that requires sequential integration of spatially separated ambiguous landmarks4. In this task, the information needed to disambiguate the stimuli is not provided externally but must be computed, maintained over time and applied to the stimuli by the brain.

Results

We trained freely moving mice to distinguish between two perceptually identical landmarks, formed by identical dots on a computer-display arena floor, by sequentially visiting them and reasoning about their relative locations. The landmarks were separated by <180 degrees in an otherwise featureless circular arena (50-cm diameter), to create a clockwise (CW) (‘a’) and a counterclockwise (CCW) (‘b’) landmark. Across trials, the relative angle between landmarks was fixed and the same relative port was always the rewarded one; within trials, the locations of landmarks was fixed. The mouse’s task was to find and nose-poke at the CCW ‘b’ landmark for water reward (‘b’ was near one of 16 identical reward ports spaced uniformly around the arena; other ports caused a time out). At most, one landmark was visible at a time (enforced by tracking mouse position and modulating landmark visibility based on relative distance (Extended Data Fig. 1; Methods). Each trial began with the mouse in the center of the arena in the dark (‘LM0’ phase; Fig. 1b), without knowledge of its initial pose. In the interval after first encountering a landmark (‘LM1’ phase), an ideal agent’s location uncertainty is reduced to two possibilities, but there is no way to disambiguate whether it saw ‘a’ or ‘b.’ After seeing the second landmark, an ideal agent could infer landmark identity (‘a’ or ‘b’; this is the ‘LM2’ phase; Fig. 1b) by estimating the distance and direction traveled since the first landmark and comparing those with the learned relative layout of the two landmarks; thus, an ideal agent can use sequential spatial reasoning to localize itself unambiguously. For most analyses, we ignored cases where mice might have gained information from not encountering a landmark, for example, as the artificial neural network (ANN) does in Fig. 2e (and Extended Data Fig. 2e). To randomize the absolute angle of the arena at the start of each new trial (and thus avoid use of any olfactory or other allocentric cues), mice had to complete a separate instructed visually guided dot-hunting task, after which the landmarks and rewarded port were rotated randomly together (Extended Data Fig. 1b).

Fig. 1: RSC represents spatial information conjunctively with hypothesis states during navigation with locally ambiguous landmarks.
figure 1

a, Two perceptually identical landmarks are visible only from close up, and their identity is defined only by their relative location. One of 16 ports, at landmark ‘b,’ delivers reward in response to a nose-poke. The animal must infer which of the two landmarks is ‘b’ to receive reward; wrong pokes result in timeout. Tetrode array recordings in RSC yield 50–90 simultaneous neurons. b, Top, schematic example trial; bottom, best possible guesses of the mouse position. LM0, LM1 and LM2 denote task phases when the mouse has seen zero, one or two landmarks and could infer their position with decreasing uncertainty. c, Left, example training curve showing Phit/Pfalse-positive; random chance level is 1/16 for 16 ports. Mice learned the task at values >1, showing they could disambiguate between the two sequentially visible landmarks. This requires the formation, maintenance and use of spatial hypotheses. Asterisks denote per-session binomial 95% significance for the correct rate. Right, summary statistics show binomial CIs on last half of sessions for all four mice. d, Mouse location heatmap from one session (red) with corresponding spatial firing rate profiles for five example cells; color maps are normalized per cell. e, Task phase (corresponding to hypothesis states in b can be decoded from RSC firing rates. Horizontal line, mean; gray shaded box, 95% CI. f, Spatial coding changes between LM1 and LM2 phases (Euclidean distances between spatial firing rate maps, control within versus across condition; see Extended Data Fig. 2a for test by decoding, median and CIs (bootstrap)). g, Spatial versus task phase information content of all neurons and position and state encoding for example cells. Gray, sum-normalized histograms (color scale as in d).

Fig. 2: Recurrent neural dynamics can be used to navigate through locally ambiguous landmarks by forming and employing multimodal hypotheses.
figure 2

a, Schematic examples of hypothesis-dependent landmark interpretation. Left, mouse encounters first LM, then identifies the second as ‘a’ based on the short relative distance. Right, a different path during LM1 leads the mouse to a different hypothesis state, and to identify the perceptually identical second landmark as ‘b.’ Hypothesis states preceding LM2 are denoted LM1a and LM1b, depending on the identity of the second landmark. b, Structure of an ANN trained on the task. Inputs encode velocity and landmarks. Right, mean absolute localization error averaged across test trials for random trajectories. c, Activity of output neurons ordered by preferred location shows transition between LM0, LM1 and LM2 phases. Red, true location. During LM1 (when the agent has only seen one landmark), two hypotheses are maintained, with convergence to a stable unimodal location estimate in LM2 after encountering the second landmark. d, 3D projection from PCA of ANN hidden neuron activities. During LM2, angular position in neural state space reflects position estimate encoding. e, Example ANN trajectories for two trials show how identical visual input (black arrowheads) leads the activity to travel to different locations on the LM2 attractor because of different preceding LM1a/b states.

Mice learned the task (P < 0.0001 on all mice, Binomial test versus random guessing; Fig. 1c), showing that they learn to form hypotheses about their position during the LM1 phase, retain and update these hypotheses with self-motion information until they encounter the second (perceptually identical) landmark, and use them to disambiguate location and determine the rewarded port. We hypothesized that RSC, which integrates self-motion5, position6,7,8, reward value9 and sensory10 inputs, could perform this computation. RSC is causally required to process landmark information11, and we verified that RSC is required for integrating spatial hypotheses with visual information but not for direct visual search with no memory component (Extended Data Fig. 1i–l).

Spatial hypotheses are encoded conjunctively with other navigation variables in RSC

We recorded 50–90 simultaneous neurons in layer 5 of RSC in four mice during navigational task performance using tetrode array drives12 and behavioral tracking (Fig. 1a and Extended Data Figs. 1 and 3; Methods). RSC neurons encoded information about both the mouse’s location (Fig. 1d) and about the task phase, corresponding to possible location hypotheses (Fig. 1d,e). This hypothesis encoding was not restricted to a separate population: most cells encoded both hypothesis state as well as the animal’s location (Fig. 1g).

This encoding was distinct from the encoding of landmark encounters in the interleaved dot-hunting task and was correlated per session with behavioral performance (Extended Data Fig. 4). The encoding of mouse location changed significantly across task phases (Fig. 1d,f), similar to the conjunctive coding for other spatial and task variables in RSC6. This mixed co-encoding of hypothesis, location and other variables suggests that RSC can transform new ambiguous sensory information into unambiguous spatial information through the maintenance and task-specific use of internally generated spatial hypotheses.

Hypothesis-dependent spatial computation using recurrent dynamics

To test whether recurrent neural networks can solve sequential spatial reasoning tasks that require hypothesis formation, and to provide insight into how this might be achieved in the brain, we trained a recurrent ANN on a simplified one-dimensional (1D) version of the task, since the relevant position variable for the landmarks was their angular position (inputs were random noisy velocity trajectories and landmark positions, but not their identity; Fig. 2b). The ANN performed as well as a near Bayes-optimal particle filter (Fig. 2b), outperforming path integration with correction (corresponding to continuous path integration13,14 with boundary/landmark resetting15,16) and represented multimodal hypotheses, transitioning from a no-information state (in LM0) to a bimodal two-hypothesis coding state (LM1) and finally to a full information, one-hypothesis coding state (LM2) (Fig. 2c,d and Extended Data Fig. 5). Bimodal hypothesis states did not emerge when the ANN was given the landmark identity (Extended Data Fig. 5h–k). Together, this shows that recurrent neural dynamics are sufficient to internally generate, retain and apply hypotheses to reason across time based on ambiguous sensory and motor information, with no external disambiguating inputs.

Both ANN and RSC neurons encoded several navigation variables conjunctively (Extended Data Fig. 2b) and transitioned from encoding egocentric landmark-relative position during LM1 to a more allocentric encoding during LM2 (Extended Data Fig. 6). Instantaneous position uncertainty (variance derived from particle filter) could be decoded from ANN activity (Extended Data Fig. 5l), analogous to RSC (Fig. 1e). ANN neurons preferentially represented landmark locations (Extended Data Fig. 2c; consistent with overrepresentation of reward sites in hippocampus17,18), but we did not observe this effect in RSC. Average spatial tuning curves of ANN neurons were shallower in the LM1 state relative to LM2, corresponding to trial-by-trial ‘disagreements’ between neurons, evident as bimodal rates per location. RSC rates similarly became less variable across trials per location in LM2 (Extended Data Fig. 7), indicating that, in addition to the explicit encoding of hypotheses/uncertainty (Fig. 1e,g), there is a higher degree of trial-to-trial variability in RSC as a function of spatial uncertainty.

The ANN computed, retained and used multimodal hypotheses to interpret otherwise ambiguous inputs: after encountering the first landmark, the travel direction and distance to the second is sufficient to identify it as ‘a’ or ‘b’ (Figs. 1b and 2a). There are four possible scenarios for the sequence of landmark encounters: ‘a’ then ‘b’, or ‘b’ then ‘a’, for CW or CCW travel directions, respectively. To understand the mechanism by which hypothesis encoding enabled disambiguation, we examined the moment when the second landmark becomes visible and can be identified (Fig. 2a). We designate LM1 states in which the following second landmark is ‘a’ as ‘LM1a’ and those that lead to ‘b’ as ‘LM1b.’ Despite trial-to-trial variance resulting from random exploration trajectories and initial poses, ANN hidden unit activity fell on a low-dimensional manifold (correlation dimension d ≈ 3; Fig. 3d) and could be well captured in a three-dimensional (3D) embedding using principal component analysis (PCA) (Fig. 2d). Activity states during the LM0,1,2 phases (green, blue and gray/red, respectively) were distinct, and transitions between phases (mediated by identical landmark encounters; black arrows) clustered into discrete locations. Examining representative trajectories (for the CCW case; Fig. 2e) reveals that LM1a and LM1b states are well-separated in activity space. If the second landmark appears at the shorter CCW displacement (corresponding to the ‘a’ to ‘b’ interval), the state jumps to the ‘b’ coding point on the LM2 attractor (Fig. 2e). On the other hand, the absence of a landmark at the shorter displacement causes the activity to traverse LM1a, until the second landmark causes a jump onto the ‘a’ coding location on the LM2 attractor. In both cases, an identical transient landmark input pushes the activity from distinct hypothesis-encoding regions of activity space onto different appropriate locations in the LM2 state, constituting successful localization.

Fig. 3: Stable low-dimensional dynamics for hypothesis-based stimulus disambiguation.
figure 3

a, Correlation structure in ANN activity is maintained across task phases, indicating maintained low-dimensional neural dynamics across different computational regimes. Top, pairwise ANN tuning correlations in LM1 and LM2 (same ordering, by preferred location). Bottom, tuning curve pairs (normalized amplitude). b, Same analysis as a, but for RSC in one session (N = 64 neurons, computed on entire spike trains, sorted via clustering in LM1). The reorganization of spatial coding as hypotheses are updated (Fig. 1d,f) is constrained by the stable pairwise structure of RSC activity. Neurons remain correlated (first and second pair) or anticorrelated (third and fourth pair) across LM1 and LM2. c, Summary statistics (session median and quartiles) for maintenance of correlations across task phases. This also extends to a separate visually guided dot-hunting task (Extended Data Fig. 8). d, Activity in both the ANN and RSC is locally low-dimensional, through correlation dimension (the number of points in a ball of some radius grows with radius to the power of N if data is locally N-dimensional) on 20 principal components. See Extended Data Fig. 8 for analysis by PCA.

We next consider the nature of the dynamics and representation that allows the circuit to encode the same angular position variables across LM1 and LM2 regimes while also encoding the different hypotheses required to disambiguate identical landmarks. Does the latter drive the network to functionally reorganize throughout the computation? Or, does the former, together with the need to maintain and use the internal hypotheses across time, require the network to exhibit stable low-dimensional recurrent attractor dynamics? To test this, we computed the pairwise correlations of the ANN activity states (Fig. 3a) and found them to be well conserved across LM1 and LM2 states. As these correlation matrices are the basis for projections into low-dimensional space, this shows that the same low-dimensional dynamics were maintained, despite spanning different computational and hypothesis-encoding regimes (metastable two-state encoding with path integration in LM1 versus stable single-state path integration unchanged by further landmark inputs in LM2; Extended Data Fig. 5). Low-dimensional pairwise structure was also conserved across different landmark configurations and varied ANN architectures, and the low-dimensionality of ANN states was robust to large perturbations (Extended Data Fig. 5w). In sum, these computations were determined by one stable set of underlying recurrent network dynamics, which, together with appropriate self-motion and landmark inputs, can maintain and update hypotheses to disambiguate identical landmarks over time, with no need for external inputs.

RSC fulfills requirements for hypothesis-dependent spatial computation using recurrent dynamics

We hypothesized that RSC and its reciprocally connected brain regions may, similarly to the ANN, use internal hypotheses to resolve landmark ambiguities using recurrent dynamics. Using the ANN as a template for a minimal dynamical system that can solve the task (Fig. 2), we asked whether neural activity in RSC is consistent with a system that could solve the task with the same mechanisms. To be described as a dynamical system, neural activity must first be sufficiently constrained by a stable set of dynamics, that is, the activity of neurons must be sufficiently influenced by that of other neurons, and these relationships must be maintained over time1. To test this property, we first computed pairwise rate correlations and found a preserved structure between LM1 and LM2, as in the ANN (median R (across sessions) of Rs (across cells) = 0.74 in RSC, versus 0.73 in ANN; Fig. 3c). Firing rates could be predicted from rates of other neurons, using pairwise rate relationships across task phases; this maintained structure also extended to the visual dot-hunting behavior (Extended Data Fig. 8). Because pairwise correlations form the basis of dimensionality reduction, this shows that low-dimensional RSC activity is coordinated by the constraints of stable recurrent neural dynamics and not a feature of a specific behavioral task or behavior.

To employ neural firing rates as states of a dynamical system that act as memory and computational substrates in the same manner as in the ANN, they should also be low-dimensional. Consistent with the stable relationships between neurons, most RSC population activity was low-dimensional (around six significant principal components, and correlation dimension of around 5.4; Fig. 3d and Extended Data Fig. 8), similar to findings in hippocampus19. Together, we find that despite significant changes in neural encoding as different hypotheses are entertained across task phases (Fig. 1d–f and Extended Data Figs. 3f and 2a) and across different tasks (Extended Data Fig. 4a–d), the evolution of firing rates in RSC is constrained by stable dynamics that could implement qualitatively similar states as the ANN.

To compute with a dynamical system, states that act as memory need to affect how the system reacts to further input. The ANN solves the task using distinct hypothesis states that are updated with visual inputs and locomotion, by placing them in the state space so that visual input arriving at different hypothesis states within LM1 (LM1a versus LM1b) pushes activity onto the correct states in LM2 (Fig. 2). We examined this process in RSC by first looking at the evolution of neural states during the spatial reasoning process. States evolved at speeds correlated with animal locomotion, consistent with the observation that hypotheses are updated by self-motion in between landmark encounters and were driven by landmark encounters consistent with findings in head-fixed tasks11 (Extended Data Fig. 9a). Neural states were also driven by failures to encounter landmarks at expected positions, which can also be informative (Fig. 2e, right), albeit with a different neural encoding than we observed for encountering the landmarks (Extended Data Fig. 2e).

We next tested whether sufficiently separated neural states, LM1a and LM1b, together with stable low-dimensional attractor dynamics could resolve the identity of the second landmark. If so, this would suggest that, as in the ANN, the ensemble activity state in RSC can serve both as memory and affect future computations. We identified subsets of trials in which mouse motion around the LM1 to LM2 transition was matched closely and aligned them in time to the point when the second landmark became visible (Fig. 4a). In these trials, locomotion and visual inputs are matched, and only the preceding hypothesis state (LM1a or b) differs. RSC firing rates differed between LM1a and LM1b states, as did subsequent rates in LM2 (comparing within- to across-group distances in neural state space across matched trials, and by decoding state from firing rates: Fig. 4b and Extended Data Fig. 9i,j).

Fig. 4: RSC exhibits stable attractor dynamics sufficient for computing hypothesis-dependent landmark identity.
figure 4

a, Top, to study hypothesis encoding and its impact without sensory or motor confounds, we used trials with matched egocentric paths just before and after the second landmark (‘a’ or ‘b’) encounter. One example session is shown. Bottom, 3D neural state space trajectories (isomap); RSC latent states do not correspond directly to those of the ANN. b, RSC encodes the difference between LM1a and LM1b, and between subsequent LM2 states, as in the ANN (Fig. 2e and Extended Data Fig. 5). Blue, within-group and grey, across-group distances in neural state space. Horizontal lines, mean; boxes, 95% CIs (bootstrap). State can also be decoded from raw spike rates (Extended Data Fig. 9j). c, Neural dynamics in RSC are smooth across trials: pairwise distances between per trial spike counts in a 750 ms window before LM2 onset remain correlated with later windows; line, median; shading, CIs (bootstrap). d, RSC activity preceding the second landmark encounter predicts correct/incorrect port choice (horizontal line, mean; gray shaded box, 95% CI from bootstrap, cross-validated regression trees). e, Decoding of hypothesis states and position from RSC using ANNs to illustrate the evolution of neural activity in the task-relevant space (see b, c and d and Fig. 1e,f, Extended Data Fig. 9 statistics). f, Schematic of potential computational mechanisms. Left, if RSC encodes only current spatial and sensorimotor states and no hypotheses beyond landmark count (LM1a or LM2b, derived from seeing the first landmark and self-motion integration that lead to identifying the second landmark as ‘a’ or ‘b’), an external disambiguating input is needed. Right, because task-specific hypotheses arising from the learned relative position of the landmarks are encoded (this figure), and activity follows stable attractor dynamics (Fig. 3), ambiguous visual inputs can drive the neural activity to different positions, disambiguating landmark identity in RSC analogously to the ANN.

To compute with the same mechanism as the ANN, neural states must be governed by stable dynamics consistently enough for current states to reliably influence future states, which requires that nearby states do not diffuse or mix too quickly1. We found that RSC firing rates were predictable across trials such that neighboring trials in activity space remained neighbors (Fig. 4c), which further confirms stable recurrent dynamics, that these states can be used as computational substrate, and indicates a topological organization of abstract task variables19. This indicates that stably maintained hypothesis-encoding differences in firing over LM1 could interact with ambiguous visual landmark inputs to push neural activity from distinct starting points in neural state space to points that correspond to correct landmark interpretations, as in the ANN.

The ANN achieved high correct rates, but mice make mistakes. If the dynamical systems interpretation holds, such mistakes would be explainable by LM1a or b states that are not in the right location, and lead to the wrong LM2 interpretation. Indeed, we observed that neural trajectories from LM1a that were close in activity space to LM1b were dragged along LM1b trajectories and vice-versa (they had similar movement directions; Extended Data Fig. 9g,h), suggesting that behavioral landmark identification outcomes might be affected by how hypotheses were encoded in RSC during LM1. We tested this hypothesis and found that RSC activity in LM1 (last 5 s preceding the transition to LM2) was predictive of the animal’s behavioral choice of the correct versus incorrect port (Fig. 4d). Notably, this behaviorally predictive hypothesis encoding was absent during training in sessions with low task performance (Extended Data Fig. 4), indicating that the dynamical structures and hypothesis states observed in RSC were task-specific and acquired during learning.

Our unrestrained nonstereotyped behavior is not amenable to direct comparison of activity trajectories between ANNs and the brain as others have done in highly stereotyped trials of macaque behavior1. Instead, we found that the dynamics of firing rates in mouse RSC are consistent with, and sufficient for, implementing hypothesis-based disambiguation of identical landmarks using a similar computational mechanism as observed in the ANN.

Discussion

We report that RSC represents internal spatial hypotheses, sensory inputs and their interpretation and fulfills the requirements for computing and using hypotheses to disambiguate landmark identity using stable recurrent dynamics. Specifically, we found that low-dimensional recurrent dynamics were sufficient to perform spatial reasoning (that is to form, maintain and use hypotheses to disambiguate landmarks over time) in an ANN (Fig. 2 and also see Extended Data Fig. 10 for non-negative ANNs and when no map input was given). We then found that RSC fulfills the requirements for such dynamics, that is, encoding of the required variables (Figs. 1 and 4) with stable low-dimensional (Fig. 3) and smooth dynamics that predicted behavioral outcomes (Fig. 4). Due to the higher trial-to-trial variability and lower number of recorded cells, we do not draw direct connections between specific latent states of the ANN and neural data, as was done in previous studies in primates2,3,20 or simpler mouse tasks19,21.

We observed that local dynamics in RSC can disambiguate sensory inputs based on internally generated and maintained hypotheses without relying on external context inputs at the time of disambiguation (Fig. 4), indicating that RSC can derive hypotheses over time and combine these hypotheses with accumulating evidence from the integration of self-motion (for example, paths after the first landmark encounter) and sensory stimuli to solve a spatiotemporally extended spatial reasoning task. These results do not argue for RSC as an exclusive locus of such computations. There is evidence for parallel computations, likely at different levels of abstraction, across subcortical22 and cortical regions such as PFC3,23,24, PPC25, LIP26 and visual27,28 areas. Further, hippocampal circuits contribute to spatial computations beyond representing space by learning environmental topology29 and constraining spatial coding using attractor dynamics19,30,31 shaped by previous experience32. Finally, the landmark disambiguation that we observed probably interacts with lower sensory areas33, reward value9,34 and action selection computations21,35.

The emergence of conjunctive encoding, explicit hypothesis codes and similar roles for dynamics across RSC and the ANN suggests that spatial computations and, by extension, cognitive processing in neocortex may be constrained by simple cost functions36, similar to sensory37 or motor38 computations. The ANN does not employ sampling-based representations, which have been proposed as possible mechanisms for probabilistic computation39,40, showing that explicit representation of hypotheses and uncertainty as separate regions in rate space could serve as alternative or supplementary mechanism to sampling.

A key open question is how learning a specific environment, task or behavioral context occurs. We observed that hypothesis coding emerges with task learning (Extended Data Fig. 4). Possible, and not mutually exclusive, mechanisms include: (1) changes of the stable recurrent dynamics in RSC, as is suggested in hippocampal CA1 (ref. 29); (2) modification of dynamics by context-specific tonic inputs3,20; or (3) changes in how hypotheses and sensory information are encoded and read out while maintaining attractor dynamics that generalize across environments or tasks, as indicated by the maintenance of recurrent structure across tasks in our data (Extended Data Fig. 8) and as has been shown in entorhinal30 and motor cortex38 and ANNs41,42, possibly helped by the high-dimensional mixed nature of RSC representations43,44. Further, how such processes are driven by factors such as reward expectation34 is an active area of research.

Our findings show that recurrent dynamics in neocortex can simultaneously represent and compute with task and environment-specific multimodal hypotheses in a way that gives appropriate meaning to ambiguous data, possibly serving as a general mechanism for cognitive processes.

Methods

Mouse navigation behavior and RSC recordings

Drive implants

Lightweight drive implants with 16 movable tetrodes were built as described previously12. The tetrodes were arranged in an elongated array of approximately 1,250 × 750 µm, with an average distance between electrodes of 250 µm. Tetrodes were constructed from 12.7-µm nichrome wire (Sandvik–Kanthal, QH PAC polyimide coated) with an automated tetrode twisting machine45 and gold-electroplated to an impedance of approximately 300 kΩ.

Surgery

Mice (male, C57BL/6 RRID: IMSR_JAX:000664) were aged 8–15 weeks at the time of surgery. Animals were housed in pairs or triples when possible and maintained on a 12-h cycle, at 65–70 °F with ~60% humidity. All experiments were conducted in accordance with the National Institutes of Health guidelines and with the approval of the Committee on Animal Care at the Massachusetts Institute of Technology (MIT). All surgeries were performed under aseptic conditions under stereotaxic guidance. Mice were anesthetized with isofluorane (2% induction, 0.75–1.25% maintenance in 1 l min−1 oxygen) and secured in a stereotaxic apparatus. A heating pad was used to maintain body temperature; additional heating was provided until fully recovered. The scalp was shaved, wiped with hair-removal cream and cleaned with iodine solution and alcohol. After intraperitoneal (IP) injection of dexamethasone (4 mg kg−1), carprofen (5 mg kg−1), subcutaneous injection of slow-release buprenorphine (0.5 mg kg−1) and local application of lidocaine, the skull was exposed. The skull was cleaned with ethanol, and a thin base of adhesive cement (C&B Metabond and Ivoclar Vivadent Tetric EvoFlow) was applied. A stainless steel screw was implanted superficially anterior of bregma to serve as electrical ground.

A 3-mm craniotomy was drilled over central midline cortex, a durotomy was performed on one side of the central sinus and tetrode drives12 were implanted above RSC, at around anterior–posterior (AP) −1.25 to −2.5 mm and medio–lateral (ML) 0.5 mm, with the long axis of the tetrode array oriented AP and the tetrode array tilted inwards at an angle of ~15–20° and fixed with dental cement. The ground connection on the drive was connected to the ground screw, and the skin around the drive implant was brought over the base layer of adhesive as much as possible to minimize the resulting open wound, sutured and secured with surgical adhesive.

At the time of implant surgery, only two of the tetrodes were extended from the drive to serve as guides during the procedure. All other tetrodes were lowered into superficial layers of cortex within 3 days postsurgery. Mice were given 1 week to recover before the start of recordings.

Chronic electrophysiology

After implant surgery, individual tetrodes were lowered over the course of several days until a depth corresponding to layer 5 was reached and spiking activity was evident. Data were acquired with an Open Ephys46 ONIX47 prototype system at 30 kHz using the Bonsai software48 (v.2.2; https://bonsai-rx.org/). The tether connecting the mouse headstage to the acquisition system was routed through a commutator above the arena and was counterbalanced using a segment of flexible rubber tread. Tetrodes were occasionally lowered by small increments of ~50 µm to restore good recording conditions or to ensure sampling of new cells across sessions.

Spike sorting

Voltage data from the 16 tetrodes, sampled at 30 kHz were bandpass filtered at 300–6,000 Hz, and a median of the voltage across all channels that were well connected to tetrode contacts was subtracted from each channel to reduce common-mode noise such as licking artifacts.

Spike sorting was then performed per tetrode using the Mountainsort software49 (https://github.com/flatironinstitute/mountainsort_examples), and neurons were included for further analysis if they had a noise overlap score <0.05, an isolation score >0.75 (provided by Mountainsort49), a clear refractory period (to ensure spikes originated from single neurons), a spike waveform with one peak and a clear asymmetry (to exclude recordings from passing axon segments) and a smooth voltage waveform and ISI (inter spike interval) histogram (to exclude occasional spike candidates driven by electrical noise). Units were not excluded based on firing rates, tuning or any higher order firing properties. The number of simultaneously recorded cells per mouse for the main analyses was as follows. Blackdot, 52,53,54,49; Gothmog, 55,59,52,51,51,85; Nodot, 65,86,72,69; Unnamed1, 67,64; Total, 984. For the entire dataset analyzed in the analysis over learning (Extended Data Fig. 4), a larger number of cells, and of simultaneously recorded cells, were collected, and sessions with <50 cells were included.

Histology

To verify the localization of the recording sites (Extended Data Fig. 3), electrolytic lesions were created by passing currents of 20 µA through a subset of tetrodes (roughly four tetrodes per animal) for 30 s each under isoflurane anesthesia, and animals were perfused and brain processed 1 h later. Brains were mounted with 4′,6-diamidino-2-phenylindole and imaged.

Behavioral experiment hardware

Behavior was carried out in a circular arena of 50-cm diameter. The floor of the arena was formed by a clear acrylic sheet, under which a diffusion screen and a flat-screen TV was positioned on which visual stimuli were displayed. The circular arena wall was formed by 32 flat black acrylic segments, every other one of which contained an opening for a recessed reward ports, 16 in total. Each reward port contained an optical beam break (880-nm infrared (IR), invisible to mouse) that detected if a mouse was holding its nose in the port, a computer-controlled syringe pump for water reward delivery and a dedicated beeper as a secondary reward indicator. The behavior arena was housed in a soundproof and light-insulated box with no indicators that could allow the mice to establish their heading. Video was acquired by a central overhead camera at 30 Hz using a low level of infrared light at 850 nm and the mouse position was tracked using the oat software50 (https://github.com/jonnew/Oat). A custom behavioral control state machine written in Python was triggered every time a new camera frame was acquired, and the position of the animal, time passed and port visits were used to transition the logic of the state machine (Extended Data Fig. 1). For analysis purposes, all behavioral data was resampled to 100 Hz and synchronized to the electrophysiological data.

Inactivation of RSC and causal necessity for hypothesis-based computations

For pharmacological inactivation of RSC (Extended Data Fig. 1i–l), four mice were trained on a simplified parametric task that permitted us to causally test the role of RSC in individual recording and inactivation sessions. The task required integration of an allocentric position hypothesis with visual landmarks (Extended Data Fig. 1i,j). After mice learned the task—quantified as reaching a hit rate of above 30% in the simple conditions (high eccentricity; Extended Data Fig. 1j)—they were given access to unrestricted water and implanted following the procedure described for the main experiment but, instead of a chronic drive implant, a removable cap was implanted and two burr holes were prepared above RSC and covered with dental cement (Extended Data Fig. 1k). After recovery from surgery, mice were put back on water restriction over the course of 1 week and reintroduced to the task. Before each experiment, mice were anesthetized briefly with isoflurane, the cap was opened temporarily and the exposed skull was wiped with lidocaine and an injection of either 50 nl of 1 μg ml−1 muscimol solution in cortex buffer per side, or the same volume of cortex solution was performed through the existing burr holes. Mice were left to recover from anesthesia for 15 min and tested on the task. Performance was assessed as the hit rate on the first port visit per trial, and confidence level were computed using the Clopper–Pearson method for binomial confidence intervals (CIs) at the 95% level.

Behavioral training

After mice had undergone surgery, they were given at least 1 week to recover before water scheduling began. Initially, mice received 3 ml of water per day in the form of 3 g of HydroGel (ClearH2O), which was reduced gradually to 1.0–1.5 g per day. During this period, mice were handled by experimenters and habituated to the arena. Throughout the entire experiment mice were given water rewards for completion of the task and were given additional water to maintain their total water intake at 1.25–1.5 ml.

After initial acclimation to the recording arena over 2 days, mice were trained on the task. Throughout the task we used white circular cues on the floor (referred to as landmarks) of ~30-mm diameter on a black background. These landmarks were the only source of light in the experiment. Mice were run every day or every other day, for a single session of 30 min to 3 h per day. Training progressed in several phases:

  1. (1)

    Initially, mice were trained that circular visual cues on the floor of the arena indicated reward locations. One of the 16 ports was selected randomly as reward port and a cue was shown in front of this port. Visiting an incorrect port resulted in a time out (~1 s initially, increased later), during which the entire arena floor was switched to gray leading to a widespread visual stimulus. Visiting the correct port resulted in an audible beep from the beeper located in the port and around 0.005 ml of water were delivered by the syringe pump. After a reward, a new reward port was chosen randomly, and the landmark was rotated together with the port, effectively performing a rotation of the entire task, and the next trial began. This meant that mice learned to not rely on any cues other than the visual landmark to locate the correct port. Mice usually completed this phase in by day 4.

  2. (2)

    We then introduced a new task phase, referred to in the text as ‘dot-hunting’ task: after each reward, the landmark disappeared and instead a blinking dot was shown in a random location in the arena. If the mouse walked over that dot, it disappeared and either a new dot in a new random location appeared, repeating the process, or the next trial was initiated. The number of required dots–chases was sampled uniformly from a range and was increased to six to eight by the time recordings began, and the last dot was always positioned at the arena center. This task phase served to obfuscate the rotation of the task. Data acquired during this task phase were used during spike sorting but were not part of the main dataset in which we analyzed hypothesis representation. We analyze this task phase separately in Fig. 3c and Extended Data Figs. 3h,i, 4 and 8. Mice learned this task phase, with six to eight dots, by day 7 on average.

  3. (3)

    Throughout phases 1 and 2, we progressively introduced a requirement for the mice to hold their snouts in the reward port for increasing durations to trigger a reward or time out. For each port visit, the required duration was drawn randomly from a uniform distribution, so on any given trial the mice did not know when exactly to expect to know the outcome of the port visit. Initially, this hold time was 500 ms, and the time range was slowly increased throughout training, depending on animal performance. By the time recordings began, a range of around 4–6 s was used. Mice were able to tolerate this holding time by day 20 on average.

  4. (4)

    Next, we introduced an identical second landmark at a nonrewarded port. Initially, the two landmarks were set two ports apart (for example, ports 1 and 3), and this distance was progressively increased to four or five ports. As before, the rewarded port and landmarks were rotated randomly after each trial, but their relative positions remained stable. Visiting the reward port at the incorrect, ‘a’ landmark (and holding there for the required duration) was handled identically to visits to any other nonreward port and triggered the same time out. As a result, mice learned to visit the ‘b’ port. Mice learned to make an initial distinction between the ports approximately by day 14–16. In one mouse, we maintained this training phase until overall task performance was significant over entire sessions (Extended Data Fig. 1f), but we noticed that the mouse had trouble consistently relearning the next task phase. We therefore transitioned subsequent mice to the next phases before a stable behavior was established.

  5. (5)

    After the mice started learning to visit the port at the ‘b’ landmark, we introduced a view distance limitation that made landmarks invisible from far away: the mouse’s position was tracked at 30 Hz and, for each landmark, its brightness was modulated in real time as a function of the mouse’s distance from it. The visibility was 0 for distances above a threshold, 1 for distances below a second threshold and transitioned linearly between the two values. For clarity, we draw only the first threshold where landmarks initially become visible in the illustrations. The second threshold was typically set to about 50% of the first, leading to a gradual brightening, but in the otherwise totally dark arena, almost any values >1 are clearly visible. Initially, thresholds were set so that both landmarks were visible from the arena center (~20 cm); they were then reduced progressively to values where, at any one time, only one of the landmarks was visible to the mouse (~10 cm). At this stage, mice that encounter a landmark after a new trial starts have no way of knowing whether this is the rewarded or nonrewarded landmark, unless they infer landmark identity via path integration (See Fig. 2e right or Extended Data Fig. 2e). Recordings began when mice were able to complete 100 trials per hour at a hit/miss rate >1. Mice reached this criterion level on average by total day 30–40 of training.

Statistics and reproducibility

Statistical tests were carried out in Matlab (Mathworks, v.2019) using built-in functions. Unless stated otherwise, CIs were computed at a 95% level using bootstrap, and P values were computed using a Mann–Whitney U test or Wilcoxon signed-rank test. In figures, significance values are indicated as nonsignificant (NS) (P > 0.05), *(P ≤ 0.05), **(P ≤ 0.01) or ***(P ≤ 0.001). No statistical method was used to predetermine sample sizes.

Behavior analysis

Recording sessions were included once mice performed the task well enough to achieve a session average hit/miss ratio >1, indicating that mice could infer the correct port between the ‘a’ and ‘b’ landmarks (a correct rate of >1/16 would indicate that they can associate landmarks with rewarded ports, but not that they can infer landmark identity). Because landmarks are visible sequentially only after full training, a ratio >1 shows that mice employed a memory based strategy where they used a previous hypothesis derived from seeing or not seeing the first landmark, together with path integration, to infer the identity of the second landmark they encounter. Only sessions with at least 50 recorded single neurons, and with at least 50 min of task performance were included. This yielded 16 sessions from four mice. For some analyses, particularly for analyses where trajectories of the mice were matched across trial types to control for potential motor and sensory confounds, additional selection criteria were applied yielding a lower number of sessions that could be used, this is stated for the respective analyses. For plots of the learning rates, we included trials where mice encountered their first landmark after 20 s or faster to exclude periods where mice were not engaged.

Behavioral epochs

For analysis, each trial was split into epochs: the time between the onset of a trial (right after the mouse completes the preceding reinitialization procedure, and finds itself at the center of the arena, unsure of its orientation relative to the currently invisible landmarks) and the onset of the reward (the first time the mouse could know whether it reached the correct port, other than by process of elimination after visiting all other ports) was split up based on the amount of information the mouse could have accumulated: the initial state when mice had not seen any landmark was labeled ‘LM0,’ time after the first landmark encounter was labelled ‘LM1,’ and after the second encounter as ‘LM2’. The timepoints when landmarks became visible and the mouse transitioned from LM0 to LM1 or from LM1 to LM2, referred to as ‘landmark encounters’ were defined as the timepoint when landmark visibility exceeded 50%.

For analyses of the correlation of neural state and eventual behavioral outcomes, each second landmark encounter was further categorized as whether it occurred at the ‘a’ or ‘b’ landmark. For behavioral analyses in Fig. 4d, trials were further categorized by whether they led to a correct port visit or to a incorrect visit and a time out.

Similarity of spatial tuning across conditions

Changes in spatial tuning in individual RSC neurons as mice encounter successive landmarks (Fig. 1f) was quantified by the Euclidian distance of their spatial tuning profiles (in an 8 × 8 map, resulting in a 64-element vector, for each comparison nonvisited ties were omitted). As an internal control, distance between tuning profiles within condition and across condition were compared using nonoverlapping 1-min segments. The control levels are different between the cases because the amount of data per session, reliability of firing, and so on, is not constant, and each control is valid only for its test data. For each comparison (LM1 versus LM2 and LM0 versus LM1), the split spatial tuning maps were compared either within the conditions, for example, within LM1 and within LM2, and compared with distances between LM1 and LM2 maps.

Neural decoding of mouse position

All decoding analyses were performed on the entire neural population with no preselection. To decode the mouse position from RSC firing rates, neural firing rates were first low-pass filtered at 1 Hz with a single-pole Butterworth filter. The resulting firing rate time series were used to predict the mouse position as 100 categorical variables forming a 10 × 10 bin grid (bin width = 50 mm). The network was made up of a single long short-term memory (LSTM) layer with 20 units, and a fully connected layer into a softmax output into the 100 possible output categories. For analyses of intermediate information content of the decoder, the network input into the final softmax layer was analyzed.

Decoding was reinitialized for each trial. For each decoded trial, all other trials served as training set. For analysis of how the neural coding of position was dependent on the landmark state of the mouse (Extended Data Fig. 2a), the same analysis was repeated with training and testing data further divided by landmark state. For analysis of the decoding performance, the output likelihood from the decoder was evaluated at the mouse’s true position for all positions that were shared across conditions for this session. Statistical analysis was then performed on a per session average likelihood (not weighted by number of trials per session).

Neural decoding of landmark state

For the analysis of landmark state (Fig. 1e), trials with at least 0.5 s of data from all three states were used (16 sessions, 486 total trials) and individual trials were held out from training for decoding. Firing rates were low-pass filtered with a causal single-pole Butterworth filter at 0.05 Hz, and landmark state (0, 1 or 2) was decoded independently for each timepoint using a categorical linear decoder (dummy variable coding, (Nneurons + 1) × 3 parameters), or a neural network with no recurrence, using a single 20-unit layer receiving instantaneous firing rates, into a six unit layer and into three softmax outputs. Training data were balanced across conditions. For related analyses of hypothesis state decoding, see also Fig. 4 and Extended Data Fig. 9j, where we decode form position-matched timepoints to account for location, motor and visual confounds, and Extended Data Fig. 7 where we match for position.

Analysis of landmark ‘nonencounters’

To show that mice can gain information by not encountering a landmark (as is shown, for example, by the ANN example in Fig. 2), we analyzed cases where the mouse first encounters a landmark, and then, in the LM1 state, encounters the position where another landmark could be, but fails to see one. We note that this analysis has unavoidable confounds, as in one condition the mouse gets salient visual input, in the other it does not. We consequently ignored these cases in the main analysis, and instead concentrated on cases where visual input was matched, but previous hypotheses differ (Fig. 4). We analyze these nonencounters by decoding the associated state change with the same method as in ‘Neural decoding of landmark state,’ but with nonbalanced conditions, due to the lower trial count, and analysis of the prediction around the 0-s point.

Dimensionality analysis

PCA was performed by first computing the covariance matrices of the low-pass filtered (as before) firing rates, and plotting their eigenvalue spectra, normalized by sum (Extended Data Fig. 8c). Each scaled eigenvalue corresponds to a proportion of explained variance. Spectra are plotted together with a control spectrum computed from covariances of randomly shuffled data. For a description of the method used to compute the correlation dimension of RSC rates (Extended Data Fig. 8d), see the heading ‘Correlation dimension’ in the section about ANN methods below.

Prediction of firing rates across RSC population

For quantification of the independence of individual RSC neurons from the surrounding RSC population (Extended Data Fig. 8f,g), the firing rates of each neuron were predicted from those of all other neurons using linear regression. Rates were first filtered at 0.01–0.5 Hz with a third-order Butterworth filter, and subsampled to 3.3 Hz. Each neuron’s rate was predicted with L1 regularized linear regression51 (\({\rm{lasso}},\lambda \approx 0.0001\)) from the rates of all other neurons and preceding firing rates using eight lags (~0.2.5 s). Goodness of fit was quantified as the proportion of variance explained, \({R}^{2}=1-{\sum }_{i}{({Y}_{i}-{Y}_{i}^{{\rm{pred}}})}^{2}/{\sum }_{i}{\left({Y}_{i}-\bar{Y}\right)}^{2}\). Predictions were computed both within condition (LM1, LM2 and dot-hunting phase), as well as across conditions, where the model was fit using coefficients determined from the other conditions.

Computation of firing rate distribution entropies

Entropies of empirical firing rate distributions were computed in bits according to their Shannon entropy, \(H\left(X\right)=-{\sum }_{i=1}^{n}P\left({x}_{i}\right){\log }_{2}P({x}_{i})\), relative to a uniform histogram of the same size, \(\hat{H}\left(X\right)=-\left(H\left(X\right)-H\left({\rm{uniform}}\right)\right)\). In cases where zeros appeared, a small offset term <<1 was added and all histograms were normalized to a sum of 1. For example, \(\hat{H}\left(\left[1,0\right]\right)=\hat{H}\left(\left[1,1,0,0\right]\right)=1\,{\rm{bit}}\), and \(\hat{H}\left(\left[1,1,1,1.3\right]\right)\cong 0.01\,{\rm{bit}}\). For the analysis in Fig. 1g, a 8 × 8 grid was used for spatial coding, and three bins for the state coding. Although the 8 × 8 grid is coarse enough to allow accurate capture of the spatial firing rate profile even for low-rate cells, the resulting estimates could be minimally affected by firing rate differences between neurons.

Trial-to-trial variance of firing rates conditioned on position

For analysis of whether partial hypothesis representation in the LM1 state corresponds to trial-by-trial changes in firing rates, evident in bimodal firing rate histograms, histograms of hidden unit firing rates of the ANN, conditioned on binned 1D position are displayed (Extended Data Fig. 7a). Data are from Experiment configuration 2 (‘Overview over experiment configurations used with ANNs’). Tuning curves were calculated using 20 bins of location/displacements and normalized individually for each neuron. The first timestep in each trial and timesteps with nonzero landmark input were excluded from the analysis. For histograms, each condition was binned in 100 column bins and neuron rates in ten row bins. Histograms were normalized to equal sum per column. For analysis of RSC firing rates (Extended Data Fig. 7b–d), we did not observe bimodal rate distributions and instead quantified the dispersion of the rate distributions according to their entropy: firing rates were low-pass filtered at 0.5 Hz to bring them into the timescale of navigation behavior, and firing rate histograms were computed with eight bins spanning from each neurons lowest to highest firing rate per neuron, for each spatial bin in a 4 × 4 grid. Because the computation of histogram entropy is biased by the number of samples, for each spatial bin, the same number of timepoints were used for the LM1 and LM2 conditions. The dispersion of the firing rate distribution was then computed as average entropies per cell across all space bin, and compared across the two conditions.

Analysis of encoding of angular position and displacement from last seen landmark

Firing rate profiles were analyzed in two reference frames, that is, global angle of the mouse in the arena, and relative angle to the last visible landmark. Only timepoints from the foraging state where the distance of mouse from the center of the arena exceeded 70% of the arena diameter were included. Timepoints from the LM1 and LM2 conditions were subsampled to yield matched number of timepoints. Firing rates were analyzed in a −π to π range in six bins by computing their entropy as described before.

Pairwise correlation of firing rates

Recordings were split into LM[0,1,2] states as before, firing rates were low-pass filtered at 1 Hz, and the Pearson correlation coefficient between each pair of neurons was computed. For display purposes, the neurons were reordered by first computing the matrix for the LM1 state, applying hierarchical clustering52, and the resulting reordering was applied to both LM1 and LM2 conditions. This reordering has no impact on any further analyses. For summary statistics, we computed the correlation of correlations for each session. We observed no systematic change in the results as a function of the low-pass cutoff frequency, see Extended Data Fig. 8e for a comparison of a 1-Hz versus a 5-Hz cutoff.

Low-dimensional embedding of neural activity

Neural firing rates were bandpass filtered as before, and an initial smoothing and dimensionality reduction step was performed by training a small LSTM with a single layer of 30 units to decode the mouse position. The hidden unit activations were then embedded in 3D space with the isomap algorithm53, using the Toolbox for Dimensionality Reduction by Laurens van der Maaten54.

Analysis of speed of neural state evolution

For quantification of how fast the neural state evolves, the firing rates of the entire population were computed by low-pass filtering the spike trains at 1 Hz (third-order Butterworth filter), and the speed of the five largest principal components of the resulting vector in Hz s−1 were related to the running speed of the mouse (m s−1, also low-pass filtered at 1 Hz) or the change in landmark brightness (percent per second) (Extended Data Fig. 9a). Data were binned in 30 bins from 0 to 0.5 m s−1 and ten bins from 0.5 to 2 m s−1 for running speed and ten bins from −50 to 50% and ten bins for ±50–200%. CIs were computed by treating median data from each session as independent samples.

Analysis of context-encoding in RSC across similar motor and sensory states

To study the encoding of context with minimal sensory and motor confounds (Fig. 4 and Extended Data Fig. 9), we split the appearances of the second landmark into two groups depending on whether the second landmark is ‘a’ or ‘b,’ as described in the main text. We then selected subsets of trials manually where egocentric paths just before the appearance of the second landmark are matched across the two groups. Figure 4a shows an example of such matched approach paths/trials. Sessions in which at least 16 trials could be matched were used for these analyses, yielding a total of 133 trials from six sessions (per session, 16, 23, 24, 24, 25 and 21). For each session, all of these trials were aligned to the time when the second landmark became visible, yielding a set of time ranges where the animals experienced similar visual inputs, performed similar locomotion behavior but potentially encoded different previous experience leading them to subsequently disambiguate the perceptually identical second landmark as ‘a’ or ‘b.’

To test whether there was consistent encoding of this context in RSC, we then compared the distances across these groups in 3D neural activity space (‘Low-dimensional embedding of neural activity’) to distances within the groups (Fig. 4b and Extended Data Fig. 9). This test was performed at the point where the second landmark became visible to assess encoding of previous context, as well as 200 ms afterwards to assess how the identity of the (now visible) landmark affects encoding in RSC.

Analysis of smooth neural trajectories across sessions

To assess whether neural trajectories were determined by population dynamics that were stable across trials and could therefore serve as substrate for the computation performed by the mice, we tested whether neural trajectories behaved consistent with a laminar flow regime where neighboring particles (in our case, neural firing rate vectors) remain neighbors for a significant amount of time, or whether they decorrelate quickly (Fig. 4c and Extended Data Fig. 9e,f). To assess temporal dynamics of the neural spiking without imposing any smoothing, we investigated raw spike counts in 750-ms windows for this analysis. For each session, an initial set of pairwise high-dimensional distances in spike counts between the trials with egocentrically similar paths (‘Analysis of context-encoding in RSC across similar motor and sensory states’) was computed from the last 750 ms preceding the appearance of the second landmark. These distances were then correlated with those in a second sliding window; Extended Data Fig. 9f). An offset of 0 s was defined as the point where both windows stopped overlapping. The correlation coefficient R was then computed for increasing window offset up to 2 s. Summary statistics were computed across sessions by first shifting each session individually by its 95% level for R (from a shuffled control which removed the relationship between cells) which results in the summary plot showing a highest value for R of ~0.8 even for offsets where the windows fully overlap and the uncorrected R value is 1. Because of this offset, the null level for each trial is now at R = 0. We then computed the CIs for the group by bootstrap relative to this level.

Decoding of low-dimensional task-relevant states from RSC activity

To illustrate the joint encoding of position and task states (as sketched conceptually in Fig. 4f) using neural data, we decoded the hypothesis state, as well as x/y position from firing rates (Fig. 4e). Individual trials were held out as test set, an ANN was trained on the remaining trials and the resulting predictions in the test trial were plotted with hypothesis state in z and x/y in x/y dimensions. True LM0, 1a and 1b states were indicated with same colors as throughout the figure. Rates were low-pass-filtered with a causal third-order Butterworth filter at 0.5 Hz to bring rates into the behavioral timescale. For position decoding, the network architecture was filtered rates > 20-unit LSTM layer > 15-unit LSTM > 6-unit LSTM > 2 element regression output (mouse x/y position). For hypothesis states, rates > 10-unit LSTM > single regression layer, with LM0 encoded at 0, and LM1a and b as −1 and +1, respectively. This analysis was not used to make statistical statements. Instead, we tested x/y encoding in Fig. 1, and hypothesis encoding in Fig. 4 and Extended Data Fig. 9 with statistical methods.

Analysis of direction of neural trajectories

To further test whether neural trajectories were determined by population dynamics that were stable across trials, and were independent of the interpretation of the second (locally ambiguous) landmark, we tested whether neural activity evolved in similar directions across trials if it started close together in 3D neural activity space (‘Low-dimensional embedding of neural activity’) (Extended Data Fig. 9g,h). We therefore looked at neural trajectories within the motor and sensory-matched LM2 approaches where the neural state at the point where the second landmark became visible started neurally close to other trials from the opposing class. For example, for an LM2a trial, we examined whether this trial might follow other close-by LM2b trials. We computed neural proximity in the 3D neural embedding (see above) and defined close-by trials as ones that were within 1 a.u. in Euclidean distance in the isomap embedding around the time when the second landmark became visible, yielding a total of 42 out of 79 trials with close neighbors from opposing classes from the five sessions (one session was excluded because the neural activity in the relevant time ranges was collapsed onto a point in the LSTM embedding). As a control, we also selected corresponding neurally furthest points. Similarity of neural evolution was then quantified as the angular difference between the trials in (3D) LSTM space over time, to assess coevolution independently of the initial selection by distance. Significance was computed by bootstrap across trials versus random alignments corresponding to a 90-degree difference.

Behavior prediction

For the behavior prediction analysis, sessions with at least five correct and incorrect port visits after the second landmark visit were used (N = 11) and an equal number of hit and miss trials (outcome of next port visit is a time out or a correct) were selected, leading to a chance prediction level of 0.5. The spike rates from the 5 s preceding the second landmark becoming visible, binned into 1-s bins, were used to predict the behavioral outcome with a binary classification decision tree with a minimum leaf size of six, previously determined using cross-validation. Predictions for each trial were fit using all other trials.

Specificity of landmark encounter coding to the foraging task

We trained a decoder to predict either the number of encountered dots in the main task, or in the dot-hunting task. These tasks were interleaved, and the same neurons were used (Extended Data Fig. 4). Train and test sets were split by trial, and decoding was performed with a regression tree on low-pass-filtered firing rates as before, performance was quantified as mean error on the number of landmarks. Only the first two landmarks were predicted in the dot-hunting task to allow use of the same classifier across both. Decoding performance was compared between the within-class (for example, decode main task encounters with decoder trained on other trials in the main task) and cross-class (for example, decode dot-hunting from decoder trained on the main task, and so on).

Analysis of neural coding as a function of task performance

To test whether the encoding of hypothesis states in RSC is specific to task performance, we analyzed a larger number of sessions from the entire period during which two landmarks with local visibility were used (92 recording sessions in total) (Extended Data Fig. 4). We analyzed the effect of task performance on the behavior prediction analysis (as described above; Extended Data Fig. 4). We also analyzed the more general decoding of landmark encounter count (same method as in ‘Specificity of landmark encounter coding to the foraging task’; Fig. 1) in all of the 92 sessions with two landmarks, and correlated decoding performance with task performance on a per session level. As an additional control, we performed the same analysis on the number of dots encountered in the interleaved dot-hunting task. For all of these analyses, we used an analogous method as for the nonbehavior-correlated analyses.

Correlation dimension in RSC

For details of the calculation of the correlation dimension for RSC data, see ‘Correlation dimension.’

Artificial neural networks

We chose a simple recurrent neural network as one of the simplest architectures that can learn to maintain state over time. Unless stated in the text, the default architecture consisted of rate neurons with an input layer into 128 hidden recurrent units (tanh nonlinearity) into 80 output neurons, trained on random velocity trajectories in random environments of up to four landmarks (see ‘Network architecture and training’ for details). For the analyses in the main text, landmark inputs were relayed to the ANN as a map that encoded their relative position but not identity (‘external map’ ANN, 80 input neurons). The findings were replicated with an ANN that received only binary landmark presence input (‘internal map’ ANN, 11 input neurons) and non-negative ANNs (Extended Data Fig. 10), on a subset of environments. The ANN serves to establish whether and how recurrent systems could solve the task, and we make no connections between the circuitry of RSC and the connections in the ANN.

A simulated animal runs with varying velocity in a circular environment starting from a random unknown position and eventually infers its position using noisy velocity information and two, three or four indistinguishable landmarks. A trial consists of a fixed duration of exploration in a fixed environment, starting from an unknown starting location; the environment can change between trials. Environments are generated by randomly drawing a constellation of two to four landmarks, and the network must generalizably localize in any of these environments when supplied with its map. The network must adjust its spatial inference computations on the basis of the configurations of the different environments, without changing its weights; thus, the adjustments must be dynamic. In the internal map scheme (Extended Data Fig. 10a–m), an input cell simply encodes by its activation whether the animal is at any landmark; it does not specify the location of the landmark, the identity of the environment, or the spatial configuration of the various landmarks in the environment. The task in the internal map scheme is substantially harder, since the network must infer the configuration of landmarks in the environment purely from the time sequence of landmark visits, while simultaneously localizing itself within the environment. Information about the maps must be acquired and stored within the network. To make the task tractable, we limit training and testing in the internal map setting to four specific environments.

In the external map task (Figs. 2 and 3 and Extended Data Figs. 2 and 57), landmark locations were random and the set of locations (map) were provided to the network, whereas in the internal map task (Extended Data Fig. 10a–m) one of four landmark configurations was used, but the maps were not provided to the network. Landmarks could be observed only for a short distance. A three-layer network with a recurrent hidden layer was trained to infer location. Velocity and landmark encounter information were encoded in the input layer, and all weights of the network were trained. The training target for the output layer was activation of a unit with von Mises tuning and preferred location matching the true location.

Network performance was compared with a number of alternative algorithms: path integration plus correction integrated the noisy velocity information starting from an initial location guess and corrected this estimate by a reset to the coordinates of the nearest landmark when a landmark was encountered. Particle filters approximated sequential Bayesian inference given the available velocity and landmark information, with each particle capturing a location hypothesis whose posterior probability is given by an associated weight. Particle locations are updated using velocity information and particles are reweighted after landmark encounters. The enhanced particle filter also reweights particles when a landmark is expected but not encountered, thus can infer location not only from the presence but also from the absence of landmarks. The output and hidden representations of the trained network were evaluated in a variety of conditions involving both random and fixed landmark locations and trajectories with random and fixed velocities.

Definition of environments and trajectories

The task is defined by a simulated animal moving along a circular track of radius 0.5 m for 10 s. The animal starts at a random, unknown position along the circle at rest and starts running along a trajectory at nonconstant velocity. A trajectory is sampled every dt = 0.1 s in the following way: at each time t, acceleration at is sampled from a zero-mean Gaussian distribution with s.d. σa = π/4 m s−2 that is truncated if |at | > π/2 m s−2. Acceleration is integrated to obtain the velocity vt and truncated if |vt | > vmax = π/2 m s−1. The actual location on the track is the integral of this velocity.

In a trial of the external map task, the locations of K = 2, 3 or 4 indistinguishable landmarks were determined sequentially: the first landmark was sampled from a uniform random distribution on the circle, with subsequent landmarks also sampled from a uniform random distribution but subject to the condition that the minimum angular distance from any previously sampled landmark is at least δ = π/9 rad.

The internal map task involved four environments, each with a unique configuration of landmarks: two environments had two landmarks, one had three and the last had four. Landmark locations in the four environments were chosen so that pairwise angular distances were sufficiently unique to allow the inference of environment identity. Landmark coordinates in environment ei were given by: e1 = {0, 2π/3} rad, e2 = {1.9562, 3.7471} rad, e3 = {0.2641, 1.2920, 3.7243} rad and e4 = {3.0511, 3.8347, 5.1625, 5.7165} rad.

Experiment configurations used with ANNs

After training, the networks were evaluated in different testing configurations that each consisted of a distribution over landmark configurations and trajectories:

Experiment configuration 1

Training distribution: this test set was generated exactly as in the training set, as described in ‘Definition of environments and trajectories’.

Experiment configuration 2

Fixed landmarks, random trajectories: the landmark configuration was given by two landmarks located at e = {0, 2π/3}, the trajectories were sampled in an identical way as in the training distribution. Note that this landmark configuration corresponds to the first environment in the internal map task.

Experiment configuration 3

Fixed landmarks, constant velocity trajectories: the landmark configuration was given by two landmarks located at e = {0, 2π/3} and the trajectories were given by constant velocity trajectories with |vt | = vmax/2. The initial position and the direction of the trajectory was random.

Experiment configuration 4

Two variable landmarks, constant velocity trajectory: the landmark configuration was given by two landmarks located at e = {0, 2π/3 + απ/3}, where α ϵ [0, 1]. The trajectories were given by constant velocity trajectories with |vt | = vmax/2 and the initial position and the direction of the trajectory was random.

Experiment configuration 5

Two environments, random trajectories: the landmark configuration was given by either e1 or e2 of the internal map task, trajectories are random.

Landmark observation

The animal is considered to have encountered a landmark if it approached within dmin = vmax × dt/2 = π/40 m−2 = π/20 rad. This threshold is large enough to prevent an animal from ‘missing’ a landmark even if it is running at maximum velocity. This ‘visibility radius’ is smaller than the one we used for the mouse behavior experiments (Fig. 1). In the ANN experiments, landmark encounters were therefore roughly coincident with the agent’s position coinciding with the landmark, whereas in the mouse data, landmark encounters occur a significant distance away from the landmark, when it becomes visible (for example, Fig. 4a). In the same way as in the mouse behavior analysis, hovering around the same landmark or approaching the same landmark consecutively would trigger a landmark encounter only at the first approach; a new encounter was triggered only if the animal approached an landmark different from the previous one, equivalent to the definition used in the analysis of mouse behavior. Also, only trials in which the animal encountered at least two different landmarks were included.

Sensory noise

The largest sources of uncertainty in the tasks were the unknown starting position and the indistinguishability of the landmarks. In addition, we assumed that the velocity information and the landmark location memory (in the external map scenario) were corrupted by noise. At each timestep of size dt = 0.1, the velocity input to the network corresponded to the true displacement vdt corrupted by zero-mean Gaussian noise of standard deviation σv = vmaxdt/10. In the external map task, the landmark map provided to the network and particle filter was corrupted by zero-mean Gaussian noise with standard deviation σl = π/50 rad, without changing the relative landmark positions: The map was coherently slightly rotated at a landmark encounter, and the rotation was sampled independently at each landmark encounter.

ANN preferred firing at landmark locations

This analysis was performed by evaluating the network of the external map task on the experiment configuration 1 of the internal map task (Extended Data Fig. 2c). First, location tuning curves were determined after the second landmark encounter using 5,000 trials from distribution 1 and using 50 location bins. Tuning curves were calculated separately for each of the four environment of the internal map task. Preferred location was determined to be the location corresponding to the tuning curve maximum. The density of preferred locations smaller than distance dmin away from a landmark was then compared with the density of preferred locations further away from landmarks.

Network architecture and training

The network consisted of three layers of rate neurons with input-to-hidden, hidden-to-hidden and hidden-to-output weights. All weights were trained.

Network input

The input layer consisted of 80 neurons in the external map case and 11 neurons in the internal map case. Ten neurons coded for velocity corrupted by noise (noise as described above). The velocity neurons had a minimum firing rate between 0 and 0.2 and a maximum firing rate between 0.8 and 1 in arbitrary units, and within this output range coded linearly for the whole range of velocity between −vmax and vmax. Negative and positive velocity here corresponds to CW and CCW travel, respectively.

The remaining neurons (70 in the external map case and 1 in the internal map case) coded for landmark input and were activated only at the timestep of, and up to, three timesteps after a landmark encounter. In the external map case, the landmark input simultaneously encoded the locations of all landmarks in the environment, thus supplying a map of the environment, but contained no information about which LM was currently encountered. The LM neurons had von Mises tuning with preferred locations xj = (j − 1) × 2π/70 rad, j = 1…70, that tiled the circle equally. Given n landmarks at locations li, i = 1…n, the firing rate of the j-th landmark input neuron was given by

$${r}_{j}=\mathop{\sum }\limits_{i}\exp \left(\frac{\cos \left({x}_{j}-\widetilde{{l}_{i}}\right)-1}{2{\sigma }_{w}^{2}}\right),$$

where \(\widetilde{{l}_{i}}\sim N\left({l}_{i},{\sigma }_{l}^{2}\right)\) is the noise-corrupted landmark coordinate (‘Sensory noise’). This mixture of von Mises activation hills produces the pattern depicted as the ‘map’ input in Extended Data Fig. 5a.

In the internal map case (Extended Data Fig. 10a–m), the landmark input neuron consisted of a single binary neuron that responded for four timesteps with activation 1 in arbitrary units whenever a landmark was encountered. This input encoded neither environment identity nor landmark location.

Hidden layer

The hidden layer consisted of 128 recurrently connected neurons. The activation ht of hidden layer neurons at timestep t was determined by ht = tanh(Wxxt + Whht − 1 + b), where xt are the activations of input neurons at timestep t, Wx are the input-to-hidden weights, Wh are the hidden-to-hidden weights and b are the biases of hidden neurons. The nonlinearity should be considered as an effective nonlinearity at long times; since the timestep dt = 0.1 s was large compared with a typical membrane time constant (τ ≈ 0.02 s), we did not include an explicit leak term.

Hidden layer (non-negative network)

In the non-negative network (Extended Data Fig. 10n–t), the recurrent activation was determined by ht = tanh([Wxxt + Whht−1 + b]+), where [u]+ denotes rectification.

Output layer

The output layer consisted of a population of 70 neurons with activity ot given by ot = tanh(Woht + bo), where Wo are the output weights and bo the biases of the output neurons.

Network training

The training targets of the output layer were place cells with von Mises tuning of width σo = π/6 rad to the true location yt,

$${\widetilde{o}}_{\alpha ,t}=\exp \left(\frac{\cos \left({z}_{a}-{y}_{t}\right)-1}{2{\sigma }_{o}^{2}}\right),$$

where zα, α = 1…70 are the equally spaced preferred locations of each training target.

The network was trained by stochastic gradient descent using the Adam algorithm55, to minimize the average square error between output ot and training targets \({\tilde{{o}}_{t}}\), with the average taken over neurons, time within each trial and trials. The gradients were clipped to 100. The training set consisted of 106 independently generated trials. During training, performance was monitored on a validation set of 1,000 independent trials and network parameters with the smallest validation error were selected. All results were cross-validated on a separate set of test trials to ensure that the network generalized across new random trajectories and/or landmark configurations.

Network location estimate

Given the activity of the output layer at time t, we define the network location estimate for that time to equal the preferred location (the preferred location was set over training) of the most active output neuron:

$${\hat{{y}_{t}}}={z}_{{\hat{\alpha }}_{t}},{\hat{\alpha }}_{t}=\text{argma}{\text{x}}_{\alpha }{o}_{\alpha ,t}$$

Performance comparisons

In Fig. 2b, we compared the performance of the network in the external map task with a number of alternative algorithms. To ensure a fair comparison, we make sure that each alternative algorithm has access to exactly the same information as the network: the landmark identities are indistinguishable and both velocity and landmark location information are corrupted by the same small amount of sensory noise. Error statistics were computed from 5,000 trials.

Path integration and correction

This algorithm implements path integration and landmark correction using a single location estimate, similar to what is implemented in hand-designed continuous attractor networks that implement resets at boundaries or other landmarks15,16,56,57. The algorithm starts with an initial location estimate at y = 0 (despite the true initial location being random and unknown), and integrates the noise-corrupted velocity signal to obtain location. At each landmark encounter, the algorithm corrects its location estimate to equal the coordinates of the landmark nearest to its current estimate.

Basic particle filter

Particle filters implement approximate sequential Bayesian inference using a sampling-based representation of the posterior distribution. Here, the posterior distribution over location at each timepoint is represented using a cloud of weighted particles, each of which encodes through its weights a belief, or estimated probability, of being at a certain location. In the beginning of the trial, Np = 1,000 particles are sampled from a uniform distribution along the circle and weighted equally. In the prediction step, particles are propagated independently using a random walk whose mean is the noise-corrupted velocity update and whose s.d. is the velocity noise σv. In the absence of a landmark encounter, particle weights remain unchanged and the particle cloud diffuses. If a landmark is encountered, the importance weights wt,β of particles β = 1…Np are multiplied by

$${w}_{t,\beta }\propto {w}_{t-1,\beta }\bullet \mathop{\sum }\limits_{i}\exp \left(\frac{\cos \left({y}_{t,\beta }-{\widetilde{l}}_{i}\right)-1}{2{\sigma }_{l}^{2}}\right)$$

where yt,β are the current estimates of the particles, and the weights are subsequently normalized such that \({\sum }_{\beta }{w}_{t,\beta }^{2}=1\). If the effective number of particles becomes too small, that is, \({N}_{\text{eff}}=1/{\sum }_{\beta }{w}_{t,\beta }^{2} < {N}_{p}/5\), the particles are resampled using low variance sampling58 and the weights equalized. This resampling step both allows for better coverage of probabilities and permits the particle cloud to sharpen again. The particle filter estimate at a given timepoint is given by the weighted circular mean \({\hat{y}}_{t}={{\arg }}({\sum }_{\beta }{w}_{t,\beta }\exp (i{y}_{t,\beta }))\) of the particle locations. In addition, we also calculate the circular variance as \(\mathrm{var}\left({y}_{t}\right)=1-\left|{\sum }_{\beta }{w}_{t,\beta }\exp \left(i{y}_{t,\beta }\right)\right|\).

Enhanced particle filter

This particle filter has identical initialization, prediction step and weight update at landmark encounters as the basic particle filter and proceeds in exactly the same way until the first landmark encounter. Subsequently, the enhanced particle filter can also use the absence of expected landmark encounters to narrow down its location posterior, similar to the network’s ability shown in Extended Data Fig. 5. This is implemented in the following way: if a particle comes within the observation threshold δ of a possible landmark location but no landmark encounter occurs, the particle is deleted by setting its weight to zero; afterwards the particle weights are renormalized. A complication to this implementation is that a subsequent landmark encounter only occurs if the current landmark is different than the previous one (‘landmark encounters’); to prevent the deletion of particles that correctly report a landmark at the current position but do not receive a landmark encounter signal because it is the same landmark as previously encountered, particles are deleted only if they come within the observation threshold δ to a possible landmark that is different than the last landmark and do not encounter it. In case all particles have been deleted, particles are resampled from a uniform distribution and their weights are equalized. As for the basic particle filter, particles are resampled whenever the effective number of particles becomes too small \({N}_{\text{eff}}=1/{\sum }_{\beta }{w}_{t,\beta }^{2} < {N}_{p}/5\). The particle filter estimate \({\hat{y}}_{t}={{\arg }}({\sum }_{\beta }{w}_{t,\beta }\exp (i{y}_{t,\beta }))\) and the circular variance \(\mathrm{var}\left({y}_{t}\right)=1-\left|{\sum }_{\beta }{w}_{t,\beta }\exp \left(i{y}_{t,\beta }\right)\right|\) are also calculated in an identical way.

Analysis of location disambiguation in output layer

The timing and accuracy of location disambiguation in Extended Data Fig. 5 was calculated in the following way: we first constructed the trajectory of the ‘alternative location hypothesis,’ corresponding to the location estimates of a model animal that made the wrong location disambiguation at the first landmark encounter, but otherwise updated its location by the correct velocity. This trajectory is shifted relative to the true trajectory by a constant distance equal to the distance between the two landmarks. At each point in time, we then identified the two neurons in the output population whose preferred locations were closest to that of the true and alternative trajectory, respectively; the activation of these neurons roughly corresponded to the height of the activation bump corresponding to the true and alternative location hypothesis as seen in Extended Data Fig. 5c,d,h. The disambiguation time was defined as the earliest time after which either the true or alternative location bump height fell below a threshold of 0.1 and stayed beyond that threshold until the end of the trial. To determine the accuracy of location disambiguation the network estimate at the last landmark interaction was analyzed. If this network estimate was closer to the true than to the wrong landmark location the trial was categorized as a correct trial, otherwise it was categorized as an incorrect trial.

State space analysis

We performed PCA on the hidden neuron states from training trials to obtain the top three principal directions. We then projected network states obtained from the distribution of testing trials 2 or 3 (Supplementary Information) onto these principal directions. The resulting reduced-dimension versions of the hidden neuron states from testing trials are shown in Fig. 2 and Extended Data Figs. 5 and 10.

Correlation dimension

To calculate the correlation dimension for the ANN and RSC activity, we first performed linear dimensionality reduction (PCA) on hidden layer activations from the training trials, retaining 20 principal components. For RSC data, rates were first low-pass filtered at 0.5 Hz. In the 20-dimensional space, we randomly picked 1,000 base points (500 for RSC). From each of these base points, we estimated how the number of neighbors in a ball of radius R scales with R. The minimum ball radius was determined such that the logarithm of the number of neighbors averaged over base points was near 1. The maximum radius was set to ten times the minimum radius, and intermediate values for the radius were spaced equally on a log scale. The slope of the linear part of the relationship between the logarithm of number of neighbors versus ball radius determined the fractal dimension

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.