Speed modulations in grid cell information geometry

Ye, Zeyuan; Wessel, Ralf

doi:10.1038/s41467-025-62856-x

Download PDF

Article
Open access
Published: 19 August 2025

Speed modulations in grid cell information geometry

Nature Communications volume 16, Article number: 7723 (2025) Cite this article

4288 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Grid cells, with hexagonal spatial firing patterns, are thought critical to the brain’s spatial representation. High-speed movement challenges accurate localization as self-location constantly changes. Previous studies of speed modulation focus on individual grid cells, yet population-level noise covariance can significantly impact information coding. Here, we introduce a Gaussian Process with Kernel Regression (GKR) method to study neural population representation geometry. We show that increased running speed dilates the grid cell toroidal-like representational manifold and elevates noise strength, and together they yield higher Fisher information at faster speeds, suggesting improved spatial decoding accuracy. Moreover, we show that noise correlations impair information encoding by projecting excess noise onto the manifold. Overall, our results demonstrate that grid cell spatial coding improves with speed, and GKR provides an intuitive tool for characterizing neural population codes.

Effect of reward on electrophysiological signatures of grid cell population activity in human spatial navigation

Article Open access 08 December 2021

Sense of self impacts spatial navigation and hexadirectional coding in human entorhinal cortex

Article Open access 02 May 2022

The grid code for ordered experience

Article 27 August 2021

Introduction

In navigation, it is crucial that the brain forms a certain internal representation of the external space¹. Grid cells are widely regarded as an essential component of internal spatial representation^2,3. Their hexagonal spatial firing patterns are thought to form a coordinate system of external space⁴ and support the downstream hippocampal spatial representations (e.g., place cells)^5,6,7,8,9. Yet maintaining precise spatial coding is particularly challenging at high running speeds, when self-location changes rapidly¹⁰. The effect of running speed modulation on grid cell population coding remains unclear.

Previous literature offers dual possible predictions about running speed modulation on grid cell codes. On the one hand, speed may support grid cell spatial coding. Running speed is known to mostly increase grid cell firing rates^11,12,13. Rats running at high speeds (10 cm/s to 50 cm/s) are also known to have more medial entorhinal cortex (MEC) cells coding spatial information than when running at a low speeds (2 cm/s to 10 cm/s)¹⁰. On the other hand, speed signals disrupt the phase differences between pairs of grid cells¹⁴. Increasing speed may also lead to larger input noise (possibly from the medial septum^11,15 or speed cells^16,17), causing larger noise error to accumulate over time, thus degrading spatial coding fidelity^18,19,20,21.

While these previous studies provide insights into speed modulations of grid cells, their analyses were limited to individual or pairs of grid cells^11,12,13,14 (although decoding analysis has been performed on MEC cell population¹⁰). Neurons in the brain represent information through their collective population activity. Population noise covariance can significantly impact information coding^{22,23,24,25,26,27,28}. Grid cells’ activities are especially known to be tightly coupled and change coherently^14,29. To study the speed modulation of grid cell code, it is important to analyze simultaneously recorded grid cell population activities, including the effect of noise covariance. Yet such a study is still lacking.

Because neural data is intrinsically high-dimensional, inferring the noise-covariance matrix can be difficult. A standard approach for discretely valued information is to compute the sample covariance across trial responses²⁴. To handle continuously varying stimuli (e.g., orientations of static grating stimuli), one typically first bins continuous parameters, then collected repeated trials at each bin^30,31,32,33. From these trial-based data, the noise covariance can be estimated using the sample-covariance estimator or, more recently, via a Wishart-process model³⁴.

However, discretizing continuous stimuli and collecting trial-based data can be impractical for two main reasons. First, high-dimensional inputs—such as natural images—require an exponentially large set of discretized values²⁷. Second, many naturalistic experimental paradigms (e.g., navigation tasks) lack repeated trials^31,33,35. A study on retinal representations of natural images addressed these challenges by substituting retinal data with convolutional neural network (CNN) units, explicitly formulating the noise covariance²⁷. However, this approach relies on the observed similarities between retinal neurons and CNN units³⁶. There’s a trend in neuroscience to move beyond trial-based experiments, towards trial-free naturalistic experiments^31,33. Yet, to our knowledge, it remains challenging to reliably estimate noise covariance from high-dimensional neural data in naturalistic tasks without repeated trials.

In this paper, we introduce Gaussian Process with Kernel Regression (GKR), a method for inferring both the smooth mean (manifold) and noise covariance from high-dimensional neural data, including recordings from naturalistic tasks. The study of manifolds and noise covariance falls within the framework of information geometry^27,37. We applied GKR to simultaneously recorded grid cell activities³⁵. We found that: (1) Running speed both dilates the grid cells’ toroidal-like manifold and increases noise; (2) Nevertheless, the effect of manifold dilation outpaces the effect of noise increase, as indicated by the overall higher Fisher information at increasing speeds, and further supported by improved spatial coding accuracy at higher speeds; (3) Furthermore, compared to hypothetical independently firing grid cells, we found that noise correlations in real grid cells “shape “ the noise structure such that more noise is projected onto the manifold surface, indicating that noise correlation in grid cells is information-detrimental. Overall, our results indicate that running speed enhances grid cell spatial coding through geometric modulations. GKR provides a useful tool to interpret noisy neural data from an intuitive information geometry perspective.

Results

Grid cell population spatial coding accuracy improves with increasing speed

We analyzed grid cell recordings from Gardner et al.³⁵, obtained as rats foraged in an open-field (OF). The dataset included approximately 60–200 simultaneously recorded grid cells per module, with the exact number varying by rat, recording day, and module (see Methods). The experiment comprised nine configurations, denoted using a notation system, for example, “R1M2” refers to rat R on day 1, specifically from grid cell module 2. Grid cells within the same module shared a similar spatial period but differed in phase. Raw spiking data were converted into firing rates using a kernel averaging (see Methods), with example rate maps shown in Fig. 1A and Supplementary Fig. 1.

**Fig. 1: Grid cell population spatial coding accuracy (SCA) improves with increasing speed.**

We analyzed the rats’ behavioral data and found a predominance of low-speed movement, resulting in a concentration of observations in the slow-speed range (Supplementary Fig. 2). Since the primary goal of this study is to compare grid cell representation properties across different speeds, it is essential to control nuisance factors—specifically, ensuring an equal number of data points across speed conditions. To achieve this, we randomly sampled data within discrete speed bins (bin width = 5 cm/s, see Methods) to create a balanced dataset, D_s, with an equal number of data points per bin. We generated fifty such ${{{\mathscr{D}}}}_{s}$ datasets per experimental configuration to estimate uncertainty.

One straightforward approach to evaluate the quality of neural spatial coding is by decoding location information from neural states. Good decoding performance indicates good neural spatial coding. We designed a locally linear classification accuracy to evaluate the quality of spatial coding, formally referred to as spatial coding accuracy (SCA) (Fig. 1B, see Methods). Specifically, at each speed bin value, several locations were randomly sampled. For each sampled location, we created two conjugate boxes centered near the sampled location but positioned in opposite directions, separated by a fixed distance of 10 cm. Data from these two boxes were collected, relabeled as class 1 and class 2, and then split into training and test sets. A logistic regression model was then trained to classify the data and evaluated on the test set. The classification accuracy, averaged over all randomly sampled spatial locations, is referred to as the SCA. SCA quantifies how well neural states corresponding to two nearby spatial locations are discriminable.

For each sampled dataset ${{{\mathscr{D}}}}_{s}$, we computed SCA across eight speed bins as described above. Fifty ${{{\mathscr{D}}}}_{s}$—each yielding eight metric-speed pairs (where metric is SCA in this case)—produced a metric-speed dataset of 400 points (dots in Fig. 1C). A natural idea is to fit a simple linear regression (e.g., least-squares regression) to these 400 points and use the slope to quantify how metric varies with speed. However, this approach is problematic as it assumes all observations are independent, which here is violated: our 400 points come from fifty ${{{\mathscr{D}}}}_{s}$ drawn from the same original dataset D, so they are statistically related. If one were to increase the number of sampled datasets ${{{\mathscr{D}}}}_{s}$, a naïve linear regression would misleadingly drive the slope’s estimation uncertainty toward zero.

To address this, we introduced Bayesian Linear Ensemble Averaging (BLEA, see Methods). BLEA proceeds in two stages: first, it fits a Bayesian linear regression separately to the metric-speed data from each ${{{\mathscr{D}}}}_{s}$, yielding fifty posterior distributions over the regression weights; then it combines these posteriors via Bayesian model averaging^38,39. The result is a Gaussian-approximated posterior over the slope (and intercept) together with predictive distributions, from which confidence intervals (CIs), p-values, and other statistical measures can be estimated using a Bayesian framework (see Methods)⁴⁰.

Applying BLEA to SCA (Fig. 1C, D), we found that SCA increases with speed, with a slope significantly larger than that of the label-shuffled dataset. This result holds across different classifiers used for computing the SCA (support vector machine and perceptron, see Supplementary Fig. 3), indicating better self-location representations in grid cell population at higher speeds.

Gaussian Process with Kernel Regression (GKR) method for fitting manifold and covariance matrices from noisy neural states

What are the underlying neural mechanisms contributing to the improved spatial coding in grid cells? To explore this question, we need a tool to analyze the high-dimensional noisy neural states. A recent popular neural population geometry framework suggests that, instead of analyzing noisy high-dimensional data, it will be more intuitive to use certain methods to extract the data’s underlying smooth manifold along with the noise covariance^34,41,42. Wishart process is such a method that can infer the smooth manifold and covariance matrix³⁴. However, the recent implementation of Wishart process requires trial-based experimental paradigm, which forbids this method to be used in broader and complex natural behaving experiments³⁴. The OF task (Fig. 1A) is one of such natural behaving experiments without strict repeated trials.

Therefore, we developed Gaussian Process with Kernel Regression (GKR) method. The main purpose of GKR is to estimate the representation manifold relevant to the (known) labels of interest⁴². Response fluctuations arising from nuisance latent variables (e.g., emotional states) are captured as a noise covariance term. For example, in this paper, locations and speeds are the labels of interest, while neural response fluctuations due to other factors are treated as “noise,” summarized in the noise covariance matrix.

Here we illustrate the principles of GKR. A dataset (e.g., ${{{\mathscr{D}}}}_{s}$) contains noisy neural states r whose dimensionality equals the number of neurons; and labels x whose dimensionality equals the number of label variables. A label variable can be stimulus parameters (e.g., grating stimulus’ orientation), latent variables (e.g., internal decision factor) or behavior variables (e.g., x, y locations and speed). GKR assumes that r follows a Gaussian distribution

$${{\bf{r}}}\left({{\bf{x}}}\right)={{\mathbf{\mu }}}\left({{\bf{x}}}\right)+{{\mathscr{N}}}\left({{\mathbf{\epsilon }}}{;}0,\Sigma \left({{\bf{x}}}\right)\right)$$

(1)

where μ(x) is assumed to be a smooth-varying mean, also called a manifold in this paper; Σ(x) is a smooth-varying covariance. The combination of manifold and noise covariance is referred to as a statistical manifold of neural response. The goal of GKR is to infer the manifold and covariance from a dataset $\{{{\bf{r}}},{{\bf{x}}}\}$.

GKR solves this inference problem by two steps (Fig. 2A, see Methods): (1) inferring smooth manifold ${{\mathbf{\mu }}}\left({{\bf{x}}}\right)$ via Gaussian process regression³⁸; (2) inferring the covariance matrix by applying a kernel averaging to residues ${{\bf{r}}}\left({{{\bf{x}}}}_{{{\rm{i}}}}\right)-{{\mathbf{\mu }}}\left({{{\bf{x}}}}_{{{\rm{i}}}}\right)$ (index i represents the i-th data point). Kernel parameters are optimized to maximize data log-likelihood.

**Fig. 2: Gaussian Process with Kernel Regression (GKR) infers the smooth manifold and covariance from noisy data.**

GKR outperforms empirical estimation methods on synthetic datasets

We evaluated GKR on both a one-dimensional synthetic model (Fig. 2B–D) and a two-dimensional synthetic model (Supplementary Fig. 4). Each model consisted of a ground truth manifold, ${{\mathbf{\mu }}}\left({{\bf{x}}}\right)$, where each component represented a synthetic neural tuning curve, and a covariance matrix, $\Sigma \left({{\bf{x}}}\right)$. We generated data from these models using a Gaussian distribution (Eq. 1) and applied GKR to infer the manifold and covariance matrix.

For comparison, we also applied bin averaging and the Ledoit-Wolf (LW) method. In bin averaging, we discretized the label space ${{\boldsymbol{x}}}$ into small bins, treating all data within a bin as having the same label. The sample mean and covariance within each bin served as estimates of the inferred manifold and covariance matrix. The LW method builds on bin averaging by incorporating shrinkage regularization to improve covariance estimation (see Methods)⁴³.

Using the inferred manifold and covariance matrix, we computed other geometric quantities, including the Riemannian metric, precision matrix, and Fisher information (see Methods). To assess the inference performance, we compared these inferred quantities to their ground truth values by computing the relative estimation error, defined as the difference between the estimate and the ground truth, normalized by the ground truth. Across various experimental conditions and in both one-dimensional and two-dimensional synthetic datasets, GKR consistently outperformed bin averaging and LW methods (Fig. 2B–D, Supplementary Fig. 4).

Grid cell population activity manifold exhibits a toroidal-like topology

We then applied GKR to the grid cell sampled dataset ${{{\mathscr{D}}}}_{s}$. The inferred manifold is intrinsically three-dimensional, as it is parameterized by three label variables of interest: two spatial locations and one speed (notably, these three labels are largely uncorrelated, see Supplementary Fig. 5). To characterize the fitted manifold’s topology, we conducted a persistent homology analysis (see Methods) and found that the manifold exhibits possibly toroidal topology (Supplementary Figs. 6A, B), consistent with previous findings³⁵.

Next, we examined how spatial locations were represented by grid cells. For each speed value, we defined a speed-slice manifold (SSM) as a cross-section of the full manifold, obtained by fixing speed while varying location (Fig. 3A). To visualize the SSM, we randomly sampled points from the manifold at a fixed speed of 20 cm/s, projected them onto the first six principal components (PCs) using principal component analysis (PCA), and further reduced the dimensionality to three using Uniform Manifold Approximation and Projection (UMAP)⁴⁴. The resulting visualization (Fig. 3B), along with persistent homology analysis (Supplementary Fig. 6C), suggests that the SSM exhibits possibly toroidal topology.

**Fig. 3: Speed dilates the grid cell population toroidal-like manifold.**

Running speed dilates the grid cell toroidal-like speed-slice manifold

A key question is how speed modulates the geometry of the SSM. Direct visualization of SSMs at different speeds is challenging, so we instead examined speed modulation using an example lattice on the SSM. This lattice consists of four spatially adjacent points, centered at the OF center, with an inter-point distance of 2 cm (Fig. 3C). While keeping these four spatial locations fixed, we varied the speed components to construct a lattice manifold. PCA analysis revealed that this manifold is low-dimensional, with three principal components accounting for over 90% of the variance (Supplementary Fig. 7). Based on this, we projected the lattice manifold into three or two dimensions for visualization (Fig. 3D). The results indicate that the lattice expands as speed increases.

In addition to the lattice manifold, we also visualized other manifold slices, including those obtained by fixing the rat’s x-coordinate and using a larger lattice. All visualizations consistently imply that the SSM dilates with increasing speed (Supplementary Fig. 7).

To quantify changes in SSM size, we used two metrics: (1) SSM radius, defined as the average distance from the SSM surface to its center, providing a global measure of SSM size; and (2) Lattice area, computed as the area of a parallelogram whose sides are tangent vectors of the SSM, capturing local manifold surface size. Across all experimental configurations examined, both SSM radius and lattice area increase with running speed, indicating that the grid cell SSM dilates as speed increases (Fig. 3E).

Running speed increases grid cell population noise

SSM dilation intuitively enhances spatial coding. Consider a binary classification task distinguishing two neural state classes corresponding to nearby locations (e.g., Fig. 1B). By increasing the separation between these classes, SSM dilation makes their representations more distinguishable. However, discriminability is not solely determined by distance—noise strength also plays a crucial role. Increased noise in the grid cell population reduces spatial coding accuracy. This raises a question: what’s the effect of running speed on grid cell population noise?

To investigate this, we fitted manifolds and noise covariances for the sampled datasets ${{{\mathscr{D}}}}_{s}$ using GKR. Total noise is the trace of the covariance matrix (Fig. 4A). We found that total noise increases with increasing speed (Fig. 4B, C). Compared to total noise, noise projected onto the manifold may be more relevant to information coding²⁶. Therefore, we projected the covariance matrix onto the tangent plane of the SSM, and computed the trace of the projected covariance matrix as the projected noise. Consistent with total noise, projected noise also increases with speed (Fig. 4B, C).

**Fig. 4: Running speed increases grid cell population noise.**

Fisher information increases with increasing speed, indicating that the effect of speed-slice manifold dilation outpaces the effect of increasing noise

Running speed has opposing effects on spatial coding. On the one hand, it expands the smooth spatial manifold (SSM), pushing neural representations of nearby locations further apart (Fig. 3E), thereby improving spatial coding. On the other hand, it increases grid cell population noise (Fig. 4C), which degrades spatial coding. To assess the overall impact, we examined (linear) Fisher information, a metric that quantifies the local discriminability of neural population representations, defined as ${\left(\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}\right)}^{T}{\Sigma }^{-1}\left(\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}\right)$, which incorporates both the noise factor ($\Sigma$) and the lattice area factor (lattice area is formed by tangent vectors $\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}$) (Fig. 5A). Fisher information is a commonly used metric for assessing the local discriminability of neural population representations. The total Fisher information, given by the trace of the Fisher information matrix, measures the overall precision of the representation, with higher values indicating better spatial coding⁴⁵.

**Fig. 5: Grid cells’ (linear) Fisher information increases with increasing running speed.**

It is well known that Fisher information is hard to estimate in a high-dimensional space²⁴. Therefore, besides using the original ${{{\mathscr{D}}}}_{s}$, we also projected ${{{\mathscr{D}}}}_{s}$ into the first six PCs, denoted as ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$. Each ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$ was then fed into GKR for fitting a GKR model. GKRs fitted from both ${{{\mathscr{D}}}}_{s}$ and ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$ were analyzed identically to double-check our results on Fisher information.

We computed the total Fisher information from the fitted GKRs (see Methods, Supplementary Fig. 8A for ${{{\mathscr{D}}}}_{s}$ and Fig. 5A for ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$). Slope analyses indicate that Fisher information increases with running speed in both ${{{\mathscr{D}}}}_{s}$ and ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$ (Fig. 5B for ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$, Supplementary Fig. 8B for ${{{\mathscr{D}}}}_{s}$, although four out of nine experimental configurations in the results of ${{{\mathscr{D}}}}_{s}$ do not show statistical significance, possibly due to the curse of dimensionality). This increase in Fisher information with running speed suggests that the effect of SSM dilation outweighs that of increasing noise, leading to improved spatial coding at higher speeds, which is qualitatively consistent with the results obtained using SCA (Fig. 1B–D).

Fisher information, derived from a purely bottom-up geometric approach, in fact is intrinsically linked to SCA (Fig. 1B). Specifically, we established a theoretical upper bound for SCA directly from Fisher information (see Methods). This bound is approximately a linear function of the square root of total Fisher information. Therefore, an increase in Fisher information implies a corresponding increase in SCA.

We first tested this theoretical upper bound in synthetic datasets (Supplementary Fig. 9). We showed that the SCA computed directly from the synthetic datasets is well bounded by the theoretical upper bound predicted by the Fisher information. Moreover, the upper bound exhibits trends consistent with actual SCA across different dataset parameters (e.g., number of data points, dimensionality).

We then applied this analysis to grid cell datasets, computing the theoretical SCA upper bounds using Fisher information fitted from GKR. The computed upper bounds are well above the actual SCA values (Fig. 5C, Supplementary Fig. 8C). More importantly, both SCA and its upper bound exhibit similar speed modulation effects. The correlations between the predicted upper bounds and actual SCA are positive (Fig. 5D, Supplementary Fig. 8D).

Overall, Fisher information, derived from the geometric properties of the SSM and noise, quantitatively aligns with decoding performance measured via SCA. Both approaches support the conclusion that grid cell spatial coding improves with increasing running speed.

Results from GKR agree with those from a modified method that does not assume data normality

GKR approximates the distribution $p\left({{\bf{r}}}|{{\bf{x}}}\right)$ as a normal distribution, which might not always valid (see Methods and Supplementary Fig. 10). To verify the validity of our previous sections’ results, we propose a modified approach—Gaussian process regression with kernel sampling (GKR-S)—that does not assume data normality. GKR-S contains two steps: first, it uses Gaussian process regression to estimate the manifold (as in GKR); second, it resamples from neighboring labels ${{{\bf{x}}}}_{{{\boldsymbol{i}}}}$ to generate pseudo-samples and thereby estimate the conditional distribution (see Methods).

We evaluated GKR-S on a synthetic dataset and found that it performs well only for low-dimensional data (Supplementary Fig. 11A). Therefore, we projected the grid cell datasets onto their first six PCs and applied both GKR-S and GKR to estimate geometric properties (e.g., manifold size, Fisher information etc.). The two methods yielded closely matching results, both revealing better spatial representation at higher running speeds, indicating that, despite its normality assumption, GKR remains a reasonable method to estimate these geometric properties (Supplementary Fig. 11).

Speed modulation effects on grid cell information geometry can be qualitatively reproduced by a simulated independent Poisson Speed-Gain (IPSG) grid cell population

What do these population results imply about individual grid cells? Here, we propose a simple model—the Independent Poisson Speed-Gain (IPSG) grid cell population, which contains three key assumptions: (1) grid cell firings are independent; (2) grid cells are Poisson neurons; and (3) the effect of running speed on individual grid cell firing is a monotonically increasing gain factor¹¹.

Using these assumptions, we derived analytical expressions for manifold size, noise strength, and Fisher information as functions of running speed. Both these analytical results and additional numerical simulations demonstrate that the IPSG can qualitatively reproduce the observed positive speed modulation effects in Figs. 3E, 4C, 5B (Supplementary Fig. 12, see Methods).

Intuitively, in IPSG, increasing speed amplifies each cell’s firing without altering its spatial tuning. Because manifold size scales roughly with firing rate, it too grows with speed (see Methods, Fig. 3E, Supplementary Fig. 12). Poisson-neuron assumption implies that noise (the standard deviation) scales as the square root of the firing rate, so noise increases more slowly than firing rate. As a result, each cell’s signal-to-noise ratio—and hence its Fisher information—rises with speed under the independent-firing assumption (see Methods). More loosely, IPSG suggests that faster running increases grid-cell firing (without disrupting the rate map too much) more than noise and the effects of noise correlations are negligible, explaining the observed speed modulations on the manifold (Figs. 3E, 4C, 5B). Whether this picture holds in real grid-cell data remains unclear. We leave more precise theoretically modeling and validations for future work, while this paper focuses more on data-driven descriptive analysis.

Grid cell activity noise correlation is information-detrimental

Our results indicate that grid cell spatial coding improves at high speeds, based on our analysis of simultaneously recorded grid cell population activity. A key advantage of analyzing population activity, as opposed to individual neural responses, is that it inherently accounts for the effects of noise correlation on spatial coding. Here, we use the term ‘noise correlation’ specifically to refer to the cell-to-cell noise covariance (i.e., between two different cells). Noise correlation can be information-beneficial or information-detrimental, depending on the geometric relation between noise covariance and the information encoding manifold (Fig. 6A, also see a two-neuron toy example as an illustration in Supplementary Fig. 13)^24,26. In this section, we explicitly examine how noise correlation influences grid cell spatial coding.

**Fig. 6: Noise correlation in grid cells increases noise projection onto the manifold and impairs information coding.**

To explore the role of noise correlation in spatial coding, we compared outcomes from the original grid cell dataset (${{{\mathscr{D}}}}_{s}$) with those from a hypothetical population of independent firing grid cells (IFGC)²⁶. A conventional method to eliminate noise correlation is trial shuffling. For neural states recorded under identical conditions (e.g., the same spatial location) across different trials, one can randomly permute the firing profile of each neuron across trials. This shuffling preserves single-cell statistics while effectively disrupting intercellular noise correlation.

Although the OF task does not include repeated trials, the GKR serves as a generative model that can produce multiple data points for each condition. Specifically, for a given condition ${{\boldsymbol{x}}}$, the fitted GKR generates (theoretically) an infinite number of data points drawn from a Gaussian distribution (Eq. 1). After the trial-shuffling procedure is applied to these generated data points, the mean remains unchanged, but the covariance matrix becomes diagonal, with all off-diagonal elements set to zero. Thus, the IFGC’s GKR model is equivalent to the original GKR model with a purely diagonal covariance matrix.

We computed the total noise levels for both the GKR and IFGC GKR models (Fig. 6B). As expected, removing the off-diagonal elements of the covariance matrix does not alter its trace, leaving the overall noise unchanged. However, this manipulation affects the projected noise: the GKR model exhibits a higher projected noise than the IFGC GKR model (Fig. 6C). This finding indicates that cell-to-cell noise correlations in grid cell activity “reshape” the noise structure, directing a larger fraction of noise onto the torus surface. Consistent with this, increasing noise correlation leads to smaller Fisher information (Fig. 6D), underscoring their detrimental impact on information encoding.

To further validate that the correlation is information-detrimental, we employed a top-down decoding approach based on linear classification accuracy of neural states from two spatial boxes (i.e., SCA in Fig. 1B), without using GKR. For the IFGC SCA, we randomly permuted each neuron’s firing rates within each box. This procedure preserves the single-cell statistics while effectively disrupting the noise correlations. Analysis of the permuted data showed that the IFGC SCA exceeded the SCA from the original datasets (Fig. 6E), thereby confirming that noise correlation is information detrimental.

Discussion

Accurate internal spatial representation is essential for navigation, and grid cells are widely regarded as a fundamental component of this process^2,3. Previous analyses of speed modulation on grid cell coding have predominantly focused on individual cells or cell pairs^11,12,13,14, thereby neglecting the influence of population noise covariance—a factor that can significantly impact coding fidelity²⁶. Here, we developed GKR to study the population coding from an information geometry perspective. We demonstrated that the grid cell manifold expands in size as speed increases. This manifold dilation effect exceeds the increase in noise, as indicated by the higher Fisher information observed at high speeds. Overall, our results favor the hypothesis that increasing running speed increases grid cell spatial coding accuracy. GKR can be a powerful tool to study neural population representation from an intuitive information geometric perspective.

However, GKR has its limitations. First, it does not perform well in very high-dimensional spaces, which may require a larger number of data points (Fig. 2 and Supplementary Fig. 4). This issue may be mitigated by first applying a dimensionality reduction method to the data⁴⁶. Second, GKR assumes that the data follows a normal distribution. While the normal distribution is a commonly used approximation^34,38—GKR may produce unreliable results if the true distribution deviates significantly from normality. It is advisable to perform a normality test (Supplementary Fig. 10), compute test data’s likelihood³⁴, or use an alternative method (e.g., GKR-S, Supplementary Fig. 11) to verify the results. Finally, GKR is only applicable when the data explicitly contains labels. In other words, its purpose is to evaluate the geometric representation properties of known labels of interest. For example, in vision⁴², navigation, or working memory⁴⁷ studies, the labels of interest are often defined by stimulus parameters. However, in cases where one aims to evaluate the representation structure of latent variables, it is necessary to first apply latent variable inference methods^46,48, and then apply GKR. Overall, future improvements to GKR could focus on enhancing its performance in high-dimensional settings, adapting it for non-normal data, and extending its applicability to scenarios without explicit labels.

We analyzed the population-level properties of grid cell activities, but what are the implications of these findings for individual grid cells? There are two major models of grid cells⁴⁹: the rate-based model and the oscillatory-interference model. First, in some rate-based models (continuous attractor networks²⁰), running speed serves as an input to the grid-cell network. Faster speeds may elevate firing rates^11,12, which can sharpen the spatial rate map, enhance the signal-to-noise ratio, and thus boost population Fisher information (Fig. 5). Although some other rate-based models contain normalization mechanisms implying the opposite—that the population mean firing rate should not change with speed (e.g., self-organizing models⁵⁰). Secondly, in the oscillatory-interference model, running speed modulates the frequency of membrane potential oscillation which might lead to more accurate spatial fields⁴⁹ (although a study suggests that MEC theta frequency is modulated by acceleration rather than speed⁵¹). Finally, higher running speeds imply more frequent encounters with environmental boundaries, allowing for more frequent corrections in grid coding¹⁸. Despite these conjectures, it should be noted that real grid cells are complicated, involving correlated noise. Models based solely on individual grid cells, without accounting for noise correlations, may result in substantial estimation errors, as shown by the pronounced discrepancies between the original data and the IFGC model in Fig. 6. The connection between population results and individual grid cells remains for future exploration.

Rats typically exhibit higher running speeds in novel environments⁵², as implied by this paper, which might enhance grid coding thus supporting more effective adaptation to novel surroundings. Grid cell representations are known to change in novel environments^53,54,55. Some studies suggest that the grid pattern rescales in a novel environment (e.g., Barry et al. 2012⁵⁶); others propose that rats refine their grid coding by learning the environment’s boundaries^18,55. These alterations in individual grid cell patterns may reflect corresponding changes in the representation geometry, such as a rescaling of the toroidal structure or localized distortions near environmental boundaries. The effects of environmental modulation on population-level representations remain an open question for future investigation.

Beyond grid cells, how does running speed influence other cell types in the navigation system? Hardcastle et al. found that the spatial decoding accuracy of MEC neurons improves at higher speeds, suggesting that increased speed generally benefits MEC spatial representation¹⁰. This aligns with findings that, similar to grid cells, running speed predominantly increases the firing rates of other MEC cell types, including head direction cells, speed cells, and conjunctive cells^11,17. However, the modulation effects of running speed on hippocampal cells may be more complex. Grid cells are modeled as a primary feedforward input to the hippocampus^8,57, suggesting that running speed should also enhance place cell representation as well. However, this feedforward model is a simplification, as the hippocampus sends feedback projections to the MEC⁵⁸. Moreover, place cells receive inputs not only from grid cells but also from other sources, such as head direction cells⁵⁷. These additional mechanistic factors obscure how running speed modulates place cell activity. Indeed, earlier research suggests that the majority of place cells are not strongly modulated by running speed, at least not in an obvious manner⁵⁹. Nevertheless, beyond speed modulation, movement direction could influence the representational geometry of place cells, as it has been shown to reshape place fields⁶⁰. Extending geometric analyses to other cell types remains an interesting avenue.

Running speed modulation effects have been widely observed across other brain regions as well⁶¹. For example, locomotion primarily suppresses neural activities in the auditory cortex^62,63. In contrast, locomotion generally enhances V1 neuron activity^64,65, but may turn to suppression after certain high running speeds⁶⁶. In fact, the effect of locomotion modulation is usually entangled with other modulation factors^61,66,67. For example, V1 neural activities are jointly influenced by both animal’s running speed and visual stimuli movement speed⁶⁵. This influence can be mathematically expressed as a weighted sum of the two speed contributions, with weights varying diversely across neurons. The geometric approach has been shown to be a practically effective method to assist in understanding the diversity of individual neurons from a comprehensive population-level perspective^41,68,69. GKR can be a useful tool to understand the diversity of running speed modulations in different brain areas.

One advantage of GKR is its ability to provide detailed inspection of location geometry. Local geometry reveals the intricacies of information coding within a small range of values, which is particularly useful for comparing the representation bias of different information values. Representation bias has been observed in the navigation system^60,70. For instance, place and grid cells’ fields tend to shift towards reward locations, which has been interpreted as an overrepresentation of rewarded locations^70,71,72,73. From a geometric perspective, overrepresentation implies larger Fisher information, which can be attributed to either local manifold dilation, reduced projected noise, or both (Figs. 3, 4, 5). The concepts of local manifold dilation and reduced noise have been supported in working memory studies: (1) The working memory system may use attractors to reduce noise^47,74,75. (2) Recurrent neural networks (RNNs) trained on working memory tasks utilize larger state spaces to represent common values, thus yield improved Fisher information⁴⁷. In RNNs, manifolds are often observed to be quite simple, usually taking the form of a low-dimensional ring structure⁴⁷. This simplicity allows the size of the encoding space to be measured using straightforward methods. However, in the actual brain, manifolds can be highly complex and high-dimensional⁶⁹. The GKR method illustrated in this paper can be particularly helpful in studying the local structure of these complex, high-dimensional manifolds, assisting the analysis of representation bias.

Methods

Experimental data

Experimental data were collected by Gardner et al.³⁵. Rats performed open-field foraging (OF) tasks in a 150 cm wide OF box. Three-dimensional motion capture tracked the rats’ head positions and orientations using five retroreflective markers attached to the implant during recordings. The 3D marker positions were then projected onto the horizontal plane to determine the rats’ 2D positions. Neuropixel probes recorded neural activity in the MEC. Neural activity were then processed using a clustering method to classify neurons into grid cells and non-grid cells³⁵. In total, these procedures yielded nine sets of simultaneously recorded grid cell population activities (i.e., nine experimental configurations): rat ‘R’ day 1 modules 1, 2, 3; rat ‘R’ day 2 modules 1, 2, 3; rat ‘S’ module 1; and rat ‘Q’ modules 1, 2. We used a shorthand notation, e.g., “R1M2”, to represent rat R (“R”) on day 1 (“1”) and grid cell module two (“M2”). Note that “R1” does not necessarily refer to the same day as “S1”. Day labels are used solely to distinguish recordings from the same rat. These processed data are available from Gardner et al. 2022³⁵.

Grid cell rate map

The grid cell rate maps shown in Fig. 1A and Supplementary Fig. 1 were computed as follows. Firing rate was estimated by dividing spike counts by 10‐ms time bins and then convolving the result with a Gaussian filter with a standard deviation of 20 ms. To estimate the averaged firing rate at different locations, the OF box ($150\times 150$ cm) was digitized into small spatial bins of $3\times 3$ cm. Firing rates at each visited spatial bin were averaged, and those at each unvisited bin were set to 0. To correct the effect of unvisited bins, we created a mask $({M}_{0})$ with a value of 1 at the visited bins and 0 at unvisited bins. Next, both the firing rate and mask ${M}_{0}$ were spatially convolved with a 2D Gaussian filter with a standard deviation σ = 8.25cm. The convolved firing rate was divided by the convolved ${M}_{0}$ to obtain the final corrected rate map for each cell.

Gridness

Gridness measures how well a grid cell’s rate map conforms to a hexagonal pattern¹². Some grid cells’ rate maps have incomplete peaks at the OF box boundaries. To correct this boundary effect, the rate map was first padded by 30 cm on each side. This padding was performed by linearly ramping the firing rate at the edges to zero over the outer 30 cm of the padded area. (implemented using the ‘numpy.pad‘ function in Python, with ‘mode = ‘linear_ramp’‘). Autocorrelating the padded rate map produced an autocorrelation map. The boundary effect of the autocorrelation was corrected by padding with zeros on all sides (implemented using ‘scipy.signal.correlate2d (padded_rate_map, padded_rate_map, mode = ‘same’, boundary = ‘fill’, fillvalue = 0)‘).

The autocorrelation map was masked by two circles centered at the map’s center. The outer circle’s diameter matched the edge length of the autocorrelation map. The inner circle’s area was 15% of the outer circle’s area to filter out center peaks on the map. Only the regions between the two circles were kept; the rest were set to 0. Next, the masked autocorrelation map was correlated with its rotated versions (rotated by 30, 60, 90, 120, and 150 degrees, respectively). A well-defined grid cell should have peak correlation values at 60 and 120 degrees, and valleys at 30, 90, and 150 degrees. Gridness was calculated by subtracting the average valley values (30, 90, and 150 degrees) from the average peak values (60 and 120 degrees).

Data preprocessing

Time was binned in 10‑ms intervals. Spikes count at each time bin was computed, and then divided by 10 ms as an estimate of firing rate. The firing rate was then temporally smoothed using a Gaussian kernel with a standard deviation of 20 ms. To estimate the rat’s speed, velocity was first computed by computing the finite differences of the rat’s positions, i.e., $\left({{{\bf{p}}}}_{i+1}-{{{\bf{p}}}}_{i-1}\right)/20$ where ${{{\bf{p}}}}_{i}$ is the rat’s position at time bin $i$. The velocity’s L2 norm is the speed. This procedure provided a feature map indicating grid cell firing rates, with rows representing time bins and columns representing grid cell IDs; and a label with rows representing time bins and three columns indicating $x$ location, $y$ location, and speed. Data with speeds lower than 5 cm/s or higher than 45 cm/s were excluded. Grid cells with low gridness (below 0.1) were also excluded. The combined feature map and label are termed as a grid cell dataset, denoted as ${{\mathscr{D}}}$. There are 9 grid cell datasets corresponding to different experimental configurations (different rats, grid cell modules, and different days). The number of grid cells in each dataset is: 113 in R1M1, 132 in R1M2, 51 in R1M3, 140 in R2M1, 153 in R2M2, 62 in R2M3, 96 in S1M1, 81 in Q1M1, 53 in Q1M2.

The speed distribution in ${{\mathscr{D}}}$ is highly biased. It has more data in low-speed region than in the high-speed region (Supplementary Fig. 2). This biased distribution of data may cause potentially biased evaluation. To avoid this, we performed resampling on the dataset as follows. Speeds ranging from 5 cm/s to 45 cm/s were binned into 5 cm/s bins. Data in each speed bin was collected. Let ${N}_{{sp}}^{\min }$ denote the minimum number of data points among all speed bins, we defined $K$ as the $\min \{{N}_{{sp}}^{\min },{\mathrm{10,000}}\}$. In each speed bin, we sampled $K$ data points (without replacement). Sampled data points from different speed bins were combined to create a single sampled dataset, denoted as ${{{\mathscr{D}}}}_{s}$. ${{{\mathscr{D}}}}_{s}$ has a roughly equal amount of data at each speed value. The above sampling procedure was repeated 50 times, resulting in 50 sampled datasets ${{{\mathscr{D}}}}_{s}$ per experimental configuration.

As a baseline comparison, we also shuffled the data ${{\mathscr{D}}}$ by permuting the label timestamps, thereby disrupting the relationship between neural states and labels. This permuted data was then processed using the same sampling procedure as described above, yielding 50 label-shuffled-sampled datasets.

The dimensionality of ${{{\mathscr{D}}}}_{s}$ is the number of grid cell, which can be more than 100. This can pose a challenge in accurately estimating covariance and Fisher information²⁴. Therefore, we also performed PCA on ${{{\mathscr{D}}}}_{s}$, projecting onto the first 6 principal components to obtain ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$. The same projection procedure was applied to the shuffled datasets. These projected datasets were used for estimating Fisher information in Fig. 5, and comparing to GKR-S in Figure 11.

Spatial coding accuracy

A common way to evaluate the quality of neural population representation is to assess how accurately a simple linear classifier can distinguish between neural population representations of two adjacent experimental conditions (e.g., stimulus parameters or locations in this paper)²². In this paper, this type of classification accuracy is referred to as spatial coding accuracy (SCA, Fig. 1B).

For a given sampled dataset ${{{\mathscr{D}}}}_{s}$, we split it into 8 speed-split datasets (SSD) based on speed values. Specifically, data with speed values within $\left[{v}_{i},{v}_{i}+5{\mbox{cm}}/{\mbox{s}}\right]$ were collected as one SSD, where ${v}_{i}={\mathrm{5,10}},\ldots,40{\mbox{cm}}/{\mbox{s}}$. For each SSD, we randomly sampled 300 spatial locations ${{{\bf{x}}}}_{c}$. For each location ${{{\bf{x}}}}_{c}$, we constructed two adjacent locations ${{{\bf{x}}}}_{\pm }={{{\bf{x}}}}_{c}\pm {{\rm{\delta }}}l\hat{{{\bf{e}}}}$, where $\hat{{{\bf{e}}}}$ is a unit vector with a random angle, ${{\rm{\delta }}}l=5{\mbox{cm}}$. Each ${{{\bf{x}}}}_{\pm }$ defines a small spatial box, centered at ${{{\bf{x}}}}_{\pm }$ with an edge length of 10 cm. Data from the two boxes were collected. To ensure fair classification, the data from the box with the larger number of data points were subsampled (without replacement) so that both boxes had an equal number of data points. The data from the two boxes were then concatenated. If the total number of data points was less than 50, this ${{{\bf{x}}}}_{c}$ was discarded due to insufficient data. Otherwise, the concatenated data was split into train and test sets (0.67:0.33). A logistic classifier (with an L2 regularization coefficient C = 1, using the scikit-learn package) was then trained on the train set and evaluated on the test set. The classification accuracy averaged across all valid ${{{\bf{x}}}}_{c}$ is defined as the SCA of that speed bin $\left[{v}_{i},{v}_{i}+5{\mbox{cm}}/{\mbox{s}}\right]$. This procedure was applied to all speed bins, ${{{\mathscr{D}}}}_{s}$ (or ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$ in Fig. 5C), and the label-shuffled dataset.

Besides using logistic regression for classification (Fig. 1), we also computed SCA using perception and support vector machines, see the results in Supplementary Fig. 3.

Bayesian linear ensemble averaging and statistical testing

Metrics considered in this paper include SCA (e.g., Figs. 1C, D, 5C), torus radius, lattice area (e.g., Fig. 3E), total and projected noise (e.g., Fig. 4B, C), and Fisher information (e.g., Fig. 5, Supplementary Figs. 8A, B), etc. For each sampled dataset ${{{\mathscr{D}}}}_{s}$, we computed the metric values at different speed bins, forming a metric-speed dataset consisting of metric value ${t}_{i}$ and the corresponding speed value ${v}_{i}$, where $i$ indexes the i th data point in the metric-speed dataset. For example, one dot in Fig. 1C is one data point in the SCA-speed dataset (with a corresponding ${{{\mathscr{D}}}}_{s}$). For convenience, we define ${{{\bf{x}}}}_{{{\boldsymbol{i}}}}=\left({v}_{i},1\right)$, which includes speed and a constant for a bias parameter. Currently, we limit our discussion to one ${{{\mathscr{D}}}}_{s}$, and later we will ensemble results from different ${{{\mathscr{D}}}}_{s}$ by Bayesian model averaging³⁹.

Given a metric-speed dataset from one ${{{\mathscr{D}}}}_{s}$, we used Bayesian linear regression (BLR) to fit the linear relationship between a metric and speed³⁸. The benefit of BLR over ordinary least squares is that BLR naturally provides a way to set the regularization parameter (by setting the prior) and offers the posterior distribution of inferred parameters (e.g., slope), thus enabling a pure Bayesian analysis of the data. We follow the implementation of BLR in Bishop 2006³⁸.

In BLR, the relationship between metric and speed is modeled as

$$t=y\left({{\bf{x}}},{{\bf{w}}}\right)+\epsilon$$

(2)

where $\epsilon \sim {{\mathscr{N}}}\left(\epsilon |0,{\beta }^{-1}\right)$, $\beta$ is a scalar representing precision, and $y={{{\bf{w}}}}^{T}{{\bf{x}}}$. This equation indicates that the conditional distribution $p\left(t|{{\bf{x}}},{{\bf{w}}}\right)$ is a Gaussian distribution with mean $y$ and variance ${\beta }^{-1}$. The prior for parameter ${{\bf{w}}}$ is ${{\bf{w}}}\sim {{\mathscr{N}}}\left({{\bf{w}}}|0,{\alpha }^{-1}{\mathbb{I}}\right)$ where ${\mathbb{I}}$ is a $2\times 2$ identity matrix, and $\alpha$ is a scalar. Given the prior and conditional distribution, we can derive the posterior distribution $p\left({{\bf{w}}}|{{\bf{t}}},X\right)$ and predictive distribution $p\left({t}_{q}|{{{\bf{x}}}}_{q},{{\bf{t}}},X\right)$, where ${{{\bf{x}}}}_{q}$ is the query label, ${{\bf{t}}}$ and $X$ are the data points in the metric-speed dataset, ${t}_{q}$ is the prediction. Both posterior and predictive distributions are Gaussian distributions. Hyperparameters $\alpha$ and $\beta$ were estimated by maximizing the marginal likelihood $p\left({{\bf{t}}}|\alpha,\beta,X\right)$ through an iterative method³⁸.

Overall, given a metric-speed dataset (obtained from one sampled dataset ${{{\mathscr{D}}}}_{s}$), BLR provides the posterior distribution $p\left({{\bf{w}}}|{{\bf{t}}},X\right)$ and predictive distribution $p\left({t}_{q}|{{{\bf{x}}}}_{q},{{\bf{t}}},X\right).$ To simplify the notation, the two distributions are written as $p\left({{\bf{w}}}|{{{\mathscr{D}}}}_{s}\right)$ and $p\left({t}_{q}|{{{\bf{x}}}}_{q},{{{\mathscr{D}}}}_{s}\right)$, which follow ${{\mathscr{N}}}\left({{\bf{w}}}|{{{\bf{m}}}}_{w,s},{\Sigma }_{w,s}\right)$ and ${{\mathscr{N}}}\left({t}_{q}|{m}_{t,s},{\Sigma }_{t,s}\right)$, respectively.

We aggregate results from sampled datasets ${{{\mathscr{D}}}}_{s}$, i.e., $p\left({{\bf{w}}}|{{\mathscr{D}}}\right)$ and $p\left({t}_{q}|{{{\bf{x}}}}_{q}{,}{{\mathscr{D}}}\right)$, by Bayesian model averaging³⁹. Since each ${{{\mathscr{D}}}}_{s}$ is a random subsampling (under the restriction of an equal number of data points in each speed bin, see Methods: Data Preprocessing) of the ${{\mathscr{D}}}$, $p\left({{\bf{w}}}|{{\mathscr{D}}}\right)={\sum }_{s}p\left({{\bf{w}}}|{{{\mathscr{D}}}}_{s}\right)p\left({{{\mathscr{D}}}}_{s}|{{\mathscr{D}}}\right)={\sum }_{s}p\left({{\bf{w}}}|{{{\mathscr{D}}}}_{s}\right)/B$ where $B=50$ is the total number of sampled datasets. $p\left({{\bf{w}}}|{{\mathscr{D}}}\right)$ is a mixture of Gaussian distributions. For simplicity, we approximate it as a single Gaussian distribution (see details in SI Methods). The mean of the approximated $p\left({{\bf{w}}}|{{\mathscr{D}}}\right)$ is

$${{{\bf{m}}}}_{w}=\frac{1}{B}{\sum }_{s}{{{\bf{m}}}}_{w,s}$$

(3)

The covariance is

$${\Sigma }_{w}=\frac{1}{B}{\sum }_{s}{\Sigma }_{w,s}+\frac{1}{B}{\sum }_{s}\left({{{\bf{m}}}}_{w,s}-{{{\bf{m}}}}_{w}\right){\left({{{\bf{m}}}}_{w,s}-{{{\bf{m}}}}_{w}\right)}^{T}$$

(4)

where the first term is the average of covariances, second term represents bias.

Similarly, $p\left({t}_{q}|{x}_{q}{,}{{\mathscr{D}}}\right)$ can be approximated by a Gaussian distribution with certain mean and covariance matrix (see Supplementary Methods). In fact, the mean and covariance matrix are in forms that, after a Gaussian approximation on $p\left({t}_{q}|{x}_{q}{,}{{\mathscr{D}}}\right)$, ${t}_{q}$ is still a linear function of ${{\bf{w}}}$, as can be explicitly written as below

$${t}_{q}={{{\bf{w}}}}^{{{\rm{T}}}}{{{\bf{x}}}}_{{{\rm{q}}}}+{{\boldsymbol{\epsilon }}}$$

(5)

where ${{\bf{w}}}\sim {{\mathscr{N}}}\left({{\bf{w}}}{|}{{{\bf{m}}}}_{w},{\Sigma }_{w}\right)$, ${{\boldsymbol{\epsilon }}}\sim {{\mathscr{N}}}\left({{\boldsymbol{\epsilon }}};0,{\sum }_{s}{\beta }_{s}^{-1}/B\right)$, and ${\beta }_{s}$ is the best hyperparameter fitted using an iteration method from a sampled dataset ${{{\mathscr{D}}}}_{s}$ (see above). Overall, we obtained $p\left({{\bf{w}}}|{{\mathscr{D}}}\right)$ and $p\left({t}_{q}|{{{\bf{x}}}}_{{{\boldsymbol{q}}}}{,}{{\mathscr{D}}}\right)$, which are both Gaussian distributions. This overall method pipeline is called Bayesian linear ensemble averaging (BLEA) in this paper. Mathematical details can be found in Supplementary Methods.

$p\left({{\bf{w}}}|{{\mathscr{D}}}\right)$ and $p\left({t}_{q}|{{{\bf{x}}}}_{q}{,}{{\mathscr{D}}}\right)$ allow us to estimate the confidence interval (CI) and assess statistical significance from Bayesian framework⁴⁰. First, since the predictive distribution $p\left({t}_{q}|{{{\bf{x}}}}_{q}{,}{{\mathscr{D}}}\right)$ is a Gaussian, the 95% CI of the prediction (two-tailed) is given by an interval [a, b] such that $\Phi \left(\left(a-\mu \right)/\sigma \right)=0.025$ and $\Phi \left(\left(b-\mu \right)/\sigma \right)=0.975$, where $\Phi \left(\cdot \right)$ is a cumulative distribution function of a standard Gaussian distribution, $\mu$ and $\sigma$ are the predictive distribution mean and standard deviation. One example CI can be found in Fig. 1C, which well covers most of data points. 95% CI is also called a credible interval in the Bayesian framework⁴⁰.

Similarly, knowing the posterior distribution of slope $p\left({{\bf{w}}}|{{\mathscr{D}}}\right)$, we can also compute its 95% CI. For example, this is illustrated by the error bars in Fig. 1D.

We are interested in whether the slope fitted from ${{\mathscr{D}}}$ is statistically different from that fitted from the label-shuffled dataset (e.g., Fig. 1D). Therefore, we also prepared label-shuffled ${{{\mathscr{D}}}}_{s}$ from ${{\mathscr{D}}}$ (see Methods: Data Preprocessing), and ran the above analysis to obtain their label-shuffled posterior and predictive distributions. Both of the posterior distributions of the original and label-shuffled datasets are Gaussian, hence we defined the slope difference $d={w}^{{data}}-{w}^{{shuffle}}$, which also follows a Gaussian distribution with a mean equal to the difference between the two slope means and a variance equal to the sum of the two variances. Based on the distribution of $d$, probability of direction, ${p}_{d}$, can be computed as the maximum of $P\left(d > 0\right)$ and $P\left(d < 0\right)$. ${p}_{d}$ represents the probability that $d$ to be positive or negative (depending on which is the most probable). It directly relates to p-value (from a frequentist framework, two-sided) by $p=2\times \left(1-{p}_{d}\right)$, where the null hypothesis is that $d=0$ and alternative hypothesis is that $d\ne 0$⁴⁰. Thus, statistical statements can be made based on the p-values. One-side p value is ${p}_{d}$ or $1-{p}_{d}$ depending on the one-side direction.

We are also interested in whether the speed-averaged metric computed under the ${{\mathscr{D}}}$ is statistically different from that computed under the hypothetical independent firing grid cell assumption (IFGC, Fig. 6). For each ${{{\mathscr{D}}}}_{s}$, we averaged the metric value across speed values. This gives one ${\bar{t}}_{s}$. Fifty ${\bar{t}}_{s}$ were concatenated and fitted by a Gaussian distribution using the maximum log-likelihood method, as an approximation of $p\left({\bar{t}}_{s}|{{\mathscr{D}}}\right)$. Therefore, we can use the same method above to determine whether the speed-averaged metric computed under original dataset is statistically different from that computed from the IFGC (Fig. 6B–E).

Bin average and Ledoit-Wolf estimator

We approximate the neural population responses (neural states for short) as a Gaussian distribution:

$${{\bf{r}}}\left({{\bf{x}}}\right)={{\mathbf{\mu }}}\left({{\bf{x}}}\right)+{{\mathscr{N}}}\left({{\boldsymbol{\epsilon }}};0,\Sigma \left({{\bf{x}}}\right)\right)$$

(6)

where ${{\bf{r}}}\in {{{\mathscr{R}}}}^{N}$ represents a neural state containing N neurons, and ${{\bf{x}}}\in {{{\mathscr{R}}}}^{M}$ represents M labels. Labels are defined broadly. They can be stimulus parameters (e.g., grating image orientation, object positions), an agent’s latent state (e.g., latent dynamics factor, emotion), or an agent’s behavioral labels (e.g., agent speed, agent position). ${{\mathbf{\mu }}}$ is the mean of neural state, modeled as a continuous function of the labels. ${{\mathbf{\mu }}}$ is also referred to as a manifold. ${{\boldsymbol{\epsilon }}}$ is white noise with a covariance $\Sigma \left({{\bf{x}}}\right)$. Given noisy neural states ${{\bf{r}}}$ and corresponding labels ${{\bf{x}}}$, our goal is to infer the smoothly varying manifold ${{\mathbf{\mu }}}$ and covariance $\Sigma$.

Bin averaging is a straightforward estimation method. This approach divides the entire range of label ${{\bf{x}}}$ into small bins. Data points ${{{\bf{r}}}}_{i}$ within each bin are considered to have the same label ${{{\bf{x}}}}_{i}$. Hence, the manifold can be estimated by sample average ${{\mathbf{\mu }}}\left({{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)={\left\langle {{\bf{r}}}\right\rangle }_{{{{\bf{x}}}}_{{{\boldsymbol{i}}}}}$, where ${\left\langle \cdot \right\rangle }_{{{{\bf{x}}}}_{{{\boldsymbol{i}}}}}$ denotes averaging over the data points within bin ${{{\bf{x}}}}_{{{\boldsymbol{i}}}}$. Similarly, the covariance $\Sigma$ can be estimated by the sample covariance matrix.

However, when the number of data points in each small bin is sparse and the neural state dimensionality is high (i.e., a large number of recorded neurons), bin averaging can lead to unreliable—and sometimes even non-invertible—estimation of the covariance matrix²⁴. To address this, the shrinkage method was proposed. This method is equivalent to adding L2 regularization to the maximum likelihood estimation of the covariance matrix, guiding the estimation towards a more structured assumption (e.g., an identity matrix)⁴³. In particular, this paper uses:

$$\Sigma=\left(1-{{\rm{\lambda }}}\right){{\rm{S}}}+{{\rm{\lambda }}}\frac{{{\rm{Tr}}}\left({{\rm{S}}}\right)}{N}{\mathbb{I}}$$

(7)

where $S$ is the sample covariance, ${{\rm{\lambda }}}$ is the shrinkage coefficient estimated by the Ledoit-Wolf (LW) shrinkage algorithm⁴³, $N$ is the number of neurons, and ${\mathbb{I}}$ is an identity matrix. This algorithm is implemented by a Python function ‘sklearn.covariance.LedoitWolf‘.

Gaussian process with Kernel Regression

One disadvantage of the bin average and LW methods is that the estimation of one bin’s covariance does not use data from adjacent bins. Ideally, the manifold and covariance matrix are smooth over label values. Data in adjacent bins provide certain information about the current bin. Therefore, we developed the Gaussian Process with Kernel Regression (GKR) method to infer smoothly varying manifold and covariance from noisy neural states. GKR has two major steps: (1) inferring the manifold via a Gaussian process, and (2) inferring the covariance matrix. Note that while Gaussian processes have been used in previous studies to infer the firing of individual grid cells^76,77, our method, GKR, which is partially based on Gaussian processes, focuses on cell-to-cell statistics.

In step 1, each component of ${{\bf{r}}}$ across all time bins is standardized to have a mean of zero and a variance of one. Denoting the standardized ${{\bf{r}}}$ as $\widetilde{{{\bf{r}}}}$. $\widetilde{{{\bf{r}}}}$ then is modeled as $\widetilde{{{\mathbf{\mu }}}}+{\beta }^{2}{{\mathbf{\eta }}}$, where ${{\mathbf{\eta }}}$ is a standard Gaussian noise, and $\beta$ is a scalar parameter. The manifold $\widetilde{{{\mathbf{\mu }}}}$ is modeled as an N-independent Gaussian process written as $\widetilde{{{\mathbf{\mu }}}}\sim {{\mathscr{G}}}{{{\mathscr{P}}}}^{N}\left(0,{k}_{{{\rm{\mu }}}}\right)$, i.e., with zero mean and a kernel function ${k}_{{{\rm{\mu }}}}:{{{\mathscr{R}}}}^{M}\times {{{\mathscr{R}}}}^{M}\to {{\mathscr{R}}}$ to control the “closeness” of $\widetilde{{{\mathbf{\mu }}}}$ given two different labels ${{\bf{x}}}$³⁸. A shared kernel for all components of $\widetilde{{{\mathbf{\mu }}}}$ is used in this paper. Although the kernel is shared by all components $\widetilde{{{\mathbf{\mu }}}}$, it has different parameters for different components of the label, i.e., ${k}_{{{\rm{\mu }}}}\left({{\bf{x}}},{{{\bf{x}}}}^{{{{\prime} }}}\right)={\prod }_{i=1}^{M}k\left({x}_{i},{x}_{i}^{{\prime} }\right)+c$, where ${x}_{i}$ is the i th component of a label ${{\bf{x}}}$, and $c$ is a constant parameter. The kernel for ${x}_{i}$ is

$$k\left({x}_{i},{x}_{i}^{{\prime} }\right)={{{\rm{\sigma }}}}_{i}^{2}\exp \left(-\frac{{\left({x}_{i}-{x}_{i}^{{\prime} }\right)}^{2}}{2{l}_{i}^{2}}\right)$$

(8)

where ${{{\rm{\sigma }}}}_{i}$ and ${l}_{i}$ are parameters. If the ${x}_{i}$ is a circular variable, a sine wrapping is applied:

$$k\left({x}_{i},{x}_{i}^{{\prime} }\right)={{{\rm{\sigma }}}}^{2}\exp \left(-\frac{{\sin }^{2}\left(2{{\rm{\pi }}}\left({x}_{i}-{x}_{i}^{{\prime} }\right)/{p}_{i}\right)}{2{l}_{i}^{2}}\right)$$

(9)

where ${p}_{i}$ represents the period of the circular variable. Based on all these modeling, the problem of inferring $\widetilde{{{\mathbf{\mu }}}}$ from noisy data $\widetilde{{{\bf{r}}}},{{\bf{x}}}$ becomes a classical Gaussian process regression problem, where the parameters $\{\beta,{l}_{i},{{{\rm{\sigma }}}}_{i},{c}_{i}\}$ are optimized to maximize the log-likelihood of a joint Gaussian distribution for $\widetilde{{{\mathbf{\mu }}}}\sim {{\mathscr{G}}}{{{\mathscr{P}}}}^{{{\mathscr{N}}}}\left(0,{k}_{{{\rm{\mu }}}}\right)$. Finally, $\widetilde{{{\mathbf{\mu }}}}$ is unstandardized back to ${{\mathbf{\mu }}}$.

In many scenarios, the label ${{\bf{x}}}$ spans a large continuous range rather than a few discretized values (e.g., possible positions of a rat in a navigation task). In this case, Gaussian process regression requires computing a large kernel matrix, leading to expensive matrix manipulations⁷⁸. To reduce this, we employed a variational inducing variable method⁷⁸. It approximates training label values with a smaller set of inducing points ${{\bf{z}}}$, thereby reducing the time complexity. In this paper, inducing points were initialized as a randomly sampled subset of the original training labels (200 inducing points), and were optimized during the optimization of Gaussian process regression. Gaussian process regression with inducing variables method is implemented in the Python GPflow package⁷⁹.

The above step one infers the manifold ${{\mathbf{\mu }}}\left({{\bf{x}}}\right)$. Step two infers the covariance matrix $\Sigma \left({{\bf{x}}}\right)$. Define the gram matrix of a point $\left({{{\bf{r}}}}_{{{\boldsymbol{i}}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)$ as $C\left({{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right){{\boldsymbol{\equiv }}}\left({{{\bf{r}}}}_{{{\boldsymbol{i}}}}{-}{{\mathbf{\mu }}}\left({{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)\right){\left({{{\bf{r}}}}_{{{\boldsymbol{i}}}}{-}{{\mathbf{\mu }}}\left({{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)\right)}^{{{\boldsymbol{T}}}}$, we estimate the covariance matrix at ${{\bf{x}}}$ as

$$\Sigma \left({{\bf{x}}}\right)={\sum }_{i}{k}_{L}\left({{\bf{x}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)C\left({{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)+\eta {\mathbb{I}}$$

(10)

where $i$ sums over all training data points, and $\eta={10}^{-6}$ is a small number for numerical stability (keeping the covariance invertible even in the first term is small). ${k}_{L}\left({{\bf{x}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)$ is a weight kernel that represents the contribution of $C\left({{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)$ in estimating the covariance matrix at label ${{\bf{x}}}$. It is normalized such that ${\sum }_{i}{k}_{L}\left({{\bf{x}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)=1$. To gain an intuition of this method, consider a simple case where (up to a normalization) ${k}_{L}\left({{\bf{x}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)=1$ if $\left|\left|{x}_{i}-x\right|\right| < {{\rm{\delta }}}$ and zero otherwise, step two is simply a sample covariance in a small bin of half-width ${{\rm{\delta }}}$.

Since we assumed covariance is a smooth function over ${{\bf{x}}}$, $C\left({{{\bf{x}}}}_{i}\right)$ of adjacent ${{{\bf{x}}}}_{i}$ should still contribute to the estimation of $\Sigma \left({{\bf{x}}}\right)$. Therefore, we used a gradually decaying weight kernel

$${k}_{L}\left({{\bf{x}}},{{{\bf{x}}}}^{{{{\prime} }}}\right)=\kappa \exp \left(-0.5{\left({{\bf{x}}}{-}{{{\bf{x}}}}^{{{{\prime} }}}\right)}^{T}L{L}^{T}\left({{\bf{x}}}{-}{{{\bf{x}}}}^{{{{\prime} }}}\right)\right)$$

(11)

with $\kappa$ as a normalization factor ensuring ${\sum }_{i}{k}_{L}\left({{\bf{x}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)=1$, and $L$ is an $M\times M$ upper triangular matrix, interpreted as the Cholesky decomposition of a semi-positive definite precision matrix $L{L}^{T}$. Note that the precision matrix has non-diagonal terms, hence the interactions between different label components are considered.

Parameter $L$ is optimized to maximize the Gaussian log-likelihood of the data

$${{\mathscr{L}}}\left(L\right)=-{\sum }_{j}\left[\log |\Sigma \left({x}_{j}\right)|-{\left({{{\bf{r}}}}_{{{\boldsymbol{j}}}}-{{\mathbf{\mu }}}\left({{{\bf{x}}}}_{{{\boldsymbol{j}}}}\right)\right)}^{T}{\Sigma }^{-1}\left({{{\bf{x}}}}_{{{\boldsymbol{j}}}}\right)\left({{{\bf{r}}}}_{{{\boldsymbol{j}}}}-{{\mathbf{\mu }}}\left({{{\bf{x}}}}_{{{\boldsymbol{j}}}}\right)\right)\right]$$

(12)

where terms irrelevant to covariance are omitted. Notably, while $\Sigma$ is the weighted average of the Gram matrices from the training set, the log-likelihood function ${{\mathscr{L}}}$ should be evaluated from the validation set, where we used different indices $i,j$ to distinguish (Eq. 10 using training set). Setting the log-likelihood function on the training set would result in $\Sigma \left({{\bf{x}}}\right)$ converging to the Gram matrix $C\left({{\bf{x}}}\right)$. This can be demonstrated by computing $\Sigma \left({{\bf{x}}}\right)$ to satisfy the condition $\partial {{\mathscr{L}}}/\partial \Sigma=0$. Therefore, splitting between training for computing covariance and validation for computing likelihood is necessary.

Overall, we use the following procedure to fit the manifold and covariance from a dataset. In step one, the entire train dataset was used for Gaussian process regression, obtaining a continuous manifold function ${{\mathbf{\mu }}}$. In step two, the train dataset was split into batches, each containing 3000 data points (except for the final batch). Each batch was further split into train and validation sets (0.66:0.33). The train set was used for computing the covariance matrix given an $L$ (initialized as an identity matrix), and the validation set was used to compute the log-likelihood function. The log-likelihood was maximized by an Adam optimizer (gradient applied on $L$). This batch training was repeated for 30 epochs. Finally, with the optimized $L$, the whole dataset was used for computing covariance (Eq. 10). The computer code for implementing GKR is provided at https://github.com/AgeYY/speed_grid_cell_information.git.

Testing the normality assumption in data

We are interested in whether $p\left({{\bf{r}}}|{{\bf{x}}}\right)$ follows a normal distribution, as assumed by GKR. We first inspected the case of an example cube that centered at (11 cm, 37 cm, 18 cm/s) with edge lengths of (10 cm, 10 cm, 10 cm/s) in the label space. Data (from a ${{{\mathscr{D}}}}_{s}$ sampled from R1M2) within the cube were collected and projected into their PC1-PC2 plane, as well as PC1 axis and PC2 axis for visualization. Using the projected data, we fitted the optimal 2D and 1D normal distributions using maximum likelihood estimation and overlapped the optimal normal distributions with the projected data for direct visual comparisons. To formally assess normality, we performed Henze–Zirkler test to the projected 2D data, and Shapiro–Wilk tests to the projected 1D data (see Supplementary Fig. 10). This example cubic data shown in Supplementary Fig. 10 does not follow a normal distribution.

Next, we examined normality across sampled cubes. For each ${{{\mathscr{D}}}}_{s}$ (one ${{{\mathscr{D}}}}_{s}$ was sampled for one experimental configuration, e.g., R1M1, R1M2 etc.), we sampled 300 cubes. Data within each cube were projected onto their first 20 PCs. For each cube, we randomly sampled one PC, and applied Shapiro–Wilk test to assess normality. A p-value below 0.05 was considered indicative of non-normality. The percentage of non-normal cubes is shown in Supplementary Fig. 10. We performed normality tests on the projected PC rather than the full high-dimensional space, as testing in the full high-dimensional space would require significantly more data and could lead to unreliable results.

Gaussian process regression with kernel sampling (GKR-S)

GKR assumes the conditional probability $p\left({{\bf{r}}}|{{\bf{x}}}\right)$ follows a normal distribution, whereas GKR-S employs a non-parametric approach that does not impose this assumption. The first step of GKR-S is identical to GKR, using Gaussian process to infer the mean ${{\mathbf{\mu }}}\left({{\bf{x}}}\right)$. We then computed the residue as ${{\boldsymbol{\epsilon }}}\left({{{\bf{x}}}}_{i}\right)={{{\bf{r}}}}_{i}-{{\mathbf{\mu }}}\left({{{\bf{x}}}}_{i}\right)$, where the subscript $i$ denotes a training data point. The residual vector ${{\boldsymbol{\epsilon }}}\left({{{\bf{x}}}}_{i}\right)$ is also referred to as ${{{\boldsymbol{\epsilon }}}}_{{{\boldsymbol{i}}}}$.

The goal is to infer the residue distribution $p\left({{\boldsymbol{\epsilon }}}|{{\bf{x}}}\right)$ (which was assumed to be Gaussian in GKR). To achieve this, we employ a resampling strategy. We assign a weight to each training data point, where points closer to query label ${{{\bf{x}}}}_{q}$ receive higher weights. In this study, we use a Gaussian kernel to determine these weights

$$k\left({{\bf{x}}},{{{\bf{x}}}}^{{{{\prime} }}}\right)=\kappa \exp \left(-0.5{\left({{\bf{x}}}{-}{{{\bf{x}}}}^{{{{\prime} }}}\right)}^{T}{\Sigma }^{-1}\left({{\bf{x}}}{-}{{{\bf{x}}}}^{{{{\prime} }}}\right)\right)$$

(13)

with $\kappa$ as a normalization factor ensuring ${\sum }_{i}k\left({{\bf{x}}},{{{\bf{x}}}}_{{{\boldsymbol{i}}}}\right)=1$, and for simplicity, $\Sigma$ is a diagonal matrix. We determined the diagonal elements ${\Sigma }_{{jj}}$ heuristically using Silverman’s rule-of-thumb⁸⁰: $\sqrt{{\varSigma }_{{jj}}}={\left(\frac{4}{d+1}\right)}^{\frac{1}{d+4}}{n}^{\frac{-1}{d+4}}{\sigma }_{j}$ where ${\sigma }_{j}$ is the standard deviation of the $j$-th label, $n$ is the number of training data, and $d$ is the dimensionality of the label ${{\bf{x}}}$.

For a given query label ${{{\bf{x}}}}_{q}$, this kernel function assigns a weight for each training data point. These weights serve as priors for resampling the training data. The sampled data are thought to be drawn from the conditional distribution $p\left({{\boldsymbol{\epsilon }}}|{{\bf{x}}}\right)$. Consequently, the statistics of $p\left({{\boldsymbol{\epsilon }}}|{{\bf{x}}}\right)$ and $p\left({{\bf{r}}}|{{\bf{x}}}\right)$ can be empirically computed.

If one is interested only in the mean and covariance of $p\left({{\bf{r}}}|{{\bf{x}}}\right)$, they can be computed analytically without sampling data. Under the assumption of an infinite amount of resampling, the distribution $p\left({{\boldsymbol{\epsilon }}}|{{\boldsymbol{x}}}\right)$ is equivalent to

$$p\left({{\boldsymbol{\epsilon }}}|{{\boldsymbol{x}}}\right)={\sum }_{i}{{\rm{\delta }}}\left({{\boldsymbol{\epsilon }}}-{{{\boldsymbol{\epsilon }}}}_{i}\right)k\left({{{\bf{x}}}}_{i},{{\bf{x}}}\right)$$

(14)

where ${{\rm{\delta }}}\left(\cdot \right)$ is the Dirac delta function. Consequently, the mean of $p\left({{\bf{r}}}|{{\bf{x}}}\right)$ can be computed analytically as ${{\mathbf{\mu }}}\left({{\bf{x}}}\right)+{\sum }_{i}k\left({{{\bf{x}}}}_{{{\rm{i}}}},{{\bf{x}}}\right){{{\boldsymbol{\epsilon }}}}_{i}$.

When estimating the covariance, for simplicity, we assume that ${{\mathbf{\mu }}}\left({{\bf{x}}}\right)$ is a close estimation of the true mean, then ${\sum }_{i}k\left({{{\bf{x}}}}_{i},{{\bf{x}}}\right){{{\boldsymbol{\epsilon }}}}_{{{\boldsymbol{i}}}}\approx 0$. This significantly simplifies the expression of noise covariance to ${\sum }_{i}k\left({{{\bf{x}}}}_{i},{{\bf{x}}}\right){{{\boldsymbol{\epsilon }}}}_{{{\boldsymbol{i}}}}{{{\boldsymbol{\epsilon }}}}_{i}^{{{\rm{T}}}}$. For computational efficiency, we dropped training data points for which $k\left({{{\bf{x}}}}_{i},{{\bf{x}}}\right) < {10}^{-6}$. It is worth noting that this mathematical form of covariance estimation is similar to the one used in GKR (Eq. 10), implying that the second step of the GKR may be thought as a case of a resampling method.

Using the above formula, we obtained both the mean and noise covariance, that allow further computation of other quantities as described in Methods: Computing Geometric Properties of the Speed-Slice Manifold at Different Speeds. In particular, the projected noise is estimated using Eq. (21), which is mathematically equivalent to resampling an infinite amount of data, projecting it onto the tangent plane, and then computing the projected data’s covariance matrix. This paper focuses on the linear Fisher information which sets a fundamental limit on the variance of any unbiased linear estimator⁸¹. Both projected noise and linear Fisher information (referred to simply as Fisher information in this paper) are fully determined by the mean and covariance matrix.

One-dimensional and two-dimensional synthetic datasets

Synthetic datasets are modeled as Gaussian distributions

$${{\bf{r}}}\left({{\bf{x}}}\right)={{\mathbf{\mu }}}\left({{\bf{x}}}\right)+{{\mathscr{N}}}\left(0,\Sigma \left({{\bf{x}}}\right)\right)$$

(15)

The covariance matrix is $\Sigma \left({{\bf{x}}}\right)=L{L}^{T}$ where

$${L}_{{ij}}\left({{\bf{x}}}\right)=\left({{\rm{\alpha }}}{\mu }_{i}\left({{\bf{x}}}\right)+{{\rm{\nu }}}\right)\cdot {e}^{-{{\rm{\gamma }}}\left|i-j\right|}$$

(16)

with ${{\rm{\nu }}}$ representing a constant for stimuli-independent noise, and ${{\rm{\gamma }}}$ as the non-diagonal decay rate. Note that the covariance matrix $\Sigma$ depends on the manifold ${{\mathbf{\mu }}}$.

In the one-dimensional synthetic model, i th component of the manifold ${{\mathbf{\mu }}}$ is given by

$${\mu }_{i}\left(x\right)={g}_{i}\frac{{{\mathscr{V}}}{{\mathscr{M}}}\left(x-{z}_{i},1/{{{\rm{\sigma }}}}^{2}\right)}{{{\mathscr{V}}}{{\mathscr{M}}}\left(0,1/{{{\rm{\sigma }}}}^{2}\right)},$$

(17)

where ${{\mathscr{V}}}{{\mathscr{M}}}\left(\cdot \right)$ denotes a von Mises function, ${g}_{i}\sim U\left({\mathrm{0.5,1.5}}\right)$ is a random gain, ${z}_{i}=2i{{\rm{\pi }}}/N$ is the preferred label value for the $i$-th neuron, and $\sigma=0.3$ is the tuning width. $x$ is a circular scalar label ranging from 0 to $2{{\rm{\pi }}}$. Parameters for generating covariance matrix (Eq. 16) are: $\alpha=0.2,\nu=0.05,\gamma=1$.

In the two-dimensional synthetic model, the i th component of manifold ${{\mathbf{\mu }}}$ is

$${\mu }_{i}\left({{\bf{x}}}\right)=\exp \left(-\frac{{\left|\left|{{\bf{x}}}-{{{\bf{z}}}}_{{{\boldsymbol{i}}}}\right|\right|}^{2}}{2{\left({{\rm{\sigma }}}{{{\rm{\lambda }}}}_{i}\right)}^{2}}\right),$$

(18)

where ${{{\bf{z}}}}_{i}\sim U\left(\left[-{\mathrm{1,1}}\right],\left[-{\mathrm{1,1}}\right]\right)$ is the center of the receptive field of neuron $i$, ${{\rm{\sigma }}}=0.3$, and ${\lambda }_{i}\sim U\left({\mathrm{0.5,1.5}}\right)$ controls the tuning width. The label ${{\boldsymbol{x}}}$ is two-dimensional, with each component ranging from −1 to 1. Parameters for generating covariance matrix (Eq. 16) are: $\alpha=0.5,\nu=0.1,\gamma=1$.

To generate a synthetic dataset of size $T$, $T$ labels ${{\bf{x}}}$ were uniformly sampled from the entire range. Each sampled label ${{\bf{x}}}$ was then used to compute one manifold point ${{\mathbf{\mu }}}$ and one covariance matrix $\Sigma$, thus generating one ${{\bf{r}}}$ using a Gaussian distribution (Eq. 15). $T$ labels generate $T$ data points.

When visualizing the ground truth of synthetic datasets in Fig. 2B, 100 labels ${{\bf{x}}}$ were randomly sampled. Then manifold points ${{\mathbf{\mu }}}$ and covariance matrices were computed. Manifold points were then fed into a PCA, dimensionally reduced to the first 2/3 dimensions (two for Fig. 2 and three for Supplementary Fig. 4). The covariance matrices were also projected onto the PCA subspace, transforming to a 2 × 2/3 × 3 matrix. The eigenvalues of this 2×2/3×3 matrix were visualized as the lengths of the ellipsoid’s major axes (Fig. 2, Supplementary Fig. 4); and eigenvectors were visualized as the ellipsoid’s major axes directions.

Computing the relative prediction error of a metric to the ground truth in the synthetic dataset

We evaluated the performances of estimators (Bin average, LW, GKR) by comparing their predictions to ground truth. We evaluated several metrics: (1) manifold ${{\mathbf{\mu }}}$ (2) covariance matrix $\Sigma$ (3) Riemannian metric ${\left(\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}\right)}^{{{\rm{T}}}}\left(\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}\right)$ (4) Linear Fisher information ${\left(\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}\right)}^{{{\rm{T}}}}{\Sigma }^{-1}\left(\partial {{\mathbf{\mu }}}/\partial {{\bf{x}}}\right)$ (5) Precision matrix ${\Sigma }^{-1}$. $\partial {{\mathbf{\mu }}}/\partial {{\boldsymbol{x}}}$ was estimated numerically by finite difference.

For each condition (number of data points or number of neurons, Fig. 2), ten synthetic datasets were sampled. For each dataset, all data were used for training the estimator. Trained estimator predicts the values of metrics at other 100 randomly sampled labels. The relative estimation error is the mean of ${\|{M}_{i}-{\hat{M}}_{i}\|}_{F}/{\|{M}_{i}\|}_{F}$ over all 100 label $i$, where ${M}_{i}$ is the ground truth quantity while ${\hat{M}}_{i}$ is the estimated quantity, ${||}\cdot |{|}_{F}$ is the Frobenius norm.

Fit and visualize grid cell population manifold

GKR was applied to ${{{\mathscr{D}}}}_{s}$ to fit a manifold and a covariance matrix. Since ${{{\mathscr{D}}}}_{s}$ has three labels (two for locations and one is the speed), the full manifold is an intrinsically three-dimensional object. For the ease of visualization, we visualized slices of the manifold instead.

First, we visualized the manifold representing different locations but fixing the speed at 20 cm/s, i.e., speed-slice manifold (SSM). A 30 by 30 grid of positions was sampled from the entire OF space. Along with the fixed speed of 20 cm/s, these labels were fed into the fitted GKR to predict the manifold points on SSM. These 900 predictions were first reduced to 6 dimensions using PCA, then projected non-linearly into 3 dimensions using Uniform Manifold Approximation and Projection (UMAP), implemented by the Python umap-learn package. The parameters used were ‘n_neighbors’ = 100, ‘min_dist’ = 0.8, ‘metric’ = ‘cosine’, and ‘init’ = ‘spectral’. To visualize the continuous manifolds, we interpolated small surfaces of adjacent x, y coordinate predictions using the plot_surface function in the Matplotlib Python package (Fig. 3B).

We visualized other manifold slices similarly. Figure 3D considered four adjacent spatial points (centered at $x=75$ cm, $y=75$ cm, with an adjacent points’ distance of 2 cm) and varying speed values. Supplementary Fig. 7C considered four distant spatial points (centered at $x=75$ cm, $y=75$ cm, with an adjacent distance of 20 cm). Supplementary Fig. 7D considered a slice with a fixed $x=75$ ${{\rm{cm}}}$. PCA analysis on these slice manifolds suggested they are low-dimensional (3 PCs are sufficient to explain 80 percent of the variance). Hence, these manifold slices were directly visualized in the space of the first two/three principal components.

Persistent Homology Barcode

Persistent homology is a method to analyze the topological structure of data clouds⁸². Each point in the data cloud was replaced by a small ball of radius $r$. If the distance of two points is smaller than $2r$, then they would be connected. Roughly speaking, a graph with dots and connected lines is called a simplicial complex. Simplicial complex can have several holes of different dimensions (0D hole means a single component connecting all points; 1D hole is a loop; 2D hole is a cavity). As the dot ball radius increases from 0 to infinity, different dots would be connected, resulting in different simplicial complexes. During this process, some holes emerge while some holes die out. The birth and dead time of different holes can be collected and represented as bars. All bars of the same hole dimensions form a barcode for that dimension. Usually most bars are short, they likely represent noise, while long-life bars indicate non-trivial topological structure of the data cloud. The number of long-life bars in each dimension is counted as a Betti number, written in ${{{\rm{\beta }}}}_{i}$. For example, a loop manifold should have one long-life zero-D hole, one one-D hole and no 2D holes. Hence the corresponding Betti number should be $\left({\beta }_{0},{\beta }_{1},{\beta }_{2}\right)=\left({\mathrm{1,1,0}}\right)$. In particular, a torus should have Betti numbers $\left({\beta }_{0},{\beta }_{1},{\beta }_{2}\right)=\left({\mathrm{1,2,1}}\right)$. We used the software package Ripser to compute the barcode, accompanied with approximated sparse filtrations to increase computational efficiency⁸³ (epsilon approximation constant = 0.2, see more detail in Ripser⁸⁴). Intuitively, instead of computing the distance matrix of all points in the data cloud, approximated sparse filtrations discard balls which are completely covered by other balls (under certain $r$).

To build an objective procedure counting the number of long-life bars in the barcode, we defined a bar-length threshold to distinguish long-life bars. Here we defined the length threshold heuristically same as previous study³⁵. A data point (e.g., a neural state) is a n-dimensional array where n is the number of grid cells. All data points form a m-by-n matrix, where m is the number of data points. We then randomly rolled (periodic boundary) each column of the matrix. This shuffled dataset was then fed into persistent homology, the maximum bar length was collected. This shuffling procedure repeated 20 times and obtained the final maximum bar length among 20 shuffling. This is the bar-length threshold.

Specifically, ${{{\mathscr{D}}}}_{s}$ was used to fit the GKR. When estimating the topological structure of the full three-dimensional manifold, 6,400 random labels were randomly sampled, and input to GKR to generate 6,400 manifold points. To simplify these data points, in accordance with Gardner et al. 2022³⁵, we firstly projected these data points into six PC subspace, and then used k-means to compute 1,200 cluster centers. These centers were then fed into Ripser, and Betti numbers were estimated from the above procedure. $\left({\beta }_{0},{\beta }_{1},{\beta }_{2}\right)=\left({\mathrm{1,2,1}}\right)$ suggests a signature of possible toroidal-like topology.

When estimating the topological structure of SSM (Fig. 3B, speed = 20 cm/s), 30-by-30 grid locations were collected, fed into the GKR to make predictions. These 900 manifold points were then projected into the first six PC subspace, and then fed into Ripser to compute Betti numbers (there’s no need to use k-means approximation in this case, because 900 data points is a good number to computationally handle, unlike the full-manifold case above).

Computing Geometric Properties of speed-slice manifold at different speeds

${{{\mathscr{D}}}}_{s}$ was used to fit the GKR. The fitted manifold is a function of $x$, $y$ locations and speed $v$. For each fixed speed value, we randomly sample 500 locations denoted as $\left({x}_{i},{y}_{i}\right)$.

The SSM center is the averaged 500 manifold points, denoted as ${{{\mathbf{\mu }}}}_{c}\left(v\right)$.

The SSM radius is the averaged Euclidian distance of these 500 random points to the SSM center

$$R={\sum}_{i}{\Vert {{\mathbf{\mu }}}({x}_{i},{y}_{i},v)-{{{\mathbf{\mu }}}}_{c}(v)\Vert }_{F}/N$$

(19)

Tangent vector $\partial {{\mathbf{\mu }}}/\partial x{|}_{\left({x}_{i},{y}_{i},v\right)}$ measures how sensitive the neural population representation to the change of $x$ location²⁷. Let ${{{\bf{a}}}}_{i}\left(v\right)\equiv \partial {{\mathbf{\mu }}}/\partial x{|}_{\left({x}_{i},{y}_{i},v\right)}$, ${{{\bf{b}}}}_{i}\left(v\right)\equiv \partial {{\mathbf{\mu }}}/\partial y{|}_{\left({x}_{i},{y}_{i},v\right)}$ and ${a}_{i}\left(v\right),{b}_{i}\left(v\right)$ as vector length respectively, lattice area at a point $\left({x}_{i},{y}_{i},v\right)$ is the area formed by the two tangent vectors

$${A}_{i}\left(v\right) ={a}_{i}\left(v\right){b}_{i}\left(v\right)\sin {\theta} \\ ={a}_{i}\left(v\right){b}_{i}\left(v\right)\sqrt{1-{\cos }^{2}{{\rm{\theta }}}} \\ =\sqrt{{a}_{i}^{2}\left(v\right){b}_{i}^{2}\left(v\right)-{\left({{{\bf{a}}}}_{i}\left(v\right)\cdot {{{\bf{b}}}}_{i}\left(v\right)\right)}^{2}}$$

(20)

The average of lattice area over 500 random points is the (averaged) lattice area of a SSM, shown in Fig. 3E.

Fisher information matrix is defined as ${J}^{T}{\Sigma }^{-1}J$, where $J$ is the Jacobian matrix in respect to spatial location. Total Fisher information is the trace of Fisher information matrix, averaged over all 500 random points.

To compute the projected noise, Jacobian matrix was normalized to $\hat{U}$, such that each column (tangent vector) has a unit length. Projected noise matrix is

$${\Sigma }^{{proj}}={\hat{U}}^{T}\Sigma \hat{U}$$

(21)

Project noise is the trace of projected noise matrix, averaged across 500 randomly sample points.

Independent Poisson Speed-Gain (IPSG) grid cells

We present an Independent Poisson Speed-Gain (IPSG) model of the grid cells, characterized by three key assumptions: (1) the grid cells fire independently; (2) grid cells exhibit Poisson firing; (3) running speed modulates the spatial rate map through an increasing gain factor. Mathematically, the firing of a grid cell $i$ is given by:

$${r}_{i}\left(x,v\right)\sim {{\rm{Poisson}}}\left({f}_{i}\left(v\right){M}_{i}\left({{\bf{x}}}\right)\right)$$

(22)

where $x$ represents the 2D position, ${f}_{i}\left(v\right)$ denotes the speed-dependent gain, which is a monotonically increasing function. Substituting this into Eq. (19), we obtain the manifold size:

$$R=\frac{1}{A}\int\limits_{{{\mathscr{X}}}} \Vert {{\mathbf{\mu }}} (x,v)-\bar{{{\mathbf{\mu }}}}(v)\Vert dx=\frac{1}{A}{\int }_{{{\mathscr{X}}}}\sqrt{{\sum }_{i=1}^{n}{f}_{i}^{2}(v){({M}_{i}({{\bf{x}}})-{\bar{M}}_{i})}^{2}}dx,$$

(23)

where $A$ is the area of the open field ${{\mathscr{X}}}$; $\bar{M}={\int }_{{{\mathscr{X}}}}M\left(x\right){dx}/A$; ${{\mathbf{\mu }}}$ is the vector of mean firing rates with ${{{\mathbf{\mu }}}}_{i}\left({{\bf{x}}},v\right)={f}_{i}\left(v\right){M}_{i}\left({{\bf{x}}}\right)$, and $\bar{{{\mathbf{\mu }}}}\left(v\right)$ is its spatial average over ${{\mathscr{X}}}$. Therefore, Eq. (23) states that the manifold size increases with running speed.

To evaluate the total noise, we approximate Poisson firing with a Gaussian firing, leading to a population noise covariance matrix that is a diagonal (because of the independent firing), with diagonal element ${{\Sigma }_{{{\rm{ii}}}}=\mu }_{i}\left({{\bf{x}}},v\right)/\Delta t$ where $\Delta t$ denotes the time window over which the firing rate is measured. The total noise, defined as the trace of the covariance matrix and averaged over the space ${{\mathscr{X}}}$, is given by:

$${{{\rm{\sigma }}}}^{2}={\sum }_{i=1}^{n}{f}_{i}\left(v\right){\bar{M}}_{i}/\Delta t,$$

(24)

which also increases with speed.

Finally, the (linear) Fisher information averaged of ${{\mathscr{X}}}$ is

$$I\left(v\right)=\frac{1}{A}{\int }_{x}{\left(\frac{\partial {{\mathbf{\mu }}}}{\partial {{\bf{x}}}}\right)}^{T}{\Sigma }^{-1}\left(\frac{\partial {{\mathbf{\mu }}}}{\partial {{\bf{x}}}}\right){dA}=\frac{1}{A}{\sum }_{i=1}^{n}{f}_{i}\left(v\right){\int }_{x}\frac{{|}{|}\nabla {M}_{i}\left({{\bf{x}}}\right){{|}{|}}^{2}}{{M}_{i}\left({{\bf{x}}}\right)}{dA},$$

(25)

which also increases with running speed.

Simulation of IPSG grid cells

To further evaluate whether IPSG grid cells can reproduce the increase in manifold size, total noise and Fisher information with speed (Figs. 3, 4, and 5), we simulated simple IPSG grid cells. The rate map of each grid cell is ${M}_{i}\left({{\bf{x}}}\right)={\sum }_{a}\cos \left(\frac{2{{\rm{\pi }}}}{L}{k}^{a}{{\bf{x}}}+{\phi }_{i}^{a}\right)+4$, where ${k}^{a}$ ($a={\mathrm{1,2,3}}$) represents unit vectors pointing at 0, 60, and 120 degrees, respectively. $L={{\rm{\pi }}}$ is the spatial period, with the simulated field spans $\left[-{{\rm{\pi }}},{{\rm{\pi }}}\right]\times \left[-{{\rm{\pi }}},{{\rm{\pi }}}\right]$. The phase offset ${\phi }_{i}^{a}$ was randomly sampled from a uniform distribution over $\left[{\mathrm{0,2}}{{\rm{\pi }}}\right)$; The additive factor of 4 ensures positive firing. All grid cells share the same speed gain function ${f}_{i}\left(v\right)\equiv f\left(v\right)=20-20\exp \left(-v/10\right)$, which models a saturating speed gain, as implied by Hinman et al. 2016¹¹.

We simulated 10 IPSG grid cells, and randomly sampled 10,000 ${{\bf{x}}}$ and $v$ from uniform distributions, with ${{\bf{x}}}$ values ranging from the whole field and speed values ranging from 0 to 40. For each sampled ${{\bf{x}}}$ and $v$, we computed the mean firing rate of each grid cell, and then generated Poisson spikes, resulting in a simulated dataset containing spike counts along with their corresponding ${{\bf{x}}}$ and $v$.

To estimate uncertainty, we performed bootstrap resampling on the dataset 10 times (with replacement), generating 10 resampled datasets. For each resampled dataset, we used GKR to fit the data and estimate the manifold size, total noise and Fisher information, as shown in Supplementary Fig. 12.

Fisher information provides the upper bound of spatial classification accuracy

Consider a classification problem involving data from two small boxes centered at ${{{\bf{x}}}}_{\pm }={{{\bf{x}}}}_{c}\pm {{\rm{\delta }}}{{\bf{x}}}$. Denote the two classes as ${{{\mathscr{C}}}}_{1}$ and ${{{\mathscr{C}}}}_{2}$. In the process of evaluating SCA, we subsampled the data points so that two boxes have equal data set sizes. In line with this, the prior probabilities of a data point belonging to either class are equal, $p\left({{{\mathscr{C}}}}_{1}\right)=p\left({{{\mathscr{C}}}}_{2}\right)=1/2$. We also assume the neural state ${{\boldsymbol{r}}}$ in box $i$ is approximately given by ${{\mathscr{N}}}\left({{\bf{r}}}{;}{{{\mathbf{\mu }}}}_{i},\Sigma \right)$, where $i$ can be 1 or 2. Here we derive the optimal classification accuracy if the classification boundary is linear (as used by the logistic classifier). A linear classification boundary means that the class is ${{{\mathscr{C}}}}_{1}$ if $y={w}^{T}r-{w}_{0} < 0$, and ${{{\mathscr{C}}}}_{2}$ otherwise.

Classification accuracy is the probability of a correct classification.

$$p\left({\mbox{correct}}\right) =p\left(y < 0,{{\mathscr{C}}}{=}{{{\mathscr{C}}}}_{1}\right)+\left(y\ge 0,{{\mathscr{C}}}{=}{{{\mathscr{C}}}}_{2}\right) \\ =\frac{1}{2}\left[p\left(y < 0|{{{\mathscr{C}}}}_{1}\right)+p\left(y\ge 0|{{{\mathscr{C}}}}_{2}\right)\right] \\ =\frac{1}{2}\left[p\left({{{\bf{w}}}}^{T}{{\bf{r}}} < {w}_{0}|{{{\mathscr{C}}}}_{1}\right)+p\left({{{\bf{w}}}}^{T}{{\bf{r}}}\ge {w}_{0}|{{{\mathscr{C}}}}_{2}\right)\right] \\ =\frac{1}{2}\left[\Phi \left(\frac{{w}_{0}-{{{\boldsymbol{w}}}}^{T}{{{\mathbf{\mu }}}}_{1}}{\sqrt{{{{\boldsymbol{w}}}}^{T}\Sigma {{\boldsymbol{w}}}}}\right)+1-\Phi \left(\frac{{w}_{0}-{{{\boldsymbol{w}}}}^{T}{{{\mathbf{\mu }}}}_{2}}{\sqrt{{{{\boldsymbol{w}}}}^{T}\Sigma {{\boldsymbol{w}}}}}\right)\right]$$

(26)

where $\Phi \left(\cdot \right)$ is the cumulative density function of a standard normal distribution. We used the fact that, if $p\left({{\bf{r}}}|{{{\mathscr{C}}}}_{{{i}}}\right)$ is a Gaussian, ${{{\bf{w}}}}^{T}{{\bf{r}}}$ is also a Gaussian with mean ${{{\bf{w}}}}^{T}{{{\mathbf{\mu }}}}_{i}$ and variance ${{{\bf{w}}}}^{T}\Sigma {{\bf{w}}}$.

Next, we find the optimal $P\left({\mbox{correct}}\right)$. Let $\partial P\left({\mbox{correct}}\right)/\partial {w}_{0}=0$, we get ${w}_{0}={{{\bf{w}}}}^{T}\left({{{\mathbf{\mu }}}}_{1}+{{{\mathbf{\mu }}}}_{2}\right)/2$. Let $\partial P\left({\mbox{correct}}\right)/\partial {{\bf{w}}}=0$, we get an equation $\Delta {{\rm{\mu }}}\left(2{{{\bf{w}}}}^{T}\Sigma {{\bf{w}}}\right)=2\Sigma {{\bf{w}}}\left({{{\bf{w}}}}^{T}\Delta {{\mathbf{\mu }}}\right)$ where $\Delta {{\mathbf{\mu }}}={{{\mathbf{\mu }}}}_{2}-{{{\mathbf{\mu }}}}_{1}$. This equation has a general solution ${{\bf{w}}}\propto {\Sigma }^{-1}\Delta {{\mathbf{\mu }}}$. Substituting this back into accuracy, we get the optimal accuracy of the two boxes is $\Phi \left(\sqrt{\Delta {{{\mathbf{\mu }}}}^{T}{\Sigma }^{-1}\Delta {{\mathbf{\mu }}}}/2\right)$.

In our case, boxes were chosen to be symmetric, hence ${{{\mathbf{\mu }}}}_{1}={{\mathbf{\mu }}}\left({{{\bf{x}}}}_{c}-{{\rm{\delta }}}{{\bf{x}}}\right)\approx {{\mathbf{\mu }}}\left({{{\bf{x}}}}_{c}\right)-\nabla {{\mathbf{\mu }}}{{\rm{\delta }}}{{\bf{x}}}$, and similarly ${{{\mathbf{\mu }}}}_{2}={{\mathbf{\mu }}}\left({{{\bf{x}}}}_{c}+{{\rm{\delta }}}{{\bf{x}}}\right)\approx {{\mathbf{\mu }}}\left({{{\bf{x}}}}_{c}\right)+\nabla {{\mathbf{\mu }}}{{\rm{\delta }}}{{\bf{x}}}$, where ${{\rm{\delta }}}{{\bf{x}}}={{\rm{\delta }}}l\hat{{{\bf{e}}}}$. Therefore, $\Delta {{\mathbf{\mu }}}=2\nabla {{\mathbf{\mu }}}{{\rm{\delta }}}{{\bf{x}}}$. The upper bound of accuracy becomes $\Phi \left(\sqrt{{{\rm{\delta }}}{{{\bf{x}}}}^{T}I{{\rm{\delta }}}{{\bf{x}}}}\right)$ where $I$ is the Fisher information. In our numerical procedure, $\hat{{{\bf{e}}}}$ is a random unit vector. The overall upper bound of accuracy is given by the integration across all angles of $\hat{{{\bf{e}}}}$

$$p{\left({\mbox{correct}}\right)}_{{\mbox{optimal}}}={\int }_{0}^{2{{\rm{\pi }}}}\Phi \left(\sqrt{{{\rm{\delta }}}{{{\bf{x}}}}^{T}I{{\rm{\delta }}}{{\bf{x}}}}\right)d{{\rm{\theta }}}/\left(2{{\rm{\pi }}}\right)$$

(27)

The relationship between optimal accuracy and the total Fisher information becomes intuitive if we assume the Fisher information is unbiased in all directions, which, biologically speaking, means that a rat has no directional bias in an open field. Under this assumption, the integral is trivial because the function inside is independent of direction. Second, the Fisher information becomes proportional to the identity matrix ${Tr}\left(I\right){\mathbb{I}}$. Without loss of generality, let ${{\rm{\delta }}}{{\bf{x}}}={{\rm{\delta }}}l{\left({\mathrm{1,0}}\right)}^{T}$. The optimal accuracy becomes (Taylor expanded around zero) $0.5+{{\rm{\delta }}}l{{\rm{\phi }}}\left(0\right)\sqrt{{Tr}\left(I\right)}$ where ${{\rm{\phi }}}\left(0\right)=1/\sqrt{2{{\rm{\pi }}}}$. Therefore, optimal accuracy asymptotically increases with the square root of the total Fisher information.

We used Monte Carlo method to estimate the integral in Eq. (27). Specifically, we sampled 2D vectors from a 2-dimensional standard Gaussian distribution, then rescaled the 2D vectors to have a length equal to ${{\rm{\delta }}}l$, resulting the sampled ${{\rm{\delta }}}{{\bf{x}}}$. This sampling is unbiased with respect to angle because the 2-dimensional standard Gaussian distribution is isometric. Given Fisher information $I$, the upper bound was estimated as the average of $\Phi \left(\sqrt{{{\rm{\delta }}}{{{\bf{x}}}}^{T}I{{\rm{\delta }}}{{\bf{x}}}}\right)$ across all ${{\rm{\delta }}}{{\bf{x}}}$.

Test upper bound of spatial classification accuracy on synthetic datasets and grid cell population responses

The upper bound (Eq. 27) is straightforward in a one-dimensional ${{\rm{\delta }}}x$

$$p{\left({\mbox{correct}}\right)}_{{\mbox{optimal}}}=\Phi \left({{\rm{\delta }}}l\sqrt{I}\right)$$

(28)

We tested upper bounds in both 1D and 2D synthetic data sets. For each parameter configuration (number of neurons $N$, number of data points $K$, and noise level ${{\rm{\alpha }}}$; the other parameters were fixed as described in Methods: One-Dimensional and Two-Dimensional Synthetic Datasets), we generated $K$ data points. SCA was computed as described in ‘Methods: Spatial classification accuracy’. On the other side, the generated $K$ data points were also used for fitting GKR. Fitted GKR make predictions of Fisher information, which then converted to upper bound (Eq. 27). Finally, the ground truth Fisher information of the synthetic datasets was also used to compute the upper bound. Results are shown in Supplementary Fig. 9.

We also inspected the upper bounds on the Grid cell datasets. GKR provides predictions of Fisher information, which were then used to compute the upper bounds. The upper bounds and SCAs of the R1M2 were shown in Supplementary Fig. 8C and Fig. 5C for ${{{\mathscr{D}}}}_{s}$ and ${{{\mathscr{D}}}}_{s}^{\left(6\right)}$ (projection to six PCs, see Methods), respectively. Upper bound-speed/SCA-speed array has $50\times 8=400$ data points (fifty sampling ${{{\mathscr{D}}}}_{s}$/${{{\mathscr{D}}}}_{s}^{\left(6\right)}$ times 8 speed bins). To have a quantitative comparison between upper bounds and SCA, we computed the Pearson correlations between the two arrays, denoted as $r$. The p-value (two-sided) and confidence interval (via Fisher z-transform) can be computed accordingly via Python package stats.peasonr^85,86.

A two-neuron example demonstrating the effects of noise correlation

We present a simple two-neuron system to illustrate that the neuronal noise correlation can have either beneficial or detrimental effects, depending on the geometry of the noise covariance.

Consider a two-neuron system encoding two classes. The mean firing rates of the two neurons under class 1 are ${{{\mathbf{\mu }}}}_{1}={\left({\mathrm{1,0}}\right)}^{T}$, and under class 2 are ${{{\mathbf{\mu }}}}_{2}={\left({\mathrm{0,1}}\right)}^{T}$. The neuron’s firing noise follows a Gaussian distribution with a covariance matrix given by:

$$\Sigma=\left[\begin{array}{cc}1 & \rho \\ \rho & 1\end{array}\right]$$

(29)

where the value 1 represents the variance of individual neural noise, and $\rho$ denotes the noise correlation. For $\varSigma$ to be a valid (positive-definite) covariance matrix, it must satisfy $1-{\rho }^{2} > 0$. Given the mean firing rates and noise covariance, we can compute the Fisher information:

$${{\rm{I}}}={(\Delta {{\mathbf{\mu }}})}^{T}{\Sigma }^{-1}(\Delta {{\mathbf{\mu }}})=\frac{2}{1-{{\rm{\rho }}}}$$

(30)

where $\Delta {{\mathbf{\mu }}}={{{\mathbf{\mu }}}}_{{{\bf{2}}}}-{{{\mathbf{\mu }}}}_{{{\bf{1}}}}$ represents the vector difference between the class-conditional firing means. The baseline Fisher information, when there is no correlation ($\rho=0$), is 2. Positive correlation ($\rho > 0$) increases the Fisher information, hence it is information-beneficial; while negative correlation ($\rho < 0$) reduces the Fisher information, hence it is information detrimental. In general, negative correlation is not necessarily detrimental to information—it depends on the geometric relationship between the noise covariance and the signal. In this simple example, positive noise “reshapes” the noise covariance so that more noise aligns with the classification decision boundary, whereas negative noise “reshapes” it such that more noise is orthogonal to the decision boundary (Supplementary Fig. 13).

To show these effects in simulation, we generated 1000 total samples (500 per class) under three different noise conditions (see Supplementary Fig. 13): (1) beneficial correlation $\rho=0.8 > 0$: noise aligns with the decision boundary; (2) detrimental correlation $\rho=-0.8 < 0$: noise projects onto the signal axis ${{\boldsymbol{\Delta }}}{{\mathbf{\mu }}}$; and (3) independent noise $\rho=0$: no correlation between neurons. Note that the total noises (trace of the covariance matrix) of three conditions are the same. With these generated data, we trained a logistic regression classifier to distinguish two classes for each condition. The classification accuracy quantifies the “goodness of information coding”. We also indicate the theoretical Fisher information value in Supplementary Fig. 13.

Independent Firing Grid Cells

We investigated the effect of grid cell activity correlation by comparing results from the original dataset ${{{\mathscr{D}}}}_{s}$ to those from hypothetical independent firing grid cells (IFGC). A classic method for generating independent firing cells involves shuffling trials within the same condition. Specifically, each cell’s firing profiles are randomly permuted across trials within a condition. This approach preserves single-cell firing statistics while disrupting cell-to-cell firing correlation. We adapted this method when computing SCA of the IFGC (Fig. 6E). Recall that the key idea of SCA is to compute the classification performance on data within two nearby (spatial) boxes. We treat data within each box as a single condition, where each data point (an N-dimensional vector of single-cell firing rates) represents one trial. We then randomly permute each cell’s firing rate across all data points within the box, breaking the cell-to-cell correlation. The SCA of this “trial-shuffled” data is called the SCA of IFGC (Fig. 6E).

We also adapted this “trial-shuffling” idea to compute the geometric metrics (total noise, projected noise, and Fisher information) for IFGC. Specifically, after fitting ${{{\mathscr{D}}}}_{s}$, GKR can predict mean and covariance at a condition ${{\bf{x}}}$. Consider GKR as a generative model, it generates infinite data points under the same condition ${{\bf{x}}}$. If we applied the above “trial-shuffling” procedure on these data points, and recompute the mean and covariance matrix, the mean remains unchanged, while the covariance matrix retains only the diagonal components of the original covariance matrix, with all off-diagonal components set to zero. Therefore, IFGC’s GKR is same as the original GKR except only having the diagonal covariance matrix. With IFGC’s GKR, the geometric metrics can be computed as previously described.

We compared speed-averaged metrics obtained from the original datasets to those obtained from IFGC. Methods of computing speed-averaged metrics along with statistical analysis can be found in the Methods: Bayesian linear ensemble averaging and statistical testing section.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Source data are provided with this paper. The grid cell spiking dataset collected by Gardner et al. (2022)³⁵ and used in this study is available in Figshare with the identifier: https://doi.org/10.6084/m9.figshare.16764508.v6. Source data are provided with this paper.

Code availability

Code for reproducing the analyses in this article is available at https://github.com/AgeYY/speed_grid_cell_information.git.

References

Whittington, J. C. R., McCaffary, D., Bakermans, J. J. W. & Behrens, T. E. J. How to build a cognitive map. Nat. Neurosci. 25, 1257–1272 (2022).
Article PubMed CAS Google Scholar
Hafting, T., Fyhn, M., Molden, S., Moser, M. B. & Moser, E. I. Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801–806 (2005).
Article ADS PubMed CAS Google Scholar
Jacobs, J. et al. Direct recordings of grid-like neuronal activity in human spatial navigation. Nat. Neurosci. 16, 1188–1190 (2013).
Article PubMed PubMed Central CAS Google Scholar
Bush, D., Barry, C., Manson, D. & Burgess, N. Using grid cells for navigation. Neuron 87, 507–520 (2015).
Article PubMed PubMed Central CAS Google Scholar
Cueva, C. J. & Wei, X. X. Emergence of grid-like representations by training recurrent neural networks to perform spatial localization. in Proc. 6th International Conference on Learning Representations (ICLR 2018) 1–19 https://openreview.net/forum?id=B17JTOe0- (2018).
Sorscher, B., Mel, G. C., Ganguli, S. & Ocko, S. A. A unified theory for the origin of grid cells through the lens of pattern formation. Advances in Neural Information Processing Systems 32, 1–18, https://proceedings.neurips.cc/paper_files/paper/2019/file/6e7d5d259be7bf56ed79029c4e621f44-Paper.pdf (2019).
Fyhn, M., Hafting, T., Treves, A., Moser, M. B. & Moser, E. I. Hippocampal remapping and grid realignment in entorhinal cortex. Nature 446, 190–194 (2007).
Article ADS PubMed CAS Google Scholar
Sorscher, B., Mel, G. C., Ocko, S. A., Giocomo, L. M. & Ganguli, S. A unified theory for the computational and mechanistic origins of grid cells. Neuron 111, 121–137.e13 (2023).
Article PubMed CAS Google Scholar
Banino, A. et al. Vector-based navigation using grid-like representations in artificial agents. Nature 557, 429–433 (2018).
Article ADS PubMed CAS Google Scholar
Hardcastle, K., Maheswaranathan, N., Ganguli, S. & Giocomo, L. M. A multiplexed, heterogeneous, and adaptive code for navigation in medial entorhinal cortex. Neuron 94, e7 (2017).
Article Google Scholar
Hinman, J. R., Brandon, M. P., Climer, J. R., Chapman, G. W. & Hasselmo, M. E. Multiple running speed signals in medial entorhinal cortex. Neuron 91, 666–679 (2016).
Article PubMed PubMed Central CAS Google Scholar
Sargolini, F. et al. Conjunctive representation of position, direction, and velocity in entorhinal cortex. Science 312, 758–762 (2006).
Article ADS PubMed CAS Google Scholar
Wills, T. J., Barry, C. & Cacucci, F. The abrupt development of adult-like grid cell firing in the medial entorhinal cortex. Front. Neural Circuits 6, 1–13 (2012).
Article Google Scholar
Yoon, K. et al. Specific evidence of low-dimensional continuous attractor dynamics in grid cells. Nat. Neurosci. 16, 1077–1084 (2013).
Article PubMed PubMed Central CAS Google Scholar
Hayman, R. & Burgess, N. Disrupting the grid cells’ need for speed. Neuron 91, 502–503 (2016).
Article PubMed CAS Google Scholar
Iwase, M., Kitanishi, T. & Mizuseki, K. Cell type, sub-region, and layer-specific speed representation in the hippocampal–entorhinal circuit. Sci. Rep. 10, 15879 (2020).
Article Google Scholar
Kropff, E., Carmichael, J. E., Moser, M. B. & Moser, E. I. Speed cells in the medial entorhinal cortex. Nature 523, 419–424 (2015).
Article ADS PubMed CAS Google Scholar
Hardcastle, K., Ganguli, S. & Giocomo, L. M. Environmental boundaries as an error correction mechanism for grid cells. Neuron 86, 827–839 (2015).
Article PubMed CAS Google Scholar
Khona, M. & Fiete, I. R. Attractor and integrator networks in the brain. Nat. Rev. Neurosci. 23, 744–766 (2022).
Article PubMed CAS Google Scholar
Burak, Y. & Fiete, I. R. Accurate path integration in continuous attractor network models of grid cells. PLoS Comput. Biol. 5, e1000291 (2009).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Burak, Y. & Fiete, I. R. Fundamental limits on persistent activity in networks of noisy neurons. Proc. Natl. Acad. Sci. USA 109, 17645–17650 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Stringer, C., Michaelos, M., Tsyboulski, D., Lindo, S. E. & Pachitariu, M. High-precision coding in visual cortex. Cell 184, 2767–2778.e15 (2021).
Article PubMed CAS Google Scholar
Rumyantsev, O. I. et al. Fundamental bounds on the fidelity of sensory cortical coding. Nature 580, 100–105 (2020).
Article ADS PubMed CAS Google Scholar
Kohn, A., Coen-Cagli, R., Kanitscheider, I. & Pouget, A. Correlations and neuronal population information. Annu. Rev. Neurosci. 39, 237–256 (2016).
Article PubMed PubMed Central CAS Google Scholar
Moreno-Bote, R. et al. Information-limiting correlations. Nat. Neurosci. 17, 1410–1417 (2014).
Article PubMed PubMed Central CAS Google Scholar
Averbeck, B. B., Latham, P. E. & Pouget, A. Neural correlations, population coding and computation. Nat. Rev. Neurosci. 7, 358–366 (2006).
Article PubMed CAS Google Scholar
Ding, X. et al. Information geometry of the retinal representation manifold. Adv. Neural Inf. Process. Syst. 36, 44310–44322 (2023).
Google Scholar
Azeredo Da Silveira, R. & Rieke, F. The geometry of information coding in correlated neural populations. Annu. Rev. Neurosci. 44, 403–424 (2021).
Article PubMed CAS Google Scholar
Waaga, T. et al. Grid-cell modules remain coordinated when neural activity is dissociated from external sensory cues. Neuron 110, 1843–1856.e6 (2022).
Article PubMed PubMed Central CAS Google Scholar
Zohary, E., Shadlen, M. N. & Newsome, W. T. Correlated neuronal discharge rate and its implications for psychophysical performance. Nature 370, 140–143 (1994).
Article ADS PubMed CAS Google Scholar
Huk, A., Bonnen, K. & He, B. J. Beyond trial-based paradigms: continuous behavior, ongoing neural activity, and natural stimuli. J. Neurosci. 38, 7551–7558 (2018).
Article PubMed PubMed Central CAS Google Scholar
Georgopoulos, A. P., Kalaska, J. F., Caminiti, R. & Massey, J. T. On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. J. Neurosci. 2, 1527–1537 (1982).
Article PubMed PubMed Central CAS Google Scholar
Cisek, P. & Green, A. M. Toward a neuroscience of natural behavior. Curr. Opin. Neurobiol. 86, 102859 (2024).
Article PubMed CAS Google Scholar
Nejatbakhsh, A., Garon, I. & Williams, A. H. Estimating noise correlations across continuous conditions with Wishart processes. Adv. Neural Inform. Process. Syst. 36, 54032–54045, https://proceedings.neurips.cc/paper_files/paper/2023/file/a935ba2236c6ba0fb620f23354e789ff-Paper-Conference.pdf (2023).
Gardner, R. J. et al. Toroidal topology of population activity in grid cells. Nature 602, 123–128 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Maheswaranathan, N. et al. Interpreting the retinal neural code for natural scenes: From computations to neurons. Neuron 111, 2742–2755.e4 (2023).
Article PubMed PubMed Central CAS Google Scholar
Amari, S. & Nagaoka, H. Methods of Information Geometry Vol. 191 (American Mathematical Soc., 2000).
Bishop, C. M. Pattern Recognition and Machine Learning (Springer New York, 2006).
Yao, Y., Vehtari, A., Simpson, D. & Gelman, A. Using stacking to average Bayesian predictive distributions (with discussion). Bayesian Anal. 13, 917–1003 (2018).
Makowski, D., Ben-Shachar, M. S., Chen, S. H. A. A. & Lüdecke, D. Indices of effect existence and significance in the Bayesian framework. Front. Psychol. 10, 1–14 (2019).
Article Google Scholar
Barack, D. L. & Krakauer, J. W. Two views on the cognitive brain. Nat. Rev. Neurosci. 22, 359–371 (2021).
Article PubMed CAS Google Scholar
Kriegeskorte, N. & Wei, X. X. Neural tuning and representational geometry. Nat. Rev. Neurosci. 22, 703–718 (2021).
Article PubMed CAS Google Scholar
Ledoit, O. & Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88, 365–411 (2004).
Article MathSciNet Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv Prepr. arXiv 1802, 03426 (2018).
Google Scholar
Kanitscheider, I., Coen-Cagli, R., Kohn, A. & Pouget, A. Measuring Fisher information accurately in correlated neural populations. PLoS Comput. Biol. 11, 1–27 (2015).
Article Google Scholar
Schneider, S., Lee, J. H. & Mathis, M. W. Learnable latent embeddings for joint behavioural and neural analysis. Nature 617, 360–368 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Ye, Z., Li, H., Tian, L. & Zhou, C. Beyond the delay neural dynamics: a decoding strategy for working memory error reduction. bioRxiv 2022.06.01.494426 https://doi.org/10.1101/2022.06.01.494426 (2022).
Higgins, I. et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. in Proceedings of the 5th International Conference on Learning Representations (ICLR) https://openreview.net/forum?id=Sy2fzU9gl (2017).
Giocomo, L. M., Moser, M.-B. & Moser, E. I. Computational models of grid cells. Neuron 71, 589–603 (2011).
Article PubMed CAS Google Scholar
Kropff, E. & Treves, A. The emergence of grid cells: Intelligent design or just adaptation. Hippocampus 18, 1256–1269 (2008).
Article PubMed Google Scholar
Kropff, E., Carmichael, J. E., Moser, E. I. & Moser, M.-B. Frequency of theta rhythm is controlled by acceleration, but not speed, in running rats. Neuron 109, 1029–1039.e8 (2021).
Article PubMed PubMed Central CAS Google Scholar
Peer, N., Yamin, H. & Cohen, D. Multidimensional encoding of movement and contextual variables by rat globus pallidus neurons during a novel environment exposure task. iScience 25, 105024 (2022).
Article ADS PubMed PubMed Central Google Scholar
Carpenter, F., Manson, D., Jeffery, K., Burgess, N. & Barry, C. Grid cells form a global representation of connected environments. Curr. Biol. 25, 1176–1182 (2015).
Article PubMed PubMed Central CAS Google Scholar
Barry, C., Hayman, R., Burgess, N. & Jeffery, K. J. Experience-dependent rescaling of entorhinal grids. Nat. Neurosci. 10, 682–684 (2007).
Article PubMed CAS Google Scholar
Keinath, A. T., Epstein, R. A. & Balasubramanian, V. Environmental deformations dynamically shift the grid cell spatial metric. eLife 7, e38169 (2018).
Article PubMed PubMed Central Google Scholar
Barry, C., Ginzberg, L. L., O’Keefe, J. & Burgess, N. Grid cell firing patterns signal environmental novelty by expansion. Proc. Natl Acad. Sci. 109, 17687–17692 (2012).
Article ADS PubMed PubMed Central CAS Google Scholar
Bush, D., Barry, C. & Burgess, N. What do grid cells contribute to place cell firing. Trends Neurosci. 37, 136–145 (2014).
Article PubMed PubMed Central CAS Google Scholar
Bonnevie, T. et al. Grid cells require excitatory drive from the hippocampus. Nat. Neurosci. 16, 309–317 (2013).
Article PubMed CAS Google Scholar
McClain, K., Tingley, D., Heeger, D. J. & Buzsáki, G. Position–theta-phase model of hippocampal place cell activity applied to quantification of running speed modulation of firing rate. Proc. Natl. Acad. Sci. USA 116, 27035–27042 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643–1653 (2017).
Article PubMed CAS Google Scholar
Parker, P. R. L., Brown, M. A., Smear, M. C. & Niell, C. M. Movement-related signals in sensory areas: roles in natural behavior. Trends Neurosci. 43, 581–595 (2020).
Article PubMed PubMed Central CAS Google Scholar
Schneider, D. M., Nelson, A. & Mooney, R. A synaptic and circuit basis for corollary discharge in the auditory cortex. Nature 513, 189–194 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Schneider, D. M., Sundararajan, J. & Mooney, R. A cortical filter that learns to suppress the acoustic consequences of movement. Nature 561, 391–395 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Niell, C. M. & Stryker, M. P. Modulation of visual responses by behavioral state in mouse visual cortex. Neuron 65, 472–479 (2010).
Article PubMed PubMed Central CAS Google Scholar
Ayaz, A., Saleem, A. B., Schölvinck, M. L. & Carandini, M. Locomotion controls spatial integration in mouse visual cortex. Curr. Biol. 23, 890–894 (2013).
Article PubMed PubMed Central CAS Google Scholar
Saleem, A. B., Ayaz, A. I., Jeffery, K. J., Harris, K. D. & Carandini, M. Integration of visual motion and locomotion in mouse visual cortex. Nat. Neurosci. 16, 1864–1869 (2013).
Article PubMed PubMed Central CAS Google Scholar
Vinck, M., Batista-Brito, R., Knoblich, U. & Cardin, J. A. Arousal and locomotion make distinct contributions to cortical activity patterns and visual encoding. Neuron 86, 740–754 (2015).
Article PubMed PubMed Central CAS Google Scholar
Shenoy, K. V., Sahani, M. & Churchland, M. M. Cortical control of arm movements: a dynamical systems perspective. Annu. Rev. Neurosci. 36, 337–359 (2013).
Article PubMed CAS Google Scholar
Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation through neural population dynamics. Annu. Rev. Neurosci. 43, 249–275 (2020).
Article PubMed PubMed Central CAS Google Scholar
Sosa, M. & Giocomo, L. M. Navigating for reward. Nat. Rev. Neurosci. 22, 472–487 (2021).
Article PubMed PubMed Central CAS Google Scholar
Dupret, D., O’Neill, J., Pleydell-Bouverie, B. & Csicsvari, J. The reorganization and reactivation of hippocampal maps predict spatial memory performance. Nat. Neurosci. 13, 995–1002 (2010).
Article PubMed PubMed Central CAS Google Scholar
Hollup, S. A., Molden, S., Donnett, J. G., Moser, M. B. & Moser, E. I. Accumulation of hippocampal place fields at the goal location in an annular watermaze task. J. Neurosci. 21, 1635–1644 (2001).
Article PubMed PubMed Central CAS Google Scholar
Boccara, C. N., Nardin, M., Stella, F., O’Neill, J. & Csicsvari, J. The entorhinal cognitive map is attracted to goals. Science 363, 1443–1447 (2019).
Article ADS PubMed CAS Google Scholar
Eissa, T. L. & Kilpatrick, Z. P. Learning efficient representations of environmental priors in working memory. PLoS Comput. Biol. 19, e1011622 (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Panichello, M. F., DePasquale, B., Pillow, J. W. & Buschman, T. J. Error-correcting dynamics in visual working memory. Nat. Commun. 10, 3366 (2019).
Article ADS PubMed PubMed Central Google Scholar
Papastathopoulos, I., Auld, G., Lindgren, F., Gerlei, K. Z. & Nolan, M. F. Bayesian inference of grid cell firing patterns using Poisson point process models with latent oscillatory Gaussian random fields. Preprint at https://doi.org/10.48550/arXiv.2303.17217 (2023).
Rule, M. E. et al. Variational log‐Gaussian point‐process methods for grid cells. Hippocampus 33, 1235–1251 (2023).
Article PubMed PubMed Central Google Scholar
van der Wilk, M. et al. A Framew. Interdomain Multioutput Gaussian Process. arXiv Prepr. arXiv 2003, 01115 (2020).
Google Scholar
Matthews, A. G. D. G. et al. GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18, 1–6 (2017).
Silverman, B. W. Density Estimation for Statistics and Data Analysis https://doi.org/10.1201/9781315140919 (Routledge, 2018).
Beck, J., Bejjanki, V. R. & Pouget, A. Insights from a simple expression for linear Fisher information in a recurrently connected population of spiking neurons. Neural Comput. 23, 1484–1502 (2011).
Article MathSciNet PubMed Google Scholar
Chaudhuri, R., Gerçek, B., Pandey, B., Peyrache, A. & Fiete, I. The intrinsic attractor manifold and population dynamics of a canonical cognitive circuit across waking and sleep. Nat. Neurosci. 22, 1512–1520 (2019).
Article PubMed CAS Google Scholar
Cavanna, N. J., Jahanseir, M. & Sheehy, D. R. A geometric perspective on sparse filtrations. in Proceedings of the 27th Canadian Conference on Computational Geometry, (CCCG 2015) 116–121 http://research.cs.queensu.ca/cccg2015/CCCG15-papers/01.pdf (2015).
Tralie, C., Saul, N. & Bar-On, R. Ripser.py: a lean persistent homology library for Python. J. Open Source Softw. 3, 925 (2018).
Article ADS Google Scholar
Student. Probable error of a correlation coefficient. Biometrika 6, 302–310 (1908).
Pearson, K. On the probable error of a coefficient of correlation a found from a fourfold table. Biometrika 9, 22–33 (1913).
Hotelling, H. New light on the correlation coefficient and its transforms. J. R. Stat. Soc. Ser. B (Methodol.) 15, 193–225 (1953).
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge Robert Wang for reviewing the manuscript and providing insightful comments; Dr. Richard J. Gardner for clarifying details regarding the publicly available grid cell spiking dataset³⁵; and Dr. Haoran Li for testing our computer code. This work was supported by grants from Incubator for Transdisciplinary Futures: Toward a Synergy Between Artificial Intelligence and Neuroscience (RW).

Author information

Authors and Affiliations

Department of Physics, Washington University in St. Louis, St. Louis, MO, USA
Zeyuan Ye & Ralf Wessel

Authors

Zeyuan Ye
View author publications
Search author on:PubMed Google Scholar
Ralf Wessel
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization and writing: Z.Y. and R.W.; methodology and investigation: ZY.; supervision: RW.

Corresponding author

Correspondence to Zeyuan Ye.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ye, Z., Wessel, R. Speed modulations in grid cell information geometry. Nat Commun 16, 7723 (2025). https://doi.org/10.1038/s41467-025-62856-x

Download citation

Received: 03 October 2024
Accepted: 31 July 2025
Published: 19 August 2025
Version of record: 19 August 2025
DOI: https://doi.org/10.1038/s41467-025-62856-x