Extended Data Fig. 7: Temporal integration of neural activity improves decoding performance, but quadratic and linear decoding yield identical biological conclusions.
From: Fundamental bounds on the fidelity of sensory cortical coding

a–c, To identify how many PLS dimensions were needed to determine d′ accurately, we divided data from each of 5 mice into three equally sized portions. We performed PLS analysis using trials in the first third. Onto the PLS dimensions thereby identified, we projected the neural ensemble activity in the second third of the data (training data). We retained only the first NR dimensions of this projection and computed d′ in the reduced space (magenta data points) by identifying a hyperplane for optimal stimulus discrimination. Finally, we applied this discrimination strategy to the remaining third of the data (test data) and again calculated d′ (grey points). Plots show mean values of d′ as a function of NR for the interval [0.83 s, 1.11 s] from stimulus onset (N = 5 mice; error bars denote s.d. across 100 different subsets of 1,000 neurons per mouse). We normalized d′ values to that found for NR = 5 on the test dataset. For NR > 5, discrimination performance declines owing to overfitting for all discrimination strategies: instantaneous (a), cumulative (b) and integrated (c). Hence, throughout the rest of the study we used NR = 5 for all calculations of d′. d, Pearson correlation coefficients between the optimal linear decoding weights attained using instantaneous decoding at different time bins after the onset of grating stimuli (±30° orientations). These weights were highly correlated for different time bins, especially across the interval [0.5 s, 2 s], during which d′ reaches a plateau. Further, optimal decoders for each time bin yielded nearly equivalent decoding performance when applied to data from other time bins. For instance, the optimal decoder for the fourth time bin (t = 0.97 s), when applied to any other of the last five time bins, yielded a performance within less than 2% of that of the optimal instantaneous decoder in all mice. When applied to the first and second time bins, the decoder from the fourth time bin yielded decoding performances that were, respectively, 83 ± 11% and 90 ± 3% (mean ± s.d.; N = 5 mice; 217–232 trials per stimulus) of that of the optimal decoders. e, Plots of d′ versus time after stimulus onset, for instantaneous and cumulative decoding strategies (Fig. 3). For each mouse that viewed gratings oriented at ±30°, we chose 100 random subsets of 1,000 cells and normalized d′ values by those obtained using a time-integrated decoding strategy, which involved optimal linear discrimination over one interval, [0.28 s, 1.94 s], covering most of the visual stimulation period. Green traces, mean d′ values for individual mice using a time bin of 275 ms. Error bars, s.d. across 5 mice. f, In the five-dimensional space used after truncating ensemble neural responses to the five leading PLS dimensions, the distributions of noise in the responses to the two stimuli were highly similar. Specifically, non-diagonal elements, Σij, of the noise covariance matrices for the two stimulus conditions were highly correlated (r: 0.81 ± 0.16; mean ± s.d.; N = 5 mice), as computed for the interval [0.83 s, 1.11 s] after stimulus onset. This similarity argues that a linear discrimination strategy to classify the two sets of ensemble neural responses is near optimal, as confirmed in h. Values of Σij are plotted as mean ± s.d., computed across 100 different randomly chosen subsets of 1,000 neurons per mouse. g, Using optimal linear decoding, d′ values saturated as the number of trials analysed increased. Colours denote individual mice. Data points were calculated for the interval [0.83 s, 1.11 s] after stimulus onset. Error bars, s.d. across 100 different randomly chosen subsets of 1,000 cells per mouse and stimulation trials. h, To check whether our results depended on our use of linear decoding, we tested whether quadratic decoding might yield different conclusions. We examined the KL divergence31, a generalization of (d′)2 that makes no assumption about the statistical distributions under consideration. We computed the KL divergence, which equals (d′)2 for linear decoders, by using Gaussian approximations to the distributions of ensemble neural responses to the two different stimuli, and we plotted the results as a function of the number of cells, n, in the ensemble. First, to recapitulate our determinations of (d′)2 (magenta data points), we computed the KL divergence under the assumption the two different response distributions had distinct means but identical noise covariance matrices, which we estimated as the mean noise covariance matrix averaged over the two different stimulus conditions. This is equivalent to computing (d′)2. Next, we relaxed the assumption that the two noise covariance matrices were equal and computed the KL divergence between the distributions of neural responses to stimulus B relative to those to stimulus A (blue points), and vice versa (red points) (Methods). For all mice, KL divergence values saturated with increasing n and, except in one mouse, were not much larger than (d′)2 values. Thus, quadratic decoders (which are optimal for discriminating two Gaussian distributions with different means and covariances) will yield the same basic conclusions as linear decoders (which are optimal for discriminating two Gaussian distributions with the same covariance matrix). Data points and error bars denote mean ± s.d. values computed in each mouse across 50 different randomly chosen subsets of cells and assignments of visual stimulation trials to decoder training and testing (Extended Data Fig. 5b). i, Mean neural responses, averaged across all cells, to stimuli A (top) and B (bottom) for the first and second halves of the experimental trials in each mouse. Error bars, s.d. across the set of trials. j, d′ values computed for each mouse using instantaneous decoders trained on the first half of the trials and tested on the second half (x axis), plotted with d′ values for an instantaneous decoder trained on the second half of the trials and tested on the first half (y axis). a–j are based on 217–332 trials per stimulus condition in each of 5 mice.