Main

Learning is the product of changes in the strength of synaptic connections between neurons6,7,8,9,10,11,12,13. Synaptic modifications can have difficult-to-predict effects on network output, particularly in complex hierarchical networks such as the brain. The challenge of determining how individual synapses should be altered to improve task performance is known as the credit assignment problem14,15,16,17,18. Whereas this problem is effectively solved in artificial neural networks (ANNs) by the backpropagation-of-error algorithm19, how credit assignment is solved in the brain remains unknown14,15.

Recent theoretical work has proposed several models by which biological circuits could solve credit assignment, including target learning and backpropagation-like algorithms1,2,3,4,5,20,21. Central to both artificial and biologically inspired solutions to credit assignment is the vectorization of instructive signals, as opposed to the broadcasting of a single scalar teaching signal14. Effective learning requires, in addition to vectorization, instructive signals to be separable from feedforward inputs to prevent interference15. In ANNs, this is achieved via temporal separation, which has long been thought to be biologically implausible. One hypothesis is that in cortex, credit-related information is spatially, rather than temporally, segregated in the apical dendrites of pyramidal neurons15. This aligns with anatomical and circuit evidence that feedforward inputs are received perisomatically and feedback inputs are received in the distal dendrites22,23,24,25,26,27,28,29,30,31. However, direct evidence regarding the subcellular mechanisms of credit assignment is lacking.

Vectorized teaching signals at the dendritic level should meet four experimentally testable conditions. First, dendritic activity should contain information that is not present in somatic activity alone (although somas could theoretically transmit gradients using qualitatively different spiking patterns2,4,32, the cable properties of dendrites predict some level of independence between somatic and dendritic activity). Second, dendritic activity should encode information about task performance that could serve as instructive signals, such as reward and error representations. Third, dendritic activity should reflect the contribution of that neuron to task performance (that is, the reward function). Fourth, disrupting vectorized instructive dendritic signals should impair learning.

Specifying a reward function using a BCI task

Evaluating credit assignment in biological neural networks has thus far proved impossible14,15. Teaching signals can only be defined relative to a reward function that maps neural activity to task performance. It is unclear whether such functions are explicitly represented in the brain. Even if they are, experimenters are blind to their specific formulation in terms of neural activity15. Neurofeedback brain–computer interface (BCI) tasks present a potential solution to this problem by directly coupling neural activity to task performance, thereby allowing the experimenter to specify the reward function to be optimized14,20,21. Previous studies have shown that mice are able to learn BCI tasks using a variety of feedback stimuli and brain areas and that learning induces changes in the activity of the neurons controlling the BCI, including in the hippocampus and various sensory and motor cortices33,34,35,36,37,38,39. Here we leveraged a visually guided neurofeedback BCI task in cortical pyramidal neurons to test subcellular mechanisms for error and reward-related signalling (Fig. 1a–c and Supplementary Figs. 1 and 2). We trained head-fixed mice under a 2-photon microscope to control the activity of two spatially intermingled sets of GCaMP7f-labelled layer 5 pyramidal neurons, in the retrosplenial cortex (RSC), designated P+ and P− (selection criteria in Extended Data Figs. 1 and 4b and Methods). The difference in mean somatic GCaMP activity of P+ versus P− neurons was coupled to rotation of a visual grating relative to a rewarded target angle33,34,35,36,38,39 (Fig. 1d–f and Supplementary Data Fig. 1). We selected RSC owing to the optical accessibility of layer 5 and previous demonstration of independent dendritic events in this area40. We recorded GCaMP activity at 15 Hz in the proximal trunk dendrite as a proxy for somatic activity; this allowed imaging of many neurons while reducing signal contamination owing to the more precise spatial footprint and faster signal kinetics of the apical trunk41,42,43. We measured task performance with two metrics: accuracy, which represented the fraction of rewarded trials; and speed, which represented the number of rewards obtained per minute. Mice (n = 6) learned the task by both metrics (Fig. 1g and Extended Data Fig. 2 and 3).

Fig. 1: Mice learn a neurofeedback BCI task through the differential regulation of P+ and P− neurons.
figure 1

a, Schematic of the BCI setup. Mice were head-fixed and imaged under a 2-photon (2P) microscope and free to run on a cylindrical treadmill. Two user-defined populations of GCaMP7f-labelled layer 5 (L5) pyramidal neurons in RSC were imaged at the proximal apical trunk: P+ (red) and P− (blue) were selected to control the rotation of a Gabor patch. P0 neurons were designated as all other neurons in the field of view. Single frames were online-registered (motion-corrected). Activity in P+ neurons rotated the patch clockwise, towards the target angle of 90°. Activity in P− neurons rotated the Gabor patch stimulus counter-clockwise, towards a 0° angle. b, Schematic of the mapping between P+ and P− activity, stimulus angle, target activity and error. Error was the distance between current and target activation. The angle represents a binned (7 bins, 15° apart, from 0° to 90°) linear mapping between the mean activity in P+ neurons minus the activity in P− neurons. c, Trial structure: mice had 28 s to reach target activity and receive a reward, delivered 1 s later. In successful trials, the 90° Gabor patch was shown for 2 s, followed by 1 s of black screen presentation. In unsuccessful trials, a 3 s black screen was presented before the onset of the next trial. d, ΔF/F0 traces as recorded live for P+ (red) and P− (blue) neurons. Vertical dashed lines and triangles represent timepoints where the mouse reached target activity. e, Mean activity for the red (P+) and blue (P−) traces shown in d. The black trace shows the arithmetic subtraction of P+ and P− neurons (z-scored). The orange trace shows the corresponding visual stimulus angle as presented to the mouse. f, Mean ΔF/F0 for P+ and P− activity aligned to the time in which the mouse reached target activity (dashed, vertical line and black triangle) for the session highlighted in d,e. Reward was delivered 1 s later (solid vertical line with water reward). Shaded areas represent s.e.m. g, Mean performance over days quantified as the fraction of successful trials over the total number of trials and as the number of rewards per minute (one-way repeated measures ANOVA, P = 5 × 10−4 (accuracy) and P = 0.002 (rewards per minute); n = 6 mice). The dashed horizontal red line represents chance level for accuracy performance (Methods). Shaded areas represent s.e.m. h, ΔF/F0 traces for the same P+ and P− neurons on training day 1 and training day 14. i, Calcium transient frequency for P+, P− and P0 neurons across the 14 days of training normalized to the activity on day 1. All neurons were tracked over the full 14 days of imaging. Two-way repeated measures ANOVA, P = 0.012, P = 0.004 and P = 9.3 × 10−4 for the effect of population identity, days and an interaction between population identity and days. After Tukey’s multiple comparison, P = 0.027 (P+ versus P− neurons), P = 0.95 (P+ versus P0 neurons) and P = 0.01 (P− versus P0 neurons). n = 6 mice. Shaded areas represent s.e.m.

Source data

We compared activity levels of P+ and P− populations, as well as the population of surrounding neurons that were not directly involved in the rotation of the stimulus (termed P0), across days of task performance. We imaged the same neurons longitudinally throughout all experiments. We found that learning was accompanied by the differential regulation in the activity of P+ and P− neurons over days (Fig. 1h,i), with P+ neurons maintaining their activity levels while P− neurons were downregulated. Whereas, on average, changes in activity in P0 neurons resembled changes in P+ neurons (Fig. 1i), selecting the subpopulation of P0 neuron with matching activity levels of P+ and P− neurons on day 1 revealed that changes in activity in P0 neurons fell in between those of P+ and P− neurons (Extended Data Fig. 4). As the most active neurons on day 1 were also those that were most strongly downregulated (Extended Data Fig. 4c), our results are consistent with a model of learning by sparsification, an energy-efficient solution to the task44. Increases in task performance were not correlated with changes in locomotion across days (Extended Data Fig. 3). Moreover, the P+ and P− populations were spatially intermingled, and had the same GCaMP transient frequency on day 1 (Extended Data Figs. 1 and 4a), ruling out the possibility of learning the task by simply engaging a non-specific gain modulation mechanism.

Dendrites contain information not found in their somas

To determine whether apical dendritic activity contained information that was not encoded in parent somatic activity alone, we used an electrically tunable lens to semi-simultaneously (15 Hz per plane) record activity in proximal and distal trunk dendrites across learning (Fig. 2a). We paired proximal and distal dendrites on the basis of the Pearson correlation of their GCaMP signals, thresholded at r = 0.6 as in previous studies41,42,43. Previous work in brain slices demonstrated that dendritic GCaMP signals are larger when current is injected in the distal trunk and smaller when current is injected at the soma41 (controlling for the same number of triggered corresponding action potentials). This indicates that differences in somatic versus dendritic magnitude for coincident GCaMP events reflect the spatial bias of the different inputs that target these two compartments. To estimate the magnitude of somatic and dendritic events, we first deconvolved the GCaMP traces of somas and dendrites using CASCADE45. Deconvolution allowed us to correct for the well-described problem of different signal kinetics across dendritic compartments46. Next, we used an area-under-the-curve approach to quantify the magnitude of individual transients (all main results were also validated using a ΔF/F0-based approach to estimation magnitude of transients; Methods and Supplementary Fig. 3) and defined events as coincident whenever they occurred within 500 ms of each other. As these coincident events represent the vast majority of GCaMP transients40,41,42,43,46,47,48,49,50,51,52, we focused all subsequent analysis on events for which a transient was detected in both compartments.

Fig. 2: Differences in somatic and dendritic magnitudes for coincident events are predicted by local network dynamics.
figure 2

a, Schematic of two-plane 2-photon calcium imaging of a network of neurons at the proximal and distal trunk. b, ΔF/F0 traces recorded simultaneously in the soma and dendrite for a single neuron of interest (top; P+ and P− neurons across days 1–14) and corresponding activity in 5 surrounding neurons (bottom). Numbers 1–5 indicate identified GCaMP events. c, Relationship between integrals of somatic and dendritic transients for the example neuron shown in b. Data points represent individual events that were simultaneously detected in soma and dendrite (Methods). A least-squares linear model (dashed grey line) defined events as dendritically amplified (residual +) versus dendritically attenuated (residual −). Events 1–5 correspond to the transients shown in b. Subscript d denotes deconvolved. d, For each coincident event in the neuron of interest shown in b,c, we estimated the network activity vector in the 2 s before using all other neurons in the field of view. Here, the network activity vector was projected onto the first three principal components for visualization only. The shaded black hyperplane represents the decision boundary for binary classification (dendritically amplified versus dendritically attenuated) calculated using a linear SVM. Events 1–5 correspond to the network activity vector associated with transients 1–5 shown in b,c. e, The relationship between SD residuals estimated in c and the distance from the decision boundary (hyperplane distance) estimated in d for all coincident somato-dendritic events in the neuron of interest. Events 1–5 correspond to those shown in bd. The dashed grey line represents the least-squares best-fit line. To maintain visual consistency with d, the distance from the hyperplane was estimated on the first three principal components (for visualization only). f, The relationship between SD residual as estimated in c and somatic event magnitude. Highlighted events 1–5 correspond to those shown in be. The dashed grey line represents the least-squares best-fit line. g, Decoder performance as a function of the correlation between SD residuals and hyperplane distance (Pearson’s r = 0.74; P = 1.4 × 1084, n = 466 neurons). Data points represent individual neurons. h, Distribution of P values for test data and a control randomly shuffled distribution, testing the correlation between SD residuals and distance from the hyperplane (or classification confidence, Wilcoxon signed rank test P = 1.3 × 109; n = 466 neurons) as estimated in e. i, Left, for all neurons, the Pearson’s r for SD residuals and somatic event magnitude as characterized in f. The residual-based approach perfectly decorrelates SD residual from somatic activity alone. Right, for test data, a zoomed in version of the same histogram shown on the left. j, Decoding performance for neurons with a statistically significant correlation between SD residual and distance from the hyperplane (paired t-test, P = 8.6 × 109; 0.61 ± 0.006 (test) and 0.50 ± 0.007 (shuffle), mean ± s.e.m.; n = 82). Dashed grey line indicates chance level. k, Pearson’s r for neurons with a statistically significant correlation between SD residual and the distance between population vector and hyperplane (paired t-test, P = 3.35 × 10−25; 0.28 ± 0.01 (test) and −7.2 × 10−4 ± 0.01 (shuffle), mean ± s.e.m.; n = 82). l, Mean ΔF/F0 events for soma and dendrites for all dendritically amplified (left) and dendritically attenuated (right) events in a single neuron. ΔF/F0 traces are aligned to somatic peak time. Event latency is defined as the time between the somatic and dendritic peaks. Dendritically amplified events peaked earlier compared with dendritically attenuated events. m, Pearson correlation value between the SD residual and the event latency between soma and its corresponding dendrite, indicating that the larger the SD residual, the earlier the dendritic peak is compared to the somatic one (paired t-test, P = 8 × 10−13; −0.075 ± 0.007 (test) and −0.005 ± 0.006 (shuffle), mean ± s.e.m; n = 466 neurons).

Source data

Empirically, we observed that the relative magnitude of coincident events in somas and dendrites varied substantially, despite event timing correlation being very high (Fig. 2b; consistent with prior studies40,41,43,46,47,49). As event magnitudes at soma and dendrites were best described by a linear relationship (Extended Data Figs. 5 and 6b), we assessed the relative degree of dendritic amplification versus attenuation with a best-fit line through all events and then calculated the somato-dendritic residual (SD residual) associated with individual transients43 (Fig. 2b,c). This captured the variance of dendritic responses for a given somatic event magnitude. We then defined positive and negative residuals as dendritically amplified and attenuated events, respectively.

To test whether SD residuals contain information that is biologically meaningful, we used activity from all the somas in our field of view in the 2 s preceding individual GCaMP events in a neuron of interest (P+ and P− neurons on days 1 to 14) to predict whether these events were dendritically amplified or attenuated (Fig. 2d). To do so, we used a linear support vector machine (SVM), a common algorithm to both classify and regress using high-dimensional data. We found that the performance of our binary classifier on individual neurons strongly correlated with the ability of the decoder to capture the magnitude of dendritic amplification or attenuation in the classification confidence (Fig. 2e,g,h and Extended Data Figs. 6c,d and 7a,b). This was an emergent property, as the decoder was trained for binary classification only and had no information about the magnitude of dendritic amplification or attenuation. Among 466 neurons, approximately 20% showed a significant correlation between classification confidence and the magnitude of SD residual (Fig. 2h and Extended Data Figs. 6c,d and 7a,b). We found that in these neurons, we could accurately decode 61% of the events as being either amplified or attenuated, well above the 50% chance level (Fig. 2j and Extended Data Figs. 6e and 7c). Additionally, at the single-cell level, we found a statistically significant positive Pearson correlation between classification confidence and SD residual, demonstrating that the surrounding network of neurons can be used to predict the amplitude of the residual for coincident somato-dendritic transients (Fig. 2k and Extended Data Figs. 6f and 7d). Of note, our analysis approach completely decorrelates somatic event magnitude from SD residuals (Fig. 2f,I and Extended Data Fig. 6a), indicating that mismatches in somato-dendritic coupling are predicted independently from somatic activity and represent information encoded de novo in the dendrites. Additionally, our results demonstrate that P0 neurons could be decoded at the same level as P+ and P− neurons (Extended Data Fig. 8), and that decoding does not depend on somatic responses to visual stimuli across the three subpopulations (Extended Data Fig. 9).

We further found that dendritically amplified events consistently peaked earlier than dendritically attenuated events compared with the soma (Fig. 2l,m and Extended Data Figs. 6g and 8e), congruent with results in brain slices41.

Experimental perturbation of SD residuals

Previous studies indicate that anaesthesia reduces top-down input and/or inhibits apical tuft dendrites in layer 5 pyramidal neurons22,53,54,55. We therefore hypothesized that the SD residual should be reduced during anaesthesia compared with wakefulness. To test this, we simultaneously recorded somatic and dendritic activity of layer 5 pyramidal neurons in RSC during these two conditions (Fig. 3a–c). Consistent with previous findings22, we observed a marked effect of anaesthesia on the frequency of GCaMP transients (Fig. 3d). For each neuron, we used all events detected during wakefulness to establish the distribution of SD residuals during awake periods. We then measured the effect of anaesthesia on the SD residual using the best-fit somato-dendritic line calculated during wakefulness. Anaesthesia strongly reduced the SD residual (Fig. 3c,e), consistent with previous observations of decreased top-down input22,55.

Fig. 3: Experimental manipulation of SD residuals.
figure 3

a, Schematic of the experimental approach. First, we imaged neurons in RSC while the mouse was exposed to rotating stimuli identical to those presented during baseline estimation in the BCI task. Next, we anaesthetized the mice (using isoflurane) and imaged the same neurons. b, Schematic of the imaging approach. c, Top, ΔF/F0 GCaMP traces simultaneously recorded in the soma and its corresponding dendrite for two representative neurons during wakefulness and anaesthesia. Bottom, the mean ΔF/F0 signal in the somas and in the dendrites of two example neurons, for all events that occurred during wakefulness and anaesthesia. Compared to somatic activity, dendrites are preferentially inhibited during anaesthesia. Shading represents s.e.m. Max, maximum; norm., normalized. d, Mean somatic event rate (GCaMP7f) during wakefulness and anaesthesia. The dashed grey line represents the identity line. Paired t-test, P < 1.9 × 10−126; 0.002 ± 6.3 × 10−5 (wakefulness) and 0.0002 ± 1.8 × 10−5 (anaesthesia); n = 832 neurons from 8 imaging sessions in 4 different mice. e, Distribution of SD residual during wakefulness and anaesthesia. Paired t-test, P = 3.3 × 10−20; 0 ± 2.2 × 10−18 (wakefulness) and −0.52 ± 4.9 × 10−2 (anaesthesia), mean ± s.e.m.; n = 160 neurons. f, Targeting strategy schematic. A Cre-dependent version of ChRmine tagged with oScarlet was injected into layer 1 of the RSC of NDNF-Cre mice. GCaMP7f under the control of the synapsin promoter was injected into layer 5 of the RSC in the same mice. g, z-stack reconstruction of an imaged field of view. The image was acquired in vivo using 2-photon microscopy (1,000 nm laser wavelength). Scale bar, 50 μm. h, Same as c, for opto on and control conditions. Compared to somatic activity, dendrites are preferentially inhibited during opto on. i, Mean somatic event rate (GCaMP7f) during opto on and off. The dashed grey line represents the identity line (paired t-test, P < 2.2 × 10−308; 2.8 × 10−2 ± 4.5 × 10−4 (control) and 5.9 × 10−3 ± 2 × 10−4 (opto on), mean ± s.e.m.; n = 2,884 neurons). Shading represents s.e.m. j, Distribution of SD residual during control and opto on. Paired t-test, P = 9 × 10−214; n = 1,886 neurons from 56 sessions recorded from 4 mice; 0 ± 7.7 × 10−19 (control) and −0.87 ± 2.4 × 10−2 (opto on).

Source data

Prior work has also demonstrated that NDNF-positive layer 1 inhibitory interneurons can inhibit the apical dendrites of pyramidal neurons53,54. We therefore tested whether NDNF-mediated inhibition reduced SD residuals, indicative of a preferential effect on apical dendritic activity. To do so, we co-injected NDNF-Cre mice (n = 4) with both a Cre-dependent version of ChRmine in layer 1 and GCaMP7f expressed under the control of the synapsin promoter in layer 5 (Fig. 3f,g). We then recorded somatic and dendritic GCaMP activity of individual layer 5 neurons, in the presence and absence of layer 1 NDNF+ interneuron activation via an LED light (Fig. 3h and Extended Data Fig. 10). Similar to our approach during anaesthesia, we first established the control relationship between somatic and dendritic event amplitudes for each neuron and then compared this to the SD residuals of activity during optogenetic activation. NDNF+ interneuron activation reduced the frequency of GCaMP transients (Fig. 3i) and, consistent with a number of previous ex vivo and in vivo studies53,54,56, strongly decreased the SD residual in individual layer 5 pyramidal neurons (Fig. 3h,j). LED illumination alone (conducted in a control cohort of five mice) was not responsible for these results (Extended Data Fig. 10). Together, these results demonstrate that SD residual is predictably affected by two independent experimental manipulations in vivo, establishing it as a robust metric of dendritic versus somatic activity.

SD residuals decode reward and trial outcome

Next, we evaluated whether the SD residual contained information about task-related variables that could serve as putative teaching signals. We first tested whether changes in SD residual at the population level contained reward-related information. For each imaging session, we decoded rewarded versus unrewarded trials by comparing the 2 s following neural activity reaching target activation on rewarded trials with the analogous 2 s timeout period during unrewarded trials (Fig. 4a–c). Using a linear SVM trained on SD residuals (Methods), we were able to decode at 63% accuracy on average, above both chance and shuffle performance (Fig. 4d,e and Extended Data Figs. 6 and 11a,b).

Fig. 4: A population vector of SD residuals contains reward-related information.
figure 4

a, Schematic of the experimental approach. We isolated all neurons in the field of view with a paired soma and dendrite and used the SD residual population vector to decode task-relevant variables. b, ΔF/F0 GCaMP traces simultaneously recorded in the soma and its corresponding dendrite for three representative neurons. c, Schematic of SD residual population vector for four different behavioural epochs. Left, somatic, dendritic and residual traces for three cartoon neurons. Green, cyan, purple and black boxes represent 4 different behavioural epochs: 2 s before and after target is reached for rewarded trials and 2 s before and after the end of unrewarded trials. The SD residual trace of each neuron was estimated by collapsing coincident soma–dendrite events into point values at the time of event onset. Resultant SD residual traces for all neurons in each behavioural epoch were used to estimate the n-dimensional vector of SD residuals, where n corresponded to the number of neurons for with paired somas and dendrites (Fig. 2 and Methods). Right, 3D plot of SD residuals for the four behavioural epochs shown on the left, where x, y and z correspond to SD residual of neurons 1–3 (left). d, A 39-dimensional vector of SD residuals collapsed onto the first 3 principal components for visualization purposes only. Each dot corresponds to a vector of SD residuals. Cyan dots are vectors resulting from the 2 s following the reaching of target activity in rewarded trials. Dark grey dots are vectors calculated in the 2 s following the end of an unsuccessful trial (same as c). The shaded black hyperplane represents the decision boundary for binary classification calculated using a linear SVM. e, Decoding accuracy for test versus shuffle data for 83 sessions (paired t-test, P = 9.8 × 109; 0.63 ± 0.01 (test) and 0.52 ± 0.01 (shuffled), mean ± s.e.m.; n = 83). f,g, Same as in d,e; a 264-dimensional vector of SD residual collapsed onto the first 3 principal components for visualization only. Green represents the last 2 s of a rewarded trial and purple represents the last 2 s of an unrewarded trial (paired t-test, P = 7.1 × 108; 0.57 ± 0.01 (test) and 0.49 ± 0.01 shuffled, mean ± s.e.m.; n = 83). h, Schematic of the experimental approach: layer 1 NDNF+ interneurons (INs) were optogenetically activated during BCI task performance. il, Same as dg, but during optogenetic activation of NDNF+ interneurons. j, Paired t-test, P = 0.18; 0.48 ± 0.02 (test) and 0.51 ± 0.02 (shuffled), mean ± s.e.m.; n = 53 sessions from 4 mice. l, Paired t-test, P = 0.13; 0.54 ± 0.03 (test) and 0.50 ± 0.01 (shuffled); n = 55 sessions. NS, not significant.

Source data

Next, we tested whether inputs onto the apical tuft dendrites represent instructive signals during learning. We used SD residuals to decode successful versus unsuccessful trials in the 2 s periods preceding successful target activation versus timeout, respectively. Once again, we found that our decoder performed significantly above chance at 57% accuracy on average (Fig. 4c,f,g and Extended Data Figs. 6 and 11c,d), demonstrating that individual neurons encode information about the network states that correspond to successful versus unsuccessful outcomes in their SD residuals both before and after reward delivery. As the trial time we analysed is pre-outcome, our results indicate that the SD residuals encode instructive signals based on the task-associated reward function.

Finally, we tested the role of layer 1 inhibition in controlling dendritic signals encoding reward and trial outcome (Fig. 4h). To do this, we performed experiments on a second set of four mice expressing ChRmine in NDNF+ layer 1 interneurons. Optogenetic activation of layer 1 NDNF+ neurons, but not LED illumination alone (conducted in a control group of five mice), abolished task and reward-related information in the apical dendrites of layer 5 pyramidal neurons (Fig. 4i–l and Extended Data Fig. 10), highlighting a potential role for local cortical inhibition in dendritic processing of task-related variables.

SD residuals reflect neuron-specific task error signals

We exploited the explicit definition of error and of functionally opposite classes of neurons in our experimental design to test whether error signals are received at apical dendrites and, if so, whether they differ between neurons according to each neuron’s causal role in the task (Fig. 5a,b). We reasoned that a scalar error signal would manifest as amplified dendritic activity during periods of error reduction for both P+ and P− neurons and as attenuated dendritic activity during times of error increase. However, a vectorized error signal would exhibit selective P+ versus P− dendritic activation, as the activity of each group is causally mapped to error in opposite ways. To disambiguate between these scenarios, we averaged the error in 2 s windows throughout the task and defined each window as an error increase or decrease epoch, given that the angle of the visual stimuli presented to the mice represented the instantaneous task-associated error (Fig. 5a). Next, we calculated the SD residuals for P+ and P− neurons for coincident soma–dendrite events in each window during error decrease and error increase epochs. As our analysis was restricted to time bins with coincident somato-dendritic events in P+ and/or P− neurons, any potential noise-driven flickering was not present in our analysis. We found that the dendrites of P+ neurons were relatively amplified during error reduction compared with error increase epochs (Fig. 5c). Dendrites in P− neurons exhibited the converse relationship: relative dendritic attenuation and amplification occurred during error reduction and error increase, respectively (Fig. 5d–f and Extended Data Figs. 12 and 6). This relationship could be observed in six out of six mice trained in the task (Extended Data Fig. 13e) and remained intact when we restricted our analysis to neurons whose somatic activity was the same during epochs of error increase and reduction (Extended Data Fig. 13). Additionally, the same inverted relationship between dendritic signals and task-associated errors was found in the dendrites of P0 neurons that were functionally correlated to P+ and P− neurons (Extended Data Fig. 14). Of note, SD residuals represented error derivatives, not errors (Extended Data Fig. 13), in contrast to instructive signals found in the classical implementations of backpropagation.

Fig. 5: Dendritic error signals are cell-specific and depend on the causal contribution of the neuron to the task.
figure 5

a, Left, experimental schematic. Top right, idealized somatic, dendritic and residual traces, respectively, for three neurons. Bottom right, relationship between stimulus angle, target and error. All closed-loop data from the BCI paradigm (which excludes rewards and timeout periods) were chunked into 2 s bins. Epochs of error decrease and increase were defined as bins in which the mean derivative of the angle increased and decreased, respectively. b, Three possible hypotheses: in the null hypothesis scenario, error is calculated at the population level through recurrent dynamics independent of dendrites. In the scalar hypothesis, a single error signal is broadcasted through the dendrites of all neurons in the network. This model predicts that all neurons receive the same error signal. In the vector hypothesis, error signals received on the dendrites of individual neurons are tailored according to the causal involvement of individual neuron to behaviour. This model predicts that neurons with opposite causal contribution to behaviour will receive different error signals onto their dendrites. c, For two individual P+ neurons, the mean ΔF/F0 signal in the somas (black) and in the dendrites (orange) for all events that occurred during epochs of error reduction and error increase, respectively. Compared to somatic activity, dendrites are relatively amplified during error reduction compared to error increase epochs. The bar graph represents mean ± s.e.m. of SD residual value (z-scored) for all events that occurred during error decrease and increase epochs. d, Same as c, for two P− neurons. In contrast to the P+ neurons, dendritic activity is relatively attenuated for error reduction epochs compared with error increase epochs. e, Top, during error reduction epochs, the cumulative distribution function for SD residuals (z-scored across all neurons) for P+ and P− neurons. The bar graph represents mean ± s.e.m. for the population distribution shown in the cumulative distribution function for P+ and P− neurons. Dendrites of P+ neurons are relatively amplified compared to the dendrites of P− neurons during error reduction epochs (t-test; P = 5.3 × 107; 0.005 ± 0.01 (P+) and −0.14 ± 0.02 (P−), mean ± s.e.m.; n = 292 (P+) and 240 (P−) neurons). Bottom, dendrites in P+ neurons are more attenuated than P− neurons during error reduction epochs (t-test, P = 1.2 × 107; − 0.1 ± 0.02 (P+) and 0.05 ± 0.01 (P−), mean ± s.e.m.; n = 267 (P+) and 249 (P−)). f, Experimental schematic. GCaMP signals in the soma and dendrites of P+ and P− neurons were recorded during optogenetic activation of layer 1 NDNF+ neurons. g, As in e but during layer 1 NDNF+ neuron activation. Left, during error reduction epochs, the cumulative distribution function for SD residuals (z-scored across all neurons) for P+ and P− neurons. The bar graph represents mean ± s.e.m. for the population distribution shown in the cumulative distribution function for P+ and P− neurons (t-test; P = 0.58; 0.06 ± 0.04 (P+) and 0.02 ± 0.04 (P−), mean ± s.e.m; n = 119 (P+) and 100 (P−) neurons). Right, same as e, for error increase epochs. Dendrites in P+ neurons are more attenuated than those in P− neurons during error reduction epochs (t-test, P = 0.7; 0.02 ± 0.05 (P+) and 0 ± 0.03 (P−); n = 105 (P+) and 105 (P−)). h, BCI performance from four mice. Left, accuracy in early and late training for control (opto off) and opto on conditions. Dashed grey line represents chance level (paired t-test; P = 0.83; 0.51 ± 0.04, mean ± s.e.m.; n = 32 for opto on against chance during early training; paired t-test; P = 3 × 107; 0.63 ± 0.03; n = 48 for control against chance during early training; P = 0.1; 0.56 ± 0.02; n = 24 for opto on against chance during late training; P = 5.7 × 10−23; 0.8 ± 0.01; n = 36 for control against chance during late training; t-test, P = 0.36 (opto on) and 1.3 × 107 (control) for early versus late training). Right, rewards per minute in early and late training for control (opto off) and opto on conditions. Dashed grey line represents rewards per minute for control on day 1 (paired t-test; P = 0.8; 1.6 ± 0.2, mean ± s.e.m.; n = 32 for opto on against control on day 1, during early training; paired t-test; P = 7 × 107; 0.63 ± 0.26; n = 48 for control against control on day 1, during early training; P = 0.23; 1.9 ± 0.26; n = 24 for opto on against control on day 1, during late training; P = 4.2 × 10−17; 3.8 ± 0.14; n = 36 for control against control on day 1, during late training; t-test, P = 0.42 (opto on) and 1.5 × 107 (control) for early versus late training).

Source data

Next, we tested whether vectorized error-related dendritic signals were necessary for learning by optogenetically activating NDNF+ layer 1 interneurons throughout the BCI task. This abolished vectorized error-related signalling in the apical tuft of layer 5 pyramidal neurons (Fig. 5f,g) and disrupted learning (Fig. 5h) but not in our LED illumination control group (Extended Data Fig. 15). This demonstrates that local computation in the apical dendritic tuft is necessary for performance improvements in the BCI task.

Discussion

Here we demonstrate the use of neurofeedback brain–computer interfaces to study the mechanisms of biological credit assignment at the subcellular level. Our results provide—to our knowledge—the first biological evidence of a vectorized solution to the credit assignment problem in the brain via cortical dendrites. Our data are consistent with a model of credit assignment in which learning is instructed by instantaneous, vectorized teaching signals received onto the distal dendrites of pyramidal neurons1,2,3,4,5. This spatial segregation mechanism allows cortical circuits to overcome the biologically implausible temporal separation of feedforward and feedback streams conventionally used for computing teaching signals during vectorized learning in ANNs.

The data presented here reveal magnitude differences in coincident somato-dendritic events that can be predicted using activity in the surrounding network of neurons. At the population level, differences in somato-dendritic coupling encode de novo information relative to somatic activity. This information could be used by individual neurons as instructive signals, such as reward and task error, providing novel evidence that individual neurons can explicitly access the reward function of a learning task through independent dendritic computation. We further demonstrate that cell-specific changes in SD residuals correlate with the functional role of individual neurons as well as with subsequent changes in activity levels during learning. Finally, optogenetic activation of NDNF+ layer 1 interneurons disrupted both dendritic computation and learning, demonstrating that dendritic processing is necessary for learning.

Our results demonstrate the existence of a signed, vectorized dendritic input that is tailored in a condition-specific manner to individual neurons. The extent to which this dendritic activity reflects moment-to-moment computational signals—as opposed to teaching signals for synaptic weight changes—remains to be uncovered. This could be explored via manipulations of dendritic activity through different phases of learning57. Importantly, vectorized error signals need not be confined to learning. In control-theory-inspired frameworks of credit assignment, error signals are applied during task operation, to steer the system online58,59. Further work is needed to assess whether these dendritic signals result from glutamatergic inputs from higher-order cortical areas, from neuromodulation, or as a product of recurrent excitatory and inhibitory local computation. Dopaminergic signalling specifically has been causally implicated in both error signalling and in learning neurofeedback BCI tasks in rodents and humans33,60,61,62 and thus represents a compelling target for future investigation. Further experiments are also needed to test whether errors signals are calculated locally at each hierarchical layer or are transmitted across layers, as in the classical formulation of backpropagation19. Previous neurofeedback BCI studies have demonstrated that degrading the contingency between neuronal activity and feedback stimuli impairs learning33,34,35: future work will have to determine whether external stimuli are always necessary for error representations or whether animals can access the cost function via internal states exclusively, and how the dendritic representation of error might change as a result.

The error signals we observed have appealing connections to the gradient calculations found in the backpropagation algorithm. In contrast to the classical implementation of backpropagation, however, we observed that dendrites received signals that bore signatures of error derivative rather than error itself. Intriguingly, our results could also be consistent with target propagation (specifically, difference target propagation)14,20,21. Indeed, our data indicate that dendritic activity contains a target signal for the parent soma in addition to task-related error information. Future approaches, built on the framework we present here, could be used to disentangle the specific learning algorithms used by the brain14,63.

Together, our results help to reconcile early findings and theories of dendritic function, which focused on single dendritic branches as the building blocks for independent computation, with later in vivo findings that have demonstrated prevalent co-occurrence of dendritic and somatic events15,24,51,64. By demonstrating that apical dendrites locally compute reward and error-related signals, our results present a framework for dendritic computation that does not require fully independent dendrites to perform credit assignment for adaptive behaviour and highlight new directions for the development of biologically inspired ANNs.

Methods

Animals

All experiments were compliant with guidance and regulation from the NIH and the Massachusetts Institute of Technology Committee on Animal Care. Male and female Rbp4-Cre and NDNF-Cre heterozygous mice were maintained on a 12 h:12 h light:dark cycle in a temperature- and humidity-controlled room with ad libitum food access and were used for experiments at 8–15 weeks of age. Except for anaesthesia experiments, mice were water-deprived by decreasing water intake from 3 ml to 1.2 ml over the course of 10–14 days and maintained at 1.2 ml thereafter, for 5–7 days before experiments and throughout training.

Surgery

Mice were initially anaesthetized using 4% isoflurane and subsequently maintained at 1–2% isoflurane through the rest of the surgery. Body temperature was maintained at physiological levels using a closed-loop heating pad. Additional heating was provided for post-surgical recovery. To protect eyes from dryness, eye cream (Bepanthen, Bayer) was applied. Mice were injected with dexamethasone (4 mg kg−1), carprophen (5 mg kg−1) and buprenorphine (slow release, 0.5 mg kg−1) subcutaneously. The scalp was shaved using hair removal cream and cleaned afterwards using iodine solution and ethanol. Next, the skull was exposed. For in vivo imaging, a 3 mm-wide craniotomy was performed. In Rbp4-Cre mice, at 3–4 different sites, we injected 100 nl of AAV1-syn-FLEX-jGCaMP7f-WPRE (Addgene, 04492-AAV1, 2–5 × 1012 viral genomes (vg) ml−1 concentration after a 1:10 dilution from the original concentration) at 400 μm from the surface of the brain in the left hemisphere of the RSC (2.5 mm caudal of bregma). The same labelling approach was used to perform anaesthesia experiments. In NDNF-Cre mice, we injected 100 nl of AAV8-nEF-Con/Foff 2.0-ChRmine-oScarlet (Addgene, 137161-AAV8, 7 × 1012 vg ml−1 after a 1:5 dilution from the original concentration) 150 μm from the surface of the brain and 75 nl of AAV1-syn-jGCaMP7f-WPRE (Addgene, 104488-AAV1, 2–5 × 1012 vg ml−1 concentration after a 1:10 dilution from the original concentration) 500 μm from the surface of the brain. The dura was left intact. Cranial windows consisted of two stacked 3 mm coverslips (inserted within the craniotomy) attached to a larger 5 mm coverslip which was subsequently fixed to the skull using cyanoacrylate glue and dental cement. A custom metal headplate was implanted to perform imaging under head-fixed conditions. At the end of the procedure, a single dose of 25 ml kg−1 of Ringer’s solution was injected subcutaneously to rehydrate the mouse. Recordings started 4–6 weeks post-surgery.

Two-photon imaging

A Neurolabware 2-photon microscope equipped with GaAsP photomultiplier tubes was used for data acquisition. Imaging was performed at 980 nm using an ultrafast pulsed laser (Spectra-Physics, Insight DeepSee) coupled to a 4× pulse splitter to reduce photodamage and bleaching. For excitation and photon collection we used a 16× Nikon objective with 0.8 numerical aperture. Bidirectional scanning was performed (512 × 796 pixels) semi-simultaneously in two separate planes using an electrically tunable lens at 30.92 Hz (15.46 Hz for each plane). Laser intensity was independently optimized at each imaging plane using an electro-optical modulator. A custom light shield was attached to the headplate to avoid light contamination. Mice were habituated to human handling for 5–10 min every day and to head-fixation for 15 min a day for at least 3 days directly preceding imaging. Small 10% sucrose water rewards were randomly dispensed during habituation. Daily water intake of at least 1.2 ml was maintained throughout the behavioural experiments. The locomotion of the mouse was recorded using an optical encoder (E6, US Digital, 2500 cycles per revolution) tracking the rotation of a cylindrical treadmill with a radius of 19 cm and acquired using the Scanbox software interfaced to a custom-built Arduino system. To maximize the number of units recorded while simultaneously reducing signal contamination, we imaged the trunk of layer five pyramidal neurons at two different planes: proximal to the soma and right below the nexus (tuft bifurcation point).

Optogenetic stimulation

For optogenetic stimulation we used a Cyclops LED driver from open ephys (OEPS-6602) triggered using a direct 6 ms TTL pulse delivered via the Neurolabware Dual PSOC box. The driver controlled a fibre-coupled 595 nm LED (8.7 mW, 100 mA Thorlabs M595F2). LED illumination was synchronized with the photomultiplier tubes (PMTs) of the imaging system using custom-made Matlab scripts. In brief, for every new frame acquired by the 2-photon microscope, the LED was activated for the initial 6 ms of the frame, while the PMTs were kept shut off for an additional 1 ms (7 ms total of PMT off time). PMTs would then reactivate to collect calcium data for the remaining approximately 24 ms of the approximately 31 ms frame.

BCI task

Similar to previous implementations of BCI learning paradigms34,35,38, mice were trained so that they obtained rewards by modulating the activity of eight or ten layer 5 pyramidal neurons in the RSC to control the rotation of a grating Gabor patch. The 8 or 10 neurons were equally divided into 2 subpopulations, P+ neurons whose activity rotated the stimulus towards a target angle of 90° (horizontal) and P− neurons whose activity rotated the stimulus away from the target angle, towards a 0° (vertical) orientation. Neural activity was transformed into a visual stimulus angle according to the following method: at the beginning of each session, we measured the baseline responses of P+ and P− neurons to 7 randomly presented oriented gratings (0°, 15°, 30°, 45°, 60°, 75° and 90°, passive viewing) for approximately 13 min (12,000 frames). ΔF/F0 was calculated for individual P+ and P− neurons and averaged across each population. The mean P− population signal was subtracted from the mean P+ population signal. Next, we randomly resampled 200 trials (435 frames each) from the aforementioned 12,000-frame baseline recording and iteratively searched (in 0.005 ΔF/F0 incremental steps) for the subtracted ΔF/F0 value producing a 50% success rate. That value was set as the threshold value for target activity during the closed-loop phase of the BCI task. Next, we calculated the mean and s.d. of the subtracted ΔF/F0 signal distribution and created a new distribution by mirroring the left side to the right. On day 1, we estimated the z-score corresponding to the ΔF/F0 threshold value on the mirrored distribution. On the following days, we estimated the subtracted ΔF/F0 signal distribution and its corresponding left-mirrored distribution in the same way as described above, and used the ΔF/F0 value corresponding to the z-score used on day 1 as the task target activity during the closed-loop phase of the task. In this way mice could learn the task by either decreasing activity of P− neurons or increasing activity in P+ neurons (or both). The mapping between neuronal activity and visual feedback angle was defined as follows: 0° angle corresponded to the minimum value in the subtracted ΔF/F0 signal distribution while target, or 90° angle was reached at subtracted ΔF/F0 value corresponding to threshold defined as described above. Activity in between was split into 7 equally spaced bins each corresponding to a 15° interval between 0° and 90°. At each screen refresh, the angle presented reflected the mapping between angle bins and the subtracted ΔF/F0 signal averaged over the last 3 frames. The screen refreshed every time a 2-photon frame at the soma was acquired (at 15 Hz). In line with previous studies performing neurofeedback BCI in rodents33,34,35,36,38, we binned the visual stimulus to avoid noise-driven, frame-by-frame stimulus updates at the screen refresh rate, which is beyond the perceptual threshold in mice65,66. To avoid introducing a second, orthogonal dimension to our task that would disrupt the straightforward mapping between neuronal activity and task error, we did not introduce any requirement on the number of P+ or P− neurons required to be simultaneously active to trigger a reward or a stimulus update. In each trial, mice had 28 s to reach target activity. If they did, a reward, consisting of 4 μl of 10% sucrose water was delivered 1 s after. Additionally, after reaching target activity, the stimulus froze to a 90° angle for 2 s. After that, mice saw a black screen for an additional 1 s and a new trial was initiated. All new trials were initiated by a 0.5 s isoluminant grey stimulus. If a mouse did not reach target activity within the 28 s of the trial, a 3 s timeout was given to them consisting of a black screen. To avoid the problem of drifting baselines, ΔF/F0 for each neuron was calculated as (Fi – Fi0)/Fi0 where Fi0 was the 10th percentile of fluorescence in the previous 30 s. For the optogenetics experiments, we recorded 2 different baselines (13 min each, during passive viewing of the Gabor patch—same as described above). The first one with the LED off (control) was used for post hoc analysis only of the data shown in Fig. 3. The second baseline was recorded during LED stimulation (opto on) and was used to map neuronal activity to angle and target during the closed-loop part of the BCI task. This mapping was crucial to ensure that the decoder was calibrated consistently for the closed-loop BCI task, which was recorded during the opto on condition only. Early and late training were defined as days 1–8 and 9–14 respectively, based on average performance (accuracy) remaining above 0.75 in the control (opto off) condition. For anaesthesia experiments, we recorded two passive viewing sessions (awake and anaesthetized, 24 min each) where we presented the same set of stimuli presented when recording the baseline session for the BCI task. We anaesthetized the mice in between these two sessions by initially administrating (via inhalation) 4% isoflurane that subsequently decreased to 1% for the duration of the imaging session.

P+ and P− neuron selection

On day 1, we drew 20–40 regions of interest (ROIs) in a single field of view prior to starting our baseline recording. Next, we recorded a session of passive visual stimuli that we would later use as our baseline recording for day 1. At the end of this recording, all ΔF/F0 traces for all drawn ROIs were plotted and visually inspected using a custom Matlab script. The experimenter would then select either eight or ten of these traces based on event frequency, signal to noise ratio (SNR; determined as the ratio between noise band width and maximum event size), baseline stability and calcium transient dynamics (with a clear rise, peak, and exponential decay—as opposed to plateau-looking events). The best 8–10 neurons would then be selected from the available pool of neurons on which ROIs were drawn. No arbitrary parameter cut-off (for example, minimum event frequency or SNR) was introduced. The subdivision of these eight to ten neurons into the P+ and P− population would then be determined by a random number generator. Once selected, P+ and P− neurons would remain the same for the entire duration of the experiment.

Online motion-correction

To avoid drifts in x and y out of our selected regions of interest, we used a fast Fourier transformation approach to live motion-correct our movies. To do so, at the beginning of each recording session we acquired a reference image by averaging 20–40 s (300–600 frames) collected onto our field of view. To motion-correct each subsequent frame, we selected four smaller central areas to register independently from one another (2D rigid translation) against the corresponding four areas in the reference image67. We finally rigid-translate the entire 2D image by taking the average translation in x and y for these four subregions.

Visual stimuli

Visual stimuli were generated using the Psychophysics Toolbox package for MATLAB (MathWorks)68 and displayed on a monitor 20 cm away from the contralateral eye. Visual stimuli consisted of a rotating Gabor patch at 7 angles spaced 15° apart from 0° to 90°.

Offline image analysis and signal extraction

To correct for brain motion after image acquisition, as well as to automatically detect ROIs, we used the Suite2p pipeline69. For each field of view, we removed duplicates by excluding ROIs whose signal correlation was above 0.6 and whose centre was within 20 μm of distance. In order to separate trunk signals from potential neuropil contamination, fluorescence signals of our ROIs were processed using FISSA70 with the following parameters: 4 neuropil subregions and alpha = 0.1. To estimate ΔF/F0 after neuropil subtraction, we calculated ΔF/F0 at time point i as (Fi – F0)/F0. F0 is defined as the tenth percentile of a 120-s-long sliding window to remove fluorescence drifts over the course of imaging. Next, we performed spike inference using the CASCADE model Global_EXC_15Hz_smoothing200ms45.

Field of view matching and ROI registration across days

Registration of neurons across days for BCI training was performed manually at the beginning of each session with the help of a custom-designed software. On day 1, a mean intensity reference image of our field of view of interest was acquired. Using a custom-designed software, we manually drew 10–20 reference ROIs, which included any recognizable brain structure including dendrites, cell somas and sharp-contrast blood vessels. On the following days, after manually finding the same approximate area for the field of view imaged on day 1, a more accurate manual registration was performed by aligning our reference ROIs drawn on day 1 with their corresponding structures on following days. As the relative x and y distance between structures varies along the z-plane, our approach allowed us to consistently match our field of view on day 1 across x, y and z dimensions on any given day. Offline registration of ROIs across days on the other hand, was initially performed using the ROIMatchPub implementation for Suite2p followed by an exhaustive manual curation.

Quantification of event frequency, magnitude and timing

Events were detected for each ROI using the MATLAB function findpeaks on the spike-inferred signal. For analysis of the spike-inferred signal, we estimated the integral of individual peaks by multiplying the height and width of individual transients. Event occurrence was defined as the time at which spike probability peaked. For ΔF/F0 analysis, once we found an event, we used a 2 s backward sliding window to identify the frame at which the derivative of the ΔF/F0 signal became consecutively positive for 300 ms. This was considered the transient onset frame while the peak of the transient was considered the maximum ΔF/F0 value in the 2 s following peak detection. We therefore estimated the integral of the ΔF/F0 signal by multiplying the height (maximum ΔF/F0 value – ΔF/F0 value at transient onset) and the width (frame at maximum ΔF/F0 value – frame at transient onset) of the ΔF/F0 signal. The backward and forward detection windows were limited in time by the presence of a precedent or subsequent event detected using the spike-inferred signal. Proximal trunks were paired to their corresponding distal trunk whenever their ΔF/F0 signal correlation was equal above 0.6. For optogenetics and anaesthesia experiments, we matched proximal trunks to their corresponding distal trunk using activity during control (opto off) and wakefulness, respectively. Whenever we found more than one distal dendrite correlated with the same proximal trunk, we selected the one with the best signal-to-noise ratio, so to always have a single distal dendrite associated with a proximal trunk. Coincident events were defined as two events occurring (independently detected) within a 500 ms window in the two compartments. To quantify the somato-dendritic magnitude mismatches of coincident events, we first fit a best-fit line against the somatic and dendritic magnitudes of all events. For each event, we calculated the residual from the best-fit line, and defined residuals larger than 0 as dendritically amplified and residuals smaller than 0 as dendritically attenuated. To estimate the SD residual during optogenetic stimulation and anaesthesia, first we estimated the best-fit line using somato-dendritic activity during light off and wakefulness conditions, respectively. Next, we calculated the residual for all events detected during opto on and anaesthesia as the distance between these events and the previously calculated best-fit line.

Decoders

To decode whether individual transients would be amplified or attenuated, we trained a SVM binary classifier (linear kernel) using stochastic gradient descent71 (as implemented by MATLAB fitclinear). For each coincident event in the soma and dendrites, we averaged the spike-inferred activity of each neuron in our field of view (excluding the neuron of interest) in the preceding 2 s, and we used this average activity to create an n-dimensional population activity vector where n corresponds to the number of isolated units in our field of view. The binary classifier was trained to separate dendritic amplification from dendritic attenuation (see above) using a leave-one-out approach. Accuracy was determined as the fraction of correctly classified events. For imbalanced datasets, we used a synthetic minority oversampling technique (SMOTE, k neighbors = 5) to train (not test) using a balanced dataset. SMOTE was applied after separating our train and test datasets. To control for any potential data leakage, our shuffle control went through the exact same procedure as our test dataset, including SMOTE oversampling with the only difference that labels were randomly shuffled before separating the train and test data. We calculated the confidence of a prediction as the Euclidean distance from the hyperplane. Reward-associated and reward-instructive epochs were defined as 2 s before and 2 s after the reach of target activity, respectively for successful trials, and 2 s before and 2 s after the end of a trial for unsuccessful trials. To decode successful from unsuccessful trials, we generated a n-dimensional SD residual vector by taking the residual for each neuron for which we identified a somato-dendritic pair (see above) in these 2 s epochs. Neurons inactive in the 2 s epochs were assigned a value of 0. The binary classifier was trained in the same manner as described above.

Statistics

All analysis was performed using MATLAB 2020a using custom-written scripts and functions. All error bars in figures represent s.e.m. Statistical tests and independent samples are described in figure legends. All t-tests in the Article are two-sided.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.