Introduction

Long-range interactions between different brain regions via feedforward and feedback signals are thought to enable flexible behaviors in rapidly evolving environments. Cortical areas are highly interconnected1,2,3,4,5,6, suggesting that, across modalities, information about stimuli and their behavioral significance is widely shared, even in primary sensory cortical areas7,8,9,10. Consequently, cortical stimulus representations can be shaped by cognitive demand and contribute to the selection and generation of actions. Within this conceptual framework, cortical feedback has been proposed to support key computations. These range from extracting fine stimulus features11,12 to learning associations13,14 and generating sensorimotor predictions13,15,16,17,18,19,20,21,22,23, to modifying and executing motor actions in accordance with behaviorally relevant goals11,12,15,16,20,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39. While feedback from deeper brain regions reformats visual, auditory, and somatosensory neural representations to enable the differential evaluation of the same sensory inputs11,24, the degree to which this is the case for olfactory processing has been less scrutinized13,16,32,39.

Volatile compounds bind odorant receptors in the nasal epithelium and relay odor information to the olfactory bulb (OB) glomeruli, which are sorted by odorant receptor type40,41,42. Glomerular responses are normalized and de-correlated by local circuits within the olfactory bulb43,44,45,46,47,48,49,50, and relayed to higher brain regions by two populations of output neurons, the mitral (MCs) and tufted cells (TCs)16,38,51,52,53,54,55,56,57,58,59,60,61. The major projection targets of the olfactory bulb include paleo-cortical areas such as the anterior (aPCx) and posterior piriform cortex (pPCx) and the anterior olfactory nucleus (AON), in addition to the olfactory striatum (tubercle), cortical amygdala (CoA), and the lateral entorhinal cortex (lENT)51. Akin to other sensory modalities, recent work revealed the existence of parallel long-range feedback olfactory processing loops16,17,35,38,45,52,56,58,62,63,64,65,66,67,68. These functional streams engage specifically the mitral and the tufted cells and their preferred cortical targets (aPCx for the mitral cells, and the AON for the tufted cells, respectively) to potentially sub-serve different computations17,69. In this view, the TC↔AON loop mainly represents sensory features, such as odorant identity, intensity, and timing. In contrast, the MC↔PCx loop re-shapes sensory representations to enable fine discrimination of learned odorants and sensorimotor integration as a function of behavioral contingency17. The primary bulbar recipients of the cortical feedback are inhibitory interneurons, in particular the granule cells (GC)15,51,70,71,72,73,74. GCs integrate feedforward sensory input and mediate lateral and recurrent inhibition by forming reciprocal synapses with the lateral dendrites of mitral and tufted cells15,51,75,76,77,78. The olfactory bulb dynamically represents odor inputs based on state and context14,45,56,62,63,64,65,67,79,80,81. Cortical feedback has been a prominent candidate for shaping the odor representations of the olfactory bulb outputs. Indeed, previous work showed that cortical bulbar feedback is sparse and odor-specific in naïve animals, and is strengthened by learning relevant stimuli13,16,17,32,39. To date, however, how cortical bulbar feedback negotiates sudden changes in stimulus-reward associations, and whether it contributes to flexible updating of behavioral strategies is poorly understood.

It remains unknown whether cortical bulbar feedback relays specifically olfactory signals or conveys multimodal inputs extending beyond olfaction as a function of behavioral demands. Computational models82,83,84,85,86 based on anatomical and functional data54,87,88,89,90,91 proposed that distributed connectivity between the olfactory bulb and the piriform cortex enables long-term plasticity and sparse coding of odor identity in a concentration invariant manner, regardless of temporal variations, background, or stimulus-reward value92,93,94,95,96. Furthermore, classic work suggested that the piriform cortex function extends beyond sensory feature extraction, encompassing more associative spatial orientation and contextual computations69,93,95,97,98,99. Recent functional studies reported that the anterior piriform cortex representations are largely unchanged upon learning of new stimulus-reward associations100, but see refs. 99,101. In contrast, the posterior piriform activity is modulated as a function of context and may bind together spatial information and actions related to olfactory behaviors102,103.

Here we investigated whether the piriform-to-olfactory bulb feedback axons represent changes in the reward contingency13,56,63,64, in addition to odor identity information as previously reported16,17,32,39. Specifically, we used multiphoton imaging of calcium signals to analyze the activity of cortical bulbar feedback axons in expert mice performing a rule-reversal task guided by olfactory and sound cues. Our results indicate that the piriform cortex-to-bulb feedback axons carry multimodal identity and reward contingency signals, which are rapidly reformatted according to changes in the behavioral contingencies.

Results

A rule-reversal Go/No-Go task to assess the role of cortical bulbar feedback axons in behavioral flexibility

To determine whether cortical bulbar feedback axons support behavioral flexibility104,105, we engaged mice in a rule-reversal Go/No-Go task, while simultaneously monitoring the dynamics of feedback axons and synaptic boutons via multiphoton imaging of GCaMP5/7b signals (Fig. 1a, b; Supplementary Figs. 15, “Methods” section). To investigate whether feedback axons represent stimulus contingency and/or trial outcome independent of the sensory modality, we used one olfactory and one auditory cue instead of two odors. We trained water-deprived head-fixed mice to discriminate between two brief (350 ms) sensory cues: a pure tone target (‘Go’) stimulus and a monomolecular odorant (‘No-Go’). We encouraged mice to respond to the ‘Go’ stimulus by licking to collect small water rewards from a spout placed in front of their mouth. Conversely, we trained them to refrain from licking in response to the ‘No-Go’ stimulus by imposing an additional time-out period and delivering a white-noise sound before initiating the next trial in the event of spurious licking (“Methods” section). To disambiguate the neural signatures of reward contingency from motion artifacts related to licking and other signals, in a subset of experiments we imposed a 500 ms delay between the cue offset and the reporting period when water reward was available (Fig. 1b, Supplementary Fig. 1a ‘no-delay’ vs. ‘delay’ versions; see “Methods” section). Experiments from both versions of the task were analyzed in parallel and provided qualitatively similar results (see “Methods” section, Supplementary Information). For clarity, we focus on the delay version of the task for the analyses discussed below. To ensure that mice use both cues to solve the task, and cannot time their responses solely based on the delivery of a preferred cue (odor or sound), we used flat hazard rate inter-trial intervals (ITI) drawn from an exponential distribution (Supplementary Fig. 1f). In conjunction with variable inter-cue intervals for consecutive trials (Supplementary Fig. 1g) based on each trial’s outcome (hit vs. correct rejections vs. false alarms vs. misses), and the lack of overt signals to mark the start of a trial, this strategy ensured that mice could not predict the onset of the sensory cues (“Methods” section). Once mice learned the stimulus-reward association and reached higher than 75% accuracy (behavioral performance), we switched the stimulus-reward contingency in blocks of contiguous trials between ‘Odor Go blocks’ (odorant rewarded) and ‘Sound-Go blocks’ (tone rewarded, Fig. 1a, c; Supplementary Fig. 1c). No explicit cue marked the block transition (rule-reversal) events.

Fig. 1: A Go/No-Go rule-reversal task using olfactory and auditory cues.
figure 1

a Schematics of a behavior session and example FOV. (Left) An olfactory (1% ethyl valerate) or auditory (6.2 KHz tone) cue was delivered randomly in each trial, and each session was divided into stimulus-reward contingency blocks of ~45 trials. Stimulus-reward contingency was alternated between ‘Sound Go blocks’ (containing Sound Go and Odor-No-Go trials; pink) and ‘Odor Go blocks’ (containing Odor Go and Sound-No-Go trials; purple). (Right) Mice virally expressing GCaMP5 in the anterior piriform cortex (aPCx) with a chronic cranial window implanted above the olfactory bulb (“Methods” section, scale bar: 500 μm). (Inset) Example FOV of cortical bulbar feedback boutons (~300 μm from the bulb surface, scale bar: 30 μm). b In the ‘delay’ task, a variable inter-trial interval (ITI; flat hazard rate, “Methods” section) was followed by the delivery of a brief odor or sound cue (0.35 s) and a fixed 0.5 s interval (delay period) before the time when the reward became available. Mice were trained to report their decision (lick vs. no-lick) within a 1.5 s window from the end of the delay period. c In-session behavioral performance comparisons between early (Top) and expert (Bottom) sessions. Performance was quantified using a moving average window (bin size = 10 trials, “Methods” section). d Behavioral performance across sessions in the delay version of the task. (Top) Average behavior session performance. Zero marks the first session when mice experienced rule-reversal in the stimulus-reward contingency within a session. The red segmented line marks the behavioral threshold for expert performance (80%, “Methods” section); (Bottom) Average number of trials to reach 70% performance after each rule-reversal event (“Methods” section). (N = 9 mice; Error bars: ±SEM). e (Top) Example licks (dots) from odor and sound trials (Top vs. Bottom rows) parsed by trial instruction (Go: Left; No-Go trials: Right) from one delay session. (Bottom) Distributions of report latency to the first lick from cue onset for all delay sessions (N = 3 mice) sound trials (yellow; 930.8 ± 3.7 ms) and odor trials (blue; 986.4 ± 7.6 ms). Inset: detail of the time period marked by black bar.

Early in training, mice displayed unstable performance across blocks and were slow in updating their lick-reporting strategy upon rule-reversal (Fig. 1c top; Supplementary Fig. 1c). As the training progressed (~20 sessions), animals learned to switch reliably between reward contingencies and maintained a high level of behavioral performance across the session (>80%, Fig. 1cbottom, d; Supplementary Fig. 1c), with drops in performance occurring only immediately after rule-reversals (see “Methods” section for details on training). On average, expert mice had high behavioral performance across both the odor and sound trials, independent of the block type (‘Odor Go’, as well as ‘Sound-Go’), with slight biases as a function of cue and mouse identity (Supplementary Figs. 1d, e; 10d, e). This rules out the potential strategy of solving the task (receiving rewards) by relying only on one of the sensory cues and engaging in random responses to the other cue. Expert mice switched reporting strategies across blocks within ≤7 trials (6.74 ± 0.82, Fig. 1d, Supplementary Fig. 1i, j) and completed an average of 5.8 ± 0.87 reversals (blocks of ~45 trials) per session, akin to other task-switching paradigms106. We chose a block size of 45 trials as a tradeoff between short enough to afford multiple switches per session (~5) and long enough to enable performance stabilization after each switch. Keeping a flat hazard rate for the number of trials per block was difficult under these constraints. If the block length is predictable, mice could in principle learn the trial structure of block switches. However, even when the number of trials per block was constant, the total duration of a given block varied in time due to the flat hazard rate ITIs, and to differences in each trial’s duration as a function of its outcome (Supplementary Fig. 1f, g). Further, if mice relied on a ‘noisy’ estimate of the block size, one would expect error trials to be randomly distributed around (before and after) the block boundary. Across sessions and mice, however, we observed a sudden dip in behavioral performance at the boundary between blocks only following the rule switch, and not preceding it. This was succeeded by a gradual increase in performance, presumably informed by the mismatch between their expected and actual trial outcomes (Fig. 1c, d, bottom; Supplementary Fig. 1c, i, j). In a subset of experiments, we also varied randomly the number of trials per block (Supplementary Fig. 1c, h; see “Methods” section). In these experiments, mice also achieved >80% expert performance for both the odor and sound trials (Supplementary Fig. 10d, e), independent of the block type, within similar training time. Altogether, we conclude that expert mice do not use a simple time-keeping strategy to predict when the rule switches during the session.

We used a stable 80% session performance as the criterion for ‘expert’ behavior and the starting point to monitor the cortical bulbar feedback activity (N = 7 mice, “Methods” section, Fig. 1c–e). We observed comparable report latencies across modalities (930.8 ± 3.7 ms for the tone and 986.4 ± 7.6 ms for the odorant from cue onset in the ‘delay’ version, N = 3 mice; 445.0 ± 2.4 vs. 443.0 ± 3.2 in the ‘no-delay’ version; N = 4 mice, Fig. 1e, Supplementary Fig. 1b, “Methods” section). The learned association was robust across days and the same animal could learn multiple odor/sound pair associations (N = 2 mice, >80% accuracy, Supplementary Fig. 1k). Thus, head-fixed mice mastered a rule-reversal task which enabled further investigating whether the cortical bulbar feedback supports behavioral flexibility.

Diverse cortical bulbar feedback representations update within seconds following reward-rule switching in task-engaged mice

To monitor the cortical bulbar feedback activity in task-engaged expert mice, we expressed a genetically encoded calcium indicator in the anterior part of the piriform cortex (aPCx, EF1-FLEX-GCaMP5-AAV + AAV-Cre, Supplementary Fig. 2; “Methods” section). We imaged fluorescence changes in synaptic boutons from feedback axons within the olfactory bulb through chronically implanted cranial windows (“Methods” section). To determine whether this activity relates to the animals’ performance in the task, we investigated the dynamics of cortical feedback specifically during the cue and delay periods (i.e. before licking, “Methods” section, N = 2475 boutons, 20 FOVs, 3 mice, delay version; N = 1315 boutons, 23 FOVs, 4 mice, no-delay version). Across sessions, we probed the cortical feedback activity at different depths from the surface (200–300 µm), sampling boutons mostly just below the mitral cell layer (Fig. 1a; Supplementary Figs. 24; “Methods” section).

Fig. 2: Fast update of cortical bulbar feedback representations following reward-rule switching in task-engaged mice.
figure 2

a Example average responses (z-scored) of cortical bulbar feedback axon boutons to odor and sound cues during Go (blue) and No-Go (red) trials. Shaded areas mark different trial periods: cue (gray); delay (green); report (pink). b, c Example boutons that displayed stable (Left) or unstable (Right) average responses to odor (b) and sound (c) across conditions (Go vs. No-Go; “Methods” section). d Bouton responses (z-scored) averaged throughout the delay period and shown across trials in an example field of view from an expert mouse. Each row shows the response of one bouton across trials and blocks throughout the behavior session. Boutons are sorted from top to bottom by the strength of their response during the odor trials of the Odor Go blocks. Color-coded bars on top mark the block structure, cue identity, and trial outcome. e (Top) Same session as (d), re-sorted by cue identity: odor trials (Left) and sound trials (Right). Boutons were classified as enhanced, unresponsive, suppressed, or complex (enhanced +  suppressed) as per their response strength and polarity to the odor cue; (Right) the same ordering of boutons was kept for the sound trials. (Bottom) Inter-block correlation analysis (Odor Go vs. Sound Go; “Methods” section). f Average z-scored response values during the delay period parsed by the cue (Odor or Sound) and instruction (Go or No-Go). Each pair of connected colored dots represents average z-scored responses across conditions (Go vs. No-Go) from individual sessions. Black dots represent the average ensemble bouton response across sessions (N = 20 sessions, 3 mice). Two-sided Student’s t-test: *** = p < 0.0001; n.s.: non-significant. g Average inter-block correlation coefficients obtained as described in e, bottom for all fields of view (delay; “Methods” section). Each pair of gray connected dots represents the inter-block correlation values computed for one session. Black dots represent the average correlation across sessions (N = 20 sessions, 3 mice). Two-sided Student’s t-test: *** = p < 0.0001. h Same analysis as in (g) including comparisons across modalities. All panels error bars: ±SEM. See Source Data and Supplementary Table 1 for exact data points and p-values.

The feedback responses to both olfactory and auditory cues and their apparent alignment to different trial epochs raise the possibility that responses change flexibly, depending on the reward rule and/or trial outcome. To test these hypotheses, we compared bouton responses to the same sensory cue across different stimulus-reward contingencies. We observed diverse responses, ranging from stimulus-tuned (odor vs. sound responsive irrespective of reward contingency; e.g. bouton #38, Fig. 2a), to instruction-tuned (‘Go’ vs. ‘No-Go’) across sensory modalities (e.g. boutons #2, #133, Fig. 2a). Responses of individual boutons to the same sensory stimulus often varied in shape, kinetics, and amplitude, depending on the instruction signal across blocks within the same behavior session (Fig. 2b, c; unstable). In contrast, other boutons were not altered by changes in reward contingency (Fig. 2b, c; stable; Supplementary Fig. 8e–g). To further investigate differences in the cortical bulbar feedback activity across stimulus-reward contingencies, we used correlation analysis of bouton ensemble responses, as well as of individual bouton responses. We compared the dynamics of the feedback responses to the same sensory cue across blocks of different reward contingencies. In example field of views (Fig. 2d, e; Supplementary Figs. 7e; 9a–d), and generally across the data (Fig. 2f; Supplementary Fig. 5c), the odor responses appeared stronger in amplitude, and more boutons were responsive in the blocks of trials when the odor was rewarded than when it was not. In the same session, a subset of feedback boutons responded to the tone, specifically in the sound-rewarded blocks (Fig. 2e, sound vs. odor trials). In comparison, in blocks in which the tone was not rewarded, sound responses were less frequent and generally smaller in amplitude (Fig. 2d, e; Supplementary Figs. 7c; 9a–d; also across the data, especially suppressed responses, Fig. 2f). Complementary to these flexible cortical feedback representations, we also observed many boutons that responded in a stable manner to a given cue across conditions (Fig. 2b, c, Supplementary Fig. 8e–g). These boutons may enable decoding of stimulus identity, independent of its contingency. In addition, we found in control experiments that the instruction signals (Go vs. No-Go trials) modulate the sniff rate irregularly and only mildly. This modulation varied across mice and cue types (odor vs. sound) and was not correlated with the behavioral performance (Supplementary Fig. 10), suggesting that the reward-contingency dependence of cortical bulbar feedback activity cannot be simply explained by changes in stimulus sampling behavior.

Many boutons mirrored closely the block structure of the task and changed their response (shape and/or amplitude) to the same cue within seconds of each rule-reversal event (for illustration, in Fig. 2e we re-sorted the trials from the session shown in Fig. 2d by Odor and Sound cues respectively).

We parsed and averaged the activity of individual boutons during the cue (Supplementary Fig. 9a), delay (Fig. 2d, e), or reporting (Supplementary Fig. 9b) periods. In each of these intervals, the correlation analysis indicated that the ensemble feedback bouton responses are more similar across trials of the same block type and different across blocks of opposite reward contingency (in both versions of the task, Fig. 2eBottom, g, h, Supplementary Fig. 7eBottom, f; “Methods” section). Similar results were obtained using self-organizing map analysis (Supplementary Fig. 11; “Methods” section). Overall, within a given field of view, the ensemble cortical bulbar feedback axon responses appeared more similar in Go trials across modalities (Odor Go vs. Sound Go) than when compared to No-Go trials of the same modality (Odor Go vs. No-Go; Sound Go vs. No-Go, Fig. 2h).

We further analyzed whether the time-varying fluorescence signals of individual boutons before reporting, throughout the cue, and delay periods (during the cue period for the no-delay version) changed as a function of trial behavioral contingency. To this end, we compared the activity of individual boutons across hit (H) vs. correct rejection (CR) vs. false alarm (FA) trials (Fig. 3a). Miss (M) trials occurred rarely (<3%) and were thus excluded from the analysis. Overall, responses of a given bouton were similar within the same condition and diverged across different conditions (Fig. 3a). Boutons that differentially modulated their responses across conditions (beyond 90% percentile of the distribution of trial-to-trial variations for within-condition comparisons, “Methods” section) represented a significant fraction of the responsive population in both versions of the task (44.8 ± 10.6% H vs. CR, 35.0 ± 7.5% H vs. FA; 46.1 ± 10.6% CR vs. FA Odor trials; 44.5 ± 13.1% H vs. CR, 27.7 ± 8.4% H vs. FA; 50.0 ± 13.5% CR vs. FA Sound trials, Fig. 3a). Across fields of view, cortical feedback axon response amplitude during the hit and false alarm trials was generally higher than for correct rejection trials (Supplementary Fig. 9e, f). However, we also observed differences in the responses of individual boutons when performing pairwise comparisons between trials of different contingencies in which mice licked the reward port (Fig. 3b; Odor hits vs. false alarms, 65.0 ± 7.5%; Odor vs. Sound hits, 60.1 ± 11.9%, Odor vs. Sound false alarms, 42.7 ± 12.1%). Since in all these cases, mice subsequently licked the reward port, changes in the response of individual boutons to the same stimulus across contingencies cannot be solely attributed to motion artifacts and/or preparatory motor activity. Overall, in task-engaged mice, the cortical feedback axon activity updated fast within the same session, depending on changes in reward contingency.

Fig. 3: Cortical bulbar feedback represents stimulus identity, contingency, and behavioral outcome.
figure 3

a Histogram of individual bouton response correlation values across trials as a function of behavioral contingency (Hits, H vs. false alarms, FA vs. correct rejections, CR) in Odor (Left) and Sound (Right) trials. Bouton responses were sampled between cue onset and end of the delay period (before licking, “Methods” section). Inset: Bouton response stability across conditions (H/H, H/CR, H/FA, CR/FA) reported using as reference the 90th percentile of the Hit/Hit bouton response correlation distribution (bootstrap analysis, “Methods” section). b Individual bouton response stability analysis for trials where mice subsequently licked the reward spout (hits and false alarms). Note the differences in trial-to-trial response correlation distributions when comparing Odor H/H vs. Odor H/FA vs. Odor FA/Sound FA trials. a and b: N = 20 sessions, 3 mice. Two-sided One-way ANOVA and multiple comparisons of means: * = p < 0.05 compared to ‘odor H/H’ bouton stability. c Principal component analysis (PCA) for one example session: feedback bouton ensemble response trajectories plotted in a space defined by the first three principal components (74.5 and 73.7% variance explained respectively for odor and sound trials); population response trajectories rapidly diverge as a function of trial contingency for both Odor (Left) and Sound (Right) trials. Miss (M) trials were excluded from the analysis due to their low frequency (<3%). Different task periods in each trajectory are represented by distinct traces (baseline: thin continuous; cue: thick continuous; delay: thick interrupted; report: thick dotted line). d Multi-layer perceptron classifiers were trained to decode stimulus identity (odor or sound), behavioral contingency (H, FA, CR), trial instruction (Go or No-Go), and behavior (lick or no-lick) in the delay version of the task. Top: Average classifier performance across all sessions normalized relative to baseline performance. When shuffling trial labels on the training data, the average classifier performance was 0. Bottom: Distribution of the number of licks per second across all sessions. See Source Data and Supplementary Table 1 for exact data points and p-values.

To determine how changes in individual bouton responses across blocks relate to their belonging to the same versus different cortical feedback axons, we systematically compared the responses of boutons within a small neighborhood (3–15 µm apart) as a function of stimulus identity (odor, sound), as well as reward contingency. Specifically, we identified boutons that visibly belonged to the same axon (yellow arrows – Supplementary Fig. 12a), as well as equidistant boutons in the vicinity that appeared to lie on other axonal branches (white arrows – Supplementary Fig. 12a). Consistently, we found that boutons belonging to the same axon responded more similarly to a given stimulus (black traces, Supplementary Fig. 12b, c) than boutons on different axons within the same neighborhood (<15 µm, color traces, Odor response correlation, Avg ± SEM: 0.62 ± 0.12 vs. 0.26 ± 0.15; Sound response correlation, Avg ± SEM: 0.50 ± 0.11 vs. 0.22 ± 0.15, p < 0.0001, Wilcoxon rank-sum test, Supplementary Fig. 12c, d). Indeed, example feedback boutons, as close as <5 µm apart, but putatively belonging to different axons, showed widely different responses to the same stimulus compared to equally spaced boutons on the same axon (multicolor traces, Supplementary Fig. 12b–d). This is consistent with many reports16,25,26,111 that boutons on the same axon are more similar in their responses to stimuli, as well as in their spontaneous activity than boutons on different axons. This was even more apparent when considering the pairwise correlation of bouton responses for trials of the same outcome (Odor Hits correlation, Avg ± SEM: 0.91 ± 0.04 vs. 0.50 ± 0.15 – same vs. different axons; Sound Hits correlation, Avg ± SEM: 0.74 ± 0.08 vs. 0.40 ± 0.15; p < 0.0001, Wilcoxon rank-sum test, Supplementary Fig. 12d). The responses of bouton pairs on the same axon varied in a coordinated manner across different contingencies for the same stimulus (rewarded vs. non-rewarded, across blocks Supplementary Fig. 12e), and across different stimuli for the same reward contingency (odor hits vs. sound hits, Supplementary Fig. 12f). In contrast adjacent boutons on different axons changed their responses in an uncorrelated manner (Supplementary Fig. 12e, f). In summary, the changes observed in bouton responses across stimuli and reward contingencies appear to occur in a coordinated manner for boutons on the same cortical feedback axon, and to vary widely for nearby boutons across different axons.

To visualize potential differences in the feedback response trajectories as a function of behavioral contingency (H vs. CR vs. FA trials), we used principal component analysis (PCA) in individual fields of view. For systematic quantification, we further used cross-validated decoding approaches. In many arbitrarily chosen fields of view, the population trajectories for the odor, as well as the sound trials (shown in a space defined by the first three principal components, Fig. 3c) diverged early in the trial, typically during the cue period. We further trained and cross-validated classifiers (multi-layer perceptrons, MLP, “Methods” section, Fig. 3d, Supplementary Figs. 13a–f, 14a–f) to decode different task features, including stimulus identity (odor vs. sound) and instruction (Go/No-Go), behavioral outcome (lick/no-lick), and trial behavioral contingency (hits, H, correct rejections, CR, false alarms, FA). Of note, given our task design, many of these features are interrelated and, thus, cannot be fully assessed separately. Both in arbitrary example fields of view (Supplementary Fig. 13a) and when averaging classification performance across FOVs (“Methods” section), the classifiers’ performance for decoding each of these variables rapidly increased during the cue and peaked during the delay period. The performance remained high throughout lick-reporting, as well as for several seconds after water collection (Fig. 3d). In the no-delay version of the task, decoding performance returned to baseline more rapidly than in the delay version, reflecting faster offset kinetics in the cortical bulbar feedback consistent with differences in our experimental design (Fig. 3d, Supplementary Fig. 14a, b). Performance of decoders trained to discriminate between cues (odor vs. sound) decayed faster relative to the ability to report other features (Go/No-Go instructions, stimulus contingency, etc.), consistent with the transient nature of the sensory input in comparison to other trial variables analyzed (Fig. 3d, Supplementary Fig. 13a). The representation of stimulus identity appears to occur at the level of specific ensembles of cortical bulbar neurons. Indeed, shuffling bouton labels resulted in a substantial decrease in the performance of the classifiers (Supplementary Figs. 13c, 14e). As expected, the classifier performance did not rise above baseline in shuffled trial label controls and was substantially higher in GCaMP-expressing mice compared to EGFP control experiments (Fig. 3d; Supplementary Fig. 13e, “Methods” section). Our results indicate that cortical bulbar feedback carries stimulus identity, contingency, and behavioral outcome signals, which are readily reformatted in different behavioral contexts.

Does the emergence of sound-driven cortical bulbar feedback activity require that the odor and sound cues occur in close temporal proximity? Or does it rather reflect changes in the contingency of behaviorally relevant stimuli across sensory modalities? To start answering these questions, we monitored the activity of cortical bulbar feedback boutons expressing GCaMP7b112 in mice trained in an auditory-only Go/No-Go task (Sound A vs. Sound B, no odors, “Methods” section). During training, care was taken so that no odor cues were present. Mice were exposed to both sound cues equally across days during training but were rewarded for licking in response to only one of the cues. With training, mice quickly learned the cue-reward association and refrained from licking the unrewarded sound (“Methods” section). Similar to the previous analysis, we analyzed the changes in the cortical feedback bouton fluorescence from the cue onset to the end of the delay period (prior to licking). We observed sound-triggered responses to both sound cues in naïve mice (first session on the training and imaging rig, “Methods” section), as well as during learning, and in expert animals (Fig. 4a, Supplementary Fig. 15a, b). Across fields of view, over the course of six days analyzed (N = 3 mice per day), the amplitude and dynamics of the sound-triggered responses changed substantially, revealing unexpected complexity (Naïve: 4.6 ± 4.6%; Day 4: 16.4 ± 2.4%; Day 6: 8.3 ± 7.7% responsive boutons). As learning of the sound-reward associations progressed, the responses of cortical feedback boutons became specifically more tuned to the rewarded (Go) sound cue, and displayed both enhancement and suppression with respect to baseline (Fig. 4b, day 6; Supplementary Fig. 15b). In the expert mice, rewarded (Go) sound responses were, on average, higher in amplitude than responses to the non-rewarded (No-Go) sound cue (Wilcoxon rank-sum, enhanced and suppressed: p < 0.001; Supplementary Fig. 15a, b, h). In contrast, in naïve sessions, responses to the Go and No-Go sound cues were more similar (Wilcoxon rank-sum, enhanced: p = 0.77; suppressed: p = 0.13, Fig. 4b, Supplementary Fig. 15a, b, h). Across the population, the magnitude of the sound responses in the cortical feedback boutons was similar in expert mice engaged in the odor/sound rule-reversal versus the auditory-only Go/No-Go task (6 days vs. several weeks, Wilcoxon rank-sum test, enhanced and suppressed: p = 0.99; Supplementary Fig. 15b, g, i; Supplementary Table 2). Across sessions, the performance of classifiers for decoding the instruction signals (Go vs. No-Go) improved with training and did not plateau within the six-day training window (Fig. 4c, d). As training progressed, signal instructions could be decoded progressively earlier within the span of a trial (during the cue and delay periods, Fig. 4c). Similar to the rule-reversal task (Fig. 3a, b), in expert mice engaged in the auditory-only Go/No-Go task, the cortical feedback bouton ensemble responses during both the cue and delay periods were different for hit vs. false alarm trials, despite the animal licking in both cases (Supplementary Fig. 15c). As such, the observed responses cannot be explained as simply the results of motor preparatory activity. In additional habituation experiments, we analyzed the activity of cortical feedback boutons across 6 consecutive days as head-fixed mice (N = 3) passively experienced the two sound cues (A vs. B; no water rewards were provided, “Methods” section, Supplementary Fig. 15d, e). Compared to the task-engaged mice, in these habituation experiments, the feedback bouton responses were on average sparser and lower in amplitude (Wilcoxon rank-sum test, enhanced and suppressed: p < 0.0001; Supplementary Fig. 15j). We did not observe a systematic differential modulation of the responses to the two cues across days (interestingly, responses to one of the cues were stronger, Supplementary Fig. 15d, e, j). As such the decoding performance was above chance. However, the decoding performance of the classifiers was stable across imaging sessions (days 1–6), in contrast to the steady increase in decoding performance during learning in mice engaged in the auditory-only Go/No-Go task (Supplementary Fig. 15f). We conclude that sound cues trigger sparse responses in the cortical bulbar feedback axons in naïve mice, whose strength and specificity are further augmented by learning of the stimulus-reward associations. Overall, these observations are consistent with the hypothesis that the cortical bulbar feedback represents, in addition to odor-specific information, reward contingency signals across different modalities.

Fig. 4: Cortical bulbar feedback responses represent stimulus contingency in an auditory-only Go/No-Go task.
figure 4

a Example average responses of individual cortical bulbar bouton responses to Go (Sound A, Left) and No-Go (Sound B, Right) cues in an auditory-only Go/No-Go task. Shaded areas mark different task periods: cue (gray); delay (green); report (pink). Blue (Sound A) and red (Sound B) traces represent the average change in fluorescence across trials (z-scored). b Average cortical feedback bouton responses in example fields of view parsed by instruction (Go or No-Go) and across days of training (naïve, day 4, and day 6). c Average multi-layer perceptron performance for decoding the instruction signals (Go vs. No-Go) across training sessions (N = 3 mice per day) for the task in (a, b). d Peak performance of classifier sampled from cue onset to the end of the delay period in the auditory-only Go/No-Go task. Error bars: ±SEM.

As in our task rule-reversals occur in the absence of an overt cue, at the boundary between blocks, expert mice usually take a few trials (≤7, Supplementary Fig. 1i, j) to compute that reward contingencies have flipped and to update their reward collection strategy accordingly. This raises the question of whether cortical bulbar feedback activity mirrors the perceived current reward rule, and thus lags in updating in a manner similar to the animal’s behavior performance. Indeed, we found signatures of such representational leakage in the cortical feedback bouton ensemble activity: post rule-reversal events, ensemble feedback activity characteristic to the previous block persisted for several trials (Fig. 5a).

Fig. 5: Cortical bulbar feedback activity mirrors the perceived reward rule.
figure 5

a Example bouton responses during block transitions sampled throughout the ‘delay period’ from one field of view in an expert mouse. (Top) Odor trials. (Bottom) Sound trials. Gray bars mark the trial outcome (correct – light gray; incorrect – dark gray). b Example individual bouton response traces from (a) (asterisks) to odor (Top) and sound (Bottom) before and after the contingency switch (0; vertical line). Interpolated responses are shown (“Methods” section). c Block transition neuronal distance analysis: Pearson correlation (ρ) was calculated between the bouton ensemble response (delay period) of a given trial of the current block and the average bouton ensemble response over the last five trials of the preceding block. The average neuronal distance is defined as 1 – ρ, and shown for the first twelve trials of a given block (N = 20 sessions, 3 mice). d Average behavioral performance following block transitions across sessions (N = 20 sessions, 3 mice; “Methods” section). e Correlation between the neuronal distance and the block behavioral performance. Color bar: Trial index of each correlation value (N = 20 sessions, 3 mice). Pearson’s Correlation: R2 = 0.85 (p < 0.0001). f Optogenetic perturbation of aPCx-originating feedback locally within the olfactory bulb (“Methods” section). In expert mice, cortical feedback was suppressed 500 ms before the start of the cue period and continued until the end of the reporting period (2.4 mW, 595 nm) in 25% of the trials of a behavior session. g Behavioral performance quantified for odor (Left) and sound (Right) trials independently in Jaws-aPCx and EGFP-aPCx expressing mice: (Jaws – no-light) vs. (Jaws – 2.4 mW) light-on trials. Two-sided One-way ANOVA and multiple comparisons of means: *** = p < 0.0001; n.s.: non-significant. All panels error bars: ±SEM. See Source Data and Supplementary Table 1 for exact data points and p-values.

This was also reflected at the level of individual bouton responses across trials (Fig. 5b). We calculated a neuronal distance (1-Pearson correlation) between the bouton ensemble response (delay period) trajectory of the preceding block and the ensemble trajectory of each trial of the current block to the same stimulus for both odor and sound trials (“Methods” section). This metric increased systematically across trials and matched the increase in behavioral performance in the new block post rule-reversal (Fig. 5c, d). Plotting the mean neural distance in the ensemble trajectory versus the average behavioral performance (across blocks and sessions) revealed a robust correlation (R2 = 0.85, p < 0.0001) between the cortical feedback activity and the update in behavioral reporting strategy (Fig. 5e).

To determine whether cortical bulbar feedback is necessary for expert mice to perform our task, we applied local optogenetic perturbations on the cortical bulbar feedback axons within the olfactory bulb. Using a viral strategy, we expressed Jaws, an inhibitory opsin113 in piriform cortex neurons (via AAV-Jaws-EGFP injections in the anterior part of the piriform cortex, aPCx, “Methods” section; Fig. 5f). We monitored changes in the behavioral performance in both odor and sound blocks in catch trials during light stimulation (25% of trials, “Methods” section). Perturbing cortical feedback activity by local optogenetic stimulation within the olfactory bulb impaired the behavioral performance compared to control trials (Odor trials – Jaws-2.4 mW: 66 ± 3% vs. 81 ± 1% Jaws-no light; Sound trials – Jaws-2.4 mW: 60 ± 4% vs. 81 ± 1%; p < 0.001, N = 3 mice; Fig. 5g; “Methods” section) and to sessions using mice that expressed only EGFP in the cortical bulbar feedback axons (under same light stimulation conditions, Fig. 5g, “Methods” section). These differences in behavioral performance were reflected as increases in the rate of false alarms and misses for both the odor, as well as the sound trials. Our results are consistent with a scenario in which the cortical bulbar feedback contributes to assessing the behavioral (reward) contingency of stimuli across multiple sensory modalities (e.g. odor and sound cues), and relays signals to the olfactory bulb that extend beyond processing olfactory input.

Discussion

Taking advantage of a novel Go/No-Go rule-reversal task engaging olfactory and auditory cues (Fig. 1), we found that the feedback axons from piriform cortex to the olfactory bulb relay identity and reward contingency information across multiple sensory modalities. The cortical bulbar feedback axon responses are reformatted upon changes in stimulus-reward contingency rules and mirror the behavior of expert mice across rule reversals (Figs. 2, 3, 5). Furthermore, optogenetic suppression experiments (Fig. 5g) suggest that the cortical bulbar feedback is part of a larger processing network that enables mice to adapt to sudden changes in stimulus-reward contingencies.

To investigate whether the cortical bulbar feedback represents reward contingency and supports behavioral flexibility104,105,114, we focused our analysis on the cue-evoked responses of feedback boutons preceding the behavioral readout (lick/no-lick assay). As in previous work16,107,108,109,110, we observed both enhanced and suppressed feedback responses that were roughly balanced in their frequency. The presence of enhanced and suppressed responses may increase the dynamic range of cortical action in controlling the activity of bulbar outputs. Decoding analysis suggested that both the enhanced and suppressed feedback responses participate to representing various stimulus features (e.g., identity, contingency, etc., Supplementary Figs. 13b, 14c). Further analysis is, however, necessary to determine whether these signals, arising presumably from distinct populations of piriform outputs107,108,109,110, carry signals involved in different computations16,23.

To increase the separability of potential motion artifacts and rule-related signals, we imposed a short delay between the offset of the sensory cue and the behavioral reporting. Many feedback boutons modulated their responses in tight correlation with changes in the reward contingency rules (Figs. 2, 3). The re-organization of cortical feedback activity included re-shaping the kinetics, amplitude, and response polarity of individual boutons (Figs. 2, 3; Supplementary Fig. 8). It generally lagged the rule-switching events by a few trials (~7; Fig. 5a–c; Supplementary Fig. 1i, j) and was correlated with changes in behavioral performance (Figs. 2, 5a–e). Responses in the piriform-to-bulb feedback were triggered by both the odor and the sound cues (Figs. 24, Supplementary Figs. 57). Feedback bouton activity modulation across conditions (Go vs. No-Go blocks) could not be simply explained by motion artifacts as indicated by EGFP control experiments (Supplementary Fig. 1m–o; 5d, e; 13e), nor by changes in sniffing (Supplementary Fig. 10), consistent with recent reports in related tasks115,116. In parallel, many bouton responses were robust to the rule reversals, and may enable stable representations of the sensory input identity despite changes in reward contingency (Fig. 2; Supplementary Fig. 8). Interestingly, consistent with previous reports during odor-triggered behaviors13, cortical feedback responses in expert mice engaged in the rule-reversal task were substantially denser than in naïve mice16,33, revealing an increased contribution of top-down input to shaping the bulb activity as a function of rule learning. Overall, the population-based decoding analysis indicated that the cortical bulbar feedback carries signals related to stimulus identity, reward contingency, and trial outcome (Fig. 3). On average, feedback responses triggered by the same cue, had higher amplitude during hits and false alarms than during correct rejection trials (Supplementary Fig 9e, f). However, we also observed differences in the responses of individual boutons across trials of different contingency in which mice subsequently licked the reward port in expert mice engaged in both the rule-reversal as well as the auditory-only Go/No-Go task (Odor Hit vs. Odor FA vs. Sound Hit, Sound A Hit vs. Sound B FA, etc., Fig. 3b; Supplementary Fig. 15c). Thus, the cortical feedback responses cannot solely be explained as motor preparatory activity. The decoding analyses were successful even when zooming into arbitrarily chosen individual (~50 × 50 µm) fields of view, revealing robust representations of the task features in the cortical feedback activity. Across multiple rule-reversals within the same session, the neural ensembles transitioned fast back and forth between rule-associated representations, akin to reports in other brain regions106,117,118,119,120. A given stimulus triggered similar cortical feedback activity in blocks of trials of the same contingency rule, and dissimilar representations in blocks of the opposite reward contingency, revealing attractor-like behavior in the piriform-to-bulb neural dynamics. Furthermore, Go feedback ensemble responses across modalities (Odor Go vs. Sound-Go) appeared more similar than responses to the same cue across instruction signals (e.g. Odor Go vs. Odor-No-Go; Sound Go vs. Sound-No-Go, Fig. 2h).

The fast-updating of the piriform cortex-to-bulb feedback responses upon rule-reversal contrasts previous reports that anterior piriform cortex representations are stable, largely sensory, and only mildly modulated by learning, context, and rule-reversal99,100, but see refs. 98,102. Differences in the behavioral tasks employed across studies, and potential specificity in the activity of distinct piriform output cell types defined by their long-range projections, may account for these differences. In particular, our task differs from previous work in that it specifically requires that expert animals repeatedly switch fast back and forth between different rules of engagement within the same behavioral session. Furthermore, different groups of piriform output neurons target functionally distinct brain regions (olfactory bulb vs. orbitofrontal cortex vs. cortical amygdala vs. lateral entorhinal cortex, etc.), and are enriched at distinct locations along the anterior-posterior axis121,122,123. To date, however, most studies monitored activity in the anterior piriform cortex agnostic of the projection targets of the recorded neurons89,100,110,124,125,126,127. As such, bulb-projecting piriform cells appear to flexibly update their representations in conjunction with changes in stimulus-reward contingency. In contrast, other piriform output neurons that target, for example, the orbitofrontal cortex or other brain regions may be less affected by stimulus-reward associations and primarily represent sensory features of stimuli100.

While feedback axonal activity is required for accurate task performance (Fig. 5f), the reward information that we observe within the feedback axons may not necessarily reflect changes in the activity of the piriform cortex. Rather, it may reflect diverse neuromodulatory input acting on these cortical bulbar feedback axons locally, within the olfactory bulb. However, the spatial statistics of the data appear to be at odds with reports on the diffuse nature of neuromodulatory action38,128,129,130,131,132. The response changes across stimuli, reward contingency conditions, and blocks occurred in a coordinated manner for boutons on the same axon and varied widely for equally spaced nearby boutons belonging to different axons (3 to 15 um apart). Specifically, the responses of bouton pairs on the same axon varied in a coordinated manner across different contingencies for the same odor stimulus (rewarded vs. non-rewarded, across blocks Supplementary Fig. 12e), and across different stimuli for the same reward contingency (odor hits vs. sound hits, Supplementary Fig. 12f). In contrast adjacent boutons on different axons changed their responses in an uncorrelated manner. A parsimonious explanation for this exquisite coordination only among the boutons belonging to a given axon is that they reflect the activity of the parent piriform neuron. However, one potential alternative explanation that our analyses cannot discard is that neuromodulation acts within the bulb in a cortical feedback axon-specific manner (i.e. boutons on the same axon are modulated in the same manner) due to unique combinations of receptors and downstream signaling cascades in individual feedback axons. In this scenario, the feedback responses may indeed not reflect the spiking activity of the cortical neuron per se.

Our results open venues for investigating the mechanisms supporting the flexible gating of some piriform-to-bulb feedback signals and the stability of others, despite changes in contingency. Since the changes in response amplitude and kinetics occur within seconds (a few trials from rule-reversal), they may rely on fast gating signals, rather than slower synaptic plasticity-based changes. Further investigation is necessary to determine whether these signals originate in the piriform cortex, or emerge through interactions with other association cortical areas (e.g. OFC, mPFC, lENT)115,133,134, and/or reflect neuromodulatory action135,136,137,138 onto specific piriform circuits139,140,141. While calcium dynamics in axon terminals have been shown to reflect changes in firing rates at the soma16,26,142, an alternative possibility is that the gating of calcium signals in the cortical feedback axons occurs via interneuron input within the bulb. However, the presence of sound-evoked responses in the feedback boutons, modulated by reward contingency suggests that a local (bulbar) mechanism is not a parsimonious explanation.

The activity of mitral cells is modulated by context, learning, and stimulus contingency, and cortical bulbar feedback has been singled out as a potential signal responsible for shaping these bulb outputs13,45,56,62,63,64,65,66,67,79,80,143. A recent study reported reward-related signals in the mitral cells which are modulated by the piriform cortex-to-bulb feedback (assessed via pharmacological silencing of the piriform)116. Our data is consistent with this body of work; it provides a framework to further analyze the dynamics of mitral (and tufted) cells in mice engaged in rule-reversal tasks, and under more naturalistic conditions144, in the presence and absence of cortical feedback. Parallel feedback loops engaging the mitral vs. tufted cells and their dominant cortical targets, the piriform cortex, and AON have been reported to perform different computations17. For example, odor identity and concentration are more easily read out from the tufted cell ensemble representations, whereas mitral cells may represent subtler features of odorants17,56,64,79,80. We expect mitral cell activity to change within a few trials post-rule reversals, matching the re-organization of cortical feedback responses and changes in behavioral performance (i.e. lower mitral cell response amplitude in the Go vs. the No-Go trial blocks). Further, we expect that the tufted cell and AON-to-bulb feedback representations are more sensory in nature, robust to changes in the stimulus-reward associations, and are less affected by perturbations of the piriform-to-bulb feedback16,17.

We observed sparse, but diverse, enhanced as well as suppressed sound-evoked responses in the piriform-to-bulb feedback axons. While sound-triggered activity has been reported in the piriform cortex145, its origin and computational roles remain unclear. In auditory and visual processing, inputs from the somatosensory146,147 and auditory cortex7 are thought to shape cortical neural representations as a function of experience. Specifically, these signals may relay stimulus association memories for V1 cortical circuits to compare the predicted and actual sensory inputs7. In contrast, a recent report148 in awake passive mice identified auditory cortex-independent, stereotyped low-dimensional sound-triggered signals in the visual cortex. These responses could be predicted from small body movements and may reflect changes in internal brain states. In our experiments, the sound-triggered piriform feedback axon responses were apparent in expert mice engaged in the rule-reversal task (Fig. 2), in naïve individuals, as well as in mice performing a two-tone Go-No/Go task (Fig. 4). Stimulus-reward associations modulated the response kinetics and amplitude of the sound-triggered feedback bouton responses across different time scales. In expert mice, within a given session, we observed fast updating in the responses of individual boutons, within seconds from the rule-reversal events. In addition, in the auditory-only Go/No-Go task, bouton responses changed across sessions during learning of the stimulus-reward associations, becoming more specifically tuned to the rewarded sound cue. In contrast, in sound habituation experiments, on average, feedback bouton responses were sparser and lower in amplitude, and did not change in a systematic manner across days (Supplementary Fig. 15d–f). While, sound responses were diverse and aligned well to different epochs of the rule-reversal task (cue, delay, report), we did not investigate here whether varying sound stimulus features (frequency, amplitude, etc.) impact the cortical bulbar feedback activity. Many sound and odor-evoked feedback responses in the expert mice lingered for seconds even after the reporting period, raising the possibility that they serve as lasting memory traces associated with the stimulus-reward contingency16 (Figs. 24; Supplementary Figs. 57, 13, 14).

Why might the olfactory bulb need to “know” when a stimulus (odorant and/or especially a non-olfactory stimulus, such as a tone) is rewarded? One possibility is that the cortical bulbar feedback serves as a binding motif in cases where odors and sounds are required together to obtain a reward. While it may be more intuitive for the binding to occur further downstream, other reports have shown that information regarding one sensory type is distributed across sensory and motor pathways of another type7,8,9,149,150,151. An alternative possibility is that the feedback axons help perform credit assignments. If subtle aspects of odorants drive the reward, these features would be more readily accessible to the cortex post mitral cell activity re-shaping due to cortical feedback input. Thus, a two-part representation — “what is it?” (via the TC↔AON pathway) and “what is new/different/important about it?” (via the MC↔PCx pathway) — may enable better representation of what aspects of stimuli in the environment lead to rewards. This process can be viewed as a representation learning algorithm that learns a mapping from sensory inputs to a (latent) feature space; the mitral cells may pass on a residual representation (i.e. reconstruction errors13,15,16,17,19), which highlights, in addition to identity, aspects of stimuli related to their associated reward contingency, context and/or level of engagement. The impaired behavioral performance observed in both odor and sound trials upon optogenetic suppression of the cortical feedback locally within the bulb (Fig. 5) is consistent with this scenario.

Previous work indicated that the medial prefrontal, orbitofrontal cortex, and basolateral amygdala circuits support behavioral flexibility in olfactory processing99,100,103,152,153. Dense orbitofrontal-to-piriform cortex bidirectional interactions133,154, top-down inputs from mPFC to AON, piriform cortex and olfactory striatum (tubercle), and neuromodulatory signals may shape the representations of sensory stimuli as a function of learned odor-reward associations155,156,157 and attentional state115,134. Our results suggest that the piriform-to-bulb feedback acts as part of a larger processing network that relates stimulus-reward associations to implementing decisions that drive behavior in dynamic environments.

Methods

Mice

Overall, 20 B6129SF2/J mice were used (JAX Laboratories®). Eleven for the rule-reversal task experiments, including 4 – no-delay, 3 – delay versions of the task, 4 for the EGFP controls, 3 for the auditory-only Go/No-go task; 3 for the auditory habituation sessions and 6 for the optogenetic suppression experiments (3 for Jaws and 3 for EGFP controls). All animal procedures conformed to NIH guidelines and were approved by the Animal Care and Use Committee of Cold Spring Harbor Laboratory. Mice were maintained at room temperature and 40–60% humidity in a 12 light/12 dark light cycle.

Surgical procedures

Mice were injected with NSAID Meloxicam 0.5 mg/Kg (Metacam®, Boehringer Ingelheim. Ingelheim, Germany) 24 h prior the surgical procedure, at the onset of surgery, and for 2 days post each surgical procedure. Depending on the recovery progression, the NSAID treatment was maintained until the mice showed alert and responsive behavior. Before each stereotaxic surgery, mice were anesthetized with 10% v/v Isofluorane (Cat# 029405. Covetrus. Portland, ME, US). For the chronic window and head bar implantation procedures, mice were anesthetized with a ketamine/xylazine (125 mg/Kg–12.5 mg/Kg) cocktail. During surgery, the animal’s eyes were protected with an ophthalmic ointment (Puralube®. Dechra. Nortwich, England, UK). Temperature was maintained at 37 °C using a heating pad (FST TR-200, Fine Science Tools. Foster City, CA, USA). Respiratory rate and lack of pain reflexes were monitored throughout the procedure. Chronic window implant surgeries were supplemented with dexamethasone (4 mg/Kg) to prevent swelling, enrofloxacin (5 mg/Kg) to prevent bacterial infection, and carprofen (5 mg/Kg) to reduce inflammation.

Viral infections

To target the piriform cortex-to-bulb feedback for the imaging and optogenetic perturbation experiments, we used the following viruses: AAV2/9-EF1a-Flex-GCaMP5 and AAC9-Cre (Penn Vector Core, Philadelphia, PA, USA), AAV1-syn-jGCaMP7b-WPRE (Cat# 104489-AAV1), AAV1-hSyn-EGFP (Cat# 504650-AAV1), and AAV5-hSyn-Jaws-KGC-GFP-ER2 (Cat # 65014-AAV5) from Addgene (Watertown, MA, USA).

AAV stereotaxic injections

Adult mice (males >60 days old, 25–40 g) were used for the stereotaxic surgery. After the induction step (isoflurane), mice were positioned in the stereotaxic device which was fitted with an isoflurane delivery mask (Cat# 942. Kopf®. Tujunga, CA, USA), and their eyes were covered with ophthalmic ointment. The head surface was cleared using hair removal cream (NairTM) and cleaned with betadine and saline. The skin was removed and the surface of the skull scraped of connective tissue to identify the cranial sutures. These were further used to align the anterior-posterior head angle by leveling the bregma and lambda with respect to the horizontal. Mice were injected bilaterally with AAV (~280 nL per site) using a calibrated borosilicate glass micropipette (tip diameter, ~10 μm) through small craniotomies (~1 mm) spanning 1 mm along the anterior-posterior axis of the aPCx in both hemispheres. Coordinate 1: anterior-posterior, AP + 2.5mm, medial-lateral, ML ± 2.2mm, dorsal-ventral, DV3.00mm; Coordinate 2: AP + 2.0mm, ML ± 2.2mm, DV3.5mm; Coordinate 3: AP + 1.5mm, ML ± 2.8mm, DV3.75mm. AP and ML coordinates were estimated from bregma, and all DV coordinates were measured from the pia surface. AAV Injections were delivered using a Picospritzer III (General Valve) and pulse generator (Agilent) by pressure application (5–20 psi, 5–20 ms at 0.5 Hz). Mice were injected with a 1:1 mixture of AAV2/9-EF1a-Flex-GCaMP5 and AAV9-Cre for imaging, with AAV5-hSyn-Jaws-KGC-GFP-ER2 for optogenetic suppression, and with AAV1-hSyn-EGFP for the EGFP control experiments. Mice trained in the two sounds Go/No-Go task were injected with AAV1-syn-jGCaMP7b-WPRE.

Chronic implantations

After recovery from the stereotaxic surgery, mice were implanted with a custom titanium head bar attached with C&B Metabond Quick adhesive luting cement (Cat# S380. Parkell. Edgewood, NY, USA), followed by black Ortho-JetTM dental acrylic application (Cat# 1520BLK. Lang. Chicago, IL, USA) and with a cranial window on top of the olfactory bulb as previously described16,17. Special care was taken to remove the bone under the inter-frontal suture and remove small bone pieces at the edges. During surgery, the exposed olfactory bulb was continuously protected and cleaned of blood excess using artificial cerebrospinal fluid (aCSF) and aCSF-soaked gelfoam. Once both hemibulbs were exposed and clean, a fresh drop of aCSF was placed on top, followed by a 3 mm round cover glass, which was gently pushed onto the OB surface to minimize motion artifacts in further experiments. Once in place, the coverslip was sealed along the edges with a combination of VitrebondTM, Crazy-GlueTM, and dental acrylic to cover the exposed skull. Mice recovered for ~7 days before imaging experiments. Adult males were used for chronic window implantation due to their larger skull size which facilitates surgical procedures.

Behavioral training

Mice were water-deprived until reaching 85% of their original weight. Once the desired weight was achieved, head-fixed mice were trained to discriminate between two brief (350 ms) sensory cues: a pure tone (6.221KHz, 70dB) target (Go) stimulus and a monomolecular odorant (1% ethyl valerate) distractor (No-Go). The pure tone choice was based on a prime number to minimize harmonics. Care was taken to choose an odorant cue that had high SNR photoionization device (PID, Aurora Scientific) readings. Olfactory cues were presented from an odor port placed in front of the mouse’s snout and auditory cues were delivered from a speaker on the side. Each training session is composed of ~270 consecutive trials separated by variable inter-trial intervals (ITI, Supplementary Fig. 1f) defined by drawing from an exponential decay function (flat hazard rate) within a 0.3–1.2 s range on top of a bias value (9 s; Eq. 1):

$$\begin{array}{c}\,f\left({x|}\mu \right)=\frac{1}{\mu }{e}^{\frac{-x}{\mu }}\\ \mu=0.2s\\ 0.3s\le f\left(x\right)\le 1.2s\end{array}$$
(1)

Trials were randomly assigned (p = 0.5) as odor trials or sound trials. In the Go trials, mice were trained to report the presence of the Go stimulus by licking a spout in front of their mouth, from which they received a small water reward (3.3 μL, Hit trials). In addition, mice were trained to refrain from licking (Correct Rejections) in the No-Go trials, so they could move faster to the next trial. Incorrect trials (False Alarms and Misses) were punished by lack of reward and the addition of a time-out period (10 s) to the regular ITI plus a one-second-long 70 dB white-noise sound. During the days preceding the first in-session rule-reversal (‘Day 0’), mice were trained so as to reach an average performance per session of no more than 70% correct trials. We implemented this protocol, aiming to ensure that mice did not get fixated on one of the two rules we employed. In a given session only one rule was used. Before ‘Day 0’, mice experienced rule-reversals across sessions and generally were substantially better at performing the task using one of the rules, and close to the chance level for the other rule, as shown in an example early training session in Supplementary Fig. 1c.

Once mice learned the stimulus-reward association to higher than 70% accuracy (typically approximately two weeks after the start of training), we switched the reward contingency in blocks of contiguous trials between Odor Go blocks (odor rewarded) and Sound-Go blocks (tone rewarded) within the same session. Once rule-reversal within the same session was introduced at ‘Day 0’, mice retained higher performance for the rule they were more comfortable with prior to ‘Day 0’. Across further training sessions, mice steadily improved their behavioral performance for both rules.

The initial stages of the rule-reversal training consisted of classical conditioning to the sound cue by repeatedly pairing the sound cue to a free water delivery. In this phase, animals were allowed to lick at will, without punishment. Automatic (free) water delivery throughout the training was used as a means to remind mice about the presence of water rewards. In each training session, such trials were not counted as correct trials and were excluded from calculating the behavioral performance. If the animals reliably licked to the cue, we introduced the odor cue as the unconditioned stimulus. We delivered water rewards when the mouse licked for the correct cue within the designated response period. Once mice reached >70% correct performance, we started reinforcing the timing parameters by turning on punishment (air puff), which was delivered for both incorrect choices and early licks, but not Miss trials (failing to lick for a rewarded cue). Once the animals retained >70% correct performance, we reversed the reward contingencies and started rewarding the odor cue. Care was taken to not over train the animals before this point. The reversal was not cued, but during the initial stage of the reversal training, we switched off the time-out punishment, while instead relying on a white-noise error signal. The best behavioral performance was obtained in this configuration, potentially because mice used the white-noise as a general error signal. During this period of early training, the error rate was high, mice predominantly licking for the sound cue, and ignoring the odor cue. We let mice learn at their own rate, allowing as many sessions as necessary to reach the same 70% performance criterion as before the reversal. Further, we introduced rule-reversals within the same session on a regular basis, while shortening the number of trials in between reversals from ~130 per block on Day 0 to a steady state of ~45 trials between reversals (block design). Mice were trained in one or two sessions per day, with each session lasting from 45 min to 1.5 h.

We trained mice in two versions of the task. In the no-delay version, a trial started with a variable length baseline, followed by the delivery of a brief sensory cue (0.35 s) and a reporting period (1.5 s) which started at the cue onset. Odor delivery was not triggered on respiration. Once we successfully trained mice in this version of the task, to disambiguate the neural signatures of reward contingency from motion artifacts related to licking and other signals, we further implemented a delay version in which we imposed a 500 ms delay period between the cue offset and the reporting period. To receive a water reward in the Go trials, mice had to lick the water spout during the reporting period. Any trial where one or more licks occurred before the start of the reporting period was classified as an early-lick trial and excluded from further analysis. Upon achieving a steady 80% average session performance, mice were considered experts at performing the task and were further used for chronic multiphoton imaging sessions and optogenetic suppression experiments. When they reached the expert level, mice performed well in both odor and sound trials across both block types (Odor Go and Sound Go) while still displaying slight biases for one of the two (Supplementary Figs. 1d, e; 10e).

The duration of a trial varied as follows. Cue presentation (350 ms) was followed by a fixed 500 ms delay period (in the delay version) and by the reporting period (up to 1.5 s, when the mouse can lick the reward spout). The animal’s behavior added additional variability to the effective inter-cue duration: (a) in a Hit trial, a fixed water reward was delivered, and the behavior control software moved to the next state (ITI) once the animal ceased to lick for 100 ms; (b) in a false alarm trial, as the animal licked, no water reward was delivered from the spout, and a 10 s additional time-out is imposed before moving to the ITI state; (c) in a correct rejection or a miss trial, the control software awaited until the end of the 1.5 s report period before moving to the ITI state. As the inter-cue period and the ITI varied from trial to trial (Supplementary Fig. 1f), mice cannot lock their licking (or lack thereof) to a strict time window following the cue onset (as observed in our behavioral data, Fig. 1e, Supplementary Fig. 1b), unless they actually detect and act on both types of sensory cues.

Expert mice performed ~270 trials/session. We chose a block size of 45 trials as a tradeoff between being short enough to afford multiple switches per session (~5) and long enough to allow for performance stabilization after each switch. Keeping a flat hazard rate for the number of trials per block was difficult under these constraints. In a subset of experiments (when performing the sniffing analysis, see below), the length of the blocks was randomly varied between 42 and 48 trials per block (Supplementary Fig. 1h). In these experiments, mice also achieved >80% expert performance (both during Odor Go and Sound Go blocks) within a similar training time as in the other experiments. The performance for the odor as well as the sound trials was >80% (Supplementary Fig. 10d).

Monitoring sniffing

We monitored sniffing in control expert mice performing the rule-reversal task. We used a mass airflow sensor (Honeywell AWM3300V) mounted into a 3D-printed nose mask coupled to the odor delivery system126. Signals were further amplified, digitized (1 KHz) and low-pass filtered (10 Hz cutoff). Expert mice used for the sniff monitoring controls experienced a variable number of trials per block (41–49 trials; Supplementary Fig. 1h).

Auditory-only Go/No-Go task

Mice expressing GCaMP7b in the anterior piriform cortex-to-olfactory bulb feedback were trained in an auditory-only Go/No-Go task (6.2 and 15.6 KHz tone cues, at 70 dB, 350 ms), and feedback axons were imaged during training through a chronic window placed on top of the olfactory bulb. The trial structure of the auditory-only Go/No-Go task was similar to the Odor/Sound Go/No-Go task, without reward contingency changes within or across sessions. Two mice were trained with the 6.2 KHz-Go/15.6 KHz-No-Go rule and one with the opposite rule (15.6 KHz-Go/6.2  KHz-No-Go). Naïve response sessions were acquired before water-depriving the mice for behavioral training. The trials followed the same structure as described above for the delay version of the rule-reversal task.

Auditory habituation sessions

Mice expressing GCaMP7b in the anterior piriform cortex-to-olfactory bulb feedback were habituated to the same sound stimuli used for the auditory-only Go/No-Go task and feedback axon activity was imaged across six days of sound habituation experiments.

Multiphoton imaging

A Chameleon Ultra II Ti:Saphire femtosecond pulsed laser (Coherent) was coupled to a custom-built multiphoton microscope. The shortest optical path was used to steer the laser to a galvanometric mirror (6215HB, Cambridge Technologies) based scanning system. The scanning head projected the incident laser beam (930 nm) through a scan lens (50 mm FL) and tube lens (300 mm FL) so as to fill the back aperture of the objective (Olympus 25X, 1.05 NA). A Hamamatsu modified H7422-40 photomultiplier tube was used as a photo-detector and a Pockels cell (350-80 BK, 302RM driver, ConOptics) as a laser power modulator. The current output of the PMT was transformed to voltage, amplified (SR570, Stanford Instruments), and digitized using a data acquisition board that also controlled the scanning system (PCI 6115, National Instruments). Image acquisition and scanning were controlled using custom-written software in LabView (National Instruments). Using submicroscopic beads (0.5 µm) and a 1.05 NA, 25× Olympus objective, the point spread function (PSF) was calculated x-y (1.0 µm FWHM) and z (2.0 µm FWHM). Cortical bulbar feedback activity was sampled at 16 Hz (160 × 128 pixels, FOV size 48 × 48 μm, 0.30–0.38 µm pixel size) for ~6 s per trial. The sound produced by the galvo-mirrors at the scan frequencies employed fell out of the mouse audible range (<1 KHz), and was therefore unlikely to provide an extra cue for solving the task. Mice were imaged between 4-to-6 weeks after the AAV injections. We sampled multiple fields of view distributed rostro-caudally and medial-laterally throughout the dorsal aspect of the olfactory bulb. Imaged fields of view were located 200-300 µm below the bulb surface, just below the mitral cell layer, and spanning the internal plexiform layer and the dorsal edge of the granule cell layer (Supplementary Fig. 2e, f).

Optogenetic suppression of the cortical bulbar feedback

Mice were bilaterally injected at the same aPCx coordinates as for GCaMP expression with AAV to express the inhibitory opsin Jaws. On the same day, mice were bilaterally implanted with cannulas loaded with 200 μm diameter optic fibers (Doric: MFC_200/230-0.48_2.0mm_ZF1.25(G)_FLT) on top of the olfactory bulb (coordinates: AP: +1.2 from the inferior cerebral vein; ML: ±1.2 mm from inter-frontal suture; DV: 0.0 mm OB surface). The space between the optic fiber and the edges of the skull craniotomy was filled with white petrolatum (Dynarex) and the optic fiber cannulas and a metallic head-bar were attached to the skull using a combination of dental cement (Metabond®), black dental acrylic resin (Ortho-Jet™) and cyanoacrylate glue (Krazy-Glue®). After training and reaching >80% performance, mice were ready for optogenetic suppression experiments (0.25 probability of experiencing a trial with light stimulation). Optic fiber cannulas were connected to a branched dual patch cable (Doric, Cat # SBP(2)_200/230/900-0.48_1m_SMA-2xZF1.25) using ceramic sleeves (ThorLabs, Cat # ADAL1). Light stimulation was performed using a 590 nm LED (ThorLabs Cat # M590F3) calibrated to deliver 2.4 mW at the tip of the patch cable. Stimulation (5 ms, 30 Hz light pulses) was triggered 500 ms before the start of the cue period and continued until the end of the reporting period (2.85 s). Equivalent experiments were performed in control animals expressing EGFP instead of Jaws.

Data analyses

Movement correction and ROI selection

Rigid registration in MATLAB was applied to the fluorescence time-lapse stacks acquired. Images were visually inspected to select a motion-free sequence of frames and create a median reference image to which we registered each image stack corresponding to a given trial. ROI selection was performed manually (ImageJ): we used both the median and standard deviation projections of the registered images to draw ROIs around the cortical feedback axonal boutons in the FOV.

Bouton and frame rejection, and interpolation

We iteratively searched for an optimal interpolation method that could be applied to the data, such that we could maximize the number of trials and ROIs to be included in the further analysis steps, without introducing erroneous data. Based on our testing (see below), we identified thresholds for discarding trials and specific ROIs. Multiphoton imaging in awake head-fixed mice frequently displays signal loss due to brain motion: field of view (FOV) loss, where the signal in all boutons is compromised at the same time, and bouton loss, where one ROI moves out of the plane of imaging/field of view due to a more complex movement (e.g. rotation). To find an appropriate interpolation method, we used behavioral sessions that had the fewest number of skipped frames. Specifically, we used ground truth trials that had at least 90 contiguous frames without NaNs to evaluate the performance of several interpolation methods. We introduced NaN values across sequences of images of varying length (1–45) at all possible time points and analyzed the performance of three standards (linear, spline158, Akima159), and of one custom interpolation method, we call step interpolation, where we replace the missing data with the value immediately preceding it. By evaluating the mean squared error between the ground truth signal and the interpolated version, we identified the best interpolation method among the ones analyzed as a function of the number of missing frames: Akima (if the window of missing frames is shorter than 7), step interpolation (if the fluorescence value occurring just before the missing window is smaller than the value following immediately after, and the difference is greater than a set threshold); and linear interpolation in all other cases. For our data set, this algorithm could not reliably interpolate windows of missing data longer than 15 contiguous frames (error exceeded 1 standard deviation of the signal at windows longer than 20 frames). Using a conservative threshold, if any trial had a loss window in any of its ROIs longer than 15 frames, it was not considered for further analysis. In addition, we set a threshold of 70% for the minimum amount of valid data points in a trial: if more than 30% of frames were NaNs, the trial was discarded. After trial and ROI rejection, the interpolation strategy described was applied to fill in any missing data.

Bleaching correction

Post individual ROI extraction, we applied a bleaching correction, assuming that: (1) bleaching follows a similar trend across trials for each individual ROI; (2) during the baseline period, the measured activity is random. Given these assumptions, by averaging across trials the fluorescence of an ROI, the only robust trend potentially present in the baseline would be the characteristic bleaching for that ROI. We checked that the baseline fluorescence of a given bouton is comparable across trials within a session. The procedure used for each ROI was as follows: (1) we averaged the fluorescence traces across all trials for that ROI (aligned to stimulus presentation), (2) fitted an exponential decay function to the baseline period of the fluorescence trace; (3) extended the fitted function to the length of each trial, (4) subtracted the fitted function from the ROI trace.

Normalization

For the decoding analysis, we normalized the data so as to use the same network across all datasets and ROIs, since multi-layer perceptrons (MLP) are sensitive to the range of the input values. For each ROI and trial, we z-scored the traces individually, using the signal baseline period to compute the necessary statistics (mean and standard deviation).

Bouton responsiveness and classification

To evaluate the responsiveness of each bouton, we obtained the distribution of average z-score values from the baseline periods and used a 99 percentile value of the distribution as the criterion to decide if a bouton was responsive (Supplementary Fig. 4). The baseline reference distribution was obtained from average z-score values each quantified over six-frames intervals extracted from the baseline period and accumulated across all trials. For each bouton and trial type, we also obtained average z-score values of equivalent six-frame periods extracted between the start of the cue delivery to the end of the response period (parsed by cue, delay, and response periods). For each bouton and trial type (Go/No-Go), we compared the average response across trials (parsed by cue, delay, and response periods) with the baseline reference distribution. If at least one of these values crossed the 99th/1st percentile of the baseline distribution criterion, the bouton was classified as responsive (enhanced or suppressed). If some of these values were below the bottom 1st percentile and some above the 99th percentile of the baseline distribution, the response was classified as complex.

We analyzed how systematically changing the signal threshold alters the number and diversity of responsive boutons in both GCaMP and EGFP experiments. Across different signal thresholds, substantially more responsive boutons were identified in the GCaMP vs. EGFP imaging data (Supplementary Fig. 4d, e).

Clustering analysis

Waveforms were normalized by the largest magnitude response of the absolute value of each trace. We used the k-means clustering function in MATLAB (Euclidean distance). Cluster quality was assessed by calculating the average distance between the average waveforms assigned to each cluster (d). To determine the total number of clusters, we calculated the average d, while varying the number of clusters from 2 to 100. The total cluster number was chosen using a cutoff threshold for which the average decrease in d plateaued.

Inter-block correlation analysis

To determine whether bouton response patterns change accordingly to each rule-reversal, for each field of view, we used the average z-scored fluorescence values aggregated across the delay period for both the odor and sound cue trials (Fig. 2e). We assembled an average response vector of the 1st or 2nd block of trials within a session for each cue by randomly picking half of the trials of each of those blocks (reference vectors). We further calculated the Pearson correlation values between these reference vectors and each column (trial) of the delay period response matrix for the odor and sound trials respectively. For each session, we repeated the bootstrapping of random trials to generate reference vectors one hundred times. One example (mean ± SEM) correlation trace is shown at the bottom of Fig. 2e. We further calculated an inter-block correlation by obtaining the mean ± SEM of the Pearson coefficient values of each Odor Go, Odor-No-Go, Sound-Go, and Sound-No-Go blocks for each session and mouse (Fig. 2g, h; Supplementary Fig. 7f).

Bouton stability

To evaluate how stable individual bouton responses were across trials of different stimulus-reward contingencies, we calculated the distributions of pairwise Pearson correlation values within (e.g. Hits-Hits) and across different (e.g. Hits-False Alarms; Hits-Correct Rejections, etc.) reward contingency conditions (Fig. 3a, b; Supplementary Fig. 15c). Using bootstrapping, in each iteration we picked two random subsets of trials, extracting for each trial the z-scored fluorescence traces between the cue onset and end of the delay period. We averaged traces across trials of each set and calculated the pairwise Pearson correlation between the resulting two average fluorescence time-traces. We repeated this procedure 100 times, picking a different set of random trials each time. We used one-third of the total number of False Alarm trials (less-frequent evaluated trial outcome) to enable the comparison with the Hit and Correct Rejection trials that occurred at a higher frequency, as expected given the high session behavioral performance of the expert mice (>80% correct trials). Miss trials were excluded from this analysis due to their infrequency (<3%). Bouton responses were classified as stable if the correlation in response across conditions was within 90% of the inter-trial variability (correlation) across Hits trials used as reference.

PCA visualization

Extracted ROI time courses were assembled in a data cube (N by S by T) of trial averaged dF/F0 responses, where N stands for the total number of cortical feedback boutons included, S is the total number of stimuli (reward-contingency trials – H, CR, FA) and T is the total number of time-bins. To reduce the dimensions of the neuronal population, this data cube was reshaped into a data matrix (N by ST) and normalized (z-scored) such that each stimulus as a function of time represents a point in an N-dimensional neural state space. To find a set of orthogonal directions that maximizes the variance captured from the data, we performed principal component analysis (PCA) and identified the eigen vectors of the associated covariance matrix. PCA was performed using built-in ‘princomp’ function in MATLAB. Data projected onto the first three principal components (PCs) is plotted in Fig. 3c. The variance explained by each PC is given by the ratio of its eigen value to the sum of all the eigen values. Miss trials were excluded from this analysis due to their low frequency (<3%). On average, across fields of view, the first three PCs explained 66.1 ± 3.3% of the signal variance for odor trials and 67.1 ± 2.8% of the signal variance explained for the sound trials.

Machine learning-based classifying task parameters and animal behavior

We used multi-layer perceptrons (MLP)160 to predict various stimulus and behavioral features associated with the task across trials. We normalized the datasets and further sliced them into non-overlapping windows of 4 time samples. These windows were then reshaped into a 4 * number of ROI input vectors which were given as input to the MLP. The data was split into disjoint test (33.3%) and training (66.6%) subsets respectively. For each classification problem (stimulus, reward-contingency, instruction, and behavior outcome), we ran 10 repetitions with shuffling and re-splitting into training and test sets to control for variability in trial quality.

The MLP consisted of one input layer that was the same size as the input vectors, two hidden layers of 10,000 and 1000 units each, and one output layer, whose size depended on the feature that we aimed to classify. The weights of the network were initialized using the Xavier uniform distribution161 (Eq. 2) where the weights of a layer (Wij) are selected from a uniform distribution (U) centered on 0 and with a range dependent on the number of nodes in the previous layer (n).

$${W}_{{ij}}\sim U\left[-\frac{1}{\sqrt{n}},\frac{1}{\sqrt{n}}\right]$$
(2)

Each layer had a Soft + + (k = 1, c = 2, Eq. 3) activation function, except for the last layer where Softmax normalization (Eq. 4) was used to return a probability distribution over the number of classes (N). For the two hidden layers, we also applied a dropout rate of 50% in order to limit overfitting.

$${Soft}++ \left(x\right)={{{\mathrm{ln}}}}\left(1+{e}^{k{{\cdot }}x}\right)+\frac{x}{c}-{{\mathrm{ln}}}\left(2\right)$$
(3)

Softmax (z-vector to be normalized, N-number of classes):

$${softmax}{(\vec{z})}_{i}=\,\frac{{e}^{\,{z}_{i}}}{{\sum }_{j=1}^{N}{e}^{\,{z}_{j}}}$$
(4)

The MLP was optimized using the Adam optimization algorithm162, and we used as a loss function the cross entropy (Eq. 5) between the Softmax output (p) and the one-hot encoding of the label (y). Cross entropy loss (N-number of classes, y-binary indicator (0 or 1), if class label c is correct classification for observation o, the p-predicted probability that observation o is of class c):

$$-{\sum}_{c=1}^{N}{y}_{o,c}{{{\mathrm{log}}}}({p}_{o,c})$$
(5)

The MLP was trained for 50 epochs (iterations through the dataset) with a batch size (number of samples per step) of 25, and a fixed learning rate (weight modification rate) of 0.001, as opposed to a variable learning rate which changes across epochs according to a schedule.

We ran three control analyses: shuffle label, shuffle channel, and EGFP control. For the shuffle label control, we shuffled the trial indices (labels) which resulted in destroying correlations between the data and the associated reward contingency. For the shuffle channel control, we shuffled the ROI fluorescence values for each time point before feeding the data into the MLP. This has the effect of destroying ROI identity information while leaving global activity in time intact. Finally, we also ran the same analyses on EGFP data, in order to measure how much of the information in the data comes from movement-related artifacts. Because EGFP fluorescence should not vary as a function of neuronal activity, any measured changes in green fluorescence are likely due to motion-related artifacts, blood-vessel occlusion, intrinsic signal changes, etc.

The same algorithms were used to decode stimulus identity in the auditory-only Go/No-Go task and the auditory-only habituation sessions.

Machine learning-based stimulus classification within and across blocks

Using the same procedure outlined above, we also ran a series of tests to measure how the representations of stimulus change across blocks in the experiment. For the same-block analysis, we selected only the trials from one type of block at a time (e.g., sound go blocks) and split them in the same proportions mentioned above. For the across-block analyses, we used all trials, but instead of random splitting, we ensured that all trials from one type of block were in the test set and all trials from the other block were in the training set.

Trajectory-based Kohonen mapping

Self-organized maps are a useful tool for visualizing patterns in multidimensional data, as they reduce the patterns across features to a color representation. Effectively the Kohonen mapping algorithm creates a translation key (through the model space) between a fixed color space and the data space163,164. This algorithm works by sequentially passing through the data samples multiple times, finding the closest match in the model space, and adjusting that model, as well as models close to it in a color space, so that they more closely resemble the sample. The color space is fixed and is three-dimensional (red, green, blue), and each model is assigned a location in this color space that does not change. The distances between models are computed in the color space, while distances between samples and models are computed in the model space. The algorithm has two parameters: the learning rate (which controls how large the change to the models is at each step) and the standard deviation of a three-dimensional Gaussian kernel (which changes how many models are modified at each step, as well as how large the modification applied to those models is). We changed these parameters at each step such that at the beginning of the algorithm, many models are altered at each step (large neighborhood radius, R, Eq. 6) with a large learning rate (L, Eq. 7). This allows the algorithm to quickly make a rough estimate of the data space. Through further iterations, the radius shrinks, and the changes made are smaller, allowing the algorithm to fine-tune the models. We seeded the algorithm by selecting the 10% of samples with the lowest total activation and setting them to black. This biases the algorithm such that samples with lower activations have darker colors and higher activations are associated with brighter colors.

The change rate and the radius of the neighborhood were given by two monotonically decreasing functions, L(k) and R(k), respectively:

$$L(k)={L}_{0}\cdot {e}^{-k\frac{{{\mathrm{ln}}}\frac{{L}_{0}}{{L}_{{{{\rm{M}}}}}}}{M}}$$
(6)

where L(k) is the learning rate, modulating the degree to which model vectors were changed at each training step, k. L0 and LM are the initial and final learning rates. We used L0 = 1 and LM = 0.01. The total number of training steps is denoted by M.

$$R(k)={{\mbox{round}}}({R}_{0}\cdot {e}^{-k\cdot \frac{{{\mathrm{ln}}}({R}_{0}/0.5)}{(g/100)\cdot M}})$$
(7)

where round denotes the rounding to the nearest integer, R(k) specifies the neighborhood size. R0 is the initial radius of the neighborhood. g is the percentage of M after which R becomes 0. We used R0 = N/2 and g = 66 (66% of steps were used to establish the topology of the map and the remaining 34% of the steps to fine-tune the representation of activity vectors in the map, i.e., only the best matching unit, BMU was changed). Within the above-defined neighborhood, model vectors move further away from the BMU changed less than the ones closer to it by multiplying the learning rate with a 3D Gaussian envelope with an SD of R(k)/3 (Eq. 8):

$${{{\rm{M}}}}{{{{\rm{V}}}}}_{k}[x,y,z]= {{{\rm{M}}}}{{{{\rm{V}}}}}_{k-1}[x,y,z]\\ +({{{\rm{A}}}}{{{{\rm{V}}}}}_{{{{\rm{k}}}}}-{{{\rm{M}}}}{{{{\rm{V}}}}}_{k-1}[x,y,z])\cdot L(k)\cdot {e}^{-\frac{{({x}_{{{{\rm{BMU}}}}}-x)}^{2}+{(\,{y}_{{{{\rm{BMU}}}}}-y)}^{2}+{({z}_{{{{\rm{BMU}}}}}-z)}^{2}}{2\cdot {[R(k)/3]}^{2}}}$$
(8)

where MVk[x,y,z] is a model vector, at step k of the training, located within the neighborhood of the BMU [distance from BMU ≤ R(k)] at position (x,y,z) in the 3D lattice. (xBMU, yBMU, zBMU) denotes the position of the BMU in the 3D lattice. AVk is the activity vector that is learned at step k, L(k) and R(k) are the learning rate and the size of the neighborhood at step k.

Neuronal distance across blocks

We calculated a neural distance (1-Pearson correlation) between the bouton ensemble response trajectory of the preceding block (averaged over responses to the last five trials in that block) and the ensemble trajectory of each trial of the current block to the same stimulus for both odor and sound trials. This metric increased systematically across trials and matched the increase in behavioral performance in the new block post rule-reversal (Fig. 5c–e).

Statistical tests

Depending on the properties of the analyzed data, different statistical tests were used to evaluate the differences between data groups. The information for each statistical comparison is detailed in the legends of each figure panel and in Supplementary Tables 1 and 2.

‘Two-sample unpaired Student’s t-test’ was used to evaluate significant differences in the means of two normally distributed data sets. ‘One-way analysis of variance’ (one-way ANOVA) was used to evaluate significant differences in the means of 3 or more normally distributed data sets. The normality of the data distribution was checked using a ‘One-sample Kolmogorov–Smirnov test’ for each data group before the testing differences with the ‘Student’s t-test’ or ‘one-way ANOVA.’ A two-sided 95% confidence interval was used for both Student’s t-test and one-way ANOVA comparisons. For ‘one-way ANOVA’ comparisons, the identification of differences between different data sets was performed post hoc using a ‘Multiple comparisons of means test.’

Significant differences between two non-normally distributed data sets were evaluated with a nonparametric ‘Wilcoxon rank-sum test’ with a one-tail 95% confidence interval.

The linear correlation between two data sets was assessed by computing the Pearson correlation coefficient. When needed, the coefficient of determination (R2) was used to evaluate how well the compared data sets fit a linear regression model.

All the statistical comparisons were tested using MATLAB® using the following functions for each test: ‘Two-sample unpaired Student’s t-test’: t-test; ‘one-way ANOVA’: anova1; ‘Multiple comparisons of means test’: multcompare; ‘Pearson correlation coefficient’: corrcoef.; ‘Wilcoxon rank-sum test’: rank-sum.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.