Abstract
How brain circuits are organized to skillfully produce learned sequences of behaviours is still poorly understood. Here we functionally examined how the cortical song premotor region HVC, which is necessary for zebra finch song1, controls the sequential production of learned song syllables. We found that HVC could generate the complete sequence of learned song syllables independently of its main synaptic input pathways. Thalamic input to HVC was needed for song initiation, but it was not required for transitions between syllables or for song completion. We showed that excitation of HVC neurons during song reliably caused vocalizations to skip back to the beginning of the song, in a manner reminiscent of a skipping record. This restarting of syllable sequences could be induced at any moment of the song and relied on local circuits within HVC. We identified and computationally modelled a synaptic network, including intratelencephalic premotor and corticostriatal neurons within HVC that are essential for completing song syllable sequences. Together, our results show that the learned zebra finch song is controlled by a cortical sequence-generating network in HVC that, once started, can sustain production of all song syllables independent of major extrinsic input pathways. Thus, sequential neuronal activity can be organized to fuse well-learned vocal motor sequences, ultimately achieving holistic control of this naturally learned behaviour.
Similar content being viewed by others
Main
Motor behaviours are considered to be learned by splitting and chunking smaller behavioural units into sequences of neural activity and then concatenating the sequences into a unified premotor plan that supports the fluid production of the entire behaviour2,3,4,5,6,7. Although there is evidence for chunking3,6, identifying the neural origin for unified premotor programs has remained challenging. The control of learned birdsong provides a tractable model for searching such premotor programs. Songbirds are among the few groups of animals, apart from humans, that learn their vocalizations through imitation. Moreover, birdsong is controlled by dedicated forebrain circuits8.
Zebra finches learn a single courtship song motif. They engage in extensive daily practice to maintain expert performance of this song. Sparse sequential neuronal activity in the pallial song nucleus HVC probably underlies the production of zebra finch song9,10,11,12,13. However, how neural sequences in HVC contribute to the progression of the song motif is still not well understood. Several lines of evidence support the idea that song control may involve reciprocal loops spanning the brainstem, thalamus and pallium14,15,16,17 (Extended Data Fig. 1a), whereas other studies suggest that HVC may be capable of generating neural sequences for song production more autonomously10,11,18,19.
There are various models of how adult songs are controlled: (1) sequential activity in HVC can sustain progression through all song syllables independently of instructive afferent inputs18,19,20; (2) input pathways link shorter neural sequences at syllable or other vocal parameter boundaries14,17,21,22; and (3) HVC sequences are continuously updated by instructive afferent input15,16. Research in this area has relied on correlations between song and electrophysiological recordings or on non-selective circuit manipulations, including electrical stimulation, cooling of brain regions and electrolytic lesioning. Here we combine a series of cell-type, circuit and pathway manipulations with synaptic mapping and computational modelling to causally examine how neural sequences contribute to completing the song motif. This study reveals that, barring a permissive thalamic input important for song initiation, HVC can independently propagate activity for production of all song syllables in the motif, and that this network relies on two synaptically interconnected classes of HVC projection neurons.
Optogenetic restarting of song
Electrical stimulation of HVC has varied effects on song production, including distortion of syllable acoustic features, truncation of song and occasional restarting of song soon after song truncation23,24. However, these studies are difficult to interpret because stimulation cannot be restricted to specific cell types or to cells within a small spatial volume, the stimulated population of neurons is highly dependent on electrode placement and there is inevitable antidromic and orthodromic activation of neurons and passing axons25.
Instead, we selectively controlled HVC activity using viral expression of the excitatory opsin ChRmine (n = 6 birds; Fig. 1a and Extended Data Fig. 1b). This provided experimental control over a population of HVC neurons composed of approximately 20% inhibitory neurons and 80% principal neurons, with a bias towards HVCX projection neurons26. Birds were implanted with fibre optics over HVC, and syllable detection software was used to perform closed-loop optogenetic manipulations while the birds were freely singing. Light stimulation reliably caused song truncation, seen as a rapid decrease in sound amplitude and disruption in syllable acoustic features (stimulation outcome probability: 86.8 ± 3.6% truncation and 10.2 ± 3.7% pause + continuation; latency to silence from onset of stimulation: 66.6 ± 4.1 ms and average ± s.e.m. in six birds; Fig. 1b–g and Extended Data Fig. 1c–g). Truncation was followed by the rapid restarting of the song motif (median, 135.8 ± 25.8 ms; lowest quartile, 87.6 ± 15.5 ms; Fig. 1b–d,h–j and Extended Data Fig. 1c). The birds restarted their song from the beginning, with one or two introductory notes followed by the motif or directly back to the first syllable of the motif, and this resetting behaviour occurred with high probability, independent from when in the song the optogenetic stimulation was triggered (Fig. 1h; all post-truncation trials reported in Extended Data Fig. 1h). When normalized by the likelihood of the birds to chain multiple motifs in series, the probability of a stimulated motif to be immediately followed by another motif was 108.6 ± 4.9%, suggesting that the optogenetic perturbation caused the song to restart from the beginning of the motif without prematurely ending the song bout (Fig. 1i and Extended Data Fig. 1i).
a, Schematic of closed-loop song-contingent light stimulation of HVC; sample image of HVC ChRmine-expressing neurons. b, Spectrograms (0–11 kHz) of normal song (top) and stimulated song (bottom). Horizontal lines identify song element boundaries, introductory notes ‘i’ followed by syllables (A, B and C) composing the motif. Light stimulation (red bars: 10-ms light) causes motif truncation (blue dashed lines overlaying letters; dashed contour represents the missing portion of the truncated syllable; motif truncation represented by the line being truncated at an angle). Orange dots indicate restart. c, Stacked control (top) and stimulated (bottom) song amplitude plots ordered by latency of stimulation onset (red line; arrow). d, Latency to stimulation (red), motif truncation (blue) and identity of resumed vocalization within 1 s following stimulation. e, Box plots (5th–95th percentile; 25th, 50th and 75th percentiles) showing the outcome of optogenetic stimulation (average probability, n = 6 birds). f, Average latency ± s.e.m. to motif truncation in response to stimulation delivered across the motif (bins, 10%; motif advancement, n = 6 birds). g, Box plots showing truncation latencies computed across all trials, per bird (n = 6). h, Probability (average ± s.e.m.) of post-truncation vocalization resumption by category upon stimulation delivered throughout the motif (bins: 10% motif advancement). i, Normalized probability of post-truncation motif restart, per bird (Methods). j, As in g for motif restart latency, per bird (n = 6). k, Subsyringeal pressure recordings (dotted line indicates ambient pressure; deviations above indicate expiration and deviations below indicate inspiration) aligned at the onset of stimulation (red bar, 50 ms; top, unstimulated trace; bottom, 34 motif traces; grey bar highlights the corresponding point in the unstimulated motif waveform). l, As in k, stimulation during quiet respiration or calls (top, sample traces; bottom, 56 traces aligned at the stimulation onset). m, Schematic of the two proposed possible scenarios; song progression is either controlled through extrinsic updates (top, red arrows) to HVC activity or controlled more autonomously by HVC (bottom, red arrow). Scale bars, 200 µm (a), 200 ms (b,k,l), 0.5 a.u. (k,l). Brain outline in a adapted with permission from ref. 60, Wiley.
To better understand how our circuit manipulations affect the motor control of the song, we recorded subsyringeal air sac pressure during optogenetic stimulations. We found that optogenetic stimulation applied during quiet respiration neither induced vocalization nor altered respiratory patterns in the birds. By contrast, stimulation during singing caused rapid cessation of expiration during ongoing syllables (Fig. 1k,l and Extended Data Fig. 1j). Syllable truncations resulted from significant respiratory pressure deviations within 36.4 ± 4.0 ms of light onset, approximately 30 ms before vocalizations were acoustically truncated, consistent with previous studies15,27 (Extended Data Fig. 1k). Finally, we found that optogenetic stimulation trials in which birds did not quickly restart singing could be the result of apnoea. Thus, in some cases, optogenetic activation suppressed involuntary respiration, which effectively blocked the reinitiation of song (apnoea duration: 588.2 ± 216.8 ms; Extended Data Fig. 1l).
Together, these data indicate that HVC can control downstream steady-state respiration circuitry in a state-dependent manner, and that once HVC is engaged, stimulation interrupts the chain of activity in HVC, resulting in abrupt song truncation and resetting of the motif back to its initial state. These attributes are reminiscent of response to perturbation in central pattern-generating (CPG) networks described in the invertebrate and vertebrate nervous systems28,29. Another defining feature of CPG networks is that once initiated, they can produce patterned activity in the absence of instructive patterned input. The seemingly automatic and rapid restarting of song hints that extrinsic inputs to HVC may function permissively, rather than instructively, in song motif production (Fig. 1m), raising the possibility that HVC produces the neuronal sequences for song in the absence of instructive patterned input and may function as a pattern-generating network for song syllable sequences.
Song initiation needs thalamic input
Input to HVC from the thalamic nucleus Uvaeformis (Uva) is one probable source of instructive signals for producing the song motif18,22,30,31. Electrical stimulation of Uva was reported to cause motif truncation at syllable boundaries17, suggesting that the inputs of Uva to HVC are instructive for motor programs to transition from one syllable to the next. To test this idea, we first used optogenetic excitation of the axon terminals of Uva in HVC through viral expression of eGtACR1, an opsin that potently drives excitation of axon terminals in zebra finches31 (Extended Data Fig. 2a,b). Light stimulation of eGtACR1-expressing Uva–HVC terminals drives strong transient increases in HVC activity (Fig. 2a and Supplementary Table 1). In contrast to thalamic electrical stimulation17, the optogenetic excitation of Uva terminals during singing did not cause motif truncation and left song syntax and spectral characteristics unaffected (Fig. 2b,c, Extended Data Fig. 3a,b and Supplementary Table 1).
a, Schematic, sample trace, raster plot and normalized peri-stimulus time histogram (PSTH) of HVC multi-unit activity recording in anaesthetized birds expressing eGtACR1 in Uva; light stimulation of Uva afferents (1 s, red bar); inset, magnified PSTH and scatter plot comparing baseline and stimulation (200 ms, dashed rectangles; n = 30 hemispheres, 17 birds). b, Song-contingent light stimulation (red bar, 200 ms) of Uva terminals in HVC; sample spectrogram (0–11 kHz; horizontal lines identify song elements). c, Violin plots reporting accuracy of song segments with (grey) and without (white) stimulation, per bird (n = 4). d, UvaHVC neurons (labelled by retrograde tracer, green) expressing ChRmine (red). Dashed white lines, fibre-optic tip. e, As in b for UvaHVC stimulation. f, Box plots (5th–95th percentile; 25th, 50th and 75th percentiles) reporting optogenetic stimulation outcome (average probability, n = 3 birds; filled circles, empty box plots from Fig. 1e reported for comparison). g, As in c for UvaHVC stimulation (n = 3 birds). Scale bars, 200 ms (b), 200 µm (d), 20 µm (d (inset)). Brain outlines in a and d adapted with permission from ref. 60, Wiley.
The lack of effects on song, even with prolonged stimulation, prompted us to test whether direct optogenetic excitation in Uva disrupts song. We expressed the excitatory opsin ChRmine in Uva neurons projecting to HVC (UvaHVC) using an intersectional viral strategy (Fig. 2d). We found that even directly stimulating UvaHVC neurons failed to cause song truncation and restarting (1.2 ± 1.2%, motif stop; 98.9 ± 1.2%, no effect; Fig. 2e,f and Supplementary Table 1). Moreover, this manipulation had no detectable impact on the spectral characteristics of song syllables (Fig. 2g, Extended Data Fig. 3c and Supplementary Table 1).
One possibility is that manipulations such as electrical stimulation may drive truncation at syllable boundaries through off-target effects, such as recruiting nearby thalamic regions or fibres of passage. Uva is located within the posterior commissure, which connects midbrain regions critical for vocalizations, audition and vision. It is immediately adjacent to the robust nucleus of the arcopallium (RA) fibre tract, which transmits descending motor commands for song (Supplementary Video 1). Neurons in and surrounding Uva relay visual information to the forebrain32, and sudden visual stimulation with a stroboscope elicits orienting responses in zebra finches27,33 that result in motif truncations at syllable boundaries, similar to those observed with electrical stimulation of Uva17.
To assess whether off-target effects could be involved in truncating motifs at syllable boundaries, we attempted to mimic the effects of electrical stimulation by non-selectively expressing ChRmine in Uva and the surrounding thalamus (Extended Data Fig. 3d). Broader thalamic optogenetic stimulation resulted in reliable motif truncation at syllable boundaries (91.5 ± 3.6%, motif stop; 0.4 ± 0.4%, pause + continuation; Extended Data Fig. 3d–k). In contrast to optogenetic stimulation in HVC or along the Uva–HVC pathway, optogenetic stimulation of the broader thalamus caused birds to momentarily stop movement and blink, both during singing and non-singing states. This suggests that the manipulation causes a visually evoked orienting response, perhaps mimicking responses to strobe-light visual stimulation27,33. Consistent with this, broader thalamic stimulation resulted in significantly longer truncation latencies than direct HVC stimulation (Extended Data Fig. 3g–k) and predominantly led to cessation of singing rather than resetting of song (Extended Data Fig. 3l,m). In the few instances when birds returned singing, the motif reset latency was significantly longer than what we observed when stimulating HVC (Extended Data Fig. 3n,o). These findings suggest that electrical stimulation-triggered song truncations are the result of off-target stimulation of the peri-Uva thalamus, and that Uva is not instructive for HVC syllable sequence progression.
We next examined if Uva could play a permissive role in song production. Electrolytic lesions of Uva or peri-Uva regions can abolish courtship song production22,30. However, electrolytic lesions non-selectively ablate neurons and damage axonal fibres. To minimize damaging axonal tracks, we performed bilateral excitotoxic lesions of Uva using a cocktail of ibotenic and quisqualic acid (n = 13 birds). This strategy yielded three outcomes: (1) complete lesions of Uva (99.6 ± 0.4% Uva lesioned) that also included the peri-Uva thalamus that resulted in birds that could no longer sing their motif; (2) large peri-Uva thalamic lesions that mostly spared Uva (10.8 ± 3.8% Uva lesioned) that resulted in birds that also could no longer sing their motif; and (3) almost complete Uva lesions (87.5 ± 7.0% Uva lesioned) that spared the broader peri-Uva thalamus and resulted in birds that could sing their motif within approximately 1 week following lesion (Fig. 3a–c and Extended Data Fig. 4a). This last group of birds demonstrates that HVC can drive production of the entire song motif, even when Uva is significantly lesioned. Nonetheless, we found that these birds chained significantly fewer motifs together in each song bout (Fig. 3d and Supplementary Table 2), and they would often fail to produce their song motif after singing introductory notes (Fig. 3e and Supplementary Table 2). These findings are consistent with Uva lesions disrupting the ability of birds to initiate courtship song performances to female birds30, and they suggest that Uva may be needed for the initiation of the song motif.
a, Schematic and sample image (NeuN immunofluorescence, grey; HVC retrograde tracer, green) and spectrograms (0–11 kHz; horizontal lines identify song elements) reporting the effect of excitotoxic bilateral lesion of Uva and peri-Uva thalamus. b, Motif self-similarity before (circles) and 1–2 weeks after (triangles) lesions (peri-thalamus + Uva (brown; n = 8); peri-thalamus excluding Uva (grey; n = 2); Uva excluding perithalamic areas (blue; n = 3). c, Percentage of Uva lesion in the three experimental groups. d, Cumulative probability of motifs per bout sung by the birds before (black) and 30 d after (blue) excitotoxic lesion of Uva that spared peri-Uva thalamic regions (n = 3 birds). e, The rate of motif start failures before (grey circles) and after (blue triangles) the excitotoxic Uva lesions. f, Schematic, sample image and spectrograms (as in a) reporting the effects of TeNT expression in UvaHVC neurons. g, Box plots (5th–95th percentile and 25th, 50th and 75th percentiles) reporting self-similarity between motifs sang before viral injection (grey) and the last motifs produced before complete cessation of singing upon expression of TeNT in UvaHVC neurons for 1–2 weeks (purple triangles; n = 6 birds). h, Cumulative probability of motifs per bout before (black) and 1–2 weeks after expression of TeNT in UvaHVC neurons (purple; n = 6 birds). i, Rate of motif start failures before (grey circles) and after (purple triangles; n = 6 birds) expression of TeNT in UvaHVC neurons. NS, nonsignificant. Scale bars, 1 mm (a), 100 µm (a (insets)), 1 s (a (spectrograms)), 200 µm (f). Brain outlines in a and f adapted with permission from ref. 60, Wiley.
To test the role of Uva in song initiation, we first blocked glutamate release from UvaHVC neurons using viral expression of tetanus neurotoxin (TeNT) (Fig. 3f and Extended Data Fig. 4b). These birds had progressive difficulty initiating their song on a timeline consistent with viral expression (approximately 10–14 days). They had increasing failures in motif initiation following singing of introductory notes and decreased number of motifs per song bout. However, in instances when the motif was initiated, the birds consistently produced all song syllables in the motif with high accuracy (Fig. 3f–i, Extended Data Fig. 4c and Supplementary Table 2). These data support the idea that the Uva–HVC pathway is permissive for initiating learned song motifs rather than instructing song syllable transitions17.
To test this, we expressed eGtACR1 in Uva and optogenetically silenced Uva neurons during singing (Extended Data Fig. 4d). We found that silencing Uva during an ongoing song motif did not disrupt the completion or acoustic structure of that motif, but it reduced the probability of initiating and concatenating a subsequent motif (Extended Data Fig. 4e,f). By contrast, using the same birds and placing fibre optics over HVC to excite Uva axon terminals across motif transitions did not suppress initiation of a subsequent song motif (Extended Data Fig. 4g,h). Thus, if Uva input to HVC is excited, birds can continue the ongoing motif and string other motifs together. If instead it is inhibited, birds still complete the ongoing motif but exhibit difficulty starting the next song motif. Thus, the Uva–HVC pathway is critical for initiating song motifs, potentially coordinating the two hemispheres, but not needed for birds to string together syllables within the song motif.
Pallial afferents are not needed for song
HVC receives excitatory input from three auditory and premotor pallial regions that play important roles in song learning: nucleus interfacialis (NIf), nucleus avalanche and medial magnocellular nucleus of the anterior nidopallium (mMAN)30,34,35,36,37. We examined the role of each pathway in adult song performance. Stimulation of eGtACR1-expressing axon terminals in HVC from any of these regions significantly increased HVC multi-unit firing activity (Extended Data Fig. 5a,g,m). However, 200-ms-long or 1-s-long song-contingent light stimulation of any of these input pathways failed to affect spectrotemporal motif characteristics (Extended Data Fig. 5a–r). We therefore tested whether these afferents are necessary for adult song performance. Previous studies indicate that bilateral lesions of either NIf, mMAN or nucleus avalanche in adults do not cause any long-lasting disruptions in song34,36,38. However, it has been shown that compensation by other pathways could account for the lack of sustained effects on song. Therefore, we consecutively lesioned mMAN, NIf and nucleus avalanche in the same birds using ibotenic and quisqualic acid. Bilateral lesions of these nuclei (mMAN, 100.0 ± 0.0%; NIf, 92.9 ± 4.0%; nucleus avalanche, 100.0 ± 0.0%; lateral magnocellular nucleus of the anterior nidopallium (lMAN), 82.5 ± 7.7%; Extended Data Fig. 6a–i) caused only a temporary decrease in motif quality. The song motif quickly recovered to its pre-lesioned state (Extended Data Fig. 6j), and, unlike Uva lesions, these lesions did not impact the number of motifs per bout or cause disruptions in motif initiation (Extended Data Fig. 6k,l). This demonstrates that HVC can generate the sequential activity necessary for completing song independent of its known main excitatory synaptic afferents, further supporting the idea that HVC is the origin of a unified premotor program for the zebra finch song motif.
Song pattern-generating network in HVC
To further define the circuit boundaries of the song pattern-generating network, we examined whether downstream target regions of HVC are critical to pattern generation. We reasoned that the kinetics of the post-truncation restarting of song provides a sensitive behavioural read-out of pattern resetting and could clarify whether those neural circuits are involved in song pattern generation or simply relay the patterned output. Disruption of a pattern generation node would produce truncation and reset latencies similar or faster than those observed upon HVC stimulation, whereas a relay node would result in low-latency truncations followed by low-probability and longer-latency motif resetting. HVC has two major output pathways: the descending song motor pathway through the pallial song region RA and the palliostriatal pathway through area X, emerging from HVCRA and HVCX neurons, respectively8.
We bilaterally expressed ChRmine in either area X or RA and light stimulated each region in freely singing birds. Driving area X neurons rarely caused motif truncation (truncation probability, 2.9 ± 1.6%; no effect, 97.2 ± 1.6%; Extended Data Fig. 7a,b). The truncations we observed occurred at syllable boundaries and were significantly delayed (latency, 146.7 ± 36.7 ms) compared with the uniform song truncations observed with HVC optogenetic stimulation. Nonetheless, stimulation of area X neurons consistently caused a modest increase in the noisiness of stimulated syllables (Extended Data Fig. 7c,f), consistent with the known role of the basal ganglia pathway38.
By contrast, optogenetic stimulation in RA caused rapid motif truncations with high reliability (92.2 ± 2.7%, motif stop; 1.5 ± 1.8%, pause + continuation; Extended Data Fig. 8a–c). These truncations exhibited uniform latency across song, similar to stimulation in HVC (Extended Data Fig. 8d–f). Because RA is downstream of HVC in the song motor pathway, we might expect birds to restart their song as fast, or faster, than when stimulating in HVC and with equal probability. However, we found that RA stimulation is less likely to be followed with restarting of the motif. When it does, it takes significantly longer than following HVC stimulation (Extended Data Fig. 8g–k). This argues that the song pattern-generating network is localized to HVC and that RA functions downstream of this network to relay motor commands for song.
To test this prediction, we moved upstream by one synapse and examined whether optogenetic stimulation of HVCRA neurons would produce the truncation and song restarts with the same timing and reliability as our pan-HVC optogenetic manipulations, as shown in Fig. 1. We used an intersectional viral strategy to achieve ChRmine expression only in HVCRA neurons (Extended Data Figs. 9 and 10). As anticipated, song-contingent HVCRA stimulation reliably caused rapid truncations throughout song (86.2 ± 5.6%, motif stop; 2.5 ± 1.8%, pause + continuation; Extended Data Fig. 9a–c). Unexpectedly, the latency to truncation was significantly longer than what we observed with pan-HVC stimulation. In some instances, it seemed to occur closer to syllable boundaries (panHVC, 66.6 ± 4.1 ms; HVCRA, 79.1 ± 4.8 ms; Extended Data Figs. 9b,d and 10c,d). Although post-truncation motif reset probability is comparable to that observed with pan-HVC stimulation (Extended Data Figs. 9e,f and 10e,f), restart latency was intermediate to the timing of pan-HVC and RA stimulation (Extended Data Figs. 9g and 10g–i). Although different latencies to first spike among HVC projection classes may influence truncation and restart dynamics39, this intermediate timing is still surprising because chains of excitatory synaptic connections among HVCRA neurons are considered to be a central component of the network controlling song9,10,11. This prompted us to investigate whether the other main class of HVC projection neurons (HVCX neurons) may contribute to the rapid restarting of the song motif.
HVCX neurons in song pattern generation
Similar to HVCRA neurons, HVCX neurons exhibit temporally precise and sparse activity during production of the song motif12,40,41, but their role in song generation is not known. They are considered to relay timing activity to the basal ganglia rather than directly contributing to song pattern generation42,43,44. However, paired recordings from HVC neurons in sleeping birds suggest that HVCX neurons can reliably lead the activity of interneurons and HVCRA neurons45, suggesting that they could contribute to song pattern generation.
We expressed ChRmine in HVCX neurons (Extended Data Fig. 11a,b) and found that stimulation reliably triggered song truncations (88.4 ± 3.0%, motif stop; 5.0 ± 2.5%, pause + continuation; Fig. 4a–c and Supplementary Table 3). The latency of truncation matched truncations elicited by pan-HVC stimulation (panHVC, 66.6 ± 4.1 ms; HVCX, 71.1 ± 4.1 ms; Fig. 4d, Extended Data Fig. 11c,d and Supplementary Table 3). Truncations were followed by rapid song restarts with probability and latency comparable to pan-HVC stimulation (Fig. 4e–g, Extended Data Fig. 11e–g and Supplementary Table 3). Song restart following HVCX stimulation was faster than that from HVCRA stimulation (Extended Data Fig. 11h,i). Although HVCRA neurons display longer spiking latencies than HVCX neurons39, which could contribute to the observed truncation dynamics, these rapid truncation and restart kinetics suggest that HVCX neurons could be part of the core song pattern-generating network, rather than only relaying timing signals to the basal ganglia.
a, Schematic and spectrograms (0–11 kHz; horizontal lines identify song elements) of song-contingent HVCX ChRmine-expressing neuron stimulation (10 ms, red bars; orange dots highlight motif restarts). b, As in Fig. 1c,d for HVCX stimulation. c, Box plots (5th–95th percentile and 25th, 50th and 75th percentiles) reporting optogenetic stimulation outcome (average probability, n = 4 birds; filled circles, empty box plots from Fig. 1e reported for comparison). d, Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ± s.e.m. of the curve of each bird; n = 4 HVCX-stimulated birds (blue). The dataset from HVC-stimulated birds (black) in Fig. 1 was compared with all experimental groups across the paper; 10-ms time bins. Inset, violin plots reporting the latency of motif truncation computed across all the birds (HVCX-stimulated birds (blue) and the dataset from HVC-stimulated birds (white) in Fig. 1 was compared with all experimental groups across the paper). e, Probability (average ± s.e.m.) of post-truncation vocalization resumption by category upon stimulation delivered throughout the motif (bins, 10% motif advancement; n = 4). f, Normalized probability of post-truncation motif restart (n = 4 HVCX-stimulated birds, filled circles; empty box plots from Fig. 1i reported for comparison). g, Same as d, but for latency to motif restart (HVCX-stimulated birds, orange; HVC-stimulated birds, black). Scale bar, 200 ms (a). Brain outline in a adapted with permission from ref. 60, Wiley.
Given that direct optogenetic stimulation of area X does not result in song truncations (Extended Data Fig. 7a,b), local synaptic transmission from HVCX neurons within HVC is probably the source of truncation and rapid restarting of song. We tested this by antidromically recruiting indirect and delayed excitation of axonal collaterals within HVC23 using optogenetic stimulation of HVC axon terminals in area X (Extended Data Fig. 12a,b). This stimulation should cause delayed song truncation, yet the motif restart kinetics should be similar to those upon direct HVC stimulation. We found that stimulation resulted in delayed motif truncation, although with lower probability compared with pan-HVC or HVCX direct stimulation, consistent with the limitations of antidromic propagation23 (Extended Data Fig. 12c–e). Nevertheless, we found that post-truncation restarting of the song motif had the same probability and latency as stimulating HVCX neuronal somata (Extended Data Fig. 12f–h).
Previous studies have identified the main synaptic connectivity motif in HVC to be disynaptic reciprocal inhibition between HVCRA and HVCX neurons through local interneurons46,47. This implicates a surge of inhibition onto HVCRA neurons following stimulation of HVCX neurons. However, this does not explain the more rapid truncation and reset of the motif upon HVCX stimulation. To examine how HVCX neurons may contribute to the song sequence propagation, we mapped the local connectivity of HVCX neurons using opsin-assisted synaptic circuit mapping. Stimulating neurotransmitter release from HVCX neurons in brain slices evoked excitatory and inhibitory postsynaptic currents in HVCX and HVCRA neurons (Fig. 5a and Extended Data Fig. 13a). Isolating monosynaptic connections using bath application of tetrodotoxin (TTX) followed by 4-aminopyridine (4-AP) revealed that HVCX neurons make monosynaptic connections with HVCRA and HVCX neurons with high and low probability, respectively (Fig. 5a). Previous studies using paired recordings and electron microscopy suggest that HVCRA neurons have only sparse connectivity with other HVCRA neurons. However, they are more reliably synaptically connected with HVCX neurons46,47,48. These data support a model in which the two HVC projection neuron classes form a heterosynaptic network, along with local interneurons, that can holistically sustain song pattern generation.
a, Polysynaptic (HVCX, cyan; HVCRA, magenta) and monosynaptic (grey) optogenetically evoked excitatory post-synaptic current (oEPSC) amplitudes evoked by HVCX optogenetic stimulation (red lines, 1 ms). Following TTX application, 4-AP restores monosynaptic oEPSCs (grey; red outline indicates absence of post-4-AP oEPSC; n = cells; animals). Bar charts show the prevalence of polysynaptic and monosynaptic oEPSCs. b, Schematic of the HVC motif pattern-generating circuit. Neurons (circles) connected by structured local synaptic projections (cones) and unstructured global projections (lines; arrows, excitatory; circles, inhibitory). c, Compensated connection matrix between HVC projection neuron classes (HVCPN) and interneurons. d, Offset of excitatory (black) and inhibitory (grey) bumps grants unidirectional propagation of excitation. e, Simulated HVCRA/HVCX sequence showing truncation and restart of the neuronal firing sequence following simulated optogenetic HVCRA excitation (magenta arrows). Peri-song neuronal firing activity (green) surges to start the motif. f, Modelled dynamics of HVC subpopulations in response to excitatory perturbation (red), aligned and overlayed with experimental average ± s.e.m. data relative to truncation (blue dashed) and restart (orange dashed) latency upon pan-HVC stimulation, shifted by 36.35 ms (semitransparent filled waves) to reflect the delay measured from respiratory data. Inset magnifies the rising peri-song neuronal activity. g, Predicted decrease in mean sequence length upon simulated degradation of HVCX neuron connectivity. h, Simulated HVCRA sequence showing stochastic truncation upon HVCX TeNT-like degradation. i, Schematic and image showing conditional TeNT expression in HVCX neurons. j, Spectrograms (0–11 kHz; horizontal lines identify song elements) show progressive motif shortening. Inset, tiled spectrograms ordered by motif length. k, Complete motifs by each bird averaged per week (birds colour-coded per normalized TeNT optical density). l, Scatter plot correlating the relative change in number of motifs per day sung at week 4 as a function of the average optical density of TeNT expression (left, Spearman r = −0.8545; r2 = 0.6127; P = 0.0029) or number of TeNT+ somata per 50-µm-thick slice (right, Spearman r = −0.8061; r2 = 0.3533; P = 0.0072). Scale bars, 100 ms (a (x axis)), 100 pA (a (y axis)), 500 µm (i), 200 ms (j). Brain outline in i adapted with permission from ref. 60, Wiley.
HVC pattern-generating network model
To test if a circuit consistent with this synaptic organization could sustain song progression and restarting, we modelled the network by arranging HVCRA, HVCX and local inhibitory neurons uniformly on a ‘chain’ (Fig. 5b,c). Synaptic weights in our model were symmetric, and the weights of excitatory and inhibitory connections decayed with distance along the chain. A pool of inhibitory neurons was driven by excitatory neurons and provided global inhibitory feedback. Song initiation was mediated by excitatory input from Uva onto a recently identified class of ‘peri-song’ HVCRA neurons, which became active just before song onset and were inactive during singing49 (Fig. 5b). Local inhibitory activity spatially lagged excitatory activity in the direction of sequence propagation, thereby providing stronger inhibition to excitatory neurons at earlier positions in the chain and effectively pushing excitatory activity forward along the chain (Fig. 5c,d and Supplementary Videos 2 and 3). Weakened excitatory synapses at the end of the chain stopped sequence propagation and released peri-song neurons from inhibition, leading to spontaneous restarting of the song motif if excitatory drive from Uva remained intact (Fig. 5e).
We modelled how this circuit responded to strong synchronous excitation, mimicking our optogenetic manipulations (Fig. 5e,f). The activation function of global inhibitory neurons was steeper than excitatory neurons, and the excitatory and global inhibitory neurons were strongly connected in a parameter regime called inhibition stabilization50. Thus, synchronous excitatory activity caused widespread and strong feedback from inhibitory neurons that blocked sequence propagation. Once the sequence was truncated, it could be spontaneously restarted through peri-song neurons, akin to restarting the song after its natural ending (Fig. 5e,f and Supplementary Videos 2 and 3). Unexpectedly, we found that the parameter settings that allowed the model to generate a moving neural sequence, optogenetic truncation and spontaneous restarting of sequence generation resulted in dynamics that qualitatively matched the timing of behaviour we measured following optogenetic manipulations of HVC (Fig. 5f). Although this simple rate-based model does not capture the precise spike-timing activity of all HVC neurons, it demonstrates that an inhibition-stabilized pattern-generating circuit matches our behavioural results following circuit perturbations, and an emergent property of the model is that its dynamics also match the timing of song truncation and restart.
Although previous studies have suggested that HVCX neuronal lesions leave song intact43,44, our model simulations predict that weakening the contribution of HVCX neurons to the chain results in premature truncation of neural sequence propagation, followed by spontaneous restarting of the motif (Fig. 5g,h and Supplementary Videos 2 and 3). To test these predictions, we suppressed excitatory synaptic transmission from HVCX neurons using selective viral expression of TeNT (Fig. 5i). In contrast to the effect of expressing TeNT in Uva, which resulted in birds producing their song motif in an all-or-none fashion (Fig. 3f–i), we found that TeNT in HVCX neurons caused birds to progressively increase the likelihood of prematurely truncating their song motifs (Fig. 5j–l and Extended Data Fig. 13b–d). Birds exhibited song truncations both within and between syllables (Fig. 5j and Extended Data Fig. 13e). The truncations occurred progressively earlier in the motifs, with timelines consistent with viral expression dynamics and with effects directly proportional to the amount of TeNT expression (Fig. 5k and Extended Data Fig. 13b,f). Consistent with model predictions, we also found that birds would frequently restart their songs following premature motif truncations, and that the latency of these restarts matched those observed upon direct optogenetic excitation of HVCX neurons (Fig. 5j and Extended Data Fig. 13e,g). Together, these results demonstrate that interruption of HVC activity, by either optogenetic perturbations or silencing of HVCX neurons, drives song truncation and probably releases a common circuit mechanism driving rapid restarting of the neural sequence for song.
Discussion
Neuronal sequences in HVC have been proposed to function as a clock, which controls the timing and progression of the song motif9,11,12,13,15. Considerable debate has centred on whether these patterns of activity require instructive patterned input for motif completion. Here we provide key observations indicating that adult HVC functions as a sequence-generating network that does not require patterned input, at least from its best described afferent pathways, to complete the song motif.
Chunking of motor sequences, followed by concatenation of commonly repeated sequences, is a proposed mechanism for optimizing learning and performance2,4,5,7,51,52,53,54,55. Motor chunking is thought to function in learning and production of the movement sequences needed for fluent speech production and other well-practiced behaviours. Early stages of juvenile bird song development involve splitting and growth of neural sequences in HVC as new syllables are being learned6,56. This process probably reflects chunking of respiratory and vocal patterns needed for accurate and rapid learning. Juvenile songbirds progressively shape their song, practicing thousands of times per day, and pallial input pathways are necessary to direct such developmental song learning35,37,57,58.
Here we show that these main pallial input pathways to HVC are dispensable for adult song production. Moreover, our cell-type selective manipulations in HVC demonstrate that ordered syllable sequence of the song motif spontaneously restarts if it is prematurely truncated. This skipping of song back to the beginning is reminiscent of CPG rhythm resetting and consistent with the holistic control of the motif by a sequence-generating circuit in HVC20,29. These findings suggest that the phase at the end of song development, referred to as ‘crystallization’, involves consolidation of neural programs for motor control within HVC. We propose that as song developmentally becomes more stereotyped and precise, consistent daily practice concatenates these sparse neural sequences into a stable chain that autonomously sustains song motif completion.
Birds lack a corpus callosum; therefore, our findings raise questions about how the interhemispheric timing of pattern-generating circuits is coordinated. Uva receives bilateral ascending input from the respiratory medulla and midbrain vocal circuits and is therefore considered to play a prominent role in interhemispheric coordination of HVC14,15,22. Accordingly, we found that thalamic input from Uva is needed for initiating the song motif. However, our evidence indicates that it is not needed for transitioning from syllable to syllable within the motif. Uva may therefore send synchronized onset cues for song to coordinate initiation of each motif in the song bout, which then continues autonomously in each HVC. Nevertheless, Uva remains active throughout the motif and could function as a metronome, providing timing signals that support interhemispheric coordination without being required to instruct transitions within ongoing song motifs.
Our synaptic connectivity mapping finds HVCX neurons consistently making monosynaptic connections with HVCRA neurons but only sparsely with other HVCX neurons. This supports the idea that the main synaptic connectivity within HVC involves disynaptic inhibition and monosynaptic excitation between HVCRA and HVCX neurons, rather than homotypic connections within either class of excitatory neurons46,47. More research will be needed to fully describe cell-type connectivity in the network and understand how song sequence progression is fully controlled. Nonetheless, we propose a straightforward computational model of HVC that can sustain sequence generation and song restarting following circuit perturbations. Moreover, our model simulations indicate that a substantial reduction in HVCX neuronal transmission leads to stochastic song truncations and restarting of the motif, a prediction matched by our selective expression of TeNT in these neurons. However, the HVC song circuit seems robust to moderate perturbation, as shown by the lack of effect in previous HVCX ablation studies43,44, focal lesion studies59 and our own model predictions (Fig. 5g,h), as well as by our data showing that birds with lower levels of TeNT expression did not exhibit significant disruptions in song (Fig. 5k,l and Extended Data Fig. 13b–d).
In summary, this study reveals that a premotor circuit, facilitated by thalamic input, can holistically control strings of vocal syllables and more precisely defines the synaptic circuit architecture critical to this pattern-generating circuit. In future experiments, it will be important to examine how ‘fused’ sequence elements are integrated for the control of other types of natural behaviours and learned vocalizations with greater syntax complexity. Zebra finches produce only a single stereotyped sequence of song syllables, making them an ideal model for first testing how the brain controls strings of vocal gestures. We propose that chunking, followed by concatenation of reliably reproduced neuronal sequences, underlies these behaviours, and that the approaches applied here can help identify the boundaries of the motor sequences used by the brain to support production of complex behaviours.
Methods
Animals
The experiments described in this study were conducted using adult male zebra finches (Taeniopygia guttata; 120–500 days post-hatch). All procedures were performed in accordance with protocols approved by the Animal Care and Use Committee at UT Southwestern Medical Center.
Viral vectors
The following adeno-associated viral vectors were used in the experiments: rAAV2/9/fDIO–CBh–eGTACR1–mScarlet, rAAV2/9/CBh–Flippase, rAAV2/9/CBh–ChRmine–mScarlet, rAAV2/9/DIO–CAG–ChRmine–mScarlet, rAAV2/9/DIO–CAG–TeNT–mScarlet (Intellectual and Developmental Disabilities Research Center Neuroconnectivity Core at Baylor College of Medicine) and rAAV2/9/CMV–CRE–eGFP (Addgene). All viral vectors were aliquoted and stored at −80 °C until use.
Stereotaxic surgery
Aseptic stereotaxic surgeries were performed after birds were anaesthetized (isoflurane inhalation; 0.8%–1.5%).
Viral injections were performed using previously described procedures26,37,58. Briefly, a cocktail of adeno-associated viral vectors (rAAV/CBh–ChRmine in HVC, RA, area X or thalamus (2 µl per hemisphere); 1:2 of rAAV/CBh–FLP and rAAV/DIO–CBh–eGtACR1, respectively (1–2 µl total per hemisphere); rAAV/DIO–CAG–ChRmine in HVC or Uva (2 µl); rAAV/CMV–Cre in RA, area X or HVC (0.5–1 µl and 2 µl, respectively); rAAV/DIO–TeNT in HVC or Uva (2 µl); and rAAV/CMV–CRE in area X or HVC (2 µl), respectively) were injected (1 nl s−1) into target areas with a Nanoject III (Drummondsci) and glass capillaries. Experiments were conducted starting a minimum of 3 weeks after viral injections. Fluorophore-conjugated retrograde tracers (Dextran 10,000 MW, AlexaFluor 488, 568 and 647, Invitrogen; Fast Blue, Polysciences) were injected bilaterally into area X, RA or HVC (160 nl; 5 × 32 n, 32 nl s−1 every 30 s) (refs. 26,37,58). Electrophysiological mapping was used to determine the centres of HVC, NIf, mMAN, LMAN and RA, and area X, nucleus avalanche and Uva were identified using stereotaxic coordinates (coordinates relative to interaural zero: head angle, rostral–caudal, medial–lateral, dorsal–ventral (in mm). The stereotaxic coordinates were as follows: HVC (45°; anterior–posterior, 0; medial–lateral, ±2.4; dorsal–ventral, −0.2 to −0.6), NIf (45°; anterior–posterior, 1.75; medial–lateral, ±1.75; dorsal–ventral, −2.4 to −1.8), mMAN (20°; anterior–posterior, 5.1; medial–lateral, ±0.6; dorsal–ventral, −2.1 to −1.6), lMAN (20°; anterior–posterior, 5.1; medial–lateral, ±1.7; dorsal–ventral, −2.2 to −1.6), RA (80°; anterior–posterior, −1.5; medial–lateral, ±2.5; dorsal–ventral, −2.4 to −1.8), X (45°; anterior–posterior, 4.8; medial–lateral, ±1.6; dorsal–ventral, −3.3 to −2.7), nucleus avalanche (45°; anterior–posterior, 1.65; medial–lateral, ±2.0; dorsal–ventral, −0.9) and UVA (20°; anterior–posterior, 2.5; medial–lateral, ±1.6; dorsal–ventral, −4.8 to −4.2).
Optogenetic manipulations
For optogenetic stimulation, optic fibres (multimode 400 µm; 0.39 numerical aperture; ThorLabs) were implanted bilaterally dorsal to HVC, RA, area X or Uva using acrylic glue and dental cement. Although the 400-µm-diameter fibres may not completely cover the entirety of the areas, we estimated that the cone of light could stimulate the vast majority of the targeted neurons. After recovery, the implanted fibres were connected to optic fibres through ceramic sleeves. The fibres were connected to a rotary joint and interfaced with a 1.5-mm multimode fibre connected to a light-emitting diode box (Prizmatix). Light intensity was regulated to achieve a final output of approximately 10 mW. We used a custom software (pcaf; LabVIEW) to deliver optogenetic stimulation during song (200 ms or 1 s for HVC afferent stimulation, 10–50 ms for direct ChRmine somatic stimulation and 50–200 ms for antidromic HVCX stimulation). In many instances, our goal was to target as many moments as possible within a bird song motif. To achieve this, we targeted most of the motifs birds were producing using quasi-random light onset delays introduced through a transistor–transistor logic. This targeting strategy allows for a detailed analysis of motif-level effects but limits our ability to conduct meaningful song-bout-level analysis of the behaviour. We note that light delivery over HVC or other brain regions is not sufficient to cause truncations or other disruptions in singing behaviour because several experiments using light stimulation (light stimulation of afferent pathways into HVC or of area X neurons) have no effect on singing behaviour. Air sac recordings and analysis were performed as previously published15.
Lesion quantification
Excitotoxic lesion was induced by 1% ibotenic acid (50–100 nl per injection site) or a cocktail of 1% ibotenic acid and 100 mM quisqualic acid (Uva and LMAN). Lesion extent was first verified by the absence or sparseness of NeuN immunostaining in the targeted nuclei. To provide an unbiased estimate of the lesion extent, retrograde tracers were injected in HVC and RA to highlight any surviving cells in the afferent nuclei. In control animals, the number of retrograde tracer-filled cells in each nucleus was quantified, and correlations were calculated between cell counts in each nucleus (Extended Data Fig. 6a–f). This analysis provided a statistical validation to extrapolate the number of cells in a target nucleus from the number of cells counted in a reference nucleus. Therefore, an average ratio across nuclei cell counts was calculated. On the basis of these control ratios and the number of cells in a non-lesioned reference nucleus, the expected number of retrogradely filled cells in each nucleus of each hemisphere was estimated.
In vivo extracellular recordings
To test the functional expression of opsins, we performed extracellular recording of HVC activity in birds under light isoflurane anaesthesia (0.8%) with Carbostar carbon electrodes (impedance: 1,670 µΩ cm; Kation Scientific). A 400-µm multimodal optical fibre was placed on the brain surface overlaying HVC and delivered light stimulation (470 nm; approximately 20 mW; 1 s) during neural recordings. To test antidromic excitation of HVCX neurons by axon terminal optical stimulation, optic fibres were implanted over area X (470 nm; approximately 20 mW; 100 ms). Signals were acquired at 10 kHz and band-pass filtered (300 Hz high-pass; 20 kHz low-pass). Spike rate (binned every 10 ms) and PSTHs were calculated to quantify light stimulation responses (one to five sites per hemisphere; Spike2). Birds without optically evoked responses were excluded from experiments. Spike counts and PSTHs were normalized to the pre-stimulus baseline (500 ms). Two-way analyses of variance (ANOVA) were calculated comparing the time course between stimulated and not stimulated recordings: for testing HVC afferents (1-s stimulation), 0–5 s (light stimulation; 0.5–1.5 s) versus 5–10 s (control; no stimulation); for ChRmine-expressing HVC neurons or HVC→area X stimulation (100-ms stimulation) 0.7–1.4 s (300 ms before and after 100-ms light stimulation) versus 5.7–6.4 s (control; no stimulation). Wilcoxon tests were performed on the average time course (with intervals specified in the figure legends).
Ex vivo physiology
Slice preparation
Zebra finches were deeply anaesthetized and then decapitated. The brain was removed from the skull and submerged in cold (1–4 °C) oxygenated dissection buffer. Acute sagittal 230-μm brain slices were cut in ice-cold carbogenated (95% O2/5% CO2) solution, containing 110 mM choline chloride, 25 mM glucose, 25 mM NaHCO3, 7 mM MgCl2, 11.6 mM ascorbic acid, 3.1 mM sodium pyruvate, 2.5 mM KCl, 1.25 mM NaH2PO4 and 0.5 mM CaCl2, and adjusted to 320–330 mOsm. Individual slices were incubated in a custom-made holding chamber filled with artificial cerebrospinal fluid, containing 126 mM NaCl, 3 mM KCl, 1.25 mM NaH2PO4, 26 mM NaHCO3, 10 mM d-(+)-glucose, 2 mM MgSO4 and 2 mM CaCl2, adjusted to 310 mOsm, pH 7.3–7.4 and aerated with a 95% O2/5% CO2 gas mixture. Slices were incubated at 36 °C for 20 min and then kept at room temperature for a minimum of 45 min before recordings.
Slice electrophysiological recording
The slices were constantly perfused in a submersion chamber with 32 °C oxygenated normal artificial cerebrospinal fluid. Patch pipettes were pulled to a final resistance of 3–5 MΩ from filamented borosilicate glass on a Sutter P-1000 horizontal puller. HVC projection neuron classes, as identified by retrograde tracers, were visualized by epifluorescence imaging using a water immersion objective (×40; 0.8 numerical aperture) on an upright Olympus BX51 WI microscope, with video-assisted infrared CCD camera (QImaging Rolera). Data were low-pass filtered (10 kHz) and acquired (10 kHz) (Axon MultiClamp 700B amplifier, Axon Digidata 1550B data acquisition and Clampex 10.6; Molecular Devices).
For voltage clamp whole-cell recordings, the internal solution contained 120 mM cesium methanesulfonate, 10 mM CsCl, 10 mM HEPES, 10 mM EGTA, 5 mM creatine phosphate, 4 mM ATP–Mg and 0.4 mM GTP–Na (adjusted to pH 7.3–7.4 with CsOH). For current clamp recordings, the internal solution contained 116 mM K gluconate, 20 mM HEPES, 6 mM KCl, 2 mM NaCl, 0.5 mM EGTA, 4 mM MgATP, 0.3 mM NaGTP and 10 mM Na phosphocreatine (adjusted to pH 7.3–7.4 with KOH; 299 mOsm).
Optically evoked synaptic currents were measured by delivering two light pulses (1 ms, spaced 50 ms, generated by a CoolLED pE-300) focused on the sample through the ×40 immersion objective. Sweeps were delivered every 10 s. Synaptic responses were monitored while holding the membrane voltage at −70 mV (for oEPSCs) and +10 mV (for optogenetically evoked inhibitory postsynaptic currents (oIPSCs)). We monitored different light stimulation intensities before baseline recording to achieve oEPSC responses at approximately 50% of the maximal response. Access resistance (10–30 MΩ) was monitored throughout the experiment, and recordings were discarded from further analysis if resistance changed by more than 20%. The excitation–inhibition (oEPSC/oIPSC) ratio was calculated by dividing the amplitude of the oEPSC at −70 mV by the amplitude of the oIPSC at +10 mV during identical light intensity stimulation. To validate inhibitory and excitatory post-synaptic currents as γ-aminobutyric acid (GABA)ergic and glutamatergic, respectively, in a subset of cells the GABAa receptor antagonist SR 95531 hydrobromide (gabazine; 10 µM) was added to the bath while holding the cell at +10 mV, or the AMPA receptor antagonist 6,7-dinitroquinoxaline-2,3-dione (10 µM) while holding the cell at −70 mV. In another subset of cells, once the baseline measures were established, we tested for monosynaptic connectivity by bath application of 1 µM TTX, followed by 100 µM 4-AP, and measured the amplitude of post-synaptic currents returning following 4-AP application. On the basis of the signal-to-noise ratio of the recordings, currents under 5 pA were considered unreliable and not considered further, as were currents rescued by 4-AP application with an amplitude less than 10 pA (non-monosynaptic; two instances: 1 HVCX→HVCX and 1 HVCX→HVCRA).
Histology and immunohistochemistry
Birds were anaesthetized with EUTHASOL (Virbac) and transcardially perfused with 4% paraformaldehyde in phosphate-buffered saline (PBS). Free-floating sagittal sections (30 µm) were cut using a cryostat (Leica CM1950). These sections were first washed in PBS, then blocked in 3% bovine serum albumin in 0.3% Triton X-100 in PBS for 1 h at room temperature and incubated with primary antibodies (α-NeuN MAB377, Millipore, 1:500; α-GFP a11122, Invitrogen, 1:1,000) diluted in the blocking buffer at 4 °C for 24 h. The slices were washed with PBS and incubated at room temperature for 2 h with fluorescent secondary antibodies (Jackson 715-605-150 Alexa Fluor 647-conjugated donkey anti-mouse for NeuN and Millipore A21206 Alexa Fluor 488-conjugated goat anti-rabbit for GFP), diluted in blocking buffer). After PBS wash, sections were mounted onto slides with Fluoromount-G (eBioscience). Composite images were acquired and stitched using an LSM 880 or LSM 710 laser scanning confocal microscope (Carl Zeiss) and/or a ZEISS Axio Scan Z1 (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility; RRID: SCR_017949). Image analyses were performed using ImageJ. After electrophysiological recordings, the slices were incubated in 4% paraformaldehyde in PBS. Sections were then washed in PBS, mounted on glass slides with Fluoromount-G (eBioscience) and visualized under an LSM 880 laser scanning confocal microscope (Carl Zeiss). In situ hybridization experiments were conducted as previously reported.
Three-dimensional brain imaging and processing
Imaging and processing of the sample brain with tracers injected in HVC (Alexa 488-conjugated dextran 10,000) and RA (Alexa 568-conjugated dextran 10,000) for three-dimensional (3D) rendering were conducted with the help of Denise Ramirez and Ariana Nawaby (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility; RRID: SCR_017949). After perfusion with 4% paraformaldehyde, the brain was embedded in oxidized agarose in preparation for sectioning. The TissueCyte 1000 instrument (TissueVision) automatically sectioned the entire volume of the brain at 100 mm in the coronal plane and collected mosaic image tiles encompassing each section. For preprocessing, images were downsampled to 1.5-μm xy resolution and colour contrast adjusted to provide high visual contrast between signals of interest and background.
For segmentation, a selected portion of signals of interest in the downsampled contrast adjusted images of the tissue was visually identified, annotated and used to train a random forest classifier for segmentation in ilastik (v.1.3.3) (refs. 61,62,63,64). This classifier was applied to all section images in the brain to assign a probability score to each pixel in the image, corresponding to its chance of belonging to specific fluorescent signals, autofluoresence or background noise. The total autofluorescence (Alexa 488 (green) and Alexa 568 (red)) pixelwise probability scores were further processed and used for visualization.
For segmentation post-processing, to create a grey silhouette of the overall shape of the brain, the autofluoresence probability signal was thresholded using the ImageJ default thresholding algorithm. Any holes in the binary mask were then flood-filled, and particles greater than 3,024 px2 were removed. Green and red probabilities were thresholded at 105 and 79 8-bit pixel intensities, respectively, as determined visually to reduce low-probability noise in the image. The GFP signal in the rostral-most portion of the brain (beyond section 135) was dimmed for better visibility of more caudal structures by subtracting the pixel intensities by 140 pixel intensity units in the 8-bit range.
For visualization, combined RGB images of the autofluoresence (grey), Alexa 488 (green) and Alexa 568 (red) post-processed probabilities were visualized in 3D using VAA3D software (v.V3.447; https://home.penglab.com/proj/vaa3d/home/index.html).
Song analysis
Birdsongs were recorded and analysed using Sound Analysis Pro (SAP) 2011 (ref. 65), and plots were made with a modified version of Avian Vocalization Network66. We manually measured and categorized the outcomes of optogenetic stimulations. Truncations were defined as stimulation-contingent atypical amplitude decays of 300 ms or less (not present in control motifs), visible as silent gaps in the spectrogram. Truncation latencies were measured from the onset of the light delivery to the onset of the optically contingent silent gap. Stop was defined as truncation not followed by continuation or resumption of the motif. Syllable boundaries and complex syllable elements were delimited by silent pauses or by clear spectral continuity changes. Twenty stimulated song segments were measured for stimulated and non-stimulated conditions for quantification of acoustic properties and sound similarity (SAP). Acoustic properties of the stimulated segment were measured and compared with the corresponding song fragment in unstimulated control motifs. When optical stimulation did not cause truncation, acoustic properties were calculated on the song fragment from the onset of optical stimulation to the end of the last syllable. The entire motif was analysed during 1-s stimulation trials.
In the 1-s time window after song truncation, optical stimulation effects were manually classified as falling into one of four categories: (1) motif reset (restarting with the first song syllable, with introductory notes or with syllables that normally link motifs); (2) calls (typical zebra finch calls); (3) introductory notes (those not followed by motif initiation); or (4) pause and continuation (post-truncation motif resumption at any syllable in the motif other than the first syllable). To calculate the normalized motif reset probability, the number of motifs per bout was calculated over 30–50 bouts (defined as chains of motifs, started with introductory notes and mostly uninterrupted; in rare occasions, we found motifs produced within 1 s from other motifs, and they were considered as part of the previous bout; M, average number of motifs per bout). Each bird’s probability of motif truncation was then normalized (normalized motif reset probability = motif reset probability/[1 − (1/M)), following the logic that 1/M is the likelihood of each motif to be the last in the bout and not be followed by another motif. Therefore, 1 − (1/M) is the probability of a motif to be followed by another motif in the current bout. The probability of reset implies the presence of a motif after the truncated one examined. Therefore, dividing by the likelihood of that motif being followed by another one returns a normalized measure of the reset.
To report cross-motif quantification of truncation or reset latency and resumed vocalization identity probability, events were categorized depending on the time point within the motif at which the onset of the corresponding stimulation occurred. The events were then grouped in 10% bins across the motif duration, per bird, to allow for comparison between birds with different motif lengths. Then 100% for each bird was set to the duration of the motif −100 ms, as the latency to truncation when applied later than 100 ms before the end of the motif would lead to unclear effects on the syllables (average truncation latency across groups = 74.36 ± 3.06). Whenever the stimulation happened in the last 100 ms of motif, the events were classified in the −20% to 0% bins, affecting the transition to the following motif (if any). Stimulations, truncations and post-truncation effects occurring during introductory notes and inter-motif connecting syllables were assigned to these −20% to 0% time bins on the basis of their temporal distance to the syllable A (if no syllable A onset was produced, the effects were not considered for further analysis, as we could not categorize the introductory note as produced at specific distance from the motif for the percentage computation).
To evaluate the likelihood of optogenetic inhibition or stimulation across a motif–motif transition to terminate a bout (Extended Data Fig. 4d–h), we delivered light or sham stimulation across the motif and extending beyond its end, and we quantified the probability of the stimulation to be contingent with the termination of the bout for 50 trials in each condition.
In lesion experiments, a minimum of 20 motifs were scored with SAP against pre-surgery motifs. Failed motif starts were defined as a series of introductory notes not leading to a motif. The number of motifs in a bout was counted over 50 bouts; for TeNT experiments, for birds that would ultimately lose their song (UVAHVC TeNT; some HVCX TeNT), the last 50 bouts before song cessation were analysed. In case of absence of motifs being produced post-lesion in Fig. 3b (the birds did not sing at all), the accuracy was assigned the value of 0 for the sake of classification.
Recurrent circuit model of HVC
The computational model used in this study is on the basis of a canonical recurrent circuit model (continuous attractor neural network67,68,69) and simulated in the BrainPy framework70. In a typical continuous attractor neural network, excitatory neurons are arranged to uniformly cover a linear feature space (for example, the location of the timing chain in the current case71) and have mutual interactions through recurrent connections72. This configuration gives rise to a continuous manifold that sustains a series of activity bumps. A song motif is considered to be controlled by an activity bump traversing from one end of the chain to another73.
To better reflect the biological characteristics of the songbird HVC, we introduced several specific features.
The model incorporates the following five distinct neuron types to capture the functional diversity in the songbird HVC:
-
(1)
Excitatory neurons (HVCRA, \({{\bf{r}}}_{{\rm{RA}}}\), and HVCX, \({{\bf{r}}}_{{\rm{X}}}\))
The excitatory neurons responsible for encoding the neural sequence are divided into two groups (HVCRA and HVCX), with their firing rates denoted as \({{\bf{r}}}_{{\rm{RA}}}\) and \({{\bf{r}}}_{{\rm{X}}}\), respectively. Consistent with experimental observations, the model only includes intergroup connections and leaves neurons within the same group unconnected. Simulations demonstrated that these intergroup connections are sufficient to self-sustain non-zero responses and moving sequences.
-
(2)
Global inhibitory neurons (\({{\bf{r}}}_{{\rm{g}}}\))
To keep the stability of the network, the network model contains a global inhibitory neuron with the firing rate \({{\bf{r}}}_{{\rm{g}}}\). Compared with excitatory neurons, in the model, this neuron has more rapid dynamics and a steeper activation function to provide effective global inhibition.
-
(3)
Local inhibitory neurons (\({{\bf{r}}}_{{\rm{I}}}\))
The circuit model has another group of inhibitory neurons (\({{\bf{r}}}_{{\rm{I}}}\)) providing local, structured inhibitory feedback to the excitatory populations, which is essential to generate spontaneous movement of the population activity bumps of excitatory neurons within the circuit. The \({{\bf{r}}}_{{\rm{I}}}\) bump slightly lags behind the excitatory neuron bumps owing to transmission delay and slow dynamics, so that the excitatory neurons at more distant locations will be suppressed less and build up more activity. As a result, the activity bump of excitatory neurons is ‘pushed’ to move forward.
-
(4)
Peri-song neurons (\({{\bf{r}}}_{{\rm{ps}}}\))
The circuit model contains an HVCRA peri-song neuron group (\({{\bf{r}}}_{{\rm{ps}}}\)) that is modelled to target HVCRA song neurons at the initial end of the manifold. This group plays a critical role in initiating and resetting motif generation.
Circuit dynamics
The neural dynamics underlying these activities are captured by a set of dynamic equations:
In these equations, subscripts denote the neuron types. The parameter \(\tau \) represents the time constant, and \(f(\cdot )\) denotes the activation function for each neuron group. External input currents are denoted as \({I}_{\mathrm{ext}}\), and specific terms such as \({I}_{\mathrm{Uva}}\) correspond to input from Uva. The capital \({W}_{{\rm{A}},{\rm{B}}}\) indicates the connection matrix from group A to B with dimensions \({N}_{{\rm{B}}}\times {N}_{{\rm{A}}}\), where \(N\) is the number of neurons in the respective group, whereas the lowercase \(w\) indicates the scalar connection strength. For convenience, we set \({N}_{\mathrm{RA}}={N}_{{\rm{X}}}={N}_{{\rm{I}}}=N\) and \({N}_{{\rm{g}}}={N}_{\mathrm{ps}}=1\). Specifically, to support a continuous manifold, the entries of connections between excitatory and local inhibitory neurons are determined by the distance between the index of pre-synaptic and post-synaptic neurons:
where \({w}_{{\rm{A}},{\rm{B}}}\) (\({\rm{A}},{\rm{B}}\in \{\mathrm{RA},{\rm{X}},{\rm{I}}\}\)) denotes the peak weight of the weight from neuronal population \({\rm{A}}\) to \({\rm{B}}\).
To target the peri-song output to the initial location of the manifold, \({W}_{\mathrm{ps},{\rm{E}}}\) is a \(N\times 1\) matrix with its column in a Gaussian profile centring at 0:
Sequence initiation
The fundamental property of the network is its ability to spontaneously generate neural sequences. In our model, peri-song neurons initiate the sequential activity. The peri-song neurons receive excitatory input, probably originating from the upstream nucleus Uva, while simultaneously receiving inhibitory input from the global inhibitory neurons. When the network is silenced, whether at rest or following truncation, activity in the global inhibitory neuron decreases, which disinhibits the peri-song neurons. This release from inhibition then triggers the onset of a motif.
Boundaries
Following the activation of excitatory neurons, the activity bump is driven by locally structured inhibitory feedback from \({{\bf{r}}}_{{\rm{I}}}\) to traverse the continuous manifold. For the bump to gain a directional motion tendency, the inhibitory feedback is intentionally enhanced at the initial locations on the chain. Owing to the recurrent nature of the network, the bump would ordinarily ‘bounce’ back upon reaching the end of the chain. However, this behaviour is inconsistent with observed data. To address this, we introduced a fading mechanism for excitatory-to-excitatory connections as the bump approaches the boundary, simulating a ‘boundary effect’. This gradual reduction in connectivity causes the bump to diminish as it reaches the end point, resulting in an automatic cessation of activity that mimics the natural termination of a motif. These two boundary behaviours were implemented by multiplying the connection strength with a compensation factor:
\(\mathrm{where}\,\phi \) is an offset term, in which we take the value \(\phi =0.5\sigma N/2{\rm{\pi }}\). The compensated connection matrices are shown in Fig. 5c.
Truncation
To simulate optogenetic stimulation truncating HVC neuronal sequences observed in experimental studies, we applied an intense, spatially homogeneous pulse input to either HVCRA or HVCX neurons. Following this stimulation, both \({{\bf{r}}}_{{\rm{RA}}}\) and \({{\bf{r}}}_{{\rm{X}}}\) became hyper-activated, leading to rapid suppression by the fast response of \({{\bf{r}}}_{{\rm{g}}}\). These neurons remain suppressed until \({{\bf{r}}}_{{\rm{g}}}\) activity subsides, corresponding to the observed motif truncation (Fig. 5e,f). Subsequently, the peri-song neurons reinitiate the neural sequence, allowing the motif to resume from the beginning. Considering that HVCRA and HVCX are connected symmetrically in the current model, we only simulated optogenetic stimulation on HVCRA as a verification.
HVCX degradation
To simulate the effects of degradation of HVCX neuron neurotransmission, as observed in Fig. 5g,h, we manually modified the output projections of HVCX. Let \(p\) denote the proportion of degradation. Under this condition, the degraded projection from HVCX to HVCRA (\({W}_{{\rm{X}},\mathrm{RA}}{\prime} \)) can be expressed as
where \({W}_{{\rm{X}},\mathrm{RA}}^{({ij})}\) represents the original connection strength, \({\sigma }_{W}\) denotes the variation coefficient, \({\xi }_{{ij}}\) is an independent Gaussian noise term indexed by the pre-neuron and post-neuron indices ij, and \({[x]}_{+}=\max (x,0)\) denotes the negative rectification, ensuring the weight is always excitatory (positive).
During synaptic degradation over weeks, experiments revealed that neuronal sequences observed in different trials within the same day could traverse and then disappear at random locations on the chain. We assume that the synaptic weights within the same day are nearly the same, and that the random progression along the chain results from the variability of single neurons. Therefore, to reproduce the random progression along the chain during synaptic degradation, each HVCRA neuron \({{\bf{r}}}_{{\rm{RA}}}(j)\) receives a Poisson-like noise \({I}_{\mathrm{noise}}\), mimicking stochastic spike generation:
where \(F\) is the Fano factor scaling the noise and \(\xi (t)\) is a standard Gaussian white noise. Moreover, the noises received by different neurons are independent of each other. Under these conditions, we observed that the sequences terminated at random positions. As illustrated in Fig. 5g, the average sequence length decreased as the proportion of neuronal degradation increased.
Statistical analysis
All data were analysed with GraphPad Prism 10. Data were tested for normality using the Shapiro–Wilk Test. Parametric and non-parametric statistical tests were used. To compare between two groups, t-test, Mann–Whitney and Kolmogorov–Smirnov tests were used. For more than two conditions, one-way and two-way ANOVA or the Kruskal–Wallis test were performed. Cumulative probability curves were calculated for each animal and then tested in groups for statistical significance. Only one comparison among all groups was made to avoid repeatedly comparing the same dataset (HVC) with individual other datasets. Fisher or X2 tests, followed by Dunn’s post hoc test, were used to compare the probability of finding optically evoked responses across the HVC projection neuron classes while stimulating the different afferents. Dunn’s, Sidak’s or Holm–Sidak’s post hoc tests were used to correct for multiple comparisons. Statistical significance refers to *P < 0.05, **P < 0.01 and ***P < 0.001.
Statistics and reproducibility
Each experimental result was produced independently and/or by combining at least two separate cohorts with similar results (for example, Uva lesions/silencing in Fig. 3 and Extended Data Fig. 4, multi-nuclei lesions in Extended Data Fig. 6 and HVCX–TeNT experiments in Fig. 5 and Extended Data Fig. 13). Figures showing viral expression or lesion extent are broadly representative of each experimental group.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The datasets generated and/or analysed during this study are available in the Texas Data Repository (https://dataverse.tdl.org/dataverse/trusel_Nature_2025_data).
Code availability
The code for model simulations and visualizations is available at GitHub (https://github.com/Zack-zuo/SongbirdHVC#)74. Anatomical atlas drawings are adapted from the ZEBrA database (Oregon Health & Science University; http://www.zebrafinchatlas.org)60. Source data are provided with this paper.
References
Aronov, D., Andalman, A. S. & Fee, M. S. A specialized forebrain circuit for vocal babbling in the juvenile songbird. Science 320, 630–634 (2008).
Doeringer, J. A. & Hogan, N. Serial processing in human movement production. Neural Networks 11, 1345–1356 (1998).
Jin, X., Tecuapetla, F. & Costa, R. M. Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nat. Neurosci. 17, 423–430 (2014).
Lashley, K. S. in Cerebral Mechanisms in Behavior; the Hixon Symposium (ed. Jeffress, L. A.) 112–146 (Wiley, 1951).
Gallistel, C. R. Précis of Gallistel’s The Organization of Action: A New Synthesis. Behav. Brain Sci. 4, 609–650 (1981).
Okubo, T. S., Mackevicius, E. L., Payne, H. L., Lynch, G. F. & Fee, M. S. Growth and splitting of neural sequences in songbird vocal development. Nature 528, 352–357 (2015).
Newell, A. & Rosenbloom, P. S. in Cognitive Skills and Their Acquisition (ed. Anderson, J. R.) 1–55 (Lawrence Erlbaum Associates, 1981).
Brainard, M. S. & Doupe, A. J. What songbirds teach us about learning. Nature 417, 351–358 (2002).
Hahnloser, R. H., Kozhevnikov, A. A. & Fee, M. S. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature 419, 65–70 (2002).
Fee, M. S., Kozhevnikov, A. A. & Hahnloser, R. H. Neural mechanisms of vocal sequence generation in the songbird. Ann. N Y Acad. Sci. 1016, 153–170 (2004).
Long, M. A., Jin, D. Z. & Fee, M. S. Support for a synaptic chain model of neuronal sequence generation. Nature 468, 394–399 (2010).
Lynch, G. F., Okubo, T. S., Hanuschkin, A., Hahnloser, R. H. & Fee, M. S. Rhythmic continuous-time coding in the songbird analog of vocal motor cortex. Neuron 90, 877–892 (2016).
Picardo, M. A. et al. Population-level representation of a temporal sequence underlying song production in the zebra finch. Neuron 90, 866–876 (2016).
Ashmore, R. C., Renk, J. A. & Schmidt, M. F. Bottom-up activation of the vocal motor forebrain by the respiratory brainstem. J. Neurosci. 28, 2613–2623 (2008).
Ashmore, R. C., Wild, J. M. & Schmidt, M. F. Brainstem and forebrain contributions to the generation of learned motor behaviors for song. J. Neurosci. 25, 8543–8554 (2005).
Hamaguchi, K., Tanaka, M. & Mooney, R. A distributed recurrent network contributes to temporally precise vocalizations. Neuron 91, 680–693 (2016).
Moll, F. W. et al. Thalamus drives vocal onsets in the zebra finch courtship song. Nature 616, 132–136 (2023).
Long, M. A. & Fee, M. S. Using temperature to analyse temporal dynamics in the songbird motor pathway. Nature 456, 189–194 (2008).
Elmaleh, M., Kranz, D., Asensio, A. C., Moll, F. W. & Long, M. A. Sleep replay reveals premotor circuit structure for a skilled behavior. Neuron 109, 3851–3861 (2021).
Armstrong, E. & Abarbanel, H. D. Model of the songbird nucleus HVC as a network of central pattern generators. J. Neurophysiol. 116, 2405–2419 (2016).
Amador, A., Perl, Y. S., Mindlin, G. B. & Margoliash, D. Elemental gesture dynamics are encoded by song premotor cortical neurons. Nature 495, 59–64 (2013).
Danish, H. H., Aronov, D. & Fee, M. S. Rhythmic syllable-related activity in a songbird motor thalamic nucleus necessary for learned vocalizations. PLoS ONE 12, e0169568 (2017).
Vu, E. T., Mazurek, M. E. & Kuo, Y. C. Identification of a forebrain motor programming network for the learned song of zebra finches. J. Neurosci. 14, 6924–6934 (1994).
Ashmore, R. C., Bourjaily, M. & Schmidt, M. F. Hemispheric coordination is necessary for song production in adult birds: implications for a dual role for forebrain nuclei in vocal motor control. J. Neurophysiol. 99, 373–385 (2008).
Histed, M. H., Bonin, V. & Reid, R. C. Direct activation of sparse, distributed populations of cortical neurons by electrical microstimulation. Neuron 63, 508–522 (2009).
Garcia-Oscos, F. et al. Autism-linked gene FoxP1 selectively regulates the cultural transmission of learned vocalizations. Sci. Adv. 7, eabd2827 (2021).
Franz, M. & Goller, F. Respiratory units of motor production and song imitation in the zebra finch. J. Neurobiol. 51, 129–141 (2002).
Marder, E. & Bucher, D. Central pattern generators and the control of rhythmic movements. Curr. Biol. 11, R986–996 (2001).
Berkowitz, A. Expanding our horizons: central pattern generation in the context of complex activity sequences. J. Exp. Biol. 222, jeb192054 (2019).
Coleman, M. J. & Vu, E. T. Recovery of impaired songs following unilateral but not bilateral lesions of nucleus uvaeformis of adult zebra finches. J. Neurobiol. 63, 70–89 (2005).
Trusel, M. et al. Synaptic connectivity of sensorimotor circuits for vocal imitation in the songbird. eLife https://doi.org/10.7554/eLife.104609 (2025).
Coleman, M. J., Roy, A., Wild, J. M. & Mooney, R. Thalamic gating of auditory responses in telencephalic song control nuclei. J. Neurosci. 27, 10024–10036 (2007).
Cynx, J. Experimental determination of a unit of song production in the zebra finch (Taeniopygia guttata). J. Comp. Psychol. 104, 3–10 (1990).
Cardin, J. A. & Schmidt, M. F. Auditory responses in multiple sensorimotor song system nuclei are co-modulated by behavioral state. J. Neurophysiol. 91, 2148–2163 (2004).
Foster, E. F. & Bottjer, S. W. Lesions of a telencephalic nucleus in male zebra finches: influences on vocal behavior in juveniles and adults. J. Neurobiol. 46, 142–165 (2001).
Bottjer, S. W., Miesner, E. A. & Arnold, A. P. Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science 224, 901–903 (1984).
Roberts, T. F., Gobes, S. M., Murugan, M., Ölveczky, B. P. & Mooney, R. Motor circuits are required to encode a sensory model for imitative learning. Nat. Neurosci. 15, 1454–1459 (2012).
Ali, F. et al. The basal ganglia is necessary for learning spectral, but not temporal, features of birdsong. Neuron 80, 494–506 (2013).
Mooney, R. Different subthreshold mechanisms underlie song selectivity in identified HVc neurons of the zebra finch. J. Neurosci. 20, 5420–5436 (2000).
Fetterman, G. C. & Margoliash, D. Rhythmically bursting songbird vocomotor neurons are organized into multiple sequences, suggesting a network/intrinsic properties model encoding song and error, not time. Preprint at bioRxiv https://doi.org/10.1101/2023.01.23.525213 (2023).
Kozhevnikov, A. A. & Fee, M. S. Singing-related activity of identified HVC neurons in the zebra finch. J. Neurophysiol. 97, 4271–4283 (2007).
Fee, M. S. & Goldberg, J. H. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience 198, 152–170 (2011).
Scharff, C., Kirn, J. R., Grossman, M., Macklis, J. D. & Nottebohm, F. Targeted neuronal death affects neuronal replacement and vocal behavior in adult songbirds. Neuron 25, 481–492 (2000).
Sánchez-Valpuesta, M. et al. Corticobasal ganglia projecting neurons are required for juvenile vocal learning but not for adult vocal plasticity in songbirds. Proc. Natl Acad. Sci. USA 116, 22833–22843 (2019).
Hahnloser, R. H., Kozhevnikov, A. A. & Fee, M. S. Sleep-related neural activity in a premotor and a basal-ganglia pathway of the songbird. J. Neurophysiol. 96, 794–812 (2006).
Mooney, R. & Prather, J. F. The HVC microcircuit: the synaptic basis for interactions between song motor and vocal plasticity pathways. J. Neurosci. 25, 1952–1964 (2005).
Kosche, G., Vallentin, D. & Long, M. A. Interplay of inhibition and excitation shapes a premotor neural sequence. J. Neurosci. 35, 1217–1227 (2015).
Kornfeld, J. et al. EM connectomics reveals axonal target variation in a sequence-generating network. eLife https://doi.org/10.7554/eLife.24364 (2017).
Daliparthi, V. K. et al. Transitioning between preparatory and precisely sequenced neuronal activity in production of a skilled behavior. eLife https://doi.org/10.7554/eLife.43732 (2019).
Sadeh, S. & Clopath, C. Inhibitory stabilization and cortical computation. Nat. Rev. Neurosci. 22, 21–37 (2021).
Bera, K., Shukla, A. & Bapi, R. S. Motor chunking in internally guided sequencing. Brain Sci. 11, 292 (2021).
Tosatto, L., Fagot, J., Nemeth, D. & Rey, A. Chunking as a function of sequence length. Anim. Cogn. 28, 2 (2024).
Fonollosa, J., Neftci, E. & Rabinovich, M. Learning of chunking sequences in cognition and behavior. PLoS Comput. Biol. 11, e1004592 (2015).
Ramkumar, P. et al. Chunking as the result of an efficiency computation trade-off. Nat. Commun. 7, 12176 (2016).
Lai, L., Huang, A. Z. X. & Gershman, S. J. Action chunking as conditional policy compression. Cognition 264, 106201 (2025).
Mackevicius, E. L., Happ, M. T. L. & Fee, M. S. An avian cortical circuit for chunking tutor song syllables into simple vocal-motor units. Nat. Commun. 11, 5029 (2020).
Piristine, H. C., Choetso, T. & Gobes, S. M. A sensorimotor area in the songbird brain is required for production of vocalizations in the song learning period of development. Dev. Neurobiol. 76, 1213–1225 (2016).
Roberts, T. F. et al. Identification of a motor-to-auditory pathway important for vocal learning. Nat. Neurosci. 20, 978–986 (2017).
Stauffer, T. R. et al. Axial organization of a brain region that sequences a learned pattern of behavior. J. Neurosci. 32, 9312–9322 (2012).
Karten, H. J. et al. Digital atlas of the zebra finch (Taeniopygia guttata) brain: a high-resolution photo atlas. J. Comp. Neurol. 521, 3702–3715 (2013).
Peng, H., Bria, A., Zhou, Z., Iannello, G. & Long, F. Extensible visualization and analysis for multidimensional images using Vaa3D. Nat. Protoc. 9, 193–208 (2014).
Peng, H., Ruan, Z., Long, F., Simpson, J. H. & Myers, E. W. V3D enables real-time 3D visualization and quantitative analysis of large-scale biological image data sets. Nat. Biotechnol. 28, 348–353 (2010).
Peng, H. et al. Virtual finger boosts three-dimensional imaging and microsurgery as well as terabyte volume image visualization and analysis. Nat. Commun. 5, 4342 (2014).
Berg, S. et al. ilastik: interactive machine learning for (bio)image analysis. Nat. Methods 16, 1226–1232 (2019).
Tchernichovski, O., Nottebohm, F., Ho, C. E., Pesaran, B. & Mitra, P. P. A procedure for an automated measurement of song similarity. Anim. Behav. 59, 1167–1176 (2000).
Koch, T. M. I., Marks, E. & Roberts, T. F. AVN: a deep learning approach for the analysis of birdsong. Preprint at bioRxiv https://doi.org/10.1101/2024.05.10.593561 (2024).
Fung, C. C., Wong, K. Y. & Wu, S. A moving bump in a continuous manifold: a comprehensive study of the tracking dynamics of continuous attractor neural networks. Neural Comput. 22, 752–792 (2010).
Amari, S. Dynamics of pattern formation in lateral-inhibition type neural fields. Biol. Cybern. 27, 77–87 (1977).
Wu, S., Hamaguchi, K. & Amari, S. Dynamics and computation of continuous attractors. Neural Comput. 20, 994–1025 (2008).
Wang, C. et al. BrainPy, a flexible, integrative, efficient, and extensible framework for general-purpose brain dynamics programming. eLife https://doi.org/10.7554/eLife.86365 (2023).
Zuo, J., Liu, X., Wu, Y. N., Wu, S. & Zhang, W.-H. A recurrent neural circuit mechanism of temporal-scaling equivariant representation. In Proc. 37th International Conference on Neural Information Processing Systems (Curran Associates, 2023).
Niell, C. M. Cell types, circuits, and receptive fields in the mouse visual cortex. Annu. Rev. Neurosci. 38, 413–431 (2015).
Zhang, W. & Wu, S. Neural information processing with feedback modulations. Neural Comput. 24, 1695–1721 (2012).
Zuo, J. Recurrent model of Songbird HVC. GitHub https://github.com/Zack-zuo/SongbirdHVC# (2025).
Acknowledgements
We thank T. Gentner, W. Dauer, D. Hattori, S. Choi and members of the Roberts laboratory for comments on an initial version of the paper, and M. Long, M. Schmidt and D. Aronov for valuable discussions of our data as the project unfolded. We are grateful to J. Holdway, L. Garcia and R. Cabuco for laboratory support. We thank D. Ramirez and A. Nawaby (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility, RRID: SCR_017949) for assistance with 3D imaging and rendering. We also thank J. Hilton, R. Hunte and P. Jennings for administrative support. W.H.Z. is supported by UT Southwestern Endowed Scholar Program. Finally, we acknowledge the National Institutes of Health for supporting this research through grant nos. UF1NS115821 and R01NS108424 to T.F.R. and F99NS124172 to D.H.A.
Author information
Authors and Affiliations
Contributions
M.T. and T.F.R. conceived the project. D.H.A., J.C. and H.P. adapted and produced the viral tools. J.Z. and W.-H.Z. designed and conducted the modelling work. M.T. designed the methodology and performed the optogenetic manipulations, lesions and electrophysiological recordings. M.T. and B.G.C. designed the methodology and performed the air sac pressure recordings with optogenetic stimulations. Z.Z. performed Neuropixels 2.0 recordings for the revision process (see response to reviewers). M.T., J.Z., E.S.M., T.M.I.K. and B.G.C. visualized the project. T.F.R. acquired funding and administered and supervised the project. M.T., W.-H.Z. and T.F.R. wrote the original draft of the paper. All authors contributed to writing, reviewing and editing the paper. All data are available in the main text or the supplementary materials.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Yarden Cohen, Melissa Coleman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Effects of optogenetic stimulation of HVC neurons in singing zebra finches.
a) Schematic of zebra finch song circuits (sagittal view), including HVC’s afferents (grey) from Uva (nucleus Uvaeformis), NIf (nucleus interface of the nidopallium), mMAN (medial magnocellular nucleus of the anterior nidopallium), and Av (nucleus avalanche); HVC projections to RA (robust nucleus of the arcopallium) via HVCRA neurons (magenta), to the striatopallidal region Area X via HVCX neurons (cyan), and to Av via HVCAv neurons (yellow); HVC inhibitory interneurons (black); the cortico-basal ganglia-thalamocortical song pathway (brown dashed lines), and the corticobulbar song motor pathway from RA (green). b) HVC multiunit neuronal activity recording in anesthetized birds expressing ChRmine in HVC. Sample trace (top, scale bar 1 s, 1 V), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation (100 ms, red bar; two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation: interaction F(69,621) = 8.137 P < 0.001, stimulation F(1,9) = 11.20 P = 0.0086, Sidak post-hoc P < 0.05 between 20 ms after the light onset and 40 ms after the light offset); inset displays magnified detail of the PSTH and scatter plot highlighting and computing the average (per hemisphere) response to the first 100 ms light stimulation (red dashed rectangle) compared to the last 100 ms baseline (black dashed rectangle, Wilcoxon test P = 0.002; n = 10 hemispheres, 5 birds). c) More spectrograms (0-11KHz, scale bar 200 ms, horizontal lines identify song elements) from the bird in Fig. 1b (same lettering and symbols) displaying multiple events of optogenetically-evoked truncations followed by rapid restart of a motif (orange circle) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. d) Violin plots reporting accuracy of song segments with (gray) and without (white) stimulation for each bird (n = 6; two-way ANOVA testing the effect of optogenetic stimulation per each bird across the group, interaction F(5,114) = 8.178 P < 0.001, CTRL vs. STIM, F(1,114) = 55.22 P < 0.001). e-g) Same as (d) but for Amplitude (two-way ANOVA, interaction F(5,114) = 19.49 P < 0.001, CTRL vs. STIM, F(1,114) = 246.4 P < 0.001), Entropy (two-way ANOVA, interaction F(5,114) = 23.76 P < 0.001, CTRL vs. STIM, F(1,114) = 322.5 P < 0.001) and Goodness of pitch (two-way ANOVA, interaction F(5,114) = 17.66 P < 0.001, CTRL vs. STIM, F(1,114) = 101.4 P < 0.001). h) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) following HVC light stimulation computed based on the time of stimulation through the progression of the motif (bins = 10% motif advancement). i) Box plots (5-95 percentile, 25,50,75 percentile) reporting probability of motif restart for each bird (orange dots). The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum). j) Spectrogram (top, as (c)) and subsyringeal air pressure (bottom, black trace) relative to 3 motifs (stimulation: red bars, 10 ms, lettering and symbols as per panel (c), scale bar 200 ms). (bottom) Grey semitransparent horizontal line indicates ambient pressure, supratmospheric pressure shows expiratory air pressure and subatmospheric is inspiratory. The insert (right) shows the average of syllable C (grey shading ±2 SD) in control (black) and stimulated trials (blue line). The yellow shading indicates the significant reduction in air pressure caused by optogenetic stimulation HVC neurons. k) Average ±SEM of cumulative probability distributions calculated for each bird whose pressure was recorded, displaying the latency to truncation as measured for each bird from spectrograms (black line) or from subsyringeal pressure (blue line) (10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(51,153) = 149.2, pressure vs. song F(1,3) = 19.50 P < 0.001, Sidak’s post-hoc identifies significant difference (p < 0.05) between 40 and 60 ms time bins); (inset) violin plots reporting the latency of motif truncation computed across all the birds (latency calculated from spectrograms (white), latency calculated from pressure for the same 4 birds (blue); two-sided Mann-Whitney U = 236 P < 0.001). l) Violin plots reporting the duration of the optogenetically-evoked apnea in events where truncations were not followed by restarting the song motif, computed across all stimulations for each of the 4 birds. Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 2 Dual effect of eGtACR1 at somatic and axonal compartments.
a) Schematics (top), and in-vivo recordings under anhestesia (bottom) showing the dual effect of eGtACR1-mediated stimulation of axonal terminals causing neurotransmitter release (red), and somatic inhibition (blue). Sample traces, raster plots and PSTH (average ±SEM fold change in HVC activity binned each 100 ms, n = 5 hemispheres, 3 birds) reporting in-vivo recordings of HVC neuronal activity change in response to NIf axon terminal excitation (red) or soma inhibition (blue) upon light delivery (region shaded in green; scale bar 200 ms, 1 V). b) Sample traces of evoked EPSPs from an eGtACR1-expressing neuron in absence or presence of somatic light-mediated inhibition (green bar; current injection 100 ms 50 pA steps, −200 pA to +400 pA, scale bar 100 ms, 20 mV). Brain outlines adapted with permission from ref. 60, Wiley.
Extended Data Fig. 3 Role of Uva and Peri-Uva thalamus in song.
a, b) Data from birds expressing eGtACR1 in Uva and implanted with optic fibers over HVC: a) Violin plots reporting Entropy of song segments with (gray) and without (white) stimulation (n = 4); two-way ANOVA testing the effect of optogenetic stimulation, interaction F(3,76) = 26.18 P < 0.001, CTRL vs. STIM, F(1,76) = 1.099 P = 0.2979). b) Violin plots reporting accuracy of song segments with (gray) and without (white) stimulation (n = 4) when using 1 s long stimulation (two-way ANOVA testing the effect of optogenetic stimulation, interaction F(3,76) = 7.91 P < 0.001, CTRL vs. STIM, F(1,76) = 0.06636 P = 0.7974). c) Data from birds expressing ChRmine in UvaHVC neurons and implanted with optic fibers over Uva. Violin plots reporting Entropy of song segments with (200 ms stimulation, gray) and without (white) stimulation (n = 3; two-way ANOVA testing the effect of optogenetic stimulation, interaction F(2,57) = 0.3931 P = 0.6768, CTRL vs. STIM, F(1,57) = 3.500 P = 0.0665). d) Schematic and sample image showing non-selective expression of AAV-ChRmine in Uva and peri-Uva thalamus, followed by thalamic song-contingent stimulation. Sample spectrogram (scale bar 200 ms, horizontal lines mark song elements) displaying motif truncation at syllable boundaries caused by thalamic light stimulation (50 ms 532 nm light pulses, red bars).Sample image illustrating ChRmine expression (red, scale bar 200 ms) in Uva (UvaHVC neurons labeled by tracer injection in HVC (green)) and peri-Uva thalamus, stimulated by light delivered through the implanted fiber optic (white dashed lines top of image). e) (left) plot showing amplitude of all the stimulated (red line) motifs, ordered by time of stimulation in the motif. (right) Plot reporting a subset of stimulated motifs’ latency to optogenetic stimulation (red circle), motif truncation (blue “x”), and restart of a motif (orange), intro notes not followed by a motif (purple), calls (grey) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. f) Box and scatter plot reporting the probability of motif stop (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (thalamus stimulated birds, n = 2, filled squares; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. thalamus, motif stop P = 0.9883, pause+continuation P = 0.6761, no perturbation P = 0.9728). g) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve; thalamus-stimulated birds (blue), HVC-stimulated birds (black) dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. thalamus identifies significant (P < 0.05) differences between 60 and 140 ms time bins. (inset) Violin plots reporting the latency of motif truncation computed across all the birds (thalamus-stimulated birds (blue), HVC-stimulated birds (white) dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, post-hoc HVC vs. thalamus P < 0.001). h) Average ±SEM latency of motif truncation in response to thalamic light stimulation across the motif (bins= 10% motif advancement). i) Violin plots reporting the latency of motif truncation upon light stimulation, per bird (thalamus stimulation (blue), HVC stimulation (white, dataset from Fig. 1); nested one-way ANOVA comparing all datasets across the manuscript, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc pan-HVC vs. thalamus P = 0.0028). j) plot representing the distribution of all motif truncation times in relation to the nearest syllable (or complex syllable segment) end. 0 ms indicates truncation occurring at the natural end of the syllable or complex syllable segment, as indicated in17. Thalamic stimulation results in significant truncation prevalence at syllable end, compared to pan-HVC stimulation (AVG ± SEM, pan-HVC n = 6 birds, thalamus n = 2 birds). k) plot representing the truncation latency distribution for thalamic or pan-HVC stimulation (AVG ± SEM, pan-HVC n = 6 birds, thalamus n = 2 birds). l) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) following thalamic light stimulation computed based on the time of stimulation through the progression of the motif (bins= 10% motif advancement). m) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability (left) and normalized probability (right) of motif restart (thalamus-stimulated birds (orange); empty box plots representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i (left) and Fig. 1i (right) respectively; one-way ANOVA comparing restart probability across groups, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. thalamus P = 0.0011; one-way ANOVA, testing the difference between groups’ normalized restart probabilities F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. thalamus: P < 0.001). The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum). n) Violin plots reporting the latency of motif restart (orange: thalamus stimulation, white: HVC stimulation dataset from Fig. 1 compared against all experimental groups across the manuscript; nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. thalamus P < 0.001). o) Same as panel (g) but for latency to motif restart. Two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0036, Tukey’s post-hoc pan-HVC vs. thalamus identifies significant difference (p < 0.05) between 70 and 640 ms time bins. (inset) One-Way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(6) = 244.7, post-hoc HVC vs. thalamus P < 0.001). Brain outline in d adapted with permission from ref. 60, Wiley.
Extended Data Fig. 4 Role of Uva in motif initiation.
a) Map of excitotoxic thalamic lesions. Brain atlas plates adapted from the zebra finch atlas (http://www.zebrafinchatlas.org), from 1 to 2 mm across the mediolateral axis of magnified of the peri-Uva thalamic area (Uva highlighted in green). Extent of the lesion as measured from the lack of NeuN staining is schematized per each bird as semitransparent red area (dashed black for degeneration) and overlayed across birds by subgroup (peri-thalamus+Uva (left), peri-thalamus excluding Uva (middle), Uva excluding the larger perithalamic areas (right)). DLM – medial part of the dorsolateral nucleus of the anterior thalamus, CP – posterior commissure, Ov – nucleus ovoidalis, OM – occipitomesencephalic tract, ICo – intercollicular nucleus, SN – substantia nigra, Uva – nucleus Uvaeformis, DLP – dorsolateral nucleus of the posterior thalamus, DLA – dorsolateral nucleus of the anterior thalamus, IPo – intermedioposterior nucleus, SpM – nucelus spiriformis medialis, Rt – nucleus rotundus, NIf – nucleus interfacialis). b) Box plots (5-95 percentile, 25,50,75 percentile) reporting average number of TeNT expressing UvaHVC neurons/brain slice (purple, n = 6 birds). c) Syntax raster plots (~100 song bouts/day) from bird in Fig. 3f showing syntax changes due to TeNT expression in UvaHVC neurons. d) Schematic, sample image (scale bar 200 µm, white lines mark Uva and optic fiber boundaries), and spectrograms (horizontal lines identify song elements boundaries, scale bar 1 s) reporting effects of eGtACR1-mediated inhibition (blue lines, 1 s) of Uva neurons. e) Optical inhibition of Uva using eGtACR1 increases song bout terminations (n = 5 birds, control (no light, grey), Uva inhibition (blue); two-sided paired t-test P = 0.0238). f) Motif self-similarity during optical inhibition of Uva (two-sided paired t-test P = 0.5902). g,h) Optical excitation of Uva terminals over HVC in the birds from Fig. 3k doesn’t significantly affect song bout terminations (g) (n = 4 birds, control:black, no light, Uva terminals excitation: red, light delivered over eGtACR1-expressing Uva neurons for 1 sec; two-sided paired t-test P = 0.2211) nor motif self-similarity (h) (two-sided paired t-test P = 0.9429). Brain outlines in a and d adapted with permission from ref. 60, Wiley.
Extended Data Fig. 5 Optogenetic stimulation of pallial afferents to HVC does not disrupt the song motif.
a) Schematic of in-vivo recording of HVC multiunit neuronal activity in anesthetized birds expressing eGtACR1 in NIf; sample trace (top), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation of eGtACR1-expressing NIf afferents (1 s from 0.5 s to 1.5 s, red bar; two-way ANOVA comparing the curve between the 0 and 5 s versus corresponding 5 s baseline without stimulation, interaction F(499,10978) = 9.255 P < 0.001, stimulation F(1,22) = 12.07 P = 0.0022, Sidak’s post-hoc identifies difference at 0.92-1.79 s). Inset shows magnified PSTH and scatter plot with the average (per hemisphere) response to the first 200 ms light stimulation (red dashed rectangle) compared to the last 200 ms baseline (black dashed rectangle, Wilcoxon test P = 0.0017; n= hemispheres, birds). b) (top) Schematic of song-contingent light stimulation of NIf axonal terminals in HVC (optic fiber implanted over HVC) and sample spectrogram of unstimulated (top) and stimulated song (bottom, red bars, 200 ms ≈10 mW bilateral 473 nm LED, spectrogram scale, 0-11KHz, scale bar 200 ms, horizontal lines identify the bouts’ (black), introductory notes’ (light gray), motifs’ (dark gray) and linker syllable’s (brown) boundaries). c) Violin plots reporting accuracy of the stimulated song segment (gray), or corresponding control unstimulated segment (white) per each bird (n = 4); two-way ANOVA, interaction F(3,76) = 1.795 P = 0.1553, CTRL vs. STIM, F(1,76) = 0.7208 P = 0.3985). d) Same as (c) but for Entropy (n = 4; two-way ANOVA, interaction F(3,76) = 1.882 P = 0.1397, CTRL vs. STIM, F(1,76) = 0.1807 P = 0.6720). e) same as (c) but for goodness of pitch (n = 4; two-way ANOVA, interaction F(3,76) = 4.553 P = 0.0055, CTRL vs. STIM, F(1,76) = 2.301 P = 0.1334). f) same as (c) but for Accuracy of the entire motif for birds receiving 1 s light stimulation (n = 4; two-way ANOVA, interaction F(3,76) = 2.417 P = 0.0728, CTRL vs. STIM, F(1,76) = 3.072 P = 0.0837). g-l) same as (a-f), but for eGtACR1 expression in mMAN: (g) two-way ANOVA comparing the curve between the 0 and 5 s versus corresponding 5 s baseline without stimulation, interaction F(499,14970) = 8.937 P < 0.001, stimulation F(1,30) = 16.68 P < 0.001, Tukey’s post-hoc identifies difference at 0.85–1.76 s; Inset: Wilcoxon test P = 0.00071; n= hemispheres, birds. (i) n = 4 birds, two-way ANOVA, interaction F(3,76) = 4.483 P = 0.006, CTRL vs. STIM, F(1,76) = 0.6008 P = 0.4407. (j) two-way ANOVA, interaction F(3,76) = 6.168 P < 0.001, CTRL vs. STIM, F(1,76) = 0.1036 P = 0.3119. (k) two-way ANOVA, interaction F(3,76) = 2.434 P = 0.0714, CTRL vs. STIM, F(1,76) = 2.828 P = 0.0967. (l) two-way ANOVA, interaction F(3,76) = 1.795 P = 0.1553, CTRL vs. STIM, F(1,76) = 0.2599 P = 0.6117). m-r) same as (a-f), but for eGtACR1 expression in Av. (m) two-way ANOVA comparing the curve between the 0 and 5 s versus corresponding 5 s baseline without stimulation, interaction F(499,9980) = 9.999 P < 0.001, stimulation F(1,20) = 39.56 P < 0.001, Tukey’s post-hoc identifies difference at 0.77 and 0.84-1.9 s; Inset: Wilcoxon test P < 0.001; n= hemispheres, birds. (o) n = 4 birds, two-way ANOVA, interaction F(3,76) = 3.524 P = 0.0189, CTRL vs. STIM, F(1,76) = 0.1304 P = 0.7190. (p) two-way ANOVA, interaction F(3,76) = 0.6458 P = 0.588, CTRL vs. STIM, F(1,76) = 0.7229 P = 0.3979. (q) two-way ANOVA, interaction F(3,76) = 5.05 P = 0.003, CTRL vs. STIM, F(1,76) = 0.1634 P = 0.2050. (r) two-way ANOVA,interaction F(3,76) = 3.936 P = 0.0115, CTRL vs. STIM, F(1,76) = 1.143 P = 0.2883). Brain outlines in a, g and m adapted with permission from ref. 60, Wiley.
Extended Data Fig. 6 Concurrent lesions of pallial afferents to HVC and RA does not disrupt the song motif.
a) Schematic and sample images of retrogradely labeled HVC afferent neurons in NIf, Uva, mMAN and Av (green) and RA afferent neurons in lMAN and HVC (red) (coronal slices for LMAN and mMAN, sagittal slices for HVC, NIf, Av, Uva; scale bar 500 µm, insets 100 µm). b) Box plots (5-95 percentile, 25,50,75 percentile) reporting the number of retrogradely labeled neurons in each brain area per hemisphere (green projecting to HVC, red to RA, n = 12 hemispheres, 7 birds). c) box plot (top) and correlation (bottom) reporting the ratio of the number of retrogradely labeled neurons in mMAN and Uva (average = 1.46, R2 = 0.6124, Spearman r = 0.7972 P = 0.0029). d) as (c) for the number of neurons in NIf and Uva (average = 6.40, R2 = 0.4917, Spearman r = 0.7902 P = 0.0033). e) as (c) for the number of neurons in Av and Uva (average = 0.09, R2 = 0.1726, Spearman r = 0.7112 P = 0.0093). f) as (c) for the number of neurons in lMAN and HVC (average = 1.23, R2 = 0.3486, Spearman r = 0.6713 P = 0.0202). g) Schematic and sample image of the excitotoxic lesion of lMAN, mMAN, NIf and Av, combined with retrograde labeling from tracer injections in HVC (green) and RA (red). Tracer labeling reveals surviving afferent neurons and allows post-hoc unbiased estimation of the lesion extent (insets display magnified detail of retrogradely labeled HVCRA and UvaHVC cells; scale bar 500 µm, insets 100 µm). h) Box and scatter plot reporting the number of surviving retrogradely labeled neurons in each brain area per hemisphere (green projecting to HVC, red to RA, retrograde labeling in Uva and HVC are reference areas for unbiased lesion quantification, n = 12, 6 birds). Grayed out box plots outlines from panel B reports control data for ease of comparison of the lesion extent. i) Quantification of mMAN, NIf, Av and lMAN lesion, per bird (n = 6 birds). j) Time course of song self-similarity (avg ±SEM, One-Way ANOVA, Mixed-effects analysis, F (1.129, 5.364) = 1.599 P = 0.2640, n = 6 birds). k) Cumulative probability curves reporting the number of motifs/bout sung by the birds before (black) and after (blue) the bilateral excitotoxic lesion of NIf, Av, mMAN and lMAN (n = 6 birds, two-way ANOVA, testing the difference between the distributions of the number of motifs in each bout before and after the lesion, interaction F(8,45) = 0.5989 P = 0.7736, pre- vs. post-lesion F(1,45) = 0.6098 P = 0.4390). l) Scatter plot of motif start failure rate before (black circles) and after (blue triangles) the bilateral excitotoxic lesion of NIf, Av, mMAN and lMAN (two-sided Wilcoxon test P = 0.4375; n = 6 birds). Brain outlines in a and g adapted with permission from ref. 60, Wiley.
Extended Data Fig. 7 Optogenetic excitation of Area X.
a) Experiment design, ChRmine expression in Area X and outline of the optic fiber track (scale bar 500 µm), and song spectrogram showing song-contingent light stimulation of Area X neurons (top: control, scalebar 200 ms; bottom: stimulated, 200 ms 532 nm light pulses, red bars, horizontal lines identify the bouts’ (black), introductory notes’ (light gray), motifs’ (dark gray) and linker syllable’s (brown) boundaries). b) Probability of motif truncation (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation of Area X (X stimulated birds, n = 5, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. Area X, motif stop P < 0.001, pause+continuation P = 0.2842, no perturbation P < 0.001). c-f) Violin plots reporting accuracy and entropy of song segments with (gray) and without (white) stimulation for each bird (accuracy: n = 5, two-way ANOVA, testing the effect of optogenetic stimulation, interaction F(4,95) = 0.9003 P = 0.4671, CTRL vs. STIM, F(1,95) = 1.108 P = 0.2953; entropy: n = 5, two-way ANOVA, interaction F(4,95) = 6.521 P < 0.001, CTRL vs. STIM, F(1,95) = 16.50 P < 0.001; amplitude: two-way ANOVA testing the effect of optogenetic stimulation, interaction F(4,95) = 8.025 P < 0.001, CTRL vs. STIM, F(1,114) = 0.5747 P = 0.4503; goodness of pitch: two-way ANOVA, interaction F(4,95) = 9.002 P < 0.001, F(1,114) = 6.606 P = 0.0117). Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 8 Optogenetic excitation of RA.
a) Schematic of the song-contingent light stimulation of RA neurons expressing ChRmine, ChRmine expression in RA and outline of the optic fiber track (scale bar 500 µm), and spectrograms representing normal (top, scalebar 200 ms) and stimulated song (bottom, 10 ms 532 nm light pulses, red bars; horizontal lines identify the bouts’ (black), introductory notes’ (light gray), motifs’ (dark gray) and linker syllable’s (brown) boundaries) displaying light-evoked motif truncation of syllables followed by motif restart. b) (left) stacked song amplitude plots showing all the stimulated motifs, ordered by the timing of stimulation from the motif onset (red line). (right) Plot reporting a subset of stimulated motifs’ latency to optogenetic stimulation (red circle), motif truncation (blue “x”), and restart of a motif (orange), intro notes not followed by a motif (purple), calls (grey) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. c) The probability of motif stop (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (RA stimulated birds, n = 5, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. RA, motif stop P = 0.8675, pause+continuation P = 0.4594, no perturbation P = 0.9933). d) Average latency ±SEM of motif truncation in response to RA light stimulation (bins = 10% motif advancement). e) Latency to motif truncation following light stimulation of RA or HVC (blue: RA stimulation, white: HVC dataset from Fig. 1; Nested one-way ANOVA comparing latency to restart across groups, F(6,19) = 9.678 P < 0.001, Dunnett’s post-hoc pan-HVC vs. RA P = 0.9986). f) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve, blue: RA-stimulated birds, black: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. RA P > 0.05). (inset) latency of motif truncation computed across all the birds (blue: RA-stimulated birds, white: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, post-hoc HVC vs. RA P > 0.9999). g) Average ±SEM probability of post-truncation vocalization resumption by category (motif restart (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause (green)) upon stimulation delivered at different latencies throughout the progression of the motif (bins = 10% motif advancement). h) Average ±SEM probability of post-truncation behavior (no vocalization resumption in the 1 s post-truncation (black), motif restart (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause (green)) in response to the RA light stimulation delivered at different latencies throughout the progression of the motif (bins = 10% motif advancement). i) (left) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability of motif restart following optogenetic stimulation of RA or HVC (orange dots: n = 5 RA-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i. The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines maximum, median, and minimum); one-way ANOVA testing the difference between groups’ restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. RA P = 0.04996). Normalized probability of post-truncation motif restart for each bird (RA stimulated birds: filled circles; empty box plots from HVC stimulated birds in Fig. 1i reported for comparison; one-way ANOVA testing the difference between groups’ normalized restart probabilities, F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. RA: P = 0.0203). j) Violin plots reporting the latency to motif restart (orange: RA stimulation birds, white: HVC stimulation dataset from Fig. 1; Nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. RA P = 0.2186). k) Cumulative probability curves reporting the latency to post-truncation motif restart (average ±SEM of each bird’s curve, orange: RA, black: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc pan-HVC vs. RA identifies significant difference (p < 0.05) between 60 and 280 ms time bins). (inset) Latency to motif restart computed across all the birds (orange: RA stimulated birds, white: HVC birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, post-hoc HVC vs. RA P < 0.001). Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 9 Selective optogenetic stimulation of HVCRA neurons restarts the song motif.
a) Schematic of the experiment showing song-contingent in-vivo light stimulation of HVCRA neurons expressing the excitatory opsin ChRmine, and spectrograms representing normal (top, scale bar 200 ms) and stimulated song (bottom, 50 ms 532 nm light pulses, red bars) displaying light-evoked motif truncation of syllables followed by rapid motif restarting (horizontal lines identify the bouts’ (black), introductory notes’ (light gray) and motifs’ (dark gray) boundaries). b) (left) Stacked song amplitude plot showing all the stimulated motifs, ordered by the timing of stimulation from the motif onset (red line). (right) Plot reporting a subset of stimulated motifs’ latency to optogenetic stimulation (red circle), motif truncation (blue “x”), and restart of a motif (orange), intro notes not followed by a motif (purple), calls (grey) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. c) Probability of motif stop (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (HVCRA stimulated birds, n = 4, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulationF(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. HVCRA, motif stop P > 0.9999, pause+continuation P = 0.6537, no perturbation P = 0.5368). d) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve, blue: HVCRA-stimulated birds, black: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. HVCRA identifies significant difference (p < 0.05) between 60 and 90 ms time bins). (inset) Latency of motif truncation computed across all the birds (blue: HVCRA-stimulated birds, white: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, post-hoc HVC vs. HVCRA P < 0.001). e) Average ±SEM probability of post-truncation vocalization resumption (motif restart (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause (green) in response to the HVCRA light stimulation delivered at different latencies throughout the progression of the motif (bins= 10% motif advancement). f) Box plots (5-95 percentile, 25,50,75 percentile) reporting the normalized probability of motif restart (orange dots: n = 4 HVCRA-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Fig. 1i; one-way ANOVA testing the difference between groups’ normalized restart probabilities, F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. HVCRA: P = 0.8071). g) Cumulative probability curves reporting the latency to post-truncation motif restart (average±SEM of each bird’s curve, orange: HVCRA, black: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc pan-HVC vs. HVCRA identifies significant difference (p < 0.05) between 70 and 130 ms timebins). (inset) Latency to motif restart computed across all the birds (orange: HVCRA stimulated birds, white: HVC birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, post-hoc HVC vs. HVCRA P = 0.1157). Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 10 Optogenetic manipulation of HVCRA neurons.
a) Schematic of the viral strategy (AAV_DIO_ChRmine in HVC, high-titer retrograde AAV_Cre in RA) and sample images of retrogradely labeled HVCRA (magenta, arrowheads) but not HVCX neurons (cyan) displaying conditional expression of ChRmine (yellow) (scale bar 200 µm, insets 3x magnification). b) HVC multiunit neuronal activity recording in anesthetized birds expressing ChRmine in HVCRA neurons. Sample trace (top, scale bar 1 V, 1 s), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation (100 ms, red bar; two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation, interaction F(69,483) = 3.445 P < 0.001, stimulation F(1,7) = 7.902 P = 0.0261, Sidak post-hoc P < 0.05 between 50 ms after the light onset and 50 ms after the light offset). (inset) PSTH and scatter plot illustrating the average (per hemisphere) response to the first 100 ms light stimulation (red dashed rectangle) compared to the last 100 ms baseline (black dashed rectangle, Wilcoxon test P = 0.0156; n = 8 hemispheres, 4 birds). c) Average latency ±SEM to motif truncation in response to the HVCRA light stimulation (bins = 10% motif advancement). d) Latency to motif truncation following light stimulation (blue: HVCRA stimulation, white: HVC dataset from Fig. 1; nested one-way ANOVA comparing latency to truncation across groups, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc pan-HVC vs. HVCRA P = 0.2873). e) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) in response to the HVCRA light stimulation computed based on the time of stimulation through the progression of the motif (bins = 10% motif advancement). f) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability of motif restart (orange dots: n = 4 HVCRA-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i. The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum); one-way ANOVA testing the difference between groups’ restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. HVCRA P = 0.9963). g) Latency of motif restart (orange: n = 4 birds HVCRA stimulation birds, white: HVC stimulation dataset from Fig. 1 compared against all experimental groups across the manuscript; nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. HVCRA P = 0.8071). h-i) Cumulative probability curves and violin plots (data reported from Extended Data Fig. 8, 9) illustrating the latency to song truncation (h) and latency to motif restart (i) in response to the light stimulation (average±SEM of each bird’s curve, magenta: n = 4 HVCRA-stimulated birds, yellow: n = 5 RA-stimulated birds, black: n = 6 HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript; 10 ms time bins). For truncation (h), two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. HVCRA identifies significant difference (p < 0.05) at the 60-90 ms time bins; violin plots: one-way ANOVA, Kruskal Wallis test H(5) = 468.9, Dunn’s multiple comparisons test: HVC vs. HVCRA P < 0.001). For restart latency (i) two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P < 0.001, Tukey’s post-hoc pan-HVC vs. HVCRA identifies significant difference (p < 0.05) at the 170-200 ms time bins; violin plots: one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVC vs. HVCRA P = 0.1157, HVCRA vs. RA P < 0.001). Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 11 Optogenetic manipulation of HVCX neurons.
a) Schematic of the viral strategy (AAV_DIO_ChRmine in HVC, high-titer retrograde AAV_Cre in X) and sample images of retrogradely labeled HVCX (cyan, arrowheads) but not HVCRA neurons (magenta) displaying conditional expression of ChRmine (yellow) (scale bar 200 µm, insets 3x magnification). b) HVC multiunit neuronal activity recording in anesthetized birds expressing ChRmine in HVCX neurons. Sample trace (top, scale bar 1 V, 1 s), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation (100 ms, red bar; two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation, interaction F(69,483) = 10.35 P < 0.001, stimulation F(1,7) = 23.44 P = 0.0019, Sidak post-hoc P < 0.05 between 30 ms after the light onset and 90 ms after the light offset); inset, paired t-test P < 0.001; n = 8 hemispheres, 4 birds). c) Average latency ±SEM to motif truncation in response to the HVCX light stimulation (bins= 10% motif advancement). d) Latency to motif truncation following light stimulation (blue: HVCX stimulation, white: HVC dataset from Fig. 1; nested one-way ANOVA testing the difference between truncation latencies, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc=0.9599). e) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) in response to the HVCX light stimulation computed based on the time of stimulation through the progression of the motif (bins=10% motif advancement). f) Probability of motif restart (orange dots: HVCX-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i. The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum); One-way ANOVA testing the difference between restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. HVCX P = 0.8772). g) Latency of motif restart (orange: n = 4 birds HVCX stimulation birds, white: HVC stimulation dataset from Fig. 1 compared against all experimental groups across the manuscript; Nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. HVCX P > 0.9999). h-i) Cumulative probability curves and violin plots illustrating the latency to song truncation (h) and latency to motif restart (i) in response to the light stimulation (magenta: HVCRA-stimulated birds, cyan: HVCX-stimulated birds, data reported from Extended Data Fig. 9 and Fig. 4 respectively; data compared across all the groups throughout the manuscript, 10 ms time bins; (h) two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc HVCRA vs. HVCX identifies significant difference (p < 0.05) at the 60-80 ms time bins; violin plots: one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, Dunn’s multiple comparisons test: HVCRA vs. HVCX P < 0.001. (i) two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P < 0.001, Tukey’s post-hoc HVCRA vs. HVCX identifies significant difference (p < 0.05) at the 170-200 ms time bins; violin plots: One-Way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVCRA vs. HVCX P < 0.001). Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 12 Antidromic optogenetic excitation of HVCX neurons from Area X.
a) (top) Schematic and sample image of eGtACR1 expression in HVC and fiber optic implant over Area X for antidromic excitation of HVCX neurons (scale bar 1 mm); (bottom) spectrograms (0-11KHz, scale bar 200 ms, horizontal lines identify song elements) representing normal (top) and stimulated song (bottom, 50 ms 470 nm light pulses, red bars) displaying light-evoked motif truncation of syllables followed by rapid motif restarting. b) HVC multiunit neuronal activity recording in anesthetized birds, sample trace (top, scale bar 1 V, 1 s; inset 1 V, 100 ms), raster plot (mid, 30 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation of eGtACR1-expressing HVCX afferents reaching Area X (100 ms, red bar, two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation, interaction F(69,345) = 2.179 P < 0.001, stimulation F(1,5) = 9.957 P = 0.0252, Sidak post-hoc P < 0.05 between 20 ms and 70 ms after the light onset). (inset) PSTH and scatter plot showing the average (per hemisphere) response to the first 100 ms of light stimulation (red dashed rectangle) compared to the last 100 ms baseline (black dashed rectangle, paired t-test P = 0.0446; n = 6 hemispheres, 3 birds). c) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability of motif truncation (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (HVC→X stimulated birds, n = 2, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc, motif stop P = 0.0129, pause+continuation P = 0.6958, no perturbation P < 0.001). d) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve, blue: HVC→X, black: HVC, dataset from Fig. 1 compared against all experimental groups across the manuscript, cyan: HVCX dataset from Fig. 4d for comparison, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions across the manuscript, F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. HVC→X identifies significant difference (p < 0.05) at the 50−80 ms timebins, HVCX vs. HVC→X P > 0.05). (inset) Latency of motif truncation computed across all the birds (blue: HVC→X, cyan: HVCX dataset from Fig. 4, white: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVC vs. HVC→X P < 0.001, HVCX vs. HVC→X P < 0.001; Cohen’s d = 0.49). e) Latency of motif truncation upon light stimulation, per bird (blue: HVC→X stimulation, white: HVC dataset from Fig. 1; nested one-way ANOVA comparing all datasets across the manuscript, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc pan-HVC vs. HVC→X P = 0.6512). f) Box plots (5−95 percentile, 25,50,75 percentile) reporting the normalized (left) and not normalized (right) probability of motif restart for each bird (n = 2 HVC→X stimulated birds: filled circles; empty box plots from Fig. 1i and Extended Data Fig. 1i reported for comparison; for the not-normalized probability, the underlying shaded areas represent the baseline probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed top and bottom lines max and min, mid line represents median); left, one-way ANOVA testing the difference between pan-HVC and HVC→X groups’ normalized restart probabilities, F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. HVC→X: P > 0.9999; right, one-way ANOVA testing the difference between pan-HVC and thalamus groups’ restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc P = 0.9996). g) Latency of motif restart (orange: HVC→X stimulation birds, white: HVC stimulation dataset from Fig. 1; nested one-way ANOVA comparing all datasets across the manuscript, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc HVC vs. HVC→X P = 0.9990). h) Cumulative probability curves reporting the latency to post-truncation motif restart in response to stimulation of HVCX axon terminals (average ±SEM of each bird’s curve, orange: HVC→X, cyan: HVCX dataset from Fig. 4, black: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, 2 W ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc HVC→X vs. HVC identifies significant difference at the 70-120 ms time bins; HVC→X vs. HVCX P > 0.05). (inset) Latency to motif restart computed across all the birds (orange: HVC→X, cyan: HVCX dataset from Fig. 4, white: HVC birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: pan-HVC vs. HVC→X P > 0.9999, HVCX vs. HVC→X vs. P > 0.9999). Brain outline in a adapted with permission from ref. 60, Wiley.
Extended Data Fig. 13 HVCX neurons in song pattern generation.
a) Box plots (5−95 percentile, 25,50,75 percentile) reporting oEPSC (left) and oIPSC (center) amplitude by cell class (Mann-Whitney test, oEPSCs U = 77 P = 0.0028, oIPSCs u = 27 P < 0.001; n= cells, animals), and (right) the ratio of oEPSC and oIPSC peak amplitudes (Mann-Whitney test, U = 119 P = 0.9847; n = cells, animals). b) Time course reporting data from Fig. 5k plotted by day, per bird, each bird color coded based on the TeNT average normalized optical density in HVC. c) Scatter plot correlating the motif self-similarity (compared to the baseline motif) with the average number of TeNT+ cells/brain slice, per each bird, color coded based on the TeNT average normalized optical density in HVC (Spearman r = −0.2970, r2 = 0.05723 P = 0.4069). d) Same as (h) but for the accuracy of the motif (Spearman r = −0.2848, r2 = 0.03963 P = 0.4271). e) Spectrograms (0-11 KHz, scale bar 200 ms, horizontal lines identify song elements) from a bird showing intermediate levels of TeNT expression in HVCX neurons (different from the bird in Fig. 5). Notice the continuous failure to complete motifs, either with truncation within syllables or at syllable boundaries, followed by rapid motif restart. The inset reports all spectrograms from one day in week 3, aligned at syllable A and ordered by motif length. f) Simplified syntax raster plots (100 motifs/day) for the bird in panel E. g) Cumulative probability curves reporting the latency to post-truncation motif restart (average ±SEM of each bird’s curve, blue: HVCX TeNT, cyan: HVCX dataset from Fig. 5, compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc HVCX vs. HVCX TeNT P > 0.05). (inset) Violin plots reporting the latency of motif truncation computed across all the birds (HVCX dataset from Fig. 4, HVCX TeNT dataset from Fig. 5; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVCX vs. HVCX TeNT P = 0.5654). Brain outline in a adapted with permission from ref. 60, Wiley.
Supplementary information
Supplementary Tables (download PDF )
Supplementary Tables 1–3.
Supplementary Video 1 (download MOV )
3D rendering of zebra finch brain with tracers injected in HVC (green) and RA (red) to label efferent axons and retrogradely identified afferent neurons. RA axons flow caudally in the posterior commissure around and below Uva.
Supplementary Video 2 (download MOV )
Animation displaying the model’s prediction of HVCPN and interneuron activity waves in normal conditions, upon optogenetic stimulation and in network degradation conditions mimicking TeNT expression in a subset of HVCX neurons.
Supplementary Video 3 (download MOV )
Animation of the schematic in Fig. 5b representing a proposed simplified description of HVC dynamics in normal conditions, upon optogenetic stimulation and in network degradation conditions mimicking TeNT expression in a subset of HVCX neurons.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Trusel, M., Zuo, J., Alam, D.H. et al. Holistic motor control of zebra finch song syllable sequences. Nature 652, 157–166 (2026). https://doi.org/10.1038/s41586-025-10069-z
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-10069-z







