Main

Motor behaviours are considered to be learned by splitting and chunking smaller behavioural units into sequences of neural activity and then concatenating the sequences into a unified premotor plan that supports the fluid production of the entire behaviour2,3,4,5,6,7. Although there is evidence for chunking3,6, identifying the neural origin for unified premotor programs has remained challenging. The control of learned birdsong provides a tractable model for searching such premotor programs. Songbirds are among the few groups of animals, apart from humans, that learn their vocalizations through imitation. Moreover, birdsong is controlled by dedicated forebrain circuits8.

Zebra finches learn a single courtship song motif. They engage in extensive daily practice to maintain expert performance of this song. Sparse sequential neuronal activity in the pallial song nucleus HVC probably underlies the production of zebra finch song9,10,11,12,13. However, how neural sequences in HVC contribute to the progression of the song motif is still not well understood. Several lines of evidence support the idea that song control may involve reciprocal loops spanning the brainstem, thalamus and pallium14,15,16,17 (Extended Data Fig. 1a), whereas other studies suggest that HVC may be capable of generating neural sequences for song production more autonomously10,11,18,19.

There are various models of how adult songs are controlled: (1) sequential activity in HVC can sustain progression through all song syllables independently of instructive afferent inputs18,19,20; (2) input pathways link shorter neural sequences at syllable or other vocal parameter boundaries14,17,21,22; and (3) HVC sequences are continuously updated by instructive afferent input15,16. Research in this area has relied on correlations between song and electrophysiological recordings or on non-selective circuit manipulations, including electrical stimulation, cooling of brain regions and electrolytic lesioning. Here we combine a series of cell-type, circuit and pathway manipulations with synaptic mapping and computational modelling to causally examine how neural sequences contribute to completing the song motif. This study reveals that, barring a permissive thalamic input important for song initiation, HVC can independently propagate activity for production of all song syllables in the motif, and that this network relies on two synaptically interconnected classes of HVC projection neurons.

Optogenetic restarting of song

Electrical stimulation of HVC has varied effects on song production, including distortion of syllable acoustic features, truncation of song and occasional restarting of song soon after song truncation23,24. However, these studies are difficult to interpret because stimulation cannot be restricted to specific cell types or to cells within a small spatial volume, the stimulated population of neurons is highly dependent on electrode placement and there is inevitable antidromic and orthodromic activation of neurons and passing axons25.

Instead, we selectively controlled HVC activity using viral expression of the excitatory opsin ChRmine (n = 6 birds; Fig. 1a and Extended Data Fig. 1b). This provided experimental control over a population of HVC neurons composed of approximately 20% inhibitory neurons and 80% principal neurons, with a bias towards HVCX projection neurons26. Birds were implanted with fibre optics over HVC, and syllable detection software was used to perform closed-loop optogenetic manipulations while the birds were freely singing. Light stimulation reliably caused song truncation, seen as a rapid decrease in sound amplitude and disruption in syllable acoustic features (stimulation outcome probability: 86.8 ± 3.6% truncation and 10.2 ± 3.7% pause + continuation; latency to silence from onset of stimulation: 66.6 ± 4.1 ms and average ± s.e.m. in six birds; Fig. 1b–g and Extended Data Fig. 1c–g). Truncation was followed by the rapid restarting of the song motif (median, 135.8 ± 25.8 ms; lowest quartile, 87.6 ± 15.5 ms; Fig. 1b–d,h–j and Extended Data Fig. 1c). The birds restarted their song from the beginning, with one or two introductory notes followed by the motif or directly back to the first syllable of the motif, and this resetting behaviour occurred with high probability, independent from when in the song the optogenetic stimulation was triggered (Fig. 1h; all post-truncation trials reported in Extended Data Fig. 1h). When normalized by the likelihood of the birds to chain multiple motifs in series, the probability of a stimulated motif to be immediately followed by another motif was 108.6 ± 4.9%, suggesting that the optogenetic perturbation caused the song to restart from the beginning of the motif without prematurely ending the song bout (Fig. 1i and Extended Data Fig. 1i).

Fig. 1: Optogenetic excitation of HVC causes truncation and restarting of the song motif.
Fig. 1: Optogenetic excitation of HVC causes truncation and restarting of the song motif.
Full size image

a, Schematic of closed-loop song-contingent light stimulation of HVC; sample image of HVC ChRmine-expressing neurons. b, Spectrograms (0–11 kHz) of normal song (top) and stimulated song (bottom). Horizontal lines identify song element boundaries, introductory notes ‘i’ followed by syllables (A, B and C) composing the motif. Light stimulation (red bars: 10-ms light) causes motif truncation (blue dashed lines overlaying letters; dashed contour represents the missing portion of the truncated syllable; motif truncation represented by the line being truncated at an angle). Orange dots indicate restart. c, Stacked control (top) and stimulated (bottom) song amplitude plots ordered by latency of stimulation onset (red line; arrow). d, Latency to stimulation (red), motif truncation (blue) and identity of resumed vocalization within 1 s following stimulation. e, Box plots (5th–95th percentile; 25th, 50th and 75th percentiles) showing the outcome of optogenetic stimulation (average probability, n = 6 birds). f, Average latency ± s.e.m. to motif truncation in response to stimulation delivered across the motif (bins, 10%; motif advancement, n = 6 birds). g, Box plots showing truncation latencies computed across all trials, per bird (n = 6). h, Probability (average ± s.e.m.) of post-truncation vocalization resumption by category upon stimulation delivered throughout the motif (bins: 10% motif advancement). i, Normalized probability of post-truncation motif restart, per bird (Methods). j, As in g for motif restart latency, per bird (n = 6). k, Subsyringeal pressure recordings (dotted line indicates ambient pressure; deviations above indicate expiration and deviations below indicate inspiration) aligned at the onset of stimulation (red bar, 50 ms; top, unstimulated trace; bottom, 34 motif traces; grey bar highlights the corresponding point in the unstimulated motif waveform). l, As in k, stimulation during quiet respiration or calls (top, sample traces; bottom, 56 traces aligned at the stimulation onset). m, Schematic of the two proposed possible scenarios; song progression is either controlled through extrinsic updates (top, red arrows) to HVC activity or controlled more autonomously by HVC (bottom, red arrow). Scale bars, 200 µm (a), 200 ms (b,k,l), 0.5 a.u. (k,l). Brain outline in a adapted with permission from ref. 60, Wiley.

Source Data

To better understand how our circuit manipulations affect the motor control of the song, we recorded subsyringeal air sac pressure during optogenetic stimulations. We found that optogenetic stimulation applied during quiet respiration neither induced vocalization nor altered respiratory patterns in the birds. By contrast, stimulation during singing caused rapid cessation of expiration during ongoing syllables (Fig. 1k,l and Extended Data Fig. 1j). Syllable truncations resulted from significant respiratory pressure deviations within 36.4 ± 4.0 ms of light onset, approximately 30 ms before vocalizations were acoustically truncated, consistent with previous studies15,27 (Extended Data Fig. 1k). Finally, we found that optogenetic stimulation trials in which birds did not quickly restart singing could be the result of apnoea. Thus, in some cases, optogenetic activation suppressed involuntary respiration, which effectively blocked the reinitiation of song (apnoea duration: 588.2 ± 216.8 ms; Extended Data Fig. 1l).

Together, these data indicate that HVC can control downstream steady-state respiration circuitry in a state-dependent manner, and that once HVC is engaged, stimulation interrupts the chain of activity in HVC, resulting in abrupt song truncation and resetting of the motif back to its initial state. These attributes are reminiscent of response to perturbation in central pattern-generating (CPG) networks described in the invertebrate and vertebrate nervous systems28,29. Another defining feature of CPG networks is that once initiated, they can produce patterned activity in the absence of instructive patterned input. The seemingly automatic and rapid restarting of song hints that extrinsic inputs to HVC may function permissively, rather than instructively, in song motif production (Fig. 1m), raising the possibility that HVC produces the neuronal sequences for song in the absence of instructive patterned input and may function as a pattern-generating network for song syllable sequences.

Song initiation needs thalamic input

Input to HVC from the thalamic nucleus Uvaeformis (Uva) is one probable source of instructive signals for producing the song motif18,22,30,31. Electrical stimulation of Uva was reported to cause motif truncation at syllable boundaries17, suggesting that the inputs of Uva to HVC are instructive for motor programs to transition from one syllable to the next. To test this idea, we first used optogenetic excitation of the axon terminals of Uva in HVC through viral expression of eGtACR1, an opsin that potently drives excitation of axon terminals in zebra finches31 (Extended Data Fig. 2a,b). Light stimulation of eGtACR1-expressing Uva–HVC terminals drives strong transient increases in HVC activity (Fig. 2a and Supplementary Table 1). In contrast to thalamic electrical stimulation17, the optogenetic excitation of Uva terminals during singing did not cause motif truncation and left song syntax and spectral characteristics unaffected (Fig. 2b,c, Extended Data Fig. 3a,b and Supplementary Table 1).

Fig. 2: Uva does not instruct transitions between syllables in the song motif.
Fig. 2: Uva does not instruct transitions between syllables in the song motif.
Full size image

a, Schematic, sample trace, raster plot and normalized peri-stimulus time histogram (PSTH) of HVC multi-unit activity recording in anaesthetized birds expressing eGtACR1 in Uva; light stimulation of Uva afferents (1 s, red bar); inset, magnified PSTH and scatter plot comparing baseline and stimulation (200 ms, dashed rectangles; n = 30 hemispheres, 17 birds). b, Song-contingent light stimulation (red bar, 200 ms) of Uva terminals in HVC; sample spectrogram (0–11 kHz; horizontal lines identify song elements). c, Violin plots reporting accuracy of song segments with (grey) and without (white) stimulation, per bird (n = 4). d, UvaHVC neurons (labelled by retrograde tracer, green) expressing ChRmine (red). Dashed white lines, fibre-optic tip. e, As in b for UvaHVC stimulation. f, Box plots (5th–95th percentile; 25th, 50th and 75th percentiles) reporting optogenetic stimulation outcome (average probability, n = 3 birds; filled circles, empty box plots from Fig. 1e reported for comparison). g, As in c for UvaHVC stimulation (n = 3 birds). Scale bars, 200 ms (b), 200 µm (d), 20 µm (d (inset)). Brain outlines in a and d adapted with permission from ref. 60, Wiley.

Source Data

The lack of effects on song, even with prolonged stimulation, prompted us to test whether direct optogenetic excitation in Uva disrupts song. We expressed the excitatory opsin ChRmine in Uva neurons projecting to HVC (UvaHVC) using an intersectional viral strategy (Fig. 2d). We found that even directly stimulating UvaHVC neurons failed to cause song truncation and restarting (1.2 ± 1.2%, motif stop; 98.9 ± 1.2%, no effect; Fig. 2e,f and Supplementary Table 1). Moreover, this manipulation had no detectable impact on the spectral characteristics of song syllables (Fig. 2g, Extended Data Fig. 3c and Supplementary Table 1).

One possibility is that manipulations such as electrical stimulation may drive truncation at syllable boundaries through off-target effects, such as recruiting nearby thalamic regions or fibres of passage. Uva is located within the posterior commissure, which connects midbrain regions critical for vocalizations, audition and vision. It is immediately adjacent to the robust nucleus of the arcopallium (RA) fibre tract, which transmits descending motor commands for song (Supplementary Video 1). Neurons in and surrounding Uva relay visual information to the forebrain32, and sudden visual stimulation with a stroboscope elicits orienting responses in zebra finches27,33 that result in motif truncations at syllable boundaries, similar to those observed with electrical stimulation of Uva17.

To assess whether off-target effects could be involved in truncating motifs at syllable boundaries, we attempted to mimic the effects of electrical stimulation by non-selectively expressing ChRmine in Uva and the surrounding thalamus (Extended Data Fig. 3d). Broader thalamic optogenetic stimulation resulted in reliable motif truncation at syllable boundaries (91.5 ± 3.6%, motif stop; 0.4 ± 0.4%, pause + continuation; Extended Data Fig. 3d–k). In contrast to optogenetic stimulation in HVC or along the Uva–HVC pathway, optogenetic stimulation of the broader thalamus caused birds to momentarily stop movement and blink, both during singing and non-singing states. This suggests that the manipulation causes a visually evoked orienting response, perhaps mimicking responses to strobe-light visual stimulation27,33. Consistent with this, broader thalamic stimulation resulted in significantly longer truncation latencies than direct HVC stimulation (Extended Data Fig. 3g–k) and predominantly led to cessation of singing rather than resetting of song (Extended Data Fig. 3l,m). In the few instances when birds returned singing, the motif reset latency was significantly longer than what we observed when stimulating HVC (Extended Data Fig. 3n,o). These findings suggest that electrical stimulation-triggered song truncations are the result of off-target stimulation of the peri-Uva thalamus, and that Uva is not instructive for HVC syllable sequence progression.

We next examined if Uva could play a permissive role in song production. Electrolytic lesions of Uva or peri-Uva regions can abolish courtship song production22,30. However, electrolytic lesions non-selectively ablate neurons and damage axonal fibres. To minimize damaging axonal tracks, we performed bilateral excitotoxic lesions of Uva using a cocktail of ibotenic and quisqualic acid (n = 13 birds). This strategy yielded three outcomes: (1) complete lesions of Uva (99.6 ± 0.4% Uva lesioned) that also included the peri-Uva thalamus that resulted in birds that could no longer sing their motif; (2) large peri-Uva thalamic lesions that mostly spared Uva (10.8 ± 3.8% Uva lesioned) that resulted in birds that also could no longer sing their motif; and (3) almost complete Uva lesions (87.5 ± 7.0% Uva lesioned) that spared the broader peri-Uva thalamus and resulted in birds that could sing their motif within approximately 1 week following lesion (Fig. 3a–c and Extended Data Fig. 4a). This last group of birds demonstrates that HVC can drive production of the entire song motif, even when Uva is significantly lesioned. Nonetheless, we found that these birds chained significantly fewer motifs together in each song bout (Fig. 3d and Supplementary Table 2), and they would often fail to produce their song motif after singing introductory notes (Fig. 3e and Supplementary Table 2). These findings are consistent with Uva lesions disrupting the ability of birds to initiate courtship song performances to female birds30, and they suggest that Uva may be needed for the initiation of the song motif.

Fig. 3: Uva is permissive for motif initiation.
Fig. 3: Uva is permissive for motif initiation.
Full size image

a, Schematic and sample image (NeuN immunofluorescence, grey; HVC retrograde tracer, green) and spectrograms (0–11 kHz; horizontal lines identify song elements) reporting the effect of excitotoxic bilateral lesion of Uva and peri-Uva thalamus. b, Motif self-similarity before (circles) and 1–2 weeks after (triangles) lesions (peri-thalamus + Uva (brown; n = 8); peri-thalamus excluding Uva (grey; n = 2); Uva excluding perithalamic areas (blue; n = 3). c, Percentage of Uva lesion in the three experimental groups. d, Cumulative probability of motifs per bout sung by the birds before (black) and 30 d after (blue) excitotoxic lesion of Uva that spared peri-Uva thalamic regions (n = 3 birds). e, The rate of motif start failures before (grey circles) and after (blue triangles) the excitotoxic Uva lesions. f, Schematic, sample image and spectrograms (as in a) reporting the effects of TeNT expression in UvaHVC neurons. g, Box plots (5th–95th percentile and 25th, 50th and 75th percentiles) reporting self-similarity between motifs sang before viral injection (grey) and the last motifs produced before complete cessation of singing upon expression of TeNT in UvaHVC neurons for 1–2 weeks (purple triangles; n = 6 birds). h, Cumulative probability of motifs per bout before (black) and 1–2 weeks after expression of TeNT in UvaHVC neurons (purple; n = 6 birds). i, Rate of motif start failures before (grey circles) and after (purple triangles; n = 6 birds) expression of TeNT in UvaHVC neurons. NS, nonsignificant. Scale bars, 1 mm (a), 100 µm (a (insets)), 1 s (a (spectrograms)), 200 µm (f). Brain outlines in a and f adapted with permission from ref. 60, Wiley.

Source Data

To test the role of Uva in song initiation, we first blocked glutamate release from UvaHVC neurons using viral expression of tetanus neurotoxin (TeNT) (Fig. 3f and Extended Data Fig. 4b). These birds had progressive difficulty initiating their song on a timeline consistent with viral expression (approximately 10–14 days). They had increasing failures in motif initiation following singing of introductory notes and decreased number of motifs per song bout. However, in instances when the motif was initiated, the birds consistently produced all song syllables in the motif with high accuracy (Fig. 3f–i, Extended Data Fig. 4c and Supplementary Table 2). These data support the idea that the Uva–HVC pathway is permissive for initiating learned song motifs rather than instructing song syllable transitions17.

To test this, we expressed eGtACR1 in Uva and optogenetically silenced Uva neurons during singing (Extended Data Fig. 4d). We found that silencing Uva during an ongoing song motif did not disrupt the completion or acoustic structure of that motif, but it reduced the probability of initiating and concatenating a subsequent motif (Extended Data Fig. 4e,f). By contrast, using the same birds and placing fibre optics over HVC to excite Uva axon terminals across motif transitions did not suppress initiation of a subsequent song motif (Extended Data Fig. 4g,h). Thus, if Uva input to HVC is excited, birds can continue the ongoing motif and string other motifs together. If instead it is inhibited, birds still complete the ongoing motif but exhibit difficulty starting the next song motif. Thus, the Uva–HVC pathway is critical for initiating song motifs, potentially coordinating the two hemispheres, but not needed for birds to string together syllables within the song motif.

Pallial afferents are not needed for song

HVC receives excitatory input from three auditory and premotor pallial regions that play important roles in song learning: nucleus interfacialis (NIf), nucleus avalanche and medial magnocellular nucleus of the anterior nidopallium (mMAN)30,34,35,36,37. We examined the role of each pathway in adult song performance. Stimulation of eGtACR1-expressing axon terminals in HVC from any of these regions significantly increased HVC multi-unit firing activity (Extended Data Fig. 5a,g,m). However, 200-ms-long or 1-s-long song-contingent light stimulation of any of these input pathways failed to affect spectrotemporal motif characteristics (Extended Data Fig. 5a–r). We therefore tested whether these afferents are necessary for adult song performance. Previous studies indicate that bilateral lesions of either NIf, mMAN or nucleus avalanche in adults do not cause any long-lasting disruptions in song34,36,38. However, it has been shown that compensation by other pathways could account for the lack of sustained effects on song. Therefore, we consecutively lesioned mMAN, NIf and nucleus avalanche in the same birds using ibotenic and quisqualic acid. Bilateral lesions of these nuclei (mMAN, 100.0 ± 0.0%; NIf, 92.9 ± 4.0%; nucleus avalanche, 100.0 ± 0.0%; lateral magnocellular nucleus of the anterior nidopallium (lMAN), 82.5 ± 7.7%; Extended Data Fig. 6a–i) caused only a temporary decrease in motif quality. The song motif quickly recovered to its pre-lesioned state (Extended Data Fig. 6j), and, unlike Uva lesions, these lesions did not impact the number of motifs per bout or cause disruptions in motif initiation (Extended Data Fig. 6k,l). This demonstrates that HVC can generate the sequential activity necessary for completing song independent of its known main excitatory synaptic afferents, further supporting the idea that HVC is the origin of a unified premotor program for the zebra finch song motif.

Song pattern-generating network in HVC

To further define the circuit boundaries of the song pattern-generating network, we examined whether downstream target regions of HVC are critical to pattern generation. We reasoned that the kinetics of the post-truncation restarting of song provides a sensitive behavioural read-out of pattern resetting and could clarify whether those neural circuits are involved in song pattern generation or simply relay the patterned output. Disruption of a pattern generation node would produce truncation and reset latencies similar or faster than those observed upon HVC stimulation, whereas a relay node would result in low-latency truncations followed by low-probability and longer-latency motif resetting. HVC has two major output pathways: the descending song motor pathway through the pallial song region RA and the palliostriatal pathway through area X, emerging from HVCRA and HVCX neurons, respectively8.

We bilaterally expressed ChRmine in either area X or RA and light stimulated each region in freely singing birds. Driving area X neurons rarely caused motif truncation (truncation probability, 2.9 ± 1.6%; no effect, 97.2 ± 1.6%; Extended Data Fig. 7a,b). The truncations we observed occurred at syllable boundaries and were significantly delayed (latency, 146.7 ± 36.7 ms) compared with the uniform song truncations observed with HVC optogenetic stimulation. Nonetheless, stimulation of area X neurons consistently caused a modest increase in the noisiness of stimulated syllables (Extended Data Fig. 7c,f), consistent with the known role of the basal ganglia pathway38.

By contrast, optogenetic stimulation in RA caused rapid motif truncations with high reliability (92.2 ± 2.7%, motif stop; 1.5 ± 1.8%, pause + continuation; Extended Data Fig. 8a–c). These truncations exhibited uniform latency across song, similar to stimulation in HVC (Extended Data Fig. 8d–f). Because RA is downstream of HVC in the song motor pathway, we might expect birds to restart their song as fast, or faster, than when stimulating in HVC and with equal probability. However, we found that RA stimulation is less likely to be followed with restarting of the motif. When it does, it takes significantly longer than following HVC stimulation (Extended Data Fig. 8g–k). This argues that the song pattern-generating network is localized to HVC and that RA functions downstream of this network to relay motor commands for song.

To test this prediction, we moved upstream by one synapse and examined whether optogenetic stimulation of HVCRA neurons would produce the truncation and song restarts with the same timing and reliability as our pan-HVC optogenetic manipulations, as shown in Fig. 1. We used an intersectional viral strategy to achieve ChRmine expression only in HVCRA neurons (Extended Data Figs. 9 and 10). As anticipated, song-contingent HVCRA stimulation reliably caused rapid truncations throughout song (86.2 ± 5.6%, motif stop; 2.5 ± 1.8%, pause + continuation; Extended Data Fig. 9a–c). Unexpectedly, the latency to truncation was significantly longer than what we observed with pan-HVC stimulation. In some instances, it seemed to occur closer to syllable boundaries (panHVC, 66.6 ± 4.1 ms; HVCRA, 79.1 ± 4.8 ms; Extended Data Figs. 9b,d and 10c,d). Although post-truncation motif reset probability is comparable to that observed with pan-HVC stimulation (Extended Data Figs. 9e,f and 10e,f), restart latency was intermediate to the timing of pan-HVC and RA stimulation (Extended Data Figs. 9g and 10g–i). Although different latencies to first spike among HVC projection classes may influence truncation and restart dynamics39, this intermediate timing is still surprising because chains of excitatory synaptic connections among HVCRA neurons are considered to be a central component of the network controlling song9,10,11. This prompted us to investigate whether the other main class of HVC projection neurons (HVCX neurons) may contribute to the rapid restarting of the song motif.

HVCX neurons in song pattern generation

Similar to HVCRA neurons, HVCX neurons exhibit temporally precise and sparse activity during production of the song motif12,40,41, but their role in song generation is not known. They are considered to relay timing activity to the basal ganglia rather than directly contributing to song pattern generation42,43,44. However, paired recordings from HVC neurons in sleeping birds suggest that HVCX neurons can reliably lead the activity of interneurons and HVCRA neurons45, suggesting that they could contribute to song pattern generation.

We expressed ChRmine in HVCX neurons (Extended Data Fig. 11a,b) and found that stimulation reliably triggered song truncations (88.4 ± 3.0%, motif stop; 5.0 ± 2.5%, pause + continuation; Fig. 4a–c and Supplementary Table 3). The latency of truncation matched truncations elicited by pan-HVC stimulation (panHVC, 66.6 ± 4.1 ms; HVCX, 71.1 ± 4.1 ms; Fig. 4d, Extended Data Fig. 11c,d and Supplementary Table 3). Truncations were followed by rapid song restarts with probability and latency comparable to pan-HVC stimulation (Fig. 4e–g, Extended Data Fig. 11e–g and Supplementary Table 3). Song restart following HVCX stimulation was faster than that from HVCRA stimulation (Extended Data Fig. 11h,i). Although HVCRA neurons display longer spiking latencies than HVCX neurons39, which could contribute to the observed truncation dynamics, these rapid truncation and restart kinetics suggest that HVCX neurons could be part of the core song pattern-generating network, rather than only relaying timing signals to the basal ganglia.

Fig. 4: Selective optogenetic stimulation of HVCX neurons restarts the song motif.
Fig. 4: Selective optogenetic stimulation of HVCX neurons restarts the song motif.
Full size image

a, Schematic and spectrograms (0–11 kHz; horizontal lines identify song elements) of song-contingent HVCX ChRmine-expressing neuron stimulation (10 ms, red bars; orange dots highlight motif restarts). b, As in Fig. 1c,d for HVCX stimulation. c, Box plots (5th–95th percentile and 25th, 50th and 75th percentiles) reporting optogenetic stimulation outcome (average probability, n = 4 birds; filled circles, empty box plots from Fig. 1e reported for comparison). d, Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ± s.e.m. of the curve of each bird; n = 4 HVCX-stimulated birds (blue). The dataset from HVC-stimulated birds (black) in Fig. 1 was compared with all experimental groups across the paper; 10-ms time bins. Inset, violin plots reporting the latency of motif truncation computed across all the birds (HVCX-stimulated birds (blue) and the dataset from HVC-stimulated birds (white) in Fig. 1 was compared with all experimental groups across the paper). e, Probability (average ± s.e.m.) of post-truncation vocalization resumption by category upon stimulation delivered throughout the motif (bins, 10% motif advancement; n = 4). f, Normalized probability of post-truncation motif restart (n = 4 HVCX-stimulated birds, filled circles; empty box plots from Fig. 1i reported for comparison). g, Same as d, but for latency to motif restart (HVCX-stimulated birds, orange; HVC-stimulated birds, black). Scale bar, 200 ms (a). Brain outline in a adapted with permission from ref. 60, Wiley.

Source Data

Given that direct optogenetic stimulation of area X does not result in song truncations (Extended Data Fig. 7a,b), local synaptic transmission from HVCX neurons within HVC is probably the source of truncation and rapid restarting of song. We tested this by antidromically recruiting indirect and delayed excitation of axonal collaterals within HVC23 using optogenetic stimulation of HVC axon terminals in area X (Extended Data Fig. 12a,b). This stimulation should cause delayed song truncation, yet the motif restart kinetics should be similar to those upon direct HVC stimulation. We found that stimulation resulted in delayed motif truncation, although with lower probability compared with pan-HVC or HVCX direct stimulation, consistent with the limitations of antidromic propagation23 (Extended Data Fig. 12c–e). Nevertheless, we found that post-truncation restarting of the song motif had the same probability and latency as stimulating HVCX neuronal somata (Extended Data Fig. 12f–h).

Previous studies have identified the main synaptic connectivity motif in HVC to be disynaptic reciprocal inhibition between HVCRA and HVCX neurons through local interneurons46,47. This implicates a surge of inhibition onto HVCRA neurons following stimulation of HVCX neurons. However, this does not explain the more rapid truncation and reset of the motif upon HVCX stimulation. To examine how HVCX neurons may contribute to the song sequence propagation, we mapped the local connectivity of HVCX neurons using opsin-assisted synaptic circuit mapping. Stimulating neurotransmitter release from HVCX neurons in brain slices evoked excitatory and inhibitory postsynaptic currents in HVCX and HVCRA neurons (Fig. 5a and Extended Data Fig. 13a). Isolating monosynaptic connections using bath application of tetrodotoxin (TTX) followed by 4-aminopyridine (4-AP) revealed that HVCX neurons make monosynaptic connections with HVCRA and HVCX neurons with high and low probability, respectively (Fig. 5a). Previous studies using paired recordings and electron microscopy suggest that HVCRA neurons have only sparse connectivity with other HVCRA neurons. However, they are more reliably synaptically connected with HVCX neurons46,47,48. These data support a model in which the two HVC projection neuron classes form a heterosynaptic network, along with local interneurons, that can holistically sustain song pattern generation.

Fig. 5: HVCX neurons are part of song pattern-generating network and model of HVC.
Fig. 5: HVCX neurons are part of song pattern-generating network and model of HVC.
Full size image

a, Polysynaptic (HVCX, cyan; HVCRA, magenta) and monosynaptic (grey) optogenetically evoked excitatory post-synaptic current (oEPSC) amplitudes evoked by HVCX optogenetic stimulation (red lines, 1 ms). Following TTX application, 4-AP restores monosynaptic oEPSCs (grey; red outline indicates absence of post-4-AP oEPSC; n = cells; animals). Bar charts show the prevalence of polysynaptic and monosynaptic oEPSCs. b, Schematic of the HVC motif pattern-generating circuit. Neurons (circles) connected by structured local synaptic projections (cones) and unstructured global projections (lines; arrows, excitatory; circles, inhibitory). c, Compensated connection matrix between HVC projection neuron classes (HVCPN) and interneurons. d, Offset of excitatory (black) and inhibitory (grey) bumps grants unidirectional propagation of excitation. e, Simulated HVCRA/HVCX sequence showing truncation and restart of the neuronal firing sequence following simulated optogenetic HVCRA excitation (magenta arrows). Peri-song neuronal firing activity (green) surges to start the motif. f, Modelled dynamics of HVC subpopulations in response to excitatory perturbation (red), aligned and overlayed with experimental average ± s.e.m. data relative to truncation (blue dashed) and restart (orange dashed) latency upon pan-HVC stimulation, shifted by 36.35 ms (semitransparent filled waves) to reflect the delay measured from respiratory data. Inset magnifies the rising peri-song neuronal activity. g, Predicted decrease in mean sequence length upon simulated degradation of HVCX neuron connectivity. h, Simulated HVCRA sequence showing stochastic truncation upon HVCX TeNT-like degradation. i, Schematic and image showing conditional TeNT expression in HVCX neurons. j, Spectrograms (0–11 kHz; horizontal lines identify song elements) show progressive motif shortening. Inset, tiled spectrograms ordered by motif length. k, Complete motifs by each bird averaged per week (birds colour-coded per normalized TeNT optical density). l, Scatter plot correlating the relative change in number of motifs per day sung at week 4 as a function of the average optical density of TeNT expression (left, Spearman r = −0.8545; r2 = 0.6127; P = 0.0029) or number of TeNT+ somata per 50-µm-thick slice (right, Spearman r = −0.8061; r2 = 0.3533; P = 0.0072). Scale bars, 100 ms (a (x axis)), 100 pA (a (y axis)), 500 µm (i), 200 ms (j). Brain outline in i adapted with permission from ref. 60, Wiley.

Source Data

HVC pattern-generating network model

To test if a circuit consistent with this synaptic organization could sustain song progression and restarting, we modelled the network by arranging HVCRA, HVCX and local inhibitory neurons uniformly on a ‘chain’ (Fig. 5b,c). Synaptic weights in our model were symmetric, and the weights of excitatory and inhibitory connections decayed with distance along the chain. A pool of inhibitory neurons was driven by excitatory neurons and provided global inhibitory feedback. Song initiation was mediated by excitatory input from Uva onto a recently identified class of ‘peri-song’ HVCRA neurons, which became active just before song onset and were inactive during singing49 (Fig. 5b). Local inhibitory activity spatially lagged excitatory activity in the direction of sequence propagation, thereby providing stronger inhibition to excitatory neurons at earlier positions in the chain and effectively pushing excitatory activity forward along the chain (Fig. 5c,d and Supplementary Videos 2 and 3). Weakened excitatory synapses at the end of the chain stopped sequence propagation and released peri-song neurons from inhibition, leading to spontaneous restarting of the song motif if excitatory drive from Uva remained intact (Fig. 5e).

We modelled how this circuit responded to strong synchronous excitation, mimicking our optogenetic manipulations (Fig. 5e,f). The activation function of global inhibitory neurons was steeper than excitatory neurons, and the excitatory and global inhibitory neurons were strongly connected in a parameter regime called inhibition stabilization50. Thus, synchronous excitatory activity caused widespread and strong feedback from inhibitory neurons that blocked sequence propagation. Once the sequence was truncated, it could be spontaneously restarted through peri-song neurons, akin to restarting the song after its natural ending (Fig. 5e,f and Supplementary Videos 2 and 3). Unexpectedly, we found that the parameter settings that allowed the model to generate a moving neural sequence, optogenetic truncation and spontaneous restarting of sequence generation resulted in dynamics that qualitatively matched the timing of behaviour we measured following optogenetic manipulations of HVC (Fig. 5f). Although this simple rate-based model does not capture the precise spike-timing activity of all HVC neurons, it demonstrates that an inhibition-stabilized pattern-generating circuit matches our behavioural results following circuit perturbations, and an emergent property of the model is that its dynamics also match the timing of song truncation and restart.

Although previous studies have suggested that HVCX neuronal lesions leave song intact43,44, our model simulations predict that weakening the contribution of HVCX neurons to the chain results in premature truncation of neural sequence propagation, followed by spontaneous restarting of the motif (Fig. 5g,h and Supplementary Videos 2 and 3). To test these predictions, we suppressed excitatory synaptic transmission from HVCX neurons using selective viral expression of TeNT (Fig. 5i). In contrast to the effect of expressing TeNT in Uva, which resulted in birds producing their song motif in an all-or-none fashion (Fig. 3f–i), we found that TeNT in HVCX neurons caused birds to progressively increase the likelihood of prematurely truncating their song motifs (Fig. 5j–l and Extended Data Fig. 13b–d). Birds exhibited song truncations both within and between syllables (Fig. 5j and Extended Data Fig. 13e). The truncations occurred progressively earlier in the motifs, with timelines consistent with viral expression dynamics and with effects directly proportional to the amount of TeNT expression (Fig. 5k and Extended Data Fig. 13b,f). Consistent with model predictions, we also found that birds would frequently restart their songs following premature motif truncations, and that the latency of these restarts matched those observed upon direct optogenetic excitation of HVCX neurons (Fig. 5j and Extended Data Fig. 13e,g). Together, these results demonstrate that interruption of HVC activity, by either optogenetic perturbations or silencing of HVCX neurons, drives song truncation and probably releases a common circuit mechanism driving rapid restarting of the neural sequence for song.

Discussion

Neuronal sequences in HVC have been proposed to function as a clock, which controls the timing and progression of the song motif9,11,12,13,15. Considerable debate has centred on whether these patterns of activity require instructive patterned input for motif completion. Here we provide key observations indicating that adult HVC functions as a sequence-generating network that does not require patterned input, at least from its best described afferent pathways, to complete the song motif.

Chunking of motor sequences, followed by concatenation of commonly repeated sequences, is a proposed mechanism for optimizing learning and performance2,4,5,7,51,52,53,54,55. Motor chunking is thought to function in learning and production of the movement sequences needed for fluent speech production and other well-practiced behaviours. Early stages of juvenile bird song development involve splitting and growth of neural sequences in HVC as new syllables are being learned6,56. This process probably reflects chunking of respiratory and vocal patterns needed for accurate and rapid learning. Juvenile songbirds progressively shape their song, practicing thousands of times per day, and pallial input pathways are necessary to direct such developmental song learning35,37,57,58.

Here we show that these main pallial input pathways to HVC are dispensable for adult song production. Moreover, our cell-type selective manipulations in HVC demonstrate that ordered syllable sequence of the song motif spontaneously restarts if it is prematurely truncated. This skipping of song back to the beginning is reminiscent of CPG rhythm resetting and consistent with the holistic control of the motif by a sequence-generating circuit in HVC20,29. These findings suggest that the phase at the end of song development, referred to as ‘crystallization’, involves consolidation of neural programs for motor control within HVC. We propose that as song developmentally becomes more stereotyped and precise, consistent daily practice concatenates these sparse neural sequences into a stable chain that autonomously sustains song motif completion.

Birds lack a corpus callosum; therefore, our findings raise questions about how the interhemispheric timing of pattern-generating circuits is coordinated. Uva receives bilateral ascending input from the respiratory medulla and midbrain vocal circuits and is therefore considered to play a prominent role in interhemispheric coordination of HVC14,15,22. Accordingly, we found that thalamic input from Uva is needed for initiating the song motif. However, our evidence indicates that it is not needed for transitioning from syllable to syllable within the motif. Uva may therefore send synchronized onset cues for song to coordinate initiation of each motif in the song bout, which then continues autonomously in each HVC. Nevertheless, Uva remains active throughout the motif and could function as a metronome, providing timing signals that support interhemispheric coordination without being required to instruct transitions within ongoing song motifs.

Our synaptic connectivity mapping finds HVCX neurons consistently making monosynaptic connections with HVCRA neurons but only sparsely with other HVCX neurons. This supports the idea that the main synaptic connectivity within HVC involves disynaptic inhibition and monosynaptic excitation between HVCRA and HVCX neurons, rather than homotypic connections within either class of excitatory neurons46,47. More research will be needed to fully describe cell-type connectivity in the network and understand how song sequence progression is fully controlled. Nonetheless, we propose a straightforward computational model of HVC that can sustain sequence generation and song restarting following circuit perturbations. Moreover, our model simulations indicate that a substantial reduction in HVCX neuronal transmission leads to stochastic song truncations and restarting of the motif, a prediction matched by our selective expression of TeNT in these neurons. However, the HVC song circuit seems robust to moderate perturbation, as shown by the lack of effect in previous HVCX ablation studies43,44, focal lesion studies59 and our own model predictions (Fig. 5g,h), as well as by our data showing that birds with lower levels of TeNT expression did not exhibit significant disruptions in song (Fig. 5k,l and Extended Data Fig. 13b–d).

In summary, this study reveals that a premotor circuit, facilitated by thalamic input, can holistically control strings of vocal syllables and more precisely defines the synaptic circuit architecture critical to this pattern-generating circuit. In future experiments, it will be important to examine how ‘fused’ sequence elements are integrated for the control of other types of natural behaviours and learned vocalizations with greater syntax complexity. Zebra finches produce only a single stereotyped sequence of song syllables, making them an ideal model for first testing how the brain controls strings of vocal gestures. We propose that chunking, followed by concatenation of reliably reproduced neuronal sequences, underlies these behaviours, and that the approaches applied here can help identify the boundaries of the motor sequences used by the brain to support production of complex behaviours.

Methods

Animals

The experiments described in this study were conducted using adult male zebra finches (Taeniopygia guttata; 120–500 days post-hatch). All procedures were performed in accordance with protocols approved by the Animal Care and Use Committee at UT Southwestern Medical Center.

Viral vectors

The following adeno-associated viral vectors were used in the experiments: rAAV2/9/fDIO–CBh–eGTACR1–mScarlet, rAAV2/9/CBh–Flippase, rAAV2/9/CBh–ChRmine–mScarlet, rAAV2/9/DIO–CAG–ChRmine–mScarlet, rAAV2/9/DIO–CAG–TeNT–mScarlet (Intellectual and Developmental Disabilities Research Center Neuroconnectivity Core at Baylor College of Medicine) and rAAV2/9/CMV–CRE–eGFP (Addgene). All viral vectors were aliquoted and stored at −80 °C until use.

Stereotaxic surgery

Aseptic stereotaxic surgeries were performed after birds were anaesthetized (isoflurane inhalation; 0.8%–1.5%).

Viral injections were performed using previously described procedures26,37,58. Briefly, a cocktail of adeno-associated viral vectors (rAAV/CBh–ChRmine in HVC, RA, area X or thalamus (2 µl per hemisphere); 1:2 of rAAV/CBh–FLP and rAAV/DIO–CBh–eGtACR1, respectively (1–2 µl total per hemisphere); rAAV/DIO–CAG–ChRmine in HVC or Uva (2 µl); rAAV/CMV–Cre in RA, area X or HVC (0.5–1 µl and 2 µl, respectively); rAAV/DIO–TeNT in HVC or Uva (2 µl); and rAAV/CMV–CRE in area X or HVC (2 µl), respectively) were injected (1 nl s−1) into target areas with a Nanoject III (Drummondsci) and glass capillaries. Experiments were conducted starting a minimum of 3 weeks after viral injections. Fluorophore-conjugated retrograde tracers (Dextran 10,000 MW, AlexaFluor 488, 568 and 647, Invitrogen; Fast Blue, Polysciences) were injected bilaterally into area X, RA or HVC (160 nl; 5 × 32 n, 32 nl s−1 every 30 s) (refs. 26,37,58). Electrophysiological mapping was used to determine the centres of HVC, NIf, mMAN, LMAN and RA, and area X, nucleus avalanche and Uva were identified using stereotaxic coordinates (coordinates relative to interaural zero: head angle, rostral–caudal, medial–lateral, dorsal–ventral (in mm). The stereotaxic coordinates were as follows: HVC (45°; anterior–posterior, 0; medial–lateral, ±2.4; dorsal–ventral, −0.2 to −0.6), NIf (45°; anterior–posterior, 1.75; medial–lateral, ±1.75; dorsal–ventral, −2.4 to −1.8), mMAN (20°; anterior–posterior, 5.1; medial–lateral, ±0.6; dorsal–ventral, −2.1 to −1.6), lMAN (20°; anterior–posterior, 5.1; medial–lateral, ±1.7; dorsal–ventral, −2.2 to −1.6), RA (80°; anterior–posterior, −1.5; medial–lateral, ±2.5; dorsal–ventral, −2.4 to −1.8), X (45°; anterior–posterior, 4.8; medial–lateral, ±1.6; dorsal–ventral, −3.3 to −2.7), nucleus avalanche (45°; anterior–posterior, 1.65; medial–lateral, ±2.0; dorsal–ventral, −0.9) and UVA (20°; anterior–posterior, 2.5; medial–lateral, ±1.6; dorsal–ventral, −4.8 to −4.2).

Optogenetic manipulations

For optogenetic stimulation, optic fibres (multimode 400 µm; 0.39 numerical aperture; ThorLabs) were implanted bilaterally dorsal to HVC, RA, area X or Uva using acrylic glue and dental cement. Although the 400-µm-diameter fibres may not completely cover the entirety of the areas, we estimated that the cone of light could stimulate the vast majority of the targeted neurons. After recovery, the implanted fibres were connected to optic fibres through ceramic sleeves. The fibres were connected to a rotary joint and interfaced with a 1.5-mm multimode fibre connected to a light-emitting diode box (Prizmatix). Light intensity was regulated to achieve a final output of approximately 10 mW. We used a custom software (pcaf; LabVIEW) to deliver optogenetic stimulation during song (200 ms or 1 s for HVC afferent stimulation, 10–50 ms for direct ChRmine somatic stimulation and 50–200 ms for antidromic HVCX stimulation). In many instances, our goal was to target as many moments as possible within a bird song motif. To achieve this, we targeted most of the motifs birds were producing using quasi-random light onset delays introduced through a transistor–transistor logic. This targeting strategy allows for a detailed analysis of motif-level effects but limits our ability to conduct meaningful song-bout-level analysis of the behaviour. We note that light delivery over HVC or other brain regions is not sufficient to cause truncations or other disruptions in singing behaviour because several experiments using light stimulation (light stimulation of afferent pathways into HVC or of area X neurons) have no effect on singing behaviour. Air sac recordings and analysis were performed as previously published15.

Lesion quantification

Excitotoxic lesion was induced by 1% ibotenic acid (50–100 nl per injection site) or a cocktail of 1% ibotenic acid and 100 mM quisqualic acid (Uva and LMAN). Lesion extent was first verified by the absence or sparseness of NeuN immunostaining in the targeted nuclei. To provide an unbiased estimate of the lesion extent, retrograde tracers were injected in HVC and RA to highlight any surviving cells in the afferent nuclei. In control animals, the number of retrograde tracer-filled cells in each nucleus was quantified, and correlations were calculated between cell counts in each nucleus (Extended Data Fig. 6a–f). This analysis provided a statistical validation to extrapolate the number of cells in a target nucleus from the number of cells counted in a reference nucleus. Therefore, an average ratio across nuclei cell counts was calculated. On the basis of these control ratios and the number of cells in a non-lesioned reference nucleus, the expected number of retrogradely filled cells in each nucleus of each hemisphere was estimated.

In vivo extracellular recordings

To test the functional expression of opsins, we performed extracellular recording of HVC activity in birds under light isoflurane anaesthesia (0.8%) with Carbostar carbon electrodes (impedance: 1,670 µΩ cm; Kation Scientific). A 400-µm multimodal optical fibre was placed on the brain surface overlaying HVC and delivered light stimulation (470 nm; approximately 20 mW; 1 s) during neural recordings. To test antidromic excitation of HVCX neurons by axon terminal optical stimulation, optic fibres were implanted over area X (470 nm; approximately 20 mW; 100 ms). Signals were acquired at 10 kHz and band-pass filtered (300 Hz high-pass; 20 kHz low-pass). Spike rate (binned every 10 ms) and PSTHs were calculated to quantify light stimulation responses (one to five sites per hemisphere; Spike2). Birds without optically evoked responses were excluded from experiments. Spike counts and PSTHs were normalized to the pre-stimulus baseline (500 ms). Two-way analyses of variance (ANOVA) were calculated comparing the time course between stimulated and not stimulated recordings: for testing HVC afferents (1-s stimulation), 0–5 s (light stimulation; 0.5–1.5 s) versus 5–10 s (control; no stimulation); for ChRmine-expressing HVC neurons or HVC→area X stimulation (100-ms stimulation) 0.7–1.4 s (300 ms before and after 100-ms light stimulation) versus 5.7–6.4 s (control; no stimulation). Wilcoxon tests were performed on the average time course (with intervals specified in the figure legends).

Ex vivo physiology

Slice preparation

Zebra finches were deeply anaesthetized and then decapitated. The brain was removed from the skull and submerged in cold (1–4 °C) oxygenated dissection buffer. Acute sagittal 230-μm brain slices were cut in ice-cold carbogenated (95% O2/5% CO2) solution, containing 110 mM choline chloride, 25 mM glucose, 25 mM NaHCO3, 7 mM MgCl2, 11.6 mM ascorbic acid, 3.1 mM sodium pyruvate, 2.5 mM KCl, 1.25 mM NaH2PO4 and 0.5 mM CaCl2, and adjusted to 320–330 mOsm. Individual slices were incubated in a custom-made holding chamber filled with artificial cerebrospinal fluid, containing 126 mM NaCl, 3 mM KCl, 1.25 mM NaH2PO4, 26 mM NaHCO3, 10 mM d-(+)-glucose, 2 mM MgSO4 and 2 mM CaCl2, adjusted to 310 mOsm, pH 7.3–7.4 and aerated with a 95% O2/5% CO2 gas mixture. Slices were incubated at 36 °C for 20 min and then kept at room temperature for a minimum of 45 min before recordings.

Slice electrophysiological recording

The slices were constantly perfused in a submersion chamber with 32 °C oxygenated normal artificial cerebrospinal fluid. Patch pipettes were pulled to a final resistance of 3–5 MΩ from filamented borosilicate glass on a Sutter P-1000 horizontal puller. HVC projection neuron classes, as identified by retrograde tracers, were visualized by epifluorescence imaging using a water immersion objective (×40; 0.8 numerical aperture) on an upright Olympus BX51 WI microscope, with video-assisted infrared CCD camera (QImaging Rolera). Data were low-pass filtered (10 kHz) and acquired (10 kHz) (Axon MultiClamp 700B amplifier, Axon Digidata 1550B data acquisition and Clampex 10.6; Molecular Devices).

For voltage clamp whole-cell recordings, the internal solution contained 120 mM cesium methanesulfonate, 10 mM CsCl, 10 mM HEPES, 10 mM EGTA, 5 mM creatine phosphate, 4 mM ATP–Mg and 0.4 mM GTP–Na (adjusted to pH 7.3–7.4 with CsOH). For current clamp recordings, the internal solution contained 116 mM K gluconate, 20 mM HEPES, 6 mM KCl, 2 mM NaCl, 0.5 mM EGTA, 4 mM MgATP, 0.3 mM NaGTP and 10 mM Na phosphocreatine (adjusted to pH 7.3–7.4 with KOH; 299 mOsm).

Optically evoked synaptic currents were measured by delivering two light pulses (1 ms, spaced 50 ms, generated by a CoolLED pE-300) focused on the sample through the ×40 immersion objective. Sweeps were delivered every 10 s. Synaptic responses were monitored while holding the membrane voltage at −70 mV (for oEPSCs) and +10 mV (for optogenetically evoked inhibitory postsynaptic currents (oIPSCs)). We monitored different light stimulation intensities before baseline recording to achieve oEPSC responses at approximately 50% of the maximal response. Access resistance (10–30 MΩ) was monitored throughout the experiment, and recordings were discarded from further analysis if resistance changed by more than 20%. The excitation–inhibition (oEPSC/oIPSC) ratio was calculated by dividing the amplitude of the oEPSC at −70 mV by the amplitude of the oIPSC at +10 mV during identical light intensity stimulation. To validate inhibitory and excitatory post-synaptic currents as γ-aminobutyric acid (GABA)ergic and glutamatergic, respectively, in a subset of cells the GABAa receptor antagonist SR 95531 hydrobromide (gabazine; 10 µM) was added to the bath while holding the cell at +10 mV, or the AMPA receptor antagonist 6,7-dinitroquinoxaline-2,3-dione (10 µM) while holding the cell at −70 mV. In another subset of cells, once the baseline measures were established, we tested for monosynaptic connectivity by bath application of 1 µM TTX, followed by 100 µM 4-AP, and measured the amplitude of post-synaptic currents returning following 4-AP application. On the basis of the signal-to-noise ratio of the recordings, currents under 5 pA were considered unreliable and not considered further, as were currents rescued by 4-AP application with an amplitude less than 10 pA (non-monosynaptic; two instances: 1 HVCX→HVCX and 1 HVCX→HVCRA).

Histology and immunohistochemistry

Birds were anaesthetized with EUTHASOL (Virbac) and transcardially perfused with 4% paraformaldehyde in phosphate-buffered saline (PBS). Free-floating sagittal sections (30 µm) were cut using a cryostat (Leica CM1950). These sections were first washed in PBS, then blocked in 3% bovine serum albumin in 0.3% Triton X-100 in PBS for 1 h at room temperature and incubated with primary antibodies (α-NeuN MAB377, Millipore, 1:500; α-GFP a11122, Invitrogen, 1:1,000) diluted in the blocking buffer at 4 °C for 24 h. The slices were washed with PBS and incubated at room temperature for 2 h with fluorescent secondary antibodies (Jackson 715-605-150 Alexa Fluor 647-conjugated donkey anti-mouse for NeuN and Millipore A21206 Alexa Fluor 488-conjugated goat anti-rabbit for GFP), diluted in blocking buffer). After PBS wash, sections were mounted onto slides with Fluoromount-G (eBioscience). Composite images were acquired and stitched using an LSM 880 or LSM 710 laser scanning confocal microscope (Carl Zeiss) and/or a ZEISS Axio Scan Z1 (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility; RRID: SCR_017949). Image analyses were performed using ImageJ. After electrophysiological recordings, the slices were incubated in 4% paraformaldehyde in PBS. Sections were then washed in PBS, mounted on glass slides with Fluoromount-G (eBioscience) and visualized under an LSM 880 laser scanning confocal microscope (Carl Zeiss). In situ hybridization experiments were conducted as previously reported.

Three-dimensional brain imaging and processing

Imaging and processing of the sample brain with tracers injected in HVC (Alexa 488-conjugated dextran 10,000) and RA (Alexa 568-conjugated dextran 10,000) for three-dimensional (3D) rendering were conducted with the help of Denise Ramirez and Ariana Nawaby (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility; RRID: SCR_017949). After perfusion with 4% paraformaldehyde, the brain was embedded in oxidized agarose in preparation for sectioning. The TissueCyte 1000 instrument (TissueVision) automatically sectioned the entire volume of the brain at 100 mm in the coronal plane and collected mosaic image tiles encompassing each section. For preprocessing, images were downsampled to 1.5-μm xy resolution and colour contrast adjusted to provide high visual contrast between signals of interest and background.

For segmentation, a selected portion of signals of interest in the downsampled contrast adjusted images of the tissue was visually identified, annotated and used to train a random forest classifier for segmentation in ilastik (v.1.3.3) (refs. 61,62,63,64). This classifier was applied to all section images in the brain to assign a probability score to each pixel in the image, corresponding to its chance of belonging to specific fluorescent signals, autofluoresence or background noise. The total autofluorescence (Alexa 488 (green) and Alexa 568 (red)) pixelwise probability scores were further processed and used for visualization.

For segmentation post-processing, to create a grey silhouette of the overall shape of the brain, the autofluoresence probability signal was thresholded using the ImageJ default thresholding algorithm. Any holes in the binary mask were then flood-filled, and particles greater than 3,024 px2 were removed. Green and red probabilities were thresholded at 105 and 79 8-bit pixel intensities, respectively, as determined visually to reduce low-probability noise in the image. The GFP signal in the rostral-most portion of the brain (beyond section 135) was dimmed for better visibility of more caudal structures by subtracting the pixel intensities by 140 pixel intensity units in the 8-bit range.

For visualization, combined RGB images of the autofluoresence (grey), Alexa 488 (green) and Alexa 568 (red) post-processed probabilities were visualized in 3D using VAA3D software (v.V3.447; https://home.penglab.com/proj/vaa3d/home/index.html).

Song analysis

Birdsongs were recorded and analysed using Sound Analysis Pro (SAP) 2011 (ref. 65), and plots were made with a modified version of Avian Vocalization Network66. We manually measured and categorized the outcomes of optogenetic stimulations. Truncations were defined as stimulation-contingent atypical amplitude decays of 300 ms or less (not present in control motifs), visible as silent gaps in the spectrogram. Truncation latencies were measured from the onset of the light delivery to the onset of the optically contingent silent gap. Stop was defined as truncation not followed by continuation or resumption of the motif. Syllable boundaries and complex syllable elements were delimited by silent pauses or by clear spectral continuity changes. Twenty stimulated song segments were measured for stimulated and non-stimulated conditions for quantification of acoustic properties and sound similarity (SAP). Acoustic properties of the stimulated segment were measured and compared with the corresponding song fragment in unstimulated control motifs. When optical stimulation did not cause truncation, acoustic properties were calculated on the song fragment from the onset of optical stimulation to the end of the last syllable. The entire motif was analysed during 1-s stimulation trials.

In the 1-s time window after song truncation, optical stimulation effects were manually classified as falling into one of four categories: (1) motif reset (restarting with the first song syllable, with introductory notes or with syllables that normally link motifs); (2) calls (typical zebra finch calls); (3) introductory notes (those not followed by motif initiation); or (4) pause and continuation (post-truncation motif resumption at any syllable in the motif other than the first syllable). To calculate the normalized motif reset probability, the number of motifs per bout was calculated over 30–50 bouts (defined as chains of motifs, started with introductory notes and mostly uninterrupted; in rare occasions, we found motifs produced within 1 s from other motifs, and they were considered as part of the previous bout; M, average number of motifs per bout). Each bird’s probability of motif truncation was then normalized (normalized motif reset probability = motif reset probability/[1 − (1/M)), following the logic that 1/M is the likelihood of each motif to be the last in the bout and not be followed by another motif. Therefore, 1 − (1/M) is the probability of a motif to be followed by another motif in the current bout. The probability of reset implies the presence of a motif after the truncated one examined. Therefore, dividing by the likelihood of that motif being followed by another one returns a normalized measure of the reset.

To report cross-motif quantification of truncation or reset latency and resumed vocalization identity probability, events were categorized depending on the time point within the motif at which the onset of the corresponding stimulation occurred. The events were then grouped in 10% bins across the motif duration, per bird, to allow for comparison between birds with different motif lengths. Then 100% for each bird was set to the duration of the motif −100 ms, as the latency to truncation when applied later than 100 ms before the end of the motif would lead to unclear effects on the syllables (average truncation latency across groups = 74.36 ± 3.06). Whenever the stimulation happened in the last 100 ms of motif, the events were classified in the −20% to 0% bins, affecting the transition to the following motif (if any). Stimulations, truncations and post-truncation effects occurring during introductory notes and inter-motif connecting syllables were assigned to these −20% to 0% time bins on the basis of their temporal distance to the syllable A (if no syllable A onset was produced, the effects were not considered for further analysis, as we could not categorize the introductory note as produced at specific distance from the motif for the percentage computation).

To evaluate the likelihood of optogenetic inhibition or stimulation across a motif–motif transition to terminate a bout (Extended Data Fig. 4d–h), we delivered light or sham stimulation across the motif and extending beyond its end, and we quantified the probability of the stimulation to be contingent with the termination of the bout for 50 trials in each condition.

In lesion experiments, a minimum of 20 motifs were scored with SAP against pre-surgery motifs. Failed motif starts were defined as a series of introductory notes not leading to a motif. The number of motifs in a bout was counted over 50 bouts; for TeNT experiments, for birds that would ultimately lose their song (UVAHVC TeNT; some HVCX TeNT), the last 50 bouts before song cessation were analysed. In case of absence of motifs being produced post-lesion in Fig. 3b (the birds did not sing at all), the accuracy was assigned the value of 0 for the sake of classification.

Recurrent circuit model of HVC

The computational model used in this study is on the basis of a canonical recurrent circuit model (continuous attractor neural network67,68,69) and simulated in the BrainPy framework70. In a typical continuous attractor neural network, excitatory neurons are arranged to uniformly cover a linear feature space (for example, the location of the timing chain in the current case71) and have mutual interactions through recurrent connections72. This configuration gives rise to a continuous manifold that sustains a series of activity bumps. A song motif is considered to be controlled by an activity bump traversing from one end of the chain to another73.

To better reflect the biological characteristics of the songbird HVC, we introduced several specific features.

The model incorporates the following five distinct neuron types to capture the functional diversity in the songbird HVC:

  1. (1)

    Excitatory neurons (HVCRA, \({{\bf{r}}}_{{\rm{RA}}}\), and HVCX, \({{\bf{r}}}_{{\rm{X}}}\))

    The excitatory neurons responsible for encoding the neural sequence are divided into two groups (HVCRA and HVCX), with their firing rates denoted as \({{\bf{r}}}_{{\rm{RA}}}\) and \({{\bf{r}}}_{{\rm{X}}}\), respectively. Consistent with experimental observations, the model only includes intergroup connections and leaves neurons within the same group unconnected. Simulations demonstrated that these intergroup connections are sufficient to self-sustain non-zero responses and moving sequences.

  2. (2)

    Global inhibitory neurons (\({{\bf{r}}}_{{\rm{g}}}\))

    To keep the stability of the network, the network model contains a global inhibitory neuron with the firing rate \({{\bf{r}}}_{{\rm{g}}}\). Compared with excitatory neurons, in the model, this neuron has more rapid dynamics and a steeper activation function to provide effective global inhibition.

  3. (3)

    Local inhibitory neurons (\({{\bf{r}}}_{{\rm{I}}}\))

    The circuit model has another group of inhibitory neurons (\({{\bf{r}}}_{{\rm{I}}}\)) providing local, structured inhibitory feedback to the excitatory populations, which is essential to generate spontaneous movement of the population activity bumps of excitatory neurons within the circuit. The \({{\bf{r}}}_{{\rm{I}}}\) bump slightly lags behind the excitatory neuron bumps owing to transmission delay and slow dynamics, so that the excitatory neurons at more distant locations will be suppressed less and build up more activity. As a result, the activity bump of excitatory neurons is ‘pushed’ to move forward.

  4. (4)

    Peri-song neurons (\({{\bf{r}}}_{{\rm{ps}}}\))

The circuit model contains an HVCRA peri-song neuron group (\({{\bf{r}}}_{{\rm{ps}}}\)) that is modelled to target HVCRA song neurons at the initial end of the manifold. This group plays a critical role in initiating and resetting motif generation.

Circuit dynamics

The neural dynamics underlying these activities are captured by a set of dynamic equations:

$${\tau }_{{\rm{E}}}{\dot{{\bf{r}}}}_{{\rm{R}}{\rm{A}}}=-{{\bf{r}}}_{{\rm{R}}{\rm{A}}}+{W}_{{\rm{X}},{\rm{R}}{\rm{A}}}\cdot {f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{X}}})+{W}_{{\rm{I}},{\rm{R}}{\rm{A}}}\cdot {f}_{{\rm{I}}}({{\bf{r}}}_{{\rm{I}}})+{w}_{{\rm{g}},{\rm{R}}{\rm{A}}}\,{f}_{{\rm{g}}}({{\bf{r}}}_{{\rm{g}}})+{W}_{{\rm{p}}{\rm{s}},{\rm{R}}{\rm{A}}}\,{f}_{{\rm{p}}{\rm{s}}}({{\bf{r}}}_{{\rm{p}}{\rm{s}}})+{I}_{{\rm{e}}{\rm{x}}{\rm{t}},1}$$
(11)
$${{\tau }_{{\rm{E}}}\dot{{\bf{r}}}}_{{\rm{X}}}=-{{\bf{r}}}_{{\rm{X}}}+{W}_{{\rm{R}}{\rm{A}},{\rm{X}}}\cdot {f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{R}}{\rm{A}}})+{W}_{{\rm{I}},{\rm{X}}}\cdot {f}_{{\rm{I}}}({{\bf{r}}}_{{\rm{I}}})+{w}_{{\rm{g}},{\rm{X}}}\,{f}_{{\rm{g}}}({{\bf{r}}}_{{\rm{g}}})+{I}_{{\rm{e}}{\rm{x}}{\rm{t}},2}$$
(12)
$${\dot{{\bf{r}}}}_{{\rm{g}}}=-{{\bf{r}}}_{{\rm{g}}}+{[{W}_{{\rm{R}}{\rm{A}},{\rm{g}}}\cdot {f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{R}}{\rm{A}}})+{W}_{{\rm{X}},{\rm{g}}}\cdot f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{X}}})]$$
(1.3)
$${{\tau }_{{\rm{I}}}\dot{{\bf{r}}}}_{{\rm{I}}}=-{{\bf{r}}}_{{\rm{I}}}+{[{{W}_{{\rm{R}}{\rm{A}},{\rm{I}}}\cdot f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{R}}{\rm{A}}})+{W}_{{\rm{X}},{\rm{I}}}\cdot f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{X}}})]$$
(1.4)
$${\tau }_{{\rm{p}}{\rm{s}}}{\dot{{\bf{r}}}}_{{\rm{p}}{\rm{s}}}=-{{\bf{r}}}_{{\rm{p}}{\rm{s}}}+{I}_{{\rm{U}}{\rm{v}}{\rm{a}}}+{w}_{{\rm{g}},{\rm{p}}{\rm{s}}}\,{f}_{{\rm{g}}}\,({{\bf{r}}}_{{\rm{g}}})$$
(15)

In these equations, subscripts denote the neuron types. The parameter \(\tau \) represents the time constant, and \(f(\cdot )\) denotes the activation function for each neuron group. External input currents are denoted as \({I}_{\mathrm{ext}}\), and specific terms such as \({I}_{\mathrm{Uva}}\) correspond to input from Uva. The capital \({W}_{{\rm{A}},{\rm{B}}}\) indicates the connection matrix from group A to B with dimensions \({N}_{{\rm{B}}}\times {N}_{{\rm{A}}}\), where \(N\) is the number of neurons in the respective group, whereas the lowercase \(w\) indicates the scalar connection strength. For convenience, we set \({N}_{\mathrm{RA}}={N}_{{\rm{X}}}={N}_{{\rm{I}}}=N\) and \({N}_{{\rm{g}}}={N}_{\mathrm{ps}}=1\). Specifically, to support a continuous manifold, the entries of connections between excitatory and local inhibitory neurons are determined by the distance between the index of pre-synaptic and post-synaptic neurons:

$${W}_{{\rm{A}},{\rm{B}}}^{(ij)}={w}_{{\rm{A}},{\rm{B}}}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{(i-j)}^{2}}{2{\sigma }^{2}}\right]$$
(2)

where \({w}_{{\rm{A}},{\rm{B}}}\) (\({\rm{A}},{\rm{B}}\in \{\mathrm{RA},{\rm{X}},{\rm{I}}\}\)) denotes the peak weight of the weight from neuronal population \({\rm{A}}\) to \({\rm{B}}\).

To target the peri-song output to the initial location of the manifold, \({W}_{\mathrm{ps},{\rm{E}}}\) is a \(N\times 1\) matrix with its column in a Gaussian profile centring at 0:

$${W}_{\mathrm{ps},{\rm{E}}}^{({\rm{k}})}={w}_{\mathrm{ps},{\rm{E}}}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{(k-0)}^{2}}{2{\sigma }^{2}}\right]$$
(3)

Sequence initiation

The fundamental property of the network is its ability to spontaneously generate neural sequences. In our model, peri-song neurons initiate the sequential activity. The peri-song neurons receive excitatory input, probably originating from the upstream nucleus Uva, while simultaneously receiving inhibitory input from the global inhibitory neurons. When the network is silenced, whether at rest or following truncation, activity in the global inhibitory neuron decreases, which disinhibits the peri-song neurons. This release from inhibition then triggers the onset of a motif.

Boundaries

Following the activation of excitatory neurons, the activity bump is driven by locally structured inhibitory feedback from \({{\bf{r}}}_{{\rm{I}}}\) to traverse the continuous manifold. For the bump to gain a directional motion tendency, the inhibitory feedback is intentionally enhanced at the initial locations on the chain. Owing to the recurrent nature of the network, the bump would ordinarily ‘bounce’ back upon reaching the end of the chain. However, this behaviour is inconsistent with observed data. To address this, we introduced a fading mechanism for excitatory-to-excitatory connections as the bump approaches the boundary, simulating a ‘boundary effect’. This gradual reduction in connectivity causes the bump to diminish as it reaches the end point, resulting in an automatic cessation of activity that mimics the natural termination of a motif. These two boundary behaviours were implemented by multiplying the connection strength with a compensation factor:

$${{W}_{{\rm{I}},{\rm{R}}{\rm{A}}/{\rm{X}}}^{(ij)}}^{{\prime} }={W}_{{\rm{I}},{\rm{R}}{\rm{A}}/{\rm{X}}}^{(ij)}\,\left(1+{c}_{0}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{i}^{2}}{4{\sigma }^{2}}\right]\right)$$
(41)
$${{W}_{{\rm{R}}{\rm{A}}/{\rm{X}},{\rm{X}}/{\rm{R}}{\rm{A}}}^{(ij)}}^{{\prime} }={W}_{{\rm{R}}{\rm{A}}/{\rm{X}},{\rm{X}}/{\rm{R}}{\rm{A}}}^{(ij)}\,\left(1-{c}_{1}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{(i-N-\phi )}^{2}}{4{\sigma }^{2}}\right]\right)$$
(42)

\(\mathrm{where}\,\phi \) is an offset term, in which we take the value \(\phi =0.5\sigma N/2{\rm{\pi }}\). The compensated connection matrices are shown in Fig. 5c.

Truncation

To simulate optogenetic stimulation truncating HVC neuronal sequences observed in experimental studies, we applied an intense, spatially homogeneous pulse input to either HVCRA or HVCX neurons. Following this stimulation, both \({{\bf{r}}}_{{\rm{RA}}}\) and \({{\bf{r}}}_{{\rm{X}}}\) became hyper-activated, leading to rapid suppression by the fast response of \({{\bf{r}}}_{{\rm{g}}}\). These neurons remain suppressed until \({{\bf{r}}}_{{\rm{g}}}\) activity subsides, corresponding to the observed motif truncation (Fig. 5e,f). Subsequently, the peri-song neurons reinitiate the neural sequence, allowing the motif to resume from the beginning. Considering that HVCRA and HVCX are connected symmetrically in the current model, we only simulated optogenetic stimulation on HVCRA as a verification.

HVCX degradation

To simulate the effects of degradation of HVCX neuron neurotransmission, as observed in Fig. 5g,h, we manually modified the output projections of HVCX. Let \(p\) denote the proportion of degradation. Under this condition, the degraded projection from HVCX to HVCRA (\({W}_{{\rm{X}},\mathrm{RA}}{\prime} \)) can be expressed as

$${{W}_{{\rm{X}},{\rm{R}}{\rm{A}}}^{(ij)}}^{{\prime} }={[(1-p){W}_{{\rm{X}},{\rm{R}}{\rm{A}}}^{(ij)}+\sqrt{(1-p){W}_{{\rm{X}},{\rm{R}}{\rm{A}}}^{(ij)}}{\sigma }_{{\rm{W}}}{\xi }_{ij}]}_{+}$$
(5)

where \({W}_{{\rm{X}},\mathrm{RA}}^{({ij})}\) represents the original connection strength, \({\sigma }_{W}\) denotes the variation coefficient, \({\xi }_{{ij}}\) is an independent Gaussian noise term indexed by the pre-neuron and post-neuron indices ij, and \({[x]}_{+}=\max (x,0)\) denotes the negative rectification, ensuring the weight is always excitatory (positive).

During synaptic degradation over weeks, experiments revealed that neuronal sequences observed in different trials within the same day could traverse and then disappear at random locations on the chain. We assume that the synaptic weights within the same day are nearly the same, and that the random progression along the chain results from the variability of single neurons. Therefore, to reproduce the random progression along the chain during synaptic degradation, each HVCRA neuron \({{\bf{r}}}_{{\rm{RA}}}(j)\) receives a Poisson-like noise \({I}_{\mathrm{noise}}\), mimicking stochastic spike generation:

$${I}_{{\rm{n}}{\rm{o}}{\rm{i}}{\rm{s}}{\rm{e}}}(j)=\sqrt{{F{\bf{r}}}_{{\rm{R}}{\rm{A}}}(j)}\xi (t)$$
(6)

where \(F\) is the Fano factor scaling the noise and \(\xi (t)\) is a standard Gaussian white noise. Moreover, the noises received by different neurons are independent of each other. Under these conditions, we observed that the sequences terminated at random positions. As illustrated in Fig. 5g, the average sequence length decreased as the proportion of neuronal degradation increased.

Statistical analysis

All data were analysed with GraphPad Prism 10. Data were tested for normality using the Shapiro–Wilk Test. Parametric and non-parametric statistical tests were used. To compare between two groups, t-test, Mann–Whitney and Kolmogorov–Smirnov tests were used. For more than two conditions, one-way and two-way ANOVA or the Kruskal–Wallis test were performed. Cumulative probability curves were calculated for each animal and then tested in groups for statistical significance. Only one comparison among all groups was made to avoid repeatedly comparing the same dataset (HVC) with individual other datasets. Fisher or X2 tests, followed by Dunn’s post hoc test, were used to compare the probability of finding optically evoked responses across the HVC projection neuron classes while stimulating the different afferents. Dunn’s, Sidak’s or Holm–Sidak’s post hoc tests were used to correct for multiple comparisons. Statistical significance refers to *P < 0.05, **P < 0.01 and ***P < 0.001.

Statistics and reproducibility

Each experimental result was produced independently and/or by combining at least two separate cohorts with similar results (for example, Uva lesions/silencing in Fig. 3 and Extended Data Fig. 4, multi-nuclei lesions in Extended Data Fig. 6 and HVCX–TeNT experiments in Fig. 5 and Extended Data Fig. 13). Figures showing viral expression or lesion extent are broadly representative of each experimental group.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.