Holistic motor control of zebra finch song syllable sequences

Trusel, Massimo; Zuo, Junfeng; Alam, Danyal H.; Marks, Ethan S.; Koch, Therese M. I.; Cao, Jie; Pancholi, Harshida; Zhao, Ziran; Cooper, Brenton G.; Zhang, Wen-Hao; Roberts, Todd F.

doi:10.1038/s41586-025-10069-z

Download PDF

Article
Open access
Published: 28 January 2026

Holistic motor control of zebra finch song syllable sequences

Nature volume 652, pages 157–166 (2026)Cite this article

11k Accesses
1 Citations
15 Altmetric
Metrics details

Subjects

Abstract

How brain circuits are organized to skillfully produce learned sequences of behaviours is still poorly understood. Here we functionally examined how the cortical song premotor region HVC, which is necessary for zebra finch song¹, controls the sequential production of learned song syllables. We found that HVC could generate the complete sequence of learned song syllables independently of its main synaptic input pathways. Thalamic input to HVC was needed for song initiation, but it was not required for transitions between syllables or for song completion. We showed that excitation of HVC neurons during song reliably caused vocalizations to skip back to the beginning of the song, in a manner reminiscent of a skipping record. This restarting of syllable sequences could be induced at any moment of the song and relied on local circuits within HVC. We identified and computationally modelled a synaptic network, including intratelencephalic premotor and corticostriatal neurons within HVC that are essential for completing song syllable sequences. Together, our results show that the learned zebra finch song is controlled by a cortical sequence-generating network in HVC that, once started, can sustain production of all song syllables independent of major extrinsic input pathways. Thus, sequential neuronal activity can be organized to fuse well-learned vocal motor sequences, ultimately achieving holistic control of this naturally learned behaviour.

Thalamus drives vocal onsets in the zebra finch courtship song

Article 22 March 2023

Neural correlates of vocal initiation in the VTA/SNc of juvenile male zebra finches

Article Open access 17 November 2021

Unsupervised restoration of a complex learned behavior after large-scale neuronal perturbation

Article 29 April 2024

Main

Motor behaviours are considered to be learned by splitting and chunking smaller behavioural units into sequences of neural activity and then concatenating the sequences into a unified premotor plan that supports the fluid production of the entire behaviour^2,3,4,5,6,7. Although there is evidence for chunking^3,6, identifying the neural origin for unified premotor programs has remained challenging. The control of learned birdsong provides a tractable model for searching such premotor programs. Songbirds are among the few groups of animals, apart from humans, that learn their vocalizations through imitation. Moreover, birdsong is controlled by dedicated forebrain circuits⁸.

Zebra finches learn a single courtship song motif. They engage in extensive daily practice to maintain expert performance of this song. Sparse sequential neuronal activity in the pallial song nucleus HVC probably underlies the production of zebra finch song^{9,10,11,12,13}. However, how neural sequences in HVC contribute to the progression of the song motif is still not well understood. Several lines of evidence support the idea that song control may involve reciprocal loops spanning the brainstem, thalamus and pallium^14,15,16,17 (Extended Data Fig. 1a), whereas other studies suggest that HVC may be capable of generating neural sequences for song production more autonomously^10,11,18,19.

There are various models of how adult songs are controlled: (1) sequential activity in HVC can sustain progression through all song syllables independently of instructive afferent inputs^18,19,20; (2) input pathways link shorter neural sequences at syllable or other vocal parameter boundaries^14,17,21,22; and (3) HVC sequences are continuously updated by instructive afferent input^15,16. Research in this area has relied on correlations between song and electrophysiological recordings or on non-selective circuit manipulations, including electrical stimulation, cooling of brain regions and electrolytic lesioning. Here we combine a series of cell-type, circuit and pathway manipulations with synaptic mapping and computational modelling to causally examine how neural sequences contribute to completing the song motif. This study reveals that, barring a permissive thalamic input important for song initiation, HVC can independently propagate activity for production of all song syllables in the motif, and that this network relies on two synaptically interconnected classes of HVC projection neurons.

Optogenetic restarting of song

Electrical stimulation of HVC has varied effects on song production, including distortion of syllable acoustic features, truncation of song and occasional restarting of song soon after song truncation^23,24. However, these studies are difficult to interpret because stimulation cannot be restricted to specific cell types or to cells within a small spatial volume, the stimulated population of neurons is highly dependent on electrode placement and there is inevitable antidromic and orthodromic activation of neurons and passing axons²⁵.

Instead, we selectively controlled HVC activity using viral expression of the excitatory opsin ChRmine (n = 6 birds; Fig. 1a and Extended Data Fig. 1b). This provided experimental control over a population of HVC neurons composed of approximately 20% inhibitory neurons and 80% principal neurons, with a bias towards HVC_X projection neurons²⁶. Birds were implanted with fibre optics over HVC, and syllable detection software was used to perform closed-loop optogenetic manipulations while the birds were freely singing. Light stimulation reliably caused song truncation, seen as a rapid decrease in sound amplitude and disruption in syllable acoustic features (stimulation outcome probability: 86.8 ± 3.6% truncation and 10.2 ± 3.7% pause + continuation; latency to silence from onset of stimulation: 66.6 ± 4.1 ms and average ± s.e.m. in six birds; Fig. 1b–g and Extended Data Fig. 1c–g). Truncation was followed by the rapid restarting of the song motif (median, 135.8 ± 25.8 ms; lowest quartile, 87.6 ± 15.5 ms; Fig. 1b–d,h–j and Extended Data Fig. 1c). The birds restarted their song from the beginning, with one or two introductory notes followed by the motif or directly back to the first syllable of the motif, and this resetting behaviour occurred with high probability, independent from when in the song the optogenetic stimulation was triggered (Fig. 1h; all post-truncation trials reported in Extended Data Fig. 1h). When normalized by the likelihood of the birds to chain multiple motifs in series, the probability of a stimulated motif to be immediately followed by another motif was 108.6 ± 4.9%, suggesting that the optogenetic perturbation caused the song to restart from the beginning of the motif without prematurely ending the song bout (Fig. 1i and Extended Data Fig. 1i).

Fig. 1: Optogenetic excitation of HVC causes truncation and restarting of the song motif. — Fig. 1: **Optogenetic excitation of HVC causes truncation and restarting of the song motif.**

To better understand how our circuit manipulations affect the motor control of the song, we recorded subsyringeal air sac pressure during optogenetic stimulations. We found that optogenetic stimulation applied during quiet respiration neither induced vocalization nor altered respiratory patterns in the birds. By contrast, stimulation during singing caused rapid cessation of expiration during ongoing syllables (Fig. 1k,l and Extended Data Fig. 1j). Syllable truncations resulted from significant respiratory pressure deviations within 36.4 ± 4.0 ms of light onset, approximately 30 ms before vocalizations were acoustically truncated, consistent with previous studies^15,27 (Extended Data Fig. 1k). Finally, we found that optogenetic stimulation trials in which birds did not quickly restart singing could be the result of apnoea. Thus, in some cases, optogenetic activation suppressed involuntary respiration, which effectively blocked the reinitiation of song (apnoea duration: 588.2 ± 216.8 ms; Extended Data Fig. 1l).

Together, these data indicate that HVC can control downstream steady-state respiration circuitry in a state-dependent manner, and that once HVC is engaged, stimulation interrupts the chain of activity in HVC, resulting in abrupt song truncation and resetting of the motif back to its initial state. These attributes are reminiscent of response to perturbation in central pattern-generating (CPG) networks described in the invertebrate and vertebrate nervous systems^28,29. Another defining feature of CPG networks is that once initiated, they can produce patterned activity in the absence of instructive patterned input. The seemingly automatic and rapid restarting of song hints that extrinsic inputs to HVC may function permissively, rather than instructively, in song motif production (Fig. 1m), raising the possibility that HVC produces the neuronal sequences for song in the absence of instructive patterned input and may function as a pattern-generating network for song syllable sequences.

Song initiation needs thalamic input

Input to HVC from the thalamic nucleus Uvaeformis (Uva) is one probable source of instructive signals for producing the song motif^18,22,30,31. Electrical stimulation of Uva was reported to cause motif truncation at syllable boundaries¹⁷, suggesting that the inputs of Uva to HVC are instructive for motor programs to transition from one syllable to the next. To test this idea, we first used optogenetic excitation of the axon terminals of Uva in HVC through viral expression of eGtACR1, an opsin that potently drives excitation of axon terminals in zebra finches³¹ (Extended Data Fig. 2a,b). Light stimulation of eGtACR1-expressing Uva–HVC terminals drives strong transient increases in HVC activity (Fig. 2a and Supplementary Table 1). In contrast to thalamic electrical stimulation¹⁷, the optogenetic excitation of Uva terminals during singing did not cause motif truncation and left song syntax and spectral characteristics unaffected (Fig. 2b,c, Extended Data Fig. 3a,b and Supplementary Table 1).

Fig. 2: Uva does not instruct transitions between syllables in the song motif. — Fig. 2: **Uva does not instruct transitions between syllables in the song motif.**

The lack of effects on song, even with prolonged stimulation, prompted us to test whether direct optogenetic excitation in Uva disrupts song. We expressed the excitatory opsin ChRmine in Uva neurons projecting to HVC (Uva_HVC) using an intersectional viral strategy (Fig. 2d). We found that even directly stimulating Uva_HVC neurons failed to cause song truncation and restarting (1.2 ± 1.2%, motif stop; 98.9 ± 1.2%, no effect; Fig. 2e,f and Supplementary Table 1). Moreover, this manipulation had no detectable impact on the spectral characteristics of song syllables (Fig. 2g, Extended Data Fig. 3c and Supplementary Table 1).

One possibility is that manipulations such as electrical stimulation may drive truncation at syllable boundaries through off-target effects, such as recruiting nearby thalamic regions or fibres of passage. Uva is located within the posterior commissure, which connects midbrain regions critical for vocalizations, audition and vision. It is immediately adjacent to the robust nucleus of the arcopallium (RA) fibre tract, which transmits descending motor commands for song (Supplementary Video 1). Neurons in and surrounding Uva relay visual information to the forebrain³², and sudden visual stimulation with a stroboscope elicits orienting responses in zebra finches^27,33 that result in motif truncations at syllable boundaries, similar to those observed with electrical stimulation of Uva¹⁷.

To assess whether off-target effects could be involved in truncating motifs at syllable boundaries, we attempted to mimic the effects of electrical stimulation by non-selectively expressing ChRmine in Uva and the surrounding thalamus (Extended Data Fig. 3d). Broader thalamic optogenetic stimulation resulted in reliable motif truncation at syllable boundaries (91.5 ± 3.6%, motif stop; 0.4 ± 0.4%, pause + continuation; Extended Data Fig. 3d–k). In contrast to optogenetic stimulation in HVC or along the Uva–HVC pathway, optogenetic stimulation of the broader thalamus caused birds to momentarily stop movement and blink, both during singing and non-singing states. This suggests that the manipulation causes a visually evoked orienting response, perhaps mimicking responses to strobe-light visual stimulation^27,33. Consistent with this, broader thalamic stimulation resulted in significantly longer truncation latencies than direct HVC stimulation (Extended Data Fig. 3g–k) and predominantly led to cessation of singing rather than resetting of song (Extended Data Fig. 3l,m). In the few instances when birds returned singing, the motif reset latency was significantly longer than what we observed when stimulating HVC (Extended Data Fig. 3n,o). These findings suggest that electrical stimulation-triggered song truncations are the result of off-target stimulation of the peri-Uva thalamus, and that Uva is not instructive for HVC syllable sequence progression.

We next examined if Uva could play a permissive role in song production. Electrolytic lesions of Uva or peri-Uva regions can abolish courtship song production^22,30. However, electrolytic lesions non-selectively ablate neurons and damage axonal fibres. To minimize damaging axonal tracks, we performed bilateral excitotoxic lesions of Uva using a cocktail of ibotenic and quisqualic acid (n = 13 birds). This strategy yielded three outcomes: (1) complete lesions of Uva (99.6 ± 0.4% Uva lesioned) that also included the peri-Uva thalamus that resulted in birds that could no longer sing their motif; (2) large peri-Uva thalamic lesions that mostly spared Uva (10.8 ± 3.8% Uva lesioned) that resulted in birds that also could no longer sing their motif; and (3) almost complete Uva lesions (87.5 ± 7.0% Uva lesioned) that spared the broader peri-Uva thalamus and resulted in birds that could sing their motif within approximately 1 week following lesion (Fig. 3a–c and Extended Data Fig. 4a). This last group of birds demonstrates that HVC can drive production of the entire song motif, even when Uva is significantly lesioned. Nonetheless, we found that these birds chained significantly fewer motifs together in each song bout (Fig. 3d and Supplementary Table 2), and they would often fail to produce their song motif after singing introductory notes (Fig. 3e and Supplementary Table 2). These findings are consistent with Uva lesions disrupting the ability of birds to initiate courtship song performances to female birds³⁰, and they suggest that Uva may be needed for the initiation of the song motif.

Fig. 3: Uva is permissive for motif initiation. — Fig. 3: **Uva is permissive for motif initiation.**

To test the role of Uva in song initiation, we first blocked glutamate release from Uva_HVC neurons using viral expression of tetanus neurotoxin (TeNT) (Fig. 3f and Extended Data Fig. 4b). These birds had progressive difficulty initiating their song on a timeline consistent with viral expression (approximately 10–14 days). They had increasing failures in motif initiation following singing of introductory notes and decreased number of motifs per song bout. However, in instances when the motif was initiated, the birds consistently produced all song syllables in the motif with high accuracy (Fig. 3f–i, Extended Data Fig. 4c and Supplementary Table 2). These data support the idea that the Uva–HVC pathway is permissive for initiating learned song motifs rather than instructing song syllable transitions¹⁷.

To test this, we expressed eGtACR1 in Uva and optogenetically silenced Uva neurons during singing (Extended Data Fig. 4d). We found that silencing Uva during an ongoing song motif did not disrupt the completion or acoustic structure of that motif, but it reduced the probability of initiating and concatenating a subsequent motif (Extended Data Fig. 4e,f). By contrast, using the same birds and placing fibre optics over HVC to excite Uva axon terminals across motif transitions did not suppress initiation of a subsequent song motif (Extended Data Fig. 4g,h). Thus, if Uva input to HVC is excited, birds can continue the ongoing motif and string other motifs together. If instead it is inhibited, birds still complete the ongoing motif but exhibit difficulty starting the next song motif. Thus, the Uva–HVC pathway is critical for initiating song motifs, potentially coordinating the two hemispheres, but not needed for birds to string together syllables within the song motif.

Pallial afferents are not needed for song

HVC receives excitatory input from three auditory and premotor pallial regions that play important roles in song learning: nucleus interfacialis (NIf), nucleus avalanche and medial magnocellular nucleus of the anterior nidopallium (mMAN)^{30,34,35,36,37}. We examined the role of each pathway in adult song performance. Stimulation of eGtACR1-expressing axon terminals in HVC from any of these regions significantly increased HVC multi-unit firing activity (Extended Data Fig. 5a,g,m). However, 200-ms-long or 1-s-long song-contingent light stimulation of any of these input pathways failed to affect spectrotemporal motif characteristics (Extended Data Fig. 5a–r). We therefore tested whether these afferents are necessary for adult song performance. Previous studies indicate that bilateral lesions of either NIf, mMAN or nucleus avalanche in adults do not cause any long-lasting disruptions in song^34,36,38. However, it has been shown that compensation by other pathways could account for the lack of sustained effects on song. Therefore, we consecutively lesioned mMAN, NIf and nucleus avalanche in the same birds using ibotenic and quisqualic acid. Bilateral lesions of these nuclei (mMAN, 100.0 ± 0.0%; NIf, 92.9 ± 4.0%; nucleus avalanche, 100.0 ± 0.0%; lateral magnocellular nucleus of the anterior nidopallium (lMAN), 82.5 ± 7.7%; Extended Data Fig. 6a–i) caused only a temporary decrease in motif quality. The song motif quickly recovered to its pre-lesioned state (Extended Data Fig. 6j), and, unlike Uva lesions, these lesions did not impact the number of motifs per bout or cause disruptions in motif initiation (Extended Data Fig. 6k,l). This demonstrates that HVC can generate the sequential activity necessary for completing song independent of its known main excitatory synaptic afferents, further supporting the idea that HVC is the origin of a unified premotor program for the zebra finch song motif.

Song pattern-generating network in HVC

To further define the circuit boundaries of the song pattern-generating network, we examined whether downstream target regions of HVC are critical to pattern generation. We reasoned that the kinetics of the post-truncation restarting of song provides a sensitive behavioural read-out of pattern resetting and could clarify whether those neural circuits are involved in song pattern generation or simply relay the patterned output. Disruption of a pattern generation node would produce truncation and reset latencies similar or faster than those observed upon HVC stimulation, whereas a relay node would result in low-latency truncations followed by low-probability and longer-latency motif resetting. HVC has two major output pathways: the descending song motor pathway through the pallial song region RA and the palliostriatal pathway through area X, emerging from HVC_RA and HVC_X neurons, respectively⁸.

We bilaterally expressed ChRmine in either area X or RA and light stimulated each region in freely singing birds. Driving area X neurons rarely caused motif truncation (truncation probability, 2.9 ± 1.6%; no effect, 97.2 ± 1.6%; Extended Data Fig. 7a,b). The truncations we observed occurred at syllable boundaries and were significantly delayed (latency, 146.7 ± 36.7 ms) compared with the uniform song truncations observed with HVC optogenetic stimulation. Nonetheless, stimulation of area X neurons consistently caused a modest increase in the noisiness of stimulated syllables (Extended Data Fig. 7c,f), consistent with the known role of the basal ganglia pathway³⁸.

By contrast, optogenetic stimulation in RA caused rapid motif truncations with high reliability (92.2 ± 2.7%, motif stop; 1.5 ± 1.8%, pause + continuation; Extended Data Fig. 8a–c). These truncations exhibited uniform latency across song, similar to stimulation in HVC (Extended Data Fig. 8d–f). Because RA is downstream of HVC in the song motor pathway, we might expect birds to restart their song as fast, or faster, than when stimulating in HVC and with equal probability. However, we found that RA stimulation is less likely to be followed with restarting of the motif. When it does, it takes significantly longer than following HVC stimulation (Extended Data Fig. 8g–k). This argues that the song pattern-generating network is localized to HVC and that RA functions downstream of this network to relay motor commands for song.

To test this prediction, we moved upstream by one synapse and examined whether optogenetic stimulation of HVC_RA neurons would produce the truncation and song restarts with the same timing and reliability as our pan-HVC optogenetic manipulations, as shown in Fig. 1. We used an intersectional viral strategy to achieve ChRmine expression only in HVC_RA neurons (Extended Data Figs. 9 and 10). As anticipated, song-contingent HVC_RA stimulation reliably caused rapid truncations throughout song (86.2 ± 5.6%, motif stop; 2.5 ± 1.8%, pause + continuation; Extended Data Fig. 9a–c). Unexpectedly, the latency to truncation was significantly longer than what we observed with pan-HVC stimulation. In some instances, it seemed to occur closer to syllable boundaries (panHVC, 66.6 ± 4.1 ms; HVC_RA, 79.1 ± 4.8 ms; Extended Data Figs. 9b,d and 10c,d). Although post-truncation motif reset probability is comparable to that observed with pan-HVC stimulation (Extended Data Figs. 9e,f and 10e,f), restart latency was intermediate to the timing of pan-HVC and RA stimulation (Extended Data Figs. 9g and 10g–i). Although different latencies to first spike among HVC projection classes may influence truncation and restart dynamics³⁹, this intermediate timing is still surprising because chains of excitatory synaptic connections among HVC_RA neurons are considered to be a central component of the network controlling song^9,10,11. This prompted us to investigate whether the other main class of HVC projection neurons (HVC_X neurons) may contribute to the rapid restarting of the song motif.

HVC_X neurons in song pattern generation

Similar to HVC_RA neurons, HVC_X neurons exhibit temporally precise and sparse activity during production of the song motif^12,40,41, but their role in song generation is not known. They are considered to relay timing activity to the basal ganglia rather than directly contributing to song pattern generation^42,43,44. However, paired recordings from HVC neurons in sleeping birds suggest that HVC_X neurons can reliably lead the activity of interneurons and HVC_RA neurons⁴⁵, suggesting that they could contribute to song pattern generation.

We expressed ChRmine in HVC_X neurons (Extended Data Fig. 11a,b) and found that stimulation reliably triggered song truncations (88.4 ± 3.0%, motif stop; 5.0 ± 2.5%, pause + continuation; Fig. 4a–c and Supplementary Table 3). The latency of truncation matched truncations elicited by pan-HVC stimulation (panHVC, 66.6 ± 4.1 ms; HVC_X, 71.1 ± 4.1 ms; Fig. 4d, Extended Data Fig. 11c,d and Supplementary Table 3). Truncations were followed by rapid song restarts with probability and latency comparable to pan-HVC stimulation (Fig. 4e–g, Extended Data Fig. 11e–g and Supplementary Table 3). Song restart following HVC_X stimulation was faster than that from HVC_RA stimulation (Extended Data Fig. 11h,i). Although HVC_RA neurons display longer spiking latencies than HVC_X neurons³⁹, which could contribute to the observed truncation dynamics, these rapid truncation and restart kinetics suggest that HVC_X neurons could be part of the core song pattern-generating network, rather than only relaying timing signals to the basal ganglia.

Fig. 4: Selective optogenetic stimulation of HVCX neurons restarts the song motif. — **Fig. 4: Selective optogenetic stimulation of HVC_X neurons restarts the song motif.**

Given that direct optogenetic stimulation of area X does not result in song truncations (Extended Data Fig. 7a,b), local synaptic transmission from HVC_X neurons within HVC is probably the source of truncation and rapid restarting of song. We tested this by antidromically recruiting indirect and delayed excitation of axonal collaterals within HVC²³ using optogenetic stimulation of HVC axon terminals in area X (Extended Data Fig. 12a,b). This stimulation should cause delayed song truncation, yet the motif restart kinetics should be similar to those upon direct HVC stimulation. We found that stimulation resulted in delayed motif truncation, although with lower probability compared with pan-HVC or HVC_X direct stimulation, consistent with the limitations of antidromic propagation²³ (Extended Data Fig. 12c–e). Nevertheless, we found that post-truncation restarting of the song motif had the same probability and latency as stimulating HVC_X neuronal somata (Extended Data Fig. 12f–h).

Previous studies have identified the main synaptic connectivity motif in HVC to be disynaptic reciprocal inhibition between HVC_RA and HVC_X neurons through local interneurons^46,47. This implicates a surge of inhibition onto HVC_RA neurons following stimulation of HVC_X neurons. However, this does not explain the more rapid truncation and reset of the motif upon HVC_X stimulation. To examine how HVC_X neurons may contribute to the song sequence propagation, we mapped the local connectivity of HVC_X neurons using opsin-assisted synaptic circuit mapping. Stimulating neurotransmitter release from HVC_X neurons in brain slices evoked excitatory and inhibitory postsynaptic currents in HVC_X and HVC_RA neurons (Fig. 5a and Extended Data Fig. 13a). Isolating monosynaptic connections using bath application of tetrodotoxin (TTX) followed by 4-aminopyridine (4-AP) revealed that HVC_X neurons make monosynaptic connections with HVC_RA and HVC_X neurons with high and low probability, respectively (Fig. 5a). Previous studies using paired recordings and electron microscopy suggest that HVC_RA neurons have only sparse connectivity with other HVC_RA neurons. However, they are more reliably synaptically connected with HVC_X neurons^46,47,48. These data support a model in which the two HVC projection neuron classes form a heterosynaptic network, along with local interneurons, that can holistically sustain song pattern generation.

Fig. 5: HVCX neurons are part of song pattern-generating network and model of HVC. — **Fig. 5: HVC_X neurons are part of song pattern-generating network and model of HVC.**

HVC pattern-generating network model

To test if a circuit consistent with this synaptic organization could sustain song progression and restarting, we modelled the network by arranging HVC_RA, HVC_X and local inhibitory neurons uniformly on a ‘chain’ (Fig. 5b,c). Synaptic weights in our model were symmetric, and the weights of excitatory and inhibitory connections decayed with distance along the chain. A pool of inhibitory neurons was driven by excitatory neurons and provided global inhibitory feedback. Song initiation was mediated by excitatory input from Uva onto a recently identified class of ‘peri-song’ HVC_RA neurons, which became active just before song onset and were inactive during singing⁴⁹ (Fig. 5b). Local inhibitory activity spatially lagged excitatory activity in the direction of sequence propagation, thereby providing stronger inhibition to excitatory neurons at earlier positions in the chain and effectively pushing excitatory activity forward along the chain (Fig. 5c,d and Supplementary Videos 2 and 3). Weakened excitatory synapses at the end of the chain stopped sequence propagation and released peri-song neurons from inhibition, leading to spontaneous restarting of the song motif if excitatory drive from Uva remained intact (Fig. 5e).

We modelled how this circuit responded to strong synchronous excitation, mimicking our optogenetic manipulations (Fig. 5e,f). The activation function of global inhibitory neurons was steeper than excitatory neurons, and the excitatory and global inhibitory neurons were strongly connected in a parameter regime called inhibition stabilization⁵⁰. Thus, synchronous excitatory activity caused widespread and strong feedback from inhibitory neurons that blocked sequence propagation. Once the sequence was truncated, it could be spontaneously restarted through peri-song neurons, akin to restarting the song after its natural ending (Fig. 5e,f and Supplementary Videos 2 and 3). Unexpectedly, we found that the parameter settings that allowed the model to generate a moving neural sequence, optogenetic truncation and spontaneous restarting of sequence generation resulted in dynamics that qualitatively matched the timing of behaviour we measured following optogenetic manipulations of HVC (Fig. 5f). Although this simple rate-based model does not capture the precise spike-timing activity of all HVC neurons, it demonstrates that an inhibition-stabilized pattern-generating circuit matches our behavioural results following circuit perturbations, and an emergent property of the model is that its dynamics also match the timing of song truncation and restart.

Although previous studies have suggested that HVC_X neuronal lesions leave song intact^43,44, our model simulations predict that weakening the contribution of HVC_X neurons to the chain results in premature truncation of neural sequence propagation, followed by spontaneous restarting of the motif (Fig. 5g,h and Supplementary Videos 2 and 3). To test these predictions, we suppressed excitatory synaptic transmission from HVC_X neurons using selective viral expression of TeNT (Fig. 5i). In contrast to the effect of expressing TeNT in Uva, which resulted in birds producing their song motif in an all-or-none fashion (Fig. 3f–i), we found that TeNT in HVC_X neurons caused birds to progressively increase the likelihood of prematurely truncating their song motifs (Fig. 5j–l and Extended Data Fig. 13b–d). Birds exhibited song truncations both within and between syllables (Fig. 5j and Extended Data Fig. 13e). The truncations occurred progressively earlier in the motifs, with timelines consistent with viral expression dynamics and with effects directly proportional to the amount of TeNT expression (Fig. 5k and Extended Data Fig. 13b,f). Consistent with model predictions, we also found that birds would frequently restart their songs following premature motif truncations, and that the latency of these restarts matched those observed upon direct optogenetic excitation of HVC_X neurons (Fig. 5j and Extended Data Fig. 13e,g). Together, these results demonstrate that interruption of HVC activity, by either optogenetic perturbations or silencing of HVC_X neurons, drives song truncation and probably releases a common circuit mechanism driving rapid restarting of the neural sequence for song.

Discussion

Neuronal sequences in HVC have been proposed to function as a clock, which controls the timing and progression of the song motif^{9,11,12,13,15}. Considerable debate has centred on whether these patterns of activity require instructive patterned input for motif completion. Here we provide key observations indicating that adult HVC functions as a sequence-generating network that does not require patterned input, at least from its best described afferent pathways, to complete the song motif.

Chunking of motor sequences, followed by concatenation of commonly repeated sequences, is a proposed mechanism for optimizing learning and performance^{2,4,5,7,51,52,53,54,55}. Motor chunking is thought to function in learning and production of the movement sequences needed for fluent speech production and other well-practiced behaviours. Early stages of juvenile bird song development involve splitting and growth of neural sequences in HVC as new syllables are being learned^6,56. This process probably reflects chunking of respiratory and vocal patterns needed for accurate and rapid learning. Juvenile songbirds progressively shape their song, practicing thousands of times per day, and pallial input pathways are necessary to direct such developmental song learning^35,37,57,58.

Here we show that these main pallial input pathways to HVC are dispensable for adult song production. Moreover, our cell-type selective manipulations in HVC demonstrate that ordered syllable sequence of the song motif spontaneously restarts if it is prematurely truncated. This skipping of song back to the beginning is reminiscent of CPG rhythm resetting and consistent with the holistic control of the motif by a sequence-generating circuit in HVC^20,29. These findings suggest that the phase at the end of song development, referred to as ‘crystallization’, involves consolidation of neural programs for motor control within HVC. We propose that as song developmentally becomes more stereotyped and precise, consistent daily practice concatenates these sparse neural sequences into a stable chain that autonomously sustains song motif completion.

Birds lack a corpus callosum; therefore, our findings raise questions about how the interhemispheric timing of pattern-generating circuits is coordinated. Uva receives bilateral ascending input from the respiratory medulla and midbrain vocal circuits and is therefore considered to play a prominent role in interhemispheric coordination of HVC^14,15,22. Accordingly, we found that thalamic input from Uva is needed for initiating the song motif. However, our evidence indicates that it is not needed for transitioning from syllable to syllable within the motif. Uva may therefore send synchronized onset cues for song to coordinate initiation of each motif in the song bout, which then continues autonomously in each HVC. Nevertheless, Uva remains active throughout the motif and could function as a metronome, providing timing signals that support interhemispheric coordination without being required to instruct transitions within ongoing song motifs.

Our synaptic connectivity mapping finds HVC_X neurons consistently making monosynaptic connections with HVC_RA neurons but only sparsely with other HVC_X neurons. This supports the idea that the main synaptic connectivity within HVC involves disynaptic inhibition and monosynaptic excitation between HVC_RA and HVC_X neurons, rather than homotypic connections within either class of excitatory neurons^46,47. More research will be needed to fully describe cell-type connectivity in the network and understand how song sequence progression is fully controlled. Nonetheless, we propose a straightforward computational model of HVC that can sustain sequence generation and song restarting following circuit perturbations. Moreover, our model simulations indicate that a substantial reduction in HVC_X neuronal transmission leads to stochastic song truncations and restarting of the motif, a prediction matched by our selective expression of TeNT in these neurons. However, the HVC song circuit seems robust to moderate perturbation, as shown by the lack of effect in previous HVC_X ablation studies^43,44, focal lesion studies⁵⁹ and our own model predictions (Fig. 5g,h), as well as by our data showing that birds with lower levels of TeNT expression did not exhibit significant disruptions in song (Fig. 5k,l and Extended Data Fig. 13b–d).

In summary, this study reveals that a premotor circuit, facilitated by thalamic input, can holistically control strings of vocal syllables and more precisely defines the synaptic circuit architecture critical to this pattern-generating circuit. In future experiments, it will be important to examine how ‘fused’ sequence elements are integrated for the control of other types of natural behaviours and learned vocalizations with greater syntax complexity. Zebra finches produce only a single stereotyped sequence of song syllables, making them an ideal model for first testing how the brain controls strings of vocal gestures. We propose that chunking, followed by concatenation of reliably reproduced neuronal sequences, underlies these behaviours, and that the approaches applied here can help identify the boundaries of the motor sequences used by the brain to support production of complex behaviours.

Methods

Animals

The experiments described in this study were conducted using adult male zebra finches (Taeniopygia guttata; 120–500 days post-hatch). All procedures were performed in accordance with protocols approved by the Animal Care and Use Committee at UT Southwestern Medical Center.

Viral vectors

The following adeno-associated viral vectors were used in the experiments: rAAV2/9/fDIO–CBh–eGTACR1–mScarlet, rAAV2/9/CBh–Flippase, rAAV2/9/CBh–ChRmine–mScarlet, rAAV2/9/DIO–CAG–ChRmine–mScarlet, rAAV2/9/DIO–CAG–TeNT–mScarlet (Intellectual and Developmental Disabilities Research Center Neuroconnectivity Core at Baylor College of Medicine) and rAAV2/9/CMV–CRE–eGFP (Addgene). All viral vectors were aliquoted and stored at −80 °C until use.

Stereotaxic surgery

Aseptic stereotaxic surgeries were performed after birds were anaesthetized (isoflurane inhalation; 0.8%–1.5%).

Viral injections were performed using previously described procedures^26,37,58. Briefly, a cocktail of adeno-associated viral vectors (rAAV/CBh–ChRmine in HVC, RA, area X or thalamus (2 µl per hemisphere); 1:2 of rAAV/CBh–FLP and rAAV/DIO–CBh–eGtACR1, respectively (1–2 µl total per hemisphere); rAAV/DIO–CAG–ChRmine in HVC or Uva (2 µl); rAAV/CMV–Cre in RA, area X or HVC (0.5–1 µl and 2 µl, respectively); rAAV/DIO–TeNT in HVC or Uva (2 µl); and rAAV/CMV–CRE in area X or HVC (2 µl), respectively) were injected (1 nl s⁻¹) into target areas with a Nanoject III (Drummondsci) and glass capillaries. Experiments were conducted starting a minimum of 3 weeks after viral injections. Fluorophore-conjugated retrograde tracers (Dextran 10,000 MW, AlexaFluor 488, 568 and 647, Invitrogen; Fast Blue, Polysciences) were injected bilaterally into area X, RA or HVC (160 nl; 5 × 32 n, 32 nl s⁻¹ every 30 s) (refs. ^26,37,58). Electrophysiological mapping was used to determine the centres of HVC, NIf, mMAN, LMAN and RA, and area X, nucleus avalanche and Uva were identified using stereotaxic coordinates (coordinates relative to interaural zero: head angle, rostral–caudal, medial–lateral, dorsal–ventral (in mm). The stereotaxic coordinates were as follows: HVC (45°; anterior–posterior, 0; medial–lateral, ±2.4; dorsal–ventral, −0.2 to −0.6), NIf (45°; anterior–posterior, 1.75; medial–lateral, ±1.75; dorsal–ventral, −2.4 to −1.8), mMAN (20°; anterior–posterior, 5.1; medial–lateral, ±0.6; dorsal–ventral, −2.1 to −1.6), lMAN (20°; anterior–posterior, 5.1; medial–lateral, ±1.7; dorsal–ventral, −2.2 to −1.6), RA (80°; anterior–posterior, −1.5; medial–lateral, ±2.5; dorsal–ventral, −2.4 to −1.8), X (45°; anterior–posterior, 4.8; medial–lateral, ±1.6; dorsal–ventral, −3.3 to −2.7), nucleus avalanche (45°; anterior–posterior, 1.65; medial–lateral, ±2.0; dorsal–ventral, −0.9) and UVA (20°; anterior–posterior, 2.5; medial–lateral, ±1.6; dorsal–ventral, −4.8 to −4.2).

Optogenetic manipulations

For optogenetic stimulation, optic fibres (multimode 400 µm; 0.39 numerical aperture; ThorLabs) were implanted bilaterally dorsal to HVC, RA, area X or Uva using acrylic glue and dental cement. Although the 400-µm-diameter fibres may not completely cover the entirety of the areas, we estimated that the cone of light could stimulate the vast majority of the targeted neurons. After recovery, the implanted fibres were connected to optic fibres through ceramic sleeves. The fibres were connected to a rotary joint and interfaced with a 1.5-mm multimode fibre connected to a light-emitting diode box (Prizmatix). Light intensity was regulated to achieve a final output of approximately 10 mW. We used a custom software (pcaf; LabVIEW) to deliver optogenetic stimulation during song (200 ms or 1 s for HVC afferent stimulation, 10–50 ms for direct ChRmine somatic stimulation and 50–200 ms for antidromic HVC_X stimulation). In many instances, our goal was to target as many moments as possible within a bird song motif. To achieve this, we targeted most of the motifs birds were producing using quasi-random light onset delays introduced through a transistor–transistor logic. This targeting strategy allows for a detailed analysis of motif-level effects but limits our ability to conduct meaningful song-bout-level analysis of the behaviour. We note that light delivery over HVC or other brain regions is not sufficient to cause truncations or other disruptions in singing behaviour because several experiments using light stimulation (light stimulation of afferent pathways into HVC or of area X neurons) have no effect on singing behaviour. Air sac recordings and analysis were performed as previously published¹⁵.

Lesion quantification

Excitotoxic lesion was induced by 1% ibotenic acid (50–100 nl per injection site) or a cocktail of 1% ibotenic acid and 100 mM quisqualic acid (Uva and LMAN). Lesion extent was first verified by the absence or sparseness of NeuN immunostaining in the targeted nuclei. To provide an unbiased estimate of the lesion extent, retrograde tracers were injected in HVC and RA to highlight any surviving cells in the afferent nuclei. In control animals, the number of retrograde tracer-filled cells in each nucleus was quantified, and correlations were calculated between cell counts in each nucleus (Extended Data Fig. 6a–f). This analysis provided a statistical validation to extrapolate the number of cells in a target nucleus from the number of cells counted in a reference nucleus. Therefore, an average ratio across nuclei cell counts was calculated. On the basis of these control ratios and the number of cells in a non-lesioned reference nucleus, the expected number of retrogradely filled cells in each nucleus of each hemisphere was estimated.

In vivo extracellular recordings

To test the functional expression of opsins, we performed extracellular recording of HVC activity in birds under light isoflurane anaesthesia (0.8%) with Carbostar carbon electrodes (impedance: 1,670 µΩ cm; Kation Scientific). A 400-µm multimodal optical fibre was placed on the brain surface overlaying HVC and delivered light stimulation (470 nm; approximately 20 mW; 1 s) during neural recordings. To test antidromic excitation of HVC_X neurons by axon terminal optical stimulation, optic fibres were implanted over area X (470 nm; approximately 20 mW; 100 ms). Signals were acquired at 10 kHz and band-pass filtered (300 Hz high-pass; 20 kHz low-pass). Spike rate (binned every 10 ms) and PSTHs were calculated to quantify light stimulation responses (one to five sites per hemisphere; Spike2). Birds without optically evoked responses were excluded from experiments. Spike counts and PSTHs were normalized to the pre-stimulus baseline (500 ms). Two-way analyses of variance (ANOVA) were calculated comparing the time course between stimulated and not stimulated recordings: for testing HVC afferents (1-s stimulation), 0–5 s (light stimulation; 0.5–1.5 s) versus 5–10 s (control; no stimulation); for ChRmine-expressing HVC neurons or HVC→area X stimulation (100-ms stimulation) 0.7–1.4 s (300 ms before and after 100-ms light stimulation) versus 5.7–6.4 s (control; no stimulation). Wilcoxon tests were performed on the average time course (with intervals specified in the figure legends).

Ex vivo physiology

Slice preparation

Zebra finches were deeply anaesthetized and then decapitated. The brain was removed from the skull and submerged in cold (1–4 °C) oxygenated dissection buffer. Acute sagittal 230-μm brain slices were cut in ice-cold carbogenated (95% O₂/5% CO₂) solution, containing 110 mM choline chloride, 25 mM glucose, 25 mM NaHCO₃, 7 mM MgCl₂, 11.6 mM ascorbic acid, 3.1 mM sodium pyruvate, 2.5 mM KCl, 1.25 mM NaH₂PO₄ and 0.5 mM CaCl₂, and adjusted to 320–330 mOsm. Individual slices were incubated in a custom-made holding chamber filled with artificial cerebrospinal fluid, containing 126 mM NaCl, 3 mM KCl, 1.25 mM NaH₂PO₄, 26 mM NaHCO₃, 10 mM d-(+)-glucose, 2 mM MgSO₄ and 2 mM CaCl₂, adjusted to 310 mOsm, pH 7.3–7.4 and aerated with a 95% O₂/5% CO₂ gas mixture. Slices were incubated at 36 °C for 20 min and then kept at room temperature for a minimum of 45 min before recordings.

Slice electrophysiological recording

The slices were constantly perfused in a submersion chamber with 32 °C oxygenated normal artificial cerebrospinal fluid. Patch pipettes were pulled to a final resistance of 3–5 MΩ from filamented borosilicate glass on a Sutter P-1000 horizontal puller. HVC projection neuron classes, as identified by retrograde tracers, were visualized by epifluorescence imaging using a water immersion objective (×40; 0.8 numerical aperture) on an upright Olympus BX51 WI microscope, with video-assisted infrared CCD camera (QImaging Rolera). Data were low-pass filtered (10 kHz) and acquired (10 kHz) (Axon MultiClamp 700B amplifier, Axon Digidata 1550B data acquisition and Clampex 10.6; Molecular Devices).

For voltage clamp whole-cell recordings, the internal solution contained 120 mM cesium methanesulfonate, 10 mM CsCl, 10 mM HEPES, 10 mM EGTA, 5 mM creatine phosphate, 4 mM ATP–Mg and 0.4 mM GTP–Na (adjusted to pH 7.3–7.4 with CsOH). For current clamp recordings, the internal solution contained 116 mM K gluconate, 20 mM HEPES, 6 mM KCl, 2 mM NaCl, 0.5 mM EGTA, 4 mM MgATP, 0.3 mM NaGTP and 10 mM Na phosphocreatine (adjusted to pH 7.3–7.4 with KOH; 299 mOsm).

Optically evoked synaptic currents were measured by delivering two light pulses (1 ms, spaced 50 ms, generated by a CoolLED pE-300) focused on the sample through the ×40 immersion objective. Sweeps were delivered every 10 s. Synaptic responses were monitored while holding the membrane voltage at −70 mV (for oEPSCs) and +10 mV (for optogenetically evoked inhibitory postsynaptic currents (oIPSCs)). We monitored different light stimulation intensities before baseline recording to achieve oEPSC responses at approximately 50% of the maximal response. Access resistance (10–30 MΩ) was monitored throughout the experiment, and recordings were discarded from further analysis if resistance changed by more than 20%. The excitation–inhibition (oEPSC/oIPSC) ratio was calculated by dividing the amplitude of the oEPSC at −70 mV by the amplitude of the oIPSC at +10 mV during identical light intensity stimulation. To validate inhibitory and excitatory post-synaptic currents as γ-aminobutyric acid (GABA)ergic and glutamatergic, respectively, in a subset of cells the GABAa receptor antagonist SR 95531 hydrobromide (gabazine; 10 µM) was added to the bath while holding the cell at +10 mV, or the AMPA receptor antagonist 6,7-dinitroquinoxaline-2,3-dione (10 µM) while holding the cell at −70 mV. In another subset of cells, once the baseline measures were established, we tested for monosynaptic connectivity by bath application of 1 µM TTX, followed by 100 µM 4-AP, and measured the amplitude of post-synaptic currents returning following 4-AP application. On the basis of the signal-to-noise ratio of the recordings, currents under 5 pA were considered unreliable and not considered further, as were currents rescued by 4-AP application with an amplitude less than 10 pA (non-monosynaptic; two instances: 1 HVC_X→HVC_X and 1 HVC_X→HVC_RA).

Histology and immunohistochemistry

Birds were anaesthetized with EUTHASOL (Virbac) and transcardially perfused with 4% paraformaldehyde in phosphate-buffered saline (PBS). Free-floating sagittal sections (30 µm) were cut using a cryostat (Leica CM1950). These sections were first washed in PBS, then blocked in 3% bovine serum albumin in 0.3% Triton X-100 in PBS for 1 h at room temperature and incubated with primary antibodies (α-NeuN MAB377, Millipore, 1:500; α-GFP a11122, Invitrogen, 1:1,000) diluted in the blocking buffer at 4 °C for 24 h. The slices were washed with PBS and incubated at room temperature for 2 h with fluorescent secondary antibodies (Jackson 715-605-150 Alexa Fluor 647-conjugated donkey anti-mouse for NeuN and Millipore A21206 Alexa Fluor 488-conjugated goat anti-rabbit for GFP), diluted in blocking buffer). After PBS wash, sections were mounted onto slides with Fluoromount-G (eBioscience). Composite images were acquired and stitched using an LSM 880 or LSM 710 laser scanning confocal microscope (Carl Zeiss) and/or a ZEISS Axio Scan Z1 (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility; RRID: SCR_017949). Image analyses were performed using ImageJ. After electrophysiological recordings, the slices were incubated in 4% paraformaldehyde in PBS. Sections were then washed in PBS, mounted on glass slides with Fluoromount-G (eBioscience) and visualized under an LSM 880 laser scanning confocal microscope (Carl Zeiss). In situ hybridization experiments were conducted as previously reported.

Three-dimensional brain imaging and processing

Imaging and processing of the sample brain with tracers injected in HVC (Alexa 488-conjugated dextran 10,000) and RA (Alexa 568-conjugated dextran 10,000) for three-dimensional (3D) rendering were conducted with the help of Denise Ramirez and Ariana Nawaby (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility; RRID: SCR_017949). After perfusion with 4% paraformaldehyde, the brain was embedded in oxidized agarose in preparation for sectioning. The TissueCyte 1000 instrument (TissueVision) automatically sectioned the entire volume of the brain at 100 mm in the coronal plane and collected mosaic image tiles encompassing each section. For preprocessing, images were downsampled to 1.5-μm xy resolution and colour contrast adjusted to provide high visual contrast between signals of interest and background.

For segmentation, a selected portion of signals of interest in the downsampled contrast adjusted images of the tissue was visually identified, annotated and used to train a random forest classifier for segmentation in ilastik (v.1.3.3) (refs. ^61,62,63,64). This classifier was applied to all section images in the brain to assign a probability score to each pixel in the image, corresponding to its chance of belonging to specific fluorescent signals, autofluoresence or background noise. The total autofluorescence (Alexa 488 (green) and Alexa 568 (red)) pixelwise probability scores were further processed and used for visualization.

For segmentation post-processing, to create a grey silhouette of the overall shape of the brain, the autofluoresence probability signal was thresholded using the ImageJ default thresholding algorithm. Any holes in the binary mask were then flood-filled, and particles greater than 3,024 px² were removed. Green and red probabilities were thresholded at 105 and 79 8-bit pixel intensities, respectively, as determined visually to reduce low-probability noise in the image. The GFP signal in the rostral-most portion of the brain (beyond section 135) was dimmed for better visibility of more caudal structures by subtracting the pixel intensities by 140 pixel intensity units in the 8-bit range.

For visualization, combined RGB images of the autofluoresence (grey), Alexa 488 (green) and Alexa 568 (red) post-processed probabilities were visualized in 3D using VAA3D software (v.V3.447; https://home.penglab.com/proj/vaa3d/home/index.html).

Song analysis

Birdsongs were recorded and analysed using Sound Analysis Pro (SAP) 2011 (ref. ⁶⁵), and plots were made with a modified version of Avian Vocalization Network⁶⁶. We manually measured and categorized the outcomes of optogenetic stimulations. Truncations were defined as stimulation-contingent atypical amplitude decays of 300 ms or less (not present in control motifs), visible as silent gaps in the spectrogram. Truncation latencies were measured from the onset of the light delivery to the onset of the optically contingent silent gap. Stop was defined as truncation not followed by continuation or resumption of the motif. Syllable boundaries and complex syllable elements were delimited by silent pauses or by clear spectral continuity changes. Twenty stimulated song segments were measured for stimulated and non-stimulated conditions for quantification of acoustic properties and sound similarity (SAP). Acoustic properties of the stimulated segment were measured and compared with the corresponding song fragment in unstimulated control motifs. When optical stimulation did not cause truncation, acoustic properties were calculated on the song fragment from the onset of optical stimulation to the end of the last syllable. The entire motif was analysed during 1-s stimulation trials.

In the 1-s time window after song truncation, optical stimulation effects were manually classified as falling into one of four categories: (1) motif reset (restarting with the first song syllable, with introductory notes or with syllables that normally link motifs); (2) calls (typical zebra finch calls); (3) introductory notes (those not followed by motif initiation); or (4) pause and continuation (post-truncation motif resumption at any syllable in the motif other than the first syllable). To calculate the normalized motif reset probability, the number of motifs per bout was calculated over 30–50 bouts (defined as chains of motifs, started with introductory notes and mostly uninterrupted; in rare occasions, we found motifs produced within 1 s from other motifs, and they were considered as part of the previous bout; M, average number of motifs per bout). Each bird’s probability of motif truncation was then normalized (normalized motif reset probability = motif reset probability/[1 − (1/M)), following the logic that 1/M is the likelihood of each motif to be the last in the bout and not be followed by another motif. Therefore, 1 − (1/M) is the probability of a motif to be followed by another motif in the current bout. The probability of reset implies the presence of a motif after the truncated one examined. Therefore, dividing by the likelihood of that motif being followed by another one returns a normalized measure of the reset.

To report cross-motif quantification of truncation or reset latency and resumed vocalization identity probability, events were categorized depending on the time point within the motif at which the onset of the corresponding stimulation occurred. The events were then grouped in 10% bins across the motif duration, per bird, to allow for comparison between birds with different motif lengths. Then 100% for each bird was set to the duration of the motif −100 ms, as the latency to truncation when applied later than 100 ms before the end of the motif would lead to unclear effects on the syllables (average truncation latency across groups = 74.36 ± 3.06). Whenever the stimulation happened in the last 100 ms of motif, the events were classified in the −20% to 0% bins, affecting the transition to the following motif (if any). Stimulations, truncations and post-truncation effects occurring during introductory notes and inter-motif connecting syllables were assigned to these −20% to 0% time bins on the basis of their temporal distance to the syllable A (if no syllable A onset was produced, the effects were not considered for further analysis, as we could not categorize the introductory note as produced at specific distance from the motif for the percentage computation).

To evaluate the likelihood of optogenetic inhibition or stimulation across a motif–motif transition to terminate a bout (Extended Data Fig. 4d–h), we delivered light or sham stimulation across the motif and extending beyond its end, and we quantified the probability of the stimulation to be contingent with the termination of the bout for 50 trials in each condition.

In lesion experiments, a minimum of 20 motifs were scored with SAP against pre-surgery motifs. Failed motif starts were defined as a series of introductory notes not leading to a motif. The number of motifs in a bout was counted over 50 bouts; for TeNT experiments, for birds that would ultimately lose their song (UVA_HVC TeNT; some HVC_X TeNT), the last 50 bouts before song cessation were analysed. In case of absence of motifs being produced post-lesion in Fig. 3b (the birds did not sing at all), the accuracy was assigned the value of 0 for the sake of classification.

Recurrent circuit model of HVC

The computational model used in this study is on the basis of a canonical recurrent circuit model (continuous attractor neural network^67,68,69) and simulated in the BrainPy framework⁷⁰. In a typical continuous attractor neural network, excitatory neurons are arranged to uniformly cover a linear feature space (for example, the location of the timing chain in the current case⁷¹) and have mutual interactions through recurrent connections⁷². This configuration gives rise to a continuous manifold that sustains a series of activity bumps. A song motif is considered to be controlled by an activity bump traversing from one end of the chain to another⁷³.

To better reflect the biological characteristics of the songbird HVC, we introduced several specific features.

The model incorporates the following five distinct neuron types to capture the functional diversity in the songbird HVC:

(1)
Excitatory neurons (HVC_RA, ${{\bf{r}}}_{{\rm{RA}}}$, and HVC_X, ${{\bf{r}}}_{{\rm{X}}}$)

The excitatory neurons responsible for encoding the neural sequence are divided into two groups (HVC_RA and HVC_X), with their firing rates denoted as ${{\bf{r}}}_{{\rm{RA}}}$ and ${{\bf{r}}}_{{\rm{X}}}$, respectively. Consistent with experimental observations, the model only includes intergroup connections and leaves neurons within the same group unconnected. Simulations demonstrated that these intergroup connections are sufficient to self-sustain non-zero responses and moving sequences.
(2)
Global inhibitory neurons (${{\bf{r}}}_{{\rm{g}}}$)

To keep the stability of the network, the network model contains a global inhibitory neuron with the firing rate ${{\bf{r}}}_{{\rm{g}}}$. Compared with excitatory neurons, in the model, this neuron has more rapid dynamics and a steeper activation function to provide effective global inhibition.
(3)
Local inhibitory neurons (${{\bf{r}}}_{{\rm{I}}}$)

The circuit model has another group of inhibitory neurons (${{\bf{r}}}_{{\rm{I}}}$) providing local, structured inhibitory feedback to the excitatory populations, which is essential to generate spontaneous movement of the population activity bumps of excitatory neurons within the circuit. The ${{\bf{r}}}_{{\rm{I}}}$ bump slightly lags behind the excitatory neuron bumps owing to transmission delay and slow dynamics, so that the excitatory neurons at more distant locations will be suppressed less and build up more activity. As a result, the activity bump of excitatory neurons is ‘pushed’ to move forward.
(4)
Peri-song neurons (${{\bf{r}}}_{{\rm{ps}}}$)

The circuit model contains an HVC_RA peri-song neuron group (${{\bf{r}}}_{{\rm{ps}}}$) that is modelled to target HVC_RA song neurons at the initial end of the manifold. This group plays a critical role in initiating and resetting motif generation.

Circuit dynamics

The neural dynamics underlying these activities are captured by a set of dynamic equations:

$${\tau }_{{\rm{E}}}{\dot{{\bf{r}}}}_{{\rm{R}}{\rm{A}}}=-{{\bf{r}}}_{{\rm{R}}{\rm{A}}}+{W}_{{\rm{X}},{\rm{R}}{\rm{A}}}\cdot {f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{X}}})+{W}_{{\rm{I}},{\rm{R}}{\rm{A}}}\cdot {f}_{{\rm{I}}}({{\bf{r}}}_{{\rm{I}}})+{w}_{{\rm{g}},{\rm{R}}{\rm{A}}}\,{f}_{{\rm{g}}}({{\bf{r}}}_{{\rm{g}}})+{W}_{{\rm{p}}{\rm{s}},{\rm{R}}{\rm{A}}}\,{f}_{{\rm{p}}{\rm{s}}}({{\bf{r}}}_{{\rm{p}}{\rm{s}}})+{I}_{{\rm{e}}{\rm{x}}{\rm{t}},1}$$

(11)

$${{\tau }_{{\rm{E}}}\dot{{\bf{r}}}}_{{\rm{X}}}=-{{\bf{r}}}_{{\rm{X}}}+{W}_{{\rm{R}}{\rm{A}},{\rm{X}}}\cdot {f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{R}}{\rm{A}}})+{W}_{{\rm{I}},{\rm{X}}}\cdot {f}_{{\rm{I}}}({{\bf{r}}}_{{\rm{I}}})+{w}_{{\rm{g}},{\rm{X}}}\,{f}_{{\rm{g}}}({{\bf{r}}}_{{\rm{g}}})+{I}_{{\rm{e}}{\rm{x}}{\rm{t}},2}$$

(12)

$${\dot{{\bf{r}}}}_{{\rm{g}}}=-{{\bf{r}}}_{{\rm{g}}}+{[{W}_{{\rm{R}}{\rm{A}},{\rm{g}}}\cdot {f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{R}}{\rm{A}}})+{W}_{{\rm{X}},{\rm{g}}}\cdot f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{X}}})]$$

(1.3)

$${{\tau }_{{\rm{I}}}\dot{{\bf{r}}}}_{{\rm{I}}}=-{{\bf{r}}}_{{\rm{I}}}+{[{{W}_{{\rm{R}}{\rm{A}},{\rm{I}}}\cdot f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{R}}{\rm{A}}})+{W}_{{\rm{X}},{\rm{I}}}\cdot f}_{{\rm{E}}}({{\bf{r}}}_{{\rm{X}}})]$$

(1.4)

$${\tau }_{{\rm{p}}{\rm{s}}}{\dot{{\bf{r}}}}_{{\rm{p}}{\rm{s}}}=-{{\bf{r}}}_{{\rm{p}}{\rm{s}}}+{I}_{{\rm{U}}{\rm{v}}{\rm{a}}}+{w}_{{\rm{g}},{\rm{p}}{\rm{s}}}\,{f}_{{\rm{g}}}\,({{\bf{r}}}_{{\rm{g}}})$$

(15)

In these equations, subscripts denote the neuron types. The parameter $\tau $ represents the time constant, and $f(\cdot )$ denotes the activation function for each neuron group. External input currents are denoted as ${I}_{\mathrm{ext}}$, and specific terms such as ${I}_{\mathrm{Uva}}$ correspond to input from Uva. The capital ${W}_{{\rm{A}},{\rm{B}}}$ indicates the connection matrix from group A to B with dimensions ${N}_{{\rm{B}}}\times {N}_{{\rm{A}}}$, where $N$ is the number of neurons in the respective group, whereas the lowercase $w$ indicates the scalar connection strength. For convenience, we set ${N}_{\mathrm{RA}}={N}_{{\rm{X}}}={N}_{{\rm{I}}}=N$ and ${N}_{{\rm{g}}}={N}_{\mathrm{ps}}=1$. Specifically, to support a continuous manifold, the entries of connections between excitatory and local inhibitory neurons are determined by the distance between the index of pre-synaptic and post-synaptic neurons:

$${W}_{{\rm{A}},{\rm{B}}}^{(ij)}={w}_{{\rm{A}},{\rm{B}}}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{(i-j)}^{2}}{2{\sigma }^{2}}\right]$$

(2)

where ${w}_{{\rm{A}},{\rm{B}}}$ (${\rm{A}},{\rm{B}}\in \{\mathrm{RA},{\rm{X}},{\rm{I}}\}$) denotes the peak weight of the weight from neuronal population ${\rm{A}}$ to ${\rm{B}}$.

To target the peri-song output to the initial location of the manifold, ${W}_{\mathrm{ps},{\rm{E}}}$ is a $N\times 1$ matrix with its column in a Gaussian profile centring at 0:

$${W}_{\mathrm{ps},{\rm{E}}}^{({\rm{k}})}={w}_{\mathrm{ps},{\rm{E}}}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{(k-0)}^{2}}{2{\sigma }^{2}}\right]$$

(3)

Sequence initiation

The fundamental property of the network is its ability to spontaneously generate neural sequences. In our model, peri-song neurons initiate the sequential activity. The peri-song neurons receive excitatory input, probably originating from the upstream nucleus Uva, while simultaneously receiving inhibitory input from the global inhibitory neurons. When the network is silenced, whether at rest or following truncation, activity in the global inhibitory neuron decreases, which disinhibits the peri-song neurons. This release from inhibition then triggers the onset of a motif.

Boundaries

Following the activation of excitatory neurons, the activity bump is driven by locally structured inhibitory feedback from ${{\bf{r}}}_{{\rm{I}}}$ to traverse the continuous manifold. For the bump to gain a directional motion tendency, the inhibitory feedback is intentionally enhanced at the initial locations on the chain. Owing to the recurrent nature of the network, the bump would ordinarily ‘bounce’ back upon reaching the end of the chain. However, this behaviour is inconsistent with observed data. To address this, we introduced a fading mechanism for excitatory-to-excitatory connections as the bump approaches the boundary, simulating a ‘boundary effect’. This gradual reduction in connectivity causes the bump to diminish as it reaches the end point, resulting in an automatic cessation of activity that mimics the natural termination of a motif. These two boundary behaviours were implemented by multiplying the connection strength with a compensation factor:

$${{W}_{{\rm{I}},{\rm{R}}{\rm{A}}/{\rm{X}}}^{(ij)}}^{{\prime} }={W}_{{\rm{I}},{\rm{R}}{\rm{A}}/{\rm{X}}}^{(ij)}\,\left(1+{c}_{0}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{i}^{2}}{4{\sigma }^{2}}\right]\right)$$

(41)

$${{W}_{{\rm{R}}{\rm{A}}/{\rm{X}},{\rm{X}}/{\rm{R}}{\rm{A}}}^{(ij)}}^{{\prime} }={W}_{{\rm{R}}{\rm{A}}/{\rm{X}},{\rm{X}}/{\rm{R}}{\rm{A}}}^{(ij)}\,\left(1-{c}_{1}\,\exp \left[-{\left(\frac{2{\rm{\pi }}}{N}\right)}^{2}\frac{{(i-N-\phi )}^{2}}{4{\sigma }^{2}}\right]\right)$$

(42)

$\mathrm{where}\,\phi $ is an offset term, in which we take the value $\phi =0.5\sigma N/2{\rm{\pi }}$. The compensated connection matrices are shown in Fig. 5c.

Truncation

To simulate optogenetic stimulation truncating HVC neuronal sequences observed in experimental studies, we applied an intense, spatially homogeneous pulse input to either HVC_RA or HVC_X neurons. Following this stimulation, both ${{\bf{r}}}_{{\rm{RA}}}$ and ${{\bf{r}}}_{{\rm{X}}}$ became hyper-activated, leading to rapid suppression by the fast response of ${{\bf{r}}}_{{\rm{g}}}$. These neurons remain suppressed until ${{\bf{r}}}_{{\rm{g}}}$ activity subsides, corresponding to the observed motif truncation (Fig. 5e,f). Subsequently, the peri-song neurons reinitiate the neural sequence, allowing the motif to resume from the beginning. Considering that HVC_RA and HVC_X are connected symmetrically in the current model, we only simulated optogenetic stimulation on HVC_RA as a verification.

HVC_X degradation

To simulate the effects of degradation of HVC_X neuron neurotransmission, as observed in Fig. 5g,h, we manually modified the output projections of HVC_X. Let $p$ denote the proportion of degradation. Under this condition, the degraded projection from HVC_X to HVC_RA (${W}_{{\rm{X}},\mathrm{RA}}{\prime} $) can be expressed as

$${{W}_{{\rm{X}},{\rm{R}}{\rm{A}}}^{(ij)}}^{{\prime} }={[(1-p){W}_{{\rm{X}},{\rm{R}}{\rm{A}}}^{(ij)}+\sqrt{(1-p){W}_{{\rm{X}},{\rm{R}}{\rm{A}}}^{(ij)}}{\sigma }_{{\rm{W}}}{\xi }_{ij}]}_{+}$$

(5)

where ${W}_{{\rm{X}},\mathrm{RA}}^{({ij})}$ represents the original connection strength, ${\sigma }_{W}$ denotes the variation coefficient, ${\xi }_{{ij}}$ is an independent Gaussian noise term indexed by the pre-neuron and post-neuron indices ij, and ${[x]}_{+}=\max (x,0)$ denotes the negative rectification, ensuring the weight is always excitatory (positive).

During synaptic degradation over weeks, experiments revealed that neuronal sequences observed in different trials within the same day could traverse and then disappear at random locations on the chain. We assume that the synaptic weights within the same day are nearly the same, and that the random progression along the chain results from the variability of single neurons. Therefore, to reproduce the random progression along the chain during synaptic degradation, each HVC_RA neuron ${{\bf{r}}}_{{\rm{RA}}}(j)$ receives a Poisson-like noise ${I}_{\mathrm{noise}}$, mimicking stochastic spike generation:

$${I}_{{\rm{n}}{\rm{o}}{\rm{i}}{\rm{s}}{\rm{e}}}(j)=\sqrt{{F{\bf{r}}}_{{\rm{R}}{\rm{A}}}(j)}\xi (t)$$

(6)

where $F$ is the Fano factor scaling the noise and $\xi (t)$ is a standard Gaussian white noise. Moreover, the noises received by different neurons are independent of each other. Under these conditions, we observed that the sequences terminated at random positions. As illustrated in Fig. 5g, the average sequence length decreased as the proportion of neuronal degradation increased.

Statistical analysis

All data were analysed with GraphPad Prism 10. Data were tested for normality using the Shapiro–Wilk Test. Parametric and non-parametric statistical tests were used. To compare between two groups, t-test, Mann–Whitney and Kolmogorov–Smirnov tests were used. For more than two conditions, one-way and two-way ANOVA or the Kruskal–Wallis test were performed. Cumulative probability curves were calculated for each animal and then tested in groups for statistical significance. Only one comparison among all groups was made to avoid repeatedly comparing the same dataset (HVC) with individual other datasets. Fisher or X² tests, followed by Dunn’s post hoc test, were used to compare the probability of finding optically evoked responses across the HVC projection neuron classes while stimulating the different afferents. Dunn’s, Sidak’s or Holm–Sidak’s post hoc tests were used to correct for multiple comparisons. Statistical significance refers to *P < 0.05, **P < 0.01 and ***P < 0.001.

Statistics and reproducibility

Each experimental result was produced independently and/or by combining at least two separate cohorts with similar results (for example, Uva lesions/silencing in Fig. 3 and Extended Data Fig. 4, multi-nuclei lesions in Extended Data Fig. 6 and HVC_X–TeNT experiments in Fig. 5 and Extended Data Fig. 13). Figures showing viral expression or lesion extent are broadly representative of each experimental group.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The datasets generated and/or analysed during this study are available in the Texas Data Repository (https://dataverse.tdl.org/dataverse/trusel_Nature_2025_data).

Code availability

The code for model simulations and visualizations is available at GitHub (https://github.com/Zack-zuo/SongbirdHVC#)⁷⁴. Anatomical atlas drawings are adapted from the ZEBrA database (Oregon Health & Science University; http://www.zebrafinchatlas.org)⁶⁰. Source data are provided with this paper.

References

Aronov, D., Andalman, A. S. & Fee, M. S. A specialized forebrain circuit for vocal babbling in the juvenile songbird. Science 320, 630–634 (2008).
Article ADS CAS PubMed Google Scholar
Doeringer, J. A. & Hogan, N. Serial processing in human movement production. Neural Networks 11, 1345–1356 (1998).
Article PubMed Google Scholar
Jin, X., Tecuapetla, F. & Costa, R. M. Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nat. Neurosci. 17, 423–430 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lashley, K. S. in Cerebral Mechanisms in Behavior; the Hixon Symposium (ed. Jeffress, L. A.) 112–146 (Wiley, 1951).
Gallistel, C. R. Précis of Gallistel’s The Organization of Action: A New Synthesis. Behav. Brain Sci. 4, 609–650 (1981).
Article Google Scholar
Okubo, T. S., Mackevicius, E. L., Payne, H. L., Lynch, G. F. & Fee, M. S. Growth and splitting of neural sequences in songbird vocal development. Nature 528, 352–357 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Newell, A. & Rosenbloom, P. S. in Cognitive Skills and Their Acquisition (ed. Anderson, J. R.) 1–55 (Lawrence Erlbaum Associates, 1981).
Brainard, M. S. & Doupe, A. J. What songbirds teach us about learning. Nature 417, 351–358 (2002).
Article ADS CAS PubMed Google Scholar
Hahnloser, R. H., Kozhevnikov, A. A. & Fee, M. S. An ultra-sparse code underlies the generation of neural sequences in a songbird. Nature 419, 65–70 (2002).
Article ADS CAS PubMed Google Scholar
Fee, M. S., Kozhevnikov, A. A. & Hahnloser, R. H. Neural mechanisms of vocal sequence generation in the songbird. Ann. N Y Acad. Sci. 1016, 153–170 (2004).
Article ADS PubMed Google Scholar
Long, M. A., Jin, D. Z. & Fee, M. S. Support for a synaptic chain model of neuronal sequence generation. Nature 468, 394–399 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Lynch, G. F., Okubo, T. S., Hanuschkin, A., Hahnloser, R. H. & Fee, M. S. Rhythmic continuous-time coding in the songbird analog of vocal motor cortex. Neuron 90, 877–892 (2016).
Article CAS PubMed Google Scholar
Picardo, M. A. et al. Population-level representation of a temporal sequence underlying song production in the zebra finch. Neuron 90, 866–876 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ashmore, R. C., Renk, J. A. & Schmidt, M. F. Bottom-up activation of the vocal motor forebrain by the respiratory brainstem. J. Neurosci. 28, 2613–2623 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ashmore, R. C., Wild, J. M. & Schmidt, M. F. Brainstem and forebrain contributions to the generation of learned motor behaviors for song. J. Neurosci. 25, 8543–8554 (2005).
Article CAS PubMed PubMed Central Google Scholar
Hamaguchi, K., Tanaka, M. & Mooney, R. A distributed recurrent network contributes to temporally precise vocalizations. Neuron 91, 680–693 (2016).
Article CAS PubMed PubMed Central Google Scholar
Moll, F. W. et al. Thalamus drives vocal onsets in the zebra finch courtship song. Nature 616, 132–136 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Long, M. A. & Fee, M. S. Using temperature to analyse temporal dynamics in the songbird motor pathway. Nature 456, 189–194 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Elmaleh, M., Kranz, D., Asensio, A. C., Moll, F. W. & Long, M. A. Sleep replay reveals premotor circuit structure for a skilled behavior. Neuron 109, 3851–3861 (2021).
Article CAS PubMed PubMed Central Google Scholar
Armstrong, E. & Abarbanel, H. D. Model of the songbird nucleus HVC as a network of central pattern generators. J. Neurophysiol. 116, 2405–2419 (2016).
Article PubMed PubMed Central Google Scholar
Amador, A., Perl, Y. S., Mindlin, G. B. & Margoliash, D. Elemental gesture dynamics are encoded by song premotor cortical neurons. Nature 495, 59–64 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Danish, H. H., Aronov, D. & Fee, M. S. Rhythmic syllable-related activity in a songbird motor thalamic nucleus necessary for learned vocalizations. PLoS ONE 12, e0169568 (2017).
Article PubMed PubMed Central Google Scholar
Vu, E. T., Mazurek, M. E. & Kuo, Y. C. Identification of a forebrain motor programming network for the learned song of zebra finches. J. Neurosci. 14, 6924–6934 (1994).
Article CAS PubMed PubMed Central Google Scholar
Ashmore, R. C., Bourjaily, M. & Schmidt, M. F. Hemispheric coordination is necessary for song production in adult birds: implications for a dual role for forebrain nuclei in vocal motor control. J. Neurophysiol. 99, 373–385 (2008).
Article PubMed Google Scholar
Histed, M. H., Bonin, V. & Reid, R. C. Direct activation of sparse, distributed populations of cortical neurons by electrical microstimulation. Neuron 63, 508–522 (2009).
Article CAS PubMed PubMed Central Google Scholar
Garcia-Oscos, F. et al. Autism-linked gene FoxP1 selectively regulates the cultural transmission of learned vocalizations. Sci. Adv. 7, eabd2827 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Franz, M. & Goller, F. Respiratory units of motor production and song imitation in the zebra finch. J. Neurobiol. 51, 129–141 (2002).
Article PubMed Google Scholar
Marder, E. & Bucher, D. Central pattern generators and the control of rhythmic movements. Curr. Biol. 11, R986–996 (2001).
Article CAS PubMed Google Scholar
Berkowitz, A. Expanding our horizons: central pattern generation in the context of complex activity sequences. J. Exp. Biol. 222, jeb192054 (2019).
Article PubMed Google Scholar
Coleman, M. J. & Vu, E. T. Recovery of impaired songs following unilateral but not bilateral lesions of nucleus uvaeformis of adult zebra finches. J. Neurobiol. 63, 70–89 (2005).
Article PubMed Google Scholar
Trusel, M. et al. Synaptic connectivity of sensorimotor circuits for vocal imitation in the songbird. eLife https://doi.org/10.7554/eLife.104609 (2025).
Coleman, M. J., Roy, A., Wild, J. M. & Mooney, R. Thalamic gating of auditory responses in telencephalic song control nuclei. J. Neurosci. 27, 10024–10036 (2007).
Article CAS PubMed PubMed Central Google Scholar
Cynx, J. Experimental determination of a unit of song production in the zebra finch (Taeniopygia guttata). J. Comp. Psychol. 104, 3–10 (1990).
Article CAS PubMed Google Scholar
Cardin, J. A. & Schmidt, M. F. Auditory responses in multiple sensorimotor song system nuclei are co-modulated by behavioral state. J. Neurophysiol. 91, 2148–2163 (2004).
Article PubMed Google Scholar
Foster, E. F. & Bottjer, S. W. Lesions of a telencephalic nucleus in male zebra finches: influences on vocal behavior in juveniles and adults. J. Neurobiol. 46, 142–165 (2001).
Article CAS PubMed Google Scholar
Bottjer, S. W., Miesner, E. A. & Arnold, A. P. Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science 224, 901–903 (1984).
Article ADS CAS PubMed Google Scholar
Roberts, T. F., Gobes, S. M., Murugan, M., Ölveczky, B. P. & Mooney, R. Motor circuits are required to encode a sensory model for imitative learning. Nat. Neurosci. 15, 1454–1459 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ali, F. et al. The basal ganglia is necessary for learning spectral, but not temporal, features of birdsong. Neuron 80, 494–506 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mooney, R. Different subthreshold mechanisms underlie song selectivity in identified HVc neurons of the zebra finch. J. Neurosci. 20, 5420–5436 (2000).
Article CAS PubMed PubMed Central Google Scholar
Fetterman, G. C. & Margoliash, D. Rhythmically bursting songbird vocomotor neurons are organized into multiple sequences, suggesting a network/intrinsic properties model encoding song and error, not time. Preprint at bioRxiv https://doi.org/10.1101/2023.01.23.525213 (2023).
Kozhevnikov, A. A. & Fee, M. S. Singing-related activity of identified HVC neurons in the zebra finch. J. Neurophysiol. 97, 4271–4283 (2007).
Article PubMed Google Scholar
Fee, M. S. & Goldberg, J. H. A hypothesis for basal ganglia-dependent reinforcement learning in the songbird. Neuroscience 198, 152–170 (2011).
Article CAS PubMed PubMed Central Google Scholar
Scharff, C., Kirn, J. R., Grossman, M., Macklis, J. D. & Nottebohm, F. Targeted neuronal death affects neuronal replacement and vocal behavior in adult songbirds. Neuron 25, 481–492 (2000).
Article CAS PubMed Google Scholar
Sánchez-Valpuesta, M. et al. Corticobasal ganglia projecting neurons are required for juvenile vocal learning but not for adult vocal plasticity in songbirds. Proc. Natl Acad. Sci. USA 116, 22833–22843 (2019).
Article ADS PubMed PubMed Central Google Scholar
Hahnloser, R. H., Kozhevnikov, A. A. & Fee, M. S. Sleep-related neural activity in a premotor and a basal-ganglia pathway of the songbird. J. Neurophysiol. 96, 794–812 (2006).
Article PubMed Google Scholar
Mooney, R. & Prather, J. F. The HVC microcircuit: the synaptic basis for interactions between song motor and vocal plasticity pathways. J. Neurosci. 25, 1952–1964 (2005).
Article CAS PubMed PubMed Central Google Scholar
Kosche, G., Vallentin, D. & Long, M. A. Interplay of inhibition and excitation shapes a premotor neural sequence. J. Neurosci. 35, 1217–1227 (2015).
Article PubMed PubMed Central Google Scholar
Kornfeld, J. et al. EM connectomics reveals axonal target variation in a sequence-generating network. eLife https://doi.org/10.7554/eLife.24364 (2017).
Daliparthi, V. K. et al. Transitioning between preparatory and precisely sequenced neuronal activity in production of a skilled behavior. eLife https://doi.org/10.7554/eLife.43732 (2019).
Sadeh, S. & Clopath, C. Inhibitory stabilization and cortical computation. Nat. Rev. Neurosci. 22, 21–37 (2021).
Article CAS PubMed Google Scholar
Bera, K., Shukla, A. & Bapi, R. S. Motor chunking in internally guided sequencing. Brain Sci. 11, 292 (2021).
Article PubMed PubMed Central Google Scholar
Tosatto, L., Fagot, J., Nemeth, D. & Rey, A. Chunking as a function of sequence length. Anim. Cogn. 28, 2 (2024).
Article PubMed PubMed Central Google Scholar
Fonollosa, J., Neftci, E. & Rabinovich, M. Learning of chunking sequences in cognition and behavior. PLoS Comput. Biol. 11, e1004592 (2015).
Article ADS PubMed PubMed Central Google Scholar
Ramkumar, P. et al. Chunking as the result of an efficiency computation trade-off. Nat. Commun. 7, 12176 (2016).
Article ADS PubMed PubMed Central Google Scholar
Lai, L., Huang, A. Z. X. & Gershman, S. J. Action chunking as conditional policy compression. Cognition 264, 106201 (2025).
Article PubMed Google Scholar
Mackevicius, E. L., Happ, M. T. L. & Fee, M. S. An avian cortical circuit for chunking tutor song syllables into simple vocal-motor units. Nat. Commun. 11, 5029 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Piristine, H. C., Choetso, T. & Gobes, S. M. A sensorimotor area in the songbird brain is required for production of vocalizations in the song learning period of development. Dev. Neurobiol. 76, 1213–1225 (2016).
Article PubMed Google Scholar
Roberts, T. F. et al. Identification of a motor-to-auditory pathway important for vocal learning. Nat. Neurosci. 20, 978–986 (2017).
Article CAS PubMed PubMed Central Google Scholar
Stauffer, T. R. et al. Axial organization of a brain region that sequences a learned pattern of behavior. J. Neurosci. 32, 9312–9322 (2012).
Article CAS PubMed PubMed Central Google Scholar
Karten, H. J. et al. Digital atlas of the zebra finch (Taeniopygia guttata) brain: a high-resolution photo atlas. J. Comp. Neurol. 521, 3702–3715 (2013).
Article PubMed PubMed Central Google Scholar
Peng, H., Bria, A., Zhou, Z., Iannello, G. & Long, F. Extensible visualization and analysis for multidimensional images using Vaa3D. Nat. Protoc. 9, 193–208 (2014).
Article CAS PubMed Google Scholar
Peng, H., Ruan, Z., Long, F., Simpson, J. H. & Myers, E. W. V3D enables real-time 3D visualization and quantitative analysis of large-scale biological image data sets. Nat. Biotechnol. 28, 348–353 (2010).
Article CAS PubMed PubMed Central Google Scholar
Peng, H. et al. Virtual finger boosts three-dimensional imaging and microsurgery as well as terabyte volume image visualization and analysis. Nat. Commun. 5, 4342 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Berg, S. et al. ilastik: interactive machine learning for (bio)image analysis. Nat. Methods 16, 1226–1232 (2019).
Article CAS PubMed Google Scholar
Tchernichovski, O., Nottebohm, F., Ho, C. E., Pesaran, B. & Mitra, P. P. A procedure for an automated measurement of song similarity. Anim. Behav. 59, 1167–1176 (2000).
Koch, T. M. I., Marks, E. & Roberts, T. F. AVN: a deep learning approach for the analysis of birdsong. Preprint at bioRxiv https://doi.org/10.1101/2024.05.10.593561 (2024).
Fung, C. C., Wong, K. Y. & Wu, S. A moving bump in a continuous manifold: a comprehensive study of the tracking dynamics of continuous attractor neural networks. Neural Comput. 22, 752–792 (2010).
Article MathSciNet PubMed Google Scholar
Amari, S. Dynamics of pattern formation in lateral-inhibition type neural fields. Biol. Cybern. 27, 77–87 (1977).
Article MathSciNet CAS PubMed Google Scholar
Wu, S., Hamaguchi, K. & Amari, S. Dynamics and computation of continuous attractors. Neural Comput. 20, 994–1025 (2008).
Article MathSciNet PubMed Google Scholar
Wang, C. et al. BrainPy, a flexible, integrative, efficient, and extensible framework for general-purpose brain dynamics programming. eLife https://doi.org/10.7554/eLife.86365 (2023).
Zuo, J., Liu, X., Wu, Y. N., Wu, S. & Zhang, W.-H. A recurrent neural circuit mechanism of temporal-scaling equivariant representation. In Proc. 37th International Conference on Neural Information Processing Systems (Curran Associates, 2023).
Niell, C. M. Cell types, circuits, and receptive fields in the mouse visual cortex. Annu. Rev. Neurosci. 38, 413–431 (2015).
Article CAS PubMed Google Scholar
Zhang, W. & Wu, S. Neural information processing with feedback modulations. Neural Comput. 24, 1695–1721 (2012).
Article MathSciNet PubMed Google Scholar
Zuo, J. Recurrent model of Songbird HVC. GitHub https://github.com/Zack-zuo/SongbirdHVC# (2025).

Download references

Acknowledgements

We thank T. Gentner, W. Dauer, D. Hattori, S. Choi and members of the Roberts laboratory for comments on an initial version of the paper, and M. Long, M. Schmidt and D. Aronov for valuable discussions of our data as the project unfolded. We are grateful to J. Holdway, L. Garcia and R. Cabuco for laboratory support. We thank D. Ramirez and A. Nawaby (University of Texas Southwestern Medical Center Whole Brain Microscopy Facility, RRID: SCR_017949) for assistance with 3D imaging and rendering. We also thank J. Hilton, R. Hunte and P. Jennings for administrative support. W.H.Z. is supported by UT Southwestern Endowed Scholar Program. Finally, we acknowledge the National Institutes of Health for supporting this research through grant nos. UF1NS115821 and R01NS108424 to T.F.R. and F99NS124172 to D.H.A.

Author information

Authors and Affiliations

Department of Neuroscience, UT Southwestern Medical Center, Dallas, TX, USA
Massimo Trusel, Danyal H. Alam, Ethan S. Marks, Therese M. I. Koch, Jie Cao, Harshida Pancholi, Ziran Zhao & Todd F. Roberts
Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
Junfeng Zuo
Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX, USA
Junfeng Zuo & Wen-Hao Zhang
Department of Psychology, Texas Christian University, Fort Worth, TX, USA
Brenton G. Cooper
Peter O’Donnell Jr. Brain Institute, UT Southwestern Medical Center, Dallas, TX, USA
Wen-Hao Zhang & Todd F. Roberts

Authors

Massimo Trusel
View author publications
Search author on:PubMed Google Scholar
Junfeng Zuo
View author publications
Search author on:PubMed Google Scholar
Danyal H. Alam
View author publications
Search author on:PubMed Google Scholar
Ethan S. Marks
View author publications
Search author on:PubMed Google Scholar
Therese M. I. Koch
View author publications
Search author on:PubMed Google Scholar
Jie Cao
View author publications
Search author on:PubMed Google Scholar
Harshida Pancholi
View author publications
Search author on:PubMed Google Scholar
Ziran Zhao
View author publications
Search author on:PubMed Google Scholar
Brenton G. Cooper
View author publications
Search author on:PubMed Google Scholar
Wen-Hao Zhang
View author publications
Search author on:PubMed Google Scholar
Todd F. Roberts
View author publications
Search author on:PubMed Google Scholar

Contributions

M.T. and T.F.R. conceived the project. D.H.A., J.C. and H.P. adapted and produced the viral tools. J.Z. and W.-H.Z. designed and conducted the modelling work. M.T. designed the methodology and performed the optogenetic manipulations, lesions and electrophysiological recordings. M.T. and B.G.C. designed the methodology and performed the air sac pressure recordings with optogenetic stimulations. Z.Z. performed Neuropixels 2.0 recordings for the revision process (see response to reviewers). M.T., J.Z., E.S.M., T.M.I.K. and B.G.C. visualized the project. T.F.R. acquired funding and administered and supervised the project. M.T., W.-H.Z. and T.F.R. wrote the original draft of the paper. All authors contributed to writing, reviewing and editing the paper. All data are available in the main text or the supplementary materials.

Corresponding author

Correspondence to Todd F. Roberts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Yarden Cohen, Melissa Coleman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Effects of optogenetic stimulation of HVC neurons in singing zebra finches.

a) Schematic of zebra finch song circuits (sagittal view), including HVC’s afferents (grey) from Uva (nucleus Uvaeformis), NIf (nucleus interface of the nidopallium), mMAN (medial magnocellular nucleus of the anterior nidopallium), and Av (nucleus avalanche); HVC projections to RA (robust nucleus of the arcopallium) via HVC_RA neurons (magenta), to the striatopallidal region Area X via HVC_X neurons (cyan), and to Av via HVC_Av neurons (yellow); HVC inhibitory interneurons (black); the cortico-basal ganglia-thalamocortical song pathway (brown dashed lines), and the corticobulbar song motor pathway from RA (green). b) HVC multiunit neuronal activity recording in anesthetized birds expressing ChRmine in HVC. Sample trace (top, scale bar 1 s, 1 V), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation (100 ms, red bar; two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation: interaction F(69,621) = 8.137 P < 0.001, stimulation F(1,9) = 11.20 P = 0.0086, Sidak post-hoc P < 0.05 between 20 ms after the light onset and 40 ms after the light offset); inset displays magnified detail of the PSTH and scatter plot highlighting and computing the average (per hemisphere) response to the first 100 ms light stimulation (red dashed rectangle) compared to the last 100 ms baseline (black dashed rectangle, Wilcoxon test P = 0.002; n = 10 hemispheres, 5 birds). c) More spectrograms (0-11KHz, scale bar 200 ms, horizontal lines identify song elements) from the bird in Fig. 1b (same lettering and symbols) displaying multiple events of optogenetically-evoked truncations followed by rapid restart of a motif (orange circle) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. d) Violin plots reporting accuracy of song segments with (gray) and without (white) stimulation for each bird (n = 6; two-way ANOVA testing the effect of optogenetic stimulation per each bird across the group, interaction F(5,114) = 8.178 P < 0.001, CTRL vs. STIM, F(1,114) = 55.22 P < 0.001). e-g) Same as (d) but for Amplitude (two-way ANOVA, interaction F(5,114) = 19.49 P < 0.001, CTRL vs. STIM, F(1,114) = 246.4 P < 0.001), Entropy (two-way ANOVA, interaction F(5,114) = 23.76 P < 0.001, CTRL vs. STIM, F(1,114) = 322.5 P < 0.001) and Goodness of pitch (two-way ANOVA, interaction F(5,114) = 17.66 P < 0.001, CTRL vs. STIM, F(1,114) = 101.4 P < 0.001). h) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) following HVC light stimulation computed based on the time of stimulation through the progression of the motif (bins = 10% motif advancement). i) Box plots (5-95 percentile, 25,50,75 percentile) reporting probability of motif restart for each bird (orange dots). The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum). j) Spectrogram (top, as (c)) and subsyringeal air pressure (bottom, black trace) relative to 3 motifs (stimulation: red bars, 10 ms, lettering and symbols as per panel (c), scale bar 200 ms). (bottom) Grey semitransparent horizontal line indicates ambient pressure, supratmospheric pressure shows expiratory air pressure and subatmospheric is inspiratory. The insert (right) shows the average of syllable C (grey shading ±2 SD) in control (black) and stimulated trials (blue line). The yellow shading indicates the significant reduction in air pressure caused by optogenetic stimulation HVC neurons. k) Average ±SEM of cumulative probability distributions calculated for each bird whose pressure was recorded, displaying the latency to truncation as measured for each bird from spectrograms (black line) or from subsyringeal pressure (blue line) (10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(51,153) = 149.2, pressure vs. song F(1,3) = 19.50 P < 0.001, Sidak’s post-hoc identifies significant difference (p < 0.05) between 40 and 60 ms time bins); (inset) violin plots reporting the latency of motif truncation computed across all the birds (latency calculated from spectrograms (white), latency calculated from pressure for the same 4 birds (blue); two-sided Mann-Whitney U = 236 P < 0.001). l) Violin plots reporting the duration of the optogenetically-evoked apnea in events where truncations were not followed by restarting the song motif, computed across all stimulations for each of the 4 birds. Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 2 Dual effect of eGtACR1 at somatic and axonal compartments.

a) Schematics (top), and in-vivo recordings under anhestesia (bottom) showing the dual effect of eGtACR1-mediated stimulation of axonal terminals causing neurotransmitter release (red), and somatic inhibition (blue). Sample traces, raster plots and PSTH (average ±SEM fold change in HVC activity binned each 100 ms, n = 5 hemispheres, 3 birds) reporting in-vivo recordings of HVC neuronal activity change in response to NIf axon terminal excitation (red) or soma inhibition (blue) upon light delivery (region shaded in green; scale bar 200 ms, 1 V). b) Sample traces of evoked EPSPs from an eGtACR1-expressing neuron in absence or presence of somatic light-mediated inhibition (green bar; current injection 100 ms 50 pA steps, −200 pA to +400 pA, scale bar 100 ms, 20 mV). Brain outlines adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 3 Role of Uva and Peri-Uva thalamus in song.

a, b) Data from birds expressing eGtACR1 in Uva and implanted with optic fibers over HVC: a) Violin plots reporting Entropy of song segments with (gray) and without (white) stimulation (n = 4); two-way ANOVA testing the effect of optogenetic stimulation, interaction F(3,76) = 26.18 P < 0.001, CTRL vs. STIM, F(1,76) = 1.099 P = 0.2979). b) Violin plots reporting accuracy of song segments with (gray) and without (white) stimulation (n = 4) when using 1 s long stimulation (two-way ANOVA testing the effect of optogenetic stimulation, interaction F(3,76) = 7.91 P < 0.001, CTRL vs. STIM, F(1,76) = 0.06636 P = 0.7974). c) Data from birds expressing ChRmine in Uva_HVC neurons and implanted with optic fibers over Uva. Violin plots reporting Entropy of song segments with (200 ms stimulation, gray) and without (white) stimulation (n = 3; two-way ANOVA testing the effect of optogenetic stimulation, interaction F(2,57) = 0.3931 P = 0.6768, CTRL vs. STIM, F(1,57) = 3.500 P = 0.0665). d) Schematic and sample image showing non-selective expression of AAV-ChRmine in Uva and peri-Uva thalamus, followed by thalamic song-contingent stimulation. Sample spectrogram (scale bar 200 ms, horizontal lines mark song elements) displaying motif truncation at syllable boundaries caused by thalamic light stimulation (50 ms 532 nm light pulses, red bars).Sample image illustrating ChRmine expression (red, scale bar 200 ms) in Uva (Uva_HVC neurons labeled by tracer injection in HVC (green)) and peri-Uva thalamus, stimulated by light delivered through the implanted fiber optic (white dashed lines top of image). e) (left) plot showing amplitude of all the stimulated (red line) motifs, ordered by time of stimulation in the motif. (right) Plot reporting a subset of stimulated motifs’ latency to optogenetic stimulation (red circle), motif truncation (blue “x”), and restart of a motif (orange), intro notes not followed by a motif (purple), calls (grey) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. f) Box and scatter plot reporting the probability of motif stop (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (thalamus stimulated birds, n = 2, filled squares; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. thalamus, motif stop P = 0.9883, pause+continuation P = 0.6761, no perturbation P = 0.9728). g) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve; thalamus-stimulated birds (blue), HVC-stimulated birds (black) dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. thalamus identifies significant (P < 0.05) differences between 60 and 140 ms time bins. (inset) Violin plots reporting the latency of motif truncation computed across all the birds (thalamus-stimulated birds (blue), HVC-stimulated birds (white) dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, post-hoc HVC vs. thalamus P < 0.001). h) Average ±SEM latency of motif truncation in response to thalamic light stimulation across the motif (bins= 10% motif advancement). i) Violin plots reporting the latency of motif truncation upon light stimulation, per bird (thalamus stimulation (blue), HVC stimulation (white, dataset from Fig. 1); nested one-way ANOVA comparing all datasets across the manuscript, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc pan-HVC vs. thalamus P = 0.0028). j) plot representing the distribution of all motif truncation times in relation to the nearest syllable (or complex syllable segment) end. 0 ms indicates truncation occurring at the natural end of the syllable or complex syllable segment, as indicated in¹⁷. Thalamic stimulation results in significant truncation prevalence at syllable end, compared to pan-HVC stimulation (AVG ± SEM, pan-HVC n = 6 birds, thalamus n = 2 birds). k) plot representing the truncation latency distribution for thalamic or pan-HVC stimulation (AVG ± SEM, pan-HVC n = 6 birds, thalamus n = 2 birds). l) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) following thalamic light stimulation computed based on the time of stimulation through the progression of the motif (bins= 10% motif advancement). m) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability (left) and normalized probability (right) of motif restart (thalamus-stimulated birds (orange); empty box plots representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i (left) and Fig. 1i (right) respectively; one-way ANOVA comparing restart probability across groups, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. thalamus P = 0.0011; one-way ANOVA, testing the difference between groups’ normalized restart probabilities F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. thalamus: P < 0.001). The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum). n) Violin plots reporting the latency of motif restart (orange: thalamus stimulation, white: HVC stimulation dataset from Fig. 1 compared against all experimental groups across the manuscript; nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. thalamus P < 0.001). o) Same as panel (g) but for latency to motif restart. Two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0036, Tukey’s post-hoc pan-HVC vs. thalamus identifies significant difference (p < 0.05) between 70 and 640 ms time bins. (inset) One-Way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(6) = 244.7, post-hoc HVC vs. thalamus P < 0.001). Brain outline in d adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 4 Role of Uva in motif initiation.

a) Map of excitotoxic thalamic lesions. Brain atlas plates adapted from the zebra finch atlas (http://www.zebrafinchatlas.org), from 1 to 2 mm across the mediolateral axis of magnified of the peri-Uva thalamic area (Uva highlighted in green). Extent of the lesion as measured from the lack of NeuN staining is schematized per each bird as semitransparent red area (dashed black for degeneration) and overlayed across birds by subgroup (peri-thalamus+Uva (left), peri-thalamus excluding Uva (middle), Uva excluding the larger perithalamic areas (right)). DLM – medial part of the dorsolateral nucleus of the anterior thalamus, CP – posterior commissure, Ov – nucleus ovoidalis, OM – occipitomesencephalic tract, ICo – intercollicular nucleus, SN – substantia nigra, Uva – nucleus Uvaeformis, DLP – dorsolateral nucleus of the posterior thalamus, DLA – dorsolateral nucleus of the anterior thalamus, IPo – intermedioposterior nucleus, SpM – nucelus spiriformis medialis, Rt – nucleus rotundus, NIf – nucleus interfacialis). b) Box plots (5-95 percentile, 25,50,75 percentile) reporting average number of TeNT expressing Uva_HVC neurons/brain slice (purple, n = 6 birds). c) Syntax raster plots (~100 song bouts/day) from bird in Fig. 3f showing syntax changes due to TeNT expression in Uva_HVC neurons. d) Schematic, sample image (scale bar 200 µm, white lines mark Uva and optic fiber boundaries), and spectrograms (horizontal lines identify song elements boundaries, scale bar 1 s) reporting effects of eGtACR1-mediated inhibition (blue lines, 1 s) of Uva neurons. e) Optical inhibition of Uva using eGtACR1 increases song bout terminations (n = 5 birds, control (no light, grey), Uva inhibition (blue); two-sided paired t-test P = 0.0238). f) Motif self-similarity during optical inhibition of Uva (two-sided paired t-test P = 0.5902). g,h) Optical excitation of Uva terminals over HVC in the birds from Fig. 3k doesn’t significantly affect song bout terminations (g) (n = 4 birds, control:black, no light, Uva terminals excitation: red, light delivered over eGtACR1-expressing Uva neurons for 1 sec; two-sided paired t-test P = 0.2211) nor motif self-similarity (h) (two-sided paired t-test P = 0.9429). Brain outlines in a and d adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 5 Optogenetic stimulation of pallial afferents to HVC does not disrupt the song motif.

a) Schematic of in-vivo recording of HVC multiunit neuronal activity in anesthetized birds expressing eGtACR1 in NIf; sample trace (top), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation of eGtACR1-expressing NIf afferents (1 s from 0.5 s to 1.5 s, red bar; two-way ANOVA comparing the curve between the 0 and 5 s versus corresponding 5 s baseline without stimulation, interaction F(499,10978) = 9.255 P < 0.001, stimulation F(1,22) = 12.07 P = 0.0022, Sidak’s post-hoc identifies difference at 0.92-1.79 s). Inset shows magnified PSTH and scatter plot with the average (per hemisphere) response to the first 200 ms light stimulation (red dashed rectangle) compared to the last 200 ms baseline (black dashed rectangle, Wilcoxon test P = 0.0017; n= hemispheres, birds). b) (top) Schematic of song-contingent light stimulation of NIf axonal terminals in HVC (optic fiber implanted over HVC) and sample spectrogram of unstimulated (top) and stimulated song (bottom, red bars, 200 ms ≈10 mW bilateral 473 nm LED, spectrogram scale, 0-11KHz, scale bar 200 ms, horizontal lines identify the bouts’ (black), introductory notes’ (light gray), motifs’ (dark gray) and linker syllable’s (brown) boundaries). c) Violin plots reporting accuracy of the stimulated song segment (gray), or corresponding control unstimulated segment (white) per each bird (n = 4); two-way ANOVA, interaction F(3,76) = 1.795 P = 0.1553, CTRL vs. STIM, F(1,76) = 0.7208 P = 0.3985). d) Same as (c) but for Entropy (n = 4; two-way ANOVA, interaction F(3,76) = 1.882 P = 0.1397, CTRL vs. STIM, F(1,76) = 0.1807 P = 0.6720). e) same as (c) but for goodness of pitch (n = 4; two-way ANOVA, interaction F(3,76) = 4.553 P = 0.0055, CTRL vs. STIM, F(1,76) = 2.301 P = 0.1334). f) same as (c) but for Accuracy of the entire motif for birds receiving 1 s light stimulation (n = 4; two-way ANOVA, interaction F(3,76) = 2.417 P = 0.0728, CTRL vs. STIM, F(1,76) = 3.072 P = 0.0837). g-l) same as (a-f), but for eGtACR1 expression in mMAN: (g) two-way ANOVA comparing the curve between the 0 and 5 s versus corresponding 5 s baseline without stimulation, interaction F(499,14970) = 8.937 P < 0.001, stimulation F(1,30) = 16.68 P < 0.001, Tukey’s post-hoc identifies difference at 0.85–1.76 s; Inset: Wilcoxon test P = 0.00071; n= hemispheres, birds. (i) n = 4 birds, two-way ANOVA, interaction F(3,76) = 4.483 P = 0.006, CTRL vs. STIM, F(1,76) = 0.6008 P = 0.4407. (j) two-way ANOVA, interaction F(3,76) = 6.168 P < 0.001, CTRL vs. STIM, F(1,76) = 0.1036 P = 0.3119. (k) two-way ANOVA, interaction F(3,76) = 2.434 P = 0.0714, CTRL vs. STIM, F(1,76) = 2.828 P = 0.0967. (l) two-way ANOVA, interaction F(3,76) = 1.795 P = 0.1553, CTRL vs. STIM, F(1,76) = 0.2599 P = 0.6117). m-r) same as (a-f), but for eGtACR1 expression in Av. (m) two-way ANOVA comparing the curve between the 0 and 5 s versus corresponding 5 s baseline without stimulation, interaction F(499,9980) = 9.999 P < 0.001, stimulation F(1,20) = 39.56 P < 0.001, Tukey’s post-hoc identifies difference at 0.77 and 0.84-1.9 s; Inset: Wilcoxon test P < 0.001; n= hemispheres, birds. (o) n = 4 birds, two-way ANOVA, interaction F(3,76) = 3.524 P = 0.0189, CTRL vs. STIM, F(1,76) = 0.1304 P = 0.7190. (p) two-way ANOVA, interaction F(3,76) = 0.6458 P = 0.588, CTRL vs. STIM, F(1,76) = 0.7229 P = 0.3979. (q) two-way ANOVA, interaction F(3,76) = 5.05 P = 0.003, CTRL vs. STIM, F(1,76) = 0.1634 P = 0.2050. (r) two-way ANOVA,interaction F(3,76) = 3.936 P = 0.0115, CTRL vs. STIM, F(1,76) = 1.143 P = 0.2883). Brain outlines in a, g and m adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 6 Concurrent lesions of pallial afferents to HVC and RA does not disrupt the song motif.

a) Schematic and sample images of retrogradely labeled HVC afferent neurons in NIf, Uva, mMAN and Av (green) and RA afferent neurons in lMAN and HVC (red) (coronal slices for LMAN and mMAN, sagittal slices for HVC, NIf, Av, Uva; scale bar 500 µm, insets 100 µm). b) Box plots (5-95 percentile, 25,50,75 percentile) reporting the number of retrogradely labeled neurons in each brain area per hemisphere (green projecting to HVC, red to RA, n = 12 hemispheres, 7 birds). c) box plot (top) and correlation (bottom) reporting the ratio of the number of retrogradely labeled neurons in mMAN and Uva (average = 1.46, R² = 0.6124, Spearman r = 0.7972 P = 0.0029). d) as (c) for the number of neurons in NIf and Uva (average = 6.40, R² = 0.4917, Spearman r = 0.7902 P = 0.0033). e) as (c) for the number of neurons in Av and Uva (average = 0.09, R² = 0.1726, Spearman r = 0.7112 P = 0.0093). f) as (c) for the number of neurons in lMAN and HVC (average = 1.23, R² = 0.3486, Spearman r = 0.6713 P = 0.0202). g) Schematic and sample image of the excitotoxic lesion of lMAN, mMAN, NIf and Av, combined with retrograde labeling from tracer injections in HVC (green) and RA (red). Tracer labeling reveals surviving afferent neurons and allows post-hoc unbiased estimation of the lesion extent (insets display magnified detail of retrogradely labeled HVC_RA and Uva_HVC cells; scale bar 500 µm, insets 100 µm). h) Box and scatter plot reporting the number of surviving retrogradely labeled neurons in each brain area per hemisphere (green projecting to HVC, red to RA, retrograde labeling in Uva and HVC are reference areas for unbiased lesion quantification, n = 12, 6 birds). Grayed out box plots outlines from panel B reports control data for ease of comparison of the lesion extent. i) Quantification of mMAN, NIf, Av and lMAN lesion, per bird (n = 6 birds). j) Time course of song self-similarity (avg ±SEM, One-Way ANOVA, Mixed-effects analysis, F (1.129, 5.364) = 1.599 P = 0.2640, n = 6 birds). k) Cumulative probability curves reporting the number of motifs/bout sung by the birds before (black) and after (blue) the bilateral excitotoxic lesion of NIf, Av, mMAN and lMAN (n = 6 birds, two-way ANOVA, testing the difference between the distributions of the number of motifs in each bout before and after the lesion, interaction F(8,45) = 0.5989 P = 0.7736, pre- vs. post-lesion F(1,45) = 0.6098 P = 0.4390). l) Scatter plot of motif start failure rate before (black circles) and after (blue triangles) the bilateral excitotoxic lesion of NIf, Av, mMAN and lMAN (two-sided Wilcoxon test P = 0.4375; n = 6 birds). Brain outlines in a and g adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 7 Optogenetic excitation of Area X.

a) Experiment design, ChRmine expression in Area X and outline of the optic fiber track (scale bar 500 µm), and song spectrogram showing song-contingent light stimulation of Area X neurons (top: control, scalebar 200 ms; bottom: stimulated, 200 ms 532 nm light pulses, red bars, horizontal lines identify the bouts’ (black), introductory notes’ (light gray), motifs’ (dark gray) and linker syllable’s (brown) boundaries). b) Probability of motif truncation (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation of Area X (X stimulated birds, n = 5, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. Area X, motif stop P < 0.001, pause+continuation P = 0.2842, no perturbation P < 0.001). c-f) Violin plots reporting accuracy and entropy of song segments with (gray) and without (white) stimulation for each bird (accuracy: n = 5, two-way ANOVA, testing the effect of optogenetic stimulation, interaction F(4,95) = 0.9003 P = 0.4671, CTRL vs. STIM, F(1,95) = 1.108 P = 0.2953; entropy: n = 5, two-way ANOVA, interaction F(4,95) = 6.521 P < 0.001, CTRL vs. STIM, F(1,95) = 16.50 P < 0.001; amplitude: two-way ANOVA testing the effect of optogenetic stimulation, interaction F(4,95) = 8.025 P < 0.001, CTRL vs. STIM, F(1,114) = 0.5747 P = 0.4503; goodness of pitch: two-way ANOVA, interaction F(4,95) = 9.002 P < 0.001, F(1,114) = 6.606 P = 0.0117). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 8 Optogenetic excitation of RA.

a) Schematic of the song-contingent light stimulation of RA neurons expressing ChRmine, ChRmine expression in RA and outline of the optic fiber track (scale bar 500 µm), and spectrograms representing normal (top, scalebar 200 ms) and stimulated song (bottom, 10 ms 532 nm light pulses, red bars; horizontal lines identify the bouts’ (black), introductory notes’ (light gray), motifs’ (dark gray) and linker syllable’s (brown) boundaries) displaying light-evoked motif truncation of syllables followed by motif restart. b) (left) stacked song amplitude plots showing all the stimulated motifs, ordered by the timing of stimulation from the motif onset (red line). (right) Plot reporting a subset of stimulated motifs’ latency to optogenetic stimulation (red circle), motif truncation (blue “x”), and restart of a motif (orange), intro notes not followed by a motif (purple), calls (grey) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. c) The probability of motif stop (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (RA stimulated birds, n = 5, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. RA, motif stop P = 0.8675, pause+continuation P = 0.4594, no perturbation P = 0.9933). d) Average latency ±SEM of motif truncation in response to RA light stimulation (bins = 10% motif advancement). e) Latency to motif truncation following light stimulation of RA or HVC (blue: RA stimulation, white: HVC dataset from Fig. 1; Nested one-way ANOVA comparing latency to restart across groups, F(6,19) = 9.678 P < 0.001, Dunnett’s post-hoc pan-HVC vs. RA P = 0.9986). f) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve, blue: RA-stimulated birds, black: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. RA P > 0.05). (inset) latency of motif truncation computed across all the birds (blue: RA-stimulated birds, white: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, post-hoc HVC vs. RA P > 0.9999). g) Average ±SEM probability of post-truncation vocalization resumption by category (motif restart (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause (green)) upon stimulation delivered at different latencies throughout the progression of the motif (bins = 10% motif advancement). h) Average ±SEM probability of post-truncation behavior (no vocalization resumption in the 1 s post-truncation (black), motif restart (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause (green)) in response to the RA light stimulation delivered at different latencies throughout the progression of the motif (bins = 10% motif advancement). i) (left) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability of motif restart following optogenetic stimulation of RA or HVC (orange dots: n = 5 RA-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i. The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines maximum, median, and minimum); one-way ANOVA testing the difference between groups’ restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. RA P = 0.04996). Normalized probability of post-truncation motif restart for each bird (RA stimulated birds: filled circles; empty box plots from HVC stimulated birds in Fig. 1i reported for comparison; one-way ANOVA testing the difference between groups’ normalized restart probabilities, F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. RA: P = 0.0203). j) Violin plots reporting the latency to motif restart (orange: RA stimulation birds, white: HVC stimulation dataset from Fig. 1; Nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. RA P = 0.2186). k) Cumulative probability curves reporting the latency to post-truncation motif restart (average ±SEM of each bird’s curve, orange: RA, black: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc pan-HVC vs. RA identifies significant difference (p < 0.05) between 60 and 280 ms time bins). (inset) Latency to motif restart computed across all the birds (orange: RA stimulated birds, white: HVC birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, post-hoc HVC vs. RA P < 0.001). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 9 Selective optogenetic stimulation of HVC_RA neurons restarts the song motif.

a) Schematic of the experiment showing song-contingent in-vivo light stimulation of HVC_RA neurons expressing the excitatory opsin ChRmine, and spectrograms representing normal (top, scale bar 200 ms) and stimulated song (bottom, 50 ms 532 nm light pulses, red bars) displaying light-evoked motif truncation of syllables followed by rapid motif restarting (horizontal lines identify the bouts’ (black), introductory notes’ (light gray) and motifs’ (dark gray) boundaries). b) (left) Stacked song amplitude plot showing all the stimulated motifs, ordered by the timing of stimulation from the motif onset (red line). (right) Plot reporting a subset of stimulated motifs’ latency to optogenetic stimulation (red circle), motif truncation (blue “x”), and restart of a motif (orange), intro notes not followed by a motif (purple), calls (grey) or continuation of the motif after a pause (green) normally not present in the unstimulated motif within 1 s following stimulation. c) Probability of motif stop (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (HVC_RA stimulated birds, n = 4, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulationF(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc pan-HVC vs. HVC_RA, motif stop P > 0.9999, pause+continuation P = 0.6537, no perturbation P = 0.5368). d) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve, blue: HVC_RA-stimulated birds, black: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. HVC_RA identifies significant difference (p < 0.05) between 60 and 90 ms time bins). (inset) Latency of motif truncation computed across all the birds (blue: HVC_RA-stimulated birds, white: HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, post-hoc HVC vs. HVC_RA P < 0.001). e) Average ±SEM probability of post-truncation vocalization resumption (motif restart (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause (green) in response to the HVC_RA light stimulation delivered at different latencies throughout the progression of the motif (bins= 10% motif advancement). f) Box plots (5-95 percentile, 25,50,75 percentile) reporting the normalized probability of motif restart (orange dots: n = 4 HVC_RA-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Fig. 1i; one-way ANOVA testing the difference between groups’ normalized restart probabilities, F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. HVC_RA: P = 0.8071). g) Cumulative probability curves reporting the latency to post-truncation motif restart (average±SEM of each bird’s curve, orange: HVC_RA, black: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc pan-HVC vs. HVC_RA identifies significant difference (p < 0.05) between 70 and 130 ms timebins). (inset) Latency to motif restart computed across all the birds (orange: HVC_RA stimulated birds, white: HVC birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, post-hoc HVC vs. HVC_RA P = 0.1157). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 10 Optogenetic manipulation of HVC_RA neurons.

a) Schematic of the viral strategy (AAV_DIO_ChRmine in HVC, high-titer retrograde AAV_Cre in RA) and sample images of retrogradely labeled HVC_RA (magenta, arrowheads) but not HVC_X neurons (cyan) displaying conditional expression of ChRmine (yellow) (scale bar 200 µm, insets 3x magnification). b) HVC multiunit neuronal activity recording in anesthetized birds expressing ChRmine in HVC_RA neurons. Sample trace (top, scale bar 1 V, 1 s), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation (100 ms, red bar; two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation, interaction F(69,483) = 3.445 P < 0.001, stimulation F(1,7) = 7.902 P = 0.0261, Sidak post-hoc P < 0.05 between 50 ms after the light onset and 50 ms after the light offset). (inset) PSTH and scatter plot illustrating the average (per hemisphere) response to the first 100 ms light stimulation (red dashed rectangle) compared to the last 100 ms baseline (black dashed rectangle, Wilcoxon test P = 0.0156; n = 8 hemispheres, 4 birds). c) Average latency ±SEM to motif truncation in response to the HVC_RA light stimulation (bins = 10% motif advancement). d) Latency to motif truncation following light stimulation (blue: HVC_RA stimulation, white: HVC dataset from Fig. 1; nested one-way ANOVA comparing latency to truncation across groups, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc pan-HVC vs. HVC_RA P = 0.2873). e) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) in response to the HVC_RA light stimulation computed based on the time of stimulation through the progression of the motif (bins = 10% motif advancement). f) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability of motif restart (orange dots: n = 4 HVC_RA-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i. The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum); one-way ANOVA testing the difference between groups’ restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. HVC_RA P = 0.9963). g) Latency of motif restart (orange: n = 4 birds HVC_RA stimulation birds, white: HVC stimulation dataset from Fig. 1 compared against all experimental groups across the manuscript; nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. HVC_RA P = 0.8071). h-i) Cumulative probability curves and violin plots (data reported from Extended Data Fig. 8, 9) illustrating the latency to song truncation (h) and latency to motif restart (i) in response to the light stimulation (average±SEM of each bird’s curve, magenta: n = 4 HVC_RA-stimulated birds, yellow: n = 5 RA-stimulated birds, black: n = 6 HVC-stimulated birds dataset from Fig. 1 compared against all experimental groups across the manuscript; 10 ms time bins). For truncation (h), two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. HVC_RA identifies significant difference (p < 0.05) at the 60-90 ms time bins; violin plots: one-way ANOVA, Kruskal Wallis test H(5) = 468.9, Dunn’s multiple comparisons test: HVC vs. HVC_RA P < 0.001). For restart latency (i) two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P < 0.001, Tukey’s post-hoc pan-HVC vs. HVC_RA identifies significant difference (p < 0.05) at the 170-200 ms time bins; violin plots: one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVC vs. HVC_RA P = 0.1157, HVC_RA vs. RA P < 0.001). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 11 Optogenetic manipulation of HVC_X neurons.

a) Schematic of the viral strategy (AAV_DIO_ChRmine in HVC, high-titer retrograde AAV_Cre in X) and sample images of retrogradely labeled HVC_X (cyan, arrowheads) but not HVC_RA neurons (magenta) displaying conditional expression of ChRmine (yellow) (scale bar 200 µm, insets 3x magnification). b) HVC multiunit neuronal activity recording in anesthetized birds expressing ChRmine in HVC_X neurons. Sample trace (top, scale bar 1 V, 1 s), raster plot (mid, 10 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation (100 ms, red bar; two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation, interaction F(69,483) = 10.35 P < 0.001, stimulation F(1,7) = 23.44 P = 0.0019, Sidak post-hoc P < 0.05 between 30 ms after the light onset and 90 ms after the light offset); inset, paired t-test P < 0.001; n = 8 hemispheres, 4 birds). c) Average latency ±SEM to motif truncation in response to the HVC_X light stimulation (bins= 10% motif advancement). d) Latency to motif truncation following light stimulation (blue: HVC_X stimulation, white: HVC dataset from Fig. 1; nested one-way ANOVA testing the difference between truncation latencies, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc=0.9599). e) Average ±SEM probability of post-truncation behavior (within 1 s from truncation: no vocalization resumption (black), motif restart with any introductory note or syllable A (orange), intro notes not followed by a motif (purple), calls (grey), resumption of the motif after a pause normally not present in control motifs (green)) in response to the HVC_X light stimulation computed based on the time of stimulation through the progression of the motif (bins=10% motif advancement). f) Probability of motif restart (orange dots: HVC_X-stimulated birds; empty box plot representing data from birds receiving HVC stimulation reported from Extended Data Fig. 1i. The underlying shaded areas represent the probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed lines show the maximum, median, and minimum); One-way ANOVA testing the difference between restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc pan-HVC vs. HVC_X P = 0.8772). g) Latency of motif restart (orange: n = 4 birds HVC_X stimulation birds, white: HVC stimulation dataset from Fig. 1 compared against all experimental groups across the manuscript; Nested one-way ANOVA comparing latency to restart across groups, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc pan-HVC vs. HVC_X P > 0.9999). h-i) Cumulative probability curves and violin plots illustrating the latency to song truncation (h) and latency to motif restart (i) in response to the light stimulation (magenta: HVC_RA-stimulated birds, cyan: HVC_X-stimulated birds, data reported from Extended Data Fig. 9 and Fig. 4 respectively; data compared across all the groups throughout the manuscript, 10 ms time bins; (h) two-way ANOVA testing the difference between truncation latency distributions, interaction F(255,867) = 2.351 P < 0.001, stimulated subpopulation F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc HVC_RA vs. HVC_X identifies significant difference (p < 0.05) at the 60-80 ms time bins; violin plots: one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(5) = 468.9, Dunn’s multiple comparisons test: HVC_RA vs. HVC_X P < 0.001. (i) two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P < 0.001, Tukey’s post-hoc HVC_RA vs. HVC_X identifies significant difference (p < 0.05) at the 170-200 ms time bins; violin plots: One-Way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVC_RA vs. HVC_X P < 0.001). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 12 Antidromic optogenetic excitation of HVC_X neurons from Area X.

a) (top) Schematic and sample image of eGtACR1 expression in HVC and fiber optic implant over Area X for antidromic excitation of HVC_X neurons (scale bar 1 mm); (bottom) spectrograms (0-11KHz, scale bar 200 ms, horizontal lines identify song elements) representing normal (top) and stimulated song (bottom, 50 ms 470 nm light pulses, red bars) displaying light-evoked motif truncation of syllables followed by rapid motif restarting. b) HVC multiunit neuronal activity recording in anesthetized birds, sample trace (top, scale bar 1 V, 1 s; inset 1 V, 100 ms), raster plot (mid, 30 trials) and normalized peri-stimulus time histogram (bottom) reporting the change in multi-unit HVC firing activity in response to light stimulation of eGtACR1-expressing HVC_X afferents reaching Area X (100 ms, red bar, two-way ANOVA comparing the curve between the 300 ms before and 300 ms after the 100 ms stimulation versus corresponding 700 ms baseline without stimulation, interaction F(69,345) = 2.179 P < 0.001, stimulation F(1,5) = 9.957 P = 0.0252, Sidak post-hoc P < 0.05 between 20 ms and 70 ms after the light onset). (inset) PSTH and scatter plot showing the average (per hemisphere) response to the first 100 ms of light stimulation (red dashed rectangle) compared to the last 100 ms baseline (black dashed rectangle, paired t-test P = 0.0446; n = 6 hemispheres, 3 birds). c) Box plots (5-95 percentile, 25,50,75 percentile) reporting the probability of motif truncation (okra), pause and continuation of the motif (green) or absence of syntactic perturbation (gray) after the light stimulation (HVC→X stimulated birds, n = 2, filled circles; empty box plots from HVC stimulation in Fig. 1e reported for comparison; two-way ANOVA testing the difference between stimulation outcome probabilities across all experimental groups, interaction F(14,46) = 57.75 P < 0.001, stimulated subpopulation F(7,23) = 1.088 P = 0.4027, Dunnett’s post-hoc, motif stop P = 0.0129, pause+continuation P = 0.6958, no perturbation P < 0.001). d) Cumulative probability curves reporting the latency to song truncation in response to the light stimulation (average ±SEM of each bird’s curve, blue: HVC→X, black: HVC, dataset from Fig. 1 compared against all experimental groups across the manuscript, cyan: HVC_X dataset from Fig. 4d for comparison, 10 ms time bins, two-way ANOVA testing the difference between truncation latency distributions across the manuscript, F(5,17) = 4.142 P = 0.0121, Tukey’s post-hoc pan-HVC vs. HVC→X identifies significant difference (p < 0.05) at the 50−80 ms timebins, HVC_X vs. HVC→X P > 0.05). (inset) Latency of motif truncation computed across all the birds (blue: HVC→X, cyan: HVC_X dataset from Fig. 4, white: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between truncation latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVC vs. HVC→X P < 0.001, HVC_X vs. HVC→X P < 0.001; Cohen’s d = 0.49). e) Latency of motif truncation upon light stimulation, per bird (blue: HVC→X stimulation, white: HVC dataset from Fig. 1; nested one-way ANOVA comparing all datasets across the manuscript, F(5.17) = 4.175 P = 0.0117, Dunnett’s post-hoc pan-HVC vs. HVC→X P = 0.6512). f) Box plots (5−95 percentile, 25,50,75 percentile) reporting the normalized (left) and not normalized (right) probability of motif restart for each bird (n = 2 HVC→X stimulated birds: filled circles; empty box plots from Fig. 1i and Extended Data Fig. 1i reported for comparison; for the not-normalized probability, the underlying shaded areas represent the baseline probability, for each of the birds, of producing a motif after any one motif (see methods, provides the basis for normalization of motif restart probability; dashed top and bottom lines max and min, mid line represents median); left, one-way ANOVA testing the difference between pan-HVC and HVC→X groups’ normalized restart probabilities, F(5,17) = 9.939 P < 0.0001, Dunnett’s post-hoc HVC vs. HVC→X: P > 0.9999; right, one-way ANOVA testing the difference between pan-HVC and thalamus groups’ restart probabilities, F(5,17) = 6.099 P = 0.0021, Dunnett’s post-hoc P = 0.9996). g) Latency of motif restart (orange: HVC→X stimulation birds, white: HVC stimulation dataset from Fig. 1; nested one-way ANOVA comparing all datasets across the manuscript, F(5,17) = 6.119 P = 0.0020, Dunn’s post-hoc HVC vs. HVC→X P = 0.9990). h) Cumulative probability curves reporting the latency to post-truncation motif restart in response to stimulation of HVC_X axon terminals (average ±SEM of each bird’s curve, orange: HVC→X, cyan: HVC_X dataset from Fig. 4, black: HVC dataset from Fig. 1 compared against all experimental groups across the manuscript, 10 ms time bins, 2 W ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc HVC→X vs. HVC identifies significant difference at the 70-120 ms time bins; HVC→X vs. HVC_X P > 0.05). (inset) Latency to motif restart computed across all the birds (orange: HVC→X, cyan: HVC_X dataset from Fig. 4, white: HVC birds dataset from Fig. 1 compared against all experimental groups across the manuscript; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: pan-HVC vs. HVC→X P > 0.9999, HVC_X vs. HVC→X vs. P > 0.9999). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Extended Data Fig. 13 HVC_X neurons in song pattern generation.

a) Box plots (5−95 percentile, 25,50,75 percentile) reporting oEPSC (left) and oIPSC (center) amplitude by cell class (Mann-Whitney test, oEPSCs U = 77 P = 0.0028, oIPSCs u = 27 P < 0.001; n= cells, animals), and (right) the ratio of oEPSC and oIPSC peak amplitudes (Mann-Whitney test, U = 119 P = 0.9847; n = cells, animals). b) Time course reporting data from Fig. 5k plotted by day, per bird, each bird color coded based on the TeNT average normalized optical density in HVC. c) Scatter plot correlating the motif self-similarity (compared to the baseline motif) with the average number of TeNT+ cells/brain slice, per each bird, color coded based on the TeNT average normalized optical density in HVC (Spearman r = −0.2970, r² = 0.05723 P = 0.4069). d) Same as (h) but for the accuracy of the motif (Spearman r = −0.2848, r² = 0.03963 P = 0.4271). e) Spectrograms (0-11 KHz, scale bar 200 ms, horizontal lines identify song elements) from a bird showing intermediate levels of TeNT expression in HVC_X neurons (different from the bird in Fig. 5). Notice the continuous failure to complete motifs, either with truncation within syllables or at syllable boundaries, followed by rapid motif restart. The inset reports all spectrograms from one day in week 3, aligned at syllable A and ordered by motif length. f) Simplified syntax raster plots (100 motifs/day) for the bird in panel E. g) Cumulative probability curves reporting the latency to post-truncation motif restart (average ±SEM of each bird’s curve, blue: HVC_X TeNT, cyan: HVC_X dataset from Fig. 5, compared against all experimental groups across the manuscript, 10 ms time bins, two-way ANOVA testing the difference between restart latency distributions, interaction F(594,2178) = 3.212 P < 0.001, stimulated subpopulation F(6,22) = 5.966 P = 0.0009, Tukey’s post-hoc HVC_X vs. HVC_X TeNT P > 0.05). (inset) Violin plots reporting the latency of motif truncation computed across all the birds (HVC_X dataset from Fig. 4, HVC_X TeNT dataset from Fig. 5; one-way ANOVA testing the difference between restart latencies, Kruskal Wallis test H(6) = 244.7, Dunn’s multiple comparisons test: HVC_X vs. HVC_X TeNT P = 0.5654). Brain outline in a adapted with permission from ref. ⁶⁰, Wiley.

Source Data

Supplementary information

Supplementary Tables (download PDF )

Supplementary Tables 1–3.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Video 1 (download MOV )

3D rendering of zebra finch brain with tracers injected in HVC (green) and RA (red) to label efferent axons and retrogradely identified afferent neurons. RA axons flow caudally in the posterior commissure around and below Uva.

Supplementary Video 2 (download MOV )

Animation displaying the model’s prediction of HVCPN and interneuron activity waves in normal conditions, upon optogenetic stimulation and in network degradation conditions mimicking TeNT expression in a subset of HVCX neurons.

Supplementary Video 3 (download MOV )

Animation of the schematic in Fig. 5b representing a proposed simplified description of HVC dynamics in normal conditions, upon optogenetic stimulation and in network degradation conditions mimicking TeNT expression in a subset of HVCX neurons.

Source data

Source Data Figs. 1–5 and Extended Data Figs. 1–13 (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Trusel, M., Zuo, J., Alam, D.H. et al. Holistic motor control of zebra finch song syllable sequences. Nature 652, 157–166 (2026). https://doi.org/10.1038/s41586-025-10069-z

Download citation

Received: 04 March 2025
Accepted: 16 December 2025
Published: 28 January 2026
Version of record: 28 January 2026
Issue date: 02 April 2026
DOI: https://doi.org/10.1038/s41586-025-10069-z

Subjects

Abstract

Similar content being viewed by others

Main

Optogenetic restarting of song

Song initiation needs thalamic input

Pallial afferents are not needed for song

Song pattern-generating network in HVC

HVCX neurons in song pattern generation

HVC pattern-generating network model

Discussion

Methods

Animals

Viral vectors

Stereotaxic surgery

Optogenetic manipulations

Lesion quantification

In vivo extracellular recordings

Ex vivo physiology

Slice preparation

Slice electrophysiological recording

Histology and immunohistochemistry

Three-dimensional brain imaging and processing

Song analysis

Recurrent circuit model of HVC

Circuit dynamics

Sequence initiation

Boundaries

Truncation

HVCX degradation

Statistical analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

HVC_X neurons in song pattern generation

HVC_X degradation