Abstract
Brain–computer interfaces (BCIs) have the potential to restore communication for people who have lost the ability to speak owing to a neurological disease or injury. BCIs have been used to translate the neural correlates of attempted speech into text1,2,3. However, text communication fails to capture the nuances of human speech, such as prosody and immediately hearing one’s own voice. Here we demonstrate a brain-to-voice neuroprosthesis that instantaneously synthesizes voice with closed-loop audio feedback by decoding neural activity from 256 microelectrodes implanted into the ventral precentral gyrus of a man with amyotrophic lateral sclerosis and severe dysarthria. We overcame the challenge of lacking ground-truth speech for training the neural decoder and were able to accurately synthesize his voice. Along with phonemic content, we were also able to decode paralinguistic features from intracortical activity, enabling the participant to modulate his BCI-synthesized voice in real time to change intonation and sing short melodies. These results demonstrate the feasibility of enabling people with paralysis to speak intelligibly and expressively through a BCI.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Neural data and brain-to-voice models related to this study are publicly available on Dryad (https://doi.org/10.5061/dryad.2280gb64f)45.
Code availability
Code to implement brain-to-voice synthesis described in this study is publicly available on GitHub (https://github.com/Neuroprosthetics-Lab/brain-to-voice-2025).
References
Card, N. S. et al. An accurate and rapidly calibrating speech neuroprosthesis. N. Engl. J. Med. 391, 609–618 (2024).
Willett, F. R. et al. A high-performance speech neuroprosthesis. Nature 620, 1031–1036 (2023).
Metzger, S. L. et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620, 1037–1046 (2023).
Silva, A. B., Littlejohn, K. T., Liu, J. R., Moses, D. A. & Chang, E. F. The speech neuroprosthesis. Nat. Rev. Neurosci. 25, 473–492 (2024).
Herff, C. et al. Generating natural, intelligible speech from brain activity in motor, premotor, and inferior frontal cortices. Front. Neurosci. 13, 1267 (2019).
Angrick, M. et al. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J. Neural Eng. 16, 036019 (2019).
Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498 (2019).
Meng, K. et al. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J. Neural Eng. 20, 046019 (2023).
Le Godais, G. et al. Overt speech decoding from cortical activity: a comparison of different linear methods. Front. Hum. Neurosci. 17, 1124065 (2023).
Liu, Y. et al. Decoding and synthesizing tonal language speech from brain activity. Sci. Adv. 9, eadh0478 (2023).
Berezutskaya, J. et al. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J. Neural Eng. 20, 056010 (2023).
Shigemi, K. et al. Synthesizing speech from ECoG with a combination of transformer-based encoder and neural vocoder. In ICASSP 2023 – 2023 IEEE Int. Conf. Acoust. Speech Signal Process. 1–5 (IEEE, 2023).
Chen, X. et al. A neural speech decoding framework leveraging deep learning and speech synthesis. Nat. Mach. Intell. 6, 467–480 (2024).
Wilson, G. H. et al. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J. Neural Eng. 17, 066007 (2020).
Wairagkar, M., Hochberg, L. R., Brandman, D. M. & Stavisky, S. D. Synthesizing speech by decoding intracortical neural activity from dorsal motor cortex. In 2023 11th Int. IEEE/EMBS Conf. on Neural Eng. (NER) 1–4 (IEEE, 2023).
Angrick, M. et al. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun. Biol. 4, 1055 (2021).
Wu, X., Wellington, S., Fu, Z. & Zhang, D. Speech decoding from stereo-electroencephalography (sEEG) signals using advanced deep learning methods. J. Neural Eng. 21, 036055 (2024).
Angrick, M. et al. Online speech synthesis using a chronically implanted brain–computer interface in an individual with ALS. Sci. Rep. 14, 9617 (2024).
Glasser, M. F. et al. A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178 (2016).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS, 2017).
Downey, J. E., Schwed, N., Chase, S. M., Schwartz, A. B. & Collinger, J. L. Intracortical recording stability in human brain–computer interface users. J. Neural Eng. 15, 046016 (2018).
Valin, J.-M. & Skoglund, J. LPCNET: improving neural speech synthesis through linear prediction. In ICASSP 2019 – 2019 IEEE Int. Conf. on Acoust. Speech Signal Process. 5891–5895 (IEEE, 2019).
Li, Y. A., Han, C., Raghavan, V. S., Mischler, G. & Mesgarani, N. StyleTTS 2: towards human-level text-to-speech through style diffusion and adversarial training with large speech language models. Adv. Neural Inf. Process. Syst. 36, 19594–19621 (2023).
Dichter, B. K., Breshears, J. D., Leonard, M. K. & Chang, E. F. The control of vocal pitch in human laryngeal motor cortex. Cell 174, 21–31 (2018).
Kaufman, M. T., Churchland, M. M., Ryu, S. I. & Shenoy, K. V. Cortical activity in the null space: permitting preparation without movement. Nat. Neurosci. 17, 440–448 (2014).
Stavisky, S. D., Kao, J. C., Ryu, S. I. & Shenoy, K. V. Motor cortical visuomotor feedback activity is initially isolated from downstream targets in output-null neural state space dimensions. Neuron 95, 195–208 (2017).
Churchland, M. M. & Shenoy, K. V. Preparatory activity and the expansive null-space. Nat. Rev. Neurosci. 25, 213–236 (2024).
Moses, D. A. et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 385, 217–227 (2021).
Kunz, E. M. et al. Representation of verbal thought in motor cortex and implications for speech neuroprostheses. Preprint at bioRxiv https://doi.org/10.1101/2024.10.04.616375 (2024).
Bouchard, K. E., Mesgarani, N., Johnson, K. & Chang, E. F. Functional organization of human sensorimotor cortex for speech articulation. Nature 495, 327–332 (2013).
Chartier, J., Anumanchipalli, G. K., Johnson, K. & Chang, E. F. Encoding of articulatory kinematic trajectories in human speech sensorimotor cortex. Neuron 98, 1042–1054 (2018).
Lu, J. et al. Neural control of lexical tone production in human laryngeal motor cortex. Nat. Commun. 14, 6917 (2023).
Breshears, J. D., Molinaro, A. M. & Chang, E. F. A probabilistic map of the human ventral sensorimotor cortex using electrical stimulation. J. Neurosurg. 123, 340–349 (2015).
Ammanuel, S. G. et al. Intraoperative cortical stimulation mapping with laryngeal electromyography for the localization of human laryngeal motor cortex. J. Neurosurg. 141, 268–277 (2024).
Pandarinath, C. et al. Neural population dynamics in human motor cortex during movements in people with ALS. eLife 4, e07436 (2015).
Stavisky, S. D. et al. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8, e46015 (2019).
Willett, F. R. et al. Hand knob area of premotor cortex represents the whole body in a compositional way. Cell 181, 396–409 (2020).
Ali, Y. H. et al. BRAND: a platform for closed-loop experiments with deep network models. J. Neural Eng. 21, 026046 (2024).
Young, D. et al. Signal processing methods for reducing artifacts in microelectrode brain recordings caused by functional electrical stimulation. J. Neural Eng. 15, 026014 (2018).
Levelt, W. J., Roelofs, A. & Meyer, A. S. A theory of lexical access in speech production. Behav. Brain Sci. 22, 1–38 (1999).
Räsänen, O., Doyle, G. & Frank, M. C. Unsupervised word discovery from speech using automatic segmentation into syllable-like units. Proc. Interspeech 2015, 3204–3208 (2015).
Williams, A. H. et al. Discovering precise temporal patterns in large-scale neural recordings through robust and interpretable time warping. Neuron 105, 246–259 (2020).
Roussel, P. et al. Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception. J. Neural Eng. 17, 056028 (2020).
Shah, N., Sahipjohn, N., Tambrahalli, V., Subramanian, R. & Gandhi, V. StethoSpeech: speech generation through a clinical stethoscope attached to the skin. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8, 123 (2024).
Wairagkar, M. et al. Data for an instantaneous voice synthesis neuroprosthesis. Dryad https://doi.org/10.5061/dryad.2280gb64f (2025).
Acknowledgements
We thank participant T15 and his family and care partners for their contributions to this research. Support was provided by the Office of the Assistant Secretary of Defense for Health Affairs through the Amyotrophic Lateral Sclerosis Research Program under award number AL220043; a New Innovator Award (DP2) from the NIH Office of the Director and managed by NIDCD (1DP2DC021055); a Seed Grant from the ALS Association (23-SGP-652); A. P. Giannini Postdoctoral Fellowship (N.S.C.); Searle Scholars Program; a Pilot Award from the Simons Collaboration for the Global Brain (AN-NC-GB-Pilot Extension-00002343-01); NIH-NIDCD (U01DC017844) and VA RR&D (A2295-R). S.D.S. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund, and a Cultivating Team Science Award from the University of California Davis School of Medicine.
Author information
Authors and Affiliations
Contributions
M.W., S.D.S. and D.M.B. conceived the study and experiment design. M.W. led the experiments and developed and implemented the target speech generation, decoder training algorithms and end-to-end pipeline for instantaneous voice synthesis: feature extraction, noise removal, preprocessing, real-time brain-to-voice decoders, pitch decoders, vocoder and output audio playback, experimental tasks, post-processing. M.W. also performed human listener evaluations, analysed all of the data and created figures. M.W. and N.S.C. developed and implemented the real-time neural signal processing, noise removal and feature-extraction pipelines. M.W., N.S.C., T.S.-C. and X.H. coded the real-time data-collection system and built the neuroprosthetic cart system. N.S.C. generated cloned voice samples for T15. M.W., N.S.C. and C.I. collected the primary data for this study. N.S.C. and C.I. interfaced with the participant and scheduled research sessions. L.M.M. contributed to the human listener evaluations. D.M.B. led planning and performed the surgical-implant-placement procedure. L.R.H. was the sponsor–investigator of the multisite clinical trial. D.M.B. was responsible for all clinical-trial-related activities at University of California Davis. S.D.S. and D.M.B. supervised all aspects of the project. M.W and S.D.S. wrote the paper. All authors reviewed and edited the paper.
Corresponding authors
Ethics declarations
Competing interests
S.D.S. is an inventor on intellectual property related to speech decoding submitted and owned by Stanford University (US patent no. 12008987) that has been licensed to Blackrock Neurotech and Neuralink. M.W., S.D.S. and D.M.B. have patent applications related to speech BCI submitted and owned by the Regents of the University of California (US patent application no. 63/461,507 and 63/450,317), including intellectual property licensed by Paradromics. D.M.B. was a surgical consultant with Paradromics, completing his consultation during the revision period of the paper. He is a consultant for Globus Medical. S.D.S. is a scientific adviser to Sonera. The MGH Translational Research Center has a clinical research support agreement with Ability Neuro, Axoft, Neuralink, Neurobionics, Paradromics, Precision Neuro, Synchron and Reach Neuro, for which L.R.H. provides consultative input. Mass General Brigham is convening the Implantable Brain-Computer Interface Collaborative Community (iBCI-CC); charitable gift agreements to Mass General Brigham, including those received to date from Paradromics, Synchron, Precision Neuro, Neuralink and Blackrock Neurotech, support the iBCI-CC, for which L.R.H. provides effort. The other authors declare no competing interests.
Peer review
Peer review information
Nature thanks Nai Ding, Nick Ramsey and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Microelectrode array placement and brain-to-voice synthesis latencies.
a. The estimated resting state language network from Human Connectome Project data overlaid on T15’s brain anatomy. b. Intraoperative photograph showing the four microelectrode arrays placed on T15’s precentral gyrus. Images in a and b are adapted from ref. 1 (Copyright © 2024 Massachusetts Medical Society, reprinted with permission from Massachusetts Medical Society). c. Closed-loop cumulative latencies across different stages in the voice synthesis and audio playback pipeline are shown. Voice samples were synthesized from raw neural activity measurements within 10 ms and the resulting audio was played out loud continuously to provide closed-loop feedback. Note the linear horizontal axis is split to expand the visual dynamic range. We focused our engineering primarily on reducing the brain-to-voice inference latency, which fundamentally bounds the speech synthesis latency. As a result, the largest remaining contribution to the latency occurred after voice synthesis decoding during the (comparably more mundane) step of audio playback through a sound driver. The cumulative latencies with the audio driver settings used for T15 closed-loop synthesis in earlier experiments are shown in dark grey. Audio playback latencies were subsequently substantially lowered through software optimizations (light grey) in latter sessions and we predict that further reductions will be possible with additional computer engineering.
Extended Data Fig. 2 Additional BCI speech synthesis performance metrics.
a. Mel-cepstral distortion (MCD) is computed across 25 Mel-frequency bands between the closed-loop synthesized speech and the target speech after removing silences between words. The four subpanels show MCDs (mean ± s.d) between the synthesized and target speech for different speech tasks in evaluation research sessions. b. Performance of brain-to-voice decoder measured over time by evaluating neural trials from different sessions offline with a fixed decoder. Decoder trained on post-implant day 165 was fixed and used to synthesize voice offline from neural trials collected in sessions over the next month. Performance was measured by computing Pearson correlation coefficient between the target speech and the synthesized speech across 40 Mel-frequencies after removing silences between words (mean ± s.d., n = 956 sentences). A noticeable decline in brain-to-voice performance was observed after approximately 15 days.
Extended Data Fig. 3 Electrodes show variability in speech tuning.
a. Example closed-loop speech synthesis trial. Spike-band power and threshold crossing spikes from each electrode are shown for one example sentence. These neural features were binned and causally normalized and smoothed on a rolling basis before being decoded to synthesize speech. The mean spike-band power and threshold crossing activity for each individual array are also shown. Speech-related modulation was observed on all arrays, with the highest modulation recorded in v6v and 55b. The synthesized speech is shown in the bottom-most row. The grey trace above it shows the participant’s attempted (unintelligible) speech as recorded with a microphone. b, d. Pearson correlation coefficients of spike-band power and threshold crossings, respectively, between each electrode and the speech envelope (first LPCNet feature predicted by the brain-to-voice decoder). Electrodes are grouped by array and sorted in ascending order to show that different electrodes have different tuning (both positive and negative) with speech. Arrays v6v and 55b have higher correlations with speech and the majority of electrodes show positive tuning. Arrays M1 and d6v have lower correlations with more electrodes tuned negatively. Electrodes with non-significant correlation (p > 0.05) are shown in grey. Insets show the same correlations for each electrode arranged spatially in the array. c, e. The time course of average spike-band power and threshold crossings across trials of two example electrodes with positive (solid line) and negative (dashed line) correlations from each array (example electrodes are marked by the ‘+’ symbol in (b, d)). Different electrodes show complex neural dynamics with respect to speech onset and a rich variety in speech tuning. For example, some electrodes have higher activity before speech onset (possibly contributing to speech preparation) while others have higher activity during or after speech onset.
Extended Data Fig. 4 Speech is not synthesized during non-speech vocalizations or orofacial movements.
a. Three example trials show microphone recording (grey) of T15 during attempted speech trials with coughing, throat clearing, non-speech vocalizations or people speaking in the background and the corresponding brain-to-voice synthesis output (blue). The brain-to-voice decoder did not synthesize audible speech and instead output silence during these non-speech vocalizations or when other people were speaking simultaneously (note that T15 starts speaking via the neuroprosthesis midway through the background conversation in example 3). In contrast, it did synthesize voice when T15 voluntarily attempted to speak. Speech was synthesized instantaneously at the exact pace with which T15 attempted to speak and with very low latency. b. Samples of non-speech vocalization events (grey) and the corresponding speech synthesis output, which was zero (silence) throughout all these events (blue). c. Examples of speech synthesis during attempted speech of each word. Here, the speech is synthesized (blue) appropriately as expected during attempted speech (grey).
Extended Data Fig. 5 Neural activity is not contaminated by acoustic artifacts and residual vocalization and movement cannot synthesize intelligible speech.
a. Three example trials’ audio recording, audio spectrogram, and the spectrograms of the two most acoustic-correlated neural electrodes. Examples are shown for the three types of speech tasks. The prominent spectral structures in the audio spectrogram cannot be observed even in the top two most correlated neural electrodes. An increase in neural activity can be observed before speech onset for each word, reflecting speech preparatory activity and further arguing against acoustic contamination. Note that in the word emphasis example, the last word ‘going’ is not vocalized fully (there is minimal activity in its audio spectrum), yet an increase in neural activity can be observed that is similar to other words. Contamination matrices and statistical criteria are shown in the bottom row, where P-value indicates whether the trial is significantly acoustically contaminated or not. b. An example trial of attempted speech with simultaneous recording of intracortical neural signals and various biosignals measured using a microphone, stethoscopic microphone and IMU sensors (accelerometer and gyroscope). Separate independent decoders were trained to synthesize speech using each of the biosignals (or all three together). c. Intelligible speech could not be synthesized from biosignals measuring sound, movement, and vibrations during attempted speech. (Left) Cross-validated Pearson correlation coefficients (mean ± s.d.) (compared to target speech) of speech synthesized using neural signals, each of the biosignals, and all biosignals together. Reconstruction accuracy is significantly lower for decoding speech from biosignals as compared to neural activity (two-sided Wilcoxon rank-sum, P = 10−59, n = 240 sentences). (Right) Distribution of Pearson correlation coefficients of speech decoding from biosignals and neural signals are mostly non-overlapping, indicating that synthesis quality from biosignals is much lower than that of neural signals. d. To assess the intelligibility of voice synthesis from neural activity and biosignals (stethoscopic mic decoder), naive human listeners performed open transcription of (the same) 30 synthesized trials using both the decoders. Median phoneme error rates and word error rates for neural decoding were significantly lower (43.60%) than decoding stethoscope recordings, which had word error rate of 100%. This indicates that intelligible speech cannot be decoded from these non-neural biosignals.
Extended Data Fig. 6 Encoding of paralinguistic features in neural activity.
a. Neural modulation during question intonation. Trial-averaged normalized spike-band power (each row in a group is one electrode) during trials where the participant modulated his intonation to say the cued sentence as a question. Trials with the same cue sentence (n = 16) were aligned using dynamic time warping and the mean activity across trials spoken as statements was subtracted to better show the increased neural activity around the intonation-modulation at the end of the sentence. The onset of the word that was pitch-modulated in closed-loop is indicated by the arrowhead at the bottom of each example. b. Paralinguistic features encoding recorded from individual arrays. Trial-averaged spike-band power (mean ± s.e.m.), averaged across all electrodes within each array, for words spoken as statements and as questions. At every time point, the spike-band power for statement words and question words were compared using the Wilcoxon rank-sum test. The blue line at the bottom indicates the time points where the spike-band power in statement words and question words were significantly different (P < 0.001, n1 = 970 words, n2 = 184 words). c. Trial averaged spike-band power across each array for non-emphasized and emphasized words. The spike-band power was significantly different between non-emphasized words and emphasized words at time points shown in blue (P < 0.001, n1 = 1269 words, n2 = 333 words). d. Trial-averaged spike-band power across each array for words without pitch modulation and words with pitch modulation (from the three-pitch melodies singing task). Words with low and high pitch targets are grouped together as the ‘pitch modulation’ category (we excluded medium pitch target words where the participant used his normal pitch). The spike-band power was significantly different between no pitch modulation and pitch modulation at time points shown in blue (P < 0.001, n1 = 486 words, n2 = 916 words). e. Confusion matrix showing offline accuracies for decoding question intonation and word emphasis paralinguistic features together using a single combined 3-class classifier.
Extended Data Fig. 7 Closed-loop paralinguistic features modulation.
a. An overview of the paralinguistic feature decoder and pitch modulation pipeline. An independent paralinguistic feature decoder ran in parallel to the regular brain-to-voice decoder. Its output causally modulated the pitch feature predicted by brain-to-voice, resulting in a pitch-modulated voice. b. An example trial of closed-loop intonation modulation for speaking a sentence as a question. A separate binary decoder identified the change in intonation and sent a trigger (downward arrow) to modulate the pitch feature output of the regular brain-to-voice decoder according to a predefined pitch profile for asking a question (low pitch to high pitch). Neural activity of an example trial with its synthesized voice output is shown along with the intonation decoder output, time of modulation trigger (downward arrow), originally predicted pitch feature and the modulated pitch feature used for voice synthesis. c. An example trial of closed-loop word emphasis where the word “YOU” from “What are YOU doing” was emphasized. To emphasize a word, we applied a predefined pitch profile (high pitch to low pitch) along with a 20% increase in the loudness of the predicted speech samples. d. An example trial of closed-loop pitch modulation for singing a melody with three pitch levels. The three-pitch classifier output was used to continuously modulate the predicted pitch feature output from the brain-to-voice decoder.
Extended Data Fig. 8 Pearson correlation coefficients over the course of a sentence.
Pearson correlation coefficient (r) of individual words in sentences of different lengths (mean ± s.d.). The correlation between target and synthesized speech remained consistent throughout the length of sentence, indicating that the quality of synthesized voice was consistent throughout the sentence. Note that there were fewer longer evaluation sentences.
Extended Data Fig. 9 Output-null and output-potent neural dynamics during speech production in individual arrays.
a-d. Average approximated output-null (orange) and output-potent (blue) components of neural activity during attempted speech of cued sentences of different lengths. Here the neural components are computed for each array independently by training separate linear decoders (i.e., repeating the analyses of Fig. 4 for individual arrays independently). A subset of sentence lengths are shown in the interest of space. Note that the d6v array had much less speech-related modulation. Bar plots within each panel show a summary of all the data (including the not-shown sentence lengths) by taking the average null/potent activity ratios for words in the first-quarter, second-quarter, third-quarter, and fourth-quarter of each sentence (mean ± s.e.m., nQ1 = 3,600, nQ2 = 4,181, nQ3 = 3,456, nQ4 = 3,134 words). e-h. Average output-null and output-potent activity during intonation modulation (question-asking or word emphasis) computed separately for each array. Output-null activity shows an increase during intonation modulated word in all arrays. Null/potent activity ratios are summarized in bar plots of intonation-modulated word (red) and the words preceding or following it (grey) (mean ± s.e.m.). The null/potent ratios of modulated words were significantly different from that of non-modulated words for the v6v, M1 and d6v arrays (two-sided Wilcoxon rank-sum, v6v: p = 10−11, M1: p = 10−16, 55b: p = 0.3, d6v: p = 10−26, n1 = 460 modulated words, n2 = 922 non-modulated words).
Extended Data Fig. 10 Head motion during speech and its relationship with the neural dynamics.
a. Head motion was tracked from videos by mapping the x and y positions of the NeuroPort pedestal in each frame (yellow points) using OpenCV. Overall head motion was summarized by the first principal component (pink axis) of x-y motion. The inset shows a single frame of head motion tracking. b. Small head motion was observed during uttering each word. Head motion remained consistent throughout the attempted speech sentence as measured by the motion from baseline during utterance of words in each of the four word-quartiles, regardless of the length of the sentence. c. The ratio of output-null and output-potent components of simultaneously recorded neural activity decayed over the course of the sentence, in contrast to the head motion in (b). d. Time course and amplitude of the output-null and output-potent components of the neural activity and simultaneous head motion in different quartiles of the sentence. The head motion (purple) follows the output-null activity (orange) but precedes the output-potent activity (blue). The output-null activity decayed over the course of the sentence, whereas the head motion during each word in a sentence remained constant. Taken together, this shows that the neural dynamics do not closely match the head motion time course.
Supplementary information
Supplementary Table 1 (download PDF )
Data used for training the brain-to-voice decoders in evaluation sessions.
Supplementary Video 1 (download MP4 )
Dysarthric speech of the participant. This video shows the participant, who has severe dysarthria due to ALS, attempting to speak the sentences cued on the screen. The speech of the participant is unintelligible to naive listeners. Taken on day 25 after implant.
Supplementary Video 2 (download MP4 )
Closed-loop voice synthesis during attempted vocalized speech. This video shows 13 consecutive closed-loop trials of instantaneous voice synthesis as the participant attempts to speak cued sentences. The synthesized voice was played back continuously in real time through a speaker. Taken on day 179 after implant.
Supplementary Video 3 (download MP4 )
Closed-loop voice synthesis with simultaneous brain-to-text decoding. This video shows 15 consecutive closed-loop trials of instantaneous voice synthesis with simultaneous brain-to-text decoding that acted as closed captioning when the participant attempted to speak cued sentences. Taken on day 110 after implant.
Supplementary Video 4 (download MP4 )
Closed-loop voice synthesis during attempted mimed speech. This video shows ten consecutive closed-loop trials of instantaneous voice synthesis with audio feedback as the participant mimed the cued sentences without vocalizing. The decoder was not trained on any mimed-speech neural data. Taken on day 195 after implant.
Supplementary Video 5 (download MP4 )
Closed-loop voice synthesis during self-initiated free responses. This video shows nine closed-loop trials of instantaneous voice synthesis with audio feedback as the participant responds to open-ended questions or is asked to say whatever he wanted. We used this opportunity to ask the participant for his feedback on this brain-to-voice neuroprosthesis. A brain-to-text decoder was used simultaneously to help with understanding what the participant was saying. Taken on days 172, 179, 186, 188, 193 and 195 after implant.
Supplementary Video 6 (download MP4 )
Closed-loop own-voice synthesis during attempted speech. This video shows nine consecutive closed-loop trials of instantaneous speech synthesis in a voice that sounds like the voice of the participant before ALS as the participant attempts to speak cued sentences. Taken on day 286 after implant.
Supplementary Video 7 (download MP4 )
Closed-loop voice synthesis of pseudo-words. This video shows five consecutive trials of closed-loop synthesis of made-up pseudo-words using the brain-to-voice decoder. The decoder was not trained on any pseudo-words. Taken on day 179 after implant.
Supplementary Video 8 (download MP4 )
Closed-loop voice synthesis of interjections. This video shows five trials of closed-loop synthesis of interjections using the brain-to-voice decoder. The decoder was not trained on these words. Taken on day 186 after implant.
Supplementary Video 9 (download MP4 )
Closed-loop voice synthesis for spelling words. This video shows seven trials of closed-loop synthesis in which the participant was spelling cued words one letter at a time using the brain-to-voice decoder. The decoder was not trained on this task. Taken on day 186 after implant.
Supplementary Video 10 (download MP4 )
Closed-loop question intonation. This video shows ten selected trials in which the participant modulated his intonation to say a sentence as a question (indicated by ‘?’ in the cue) or as a statement by using an intonation decoder that modulated the brain-to-voice synthesis in a closed loop. Taken on day 286 after implant.
Supplementary Video 11 (download MP4 )
Closed-loop word emphasis. This video shows eight selected trials in which certain (capitalized) words in the cued sentences were emphasized by the participant by using an emphasis decoder that modulated the brain-to-voice synthesis in a closed loop. Taken on day 286 after implant.
Supplementary Video 12 (download MP4 )
Singing three-pitch melodies in a closed loop. This video shows three consecutive trials in which the participant sung short melodies with three pitch targets by using a pitch decoder that modulated the brain-to-voice synthesis. At the start of each trial, an audio cue plays the target melody. The on-screen targets then turn from red to green to indicate that the participant should begin. The vertical bar on the left shows the instantaneous decoded pitch (low, mid and high). Interactive visual cues for each pitch target are shown on the screen. Visual feedback cues show the note in the melody that the participant is singing. Taken on day 342 after implant.
Supplementary Video 13 (download MP4 )
Singing three-pitch melodies using a unified brain-to-voice decoder. This video shows three trials in which the participant sung short melodies with three pitch targets by using a single unified brain-to-voice decoder that inherently synthesizes intended pitch in a closed loop. At the start of each trial, an audio cue plays the target melody. The vertical bar on the left shows the instantaneous decoded pitch (low, mid and high) for visual feedback only (that is, this separately decoded pitch, which is the same as in Supplementary Video 12, is not used in the unified brain-to-voice model). Interactive cues show the note in the melody that the participant is singing, providing visual feedback. Taken on day 342 after implant.
Supplementary Video 14 (download MP4 )
Closed-loop voice synthesis in session 1. This video shows three closed-loop trials of instantaneous voice synthesis from the first day of neural recording (day 25 after implant). The brain-to-voice decoder was trained during this session using 190 sentence trials with a limited 50-word vocabulary recorded earlier on the same day. The second part of the video shows the same three trials reconstructed offline using an optimized brain-to-voice decoder (that is, the algorithm used throughout the rest of this paper), which improved intelligibility.
Supplementary Video 15 (download MP4 )
Intelligible voice cannot be decoded from biosignal recordings of residual speech of T15. This video shows examples of speech synthesized from simultaneous recordings from microphone, stethoscopic microphone, IMU sensor and intracortical neural activity as T15 attempts to speak. Speech synthesized from microphone, stethoscope and IMU biosignals was not intelligible (word error rate for stethoscope decoding: 100%), whereas the voice synthesized from neural activity was more intelligible (word error rate: 43.60%). From day 482 after implant.
Supplementary Video 16 (download MP4 )
Comparison of the voice of T15 before ALS with the own-voice synthesis by the brain-to-voice BCI. This video shows examples of (1) the voice of T15 before ALS; (2) the voice cloned by the StyleTTS 2 model, which was trained using T15’s voice before ALS; (3) target audio generated using this cloned-voice, time-aligned with neural signals during attempted speech used as training data for the personalized own-voice BCI speech-synthesis decoder; and (4) the own voice synthesized by the personalized brain-to-voice decoder in a closed loop from neural activity.
Supplementary Audio 1 (download WAV )
Acausal speech synthesis by predicting discrete speech units. Audio recording of three example trials of speech reconstructed offline using the approach of predicting discrete speech units acausally at the end of the sentence using connectionist temporal classification (CTC) loss. From day 25 after implant.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wairagkar, M., Card, N.S., Singer-Clark, T. et al. An instantaneous voice-synthesis neuroprosthesis. Nature 644, 145–152 (2025). https://doi.org/10.1038/s41586-025-09127-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-09127-3
This article is cited by
-
The emergence of NeuroAI: bridging neuroscience and artificial intelligence
Nature Reviews Neuroscience (2025)
-
World first: brain implant lets man speak with expression — and sing
Nature (2025)
-
Brain–computer interface restores naturalistic speech to a man with ALS
Nature Reviews Neurology (2025)
-
A brain implant that could rival Neuralink’s enters clinical trials
Nature (2025)
-
China pours money into brain chips that give paralysed people more control
Nature (2025)


