Abstract
Unraveling how humans understand speech despite distortions has long intrigued researchers. A prominent hypothesis highlights the role of multiple endogenous brain rhythms in forming the computational context to predict speech structure and content. Yet how neural processes may implement rhythm-based context formation remains unclear. Here we propose the brain rhythm-based inference model (BRyBI) as a possible neural implementation of speech processing in the auditory cortex based on the interaction of endogenous brain rhythms in a predictive coding framework. BRyBI encodes key rhythmic processes for parsing spectro-temporal representations of the speech signal into phoneme sequences and to govern the formation of the phrasal context. BRyBI matches patterns of human performance in speech recognition tasks and explains contradictory experimental observations of rhythms during speech listening and their dependence on the informational aspect of speech (uncertainty and surprise). This work highlights the computational role of multiscale brain rhythms in predictive speech processing.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Source data for Figs. 2d–g, 4a(ii−v), b(ii−v), c(ii−v) and 5 are available with this manuscript. The TIMIT dataset67,68 analyzed during the current study is available via Figshare at https://doi.org/10.6084/m9.figshare.29877332.v1 (ref. 68). Source data are provided with the paper.
Code availability
Code is available via Zenodo at https://doi.org/10.5281/zenodo.15727884 under the MIT License69.
References
Viemeister, N. F. & Wakefield, G. H. Temporal integration and multiple looks. J. Acoust. Soc. Am.90, 858–865 (1991).
Saberi, K. & Perrott, D. R. Cognitive restoration of reversed speech. Nature 398, 760 (1999).
Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 41, 245–255 (2003).
Giraud, A.-L. et al. Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56, 1127–1134 (2007).
Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19, 158–164 (2016).
Ghitza, O. Acoustic-driven delta rhythms as prosodic markers. Lang. Cogn. Neurosci. 32, 545–561 (2017).
Giraud, A.-L. & Poeppel, D. Cortical oscillations and speech processing: emerging computational principles and operations. Nat. Neurosci. 15, 511–517 (2012).
Ronconi, L., Oosterhof, N. N., Bonmassar, C. & Melcher, D. Multiple oscillatory rhythms determine the temporal organization of perception. Proc. Natl Acad. Sci. USA 114, 13435–13440 (2017).
Hyafil, A., Giraud, A.-L., Fontolan, L. & Gutkin, B. Neural cross-frequency coupling: connecting architectures, mechanisms, and functions. Trends Neurosci. 38, 725–740 (2015).
Gross, J. et al. Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biol. 11, 1001752 (2013).
Teng, X. & Poeppel, D. Theta and gamma bands encode acoustic dynamics over wide-ranging timescales. Cerebral Cortex 30, 2600–2614 (2020).
Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).
Ghitza, O. & Greenberg, S. On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence. Phonetica 66, 113–126 (2009).
Miller, G. A. & Licklider, J. C. The intelligibility of interrupted speech. J. Acoust. Soc. Am. 22, 167–173 (1950).
Huggins, A. Temporally segmented speech. Percept. Psychophys. 18, 149–157 (1975).
Garvey, W. D. The intelligibility of speeded speech. J. Exp. Psychol. 45, 102 (1953).
Ghitza, O. Behavioral evidence for the role of cortical θ oscillations in determining auditory channel capacity for speech. Front. Psychol. 5, 652 (2014).
Gransier, R., Peeters, S. & Wouters, J. The importance of temporal-fine structure to perceive time-compressed speech with and without the restoration of the syllabic rhythm. Sci. Rep. 13, 2874 (2023).
Ding, N., Chatterjee, M. & Simon, J. Z. Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. Neuroimage 88, 41–46 (2014).
Molinaro, N. & Lizarazu, M. Delta (but not theta)-band cortical entrainment involves speech-specific processing. Eur. J. Neurosci. 48, 2642–2650 (2018).
Rimmele, J. M., Poeppel, D. & Ghitza, O. Acoustically driven cortical δ oscillations underpin prosodic chunking. eNeuro 8, 4 (2021).
Roehm, D., Schlesewsky, M., Bornkessel, I., Frisch, S. & Haider, H. Fractionating language comprehension via frequency characteristics of the human EEG. Neuroreport 15, 409–412 (2004).
Donhauser, P. W. & Baillet, S. Two distinct neural timescales for predictive speech processing. Neuron 105, 385–393 (2020).
Bai, F., Meyer, A. S. & Martin, A. E. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol. 20, 3001713 (2022).
Park, H., Thut, G. & Gross, J. Predictive entrainment of natural speech through two fronto-motor top-down channels. Lang. Cogn. Neurosci. 35, 739–751 (2020).
Ten Oever, S., Carta, S., Kaufeld, G. & Martin, A. E. Neural tracking of phrases in spoken language comprehension is automatic and task-dependent. eLife 11, 77468 (2022).
Meyer, L., Henry, M. J., Gaston, P., Schmuck, N. & Friederici, A. D. Linguistic bias modulates interpretation of speech via neural delta-band oscillations. Cerebral Cortex 27, 4293–4302 (2017).
Arnal, L. H., Doelling, K. B. & Poeppel, D. Delta–beta coupled oscillations underlie temporal prediction accuracy. Cerebral Cortex 25, 3077–3085 (2015).
Doelling, K. B., Arnal, L. H. & Assaneo, M. F. Adaptive oscillators support bayesian prediction in temporal processing. PLoS Comput. Biol. 19, 1011669 (2023).
Molinaro, N. et al. Speech-brain phase coupling is enhanced in low contextual semantic predictability conditions. Neuropsychologia 156, 107830 (2021).
Herbst, S. K. & Obleser, J. Implicit temporal predictability enhances pitch discrimination sensitivity and biases the phase of delta oscillations in auditory cortex. Neuroimage 203, 116198 (2019).
Kaufeld, G. et al. Linguistic structure and meaning organize neural oscillations into a content-specific hierarchy. J. Neurosci. 40, 9467–9475 (2020).
Hannemann, R., Obleser, J. & Eulitz, C. Top-down knowledge supports the retrieval of lexical information from degraded speech. Brain Res. 1153, 134–143 (2007).
Hyafil, A., Fontolan, L., Kabdebon, C., Gutkin, B. & Giraud, A.-L. Speech encoding by coupled cortical theta and gamma oscillations. eLife 4, 06213 (2015).
Hovsepyan, S., Olasagasti, I. & Giraud, A.-L. Combining predictive coding and neural oscillations enables online syllable recognition in natural speech. Nat. Commun. 11, 3117 (2020).
Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).
Bastos, A. M. et al. Canonical microcircuits for predictive coding. Neuron 76, 695–711 (2012).
Zhao, B., Dang, J., Zhang, G. & Unoki, M. Cortical oscillatory hierarchy for natural sentence processing. In Proc. Interspeech 2020 125–129 (2020).
Poeppel, D., Idsardi, W. J. & Van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Philos. Trans. R. Soc. B 363, 1071–1086 (2008).
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A. A. The natural statistics of audiovisual speech. PLoS Comput. Biol. 5, 1000436 (2009).
McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. an account of basic findings. Psychol. Rev. 88, 375 (1981).
Marchesotti, S. et al. Selective enhancement of low-gamma activity by tacs improves phonemic processing and reading accuracy in dyslexia. PLoS Biol. 18, 3000833 (2020).
Friston, K. & Kiebel, S. Predictive coding under the free-energy principle. Philos. Trans. R. Soc. B 364, 1211–1221 (2009).
Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020).
Duecker, K. et al. Challenges and approaches in the study of neural entrainment. J. Neurosci. 44, 40 (2024).
Canales-Johnson, A. et al. Broadband dynamics rather than frequency-specific rhythms underlie prediction error in the primate auditory cortex. J. Neurosci. 41, 9374–9391 (2021).
Giraud, A.-L. Oscillations for all? A commentary on Meyer, Sun & Martin (2020). Lang. Cogn. Neurosci. 35, 1106–1113 (2020).
Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA 118, 2105646118 (2021).
Taft, M. & Hambly, G. Exploring the cohort model of spoken word recognition. Cognition 22, 259–282 (1986).
Norris, D. & McQueen, J. M. Shortlist b: a Bayesian model of continuous speech recognition. Psychol. Rev. 115, 357 (2008).
McClelland, J. L. & Elman, J. L. The trace model of speech perception. Cogn. Psychol. 18, 1–86 (1986).
Martin, A. E. A compositional neural architecture for language. J. Cogn. Neurosci. 32, 1407–1427 (2020).
Su, Y., MacGregor, L. J., Olasagasti, I. & Giraud, A.-L. A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension. PLoS Biol. 21, 3002046 (2023).
Wang, X.-J. Neurophysiological and computational principles of cortical rhythms in cognition. Physiol. Rev. 90, 1195–1268 (2010).
Dumont, G. & Gutkin, B. Macroscopic phase resetting-curves determine oscillatory coherence and signal transfer in inter-coupled neural circuits. PLoS Comput. Biol. 15, 1007019 (2019).
Ten Oever, S. & Martin, A. E. An oscillating computational model can track pseudo-rhythmic speech by using linguistic predictions. eLife 10, e68066 (2021).
Hovsepyan, S., Olasagasti, I. & Giraud, A.-L. Rhythmic modulation of prediction errors: a top-down gating role for the beta-range in speech processing. PLoS Comput. Biol. 19, 1011595 (2023).
Giraud, A.-L. & Ramus, F. Neurogenetics and auditory processing in developmental dyslexia. Curr. Opin. Neurobiol. 23, 37–42 (2013).
Lehongre, K., Morillon, B., Giraud, A.-L. & Ramus, F. Impaired auditory sampling in dyslexia: further evidence from combined fMRI and EEG. Front. Hum. Neurosci. 7, 454 (2013).
Lehongre, K., Ramus, F., Villiermet, N., Schwartz, D. & Giraud, A.-L. Altered low-gamma sampling in auditory cortex accounts for the three main facets of dyslexia. Neuron 72, 1080–1090 (2011).
Lallier, M. et al. in Reading and Dyslexia: from Basic Functions to Higher Order Cognition (eds Lachmann, T. & Weis, T.) 147–163 (Springer, 2018).
Power, A. J., Colling, L. J., Mead, N., Barnes, L. & Goswami, U. Neural encoding of the speech envelope by children with developmental dyslexia. Brain Lang. 160, 1–10 (2016).
Goswami, U. Sensory theories of developmental dyslexia: three challenges for research. Nat. Rev. Neurosci. 16, 43–54 (2015).
Lizarazu, M. et al. Neural entrainment to speech and nonspeech in dyslexia: conceptual replication and extension of previous investigations. Cortex 137, 160–178 (2021).
Elsner, B., Kugler, J., Pohl, M. & Mehrholz, J. Transcranial direct current stimulation (tDCS) for improving aphasia in adults with aphasia after stroke. Cochrane Database Syst. Rev. 5, 5 (2019).
Xie, X., Hu, P., Tian, Y., Wang, K. & Bai, T. Transcranial alternating current stimulation enhances speech comprehension in chronic post-stroke aphasia patients: A single-blind sham-controlled study. Brain Stimul. 15, 1538–1540 (2022).
Garofolo, J. S. TIMIT Acoustic Phonetic Continuous Speech Corpus (Linguistic Data Consortium, 1993).
Dogonasheva, O. TIMIT dataset used for testing the BRyBI model. Figshare https://doi.org/10.6084/m9.figshare.29877332.v1 (2025).
Dogonasheva, O. xomaiya/BRyBI: brain rhythm based inference (BRyBI) model of speech processing in auditory cortex. Zenodo https://doi.org/10.5281/zenodo.15727884 (2025).
Acknowledgements
We thank H. Stein for valuable comments on the manuscript. O.D. was supported by the Brain Program of the IDEAS Research Center and the Vernadski scholarship. The research is in part through computational resources of HPC facilities at HSE University. This work/article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University) (D.Z.). B.G. was funded by Agence Nationale pour la Recherche (nos. ANR-17-EURE-0017 and ANR-10IDEX-0001-02), CNRS and INSERM. A.-L.G. was supported by Fondation pour l’Audition (FPA IDA11), NCCR snf agreement #51NF40_180888, ANR−France 2030−IHU reConnect. This work has benefited from a french government grant managed by the Agence Nationale de la Recherche under the France 2030 programme, reference ANR-23-IAHU-0003. K.B.D. is supported as well by the Institut Pasteur (G5, Human and Artificial Perception).
Author information
Authors and Affiliations
Contributions
O.D., A.-L.G. and B.G. conceived the research idea. O.D. developed the theoretical formalism with support of B.G. and D.Z., and performed the analytic calculations and performed the numerical simulations. B.G., D.Z. and A.-L.G. verified the theoretical formalism. K.B.D. encouraged O.D. to investigate LLMs and discussed the findings of this work. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Peer review
Peer review information
Nature Computational Science thanks Sophie Slaats and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Discussion, Figs. 1–7 and Tables 1–3.
Source data
Source Data Fig. 2
Numerical source data for Fig. 2a–g.
Source Data Fig. 4
Numerical source data for Fig. 4a(ii–v),b(ii–v),c(ii–v).
Source Data Fig. 5
Numerical source data for Fig. 5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dogonasheva, O., Doelling, K.B., Zakharov, D. et al. Rhythm-based hierarchical predictive computations support acoustic−semantic transformation in speech processing. Nat Comput Sci 5, 915–926 (2025). https://doi.org/10.1038/s43588-025-00876-9
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s43588-025-00876-9
This article is cited by
-
How neural rhythms can guide word recognition
Nature Computational Science (2025)


