Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Rhythm-based hierarchical predictive computations support acoustic−semantic transformation in speech processing

A preprint version of the article is available at bioRxiv.

Abstract

Unraveling how humans understand speech despite distortions has long intrigued researchers. A prominent hypothesis highlights the role of multiple endogenous brain rhythms in forming the computational context to predict speech structure and content. Yet how neural processes may implement rhythm-based context formation remains unclear. Here we propose the brain rhythm-based inference model (BRyBI) as a possible neural implementation of speech processing in the auditory cortex based on the interaction of endogenous brain rhythms in a predictive coding framework. BRyBI encodes key rhythmic processes for parsing spectro-temporal representations of the speech signal into phoneme sequences and to govern the formation of the phrasal context. BRyBI matches patterns of human performance in speech recognition tasks and explains contradictory experimental observations of rhythms during speech listening and their dependence on the informational aspect of speech (uncertainty and surprise). This work highlights the computational role of multiscale brain rhythms in predictive speech processing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Predictive Bayesian inference for rhythm-based dynamical speech formation in BRyBI.
Fig. 2: General performance of the BRyBI model for natural speech.
Fig. 3: Design of the verification: alternatives for the BRyBI model and experiments with temporary manipulation of speech.
Fig. 4: Human behavior and performance of models in experiments with speech under temporal manipulations.
Fig. 5: Prediction of the model for compressed speech spaced by phrases.

Similar content being viewed by others

Data availability

Source data for Figs. 2d–g, 4a(ii−v), b(ii−v), c(ii−v) and 5 are available with this manuscript. The TIMIT dataset67,68 analyzed during the current study is available via Figshare at https://doi.org/10.6084/m9.figshare.29877332.v1 (ref. 68). Source data are provided with the paper.

Code availability

Code is available via Zenodo at https://doi.org/10.5281/zenodo.15727884 under the MIT License69.

References

  1. Viemeister, N. F. & Wakefield, G. H. Temporal integration and multiple looks. J. Acoust. Soc. Am.90, 858–865 (1991).

    Article  Google Scholar 

  2. Saberi, K. & Perrott, D. R. Cognitive restoration of reversed speech. Nature 398, 760 (1999).

    Article  Google Scholar 

  3. Poeppel, D. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 41, 245–255 (2003).

    Article  Google Scholar 

  4. Giraud, A.-L. et al. Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56, 1127–1134 (2007).

    Article  Google Scholar 

  5. Ding, N., Melloni, L., Zhang, H., Tian, X. & Poeppel, D. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19, 158–164 (2016).

    Article  Google Scholar 

  6. Ghitza, O. Acoustic-driven delta rhythms as prosodic markers. Lang. Cogn. Neurosci. 32, 545–561 (2017).

    Article  Google Scholar 

  7. Giraud, A.-L. & Poeppel, D. Cortical oscillations and speech processing: emerging computational principles and operations. Nat. Neurosci. 15, 511–517 (2012).

    Article  Google Scholar 

  8. Ronconi, L., Oosterhof, N. N., Bonmassar, C. & Melcher, D. Multiple oscillatory rhythms determine the temporal organization of perception. Proc. Natl Acad. Sci. USA 114, 13435–13440 (2017).

    Article  Google Scholar 

  9. Hyafil, A., Giraud, A.-L., Fontolan, L. & Gutkin, B. Neural cross-frequency coupling: connecting architectures, mechanisms, and functions. Trends Neurosci. 38, 725–740 (2015).

    Article  Google Scholar 

  10. Gross, J. et al. Speech rhythms and multiplexed oscillatory sensory coding in the human brain. PLoS Biol. 11, 1001752 (2013).

    Article  Google Scholar 

  11. Teng, X. & Poeppel, D. Theta and gamma bands encode acoustic dynamics over wide-ranging timescales. Cerebral Cortex 30, 2600–2614 (2020).

    Article  Google Scholar 

  12. Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).

    Article  Google Scholar 

  13. Ghitza, O. & Greenberg, S. On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence. Phonetica 66, 113–126 (2009).

    Article  Google Scholar 

  14. Miller, G. A. & Licklider, J. C. The intelligibility of interrupted speech. J. Acoust. Soc. Am. 22, 167–173 (1950).

    Article  Google Scholar 

  15. Huggins, A. Temporally segmented speech. Percept. Psychophys. 18, 149–157 (1975).

    Article  Google Scholar 

  16. Garvey, W. D. The intelligibility of speeded speech. J. Exp. Psychol. 45, 102 (1953).

    Article  Google Scholar 

  17. Ghitza, O. Behavioral evidence for the role of cortical θ oscillations in determining auditory channel capacity for speech. Front. Psychol. 5, 652 (2014).

    Article  Google Scholar 

  18. Gransier, R., Peeters, S. & Wouters, J. The importance of temporal-fine structure to perceive time-compressed speech with and without the restoration of the syllabic rhythm. Sci. Rep. 13, 2874 (2023).

    Article  Google Scholar 

  19. Ding, N., Chatterjee, M. & Simon, J. Z. Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. Neuroimage 88, 41–46 (2014).

    Article  Google Scholar 

  20. Molinaro, N. & Lizarazu, M. Delta (but not theta)-band cortical entrainment involves speech-specific processing. Eur. J. Neurosci. 48, 2642–2650 (2018).

    Article  Google Scholar 

  21. Rimmele, J. M., Poeppel, D. & Ghitza, O. Acoustically driven cortical δ oscillations underpin prosodic chunking. eNeuro 8, 4 (2021).

  22. Roehm, D., Schlesewsky, M., Bornkessel, I., Frisch, S. & Haider, H. Fractionating language comprehension via frequency characteristics of the human EEG. Neuroreport 15, 409–412 (2004).

    Article  Google Scholar 

  23. Donhauser, P. W. & Baillet, S. Two distinct neural timescales for predictive speech processing. Neuron 105, 385–393 (2020).

    Article  Google Scholar 

  24. Bai, F., Meyer, A. S. & Martin, A. E. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol. 20, 3001713 (2022).

    Article  Google Scholar 

  25. Park, H., Thut, G. & Gross, J. Predictive entrainment of natural speech through two fronto-motor top-down channels. Lang. Cogn. Neurosci. 35, 739–751 (2020).

    Article  Google Scholar 

  26. Ten Oever, S., Carta, S., Kaufeld, G. & Martin, A. E. Neural tracking of phrases in spoken language comprehension is automatic and task-dependent. eLife 11, 77468 (2022).

    Article  Google Scholar 

  27. Meyer, L., Henry, M. J., Gaston, P., Schmuck, N. & Friederici, A. D. Linguistic bias modulates interpretation of speech via neural delta-band oscillations. Cerebral Cortex 27, 4293–4302 (2017).

    Google Scholar 

  28. Arnal, L. H., Doelling, K. B. & Poeppel, D. Delta–beta coupled oscillations underlie temporal prediction accuracy. Cerebral Cortex 25, 3077–3085 (2015).

    Article  Google Scholar 

  29. Doelling, K. B., Arnal, L. H. & Assaneo, M. F. Adaptive oscillators support bayesian prediction in temporal processing. PLoS Comput. Biol. 19, 1011669 (2023).

    Article  Google Scholar 

  30. Molinaro, N. et al. Speech-brain phase coupling is enhanced in low contextual semantic predictability conditions. Neuropsychologia 156, 107830 (2021).

    Article  Google Scholar 

  31. Herbst, S. K. & Obleser, J. Implicit temporal predictability enhances pitch discrimination sensitivity and biases the phase of delta oscillations in auditory cortex. Neuroimage 203, 116198 (2019).

    Article  Google Scholar 

  32. Kaufeld, G. et al. Linguistic structure and meaning organize neural oscillations into a content-specific hierarchy. J. Neurosci. 40, 9467–9475 (2020).

    Article  Google Scholar 

  33. Hannemann, R., Obleser, J. & Eulitz, C. Top-down knowledge supports the retrieval of lexical information from degraded speech. Brain Res. 1153, 134–143 (2007).

    Article  Google Scholar 

  34. Hyafil, A., Fontolan, L., Kabdebon, C., Gutkin, B. & Giraud, A.-L. Speech encoding by coupled cortical theta and gamma oscillations. eLife 4, 06213 (2015).

    Article  Google Scholar 

  35. Hovsepyan, S., Olasagasti, I. & Giraud, A.-L. Combining predictive coding and neural oscillations enables online syllable recognition in natural speech. Nat. Commun. 11, 3117 (2020).

    Article  Google Scholar 

  36. Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).

    Article  Google Scholar 

  37. Bastos, A. M. et al. Canonical microcircuits for predictive coding. Neuron 76, 695–711 (2012).

    Article  Google Scholar 

  38. Zhao, B., Dang, J., Zhang, G. & Unoki, M. Cortical oscillatory hierarchy for natural sentence processing. In Proc. Interspeech 2020 125–129 (2020).

  39. Poeppel, D., Idsardi, W. J. & Van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Philos. Trans. R. Soc. B 363, 1071–1086 (2008).

    Article  Google Scholar 

  40. Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A. A. The natural statistics of audiovisual speech. PLoS Comput. Biol. 5, 1000436 (2009).

    Article  Google Scholar 

  41. McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. an account of basic findings. Psychol. Rev. 88, 375 (1981).

    Article  Google Scholar 

  42. Marchesotti, S. et al. Selective enhancement of low-gamma activity by tacs improves phonemic processing and reading accuracy in dyslexia. PLoS Biol. 18, 3000833 (2020).

    Article  Google Scholar 

  43. Friston, K. & Kiebel, S. Predictive coding under the free-energy principle. Philos. Trans. R. Soc. B 364, 1211–1221 (2009).

    Article  Google Scholar 

  44. Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020).

    Google Scholar 

  45. Duecker, K. et al. Challenges and approaches in the study of neural entrainment. J. Neurosci. 44, 40 (2024).

  46. Canales-Johnson, A. et al. Broadband dynamics rather than frequency-specific rhythms underlie prediction error in the primate auditory cortex. J. Neurosci. 41, 9374–9391 (2021).

    Article  Google Scholar 

  47. Giraud, A.-L. Oscillations for all? A commentary on Meyer, Sun & Martin (2020). Lang. Cogn. Neurosci. 35, 1106–1113 (2020).

    Article  Google Scholar 

  48. Schrimpf, M. et al. The neural architecture of language: integrative modeling converges on predictive processing. Proc. Natl Acad. Sci. USA 118, 2105646118 (2021).

    Article  Google Scholar 

  49. Taft, M. & Hambly, G. Exploring the cohort model of spoken word recognition. Cognition 22, 259–282 (1986).

    Article  Google Scholar 

  50. Norris, D. & McQueen, J. M. Shortlist b: a Bayesian model of continuous speech recognition. Psychol. Rev. 115, 357 (2008).

    Article  Google Scholar 

  51. McClelland, J. L. & Elman, J. L. The trace model of speech perception. Cogn. Psychol. 18, 1–86 (1986).

    Article  Google Scholar 

  52. Martin, A. E. A compositional neural architecture for language. J. Cogn. Neurosci. 32, 1407–1427 (2020).

    Article  Google Scholar 

  53. Su, Y., MacGregor, L. J., Olasagasti, I. & Giraud, A.-L. A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension. PLoS Biol. 21, 3002046 (2023).

    Article  Google Scholar 

  54. Wang, X.-J. Neurophysiological and computational principles of cortical rhythms in cognition. Physiol. Rev. 90, 1195–1268 (2010).

    Article  Google Scholar 

  55. Dumont, G. & Gutkin, B. Macroscopic phase resetting-curves determine oscillatory coherence and signal transfer in inter-coupled neural circuits. PLoS Comput. Biol. 15, 1007019 (2019).

    Article  Google Scholar 

  56. Ten Oever, S. & Martin, A. E. An oscillating computational model can track pseudo-rhythmic speech by using linguistic predictions. eLife 10, e68066 (2021).

    Article  Google Scholar 

  57. Hovsepyan, S., Olasagasti, I. & Giraud, A.-L. Rhythmic modulation of prediction errors: a top-down gating role for the beta-range in speech processing. PLoS Comput. Biol. 19, 1011595 (2023).

    Article  Google Scholar 

  58. Giraud, A.-L. & Ramus, F. Neurogenetics and auditory processing in developmental dyslexia. Curr. Opin. Neurobiol. 23, 37–42 (2013).

    Article  Google Scholar 

  59. Lehongre, K., Morillon, B., Giraud, A.-L. & Ramus, F. Impaired auditory sampling in dyslexia: further evidence from combined fMRI and EEG. Front. Hum. Neurosci. 7, 454 (2013).

    Article  Google Scholar 

  60. Lehongre, K., Ramus, F., Villiermet, N., Schwartz, D. & Giraud, A.-L. Altered low-gamma sampling in auditory cortex accounts for the three main facets of dyslexia. Neuron 72, 1080–1090 (2011).

    Article  Google Scholar 

  61. Lallier, M. et al. in Reading and Dyslexia: from Basic Functions to Higher Order Cognition (eds Lachmann, T. & Weis, T.) 147–163 (Springer, 2018).

  62. Power, A. J., Colling, L. J., Mead, N., Barnes, L. & Goswami, U. Neural encoding of the speech envelope by children with developmental dyslexia. Brain Lang. 160, 1–10 (2016).

    Article  Google Scholar 

  63. Goswami, U. Sensory theories of developmental dyslexia: three challenges for research. Nat. Rev. Neurosci. 16, 43–54 (2015).

    Article  Google Scholar 

  64. Lizarazu, M. et al. Neural entrainment to speech and nonspeech in dyslexia: conceptual replication and extension of previous investigations. Cortex 137, 160–178 (2021).

    Article  Google Scholar 

  65. Elsner, B., Kugler, J., Pohl, M. & Mehrholz, J. Transcranial direct current stimulation (tDCS) for improving aphasia in adults with aphasia after stroke. Cochrane Database Syst. Rev. 5, 5 (2019).

  66. Xie, X., Hu, P., Tian, Y., Wang, K. & Bai, T. Transcranial alternating current stimulation enhances speech comprehension in chronic post-stroke aphasia patients: A single-blind sham-controlled study. Brain Stimul. 15, 1538–1540 (2022).

    Article  Google Scholar 

  67. Garofolo, J. S. TIMIT Acoustic Phonetic Continuous Speech Corpus (Linguistic Data Consortium, 1993).

  68. Dogonasheva, O. TIMIT dataset used for testing the BRyBI model. Figshare https://doi.org/10.6084/m9.figshare.29877332.v1 (2025).

  69. Dogonasheva, O. xomaiya/BRyBI: brain rhythm based inference (BRyBI) model of speech processing in auditory cortex. Zenodo https://doi.org/10.5281/zenodo.15727884 (2025).

Download references

Acknowledgements

We thank H. Stein for valuable comments on the manuscript. O.D. was supported by the Brain Program of the IDEAS Research Center and the Vernadski scholarship. The research is in part through computational resources of HPC facilities at HSE University. This work/article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University) (D.Z.). B.G. was funded by Agence Nationale pour la Recherche (nos. ANR-17-EURE-0017 and ANR-10IDEX-0001-02), CNRS and INSERM. A.-L.G. was supported by Fondation pour l’Audition (FPA IDA11), NCCR snf agreement #51NF40_180888, ANR−France 2030−IHU reConnect. This work has benefited from a french government grant managed by the Agence Nationale de la Recherche under the France 2030 programme, reference ANR-23-IAHU-0003. K.B.D. is supported as well by the Institut Pasteur (G5, Human and Artificial Perception).

Author information

Authors and Affiliations

Authors

Contributions

O.D., A.-L.G. and B.G. conceived the research idea. O.D. developed the theoretical formalism with support of B.G. and D.Z., and performed the analytic calculations and performed the numerical simulations. B.G., D.Z. and A.-L.G. verified the theoretical formalism. K.B.D. encouraged O.D. to investigate LLMs and discussed the findings of this work. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Olesia Dogonasheva.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Peer review

Peer review information

Nature Computational Science thanks Sophie Slaats and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–7 and Tables 1–3.

Reporting Summary

Peer Review file

Source data

Source Data Fig. 2

Numerical source data for Fig. 2a–g.

Source Data Fig. 4

Numerical source data for Fig. 4a(ii–v),b(ii–v),c(ii–v).

Source Data Fig. 5

Numerical source data for Fig. 5.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dogonasheva, O., Doelling, K.B., Zakharov, D. et al. Rhythm-based hierarchical predictive computations support acoustic−semantic transformation in speech processing. Nat Comput Sci 5, 915–926 (2025). https://doi.org/10.1038/s43588-025-00876-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00876-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing