Reply to: The timing of speech-to-speech synchronization is governed by the P-center

Assaneo, M. Florencia

doi:10.1038/s42003-025-07546-6

Download PDF

Matters Arising
Open access
Published: 13 February 2025

Reply to: The timing of speech-to-speech synchronization is governed by the P-center

M. Florencia Assaneo ORCID: orcid.org/0000-0002-2793-7827¹

Communications Biology volume 8, Article number: 231 (2025) Cite this article

911 Accesses
1 Altmetric
Metrics details

Subjects

replying to: T. Rathcke Communications Biology https://doi.org/10.1038/s42003-025-07544-8 (2025).

In her comment on our study, the author suggests that the bimodal outcome observed in the speech-to-speech synchronization test (SSS-test) may be a byproduct of the “P-center effect” rather than a reflection of participants’ auditory-motor synchronization abilities. I acknowledge that our previous studies have not sufficiently explored the role of perceptual P-centers in this bimodal outcome, and I agree that this aspect warrants thorough investigation through specifically designed experiments. However, based on the existing evidence, I believe that differences in participants’ auditory-motor abilities provide the most straightforward interpretation of the bimodal distribution. Below, I present several reasons supporting my contention.

The author relies on the analysis of discrete perceptual events to argue that our stimulus lacks rhythmicity compared to the timing of syllables produced with a metronome (Fig. 1 in the comment). However, our approach is fundamentally different. We focus on a continuous physical property of the perceived and produced signals: the envelope. It is well established that the envelope is one of the primary acoustic properties processed by the auditory cortex when listening to speech^1,2 or music^3,4. Thus, regardless of higher-order perceptual processing of auditory stimuli, it is clear that the rhythmic properties of the envelope are being recovered by auditory areas. Additionally, brain activity in motor regions during speech production closely approximates the speech envelope⁵. The main goal of the SSS-test is to assess how well participants can synchronize their speech-motor activity to the rhythmic features of the envelope of the sound they are listening to. To this end, we designed a stimulus with a clear rhythmic envelope (Fig. 1a) and estimated the phase-locking value between the envelope of the produced speech and the envelope of the stimulus, measuring the stability of the phase lag between the two continuous signals^6,7 (Fig. 1b). This measurement reveals the bimodal outcome, indicating general synchrony between the produced and perceived envelopes. Since auditory activity when listening to speech tracks its envelope and the activity in motor areas has been shown to be proportional to the produced speech envelope, the bimodal outcome suggests a synchronization between these two regions for high synchronizers but not for lows. This is further supported by the observation that high and low synchronizers exhibit differences in the microstructural properties and total volume of the arcuate fasciculus, the principal white matter pathway connecting temporal (auditory) and frontal (motor/pre-motor) brain regions⁶.

**Fig. 1: Overview of different versions of the SSS-test.**

One interpretation presented by the author is that high synchronizers ignore precise acoustic timing and follow a rate broadly commensurate with the auditory prompt. However, when we introduced an imperceptible increment in the syllabic rate, high synchronizers rapidly adapted their rate (Fig. 1c). It is difficult to reconcile this fast and involuntary response with the interpretation that high synchronizers establish the distal rate of spoken inputs to synchronize their speech. Additionally, we found no difference between high and low synchronizers in the reported perception of changes in the presentation rate (Fig. 1d), arguing against a difference in perceived rhythmicity between the groups⁶.

Regarding low synchronizers, she proposes that they conscientiously followed the prompt, trying to synchronize with the jittered P-center of concatenated syllables. However, the implicit version of the test, in which participants are not explicitly instructed to synchronize but instead perform an orthogonal syllable recall task, also recovers the bimodal outcome⁶. In this design, participants are instructed to pay attention to the external stream of syllables because they will be debriefed afterward about which syllables were presented. The production of “tahs” is framed as a way to increase the difficulty of the task, rather than as a synchronization exercise. Crucially, even in this version, where participants are not intended to synchronize, the results still show a clear distinction between high and low synchronizers. Moreover, if all low synchronizers were employing a uniform synchronization strategy, using the same auditory stimulus across participants, as in the SSS-test, would likely reveal a consistent temporal pattern among them. However, observations show that some low synchronizers maintain a quicker pace while others adopt a slower one. Additionally, variability exists in the stability of their rates, with some demonstrating more consistency than others. These findings argue against the presence of a unified strategy among low synchronizers.

The comment reads: ‘In a sense, then, low synchronizers were actually better at synchronizing with the external auditory prompt than high synchronizers.’ Contrary to this argument, high synchronizers were found to have more years of musical experience than low synchronizers^6,8. It is widely recognized that musicians generally exhibit superior auditory-motor synchronization abilities compared to non-musicians^9,10. Additionally, when exposed to a sequence comprising the repetition of the same acoustic unit (i.e., a tone or the syllable “go”), resulting undoubtedly in a rhythmic pattern of P-centers, low synchronizers continued to perform less effectively than high synchronizers, despite the disappearance of the bimodal effect¹¹. This supports the hypothesis that individuals classified as low synchronizers indeed exhibit inferior auditory-motor synchronization skills compared to high synchronizers.

Finally, the author proposes that the observed grouping might stem from individual differences in the interaction between feedback and feedforward control mechanisms during speech production, as described in the neurocomputational DIVA model. However, the bimodal distribution persists when participants are instructed to clap in sync with the stimulus instead of whispering ‘tah’¹¹.

Based on the reasons outlined above, I favor the hypothesis of qualitatively different auditory-motor integration abilities between groups over the alternative hypothesis proposed by the author. Nevertheless, I recognize the significance of future studies investigating the role of the P-center, as they will ultimately disentangle these two interpretations.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

References

Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ahissar, E. et al. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. PNAS 98, 13367–13372 (2001).
Article CAS PubMed PubMed Central Google Scholar
Di Liberto, G. M., Pelofi, C., Shamma, S. & de Cheveigné, A. Musical expertise enhances the cortical tracking of the acoustic envelope during naturalistic music listening. Acoust. Sci. Technol. 41, 361–364 (2020).
Article Google Scholar
Doelling, K. B., Assaneo, M. F., Bevilacqua, D., Pesaran, B. & Poeppel, D. An oscillator model better predicts cortical entrainment to music. PNAS 116, 10113–10121 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ruspantini, I. et al. Corticomuscular Coherence Is Tuned to the Spontaneous Rhythmicity of Speech at 2–3 Hz. J. Neurosci. 32, 3786–3790 (2012).
Article CAS PubMed PubMed Central Google Scholar
Assaneo, M. F. et al. Spontaneous synchronization to speech reveals neural mechanisms facilitating language learning. Nat. Neurosci. 22, 627–632 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lizcano-Cortés, F. et al. Speech-to-Speech Synchronization protocol to classify human participants as high or low auditory-motor synchronizers. STAR Protoc. 3, 101248 (2022).
Article PubMed PubMed Central Google Scholar
Rimmele, J. M. et al. Musical sophistication and speech auditory-motor coupling: easy tests for quick answers. Front. Neurosci. 15, 764342 (2022).
Franĕk, M., Mates, J., Radil, T., Beck, K. & Pöppel, E. Finger tapping in musicians and nonmusicians. Int J. Psychophysiol. 11, 277–279 (1991).
Article PubMed Google Scholar
Repp, B. H. Sensorimotor synchronization and perception of timing: Effects of music training and task experience. Hum. Mov. Sci. 29, 200–213 (2010).
Article PubMed Google Scholar
Mares, C., Echavarría Solana, R. & Assaneo, M. F. Auditory-motor synchronization varies among individuals and is critically shaped by acoustic features. Commun. Biol. 6, 1–10 (2023).
Article Google Scholar

Download references

Acknowledgements

This work was supported by DGAPA-UNAM through the PAPIIT grant IN206825.

Author information

Authors and Affiliations

Institute of Neurobiology, National Autonomous University of Mexico, Juriquilla, Querétaro, Mexico
M. Florencia Assaneo

Authors

M. Florencia Assaneo
View author publications
Search author on:PubMed Google Scholar

Contributions

M.F.A. conceptualization and writing.

Corresponding author

Correspondence to M. Florencia Assaneo.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Assaneo, M.F. Reply to: The timing of speech-to-speech synchronization is governed by the P-center. Commun Biol 8, 231 (2025). https://doi.org/10.1038/s42003-025-07546-6

Download citation

Received: 03 July 2024
Accepted: 13 January 2025
Published: 13 February 2025
DOI: https://doi.org/10.1038/s42003-025-07546-6