replying to: T. Rathcke Communications Biology https://doi.org/10.1038/s42003-025-07544-8 (2025).
In her comment on our study, the author suggests that the bimodal outcome observed in the speech-to-speech synchronization test (SSS-test) may be a byproduct of the “P-center effect” rather than a reflection of participants’ auditory-motor synchronization abilities. I acknowledge that our previous studies have not sufficiently explored the role of perceptual P-centers in this bimodal outcome, and I agree that this aspect warrants thorough investigation through specifically designed experiments. However, based on the existing evidence, I believe that differences in participants’ auditory-motor abilities provide the most straightforward interpretation of the bimodal distribution. Below, I present several reasons supporting my contention.
The author relies on the analysis of discrete perceptual events to argue that our stimulus lacks rhythmicity compared to the timing of syllables produced with a metronome (Fig. 1 in the comment). However, our approach is fundamentally different. We focus on a continuous physical property of the perceived and produced signals: the envelope. It is well established that the envelope is one of the primary acoustic properties processed by the auditory cortex when listening to speech1,2 or music3,4. Thus, regardless of higher-order perceptual processing of auditory stimuli, it is clear that the rhythmic properties of the envelope are being recovered by auditory areas. Additionally, brain activity in motor regions during speech production closely approximates the speech envelope5. The main goal of the SSS-test is to assess how well participants can synchronize their speech-motor activity to the rhythmic features of the envelope of the sound they are listening to. To this end, we designed a stimulus with a clear rhythmic envelope (Fig. 1a) and estimated the phase-locking value between the envelope of the produced speech and the envelope of the stimulus, measuring the stability of the phase lag between the two continuous signals6,7 (Fig. 1b). This measurement reveals the bimodal outcome, indicating general synchrony between the produced and perceived envelopes. Since auditory activity when listening to speech tracks its envelope and the activity in motor areas has been shown to be proportional to the produced speech envelope, the bimodal outcome suggests a synchronization between these two regions for high synchronizers but not for lows. This is further supported by the observation that high and low synchronizers exhibit differences in the microstructural properties and total volume of the arcuate fasciculus, the principal white matter pathway connecting temporal (auditory) and frontal (motor/pre-motor) brain regions6.
a Stimulus design for the implicit fixed rate version7. The upper inset shows the audio signal of the stimulus (gray) with its corresponding envelope (pink). The lower panel displays the spectrum of the stimulus envelope. b Sketch of the algorithm used to calculate the speech-to-speech synchrony. The upper panel depicts the envelope of the perceived speech signal (pink line), while the lower panel shows the envelope of the produced speech signal (red line), both filtered around the presented syllabic rate (i.e., between 3.5 and 5.5 Hz). The phase locking value estimates the stability across time of the phase difference (Δθ) between these two signals. c Average spectrogram of the envelope of the produced speech from 33 high synchronizers, evaluated using a task version with a stimulus syllabic rate increasing by 0.1 Hz every 10 seconds (red line). d After completing the task where the presented rate increased from 4.3 to 4.7 Hz in 0.1 Hz steps every 10 seconds, high (n = 33) and low (n = 22) synchronizers reported whether they perceived an increment, decrement, or no change in stimulus rhythm. The panel shows participants’ responses. Panels c and d were adapted from ref. 6.
One interpretation presented by the author is that high synchronizers ignore precise acoustic timing and follow a rate broadly commensurate with the auditory prompt. However, when we introduced an imperceptible increment in the syllabic rate, high synchronizers rapidly adapted their rate (Fig. 1c). It is difficult to reconcile this fast and involuntary response with the interpretation that high synchronizers establish the distal rate of spoken inputs to synchronize their speech. Additionally, we found no difference between high and low synchronizers in the reported perception of changes in the presentation rate (Fig. 1d), arguing against a difference in perceived rhythmicity between the groups6.
Regarding low synchronizers, she proposes that they conscientiously followed the prompt, trying to synchronize with the jittered P-center of concatenated syllables. However, the implicit version of the test, in which participants are not explicitly instructed to synchronize but instead perform an orthogonal syllable recall task, also recovers the bimodal outcome6. In this design, participants are instructed to pay attention to the external stream of syllables because they will be debriefed afterward about which syllables were presented. The production of “tahs” is framed as a way to increase the difficulty of the task, rather than as a synchronization exercise. Crucially, even in this version, where participants are not intended to synchronize, the results still show a clear distinction between high and low synchronizers. Moreover, if all low synchronizers were employing a uniform synchronization strategy, using the same auditory stimulus across participants, as in the SSS-test, would likely reveal a consistent temporal pattern among them. However, observations show that some low synchronizers maintain a quicker pace while others adopt a slower one. Additionally, variability exists in the stability of their rates, with some demonstrating more consistency than others. These findings argue against the presence of a unified strategy among low synchronizers.
The comment reads: ‘In a sense, then, low synchronizers were actually better at synchronizing with the external auditory prompt than high synchronizers.’ Contrary to this argument, high synchronizers were found to have more years of musical experience than low synchronizers6,8. It is widely recognized that musicians generally exhibit superior auditory-motor synchronization abilities compared to non-musicians9,10. Additionally, when exposed to a sequence comprising the repetition of the same acoustic unit (i.e., a tone or the syllable “go”), resulting undoubtedly in a rhythmic pattern of P-centers, low synchronizers continued to perform less effectively than high synchronizers, despite the disappearance of the bimodal effect11. This supports the hypothesis that individuals classified as low synchronizers indeed exhibit inferior auditory-motor synchronization skills compared to high synchronizers.
Finally, the author proposes that the observed grouping might stem from individual differences in the interaction between feedback and feedforward control mechanisms during speech production, as described in the neurocomputational DIVA model. However, the bimodal distribution persists when participants are instructed to clap in sync with the stimulus instead of whispering ‘tah’11.
Based on the reasons outlined above, I favor the hypothesis of qualitatively different auditory-motor integration abilities between groups over the alternative hypothesis proposed by the author. Nevertheless, I recognize the significance of future studies investigating the role of the P-center, as they will ultimately disentangle these two interpretations.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
References
Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).
Ahissar, E. et al. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. PNAS 98, 13367–13372 (2001).
Di Liberto, G. M., Pelofi, C., Shamma, S. & de Cheveigné, A. Musical expertise enhances the cortical tracking of the acoustic envelope during naturalistic music listening. Acoust. Sci. Technol. 41, 361–364 (2020).
Doelling, K. B., Assaneo, M. F., Bevilacqua, D., Pesaran, B. & Poeppel, D. An oscillator model better predicts cortical entrainment to music. PNAS 116, 10113–10121 (2019).
Ruspantini, I. et al. Corticomuscular Coherence Is Tuned to the Spontaneous Rhythmicity of Speech at 2–3 Hz. J. Neurosci. 32, 3786–3790 (2012).
Assaneo, M. F. et al. Spontaneous synchronization to speech reveals neural mechanisms facilitating language learning. Nat. Neurosci. 22, 627–632 (2019).
Lizcano-Cortés, F. et al. Speech-to-Speech Synchronization protocol to classify human participants as high or low auditory-motor synchronizers. STAR Protoc. 3, 101248 (2022).
Rimmele, J. M. et al. Musical sophistication and speech auditory-motor coupling: easy tests for quick answers. Front. Neurosci. 15, 764342 (2022).
Franĕk, M., Mates, J., Radil, T., Beck, K. & Pöppel, E. Finger tapping in musicians and nonmusicians. Int J. Psychophysiol. 11, 277–279 (1991).
Repp, B. H. Sensorimotor synchronization and perception of timing: Effects of music training and task experience. Hum. Mov. Sci. 29, 200–213 (2010).
Mares, C., Echavarría Solana, R. & Assaneo, M. F. Auditory-motor synchronization varies among individuals and is critically shaped by acoustic features. Commun. Biol. 6, 1–10 (2023).
Acknowledgements
This work was supported by DGAPA-UNAM through the PAPIIT grant IN206825.
Author information
Authors and Affiliations
Contributions
M.F.A. conceptualization and writing.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Assaneo, M.F. Reply to: The timing of speech-to-speech synchronization is governed by the P-center. Commun Biol 8, 231 (2025). https://doi.org/10.1038/s42003-025-07546-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-025-07546-6