replying to: R. Anselme et al.; Scientific Reports https://doi.org/10.1038/s41598-025-94850-0 (2022).

In Winter et al.1, we reported evidence for a cross-modal iconic association between the trilled r sound and the sensory dimension of roughness in spoken vocabularies. One of our studies showed that the word ‘rough’ is more likely to contain an r than the word ‘smooth’ across 332 languages, but only when the language has a trilled r. Anselme, Pellegrino & Dediu2 (APD) present a reanalysis of our data based on a more rigorous coding of trilled versus non-trilled r. They confirm the link between trilled r and roughness but find an equally strong effect for languages with a non-trilled r, suggesting that the unique properties of trills ‘cannot be the main cause to this tactile-sound association.’ We question this conclusion. In making our case, we point out an important methodological dilemma: while pre-existing cross-linguistic data sets are often noisy, using manual coding to reduce this noise may introduce biases that distort the results. Indeed, a simulation shows that the effect size for languages with non-trilled r in APD’s analysis is larger than expected under an unbiased approach to recoding. While APD convincingly demonstrate the existence of an effect for non-trilled r, it is not clear that the effect is as strong as that for trilled r. It remains possible that trills play a distinctive role in carrying an iconic association with roughness across vocabularies.

Study 3 in our paper divided languages into those with trilled r and those with non-trilled r, requiring coding decisions about the type of r in each language. Realising that these decisions could be biased by our trilled-r-for-rough hypothesis, we (i) made them before checking whether the words for ‘rough’/ ‘smooth’ fit our prediction; and (ii) used automated and shallow heuristic strategies for coding, relying on pre-existing phoneme inventory data from PHOIBLE3 where available.

APD revised the coding for trilled r based on a review of available grammars. This painstaking manual recoding is welcome. However, APD’s coding documentation4 makes it clear that (i) the words for ‘rough’ and ‘smooth’ were consulted (and are sometimes referenced in the notes) and (ii) the decisions were often uncertain. The recoding is clearly based on careful phonetic judgment, but unconscious biases can nonetheless influence the results. For instance, one may apply additional scrutiny to languages originally coded with a trill: indeed, based on Fig. 1 of APD, 41% of the decisions were revised for languages originally coded with a trill, compared to 32% for other languages. Although this could be due to our original trill codes being less accurate, bias is also possible.

Fig. 1
figure 1

Simulated estimated effect sizes for the reshuffled results. The estimates shown here are all posterior means of the difference between the probability of r in words with the meanings ‘rough’ versus ‘smooth’. The solid line shows our original estimate, while the dashed line shows APD’s revised estimate.

It is difficult to measure coding bias, but we find the discrepancy between our original results and APD’s revised results striking. To quantify this intuition, we ran a simulation in which we reshuffled the revised codes separately within the trilled r and non-trilled r groups, and then refitted the original model to languages with non-trilled r (see osf.io/6nma2/). This was repeated 2,000 times. In each iteration of the simulation, the numbers of reassignments are the same as in APD: for instance, the same number of languages with trilled r are recoded as non-trilled. The simulation is summarised in Fig. 1, which shows how a neutral set of revisions would typically change the results. In our original study, we found an estimated difference of 0.02 between the probability of r in ‘rough’ versus ‘smooth’ words. Most of the simulated samples (84%) show a higher estimate, likely because of the distribution of reassignments (e.g. trills were more likely to be revised than non-trills). APD’s revised effect size is 0.19, which is extreme even compared to this shifted baseline, sitting well above the 99th percentile.

This discrepancy may reflect (i) biases in our original coding, (ii) biases in APD’s revised coding, or (iii) systematic differences between the simulated versus the actual process of revision. As for (i), we acknowledge that our coding could have been subject to bias. In particular, there were 15 languages where we overrode the phonemic codes from PHOIBLE. When we use the PHOIBLE codes instead, the results for languages with non-trilled r move slightly towards those reported by APD, with the probability of r estimated at 0.22 [0.11,0.36] for ‘smooth’ and 0.32 [0.17,0.5] for ‘rough’. APD’s results are more extreme, suggesting that (ii) may also have played a role. Regarding (iii), simulations always carry this risk, and therefore these findings are illustrative only.

Our goal is not to adjudicate between our original and APD’s revised data, but to point out that manual coding is always susceptible to biases, especially when coding decisions are uncertain. This potential for bias must be carefully balanced against the noise introduced by using pre-existing data or simple coding heuristics. It is not self-evident that manual coding is always preferable.

In summary, we agree with APD’s finding that languages with non-trilled r also show an r-for-rough effect. However, in our view, it remains likely that trilled r carries a particularly strong iconic association with roughness, while other r’s show a real but less consistent effect. Currently, only the association of trills and roughness is rooted in a mechanistic explanation. Moreover, even APD’s own results show a numerically stronger effect for trills (0.26 on the same scale as Fig. 1) versus non-trills (0.19), with no strong evidence to rule out a difference.

There are several ways to reconcile the distinctive role of trills in the r-for-rough pattern with its presence in languages with non-trilled r. First, the pattern may persist through historical inertia even after the trilled r realisation is lost. This is what seems to have happened in English, where trilled r is virtually extinct. There are likely other languages in our sample with a similar historical trajectory, given that trills often give rise to other rhotics5. Second, language variation and contact mean that a trill realisation is often accessible to speakers even when it is not typically used in a variety. Thus, trills can be used in an iconic function in American English, as in the ‘R-r-ruffles have r-r-ridges’ example from our original paper; or in Japanese, where, for example, singer Ringo Sheena uses trilled r’s in Tsumi to Batsu to portray an edgy persona. Trilled r is not the conventional realisation for either language, yet it is available for expressive purposes. Third, speakers may regard a trill as a ‘prototypical r’ even if it rarely surfaces in their variety (e.g., apical trills form a convention for Bühnendeutsch ‘stage German’, even for actors who speak varieties without this realisation). This fits into a broader pattern whereby hyperarticulated realisations of phonemes are identified as prototypical6. It is, of course, precisely such hyperarticulation that lends itself well to the expressive and depictive functions of iconicity7. Finally, it is possible that other rhotics also show phonetic properties that lend themselves to an association with roughness, though not to the same extent as trills.

We concur with APD that getting to the root of the r-for-rough pattern ‘may require detailed historical investigations and experimental cross-linguistic work’. Experimental work is already underway for trills in particular, showing that listeners from diverse languages perceive trills as rough8. Based on the arguments above, we suggest that such work should also explicitly compare different rhotics with respect to their iconic potential; and explore phonetic manipulations along different dimensions (e.g. number of trill cycles, amount of fricative noise, etc.) to hone in on those aspects of rhotics that associate them with roughness.