Abstract
Are the sound systems of languages ecologically adaptive like other aspects of human behavior? In previous substantive explorations of the climate–language nexus, the hypothesis that desiccation affects the tone systems of languages was not well supported. The lack of analysis of voice quality data from natural speech undermines the credibility of the following two key premises: the compromised voice quality caused by desiccated ambient air and constrained use of phonemic tone due to a desiccated larynx. Here, the full chain of causation, humidity → voice quality → number of tones, is for the first time strongly supported by direct experimental tests based on a large speech database (China’s Language Resources Protection Project). Voice quality data is sampled from a recording set that includes 997 language varieties in China. Each language is represented by about 1200 sound files, amounting to a total of 1,174,686 recordings. Tonally rich languages are distributed throughout China and vary in their number of tones and in the climatic conditions of their speakers. The results show that, first, the effect of humidity is large enough to influence the voice quality of common speakers in a naturalistic environment; secondly, poorer voice quality is more likely to be observed in speakers of non-tonal languages and languages with fewer tones. Objective measures of phonatory capabilities help to disentangle the humidity effect from the contribution of phylogenetic and areal relatedness to the tone system. The prediction of ecological adaptation of speech is first verified through voice quality analysis. Humidity is observed to be related to synchronic variation in tonality. Concurrently, the findings offer a potential trigger for diachronic changes in tone systems.
Similar content being viewed by others
Introduction
Human behavior, such as phenotypes and survival strategies, is well-adapted to most features of contemporary environments (Chagnon and Irons, 2002). For example, the internal nasal fossa and mid-facial morphology show an ecogeographical distribution consistent with climate adaptation, partly because the nasal cavity plays a major role in adapting to extremely cold and dry climates (Maddux et al., 2017; Evteev et al., 2014). Ancient farmers developed adaptive strategies to cope with changes in crop yields under environmental and climate changes such as cooling events (d’Alpoim Guedes and Bocinsky, 2018). Among all human behaviors, using language to communicate is one of the most unique traits that distinguish humans and animals. In view of the universal adaptability of human biology and human behavior, a growing number of studies have attempted to trace the relationship between ecological factors and possible adaptive language elements.
Previous studies have provided evidence for the hypothesis that changes in the sound system are ecologically adaptive. These studies have explored various correlations, such as the correlation between the reduction of ambient air pressure and the use of ejectives, and the correlation between climate and sonority classes (Maddieson and Coupé, 2015; Munroe et al., 2009; Ember and Ember, 2007; Everett, 2013; Everett et al., 2015; Everett, 2017). One such exploration is presented by Everett et al. (2015), which demonstrated a statistical association between ambient desiccation and the absence of lexical tone. The authors submit that complex tone should be more difficult to achieve in arid climates than in warmer and more humid climates given that inhalation of dry air impacts vocal fold physiology and that production of tones requires relatively precise manipulation of the vocal folds. Most commentaries have agreed in general that language is ecologically adaptive (Boer, 2016; Donohue, 2016; Ladd, 2016; Everett et al., 2016b; Hammarström, 2016; Collins, 2016; Winter and Wedel, 2016). However, there has been much debate about the more specific hypothesis vis-à-vis desiccation and tonality. On the one hand, the suggestion is supported, at least indirectly, by extensive experimental evidence from laryngology. Everett et al. (2015) offered global, continental, and linguistic family-level data consistent with the geography–tone association. On the other hand, the discovered patterns of humidity–tonality are not buttressed by natural speech analysis. The two premises are that (1) desiccated ambient air results in compromised voice quality and (2) a desiccated larynx constrains the use of phonemic tone (Everett, 2017; Ladd, 2016; Everett, 2021). In the absence of support for these two key premises, the discovered patterns of humidity–tonality could potentially be interpreted as an epiphenomenon of the geographical distribution of languages (Collins, 2016; Winter and Wedel, 2016). Subsequent works presented analyses with a continuous measure of tone and found a positive correlation between humidity and tone, but the significance disappears when controlling for relative genealogical distance (Hammarström, 2016; Roberts, 2018). Thus, whether the absence of ambient humidity negatively correlates with the presence of tone remains unresolved.
Regarding the first premise, namely that desiccated air affects voice quality, the debate revolves around the question of whether the desiccation effect is large enough to impact laryngeal pitch control (Donohue, 2016). Everett et al. offered a brief meta-analysis of relevant studies from laryngology, showing that laryngeal desiccation impacts the viscoelasticity of the vocal folds and that the desiccation of vocal cords leads to greater perceived phonatory effort on the part of speakers (Everett et al., 2015; Everett, 2017). However, some have argued that the impact of desiccated air on the vocal cords is minor (Boer, 2016). Everett et al. countered this view with reference to several lines of evidence relating to special environments or special populations such as winter athletes, singers, and patients with respiratory diseases (Everett et al., 2016b; Sue-Chu, 2012; Koskela, 2007). Although previous studies on the effect of hydration on voice quality were conducted with normal subjects (Leydon et al., 2009; Alves et al., 2019), they were conducted under laboratory conditions. None of the studies has provided direct evidence that the effect of desiccation, after long-term provocation periods, is large enough to increase the jitter of common speakers, in a naturalistic environment in which speech is used.
Concerning the second premise, that a desiccated larynx constrains the use of phonemic tone, it has been debated whether languages with complex tones really do rely more on precise laryngeal pitch control (Everett, 2017; Ladd, 2016). Everett et al. (2015) categorized languages as having or not having ‘complex tonality’. Here ‘complex tone’ is defined as a tone system with three or more tonemic contrasts according to Maddieson (2013). Yet, languages vary non-discretely in tone. The characterization of complex tonality by Everett et al. (2015) is somewhat simplistic. Tone is not a simple pitch or fundamental frequency but can involve several other kinds of cues, and it is often not possible to identify a single cue that is responsible for all contrasts (Donohue, 2016). For instance, many Hmongic languages are known to express different tonal categories through a mesh of cues including breathiness, creakiness, besides modal phonation. Thus, exclusively investigating pitch or fundamental frequency when contrasting tone systems cross-linguistically is insufficient as a basis for judging whether tonal languages require more precise laryngeal control (Ladd, 2016; Gussenhoven 2016). Whether using phonemic pitch or the length and specific phonations involved in tonal complexity, regular vocal fold vibration is required. Measurements of perturbation in voice frequency (jitter) and amplitude (shimmer) can quantify the regularity and hence the stability of vocal fold vibration (Brockmann et al., 2011). It is not clear whether minor effects of humidity on jitter rates can impact tone production in normal speech (Everett, 2017; Ladd 2016) since studies have not addressed this issue yet.
Based on the two premises, the suggestion was made to use voice quality measurements as an intermediate process to help establish a full causal chain fleshing out the hypothesis of the relationship between humidity and tonality (Everett et al., 2015). Current research (Everett et al., 2015; Hammarström,2016; Roberts, 2018) simply tested a correlation between the variables at either end of the chain. That is, previous work analyzed correlations between humidity and tones in databases only, without analyzing actual speech data—the key middle link of the causal chain. When directly predicting the distribution of tone using humidity, juggling with multicollinearity between environmental and sociocultural predictors (such as population, language families, and language contact) makes it hard to derive causal mechanisms from correlation patterns. A direct experimental test of the full chain of causation, humidity → voice quality → tonality, is difficult without a cross-linguistics speech database. In order to extract voice quality measurements from natural speech, a cross-linguistic speech database is needed. All audio in the database should be produced according to the same standard because acoustic voice features are sensitive to environmental noise and signal amplitude, which depends on the location of equipment and the quality of the microphone (Fahed et al., 2022; Uloza et al., 2021). Additionally, acoustic voice features of different genders and age groups also show significant differences (Brockmann et al., 2011; Schultz et al., 2021).
Here, we establish two more fine-grained causal links, humidity → voice quality and voice quality → number of tones, aiming to strengthen the evidence linking humidity with tonality. Currently, China’s Language Resources Protection Project (Zhongguo Yuyan Ziyuan Baohu Gongcheng, abbreviated as YuBao) provides a large, standardized, high-quality audio and video database that can be used to extract voice quality data from natural speech. The database covers language varieties from 1718 locations in China. The database establisher has set strict and uniform standards for the recording process, including recording environment, equipment, parameters, and speaker selection. Other speech databases around the world struggle to meet all these requirements for large-scale, high-quality recording, and uniform recording standards at the same time (Heggarty et al., 2019). In addition to meeting the above conditions, the YuBao database offers two advantages in studying humidity–tonality patterns: (1) the database covers a large number of tonally rich languages, (2) the locations of the sampled languages cover diverse climates. Owing to tremendous differences in latitude, longitude, and altitude, the climate of China is extremely diverse, ranging from tropical in the far south to subarctic in the far north and alpine in the higher elevations of the Tibetan Plateau (Lü and Li, 2012). China is a natural testing ground for the effects of desiccated air on the larynx, given the great variation in the number of tones across different languages and differences in the climactic conditions experienced by the speakers of these languages (Collins, 2016) (see Fig. 1a, c).
We will examine the distribution of tonal patterns in China and address the issue of humidity’s effect by drawing upon phonetic and phonological data for a large set of languages. We are pursuing three goals. First, we will analyze whether lower humidity leads to poorer voice quality. Second, we will investigate the hypothesis that poor voice quality has an effect on the number of tones. Third, we re-test the correlation between humidity and the number of tones, comparing our results with previous ones (Hammarström, 2016; Roberts, 2018).
Methods
Rather than employing simplistic binning strategies in the categorization of linguistic and geographic variables, we use continuous variables, specifically humidity, jitter, shimmer, and number of tones. We relied on recordings and phonotactics from the YuBao database. We were authorized and downloaded recordings pertaining to a list of 1200 lexical items in 997 language varieties, all of which were used in the current study. We analyzed a total of 1,174,686 recordings, which included samples from a few locations where not all 1200 lexical items were present. All the recordings in the case study were digitized at a sampling rate of 44,100 Hz, 16 bits per sample. The recordings for every language variety were made according to strict standards that regulated the speakers, recordings, videos, and transcriptions. To control for noise, it was recommended to record in a professional recording studio or a quiet room with doors and windows closed and with electrical appliances such as fans, air conditioners, fluorescent lights, and mobile phones turned off. These standards also require controlling background noise to be below -60 dB and no louder than -48 dB, and speech volume should reach a maximum of -18 dB or lower than -6 dB, with Audacity being used as an example. For each language variety, the metadata consist of a location (province, city/district, and town/county) and the data consist of recordings of 1200-item list from one native male speaker aged 55–65.
Forty-eight (4.81%) varieties in the dataset are classified as non-tonal, and the remainder are tonal. For the latter, we extracted the number of tones for each language variety (Chinese Academy of Social Sciences, 2017a, 2017b). The number of tones, or rather the number of tone oppositions, includes all the contrasts that produce meaning differentiation at the word level. For example, the Jianchuan variety of Bai (ISO 639-3 code: bca) has eight tones, divided between those with modal (e.g., [tɕi33] ‘pull’) and non-modal phonation (e.g., [tɕi42] ‘chase’). The Chao tone numerals following the string of IPA characters indicate pitches relative to the natural pitch range of a particular speaker’s voice. The pitch level of words with non-modal phonation is higher than that of words with modal phonation (Editorial Board of Chinese Minority Languages, 2009). For the 997 languages used in the analysis, tone ranged from 0 to 14, with a mean of 5. The concept of ‘complex tone system’ (Everett, et al., 2015) is useful for broad comparisons on a worldwide scale, but at the more local level of Sino–Tibetan, Hmong–Mien, and Kam–Tai languages, a more fine-grained categorization is needed, since most of the languages would fit a general definition of complex tone systems. For instance, the Mandarin tone system, which has four tonemic contrasts, is evidently less complex than that of the Jianchuan variant of Bai, even if both languages are plausibly classified as having complex tone systems. Thus, the concept of ‘complex tone’ is avoided here.
Commonly used acoustic measures for voice quality analysis include jitter, shimmer, HNR (harmonic noise ratio), fundamental frequency, PTP (phonation threshold pressure), and PPE (perceived phonatory effort) (Leydon et al., 2009; Alves et al., 2019; Gussenhoven, 2016). Dehydration, water ingestion, and steam inhalation (rehydration) can significantly affect jitter and shimmer (Alves et al., 2019; Mahalingam and Boominathan, 2016). Jitter and shimmer are more sensitive to modest increases or decreases in humidity than other measures. Jitter and shimmer values were extracted using Praat (see Supplementary Text) and averaged over 1200-item recordings of each location, with both measures expressed as percentages. Specific humidity, which refers to the ratio of water in the air, was chosen as the main ecological variable. We obtained specific humidity data from the WheatA database for locations associated with languages across China from 1982 to 2021. The unit of humidity is expressed in g/kg. For each location, the mean specific humidity was calculated across all years and months. The humidity ranged from 2.24 to 16.24, with an average of 9.158. The highest humidity was recorded in Yazhou District, Sanya City, Hainan Province (18.363°, 109.178°), at 16.24, while the lowest was in Ritu County, Ngari Prefecture, Tibet Autonomous Region (33.383°, 79.739°), at 2.24. In total, there are 997 locations (speakers). Each location has a humidity datum, a jitter datum, a shimmer datum, and a tone total (number of tones).
We used base R (R Core Team, 2018) and the lme4 package (Bates et al., 2015) to perform linear mixed-effects analysis for voice quality and generalized linear mixed-effects analysis for the number of tones. When investigating a linguistic phenomenon across multiple languages, neglecting the possibility that languages with a shared ancestor may also share similar features can lead to incorrect conclusions. In this case, it is appropriate to include linguistic family as a random effect in regression analysis (Coupé, 2018). We expect that data pertaining to one and the same language family are not independent and therefore model language family, or linguistic groups, as random effects in all analyses. As shown in Table 1 and Fig. 1d, the family factor has six levels (Altaic, Austroasiatic, Sinitic, Hmong–Mien, Tibeto–Burman, Kam–Tai). Generalized linear models allow for the dependent variable to follow a non-normal distribution, such as a Poisson distribution, which is suitable for data on phoneme inventory sizes (Coupé, 2018).
We first conducted two linear mixed-models to examine the effect of humidity on voice quality measures (jitter and shimmer). Humidity was entered into the models as a fixed effect. As random effects, we had intercepts for language families and a by-family random slope for the effect of humidity. We also log-transformed jitter and shimmer to achieve a more normal distribution of the data. Next, we used a generalized linear mixed-effects model to predict the number of tones, using a Poisson distribution to capture the discrete and skewed nature of the data. We entered jitter and shimmer as fixed effects and did not include interactions between independent variables to avoid making the model more complex. As random effects, we included intercepts for language families and a by-family random slope for the effect of jitter and shimmer. Finally, we performed a generalized linear mixed-effects analysis to examine the effect of humidity on the number of tones, with only humidity included in the model.
Sinitic accounts for 76.62% of the total number of data points. The remaining linguistic groups have fewer representatives. We also carried out parallel statistical analyses of these linguistic groups. We used a linear model fitting voice quality and a generalized linear model fitting the number of tones within each linguistic group (see Supplementary Materials for details).
Results and discussion
Humidity effect on voice quality
The linear mixed-model for jitter/shimmer and humidity reveals an interaction, as evidenced by the negative slope in Fig. 2. The graph illustrates that locations with lower humidity tend to have a higher percentage of jitter and shimmer. The analysis of factorial experiments shows that the fixed effect of humidity is significant (Bolker et al., 2022) (jitter: χ2 = 160.68, df = 1, p < 0.0001; shimmer: χ2 = 42.58, df = 1, p < 0.0001). Speakers living in more humid regions are more likely to have better voice quality, and speakers living in dryer regions are more likely to have poorer voice quality (see Fig. 2). Two groups, Sinitic and Tibeto–Burman, comprise 89.17% of the varieties in the sample, with each having more than 100 representatives. Although within-family regressions suggest that the humidity effect is only significant in Sino–Tibetan, speakers of linguistic groups located in humid regions (Kam–Tai, Hmong–Mien, Austroasiatic) have lower jitter and shimmer. Speakers of the linguistic group located in dryer regions (Altaic) have higher jitter and shimmer (see Fig. 3).
This first experiment constitutes direct evidence in support of the humidity–voice quality causal link. Why would less humidity lead to higher jitter and shimmer in speakers? There is a possible pathway linking humidity and voice quality. The effect of desiccated air is the evaporation of the airway surface liquid coating the vocal folds. Hypohydration alters the vibratory characteristics of the vocal folds. Muscular function varies according to hydration status, with increased fatigue and decreased rapidity of movement resulting from water deficit (Judelson et al., 2007). In naturalistic environments, prolonged inhalation of relatively dry air results in decreased efficiency of vocal fold vibration and compromised voice quality. A speaker’s laryngeal control is not as precise as that of a speaker living in a humid environment. This is a result of long-term accumulative effects of climate rather than short-term air provocation in experimental settings or extreme environments.
In the within-family regressions for Kam–Tai, Hmong–Mien, Austroasiatic, and Altaic, the effect of humidity on voice quality was not significant, potentially due to the small sample sizes or the small variance of humidity. That humidified air does not affect perturbations as systematically as dry air can account for the results pertaining to Kam–Tai, Hmong–Mien, and Austroasiatic (Hemler et al., 1997). Everett et al. also used the effect of humidified air to explain why humidity does not broadly correlate with tonality (Everett et al., 2016a). Previous work has reported that humidified air did not systematically influence perturbation, at least in the short-term provocation period (Hemler et al., 1997). Only two of the four previous studies found a significant positive effect of higher humidity levels on PTP. Limited significant effects were found for moderate humidity conditions (Alves et al., 2019). In contrast, low-humidity environments revealed more significant negative effects. One explanation is that the effect of further humidification of inhaled air does not exist, another explanation is that further decrease of perturbation may not be possible, especially since control perturbation measurements of all subjects were low and well within the normal range (Hemler et al., 1997). Humidity variation is not only limited at Kam–Tai, Hmong–Mien, and Austroasiatic locations, but they have high average and minimum values (see Supplementary Table S11). This suggests that limited variation in high humidity does not cause significant changes in voice quality, unlike low humidity.
Jitter and shimmer effects on number of tones
A generalized linear mixed-model with tone as the dependent variable, and jitter and shimmer as the independent variables, shows no interaction between jitter/shimmer and tone. Comparison of the full model with fixed effects and the model without fixed effects reveals that including jitter or shimmer as fixed effects in the model does not significantly improve the model fit (χ2 = 2.1275, df = 5, p = 0.8312).
However, within-family regressions suggest that the jitter effect is significant in Sino–Tibetan and Austroasiatic languages (see Fig. 4 and Supplementary Table S5). Including shimmer as a fixed effect in the models does not significantly improve the model. Altaic languages, as well as some Austroasiatic and Tibeto–Burman languages, are non-tonal; the mean humidity of these is 4.972, which is only greater than the humidity in 10.63% of locations. Their mean jitter and shimmer are 2.478 (greater than the jitter in 78.94% of locations) and 12.10 (greater than the shimmer in 83.85% of the locations), respectively.
In the previous section on the effect of humidity on voice quality, we observed that humidity affects both jitter and shimmer, but the impact on shimmer is not as prominent as the impact on jitter. The adjusted R2 results based on the linear model of linguistic groups demonstrate that humidity has a better predictive effect on jitter than shimmer (refer to Supplementary Tables S2 and S4). In studies that have investigated the effects of systemic hydration on shimmer, shimmer values are less accurate in speech signals compared to jitter values (Alves et al., 2019). Although both jitter and shimmer are time-based, jitter is more dependent on fundamental frequency (Shu et al., 2022). Tone is more associated with variations in fundamental frequency than with jitter. However, no statistical difference was found for the effect of systemic or surface hydration on fundamental frequency (Alves et al., 2019). Therefore, the acoustic measure of voice quality, jitter, is not only well-predicted by humidity but is also an effective predictor of the number of tones.
A poorer voice quality is more likely to be observed in the speakers of non-tonal languages and languages with fewer tones, which occurs in Altaic and Sino–Tibetan. Speakers of Sino–Tibetan languages with more tones are more likely to have better voice quality. Although the jitter effect was not significant across all within-family regressions, linguistic groups with the lowest jitter (Kam–Tai, Hmong–Mien) have a larger number of tones, while the linguistic group with the highest jitter (Altaic) represents non-tonal languages. These results suggest that synchronic variation in tonality is related to minor differences in jitter rates.
Why might higher jitter be associated with tone reduction? Maintaining adequate tone usage in communication is challenging when the efficiency of vocal fold vibration decreases, resulting in weaker lexical tone distinctions. Irregular and aperiodic vocal fold vibrations impede the effortless production of both vowels and tones. Vowels require high-amplitude vocal fold vibration in nearly all cases. However, as phonatory effort increases, the maintenance of adequate tone use becomes more challenging than maintaining vowel use. Vowel differences contribute more significantly than tone differences in dialect perception (Liu et al., 2020). During language acquisition, children demonstrated reduced sensitivity to tone mispronunciations relative to vowel mispronunciations (Wewalaarachchi and Singh, 2015; Singh et al., 2015). Deficient use of tonemes is permissible in communication, whereas vowel mispronunciation is easily perceived by listeners.
A causal effect between voice quality and the number of tones exists but is not robust. Beyond a certain point, an improvement in voice quality may not result in a significant increase in the number of tones produced, just as further humidification of inhaled air beyond a certain point may not have a noticeable effect on vocal fold vibration. The mean values of jitter of Kam–Tai and Hmong–Mien are low (see Supplementary Table S9). Hmong–Mien and Kam–Tai generally have the largest number of tone categories due to tonogenesis and tone change. Although Hmong–Mien and Kam–Tai maintain their tonal complexity as they are less prone to influence from poor voice quality, better voice quality may not increase the number of tones any further, as the number of tones is unlikely to break the tone inventory cap. The extreme tone inventories are limited by their physiological basis (Ran, 2016). In humid areas or areas categorized as not dry, the vocal fold control is not hampered by humid air, and the normal voice quality does not impede the use or development of phonemic tone.
The higher jitter or shimmer in Altaic languages may be caused by complex syllable structures. Altaic languages are non-tonal. As Maddieson notes, non-tonal languages are considerably more likely to have complex syllable structures (Maddieson, 2013), and this is true of Altaic languages, which permit freer combinations of two or more consonants in the position after a vowel. The first consonants of clusters are mainly sonorants including liquids, nasals, and glides. Meanwhile, languages with complex syllables have fewer vowels (Everett, 2021). Including more consonant clusters or fewer vowels in a syllable may increase the jitter of the syllable, because consonants within normal speech will appear to have little periodicity, whereas sustained vowels will appear to be strongly periodic (Farideh et al., 2021). Vowels are richer in glottal vibrations compared to consonants (Saggio and Costantini, 2020). Higher jitter or shimmer may result from complex syllable structures, which are more likely to occur in non-tonal languages, and lower jitter or shimmer may result from moderately complex syllable structures, which are associated with the occurrence of complex tone systems. Thus, the association between voice quality and tone systems may be due to a potentially confounding factor, namely syllable structures. However, this assumption does not hold for three main reasons. The first reason is that the tone category does not show any consistent relationship to the occurrence of simple syllable structure (Maddieson, 2013). Most morphemes in Sinitic consist of one syllable, and most syllables are identifiable as morphemes (Thurgood and LaPolla, 2008). Although Sinitic languages all have simple syllable structures, the number of distinctive tones varies across variants of Sinitic, and so does speakers’ jitter. The second reason is that although Hmong–Mien and Kam–Tai have moderately complex syllable structure, their jitter and shimmer is lower than those of Sinitic. Hmong–Mien and Kam–Tai languages that permit liquids or glides in the second position of consonant clusters are counted as having moderately complex syllable structures. Third, polysyllabic (especially disyllabic) words, most of which are transparently compounded of monosyllabic morphemes, occur frequently in the lexicon of most Sino–Tibetan languages (Thurgood and LaPolla, 2008). Sampling voice quality data from lexical recordings mitigates the effects of differences in syllable structure between Altaic and Sino–Tibetan. These three reasons preclude the possibility that syllable structure is responsible for the correlation between voice quality and tone system observed in this study being fortuitous.
Humidity effect on number of tones
A generalized linear mixed-model with the number of tones as a dependent variable and humidity as an independent variable revealed a significant nonlinear relationship between humidity and tone, as also found in a previous study (Roberts, 2018). Our final model is expressed by a polynomial of humidity and has a random intercept for family and a random slope for the humidity effect for the family. The results show that the fixed effect of humidity is significant (I(Humidity2): χ2 = 10.59, df = 1, p = 0.001; I(Humidity3): χ2 = 8.06, df = 1, p = 0.005) (see Fig. 2). Within-family regressions suggest that the humidity effect is significant in Sino–Tibetan (see Fig. 5). Thus, we came to the conclusion that the location with higher humidity tends to have a higher number of tones.
The correlation between climate and tonality is not particularly surprising in China, because ethnic languages which are tonal, and Sinitic, with a larger number of tones, are found across Southern China and are less concentrated in the northern and northwestern parts of the country. The Qinling–Huaihe Line, corresponding roughly to the 33rd parallel, is often used as the geographical dividing line between northern and southern China. This line approximates the 0 °C January isotherm and the 800 millimeters isohyet in China. It divides eastern China into northern and southern regions with different climates, namely, semi-humid and humid. Moreover, because of higher altitudes, there are arid and semi-arid regions in northwest China and the Qinghai–Tibet alpine regions in western China. Main dialect groups of Sinitic (Wu, Xiang, Gan, Hakka, Min, and Yue) cover the east and southeast of China, falling neatly into almost complementary geographical distribution with Mandarin (LaPolla, 2001). Populations that speak Hmong–Mien, Kam–Tai, and Austroasiatic are in southern China (Sun et al., 2013).
Historical contingencies playing out in geographical space is an ever-present factor in the evolution of languages, and a possible confound for correlational studies and attempts to establish causal chains. In China, successive waves of migration from the north have, over many centuries, led to a successive superimposition of layers of different northern Chinese dialects onto evolving southern dialects. The southward expansion of Sinitic resulted in the emergence of new varieties in Southern China with substrate influence from the indigenous languages spoken there (Chappell, 2001). Sino–Tibetan languages also spread into Burma and throughout the Himalayas, from an origin which is commonly assumed to have been the Yellow River. The Kam–Tai and Hmong–Mien families expanded southward during historic times (Diamond and Bellwood, 2003). The ‘Altaicization of Northern Chinese’ hypothesis (Hashimoto, 1986) implies that Northern Sinitic varieties are more likely to be stress-based and have a smaller number of tone categories when they are spoken near generally non-tonal Altaic languages. Southern Sinitic varieties maintain or develop their tonal complexity, as they are less prone to influence from the Altaic languages and instead prone to influence from highly tonal Hmong–Mien languages (Collins, 2016; LaPolla 2001; Szeto and Yurayong, 2010). It is necessary to be aware of the effects of language contact because of its important contributions to the tone system. Still, its direct and indirect effects on tone are mediated by humidity effects, which unlike effects of language contact, are systematic (Collins, 2016; Everett et al., 2016a). There is nothing to suggest that the correlation between humidity and tone could be an artifact of the history of language families and language contact. The objective characteristics of phonatory capabilities captured from speech samples are immune to contact-based effects or the language-families history. Our findings certainly do not deny or contradict the great contributions of history and language contact to the distribution of tone systems, but the fact that climate still emerges as a correlate of variation in sound systems in the face of historical contingencies underscores the importance of this latter factor.
Conclusion
The full chain of causation, humidity → voice quality → number of tones, is for the first time strongly supported by direct experimental tests on the basis of a large speech database, China’s Language Resources Protection Project. Previous studies only examined a correlation between the variables at either end of the chain. Owing to the lack of large, standard, and high-quality cross-linguistic speech databases for extracting voice quality measurements from natural speech, direct experimental testing of the entire causal chain has been hindered. In the absence of intermediate links, the hypothesis about tone and humidity cannot be verified.
Here, the prediction that climate affects the tone systems via voice quality is verified. The chain of causal effects becomes complete when we observe that relatively dry ambient humidity results in decreased efficiency of vocal fold vibration. The effect of humidity on the vocal folds is sufficient to surface in natural speech. Adequate tone usage in communication is hard to maintain when the efficiency in vocal fold vibration decreases. This leads to fewer distinctions in lexical tone, and then a change in the whole sound system. The pattern holds with respect to Sino–Tibetan languages. Meanwhile, the impact of humidified air on the vocal folds is not as substantial as that of dry air. Humid air does not impede laryngeal control. Languages maintain or develop their tonal complexity as they are less prone to influence from increased phonation effort. This pattern is observed in Hmong–Mien and Kam–Tai. Objective measures of phonatory capabilities disentangle the humidity effect from the effects of language diversity, history, and contact. The prediction of ecological adaptation of speech, then, is supported by the distribution of tone systems in China. Our results suggest that humidity is related to synchronic variation in tonality and offer a potential trigger for diachronic changes in tone systems.
Armed with the ‘acoustic adaptation hypothesis’ of Maddieson and Coupé, 2015, which was originally used to account for the relationship between characteristic of the syllable structure and ecological factors, Ladd (2016) went on to explore the humidity–tonality correlation. Areas with high annual precipitation and greater tree cover are observed to contain languages with a lower dependence on consonants in their sound patterns. This is because filtering effects of the environment are more likely to degrade higher frequency sounds. Consonants, in general, rely on higher frequency acoustic characteristics for identification, and hence their predominance in languages spoken in these areas is relatively lower (Maddieson and Coupé, 2015). Similarly, the effect of humidity also significantly affects sound transmission, leading to sounds being quieter and duller, because humidity will absorb high-frequency energy, reducing the level of high frequencies in the sound (Harris, 1966; Howard and Angus, 2009). Ladd believes that although both production and transmission factors play a role in explaining the correlation between humidity and tone, the effect of humidity on the signal constitutes a more plausible explanation for the uneven distribution of tone languages than effects on the organs (Ladd, 2016). Ladd pointed out that, in all situations, higher frequencies fade more quickly than lower frequencies. However, frequencies within the range of fundamental frequency tend to fade more quickly in dry air compared to humid air. According to Ladd, the humidity–tonality correlation can thus be explained with reference to sound transmission. The attenuation of low-frequency sounds in dry air results in a weakened distinction of the signal, which can further influence the perception of listeners. This can lead to miscommunication, and eventually to a selection pressure against fine tonal distinctions (Roberts, 2018). However, according to the ISO 9613-1 standard for calculating the attenuation of sound as a result of atmospheric absorption (ISO 9613-1, 1993), low-frequency pure-tone atmospheric attenuation coefficients in dry air are only slightly higher than in humid air. Additionally, the attenuation of low frequencies is much lower compared to that of high frequencies, regardless of the humidity level. This means that any changes in low-frequency attenuation may not be noticeable to listeners. Thus, while we are not strongly opposed to this idea, it seems to us unlikely that the causal chain linking tone and humidity resides in the transmission of sound in air and the effects of humidity on the signal that reaches the hearer. In contrast, we believe that a causal pattern of the effects of humidity on vocal organs has been demonstrated in the present paper.
In research focusing on human behavior adaptation, it has only rarely been possible to prove complete mechanisms of how ecology drives human behavior. Natural selection theory can directly support the connection between ecology and physiology. For example, the brain expansion in Homo was mainly driven by ecological challenges such as finding, caching, or processing food (González-Forero and Gardner, 2018). However, it is doubtful that other human behaviors are directly subject to the forces of natural selection. Climate, voice quality, and tone are linked in a causal chain, which provides a case study of a possible complete mechanism through which ecology triggers human behavior. The relationship between ecology and human behavior is mediated by physiological mechanisms, rather than directly related.
Despite the limited number of extremely dry regions in our dataset, we were still able to identify a trend that relatively dry ambient humidity results in decreased efficiency of vocal fold vibration. Moving forward, we will consider ways to expand our dataset to include more regions with extreme aridity and conduct further research to better understand the impact of humidity on voice quality. In addition, the approach demonstrated here could be extended to the exploration of global geo-phonetic correlations, which calls for the further establishment of a global, large-scale, high-quality, and standard speech database.
Data availability
All data generated or analyzed during this study are included in the supplementary information file and submitted dataset. And the dataset is also available in a GitHub repository (https://github.com/EL-CL/SI_Data). The dataset includes the locations of the 997 language varieties along with their corresponding humidity values (g/kg), jitter rates, shimmer rates, number of tones, language family membership, and raw acoustic data (jitter and shimmer for each audio file). Zhongguo Yuyan Ziyuan Baohu Gongcheng, abbreviated as YuBao, can be accessed at https://zhongguoyuyan.cn/index. WheatA database can be accessed at http://www.wheata.cn/. Python package pyecharts can be accessed at https://github.com/pyecharts/pyecharts.
References
Alves M, Krüger E, Pillay B, van Lierde K, van der Linde J (2019) The effect of hydration on voice quality in adults: a systematic review. J Voice 33(1). https://doi.org/10.1016/j.jvoice.2017.10.001
Bates D, Mächler M, Bolker BM, Walker SC (2015) Fitting linear mixed effects models using lme4. J Stat Softw 67(1):1–48. https://www.jstatsoft.org/article/view/v067i01
Boer BD (2016) Commentary: is the effect of desiccation large enough? J Lang Evol 1(1):55–57. https://doi.org/10.1093/jole/lzv008
Bolker B, Westfall J, Aust F, Ben-Shachar MS (2022) Analysis of factorial experiments. https://github.com/singmann/afex
Brockmann M, Drinnan JM, Storck C, Carding NP (2011) Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. J Voice 25(1):44–53. https://doi.org/10.1016/j.jvoice.2009.07.002
Coupé C (2018). Modeling linguistic variables with regression models: addressing non-gaussian distributions, non-independent observations, and non-linear predictors with random effects and generalized additive models for location, scale, and shape. Front Psychol, 9. https://doi.org/10.3389/fpsyg.2018.00513
Chagnon N, Irons W (2002) Adaptation and human behavior: an anthropological perspective.1st edn. Aldine De Gruyter, New York
Chappell HM (2001) Synchrony and diachrony of Sinitic languages: A brief history of Chinese dialects. In: Chappell HM (ed) Sinitic grammar: synchronic and diachronic perspectives. Oxford University Press, Oxford, pp. 3–28
Chinese Academy of Social Sciences (2017a) Language atlas of China: Chinese dialect volume, 2nd edn. The Commercial Press, Beijing
Chinese Academy of Social Sciences (2017b) Language atlas of China: minority languages volume, 2nd edn. The Commercial Press, Beijing
Collins J (2016) Commentary: the role of language contact in creating correlations between humidity and tone. J Lang Evol 1(1):46–52. https://doi.org/10.1093/jole/lzv012
d’Alpoim Guedes J, Bocinsky RK (2018) Climate change stimulated agricultural innovation and exchange across Asia. Sci Adv 4(10). https://www.science.org/doi/10.1126/sciadv.aar4491
Diamond J, Bellwood P (2003) Farmers and their languages: the first expansions. Science 300(5619):597–603. https://www.science.org/doi/10.1126/science.1078208
Donohue M (2016) Commentary: culture mediates the effects of humidity on language. J Lang Evol 1(1):57–60. https://doi.org/10.1093/jole/lzv009
Editorial Board of Chinese Minority Languages (2009) Brief chronicles of Chinese minority languages series. The Ethnic Publishing House, Beijing
Ember C, Ember M (2007) Climate, econiche, and sexuality: influences of sonority in language. Am Anthropol 109(1):180–185. https://www.jstor.org/stable/4496596
Everett C (2013) Evidence for direct geographic influences on linguistic sounds: the case of ejectives. PLoS ONE 8(6):65275. https://doi.org/10.1371/journal.pone.0065275
Everett C (2017) Languages in drier climates use fewer vowels. Front Psychol 8:1285. https://doi.org/10.3389/fpsyg.2017.01285
Everett C (2021) The sound systems of languages adapt, but to what extent? Considerations of typological, diachronic and mercurial data. Cadernos de Linguística 2(1):1–23. https://cadernos.abralin.org/index.php/cadernos/article/view/342
Everett C, Blasi DE, Roberts SG (2015) Climate, vocal folds, and tonal languages: connecting the physiological and geographic dots. Proc Natl Acad Sci USA 112(5):1322–1327. https://doi.org/10.1073/pnas.1417413112
Everett C, Blasi DE, Roberts SG (2016a) Language evolution and climate: the case of desiccation and tone. J Lang Evol 1(1):33–46. https://doi.org/10.1093/jole/lzv004
Everett C, Blasi DE, Roberts SG (2016b) Response: climate and language: has the discourse shifted? J Lang Evol 1(1):83–87. https://doi.org/10.1093/jole/lzv013
Evteev AA, Cardini AL, Morozova IY, O’Higgins P (2014) Extreme climate, rather than population history, explains mid-facial morphology of northern Asians. Am J Phys Anthropol 153:449–462. https://doi.org/10.1002/ajpa.22444
Fahed VS, Doheny EP, Busse M, Hoblyn J, Lowery MM (2022) Comparison of acoustic voice features derived from mobile devices and studio microphone recordings. J Voice. https://doi.org/10.1016/j.jvoice.2022.10.006
Farideh J, Gadepalli C, Jarchi D, Cheetham B (2021) Acoustic analysis and digital signal processing for the assessment of voice quality. Biomed Signal Process Cont 70(4):103018. https://doi.org/10.1016/j.bspc.2021.103018
González-Forero M, Gardner A (2018) Inference of ecological and social drivers of human brain-size evolution. Nature, 554–557. https://www.nature.com/articles/s41586-018-0127-x
Gussenhoven C (2016) Commentary: tonal complexity in non-tonal languages. J Lang Evol 1(1):62–64. https://doi.org/10.1093/jole/lzv016
Hammarström H (2016) Commentary: there is no demonstrable effect of desiccation. J Lang Evol 1(1):65–69. https://doi.org/10.1093/jole/lzv015
Harris CM (1966) Absorption of sound in air versus humidity and temperature. J Acoust Soc Am 40(1):148–159. https://doi.org/10.1121/1.1910031
Hashimoto M (1986) The Altaicization of Northern Chinese. In: McCoy J, Light T (eds) Contributions to Sino-Tibetan studies. Brill EJ, Leiden, pp. 76–97
Heggarty P, Shimelman A, Abete G, Anderson C, Sadowsky S (2019) Sound comparisons: a new online database and resource for research in phonetic diversity. In: Calhoun S, Escudero P, Tabain M, Warren P (eds.) Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS), Melbourne, Australia 2019. Australasian Speech Science and Technology Association, Canberra, Australia, pp. 280–284. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2019/papers/ICPhS_329.pdf
Hemler R, Wieneke GH, Jonckere PD (1997) The effect of relative humidity of inhaled air on acoustic parameters of voice in normal subjects. J Voice 11(3):295–300. https://doi.org/10.1016/S0892-1997(97)80007-0
Howard D, Angus J (2009) Acoustics and Psychoacoustics, 4th edn. Focal Press, Oxford
ISO 9613-1. Acoustics–attenuation of sound during propagation outdoors–part 1: calculation of the absorption of sound by the atmosphere. Int StandOrgan 1993. https://www.iso.org/standard/17426.html
Judelson DA, Maresh CM, Anderson JM, Armstrong LE, Casa DJ, Kraemer WJ, Volek JS (2007) Hydration and muscular performance. does fluid balance affect strength, power and high-intensity endurance? Sports Med 37(10):907–921. https://doi.org/10.1111/j.1467-3010.2009.01790.x
Koskela HO (2007) Cold air-provoked respiratory symptoms: the mechanisms and management. Int J Circumpolar Health 66(2):91–100. https://doi.org/10.3402/ijch.v66i2.18237
Ladd DR (2016) Commentary: tone languages and laryngeal precision. J Lang Evol 1(1):70–72. https://doi.org/10.1093/jole/lzv014
LaPolla RJ (2001) The role of migration and language contact in the development of the Sino-Tibetan language family. In: Aikhenvald AY, Dixon RMW (eds). Areal diffusion and genetic inheritance: case studies in language change. Oxford Univ Press, Oxford, pp. 225–254
Leydon C, Sivasankar M, Falciglia DL, Atkins C, Fisher KV (2009) Vocal fold surface hydration: a review. J Voice 23(6):658–665. https://doi.org/10.1016/j.jvoice.2008.03.010
Liu HM, Liang J, van Heuven VJ, Heeringa W (2020) Vowels and tones as acoustic cues in Chinese subregional dialect identification. Speech Commun 123(3):59–69. https://doi.org/10.1016/j.specom.2020.06.006
Lü LC, Li WL (2012) Zhongguo dili [China Geography], 1st edn. Science Press, Beijing
Maddieson I (2013) Tone. In: Dryer, M., Haspelmath, M. (eds) The world atlas of language structures online. Max Planck Institute for Evolutionary Anthropology, Leipzig. https://wals.info/chapter/13
Maddieson I, Coupé C (2015) Human spoken language diversity and the acoustic adaptation hypothesis. J Acoust Soc Am 138(3):1838. https://doi.org/10.1121/2.0000198
Maddux SD, Butaric LN, Yokley TR, Franciscus RG (2017) Ecogeographic variation across morphofunctional units of the human nose. Am J Phys Anthropol 162(1):103–119. https://doi.org/10.1002/ajpa.23100
Mahalingam S, Boominathan P (2016) Effects of steam inhalation on voice quality-related acoustic measures. Laryngoscope 126(10):2305–2309. https://doi.org/10.1002/lary.25933
Munroe RL, Fought JG, Macaulay R (2009) Warm climates and sonority classes not simply more vowels and fewer consonants. Cross Cult Res 43(2):123–133. https://doi.org/10.1177/106939710933148
R Core Team (2018) R: A language and environment for statistical computing. Vienna. http://www.R-project.org/
Ran Q (2016) Hanyu fangyan jixian shengdiao qingdan yanjiu [Studies on extreme tone inventories across Chinese dialects]. Nankai Univ Press, Tianjin
Roberts SG (2018) Robust, causal, and incremental approaches to investigating linguistic adaptation. Front Psychol 9:166. https://doi.org/10.3389/fpsyg.2018.00166
Saggio G, Costantini G (2020) Worldwide healthy adult voice baseline parameters: a comprehensive review. J Voice. https://doi.org/10.1016/j.jvoice.2020.08.028
Schultz BG, Rojas S, John MS, Kefalianos E, Vogel AP (2021) A cross-sectional study of perceptual and acoustic voice characteristics in healthy aging. J Voice. https://doi.org/10.1016/j.jvoice.2021.06.007
Shu M, Zhang Y, Jiang JJ (2022) The effect of Mandarin vowels on acoustic analysis: a prospective observational study. J Voice 138(3):1–6
Singh L, Goh HH, Wewalaarachchi TD (2015) Spoken word recognition in early childhood: comparative effects of vowel, consonant and lexical tone variation. Cognition 142:1–11. https://doi.org/10.1016/j.jvoice.2022.03.028
Sue-Chu M (2012) Winter sports athletes: long-term effects of cold air exposure. Br J of Sports Med 46(6):397–401. https://doi.org/10.1136/bjsports-2011-090822
Sun H, Zhou C, Huang X, Liu S, Lin K, Yu L, Huang K, Chu J, Yang Z (2013) Correlation between the linguistic affinity and genetic diversity of Chinese ethnic groups. J Hum Genet 58(10). https://www.nature.com/articles/jhg201379
Szeto PY, Yurayong C (2010) Sinitic as a typological sandwich: revisiting the notions of Altaicization and Taicization. Linguistic Typol 25(3):6858–6868. https://doi.org/10.1515/lingty-2021-2074
Thurgood G, LaPolla RJ (2008) The Sino-Tibetan Languages. Routledge, London and New York
Uloza V, Ulozaite-Staniene N, Petrauskas T, Kregždyte R, Lithuania K (2021) Accuracy of acoustic voice quality index captured with a smartphone–measurements with added ambient noise. J Voice. 465.e19–465.e26 https://doi.org/10.1016/j.jvoice.2021.01.025
Wewalaarachchi TD, Singh L (2015) Vowel, consonant, and tone variation exert asymmetrical effects on spoken word recognition: evidence from 6 year-old monolingual and bilingual learners of Mandarin. J Exp Child Psychol 189(3):1838–1838. https://doi.org/10.1016/j.jecp.2019.104698
Winter B, Wedel A (2016) Commentary: desiccation and tone within linguistic theory and language contact research. J Lang Evol 1(1):80–82. https://doi.org/10.1093/jole/lzv010
Acknowledgements
This research was funded by the major project from National Social Science Fund of China (Grant No. 19ZDA300) and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany´s Excellence Strategy (EXC 2150–390870439).
Author information
Authors and Affiliations
Contributions
QR and QX designed research, analyzed data, reviewed, and edited the paper. YL and S Wichmann wrote, reviewed, and edited the paper. LW contributed language recordings resource, supervised the research, and reviewed the paper. S Wang, JD, and YL performed research and analyzed data. TW improved the visualization. YL and QX contributed equally to this work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liang, Y., Wang, L., Wichmann, S. et al. Languages in China link climate, voice quality, and tone in a causal chain. Humanit Soc Sci Commun 10, 453 (2023). https://doi.org/10.1057/s41599-023-01969-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1057/s41599-023-01969-4







