Abstract
Recognizing individual variability is essential for developing targeted, personalized medical interventions. Vocal fatigue is a prevalent symptom and complaint among occupational voice users, but its identification has yielded mixed results. Vocal fatigue is a complex issue with heterogeneous biophysiological responses to vocal demands among individuals. This research aims to classify individuals as vocal demand responders to measure changes in vocal performance consistent with state vocal fatigue. A total of 37 participants (19F, 18M) completed a 30-minute vocal loading task (VLT) which consisted of loud speaking with background noise. Participants provided speech samples pre- and post-VLT and rated their vocal effort levels before, every 5 minutes during, and after the VLT. Perceived effort ratings and measured vocal performance from the speech samples were used to classify participants into distinct subgroups of vocal demand responders. Prior to classification there were few detectable changes associated with the VLT. However, the subgroup with both vocal effort and voice production demand responses displayed significant changes consistent with vocal fatigue while the other subgroups did not. These findings support the need for an individual-based approach to subtyping and measuring vocal fatigue and highlight its heterogeneous nature.
Similar content being viewed by others
Introduction
Vocal fatigue is a common symptom in individuals seeking vocal health treatment. It is also a prevalent complaint in populations without voice disorders and can be an early sign of vocal health risks seen in occupational voice users, particularly teachers. One systematic review summarized vocal fatigue prevalence in teachers to be between 42% and 92%1. This wide range of prevalence has been reported as being caused by inherent differences in measurements of fatigue based on either state or trait fatigue. Trait fatigue has been defined as the “average amount of perceived fatigue over a period of time” and state fatigue as the “change in perception of fatigue during an ongoing activity”2. The Vocal Fatigue Index (VFI) is a validated instrument in measuring trait fatigue3. However, the quantification of state fatigue has been hindered by both the wide range of research protocols, which are not comparable, and an array of metrics that have resulted in both mixed and sometimes contradictory results. These differences may mask underlying individual variations, which could potentially be used to identify subgroups with distinct response patterns or risk profiles.
The research protocols range from in situ voice observations to laboratory-induced vocal fatigue. One approach for quantifying state vocal fatigue has been to monitor occupational voice users within their actual work environment and attempt to detect change over time. This has been attempted in schoolteachers4,5, call-center workers6, singers7,8, and radio broadcasters.9. These studies illustrate the variety of devices and techniques to track vocal use in ecologically valid (although less controlled) environments. On the other end of the spectrum are laboratory environments with prescriptive tasks designed to induce vocal fatigue, i.e. a vocal loading task (VLT). While the specifics of a VLT can vary widely across studies, a VLT typically includes a prolonged speaking task or elevated vocal effort. Fujiki and Sivasankar10 reported that for VLTs the most common duration was two hours with shortest and longest durations as 15 min and 3.75 h, respectively. The most common type of loading task was prolonged, loud reading. Previous work has typically used two approaches to elicit elevated loudness: either with background noise11,12 or a loudness target13,14. While a variety of VLT-influencing metrics have been reported (e.g., direct sensation of fatigue or discomfort, auditory perception of vocal effort, voice acoustic parameters), the results have been inconclusive with the only consistent measure related to assumed state vocal fatigue being perceived vocal effort15. Throughout VLT studies, acoustic measures have been used to track changes in voice production associated with prolonged speaking or other vocal demands. Common measures used to assess vocal fatigue in this manner include fundamental frequency16, speech level6, and cepstral peak prominence17. Unfortunately, these results vary and are inconclusive, illustrated by reports of increases, decreases, or no change associated with vocal fatigue18 and considerable inter- and intra-subject variability5.
These inconsistent results could be related to three problems with the implementation of VLTs. In general, (1) there is not a consistent definition and framework for studying state vocal fatigue, (2) the VLT studies vary widely in design and are not comparable, and finally, (3) the assumption that fatigue is induced by the VLT may not be appropriate due to potential individual differences in the biophysiological response to vocal demands and fatigue.
To address the first concern, a proposed consensus definition and framework for vocal fatigue and its related terms was introduced by Hunter and colleagues2. Here vocal fatigue is defined as “the perceived measurable symptom that influences vocal task performance and is individual specific; it is a multifaceted concept integrating self-perceived vocal symptoms and/or physiologic deficit,” which may be a result of high “vocal demand response,” high “vocal effort,” or “neuromuscular deficit.” This definition supports previous work in the use of measurements of vocal effort and vocal performance. Importantly, it also states that vocal fatigue is “individual specific”–a concept that has not been commonly considered.
In addition to a consistent framework, a consistent protocol for VLTs is needed. Direct comparisons of VLTs across different studies are essential for a comprehensive understanding. Additionally, a VLT designed with scalability would allow for a much larger sample of participants, which is critical for the detection of individualistic vocal demand responses. Previous work by Hunter et al.19 has discussed this need and proposed a VLT protocol. This protocol allows for broad adoption and comparability through a design philosophy of modularity for flexibility and scalability that leverages computation tools for data acquisition, segmentation, and signal processing.
While a proposed framework and protocol exist for measuring state vocal fatigue, addressing the inherent heterogeneous response from participants remains an unresolved challenge. Nanjundeswaran and Shembel20 have proposed a conceptual framework that highlights the need for a better understanding of individual differences in vocal demand responses related to vocal fatigue. Recent studies by Shembel and colleagues provide direct evidence for this heterogeneity in vocal demand responses. Their work examining the effects of vocal loading on various voice parameters demonstrated significant variability in how individuals with and without voice disorders respond to similar vocal demands21,22.
Given this documented variability, a critical next step is to categorize these different response patterns. One potential approach to measuring the heterogeneous response to vocal fatigue is to develop detection and classification of individuals into subtypes of vocal demand responses. Based on previous VLT research and the framework from Hunter et al.2 changes in vocal performance and/or perceived vocal effort may demonstrate vocal demand responses, which implicate vocal fatigue. For the purposes of this study, vocal fatigue is operationalized as measurable changes in either or both of two key dimensions: (1) self-perceived vocal effort as quantified by the Borg CR-100 scale ratings before, during, and after the VLT; and (2) objective changes in vocal performance parameters encompassing the subjective qualities of pitch, loudness, and voice quality, quantified by speaking fundamental frequency (F0), speech level (SL), and smoothed cepstral peak prominence (CPPS) respectively. Since these two types of demand responses (vocal performance and perceived vocal effort) are not necessarily related, each demand response will be independently classified. The combination of these classifications provides a basis for subgroups of individuals with homogeneous responses to the VLT for the study of vocal fatigue.
The purpose of this paper is to use a highly structured VLT protocol to classify individuals who respond to either changes in vocal effort or in vocal performance as a result of prolonged loud speaking with background noise. We hypothesize that a combination of measured changes in vocal performance and perceived vocal effort will classify VLT participants with and without vocal demand responses. Additionally, individuals who exhibit vocal demand responses may be implicated in having vocal fatigue. The successful confirmation of the hypothesis paves the way for personalized medical interventions that will enhance patient health outcomes.
Methods
Participants
A total of 37 participants qualified and consented to participate. A target sample size of 40 participants was initially devised based on previous VLT studies10, which would accommodate potential four-class subtyping (yielding approximately 10 participants per subgroup) while also providing sufficient statistical power for aggregate analysis. The participant group consisted of 19 participants who identified their gender as women and 18 participants who identified their gender as men. The average age of the participants was 20.1 years (SD: 1.4) with 29 participants identifying as White, 5 as Black or African American, 2 as Hispanic or Latino, and 1 as Asian. The participants were enrolled at Michigan State University and received course credit as compensation for their participation. Michigan State University’s Human Research Protection Programs Human Subject Review Board provided human research participation oversight and approval for all experimental protocols (STUDY00004125, LEGACY16-689). Written, informed consent was obtained from all participants, and all methods followed relevant guidelines and regulations. To be included in the study, participants must be between the ages of 18 and 49 and be native speakers of American English. Participants were excluded prior to participation if they self-reported as having current or past speech, voice, or hearing problems, currently smoking, or had a significant self-reported vocal handicap as determined by a VHI-1023 score exceeding 20. The participant group had a mean VHI-10 score of 8.4 (range 0–16; SD: 4.4) indicating mild to moderate self-perceived vocal handicap. It should be noted that while self-report measures and questionnaires were used for screening, direct laryngeal visualization was not performed as part of the study protocol. Additionally, the participants were tested for normal hearing through pure-tone stimulation (air conduction) of at least 20 dB HL in both ears at 500 Hz, 1 kHz, 2 kHz, and 4 kHz.
Instrumentation
Participants’ speech was recorded using a head-mounted omnidirectional microphone (B3, Countryman Associates, Menlo Park, CA) placed 5 cm from their mouth. The microphone signal was pre-amplified (HV-3D, Millennia Media, Diamond Springs, CA) and digitized (ADI-8 DS, RME Audio, Haimhausen, Germany) before being recorded using a digital audio workstation (REAPER, Cockos, Rosendale, NY) at a sampling rate of 44.1 kHz with 16-bit resolution. A reference sound level meter (IEC 60651 Type 2) was positioned 50 cm from the speaker’s mouth, and its reference microphone was calibrated to 94 dB SPL (relative to 20 \(\upmu \hbox {Pa}\)) using the two-step calibration procedure for head-mounted microphones found in Švec et al.24 PsychoPy (v3.0.225) was used to present the stimuli and collect the user’s vocal effort ratings. The schematic for instrumentation is shown in Fig. 1.
Procedure
After informed consent and the hearing screening, participants completed a series of tutorials to be introduced to the rating scales and speech stimuli used during the experiment. The tutorials provided instruction on how to use the computer interface and gave examples and practice of the map description task (see below). These tutorials were pre-scripted with simultaneous text and audio instructions to ensure that all participants received identical instructions and optimize participant economy2,15. Then the participants completed a vocal loading task (VLT), which consisted of describing routes on maps in background noise for up to 30 min. The participants were asked to describe the routes accurately and in a manner that their instructions would be understood to someone needing to create the route. This context was include to provide communicative intent in the task. The background noise was multi-talker speech babble (six female and six male North American speakers26) that gradually increased in intensity from 45 dBA to 75 dBA over 30 s at a rate of 10 dB every 10 s. The maximal level of noise persisted throughout the task until voluntary termination or completion of six 5-min intervals (30 min). Before and after the VLT, participants read aloud the first paragraph of the Rainbow Passage27 and were instructed to read with comfortable pitch and loudness. During the VLT, the participants rated their perceived vocal effort using the Borg CR-100 (see Fig. 228,29). These ratings were measured before, after, and every five minutes of the VLT for a total of eight measurements throughout the task. The Borg CR-100 scale was both anecdotally and experientially anchored following the procedure in Hunter et al.29.
Acoustic measurement
Speech samples were processed to remove non-voicing segments30,31. From each voice-only-concatenated speech segment, five acoustic parameters were computed: mean speaking fundamental frequency (F0), standard deviation of speaking fundamental frequency (F0sd), speech level (SL), standard deviation of speech level (SLsd), and smoothed cepstral peak prominence (CPPS). These parameters were selected to reflect basic vocal performance parameters including pitch, pitch variability, loudness, loudness variability, and voice quality.
F0, F0sd, and CPPS were computed using Praat (v.6.1.0932). Settings for F0 computation in Praat were: F0 range for male-pitched voices, 65 Hz to 350 Hz; F0 range for female-pitched voices, 150 to 800 Hz. Additionally, F0 and F0sd were converted from Hertz to semitones (ST) with the average F0 of the pre-VLT Rainbow Passage as the reference. The mean and standard deviations of speech level were computed from a distribution of speech level measurements from a moving window of 20 ms with 50% overlap.
Statistical analysis
SPSS (v. 26.0, IBM, Armonk, NY) was used for statistical analysis. Normality, independence, and equal variance assumptions were checked. If these assumptions were met, one-way analysis of variance (ANOVA) tests with an alpha level of 0.05 with Bonferroni multiple comparison adjustments were used to compare the sample means for self-reported vocal effort level ratings (VER) and the five acoustic parameters (F0, F0sd, L, Lsd, CPPS) across each time point of the VLT. Pair-wise comparisons of each time point (pre, post, and the six 5-min increments during the loading task) were done using post hoc Tukey HSD tests. Welch’s ANOVA and Tamhane’s T2 post hoc tests were used if equal variance could not be assumed.
Two dimensions of participant grouping related to vocal demand responses were used, one for vocal effort response and the other for vocal performance response. To classify participants based on vocal effort response, ten proposed features were derived from the vocal effort ratings (see Table 1). Two groups with minimal and significant features were clustered using iterative k-means. The feature set was reduced based on feature importance and statistical significance within the models. The two resulting groups were labeled as “high vocal demand response” and “low vocal demand response” based on assumptions about the relationship between changes in vocal effort and vocal fatigue during vocal loading.
Participants’ vocal performance responses were categorized into groups using a general linear model (GLM) with an alpha level of 0.05 fitted for each participant, with time (pre- and post-vocal loading task) as the dependent variable and the five acoustic parameters as covariates. Participants were then grouped based on whether they exhibited a significant model and at least one significant change in acoustic feature within the model, with the “voice change” group indicating significant change(s) in vocal performance and the “no voice change” group indicating no significant changes.
After clustering based on both the vocal effort response and vocal performance response, four groups were created by intersecting these groups. The vocal performance of the groups was compared before and after vocal loading using the same statistical procedure as for the aggregate group.
Results
The extracted parameters VER, F0, SLsd, and CPPS met the assumptions for normality and independence, however, equal variance could not be assumed. F0sd and SL met all three assumptions. Table 2 summarizes the mean and standard deviation estimates for each measure across the time points (pre-VLT, 5-min increments during the VLT, and post-VLT) for all participants.
There were only significant differences between PRE and POST with VER and F0. There was significant increase of VER of 22.7 from pre-VLT to post-VLT (p = 0.001). There was significant but small increase of F0 of 0.83 ST from pre-VLT to post-VLT (p < 0.0001). For VER, F0, F0sd, SL, and SLsd there were significant increases between PRE- and the measurements during the VLT (p < 0.05 for each test; see Table 2 for magnitude of change). For F0, F0sd, SL, and SLsd there were significant decreases between POST- and 30 min of VLT (p < 0.05 for each test; see Table 2 for magnitude of change). VER did not decrease to pre-VLT values. There were not significant changes with CPPs. Additionally, there were no significant differences for any measure between the time increments of the VLT.
Vocal demand response clustering
The data were clustered based on two significant features, the noise demand response (NDR) and the temporal demand response (TDR). NDR is the difference between the vocal effort rating after five minutes of vocal loading \((VER_5)\) and the vocal effort rating prior to the loading task \((VER_0)\), while TDR is the difference between vocal effort rating after thirty minutes of vocal loading \((VER_{30})\) and the vocal effort rating after 5 min of vocal loading \((VER_5)\). NDR quantifies the effect of noise on vocal effort as expected by the Lombard effect,14 while TDR quantifies the effect of time speaking within noise across the VLT. Both features were found to be statistically significant (NDR: p = 0.003; TDR: p < 0.001) in the k-means clustering analysis. It is important to note that while statistical significance was achieved for these features, they were chosen to maximize the differences across the cases in the clusters and are used only for descriptive purposes. Independent samples t-tests showed that the means of the two clusters for NDR (p < 0.001) and TDR (p = 0.001) were significantly different. Additionally, these two features were also found to be uncorrelated (r = 0.08). Cluster 1 consisted of 14 participants and had a center of NDR of 26.4 and TDR of 32.9, while Cluster 2 consisted of 23 participants and had a center of NDR of 12.2 and TDR of 0.8. Cluster 1 was relabeled as high vocal effort response (HVER) and Cluster 2 was relabeled as low vocal effort response (LVER) based on their respective features. Table 3 summarizes the count and centers of the two clusters, while Fig. 3 shows the data separated by cluster, including the cluster centers.
The analyses for VER over the duration of the VLT were repeated with the two groups (see Fig. 4). HVER showed a significant (p < 0.001) main effect of VER across the VLT, whereas LVER did not exhibit any significant effect of VER. For HVER, there was a considerable increase in VER from PRE to VL05 (26.4, p < 0.001), from VEL30 to VEL05 (32.9, p < 0.001), and between PRE and POST (46.4, p = 0.001). There were no significant differences in VER between HVER and LVER at PRE, VL05, or VL10. However, for the other time points, HVER had significantly higher VER than LVER (see Table 4).
Acoustic voice change clustering
Out of the total number of participants, 16 were found to have significant general linear models (GLM) with a p-value less than 0.05 for the model and at least one acoustic covariate when comparing the PRE-POST differences of the five vocal performance measures. The remaining 21 participants who did not have significant models were grouped as the no voice change group (NC), while the 16 participants with significant models were placed in the voice change group (VC). The VC group had an average goodness-of-fit coefficient of 0.93 (SD = 0.27). As the GLMs for the NC group were not significant, no goodness-of-fit coefficients are reported.
Vocal demand response and acoustic voice change subtyping
Four subgroups were formed by combining the clusters based on vocal effort response and acoustic voice change, namely low vocal demand response and no voice change (LVER-NC), low vocal effort response and voice changes (LVER-VC), high vocal effort response and no voice change (HVER-NC), and high vocal effort response and voice changes (HVER-VC), as cross-sections. LVER-NC had 15 participants (10 males and 5 females), LVER-VC had 8 participants (2 males and 6 females), HVER-NC had 6 participants (3 males and 3 females), and HVER-VC had 8 participants (3 males and 5 females). A summary of the groups is presented in Table 5.
Following the combined clustering, pre-post differences in vocal performance measures were repeated for each subgroup. These differences with additional statistical details are summarized in Table 6. Mean fundamental frequency (F0) significantly increased for all groups except LVER-VC. For the HVER-VC group (n = 6), all five acoustic parameters of vocal performance were significantly changed from pre to post VLT. Specifically, F0 increased by 0.78 ST (p < 0.001), F0sd increased by 0.42 ST (p = 0.022), SL increased by 1.48 dB (p = 0.022), SLsd increased by 0.28 dB (p = 0.004), and CPPs decreased by 0.54 dB (p = 0.045). No other statistically significant relationships were observed.
Discussion
This study aims to identify vocal fatigue through the classification of individuals based on their response to the vocal demands of prolonged speaking with elevated background noise. This step is important in better understanding individual differences and group classification which can lead to better interventions for vocal fatigue. For this research, vocal fatigue was operationally defined through two dimensions: (1) self-perceived vocal effort (Borg CR-100 scale) and (2) objective changes in vocal performance measured through speaking fundamental frequency (F0), speech level (SL), and smoothed cepstral peak prominence (CPPS). A combination of unsupervised machine learning and null-hypothesis testing was used to subtype participants. The main hypothesis is that this approach will help identify participants who exhibit changes in vocal effort or vocal performance, as well as those who experience state vocal fatigue related to the tested communication demands (prolonged speaking and elevated loudness). This hypothesis is supported by the distinct differences in vocal demand responses across the four classified responder subgroups.
Prior to classification, there were few detectable changes because of the VLT. The most notable changes in vocal effort and performance occur between the time before the vocal loading task (PRE) and after 5 min of the task (VL05), with significant increases in VER, F0, F0sd, SL, and SLsd. These changes are consistent with the Lombard effect where an increase in background noise results in a change in voicing to accommodate the noise33,34. Interestingly, no significant changes were observed throughout the duration of the VLT, suggesting that the voicing pattern remained constant until the background noise was removed. While VER trended upward throughout the VLT, it was not significant until clustering was performed. The PRE-VLT VER levels (17.1; between slight and moderate vocal effort) were higher than previously reported vocal effort ratings using the Borg CR-10 scale in conversational speech (1.4; between very slight and slight vocal effort35), but still fell within the same range as baseline vocal effort ratings measured with the Borg CR-100 scale in a laboratory setting (24; between slight and moderate vocal effort14). The PRE-POST increase in VER was expected and consistent with previous VLT studies, but the increase in F0 was not consistently observed in previous studies and may be due to a vocal warm-up effect. Studies have demonstrated a similar warm-up effect in college students, where there was a change in voice quality throughout the day6, and in schoolteachers throughout their workday36. More changes in vocal production were expected between PRE and POST, but before subtyping, the changes in VER and F0 did not implicate vocal fatigue.
Vocal demand response subgroups
The clustering analysis of VER revealed two distinct groups with significantly different responses to the vocal demand. These groups were characterized by their noise demand response and temporal demand response, which relate to individual responses to the background noise demand and prolonged speaking demand presented during the study.
While the VER clustering provided valuable information, the second stage of acoustic clustering offered additional insight. Three of the four groups exhibited the same changes in F0 similar to the aggregate subject pool. Notably, the LVER-VC group, which had low vocal demand responses but significant voice changes, did not show any significant acoustic voice changes as a group. Upon closer inspection, it was discovered that individual variation was high in this group and the direction of voice change was inconsistent, resulting in an aggregate of no change. Conversely, the HVER-VC group, which had high vocal demand responses and significant voice changes, exhibited similar changes in all acoustic measures between PRE and POST, resulting in statistically significant results. These findings indicate a measurable individual component in voice change and vocal fatigue resulting from vocal demand26. These results also shed light on the conflicting findings from previous attempts to measure vocal fatigue associated with vocal loading. Moreover, they offer additional empirical support for the conceptual framework proposed by Nanjundeswaran and Shembel20, which highlights the heterogeneous nature of vocal fatigue.
This heterogeneity is empirically supported by findings from related vocal loading task studies, which demonstrated variable responses across different clinical populations and voice parameters21,37. These investigations observed that self-perceptual measures of vocal effort and discomfort consistently showed significant changes after vocal loading in both typical voice users and those with primary muscle tension dysphonia (pMTD), while objective measures such as supraglottic compression, acoustic parameters, and extrinsic laryngeal muscle tension varied considerably between and within groups. Specifically, quantitative measures of laryngeal configuration and acoustic measures like cepstral peak prominence (CPP) showed complex relationships with perceived effort rather than consistent group-level changes after vocal loading38. These findings parallel the current study’s observation that only when individuals are classified by their specific responses to vocal demands do meaningful patterns of vocal fatigue emerge, supporting the need for subtyping approaches rather than relying solely on group averages39. This has implications for tailoring vocal health interventions to individual profiles rather than general trends.
Building on these insights, although the present study did not specifically investigate explanatory factors for individual classification, the framework suggests that differences in individuals’ baseline vocal fitness and their perception of vocal demands could explain the formation of vocal demand response groups20,39. Future studies should directly assess baseline vocal fitness potentially through physiological, aerodynamic, and self-assessment measures to determine if pre-existing vocal capabilities predict an individual’s vocal demand response pattern and susceptibility to vocal fatigue.
Vocal fatigue symptoms
Secondary analyses were performed to investigate potential associations between vocal fatigue symptoms and the vocal demand response subtypes. Prior to the experiment, participants completed the Vocal Fatigue Index (VFI)3, allowing for comparison between self-reported vocal fatigue symptoms and the identified response subtypes. The VFI subscales (tiredness of voice, physical discomfort, improvement with rest) were compared across the clusters and response dimensions using one-way ANOVA tests (alpha = 0.05) with Bonferroni multiple comparison corrections. Notably, participants in the voice change group (LVER-VC and HVER-VC combined) demonstrated significantly higher scores (p = 0.01) on the second VFI component (physical discomfort; VFI-2) compared to those in the no voice change group (LVER-NC and HVER-NC combined). The voice change group had a mean VFI-2 score of 3.64 (SD = 2.56), while the no voice change group had a mean score of 1.52 (SD = 2.04). No other relationships were found to be statistically significant. These findings suggest the potential utility of the VFI-2 subscale as a screening tool for identifying individuals at risk for vocal misuse under vocal demands such as background noise or prolonged speaking. Prior research supports this application, showing meaningful correlations between the physical discomfort subscale and both physiological measures (pulmonary function)40 and environmental factors (classroom size)41 in occupational voice users. The relationship between self-reported physical discomfort and objective vocal changes may offer clinicians an efficient means to identify patients who would benefit most from targeted intervention strategies.
Potential clinical implications
The LVER-VC group holds promise for clinical interest because these individuals may experience vocal fatigue without perceiving changes in vocal effort. Consequently, they might not observe the need for proper vocal rest during periods of fatigue, unlike the HVER-VC group. This aligns with the theory proposed by Whitling et al.26, who observed a subset of participants exhibiting remarkable endurance in VLTs. The authors suggested that this group of individuals with heightened endurance may share characteristics with patients seen in voice clinics, implying that repetitive overuse of the voice without adequate regulation could pose a risk factor for voice disorders. Moreover, this group is primarily comprised of participants who are female (female-to-male ratio of 6:2)—a demographic with a higher risk of voice problems42,43.
Building upon these observations about the LVER-VC group, vocal loading research on patients with primary muscle tension dysphonia (pMTD) contributes to a framework for understanding the disconnect between objective vocal changes and subjective perception21,22,37. The poor correlations between physical measures (extrinsic laryngeal muscle tension, supraglottic compression) and perceived vocal effort suggest that afferent (sensory) mechanisms may be more critical in symptom manifestation than motor function. For the LVER-VC group, this sensory processing difference may delay appropriate compensatory behaviors. This extends beyond simple endurance to suggest that sensory awareness training could be a valuable therapeutic approach, potentially preventing progression from subclinical voice changes to voice disorders through improved proprioceptive monitoring.
Another potential clinical interest is applying the VLT classification as an objective marker for how vocal responses change over time. Goals of intervention relating to reducing excessive vocal effort or adverse vocal demand responses could be evaluated by the classification of the responses to the VLT. Additionally, this classification approach focuses on individual performance, which could help create a personalized way to detect the positive impacts of therapeutic intervention.
Limitations and opportunities
As with all studies, there are limitations as well as opportunities for future work. One limitation of this study is the restricted sample population of college-age adults, which may limit the generalization of the findings to other age groups or populations which may have a different response to the study parameters. Additionally, the segmenting of the population into four distinct subgroups reduces the statistical power of the study, despite having more participants than many other vocal loading studies10. Nonetheless, the ability to observe significant differences within these smaller groups is noteworthy.
Expanding the subject pool both by amount and diversity would enhance the study’s validity. To facilitate this, the study was designed and executed using the free PsychoPy platform, which allows for identical instructions and protocols to be employed in various locations with the necessary hardware (e.g., microphones and speakers). The presentation program incorporated automated segmentation protocols, allowing for rapid data processing, which substantially lowers computation costs. Deploying similar VLT designs will enable comparable research and a practical increase in sample size.
Additionally, psychological and physical measurements of the participants should be collected to investigate possible correlations between vocal demand responses and individual attributes, such as personality and vocal experience. Discovering these traits could unveil potential risk factors for vocal fatigue, leading to a better comprehension of vocal fatigue and laying the groundwork for reducing its prevalence and impact. Given previous research which has shown connections between voice and psychophysical measurement, future studies should better incorporate a battery of measures to assist in vocal health research.
Conclusions
This study shows that inconsistencies in vocal loading task (VLT) studies on state vocal fatigue can be reduced using a multi-faceted approach. This includes a consistent framework and definition for vocal fatigue, a modular and comparable VLT protocol, and a computational method for classifying vocal demand responders. While the first two suggestions have been proposed in previous work, it is important to again reiterate that a constant framework and definition are crucial and modular and comparable protocols are essential for advancing the field. Novel to this report is the approach of vocal demand responder subtyping. This approach quantifies state vocal fatigue through measurable changes in both perceived effort and vocal acoustic parameters, enabling individual classification of fatigue responses. Identifying these responders is important for developing personalized therapeutic approaches and understanding the underlying mechanisms of vocal fatigue, while also providing an example for other voice assessment situations on implementing a precision medicine approach.
Data availibility
The post-processed, de-identified data that support these findings are available upon request from the corresponding author.
References
Moreno, M., Calvache, C. & Cantor-Cutiva, C. Systematic review of literature on prevalence of vocal fatigue among teachers. J. Voice https://doi.org/10.1016/j.jvoice.2022.07.029 (2022).
Hunter, E. J. et al. Toward a consensus description of vocal effort, vocal load, vocal loading, and vocal fatigue. J. Speech Lang. Hear. Res. JSLHR 63, 509–532. https://doi.org/10.1044/2019_JSLHR-19-00057 (2020).
Nanjundeswaran, C., Jacobson, B. H., Gartner-Schmidt, J. & Verdolini Abbott, K. Vocal fatigue index (VFI): Development and validation. J. Voice 29, 433–440. https://doi.org/10.1016/j.jvoice.2014.09.012 (2015).
Halpern, A. E., Spielman, J. L., Hunter, E. J. & Titze, I. R. The inability to produce soft voice (IPSV): a tool to detect vocal change in school-teachers. Logoped. Phoniatr. Vocol. 34, 117–127. https://doi.org/10.1080/14015430903062712 (2009).
Remacle, A., Garnier, M., Gerber, S., David, C. & Petillon, C. Vocal change patterns during a teaching day: Inter- and intra-subject variability. J. Voice 32, 57–63. https://doi.org/10.1016/j.jvoice.2017.03.008 (2018).
Ben-David, B. M. & Icht, M. Voice changes in real speaking situations during a day, with and without vocal loading: Assessing call center operators. J. Voice 30(247), e1-11. https://doi.org/10.1016/j.jvoice.2015.04.002 (2016).
Carroll, T. et al. Objective measurement of vocal fatigue in classical singers: a vocal dosimetry pilot study. Otolaryngology 135, 595–602. https://doi.org/10.1016/j.otohns.2006.06.1268 (2006).
Schloneger, M. J. & Hunter, E. J. Assessments of voice use and voice quality among college/university singing students ages 18–24 through ambulatory monitoring with a full accelerometer signal. J. Voice 31(124), e21-124.e30. https://doi.org/10.1016/j.jvoice.2015.12.018 (2017).
Cantor-Cutiva, C., Bottalico, P. & Hunter, E. Work-related communicative profile of radio broadcasters: a case study. Logoped. Phoniatr. Vocol. 44, 178–191. https://doi.org/10.1080/14015439.2018.1504983 (2019).
Fujiki, R. B. & Sivasankar, M. P. A review of vocal loading tasks in the voice literature. J. Voice 31(388), e33-388.e39. https://doi.org/10.1016/j.jvoice.2016.09.019 (2017).
Erickson-Levendoski, E. & Sivasankar, M. Investigating the effects of caffeine on phonation. J. Voice 25, e215-9. https://doi.org/10.1016/j.jvoice.2011.02.009 (2011).
Whitling, S., Lyberg-Åhlander, V. & Rydell, R. Long-time voice accumulation during work, leisure, and a vocal loading task in groups with different levels of functional voice problems. J. Voice 31(246), e1-246.e10. https://doi.org/10.1016/j.jvoice.2016.08.008 (2017).
Anand, S., Bottalico, P. & Gray, C. Vocal fatigue in prospective vocal professionals. J. Voice 35, 247–258. https://doi.org/10.1016/j.jvoice.2019.08.015 (2021).
Berardi, M. L. & Hunter, E. J. Self-perception of vocal effort in response to modeled communication demands. J. Voice https://doi.org/10.1016/j.jvoice.2022.05.020 (2022).
Berardi, M. L. Validation and Application of Experimental Framework for the Study of Vocal Fatigue (Michigan State University, 2020).
D’haeseleer, E. et al. Factors involved in vocal fatigue: A pilot study. Folia Phoniatr. Logop. 68, 112–118. https://doi.org/10.1159/000452127 (2016).
Fujiki, R. B., Chapleau, A., Sundarrajan, A., McKenna, V. & Sivasankar, M. P. The interaction of surface hydration and vocal loading on voice measures. J. Voice 31, 211–217. https://doi.org/10.1016/j.jvoice.2016.07.005 (2017).
Kitch, J. A., Oates, J. & Greenwood, K. Performance effects on the voices of 10 choral tenors: acoustic and perceptual findings. J. Voice 10, 217–227. https://doi.org/10.1016/S0892-1997(96)80002-6 (1996).
Hunter, E. J., Berardi, M. L. & Whitling, S. A semiautomated protocol towards quantifying vocal effort in relation to vocal performance during a vocal loading task. J. Voice https://doi.org/10.1016/j.jvoice.2022.01.003 (2022).
Nanjundeswaran, C. & Shembel, A. C. Laying the groundwork to study the heterogeneous nature of vocal fatigue. J. Voice (2022).
McDowell, S., Morrison, R., Mau, T. & Shembel, A. C. Clinical characteristics and effects of vocal demands in occupational voice users with and without primary muscle tension dysphonia. J. Voice 39, 448–456. https://doi.org/10.1016/j.jvoice.2022.10.005 (2025).
Shembel, A. C. et al. Relationships between laryngoscopic analysis metrics of supraglottic compression and vocal effort in primary muscle tension dysphonia. J. Voice https://doi.org/10.1016/j.jvoice.2023.06.011 (2023).
Rosen, C. A., Lee, A. S., Osborne, J., Zullo, T. & Murry, T. Development and validation of the voice handicap index-10. Laryngoscope 114, 1549–1556. https://doi.org/10.1097/00005537-200409000-00009 (2004).
Svec, J. G., Popolo, P. S. & Titze, I. R. Measurement of vocal doses in speech: experimental procedure and signal processing. Logoped. Phoniatr. Vocol. 28, 181–192. https://doi.org/10.1080/14015430310018892 (2003).
Peirce, J. et al. PsychoPy2: Experiments in behavior made easy. Behav. Res. Methods 51, 195–203. https://doi.org/10.3758/s13428-018-01193-y (2019).
Whitling, S., Rydell, R. & Åhlander, V. L. Design of a clinical vocal loading test with long-time measurement of voice. J. Voice 29, 261.e13–27. https://doi.org/10.1016/j.jvoice.2014.07.012 (2015).
Fairbanks, G. Voice and Articulation Drillbook (Harper & Bros, 1960).
Borg, G. Borg’s Perceived Exertion and Pain Scales (Human Kinetics, 1998).
Hunter, E. J., Berardi, M. L. & van Mersbergen, M. Relationship between tasked vocal effort levels and measures of vocal intensity. J. Speech Lang. Hear. Res. JSLHR 64, 1829–1840. https://doi.org/10.1044/2021_JSLHR-20-00465 (2021).
Maryn, Y., Bodt, M. & Roy, N. The acoustic voice quality index: toward improved treatment outcomes assessment in voice disorders. J. Commun. Disord. 43, 161–174. https://doi.org/10.1016/j.jcomdis.2009.12.004 (2010).
Rubin, A. D. et al. Comparison of pitch strength with perceptual and other acoustic metric outcome measures following medialization laryngoplasty. J. Voice 33, 795–800. https://doi.org/10.1016/j.jvoice.2018.03.019 (2019).
Boersma, P. Praat, a system for doing phonetics by computer. Glot. Int. 5, 341–345 (2001).
Berardi, M. & Hunter, E. J. Voice acoustics and effort of three different communication scenarios presented in an anechoic baseline. J. Acoust. Soc. Am. 153, A171–A171 (2023) (Publisher: Acoustical Society of America.).
Bottalico, P., Passione, I. I., Graetzer, S. & Hunter, E. J. Evaluation of the starting point of the lombard effect. Acta Acustica 103, 169–172. https://doi.org/10.3813/AAA.919043 (2017).
van Leer, E. & van Mersbergen, M. Using the borg cr10 physical exertion scale to measure patient-perceived vocal effort pre and post treatment. J. Voice 31(389), e19-389.e25. https://doi.org/10.1016/j.jvoice.2016.09.023 (2017).
Rantala, L., Vilkman, E. & Bloigu, R. Voice changes during work: subjective complaints and objective measurements for female primary and secondary schoolteachers. J. Voice 16, 344–355 (2002).
Shembel, A. C. et al. Extrinsic laryngeal muscle tension in primary muscle tension dysphonia with shear wave elastography. Laryngoscope 133, 3482–3491. https://doi.org/10.1002/lary.30830 (2023).
Toles, L. E. & Shembel, A. C. Acoustic and physiologic correlates of vocal effort in individuals with and without primary muscle tension dysphonia. Am. J. Speech Lang. Pathol. 33, 237–247. https://doi.org/10.1044/2023_ajslp-23-00159 (2024).
Shembel, A. C. & Nanjundeswaran, C. Potential biophysiological mechanisms underlying vocal demands and vocal fatigue. J. Voice https://doi.org/10.1016/j.jvoice.2022.07.017 (2022).
Hunter, E. J., Maxfield, L. & Graetzer, S. The effect of pulmonary function on the incidence of vocal fatigue among teachers. J. Voice 34, 539–546. https://doi.org/10.1016/j.jvoice.2018.12.011 (2020).
Banks, R., Bottalico, P. & Hunter, E. The effect of classroom capacity on vocal fatigue as quantified by the vocal fatigue index. Folia Phoniatr. Logop. 69, 85–93. https://doi.org/10.1159/000484558 (2017).
Hunter, E. J., Tanner, K. & Smith, M. E. Gender differences affecting vocal health of women in vocally demanding careers. Logoped. Phoniatr. Vocol. 36, 128–136. https://doi.org/10.3109/14015439.2011.587447 (2011).
Roy, N., Merrill, R. M., Gray, S. D. & Smith, E. M. Voice disorders in the general population: prevalence, risk factors, and occupational impact. Laryngoscope 115, 1988–1995. https://doi.org/10.1097/01.mlg.0000179174.32345.41 (2005).
Acknowledgements
The authors would like to acknowledge all participants who consented to join us in this research endeavor. The analysis protocols and techniques reported in this publication were supported by the National Institute of Deafness and Other Communication Disorders of the National Institutes of Health under Award Number R01DC012315 (P.I. Eric Hunter). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
M.B., S.W., and E.H. conceived the experiment, M.B. conducted the experiment, M.B. and E.H. analyzed the results. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Berardi, M.L., Whitling, S. & Hunter, E.J. Voice fatigue subtyping through individual modeling of vocal demand responses. Sci Rep 15, 25718 (2025). https://doi.org/10.1038/s41598-025-10565-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-10565-2