Introduction

Despite verbal content in speech being the main component of human communication for hearing individuals, there is much more to successful communication than just the verbal message. To understand the contextual form of the word and gain full insights into the speaker’s communicative intent, nonverbal auditory information also needs to be comprehended1,2,3. Prosody is a complex nonverbal speech attribute associated with acoustic properties of pitch, duration, intensity and their combination and modulation in intonation and stress patterns, over temporal scales ranging from individual phonemes to phrases. It carries multidimensional information and serves diverse functions, including disambiguating meaning and highlighting or emphasising particular elements in a spoken message, and signalling emotion4. Broadly, the functions of prosody can be considered linguistically (for example, to indicate whether a statement is declarative or interrogative), or affectively, to convey the speaker’s emotional state5. Modulations in vocal pitch (fundamental frequency (F0)), durational properties (such as syllable length), intensity, and voice quality perceived by listeners to convey emotional states are collectively known as ‘emotional prosody’6,7. Different emotions tend to create different prosody ‘profiles’ in speech. For instance, joy is typically characterised by a faster speech rate, higher intensity, and increases in F0 mean and variability, resulting in more melodic and energetic speech; while sadness is typically characterised by slower speech, at a lower intensity, with decreases in F0 mean and variability8,9. Impaired prosodic processing has been shown to have implications for social interactions and interpersonal relationships6.

Consideration of how processing of prosody is affected in neurodegenerative diseases is important, and may be particularly pertinent to language-led dementia (primary progressive aphasia, PPA) syndromes characterised by profound communication difficulties10,11. Syndromes across the Alzheimer’s disease (AD) and frontotemporal lobar degeneration spectrum are characterised by impairments in speech processing12,13,14,15,16 and/or social signal processing17,18,19,20,21. Previous research has identified impairments in nonverbal auditory perception in typical AD and most notably, in non-fluent/agrammatic variant PPA (nfvPPA) and semantic variant PPA (svPPA)22,23,24,25,26,27. Impaired perception of emotional prosody has been documented in typical AD7,28,29,30,31, svPPA, nfvPPA and the language variant of AD, logopenic variant PPA (lvPPA)5,32,33,34. Patients with lvPPA have impaired tracking of an acoustic parameter (speech envelope) relevant to prosody perception35.

However, all these studies have considered emotional prosodic comprehension using ‘clear’ speech stimuli under ideal, laboratory settings, which are unlikely to reflect the reality of communication in daily life, where speech is often degraded in quality or masked by noise and other competing signals. The comprehension of ‘degraded’ emotional prosody comprehension has been largely unexplored. One widely used technique for altering speech signals experimentally is noise-vocoding, whereby the speech signal is divided digitally into discrete frequency bands (‘channels’), each filled with white noise and modulated by the amplitude envelope of the original signal36. Noise-vocoding approximately simulates the common everyday experience of interpreting vocal signals with reduced acoustic quality (such as over a low-quality telephone or video connection or when whispered)37. Among various alternative methods37,38, noise-vocoding is an attractive paradigm to study the effects of neurodegenerative diseases on the comprehension of degraded speech and in particular vocal emotional signals, for three main reasons. Firstly, it allows for parameterisation of the amount of acoustic degradation of a speech signal. Secondly, spectral detail is important to many emotional prosodic cues; such cues are therefore likely to be sensitive to noise-vocoding, as a technique that targets spectral detail in speech signals. Thirdly, noise-vocoding has previously been used successfully to extract thresholds for comprehension of degraded verbal messages in the diagnostic groups targeted in the present study39, suggesting that this paradigm might be well adapted for deriving analogous measures for degraded prosodic comprehension in the same clinical populations.

Here, we explored perception of emotional prosody in major PPA syndromes and AD, both in ‘clear’ and ‘degraded’ speech forms. In line with previous research5,7,28,29,30,31,32,33,34,35,40 and our own cumulative clinical experience, we hypothesised that people with AD and PPA would perform worse than healthy age-matched individuals at identifying ‘clear’ emotional prosody, with an additional performance cost from degrading emotional prosodic cues; and that these deficits would be most marked in nfvPPA and lvPPA39. We further predicted that emotional prosody comprehension performance would correlate with measures of daily life socio-emotional functioning.

Materials and methods

Participants

Eighteen patients with typical amnestic AD, nine patients with lvPPA, 11 patients with nfvPPA, and 11 patients with svPPA were recruited via a specialist cognitive clinic. All patients fulfilled consensus clinical diagnostic criteria14,41. Twenty-four healthy age-matched individuals with no history of neurological or psychiatric disorders were recruited from the Dementia Research Centre volunteer database.

All participants had a comprehensive general neuropsychological assessment (Table 1). None had a history of otological disease, other than presbycusis; participants assessed in person at the research centre had their peripheral hearing assessed using pure-tone audiometry, following a previously described procedure (details in Supplementary Material is available online). No participants were excluded based on their general neuropsychological testing performance.

Table 1 General demographic, clinical and neuropsychological characteristics of all participant groups.

Due to the COVID-19 pandemic, some data for this study were collected remotely (see Supplementary Materials). We have described the design and implementation of our remote neuropsychological assessment protocol elsewhere42. Participants assed remotely completed the T-MMSE43 while those assessed in-person completed the standard MMSE; for all participants, we applied a scalar conversion to generate a combined MMSE score for incorporation into analyses43 (Table 1). Performance profiles of seven healthy control participants who performed the experiment both in person and subsequently remotely were very similar, justifying combining participants tested in person and remotely in the main analyses (Fig. S3)39.

All participants gave written informed consent to take part in the study. Ethical approval was granted by the UCL-NHNN Joint Research Ethics Committees, in accordance with Declaration of Helsinki guidelines.

Creation of experimental stimuli

Forty-five three-digit numbers (of the form: ‘three-hundred-and-seventy-three’), spoken by two adult male and two adult female speakers (all with Standard Southern British English accents), were taken from a previously normed set of vocal emotional stimuli44. The full stimuli set comprises six emotions (namely anger, surprise, sadness, fear, disgust and happiness). Here, to reduce response demands for patients, we selected numbers portraying three emotions: anger, surprise and sadness, chosen because previous work has shown that healthy older individuals are consistently able to identify these5.

Speech recordings were noise-vocoded using Praat (https://www.fon.hum.uva.nl/praat/) to generate acoustically altered stimuli at either six, 12 or 18 channels (see Supplementary Fig. 1 for spectrograms). Details concerning the synthesis of noise-vocoded stimuli are provided in Supplementary Material online. Three levels of noise-vocoding were used due to considerations of the length of the task and concerns over participant fatigue, in comparison to a previous thresholding procedure which included 24 vocoding channels39. The three channels chosen were designated to signify hard (six channels), medium (12 channels) and easy (18 channels) comprehension. This was informed through previous work showing that vocoded speech at 10 channels is readily intelligible by healthy listeners, whereas four channels only becomes intelligible after hours of training45. At each noise-vocoding level, 15 three-digit number stimuli were presented, with five numbers in each of the three emotions. Twenty-one three-digit number stimuli were kept in ‘clear’ form: six (two for each emotion) were used as practice items for familiarising the participant with the stimuli before the experimental test, and 15 (five per emotion) were used as a clear speech control condition. Thus, a total of 60 stimuli (20 for each emotion) were presented during the experimental test session, which typically lasted around 20 min.

Experimental procedure

The stimuli were administered either in-person in a quiet room via Audio-Technica ATH-M50x headphones at a comfortable fixed listening level (at least 70 dB), or remotely via Labvanced and shared through a video link (see Supplementary materials online).

To be familiarised with the experimental procedure, participants first heard the six practice clear three-digit number stimuli, and asked to identify which emotion each number was spoken with, using a cue card as a guide and response aid if necessary (Fig. S2). Feedback was given for the practice trials and participants were told if they had answered correctly on each of the practice trials.

Participants then continued to the experimental task proper, where first, the 45 noise-vocoded speech trials were presented in randomised order to each participant, followed by the 15 clear speech trials, again in randomised order. On each trial, participants were tasked to indicate, either verbally or by pointing at the cue card (Fig. S2), which emotion was portrayed (e.g., was it sadness, anger or surprise? ). Responses were noted by the examiner for offline analysis, no feedback about the performance was given, and no time limits were imposed.

Assessment of social cognition

To assess patients’ social cognition, the modified Interpersonal Reactivity Index (mIRI) and the Revised Self-Monitoring Scale (RSMS) questionnaires were completed by the primary caregiver or another close informant on each patient’s behalf.

The mIRI, frequently used with people with dementia46,47, is based on the Interpersonal Reactivity Index48. It includes two seven-item subscales: the first subscale measures cognitive empathy in the form of perspective taking, and the second subscale assesses emotional empathy in the form of empathic concern. The questions are formatted in a series of statements and responders are asked how well each statement describes the participant on a Likert response scale.

The RSMS (Lennox & Wolfe, 1984), also frequently used with people with dementia49,50, is a 13-item questionnaire based on the Self-Monitoring Scale51. It is made up of two subscales: the first subscale measures participants’ sensitivity to expressive behaviour, and the second subscale measures the tendency to monitor self-presentation. The questions are formatted in a series of statements and responders are asked how well each statement describes the participant on a Likert response scale.

Data analysis

Data were analysed in R® (v4). For continuous demographic and background neuropsychological data, participant groups were compared using ANOVA or Kruskal Wallis tests, depending on whether assumptions of the general linear model were met. Group categorical data were compared using Fisher’s exact test.

For the main experimental and control (i.e. emotional prosody) tests, a binary score was given for each trial: one if the emotion was correctly identified, zero if incorrect. To reduce bias in subsequent analyses, participants who scored at or below chance on the clear emotion recognition control task (i.e., a score that could have been achieved by random guessing) were excluded. A threshold for chance performance was calculated using the cumulative probability function: 15 trials with probability 0.33’ suggested that a hit rate (k) of nine (out of 15) or above was unlikely to be achieved by chance (p = 0.029 [k = 8, p = 0.084])52.

For the control task (clear emotion comprehension), due to ceiling effects, data were analysed using the Kruskal Wallis test. For the experimental task, data were analysed using a mixed ANCOVA, with vocoding channels as the repeated measure (three levels: six [hard], 12 [medium], 18 [easy]) and diagnosis as the between-subjects factor, adjusting for performance on the control task (performance on clear emotions), and Wechsler Abbreviated Scale of Intelligence (WASI) Matrix Reasoning (as a proxy for disease severity). Additionally, the model included an interaction term between diagnosis and vocoding channel number. Where the omnibus test was significant, post hoc analyses were conducted (pairwise t-tests for ANCOVA; Dunn test for Kruskal Wallis). To facilitate comparison with the control task performance, we also ran unadjusted Kruskal Wallis tests on vocoded task performance at each channel level separately.

Spearman’s rank correlation was used to assess the relationship between accurate comprehension of degraded emotional prosody and disease severity (WASI matrix reasoning), auditory perception (Psycholinguistic Assessment of Language Processing in Aphasia (PALPA)-3, pure-tone audiometry), working memory (digit span), and measures of socio-emotional awareness (mIRI and RSMS). To assess whether any significant correlations between noise-vocoded emotional prosody comprehension and socio-emotional awareness could be explained by generic noise-vocoded speech perception ability rather than specific degraded emotional prosodic comprehension skills, we additionally conducted correlation analyses with performance on a neutral prosody noise-vocoded number repetition task (from39) as the majority of participants had also completed this (excluding three patients with nfvPPA)).

Considering the small cohort sizes and exploratory nature of the study, no corrections for multiple comparisons were conducted, to avoid inflating type II error. Effect sizes (epsilon squared for Kruskal Wallis models (ε2); partial eta squared for ANCOVA models (η2p); Spearman’s rank correlation coefficient (rs) were generated in addition to p-values, and an alpha of 0.05 was adopted as the threshold for statistical significance on all tests.

Results

General neuropsychology profiles were in keeping with the syndromic diagnosis for each patient group (Table 1).

Participants who scored at or below chance on the clear emotion comprehension control task were excluded from subsequent analyses (two patients with AD, one with nfvPPA, and one with svPPA). An additional AD patient was excluded as they were unable to correctly identify any of the ‘angry’ stimuli in the clear speech condition. Sixty-eight participants were included in the final analyses.

General participant group characteristics

Participant groups did not differ significantly in age, sex, handedness, years of formal education or pure-tone audiometry (all p > 0.05). Patient groups did not differ in mean symptom duration (p = 0.136) or combined MMSE score (p = 0.069). Basic speech discrimination (assessed using the PALPA-353) performance also did not differ significantly across participant groups (p = 0.366).

Experimental behavioural data

Data for comprehension of clear and vocoded emotional prosody for all participant groups are summarised in Table 2; Fig. 1.

Table 2 Mean correct raw scores for comprehension of emotional prosody in clear and noise-vocoded speech, in each participant group.
Fig. 1
figure 1

Boxplots of noise-vocoded emotional comprehension performance for each diagnostic group. Panel A shows data across participant groups; here, score refers to the percentage correct of the combined noise-vocoded prosody comprehension score, across all three vocoding channel levels. Significant between-group differences after adjusting for performance on identifying clear emotions and WASI matrix reasoning (as an index of disease severity) are coded with *** as p < 0.001, ** as p < 0.01 and * as p < 0.05. Panels B - F show performance profiles across vocoding channels (and in clear speech) for each participant group separately; score refers to the percentage correct of the prosody comprehension performance at each vocoding level. In all panels, the horizontal line within each box indicates the median score, with the boxes indicating the interquartile range; individual participant data points are superimposed. AD, patient group with typical Alzheimer’s disease; Control, healthy age-matched individuals; lvPPA, patient group with logopenic variant primary progressive aphasia; nfvPPA, patient group with nonfluent/agrammatic variant primary progressive aphasia; svPPA, patient group with semantic variant primary progressive aphasia.

There was a significant difference between diagnostic groups on comprehension of the clear emotional stimuli (Table 2) (χ2(4) = 20.22, p = 0.001, ε2 = 0.24, 95% CI [0.13, 0.46]), with all patient groups performing worse than healthy individuals (AD: t=-3.23, p = 0.001; lvPPA: t=-2.86, p = 0.004; nfvPPA: t=-3.40, p < 0.001; svPPA t=-2.93, p = 0.003) (Fig. S4). There were no differences between patient groups (p > 0.05).

After adjusting for performance on identifying clear emotions and WASI matrix reasoning (as an index of disease severity), there was a significant main effect of diagnosis on vocoded emotional prosody comprehension (F(4,58) = 5.47, p < 0.001, η2p = 0.27, 95% CI [0.09, 1.00]). There were also significant effects of clear emotion comprehension performance (F(1,58) = 8.45, p = 0.005, η2p = 0.13, 95% CI [0.02, 1.00]), vocoding channels (F(2,120) = 18.03, p < 0.001, η2p = 0.23, 95% CI [0.12, 1.00]), and WASI matrix reasoning (F(1,58) = 4.46, p = 0.039, η2p = 0.07, 95% CI [0.00, 1.00]). Post-hoc analyses showed that all patient groups performed worse than the healthy individuals (all p < 0.03), and the lvPPA patient group performed significantly worse than the svPPA group (t=-2.35, p = 0.020); but no other between-group comparisons were significant (all p > 0.05) (Table 2; Fig. 1).

Across groups, participants performed significantly worse at six channels compared with 12 (t = 2.63, p = 0.009) and 18 (t = 3.72, p < 0.001) channels (Fig. 1). There was no significant difference between performance at 12 and 18 channels (t = 1.08, p = 0.280). The interaction between diagnosis and vocoding channel number was also not significant (F(8,120) = 0.60, p = 0.774).

Correlational analyses

In the combined patient cohort, performance on noise-vocoded emotional prosody comprehension was not significantly correlated with peripheral hearing (as measured with pure-tone audiometry; rs(21)=-0.09, p = 0.669) or speech discrimination (as measured with the PALPA-3; rs(26) = 0.23, p = 0.240)). Noise-vocoded emotional prosody comprehension was significantly correlated with WASI matrix reasoning score (rs(40) = 0.34, p = 0.029), forward digit span (rs(42) = 0.44, p = 0.003) and reverse digit span (rs(42) = 0.36, p = 0.018).

Total scores on the clear and noise-vocoded emotional prosody tasks were also significantly correlated with relevant scores on the mIRI and RSMS across the combined patient cohort (Figs. 2 and 3, respectively). Accurate comprehension of clear emotional prosody was significantly correlated with the mIRI subscale: cognitive empathy (perspective taking) (p = 0.038) and at threshold for significance with total mIRI score (p = 0.050) (Fig. 2). Accurate comprehension of noise-vocoded emotional prosody was significantly correlated with the full mIRI (p = 0.016), mIRI subscale: emotional empathy (p = 0.020), mIRI subscale: cognitive empathy (p = 0.031), the full RSMS (p = 0.011), and the RSMS subscale: sensitivity to expressive behaviour (p = 0.001; Fig. 3).

Fig. 2
figure 2

Correlation plots of clear emotional prosody comprehension with measures of social cognition across the patient cohort. This Figure shows how different standard measures of social cognition were correlated with total score on recognition of clear (natural) emotional prosody (see text) across syndromic groups, as follows: (A) correlation with the full modified Interpersonal Reactivity Index (mIRI); (B) correlation with the cognitive empathy subscale in mIRI; (C) correlation with the cognitive empathy (perspective taking) subscale in mIRI; (D) correlation with the full revised self-monitoring scale (RSMS); (E) correlation with the sensitivity to socio-emotional expressiveness RSMS subscale; (F) correlation with the monitoring self-presentation RSMS subscale. Spearman’s rank and p-value shown alongside each correlation line, bold green font indicates a significant correlation (p < 0.05). The percentage correct here is the percentage correct for clear emotional prosody (control task). Dots represent individual participants’ performance, with colours representing each syndromic diagnosis, as coded in the key (right); shading represents 95% confidence intervals. AD, Alzheimer’s disease; lvPPA, logopenic variant primary progressive aphasia; mIRI, modified Interpersonal Reactivity Index; nfvPPA, nonfluent variant primary progressive aphasia; RSMS, revised self-monitoring scale; svPPA, semantic variant primary progressive aphasia.

Fig. 3
figure 3

Correlation plots of noise-vocoded emotional prosody comprehension with measures of social cognition across the patient cohort. This Figure shows how different standard measures of social cognition were correlated with total score on recognition of noise-vocoded emotional prosody (see text) across syndromic groups, as follows: (A) correlation with the full modified Interpersonal Reactivity Index (mIRI); (B) correlation with the cognitive empathy subscale in mIRI; (C) correlation with the cognitive empathy (perspective taking) subscale in mIRI; (D) correlation with the full revised self-monitoring scale (RSMS); (E) correlation with the sensitivity to socio-emotional expressiveness RSMS subscale; (F) correlation with the monitoring self-presentation RSMS subscale. Spearman’s rank and p-value shown alongside each correlation line, bold green font indicates a significant correlation (p < 0.05). The percentage correct here is the combined noise-vocoded score (combined across all three levels of vocoding channels: see text). Dots represent indvidual participants’ performance, with different colours representing each syndromic diagnosis, as coded in the key (right); shading represents 95% confidence intervals. AD, Alzheimer’s disease; lvPPA, logopenic variant primary progressive aphasia; mIRI, modified Interpersonal Reactivity Index; nfvPPA, nonfluent variant primary progressive aphasia; RSMS, revised self-monitoring scale; svPPA, semantic variant primary progressive aphasia.

No significant correlations were observed between noise-vocoded number repetition (i.e. a non-prosodic degraded speech control) and any of the social cognition (mIRI or RSMS) measures (all p > 0.05) (Fig. S5).

Discussion

Here, we have shown that comprehension of acoustically degraded emotional prosody is impaired in patients with AD and PPA syndromes relative to healthy age-matched individuals. In line with previous research and our hypotheses, we showed consistent deficits in clear prosody perception5,28,34. Furthermore, the deficit for emotional prosody comprehension in noise-vocoded speech remained after adjusting for performance on the ‘clear’ (i.e. natural, undistorted) speech task in the present study, suggesting that deficits in emotional prosody comprehension are exacerbated in non-ideal listening conditions in patients with PPA and AD. Based on previous research suggesting that patients with lvPPA and nfvPPA may be particularly susceptible to the effects of noise vocoding39, we had hypothesised that this additional ‘cost’ to noise-vocoding would be more apparent in these groups. However, all of the patient groups had significantly greater costs to noise-vocoding compared with the healthy controls, and the only between-patient group significant contrast was between lvPPA and svPPA. These findings could potentially reflect separate mechanisms, including previously identified problems with perceiving noise-vocoded auditory stimuli39, a core deficit in AD and lvPPA for apperceptive processing (e.g., the representation and decoding of auditory objects), and impaired disambiguation of degraded speech signals in svPPA54. There are likely to be stored neural ‘templates’ corresponding to the perceptual characteristics of the prosodic signatures of particular emotions, and under acoustic degradation, the neural template matching is stressed, analogous with previous work in AD and lvPPA suggesting impaired ‘template activation’ for phonemes55,56,57.

We also identified significant associations between emotional prosody comprehension performance and scores on two social cognition questionnaires (mIRI and RSMS) and their subscales, more consistently for vocoded than clear stimuli. In clear emotional prosody, there was a significant correlation with the mIRI cognitive empathy subscale and a borderline significant correlation with the full mIRI (Fig. 2). For noise-vocoded emotional prosody, we identified the same significant correlations as with clear emotional prosody comprehension performance and found additional significant correlations with the mIRI emotional empathy subscale, the full RSMS and the RSMS sensitivity to expressive behaviour subscale (Fig. 3). Patient groups did not differ significantly in mean scores on any of the social cognition measures assessed here, although patients with svPPA did tend to have lower scores than those with other diagnoses, consistent with previous research suggesting that social cognition is impaired in this population10,58. There was also considerable individual heterogeneity within groups on the mIRI and RSMS (and subscales) (see Table 1), which should be taken into consideration when interpreting the results of the correlation analyses; previous studies incorporating these measures have recorded similar degrees of variability47,50. Importantly, no significant correlations were identified between social cognition questionnaire scores and a noise-vocoded number repetition control task (Fig. S5), nor between noise-vocoded emotional prosody comprehension and pure-tone audiometry or speech discrimination task performance, implying that dissociable, central auditory mechanisms process verbal and nonverbal dimensions of speech signals. We did observe significant correlations between noise-vocoded nonverbal emotional prosody comprehension and measures of executive functioning and working memory (i.e. WASI Matrix Reasoning and digit span tests), consistent with previous research implicating the involvement of a fronto-parietal brain network59,60,61 in these processes.

We chose to use the paradigm of noise-vocoding in the current study as a model to simulate challenging listening conditions relevant to those encountered in daily life (such as a poor quality telephone or internet connection). Our findings have potential clinical significance: noise-vocoding may represent a ‘stress test’ of vocal emotion comprehension by patients with dementia in suboptimal everyday listening environments. Further, as comprehension of noise-vocoded vocal emotions correlates with measures of social cognition, this paradigm might be developed to generate clinical markers of social cognitive impairment. The present results corroborate previous evidence that noise-vocoding is a clinically relevant procedure for assessing the impact of acoustic degradation on the extraction of different kinds of information from verbal messages by listeners with dementia39. One possible explanation is that patients with neurodegenerative diseases targeting core auditory processing networks experience difficulties comprehending emotional cues within our daily hearing environments (i.e. an intrinsic auditory deficit), meaning that they are therefore less likely to engage with emotional cues being spoken and utilised towards them, manifesting as perceived socio-emotional difficulties which are actually secondary to the more basic auditory deficit39,54,62. An alternative explanation is that patients with these conditions have a ‘double-hit’ of a top-down impairment in social cognition58,63,64 coupled with the bottom-up auditory processing deficit. Either way, the significant correlation here implies that our noise-vocoded emotional prosody comprehension task could hold promise as a tool for tracking impairment.

There are several limitations in this experiment and more work needs to be conducted to not only refine the paradigm used here but to also understand the underlying mechanisms, as well as how they are impacted in different dementias and implications for daily life function in patients. Firstly, across groups, participants in the present study performed significantly worse at six than 12 channels, and at six than 18 channels, but there were no significant differences between 12 and 18 channels. This ‘flattening’ between 12 and 18 channels is perhaps unsurprising as it is consistent with the nonlinear noise-vocoding scale (e.g., the perceptual difference between 10 and 11 channels is negligible, whereas the difference between one and two channels is drastic), but future work should aim to refine channel selection. Secondly, the group sizes reported here were relatively small. Considering the rarity of PPA, the collection of substantially larger datasets would require multi-centre collaboration. Thirdly, while we deliberately selected spoken numbers as the prosodic ‘carrier’ in order to reduce potential top-down effects from semantic content (a strategy previously used successfully in studies of both cognitively impaired and normal listeners:5,44), in daily life it is unusual for numbers to be spoken with the emotions studied here, which may affect the generalizability of our findings. Fourthly, and relatedly, noise-vocoding was employed as this allowed us to tightly control the degree of degradedness of the stimuli; while there are qualitative similarities between noise-vocoding and certain daily-life communication scenarios in which the acoustic quality of spoken messages is reduced, other paradigms that more closely simulate everyday listening and other kinds of daily-life acoustic challenge should be explored in future work. Development of digital virtual ‘soundscapes’ would be one such exciting avenue.

Future studies should investigate prosody processing in non-English-speaking patients with dementia, and determine which emotional nonverbal vocal signals are more easily transferred transculturally and cross-linguistically. It will also be important to study patients with other dementia syndromes, including behavioural variant and right temporal variant frontotemporal dementia, in whom impaired emotional prosody perception has previously been identified65. It would additionally be interesting to see whether comprehension of noise-vocoded emotional prosody can be modulated pharmacologically, by dopaminergic or cholinergic stimulation59,66,67 and/or perceptual learning54. This paradigm should also be extended to establish the brain basis and neural mechanisms for comprehending degraded prosodic and other more complex socio-emotional signals in dementia, such as sarcasm21. Further, it would be of interest to investigate the extent to which different diagnostic groups rely on specific prosodic cues when perceiving and comprehending emotional prosody. Finally, conversation analysis methods could be employed to explore how these deficits of emotional prosody comprehension under degraded listening conditions impact real-world interactions between people with dementia and their communication partners68.

Conclusions

The findings presented here open a window on a dimension of real-world emotional communication that has often been overlooked in dementia but is particularly pertinent to social cognitive functioning and communication. We currently lack brief, easy-to-administer, ecologically relevant and quantifiable measures of social cognitive and communication function for major dementias, suitable for use in clinical settings. The present work suggests that comprehension of noise-vocoded emotional prosody may be a candidate paradigm for generating measures of this kind.