Introduction

The steady rise of life expectancy and ageing-related disorders have created unprecedented public health challenges1,2,3. One in six people will be over 60 years old by 2030, and the number of individuals over 80 years old in 2020 will triple by 2050, reaching 426 million4. Compared with younger adults, older adults experience distinct physical, physiological and neuropsychological changes, including progressive reductions in cognitive and functional skills3. Some such declines are exacerbated in people with Alzheimer’s dementia, an ageing-related disorder typified by temporo-parieto-hippocampal brain atrophy, memory deficits and loss of independence5. Alzheimer’s dementia accounts for 60–70% of all dementia cases worldwide, with its case load of roughly 35 million in 20216 projected to triple by 20505. (For the purposes of this Review, ‘Alzheimer’s dementia’ refers to the typical, amnestic clinical variant of the disorder, rather than other variants associated to Alzheimer’s pathology, such as logopenic variant primary progressive aphasia, posterior cortical atrophy, or frontal variant Alzheimer’s disease.) Alzheimer’s dementia affects approximately 10% of people above the age of 65 years7, with varying cognitive, behavioural, functional, psychiatric and motoric manifestations as patients progress through preclinical, mild, moderate and severe stages8. The impact of Alzheimer’s dementia is multifarious: it is disabling and deadly for patients, burdensome for caregivers and costly for families and health systems at large9. Indeed, combined with less prevalent dementia variants, cases of Alzheimer’s dementia involve global economic costs exceeding US$1.3 billion9.

Such consequences can be mitigated via timely assessments. For healthy older adults, these assessments promote peace of mind and prevent individuals or their relatives from misreading normal changes as dementia symptoms, which can circumvent affective and financial challenges10. For people who receive a diagnosis of Alzheimer’s dementia, these early assessments increase planning time to adopt neuroprotective habits11, optimize pathology-targeted therapies12, and reduce costs by prioritizing routine over emergency care13. However, standard diagnostic tools for Alzheimer’s dementia present numerous limitations. Cognitive evaluations, brain scans and biofluid markers are often expensive, stressful, subject to scheduling delays, and unavailable in many locations (which undermines equitable global access)14. Furthermore, these standard diagnostic tools can lead to false positive diagnoses of cognitive impairment in healthy older adults (which creates unnecessary harm and distress)10, and many tests are insensitive to cognitive impairment when administered in primary care settings15. Thus, there is a pressing need for reliable markers of normal and pathological ageing-related effects that can be delivered in an affordable, patient-friendly, immediate and scalable way.

Speech and language assessments — formal measures of productive and receptive verbal skills — broadly meet these goals14,16. ‘Speech’ refers to the motoric mechanisms that sustain vocal production with active (including the tongue and lips) and passive (including the palate and teeth) articulators. By contrast, ‘language’ encompasses diverse cognitive domains — including phonology, lexico-semantics and morphosyntax — used for comprehending and producing oral or written verbal units. The complex, interacting functions of speech and language have been variously categorized and defined; here, I distinguish five major domains. Motor speech processes are neuromuscular operations for articulating speech sounds. Phonology refers to cognitive mechanisms that support processing of phonemes (abstract categories of speech sounds), syllables and other abstract sound phenomena. Lexico-semantics includes receptive and productive processes that link word forms with context-sensitive conceptual (meaning) information. Morphosyntax refers to processes that govern how words are formed, combined and arranged in sentences. Finally, pragmatic processing and discursive processing are treated together and include context-sensitive mechanisms beyond literal meanings and sentential units, including skills for situation-dependent language use (pragmatics) and textual organization (discourse)16,17. Each of these domains involves multiple subsystems and relations to broader physiological and neurocognitive mechanisms, which increases the complexity of studying them18. Moreover, they variously intersect with multiple aspects of ageing, broadly shaping functionality, interpersonal relations and quality of life19,20. Given the ubiquity of verbal communication in daily life, identifying affected and spared domains of speech and language is core to understanding and distinguishing healthy ageing and Alzheimer’s dementia14,16.

In this Review, I survey speech and language research in ageing and Alzheimer’s dementia, covering receptive and productive measures. First, I outline key approaches to measuring speech and language that are commonly used with older adults. Then, I describe key changes observed in healthy older adults, domains that are typically spared and affected, and underlying neurocognitive phenomena. Next, I survey key speech and language research in Alzheimer’s dementia. Then, I introduce automated speech and language analysis and its applications in distinguishing Alzheimer’s dementia from typical age-related language change. Next, I discuss outstanding challenges for speech and language assessments across methodological, equity-related, translational and therapeutic dimensions. Finally, I outline an actionable agenda for future research and clinical work in this area.

Throughout this Review, I use the term Alzheimer’s ‘dementia’ rather than Alzheimer’s ‘disease’. Whereas patients in the studies reviewed here show clinical dementia symptoms, not all have pathological or biofluid confirmation of Alzheimer’s disease (however, some of these papers use the term ‘Alzheimer’s disease’ because they do not observe the distinction I draw here). In addition, as noted earlier, the term ‘Alzheimer’s disease’ can include different syndromes other than typical Alzheimer’s dementia (for instance, logopenic variant primary progressive aphasia), which fall outside the scope of this paper.

Measures of speech and language

Speech and language evaluations rest on four major approaches. First, neuropsychological tests offer coarse-grained measures of basic language skills, such as sentence comprehension, object naming and word repetition. These tools remain broadly used in their original paper-and-pencil versions and require highly trained clinicians for their administration, scoring and interpretation, with scoring often completed with reference to normative data. A well-known neuropsychological test is the Boston Naming Test, which requires viewing and naming of object drawings (thereby tapping on lexico-semantics) that increase in retrieval difficulty as the test progresses (from item 1, ‘bed’, to item 60, ‘abacus’)21. This approach yields fast individual-level insights into core speech and language domains and can be administered with minimal resources in office or bedside settings. However, it fails to capture important dimensions (such as processing speed), and it lacks fine-grained insights into daily communicative skills.

Second, in psycholinguistic paradigms, individuals respond to visual or auditory verbal stimuli by pressing predefined keys on a computer keyboard or controller. Responses are logged to establish accuracy (whether the response was correct or incorrect) and latency (how fast the response was made). Used mainly for research purposes, this approach often involves strategic stimulus sets that are custom-designed to test fine-grained hypotheses. For example, to examine whether words with weak perceptual associations are harder to retrieve than those with strong perceptual associations in Alzheimer’s dementia, researchers might ask participants to press a key only when displayed letter strings are real words, as opposed to pseudowords, without disclosing that real words will be a mix of weak-association and strong-association words. Data from patients with Alzheimer’s dementia and healthy individuals of the same age can be compared in accuracy and latency across conditions, to establish whether the former word class is distinctly impaired in Alzheimer’s dementia22. Psycholinguistic paradigms are unique in that they offer objective measures of performance on highly specific linguistic categories. However, they can prove lengthy and tiring, and can lack ecological validity.

Third, clinical linguistics assessments analyse disordered speech and language through systematic examination of spoken or written language samples23. Rooted in linguistic theory, this approach investigates motoric, phonological, lexico-semantic, morphosyntactic, pragmatic and discursive aspects of communication, often using transcribed naturalistic speech or structured elicitation tasks. Examiners, including speech-language pathologists, use specialized profiling frameworks to rate and describe aspects of voice (such as dysphonia and vocal fold paralysis), speech (such as dysarthria and apraxia of speech), and language (including scoring utterance length, grammatical complexity or error types). For instance, if a patient seems to have difficulty finding words in a conversation (as signalled by long pauses, vague terms such as ‘thingy’ or descriptions of the referent), analyses can be used to determine whether the difficulties stem from phonological or lexico-semantic problems23. Compared with standardized neuropsychological tests, clinical linguistic assessments afford a richer depiction of how language functions in context, offering insights into both deficits and compensatory strategies of individuals. However, their application requires linguistic expertise and considerable time for transcription, annotation and interpretation. Thus, they have limited clinical use despite their diagnostic precision and explanatory power.

Last, automated speech and language analysis (also known as ASLA) offers rich insights into natural oral production via digital tools. This approach quantifies acoustic features in speech recordings and linguistic features in transcripts14,24, offering complementary measures of language tasks and phenomena that are often targeted by clinical linguistics. First, the voice of an individual is recorded as they describe a picture or perform a fluency or other verbal task, be it at the office of a clinician, at a laboratory or at home. The audio is next preprocessed through computerized tools (including denoising and volume normalization) and transcribed either automatically (via speech-to-text software) or manually (by trained transcribers). Then, the audio and/or the transcripts can be analysed to extract features that are subsequently analysed in statistical models or machine learning pipelines to discriminate individuals on the basis of age or clinical status. Whereas most neuropsychological tests, psycholinguistic paradigms and clinical linguistic assessments mainly reveal ‘deficits’ (errors or inefficiencies based on predefined correct responses), automated speech and language analysis tools capture atypicalities (deviant patterns relative to other individuals, which are not errors per se). However, unlike the other three approaches, this framework does not assess receptive skills, thereby providing only partial coverage of speech and language.

These four approaches often have partly distinct aims and similarities. Additional settings exist for each approach beyond the canonical ones described here — for instance, they might be administered remotely or in-person at the homes of participants rather than in laboratories or clinics. In addition, tasks and measures are often shared among approaches. For instance, picture description tasks are harnessed by all approaches (although less commonly in psycholinguistic paradigms), and accuracy measures are used in the first three. Importantly, the tasks and outcome measures listed here are merely examples of a vast repertoire found within each approach.

In the next two sections, I review data primarily from the former three approaches, which form the bulk of the evidence for language changes in ageing and Alzheimer’s dementia (Table 1). Then, I turn to automated speech and language analysis and how results from this newer approach advance the study of language change.

Table 1 Approaches to studying speech and language in older adults

Healthy ageing

Speech and language skills change continually throughout the lifespan. Typical studies use cross-sectional designs, whereby younger adults (mostly in their midlife, from age 40 to 60 years) and older adults (approximately age 65 years and above) are separated into otherwise demographically matched groups to compare their speech and language outcomes. Although the results of these studies vary owing to participant-level factors (including level of education and bilingual experience)25, specific patterns of decline, preservation and even improvement have been established across speech and language domains (Table 2).

Table 2 Speech and language domains in healthy ageing and Alzheimer’s dementia

Most motor speech processes are similar between younger and older adults. This claim seems true of palatal26 and nasal airflow27 dynamics, including intraoral air pressure28, which are required for proper articulation. Phonation (the production of oral sounds through vocal fold vibration) is also unimpaired in most older adults, despite varying levels of physical deterioration of the larynx and vocal folds29. Speech breathing and voice pitch are often also preserved30,31. However, pitch is sometimes subject to sex-specific changes — women sometimes have reductions in the fundamental frequency of their voice (which leads to a deeper voice)32, and men can have higher lung volumes with age (which enables greater loudness and utterance length)31. In general, phonatory and articulatory changes in older adults do not hinder conversational speech33.

Studies of phonology yield a more complex picture. Ageing does not markedly increase phonemic substitution errors (such as saying ‘cat’ as ‘gat’) during word production34, nor does it compromise discrimination of well-articulated phonemes (such as discriminating whether a word began with /m/ or /n/)35. However, age is a key predictor of language comprehension when phonological input is degraded, even surpassing verbal memory skills36. In addition, phoneme discrimination is more affected by attentional load35 and inhibitory difficulties37 in older adults than in younger adults. The interdependence of phonology and other cognitive domains in older adults is underscored by repetition tasks, which often reveal ageing-related difficulties for single words but not for sentences (which probably benefit from grammatical and semantic priming)38. Older adults further exhibit delayed phonological access (slow access to the sounds of a word)39, in some cases partly driven by mild hearing loss40.

Ageing-related changes are also seen in the lexico-semantic domain. Processing speed is systematically reduced by ageing, in reception tasks (including accessing written words41,42 and finding terms based on their definition43) and in production tasks (including naming pictures44,45 and verbal fluency tasks — listing words that pertain to a given category or that begin with a certain sound)46,47. Specific lexico-semantic slowdowns are linked to distinct neurophysiological alterations (such as reduced N400 modulations when processing contextually unexpected words, such as ‘cat’ after being primed with ‘a type of fruit’)48. Others, such as decreases in verbal fluency, hold upon adjusting for cognitive decline49,50 and even in the second languages of participants25, which attests to their generalizability. Ageing also involves more frequent tip-of-the-tongue states, whereby intended meanings are accessible during production but their corresponding words are not51,52. Tip-of-the-tongue states occur roughly once a week in younger adults but manifest nearly daily for older individuals52 — an increase that has been related to abnormalities in phonology-relevant left-hemisphere brain regions (such as the anterior insula) and tracts (such as the arcuate fasciculus)51,53. Moreover, ageing attenuates embodied language processes (namely, the reactivation of sensorimotor mechanisms during word comprehension). For example, unlike younger adults, older adults show no motor response facilitation when processing action-related words54,55. Finally, among older adults, the tendency to produce high-frequency, early-acquired words predicts future episodic memory decline, even controlling for other cognitive skills56. Overall, reduced speed and efficiency in lexico-semantic processing is a hallmark of ageing.

However, the scenario is markedly different for word knowledge. Until 65 years of age, ageing involves higher comprehension of increasingly rare terms57 and higher lexical diversity during dialogue58. This effect, observed across monolingual and bilingual individuals25,59, has been attributed to greater language exposure and an associated increase in vocabulary (supporting Heaps’ law60). The dissociation between lexico-semantic efficiency and knowledge might be influenced by the ageing trajectories of broader cognitive skills. In particular, processing efficiency hinges on fluid abilities (including attention, inhibition and overall processing speed), which tend to worsen with age25. Conversely, knowledge is associated with crystallized intelligence, which typically improves across the lifespan25,61. Biologically speaking, lexico-semantic changes in knowledge and efficiency might be associated with ageing-related reductions in the thickness of, activation in and connectivity between temporal and prefrontal cortices, as well as decreased left lateralization of language responses62,63.

By contrast, ageing seems to have little effect on morphosyntactic skills. Substantial evidence for this conclusion comes from receptive paradigms. Older adults match their younger counterparts in judging whether sentences are grammatically correct and whether they require singular or plural verbs (although subject–verb agreement judgements become slower with age in the first languages of bilingual individuals, which suggests that efficiency can be reduced even when accuracy is preserved)25. Moreover, older adults outperform younger adults in both tasks when tested in their second language64,65. Morphological priming mechanisms are also unaffected in late life. For example, in German-speaking older adults, processing of specific verbs (such as ‘warnen’ (‘to warn’)) benefits from prior processing of derived nouns (such as ‘Warnung’ (‘warning’)), just as seen in younger adults41. Preserved priming has also been observed for sentences with congruent structures (such as “the clock and the drum move up” preceded by the prime “the pencil and the orange move together”)66. In the same vein, ageing does not affect syntactic processing during comprehension62, nor does it influence the capacity to detect specific words amidst syntactic and semantic sentential manipulations, even in individuals with left frontotemporal atrophy67.

However, age-related declines have been observed in productive tasks that involve aspects of lexico-syntactic integration or derivational morphology skills. For instance, Spanish-speaking older adults are less accurate than younger adults in generating the adjectival form of a given noun (such as generating ‘claro’ (‘clear’) from ‘claridad’ (‘clarity’))68. Moreover, older adults produce fewer relative and subordinate clauses per sentence in spontaneous conversation than younger adults, among other differences69. The preservation of receptive morphosyntactic skills, as detailed in the previous paragraph, despite these productive deficits and executive function difficulties, might reflect ageing-related connectivity changes between bilateral frontotemporal brain networks67.

Signatures of ageing are also found at the pragmatic and discursive level. Relative to younger adults, older adults find it harder to comprehend spoken utterances that are fast-paced or produced amidst noise70. These difficulties can lead others to adopt ‘elderspeak’ in communicating with them71 (Box 1). Furthermore, the language output of older adults is less informative72, less connected73 and less coherent74, as well as syntactically and propositionally simpler, than that of younger adults75. These and other discourse-level changes might be linked to domain-general deficits74. Indeed, among community-dwelling older adults, reduced vocabulary richness and overall speech output are top predictors of lower cognitive skills76. By contrast, decreased use of specific features (such as articles, prepositions, numbers, long words or swear words) in daily conversations correlate with reduced executive skills77. Finally, older adults routinely rely on pragmatic and otherwise cognitive skills to compensate for difficulties in other linguistic domains, often mitigating their communicative impact (for instance, they might resort to metaphors to denote objects whose names they cannot retrieve)78.

In summary, ageing involves broad preservation of motor speech, phonology and morphosyntax, with minor declines that rarely hinder communication. In the lexico-semantic domain, there is a trade-off between increased knowledge and reduced efficiency and speed. The broad preservation of morphosyntactic skills and decay of lexico-semantic skills aligns with the notion of retrogenesis, which posits that domains consolidated later during development (including efficient vocabulary access) decline before those consolidated earlier (including grammatical ones)79,80. Finally, there are a range of diverse pragmatic and discursive particularities in ageing that are associated with broader cognitive challenges. Knowledge of these patterns illuminates the later-life trajectories of these core human faculties and, thereby, helps to prevent normal age-related changes being misread as alert signs of pathological decline. Moreover, this knowledge enables typical language changes to be distinguished from changes observed in Alzheimer’s dementia, as described next.

Alzheimer’s dementia

Early cognitive symptoms of Alzheimer’s dementia include episodic and semantic memory deficits, executive function (including planning) difficulties and other challenges (such as personality changes)81,82,83. Reasonably, these domains are privileged in diagnostic criteria and clinical assessments81. However, speech and language impairments are present in over 35% of mild-stage cases and in most moderate and severe cases84,85, sometimes before executive or memory symptoms emerge86. Thus, exploring these changes in language is critical to enabling early detection. Studies typically include between-participant designs, whereby individuals with Alzheimer’s dementia are compared with healthy older adults or individuals with other disorders (Table 2). Unless otherwise specified, the findings below pertain mainly to mild-stage cases of Alzheimer’s dementia relative to sociodemographically matched healthy older adults.

Motor speech skills are unevenly compromised in Alzheimer’s dementia. Speech timing measures reveal systematic anomalies, including abnormal rhythm and excessive false starts87, as well as longer pauses and syllables, which probably reflect the need for more time to retrieve words as discourse unfolds88,89. Acoustic irregularities have also been reported, pointing to reduced vocal tract resonance90. Conversely, articulatory issues are rare in early stages, but they can worsen as dementia progresses91. However, motor speech deficits in Alzheimer’s dementia are generally less marked than in primary dysarthric disorders, such as Parkinson’s disease92.

Phonological difficulties are also not prominent in Alzheimer’s dementia. Mispronunciations or distortions are infrequent during word production93. Similarly, phonemic paraphasias (incorrect pronunciations resulting from adding, removing or substituting phonemes) can be sporadic or absent depending on the study87,94,95. However, patients are often impaired in fluency tasks that require them to produce words that begin with a given letter, pointing to reduced phonological access43,44,96. When present, phonological manifestations are linked to temporal and inferior frontal brain anomalies97.

By contrast, lexico-semantic disruptions are pervasive in Alzheimer’s dementia. The ability to produce semantically constrained words in a fluency task (such as words pertaining to the category ‘animals’) is consistently compromised96 and declines over time91. When performing this task, patients produce more frequent, less specific terms (for instance, favouring words such as ‘bird’ over ‘eagle’), which suggests a propensity to access highly consolidated, easily retrievable words in the lexicon98 (Fig. 1). These deficits are earlier and greater than those observed in phonemic fluency tasks (for instance, producing words that begin with a given letter)42,43,96,99. Systematic deficits are also observed when patients name pictures of objects, people or places100,101. Indeed, patients are often slow in102,103, or incapable of104, finding certain words that they intend to utter. Moreover, they exhibit exceedingly frequent semantic paraphasias, in which the intended word is replaced by a conceptually related but incorrect term (for instance, saying ‘table’ instead of ‘chair’)45,46. Some patients exhibit distinct difficulties with nouns relative to verbs — a predictable pattern because nouns differentially recruit temporal brain regions that typically begin to undergo atrophy in early disease stages105.

Fig. 1: Performance of individuals with Alzheimer’s dementia compared with healthy control individuals.
Fig. 1: Performance of individuals with Alzheimer’s dementia compared with healthy control individuals.
Full size image

a, Between-group differences in word frequency (left) and word granularity (right) features derived from fluency tasks via automated speech and language analysis. b, Machine learning classifiers show that persons with Alzheimer’s dementia are best discriminated from healthy control individuals upon combining six features from verbal fluency responses (frequency, granularity, imageability, familiarity, length and phonological neighbourhood). *P < 0.05; **P < 0.01. Parts a and b are reprinted from ref. 98, CC BY 4.0.

More generally, lexico-semantic disruptions have been linked to atrophy106,107, cortical thinning108, reduced blood flow109,110 and hypometabolism111,112 across mesial temporal lobe regions that subserve declarative memory, including hippocampal, entorhinal, perirhinal, middle or anterior temporal and/or temporo-parietal cortices. Thus, these disruptions can be seen as linguistic manifestations of broader cognitive deficits17,98. Lexico-semantic deficits can emerge up to 12 years before dementia onset100,113, and even in asymptomatic individuals with confirmed tauopathy104 or PSEN1 mutations86 (common causes of Alzheimer’s disease), which underscores their relevance for preclinical detection. In particular, semantic fluency tasks are a useful target for early screening and longitudinal monitoring.

Most morphosyntactic abilities seem unaffected in individuals with Alzheimer’s dementia. Normal performance has been reported in the use of definite and indefinite articles, verb tense markers and subject–verb agreement, as well as in the formulation of ‘wh’ questions (involving ‘who’, ‘what’, ‘which’, ‘when’, ‘where’, ‘why’ and ‘how’), and in the proportion of inflected words114,115. The preserved capacity to conjugate regular verbs (such as ‘play’–‘played’) contrasts with deficits for conjugating irregular verbs (such as ‘buy’–‘bought’) and for retrieving compound words (which are learned as units despite integrating two free morphemes, such as ‘fretboard’)116. This pattern suggests that procedural (rule-based) mechanisms remain functional although declarative (memory-based) systems do not117. Patients also respond normally to subject–verb agreement and tense violations in receptive tasks118,119. Furthermore, they speak102 and write120 with grammatical accuracy. However, as patients progress beyond the mild stage, specific morphosyntactic skills, such as verb inflection, can yield deficits relative to healthy older adults121,122.

Studies of syntactic complexity in the speech of individuals with Alzheimer’s dementia yield heterogeneous results. Consider studies on sentence length and the proportion of dependent and independent clauses per sentence. In comparisons with sociodemographically matched healthy control individuals, some studies have reported significant group-level deficits in patients90,95, other studies have found impairments in only one-third of patients123, and still other studies have found these domains to be spared115. Nevertheless, when reductions in syntactic complexity are present, they further reduce and become more marked as the disease progresses120,124. Patients also show impaired performance in highly demanding syntactic tasks, such as producing sentences in the passive voice (which requires them to convert direct objects into subjects while preserving their role as action recipients)125 or detecting grammatical roles (which involve ascertaining who is doing what to whom across diverse sentential patterns)126. Overall, the broad sparing of morphosyntactic skills in Alzheimer’s dementia might reflect their dependence on frontostriatal circuits, which are usually unimpaired in early and moderate disease stages17.

Alzheimer’s dementia clearly affects pragmatic and discursive processes, with a typical reduction in skills that are vital for daily communication. In receptive tasks, patients with early-stage Alzheimer’s dementia and individuals with mild cognitive impairment (who are at increased risk of developing Alzheimer’s dementia) exhibit difficulties in inferring lessons from proverbs and fables127, in grasping the gist of a story128 and in understanding ironic, idiomatic or metaphorical phrases129,130. In spontaneous speech tasks, compared with healthy older adults, both groups exhibit lower idea density and lexical diversity72,73, abnormal cohesion131, more formulaic language132 and increased semantic variability133. These impacts reflect degradation of declarative memory. Discourse production in Alzheimer’s dementia is also typified by reduced informativeness and coherence90,134, and both patterns worsen in proportion to executive and memory dysfunction134,135 and with disease progression136. Measures of discursive and pragmatic skills, therefore, could prove useful for longitudinal tracking of cognitive decline.

In summary, language change in Alzheimer’s dementia resembles healthy ageing in its partial sparing of specific subdomains within motor speech, phonology and morphosyntax (including pronunciation and regular verb conjugation), but it involves impairments in other subdomains (including speech timing and irregular verb conjugation) (Fig. 2). Also, compared with healthy older adults, individuals with Alzheimer’s dementia exhibit reduced lexico-semantic knowledge and processing efficiency, as well as diverse pragmatic and discursive deficits. Accoordingly, the language changes observed in Alzheimer’s dementia are not simply an exacerbation of those seen in healthy ageing.

Fig. 2: Speech and language changes in healthy and pathological ageing.
Fig. 2: Speech and language changes in healthy and pathological ageing.
Full size image

Healthy younger and older adults differ in age, but both are disease-free. Healthy older adults and people with Alzheimer’s dementia are matched for age, but one group comprises diagnosed patients. The scope of decline for each speech and language domain is represented by line width. Healthy ageing is associated with minor decay in motor speech, phonology and morphosyntactic processing, and major reductions in lexico-semantic processing speed, discourse and pragmatic skills. Alzheimer’s dementia exacerbates these patterns. The only domain that shows the opposite pattern is lexico-semantic knowledge, which increases in healthy ageing but decreases in Alzheimer’s dementia.

In particular, language changes in Alzheimer’s dementia have profound functional consequences. They can disrupt communication skills, rendering patients restless and agitated19, which in turn can undermine relationships with caregivers19. In fact, in a large cohort of older adults (including healthy individuals and diverse pathologies), the presence of functional expressive and receptive communication deficits, as established with the Minimum Data Set battery, predicted quality of life more strongly than diseases such as cancer, even adjusting for age, sex and comorbidities20. Some language difficulties, especially lexico-semantic ones, become statistically significant before the onset of memory and executive symptoms. Thus, language measures could be used to anticipate which older adults are at elevated risk for Alzheimer’s dementia86,100,104,109,113.

Automated speech and language analysis

Most findings above come from neuropsychological tests, psycholinguistic paradigms and clinical linguistic assessments. In the past decade, these have been increasingly complemented with evidence from automated speech and language analysis. This approach, which is becoming prominent in leading journals137, research challenges138 and clinical trials139, represents a powerful development in the digital health agenda140.

Automated speech and language analysis typically uses machine learning methods24. The recordings of participants are usually divided into a training and a testing set, each comprising two groups (such as patients and healthy control individuals). Speech and language features from the training set are used to train a classifier to discriminate between the two types of participants. The model can then be run on the testing set to establish how well the learned features distinguish the two groups in unseen data. The performance of the classifier can be quantified through different metrics, including the area under the receiver operating characteristic curve (AUC), which typically ranges from chance-level (AUC ≈ 0.50) to near-perfect discrimination (AUC = 0.99)24. Importantly, robust testing of classifiers usually requires several hundreds or thousands of participants141. Otherwise, analyses can rely on accidental details of the training set data that fail to support generalization to new data141.

Certain features identified through automated speech and language analysis are ‘interpretable’ and attributable to specific neurocognitive disruptions. These include speech timing patterns (for instance, pause duration)142, the ratios of specific word classes (such as nouns and verbs, via part-of-speech tagging)105 and the semantic granularity of nouns (the tendency to use more or less precise words for concepts)98. As a specific example of interpretation, semantic memory abnormalities can be captured via audio-derived pause duration metrics (with longer silences reflecting increased word retrieval effort) or text-derived semantic granularity metrics (with lower values revealing a preference for less demanding, conceptually unspecific words)98. Conversely, other features are ‘uninterpretable’, emerging from black-box deep-learning architectures. These features include high-dimensional embeddings of transformer models such as Wav2Vec or BERT, which are potentially sensitive to neurocognitive changes but do not correspond to well-defined acoustic or linguistic constructs143,144.

People with mild cognitive impairment — who are at elevated risk for Alzheimer’s dementia relative to healthy older adults — can be robustly distinguished from cognitively unimpaired individuals via automated speech timing145,146 and morphological and semantic146 features. Models trained with such features match neuropsychological tests in discriminating people with mild cognitive impairment from healthy control individuals147, and acoustic measures surpass neuropsychological tests for the same purpose148. Furthermore, text-level embeddings from automated speech and language analysis can distinguish cognitively unimpaired individuals with amyloid pathology (a core biological marker of Alzheimer’s disease, which most often leads to dementia) from individuals without amyloid pathology149. Furthermore, longer between-utterance pauses and slower speech rate correlate with higher tau levels in cerebrospinal fluid150 and in brain regions including the entorhinal cortex (another biomarker found in most people with Alzheimer’s disease)151. More directly, analysis of lexico-semantic, morphosyntactic and orthographic features in older adults can successfully predict who would develop Alzheimer’s dementia 7 years later152. Moreover, such language-based prediction of Alzheimer’s dementia 7 years in advance was more accurate than that afforded by a battery of cognitive tests (measuring memory, learning, executive functions and crystallized intelligence) and by measurements of APOE (a genetic risk factor for Alzheimer’s dementia)152. Thus, verbal behaviour can hold clues about future Alzheimer’s dementia onset.

Automated speech and language analysis can also robustly discriminate between patients with early-stage Alzheimer’s dementia and healthy control individuals. For instance, pause duration and semantic granularity metrics from brief verbal fluency tasks yield AUC values of up to 0.84 (ref. 142) and 0.89 (ref. 98), respectively, whereas black-box models have reached AUC values of 0.93 (ref. 153) — approximating the AUC range of 0.80–0.97 obtained through widely used cognitive tests154,155.

Automated speech and language analysis features can also track the severity of Alzheimer’s dementia. Semantic granularity correlates positively with executive test scores (which capture cognitive control processes such as planning, working memory, inhibition and cognitive flexibility)98, and combined morphosyntactic and lexico-semantic features correlate positively with episodic memory outcomes (from tests measuring recollection of personally experienced events)147. Also, analysis of verbal fluency responses has shown that fewer and less organized words are associated with lower scores on the Mini-Mental Status Examination, a standard cognitive screener in assessments of Alzheimer’s dementia156. More crucially, greater decline in a composite measure of automated speech and language features, over 18 months, correlated with worsening decline in the Clinical Dementia Rating, the global gold-standard tool to capture dementia severity for diagnostic, prognostic and monitoring purposes157. In addition, single-timepoint analyses show that declines in speech timing correlate with lower hippocampal volume158 and reduced right inferior frontal activation142,145,146,159 — two biological markers of disease progression in Alzheimer’s dementia. Additional single-timepoint analyses show that higher word frequency and lower semantic granularity are linked to lower connectivity along the default-mode and salience networks98, which reinforces the utility of automated language features to capture the degree of neurofunctional anomalies across patients. Thus, it seems that features detected by automated speech and language measures can capture the magnitude of core deficits and neurobiological disruptions in Alzheimer’s dementia.

In summary, automated speech and language analysis fares well in comparison with current clinical benchmarks for detecting and tracking the severity of Alzheimer’s dementia. This approach also has some operational advantages over cognitive (such as memory) and biological (such as blood biomarker) measures used for Alzheimer’s diagnosis. Automated speech and language analysis surpasses the ecological validity of other digital tests (such as gamified memory tests or eye-tracking devices, which require behaviours and devices that are not common in daily life)140. In addition, it can be repeated monthly, weekly or daily, non-invasively and remotely, which is not true of other approaches, such as fluid-based biomarkers140. Moreover, gold-standard tests used for diagnosis, such as dementia rating scales or memory tasks, require clinicians to manually record responses or scores during examination, resulting in considerable error rates160,161. Conversely, automated speech and language analysis involves automated data recording and processing, which circumvents manual intervention and clerical errors during scoring or registering of results. Importantly, automated speech and language analysis tasks typically take 2 to 4 min to administer162. Thus, considering US data that primary care consultations with older adults typically last about 19 min163, they represent a time-efficient addition or alternative to the lengthier tests typically used for screening purposes. Moreover, speech recordings can be obtained remotely by phone164,165 or videoconferencing166. Considering that, in the USA, almost 6 million individuals experience delayed medical care each year owing to mobility challenges and transportation barriers167, remotely deployable tools could promote more equitable access to screenings for Alzheimer’s dementia.

Altogether, automated speech and language analysis stands out as a powerful innovation to discriminate language change owing to Alzheimer’s dementia from healthy ageing. However, its widespread clinical implementation remains elusive (Box 2). Furthermore, use and implementation of this approach, much like neuropsychological, psycholinguistic and clinical linguistic measures, involves numerous challenges.

Challenges of assessing speech and language

Speech and language measures illuminate core processes of healthy and pathological ageing. However, current work brings methodological, equity-related and interventional challenges. These challenges exist for neuropsychological tests, psycholinguistic paradigms, clinical linguistic assessments, and automated speech and language analysis. Identifying such challenges is central for delineating a strategic agenda for the field.

Samples, data and tasks

Most studies have small sample sizes. For example, in two reviews considering neuropsychological tests, psycholinguistic paradigms and clinical linguistic assessments, the mean sample sizes of studies involving patients with Alzheimer’s dementia and healthy control individuals were approximately 47 (ref. 168) and 73 (ref. 16). Most such studies are underpowered, as their analytical designs would require larger groups to yield reliable results — especially if they adjusted for relevant confounds, which very few studies do. As regards automated speech and language analysis, combined samples of patients and healthy control individuals totalled <200 participants in about 70% of studies137 and in the most commonly used automated speech and language analysis datasets169. Multiple reports based on this approach use the ADReSS dataset, which includes only 104 participants for training and 52 participants for testing138. Although exciting results are often reported, these might be inflated, as classifier performance often decreases with larger samples141. Thus, results from patient–control classifiers trained with small samples should be taken with caution unless they are replicated in large, independent cohorts.

Furthermore, recordings used for automated speech and language analysis vary in terms of properties such as length, acquisition device, background noise and mic-to-mouth distance, all of which can affect results24. Some studies use high-end equipment in standardized settings, which provides very ‘clean’ data, but these conditions are hard to replicate across clinical institutions. Thus, a balance between data quality and operational feasibility is yet to be struck in automated speech and language analysis research. More generally, across all approaches, reported findings must be appraised against their sample size and data quality — two factors that call for improvements at large.

Across all approaches, important confounds and design limitations have not been systematically tackled. For instance, cross-sectional studies often test younger adults and older adults in the same year, which results in a comparison across generations. However, as different cognitive outcomes can increase170 or decrease171 across decades, ageing-related enhancements (such as vocabulary expansions) or declines (such as reduced processing speed) might be partly explained by sociocultural discrepancies between generations57. Furthermore, studies on individuals with Alzheimer’s dementia vary in inclusion criteria, with some requiring diagnoses based on current diagnostic guidelines and others relying solely on cut-offs from individual cognitive tests (which represent only one of many components of the diagnostic process)169. In addition, across all approaches to speech and language research, most studies target patients with mild-stage Alzheimer’s dementia, and few studies monitor progression to moderate or severe stages (although data collection can be unfeasible in highly advanced cases, owing to mutism). This sampling precludes establishing rates of faster or slower decline, limiting clinical utility. Some studies even do not establish the stage of the participants, which undermines comparability among reports and integrative conclusions. Furthermore, studies across all speech and language approaches on healthy ageing and Alzheimer’s dementia have been unevenly successful at accounting for sociodemographic variables (such as education) and clinical variables (such as cognitive severity) that are known to modulate speech and language outcomes. Thus, how findings generalize across these factors remains poorly understood.

Another shortcoming lies in the lack of normative data — robust benchmarks of what constitutes normal and abnormal speech and language behaviour in specific countries, age groups and languages. These are available for some (but not all) neuropsychological tests but are mostly absent for psycholinguistic tasks, clinical linguistic measures, and automated speech and language analysis metrics. The definition of inclusion and exclusion criteria for research studies (for instance, to determine minimal levels of performance to be considered a healthy control), severity levels (to stratify groups on the basis of their linguistic skills), and treatment efficacy (to establish whether an intervention leads to healthy-like behaviour) are all weakened because of such a gap. Across all four approaches, normative data would be useful (a) for single-case studies, so that individual performance can be gauged against population-wide patterns; (b) for between-subject designs, to establish whether patients and control individuals in each group fall within the typical range of variation for their assigned group; and (c) for correlational analyses, so that associations between speech or language outcomes and clinical or neurobiological measures can be assessed relative to robust benchmarks. In summary, although experimental work can certainly be pursued without normative data, these would fruitfully expand analytical possibilities.

Task selection criteria also warrant consideration. Within each approach, the instructions and stimuli used to elicit verbal behaviour must be carefully chosen, as they directly affect results. Instruments, such as the Boston Naming Test21, that were developed for US or UK populations are often used in other countries decades after their creation without considering their contextual adequacy. Performance on certain psycholinguistic tasks can be influenced by computer familiarity, which affects tasks that require rapid visual scanning or keyboard use and is rarely accounted for in analytical designs172. Furthermore, the prompts used to collect data for automated speech and language analysis studies are often based on tradition rather than optimality. For example, the popular Cookie Theft picture is used in modern-day studies despite depicting American stereotypes from 1972, which can influence fluency and precision in contemporary research participants173. In summary, because speech and language outcomes are task-dependent174,175,176, findings attributed to a given language domain or phenomenon might instead reflect task-specific properties.

Given that studies differ markedly in their sample sizes, confound controls and chosen tasks, there is considerable variability in their capacity to discriminate between groups or predict the scores of participants on other tasks. Across automated speech and language analysis studies, discrimination between patients with Alzheimer’s dementia and healthy older adults ranges from >90 to 50%24. Predictions of Mini-Mental Status Examination scores are also inconsistent. In Alzheimer’s dementia, for instance, root mean squared errors (indicating how far speech-based predictions are from actual Mini-Mental Status Examination values) are as low as 1.8 and as high as >6.5 (ref. 177). Beyond the above-mentioned issues, such incongruences probably reflect the limitations of purely quantitative measures (such as those predominant in psycholinguistics and automated speech and language analysis), which overlook multiple cultural, social and individual phenomena that are graspable through qualitative frameworks (such as discourse and conversation analysis, which are sometimes incorporated into neuropsychological and clinical linguistic assessments178,179). However, the combined factors that account for such discrepancies are yet to be unveiled.

Language diversity

Another major set of challenges for assessing speech and language abilities is language diversity14. Although there are roughly 7,000 living languages worldwide, the literature on speech and language in ageing and Alzheimer’s dementia, across all four approaches, has produced evidence for fewer than 50 languages, most covered in only a few studies. Highly studied languages are, moreover, mainly spoken in high-income countries, which is where most dementia research takes place. In particular, roughly 70% of word retrieval studies and over 40% of automated speech and language analysis works on Alzheimer’s dementia involve English-speaking countries14, despite English being spoken by roughly 17% of the world population180. Furthermore, many findings from English are not replicated in other language groups. For instance, in comparisons between healthy individuals and persons with Alzheimer’s dementia, the use of pronouns is abnormally high in English-speaking patients but abnormally low in their Bengali-speaking counterparts14. Additional evidence concerns subject omission, which is permissible in Italian but not in English. For instance, given the sentences ‘Lucas ha avviato l’auto’ and ‘Lucas started the car’, the omission of ‘Lucas’ would render the former grammatical (‘Ha avviato l’auto’) and the latter ungrammatical (‘Started the car’). A cross-linguistic study has found that Italian speakers with Alzheimer’s dementia omitted subjects more often than healthy users of the same language, whereas English speakers with Alzheimer’s dementia did not181. Likewise, a machine learning model trained with linguistic features (such as semantic granularity) from English-speaking patients with Alzheimer’s dementia and control individuals offered good classification in an English-speaking testing set but not in a Spanish-speaking testing set182. However, there are a few examples of limited generalizability. For instance, models trained with picture description data from English speakers with and without Alzheimer’s and tested on data from Spanish-speaking Latinos revealed good generalizability for speech timing features, both for detecting the disease and for inferring cognitive severity182. Overall, linguistic features are unevenly sensitive to clinical severity depending on the language of a patient182. The grammatical and neurocognitive differences among languages motivate a need for innovative cross-linguistic research14. Indeed, the language communities with the highest burden of Alzheimer’s dementia and the fewest resources to counter it currently have the least research14.

In summary, major challenges across all four approaches include small and heterogeneous samples, insufficient control of confounds, limited normative data, strong task dependence and biased coverage of the languages of the world. These issues reduce the reliability, comparability and cross-linguistic generalizability of speech and language research on healthy ageing and Alzheimer’s dementia. Addressing them is critical for strengthening both theoretical and clinical conclusions.

Summary and future directions

Overall, speech and language skills decline to some degree in both normal and pathological ageing. Word-processing speed and discourse-level skills show substantial declines in healthy older adults, and both patterns are exacerbated in people with Alzheimer’s dementia. Word knowledge increases in healthy older adults relative to younger adults but decreases in individuals with Alzheimer’s dementia relative to healthy older adults. Conversely, motor speech, phonology and morphosyntax do not exhibit systematic declines in older compared with younger adults, nor in patients with Alzheimer’s dementia compared with healthy older adults. These speech and language changes are probably influenced by degradation of broader cognitive domains, including semantic memory and executive functions.

Such patterns shed light on key changes that shape functionality in healthy older adults and undermine it in those with Alzheimer’s dementia. Indeed, as acknowledged in widely used functionality measures, which capture the capacity to perform daily tasks, verbal communication is central to social interaction, problem solving and independence183. More crucially, knowledge of typical differences between the groups can prevent unwarranted alerts and support screening and diagnostic procedures. In addition, they capture overall cognitive severity, brain abnormalities and preclinical risk — three milestones that amplify their clinical utility.

However, speech and language research on healthy ageing and Alzheimer’s dementia faces major challenges. These involve methodological shortcomings, insufficient concern for diversity, barriers for clinical adoption of automated speech and language analysis, and limited information on treatment efficacy (Box 3). Each gap motivates actionable recommendations for basic and translational progress, all implicating multicentric, transdisciplinary, intersectional efforts.

Methodological soundness and harmonization could be enhanced through an expert advisory panel. Leaders in gerontology, dementia, language and data science, among other disciplines, should come together to advance best practices for researchers studying speech and language in healthy ageing and Alzheimer’s dementia. These could include diagnostic criteria (including the roles of biofluid and clinical data184), task and measure selection (to optimize performance and interpretability while minimizing participant burden166), data acquisition and (pre)processing practices (as done for movement disorders185), and strategies to tackle participant heterogeneity (such as covariance analysis, structural equation modelling and domain-invariant feature selection). Furthermore, the need for larger sample sizes calls for multicentric research initiatives. Successful models can be found in dementia consortia such as ADNI186 and ReDLat187, which are collecting speech and language data across several countries for stringent generalizability tests. Eventually, statistical power could be maximized via cross-consortium projects, which enable separate multicentric initiatives to pool their data together188. Such concerted efforts would be vital to establish normative benchmarks across tasks and measures, ideally stratifying for sex, age, education and dialect in each language. Most and least robust methods can also be ascertained through initiatives whereby research teams use different pipelines to analyse the same dataset (such as the ADReSS challenge138) or their reverse counterpart, whereby the same pipeline is applied to different datasets (a strategy not yet implemented). In particular, longitudinal studies are needed to establish differential linguistic trajectories in older adults with and without dementia, including links to disease and symptom progression.

The inequitable representation of the languages of the world also requires international collaboration. There are distinct approaches to testing for cross-linguistic generalizability. These include conducting language-specific analyses within comparable domains, comparing control-normalized scores between patients who use different languages, or training machine learning models on one language for testing on another189. Actionable inspiration can be found in two projects. The need for cross-linguistically valid, harmonized measures is being tackled by the Mini-Linguistic Status Examination (known as the MLSE) team190. First developed in English, the Mini-Linguistic Status Examination offers brief tests of motor speech, phonology, lexico-semantic, morphosyntactic and verbal working memory skills, all adapted by native-speaking experts. Covered languages include Farsi, French, Greek, Japanese, Moroccan Arabic and Spanish. Expansions of this work to other languages could be vital to foster fairness in speech and language assessments while enabling systematic comparisons to develop culture-sensitive evaluation (and even diagnostic) criteria. Another relevant effort is the Include Network, which leverages pre-existing data from roughly 5,000 participants across 30 countries to identify language-specific or typology-specific and generalizable markers of healthy ageing and Alzheimer’s dementia. Covered languages include Arabic, Cantonese, Catalan, Dutch, English, French, German, Hebrew, Italian, Korean, Mandarin, Portuguese, Russian, Spanish, Tagalog and Turkish. Researchers can join one of these existing initiatives or replicate these collaborative models in a new effort that is suitable for their local context and research aims.

In sum, speech and language changes are a central feature of healthy ageing and Alzheimer’s dementia. Speech and language assessments — via neuropsychological tests, psycholinguistic paradigms, clinical linguistic assessments, and automated speech and language analysis — can inform theoretical and translational agendas to understand and support these populations. Although great progress has been made, there are enduring and emerging challenges that can be overcome with input from neurolinguists, geriatricians, neurologists, data scientists, tech companies, pharmaceuticals, regulatory agencies and policymakers — not to mention patients, families and caregivers.