Auditory-cognitive contributions to speech-in-noise perception determined with structural equation modelling of a large sample

Benzaquén, Ester; Çolak, Hasan; Guo, Xiaoxuan; Lad, Meher; Rushton, Steven P.; Griffiths, Timothy D.

doi:10.1038/s41598-025-18800-6

Download PDF

Article
Open access
Published: 07 October 2025

Auditory-cognitive contributions to speech-in-noise perception determined with structural equation modelling of a large sample

Ester Benzaquén¹,
Hasan Çolak^1,2,
Xiaoxuan Guo¹,
Meher Lad³,
Steven P. Rushton⁴ &
…
Timothy D. Griffiths¹

Scientific Reports volume 15, Article number: 34915 (2025) Cite this article

290 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Problems understanding speech-in-noise (SIN) are commonly associated with peripheral hearing loss. But pure tone audiometry (PTA) alone fails to fully explain SIN ability. This is because SIN perception is based on complex interactions between peripheral hearing, central auditory processing (CAP) and other cognitive abilities. We assessed interaction between these factors and age using a multivariate approach that allows the modelling of directional effects on theoretical constructs: structural equation modelling. We created a model to explain SIN using latent constructs for sound segregation, auditory (working) memory, and SIN perception, as well as PTA, age, and measures of non-verbal reasoning. In a sample of 207 participants aged 18–81 years old, age was the biggest determinant of SIN ability, followed by auditory memory. PTA did not contribute to SIN directly, although it modified sound segregation ability, which covaried with auditory memory. A second model, using a CAP latent structure formed by measures of sound segregation, auditory memory, and temporal processing, revealed CAP to be the largest determinant of SIN ahead of age. Furthermore, we demonstrated the impact of PTA and non-verbal reasoning on SIN are mediated by their influence on CAP. Our results highlight the importance of central auditory processing in speech-in-noise perception.

Speech recognition in noise across the life span with cognition and hearing sensitivity as mediators of age effects

Article Open access 01 July 2025

Immersive auditory-cognitive training improves speech-in-noise perception in older adults with varying hearing and working memory

Article Open access 07 March 2025

A genome-wide association study reveals a polygenic architecture of speech-in-noise deficits in individuals with self-reported normal hearing

Article Open access 07 June 2024

Introduction

Approximately 1 in 4 people, over 1.3 billion worldwide, have hearing loss, and those of advanced age are overly impacted^1,2. With an ageing population, hearing loss poses an increasing societal problem. Older adults suffering from hearing loss not only face a world without sound, but experience impaired communication, social isolation, and increased risk of depression^3,4 and dementia^5,6,7,8,9.

Older adults who struggle to hear in everyday life situations often complain about their inability to understand speech, especially in adverse conditions, such as with competing speakers in the background, environmental noises, or low or degraded speech¹⁰. Speech-in-noise (SIN) perception, colloquially termed the “cocktail party problem”, has been the target of hearing research for several decades. Although a person’s overall hearing ability, measured by pure tone audiometry (PTA), is a major factor determining speech-in-noise perception, it fails to fully explain it^11,12,13. Additionally, the use of hearing aids is unable to fully resolve SIN difficulties, as demonstrated by the dissatisfaction of some hearing aid users in noisy situations and in large group settings^14,15.

One reason that PTA cannot fully explain speech-in-noise ability nor interventions that only target sound levels (i.e., hearing aids) can fully restore it, is because SIN perception is inherently more complex than a process solely reliant on peripheral auditory mechanisms; it also requires central sound processing, involving the brainstem and cortex^16,17. When speech embedded in noise reaches a person’s cochlea, the target speech still needs to be isolated from the background and remembered until meaning can be extracted. This complex process demands both sound segregation ability and working memory capacity.

Sound segregation is primarily a bottom-up auditory process driven by acoustic cues such as pitch, temporal processing, and spatial location, allowing the brain to separate target sounds from competing background noise¹⁸.To resemble speech in noise, we developed a Stochastic Figure Ground (SFG) paradigm, which aims to emulate the necessity of extracting a meaningful signal from a perceptually similar background but with no linguistic information^19,20. SFG is comprised of pure-tone components that repeat over time to form a figure (target) while random tones varying over time form the ground (masker). Previous research has been successful at linking SFG and speech-in-noise ability, as well as proving the involvement of higher-level brain structures beyond the auditory cortex in its processing^{19,21,22,23,24}. Further, SFG can elicit similar neural entrainment to that of speech^25,26.

Auditory working memory (AWM) capacity has been repeatedly linked to speech-in-noise perception [for reviews, see ²⁷, and ⁸]. From first principles, understanding speech in adverse conditions when the perceptual signal is not clear, such as in speech in noise paradigms, requires a level of “post-processing” of the input sound to decode it, for example, to support postdiction - the reconstruction of misheard words. Thus, it is not surprising that SIN relies partly on our ability to maintain and manipulate sounds in mind. For example, working memory is associated with noise-vocoded speech perception²⁹ and with the processing of speech from competing sources³⁰ – both adverse listening conditions similar to SIN. Working memory may even be the mechanism behind better SIN in musicians^12,31. It has been proposed³² that working memory is especially important when the periphery starts to degrade –e.g., due to age– and the sound input lacks fidelity. In this view, working memory acts to compensate for inaccurate perception. Nevertheless, short-term auditory memory must be, axiomatically, involved in sentence-level speech comprehension; a sentence being processed always needs to be retained in mind until this sentence is completed (or can be accurately predicted) before it can be understood.

Speech-in-noise perception relies on other cognitive abilities beyond central auditory processing, for example, processing speed, inhibitory control, and crystallized intelligence^27,28. It is generally understood that as hearing deteriorates, speech perception requires greater input from cognition³³. This may extend to speech-in-noise perception, where the sensory input is also “deteriorated”. For example, nonverbal reasoning is associated with the recognition of degraded speech in both cochlear implant users and normal-hearing adults³⁴. Additionally, speech-in-noise perception requires less sensory evidence and relies more on preparatory (cognitive) processes³⁵.

Age is undoubtedly linked to SIN ability, as peripheral hearing, central auditory processing, and cognition decrease with age [for a review, see ³⁶]. Age is also known to degrade temporal processing³⁷, another central mechanism implicated in speech-in-noise perception¹¹. In older adults with and without hearing loss, cognitive function appears highly influential in SIN perception, although cognitive function is tightly related to age itself³⁸. The inter-relationship amongst age, hearing, central auditory processing, and cognition complicates the interpretation of these associations. Mainly, if age leads to greater hearing loss and greater cognitive decline, as well as worse auditory processing, how can we disentangle the relationship between these variables and speech-in-noise perception? Additionally, the discrete contributions of working memory and sound segregation to speech-in-noise perception remain to be elucidated.

Here, we have taken a multivariate approach that allows us to study theoretical constructs and their complex casual relationships: structural equation modelling (SEM). SEM is a statistical method where latent constructs can be defined using measured variables or indicators, and directional links modelled based on empirical hypotheses. By creating separate latent variables for central auditory processes, as well as general intelligence as measured by a matrices test, their discrete contributions to speech-in-noise perception for both words and sentences can be assessed. Furthermore, by adding age, PTA, and the causal link between them, a more accurate and detailed view of the effects of age and hearing on SIN can be discovered. We first test a structural equation model where sound segregation and auditory (working) memory are separated in two latent variables. The sound segregation construct is measured using SFG tasks, while the auditory memory construct is measured using the backwards digit span test and tests of precision for delay-matching sounds based on frequency and amplitude modulation rate³¹. Considering sound segregation and auditory memory are part of an overlying theoretical construct, i.e. central auditory processing (CAP), and both rely on similar brain architecture, including primary and non-primary auditory cortex^22,39,40, a second structural equation model is created. In this model, sound segregation and auditory memory, in addition to temporal processing, measured with a between-channels gap detection task, are joined in a latent construct representing CAP.

Materials and methods

Participants

Data from 222 participants (148 females) aged between 18 and 81 years old was collected. Inclusion criteria were native English-speaker status, the absence of any hearing complaints (such as self-perceived or diagnosed hearing loss, the use of hearing aids, or tinnitus), no history of neuropsychological disorders, and no current use of neurotropic medication. A total of 15 participants were removed from all analyses due to dyslexia diagnosis (2), inability to perform the sentence-in-noise task (2), tinnitus (1), non-native English-speaker status (1), incomplete word-in-noise test (1), and due to being duplicated data (8). For those who were tested twice (i.e., duplicated) only the earliest session was used for analysis. Thus, data from 208 participants (138 female) was used for analysis. Participants’ age ranged from 18 to 81 years (mean: 49.13; median: 51.08; standard deviation [SD]: 16.00). Data from 2 participants was incomplete and thus were not included in the SEM analysis. The study was approved by Newcastle University’s Ethics Committee (Reference numbers: 10356/2018 and 46225/2023), and written informed consent was obtained from all participants before the start of the study. The study was performed in accordance with the Declaration of Helsinki (World Medical Association, 2024).

Materials

Speech-in-noise

Speech-in-noise tasks for both words and sentences were included. The word-in-noise (WiN) test consisted of the Iowa Test of Consonant Perception - British version (ITCP-B)⁴¹. The ITCP-B is a single-word closed-set computer task with phonetically-balanced features. The test consists of 120 consonant-vowel-consonant words spoken by either a male or female speaker amongst 8-talker babble noise. Participants hear the target word (e.g., “moon”), which has a one second post-babble onset, and are then presented with a self-paced 4-alternative forced-choice screen with phonetically similar words (or minimal pairs) (e.g., “moon-boon-dune-noon”). Participants make their selection using the numbers 1–4 on the keyboard and are then presented with feedback (“Correct-Incorrect”) for 0.6 s. A new trial starts one second afterwards while a fixation cross is shown on the screen. Words are always presented at a -2 dB signal-to-noise ratio (SNR). The babble was formed by 4 female and 4 male British speakers, and the recording lasted 15 s. A list of pre-defined starting points for the babble was created spanning every 0.1 s starting at 0, and permuted without replacement for each participant. Half of the words were spoken by a male speaker while the other half by a female speaker, and this was randomly selected per participant. Participants were given a break after every 40 trials. The performance was calculated as the proportion of correct answers.

The sentence-in-noise (SiN) test consisted of the British version of the Oldenburg sentences²¹. This is a closed-set test where sentences follow the structure < name-verb-number-adjective-noun> (e.g., “William sees four white houses”) and are spoken by a male speaker masked by 16-talker babble noise. A 5 × 10 matrix of all possible word combinations is presented to participants, who have to respond using a mouse. Answers are considered correct only if all selected words are correct. This is an adaptive paradigm with a 1-up 1-down staircase. The starting SNR was 10 dB, which changed in steps of 3 dB, reducing to 2 dB after the first reversal, and further reducing to 1 and 0.5 dB after 4 and 6 reversals, respectively. The babble was presented for 3.3 s, and sentence onset was 0.25 s post-babble. The masker consisted of a recording of 21.49 s, and its starting point is fully randomised per trial from 0 to 18 s. Sentence presentation is also fully randomised per trial. The task ends after 12 reversals, and the performance threshold is calculated as the median SNR of the last 6 reversals.

SFG

Two SFG tasks were included. The SFG-Gap discrimination task was adapted from Holmes and Griffiths²¹. SFG is formed by tone chords lasting 50 ms. SFG can be divided into two components called “Figure” (target) and “Ground” (masker). The ground was composed of between 5 and 15 tone elements per chord, for a total of 70 chords (3.5 s). Tones were selected from a frequency space between 179.73 and 7246.29 Hz in a logarithmic scale. The figure was composed of 3 tone elements per chord which repeated over time for a total of 42 chords (2.1 s), and it started between chords 16–20. A set of 144 figures was created in advance and the order randomised to present to each participant. When necessary, a new iteration of this set was presented. Two SFG stimuli were presented per trial with an inter-stimulus interval (ISI) of 400 ms. One of the stimuli had a 6-chords-long gap in the Figure, constrained to start between chords 11–32 of the figure. Participants were required to respond to which stimulus had a gap in the figure. This is a 1-up 1-down adaptive procedure, where the target-to-masker ratio (TMR) is changed. Starting TMR was 10 dB, and TMR was changed in steps of 4 dB, which are reduced to 2 dB after the first reversal, and further reduced to 1 dB after 4 reversals. The task ended after 10 reversals. Participants were familiarised with the stimuli at the beginning of each task by introducing the concepts of “figure” and “ground”, and allowing a practice run of 6 trials at the starting TMR.

The SFG-Figure discrimination task followed the same trial structure and adaptive procedure as the SFG-Gap, but consisted of stimuli of 2 s-duration. One of the stimuli was background only while the other one included a figure spanning 6 chords, which again could start at chords 16–20, and the participants’ task was to select which stimulus contained the figure. Due to the adaptive nature of the paradigm and to avoid changes in overall power between both stimuli, the background-only stimulus included a “dummy” figure of the same duration created of random elements not already contained within the ground that changed in TMR in the same fashion. More specifically, three tone elements were added to the 6 adjacent chords representing the figure. The frequencies of said tone elements were selected randomly for each of the 6 chords. The current trial TMR was then applied to this “dummy” figure. This was done to prevent successful task completion by perceiving changes in overall power between stimuli with and without a figure. Thresholds for both tasks were calculated as the median TMR of the last 6 reversals.

Auditory working memory

An auditory memory task to calculate memory precision for frequency (Freq) and amplitude modulation (AM) rate was used^31,42. Participants heard either a one-second pure-tone or AM-modulated white noise. After a delay of 2 s, the target sound had to be matched by clicking on an unlabelled and continuous visual horizontal scale representing the frequency (440–880 Hz) or AM rate (5–20 Hz) space. Frequencies were selected from a uniform distribution and a sinusoidal function was used to apply the amplitude modulation. Every click with a mouse played their selected sound, which could be done repeatedly and without time limit, after which they would press the ‘Enter’ key to confirm. The stimulus type was alternated trial-by-trial for a total of 32 trials. A break was given to participants after 16 trials. Four practice trials, 2 for each stimulus type, were presented to the participant at the beginning of the task. A precision score was obtained for frequency and AM performance by using the inverse of the standard deviation of the errors calculated using a Gaussian function.

A measure of phonological working memory was also included: the digit span (DS) test from WMS-III (Wechsler Memory Scale – Third Edition. The Psychological Corporation). Participants are required to repeat a sequence of digits increasing in load as they heard them (DS Forward) or in the opposite order (DS Backward). The total score represents the number of sequences repeated accurately.

Between channels gap discrimination

A between-channels gap (B-C Gap) discrimination task was adapted from Phillips, Taylor, Hall, Carr and Mossop⁴³. This test was designed to be a ‘central’ gap-detection task that requires the recognition of a gap between frequencies that are represented separately in the ascending auditory pathway. Two narrow-band noises with a bandwidth of 0.25 octaves and a 0.5 ms ramp were separated by a silent interval. The first sound was centred at 4 kHz and lasted 10 ms, while the second sound was centred at 1 kHz and lasted 300 ms. The gap duration started at 200 ms and changed following a 1-up 2-down staircase paradigm, starting with a step-change of 20 ms, followed by 15 (after 3 reversals), 10 (after 6 reversals), 5 (after 8 reversals), 2 (after 10 reversals), and 1 (after 12 reversals) ms. The task ended after a total of 19 reversals or after reaching 125 trials, whichever happened first. Participants were presented with two pairs of stimuli separated by 600 ms, one with a gap as described above and one without (1 ms gap). Participants had to press (self-paced) the number keys ‘1’ or ‘2’ depending on whether the gap was in the first or second position. Feedback was shown on the screen (‘Correct!’ or ‘Wrong!’) for 500 ms. A new trial (inter-trial interval) started after 1 s. During the task, if the gap duration reached 1 ms, any answer was considered wrong and the gap duration increased. Before the beginning of the task, and after a familiarisation run where participants were introduced to the target stimuli, 12 practice trials were presented where the gap duration was 230 ms, and it adaptively changed in a 1-up 1-down pattern by 5 ms. The performance threshold was calculated as the median duration of the gap on the last 6 reversals.

General (fluid) intelligence

A matrix test to measure general or fluid intelligence was created using the matrix reasoning item bank [MaRs-IB; ⁴⁴]. Matrices were all taken from set 1. Participants familiarised themselves with the task with 4 practice trials using the first matrices from the set (numbers 1–4). The test included a total of 26 matrices; 25 in sequential order starting with item 6 (numbers 6–30), and the matrix number 47 of greater difficulty to avoid ceiling effects. Participants had 30 s to respond to each matrix, and a countdown timer appeared for the last 5 s. A total score representing the number of correct matrices was created (0–26) per participant.

Other measures

Measures for musicality (Goldsmith Musical Sophistication Index; Gold-MSI), premorbid intelligence and literacy (Wechsler Test of Adult Reading; WTAR), and self-reported SiN ability (Spatial Speech Questionnaire, SSQ) were also taken. These measures are not included in the current analyses and thus are not described further.

Procedure

After arriving at the lab and providing informed consent, PTA thresholds were measured for frequencies 0.25–8 kHz in a soundproof room using air conduction only with the diagnostic audiometer AD226 by Interacoustics. The computer tasks were then performed in the same soundproof room in the following order: SIN tests (words, then sentences), SFG-Figure discrimination, auditory memory (Freq + AM), SFG-Gap discrimination, Matrices, Gold-MSI, and Gap detection. Paper tests were then completed in another room in the following order: DS Forward, DS Backward, WTAR, and SSQ. The testing session usually lasted 2 h and participants received compensation for their time. Most computer tasks were coded in JavaScript and run using Chrome, except WiN and Gap detection, which were coded using Matlab R2017a. All stimuli were presented between 65 and 73 dB SPL (sound pressure level) depending on the task but at the same level across participants.

Analysis

Before SEM analyses, data were linearly transformed to reduce differences in variance amongst variables and so that positive values would reflect better performance. Thus, the scores of SiN, both SFG tasks, and B-C Gap were inverted. SiN was further multiplied by 10 while B-C Gap was multiplied by 0.1. WiN was changed from reflecting proportion to reflecting percentage. Lastly, the precision scores for AM and Freq were multiplied by 50 and 10, respectively.

Structural equation models (SEM) were built using the lavaan package (version 0.6–17) in R (version 4.2.1). To calculate the models, maximum likelihood estimation with nonnormality correction based on the Satorra-Bentler scaled test statistic was used^45,46. The α level used for significance testing of path coefficients was set at 5%. The models were evaluated by a set of criteria using several goodness-of-fit measures: the Bentler comparative fit index (CFI), Tucker-Lewis Index (TLI), the root-mean-square error of approximation (RMSEA), and the standardised root mean squared residual (SRMR)^47,48. Only robust measures of these indices are reported in this study^45,46. Bootstrapped 95% confidence intervals were created for each model fit measure using 1000 repetitions with the ‘Bollen-Stine’ method⁴⁹ as implemented in lavaan.

The choice of scaling variables for each latent construct was decided on theory alone. Due to previous research showing good predictability of SIN with SFG-Gap²¹, this was selected as the scaling variable of the SFG latent construct. For auditory working memory, recent findings implicate memory for amplitude modulation rates as one of the greatest factors determining speech-in-noise perception⁴², thus AM was used as the scaling variable for the AWM construct. Because word-in-noise perception forms the basis for sentence-in-noise perception, this (i.e., WiN) was used as the scaling variable for SIN. For a Central Auditory Processing (CAP) latent variable, the working memory component was used as the scaling variable, as the contribution of working memory is the most replicated finding in speech-in-noise research²⁸. Although scaling variables are not estimated nor tested for significance, as lavaan uses the fixed-marker technique, these are assumed significant for visualization purposes only.

Results

Hearing thresholds

Participants averaged thresholds over all frequencies (0.25–8 kHz) and both ears were 11.969 (± 9.245) dB hearing level (HL). No participant had greater than mild-hearing loss (< 40 dB) when averaged over all frequencies across both ears. Individual and average thresholds are plotted in Fig. 1.

Demographic data and performance

Demographic data and the average performance of all tasks are shown in Table 1. A Spearman correlation matrix was performed using Holm-Bonferroni correction for multiple comparisons. Most variables were correlated to each other, although age did not correlate with DS backwards. DS backwards also showed no correlation to PTA or both SFG tasks. Lastly, between-channels gap detection did not correlate with SFG for Figure discrimination.

Table 1 Descriptive statistics and spearman correlation coefficients.

Full size table

Sound segregation and auditory (working) memory as separate contributors to speech-in-noise

A SEM model was constructed with separate latent variables representing sound segregation (‘SFG’) and Auditory Working Memory (‘AWM’). The contributions of age, PTA, and overall intelligence (‘MTX’) were also included in the model. This model (Model 1), including path coefficients and model fit indices, can be seen in Fig. 2.

Model 1 (Fig. 2) had a good model fit as demonstrated by the CFI (0.997, 95% CI: 0.978-1) and RMSEA (0.021, 95% CI: 0-0.060). This model was able to explain 79.8% of the variance of speech-in-noise (R²: 0.830, Adjusted R²: 0.798). Based on path coefficients (β), the factor that had the greatest effect on speech-in-noise was age (β: −0.49, p < 0.001), followed by Auditory Working Memory (β: 0.29, p < 0.01). Further, age significantly modified hearing (PTA; β: 0.74, p < 0.001), sound segregation (SFG; β: −0.32, p < 0.001), and general intelligence (MTX; β: −0.38, p < 0.001). Although intelligence contributed to SFG (β: 0.2, p < 0.01) and AWM (β: 0.51, p < 0.001), it did not have a direct effect on SIN (β: −0.03, p > 0.05). Similarly, PTA and SFG did not predict SIN changes significantly (PTA; β: −0.15, p > 0.05; SFG; β: 0.19, p > 0.05). The contribution of hearing (PTA) on SIN ability was also not mediated by (its effect on) working memory (β: −0.15, p > 0.05). Hearing (PTA) had only an effect on sound segregation (SFG; β: −0.32, p < 0.001). The shared variance between working memory and sound segregation was large and statistically significant (β: 0.43, p < 0.001).

The effects of central auditory processing on speech-in-noise

An alternative model (Model 2; Fig. 3) was created where sound segregation and auditory memory were joined under a general process of CAP, with one indicator each. SFG-Gap was used to exemplify sound segregation ability, while precision for amplitude modulation (AM) was used to index auditory memory. A third indicator was used to represent temporal processing: between-channels gap discrimination.

By combining three types of indicators into one latent construct – CAP – this factor achieved a significant contribution to SIN with a path coefficient of 0.50 (p < 0.001), greater than that of age (β: −0.44, p < 0.001). The relationship between SIN and PTA (β: −0.09, p > 0.05), and SIN and MTX (β: −0.02, p > 0.05), was mediated through their effects on CAP (PTA→CAP, β: −0.34, p < 0.001; MTX→CAP, β: 0.37, p < 0.001). Age had a causal effect on CAP (β: −0.26, p < 0.05), PTA (β: 0.74, p < 0.001), and MTX (β: −0.38, p < 0.001).

Model fit, although slightly smaller than the previous model (Model 1), was good as exemplified by CFI (0.991, 95% CI: 0.980-1) and RMSEA (0.046, 95% CI: 0-0.069). This model was able to explain 81.1% of the variance of speech-in-noise (R²: 0.830, Adjusted R²: 0.811), slightly higher than the previous model despite using fewer variables.

All path coefficients for Models 1 and 2 between latent variables, including observed variables, can be seen in Table 2.

Table 2 Path coefficients between latent variables as well as observed variables for model 1 and 2.

Full size table

To assess multicollinearity, which could weaken the confidence in coefficient estimates, we built a linear regression model. First, sentence and word scores were standardized and summed together to create the dependent variable, and all other measured variables used to construct both SEM (Model 1 and Model 2) were defined as predictors. Then, tolerance and variance inflation factors (VIF) were calculated using the package “olsrr” in R. VIF values > 10 are considered to indicate potential collinearity, and thus could undermine the interpretation of the models⁵⁰. VIF values were greatest for Age and PTA, but these were far from the multicollinearity threshold (2.50 and 2.39, respectively).

Discussion

In the current study, we constructed a structural equation model (SEM) of speech-in-noise perception (SIN). SEM is a complex multivariate approach that allows for the exploration of causal relationships between theoretical constructs. We included the observed/exogenous variables age, hearing thresholds (PTA), and general intelligence (MTX), and created latent constructs for sound segregation (SFG) and auditory working memory (AWM). By modelling the effects of age on hearing thresholds and the influence of PTA on central auditory processing, we found that hearing did not have a direct impact on speech in noise. This is in contrast to previous research that highlighted the importance of hearing thresholds over age on speech-in-noise perception^42,51. One possibility for this discrepancy may be due to the fact that the majority of people in our sample had normal or near-normal hearing, with average hearing thresholds not exceeding 40 dB. Additionally, when modelling complex relationships in normal-hearing people between auditory-cognitive factors and hearing, other research has demonstrated little or no effect of audiometric thresholds¹². Nevertheless, the inclusion of young adults with no hearing loss and the fact that our sample has only mild-hearing loss that is highly age-related, prevent us from drawing strong conclusions on the lack of direct relationships between PTA and SIN. We are unfortunately unable to assess whether extended high-frequency (> 8 kHz) hearing level would be a better predictor than standard PTA as suggested by others^52,53,54; although this finding is not always replicated⁵⁵. Similarly, hearing was only measured using air conduction, thus we cannot isolate sensorineural from conductive or mixed hearing loss.

We found that age was the biggest contributor to speech-in-noise perception. The unique variance explained by age was not captured by any of our auditory-cognitive variables. Previous research has highlighted the importance of age [e.g., ⁵⁶], yet the non-specific impacts of ageing preclude a single interpretation. Although the influence of age on some level of cognitive decline was seen in its significant relationship to nonverbal intelligence (as measured by MTX), MTX did not show any direct impact on speech-in-noise thresholds. Due to the significant direct path between age and SIN, part of the age effects on speech-in-noise difficulties relies on mechanisms not explained by our model. One possible factor may be age-related reduced inhibition, as research has demonstrated that, for people with normal hearing, deficits in speech-in-noise perception are mediated by impaired inhibitory processes⁵⁷. Research has found that older adults over-represent sound signals in the cortex, and this extends to the neural processing of “unattended” speech⁵⁸, thus it is plausible that impaired inhibition to “noise” signals hinders speech-in-noise perception. Another factor is likely the age-related slowing in processing speed, which has been associated with SIN perception^28,59 and has been previously attributed as a contributor to SIN decline over time⁶⁰. Nevertheless, age has a multifactorial effect which includes deterioration of overall cognitive abilities⁶¹, temporal processing³⁷, frequency selectivity⁶², and binaural processing⁶³, all of which could impact speech-in-noise perception.

Auditory working memory had a discrete effect on SIN not explained by age or hearing thresholds. As mentioned earlier, the relationship between auditory memory and speech-in-noise is the most replicated finding in SIN research^27,28. Although previous research has found weak or no relationship between working memory and SIN in young adults with normal hearing³², our model demonstrated an association between auditory memory and SIN independent of age and hearing levels, as they did not have significant contributions to the constructed working memory latent variable.

Sound segregation did not contribute to SIN directly, but shared a significant amount of variance with auditory working memory. Sound segregation partly relies on peripheral mechanisms, as PTA significantly affected SFG performance. According to our model, the variance explained by SFG is shared and captured by auditory memory. Besides the shared variance with auditory memory, the addition of a new SFG stimulus (figure discrimination) may be behind the lack of discrete contribution to SIN by this sound segregation construct. Other than the previously developed SFG where participants ought to discriminate the presence of a gap within the figure²¹, and which requires the tracking of the figure over time, a newly developed SFG was used that combined this type of SFG and the original version where the presence of a figure had to be detected. However, the original detection paradigm used a fixed SNR of 0 dB, such that the target and ground are only differentiated by temporal coherence²⁰. In the current newly developed task, which involved figure discrimination in an adaptive paradigm with varying SNR, the figure could be detected within the background by sound-level differences alone, and did not require tracking over time.

Nonverbal reasoning greatly modulated auditory working memory. It also impacted sound segregation mechanisms albeit less strongly. General intelligence is thought to determine all cognitive-related processes; however, our model stresses that the effect of general intelligence on SIN is mediated by auditory working memory. Working memory capacity and general intelligence are highly related⁶⁴, but our model considered a directional mechanism where intelligence determines memory ability. Associations between nonverbal intelligence and degraded-speech recognition have been found before^34,65, but differences in processing speed may be behind this relationship⁵⁹. The matrix test used in the current study posed a time limit to respond, and thus is confounded by processing speed, which is also undoubtedly linked to general intelligence and working memory⁶⁶. The relationship between intelligence and working memory may also partly be driven by the association between intelligence and sensory performance⁶⁷, as some of the tasks used to measure working memory were based on frequency and temporal precision.

We further created a simplified SEM where sound segregation (SFG-Gap), auditory memory (AM rate), and temporal processing (between-channel gap detection) all act under one latent structure: central auditory processing (CAP). By joining sound segregation and auditory memory mechanisms, and adding temporal precision, this latent construct surpassed age as the most important predictor of speech-in-noise perception. We further revealed that the impacts of hearing and intelligence on SIN act through their influence on central auditory processing. In other words, the effects of hearing and intelligence on speech-in-noise are mediated through central auditory mechanisms: greater abstract reasoning and better hearing support improved central processing, which causes enhanced comprehension of speech-in-noise. Previous research using SEM has emphasised the importance of central auditory processing to speech in noise perception. In a sample of 120 older adults, Anderson, White-Schwoch¹² found central processing was the biggest determinant of speech in noise perception, followed by cognitive function which included auditory (and working) memory.

Overall, our research demonstrates a critical role of central auditory processing, encompassing auditory (working) memory, sound segregation, and temporal precision, in speech in noise perception. Furthermore, our models emphasize the importance of non-specific age effects that go beyond declines in hearing and reasoning abilities, and how central auditory processing mechanisms mediate the effect of hearing and cognition on speech-in-noise perception. Our results suggest that CAP processes should be assessed in clinical settings to obtain a comprehensive view of the reasons for SIN difficulties.

Data availability

The data and analysis script used in the current manuscript can be publicly accessed through OSF (https://osf.io/nz9v7/).

References

Vos, T. et al. Global, regional, and National incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: A systematic analysis for the global burden of disease study 2015. Lancet 388 (10053), 1545–1602. https://doi.org/10.1016/S0140-6736(16)31678-6 (2016).
Article Google Scholar
Akeroyd, M. A. & Munro, K. J. Population estimates of the number of adults in the UK with a hearing loss updated using 2021 and 2022 census data. Int. J. Audiol. 63 (9), 659–660. https://doi.org/10.1080/14992027.2024.2341956 (2024).
Article PubMed Google Scholar
Mener, D. J., Betz, J., Genther, D. J., Chen, D. & Lin, F. R. Hearing loss and depression in older adults. J. Am. Geriatr. Soc. 61 (9), 1627–1629. https://doi.org/10.1111/jgs.12429 (2013).
Article PubMed PubMed Central Google Scholar
Li, C. M. et al. Hearing impairment associated with depression in US adults, National health and nutrition examination survey 2005–2010. JAMA Otolaryngol. Head Neck Surg. 140 (4), 293–302. https://doi.org/10.1001/jamaoto.2014.42 (2014).
Article PubMed PubMed Central Google Scholar
Lin, F. R. et al. Hearing loss and incident dementia. Arch. Neurol. 68 (2), 214–220. https://doi.org/10.1001/archneurol.2010.362 (2011).
Article PubMed PubMed Central Google Scholar
Gallacher, J. et al. Auditory threshold, phonologic demand, and incident dementia. Neurology 79 (15), 1583–1590. https://doi.org/10.1212/WNL.0b013e31826e263d (2012).
Article PubMed Google Scholar
Griffiths, T. D. et al. How can hearing loss cause dementia?? Neuron 108 (3), 401–412. https://doi.org/10.1016/j.neuron.2020.08.003 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lin, F. R. et al. Hearing loss and cognitive decline in older adults. JAMA Intern. Med. 173 (4), 293–299. https://doi.org/10.1001/jamainternmed.2013.1868 (2013).
Article ADS PubMed Google Scholar
Livingston, G. et al. Dementia prevention, intervention, and care: 2024 report of the lancet standing commission. Lancet 404 (10452), 572–628. https://doi.org/10.1016/s0140-6736(24)01296-0 (2024).
Article PubMed Google Scholar
Kochkin, S. MarkeTrak VIII: Consumer satisfaction with hearing aids is slowly increasing. Hear. J. 63 (1). https://doi.org/10.1097/01.Hj.0000366912.40173.76 (2010).
Füllgrabe, C., Moore, B. C. J. & Stone, M. A. Age-group differences in speech identification despite matched audiometrically normal hearing: Contributions from auditory temporal processing and cognition. Frontiers in Aging Neuroscience. 6. https://doi.org/10.3389/fnagi.2014.00347 (2015).
Anderson, S., White-Schwoch, T., Parbery-Clark, A. & Kraus, N. A dynamic auditory-cognitive system supports speech-in-noise perception in older adults. Hear. Res. 300, 18–32. https://doi.org/10.1016/j.heares.2013.03.006 (2013).
Article PubMed PubMed Central Google Scholar
Griffiths, T. D. Predicting speech-in-noise ability in normal and impaired hearing based on auditory cognitive measures. Front. Neurosci. 17, 1077344. https://doi.org/10.3389/fnins.2023.1077344 (2023).
Article PubMed PubMed Central Google Scholar
Kochkin, S. MarkeTrak V: Consumer satisfaction revisited. Hear. J. 53 (1), 45–46 (2000).
Article Google Scholar
Kochkin, S. MarkeTrak VII: Customer satisfaction with hearing instruments in the digital age. Hear. J. 58 (9). https://doi.org/10.1097/01.HJ.0000286545.33961.e7 (2005).
Chandrasekaran, B. & Kraus, N. The scalp-recorded brainstem response to speech: Neural origins and plasticity. Psychophysiology 47 (2), 236–246. https://doi.org/10.1111/j.1469-8986.2009.00928.x (2010).
Article PubMed Google Scholar
Ding, N. & Simon, J. Z. Emergence of neural encoding of auditory objects while listening to competing speakers. In Proceedings of the National Academy of Sciences. 109(29): pp. 11854–11859. https://doi.org/10.1073/pnas.1205381109 (2012).
Carlyon, R. P. How the brain separates sounds. Trends Cogn. Sci. 8 (10), 465–471. https://doi.org/10.1016/j.tics.2004.08.008 (2004).
Article PubMed Google Scholar
Teki, S., Chait, M., Kumar, S., von Kriegstein, K. & Griffiths, T. D. Brain bases for auditory stimulus-driven figure-ground segregation. J. Neurosci. 31 (1), 164–171. https://doi.org/10.1523/JNEUROSCI.3788-10.2011 (2011).
Article CAS PubMed PubMed Central Google Scholar
Teki, S., Chait, M., Kumar, S., Shamma, S. & Griffiths, T. D. Segregation of complex acoustic scenes based on temporal coherence. Elife 2, e00699. https://doi.org/10.7554/eLife.00699 (2013).
Article PubMed PubMed Central Google Scholar
Holmes, E. & Griffiths, T. D. Normal’ hearing thresholds and fundamental auditory grouping processes predict difficulties with speech-in-noise perception. Sci. Rep. 9 (1), 16771. https://doi.org/10.1038/s41598-019-53353-5 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Teki, S. et al. Neural correlates of auditory figure-ground segregation based on temporal coherence. Cereb. Cortex. 26 (9), 3669–3680. https://doi.org/10.1093/cercor/bhw173 (2016).
Article PubMed PubMed Central Google Scholar
Holmes, E., Zeidman, P., Friston, K. J. & Griffiths, T. D. Difficulties with speech-in-noise perception related to fundamental grouping processes in auditory cortex. Cereb. Cortex. 31 (3), 1582–1596. https://doi.org/10.1093/cercor/bhaa311 (2021).
Article PubMed Google Scholar
Guo, X. et al. Predicting speech-in-noise ability with static and dynamic auditory figure-ground analysis using structural equation modelling. Proc Biol Sci. 292(2042): 20242503. https://doi.org/10.1098/rspb.2024.2503 (2025).
O’Sullivan, J. A., Shamma, S. A. & Lalor, E. C. Evidence for neural computations of temporal coherence in an auditory scene and their enhancement during active listening. J. Neurosci. 35 (18), 7256–7263. https://doi.org/10.1523/JNEUROSCI.4973-14.2015 (2015).
Article CAS PubMed PubMed Central Google Scholar
Guo, X. et al. Neural entrainment to pitch changes of auditory targets in noise. NeuroImage. 314, 121270. https://doi.org/10.1016/j.neuroimage.2025.121270 (2025).
Akeroyd, M. A. Are individual differences in speech reception related to individual differences in cognitive ability? A survey of Twenty experimental studies with normal and hearing-impaired adults. Int. J. Audiol. 47 Suppl 2, S53–71. https://doi.org/10.1080/14992020802301142 (2008).
Article PubMed Google Scholar
Dryden, A., Allen, H. A., Henshaw, H. & Heinrich, A. The association between cognitive performance and Speech-in-Noise perception for adult listeners: A systematic literature review and meta-analysis. Trends Hear. 21, 2331216517744675. https://doi.org/10.1177/2331216517744675 (2017).
Article PubMed PubMed Central Google Scholar
Rosemann, S. et al. The contribution of cognitive factors to individual differences in understanding noise-vocoded speech in young and older adults. Front. Hum. Neurosci. 11, 294. https://doi.org/10.3389/fnhum.2017.00294 (2017).
Article PubMed PubMed Central Google Scholar
James, P. J., Krishnan, S. & Aydelott, J. Working memory predicts semantic comprehension in dichotic listening in older adults. Cognition 133 (1), 32–42. https://doi.org/10.1016/j.cognition.2014.05.014 (2014).
Article PubMed Google Scholar
Lad, M., Billig, A. J., Kumar, S. & Griffiths, T. D. A specific relationship between musical sophistication and auditory working memory. Sci. Rep. 12 (1), 3517. https://doi.org/10.1038/s41598-022-07568-8 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Füllgrabe, C. & Rosen, S. On the (Un)importance of working memory in speech-in-noise processing for listeners with normal hearing thresholds. Front. Psychol. 7, 1268. https://doi.org/10.3389/fpsyg.2016.01268 (2016).
Article PubMed PubMed Central Google Scholar
Wayne, R. V. & Johnsrude, I. S. A review of causal mechanisms underlying the link between age-related hearing loss and cognitive decline. Ageing Res. Rev. 23, 154–166. https://doi.org/10.1016/j.arr.2015.06.002 (2015).
Article PubMed Google Scholar
Mattingly, J. K., Castellanos, I. & Moberly, A. C. Nonverbal reasoning as a contributor to sentence recognition outcomes in adults with cochlear implants. Otol Neurotol. 39 (10), e956–e963. https://doi.org/10.1097/MAO.0000000000001998 (2018).
Article PubMed PubMed Central Google Scholar
Vaden, K. I. Jr., Teubner-Rhodes, S., Ahlstrom, J. B., Dubno, J. R. & Eckert, M. A. Evidence for cortical adjustments to perceptual decision criteria during word recognition in noise. Neuroimage 253, 119042. https://doi.org/10.1016/j.neuroimage.2022.119042 (2022).
Article PubMed Google Scholar
Windle, R., Dillon, H. & Heinrich, A. A review of auditory processing and cognitive change during normal ageing, and the implications for setting hearing aids for older adults. Front. Neurol. 14, 1122420. https://doi.org/10.3389/fneur.2023.1122420 (2023).
Article PubMed PubMed Central Google Scholar
Anderson, S. & Karawani, H. Objective evidence of Temporal processing deficits in older adults. Hear. Res. 397, 108053. https://doi.org/10.1016/j.heares.2020.108053 (2020).
Article PubMed PubMed Central Google Scholar
Marsja, E., Stenbäck, V., Moradi, S., Danielsson, H. & Rönnberg, J. Is having hearing loss fundamentally different? Multigroup structural equation modeling of the effect of cognitive functioning on speech identification. Ear Hear. 43 (5), 1437–1446. https://doi.org/10.1097/aud.0000000000001196 (2022).
Article PubMed Google Scholar
Kumar, S. et al. Oscillatory correlates of auditory working memory examined with human electrocorticography. Neuropsychologia 150, 107691. https://doi.org/10.1016/j.neuropsychologia.2020.107691 (2021).
Article PubMed PubMed Central Google Scholar
Kumar, S. et al. A brain system for auditory working memory. J. Neurosci. 36 (16), 4492. https://doi.org/10.1523/JNEUROSCI.4341-14.2016 (2016).
Article CAS PubMed PubMed Central Google Scholar
Guo, X. et al. British version of the Iowa test of consonant perception. JASA Express Lett. 4 (12). https://doi.org/10.1121/10.0034738 (2024).
Lad, M., Taylor, J. P. & Griffiths, T. D. The contribution of short-term memory for sound features to speech-in-noise perception and cognition. Hear. Res. 451, 109081. https://doi.org/10.1016/j.heares.2024.109081 (2024).
Article PubMed Google Scholar
Phillips, D. P., Taylor, T. L., Hall, S. E., Carr, M. M. & Mossop, J. E. Detection of silent intervals between noises activating different perceptual channels: Some properties of central auditory gap detection. J. Acoust. Soc. Am. 101 (6), 3694–3705. https://doi.org/10.1121/1.419376 (1997).
Article ADS CAS PubMed Google Scholar
Chierchia, G. et al. The matrix reasoning item bank (MaRs-IB): Novel, open-access abstract reasoning items for adolescents and adults. R Soc. Open. Sci. 6 (10), 190232. https://doi.org/10.1098/rsos.190232 (2019).
Article ADS PubMed PubMed Central Google Scholar
Brosseau-Liard, P. E. & Savalei, V. Adjusting incremental fit indices for nonnormality. Multivar. Behav. Res. 49 (5), 460–470. https://doi.org/10.1080/00273171.2014.933697 (2014).
Article Google Scholar
Brosseau-Liard, P. E., Savalei, V. & Li, L. An investigation of the sample performance of two nonnormality corrections for RMSEA. Multivar. Behav. Res. 47 (6), 904–930. https://doi.org/10.1080/00273171.2012.715252 (2012).
Article Google Scholar
Hu, L. & Bentler, P. M. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct. Eq. Model. Multidis. J. 6 (1), 1–55. https://doi.org/10.1080/10705519909540118 (1999).
Article Google Scholar
Kline, R. B. Principles and Practice of Structural Equation Modeling 3rd. edn (Guilford Press, 2011).
Bollen, K. A. & Stine, R. A. Bootstrapping Goodness-of-Fit measures in structural equation models. Sociol. Methods Res. 21 (2), 205–229. https://doi.org/10.1177/0049124192021002004 (1992).
Article Google Scholar
Chatterjee, S. & Hadi, A. S. Analysis of collinear data, In Regression Analysis by Example, (eds Chatterjee, S. & Hadi A.S.) 221–258. https://doi.org/10.1002/0470055464.ch9 (2006).
Billings, C. J. & Madsen, B. M. A perspective on brain-behavior relationships and effects of age and hearing using speech-in-noise stimuli. Hear. Res. 369, 90–102. https://doi.org/10.1016/j.heares.2018.03.024 (2018).
Article PubMed PubMed Central Google Scholar
Yeend, I., Beach, E. F. & Sharma, M. Working memory and extended High-Frequency hearing in adults: Diagnostic predictors of speech-in-noise perception. Ear Hear. 40 (3), 458–467. https://doi.org/10.1097/aud.0000000000000640 (2019).
Article PubMed Google Scholar
Çolak, H. et al. Subcortical auditory processing and speech perception in noise among individuals with and without extended high-frequency hearing loss. J. Speech Lang. Hear. Res. 67 (1), 221–231. https://doi.org/10.1044/2023_JSLHR-23-00023 (2024).
Article PubMed Google Scholar
Motlagh Zadeh, L. et al. Extended high-frequency hearing enhances speech perception in noise. In Proceedings of the National Academy of Sciences 23753–23759 Vol .116(47). https://doi.org/10.1073/pnas.1903315116 (2019).
Smith, S. B. et al. Investigating peripheral sources of speech-in-noise variability in listeners with normal audiograms. Hear. Res. 371, 66–74. https://doi.org/10.1016/j.heares.2018.11.008 (2019).
Article CAS PubMed Google Scholar
Besser, J., Festen, J. M., Goverts, S. T., Kramer, S. E. & Pichora-Fuller, M. K. Speech-in-Speech listening on the LiSN-S test by older adults with good audiograms depends on cognition and hearing acuity at high frequencies. Ear Hear. 36 (1), 24–41. https://doi.org/10.1097/aud.0000000000000096 (2015).
Article PubMed Google Scholar
Gomez-Alvarez, M., Johannesen, P. T., Coelho-de-Sousa, S. L., Klump, G. M. & Lopez-Poveda, E. A. The relative contribution of cochlear synaptopathy and reduced inhibition to age-related hearing impairment for people with normal audiograms. Trends Hear. 27. https://doi.org/10.1177/23312165231213191 (2023).
Presacco, A., Simon, J. Z. & Anderson, S. Evidence of degraded representation of speech in noise, in the aging midbrain and cortex. J. Neurophysiol. 116 (5), 2346–2355. https://doi.org/10.1152/jn.00372.2016 (2016).
Article PubMed PubMed Central Google Scholar
Moberly, A. C., Mattingly, J. K. & Castellanos, I. How does nonverbal reasoning affect sentence recognition in adults with cochlear implants and Normal-Hearing peers?. Audiol. Neurootol. 24 (3), 127–138. https://doi.org/10.1159/000500699 (2019).
Article PubMed Google Scholar
Pronk, M. et al. Decline in older persons’ ability to recognize speech in noise: The influence of demographic, Health-Related, environmental, and cognitive factors. Ear Hear. 34 (6). https://doi.org/10.1097/AUD.0b013e3182994eee (2013).
Harada, C. N., Love, M. C. N., & Triebel, K. L. Normal cognitive aging. Clin. Geriatr. Med. 29 (4), 737–752. https://doi.org/10.1016/j.cger.2013.07.002 (2013).
Article PubMed PubMed Central Google Scholar
Regev, J., Zaar, J., Relaño-Iborra, H. & Dau, T. Age-related reduction of amplitude modulation frequency selectivity. J. Acoust. Soc. Am. 153 (4), 2298–2298. https://doi.org/10.1121/10.0017835 (2023).
Article ADS PubMed Google Scholar
Moore, B. C. J. Effects of age and hearing loss on the processing of auditory Temporal fine structure. In Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing. https://doi.org/10.1007/978-3-319-25474-6_1 (Springer International Publishing. 2016).
Conway, A. R. A., Kane, M. J. & Engle, R. W. Working memory capacity and its relation to general intelligence. Trends Cogn. Sci. 7 (12), 547–552. https://doi.org/10.1016/j.tics.2003.10.005 (2003).
Article PubMed Google Scholar
Moberly, A. C. et al. How does aging affect recognition of spectrally degraded speech? Laryngoscope. 128 Suppl 5(Suppl 5). https://doi.org/10.1002/lary.27457 (2018).
Fry, A. F. & Hale, S. Relationships among processing speed, working memory, and fluid intelligence in children. Biol. Psychol. 54 (1), 1–34. https://doi.org/10.1016/S0301-0511(00)00051-X (2000).
Article CAS PubMed Google Scholar
Troche, S. J. & Rammsayer, T. H. Temporal and non-temporal sensory discrimination and their predictions of capacity- and speed-related aspects of psychometric intelligence. Pers. Indiv. Differ. 47 (1), 52–57. https://doi.org/10.1016/j.paid.2009.02.001 (2009).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biosciences Institute, Newcastle University, Newcastle Upon Tyne, UK
Ester Benzaquén, Hasan Çolak, Xiaoxuan Guo & Timothy D. Griffiths
Department of Audiology, Hacettepe University, Ankara, Turkey
Hasan Çolak
Translational and Clinical Research Institute, Newcastle University, Newcastle Upon Tyne, UK
Meher Lad
School of Natural and Environmental Sciences, Newcastle University, Newcastle Upon Tyne, UK
Steven P. Rushton

Authors

Ester Benzaquén
View author publications
Search author on:PubMed Google Scholar
Hasan Çolak
View author publications
Search author on:PubMed Google Scholar
Xiaoxuan Guo
View author publications
Search author on:PubMed Google Scholar
Meher Lad
View author publications
Search author on:PubMed Google Scholar
Steven P. Rushton
View author publications
Search author on:PubMed Google Scholar
Timothy D. Griffiths
View author publications
Search author on:PubMed Google Scholar

Contributions

E.B, X.G. and T.G. conceptualized and designed the study. T.G. received the funding. E.B. and M.L. programmed the computer tasks. E.B., H.Ç., and X.G. collected the data. E.B., H.Ç., X.G. and S.R. analysed the data. E.B. prepared all figures. E.B. wrote the original draft. T.D., H.Ç., and X.G. revised the manuscript. All authors reviewed and approved the submitted manuscript.

Corresponding author

Correspondence to Ester Benzaquén.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Benzaquén, E., Çolak, H., Guo, X. et al. Auditory-cognitive contributions to speech-in-noise perception determined with structural equation modelling of a large sample. Sci Rep 15, 34915 (2025). https://doi.org/10.1038/s41598-025-18800-6

Download citation

Received: 14 May 2025
Accepted: 03 September 2025
Published: 07 October 2025
DOI: https://doi.org/10.1038/s41598-025-18800-6