Abstract
Auditory perception requires categorizing sound sequences, such as speech or music, into classes, such as syllables or notes. Auditory categorization depends not only on the acoustic waveform, but also on variability and uncertainty in how the listener perceives the sound – including sensory and stimulus uncertainty, the listener’s estimated relevance of the particular sound to the task, and their ability to learn the past statistics of the acoustic environment. Whereas these factors have been studied in isolation, whether and how these factors interact to shape categorization remains unknown. Here, we measured human participants’ performance on a multi-tone categorization task and modeled each participant’s behavior using a Bayesian framework. Task-relevant tones contributed more to category choice than task-irrelevant tones, confirming that participants combined information about sensory features with task relevance. Conversely, participants’ poor estimates of task-relevant tones or high-sensory uncertainty adversely impacted category choice. Learning the statistics of sound category over both short and long timescales also affected decisions, biasing the decisions toward the overrepresented category. The magnitude of this effect correlated inversely with participants’ relevance estimates. Our results demonstrate that individual participants idiosyncratically weigh sensory uncertainty, task relevance, and statistics over both short and long timescales, providing a novel understanding of and a computational framework for how sensory decisions are made under several simultaneous behavioral demands.
Similar content being viewed by others
Introduction
Making sensory decisions in everyday settings is a complex task due to the presence of multiple forms of uncertainty1,2,3,4,5,6,7,8. One of many such decisions is identifying what a friend said when the conversation takes place in a crowded restaurant versus in a quiet room. Despite the apparent ease by which we can accomplish this task, it is a very complicated computational process involving many factors. First, the sensory transduction process and neural-processing stages are noisy, generating a form of uncertainty called sensory uncertainty9,10,11,12. A second form of uncertainty is stimulus relevance: of all stimuli within a sensory scene, only some are relevant to the current decision13,14,15,16,17,18. For example, when trying to identify a particular sound uttered by our friend, we have to disregard sounds uttered by other speakers, which can be considered to be distractors. A third form of uncertainty is stimulus uncertainty, which is a listener’s uncertainty in estimating the variability in a sound source’s generation of the nominal same stimulus19. Even once our sensory system isolates the signals that were produced by our friend from background noise or irrelevant signals, we may be uncertain as to what specific sound or word was produced. Furthermore, when presented with any form of uncertainty, observers may rely on the learned past statistics of stimuli20,21,22,23,24,25,26 to make their decisions, such as their long-term knowledge of how their friend typically talks or their short-term knowledge of their friend’s recent speech sounds27. Indeed, observers can learn both short- 28,29 and long-term stimulus statistics30,31,32 and this learning is considered crucial for efficient decision making33,34, with or without a conscious effort from the listener.
Whereas the effects of the factors – sensory uncertainty, stimulus relevance, stimulus uncertainty and short- and long-term statistics – on sensory decision-making, such as categorization, have previously been studied separately19,35, in everyday situations we often confront them simultaneously. Thus, these factors may interact with each other, differentially affecting category decisions. For example, sensory uncertainty may not only decrease our ability to categorize a sound uttered by our friend but may also affect our estimate of the relevant versus the distractor sounds. Our prior expectations of what our friend is saying may play a greater role when our sensory uncertainty or stimulus uncertainty is high or when there are more distracting sounds in the environment by other speakers. These expectations may be reduced if we are talking to a person we just met.
To test whether and how the interplay of these factors shapes sensory decisions, we measured the performance of human participants on an auditory categorization task36,37,38,39. More specifically, our goals were to test how observers’ sensory uncertainty and associated estimates of the likelihood of the irrelevant stimuli mediated their sensitivity to stimulus relevance, and how their task performance was affected by learning. Participants categorized trial sequences of three tones as either high frequency or low frequency in a two-alternative forced choice task, as we varied the number of task-irrelevant (distractor) tones and task-relevant (category-specific) tones across the trials. Additionally, to test learning, in a subset of experimental sessions, we modulated the proportion of trials biased toward the high or the low category.
Because it has been suggested that perceptual systems optimally integrate information from relevant inputs while ignoring the irrelevant ones40, we characterized participants’ behavior using normative Bayesian models of sensory decision-making41,42,43,44,45,46. The Bayesian formalism is a well-established framework to understand complex interactions among many factors governing decision-making13,14,30,31,47. On average, when categorizing auditory stimuli, participants integrated information across all three tones but weighed them by their task relevance. Stimulus uncertainty was similar across participants. However, individual participants’ sensitivity to task relevance depended on their internal sensory uncertainty and subjective estimates of the likelihood of irrelevant stimuli. We also found population-wide evidence of both learning short-term statistics (i.e., stimulus category of the previous trial) and learning long-term statistics (i.e., session-level bias toward high versus low category). However, individual participants’ knowledge of short- and long-term statistics was inversely correlated with their estimates of the likelihood of irrelevant stimuli. Together, our results demonstrate that humans differentially combine information about sensory uncertainty, stimulus relevance, stimulus uncertainty, and short-term and long-term learning, providing an integrated framework for understanding how the interplay of different mechanisms shapes auditory processing.
Results
Participants weigh tones according to stimulus relevance
Auditory categorization can depend on multiple factors, such as sensory uncertainty, stimulus relevance, stimulus uncertainty, and stimulus statistics that can vary over both short and long timescales. Because these factors often co-occur, they may interact with each other to affect decision-making and considering their relations they, indeed, computationally should interact. Here, we designed a categorization task to measure how these factors interacted in decision-making. In a two-alternative forced-choice auditory task, participants were asked to report the category of a three-tone sequence (Fig. 1A). Each trial was randomly selected to be from low or high category, and each tone of the trial could probabilistically either be a signal or distractor (Fig. 1B-D).
Schematic diagram of the auditory categorization task. (A) A participant categorizes a given three-tone trial sequence as high or low. (B) Example low (top) and high (bottom) category trials from an unbiased session. The underlying probability distribution for signal tones is Gaussian (low-frequency Gaussian: light blue; high-frequency Gaussian: red) and for distractor tones is uniform (purple). The stimuli in each trial are three tones denoted by notes: signal tones from the low-frequency distribution (light blue), signal tones from the high-frequency distribution (red), and distractor tones from the uniform distribution (purple). (C) The 4 types of trial combinations in the task and their corresponding probabilities. (D) Example trial sequences from one of the participant’s data in the unbiased session.
We first tested whether participants accounted for relevance of individual tones when categorizing the tone sequences. Because the total number of tones was the same during each trial, when a signal tone was replaced by a distractor tone, it decreased the amount of available information to the listener regarding stimulus category and added irrelevant information. To account for this, we analyzed trials separately with the same number of distractors across individuals (Fig. 2A,B). Compared to signal tones, distractors should have less of an impact on participants’ decisions, because they have no relevance for the task. As expected, participant’s category choice probability was more strongly correlated with the frequency of the presented signal tones compared to the frequency of the distractor tones (Fig. 2A; two-sided paired t-test, trials with one distractor: p = 1.32e-22; trials with two distractors: p = 5.36e-18; Figs S1C-H and Table 1). This confirms that participants used tone relevance when making decisions.
On average, participants weighted tones according to their relevance when making category decisions. (A) For both signal (grey) and distractor (purple) tones, their influence on category choice is computed using correlation of the respective tone frequencies with their associated average category choice probability (Methods). Error bars: standard error of the mean. Statistics: two-sided paired t-test between each subjects’ normalized correlation coefficient of signal tone frequencies with choice probability and correlation coefficient of distractor tone frequencies with choice probability. (B) Average accuracy in unbiased trials decreased with the number of distractor tones. Error bars: standard error of the mean. Statistics: one-way repeated measures ANOVA; ptone_position = 2.82e-58, Ftone_position(3, 165) = 226.61. (C) For trials with exactly one distractor tone, accuracy was higher when the distractor tone frequency was similar to the trial category (Blue: low-category trials, red: high-category trials). Binned frequencies of the distractor are shown on the x-axis. Error bars: standard error of the mean. Statistics: two-tailed Wilcoxon signed-rank test between accuracy in high- and low- category trials computed at each distractor frequency. N = 56 participants. †p < 0.05, *p < 0.01, **p < 0.001, ***p < 0.0001.
Conversely, because distractor tones may still be integrated in the decision with some probability48,49, they may impair participants’ performance by providing information that should have been ignored. Indeed, as the number of distractors increased, the correlation between distractor tones and category choice, which we interpret as the influence of distractor tones on category choice, also increased whereas that of signal tones decreased (Fig. 2A; Figs S1B-D; one-way rmANOVA, significant main effect for number of distractors on correlations with signal tone frequencies: F(2,110) = 34.55, p = 2.28e-12, and on correlations with distractor tone frequencies: F(1,55) = 24.36, p = 8e-6). Additionally, accuracy decreased as the number of distractors increased (Fig. 2B; N = 56, correlation of accuracy with the number of distractors: Spearman, ρ = -1, p = 0). These findings collectively demonstrate that because participants were uncertain about the relevance of the different tones, the distractors adversely impacted their decisions.
Next, we tested whether the effect of distractors depended on their tone frequency. We hypothesized that a distractor tone whose frequency was in the same frequency range as the trial’s generative category center would impair participants’ decisions less than a distractor tone whose frequency was not in the same frequency range. We evaluated this prediction by computing the average participant accuracy for trials with one distractor tone. Category accuracy for these trials was significantly higher when the frequency of the distractor was similar to the trial category (Fig. 2C; Table 1 – Wilcoxon signed-rank test). For trials in which the distractor frequency was different than the trial category, we plotted separate psychometric curves corresponding to the signal or distractor tone frequencies (Fig S1J). A qualitative comparison of these curves to the psychometric curve for trials with only signal tones (Fig S1B) indicates that distractors have a substantial effect on participants’ category choice. Combined, these data suggest that participants’ ability to differentiate between signal and distractor tones depends on the frequency value of the distractors.
Participants exhibit high individual variability in performance
Whereas the previous analysis confirmed the expected effects of tone relevance on the mean performance of the participants, the extent to which tone relevance impacts choice varies idiosyncratically across individuals. This diversity may be due to participants’ distinct internal priors, estimates of relevance, measure of stimulus uncertainty or their sensory uncertainty. Indeed, there was significant heterogeneity in the correlation of tone frequency with category choice probability (for both signal and distractor tones, Figs. 3A, B) and with accuracy (across trials with one and two distractors, Fig. 3C). Accuracy for trials with one distractor tone ranged from 57%—89% and for trials with two distractor tones ranged from 53%—75%. Thus, in subsequent analyses, we not only quantified the effects of relevance, stimulus uncertainty, sensory uncertainty, and learning of stimulus statistics on the average performance across the population but also on inter-participant variability. This allowed us to probe how these different real-life factors might mediate individual participant’s decision-making process.
Individual participants differed in their estimate of relevance (A, B) For each participant, influence of tone type on category choice is computed using the correlation of respective tone frequencies with category choice probability. (A) Influence of signal versus distractor tones for trials with one distractor. Statistics: two-tailed Wilcoxon signed-rank test for the 56 participants. (B) Influence of signal versus distractor tones for trials with two distractors. Statistics: two-tailed Wilcoxon signed-rank test for the 56 participants. (C) Categorization accuracy for the tone sequences with one versus two distractor tones. Statistics: two-tailed Wilcoxon signed-rank test for the 56 participants. Error bars: standard error of the mean obtained by assuming that a participant’s accuracy follows a binomial distribution; not large enough to be distinguishable. †p < 0.05, *p < 0.01, **p < 0.001, ***p < 0.0001.
Bayesian model can account for participant performance
So far, we have demonstrated in a model-free way that when making decisions, participants across the population accounted for the relevance of the different tone frequencies as they integrated information from the three tones. Next, to investigate how these factors shaped individual participant’s category decisions, we modeled each participant’s behavior using a Bayesian approach. We first discuss how tone frequency should affect individual category choice probability and then lay out three models for understanding decisions and their predictions for this relation.
On a standard two alternative forced choice (2AFC) frequency categorization task with no distractor tones, tone frequencies at the intersection of the two gaussian distributions do not provide evidence for either category. As the tone frequencies move further away from the decision boundary, tones become more informative about the trial’s category (Figs. 4A, B) and consequently, a participant’s accuracy increases the further the tone frequencies are from the decision boundary. Thus, in the absence of distractors, if a trial consisted of a single tone, a participant’s choice probability would be modeled by a standard psychometric curve, which is a monotonically increasing saturating sigmoid. In contrast, if the trial consisted of multiple tones, choice probability could be modeled as the product of the relevant sigmoidal single-tone psychometric curves.
Graph of the Bayesian Model (A) The (top level) category identity C of the tone-burst sequence constrains the values of the (middle level) three individually generated tone frequencies vi. Each tone frequency has a probability of 30% of being replaced by a distractor frequency uniformly drawn from the full frequency range. The (bottom level) auditory sensory signal mi represents a noisy measurement of the true tone frequency vi. The black arrows define the generative conditional probability densities p(v|C) and p(m|v). The task of the observer is to infer the category membership of the full tone sequence v from the noisy sensory measurement m = {m1, m2, m3} while considering the possibility of distractor tone bursts (green arrows). (B) The Full Bayesian Model considers participants to be veridical about the parameters of the generative distribution and the distractor probability. Top: Given a particular category, the perceived probability of a certain tone frequency is governed by the respective conditional distribution p(v|C) and the probability of that tone being replaced by a distractor tone pdistractor. Middle: The sensory process of the Bayesian observer is modeled as a Gaussian process centered at the true stimulus frequency. Bottom: If the tones can be judged to be either signal or distractor, the psychometric curve becomes ‘S’-shaped. Further, this curve plateaus at 0.5 because frequency values at the tails of the Gaussian distributions are more likely to be distractors than signal tones. (C) The No Distractors Model considers those participants that believe that there are no distractors, and therefore all perceived stimuli are drawn from either the “high" or “low” distributions. Top: Given a particular category, the perceived probability of a certain tone frequency is governed by the respective conditional distribution p(v|C). Middle: The sensory process of the Bayesian observer is modeled as a Gaussian process centered at the true stimulus frequency. Bottom: If a trial is judged to contain only signal tones that were drawn from the low- or high- frequency signal Gaussian distributions, the psychometric curve becomes sigmoid, as in a traditional 2AFC Task. (D) The Random Guess Model considers those participants that assign category identities at random, as if all tone bursts were drawn from the distractor distribution. Top: Given a particular category, the perceived probability of a certain tone frequency is flat across the stimulus range. Middle: The sensory process of the Bayesian observer is modeled as a Gaussian process centered at the true stimulus frequency. Bottom: If a trial is judged to contain only distractor tones, the psychometric curve becomes flat, as each tone is not informative to the category decision. Blue: p(v|C = L), red: p(v|C = H), purple: p(v|D).
However, in the presence of distractors, we expect participants’ performance to be substantially different. Because tones whose frequencies are far from the category-specific Gaussians are more likely to be distractors than signal tones (Figs. 4C, D), participants may assign lower relevance to those tones. This effect would result in poorer accuracy for tones lying at the tails of the Gaussian distributions (Fig. 4D).
We developed a general framework to capture potential strategies that a participant could use when categorizing the three-tone sequences 41,50,51,52. These strategies depend on a participant’s internal model of stimulus relevance of the three tone frequencies and can be formalized using the Bayesian approach (Fig. 5A). The full Bayesian model reflects a decision-making strategy for a participant (Fig. 5A table, top). This model estimates the participant’s belief about the relevance of each of the three tones and integrates information across the tones accordingly. It also captures the participant’s prior beliefs of the trial categories. As alternative models, we considered two reduced variants of this model. In the no-distractor model, the participant assumed that all three tones were ‘signal’ (Fig. 5A table, center). Conversely, in the random-guess model, the participant assumed that all three tones were irrelevant ‘distractors’ and made their choice randomly akin to a coin flip based on only their learned prior beliefs (Fig. 5A table, bottom). This model controls for the possibility that participants ignored all the tones when they made their decision. Henceforth, we interpret our experimental findings using this three-model Bayesian framework.
Schematic diagram of the 3 decision-making models; psychometric curves and model fits shown for both representative participants as well as averaged across all participants. (A) A schematic of all 3 decision-making models. (top) The full Bayesian model: the participant considers that any tone can probabilistically either be ‘signal’ or ‘distractor’ (see main text for more information). Using a hypothetical trial, we illustrate that deciding that a tone is ‘signal’ versus ‘distractor’ shifts participants’ category choice probabilities. (center) The no-distractor model: the participant considers all tones to be ‘signal’ tones, and thus a tone that the full Bayesian model interprets as a distractor might instead by interpreted as a high signal tone. (bottom) The random-guess model: the participant considers all tones to be ‘distractor’ tones and thus makes their decision randomly. \(\widehat{L}\): \(\widehat{Low}\)-, \(\widehat{H}\): \(\widehat{High}\)—category decision. (B, C, D) Psychometric curves (black lines) and respective model fits (full Bayesian model: orange lines, no-distractor model: teal lines, random-guess model: magenta lines) for 3 example participants (for posteriors of the full Bayesian model, and for GLM analysis, see Fig S3; for more participants see Fig S4). The x-axes of the psychometric curves span the frequency range of the stimulus tones, and the y-axes denote the mean probability of a \(\widehat{High}\) category choice response. Error bars: standard error of the mean. (B) Example participant with high task accuracy on trials with either one or two distractor tones. (C) Example participant with high task accuracy on trials with one distractor, and poor accuracy on trials with two distractors. (D) Example participant with poor accuracy on trials with both one distractor and two distractor trials. (E) Psychometric curve and model fits averaged across all participants.
We first examined the performance of a representative participant with high accuracy on trials with distractors (Fig. 5B, one-distractor accuracy 80.59% and two-distractor accuracy 74.14%). Tones that were far from the two category distributions and thus more likely to be distractors were correlated less strongly with behavioral performance than tones closer to the middle of the distributions (Fig. 5B, black). Additionally, the full Bayesian model (Fig. 5B, orange) fit the participant’s nonmonotonic behavior considerably better than the no-distractor or random-guess models (Fig. 5B, teal and magenta, respectively; Table 1 – two-sided Wilcoxon signed-rank test). These analyses collectively demonstrated that this participant used the relevance of the different tones when categorizing trials.
We next studied the performance of two participants with intermediate and low accuracy. Their choice probability for tones drawn from the flanks (highest or lowest frequencies) of the signal distributions changed less in comparison to the participant with higher accuracy, resulting in psychometric curves that were more sigmoidal in nature (Figs. 5C, D). Furthermore, the difference in accuracy on trials with one and two distractors was lower for the low-accuracy than high-accuracy participants (Fig. 5C: one-distractor accuracy 78.24% and two-distractor accuracy 59.48%, Fig. 5D: one-distractor accuracy 59% and two-distractor accuracy 61.21%). Even for these participants, although the effect was smaller, the full Bayesian model provided a better fit to the psychometric curves compared to the other two Bayesian models (orange curve versus the teal and magenta curves; Table 1 – two-sided Wilcoxon signed-rank test).
These three examples suggest that potentially all the participants, irrespective of their accuracy, are impacted by stimulus relevance when choosing the stimulus category, albeit to varying degrees. To test this hypothesis across the entire population, we plotted the average psychometric curve and the average model fits (Fig. 5E). We found that there the data were well fit by the full Bayesian model, which qualitatively illustrates a population-wide effect of stimulus relevance on category choice decisions.
Next, we quantified this effect by comparing the statistics for the fits of the Bayesian models. Because the number of fitting parameters differs across the three models (6 in the full Bayesian, 5 in the no-distractor, and 2 in the random-guess model (see Methods and Fig S5)), we used Bayesian Information Criterion (BIC) scores to compare them. A lower BIC score indicates a better model fit. For all participants, the full Bayesian model was better than the random-guess model (Fig. 6A; population statistics: two-sided Wilcoxon signed-rank test on BIC, Z = -6.51, p = 7.55e-11) at describing categorization behavior. This implies that participants do not simply rely on their prior beliefs when classifying the tone sequences.
Nearly all participants were sensitive to stimulus relevance. On unbiased trials, BIC scores of the full Bayesian model were lower than that of the (A) random-guess model for all participants and (B) the no-distractor model for all except two participants. The two outliers are shown in light pink. Statistics: two-tailed Wilcoxon signed-rank test comparing model fits across the population. (C) Participants’ accuracy was inversely correlated with difference in BIC scores between the full Bayesian and no-distractor models. Statistics: Spearman correlation. In all panels, data point: one participant; †p < 0.05; *p < 0.01, **p < 0.001, ***p < 0.0001. Error bars in (A, B) and horizontal error bars in (C): 95% confidence intervals derived from respective model fits to 100 bootstrapped samples. Vertical error bars in (C): standard error of the mean, obtained by assuming that accuracy follows a binomial distribution; vertical error bars not large enough to be distinguishable.
For most participants, except for two, the full Bayesian model was also better than the no-distractor model (Fig. 6B; same test, Z = -6.49, p = 8.41e-11). For the two outlier participants (Fig. 6B, C; light pink), the full Bayesian and no-distractor model fits were similar (Table 1 – two-sided Wilcoxon signed-rank tests), indicating that they did not necessarily account for relevance when making decisions. However, for the rest, the full Bayesian model reproduced the main patterns in the data in an absolute sense because it accurately captured participants’ responses across tone frequencies (Figs. 5B-E, Figs S4A, D, G). These results confirm that nearly all participants took stimulus relevance into account when making category decisions.
Finally, we correlated model fits to participants’ task accuracy. Across the population, the difference in BICs of the full Bayesian and the no-distractor models was inversely correlated with accuracy (Fig. 6C, Spearman, ρ = -0.85, p = 8.53e-17). For highly accurate participants, the full Bayesian model performed particularly well. These data, once again, suggest that participants’ behavioral performance was determined by their ability to discriminate tones based on their decision relevance.
We also explored the space of non-Bayesian models that had fewer assumptions. For the first model, we implemented a simple boundary model that had three parameters: (1) the frequency dividing the low distractor range from the low-signal range, (2) the frequency dividing the low and high signal ranges, and (3) the frequency dividing the high signal range from the high-distractor range. We found that this model was informative for the high-accuracy participants and consistent with the full Bayesian model (Figs S2C, D), but was not informative for the other participants.
For the high-accuracy participants, this boundary model provided insight into the extent to which they discounted extreme stimuli. We found that these participants exaggerate the signal-frequency ranges by underestimating the lower signal boundary (separating distractor from low signal) and overestimating the upper signal boundary (separating high signal from distractor) (Fig S2A). This matches the full Bayesian fitting results, which also show a tendency to exaggerate the distance of the perceived generative signal distributions from the center of the stimulus range (Figs S5B, C).
For the second model, we implemented a generalized linear model (GLM) (Figs S3A, E). We found that this model was consistent with the full Bayesian model for high-accuracy participants. The observation that high- and intermediate-accuracy participants are impacted by stimulus relevance in their decision-making process was substantiated by the corresponding boundary models and GLMs (Figs S2D, E, S3F, G). The GLM analysis was also able to substantiate the finding that the difference in accuracy on trials with one and two distractors was lower for the low-accuracy than high-accuracy participants (Fig S4).
However, the boundary model and GLM differed in their ability to capture the behavior of low-accuracy participants. For low-accuracy participants, the boundary model assigned internal boundaries to these participants that caused most of the stimulus range to be labeled as distractor tones (Fig S2B). In contrast, the GLM results indicate that even these participants understood the generative distributions (Fig S3F), which is backed by the full Bayesian results showing that these participants were still best fit by the full Bayesian model. (Figs. 5D, S3C). Although the GLM was able to describe the behavior of low-accuracy participants, it was not as informative as the Bayesian models regarding the interpretation of the parameters (Figs S3E-G). Together, these results indicate that individual variability in performance is not captured by the boundary and GLM models which primarily are driven by representations of the generative distributions. Therefore, other factors likely account for individual variability in performance.
Sensory uncertainty and estimated task relevance underlie inter-participant variability
Individual participants’ ability to discriminate between generative distributions of the signal and distractor tones may depend not only on their subjective estimates of the tones’ task relevance (or more specifically, their estimate of the probability of distractors) but also on their measure of stimulus uncertainty (which is the standard deviation of the two Gaussian distributions) and their sensory uncertainty. For instance, some participants may have different prior convictions about the number of distractors (e.g., some may spend more time in noisy environments than others), which could bias how they integrate information over the three tones. Additionally, we expect that participants with low stimulus and sensory uncertainty would distinguish better between signal and distractor tone frequencies than participants with high uncertainty. A participant’s sensory uncertainty determines the variability in perception of the same tone frequency over multiple presentations, whereas the stimulus uncertainty captures the participant’s estimate of the variability in the underlying source generating the same stimulus. Thus, a participant with low stimulus and sensory uncertainties would perceive repeated presentations of the same tone more consistently and would estimate the distribution of the low (high) tone source to be narrower than a participant with high stimulus and sensory uncertainties. Consequently, we conjectured that estimated priors for the probability of distractors, stimulus uncertainty, and sensory uncertainty would drive individual participants’ category choice and task accuracy contributing to inter-individual variability.
To test the effect of these three factors on category choice, we first analyzed the inter-participant variability in these factors. We found that across the 56 participants, stimulus uncertainty (denoted by model parameter \(\sigma\)) was nearly constant (Fig S5D), whereas sensory uncertainty (denoted by parameter \({\sigma }_{sensory}\)) and the estimation for the probability of distractors (denoted by model parameter \({p}_{distractor}\)) were participant-specific (Figs S5A,E,F). To quantify a relationship between \({p}_{distractor}\) and \({\sigma }_{sensory}\), we defined a metric ‘sigmoidicity’ (Fig. 7A; see Methods) which quantitatively captured participant’s category decisions across all trials. We found that this metric was significantly influenced by \({p}_{distractor}\) and \({\sigma }_{sensory}\) as well as by the interaction between these two parameters (Fig. 7B; Table 1 – two-factor regression model). Specifically, sigmoidicity increased with \({\sigma }_{sensory }\) and decreased with \({p}_{distractor}\). In other words, sigmoidicity could be used to quantify the extent to which the participants considered the presence of distractors when categorizing.
For an individual participant their category choice was driven by both their sensory uncertainty and their subjective estimates of the probability of distractors, but their accuracy was mostly driven by their sensory uncertainty. (A) Cartoon models of different psychometric curves illustrate the qualitative range of sigmoidicity. Left: high sigmoidicity, right: low sigmoidicity. (B) Each data point corresponds to a single participant; color denotes \({\text{p}}_{\text{distractor}}\) which is the participant’s estimated probability of distractors (see color bar); Black: \({\text{p}}_{\text{distractor}}\) \(\sim\) 0; Green: \({\text{p}}_{\text{distractor}}\) \(\sim\) 1. \({\text{p}}_{\text{distractor}}\) is a fitting parameter in the full Bayesian model and has value 0 in the no-distractor model. The model parameter \({\upsigma }_{\text{sensory}}\) captures participant’s sensory uncertainty. Statistics: effect of \({\text{p}}_{\text{distractor}}\) and \({\upsigma }_{\text{sensory}}\) on sigmoidicity using a two-factor regression model. (C) Participants’ accuracy did not significantly depend on \({\text{p}}_{\text{distractor}}\) but was inversely correlated with their sensory uncertainty. \({\text{p}}_{\text{distractor}}\) is color coded as in (B). Statistics: effect of \({\text{p}}_{\text{distractor}}\) and \({\upsigma }_{\text{sensory}}\) on accuracy using a two-factor regression model. In all panels, data point: one participant; †p < 0.05; *p < 0.01, **p < 0.001, ***p < 0.0001. Error bars in (B) and horizontal error bars in (C): 95% confidence intervals derived from respective model fits to 100 bootstrapped samples. Vertical error bars in (C): standard error of the mean, obtained by assuming that accuracy follows a binomial distribution; vertical error bars aren’t large enough to be distinguishable.
We then tested the effect of \({p}_{distractor}\) and \({\sigma }_{sensory}\), on participants’ accuracy. Although we did not find a relationship between accuracy and \({p}_{distractor}\) (Fig. 7C; Table 1 – two-factor regression model), we found a strong, roughly linear, relationship between accuracy and \({\sigma }_{sensory}\) (Spearman, ρ = -0.96, p = 3.15e-31). Taken together, these data demonstrate that participants had a similar measure of stimulus uncertainty but differed in their sensory uncertainty and estimate of relevance. Further, although participants’ category decisions were shaped by both their estimate for the probability of distractors and their sensory uncertainty, much of their differences in accuracy may be driven by differences in their ability to distinguish between the tone frequencies.
Next, we tested whether listeners differentially weighed tones based on their position in the sequence when making their category decision. To test this idea, we developed an alternative Bayesian model with a nonuniform psychophysical kernel in which we substituted \({p}_{distractor}\) for 3 tone-position dependent versions of \({p}_{distractor}\). We found that participants weighed the first two tone positions (\({p}_{distracto{r}_{position1}}\): Mdn = 0.7, IQR = 0.24; \({p}_{distracto{r}_{position2}}\): Mdn = 0.7, IQR = 0.28) equally but slightly more than the third tone position (\({p}_{distracto{r}_{position3}}\): Mdn = 0.85, IQR = 0.17; Fig S6G; Table 1 – one-way rmANOVA, post hoc two-sided Wilcoxon signed-rank test) when making their category decision. Additionally, values for \({p}_{distracto{r}_{position3}}\) but not \({p}_{distracto{r}_{position1}}\) and \({p}_{distracto{r}_{position2}}\) were statistically different than \({p}_{distractor}\) (Fig S6G; Mdn = 0.59, IQR = 0.43; Table 1 – two-sided Wilcoxon signed-rank test). The influence of all three position-dependent weights on sigmoidicity and accuracy was consistent with results (Figs. S6A-F; Table 1 – Spearman) from the full Bayesian model (Figs. 7B, C), in that both \({p}_{distractor}\) and \({\sigma }_{sensory}\) were correlated with sigmoidicity, while \({\sigma }_{sensory},\) but not \({p}_{distractor}\) , was correlated with accuracy. The finding that the third tone position is discounted relative to the first two tones is also supported by results from the simpler GLM model (Fig. S6H; Table 1 – one-way rmANOVA, post hoc two-sided Wilcoxon signed-rank test). We found that this alternative tone location-dependent Bayesian model outperformed the location-independent model, likely because it was able to capture this discounting of the tone in the third position (two-sided Wilcoxon signed-rank test on BIC, T = 0, p = 7.55e-11). Overall, these data show that participants may have used information from all three tone positions but weighed the third tone somewhat less than the first two.
Using this differential weighting of the tone positions, we reconstructed the psychometric curves (Fig. 8) for the representative participants shown in Fig. 5. This reconstruction accounts for a participant’s accumulation of evidence across the three tone positions. Interestingly, we found that although the full Bayesian model was simpler, both models adequately fit the experimental psychometric curves and for both the models the underlying p_distractor values were higher than the true experimental value of 0.3. Participants thus had a tendency to ignore tones more than the statistics of the task would require.
Psychometric curves for representative participants fitted using the tone-position dependent Bayesian model (A, B, C) Psychometric curves (black lines) and model fits of the tone-position dependent Bayesian model (orange lines) for the same 3 example participants as Fig. 5. The x-axes of the psychometric curves span the frequency range of the stimulus tones, and the y-axes denote the mean probability of a \(\widehat{\text{High}}\) category choice response. Error bars: standard error of the mean. (A) Example participant with high task accuracy on trials with either one or two distractor tones. (B) Example participant with high task accuracy on trials with one distractor, and poor accuracy on trials with two distractors. (C) Example participant with poor accuracy on trials with both one or two distractors.
Participants rely on past statistics of category probabilities
Having established how stimulus uncertainty, sensory uncertainty and stimulus relevance shape auditory categorization, we next asked whether and how participants’ learning of past statistics of the category probabilities informed their current decision. We first analyzed the influence of short-term statistics by computing for unbiased sessions the mean probability of a \(\widehat{\text{High}}\) category choice response for a given trial (\(t)\) conditioned on the category of the previous trial (\(t-1\)), \(\text{p}(\widehat{{\text{H}}_{t}}|{\text{H}}_{t-1})\) and \(\text{p}\left(\widehat{{\text{H}}_{t}}|{\text{L}}_{t-1}\right)\) (Fig. 9A). We found that the difference between the two curves was quite variable across participants, suggesting that although some participants’ decisions were substantially affected by the short-term history of category probabilities (Figs. 9C, D), others’ decisions were not (Fig. 9B). Overall, across the population, the category of the previous trial (i.e., short-term stimulus statistics) significantly influenced participants’ choices on the current trial (Fig. 9E; \({\Delta }_{\text{short}-\text{term}}:\) difference between \(\text{p}(\widehat{{\text{H}}_{t}}|{\text{H}}_{t-1})\) and \(\text{p}\left(\widehat{{\text{H}}_{t}}|{\text{L}}_{t-1}\right)\) at central frequency of 511 Hz; central frequency: frequency which is the mid-point of the experimental frequency range in log-space; two-sided Wilcoxon signed-rank test on \({\Delta }_{\text{short}-\text{term}}\), Z = 6.04, p = 1.58e-9). Moreover, we found that this influence was only restricted to the category of the previous trial and did not substantially depend on whether the participant responded accurately to the trial (Fig S7; Table 1 – two-sided Wilcoxon signed-rank test on \({\Delta }_{\text{CorrectVsIncorrect}}\)). Combined, these data confirm the presence of category-driven short-term learning in decision-making.
To a varying degree, participants incorporated short-term statistics of category probabilities in their decisions. (A) Schematic shows the effect of learning short-term statistics. (B-D) Average psychometric curves (solid lines- light blue: previous trial is low; red: previous trial is high) and the adapted full Bayesian model fits (same colors, dashed lines) conditioned on the previous trial’s category type for the same three participants as Fig. 5 in the unbiased session. Error bars: standard error of the mean. Difference between \(p(\widehat{High})\) computed at the central tone frequency 511 Hz \({\Delta }_{short-term}\): (B) \(\sim\) 0.1, (C) ~ 0.22 and (D) \(\sim\) 0.32. (E) Corresponding psychometric curves averaged across all participants in the unbiased session; \({\Delta }_{short-term}\): \(\sim\) 0.12. Statistics: two-tailed Wilcoxon signed-rank test on \({\Delta }_{short-term}\). In panel (E), †p < 0.05; *p < 0.01, **p < 0.001, ***p < 0.0001.
In conjunction with short-term learning, participants may also use long-term learning when there is a bias in the trial categories toward low or high choices over a longer time scale. To test whether participants were affected by short- and/or long-term learning, in separate behavioral sessions, we changed the probability of the low-category trials. Specifically,\({p}_{L}=0.7\) in biased low sessions and \({p}_{L}=0.3\) in biased high sessions (Fig. 11A; Participants: N = 53 in biased low and N = 48 jointly in biased low and biased high). Anecdotally, individual participants with high values of \({p}_{distractor}\) did not use either type of learning when making decisions (Fig. 10B; Fig S8A). However, participants with lower values of \({p}_{distractor}\) seemed to use previous trial statistics to varying degrees (Figs. 10C, D). Some participants used both short- and long-term learning (Fig. 10C; Fig S8B), whereas others primarily used long-term learning (Fig. 10D; Fig S8C). Overall, in the population, there was a clear effect of both the session type and the last trial on category choice (Fig. 10E, Figs S7-9; \({\Delta }_{\text{long}-\text{term}}\): difference between psychometric curves from biased low and biased high sessions at central frequency; population statistics: two-sided Wilcoxon signed-rank test on \({\Delta }_{\text{long}-\text{term}}\), Z = 8.63, p = 6.34e-18; two-sided Wilcoxon signed-rank test on \({\Delta }_{\text{short}-\text{term}}\), biased low: Z = 5.94, p = 2.91e-9; biased high: Z = 5.94, p = 2.91e-9). Thus, data across the three sessions – unbiased, biased low and biased high – suggests that both, short- and long-term learning influenced participants’ decision-making, presumably depending on their relevance estimates.
To a varying degree, participants’ decisions were affected by long-term statistics of category probabilities. (A) Example stimuli from unbiased, biased low, and biased high trials. Individual tones are color-coded according to their underlying distribution as shown in Fig. 1B. (B-D) Average psychometric curves for three participants in the unbiased (black), biased low (light blue) and biased high (red) sessions. Respective model fits also shown using dashed lines; (B, C) fit using the full Bayesian model and (D) fit using the random-guess model for the biased sessions and the full Bayesian model for the unbiased session. Error bars: standard error of the mean. Curves for the biased sessions are evaluated using balanced datasets (see Methods). Curves for the unbiased trials are similar to black lines in Figs. 5B-D. Difference between \(p(\widehat{High})\) for biased low and biased high trials computed at the central tone frequency 511 Hz. \({\Delta }_{long-term}\): (B) \(\sim 0.06\), (C) \(\sim 0.35\) and (D) \(\sim 0.51.\) (E) Psychometric curves averaged across all participants; \({\Delta }_{long-term}\): \(\sim\) 0.2. Statistics: two-tailed Wilcoxon signed-rank test on \({\Delta }_{long-term}\). In panel (E), †p < 0.05; *p < 0.01, **p < 0.001, ***p < 0.0001.
To further quantify how learning modulates category decision-making, we adapted the full Bayesian model to incorporate both short- (exponential weighted: \({W}_{1}{e}^{-t/\tau }\)) and long-term (constant: \({W}_{constant}\)) components (Values of \({W}_{1}\) and \({W}_{constant}\sim\) 0 implies that there was not any learning, whereas values ~ 1 implies learning; \(\tau\) is a time constant that captures the number of preceding trials that a participant may have used to learn the short-term stimulus statistics). This adapted model, thus, combined the effects of sensory uncertainty, stimulus relevance, stimulus uncertainty, as well as learning of long- and short-term stimulus statistics.
Using this framework, we asked how participants measure of stimulus relevance and their sensory uncertainty modulated their learning of stimulus statistics. Thus, we first tested whether participants accounted for stimulus relevance when categorizing trials during biased low and high sessions. We found that the performance of most participants, except for two, was best fit by the full Bayesian model (Fig S10). These two outlier participants primarily used their learned prior and their performance was thus best fit by the random-guess model (Figs S10A, B, D, E). These data reinforce our previous finding that stimulus relevance is central to categorization.
Next, we assessed the degree to which participants learned the short-term statistics of trials by fitting the adapted version of the full Bayesian model to data from the unbiased session. We found that nearly all participants demonstrated significant short-term learning (Fig. 11A; \({W}_{1}>0\) for N = 55). However, there was substantial variability across participants such that for some participants, \({W}_{1}\sim 0\), whereas for other participants, it was \(\sim 1\). Similar heterogeneity was also observed in the time constant for learning such that the participants’ time constant was inversely correlated with their accuracy (Fig. 11B; Spearman, ρ = -0.27, p = 0.042). However, across the population, the average time constant (\(\tau\)) in the unbiased session was \(\sim\) 1 trial (Mdn: 0.63 trial, IQR: 0.44; Figs. 11A, B). The nature of \({W}_{1}\) (Fig. 11C) and \(\tau\) (Fig S11; Table 1 – Spearman correlation with accuracy) was similar in the biased sessions. Jointly, these results indicate that although most participants show signs of short-term learning when making category decisions, this learning was largely restricted to the immediately preceding trial.
Participants’ subjective estimate of distractor probability varies inversely with their learning of stimulus statistics when making category decisions. (A) On unbiased trials, nearly all participants (black, N = 55) exhibited short-term learning (\({W}_{1}\) > 0). (B) Their accuracy was inversely correlated with the time constant (\(\tau\)) for such learning. Further, the median number of previous trials over which participants’ retained information was 0.63, indicating that participants retained information from the immediately preceding trial. (C) On biased trials (biased low: light blue, N = 53; biased high: red, N = 48), the effect of short- and long-term learning varied substantially across participants. Across all three sessions, \({p}_{distractor}\) was inversely correlated with (D, E) short-term learning given by \({W}_{1}{e}^{-1/\tau }\) and in the biased sessions, it was also inversely related to (F) long-term learning captured using \({W}_{constant}\). All panels, data point: one participant; †p < 0.05; *p < 0.01, **p < 0.001, ***p < 0.0001; Statistics: Spearman correlation; error bars: standard error of the mean.
Similar to short-term learning, we also expected some participants to exhibit effects of long-term learning. Indeed, most participants learned the long-term trial statistics (Fig. 11C), but this learning was not correlated to their short-term learning (Spearman; Table 1). Thus, participants appear to use both the bias in the session and their knowledge of previous trials albeit to different degrees when making decisions.
Next, we probed how these two learning mechanisms relate to participants’ estimated probability of distractors. We found that the weight of the preceding trial (\({W}_{1}{e}^{-1/\tau }\)) was inversely correlated with \({p}_{distractor}\) for all three sessions (Figs. 11D, E; Spearman, unbiased: ρ = -0.36, p = 0.0069, biased low and high: ρ = -0.47, p = 1e-6). Similarly, long-term bias (\({W}_{constant}\)) was also inversely correlated with \({p}_{distractor}\)(Fig. 11F; Spearman, biased low and high: ρ = -0.25, p = 0.013). This indicates that participants’ measure of relevance uncertainty was inversely correlated with their stimulus-category knowledge derived over both short- and long-time scales. This suggests that participants who are poor at differentiating between signal versus distractor tones likely rely on other sources of information such as past knowledge when making category choices.
Finally, we asked how the various parameters that drive performance – sensory uncertainty, stimulus relevance, stimulus uncertainty, and learning relate to task accuracy. We found that sensory uncertainty was a strong predictor of accuracy, accounting for much of the variance (Figs. 7C, S12; Spearman, biased low and high: ρ = -0.75, p = 2.38e-19). Thus, the main factor in deciding how proficiently people categorized sounds could be interpreted as how well they perceived the different tone frequencies. These results support the basic assumption of Bayesian models that sensory uncertainty is a fundamental factor in decision making.
Discussion
It has long been known that humans and other animals can isolate signals of interest from background in noisy, crowded environments. When presented with two competing auditory streams, participants’ neural responses to the task-relevant stimuli are enhanced53,54, but, their processing of the irrelevant distractor stimuli is mediated by attention55,56. Visual spatial and feature attention affects the neural representation of task-irrelevant sensory information57,58. Furthermore, because such information is often incompletely suppressed, the presence of distractors impairs human decision-making behavior15. However, much of this work has studied the effects of relevance in isolation. When making decisions in noisy, uncertain environments, humans often incorporate multiple sources of information. One such source is the long-term sensory statistics of the stimuli of interest30,59,60,61 – studies have found that sensitivity to these statistics introduces observable biases in perception. However, in addition to tuning to long-term regularities, our sensory representations often need to rapidly adapt to short-term changes62. Indeed, such adaptation underscores context-dependent speech perception and categorization27. In our work, we investigated this complexity in perceptual decision-making by testing how participants’ sensory uncertainty, measure of stimulus relevance, stimulus uncertainty, and learning interacted to shape their auditory categorization behavior.
We found that participants differentially combined these factors when making category decisions. Specifically, when categorizing tone sequences consisting of both category-irrelevant (distractors) and relevant tones, participants weighed individual tones by their task relevance (Figs. 2,3; Figs S1-4,10). Additionally, nearly all participants had similar stimulus uncertainty but diverged in their sensory uncertainty and their estimates of stimulus relevance (Fig S5). In fact, participants’ psychometric curves and category choices were jointly determined by both their sensory uncertainty and their subjective estimate of stimulus relevance, whereas their accuracy was mostly driven by their sensory uncertainty (Fig. 7). We also observed that participants’ prior expectations of stimulus category were affected by their knowledge of stimulus statistics over both short and long timescales and this effect was inversely correlated with participants’ estimated probability of distractors (Figs. 9,10,11; Fig S7-10). Additionally, we treat stimulus uncertainty as an estimate that the participants make about the stimulus distribution. However, it is possible that some participants do not explicitly separate the estimates of the underlying stimulus distributions and internal noise.
Flexibility of experimental paradigm
Compared to previous work30,63, our experimental paradigm is unique in that we could modulate the multiple factors of sensory uncertainty, stimulus relevance, stimulus uncertainty and learning all within the same stimulus sequence. For example, because our stimulus was a three-tone sequence and because we could manipulate the number of task-irrelevant distractors (Fig. 1), we could examine not only how participants accounted for relevance during categorization but also how their sensory uncertainty might have affected their associated measures of relevance (Figs. 2,5,6,7; Fig S1). Additionally, although humans use both short- and long-term knowledge to efficiently make decisions26,33, most studies probing the influence of learned expectations on categorization have largely examined either short- or long-term effects without testing for their interaction64,65,66. Because in part of our experiment we biased the tone sequences towards either the low or the high category, we could investigate how participants’ reliance on long-term stimulus statistics interacted with their learning of short-term statistics i.e., the effect of the previous trial’s category (Fig. 11). Thus, our experimental setup allowed us to simultaneously probe multiple factors that often underlie category decision-making. However, a potential shortcoming of our experimental design is that we did not vary stimulus uncertainty. Future studies could examine how this parameter interacts with all the other behavioral factors when making category decisions. In addition, we treat stimulus uncertainty as an estimate that the participants make about the stimulus distribution. However, it is possible that some participants do not explicitly separate the estimates of the underlying stimulus distributions and internal noise.
Bayesian approach to study variability and interplay of relevance and uncertainty
To quantify the interplay between all these factors, we leveraged a Bayesian model (Figs. 1,5,6,7,8,9,10,11). Bayesian models are a powerful tool that has been successfully used to study human behavior in sensorimotor learning12,67, sensory perception25,68, categorization30,42 etc. Our Bayesian analysis allowed us to characterize how individual participants may vary in their categorization behaviors. Across all three sessions, performance of nearly all participants was best captured by the full Bayesian model (Figs. 6,11; Figs S4,10). However, in the unbiased session, few participants were found to be equally well fit by both the full Bayesian and the no-distractor models, suggesting that they did not take relevance into account when making category decisions. In the biased sessions, two participants who relied only on long-term stimulus statistics to make their decisions were best captured by the random-guesses model (Fig S10B, E).
In addition to this heterogeneity across strategies, we found that even among the participants best-captured by the full Bayesian model, there was substantial variability (Figs. 7,11; Figs S2-6,8) resulting from participants’ sensory uncertainty, their subjective measure of relevance, and their degree of short- and long-term learning. For instance, by modeling sensory uncertainty using the parameter \({\sigma }_{sensory}\), we found that although, on average, the value of participants’ sensory uncertainty is consistent with previous literature30, some participants have low sensory uncertainty and high task accuracy whereas some others have substantially high sensory uncertainty and poor task accuracy (Fig. 7). This variability also underscored participants’ measure of stimulus relevance and their learning of stimulus statistics. In the latter, we observed that not all individuals used their prior information. Overall, these results illustrate diversity in the human perceptual decision-making process and follow prior studies that have elaborated upon individual differences in performance across several speech and non-speech tasks 69,70. Additionally, because our fitting procedures for the Bayesian models suggest that the participants are likely veridical about the stimulus features, their variability in categorization behavior mostly results from the cumulative effects of internal processes such as determination of stimulus relevance, sensory uncertainty and reliance on short- and long-term learning. This finding is consistent with previous work that attributes individual differences in the cocktail-party effect to stimulus-independent, internal stochastic processes 71,72.
Future work could use similar Bayesian approaches to probe whether in such auditory categorization tasks, participants use the same strategy throughout the experiment or switch between different strategies73 based on the complexity of the trial. Further, we conjecture that given the interdependence of stimulus relevance and strategy, it is likely that our analysis is a simplification of human behavior, and that relevance may be a time-varying property of participants. This contrasts with sensory uncertainty, which is likely to be a static parameter intrinsic to a given participant.
Learning
The Bayesian analysis also allowed us to explore how the different forms of uncertainty – sensory uncertainty, stimulus relevance and stimulus uncertainty – interact with learning of stimulus statistics during categorization. We found that both short-term and long-term learning biased participants’ behavior in a manner consistent with previous studies 27,30. Moreover, these effects were independent (Fig. 11) of each other60, but they were both inversely correlated with the relevance parameter \({p}_{distractor}\). In other words, when making category decisions, participants combined their uncertainty about stimulus relevance with other sources of information74. Additionally, because the influence of long-term learning in the experiment grew with time (Fig S9), participants may have adapted to the underlying stimulus statistics. Future work should investigate the neural mechanisms that drive the dynamics of this adaptation, over both short and long timescales, and how such adaptation may be altered by the presence of distractors.
Future directions
The interplay between multiple factors might also have implications for the broader understanding of how humans navigate real-life crowded noisy environments. For example, spatial cues75,76 and attention77,78,79 are integral to parsing multiple sound sources, but it is not yet well understood how humans can effortlessly identify relevant sounds in these noisy conditions. In our study, participants categorized three-tone sequences into high or low categories such that their category decisions were not greatly affected by tone positions (Fig S6). However, real-life stimuli such as speech80 and musical chords have rich information both within and across sequences. Whether such structure can aid relevance discrimination remains untested. Furthermore, our results indicate that participants’ estimates of stimulus relevance varied inversely with their learning of stimulus statistics over different time scales. Future work should explore the role of relevance in real-life scenarios and whether prior knowledge of one’s environment may detract from identification of relevant sounds. By covering many aspects of the complex process of auditory category learning, our approach may also open new ways of studying the neural basis of audition. Short- and long-term learning81,82,83,84,85, uncertainty in relevance of stimuli57, uncertainty in category of trials86 and sensory uncertainty may all be captured by distinct circuit elements and brain regions87,88. As such, experiments that vary these factors in a model-driven framework can enable new approaches towards understanding neural computation in the auditory system.
Methods
Ethics statement
The experimental protocol was approved by the Institutional Review Board of the University of Pennsylvania. All procedures were carried out in accordance with the IRB guidelines. All participants participated voluntarily and provided informed consent in writing prior to participating.
Human psychophysics apparatus
We used the crowdsourcing online platform provided by Prolific to recruit participants. We collected participants’ consent and demographic data using Qualtrics and conducted all experiments using Pavlovia. All subjects reported having normal or corrected-to-normal hearing. Subjects were required to use a laptop or desktop computer to complete the experiment. Each participant had to pass a ‘headphone-check’ experiment that helped ensure that they were wearing headphones or earphones as instructed89 before continuing with the experiment. This headphone-check required participants to judge which of three pure tones is the quietest and is designed to screen out non-headphone users due to phase-cancellation.
Stimulus creation and task design
Human participants performed an auditory categorization task in which they reported the frequency category (\(\widehat{\text{High}}\) or \(\widehat{\text{Low}}\)) of a three-tone sequence (each tone is a sinusoid; tone duration: 300 ms during training, 280 ms during testing; inter-tone interval: 175 ms during training, 165 ms during testing; 44,100 Hz sampling rate; 20 ms on- and off-ramps). Longer duration and intervals were used during training to give the listener more information about the task. Once they had learned the task structure, these parameters were reduced to reduce overall experiment duration. Each of the three tones \(v\) probabilistically could be either signal or distractor; that is a signal tone was relevant to the category membership of a sequence, whereas a distractor tone was irrelevant. The signal frequency probability distributions were Gaussian (\({p}_{G}\); Eqs. 1a, b) in log(Hz) space: the mean of the low-frequency distribution (\({\mu }_{L}\)) was 2.55, the mean of the high-frequency distribution (\({\mu }_{H}\)) was 2.85, and standard deviation of both (\({\sigma }_{expt}\)) was 0.1. In Hz, the mean frequencies for the low and high distributions correspond to 355 Hz and 708 Hz respectively. The distractor frequency probability distribution was uniform (\({p}_{U}\); Eq. 1c) and overlapped with both Gaussian distributions. For the distractor probability, the frequencies ranged from a = 90 Hz to b = 3000 Hz.
Overall, a low (high) category sequence was a combination of low-frequency (high-frequency) signal tones and distractor tones. The complete frequency range of the tones was between 90 – 3000 Hz and was uniformly sampled in log 10 space such that the frequency of each tone was one of 30 possible values. Because of the discrete nature of the frequencies, the equations presented here are probabilities across the 30 possible frequency values rather than continuous probability density functions.
Participants completed three sessions of the experiment: unbiased, biased low, and biased high. Tone sequences in the unbiased sessions were equally likely to be drawn from either the high or low categories, i.e., \({p}_{L}=0.5\). Conversely, in the biased low (biased high) sessions, we overrepresented the corresponding distribution such that \({p}_{L}({p}_{H})=0.7\). Across all three sessions, the probability that each tone was signal was \({p}_{S}=0.7\) and of it being a distractor was \({p}_{D}=0.3\). This meant that, of the set of three tones, the probability that the set included just one distractor was 3(0.3)(1—0.3)(1—0.3) = 0.44. The signal and distractor tone probabilities were chosen based on pilot data internally collected. Specifically, for \({p}_{D}\), we tested values of 0.2, 0.3 and 0.4 and chose \({p}_{D}=0.3\) based on a tradeoff between participants’ accuracy and their self-reported difficulty of the experimental task.
The generative process of the experimental stimuli can be summarized as,
where \({\varvec{v}}\) is the set of three experimental tones frequencies \({v}_{1},{v}_{2},{v}_{3}\) (in units of log(Hz)). \({\varvec{r}}\) is the set of tone relevance indicators \({r}_{1},{r}_{2},{r}_{3}\in \{R, I\}\) such that \({r}_{i}\) = R implies that the tone \({v}_{i}\) is signal (relevant) and \({r}_{i}\) = I implies that it is a distractor (irrelevant). C is the category of the sequence (high (\(H\)) or low (\(L\))). Because the tones and their relevance are conditionally independent, given a trial’s category, we can rewrite Eq. 2 as:
For example, in a high-category sequence, for a given tone \({v}_{i}\),
The priors in the generative process are \(p\left(C=L\right)={p}_{L} \text{and} p\left(C=H\right)= {p}_{H}=1-{p}_{L}\).
Participants
In total, 48 healthy adults (22 female; mean ± SD age of the 48 adults: 26.32 ± 7.17) successfully completed the full study. Of the 169 adults who signed up for the study, 32 declined to participate after signing the consent forms, 72 failed the ‘headphone check’, and 9 others either only completed one of the biased sessions or did not exceed 70% accuracy on those trials that did not have distractors. We tested for this accuracy threshold during a participant’s first session, which ensured that the participants understood the basic task of categorizing three-tone sequences into two categories. All participants self-reported that they had normal hearing or hearing aids.
We had 3 study variations to randomly counterbalance the order of the session type across participants: (1) unbiased, biased low, and biased high (N = 16); (2) biased high, biased low, and unbiased (N = 17); and (3) biased low, unbiased, and biased high (N = 15). 8 other adults (3 female; mean ± SD age of the 8 adults: 32 ± 13.95) only partially completed the study, which supplemented data in the unbiased experiment. Data from 5 of these participants supplemented data in the biased low experiments.
We collected pilot data for 9 participants and fit the full Bayesian and no-distractor models to this data. Computing the effect size based on the negative log-likelihood of the model fits, and using a power value of 0.2, we then calculated the minimum required sample size for our experiment. For this, we used the function pingouin.power-ttest (Python). The minimum required sample size was n = 16.04, which indicated that our experimental sample size of 56 participants in the unbiased session and 48 participants across both of the biased sessions was sufficient to capture the effect of distractors on participant behavior. A larger cohort also helped ensure that we could study variability in the underlying factors across real-world participants. Additionally, the cohort size is similar to other comparable studies30,41,90,91,92.
Experimental setup
During the training part of the task, participants were presented with 35 total trials. First, they were presented with 16 easy trials in which the frequencies of all three tones were signal and drawn from the mean value \(\pm\) 1 \(\sigma\) of either the low- or high-frequency Gaussian (signal) distributions. They were then presented with 10 more difficult trials in which two of the three tones were signal: two tones were drawn from the mean \(\pm\) 1 \(\sigma\) of either the low- or high-frequency Gaussians, whereas the third was distractor and drawn from the extremes of the uniform distractor distribution (\(<{\mu }_{L}-4\sigma\) or \(>{\mu }_{H}+4\sigma\)). Finally, participants performed 9 very difficult trials in which only one of the three tones was signal and the other two were distractors. Before the ‘more difficult’ and ‘very difficult’ training sets, we informed the participants that the trials in those sets would comprise of one or two distractor tones respectively. Thus, the progression of the ‘easy’, ‘more difficult’ and ‘very difficult’ trials allowed us to systematically introduce the distractor tones to the participants. They were informed to ignore the distractor tones and only make a category decision based on the signal tones. We used the same training paradigm for all the 3 sessions – unbiased, biased low, and biased high. However, we did not inform them about the potentially biased nature of a session. Therefore, any information about the long-term bias was gained through exposure to the stimuli during the testing phase of the experiment.
In the testing phase, we had 600 trials in the unbiased session and 800 trials in each of the biased sessions. We randomized the trials and divided them into 4 blocks of 150 trials each for the unbiased session and 5 blocks of 160 trials for the biased sessions. The participants could optionally take a break (maximum: 3 min) between each block. Participants had to respond within 1.8 s after offset of the last tone and were given visual feedback about their report after each trial. The feedback duration was 0.8 s, and the participants were shown either ‘Correct’, ‘Not quite’, or ‘No response. Please respond faster’. Additionally, to ensure data quality, participants were not allowed to skip or incorrectly answer 10 trials in a row; we gave them an ‘attention warning’ if 5 continuous trials were skipped or answered incorrectly. In the unbiased session, the median number of no-response trials was 1, and the 90th percentile was 3.8. The number of no-response trials was also low during biased low (median: 1; 90th percentile: 7) and biased high (median:1; 90th percentile: 11.4) sessions. We excluded the no-response trials when we computed participant accuracy.
At the end of each experimental session, participants were asked for their consent to be recalled for the next session. The dates of the data collection were staggered, and we conducted each subsequent session after a variable number of days ranging from (1–30 days). We could not identify any relationship between the days between sessions and a participant’s behavior.
Raw data analysis
Figure 2A-B: To compute the influence of each tone frequency on category choice probability \(\text{p}\left(\widehat{\text{High}}\right)\), we considered the tones in the region where \(\text{p}\left(\text{signal}\right)>\text{p}\left(\text{distractor}\right)\). For each of these tones, we calculated the average category choice when that tone was presented as a signal tone. Then, we computed the influence as the normalized correlation coefficient between these tone frequencies and the corresponding category choice probabilities. We repeated this analysis focusing only on trials where the tones were presented as a distractor tone. We additionally tested whether the position of a tone within each trial (e.g., first, second or third) affected participants’ decisions and found that these correlations were independent of tone positions (Fig S1A; Table 1, one-way repeated measures ANOVA (rmANOVA)). This analysis suggests that participants weighed all three tone positions equally within a trial and integrated information across the tones when making their category choice, and therefore our analyses could treat tone types in all three tone positions the same.
Accuracy
We computed each participant’s accuracy by comparing their response to a given trial’s ‘true’ category. ‘True’ category is the category which is used to generate the trial using Eq. 2.
Computational models
Bayesian models
We assumed that participants performed Bayesian inference over a generative model when solving the categorization task: given their sensory evidence of the tones \({m}_{i}\) where \(i\in \{\text{1,2},3\}\), participants computed the posterior probability,
where \(C\) is the category choice (\(\widehat{\text{High}}\) (\(\widehat{\text{H}}\)) or \(\widehat{\text{Low}}\) (\(\widehat{\text{L}}\))). In this equation, the likelihood \(, p\left({\varvec{m}}|C\right)\) that the sequence of sensory evidence \({\varvec{v}}\) belonged to a particular category was calculated by marginalizing over a broad range of hypothesized tone frequencies \({\varvec{f}}=\{{f}_{1},{f}_{2},{f}_{3}\}\). To account for the entire frequency range of human hearing93,94, both \({\varvec{f}} \text{ and } {\varvec{m}}\in \left\{3.98, \text{39,810}\right\}\text{ Hz}.\) In addition, we introduced a dummy variable \({\varvec{r}}=\left\{{r}_{1},{r}_{2},{r}_{3}\right\}\) that explicitly characterized whether a participant deemed each tone’s sensory evidence (\({m}_{i}\)) as ‘relevant’ (denoted by R) or ‘irrelevant’ (given by I) to their category decision. Thus, the likelihood of a high-category three-tone trial is,
Each hypothesized tone \({f}_{i}\text{ generated sensory evidence }{m}_{i}\text{ according to the probability density }p\left({m}_{i}|{f}_{i}\right)\), which is a Gaussian characterized by the participant’s sensory uncertainty and noise in the auditory pathway (denoted using parameter \({\upsigma }_{sensory}\)). Thus,
The details of how we computed the value of \({\sigma }_{sensory}\) are given below in Methods—model fit and predictions.
Next, because the hypothesized tones and participant’s estimated relevance are conditionally independent given trial category, we can rewrite:
Additionally, because we observed that frequencies in each tone position are almost equally correlated with behavior (Fig S1A, S5), the relevance of a tone (denoted by parameter \({p}_{distractor}\)) is agnostic of its order in the three-tone sequence. Using the generative model (see Eqs. 4 and 5), we can then expand Eq. 10 as follows,
When computing the posterior in Eq. 6, \(p\left(C\right)={p}_{low}\) for category choice \(\widehat{\text{Low}}\) and \({p}_{high}\) for category choice \(\widehat{\text{High}}\). Here, \({\mu }_{low}, {\mu }_{high}, \sigma , {p}_{distractor}, \text{ and } {p}_{low}\) are the participants’ estimated values of the true experimental parameters \({\mu }_{L}, {\mu }_{H}, {\sigma }_{expt}, {p}_{D}, { \text{ and } p}_{L}\) and are fitting parameters in the Bayesian models. When we fit these parameters to the participants’ data, we minimized the negative likelihood (-log(posterior)), which is the cost function.
We compared three main decision-making models in the analysis: (1) A ‘full Bayesian’ model, which assumed that participants probabilistically determined if a tone was ‘signal’ or ‘distractor’; (2) a ‘no-distractor’ model, which assumed that participants considered all of the tones to be ‘signal’; and (3) a ‘random-guess’ model, which assumed that participants considered all of the tones to be ‘distractors’ and thus responded randomly. In the ‘no-distractor’ model, \({p}_{distractor}\) = 0, while in the ‘random-guess’ model, \({p}_{distractor}\) = 1. In these three models, stimulus relevance is agnostic of tone position (Fig. 5). We also compared results from the full Bayesian model to those from an alternative model with position-dependent weights for each tone in the sequence (Fig S6).
The full Bayesian model has 6 fitting parameters – \({\mu }_{low}, {\mu }_{high}, \sigma ,{\sigma }_{sensory},{p}_{distractor},\) and \({p}_{low}\)– such that \({0<p}_{distractor}<1\). On the other hand, in the no-distractor model,\({r}_{i}=R\) for \(i\in \text{1,2},3\) in Eq. 11. Because this model assumes that all of the tones are ‘signal’, we can simplify the likelihood \(p\left({\varvec{f}}|H\right)\) into a product of three Gaussians. As a consequence, the no-distractor model has 5 fitting parameters – \({\mu }_{high},{\mu }_{low},\sigma ,{p}_{low}, \text{ and } {\sigma }_{sensory}.\) In the random-guess model, participants categorized trials using random guesses proportional to the probability of the prior, and the 2 fitting parameters are \({p}_{low}\) and \({\sigma }_{sensory}\). More specifically, in this model the posterior in Eq. 6 simplifies to,
To capture change in expectations resulting from short- and long-term learning (Fig. 10), we fit an adapted full Bayesian model with a time-varying prior, such that,
Here, \(p{\left(C=L\right)}_{n}\) denotes a participant’s prior for the low-category choice decision in the \({n}^{th}\) trial. The indicator function \({\mathbb{l}}_{{t}_{C}}\) takes the value 1 if the ground truth of \({\left(n-t\right)}^{th}\) trial was low (L) and it takes the value -1 if it was high (H). \({W}_{constant}=\left|{W}_{0}-0.5\right|*2\) captures the long-term effect on participant’s expectations, \({W}_{1}\) indicates the short-term effect and \(\tau\) is the time constant governing the short-term effect.
Last, the model with position-dependent weights has 8 fitting parameters – \({\mu }_{low},{\mu }_{high},\sigma ,{\sigma }_{sensory},{p}_{distracto{r}_{positio{n}_{1}}}, {p}_{distracto{r}_{positio{n}_{2}}}, {p}_{distracto{r}_{positio{n}_{3}}}\) and \({p}_{low}\). \({p}_{distracto{r}_{positio{n}_{1}}}, {p}_{distracto{r}_{positio{n}_{2}}}\) and \({p}_{distracto{r}_{positio{n}_{3}}}\) are participants’ estimates of the probabilities that the first, second and third tone respectively in the sequence are distractors.
To fit each of the above models to the participants’ psychometric curves (see Figs. 6,8,9; Figs S4,5), we applied a decision threshold to the Bayesian model posterior such that,
Similar to Eq. 9, we considered that each tone frequency \({v}_{i}\) generated sensory evidence \({m}_{i}\) according to the probability density \(p\left({m}_{i}|{v}_{i}\right)\), which is a Gaussian characterized by the participant’s sensory uncertainty (denoted by the parameter \({\upsigma }_{sensory}\)).
Boundary model
We constructed a simple boundary-decision model for each participant that had three model decision criteria along the frequency continuum: 1) xL, the boundary between low-distractor tones and low-signal tones, 2) xC, the category boundary between low-signal tones and high-signal tones, and 3) xH, the boundary between high-signal tones and high-distractor tones. Each tone was independently classified as being a distractor (D), low signal (L) or high signal (H). The model for individual tones can be represented as,
Here, \({\widehat{c}}_{i}|{v}_{i}\) is the participant’s inferred generative category for tone i. The participant’s category report is based on the mode of the tone classifications over the three inferred tone categories \(\widehat{{\varvec{c}}}\) (a voting model),
Here, c is the participant’s inferred category for the full tone sequence. The category report probability is,
Here, y is the category report, which can be either 0 (“Low”) or 1 (“High”). If the mode of the tone classifications is “Distractor” or there is no single mode, the category report probability is 0.5.
Generalized linear model
We constructed an elastic net (Lasso + second-order Tikhonov) regularized binary logistic regression model95 for each participant such that the model predictors are the indicator functions for the three tones in a trial. An indicator function \({\mathbb{l}}_{{t}_{i}}\) takes the value of 1 if tone \({t}_{i}\) is present in the trial and 0 otherwise. There are 30 predictors because a tone frequency can have 30 possible values. The model can be represented as,
Here, \(y\) is the participant’s category report. The \(w\) s capture the influence (weight) of the different experimental tone frequencies in a participant’s decision-making process. The function \(\text{expit}(x)\) is computed as \(\frac{1}{1+\text{exp}\left(-x\right)}.\)
To compute the individual influence of the three tone positions in the sequence (Fig S6H), we expand the model to have 90 predictors – 30 for each tone position.
Model Fit and Predictions
Bayesian models: full Bayesian and no-distractor models
We used two approaches to fit the full Bayesian and no-distractor models to the unbiased data. First, we fit all the parameters to the participant data. However, we suspected that we may be overfitting \({\mu }_{low}, {\mu }_{high}, \sigma\) and \({p}_{distractor}\), because the parameters are redundant, and there are multiple configurations that lead to the same likelihood function (Eq. 12) for the Bayesian model. Thus, in the second approach, we considered participants to be veridical about the parameters of the Gaussian distributions (\({\mu }_{low}=2.55,\) \({\mu }_{high}=2.85\) and \(\sigma =0.1\)) and only fit the remaining parameters – \({\sigma }_{sensory}, { p}_{low}\) and for the full Bayesian model, \({p}_{distractor}\). Because both approaches had similar fits (see Figs S5G, H) and to prevent overfitting, we followed the second approach.
In both the approaches, we initially fit these models to a participant’s performance using a coarse parameter grid search, which allows for systematic multi-start optimization. When applicable, we used the following grid values: \({\mu }_{low}\in \left\{2.1, 2.25, 2.4, 2.55, 2.7\right\},{ \mu }_{high}\in \left\{2.7, 2.85, 3, 3.15, 3.3\right\},\) \(\sigma \in \left\{0.05, 0.25, 0.45, 0.65, 0.85\right\}\), \({p}_{distractor}\in \left\{0.05, 0.25, 0.45, 0.65, 0.85\right\}\) and \({p}_{low}\) was constrained by each participant’s correct performance. We also assumed that \({\sigma }_{sensory}\) is participant and task dependent but not model dependent. Moreover, because the essential experimental goal was the same in both the unbiased and the biased sessions, we assumed that \({\sigma }_{sensory}\) was constant across the three session types. Thus, instead of running a separate psychophysics experiment to compute \({\sigma }_{sensory},\) we used data from the unbiased session for trials with all three tones drawn from the signal distributions and no distractor tones. We fit this data using a smaller number of model parameters – \({\mu }_{low}, {\mu }_{high}, \sigma , {\sigma }_{sensory}, { \text{and }p}_{low}\). When the remaining 4 parameters were systematically varied, we found that a participant’s \({\sigma }_{sensory}\) was essentially constant; across the population: median – 0.19; 10th percentile – 0.14; and 90th percentile – 0.29. These values are consistent with previous findings30.
Next, like the unbiased session, we also fit data from the two biased sessions using the second fitting approach. We tested our fitting procedure by simulating 4 virtual participants (Table 2). These were modeled based on 4 actual participants from our dataset to ensure a consistency check for performance accuracy. Additionally, for the virtual participants, we computed the confidence intervals on their fitted parameters by simulating behavior 10 times and following the bootstrapping procedure as detailed below.
Specifically, we generated confidence intervals by creating either 100 bootstrapped (with replacement) datasets comprising of 600 trials each for the unbiased session (Fig. 6, Fig S5) or 100 balanced datasets for the biased tasks (Fig. 10, Fig S10). A balanced dataset was constructed using all trials from the underrepresented category and subsampling an equal number of trials from the overrepresented category. This balanced dataset separated the external bias in the experiment (captured by \({p}_{L}=0.7\) or \({p}_{H}=0.7\)) from the internalized expectation of the participants, as the latter can vary from 0 to 1. For example, participant in Fig. 9A has negligible internalized expectation of bias, whereas participants in Figs S10C, D have high internalized expectations. These data were fit with “post-grid search”, which is a finer-resolution optimization routine using the Nelder-Mead method from scipy.optimize.minimize (Python). During this optimization, we allowed \({\sigma }_{sensory}\) to vary within \(\pm\) 0.02.
When fitting the adapted full Bayesian model to participant performance, we reduced the number of fitting parameters from 5 \(({p}_{distractor}, {\sigma }_{sensory}, {W}_{0}, {W}_{1}\) and \(\tau\)) to 3 by using the median values of \({p}_{distractor}\) and \({\sigma }_{sensory}\). The parameter median values were calculated from the previous full Bayesian model fits to the bootstrapped (unbiased) or the balanced (biased low and biased high) datasets. To generate confidence intervals for \({W}_{0}, {W}_{1}\) and \(\tau\), for the unbiased data, we used 10 subsets of 500 contiguous trials and for the biased data, 10 subsets of 600 contiguous trials. Additionally, we fit these parameters using the following grid values: \({W}_{0}\in \left\{\text{0.35,0.38}\dots 0.66\right\}, {W}_{1}\in \{0, 0.1\dots 1\}\) and \(\tau \in {10}^{\{-1,-0.85 \dots 0.7\}}\).
Random-guess model
Because this model has only two free parameters, \({\sigma }_{sensory}{ \text{ and }p}_{low}\), we simplified the fitting procedure to directly fit either the 100 bootstrapped datasets (unbiased session) or the 100 balanced datasets (biased sessions). We fit \({p}_{low}\) with a grid search over {0, 0.1, 0.2, … ,1}.
Full Bayesian model with position-dependent weights
We reduce the 8 parameters in this model to 4. Participants are assumed to be veridical about the Gaussian parameters and for \({\sigma }_{sensory}\) we use the corresponding median value. Thus, we only fit \({p}_{distracto{r}_{positio{n}_{1}}}\), \({p}_{distracto{r}_{positio{n}_{2}}}{p}_{distracto{r}_{positio{n}_{3}}}\) and \({p}_{low}\). The fitting procedure is similar to the Full Bayesian model.
Metrics
Across the three experimental sessions, for nearly all the participants, we used the final model parameter values of the Full Bayesian model to compute their psychometric-curve fits and their Bayesian model posteriors (Figs. 5,7,8,9; Figs S3,4,10). Because we had our trials had 3 tones, we plotted the average psychometric curve across all 3 tone positions.
Average curve = \((p\left(High|ton{e}_{positio{n}_{1}}\right)+p\left(High|ton{e}_{positio{n}_{2}}\right)+p\left(High|ton{e}_{positio{n}_{3}}\right))/3\).
We tested the goodness of fit of the different models by comparing their Bayesian Information Criterion (BIC) values, which penalizes model complexity and encourages fits of simpler models (Figs. 6; Fig S10). We also tested the results using the Akaike Information Criterion and the corrected Akaike Information Criterion96, which were consistent with the BIC analysis.
From the full Bayesian model fits, we derived ‘sigmoidicity’ for each participant. This metric encapsulates the nature of the shape of the psychometric curve and thus, in turn the category choice of the participant. It is ~ 0 for participants who can accurately identify distractor tones as irrelevant to their category decisions (example participant in Fig. 5B) and is ~ 1 for participants whose psychometric curve resembles a step function. Specifically, for each participant sigmoidicity is computed using their respective posterior curve (examples in Figs S3B-D, S4B, E, H) as follows:
where the distractor sensory percept is more likely to be generated by the distractor tones than by the signal tones. Thus, in this formula, distractor sensory percept \(\in {\{3.98,\upmu }_{\text{low}}-2.1\upsigma -{\text{median}(\upsigma }_{\text{sensory}})\}\text{Hz} \text{and }\{{\upmu }_{\text{high}}+2.1\upsigma +\text{median}({\upsigma }_{\text{sensory}}), 39810.72\}\) Hz.
Generalized Linear model (GLM)
We fit a GLM to each participant’s psychophysical performance by choosing a weighting parameter α using ten-fold cross-validation. α is a parameter controlling a convex combination of Lasso and second-order Tikhonov regularization. \(\alpha \; = \;0\) implies only Lasso regularization while \(\alpha \; = \;1\) implies only Tikohonov regularization. For example, \(\alpha\) for the curve fit in Fig S3E is 0.3 whereas it is 0 for those in Figs S3F and S3G. The Tikhonov matrix (\(\tau\)) is designed such that \({\tau }_{ii}=-1, {\tau }_{i,i-1}=2\) and \({\tau }_{i,i+1}= -1.\)
The individual influence of the three tone positions in the sequence (Fig S5H) is computed by taking the sum of the absolute values of the corresponding GLM weights for each of the 30 tone frequencies. The weights for the first tone position are \({w}_{1}\dots {w}_{30}\), for the second tone position are \({w}_{31}\dots {w}_{60}\) and for the third tone position are \({w}_{61}\dots {w}_{90}\). We tested our fitting procedure by simulating 6 virtual participants (Table 3). The virtual participants were based on real experimental participants.
Boundary model
We fit a boundary model to each participant’s decision behaviour by first setting the constraint \({x}_{L}<{x}_{c}<{x}_{H}\). Because the stimulus set consisted of 30 unique tones, there were only 31 meaningful boundary locations. Once the order constraint was included, 4494 possible combinations of the 3 criteria remained, and the minimum negative log-likelihood was found by optimizing the model over all possible combinations.
Data and Code Availability
The de-identified data and code associated with this paper are publicly available and can be accessed at https://github.com/geffenlab/UncertaintyRelevanceLearningInterplay. We have also used Zenodo to assign a DOI to the repository: https://doi.org/10.5281/zenodo.7439086.
References
Heekeren, H. R., Marrett, S. & Ungerleider, L. G. The neural systems that mediate human perceptual decision making. Nat. Rev. Neurosci. 9, 467–479. https://doi.org/10.1038/nrn2374 (2008).
Chen, S. Y., Ross, B. H. & Murphy, G. L. Decision making under uncertain categorization. Front. Psychol. https://doi.org/10.3389/fpsyg.2014.00991 (2014).
Niwa, M. & Ditterich, J. Perceptual decisions between multiple directions of visual motion. J. Neurosci. 28, 4435–4445. https://doi.org/10.1523/JNEUROSCI.5564-07.2008 (2008).
Bushdid, C., Magnasco, M. O., Vosshall, L. B. & Keller, A. humans can discriminate more than 1 Trillion olfactory stimuli. Science https://doi.org/10.1126/science.1249168 (2014).
Garcia, S. E., Jones, P. R., Rubin, G. S. & Nardini, M. Auditory localisation biases increase with sensory uncertainty. Sci. Rep. 7, 40567. https://doi.org/10.1038/srep40567 (2017).
Heron, J., Whitaker, D. & McGraw, P. V. Sensory uncertainty governs the extent of audio-visual interaction. Vis. Res. 44, 2875–2884. https://doi.org/10.1016/j.visres.2004.07.001 (2004).
Pouget, A., Beck, J. M., Ma, W. J. & Latham, P. E. Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16, 1170–1178. https://doi.org/10.1038/nn.3495 (2013).
Qamar, A. T. et al. Trial-to-trial, uncertainty-based adjustment of decision boundaries in visual categorization. Proc. Natl. Acad. Sci. 110, 20332–20337. https://doi.org/10.1073/pnas.1219756110 (2013).
Beierholm, U., Rohe, T., Ferrari, A., Stegle, O. & Noppeney, U. Using the past to estimate sensory uncertainty. eLife 9, e54172. https://doi.org/10.7554/eLife.54172 (2020).
Barthelmé, S. & Mamassian, P. Evaluation of objective uncertainty in the visual system. PLOS Comput. Biol. 5, e1000504. https://doi.org/10.1371/journal.pcbi.1000504 (2009).
Zhou, Y., Acerbi, L. & Ma, W. J. The role of sensory uncertainty in simple contour integration. PLOS Comput. Biol. 16, e1006308. https://doi.org/10.1371/journal.pcbi.1006308 (2020).
Körding, K. P. & Wolpert, D. M. Bayesian integration in sensorimotor learning. Nature 427, 244–247. https://doi.org/10.1038/nature02169 (2004).
Wei, K. & Körding, K. Relevance of error: What drives motor adaptation?. J. Neurophysiol. 101, 655–664. https://doi.org/10.1152/jn.90545.2008 (2009).
Daliri, A. & Dittman, J. Successful auditory motor adaptation requires task-relevant auditory errors. J. Neurophysiol. 122, 552–562. https://doi.org/10.1152/jn.00662.2018 (2019).
Anders, U. M., McLean, C. S., Ouyang, B. & Ditterich, J. Perceptual Decisions in the Presence of Relevant and Irrelevant Sensory Evidence. Front. Neurosci. https://doi.org/10.3389/fnins.2017.00618 (2017).
Mirza, M. B., Adams, R. A., Friston, K. & Parr, T. Introducing a Bayesian model of selective attention based on active inference. Sci. Rep. 9, 13915. https://doi.org/10.1038/s41598-019-50138-8 (2019).
Lutfi, R. A., Kistler, D. J., Callahan, M. R. & Wightman, F. L. Psychometric functions for informational masking. J. Acoust. Soc. Am. 114, 3273–3282 (2003).
Oh, E. L., Wightman, F. & Lutfi, R. A. Children’s detection of pure-tone signals with random multitone maskers. J. Acoust. Soc Am. 109, 2888–2895 (2001).
Lutfi, R. A. Informational processing of complex sound. III: Interference. J. Acoust. Soc. Am. 91, 3391–3401. https://doi.org/10.1121/1.402829 (1992).
Hansen, K., Hillenbrand, S., and Ungerleider, L. Effects of Prior Knowledge on Decisions Made Under Perceptual vs. Categorical Uncertainty. Front. Neurosci. 6 (2012)
Kok, P., Brouwer, G. J., van Gerven, M. A. J. & de Lange, F. P. Prior expectations bias sensory representations in visual cortex. J. Neurosci. 33, 16275–16284. https://doi.org/10.1523/JNEUROSCI.0742-13.2013 (2013).
Rahnev, D., Lau, H. & de Lange, F. P. Prior expectation modulates the interaction between sensory and prefrontal regions in the human brain. J. Neurosci. 31, 10741–10748. https://doi.org/10.1523/JNEUROSCI.1478-11.2011 (2011).
Kok, P., Mostert, P. & de Lange, F. P. Prior expectations induce prestimulus sensory templates. Proc. Natl. Acad. Sci. 114, 10473–10478. https://doi.org/10.1073/pnas.1705652114 (2017).
Rohenkohl, G., Cravo, A. M., Wyart, V. & Nobre, A. C. Temporal expectation improves the quality of sensory information. J. Neurosci. 32, 8424–8428. https://doi.org/10.1523/JNEUROSCI.0804-12.2012 (2012).
Stocker, A. A. & Simoncelli, E. P. Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci. 9, 578–585. https://doi.org/10.1038/nn1669 (2006).
Mendonça, A. G. et al. The impact of learning on perceptual decisions and its implication for speed-accuracy tradeoffs. Nat. Commun. 11, 2757. https://doi.org/10.1038/s41467-020-16196-7 (2020).
Holt, L. L. Speech categorization in context: Joint effects of nonspeech and speech precursors. J. Acoust. Soc. Am. 119, 4016–4026. https://doi.org/10.1121/1.2195119 (2006).
Kluender, K. R., Coady, J. A. & Kiefte, M. Sensitivity to change in perception of speech. Speech Commun. 41, 59–69. https://doi.org/10.1016/S0167-6393(02)00093-6 (2003).
Holt, L. L., Lotto, A. J. & Kluender, K. R. Neighboring spectral content influences vowel identification. J. Acoust. Soc. Am. 108, 710–722. https://doi.org/10.1121/1.429604 (2000).
Gifford, A. M., Cohen, Y. E. & Stocker, A. A. Characterizing the impact of category uncertainty on human auditory categorization behavior. PLOS Comput. Biol. 10, e1003715. https://doi.org/10.1371/journal.pcbi.1003715 (2014).
Berniker, M., Voss, M. & Kording, K. Learning priors for bayesian computations in the nervous system. PLOS ONE 5, e12686. https://doi.org/10.1371/journal.pone.0012686 (2010).
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216. https://doi.org/10.1146/annurev.neuro.24.1.1193 (2001).
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221. https://doi.org/10.1038/nn1954 (2007).
Green, D. M. & Swets, J. A. Signal detection theory and psychophysics (John Wiley, 1966).
Watson, C. S. & Kidd, G. R. Studies of tone sequence perception: Effects of uncertainty, familiarity, and selective attention. Front. Biosci. 12, 3355–3366. https://doi.org/10.2741/2318 (2007).
Russ, B. E., Lee, Y.-S. & Cohen, Y. E. Neural and behavioral correlates of auditory categorization. Hear Res. 229, 204–212. https://doi.org/10.1016/j.heares.2006.10.010 (2007).
Banno, T., Lestang, J.-H. & Cohen, Y. E. Computational and neurophysiological principles underlying auditory perceptual decisions. Curr. Opin. Physiol. 18, 20–24. https://doi.org/10.1016/j.cophys.2020.07.001 (2020).
Ley, A. et al. Learning of new sound categories shapes neural response patterns in human auditory cortex. J. Neurosci. 32, 13273–13280. https://doi.org/10.1523/JNEUROSCI.0584-12.2012 (2012).
Tsunada, J. & Cohen, Y. E. Neural mechanisms of auditory categorization: From across brain areas to within local microcircuits. Front. Neurosci. https://doi.org/10.3389/fnins.2014.00161 (2014).
Cao, Y., Summerfield, C., Park, H., Giordano, B. L. & Kayser, C. Causal inference in the multisensory brain. Neuron 102, 1076-1087.e8. https://doi.org/10.1016/j.neuron.2019.03.043 (2019).
Acerbi, L., Dokka, K., Angelaki, D. E. & Ma, W. J. Bayesian comparison of explicit and implicit causal inference strategies in multisensory heading perception. PLOS Comput. Biol. 14, e1006110. https://doi.org/10.1371/journal.pcbi.1006110 (2018).
Rigoli, F., Pezzulo, G., Dolan, R. & Friston, K. A goal-directed bayesian framework for categorization. Front. Psychol. https://doi.org/10.3389/fpsyg.2017.00408 (2017).
Adler, W. T. & Ma, W. J. Comparing bayesian and non-bayesian accounts of human confidence reports. PLOS Comput. Biol. 14, e1006572. https://doi.org/10.1371/journal.pcbi.1006572 (2018).
Knill, D. C. & Pouget, A. The Bayesian brain: The role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719. https://doi.org/10.1016/j.tins.2004.10.007 (2004).
Kording, K. P. Bayesian statistics: Relevant for the brain?. Curr. Opin. Neurobiol. 25, 130–133. https://doi.org/10.1016/j.conb.2014.01.003 (2014).
Piasini, E., Balasubramanian, V. & Gold, J. I. Effect of Geometric Complexity on Intuitive Model Selection. In Machine Learning, Optimization, and Data Science Lecture Notes in Computer Science (eds Nicosia, G. et al.) (Springer International Publishing, 2022).
Weiss, Y., Simoncelli, E. P. & Adelson, E. H. Motion illusions as optimal percepts. Nat. Neurosci. 5, 598–604. https://doi.org/10.1038/nn0602-858 (2002).
Rangelov, D. & Mattingley, J. B. Evidence accumulation during perceptual decision-making is sensitive to the dynamics of attentional selection. Neuroimage 220, 117093. https://doi.org/10.1016/j.neuroimage.2020.117093 (2020).
Nett, N., Bröder, A. & Frings, C. When irrelevance matters: Stimulus-response binding in decision making under uncertainty. J. Exp. Psychol. Learn. Mem. Cogn. 41, 1831–1848. https://doi.org/10.1037/xlm0000109 (2015).
de Winkel, K. N., Katliar, M. & Bülthoff, H. H. Forced fusion in multisensory heading estimation. PLOS ONE 10, e0127104. https://doi.org/10.1371/journal.pone.0127104 (2015).
Wozny, D. R., Beierholm, U. R. & Shams, L. Probability matching as a computational strategy used in perception. PLOS Comput. Biol. 6, e1000871. https://doi.org/10.1371/journal.pcbi.1000871 (2010).
Rahnev, D. & Denison, R. N. Suboptimality in perceptual decision making. Behav. Brain Sci. 41, e223. https://doi.org/10.1017/S0140525X18000936 (2018).
Ding, N. & Simon, J. Z. Emergence of neural encoding of auditory objects while listening to competing speakers. Proc. Natl. Acad. Sci. 109, 11854–11859. https://doi.org/10.1073/pnas.1205381109 (2012).
Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236. https://doi.org/10.1038/nature11020 (2012).
Schwartz, Z. P. & David, S. V. Focal suppression of distractor sounds by selective attention in auditory cortex. Cereb. Cortex 28, 323–339. https://doi.org/10.1093/cercor/bhx288 (2018).
Jensen, A., Merz, S., Spence, C. & Frings, C. Interference of irrelevant information in multisensory selection depends on attentional set. Atten. Percept. Psychophys. 82, 1176–1195. https://doi.org/10.3758/s13414-019-01848-8 (2020).
Chen, J., Scotti, P. S., Dowd, E. W. & Golomb, J. D. Neural representations of task-relevant and task-irrelevant features of attended objects. Prepr. bioRxiv https://doi.org/10.1101/2021.05.21.445168 (2021).
Xu, Y. The neural fate of task-irrelevant features in object-based processing. J. Neurosci. 30, 14020–14028. https://doi.org/10.1523/JNEUROSCI.3011-10.2010 (2010).
Hansen, K., Hillenbrand, S. & Ungerleider, L. Persistency of priors-induced bias in decision behavior and the fMRI signal. Front. Neurosci. https://doi.org/10.3389/fnins.2011.00029 (2011).
Roark, C. L. & Holt, L. L. Long-term priors constrain category learning in the context of short-term statistical regularities. Psychon. Bull. Rev. https://doi.org/10.3758/s13423-022-02114-z (2022).
Hansen, K. A., Hillenbrand, S. F. & Ungerleider, L. G. Human brain activity predicts individual differences in prior knowledge use during decisions. J. Cogn. Neurosci. 24, 1462–1475. https://doi.org/10.1162/jocn_a_00224 (2012).
Gekas, N., McDermott, K. C. & Mamassian, P. Disambiguating serial effects of multiple timescales. J. Vis. 19, 24. https://doi.org/10.1167/19.6.24 (2019).
Tardiff, N., Suriya-Arunroj, L., Cohen, Y. E. & Gold, J. I. Rule-based and stimulus-based cues bias auditory decisions via different computational and physiological mechanisms. PLoS Comput Biol 18(10), e1010601 https://doi.org/10.1371/journal.pcbi.1010601 (2022).
Ming, V. L. & Holt, L. L. Efficient coding in human auditory perception. J. Acoust. Soci. Am. 126, 1312–1320. https://doi.org/10.1121/1.3158939 (2009).
Smith, E. C. & Lewicki, M. S. Efficient auditory coding. Nature 439, 978–982. https://doi.org/10.1038/nature04485 (2006).
Stilp, C. E. & Kluender, K. R. Stimulus statistics change sounds from near-indiscriminable to hyperdiscriminable. PLOS ONE 11, e0161001. https://doi.org/10.1371/journal.pone.0161001 (2016).
Heald, J. B., Lengyel, M. & Wolpert, D. M. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600, 489–493. https://doi.org/10.1038/s41586-021-04129-3 (2021).
Knill, D. C. & Richards, W. Perception as Bayesian Inference (Cambridge University Press, 1996).
Kidd, G. R., Watson, C. S. & Gygi, B. Individual differences in auditory abilities. J. Acoust. Soc. Am. 122, 418–435. https://doi.org/10.1121/1.2743154 (2007).
Oberfeld, D. & Klöckner-Nowotny, F. Individual differences in selective attention predict speech identification at a cocktail party. Elife 5, e16747. https://doi.org/10.7554/eLife.16747 (2016).
Lutfi, R. A., Pastore, T., Rodriguez, B., Yost, W. A. & Lee, J. Molecular analysis of individual differences in talker search at the cocktail-party. J. Acoust. Soc. Am. 152, 1804. https://doi.org/10.1121/10.0014116 (2022).
Lutfi, R. A., Rodriguez, B., Lee, J. & Pastore, T. A test of model classes accounting for individual differences in the cocktail-party effect. J. Acoust. Soc. Am. 148, 4014. https://doi.org/10.1121/10.0002961 (2020).
Ashwood, Z. C. et al. Mice alternate between discrete strategies during perceptual decision-making. Nat. Neurosci. https://doi.org/10.1038/s41593-021-01007-z (2022).
Honig, M., Ma, W. J. & Fougnie, D. Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions. Proc. Natl. Acad. Sci. 117, 8391–8397. https://doi.org/10.1073/pnas.1918143117 (2020).
Ihlefeld, A. & Shinn-Cunningham, B. Spatial release from energetic and informational masking in a selective speech identification task. J. Acoust. Soc. Am. 123, 4369–4379. https://doi.org/10.1121/1.2904826 (2008).
Middlebrooks, J. C. & Waters, M. F. Spatial mechanisms for segregation of competing sounds, and a breakdown in spatial hearing. Front. Neurosci. https://doi.org/10.3389/fnins.2020.571095 (2020).
Bronkhorst, A. W. The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Atten. Percept. Psychophys. 77, 1465–1487. https://doi.org/10.3758/s13414-015-0882-9 (2015).
Shinn-Cunningham, B. G. Object-based auditory and visual attention. Trends Cogn. Sci. 12, 182–186. https://doi.org/10.1016/j.tics.2008.02.003 (2008).
Wöstmann, M., Herrmann, B., Maess, B. & Obleser, J. Spatiotemporal dynamics of auditory attention synchronize with speech. Proc. Natl. Acad. Sci. 113, 3873–3878. https://doi.org/10.1073/pnas.1523357113 (2016).
Holt, L. L. & Lotto, A. J. Speech perception as categorization. Atten. Percept. Psychophys. 72, 1218–1227. https://doi.org/10.3758/APP.72.5.1218 (2010).
Freedman, D. J., Riesenhuber, M., Poggio, T. & Miller, E. K. A Comparison of Primate Prefrontal and Inferior Temporal Cortices during Visual Categorization. J. Neurosci. 23, 5235–5246. https://doi.org/10.1523/JNEUROSCI.23-12-05235.2003 (2003).
Zhong, L., Zhang, Y., Duan, C. A., Pan, J. & Xu, N. Dynamic and causal contribution of parietal circuits to perceptual decisions during category learning. Prepr. bioRxiv https://doi.org/10.1101/304071 (2018).
Zhong L, Zhang Y, Duan CA, Deng J, Pan J & Xu NL. Causal contributions of parietal cortex to perceptual decision making during stimulus categorization. Nat Neurosci. 22(6), 963-973. https://doi.org/10.1038/s41593-019-0383-6 (2019).
Akrami, A., Kopec, C. D., Diamond, M. E. & Brody, C. D. Posterior parietal cortex represents sensory history and mediates its effects on behaviour. Nature 554, 368–372. https://doi.org/10.1038/nature25510 (2018).
Bianco, R. et al. Long-term implicit memory for sequential auditory patterns in humans. eLife 9, e56073. https://doi.org/10.7554/eLife.56073 (2020).
Grinband, J., Hirsch, J. & Ferrera, V. P. A neural representation of categorization uncertainty in the human brain. Neuron 49, 757–763. https://doi.org/10.1016/j.neuron.2006.01.032 (2006).
van Bergen, R. S., Ji Ma, W., Pratte, M. S. & Jehee, J. F. M. Sensory uncertainty decoded from visual cortex predicts behavior. Nat. Neurosci. 18, 1728–1730. https://doi.org/10.1038/nn.4150 (2015).
Fetsch, C. R., Pouget, A., DeAngelis, G. C. & Angelaki, D. E. Neural correlates of reliability-based cue weighting during multisensory integration. Nat. Neurosci. 15, 146–154. https://doi.org/10.1038/nn.2983 (2012).
Woods, K. J. P., Siegel, M. H., Traer, J. & McDermott, J. H. Headphone screening to facilitate web-based auditory experiments. Atten. Percept. Psychophys. 79, 2064–2072. https://doi.org/10.3758/s13414-017-1361-2 (2017).
Bankieris, K. R., Bejjanki, V. R. & Aslin, R. N. Sensory cue-combination in the context of newly learned categories. Sci. Rep. 7, 10890. https://doi.org/10.1038/s41598-017-11341-7 (2017).
Bejjanki, V. R., Clayards, M., Knill, D. C. & Aslin, R. N. Cue integration in categorical tasks: insights from audio-visual speech perception. PLOS ONE 6, e19812. https://doi.org/10.1371/journal.pone.0019812 (2011).
McPherson, M. J. & McDermott, J. H. Diversity in pitch perception revealed by task dependence. Nat. Hum. Behav. 2, 52–66. https://doi.org/10.1038/s41562-017-0261-8 (2018).
Moller, H. & Pedersen, C. S. Hearing at low and infrasonic frequencies. Noise Health 6, 37 (2004).
Ashihara, K. Hearing thresholds for pure tones above 16kHz. J. Acoust. Soc. Am. 122, EL52–EL57. https://doi.org/10.1121/1.2761883 (2007).
Jas, M. et al. Pyglmnet: Python implementation of elastic-net regularized generalized linear models. J. Open Sour. Softw. 5, 1959. https://doi.org/10.21105/joss.01959 (2020).
Cavanaugh, J. E. Unifying the derivations for the Akaike and corrected Akaike information criteria. Stat. Prob. Lett. 33, 201–208. https://doi.org/10.1016/S0167-7152(96)00128-9 (1997).
Acknowledgements
We thank the members of the Geffen Lab for advice and feedback on the project. We also thank Doris Dijksterhuis and Sandra Reinert for their ideas regarding data analysis. This work was supported by the NIH grant (R01NS113241) to MNG, YC, and KK. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
J.S., J.S.C., K.P.K., E.P., Y.E.C. and M.N.G. designed the study. J.S. collected the data. J.S. and J.S.C. analyzed the data. J.S., J.S.C., K.P.K., Y.E.C. and M.N.G. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests in the design or execution of the study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Sheth, J., Collina, J.S., Piasini, E. et al. The interplay of uncertainty, relevance and learning influences auditory categorization. Sci Rep 15, 3348 (2025). https://doi.org/10.1038/s41598-025-86856-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-86856-5