Abstract
Neural variability, i.e. random fluctuations in neural activity, is a ubiquitous and sizable brain feature that impacts behavior. Its functional role however remains unclear and neural variability is commonly viewed as a nuisance factor degrading behavioral efficiency. Using functional magnetic resonance imaging in humans and computational modeling, we show here that neural variability provides a solution to the open issue regarding how the brain produces efficient adaptive behavior in uncertain and changing environments without facing computational complexity problems. We found that neural variability in the medial prefrontal cortex (mPFC) enables decision-making processes in the mPFC to produce near-optimal behavior in uncertain and ever-changing environments without involving complex computations known in such environments to rapidly become computationally intractable. The results thus suggest that in the same way as genetic variability contributes to adaptive evolution, neural variability contributes to efficient adaptive behavior in real-life environments.
Similar content being viewed by others
Introduction
Variability, i.e., random fluctuations in signal processing, is a ubiquitous and sizable feature of biological systems ranging from genetic to neural variability1,2. In the brain, neural variability has a direct impact on behavior3. Neural variability is both substantial and preserved throughout evolution and may be presumed to play an important functional role. Its function(s), however, remain(s) unclear, and neural variability is more commonly viewed as a nuisance factor degrading behavioral efficiency2,3. We report here behavioral and neuroimaging findings suggesting that, in contrast, neural variability plays a major role in producing efficient adaptive behavior in real-life environments featuring uncertain and changing situations.
In such environments, learning and adjusting to new situations optimally requires discounting past information in proportion to the environment's volatility, i.e., the frequency of situation changes4,5,6. Rodent, monkey, and human adaptive behavior was found to be consistent with this optimal adaptive principle, which was further associated with the dorsomedial prefrontal cortex (dmPFC), including the dorsal anterior cingulate cortex (dACC), along with the noradrenergic system4,7,8,9,10,11,12,13,14,15,16,17,18. The dmPFC has a well-documented role in guiding behavior based on internal beliefs about action-outcome contingencies (see, e.g., refs. 19,20,21,22,23). However, the neural mechanisms implementing this optimal adaptive principle remain poorly understood, notably because estimating the environment volatility involves complex inferential computations that rapidly become intractable4,6,24,25.
To clarify this issue, we hypothesized that neural variability might induce a behavioral flexibility consistent with this optimal adaptive principle without relying on complex computations. Indeed, previous studies first show that exploratory choices in uncertain and changing environments reflect computational imprecisions corrupting the learning of action-outcome contingencies26. Second, neural network models27 suggest that the variability in neuron spiking induces internal representations encoded in populations of neurons to undergo a stochastic variability obeying Weber’s law, i.e., scaling with the magnitude of changes in encoded representations28,29. When this Weber variability corrupts the formation of internal beliefs about action-outcome contingencies, computer simulations further show that the corrupted beliefs elicited nearly optimal adaptive behavior in uncertain and changing environments without relying on complex volatility inferences30. These results thus lead to the intriguing hypothesis that neural variability alone might induce the dmPFC to elicit efficient adaptive behavior by merely encoding beliefs, assuming stable environments but undergoing stochastic Weber variability.
We tested this hypothesis (named the Weber-variability model) by using human functional magnetic resonance imaging (fMRI) along with computational modeling to measure the aggregated impact of neural variability onto internal representations the brain encodes to guide behavioral choices.
Results
We scanned 22 participants while they were performing a standard two-armed bandit task with binary outcomes (reward vs. no reward) within a varying-volatility environment (Fig. 1B,C; “Methods” section). Participants had to choose in every trial one of the two visually presented arms by pressing one of two response buttons. One arm led to rewards more frequently (probability η = 85% vs. 1-η = 15%) but these reward contingencies reversed unpredictably with unbeknownst to participants, a probability (named volatility) varying episodically and pseudo-randomly along experimental sessions (volatility levels: 5%, 7% and 10%). Participants chose the current, more frequently rewarded arm in 72% of trials, well above chance level ( = 50%, T-test, T21 = 20.4, P < 10-10). Participants were also sensitive to volatility: participants switched their responses after no-rewarded trials more often in high than low volatility trials ( + 5% of trials; paired T-test: T21 = 2.89, P = 0.0089), indicating that consistent with the optimal adaptive principle, they discounted past information more in high than low volatility episodes.
A, left model comprising three hierarchically organized probabilistic inference levels. First-level: inferences about environment latent states zt characterizing current external contingencies (stimulus-action-outcome contingencies: st-at-rt). Second-level: inferences about change probabilities τt (i.e., volatility) in current latent states. Third-level: inferences about the change rate ν of volatility4. Rate ν is assumed to be constant across time but its estimate may vary across time. Middle, model comprising only the two lower inference levels: volatility τ is assumed to be constant but its estimates may vary across time5. Right, models comprising only the lowest inference level: the environment latent state z is assumed to remain unchanged over time but beliefs about its identity may vary over time. The Weber-variability model comprises only the first inference level but these first-order inferences undergo computational imprecisions in agreement with Weber’s Law, providing the necessary flexibility to adapt to changing environments. B trial structure of the two-arm bandit task. In every trial, participants chose one of the two visually presented arms (square vs. circle with randomized left-right positions) by pressing the corresponding hand-held response button. 0.4–3.8 s later (jittered) and contingent upon participants’ choices, they received a visually presented reward (1 euro shown) or not (red cross) (duration: 700 ms). Inter-trial intervals were jittered from 0.5 to 3.9 seconds. C Reward probabilities associated with arms (15% vs. 85%) reversed with probability 0.05 (low volatility episodes), 0.07 (middle volatility episodes), and 0.1 (high volatility episodes). Episode order was pseudo-randomized.
Modeling adaptive behavior
Optimal models
In this task, the optimal adaptive agent learns reward probabilities (η, 1-η) and forms state beliefs regarding how reward probabilities (η, 1-η) map onto bandits’ arms across successive trials. If the task contingencies were stationary (no reversals), these state beliefs would derive from merely registering online the outcomes (reward vs. no reward) associated with chosen arms over trials, what we name first-order inferences (“Methods” section) (Fig. 1A).
Because of reversals, the optimal agent needs to further discount the weight of its prior state beliefs in forming its subsequent state beliefs according to the probability of reversals, i.e., the environment volatility. The more volatile the environment is, the less the agent should rely on past information. The optimal agent infers the environment volatility from the history of action outcomes. For simplicity, the agent may assume that volatility remains constant across trials as in ref. 5 (although its successive estimates may vary). We refer to this optimal agent as the second-order volatility inference model (Fig. 1A). In the present task, however, volatility varied across trials, so that, as described in ref. 4, the truly optimal agent needs to further infer the change rate of volatility across trials to properly infer volatility trial-by-trial. We refer to this optimal agent as the third-order volatility inference model (Fig. 1A) (“Methods” section).
All these inference models are deterministic: the choice and outcome history fully determines current state beliefs. Yet, the second- and third-order volatility inference models are computationally intractable. We emulated these models using the recently developed, sequential Monte Carlo method based on particle filtering31,32,33,34 (“Methods” section).
We also investigated the varying-volatility inference model tailored to the present task, which generative model is based on the exact but hidden step-wise structure of volatility across the protocol. Unsurprisingly, the two generic second-/third-order volatility inference models equally outperformed this tailored model in accounting for participants’ choices (Supplementary Fig. 1), which was therefore no longer considered in subsequent analyses.
The Weber-variability model
The first-order inference model assumes stationary action-outcome contingencies and evidently leads to poor adaptive performances in changing environments. The Weber-variability model remedies this limitation by (counter-intuitively) merely assuming that belief updating in the first-order inference model undergoes computational imprecision stemming from neural variability (Fig. 1A, “Methods” section). In accordance with Weber’s law28,29,35 indeed, these imprecisions presumably scale with the magnitude of belief updating and consequently, increase whenever reversals occur. The imprecisions, therefore, induce current state beliefs to depend less upon prior state beliefs, when reversals occur more frequently or equivalently, when the environment volatility increases in accordance with the optimal adaptive principle.
Neural variability and, consequently, computational imprecision are further inherently stochastic. We therefore assumed that these imprecisions induce belief updating in the first-order inference model to undergo increased variance/entropy in every trial t modeled as a random variable \({\epsilon }_{t}\). Conforming to Weber’s law, variance increases \({\epsilon }_{t}\) are assumed to scale with the magnitude dt of belief updating:
where μ, λ are non-negative free parameters quantifying the constant and Weber component of computational imprecision, respectively. \({u}_{t}\) is a random variable uniformly distributed over the range [0;1] accounting for imprecision stochasticity. Thus, variance increases \({\epsilon }_{t}\) in belief updating (named Weber-variability for clarity) vary randomly between 0 and \(\mu+\lambda {d}_{t}\) in every trial t. Previous computer simulations confirmed that this model approaches optimal adaptive behavior even in varying-volatility environments30.
Weber-variability \({\epsilon }_{t}\) presumably stems from neural variability. Random variable \({\epsilon }_{t}\) only quantifies the resulting computational imprecision that first-order inferences undergo. Accordingly, no additional computations (such as magnitude dt of belief updating) are assumed to occur beyond the basic first-order inference model. Thus, the Weber-variability model is a stochastic adaptive process: stochastic imprecision corrupts and carries over trials in the online formation of state beliefs, so that the choice and outcome history does not fully determine current state beliefs.
To assess the actual stochasticity of Weber-variability, we also considered the deterministic nested variant of the Weber-variability model that discarded its stochastic component \({u}_{t}\): belief updating undergoes deterministic adjustments of variance scaling with quantity \(\left(\mu+\lambda {d}_{t}\right)\), which corresponds to the previously proposed Bayesian-surprise heuristics36. The choice and outcome history then fully determines current state beliefs. In contrast to the Weber-variability model, this Bayesian-surprise model conceptually implies that deterministic quantity \(\left(\mu+\lambda {d}_{t}\right)\) is explicitly computed to algorithmically adjust state beliefs from first-order inferences to approximate the optimal adaptive principle.
Reinforcement learning models
We also considered reinforcement learning (RL) processes as potential alternative accounts of participants’ adaptive performances. We investigated the standard Rescorla & Wagner’s (RW) and Pearce-Hall’s (PH) RL processes37,38,39 comprising constant and adaptive learning rates as free parameters, respectively (“Methods” section). We also investigated their noisy variant undergoing stochastic computational imprecision. Similar to Weber-variability, these imprecisions were assumed to scale with the magnitude of action-value updating (i.e., with reward prediction errors reflecting the discrepancy between expected and actual rewards) and to stochastically corrupt in every trial the learning of action values. We found that among all these RL models and across all the analyses described below, the noisy RW-RL best fit participants’ performances systematically (Supplementary Fig. 2). We therefore report only this model in the following.
The Weber-variability model best accounts for human performance
To capture potential noise in participants’ action selection, all the models described above further included a softmax decision policy (free parameter: inverse temperature β40). Unlike the computational imprecisions postulated in the Weber-variability and noisy RL models, selection noise corrupts no internal representations (state beliefs or action values) and consequently, does not carry over successive trials. Computational imprecisions in the models were therefore estimated with no potential selection noise confounds.
Model fits were compared using Bayesian model comparisons with uniform priors (BMC), which optimally balance model complexity and adequacy to data and prevent overfitting issues. Accordingly, we computed exact model likelihoods given participants’ choice data by marginalizing over model parameter spaces using particle filtering Monte Carlo methods31,33,41, in order to derive Model Posterior Probabilities (MPPs) and Exceedance Probabilities over participants42,43 (“Methods” section).
BMC revealed that compared to the second-/third-order volatility inference and noisy RW-RL model, the Weber-variability model decisively best fitted participants’ choices (Pexceedance = 0.963; Fig. 2A). Its best-fitting free parameters (posterior Bayesian estimates) confirmed that Weber component \(\lambda\) dominantly contributed to Weber-variability \({\epsilon }_{t}\) in belief updating (mean[s.e.m]: μfit = 0.010[0.001]; λfit = 0.053[0.007]; λfit〈dt〉 = 0.023[0.0007]; βfit = 6.67[0.99]).
A Bayesian model comparison given participants’ choices across the third-/second-order volatility inference model, Weber-variability model, and best-fitting RL model (noisy RW-RL, Supplementary Fig. 2). Bars show exact model posterior probabilities over the n = 22 participants (“Methods” section). Model exceedance probabilities are shown in brackets. Error bars: Bayesian estimates of model posterior probability standard deviations. The Weber-variability model fitted decisively better than the other models. B Confusion matrices from the model recovery procedure across the same models. Large matrix: exact model posterior probability given models’ simulated performances; small matrix: model exceedance probability. Each model fitted its own simulated performance in the task decisively better than the other models. C simulations of fitted model performances compared to participants’ performances around reversals in high volatility episodes. Shaded areas: s.e.m. across participants. Statistically significant differences (two-sided T-tests over three consecutive trials, d.f. = 21) are shown. Left: *p = 0.029, ****p = 0.000051; middle left: **p = 0.0098, ****p = 0.000952; middle right: all ps > 0.09. Right: all ps < 0.000015. Only the Weber-variability model reproduced participants’ choices with no significant deviations. Note that the best RL model (noisy RW-RL) dramatically failed (see Supplementary Note 1 for explanation). Source data are provided as a Source Data file.
Weber-variability and noisy RW-RL model fitting relied on inferring the trial-by-trial realizations of their stochastic component. As the impact of these realizations carried over trials, their estimates in every trial depended upon all trials, making these estimates strongly related to each other (as when fitting Kalman filters to empirical data40, “Methods” section). Bayesian estimation and comparison constrained these realizations to best comply with their generative distribution and prevented them from potential overfitting issues. Yet, the possibility remained that such realizations might spuriously compensate for potential systematic model misspecifications regarding participants’ behavior rather than express genuine stochastic fluctuations as postulated. To address this issue, we first simulated every model in the present protocol using its best-fitting free parameters along with Weber-variability and noisy RW-RL model stochastic components randomly drawn from their respective generative distribution, irrespective of participants’ choices and their actual outcomes.
BMC first showed that every model fitted its own simulated performance decisively better than the other models (all Pexceedance > 0.999, Fig. 2B), confirming that our protocol and fitting procedure properly disentangled the models and prevented from potential overfitting issues44. Second, only the Weber-variability model reproduced participants’ performances preceding and following reversals with no significant deviations (Fig. 2C), indicating that in model fitting, the estimated realizations of Weber-variability stochastic component \({u}_{t}\) (posterior Bayesian estimates denoted \({U}_{{fit}}^{t}\)) were unlikely to reflect model misspecifications regarding participants’ responses to reversals (Supplementary Note 1).
Consistent with the optimal adaptive principle, the Weber-variability model, like participants, was also sensitive to the environment volatility: the model switched its response after no-rewarded trials more often in high than low volatility trials (+9% of trials; paired T-test: T21 = 6.64, P < 0.0001). Weber component λfit indeed induced past information to be discounted more in high than low volatility episodes, as explained in the model description.
In this analysis, model performances were averaged across simulations and episodes, which factors out the impact of stochastic fluctuations. The result, therefore, implies that with or without computational noise, volatility inference and RL models could not account for participants’ performances, while the Weber-variability model performance reproducing participants’ behavior resembled the Bayesian-surprise model performance. Yet, the Weber-variability model fitted participants’ performances decisively better than the Bayesian-surprise model (BMC: \({{MPP}}_{{Weber}-{variability}}=0.93,{{MPP}}_{{Bayesian}-{surprise}}=0.07,{P}_{{exceedance}} > 0.9999\)), indicating that the Weber-variability stochastic component \({u}_{t}\) accounted for additional variance in participants’ performances. We then examined the possibility that the fitted realizations of the Weber-variability component \({u}_{t}\) might reflect other model misspecifications unrelated to reversals rather than expressing genuine stochastic fluctuations as postulated.
One possibility is that these realizations might capture some choice history effects in participants’ responses, including previously documented response repetition or choice trace biases45, which no models we investigated so far take into account. We examined this possibility by inserting either a response repetition or a choice trace bias in the softmax decision policy within every model (“Methods” section). While in both cases, the insertion indeed improved model fits, Bayesian model comparison confirmed the results above: the Weber-variability model again fitted participants’ performances decisively better than the noisy RW-RL, second- and third-order volatility inference model (in both cases: Pexceedance > 0.96, Fig. 3A). Also, the Weber-variability model again fitted decisively better than the Bayesian-surprise model (in both cases: \({{MPP}}_{{Weber}-{variability}} > 0.72,{{MPP}}_{{Bayesian}-{surprise}} < 0.28,{P}_{{exceedance}} > 0.987\)). Thus, the fitted realizations of the stochastic component \({u}_{t}\) were unlikely to simply reflect model misspecifications regarding response repetition and choice trace biases.
A Bayesian model comparison given participants’ choices (computed over the n = 22 participants) when models comprise no choice history accounts (left, same data as in Fig. 2A), comprise repetition biases (middle) and choice trace biases (right). Bars show exact model posterior probabilities. Model exceedance probabilities are shown in brackets. Error bars: Bayesian estimates of model posterior probability standard deviations. In every case, the Weber-variability model fitted decisively better than the other models, indicating that fitted Weber variability was unrelated to such choice history effects. B One-way ANOVA of best-fitting realizations \({U}_{{fit}}^{t}\) of Weber variability stochastic component \({u}_{t}\) over the n = 22 participants and factoring choice history from trial t-3 to t-1 (Switch vs. Repeat responses relative to preceding trials) into an 8-level fixed-effect factor. Bars show the means over participants. Error bars are s.d. across trials within participants. The choice history factor accounted for only η2 = 17% of \({U}_{{fit}}^{t}\) total variance, indicating that 83% of \({U}_{{fit}}^{t}\) total variance was unrelated to any three-fold choice history. C autocorrelations of best-fitting realizations \({U}_{{fit}}^{t}\) of Weber variability stochastic components \({u}_{t}\) across successive trials averaged over participants (n = 22). These realizations \({U}_{{fit}}^{t}\) showed virtually no autocorrelations (all R2 < 0.005). Bars and error bars are mean ± s.e.m. over the n = 22 participants. D Empirical distribution of best-fitting realizations \({U}_{{fit}}^{t}\) across trials and participants. Prior generative distribution \({u}_{t}\) is uninformative, i.e., uniform over [0;1]. The posterior distribution is obtained from marginalizing over parameter spaces and particle trajectories from particle filters (see “Methods” section). Note that this empirical posterior distribution is approximately Gaussian, centered on its mean 0.5, as expected from averaging over a series of independent random variables. E Mean ± s.d. (over the n = 22 participants) of empirical best-fitting realizations \({U}_{{fit}}^{t}\) distributions along experimental blocks (scanning runs) and fMRI sessions. Note the lack of any temporal order effects (F(5,105) = 1.737, p = 0.1544). Source data are provided as a Source Data file.
To more generally assess whether these fitted realizations could spuriously reflect any potential, known or unknown choice history effects, we computed best-fitting realizations \({U}_{{fit}}^{t}\) of the Weber-variability stochastic component \({u}_{t}\) (posterior Bayesian estimates). We entered these best-fitting realizations \({U}_{{fit}}^{t}\) into a one-way ANOVA factoring all possible choice histories over consecutive trials t−3, t−2, and t-1 into an 8-level fixed-effect factor (Fig. 3B). In this ANOVA, the choice history factor captured to what extent any three-fold sequences of past choices could explain best-fitting realizations \({U}_{{fit}}^{t}\). We found that consistent with the actual presence of response repetition and/or choice trace biases, this choice history factor indeed exhibited a significant effect (F(7,167) = 4.91, p < 0.001). However, the choice history factor accounted for only η2 = 17% of \({U}_{{fit}}^{t}\) total variance, indicating that conversely, 83% of \({U}_{{fit}}^{t}\) variability was unrelated to any potential choice history effects. Thus, best-fitting realizations \({U}_{{fit}}^{t}\) were dominantly unrelated to any model misspecifications regarding choice history.
Another possibility is that the fitted realizations of the Weber-variability component \({u}_{t}\) might spuriously reflect some variations in participants’ attention, arousal, fatigue or global internal states across trials that no models investigated here take into account46. Such potential variations predict best-fitting realizations \({U}_{{fit}}^{t}\) to exhibit autocorrelations rather than to fluctuate independently across successive trials as postulated. We found the best-fitting realizations \({U}_{{fit}}^{t}\) to exhibit virtually no autocorrelations across successive trials (all R2 < 0.005) (Fig. 3C), indicating that more than 99.99% of \({U}_{{fit}}^{t}\) variability was indeed uncorrelated across successive trials. Thus, the fitted realizations of the Weber-variability component \({u}_{t}\) were unlikely to spuriously reflect potential variations in participants’ attention, arousal, fatigue, or global internal states across trials.
More generally, we reasoned that if the fitted realizations of the Weber-variability component \({u}_{t}\) reflected any model misspecifications rather than stochastic Weber variability, the Weber-variability model and the same model with Weber component \(\lambda\) set to zero (so that variance increases in belief updating are modeled as\({\epsilon }_{t} \sim \mu \cdot {u}_{t}\) rather than \({\epsilon }_{t} \sim \left(\mu+\lambda {d}_{t}\right)\cdot {u}_{t}\)) should fit participants’ performances equally well. The Weber-variability model, however, fitted participants’ performances decisively better (Bayesian model comparison: \({{MPP}}_{\lambda > 0}=0.89;{{MPP}}_{\lambda=0}=0.11;{P}_{{exceedance}} > 0.9999\)), indicating that the fitted realizations of the Weber-variability component \({u}_{t}\) were unlikely to reflect model misspecifications rather than Weber variability.
Moreover, neural network models predict Weber variability stochastic fluctuations to be Gaussian distributed27. We consistently found that while the prior generative distribution of \({u}_{t}\) was uninformative (i.e., uniform), the empirical posterior distribution of best-fitting realizations \({U}_{{fit}}^{t}\) was approximately Gaussian and centered on its mean 0.5 (Fig. 3D). Of note, the observed Gaussian-like empirical distribution exhibited some slight skewness consistent with the results above that a small fraction of \({U}_{{fit}}^{t}\) variability reflected some choice history effects. Finally, the mean and variance of the empirical posterior distribution exhibited no significant variations over time (F(5,105) = 1.737, p = 0.1544)(Fig. 3E).
Altogether, these results showed that the fitted realizations of the Weber-variability component \({u}_{t}\) were essentially Gaussian-like distributed, uncorrelated across successive trials, stable over time, and unlikely to spuriously reflect known or unknown potential model misspecifications. Accordingly, the results revealed that the best-fitting Weber-variability model entailed belief updating guiding participants’ choices to undergo a variability consistent with Weber’s law and empirically featuring stochastic fluctuations. We denoted \({\epsilon }_{t}^{{fit}}=\left({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\right).{U}_{{fit}}^{t}\) this estimated Weber-variability (posterior Bayesian estimates, “Methods” section).
dmPFC computes choices from beliefs undergoing Weber variability
We next examined the hypothesis that the dmPFC guides adaptive behavior and computes choices from corrupted beliefs, i.e., state beliefs from the best-fitting Weber-variability model. We entered fMRI activity in full variance regression analyses (significance voxel-wise threshold: P = 0.001; cluster-wise threshold: P = 0.05, FWE-corrected for multiple comparisons). All post-hoc analyses of activations removed selection biases through standard leave-one-out procedures47 (“Methods” section).
According to previous studies21,48, the fMRI signature of choice computations is that besides increasing with reaction times (RTs), activations should exhibit two concomitant effects at choice time: (1) an effect reflecting the neural demand in reaching actual choices, i.e., activations decrease when the decision variable increasingly favors the chosen relative to unchosen option; (2) an effect reflecting the neural demand in encoding the decision variable irrespective of chosen options, i.e., besides the preceding choice-reaching effect, activations increase when the decision variable conveys more information or equivalently, increasingly differentiates choice options. In the present protocol featuring binary outcomes and constant reward probabilities (85% vs. 15%), the decision variable—i.e., the difference in reward expectations between bandit’s arms—scaled with the difference in state beliefs (“Methods” section). Our hypothesis thus predicts that besides varying with RTs, dmPFC activations should exhibit the two following effects concomitantly: (1) decreasing activations when the corrupted belief supporting the chosen relative to unchosen arm increases; (2) in addition to the preceding choice-reaching effect, increasing activations when the corrupted beliefs increasingly differ between the two bandit arms. We therefore entered brain activations at choice time in the regression analysis, which, along with RTs, includes the two regressors capturing the two effects: namely, the signed and unsigned difference between corrupted beliefs, i.e., the linear and quadratic expansion of belief differences between the chosen and unchosen arm21,48 (“Methods” section).
The whole-brain analysis revealed that only activations in the dmPFC extending from the dACC to the pre-Supplementary Motor Area (pre-SMA) exhibited both choice computation effects (Fig. 4A). These dmPFC activations increased with RTs (T21 = 7.91, P < 0.00001) and independently, exhibited both the predicted linear and quadratic effects (both |T21 | > 3.97, Ps < 0.0007) (Fig. 4B, C) (Supplementary Note 2, Supplementary Fig. 4), confirming previous evidence that the dmPFC plays a central role in computing choices from state beliefs (see, e.g., ref. 21). Moreover, activation residuals from this regression analysis predicted participants’ choices no better than chance (logistic model with switch/stay responses and regression residuals as the dependent and independent variable, respectively: model likelihood relative to chance: T21 = 1.58, p = 0.13), indicating that choice computations from corrupted beliefs accounted for the dmPFC involvement in participants’ choices. These residuals were further independent of Weber variability, best-fitting realizations \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\) even when removing the RT regressor (both T21 < 1.5, Ps > 0.148, Fig. 4C), indicating that dmPFC activations reflecting choice computations from corrupted beliefs accounted for the Weber variability observed in participants’ choices. We refer to these residuals as non-corrupting residuals.
A unique MRI bold activation cluster in the whole-brain analysis associated with choice computations at choice time, i.e., exhibiting jointly a negative linear effect \(\left(-({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})\right)\) and a positive quadratic effect \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\) of chosen-relative-to-unchosen beliefs undergoing Weber variability (dark blue, conjunction analysis, voxel-wise threshold p < 0.001, cluster-wise FWE-corrected p < 0.05); Linear and quadratic effects capture signed and unsigned differences respectively between chosen and unchosen beliefs (see text). Activations are superimposed on the MNI template sagittal and axial slices centered on the unsigned difference activation peak. B MRI activity at choice time averaged over the activation cluster shown in (A) plotted against chosen-relative-to-unchosen beliefs undergoing Weber variability and factoring out RTs and either the quadratic effect (left) or the linear effect (right). Note both the predicted negative linear effect and positive quadratic effect. C Full variance analyses at choice time over the activation cluster shown in (A) comprising the signed \({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}}\) and unsigned \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\) differences between chosen and unchosen beliefs undergoing Weber variability, along with Weber variability (and with or without Reaction Times) as regressors. Data points show individual subjects’ data. Note that the Weber variability regressor captured no residual variances. Left graph: ***from left to right bars: p < 0.00001, p = 0.000038, p = 0.00007, p = 0.148; right graph: p < 0.00001, p = 0.000012, p = 0.138 (one-sample two-sided T-test, d.f. = 21). D Best-fitting realizations \({U}_{{fit}}^{t}\) of Weber variability stochastic components \({u}_{t}\) plotted against activation residuals averaged over the cluster shown in (A) and normalized by Weber variability deterministic component \({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\). Activation residuals were computed from the regression analysis comprising Reaction Times and both \(\left(-({B}_{{ch}}-{B}_{{unch}})\right)\) and \({({B}_{{ch}}-{B}_{{unch}})}^{2}\) as regressors with Weber variability \({\epsilon }_{t}^{{fit}}\) removed from current beliefs when forming these regressors. The graph indicates that the best-fitting realizations \({U}_{{fit}}^{t}\) that corrupt beliefs driving choices reflected neural fluctuations, corrupting belief updating in the pre-SMA and ACC. All error bars are s.e.m. across participants. ***p < 0.001 (one-sample two-sided T-test, d.f. = 21). Source data are provided as a Source Data file.
Critically, this choice computation regression model based on corrupted beliefs accounted for dmPFC activity decisively better than the same regression model based on beliefs from the best-fitting Bayesian-surprise model (BMC removing selection biases: \({{{MPP}}_{{Weber}-{variability}}=0.958,{{MPP}}_{{Bayesian}-{surprise}}=0.042,P}_{{exceedance}} > 0.9999\)). This result suggests that the Weber variability corrupting state beliefs encoded in the dmPFC to guide choices originates from neural variability across trials. We assessed this interpretation by considering the activation residuals from the choice computation regression model based on beliefs updated from the history of corrupted beliefs but in the current trial, updated without undergoing Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\). We refer to these residuals in our dmPFC region as corrupting residuals as they correspond to neural fluctuations presumably inducing in every trial the Weber variability that corrupts state beliefs guiding participants’ choices.
Confirming the hypothesis, we found that unlike non-corrupting residuals, corrupting residuals correlated with Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\): corrupting residuals normalized by deterministic component \({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\) linearly scaled with best-fitting realizations \({U}_{{fit}}^{t}\) of Weber variability stochastic component (R = 0.56, p < 0.001, Fig. 4D). Consistent with the hypothesis furthermore, corrupting residuals predicted participants’ choices better than chance (logistic model with switch/stay responses and corrupting residuals as dependent and independent variable, respectively: model likelihood relative to chance: T21 = 2.38, p = 0.027) and better than non-corrupting residuals (BMC between logistic models removing selection biases: \({{{MPP}}_{{non}-{corruptingresiduals}}=0.28,{{MPP}}_{{corruptingresiduals}}=0.72,P}_{{exceedance}}=0.987\)) in both switch and stay trials (logistic model likelihood differences in repeat trials: T21 = 2.39, p = 0.026; in switch trials: T21 = 2.12, p = 0.046). Altogether, these results provide evidence that the dmPFC computed choices from state beliefs undergoing a Weber-variability originating from neural variability across trials.
Weber variability stems from neural variability corrupting belief updating
We next tested whether Weber-variability arises from neural variability corrupting belief updating that unfolds between two successive trials from action outcome to next choice onsets. At choice time, belief updating is completed and, as reported above, dmPFC activations reflected choice computations from corrupted beliefs. As Weber variability increases belief entropy, which augments the neural demand in computing choices, we reasoned that at choice time, dmPFC activations should overall increase with Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\). At outcome time, in contrast, belief updating starts, and dmPFC activations reflect the neural demand in belief updating. This neural demand augments with belief updating magnitude \({d}_{t}\) but decreases when belief updating becomes more imprecise and features larger computational imprecision. Weber-variability measures these imprecisions and assumes them to further scale with belief updating magnitude \({d}_{t}\), thereby combining two opposite effects on neural demands. The Weber-variability model thus predicts dmPFC activations at outcome time to neither increase nor decrease with Weber variability significantly.
The whole-brain regression analysis, including Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\) as the unique regressor-of-interest at outcome and choice time confirmed that at choice time, dmPFC activations increased with Weber variability \({\epsilon }_{t}^{{fit}}\) (Fig. 5A, B) (Post-hoc statistics removing selection biases: T21 = 10.34, P < 0.000001). These activations involved both the dACC and pre-SMA and formed a cluster virtually identical to that we identified above as subserving choice computations. As predicted in contrast, activations at outcome time within this region were weakly associated with Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\) (post-hoc statistics: T21 = −1.99, P = 0.059) (Fig. 5B).
A MRI bold activation cluster in the dmPFC associated with Weber variability \({\epsilon \, }_{t}^{{fit}}=\left({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\right).{U}_{{fit}}^{t}\) at choice time (light-blue, voxel-wise p < 0.001, cluster-wise FWE-corrected p < 0.05), superimposed on the MNI template sagittal slice centered on the voxel exhibiting the maximal correlation. B Weber-variability -related activations averaged over the activation cluster shown in (A) at outcome and choice time (mean ± s.e.m. over the n = 22 participants from a leave-one-out procedure removing selection biases). Note the expected marginal correlation at outcome time (\({{{\boldsymbol{ \sim }}}}\) p = 0.059; ****p < 0.0000001; two-sided one-sample T-tests). C Bayesian model comparison over the n = 22 participants between Weber variability \({\epsilon \, }_{t}^{{fit}}\) and its sole deterministic component \({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\) as concurrent models of these activations at both outcome and choice time. Bars are model posterior probabilities; error bars are Bayesian estimates of model posterior probability standard deviations. Model exceedance probability Pexc is shown. To remove selection biases, the Bayesian model comparison was performed only on voxels correlating with both \({\epsilon }_{t}^{{fit}}\) and \({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\) (voxel-wise threshold p < 0.001) and through a leave-one-out procedure. D, left: model with \({U}_{{fit}}^{t}\) as the unique regressor (**p = 0.0045; ****p = 0.0000051; one-sample two-sided T-tests, d.f. = 21); right: full variance analyses over these voxels comprising Weber variability \({\epsilon \, }_{t}^{{fit}}\) as regressor of interest and factoring out deterministic component \({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}\) (shown on the plot) along with other variables of no interest, including RTs, response switches, and RL variables (**p = 0.0021, ***p = 0.00025; one-sample two-sided T-tests, d.f. = 21). Error bars are s.e.m. across participants. dmPFC activity correlated negatively at outcome time and positively at choice time with best-fitting realizations \({U}_{{fit}}^{t}\) of Weber variability stochastic component \({u}_{t}\). Data points show individual subjects’ data. See supplementary Fig. 5 for additional analyses regarding individual data. Source data are provided as a Source Data file.
The latter result relies on problematically accepting a null hypothesis. To circumvent this problem, we considered best-fitting realizations \({U}_{{fit}}^{t}\) of Weber-variability stochastic component \({u}_{t}\), which captured stochastic variations in computational imprecisions irrespective of belief updating magnitude \({d}_{t}\). We thus predicted dmPFC activations at outcome time to decrease with best-fitting realizations \({U}_{{fit}}^{t}\), as reflecting neural demand decreases when beliefs are updated with less precision, irrespective of updating magnitudes. At choice time, by contrast, dmPFC activations should again increase with best-fitting realizations \({U}_{{fit}}^{t}\), as reflecting neural demand increases in computing choices from more imprecise beliefs.
BMC first confirmed that Weber-variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\) explained these dmPFC activations at both choice and outcome time decisively better than its sole deterministic component \(({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\) (Pexceedance > 0.953), even when considering only the dmPFC voxels correlating significantly at choice time with this deterministic component (Fig. 5C). Thus, dmPFC activations were associated with best-fitting realizations \({U}_{{fit}}^{t}\) independently of updating magnitudes \({d}_{t}\). Second, the regression analysis including \({U}_{{fit}}^{t}\) as the unique regressor confirmed the predicted signs of this association: dmPFC activations were negatively and positively associated with \({U}_{{fit}}^{t}\) at outcome and choice time, respectively (Fig. 5D, left). Third, to control for potential confounding factors, we entered these dmPFC activations in an additional regression analysis including Weber-variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\cdot {U}_{{fit}}^{t}\) as one regressor and deterministic component \(({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\) as a second regressor factoring out updating magnitudes \({d}_{t}\) (Fig. 5D, right). The Weber-variability regressor thus captured only activations associated with best-fitting realizations \({U}_{{fit}}^{t}\), while the deterministic component regressor \(({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\) could capture no effects as the full variance regression placed the shared variance between the two regressors in residuals. We also included the following potentially confounding variables as regressors of no interest: at choice time, RTs, response repetition/switch, chosen arms, and arm reward values computed from the best-fitting RW-RL model; and at outcome time, chosen arms, signed and unsigned reward prediction errors computed from the best-fitting RW-RL model. The results confirmed that even when including these potentially confounding factors, activations remained negatively and positively associated with Weber-variability regressor (i.e., \({U}_{{fit}}^{t}\)) at outcome and choice time, respectively (all \(\left|{T}_{21}\right| > 3.52\), Ps < 0.0021, Fig. 5D, right).
These findings ruled out interpreting the observed Weber-variability and associated dmPFC activations as reflecting auxiliary (attentional) control processes that adjust belief updating in relation to the occurrence of surprising action outcomes49,50, while the stochastic component \({u}_{t}\) merely reflects trial-by-trial control fluctuations. In contrast to what we observed, this interpretation indeed predicts dmPFC activations to especially increase at outcome time with Weber variability and its stochastic component \({u}_{t}\). Indeed, both variables in this interpretation measure the intensity of auxiliary control and related adjustment processes, which increase with the magnitude of belief updating. Instead, our findings provide evidence that Weber-variability corrupting state beliefs guiding choices originated in neural variability in belief updating processes that unfold between two successive trials from action outcome to next choice onsets.
Weber-variability explains dmPFC activity correlating with volatility estimates
According to the Weber-variability model, no inferences about the environment volatility are required to produce efficient adaptive behavior. Previous studies4,14 however, report that following action outcomes, dmPFC activations correlate with the volatility estimates from volatility inference models. We then examined whether the Weber-variability model might explain such volatility-related activations.
Replicating previous results first4, a whole-brain regression analysis confirmed that activations in the dmPFC correlated at outcome time with volatility estimates from the third-order volatility inference model (Fig. 6A, B). These activations involved a region slightly above the previously reported one4 (Supplementary Fig. 3A), but we observed post-hoc that the latter also exhibits significant volatility-related activations at outcome time (Supplementary Fig. 3B). Both regions lay within the dmPFC region we identified above as computing choices from corrupted beliefs. BMC then revealed that in each region, Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\cdot {U}_{{fit}}^{t}\) accounted decisively better than volatility estimates for dmPFC activity from outcome to choice time (both Pexceedance > 0.98, Fig. 6D, Supplementary Fig. 3D).
A MRI bold activation cluster correlating with volatility estimates from the third-order volatility inference model at outcome time (dark-green, voxel-wise p < 0.001, cluster-wise FWE-corrected p < 0.05) superimposed on the MNI template sagittal slice centered on the activation peak (MNI coordinates: x,y,z = 6,17,50 mm). Light-blue area: same data as in Fig. 5A. B Volatility-related activations averaged over the dark-green cluster shown in (A) at outcome and choice time (leave-one-out procedure removing selection biases). Mean and s.e.m across the n = 22 participants (**p = 0.0023; two-sided one-sample T-tests). C Full variance analyses over the dark-green activation cluster shown in (A), comprising volatility estimates and Weber variability \({\epsilon }_{t}^{{fit}}\) as regressor of interest and factoring out variables of no interest, including RTs, response switches, and RL variables. Error bars are s.e.m. across participants (n = 22). **p = 0.006, ***p = 0.0004, otherwise ps > 0.47; two-sided one-sample T-tests, d.f. =21). D Bayesian model comparison over the n = 22 participants between volatility estimates and Weber variability \({\epsilon }_{t}^{{fit}}\) as concurrent models of both outcome- and choice-related activations in the dark-green cluster shown in (A). Bars are model posterior probabilities. Error bars are Bayesian estimates of s.d. of the model posterior probability. Model exceedance probability Pexc is indicated. All data points show individual subjects’ data. Source data are provided as a Source Data file.
To examine whether volatility estimates might account for some dmPFC activations unrelated to Weber-variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}).{U}_{{fit}}^{t}\) and potentially confounding factors, we next entered the neural activity within these two volatility-related regions in a regression analysis including volatility estimates and Weber-variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\cdot {U}_{{fit}}^{t}\) as regressors of interest and as regressors of no interest, the potentially confounding outcome-related and choice-related factors mentioned above (chosen arms and reward prediction errors at outcome time; RTs, response repetition/switch, chosen arms and arm reward values at choice time). We adopted a conservative approach, favoring the volatility regressor by purposely omitting the leave-one-out procedure, removing selection biases. We found that in our volatility-related region, activations at both outcome and choice time remained associated with the Weber variability regressor (both T21 = 3.05, Ps = 0.0060) but not with the volatility regressor (T21 = 0.73, Ps > 0.473) (Fig. 6C). We observed virtually the same results in the previously reported, volatility-related region (Supplementary Fig. 3C).
This regression analysis placed in residuals the shared variance between volatility estimates and Weber-variability. As best-fitting realizations, \({U}_{{fit}}^{t}\) varied independently of volatility estimates (because, unlike volatility estimates, realizations \({U}_{{fit}}^{t}\) exhibited virtually no autocorrelations across successive trials, as mentioned above); this shared variance reduced to a deterministic component \(({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\). Accordingly, the result showed that this deterministic component accounted for volatility-related activations, while the remaining activations associated with the Weber variability regressor were mostly unrelated to the deterministic component and consequently reflected best-fitting realizations \({U}_{{fit}}^{t}\). Consistent with the results from the preceding section, these remaining activations were negatively and positively associated with the Weber variability regressor at outcome and choice time, respectively (Fig. 6C).
Discussion
The results confirmed that in uncertain and changing environments, the dACC and pre-SMA guide behavioral choices by encoding state beliefs deriving from first-order inferences about external contingencies. The encoded beliefs were found to further undergo a stochastic variability consistent with Weber’s law and associated with the trial-by-trial neural variability in the dACC and pre-SMA, corrupting belief updating processes across successive trials. This neural variability also accounted for dmPFC activations previously reported to correlate with volatility estimates from volatility inference models4,14. As previously observed4,5,14,30, we further found that human adaptive performances in uncertain and changing environments are consistent with optimal adaptive behavior. Critically, our results reveal that these efficient performances derive from the stochastic neural fluctuations corrupting the beliefs about action-outcome contingencies that the pre-SMA and dACC update to guide behavior rather than from additional complex neural computations/mechanisms dedicated to volatility estimates as previously proposed4,14. Our findings thus support the perhaps most parsimonious neural account of efficient adaptive behavior in uncertain and changing environments with no need to assume additional mechanisms and complex computations.
fMRI provides little indication about the neuronal origin of this corrupting variability. A likely hypothesis is that this variability originates in noisy neuron spiking activity resulting from noisy synaptic transmission or spike generation. In neural network models, this noisy activity elicits a stochastic Weber variability bearing upon information coding in populations of neurons, i.e., a variability increasing with the magnitude of information updates27. Neuronal recordings in rodents consistently confirm that in populations of dACC neurons, the variability of neuron activity increases when external contingencies change12. Weber variability may also reflect inherent fluctuations in neuronal sampling processes previously proposed to encode state beliefs, and which sampling noise increases with the magnitude of belief updates51,52,53.
Our results do not imply that the neuronal sources of corrupting fluctuations are confined to the dACC and pre-SMA. The stochastic corrupting fluctuations we observed in these regions are likely to reflect the cumulated effects of neural variability along the brain network, including notably the ventromedial prefrontal cortex, known to be involved in forming internal beliefs from action outcomes23,54,55,56. As we identified these fluctuations from participants’ choice data, the fluctuations were consistently found in the fMRI neural signature of choice computations from internal beliefs (Fig. 4). Our results thus provide evidence that the dACC and pre-SMA form the downstream stage of behavioral control deriving behavioral choices from internal beliefs about current action-outcome contingencies (e.g., refs. 21,48,57,58,59,60,61,62,63).
Conceptually, our results suggest that the brain produces efficient adaptive behavior by rapidly forming simplified, locally accurate but globally inaccurate stationary world models, while neural variability enables it to rapidly disengage from an obsolescent local world model to form new ones to guide behavior. We found indeed that the dmPFC guides behavior by encoding beliefs, assuming stable action-outcome contingencies. The assumption is accurate at a short time-scale and advantageously maximizes the speed of learning ongoing environment contingencies. However, the assumption is evidently inaccurate at longer time-scales and considerably decreases the speed of adapting to contingency changes. The stochastic neural fluctuations we observed as corrupting these beliefs encoded in the dmPFC to guide behavior precisely suppress this downside. Following Weber’s law, indeed, this variability decreases the influence of prior beliefs on belief updating when external contingencies change and consequently, enables these beliefs to rapidly adapt to new contingencies. From that respect, our findings do not describe a neural approximation of optimal adaptive processes4,5, which, in contrast, presumes non-stationary environments in order to form globally more accurate world models, favoring asymptotic detrimentally to short-term efficacy.
In particular, when action outcomes are repeatedly inconsistent with the beliefs encoded in the dmPFC and prompt substantial belief updates, the observed stochastic neural fluctuations make such beliefs increasingly variable to the point that they essentially induce random switching across distinct courses of action. This may occur before beliefs possibly stabilize in favor of one course of action and, following Weber’s law, undergo less stochastic neural fluctuations. Stochastic behavior may thus arise from structured beliefs encoded in the dmPFC but undergoing large stochastic neural fluctuations. Our results may thus account for the emergence of stochastic behavior in rodents facing changes in reward contingencies when noradrenergic locus coeruleus inputs into the dACC are optogenetically manipulated to favor behavioral switches15. More generally, our results indicate that switching away from an ongoing course of action to explore alternative ones—the process to switch from exploitation to exploration behavior, which involves the dmPFC12,64—is neither fully deterministic nor predictable but virtuously exhibits considerable variability stemming from neural noise.
The present study has some limitations. fMRI allows measuring the aggregated impact of neural variability on information processing. Accordingly, fMRI reveals here that fluctuations in dmPFC activations induce the Weber variability, corrupting the representations encoded in the dmPFC to guide behavior. As noted above, however, fMRI precludes the possibility of identifying the precise neural origins of such fluctuations and, consequently, Weber variability. More invasive recording methods, like intracranial Electro-Encephalography in patients or neurophysiological recordings in rodents or monkeys, should provide more insights about these neural origins. A second potential limitation is that the present study is based on a unique, simple behavioral protocol that might potentially question its generalizability. The protocol consists of the canonical probabilistic reversal learning task, namely a two-alternative forced choice task with binary outcomes featuring constant probabilities with episodic, unpredictable reversals. We chose this task first because it is commonly used in decision neurosciences and especially in previous studies investigating volatility-related brain activations (e.g., ref. 4), which we sought to replicate. A second reason is that the task is simple enough to allow closed-form mathematical derivations in the Weber-variability model through inversion equations Eqs. E5, E6 (see “Methods” section), which enables building related regressors for fMRI analyses. A third reason is that using a simple task presumably corresponds to a conservative approach: the simpler the task, the simpler are inferential processes, and furthermore, the less noisy/variable neural computations are. Thus, our results are likely to generalize to more complex tasks, which presumably undergo more neural variability. Future research should figure out how to test the neural underpinnings of Weber variability in more complex behavioral tasks. A third potential limitation is that we cannot formally rule out an alternative, though unlikely, interpretation of the present results. One could indeed interpret Weber variability \({\epsilon }_{t} \sim \left(\mu+\lambda {d}_{t}\right)\cdot {u}_{t}\) and related activations as alternatively reflecting variable model misspecifications in belief updating processes that vary randomly across successive trials. In other terms, using Marr’s level terminology, the observed Weber variability might reflect within individuals an algorithmic variability in belief updating processes rather than a mere neural variability corrupting a steady belief updating process. The interpretation appears to be compatible with our results. We, however, note that this interpretation lacks parsimony, and its predictive explanatory power remains questionable. Also, the interpretation appears to be less plausible in simple behavioral protocols (as in the present study) than in more complex ones.
To conclude, our findings point to a general neural adaptive principle based on forming world models presuming stable environments but undergoing a variability scaling with internal changes inconsistent with this stability premise. This principle has two key evolutionary advantages: (1) computational frugality, as forming such stationary world models relies on elementary neural computations, reducing to register event occurrences, while internal variability stems from neural processing imprecision requiring no computational costs; (2) uncertainty robustness, as its adaptive efficiency relies on no assumptions regarding the true temporal structure of the environment. Computer simulations30 indeed show that the present adaptive principle unravels this structural uncertainty5,25 more efficiently than adaptive reinforcement learning and neurally tractable approximations of optimal adaptive processes. Thus, our results support a neural adaptive principle that may explain the prevalence of neural variability across the brain3 and the apparent ubiquity of Weber’s law in neural information coding29,35. We finally note that a similar adaptive principle seems to regulate genetic adaptation: the high fidelity of DNA replication relies on presuming stable environments but combines with a mutational variability increasing when changes in the environment induce internal changes contravening this stability premise (i.e., stressors)65,66,67. Accordingly, the neural adaptive principle we describe here might reflect a more general principle of biological adaptation ranging from adaptive evolution to behavior.
Methods
Participants
22 right-handed volunteers participated in the present study (12 females, mean age: 24.6 y/o, age range: 20-30 y/o). One additional participant was excluded because her/is performance did not exceed chance level. Participants had neither a history of neurological and psychiatric diseases nor current psychiatric medication and had normal or corrected to normal vision, as assessed by medical examinations. Every participant provided a written informed consent, and the study was approved by the French National Ethics Committee (CPP, Inserm protocol #C15-98). Participants were paid for their participation. They received a flat pay-off plus an extra amount according to their performance in the task that did not exceed 10 euros.
Behavioral Protocol
Participants were tested in two experimental sessions administered on two distinct days. Each session included three scanning runs. In every run, participants carried out a two-armed bandit task comprising 180 trials. Overall, participants performed a grand total of 1080 task trials.
In every trial, participants chose one of the two visually presented arms by pressing one of two response buttons and received binary feedback (reward vs. no reward). Bandit’s arms corresponded to two distinct shape stimuli presented horizontally and randomly displayed on the left or right side (Fig. 1B). Distinct pairs of shapes were used across runs.
One arm led to rewards more frequently (85% vs. 15%) but these reward contingencies reversed unpredictably with a probability (i.e., volatility) varying episodically and pseudo-randomly along runs (Fig. 1C). Unbeknownst to the participants, each run was thus composed of the succession of three distinct 60-trials episodes: a low volatile episode (volatility=0.05), a mild volatile episode (volatility = 0.07) and a high volatile episode (volatility = 0.1). Accordingly, reward probabilities associated with arms were swapped between 85% and 15% on average every 20, 15, or 10 trials in low, mild, and high volatile episodes, respectively. The order of episodes was carefully counterbalanced across runs, sessions, and participants using the 12 Latin squares of order 3 across subjects.
Trials started with stimulus onsets and ended with feedback offsets (feedback duration: 700 ms). Response-Feedback onset asynchrony was jittered (range: 150, 1800, and 3450 ms, uniform distribution). Inter-trial intervals were also jittered (range: 500, 2200, and 3900 ms, uniform distribution). Trials were ended whenever no responses were recorded within 1300 ms following stimulus onsets. Overall, each experimental session lasted around 80 min, including breaks between scanning runs.
Finally, participants were familiarized with the task before the first session by performing a training version of the task made of two 60-trial runs comprising each two reward-contingency reversals.
fMRI data acquisition
A Siemens Verio 3 T scanner (CENIR, ICM, Paris, France) and a 32-channel head coil were used to acquire both high-resolution T1-weighted anatomical MRI (3D MPRAGE; voxel resolution: 1mm3) and a T2*-weighted multiband-echo planar imaging (mb-EPI) (multiband factor: 3; acceleration factor 2; GRAPPA). fMRI time series acquisition parameters were as follows: 54 slices (ascending order, tilted plane acquisition); voxel size: 2.5 mm isometric; repetition time: 1.1 s; echo time: 25 ms.
Image pre-processing included the co-registration of anatomical T1 images with mean EPI, segmentation, and normalization to a standard T1 template to allow group-level anatomical localization. Pre-processing of mb-EPI images consisted of standard spatial realignment, movement correction, reconstruction and distortion correction, and normalization using the same transformation as applied to anatomical T1 images. Normalized images were spatially smoothed using a Gaussian kernel (FWHM = 8 mm). We used the SPM12 software (Wellcome Trust Center for NeuroImaging, London, UK; www.fil.ion.ucl.ac.uk) for all pre-processing steps except for distortion correction. Distortion correction consisted of image unwarping and reconstruction using FSL software68.
Computational models
First-order inference model
The first-order inference model assumes external contingencies to remain stable over time, that is, volatility is zero, and the environment remains in the same latent state z. The model, therefore, comprises only one level of inferences bearing upon current external contingencies (Fig. 1A). Using standard notations, the model more precisely assumes in the present protocol environment that:
-
(i)
Given chosen arm at, feedbacks rt are delivered according to a Bernoulli distribution with parameters \(\eta\) or \(1-\eta\):
\({r}_{t}|{a}_{t}\sim {\mbox{Bernoulli}}(\eta )\) if \({a}_{t}\) is the arm with reward probability \(\eta\)
\({r}_{t}|{a}_{t}\sim {\mbox{Bernoulli}}(1-\eta )\) if \({a}_{t}\) is the arm with reward probability \(1-\eta\);
-
(ii)
reward probability \(\eta > 0.5\) follows a Beta distribution:
(iii) Priors about hyper-parameters \({a}_{\eta },{b}_{\eta }\), are uninformative:
The corresponding inference yields in every trial t to the formation of first-order beliefs B(t) as probability distributions over the two potential latent states over trials (reward probability \(\eta > 0.5\) associated with arm 1 or arm 2). By marginalizing over these first-order beliefs, the model infers, given past action outcomes, what should be in every trial t the highest rewarding action in response to stimuli.
This first-order inference model yields the theoretically optimal adaptive behavior (ideal Bayesian observer model) in stable environments. The model adapts dramatically slowly to changes in external contingencies. The counterpart is that in this model, the inferential process reduces to a very simple counting process of action outcomes and is computable in a closed form. Denoting \({B}_{1}(t)\) the belief that arm 1 is associated with reward probability \(\eta > 0.5\) (and arm 2 with reward probability \(1-\eta\)), (and \({B}_{2}(t)\)= 1-\({B}_{1}(t)\) the converse belief), we can indeed write:
with
if in trial \(t-1,\) arm 1 is chosen and rewarded or arm 2 is chosen and unrewarded,
if in trial \(t-1,\) arm 1 is chosen and unrewarded or arm 2 is chosen and rewarded.
Second-order volatility inference model5
This volatility inference model assumes that environment latent states zt or external contingencies may change over time with a volatility/probability \(\tau\) that remains constant across trials (but its estimates may vary across time). The model therefore comprises two hierarchically organized levels of inferences bearing upon: (1) the volatility value \(\tau\) and (2) the successive occurrences of latent states \({z}_{t}\) determining stimulus-action-outcome contingencies: i.e., in the present protocol, the mapping between the two arms and reward probabilities (\(\eta\), 1- \(\eta\)) (Fig. 1A). Using standard notations, the model thus assumes in the present protocol environment that:
(i) volatility \(\tau \in [0;0.5]\) follows a Beta distribution:
\(2\tau \sim {\mbox{Beta}}\left({a}_{\tau },{b}_{\tau }\right)\);
(ii) current latent state \({z}_{t}\) changes between trials t−1 and t with probability \(\tau\);
(iii) Given the current latent state zt and chosen arm at, feedbacks rt (=1 or 0) are delivered according to a Bernoulli distribution with parameters \(\eta\) or \(1-\eta\):
\({r}_{t}|{a}_{t},{z}_{t}\sim\) \({\mbox{Bernoulli}}\) \((\eta )\) if \({a}_{t}\) is the arm with reward probability \(\eta\)
\({r}_{t}|{a}_{t},{z}_{t}\sim {Bernoulli}(1-\eta )\) if \({a}_{t}\) is the arm with reward probability \(1-\eta\);
(iv) reward probability \(\eta > 0.5\) follows a Beta distribution:
(v) Priors about hyper-parameters \({a}_{\tau },{b}_{\tau }\), \({a}_{\eta },{b}_{\eta }\) are uninformative:
\(({a}_{\tau }=1;{b}_{\tau }=1;{a}_{\eta }=1;{b}_{\eta }=1)\)
The corresponding nested inferences yield in every trial t to the formation of (1) second-order beliefs about volatility \(\tau\) and (2), first-order beliefs B(t) about current latent states or external contingencies. By marginalizing over these first-order beliefs, the model infers, given past action outcomes, what should be in every trial t the highest rewarding action in response to stimuli.
The model yields the theoretically optimal adaptive behavior (ideal Bayesian observer model) in constant-volatility environments. In this model, however, computing posterior beliefs about latent states \({z}_{t}\) and latent parameters \(\tau\), \(\eta\) is an intractable problem. We addressed this issue by using a sequential Monte Carlo (SMC) algorithm recently developed in machine learning to solve this class of inferential models comprising both latent states and parameters31. The algorithm is based on particle filtering methods and converges to the exact solution when the number of sampling particles increases to infinity33. The algorithm comprises two intermixed SMC procedures: (1) a particle filter34 implementing iterated Important Sampling in the space of latent states \({z}_{t}\), (2) an iterated Importance Sampling combined with a particle Markov Chain Monte Carlo method34,69 in the parameters’ space, \(\eta,\tau\). We implemented the algorithm using a total number of 106 particles corresponding to 103 samples in the parameters’ space, each associated with 103 particles in the space of latent spaces. We verified that this number allows approaching the asymptotic convergence: we implemented the algorithm using 4 × 106 particles and obtained virtually identical posterior beliefs.
Third-order volatility inference model4
This volatility inference model assumes that consistent with varying-volatility environments, volatility \({\tau }_{t}\) varies across trial t as a bounded, Gaussian random walk with (unknown) variance ν and comprises three hierarchically organized levels of inferences bearing upon: (1) the volatility change rate ν; (2) the successive volatility values \({\tau }_{t}\); and (3) the successive occurrence of latent states \({z}_{t}\) determining stimulus-action-outcome contingencies: i.e., in the present protocol, the mapping between the two arms and reward probabilities (\(\eta\), 1- \(\eta\)) (Fig. 1A). Using standard notations, the model more precisely assumes in the present protocol environment that:
(i) volatility \({\tau }_{t}\) varies as a bounded Gaussian random walk within the range [0,0.5] with variance \({{{\rm{\nu }}}}\) following an Inverse-Gamma distribution:
\(\nu \sim\) \({\mbox{Inverse}}-{\mbox{Gamma}}\) \(\left({a}_{v},{b}_{v}\right)\);
(ii) current latent state \({z}_{t}\) changes between trials t−1 and t with probability \({\tau }_{t}\);
(iii) Given the current latent state \({z}_{t}\) and chosen arm at, feedbacks rt (=1 or 0) are delivered according to a Bernoulli distribution with parameters \(\eta\) or \(1-\eta\):
\({r}_{t}|{a}_{t},{z}_{t}\sim\) \({\mbox{Bernoulli}}\) \((\eta )\) if \({a}_{t}\) is the arm with reward probability \(\eta\)
\({r}_{t}|{a}_{t},{z}_{t}\sim\) \({\mbox{Bernoulli}}\) \((1-\eta )\) if \({a}_{t}\) is the arm with reward probability \(1-\eta\);
(iv) reward probability \(\eta > 0.5\) follows a Beta distribution:
\(2\eta -1\sim\) \({\mbox{Beta}}\) \(({a}_{\eta },{b}_{\eta })\).
(v) Priors about hyper-parameters \({a}_{v},{b}_{v}\), \({a}_{\eta },{b}_{\eta }\) are uninformative:
\(({a}_{v}=3;{b}_{v}=0.001;{a}_{\eta }=1;{b}_{\eta }=1)\)
The corresponding nested inferences yield in every trial t to the formation of (1) third-order beliefs about volatility change rate υ, (2) second-order beliefs about volatility \({\tau }_{t}\), and finally, (3) first-order beliefs B(t) about current latent states or external contingencies. By marginalizing over these first-order beliefs, the model infers, given past action outcomes, what should be in every trial t the highest rewarding action in response to stimuli.
The model yields the theoretically optimal adaptive behavior (ideal Bayesian observer model) in varying-volatility environments with no prior knowledge about how volatility varies across time. In this model, however, computing posterior beliefs about latent states \({z}_{t},{\tau }_{t}\), and latent parameters \(\nu\), \(\eta\) is again an intractable problem. We addressed this issue by using the same particle-filtering procedure as described above for the second-order volatility inference model.
Weber-variability model
The Weber variability model is based on the first-order inference model. The Weber-variability model involves the same first-order inferential process but further postulates that neural variability induces computational imprecisions increasing the entropy/variance in belief updating (Fig. 1A). In agreement with the Weber’s law28,29,35, computational imprecisions \({\epsilon }_{t}\) at time point t are modeled as a random variable scaling with the magnitude dt of changes in first-order beliefs:
where \({u}_{t}\) is a random variable uniformly distributed over the range [0;1], μ and λ are non-negative free parameters quantifying the constant and Weber component of computational imprecisions. The Weber component makes the increase of belief entropy consistent with volatility inference models: the more volatile the environment, the more belief entropy increases. We denote \({\widetilde{B}}_{1}(t)\) and \({\widetilde{B}}_{2}\left(t\right)=1-{\widetilde{B}}_{1}(t)\) the corrupted first-order beliefs undergoing computational imprecisions \({\epsilon }_{t}\). To evaluate \({\widetilde{B}}_{1}(t)\), we need to first estimate computational imprecisions \({\epsilon }_{t}\). For that purpose, we quantified the magnitude of changes in first-order beliefs \({d}_{t}\) as the Jensen-Shannon distance between \({\widetilde{B}}_{1}\left(t-1\right)\) and the beliefs \({B}_{1}^{{\prime} }\left(t\right)\) that would be inferred from \({\widetilde{B}}_{1}\left(t-1\right)\) through the exact first-order inference process described above: \({d}_{t}={JS}[{\widetilde{B}}_{1}(t-1),{B}_{1}^{{\prime} }(t)\,]\), a measure corresponding to the notion of Bayesian surprise36. Computational imprecisions \({\epsilon }_{t}\) corrupt this exact inference process so that compared to uncorrupted beliefs \({B}_{1}^{{\prime} }\left(t\right)\), the resulting corrupted beliefs \({\widetilde{B}}_{1}(t)\) exhibit an increased entropy or equivalently, an increased variance varying with \({\epsilon }_{t}\), while the mean remained unchanged:
Modeling corrupted beliefs \({\widetilde{B}}_{1}\left(t\right)\) as following a Beta distribution \({\mbox{Beta}}\left[\widetilde{a}(t),\widetilde{b}(t)\right]\) with \(\widetilde{a}\left(t\right) > 1\) and \(\widetilde{b}\left(t\right) > 1\), we computed its parameters \(\widetilde{a}(t),\widetilde{b}(t)\) from its mean and variance given by Eqs. (3), (4) by inverting the two following mathematical identities:
Thus, we fully modeled the corrupted first-order inference process undergoing computational imprecisions \({\epsilon }_{t}\) consistent with Weber’s law. We refer to this model as the Weber-variability model. Note that this inference process remains the same and as simple as the first-order inference model but undergoes computational imprecisions \({\epsilon }_{t}\), which are stochastic through the random component \({u}_{t}\). Importantly, these computational imprecisions we quantified for the sake of modeling are presumably not computed but endured by the inference process. Given free parameters μ and λ, simulating the model is straightforward through Eqs. (1–6) and by drawing a random value from \({u}_{t}\) at each time step. Priors about parameters μ and λ are uninformative (i.e., uniform) and vary over the range [0;1].
Reinforcement learning models
To assess the presence of first-order inferential processes in adaptive behavior, we also considered the standard, non-inferential adaptive model, namely the Pearce-Hall Reinforcement Learning model with computational imprecision. The model combines Pearce-Hall’s learning rule37,38 and is similar to the Weber-variability model, noisy updates scaling with the unsigned reward prediction error. In each trial following feedback \({r}_{t}\), the model updates action values \({\mbox{Q}}\) \((a)\) with respect to the following noisy updating rule:
where \({a}_{t}\) is the chosen action (\({{\mbox{Q}}}_{t}\left(a\right)\) remain unchanged for \(a\ne {a}_{t}\)). \(N(0,\zeta )\) denotes the zero-centered Gaussian distribution with standard deviation \(\zeta\) modeling computational imprecision’s stochasticity. Free parameters included: α and \({\alpha }_{{PH}}\) quantifying the constant and adjustable component of learning rate, respectively; ζ quantifying computational imprecision’s stochasticity. We also fitted its reduced, nested models: namely, the exact PH-RL model \((\zeta=0)\); the noisy standard Rescorla&Wagner RL \(({\alpha }_{{PH}}=0)\); and finally the exact standard RW-RL \((\zeta=0;{\alpha }_{{PH}}=0)\). Bayesian model comparison showed that among all these four RL models, the noisy standard RW-RL best fit participants’ behavior in the present protocol (Supplementary Fig. 2). Priors about parameters α and \({\alpha }_{{PH}}\) are uninformative (i.e., uniform) and vary over the range [0;1]. Prior about parameter \(\zeta\) is uninformative (i.e., uniform) and varies over the range [0;1].
Decision variable and policy
Decision variable
The decision variable DV driving choices in every trial is the difference between reward expectations ER1 and ER2 associated with each option. For RL models, the decision variable is simply the difference in action values \({{\mbox{Q}}}_{t}\left(a\right)\) between actions. For the inference models, reward expectations ER can be computed by marginalizing over first-order beliefs and written as follows (using the notations from the Weber variability model):
Decision variable DV then writes as follow:
In the present protocol, \(\eta=0.85\) and its estimates remained close to this value. Accordingly, the decision variable scales with the difference in first-order beliefs between options.
Alternatively, Rouault et al.21 have proposed that reward expectations ER is the weighted sum of reward magnitudes normalized across choice options (i.e., =1/2 here for each option) and state beliefs \(\widetilde{B}\left(t\right)\) about which option is the more frequently rewarded option. In the present protocol, reward expectations are then written as follows:
where \(\omega\) is the weighting parameter. Decision variable DV then writes as follows:
Again, the decision variable scales with the difference in first-order beliefs between options. The present protocol is thus agnostic about these two assumptions regarding the decision variable.
Decision policy
For all models, we modeled action selection with a softmax decision policy over the decision variable (free parameter: temperature \(1/\beta\)) with a uniform prior over the range [0;1]40. We also investigated choice history biases in action selection that were previously documented45 and that might have potentially altered model comparison results. For that purpose, we modeled two variants of choice history biases: (1) a repetition effect biasing the softmax rule toward the last selected option with one additional free parameter (repetition bias); (2) a choice trace effect biasing the softmax rule toward the most frequently chosen option over recent trials (two additional free parameters: \(\gamma\) scaling the exponential decay over time and \(\theta\) scaling the bias weight in the softmax rule): bias toward option i in trial t \(={{{\rm{\theta }}}}{f}_{t}^{i}\) with \({f}_{t}^{i}=1-\gamma+\gamma {f}_{t-1}^{i}\) if option i was chosen in trial t−1, otherwise \({f}_{t}^{i}=\gamma {f}_{t-1}^{i}\). None of these biases was found to alter the results (Fig. 3A).
Bayesian model comparisons
Model fits to human data were compared based on exact Model Posterior Probabilities (MPPs), i.e., the model marginal likelihoods given the data with uniform priors over models. MPPs based on marginal likelihoods are the optimal Bayesian quantification for comparing models, balancing model degree of freedom (or complexity) and adequacy to data. Model marginal likelihoods are obtained by marginalizing model likelihoods over the whole free parameter space. This marginalization was carried out through standard Imseportance Sampling and Quasi Monte Carlo methods41.
Marginalizing model likelihoods over the whole free parameter space requires first computing model likelihoods given free parameter values. For the second- and third- order volatility inference models that we emulated with particle filtering (see above), computing model likelihoods given human data is straightforward: for each model, the particle filter emulates the corresponding exact inferential process so that model posterior beliefs and consequently, the model softmax probability to select participants’ chosen actions are unrelated to particle filtering. Given inverse temperature \(\beta\), thus, each model directly provides its likelihood given human data.
For the Weber-variability model, Weber variability \({{{{\rm{\epsilon }}}}}_{{{{\rm{t}}}}}\) makes this model to diverge from the exact first-order inference process. As a result, model posterior beliefs and consequently, the model softmax probability to select participants’ chosen actions depend upon the actual realization of computational imprecisions \({\epsilon }_{t}\) (through its stochastic component \({u}_{t}\)). Given free parameters, computing the model likelihood given human data then requires marginalizing over realizations of computational imprecision \({\epsilon }_{t}\). We solved this marginalization problem by noting that the model actually defines a hidden Markov chain generating actions. We therefore computed the marginalization using established sequential Monte Carlo methods for hidden Markov models33. Note that this marginalization procedure again optimally balances the adequacy to data and the additional degree of freedom resulting from the various possible realizations of computational imprecision \({\epsilon }_{t}\). The same method was used for reinforcement learning models comprising stochastic noise \({{{\rm{N}}}}(0,{{{\rm{\zeta }}}})\) (see above).
Model recovery procedure
We implemented a model recovery procedure to assess the ability of our experimental protocol to properly discriminate the models44. The recovery procedure consists of generating synthetic choice data by simulating the performance of every model of interest in the task, then applying our model comparison procedure (see above) to these data. To be conclusive, the procedure should lead to selecting the simulated model. For every model, we generated 22 sets of synthetic data, each corresponding to the best-fitting free parameters computed for one participant. The results (Fig. 2) show that the recovery procedure was conclusive, thereby validating the protocol's ability to properly discriminate the models.
Deriving best-fitting realizations \({U}_{{fit}}^{t}\) of stochastic component \({u}_{t}\) and Weber variability \({\epsilon }_{t}^{{fit}}\)
To analyze Weber variability (Fig. 3) and to conduct the model-based fMRI analyses (Figs. 4–6), we estimated the most likely, trial-by-trial realizations \({U}_{{fit}}^{t}\) of its stochastic component \({u}_{t}\) given the actions performed by human participants and received feedback. To generate these realizations (usually referred to as smoothing realizations), we used the particle Monte Carlo Markov Chain method referred to as particle independent Metropolis-Hastening samplers33. The method allows obtaining smoothed realizations of stochastic distributions from fixed parameter values. We applied the method by using the best-fitting model parameters to obtain N = 1000 samples of posterior distributions \(p({u}_{t},|,{a}_{1:T},{r}_{1:T},{\mu }_{{fit}},{\lambda }_{{fit}},{\beta }_{{fit}})\) for each participant, where T is the total number of trials. We thus generated for each participant: N smoothing realizations \({u}_{t}^{i}\) of stochastic component \({u}_{t}\), N estimates \({d}_{t}^{i}\) of belief update magnitudes and N smoothing realizations of Weber variability \({\epsilon }_{t}^{i}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t}^{i})\cdot {u}_{t}^{i}\) with \(i=1:1000\). We then averaged/marginalized over these N samples to obtain for each participant the most likely, trial-by-trial stochastic component \({U}_{{fit}}^{t}\), Weber variability \({\epsilon }_{t}^{{fit}}=({\mu }_{{fit}}+{\lambda }_{{fit}}{d}_{t})\cdot {U}_{{fit}}^{t}\) and the resulting corrupted first-order beliefs, which were used to analyze Weber variability and to form regressors in model-based fMRI analyzes.
fMRI data analyses
All fMRI data analyses were conducted using the SPM12 software (Wellcome Trust Center for NeuroImaging, London, UK; www.fil.ion.ucl.ac.uk). Statistical parametric maps of local brain activations were computed in every participant using the standard general linear model (GLM). The model included separate event-related regressors, which convolved a series of delta functions with the canonical hemodynamic response function (HRF) that estimated BOLD responses at stimulus and feedback onsets (i.e., at choice and outcome time). Regressors removing nuisance variables included trials with no responses, six motion parameters from the realignment procedure, along with regressors modeling each scanning run. Event-related regressors of interest were parametrically modulated by variables derived from computational models fitted to behavioral data. These regressors were generated for each participant using individual best-fitting parameters. All parametric modulations were z-scored to ensure between-subjects and between-regressors comparability of regression coefficients. All GLMs were performed in full variance with no orthogonalization procedures so that all shared variance across regressors was placed in residuals, and observed activations were specific to each parametric modulation.
Following the standard SPM method, second-level parametric maps were then obtained for each contrast over the group of participants with a significance voxel-wise threshold set at p < 0.001 (uncorrected) and with a significance cluster-wise threshold set at p < 0.05 corrected for family-wise errors for multiple comparisons. We removed selection biases from all post-hoc analyses performed from activation clusters using the standard leave-one-out procedure70: for every GLM, the partial correlation coefficients (betas) of each participant were averaged over activation clusters identified in the N−1 remaining participants (using the significance thresholds indicated above); these coefficients were then entered in post-hoc analyses across the sample of N participants.
To compare competitive models of fMRI activity, we further performed Bayesian Model Selection (BMS) based on computing model posterior probabilities given the data (i.e., mean activity over activation clusters identified as indicated above) to derive model Exceedance Probabilities42,43. This BMS is the optimal method for model selection given potential inter-individual differences. For post-hoc Bayesian model selection, the same leave-one-out procedure as described above was used for removing any selection biases.
fMRI signature of choice computations
To investigate brain regions involved in choice computations, we exactly followed the approach described in ref. 21. In the present protocol, the decision variable driving choices (differences in reward expectations between options) from inference models scales with the relative difference in first-order beliefs between choice options (see “Decision variable and policy” section). We referred to the beliefs derived from the best-fitting Weber-variability model as corrupted beliefs. Accordingly, we built the General Linear Model (GLM), including, along with response times (RTs), the two following regressors at choice time:
The chosen-relative-to-unchosen corrupted belief \(({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})\). This regressor aims at capturing the neural demand to make actual choices in each trial. This demand decreases when the belief associated with the chosen-relative-to-the-unchosen option increases. A region involved in choice computations is therefore expected to correlate negatively with \(({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})\).
The unsigned difference between corrupted beliefs \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\). This regressor is independent of actual choices and aims at capturing the neural demand associated with encoding the decision variable driving choices. This neural demand increases with the information conveyed by the decision variable, i.e., when the beliefs increasingly differ between the two options, or equivalently \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\) increases. The quadratic expansion is used here as more closely capturing belief entropy than absolute modulus (using absolute modulus actually led to virtually the same results). A region encoding the decision variable is therefore expected to correlate positively with \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\).
Thus, activations reflecting choice computations from corrupted beliefs are expected to vary with RTs and independently, to jointly exhibit both a negative linear effect of \(({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})\) and, when this effect of actual choice computations is factored out, a positive quadratic effect of \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\) reflecting the encoding of the decision variable driving choice computations. We therefore performed the full variance GLM regression including \(({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})\) and \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\) along with RTs as regressors in order to identify brain regions involved in computing choices from first-order beliefs undergoing Weber variability.
Note that mathematically, the linear and quadratic regressors are statistically independent (no shared variances), provided that choices randomly sample the belief space. This is, of course, not the case as participants’ choices were evidently strongly biased towards positive values of \(({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})\). The linear and quadratic regressor, therefore, only shares the variance resulting from such sampling biases. A useful feature of full variance regression analyses, however, is to assign the shared variance between regressors to regression residuals. The regression coefficients associated with the linear and quadratic regressors are therefore uncontaminated by any choice sampling biases. Yet, activations varying with \(a({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})+\) \({b({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}})}^{2}\) might be interpreted as varying with \({({\widetilde{B}}_{{ch}}-{\widetilde{B}}_{{unch}}-k)}^{2}\) with k denoting a non-zero constant (indeed \({(X-k)}^{2}=-2{kX}+{X}^{2}+\) \({k}^{2}\)). Such a non-zero constant k, however, has no meaning here. Consequently, the regression coefficients associated with the linear and quadratic regressors reflect the presence of separate linear and quadratic effects associated with the computation of actual choice and the encoding of decision variable, respectively (see also ref. 48 showing dissociations between the linear and quadratic effects within the same region using distinct decision situations).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Source data are provided with this paper. The Source fMRI statistical maps used in this study are available in the Neurovault.org database under accession code https://neurovault.org/collections/GEGBYGXJ/ Source data are provided with this paper.
Code availability
Computer codes are freely available at https://github.com/csmfindling/Volnoise.
References
Paulsson, J. Summing up the noise in gene networks. Nature 427, 415–418 (2004).
Faisal, A. A., Selen, L. P. & Wolpert, D. M. Noise in the nervous system. Nat. Rev. Neurosci. 9, 292–303 (2008).
Waschke, L., Kloosterman, N. A., Obleser, J. & Garrett, D. D. Behavior needs neural variability. Neuron 109, 751–766 (2021).
Behrens, T. E., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Payzan-LeNestour, E. & Bossaerts, P. R. isk unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings. PLoS Comput. Biol. 7, e1001048 (2011).
Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).
Gallistel, C. R., Mark, T. A., King, A. P. & Latham, P. E. The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. J. Exp. Psychol. Anim. Behav. Process 27, 354–372 (2001).
Aston-Jones, G. & Cohen, J. D. Adaptive gain and the role of the locus coeruleus-norepinephrine system in optimal performance. J. Comp. Neurol. 493, 99–110 (2005).
Yu, A. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Kennerley, S. W., Walton, M. E., Behrens, T. E., Buckley, M. J. & Rushworth, M. F. Optimal decision making and the anterior cingulate cortex. Nat. Neurosci. 9, 940–947 (2006).
Jepma, M. & Nieuwenhuis, S. Pupil diameter predicts changes in the exploration-exploitation trade-off: evidence for the adaptive gain theory. J. Cogn. Neurosci. 23, 1587–1596 (2011).
Karlsson, M. P., Tervo, D. G. & Karpova, A. Y. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science 338, 135–139 (2012).
Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).
Payzan-LeNestour, E., Dunne, S., Bossaerts, P. & O’Doherty, J. P. The neural representation of unexpected uncertainty during value-based decision making. Neuron 79, 191–201 (2013).
Tervo, D. G. et al. Behavioral variability through stochastic choice and its gating by anterior cingulate cortex. Cell 159, 21–32 (2014).
McGuire, J. T., Nassar, M. R., Gold, J. I. & Kable, J. W. Functionally dissociable influences on learning rate in a dynamic environment. Neuron 84, 870–881 (2014).
Nassar, M. R., McGuire, J. T., Ritz, H. & Kable, J. W. Dissociable forms of uncertainty-driven representational change across the human brain. J. Neurosci. 39, 1688–1698 (2019).
Tervo, D. G. R. et al. The anterior cingulate cortex directs exploration of alternative strategies. Neuron 109, 1876–1887.e1876 (2021).
Shenhav, A., Botvinick, M. M. & Cohen, J. D. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron 79, 217–240 (2013).
Fouragnan, E. F. et al. The macaque anterior cingulate cortex translates counterfactual choice value into actual behavioral change. Nat. Neurosci. 22, 797–808 (2019).
Rouault, M., Drugowitsch, J. & Koechlin, E. Prefrontal mechanisms combining rewards and beliefs in human decision-making. Nat. Commun. 10, 301 (2019).
Monosov, I. E., Haber, S. N., Leuthardt, E. C. & Jezzini, A. Anterior cingulate cortex and the control of dynamic behavior in primates. Curr. Biol. 30, R1442–R1454 (2020).
Domenech, P., Rheims, S. & Koechlin, E. Neural mechanisms resolving exploitation-exploration dilemmas in the medial prefrontal cortex. Science 369, eabb0184 (2020).
Wilson, R. C., Nassar, M. R. & Gold, J. I. A mixture of delta-rules approximation to bayesian inference in change-point problems. PLoS Comput. Biol. 9, e1003150 (2013).
Bossaerts, P., Yadav, N. & Murawski, C. Uncertainty and computational complexity. Philos. Trans. R. Soc. Lond. B Biol. Sci. 374, 20180138 (2019).
Findling, C., Skvortsova, V., Dromnelle, R., Palminteri, S. & Wyart, V. Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nat. Neurosci. 22, 2066–2077 (2019).
Deco, G. & Rolls, E. T. Decision-making and Weber’s law: a neurophysiological model. Eur. J. Neurosci. 24, 901–916 (2006).
Fechner, G. T. Elemente der Psychophysik (Breitkopf and Härtel, 1860).
Laming, D. Weber’s law. In: Inside Psychology: A Science Over 50 Years (ed. Rabbitt, P.) (Oxford Univ. Press, 2009).
Findling, C., Chopin, N. & Koechlin, E. Imprecise neural computations as a source of adaptive behaviour in volatile environments. Nat. Hum. Behav. 5, 99–112 (2021).
Chopin, N., Jacob, P. E. & Papaspiliopoulos, O. SMC2: an efficient algorithm for sequential analysis of state space models. J. R. Stat. Soc. B 75, 397–426 (2013).
Doucet, A., Godsill, S. & Andrieu, C. On sequential Monte Carlo sampling methods for Bayesian filtering. Statist. Comput. 10, 197–208 (2000).
Andrieu, C., Doucet, A. & Holenstein, R. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B 72, 269–342 (2010).
Chopin, N. A sequential particle filter for static models. Biometrika 89, 539–552 (2002).
Deco, G., Scarano, L. & Soto-Faraco, S. Weber’s law in decision making: integrating behavioral data in humans with a neurophysiological model. J. Neurosci. 27, 11192–11200 (2007).
Itti, L. & Baldi, P. F. Bayesian surprise attracts human attention. Vis. Res. 49, 1295–1306 (2006).
Rescorla, R. A. & Wagner, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In Classical Conditioning II (eds Black, A. H. & Prokasy, W. F.) (Appleton-Century-Crofts, 1972).
Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Roesch, M., Esber, G. R., Li, J., Daw, N. & Schoenbaum, G. Surprise! Neural correlates of Pearce-Hall and Rescorla-Wagner coexist within the brain. Eur. J. Neurosci. 35, 1190–1200 (2012).
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods (SIAM, 1992).
Rigoux, L., Stephan, K. E., Friston, K. J. & Daunizeau, J. Bayesian model selection for group studies - revisited. Neuroimage 84, 971–985 (2014).
Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J. & Friston, K. J. Bayesian model selection for group studies. Neuroimage 46, 1004–1017 (2009).
Palminteri, S., Wyart, V. & Koechlin, E. The importance of falsification in computational cognitive modeling. Trends Cogn. Sci. 21, 425–433 (2017).
Urai, A. E., de Gee, J. W., Tsetsos, K. & Donner, T. H. Choice history biases subsequent evidence accumulation. eLife 8, e46331 (2019).
Esterman, M. & Rothlein, D. Models of sustained attention. Curr. Opin. Psychol. 29, 174–180 (2019).
Esterman, M., Tamber-Rosenau, B. J., Chiu, Y. C. & Yantis, S. Avoiding non-independence in fMRI data analysis: leave one subject out. Neuroimage 50, 572–576 (2010).
Duverne, S. & Koechlin, E. Rewards and cognitive control in the human prefrontal cortex. Cereb. Cortex 27, 5024–5039 (2017).
Vassena, E., Deraeve, J. & Alexander, W. H. S. urprise value and control in anterior cingulate cortex during speeded decision-making. Nat. Hum. Behav. 4, 412–422 (2020).
Brockett, A. T. & Roesch, M. R. Anterior cingulate cortex and adaptive control of brain and behavior. Int Rev. Neurobiol. 158, 283–309 (2021).
Shi, L. & Griffiths, T. L. Neural implementation of hierarchical Bayesian inference by importance sampling. Adv. Neural Inf. Process. Syst. 22, 1669–1677 (2009).
Huang, Y. & Rao, R. P. Neurons as Monte Carlo samplers: Bayesian inference and learning in spiking networks. Adv. Neural Inf. Process. Syst. 27, 1943–1951 (2014).
Legenstein, R. & Maass, W. Ensembles of spiking neurons with noise support optimal probabilistic inference in a dynamically changing environment. PLoS Comput Biol. 10, e1003859 (2014).
Stalnaker, T. A., Cooch, N. K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
Chan, S. C., Niv, Y. & Norman, K. A. A probability distribution over latent causes, in the orbitofrontal cortex. J. Neurosci. 36, 7817–7828 (2016).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Walton, M. E., Devlin, J. T. & Rushworth, M. F. Interactions between decision making and performance monitoring within prefrontal cortex. Nat. Neurosci. 7, 1259–1265 (2004).
Nachev, P., Kennard, C. & Husain, M. Functional role of the supplementary and pre-supplementary motor areas. Nat. Rev. Neurosci. 9, 856–869 (2008).
Wunderlich, K., Rangel, A. & O’Doherty, J. P. Neural computations underlying action-based decision making in the human brain. Proc. Natl. Acad. Sci. USA 106, 17199–17204 (2009).
Rushworth, M. F., Noonan, M. P., Boorman, E. D., Walton, M. E. & Behrens, T. E. Frontal cortex and reward-guided learning and decision-making. Neuron 70, 1054–1069 (2011).
Hare, T. A., Schultz, W., Camerer, C. F., O’Doherty, J. P. & Rangel, A. Transformation of stimulus value signals into motor commands during simple choice. Proc. Natl. Acad. Sci. USA 108, 18120–18125 (2011).
Pisauro, M. A., Fouragnan, E., Retzler, C. & Philiastides, M. G. Neural correlates of evidence accumulation during value-based decisions revealed via simultaneous EEG-fMRI. Nat. Commun. 8, 15808 (2017).
Aquino, T. G., Cockburn, J., Mamelak, A. N., Rutishauser, U. & O'Doherty, J. P. Neurons in human pre-supplementary motor area encode key computations for value-based choice. Nat. Hum. Behav. 7, 970–985 (2023).
Donoso, M., Collins, A. G. & Koechlin, E. Human cognition. Foundations of human reasoning in the prefrontal cortex. Science 344, 1481–1486 (2014).
MacLean, R. C., Torres-Barcelo, C. & Moxon, R. Evaluating evolutionary models of stress-induced mutagenesis in bacteria. Nat. Rev. Genet 14, 221–227 (2013).
Charlesworth, D., Barton, N. H. & Charlesworth, B. The sources of adaptive variation. Proc. Biol. Sci. 284, 20162864 (2017).
Riederer, J. M., Tiso, S., van Eldijk, T. J. B. & Weissing, F. J. Capturing the facets of evolvability in a mechanistic framework. Trends Ecol. Evol. 37, 430–439 (2022).
Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23, S208–S219 (2004). Suppl 1.
Beaumont, M. A. Estimation of population growth or decline in genetically monitored populations. Genetics 164, 1139–1160 (2003).
Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. & Baker, C. I. Circular analysis in systems neuroscience: the dangers of double dipping. Nat. Neurosci. 12, 535–540 (2009).
Acknowledgements
We thank Julie Drevet for her help in data acquisition. Supported by a European Research Council Grant (ERC-AdG #250106) to E.K., a Marie Skłodowska-Curie grant (#895213) to V.S. and a DGA grant (#2017-60-0039) to M.R.-M. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
E.K., M.R.M. and C.F. conceived the study and designed the experimental protocol. C.F. programmed the models. M.R.-M. collected the data. E.K., M.R-M., C.F. and V.S. analyzed the data. E.K., M.R/M., V.S. and C.F. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Findling, C., Romand-Monnier, M., Skvortsova, V. et al. Neural variability in the medial prefrontal cortex contributes to efficient adaptive behavior. Nat Commun 16, 11356 (2025). https://doi.org/10.1038/s41467-025-66444-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-66444-x








