Contribution of amygdala to dynamic model arbitration under uncertainty

Woo, Jae Hyung; Costa, Vincent D.; Taswell, Craig A.; Rothenhoefer, Kathryn M.; Averbeck, Bruno B.; Soltani, Alireza

doi:10.1038/s41467-025-66745-1

Download PDF

Article
Open access
Published: 28 November 2025

Contribution of amygdala to dynamic model arbitration under uncertainty

Nature Communications volume 16, Article number: 11704 (2025) Cite this article

2393 Accesses
1 Citations
Metrics details

Subjects

Abstract

Intrinsic uncertainty in the reward environment requires the brain to run multiple models simultaneously to predict outcomes from preceding cues or actions. For example, reward outcomes may be linked to specific stimuli and actions, corresponding to stimulus- and action-based learning. But how does the brain arbitrate between such models? Here, we combined multiple computational approaches to quantify concurrent learning in male monkeys performing tasks with different levels of uncertainty about the model of the environment. By comparing behavior in control monkeys and monkeys with bilateral lesions to the amygdala or ventral striatum, we found evidence for a dynamic, competitive interaction between stimulus-based and action-based learning, and for a distinct role of the amygdala in model arbitration. We demonstrated that the amygdala adjusts the initial balance between the two learning systems and is essential for updating arbitration according to the correct model, which in turn alters the interaction between arbitration and learning that governs the time course of learning and choice behavior. In contrast, VS lesions lead to an overall reduction in stimulus-value signals. This role of the amygdala reconciles existing contradictory observations and provides testable predictions for future studies into circuit-level mechanisms of flexible learning and choice under uncertainty.

Rate and noise in human amygdala drive increased exploration in aversive learning

Article 27 August 2025

Translation of monosynaptic circuits underlying amygdala fMRI neurofeedback training

Article Open access 05 August 2024

Neural inhibition as implemented by an actor-critic model involves the human dorsal striatum and ventral tegmental area

Article Open access 16 March 2024

Introduction

One of the challenging aspects of learning in naturalistic settings is that it is inherently unclear what features or attributes of choices are predictive of subsequent reward outcomes. Imagine successfully operating a new coffee machine after switching it on and pressing a flashing button on the left side of its screen. What should you press the next time you want to get coffee: the same button or any button that is flashing? In this specific example, reward outcomes can be equally attributed to either the identity of a choice option (e.g., flashing button) or the action needed to obtain that option (pressing the left button), corresponding to uncertainty about the correct model of the environment (stimulus-based vs. action-based). More generally and during most naturalistic settings, reward outcomes could be linked to any combination of features or attributes of a selected option or chosen action. It has been suggested that the brain tackles such uncertainty by running multiple predictive models of the environment, with each model predicting outcomes based on different attributes of choice options, and using the reliability of these predictions to select the appropriate model to inform choice behavior^1,2,3.

Although many conceptual and algorithmic solutions to model arbitration exist^2,3,4, confirming implementation level details in terms of the operation of neural circuits has remained a challenge due to several factors. First, most experimental paradigms involve manipulating uncertainty in one of two ways. In some paradigms, there is uncertainty about which of multiple choice options is more rewarding through probabilistic reward contingencies and contingency reversals^5,6,7. In other paradigms, the correct option is deterministically linked to reward outcomes, but there is uncertainty about the correct model of the environment and when to choose that option^8,9,10,11,12. Few, if any, studies have manipulated expected and unexpected uncertainty¹³ related to stimulus-action-outcome relationships together with uncertainty about which model of the reward environment is relevant at the time. Critically, in reward environments where the correct model of the environment does not change frequently, the reliability of different models can reach their asymptotes very quickly, concealing the contributions of circuits involved in dynamic arbitration and model selection. Second, it is intrinsically difficult to measure and track the contributions of multiple learning systems and their interactions because different learning systems can drive choice behavior at any given moment. Third, because computations required for learning and arbitration under uncertainty must interact with each other, many cortical and subcortical areas may appear to be similarly involved in different processes (lack of specialization). For example, the amygdala has been shown to contribute to reward learning under uncertainty by both improving and impairing learning performance^{14,15,16,17,18,19,20} and by its contribution being associated with different types of uncertainty^13,21.

To overcome these challenges and reveal the circuit and neural mechanisms underlying arbitration, we applied multiple computational approaches to examine choice behavior in three groups of monkeys (a control group and two groups with bilateral lesions of either the amygdala or ventral striatum) performing a probabilistic learning task that involved multiple forms of uncertainty. These included uncertainty about the better option on a given trial (expected uncertainty), uncertainty about the correct model of the environment, and uncertainty about when reward associations change (unexpected uncertainty), thus creating a challenging task that could reveal the role of the amygdala and ventral striatum (VS) in all three of these processes. To track the simultaneous contributions of multiple learning systems and their interaction, we extended metrics based on information theory^22,23 to quantify consistency in choice and learning based on stimulus- and action-based learning over time. Additionally, we developed several reinforcement learning (RL) models that, along with previous models, were used to fit choice behavior on a trial-by-trial basis. These models extended the previous ones by incorporating static or dynamic arbitration among alternative learning systems, based on different signals. We then examined the best models and their estimated parameters, particularly those related to the arbitration process, to pinpoint the roles of the amygdala and VS in reward learning and arbitration. Moreover, by modulating the key parameters of the model, we simulated and qualitatively replicated the distinct behavioral signatures of amygdala or VS lesions. Together, by utilizing the above methods, we provide evidence for interactions between stimulus-based and action-based learning under uncertainty, uncover mechanisms underlying arbitration between the two systems, explore how arbitration and learning processes interact, and determine the amygdala’s contributions to arbitration and overall behavior.

Results

Behavioral paradigm with multiple forms of uncertainty

We examined monkeys’ choice behavior when performing a variant of a probabilistic learning paradigm that involves multiple forms of uncertainty. In this paradigm, during each block of 80 trials, monkeys selected between two novel visual stimuli that were randomly presented on the opposite sides of the screen (Fig. 1a; see Experimental paradigm in “Methods”). Selection of each option was rewarded with a certain probability (80:20, 70:30, or 60:40), but the probabilities for better and worse options reversed at a random point within the block without any signal to the monkey. Critically, at the start of a block of trials the monkeys were unaware if the assigned reward probabilities for a particular block were linked to the selection of specific stimuli or locations (Fig. 1b). This created uncertainty about the correct model of the environment (stimulus-based vs. action-based). In one task, rewards were exclusively based on stimulus-outcome associations (What-only task; Fig. 1c). In the second task, rewards in each block of trials were determined by either stimulus-outcome or action-outcome associations (What/Where task; Fig. 1d). Together, these components resulted in three types of uncertainty: uncertainty about the correct option on a given trial (expected uncertainty), uncertainty about the correct model of the environment, and uncertainty regarding when reward associations reverse (unexpected uncertainty).

**Fig. 1: Experimental paradigm, block types, and time course of performance.**

Evidence for multiple learning systems and their interaction

To examine the presence of multiple learning systems and reveal the effects of different forms of uncertainty, we first compared the performance of control monkeys across different tasks and reward schedules. Overall performance was best during the What-only task, in which only the stimulus identity was predictive of reward, and there was no inherent uncertainty about the model of the environment in terms of objective task structure. Using mixed-effects analysis which accounts for subject variability, we observed that the probability of choosing the more rewarding option, P(Better), during the What-only task was significantly higher than in either What blocks (main effects of block type; β_What = −0.074, p = 2.74 × 10⁻²⁸) or Where blocks (β_Where = −0.087, p = 2.39 × 10⁻³⁷) of the What/Where task, which involved additional uncertainty about the correct model of the environment. We also found that expected uncertainty affects performance, measured by P(Better): across all block types and tasks, performance improved as it became easier to discriminate between the reward probabilities of the two options (main effect of reward variance; β_var(P) = −1.950, p = 4.60 × 10⁻³¹).

To capture the effect of reward feedback and how it was used to perform the task and adjust the behavior, we utilized information-theoretic metrics to quantify consistency in reward-dependent choice strategy on two attribute dimensions, stimulus identity and stimulus location^22,23. Specifically, we examined the conditional entropy of reward-dependent strategy (ERDS), defined as the Shannon entropy of stay/switch strategy conditioned on the previous reward feedback, separately for stay/switch based on action or stimulus identity (see Eq. 1 in “Methods”). Lower values of ERDS_Stim suggest that the animals stayed or switched after reward feedback based on stimulus identity (stimulus-based learning), whereas lower values of ERDS_Action indicate that the animals’ stay/switch strategy was based on assigning reward to the chosen action (action-based learning).

To quantify the interaction between the two learning systems, we computed the correlation between ERDS_Stim and ERDS_Action computed from each 80-trial block (Supplementary Fig. 5). We found that even in the What-only task for which action-based learning was irrelevant and minimally used, there was a negative correlation between ERDS_Stim and ERDS_Action (Spearman’s correlation, r = −0.123, p = 4.29 × 10⁻⁷), suggesting that more consistency in using one model resulted in less consistency using the other model. For the What/Where task, we observed stronger negative correlations between ERDS_Stim and action ERDS_Action for both block types (What: r = −0.602, p = 2.43 × 10⁻²⁹⁶; Where: r = −0.578, p = 1.32 × 10⁻²⁶¹). Overall, these results reveal significant interactions between the stimulus- and action-based learning systems.

Considering previous findings on the influence of learning strategy on response time^24,25, we hypothesized that reaction time (RT) is influenced by the dominant learning system at any given time. To test this hypothesis, we categorized trials as either stimulus- or action-dominant by directly comparing ERDS_Stim and ERDS_Action (see Data analysis and statistical tests in “Methods” for details). Using this approach, we found that during the What/Where task, responses in stimulus-dominant trials were significantly slower than action-dominant trials in both block types (Supplementary Note 1). This contrast in stimulus vs. action-driven RTs was harder to identify in the What-only task, in which choices were dominated by the stimulus-based system. Nonetheless, rare action-dominant trials happened when reward value estimates based on the two systems were close to each other, resulting in slower and more erroneous responses. Overall, these results show that entropy-based metrics could be used to identify the adopted model on a given trial and that RT reflected the adopted strategy, with the stimulus-based strategy resulting in longer RT than the action-based strategy.

Influences of amygdala and ventral striatum lesions on learning and choice behavior

Next, we compared the effects of amygdala and VS lesions on choice behavior to elucidate their contributions to decision-making, learning, and arbitration. During the What-only task, amygdala-lesioned monkeys exhibited the largest impairment in performance (P(Better) across all three reward schedules: M ± SD = 0.581 ± 0.11; main effect of group in mixed-effects analysis; β_amyg = −0.172, p = 7.72 × 10⁻²²; contrast between lesion groups: β_amyg−β_VS = −0.076, p = 1.11 × 10⁻⁴; Fig. 1e; Supplementary Table 1). VS-lesioned monkeys also showed impairment compared to the control monkeys (main effect of group in mixed-effects analysis; β_VS = −0.096, p = 1.19 × 10⁻⁶; Fig. 1e; Supplementary Table 1). Although previous study¹⁷ have reported additional differences in the behavior of amygdala- and VS-lesioned monkeys, the better performance of VS- compared to amygdala-lesioned monkeys (Fig. 1e inset) is surprising, given the established role of VS for stimulus-based learning^26,27, which is required for the What-only task.

During What blocks of the What/Where task, however, amygdala-lesioned monkeys performed significantly better than VS-lesioned monkeys (mixed-effects analysis, contrast between lesion groups: β_amyg−β_VS = 0.104, p = 0.0152; Fig. 1f inset; Supplementary Table 2), while both lesioned groups were impaired relative to control monkeys (amygdala: β_amyg = −0.105, p = 0.00334; VS: β_VS = −0.209, p = 1.05 × 10⁻⁷). In Where blocks that did not require stimulus-based learning, only amygdala-lesioned monkeys showed impairments in performance relative to controls (contrast from controls = −0.070, p = 0.0173; Fig. 1g inset; Supplementary Table 2). VS-lesioned performance was comparable to that of controls (contrast from controls = −0.037, p = 0.242) yet not significantly better than that of amygdala-lesioned monkeys (contrast between lesion groups = −0.033, p = 0.351).

Overall, these results demonstrate that in the absence of intrinsic uncertainty about the correct model of the environment and when this model was stimulus-based, VS-lesioned monkeys (with intact amygdala) were able to partially overcome the deficit in stimulus-based learning to a significantly larger degree than amygdala-lesioned monkeys. Under additional uncertainty about the correct model of the environment (the What/Where task), however, VS lesions caused significant impairment in What blocks only, consistent with the role of VS in stimulus-based learning. In contrast, amygdala-lesioned monkeys exhibited impaired performance in both What and Where blocks (but more strongly in What blocks) despite no clear evidence for the significant contribution of the amygdala to action-based learning (but see ref. ²⁰), whereas there is action encoding in the amygdala and VS^28,29. As noted in the previous work¹⁸, the similar impairments during What and Where blocks observed in amygdala-lesioned monkeys cannot be explained by the amygdala’s currently assumed role in stimulus-based learning.

These results are also puzzling because the higher performance of VS-lesioned compared to amygdala-lesioned monkeys in the What-only task suggests a stronger contribution of the amygdala to stimulus-based learning. However, the higher performance of amygdala-lesioned compared to VS-lesioned monkeys in What blocks of the What/Where task contradicts this idea. These findings hint at a potential role of the amygdala in arbitration between stimulus- and action-based learning, in addition to its known role in stimulus-based learning.

To study the relative adoption of the two learning strategies according to uncertainty of the reward environment, we examined the difference between ERDS_Stim and ERDS_Action (ΔERDS) by block types and reward schedules (Supplementary Fig. 6). We found that the reward uncertainty (measured as variance¹³) is predictive of the relative degree of adoption between stimulus- and action-based strategies. More specifically, in the What-only and What blocks of the What/Where task, animals’ strategies became relatively more biased toward action-based (increasing ΔERDS) as the uncertainty of the reward schedule increased (Supplementary Fig. 6d, e). Consistently, in Where blocks, they tended to become relatively more stimulus-based under more uncertainty (decreasing ΔERDS; Supplementary Fig. 6f). These observations demonstrate that both control and lesioned monkeys adjust to reward uncertainty by exploring the incorrect model of the environment, even though they start from different baselines. Critically, amygdala-lesioned monkeys exhibited the smallest distinction between the two types of learning strategies (Supplementary Fig. 6d–f).

Finally, we also examined RT in the two lesioned groups and found consistent results to those of the control animals (Supplementary Note 1). Together, our findings suggest that VS lesions biased behavior toward action-based learning by impairing stimulus-based learning. In contrast, amygdala lesions resulted in more nuanced impairment of both stimulus- and action-based learning, as well as their coordination. To reveal the underlying mechanisms, we developed multiple computational models to fit the choice behavior of both control and lesioned monkeys.

Mechanisms of arbitration between stimulus- and action-based learning systems

To uncover mechanisms underlying the interaction between the two systems, we developed several hybrid RL models to fit the choice behavior of control monkeys on a trial-by-trial basis (see Supplementary Table 15 for the list of all models). In the simplest model, signals from distinct action-based and stimulus-based learning systems were combined linearly using a fixed weight to control choice behavior. We also tested models with dynamic arbitration in which the relative weighting of the two systems, ω, was updated on each trial based on the reliability of the two systems. Drawing on previous literature, we compared multiple methods for computing reliability: (1) the magnitude of the reward prediction error (|RPE|), (2) the value of the chosen option (V_cho), (3) discernibility between two competing options (|ΔV|), and (4) the sum of value estimates within each system (ΣV) (see Eqs. 13–16 in “Methods”). Additionally, we considered a more general model in which the baseline (time-independent) ratio of value signals from the two learning systems (quantified by parameter, ρ) could be adjusted independently of ω (Fig. 2a; see Eq. 10 in “Methods”). As a result, this (Dynamic ω-ρ) model relaxes the assumption that an increase in signal strength from one system (or equivalently, the sensitivity of decision-making to those signals) is matched by an equal decrease in signal strength from the other system, and vice versa. To determine the best model, we computed the goodness-of-fit using five-fold cross-validation (see K-fold cross-validation of model performance in “Methods”).

**Fig. 2: Fit of choice behavior of control monkeys using various RL models.**

Comparing the single-system models with the simplest two-system model, which assumes a fixed relative weighting for the two systems (RL_Stim+Action + Static ω), we found that the latter provided a better fit. Interestingly, this model improved the goodness-of-fit even in the What-only task, in which action learning was not predictive of reward. Overall, however, all the dynamic models provided a better fit than the model with fixed weighting. Ultimately, the Dynamic ω-ρ model, which uses the value of the chosen option (V_cho) to estimate reliability and incorporates a baseline weighting of the two systems quantified by ρ, provided the best fit across all tasks and for each monkey (Fig. 2b; Supplementary Table 16).

To gain more insight into how dynamic arbitration improves the fit of choice behavior, we next examined the behavior of the Dynamic ω-ρ model and its arbitration weights over time. To that end, we computed the effective arbitration weight (“effective” ω denoted by Ω) to measure the overall relative weighting between two systems considering the parameter ρ (see Eq. 12 in “Methods”). Both the example block and the averaged trajectories of trial-by-trial Ω from the best model (Fig. 2c, d) showed dependence on the block type and uncertainty in the reward schedule, especially during the What/Where task that required arbitration between competing models of the environment. These results demonstrate that Ω can capture behavioral adjustments to uncertainty over time.

As shown above, stimulus-based choices lead to slower RTs (Supplementary Note 1). Motivated by this finding, we tested whether the Dynamic ω-ρ model could capture the differences in RT according to the dominant learning system. To that end, we computed the correlation between the median RTs of the block and the average estimated values of Ω, which measures the overall relative weighting of the stimulus-based to action-based system. For the What-only task, we found a small yet significant correlation between the effective arbitration weight and RT (Spearman’s r = 0.094, p = 1.24 × 10⁻⁴). In comparison, Ω and RT were highly correlated in the What/Where task (r = 0.414, p = 3.78 × 10⁻²⁴⁵; Fig. 2e). However, because these simple correlations do not control for other confounding factors such as choice confidence (measured by value difference), choice accuracy (choosing the better or worse option), and long-term drift in RT, we conducted further regression analyses. These analyses included these factors along with other model-derived measures to predict trial-by-trial RT (Supplementary Note 2). Using these analyses, we found that across all groups and conditions, higher Ω predicted longer RT (Supplementary Note 2). Together, these results indicate that slower RTs occurred when larger weights were assigned to the stimulus-based system, and faster RTs occurred with larger weights on the action-based system. This is consistent with the previous analysis, which showed that action-dominant trials (determined using ERDS) were accompanied by faster RTs.

As part of our exploration of arbitration mechanisms, we also compared multiple algorithms for estimating the reliability of the two systems, including V_cho, |RPE|, discernibility between two competing options (|ΔV|), and the sum of value estimates within each system (ΣV) (see Eqs. 13–16 in “Methods”). We found that among the four reliability measures considered, V_cho best explained the control monkeys’ choice behavior across all block types (Supplementary Fig. 7a). Specifically, the Dynamic ω model based on V_cho improved the fit over the Static ω model in all tasks, whereas the Dynamic ω model based on |RPE| improved the fit over the Static ω model only in the What/Where task (Fig. 2b). Through model recovery, we confirmed that our model fitting procedure could effectively discriminate between the alternative one-system and two-system models (Supplementary Fig. 8a–d).

Importantly, V_cho and |RPE| are conceptually related as both signals measure the predictiveness of reward values in each system. However, the two signals differ in their sensitivity to negative feedback (i.e., no reward). For example, when the chosen value of the more reliable system (e.g., stimulus-based system in What blocks) is (correctly) estimated to be high, negative RPE (and consequently |RPE|) will also be high, undesirably facilitating the update toward the incorrect system. In comparison, V_cho by itself is less sensitive to negative feedback, as the updated value of V_cho after omission of reward will still reflect the high value estimates for the more reliable system. As a result, the V_cho signal distinguishes the more reliable system better than the |RPE| signal, especially for more uncertain reward schedules (Supplementary Fig. 7b–d). Ultimately, the difference in the V_cho drives the arbitration process (Eqs. 8–9), and this difference is equal to the difference in signed RPE. This suggests that the reliability signal could be more connected to signed rather than unsigned RPE.

Finally, to further validate our models, we compared the predictions of different models regarding the observed negative interaction between ERDS_Stim and ERDS_Action during the What-only task. This is to ensure that this relationship was due to competition between the two learning systems and not due to task structure, as the animals could not stay/switch on the stimuli and location dimension at the same time, while positions of stimuli were pseudo-randomly assigned to either side. To that end, we simulated choice behavior using single-system or two-system models and computed regression weights between ERDS_Stim and ERDS_Action (see Model fitting and simulation in “Methods” for more details). Competition between the two learning systems during the What-only task would suggest that weaker stimulus-based learning corresponds to stronger action-based learning and vice versa. We found that in the single-system model that learned the stimulus-outcome contingencies only (RL_Stim-only), ERDS_Stim was only weakly predictive of ERDS_Action (Supplementary Fig. 9b; β = −0.007, p = 1.07 × 10⁻⁴⁴). In contrast, in both the static and dynamic two-system models, ERDS_Stim was negatively predictive of ERDS_Action, thus reproducing the competitive interaction between the two systems (Supplementary Fig. 9c; β = −0.0648, p = 1.09 × 10⁻⁵⁶; Supplementary Fig. 9d; β = −0.0893, p = 6.73 × 10⁻⁵⁷). These simulation results further support the presence of multiple learning systems and their dynamic interaction, even in an environment where one of the two systems was not beneficial for performing the task.

Deficits in arbitration due to amygdala but not VS lesions

Fit of choice behavior of lesioned monkeys revealed that the Static ω model explained the choice behavior of both lesioned groups better than models with a single learning system (Fig. 3a). Moreover, incorporating dynamic arbitration as in the Dynamic ω model further improved the fit beyond what the Static ω model achieved in both groups. Furthermore, including baseline relative weighting, as in the Dynamic ω-ρ model, resulted in the best overall fit (Fig. 3a). Finally, consistent with the results in controls, for both lesioned groups, the dynamic model that used V_cho to estimate reliability accounted for choice behavior better than the model using |RPE| for estimating reliability (Supplementary Fig. 7a).

**Fig. 3: Fit of choice behavior for amygdala- and VS-lesioned monkeys.**

To determine the mechanisms by which different lesions impact the arbitration process, we examined the estimated parameters in the best model. We first confirmed that the parameters of this model were recovered well (Supplementary Fig. 8e, f). The estimated trajectory of Ω revealed that, similar to controls, arbitration was modulated by reward uncertainty during the What/Where task in both amygdala-lesioned (Fig. 3b) and VS-lesioned monkeys (Fig. 3c). Nonetheless, Ω values were overall smaller than in controls, corresponding to a more action-based strategy in lesioned animals (compare Fig. 3b, c and Fig. 2d). Importantly, a key difference between the two lesioned groups demonstrates deficits in the arbitration process due to amygdala lesions. During the What/Where task, the effective arbitration weight (Ω) for amygdala-lesioned monkeys increased over time in both What and Where blocks (dashed and dotted curves in Fig. 3b). In contrast, VS-lesioned monkeys showed an increase in Ω during What blocks and a decrease in during Where blocks (dashed and dotted curves in Fig. 3c), mirroring the pattern observed in control animals (dashed and dotted curves in Fig. 2d). Meanwhile, during the What-only task, Ω remained stable but at lower values for amygdala-lesioned (solid curves in Fig. 3b) compared to VS-lesioned monkeys (solid curves in Fig. 3c), and significantly lower than in controls (solid curves in Fig. 2d).

These results demonstrate that VS lesions biased behavior toward action-based learning while keeping the arbitration processes relatively intact, whereas amygdala lesions impaired arbitration in addition to biasing behavior toward action-based learning. These suggest that the deficits observed in amygdala-lesioned monkeys cannot solely be attributed to impairments in stimulus-based learning; instead, they involve a more complex interaction between stimulus- and action-based signals. Consistent with this interpretation, we found that the winning model with dynamic arbitration (Dynamic ω-ρ) more accurately captures the key aspects of behavioral strategy in the two lesioned groups compared to the single-system models (Supplementary Fig. 10).

To further investigate the dynamics of the arbitration weight, we next examined the rate of change in Ω across the three groups. To that end, we calculated the “effective” arbitration rates by calculating the ratio of the overall change in Ω toward 1 (favoring stimulus-based system) or 0 (favoring action-based system) relative to its original value (Fig. 3d; see Eqs. 17–19 in “Methods”). This quantity measures the rate of arbitration, analogous to the learning rate for updating value estimates. We found that in control monkeys, the effective arbitration rates toward the stimulus-based (ψ₊) or action-based (ψ₋) system diverged toward the end of a block, reflecting the adoption of the correct model of the environment. That is, when the stimulus-based system was more reliable, the effective arbitration rates toward the stimulus-based system were larger than those toward the action-based system (mixed-effects model with a single fixed intercept, representing the mean Δψ; What-only task: β₀ = 0.0930, p = 1.24 × 10⁻¹⁰²; What blocks of What/Where task: β₀ = 0.0444, p = 1.59 × 10⁻⁹; Fig. 3e, f; Supplementary Tables 3 and 4). Similarly, in Where blocks, where the action-based system was more reliable, the effective arbitration rate toward the action-based system was significantly larger than that toward the stimulus-based system (β₀ = −0.0301, p = 0.00968; Fig. 3g).

In contrast, amygdala-lesioned monkeys showed the minimum differentiation between adjustments toward the more and less reliable (correct and incorrect) learning systems. Notably, during the What/Where task, amygdala-lesioned monkeys exhibited no significant difference between the two arbitration rates during either block type (mixed-effects analysis on Δψ with a single intercept; What: β₀ = 0.0114, p = 0.218; Where: β₀ = −0.00347, p = 0.824; Fig. 3f, g; Supplementary Table 4). In contrast, VS-lesioned monkeys exhibited an overall large bias in arbitration rates toward the action-based system (i.e., higher ψ₋) during both block types (mixed-effects analysis on Δψ with a single intercept; What: β₀ = −0.0536, p = 3.95 × 10⁻⁷; Where: β₀ = −0.0739, p = 2.03 × 10⁻⁵; Fig. 3f, g; Supplementary Table 4). Overall, amygdala-lesioned monkeys were characterized by the least amount of differentiation between the two arbitration rates (mixed-effects analysis on |Δψ|; contrast from controls = −0.0254, p = 0.0114; contrast from VS = −0.0282, p = 0.0177; Supplementary Table 5).

In the What-only task, with reduced uncertainty about the model of the environment, the difference between the two arbitration rates in amygdala-lesioned monkeys was positive (contrast on group means = 0.0142, p = 0.00116; Fig. 3e; Supplementary Table 3) but much smaller than that of control monkeys (main effect of group in mixed-effects analysis; β_amyg = −0.0786, p = 1.15 × 10⁻³⁷). The VS-lesioned group also exhibited higher arbitration rates toward the stimulus-based system (contrast on group means = 0.0226, p = 1.20 × 10⁻⁴), which aligns with the recovered performance observed in these monkeys.

Together, these results suggest that amygdala lesions impair arbitration between the two learning systems by eliminating differential updates for the correct and incorrect (more and less reliable) systems. This indicates that the amygdala is critical for identifying and/or retaining the correct model of the environment, or biasing arbitration toward it. In contrast, VS lesions mainly impair stimulus-based learning and increase the overall arbitration bias toward action-based learning.

Dynamic interaction between learning and arbitration processes and the impact of the initial state

Considering the observed effects of amygdala and VS lesions on arbitration dynamics, we next examined the estimated parameters from the best-fit model (Dynamic ω-ρ). In this model, ρ captures whether there is an overall reduction in baseline value signals from the stimulus-based system relative to the action-based system. Consistent with the hypothesized role of VS in stimulus learning, the estimated values of ρ were on average smaller in VS-lesioned monkeys compared to controls (permutation test for difference in group mean; p = 0.0044), indicating a larger baseline reduction in stimulus-value signals relative to action-value signals in VS-lesioned monkeys (Fig. 4a). In contrast, we found no such evidence for reduction in ρ in amygdala-lesioned monkeys relative to controls (permutation test for difference in group mean; p = 0.869). As a result, VS-lesioned monkeys exhibited a smaller difference in choice sensitivity to stimulus- and action-value signals (Δβ = β_stim−β_action), with a bias toward action-value signals, compared to controls in both the What-only task (mixed-effects analysis on Δβ, main effect of group; β_VS = −7.48, p = 0.00658; Fig. 4b inset; Supplementary Table 6) and the What/Where task (β_VS = −3.53, p = 0.0333; Fig. 4c inset). This was not the case for amygdala-lesioned monkeys, which exhibited no significant difference compared to controls in either task (What-only: β_amyg = −2.69, p = 0.280; Fig. 4b inset; What/Where: β_amyg = 0.542, p = 0.719; Fig. 4c inset; Supplementary Tables 6 and 7). These results suggest that, unlike VS lesions, amygdala lesions did not significantly alter the relative baseline strength of stimulus-value vs. action-value signals. Instead, amygdala lesions reduced sensitivity to both systems. Therefore, consistent with previous observations, the deficit observed in amygdala-lesioned monkeys cannot be solely attributed to impairments in stimulus-based learning. Instead, they suggest deficits in arbitration processes that subsequently affect learning and decision making.

**Fig. 4: Comparison of relative weighting and sensitivity of choice to stimulus- and action-based signals across control and lesion groups, and their simulations.**

To confirm this point, we examined the initial arbitration weights that determine the weights of the two systems on choice at the beginning of each block, when the monkeys were unaware of thecorrect model of the environment during the What/Where task. We note that amygdala- and VS-lesioned monkeys did not significantly differ in the initial Ω (effective ω) values (planned contrast for group difference in Ω₀= 0.0242, p = 0.657; Supplementary Table 8; compare Ω of the first trial in Fig. 3b, c). However, by examining the initial arbitration weights (ω₀) before scaling by ρ, we found that amygdala-lesioned monkeys had significantly smaller ω₀ values compared to controls (mixed-effects analysis on Ω₀; β_amyg = −0.194, p = 0.00419) while the VS-lesioned group did not (β_VS = −0.119, p = 0.108; Supplementary Table 9). More directly, the amygdala group showed larger changes in ω₀ after scaling by ρ compared to the VS group (group contrast in mixed-effects analysis on Ω₀−ω₀ = 0.111, p = 0.0204; Supplementary Table 10). This means that larger values of ρ in amygdala-lesioned monkeys were offset by lower ω₀ values to yield Ω₀ comparable to VS-lesioned monkeys. Therefore, deficits due to amygdala lesions can be mainly attributed to the reduction in the initial weight (ω₀) and subsequent interaction between arbitration and learning processes. In contrast, deficits due to VS lesions are largely caused by a reduction in the relative baseline strength of stimulus-value to action-value signals, measured by ρ.

To further validate this idea through model simulations, we generated the choice behavior of the Dynamic ω-ρ model by adjusting two key parameters to mimic the effects of brain lesions: baseline ratio of the weights of the stimulus- to action-value signals (ρ) and the initial arbitration weight (ω₀). These two parameters reflected the most consistent effect of lesions across the two tasks, with reduced ρ in VS-lesioned monkeys (Fig.4a–c; β_VS = −0.195, p = 0.00279; mixed-effects analysis on compiled ρ across all groups/tasks) and reduced ω₀ in amygdala-lesioned monkeys (β_amyg = −0.191, p = 0.0171; mixed-effects analysis on compiled ω₀ across groups/tasks). We kept all other parameters constant except for the common inverse temperature, β₁.

Trajectories of simulated ω during What blocks revealed that different values of ρ and ω₀ can create different dynamics with respect to arbitration rates (Fig. 4d–f). More specifically, the simulated trajectory of ω based on mean ω₀ in control monkeys during the What/Where task (Fig. 4d, black arrow) resulted in larger transitions toward the stimulus-based system (ψ₊ > ψ₋), whereas mean ω₀ in amygdala-lesioned monkeys (Fig. 4e, red arrow) reduced the distinction between the two arbitration rates (ψ₊ ≈ ψ₋). In comparison, simulations using mean ω₀ in VS-lesioned monkeys (Fig. 4f, gray arrow) resulted in an overall update bias toward the action-based system (ψ₊ < ψ₋). These results qualitatively mimic the pattern of effective arbitration rates in the three groups (Fig. 3e–g).

Finally, we also tested the causal contribution of initial arbitration weight on performance using simulated choice behavior (Fig. 4g–i). Crucially, we found that lower values of ω₀, as observed in amygdala-lesioned monkeys (ω₀ = 0.18; red arrows in Fig. 4h), lead to reduced performance when compared to higher ω₀ values (e.g., ω₀ = 0.40). This effect was reflected in a significant main effect of ω₀ on the simulated performance (F_(20,1659) = 40.6, p = 8.38 × 10⁻¹²⁸). These simulation results demonstrate that reduced flexibility in the arbitration process––reflected by a lower ω₀––could be the main cause of impaired performance, rather than just a secondary consequence.

Although control monkeys were also biased toward the action-based system at the start of the What/Where task (mean ± s.e.m; ω₀ = 0.374 ± 0.058), lesions to the amygdala resulted in an even larger bias toward the action-based system (ω₀ = 0.179 ± 0.053), and this consequently led to a lack of differential updates for the two systems. Therefore, our simulations indicate that amygdala-lesioned monkeys operate within a parameter space that produces smaller differences in arbitration rates favoring the correct system for a given environment, which ultimately reduces performance. Overall, these results suggest that the initial state of the system (ω₀) is crucial for determining the later trajectory and rates of transition in the arbitration process.

In contrast, lesions to VS mainly decreased ρ to bias signals toward the action-based system, while affecting the initial state of arbitration to a lesser degree (ω₀ = 0.276 ± 0.042). It is worth noting that the simulations using ρ = 0.4 (Fig.4f, i), which mimics the reduction in the relative baseline strength of stimulus-value signals due to VS lesions, result in the biased update rates toward the action-based system for many of the ω₀ values (blue lines in Fig.4f), including ω₀ ~ 0.37, which matches the initial values for control monkeys. Therefore, the consistent adoption of an action-based strategy in the What/Where task can be sufficiently accounted for by a reduced ρ value, without the need for additional constraints on ω₀. These results support the notion that the impairments observed in VS-lesioned monkeys during this task can be fully explained by a reduction in stimulus-value signals, with minimal direct impact on arbitration processes.

Diversity of behavior driven by the dynamic interaction between learning and arbitration processes

To demonstrate the impact of dynamic interaction between the learning and arbitration processes on behavior, we simulated the model within the task by adjusting parameters such as the learning and forgetting rates. These simulations revealed a wide range of dynamics in performance and arbitration weights, highlighting complex interactions between learning, arbitration, and decision-making processes (Fig. 5). Interestingly, we observed that higher initial arbitration weights, which would allow the animals to correctly bias their behavior toward the stimulus-based system during a stimulus-learning task, can both facilitate and impede learning after reversals depending on other parameters of the model.

**Fig. 5: Complex interaction between arbitration and learning gives rise to diverse behavioral patterns.**

However, in most cases, a larger initial bias toward the stimulus-based system helps both initial learning of stimuli and their reversals (Fig. 5a, c–f).

However, in scenarios where the positive learning rate significantly exceeds the negative learning rate, a smaller initial arbitration weight—though it may incorrectly bias behavior toward an action-based strategy—can actually facilitate adjustments to reversals in stimulus values (Fig. 5b). This happens because lower values of ω₀, as in the case of amygdala lesions, result in a dependence of choice on both stimulus- and action-based signals and thus, more explorations that greatly benefit response to reversals. These results, based on our best dynamic arbitration model, can thus explain the paradoxical improvements in performance observed following amygdala lesions or inactivation.

Contribution of the amygdala to long-term adjustments of behavior

Lesions to certain brain areas are often accompanied by adjustments or compensation by other brain areas that result in reducing initial behavioral impairments over the long term. Considering the observed effects of amygdala and VS lesions on learning and decision-making behavior, we investigated long-term adjustments in these behaviors in the absence of task-imposed, objective uncertainty about the correct model of the environment. To that end, we examined ERDS, median RT, and initial arbitration weight across all sessions of the What-only task using the proportion of sessions completed as an independent variable (Methods).

For consistency in stimulus-based strategy, we observed a long-term decrease in ERDS_Stim in VS-lesioned monkeys (planned contrast for the slope of VS group = −0.626, p = 4.94 × 10⁻³²⁴; Supplementary Fig. 11a), to a significantly greater extent than control monkeys (β_VS:sess% = −0.549, p = 7.78 × 10⁻¹², controls as a reference group; Supplementary Table 11). There was no evidence for such adjustment in control (β_cont:sess% = −0.077, p = 0.0881) or amygdala-lesioned monkeys across time (planned contrast for the slope of amygdala group = −0.033, p = 0.478). Specifically, despite their impaired stimulus-based learning, monkeys with VS lesions were able to increase their adoption of stimulus-based strategy over time. Consistently, these monkeys also decreased their adoption of action-based strategy as reflected in the positive slope of ERDS_Action over time (planned contrast for the slope of VS group = 0.192, p = 3.49 × 10⁻⁴; Supplementary Fig. 11b), which was significantly greater compared to controls (β_VS:sess% = 0.211, p = 0.00208; Supplementary Table 12). There was no evidence of such an effect in control monkeys (β_cont:sess% = −0.019, p = 0.652) or in monkeys with amygdala lesions (planned contrast for the slope of amygdala group = 0.014, p = 0.748). Interestingly, consistent with previous results, the complementary changes in model adoption in VS-lesioned monkeys were also reflected in increased median RT over time in these monkeys (planned contrast for the slope of VS group = 30.7, p = 1.61 × 10⁻⁶; Supplementary Fig. 11c; Supplementary Table 13) to greater extent than controls (β_VS:sess% = 24.3, p = 0.00236) or amygdala-lesioned monkeys (planned contrast for group difference in slopes = −29.6, p = 0.00224). This was accompanied by a long-term increase in the initial effective arbitration weights Ω₀ (toward stimulus-based system) only in the VS-lesioned monkeys (planned contrast for the slope of VS group = 0.455, p = 2.88 × 10⁻⁴; Supplementary Fig. 11d; Supplementary Table 14), which was significantly greater compared to both controls (β_VS:sess% = 0.568, p = 5.42 × 10⁻⁵) and amygdala group (planned contrast for group difference in slopes = −0.431, p = 0.00183).

These results provide evidence for adjustments on a long timescale in VS-lesioned but not amygdala-lesioned monkeys. They suggest that, in the absence of additional uncertainty about the model of the environment, intact amygdala in VS-lesioned monkeys (and not intact VS in amygdala-lesioned monkeys) enabled these animals to slowly improve their performance over time. This amygdala-driven mechanism enabled VS-lesioned monkeys to gradually suppress action-based strategy, resulting in an increase in overall RT and initial effective arbitration weight (Ω₀) over time.

Discussion

Here, we applied a combination of computational approaches to reanalyze data from control monkeys and those with amygdala and VS lesions^17,18,24 to explore the interaction between stimulus- and action-based learning and to uncover computational and neural mechanisms underlying arbitration processes. Our main goal was to investigate the interaction between stimulus- and action-based learning systems, instead of examining them in isolation as in the original studies. Using multiple behavioral metrics, we found evidence for competitive interaction between the two learning systems. Moreover, by developing various models with arbitration and fitting choice data to these and competing models, we tested the plausibility of various mechanisms for estimating reliability signals that guide arbitration processes. Using this approach, we mapped the distinct effects of two brain lesions onto two key parameters of the model: the initial state of the arbitration (ω₀) for amygdala lesions and the relative baseline strength of stimulus-value to action-value signals (ρ) for VS lesions.

For amygdala-lesioned monkeys, the reduced initial arbitration weight was implicated in undifferentiated arbitration rates toward and away from the correct learning system for a given environment. This suggests that the amygdala may have a role in identifying and retaining the correct model of the environment, or in biases model arbitration toward the correct model of the environment. Our model simulations also illustrated that the interaction between learning and arbitration processes generates diverse behaviors with strong dependency on the initial state.

Previous studies using the same dataset have identified deficits in both stimulus-based and action-based learning due to amygdala lesions^17,18, but they considered these deficits independently, as they did not examine the interaction between stimulus- and action-based learning. Using a single-system RL model, they found that amygdala lesions reduce choice consistency (sensitivity to value signals) for stimulus-based learning¹⁷ and increase sensitivity to negative feedback (α₋) for action-based learning¹⁸. We found results using our two-system model with dynamic arbitration (Supplementary Figs. 12b, 13d and 14–16) Moreover, we provided a unified account of the monkeys' choice behavior based on the dynamic interaction between learning and arbitration under uncertainty. Specifically, our simulation results mimicking amygdala lesions (Fig. 4) suggest that a biased initial state strongly favoring action-based learning is the key feature of the deficits observed in amygdala-lesioned monkeys. This strong initial bias altered the interaction between decision-making, learning, and arbitration processes, making the arbitration update rates between the two systems to be less distinguishable from each other compared to the controls. Importantly, VS-lesioned monkeys exhibited a smaller sensitivity to the stimulus-based compared to the action-based signals. As a result, VS lesions led to an overall bias in arbitration update rates toward action-based learning in both What and Where blocks.

More specifically, we found that the difference between the effective weighting of the two systems (β_stim and β_action) in amygdala-lesioned monkeys was not significantly different from that of controls (Fig. 4b, c inset). This indicates that amygdala lesions reduced sensitivity to stimulus-based and action-based signals to a similar degree, unlike the pattern observed in VS-lesioned monkeys. Instead, the main deficits due to amygdala lesions were captured by a biased initial state of arbitration that favors action-based signals. When coupled with the overall reduced sensitivity to value signals, this effect diminishes differential effective arbitration rates for correct and incorrect (more reliable and less reliable) models (Fig. 4e). This suggests that in addition to its contribution to stimulus-based learning, amygdala also plays a crucial role in identifying and/or retaining the more reliable model of the environment¹⁶, or mediating the influence of such identification on arbitration processes. This suggests that the amygdala, like the prefrontal cortex, is involved in learning to learn³⁰ and can explain why amygdala lesions weaken the amount of evidence needed before the animals reverse their choice preference¹⁶.

Interestingly, VS-lesioned monkeys (with intact amygdala) were able to gradually overcome their impaired stimulus-based learning while showing a significantly larger arbitration rate for the stimulus-based system during the What-only task (Fig. 3e). This suggests that a signal to or from the amygdala, but not in the amygdala-to-VS pathway, could bias arbitration toward the more reliable model and lead to slow long-term behavioral adjustments. We found that arbitration was still present in amygdala-lesioned monkeys, suggesting that the amygdala is not required for arbitration per se but has a more nuanced role by setting and/or retaining the initial balance between models and improving the overall sensitivity to value signals. These two effects result in larger arbitration rates for the more reliable model of the environment, thus altering the trajectory of learning and choice behavior.

Moreover, we found that while VS-lesioned monkeys exhibit a bias toward the action-based strategy during the What/Where task, their response times are also shorter than those of control and amygdala-lesioned monkeys. Given the significant involvement of VS in effort exertion^31,32,33,34, the shorter RT in VS-lesioned monkeys could be linked to the fact that the stimulus-based strategy requires more cognitive effort. This is reflected in the slightly longer RT in What blocks compared to Where blocks, as well as the positive correlation between RT and arbitration weight. As a result, VS-lesioned monkeys may default to the action-based strategy, allowing them to perform the task faster. The stronger reliance on the action-based strategy in VS-lesioned monkeys can be adequately explained by the reduction in the ρ parameter of our model, without necessarily suggesting an impaired arbitration process.

Arbitration between alternative models has garnered significant interest in cognitive, behavioral, and systems neuroscience. This includes arbitration between model-free vs. model-based RL^35,36,37,38, Pavlovian vs. instrumental control³⁹, habitual vs. goal-directed system^40,41, competing sets of strategies for solving complex stimulus-response mappings^42,43, and during social decision-making^44,45,46. Here, we explored more basic arbitration required for any type of decision-making, as any choice option has to be selected by taking an action. Unlike arbitration between different types of learning systems––which often requires distinct reliability signals (e.g., model-free vs. model-based relying on unsigned reward prediction error and unsigned state prediction error³⁶)––we found that the same reliability signal, based on the value of the chosen option or chosen action (V_cho), can be used for arbitration between stimulus- and action-based learning. Critically, we found that in both controls and lesioned monkeys, the reliability signal based on V_cho captured arbitration better than the reliability signal based on unsigned RPE. Because the difference in chosen values is equal to the difference in signed RPEs (Eq. 14), our results suggest that the reliability of alternative models may be more linked to signed RPE than to unsigned RPE.

Our proposal that the amygdala contributes to model arbitration to identify and reinforce the correct model of the environment is consistent with its postulated role in signaling attentional shifts for relevant control of behavior^47,48. There are several pathways through which the amygdala could affect model arbitration. One major candidate is prefrontal-amygdala circuits^49,50. In particular, the orbitofrontal cortex (OFC) receives substantial projections from the amygdala^51,52 and could serve a central role in encoding and monitoring the reliability of multiple actor predictive models³. Given that amygdala-to-OFC input has been reported to be significantly involved in value coding by OFC neurons^53,54, it is possible that this input also carries information for selective arbitration to appropriately bias behavior toward the relevant learning system in a given environment. Conversely, the PFC-to-amygdala pathway could signal an internal state variable (arbitration weight in our model), serving as the necessary input to the amygdala for computing a differential adjustment in model arbitration that is relayed back to the PFC. This may explain why basolateral amygdala lesions could reduce OFC-induced impairment in reversal learning¹⁴. Thus, strong reciprocal connections between amygdala and PFC, in particular vlPFC, OFC, or ACC, could be crucial for proper arbitration between alternative models of the environment.

While earlier lesion studies have attributed varying degrees of behavioral deficits to amygdala^55,56, its role in instrumental learning has since been a matter of debate due to mixed evidence both in favor of^{17,18,19,57,58} and against^{14,15,59,60,61} its involvement in reward learning. Critically, our framework can account for the amygdala’s seemingly inconsistent role. Through simulations of our best-fitting model with dynamic interaction between two systems, we found that in certain situations, the lower initial arbitration weight that biases behavior toward the action-based system can actually facilitate adjustments to reversals during stimulus-based learning, especially if the performance has saturated before reversal (Fig. 5b). This happens because more reliance on the less reliable action-based system allows faster exploration of alternative stimuli and thus faster reversal. This could account for puzzling improvement in performance due to basolateral amygdala lesions in rats¹⁵ or monkeys^16,59, which has been attributed to increased benefits from negative feedback. In these examples, the stimulus-based system would still prefer the previously better option, which is no longer rewarding, but a less reliable action-based system would cause more switching from that option. Other studies that reported null results from reversible amygdala inactivations^60,61 have also utilized object reversals after initial learning of stimulus-reward associations over a long period of time (referred to as discrimination learning).

What these studies have in common is that they all utilize visual discrimination learning over a long period of time before a reversal, which would suppress learning from unrewarded trials (i.e., small α₋) and allow the reliability of the stimulus-based system to reach its asymptote, thus slowing down reversal. This is very different from our experimental paradigm in which reversals happened on a short timescale before the reliability of the stimulus-based system could stabilize. Overall, our study suggests that observed discrepancies in the effects of amygdala lesions are due to a dynamic interaction between arbitration, learning, and decision-making processes.

The dynamic interaction between arbitration and learning processes is particularly relevant in interpreting results using behavioral paradigms that were intended to parse the contributions of one learning system, but where competing learning systems could have strong unintended effects on behavior. Our results indicate that interpreting behaviors shaped by various learning systems should be approached with caution. This is particularly important when the manipulations in use might influence the arbitration process and thereby change the interplay among the learning systems. In principle, a multitude of simple learning strategies could underlie the heterogeneity in the so-called decision variables^62,63,64, and careful examination of neural signals^65,66,67 is needed to properly identify the neural substrates of corresponding learning systems.

Methods

Experimental paradigm

We examined two variants of a probabilistic reversal learning task in which monkeys selected between two visual stimuli to obtain a juice reward. During each block of the What-only task, reward was assigned stochastically according to stimulus identity only while reward probabilities on the two stimuli (selected afresh on each block) switched between trial 30 and 50 of the block without any signal to the monkeys (Fig. 1a). In the What/Where task, reward was assigned based on either stimulus identity (What blocks) or stimulus location (Where blocks) with reversal similar to the What-only task (more details below). Data were collected from a total of 20 unique monkeys, some of whom received bilateral excitotoxic lesions to either the amygdala or the VS (Supplementary Fig. 17). All experimental procedures for all monkeys were performed in accordance with the Guide for the Care and Use of Laboratory Animals and were approved by the National Institute of Mental Health Animal Care and Use Committee. We describe each experimental setup in more detail below.

What-only task

Data from this task¹⁷ were collected in eleven male rhesus macaques weighing 6.5–10.5 kg (controls: n = 4; amygdala-lesioned: n = 4; VS-lesioned: n = 3). The monkeys completed an average of 26.73 sessions (SD = 5.98) and an average of 16.81 (SD = 6.72) blocks per session. In total, monkeys completed on average 372.9 blocks (SD = 89.6). Each block consisted of 80 trials and involved a single reversal of stimulus-outcome contingencies on a randomly selected trial between trials 30 and 50 from a uniform distribution (Fig. 1a). On each trial, monkeys were trained to fixate on a central point on a screen (500–750 ms) to initiate the trial. After fixation, two stimuli, a square and a circle of random colors, were assigned pseudo-randomly to the left and right of the fixation point (6° visual angle). Monkeys indicated their choice by making a saccade to the target stimulus and fixating for 500 ms. Reward (0.085 ml juice) was delivered according to the assigned reward schedule for a given block. Each trial was followed by a fixed 1.5 s inter-trial interval. Trials in which monkeys failed to fixate within 5 s or make a choice within 1 s were aborted and then repeated.

The reward schedule was determined by the probabilities of reward on two choice options selected from four possible values: 100/0, 80/20, 70/30, and 60/40. The reward schedule was randomly selected at the start of each block and remained constant within that block. Monkeys performed the deterministic task (100/0 reward schedule) after the data collection for the stochastic task had been completed. Here, we focus on our analyses of the task’s stochastic variant to match the reward schedules used in the What/Where task (which only contained stochastic schedules, as described below), and therefore, we have excluded the deterministic portion of the data from our analyses. All monkeys that received lesions were trained and tested following their recovery from surgery. For more detailed surgical information, see the Supplemental Experimental Procedures in the original study¹⁷. This experimental setup and some analyses of the data have been previously reported¹⁷.

What/Where task

Unlike the What-only task, the What/Where task involved both stimulus-based and action-based learning, and this feature introduced additional uncertainty about the correct model of the environment. The effects of lesions to the VS and amygdala during the What/Where task were examined in two different studies with separate sets of controls for each. The first study investigating the effect of VS lesions²⁴ had a total of eight subjects weighing 6.5–11 kg (controls: n = 5; VS-lesioned: n = 3). One of the five control monkeys and all three of the VS-lesioned monkeys were the same monkeys used in the What-only task¹⁷. The second study investigating the effect of amygdala lesions¹⁸ had a total of 10 subjects weighing 6–11 kg (controls: n = 6; amygdala-lesioned: n = 4). One of the six unoperated controls was the same monkey used in the What-only task¹⁷ and the What/Where task involving VS lesion²⁴. One additional control monkey was used as an unoperated control for the earlier What/Where task only²⁴. The remaining control and the four amygdala-lesioned monkeys were additionally trained for the subsequent study using the What/Where task¹⁸. See Supplementary Fig. 17a for a summary diagram. Any monkeys that have participated in both the What-only and What/Where tasks (i.e., one control and three VS-lesioned monkeys) first completed the What-only task and then later completed the What/Where task. Notably, all newly trained monkeys that performed the What/Where task were first trained on a simple two-armed bandit stimulus-based reward associations (“What” condition). After learning about this task, they were then trained with a deterministic version (100/0) of the What/Where task and gradually transitioned into the probabilistic outcomes used for this experiment (80/20, 70/30, 60/40). As such, all monkeys in the study shared the same prior training experience, specifically learning the “What” task first.

The monkeys in this task completed an average of 29.56 sessions (SD = 5.36), with an average of 18.41 (SD = 4.26) blocks per session. In total, monkeys completed on average 559.1 blocks (SD = 141.6). Each block consisted of 80 trials and involved a single reversal of the stimulus-based or action-based contingencies between trials 30 and 50. A given block was randomly assigned as a What or a Where block and remained constant within that block (Fig. 1b). In What blocks, reward probabilities were assigned based on stimulus identity, with a particular object having a higher reward probability. In Where blocks, reward probabilities were assigned based on location, with a particular side having a higher reward probability regardless of stimulus identity. What and Where blocks were randomly interleaved throughout the session, and the block type was not indicated to the monkey. The reward schedule was randomly selected from three schedules (80/20, 70/30, 60/40) at the start of each block and remained constant within that block.

On each trial, monkeys were trained to fixate on a central point on a screen (400–600 ms) to initiate the trial. After fixation, two visual objects were assigned pseudo-randomly to the left and right of the fixation point (6° visual angle). Each block used two novel images that the animal had never seen before. Monkeys indicated their choice by making a saccade to the target stimulus and fixating for 500 ms. Reward was delivered probabilistically according to the assigned reward schedule for a given block. Each trial was followed by a fixed 1.5 s inter-trial interval. Trials in which monkeys failed to fixate within 5 s or make a choice within 1 s were aborted and then repeated. This experimental setup, surgical information, and some analyses of the data have been previously reported^18,24.

Quantification and statistical analysis

Entropy-based metrics

Here, we utilized information-theoretic metrics to quantify learning and choice behavior^22,23. Specifically, we focused on the conditional entropy of reward-dependent strategy (ERDS) to measure how monkeys associated reward feedback with stimulus identity or action.

Generally, ERDS is calculated as follows:

$${{ERDS}}=\,H\left({{str}}|{{rew}}\right)= -\left\{P({{stay}},{{win}})\times {\log }_{2}\frac{P\left({{stay}},{{win}}\right)}{P\left({{win}}\right)}+P({{switch}},{{win}})\right.\\ \left. \times {\log }_{2}\frac{P\left({{switch}},{{win}}\right)}{P\left({{win}}\right)}+P\left({{stay}},{{lose}}\right)\times {\log }_{2}\frac{P\left({{stay}},{{lose}}\right)}{P\left({{lose}}\right)} \right.\\ \left.+P\left({{switch}},{{los}}e\right)\times {\log }_{2}\frac{P\left({{switch}},{{lose}}\right)}{P\left({{lose}}\right)}\right\}$$

(1)

where str is the adopted strategy coded as stay (1) or switch (0), rew is the previous reward outcome coded as reward (1) or no reward (0). Noting that $\frac{P({{\mathrm{stay}}},{{\mathrm{win}}})}{P({{\mathrm{win}}})}$ and $\frac{P({{\mathrm{switch}}},{{\mathrm{lose}}})}{P({{\mathrm{lose}}})}$ measure tendencies for win-stay and lose-switch strategies, one can see the ERDS combines these tendencies into a single quantity:

$$ERDS=\, -\{P({stay,win})\times {\log }_{2} (WinStay)+P({switch,win}) \\ \times {\log }_{2}(1- WinStay)+P({stay,lose})\times {\log }_{2} (1-LoseSwitch)\\ +P({switch,lose})\times {\log }_{2}(LoseSwitch)\left)\right.\}.$$

(2)

As the equations above suggest, ERDS measures the consistency in response to reward feedback (win or lose). Lower ERDS values correspond to decreased randomness in the variable and thus more consistency in the utilized strategy, which could be stimulus-based and/or action-based. To detect these strategies, we defined two types of ERDS by considering choice and reward feedback in terms of stimulus identity or action, corresponding to ERDS_Stim and ERDS_Action, respectively.

Therefore, lower values of ERDS_Stim suggest that the animals stay or switch consistently based on stimulus identity according to reward feedback, indicating the stronger adoption of the stimulus-based strategy. Conversely, lower values of ERDS_Action indicate that the animals adopted the action-based strategy more strongly. Overall, comparison of ERDS_Stim and ERDS_Action enables us to quantify the adopted strategy on a trial-by-trial basis, either by computing the average values across a block of trials or by aligning all trials relative to the beginning, reversal point, and the end of each block.

Computational models

Single-system reinforcement learning (RL) models

We first used two standard RL models that learn about one type of reward contingencies to fit monkeys’ choice data. Specifically, the RL_Stim-only and RL_Action-only models associate reward outcomes to choice options either in terms of stimulus identity or chosen action in order to estimate stimulus and action values, respectively. These values were used to determine choice on each trial and were updated based on reward outcome at the end of trial, as described below.

More specifically, the value of the chosen option (V_C) is updated using reward prediction error (RPE) and two separate learning rates for rewarded and unrewarded trials (α₊ and α₋, respectively) while the value of the unchosen option (V_U) decays to zero:

$${V}_{C}(t+1)=\left\{\begin{array}{l}{V}_{C}(t)+{\alpha }_{+}\left(R\left(t\right)-{V}_{C}\left(t\right)\right)\,{{{\mathrm{if}}}}\,R(t)=1,\\ {V}_{C}(t)+{\alpha }_{-}\left(R\left(t\right)-{V}_{C}\left(t\right)\right)\,{{{\mathrm{if}}}}\,R(t)=0,\end{array}\right.$$

(3)

$${V}_{U}\left(t+1\right)={\left(1-\zeta \right)V}_{U}\left(t\right),$$

(4)

where R(t) is the reward feedback on trial t and ζ is the decay or forgetting rate for the unchosen option. Chosen and unchosen options are coded as {stimulus A, stimulus B} in the RL_Stim-only model and as {Left, Right} in the RL_Action-only model. In these and other models, the probability of choosing the option on the right, P_Right(t), was computed using a softmax function:

$${P}_{{{\mathrm{Right}}}}(t)=\frac{1}{1+\exp (-{\beta }_{1}\left({{OV}}_{{Right}}(t)-{{OV}}_{{Left}}(t)\right){-\beta }_{0})},$$

(5)

where OV_Left and OV_Right denote the overall reward value of options on the left and right, β₁ controls the steepness of the sigmoid function (inverse temperature) measuring the baseline sensitivity of choice to difference in value signals, and β₀ is the side bias with positive values corresponding to a bias toward choosing right. For the RL_Stim-only model, OV_Left and OV_Right were assigned based on the stimulus identity appearing on the respective side for a given trial. For example, if stimulus A appeared on the left of fixation, then OV_Left = V_StimA and OV_Right = V_StimB. For the RL_Action-only model, the overall reward values correspond to action values; i.e., OV_Left = V_Left and OV_Right = V_Right.

Two-system model with static weighting of stimulus- and action-based learning

As an extension of the above RL models, we considered hybrid RL models that constituted two value functions, V_Stim and V_Action, to simultaneously track the reward value for alternative stimuli and actions and made choices based on a weighted sum of value signals from the two systems with a weight that was fixed on each block of the experiment (RL_Stim+Action+Static ω or Static ω model for short). Specifically, the value functions were updated in parallel using Eqs. 3 and 4.

Therefore, the overall values in this model are computed as follows:

$${{OV}}_{i}\,={V}_{{Stim}\left(i\right)}\omega+{V}_{{Action}(i)}(1-\omega ),$$

(6)

where ω represents the relative weight of the stimulus-based system compared to the action-based system, i ∈ {Left, Right}, and V_Stim(i) indicates the stimulus value for the option appearing on the side i. For example, if the stimulus A appeared on the left side, then OV_Left = V_StimAω + V_Left(1−ω) and OV_Right = V_StimBω + V_Right(1−ω). Similar to the learning rates and other parameters, a single value of ω was estimated for each block of trials. In the special case where ω = 0.5, the stimulus-value and action-value exert equal influence on choice. We note that the overall reward value in our model is used mainly as a convenience for presenting the model and does not require stimulus and action values to be integrated. Rather, each system can first compare its own values (stimulus values against other stimulus values, and action values against other action values), and the results of these within-system comparisons are then combined, with different weights, to determine the choice (see Eq. 7).

Using the above OVs (Eq. 6), the decision rule in Eq. 5 can be rewritten as follows (omitting the trial index t for simplicity):

$${{{\mathrm{logit}}}}\left({P}_{{Right}}\right)=\, {\beta }_{0}+{\beta }_{1}\left({{OV}}_{{Right}}-{{OV}}_{{Left}}\right)\\=\, {\beta }_{0}+{\beta }_{1}\left\{{V}_{{Stim}\left({Right}\right)}\omega+{V}_{{Right}}\left(1-\omega \right)\right.\\ \left.- \, \left({V}_{{Stim}\left({Left}\right)}\omega+{V}_{{Left}}\left(1-\omega \right)\right)\right\}\\=\, {\beta }_{0}+{\beta }_{1}\left\{\left({V}_{{{Stim}}\left({Right}\right)}-{V}_{{Stim}\left({Left}\right)}\right)\omega \right. \\ \left.+\left({V}_{{Right}}-{V}_{{Left}}\right)\left(1-\omega \right)\right\}\\=\, {\beta }_{0}+{\beta }_{1}\left\{\Delta {V}_{{Stim}}\omega+{\Delta V}_{{Action}}\left(1-\omega \right)\right\}\\=\, {\beta }_{0}+{\beta }_{1}\omega \left(\Delta {V}_{{Stim}}\right)+{\beta }_{1}\left(1-\omega \right)\left({\Delta V}_{{Action}}\right),$$

(7)

where ΔV_Stim = V_Stim(Right) – V_Stim(Left) and ΔV_Action = V_Right – V_Left. That is, β₁ω and β₁(1–ω) represent the sensitivity of choice to value signals from the stimulus- and action-based systems, respectively. Therefore, ω controls the relative sensitivity of choice to the two competing systems, with stronger ω corresponding to a stronger influence of the stimulus-based system, and vice versa.

Two-system models with dynamic weighting of stimulus- and action-based learning

To allow for dynamic arbitration, we constructed hybrid models in which ω was updated on a trial-by-trial basis using the relative reliability of the two systems. In this model (Dynamic ω), the difference in reliability of the two systems is computed at the end of each trial to update the value of ω toward the more reliable system. More specifically, the relative reliability, ΔRel, between two systems at the end of trial t is computed as follows:

$${\Delta Rel}(t)=\Delta {V}_{{cho}}(t)={V}_{C,{Stim}}(t)-{V}_{C,{Action}}(t),$$

(8)

where V_C,Stim and V_C,Action correspond to the value of the chosen option in the stimulus- and action-based system, respectively, and ΔV_cho denotes the (signed) difference between the two. ΔRel ranges [−1, 1], with positive values indicating a more reliable stimulus-based system. Intuitively, ΔV_cho signals the system that gives an overall larger value for the given choice and thus, is more reliable in predicting reward.

Subsequently, the relative sensitivity of choice to the two systems, ω, is updated as follows:

$$\begin{array}{l}\omega (1)={\omega }_{0},\\ \omega (t+1)=\left\{\begin{array}{l}\omega (t)+{\alpha }_{\omega }{\varDelta Rel}(t)\left(1-\omega (t)\right)+{\zeta }_{\omega }({\omega }_{0}-\omega (t))\,{{{\mathrm{if}}}}\,{\varDelta Rel}(t) > 0,\\ \omega (t)+{\alpha }_{\omega }\left|{\varDelta Rel}(t)\right|\left(0-\omega (t)\right)+{\zeta }_{\omega }({\omega }_{0}-\omega (t))\,{{{\mathrm{if}}}}\,{\varDelta Rel}(t) < 0,\end{array}\right.\end{array}$$

(9)

where ω₀ is the initial ω on the first trial (onset of each block), α_ω is the baseline model arbitration rate (distinct from the learning rates in in Eq. 3), and ζ_ω is the passive decay rate that pulls ω toward its initial value ω₀ (distinct from the decay rate in Eq. 4). This additional decay mechanism assumes that ω defaults back to its initial bias in the absence of exogenous input signaling the reliability difference (ΔRel). We focus on this model with passive decay for all analyses, as it fits better than the variants without the passive decay term (Supplementary Fig. 18a). Importantly, the arbitration rates α_ω tend to be larger than the decay rates ζ_ω across groups and tasks (Supplementary Fig. 18b, c).

Two-system models with dynamic weighting and separate baseline signals for stimulus- and action-based learning

To rule out the possibility that the observed effects in amygdala-lesioned monkeys are solely due to impairment in learning stimulus values (e.g., by reducing the baseline strength of stimulus-value signals) and without any changes to arbitration processes, we included an additional parameter to separate these two types of changes. In the Dynamic ω model, an increase in the sensitivity to the stimulus-based system, β₁ω(t), is strictly tied to a decrease in the sensitivity to the action-based system, β₁(1–ω(t)), and vice versa. This constraint can be removed by introducing a constant factor that further scales the value of a given model before combining with the value from the other model. In this new model referred to as Dynamic ω-ρ, the overall values are equal to:

$${{OV}}_{i}={V}_{{Stim}\left(i\right)}\rho \omega (t)+{V}_{{Action}(i)}(1-\rho )(1-\omega (t)),$$

(10)

where ρ is a constant that measures the baseline ratio of signal strength from the stimulus-based system relative to the action-based system (estimated for each monkey), independent of the time-dependent arbitration weight (ω). The update for ω is the same as in the Dynamic ω model (Eq. 9). Therefore, similar to Eq. 7, the decision rule for the Dynamic ω-ρ model can be simplified as follows:

$${{{\mathrm{logit}}}}\left({P}_{{{\mathrm{Right}}}}(t)\right)=\, {\beta }_{0}+{\beta }_{1}\left\{{{OV}}_{{Right}}\left(t\right)-{{OV}}_{{Left}}\left(t\right)\right\} \\=\, {\beta }_{0}+{\beta }_{1}\rho \omega \left(t\right)\Delta {V}_{{Stim}}\left(t\right)+{\beta }_{1}\left(1-\rho \right)\left(1-\omega \left(t\right)\right){\Delta V}_{{Action}}\left(t\right).$$

(11)

This shows that β₁ρ and β₁(1–ρ) can be interpreted as the baseline strength of the stimulus- and action-based signals on choice, respectively, prior to modulation by the time-dependent arbitration parameter ω. Therefore, including ρ allows us to capture baseline (time-independent) differences in the strength of signals from the two learning systems. Therefore, ρ < 0.5 (ρ > 0.5) corresponds to lower baseline activity or impairment in the stimulus-based (respectively, action-based) system. When ρ = 0.5, the Dynamic ω-ρ model reduces to the Dynamic ω model. Because ρ is assumed to capture the baseline activity, we estimated a single value of ρ per each monkey for the entire duration of the experiment (see Model fitting and simulation for more details).

Because in this model, the arbitration weight is further weighted by ρ (or 1–ρ), we defined an “effective” arbitration weight to measure the overall relative weighting between the stimulus-based (β_Stim = β₁ρω) and action-based (β_Action = β₁(1−ρ)(1−ω)) systems as follows:

$$\Omega (t)=\frac{{\beta }_{{Stim}}(t)}{{\beta }_{{Stim}}(t)+{\beta }_{{Action}}(t)}=\frac{\rho \omega (t)}{\rho \omega (t)+(1-\rho )(1-\omega (t))},$$

(12)

where Ω is the relative weight of the stimulus-based system with respect to the total weight of the two systems on choice. It is worth noting that the common baseline inverse temperature β₁ is independent of Ω, which is determined only by ρ and ω, and that Ω reduces to ω when ρ = 0.5.

Alternative signals for estimating the reliability of learning systems

In addition to V_cho as the reliability signal for updating ω in the dynamic models, we also considered several other quantities to estimate reliability. As the first alternative to ΔV_cho for the relative reliability used to update the relative weight (Eq. 8), we considered the difference in magnitudes of RPE (|RPE|) of the action- and stimulus-based systems:

$${\varDelta Rel}(t)={|}{{RPE}}_{{Action}}(t)|-{|}{{RPE}}_{{Stim}}(t)|.$$

(13)

Conceptually, a system that yields better prediction of reward on a given trial has a lower magnitude of RPE and thus is more reliable. Note that the difference in chosen stimulus and action values, ΔV_cho, can be also rewritten as:

$$\Delta {V}_{{cho}}\left(t\right)=\, {V}_{C,{Stim}}\left(t\right)-{V}_{C,{Action}}\left(t\right) \\=\, \left(R\left(t\right)-{V}_{C,{Action}}\left(t\right)\right)-\left(R\left(t\right)-{V}_{C,{Stim}}\left(t\right)\right) \\=\, {{RPE}}_{{Action}}\left(t\right)-{{RPE}}_{{Stim}}\left(t\right).$$

(14)

This demonstrates that using ΔV_cho for the relative reliability corresponds to the signed RPE instead of the unsigned RPE.

We also considered the difference between the value of chosen and unchosen options within each system (|ΔV|) as a measure of the reliability of that system. Intuitively, this reliability signal is larger for a system that yields a better discernibility between the two competing options on a given trial. In this case, the relative reliability can be written as:

$${\varDelta Rel}\left(t\right)=\, \left|\Delta {V}_{{Stim}}\left(t\right)\right|-\left|\Delta {V}_{{Action}}\left(t\right)\right| \\=\, \left|{V}_{{StimA}}\left(t\right)-{V}_{{StimB}}\left(t\right)\right|-\left|{V}_{{Left}}\left(t\right)-{V}_{{Right}}\left(t\right)\right|.$$

(15)

Finally, we also considered the total sum of the two value estimates (|ΣV|) as a signal for estimating the reliability of a given system. For this measure, reliability is larger for a system that gives overall larger combined values from the two options. In this case, the relative reliability is equal to:

$${\Delta Rel}\left(t\right)=\, \Sigma {V}_{{Stim}}\left(t\right)-\Sigma {V}_{{Action}}\left(t\right)=\left({V}_{{StimA}}\left(t\right)+{V}_{{StimB}}\left(t\right)\right)\\ -\left({V}_{{Left}}\left(t\right)+{V}_{{Right}}\left(t\right)\right).$$

(16)

For all these different versions of ΔRel, we used the same equation for updating ω (Eq. 9).

Effective arbitration rates

In the above formulation of arbitration mechanism (Eqs. 8–12), the rate of update for ω depends on several factors including the baseline ratio parameter ρ, baseline model arbitration rate α_ω, the trial-by-trial difference in reliability (ΔRel), and the passive decay mechanism (ζ_ω). To capture the overall update rates in arbitration weight Ω, we defined the “effective” arbitration rates that quantify the overall rate of change toward either the stimulus- or action-based system (Fig. 3d). Analogous to the update rule in the valuation system (Eq. 3), an update rule for the effective arbitration weight can be written as follows:

$$\Omega (t+1)=\left\{\begin{array}{l}\Omega (t)+{\psi }_{+}(t)\left(1-\Omega \left(t\right)\right)\,{{\mathrm{if}}}\,\Delta \varOmega (t) > 0,\\ \Omega \left(t\right)+{\psi }_{-}\left(t\right)\left(0-\Omega \left(t\right)\right)\,{{\mathrm{if}}}\,\Delta \varOmega \left(t\right) < 0.\end{array}\right.$$

(17)

where ψ₊ and ψ₋ represent the effective arbitration rate toward the stimulus-based and action-based systems, respectively (ΔΩ = 0 corresponds to no change in arbitration). Unlike the learning rates (α₊ and α₋) that are fitted to each session, the effective arbitration rates (ψ) are estimated for each trial (Fig. 3e–g). More specifically, the effective arbitration rate on trial t when Ω increases, shifting choice toward the stimulus-based system, is defined as follows:

$${\psi }_{+}\left(t\right)=\frac{{[\Delta \Omega \left(t\right)]}^{+}}{1-\Omega \left(t\right)},$$

(18)

whereas the effective arbitration rate when Ω decreases, biasing choice toward the action-based system, is defined as follows:

$${\psi }_{-}\left(t\right)=\frac{{[\Delta \Omega \left(t\right)]}^{-}}{0-\Omega \left(t\right)},$$

(19)

where []⁺ and []^- indicate positive and negative changes in Ω, respectively. Similar to the learning rates for rewarded and unrewarded trials (α₊ and α₋), which capture differential updates based on different reward outcomes, the effective arbitration rates capture differential update rates toward the correct and incorrect learning systems based on the difference in reliability. Specifically, we tested whether the effective arbitration rate of the correct system in a given block type (i.e., ψ₊ in What blocks and ψ₋ in Where blocks) is distinct from that of the incorrect system, as larger effective rates for the correct system would reduce noise in incorporating feedback, enabling more efficient arbitration and, ultimately, leading to improved performance.

Model fitting and simulation

We used the standard maximum likelihood estimation method to fit choice data and estimate the best-fit parameters for the described models. One set of model parameters was fitted to each session (consisting of ~20 blocks) of monkeys’ choice data. Fitting was performed using the MATLAB optimization function fmincon, repeating the search for 100 sets of random initial parameter values to ensure global minima. See Supplementary Table 15 for the list of models and the ranges of parameters used. We report the mean Akaike Information Criterion (AIC) values of each model in Supplementary Fig. 14.

For the Dynamic ω-ρ model (Eqs. 10–12), we assumed that the value of ρ is fixed for the entirety of the experiment and does not vary across sessions, as it aims to capture the relative strength of one learning system relative to the other. Accordingly, to estimate a single value of ρ for each monkey, we fitted the entire dataset of a given monkey and obtained a single set of best-fit parameters. From this set, we only kept the value of ρ and fit the choice data again for the remaining parameters by allowing different values across sessions. The distributions of the other fitted parameters of this model are reported in Supplementary Figs. 12–13.

To test whether the observed relationship between stimulus- and action-based learning indeed required competition between the two learning systems and was not due to task structure, we simulated choice behavior using single-system or two-system models and computed model-simulated ERDS_Stim and ERDS_Action (Eq. 1) (Supplementary Fig. 9). To measure the association between the two measures and isolate the within-subject effect, we mean-centered the predictor (ERDS_Stim) and fitted the mixed-effects model as ERDS_Action ~ ERDS_Stim + (1|subject) to account for subject variability. To that end, we used the fitted parameters for each session and simulated the choice behavior 100 times for each block, using the same random reversal position and reward schedule as the behavioral data. We obtained the final averaged ERDS values for stimulus and action.

K-fold cross-validation of model performance

To compare the goodness-of-fit and determine the winning model, we used five-fold cross-validation, where each set of training/testing blocks was tested repeatedly with 50 unique instances. For each task, we created the training and testing sets as follows: for each subject, we randomly partitioned all the block data experienced by the monkey into five equal subsamples or “folds,” and selected each fold as the test data (20%) while using the remaining four folds as the training data (80%). For each instance, we obtained best-fit parameters from the training set that minimized the negative log-likelihood across all training blocks and used these to calculate the negative log-likelihood from the test blocks. Model performance was tested for each fold, thereby exhaustively testing all blocks data for a given subject. We repeated this procedure 50 times for each subject, each time using a unique combination of partitioning the data into five folds. Final mean negative log-likelihoods (-LL) were obtained by averaging across all tested blocks, reflecting the total sum of -LL during each block on average. We report these values in Figs. 2b and 3a, Supplementary Fig. 7a, and Supplementary Fig. 18a. Model performance for individual monkeys is also reported in Supplementary Table 16.

In these results, we note that even a small improvement in -LL implies a significant improvement in the predictability of the model for the tested blocks (Supplementary Note 3). Along with the -LL results, we also provide McFadden’s R² values⁶⁸ as an absolute measure for goodness-of-fit for each model in comparison to a null model with chance-level prediction. This quantity is calculated as follows:

$${McFadden}\,{R}^{2}=1-\frac{{\sum }_{t}{{LL}}_{{model}}\left(t\right)}{{{\sum }_{t}{LL}}_{{null}}}=1-\frac{{\sum }_{t}{{LL}}_{{model}}\left(t\right)}{80\times {ln}\left(0.5\right)},$$

(20)

where t indicates trial number within each block of 80 trials.

Models and parameters recovery

To perform model recovery (Supplementary Fig. 8a–d), we simulated the choice behavior of each model during a randomly created block environment similar to the experiment. Specifically, each session consisted of 30 blocks, each block having 80 trials with a reversal and with a randomly assigned reward schedule (80/20, 70/30, 60/40) and block type (What-only or What/Where). To ensure that choice behavior is simulated within a plausible range of parameters, we randomly sampled each parameter value from a kernel distribution fitted to all observed parameter values for a given model. We then fit all models and determined the best-fit model for each simulated session based on the AIC. We repeated this procedure 1000 times and report the proportion of sessions that a given fitted model best accounts for each simulated model.

For parameter recovery (Supplementary Fig. 8e, f), we simulated the choice behavior of the best model (Dynamic ω-ρ) using the estimated parameters from the experimental data and then refit the simulated data with the same model. Each block environment was set up using the same reward schedule and reversal position as the actual experiment. We simulated each block one time to ensure that the total number of simulated trials is the same as that of the experiment, from which the true parameters were estimated. We recovered the parameters from the simulated data using the same fitting procedure based on AIC, repeating the search for 100 initial random parameter values to ensure global minima. For recovering the ρ parameter, we used the entire simulated data for each monkey and obtained a single set of best-fit parameters, as was done for the actual data.

Data analysis and statistical tests

All analyses were carried out using MATLAB (MathWorks, version 2021b). All comparisons were performed using appropriate statistical tests reported throughout the text. For each test, we report the exact p-values and effect sizes when appropriate. Unless otherwise noted, all statistical tests used in this study were two-sided. In cases where highly significant results caused the software to return p-values of zero due to numerical precision limits, we report the smallest representable floating-point number at machine precision (i.e., 4.94 × 10⁻³²⁴).

Because the data were collected across different animals, we primarily utilized linear mixed-effects regression analyses (using MATLAB fitlme function) for between-group comparisons, with the group assignment as a fixed effect and subjects as a random effect, to appropriately account for between-subject variance. To maximize the accuracy of models and reduce Type-I error, we considered subject-level intercepts and additional random slopes for long-term adjustment effects, both across the experiment (proportion of session completed) and within each session day (block within a session). When performing a within-group significance test to compare a paired set of samples (e.g., Δβ = β_stim−β_action, Δψ = ψ₊−ψ₋), we fitted mixed-effects models with a fixed intercept and subject-level random effects (data ~ 1 + (1 + sess_perc + block_in_sess|subject)), where the main intercept represents the mean value of the paired difference. We then tested whether this intercept significantly differed from zero, as indicated by the coefficient β₀. For comparing block-wise performance within each group, we included random effects of subjects and fixed effects of block types (What-only, What, Where) and reward uncertainty (measured as the variance of the outcome¹³) with the following formula: P(Better) ~ variance + BlockType + (1 + variance+sess_perc+block_in_sess|subject). Specifically, the variance was calculated as p_B*(1−p_B), where p_B represents the reward probability of the better option (i.e., p_B = {0.8, 0.7, 0.6}). Variance was further mean-centered by subject for better interpretability of other coefficients.

To categorize each trial as either stimulus- or action-dominant, we directly compared ERDS_Stim and ERDS_Action (computed from a moving window of 10 trials; Eq. 1). Trials with ERDS_Stim <ERDS_Action and ERDS_Action <ERDS_Stim were categorized as stimulus-dominant and action-dominant, respectively. We dropped trials with ERDS_Stim = ERDS_Action, amounting to 11.1% of total trials in the What-only and 11.7% in the What/Where tasks. We used a mixed-effects regression model with subjects as random intercepts to test whether the dominant strategy significantly modulates RT. Fixed effects included the following predictors: dominant strategy, coded as stim-dominant (0) or action-dominant (1), whether the monkey had chosen the better option (1) or not (0), reward schedule or uncertainty (measured as variance), trial number within a block, block number within a session, and session number within the subject. We also included interaction between the dominant strategy and the choice of a better option, as the latter could depend on the adopted strategy. Variables were normalized by each subject to yield comparable standardized regression coefficients.

To study the long-term adjustment in behavior across the time course of the experiment (Supplementary Fig. 11), we calculated ERDS_Stim, ERDS_Action (Eq. 1), and median RT for each block and regressed them on the proportion of the sessions (total block number) completed as the predictor variable. This regressor was further mean-centered by subject for better interpretability of other coefficients. Initial effective arbitration weight (Ω₀), which was estimated for each session, was analyzed at the session level. To further account for the variability in adjustment at the subject level, we considered random slopes and intercepts for the effect of sessions within each subject using mixed-effects models. We included all group data from the What-only task and used planned contrasts to infer slopes for each group and the group differences in the slopes. Full results are reported in Supplementary Tables 11–14. For visualization purposes only, we plot the simple least-squares lines and the estimated slopes for each group (β_sess) in Supplementary Fig. 11.

To estimate the effective arbitration rate across time (Fig. 3e–g; Eqs. 18 and 19), we computed the mean trajectory for each of two transition rates (toward stimulus- or action-based system) across blocks by aligning all trials relative to the beginning, reversal point, and the end of each block. For the bar plots in the insets, we computed the difference between the mean of the two transition rates within each block. In particular, we focused on the trials after the reversal to avoid potential confounds with initial bias (ω₀) and observed the transition behavior after adjusting to the initial uncertainty of the block.

To plot performance (Fig. 1e–g) and effective arbitration rates over time (Fig. 3e–g; Eqs. 18 and 19), we obtained the trajectories by concatenating 20 trials relative to the beginning, reversal point, and the end of each block, to account for the random reversal positions. These curves were then smoothed with a moving window of five trials, separately within the acquisition and reversal phases.

To identify the model parameters that are significantly modulated in the lesioned group with the consistent direction of effects across task conditions (What-only and What/Where), we adopted the following mixed-effects model: parameter ~ group + (group|task) + (1|subject:task). Namely, the model included a fixed effect of group (control, amygdala, VS) and random effects of subjects and tasks with a random slope of each group in each task. This formulation assumes a single, uniform effect of the group regardless of task condition, thereby identifying parameters that are consistently modulated by brain lesions. Note that subjects were nested within tasks to allow for a separate baseline for each task, as some of the monkeys participated in both the What-only and What/Where tasks. We tested this mixed-effects model on each of the parameters compiled across groups and tasks, with control monkeys as the reference group. For testing the group difference in the ρ parameter (Eqs. 10 and 11) across task conditions, which was fitted for each subject and therefore lacks adequate sample size, we utilized two-sided permutation tests. Specifically, we conducted permutation tests on the ρ values of each subject from a given pair of tested groups across both tasks and generated a null distribution of the test statistic (i.e., mean group difference in ρ) by randomly shuffling the group assignment 10,000 times. The p-value was calculated as the proportion of permuted test statistics that were as extreme or more extreme than the empirically observed test statistic.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data analyzed in this study are available at https://github.com/DartmouthCCNL/woo_etal_amygdala⁶⁹. The data generated in this study are provided in the Source data file. Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Dr. Alireza Soltani (alireza.soltani@dartmouth.edu). Source data are provided with this paper.

Code availability

Custom analysis codes are available at https://github.com/DartmouthCCNL/woo_etal_amygdala⁶⁹.

References

Averbeck, B. & O’Doherty, J. P. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 47, 147–162 (2022).
Article PubMed Google Scholar
O’Doherty, J. P. et al. Why and how the brain weights contributions from a mixture of experts. Neurosci. Biobehav. Rev. 123, 14–23 (2021).
Article PubMed PubMed Central Google Scholar
Soltani, A. & Koechlin, E. Computational models of adaptive behavior and prefrontal cortex. Neuropsychopharmacology 47, 58–71 (2022).
Article PubMed Google Scholar
Koechlin, E. An evolutionary computational theory of prefrontal executive function in decision-making. Philos. Trans. R. Soc. B Biol. Sci. 369, 20130474 (2014).
Article Google Scholar
Clark, L., Cools, R. & Robbins, T. W. The neuropsychology of ventral prefrontal cortex: decision-making and reversal learning. Brain Cogn. 55, 41–53 (2004).
Article PubMed Google Scholar
Yaple, Z. A. & Yu, R. Fractionating adaptive learning: a meta-analysis of the reversal learning paradigm. Neurosci. Biobehav. Rev. 102, 85–94 (2019).
Article PubMed Google Scholar
Izquierdo, A., Brigman, J. L., Radke, A. K., Rudebeck, P. H. & Holmes, A. The neural basis of reversal learning: an updated perspective. Neuroscience 345, 12–26 (2017).
Article PubMed Google Scholar
Ragozzino, M. E., Wilcox, C., Raso, M. & Kesner, R. P. Involvement of rodent prefrontal cortex subregions in strategy switching. Behav. Neurosci. 113, 32–41 (1999).
Article PubMed Google Scholar
Birrell, J. M. & Brown, V. J. Medial frontal cortex mediates perceptual attentional set shifting in the rat. J. Neurosci. 20, 4320–4324 (2000).
Article PubMed PubMed Central Google Scholar
Floresco, S. B., Ghods-Sharifi, S., Vexelman, C. & Magyar, O. Dissociable roles for the nucleus accumbens core and shell in regulating set shifting. J. Neurosci. 26, 2449–2457 (2006).
Article PubMed PubMed Central Google Scholar
Mansouri, F. A., Matsumoto, K. & Tanaka, K. Prefrontal cell activities related to monkeys’ success and failure in adapting to rule changes in a Wisconsin Card Sorting Test analog. J. Neurosci. 26, 2745–2756 (2006).
Article PubMed PubMed Central Google Scholar
Buckley, M. J. et al. Dissociable Components Of Rule-guided Behavior Depend On Distinct Medial And Prefrontal Regions. Science 325, 52–58 (2009).
Article ADS PubMed Google Scholar
Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).
Article PubMed PubMed Central Google Scholar
Stalnaker, T. A., Franz, T. M., Singh, T. & Schoenbaum, G. Basolateral amygdala lesions abolish orbitofrontal-dependent reversal impairments. Neuron 54, 51–58 (2007).
Article PubMed Google Scholar
Izquierdo, A. et al. Basolateral amygdala lesions facilitate reward choices after negative feedback in rats. J. Neurosci. 33, 4105–4109 (2013).
Article PubMed PubMed Central Google Scholar
Jang, A. I. et al. The role of frontal cortical and medial-temporal lobe brain areas in learning a Bayesian prior belief on reversals. J. Neurosci. 35, 11751–11760 (2015).
Article PubMed PubMed Central Google Scholar
Costa, V. D., Dal Monte, O., Lucas, D. R., Murray, E. A. & Averbeck, B. B. Amygdala and ventral striatum make distinct contributions to reinforcement learning. Neuron 92, 505–517 (2016).
Article PubMed PubMed Central Google Scholar
Taswell, C. A. et al. Effects of amygdala lesions on object-based versus action-based learning in macaques. Cereb. Cortex 31, 529–546 (2021).
Article PubMed Google Scholar
Groman, S. M. et al. Orbitofrontal circuits control multiple reinforcement-learning processes. Neuron 103, 734–746.e3 (2019).
Article PubMed PubMed Central Google Scholar
Aguirre, C. G. et al. Dissociable contributions of basolateral amygdala and ventrolateral orbitofrontal cortex to flexible learning under uncertainty. J. Neurosci. 44, e0622232023 (2024).
Piray, P. & Daw, N. D. A model for learning based on the joint estimation of stochasticity and volatility. Nat. Commun. 12, 6587 (2021).
Article ADS PubMed PubMed Central Google Scholar
Trepka, E. et al. Entropy-based metrics for predicting choice behavior based on local response to reward. Nat. Commun. 12, 6567 (2021).
Article ADS PubMed PubMed Central Google Scholar
Woo, J. H. et al. Mechanisms of adjustments to different types of uncertainty in the reward environment across mice and monkeys. Cogn. Affect. Behav. Neurosci. 23, 600–619 (2023).
Article PubMed PubMed Central Google Scholar
Rothenhoefer, K. M. et al. Effects of ventral striatum lesions on stimulus-based versus action-based reinforcement learning. J. Neurosci. 37, 6902–6914 (2017).
Article PubMed PubMed Central Google Scholar
Farashahi, S., Rowe, K., Aslami, Z., Gobbini, M. I. & Soltani, A. Influence of learning strategy on response time during complex value-based learning and choice. https://doi.org/10.1371/journal.pone.0197263.
O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).
Article ADS PubMed Google Scholar
Clarke, H. F., Robbins, T. W. & Roberts, A. C. Lesions of the medial striatum in monkeys produce perseverative impairments during reversal learning similar to those produced by lesions of the orbitofrontal cortex. J. Neurosci. 28, 10972–10982 (2008).
Article PubMed PubMed Central Google Scholar
Costa, V. D., Mitz, A. R. & Averbeck, B. B. Subcortical substrates of explore-exploit decisions in primates. Neuron 103, 533–545.e5 (2019).
Article PubMed PubMed Central Google Scholar
Tang, H., Costa, V. D., Bartolo, R. & Averbeck, B. B. Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior. Cell Rep. 38, 110198 (2022).
Article PubMed PubMed Central Google Scholar
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).
Article PubMed Google Scholar
Schouppe, N., Demanet, J., Boehler, C. N., Ridderinkhof, K. R. & Notebaert, W. The role of the striatum in effort-based decision-making in the absence of reward. J. Neurosci. 34, 2148–2154 (2014).
Article PubMed PubMed Central Google Scholar
Dobryakova, E., Jessup, R. K. & Tricomi, E. Modulation of ventral striatal activity by cognitive effort. NeuroImage 147, 330–338 (2017).
Article PubMed Google Scholar
Silveira, M. M., Tremblay, M. & Winstanley, C. A. Dissociable contributions of dorsal and ventral striatal regions on a rodent cost/benefit decision-making task requiring cognitive effort. Neuropharmacology 137, 322–331 (2018).
Article PubMed Google Scholar
Kim, A. & Chib, V. S. Neural substrates underlying the expectation of rewards resulting from effortful exertion. https://doi.org/10.1101/2024.02.01.578411 (2024).
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article PubMed Google Scholar
Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Article PubMed PubMed Central Google Scholar
Kool, W., Gershman, S. J. & Cushman, F. A. Cost-benefit arbitration between multiple reinforcement-learning systems. Psychol. Sci. 28, 1321–1333 (2017).
Article PubMed Google Scholar
Weissengruber, S., Lee, S. W., O’Doherty, J. P. & Ruff, C. C. Neurostimulation reveals context-dependent arbitration between model-based and model-free reinforcement learning. Cereb. Cortex 29, 4850–4862 (2019).
Article PubMed Google Scholar
Dorfman, H. M. & Gershman, S. J. Controllability governs the balance between Pavlovian and instrumental action selection. Nat. Commun. 10, 5826 (2019).
Article ADS PubMed PubMed Central Google Scholar
Miller, K. J., Shenhav, A. & Ludvig, E. A. Habits without values. Psychol. Rev. 126, 292–311 (2019).
Article PubMed PubMed Central Google Scholar
Seok, D. et al. Neurocircuit dynamics of arbitration between decision-making strategies across obsessive-compulsive and related disorders. NeuroImage Clin. 35, 103073 (2022).
Article PubMed PubMed Central Google Scholar
Collins, A. & Koechlin, E. Reasoning, learning, and creativity: frontal lobe function and human decision-making. PLOS Biol. 10, e1001293 (2012).
Article PubMed PubMed Central Google Scholar
Wurm, F., Ernst, B. & Steinhauser, M. Surprise-minimization as a solution to the structural credit assignment problem. PLOS Comput. Biol. 20, e1012175 (2024).
Article ADS PubMed PubMed Central Google Scholar
Charpentier, C. J., Iigaya, K. & O’Doherty, J. P. A neuro-computational account of arbitration between choice imitation and goal emulation during human observational learning. Neuron 106, 687–699.e7 (2020).
Article PubMed PubMed Central Google Scholar
Philippe, R. et al. Neurocomputational mechanisms involved in adaptation to fluctuating intentions of others. Nat. Commun. 15, 3189 (2024).
Article ADS PubMed PubMed Central Google Scholar
Nong, T.-T. A. et al. Computational mechanisms underlying the emergence of theory of mind in children. https://doi.org/10.31234/osf.io/y876r (2023).
Li, J., Schiller, D., Schoenbaum, G., Phelps, E. A. & Daw, N. D. Differential roles of human striatum and amygdala in associative learning. Nat. Neurosci. 14, 1250–1252 (2011).
Article PubMed PubMed Central Google Scholar
Roesch, M. R., Esber, G. R., Li, J., Daw, N. D. & Schoenbaum, G. Surprise! Neural correlates of Pearce–Hall and Rescorla–Wagner coexist within the brain. Eur. J. Neurosci. 35, 1190–1200 (2012).
Article PubMed PubMed Central Google Scholar
Murray, E. A. & Fellows, L. K. Prefrontal cortex interactions with the amygdala in primates. Neuropsychopharmacology 47, 163–179 (2022).
Article PubMed Google Scholar
Gangopadhyay, P., Chawla, M., Dal Monte, O. & Chang, S. W. C. Prefrontal–amygdala circuits in social decision-making. Nat. Neurosci. 24, 5–18 (2021).
Article PubMed Google Scholar
Carmichael, S. T. & Price, J. L. Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys. J. Comp. Neurol. 363, 615–641 (1995).
Article PubMed Google Scholar
Ghashghaei, H. T., Hilgetag, C. C. & Barbas, H. Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. NeuroImage 34, 905–923 (2007).
Article PubMed Google Scholar
Rudebeck, P. H., Mitz, A. R., Chacko, R. V. & Murray, E. A. Effects of amygdala lesions on reward-value coding in orbital and medial prefrontal cortex. Neuron 80, 1519–1531 (2013).
Article PubMed Google Scholar
Rudebeck, P. H., Ripple, J. A., Mitz, A. R., Averbeck, B. B. & Murray, E. A. Amygdala contributions to stimulus–reward encoding in the macaque medial and orbital frontal cortex during learning. J. Neurosci. 37, 2186–2202 (2017).
Article PubMed PubMed Central Google Scholar
Mahut, H. Spatial and object reversal learning in monkeys with partial temporal lobe ablations. Neuropsychologia 9, 409–424 (1971).
Article PubMed Google Scholar
Sarter, M. & Markowitsch, H. J. Involvement of the amygdala in learning and memory: a critical review, with emphasis on anatomical relations. Behav. Neurosci. 99, 342–380 (1985).
Article PubMed Google Scholar
Hampton, A. N., Adolphs, R., Tyszka, J. M. & O’Doherty, J. P. Contributions of the amygdala to reward expectancy and choice signals in human prefrontal cortex. Neuron 55, 545–555 (2007).
Article PubMed Google Scholar
Rygula, R. et al. Role of central serotonin in anticipation of rewarding and punishing outcomes: effects of selective amygdala or orbitofrontal 5-HT depletion. Cereb. Cortex 25, 3064–3076 (2015).
Article PubMed Google Scholar
Rudebeck, P. H. & Murray, E. A. Amygdala and orbitofrontal cortex lesions differentially influence choices during object reversal learning. J. Neurosci. 28, 8338–8343 (2008).
Article PubMed PubMed Central Google Scholar
Izquierdo, A. & Murray, E. A. Selective bilateral amygdala lesions in rhesus monkeys fail to disrupt object reversal learning. J. Neurosci. 27, 1054–1062 (2007).
Article PubMed PubMed Central Google Scholar
Ochoa, J. G. et al. Post-training depletions of basolateral amygdala serotonin fail to disrupt discrimination, retention, or reversal learning. Front. Neurosci. 9, 155 (2015).
Genovesio, A., Tsujimoto, S., Navarra, G., Falcone, R. & Wise, S. P. Autonomous encoding of irrelevant goals and outcomes by prefrontal cortex neurons. J. Neurosci. 34, 1970–1978 (2014).
Article PubMed PubMed Central Google Scholar
Cazettes, F. et al. A reservoir of foraging decision variables in the mouse brain. Nat. Neurosci. 26, 840–849 (2023).
Article PubMed PubMed Central Google Scholar
Gupta, D., DePasquale, B., Kopec, C. D. & Brody, C. D. Trial-history biases in evidence accumulation can give rise to apparent lapses in decision-making. Nat. Commun. 15, 662 (2024).
Article ADS PubMed PubMed Central Google Scholar
Bartolo, R., Saunders, R. C., Mitz, A. R. & Averbeck, B. B. Dimensionality, information and learning in prefrontal cortex. PLOS Comput. Biol. 16, e1007514 (2020).
Article ADS PubMed PubMed Central Google Scholar
Bartolo, R. & Averbeck, B. B. Prefrontal cortex predicts state switches during reversal learning. Neuron 106, 1044–1054.e4 (2020).
Article PubMed PubMed Central Google Scholar
Tang, H., Bartolo, R. & Averbeck, B. B. Reward-related choices determine information timing and flow across macaque lateral prefrontal cortex. Nat. Commun. 12, 894 (2021).
Article ADS PubMed PubMed Central Google Scholar
McFadden, D. Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics (ed. Zarembka, P.) 105–142 (New York: Academic Press, 1973).
Woo et al. Contribution of amygdala to dynamic model arbitration. Zenodo, https://doi.org/10.5281/zenodo.17480631 (2025).

Download references

Acknowledgements

The authors thank Chanc VanWinkle Orzell and Dmitriy Lisitsyn for their helpful comments on the manuscript. This work is supported by the National Institutes of Health (R01 DA047870 to A.S.) and by the Intramural Research Program of the NIMH (ZIA MH002928). The contributions of the NIH authors were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered Works of the United States Government. However, the findings and conclusions presented in this paper are those of the authors and do not necessarily reflect the views of the NIH or the U.S. Department of Health and Human Services.

Author information

Authors and Affiliations

Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
Jae Hyung Woo & Alireza Soltani
Division of Developmental and Cognitive Neuroscience, Emory National Primate Research Center, Atlanta, GA, USA
Vincent D. Costa & Kathryn M. Rothenhoefer
Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
Craig A. Taswell & Bruno B. Averbeck

Authors

Jae Hyung Woo
View author publications
Search author on:PubMed Google Scholar
Vincent D. Costa
View author publications
Search author on:PubMed Google Scholar
Craig A. Taswell
View author publications
Search author on:PubMed Google Scholar
Kathryn M. Rothenhoefer
View author publications
Search author on:PubMed Google Scholar
Bruno B. Averbeck
View author publications
Search author on:PubMed Google Scholar
Alireza Soltani
View author publications
Search author on:PubMed Google Scholar

Contributions

J.H.W. and A.S. designed the study; V.C., C.T., K.R., and B.A. designed the experiments; J.H.W., V.C., C.T., K.R., B.A., and A.S. performed research; J.H.W. and A.S. analyzed data; J.H.W. and A.S. wrote the first draft paper. All authors contributed to the revision of the paper.

Corresponding author

Correspondence to Alireza Soltani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Massimo Silvetti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Woo, J.H., Costa, V.D., Taswell, C.A. et al. Contribution of amygdala to dynamic model arbitration under uncertainty. Nat Commun 16, 11704 (2025). https://doi.org/10.1038/s41467-025-66745-1

Download citation

Received: 20 August 2024
Accepted: 12 November 2025
Published: 28 November 2025
Version of record: 30 December 2025
DOI: https://doi.org/10.1038/s41467-025-66745-1

Subjects

Abstract

Similar content being viewed by others

Rate and noise in human amygdala drive increased exploration in aversive learning

Translation of monosynaptic circuits underlying amygdala fMRI neurofeedback training

Neural inhibition as implemented by an actor-critic model involves the human dorsal striatum and ventral tegmental area

Introduction

Results

Behavioral paradigm with multiple forms of uncertainty

Evidence for multiple learning systems and their interaction

Influences of amygdala and ventral striatum lesions on learning and choice behavior

Mechanisms of arbitration between stimulus- and action-based learning systems

Deficits in arbitration due to amygdala but not VS lesions

Dynamic interaction between learning and arbitration processes and the impact of the initial state

Diversity of behavior driven by the dynamic interaction between learning and arbitration processes

Contribution of the amygdala to long-term adjustments of behavior

Discussion

Methods

Experimental paradigm

What-only task

What/Where task

Quantification and statistical analysis

Entropy-based metrics

Computational models

Single-system reinforcement learning (RL) models

Two-system model with static weighting of stimulus- and action-based learning

Two-system models with dynamic weighting of stimulus- and action-based learning

Two-system models with dynamic weighting and separate baseline signals for stimulus- and action-based learning

Alternative signals for estimating the reliability of learning systems

Effective arbitration rates

Model fitting and simulation

K-fold cross-validation of model performance

Models and parameters recovery

Data analysis and statistical tests

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links