Abstract
Hierarchical decisions in natural environments require processing uncertainty across multiple levels, but existing models struggle to explain how animals perform flexible, goal-directed behaviors under such conditions. Here we introduce CogLinks, biologically grounded neural architectures that combine corticostriatal circuits for reinforcement learning and frontal thalamocortical networks for executive control. Through mathematical analysis and targeted lesion, we show that these systems specialize in different forms of uncertainty, and their interaction supports hierarchical decisions by regulating efficient exploration, and strategy switching. We apply CogLinks to a computational psychiatry problem, linking neural dysfunction in schizophrenia to atypical reasoning patterns in decision making. Overall, CogLink fills an important gap in the computational landscape, providing a bridge from neural substrates to higher cognition.
Introduction
Environments and behaviors are often hierarchically organized, requiring animals to make decisions that integrate information from multiple levels. In complex tasks, animals must evaluate both immediate and broader causes to adapt their actions, especially when an unexpected outcome arises. Such outcomes can be ambiguous, as the brain must determine whether they result from random variability, a suboptimal strategy, or a fundamental change in the environment-a critical process for selecting the most effective course of action. This challenge often involves hierarchical reasoning, where the brain not only processes uncertainties at different levels but also integrates them to form coherent strategies.
For example, if a conversation with a new colleague feels awkward, one might question whether the topic is poorly chosen or whether the colleague is simply having a bad day. Disambiguating these possibilities is crucial for selecting the appropriate response. If the discomfort stems from a poor topic choice, switching to a different subject might improve the interaction. However, if the colleague’s disengagement reflects an underlying mood, a better approach might be to pause and revisit the conversation another day. This decision-making process relies on hierarchical inference, where lower-level variables (such as topic preferences) are interpreted within the broader context of higher-level states (such as mood or personal circumstances). This distinction becomes easier with a familiar colleague, as prior experience reduces uncertainty about their preferences, making it more likely that disengagement is attributed to mood rather than topic choice.
Since both conversational preferences and emotional states are latent variables that cannot be directly observed, the brain must infer their values while also estimating the uncertainty associated with each. For instance, one must assess both the intrinsic appeal of a topic, such as the Super Bowl, and the likelihood that a friend has emotionally recovered from a breakup after four months.
A fundamental challenge in neuroscience is understanding how the brain processes and integrates uncertainty across multiple hierarchical levels to drive flexible decision-making. Animal studies have demonstrated that perceptual confidence can influence higher-level contextual inference and revealed neural substrates associated with both sensory and contextual uncertainty in hierarchical decision-making tasks1,2. However, how contextual uncertainty interacts with other types of uncertainty, such as associative or outcome uncertainty-illustrated by the example of a conversation with a new colleague-remains unclear.
Machine learning approaches have contributed to progress in addressing this question. Traditionally, normative models based on Bayesian inference have been used to solve hierarchical tasks and model the strategies animals employ3,4,5. These models estimate uncertainty at multiple levels and use it for effective credit assignment6,7,8. However, they pose significant limitations as tools for neuroscientific discovery. First, their explanatory power is constrained when the generative model of the environment is misspecified9, leaving the challenge of specifying an accurate model unresolved. Second, their components are non-neural, making it unclear whether and how they correspond to neural circuits and computations.
To address these limitations, the field has increasingly turned to neural networks trained through deep learning, which have been proposed as models of neural computation and have demonstrated exceptional performance on a range of tasks, sometimes exceeding human capabilities10,11. However, these architectures operate as frequentist prediction models that do not explicitly account for uncertainty. As a result, they cannot estimate confidence in different task components in the way humans and animals do8,12,13,14. These shortcomings highlight the need for a new approach to understand how brains make decisions in hierarchical environments and how uncertainty processing enables this cognitive ability.
In this study, we introduce CogLink, a neural architecture designed to bridge this gap. Fundamentally, CogLink networks are dynamical systems composed of rate neurons and share structural similarities with artificial feedforward and recurrent neural networks (Fig. 1a). However, they differ from conventional machine learning neural networks in three important ways.
a A task-optimized neural network updates its parameters using gradient descent to minimize a predefined loss function. b The ideal observer model employs a generative model of the environment and chooses actions that minimize the loss over the posterior distribution of the state. This approach uses Bayesian inference to evaluate the posterior and inform optimal decision-making. c The CogLink model integrates algorithmic approximations with neural dynamics, enabling interpretable computation. By constraining neural activity to a low-dimensional subspace, CogLink uses dimensionality reduction and separation of timescales to approximate neural trajectories as structured algorithms. These approximations allow the optimization of algorithmic parameters to inform the corresponding neural parameters in the CogLink model. Unlike exact solutions, which are computationally intractable, CogLink achieves asymptotic optimality through mathematical analysis, making the optimization process feasible. d CogLink incorporates biological realism by modeling specific brain systems, including connectivity patterns, cell types, and learning rules. This biologically grounded design enables the identification of computational roles for each mechanism in hierarchical decision-making.
First, CogLink networks are optimized using a multi-step procedure. Instead of using backpropagation to minimize network error, we employ an approach that leverages scale separation principles to extract a structured computational algorithm from neural dynamics, followed by mathematical analysis to determine near-optimal network connectivity parameters (Fig. 1b, c).
Second, CogLink networks incorporate biological realism by modeling specific brain systems, including known connectivity patterns, cell types, and learning rules for dynamic control (Fig. 1d).
Third, CogLinks offer greater interpretability. Because they are explicitly structured to approximate an algorithm, they allow us to directly map neural mechanisms to their functional roles, unlike traditional deep neural networks, which are often considered black boxes (Fig. 1d).
Through iterative development, we construct progressively complex CogLinks, mirroring the increasing computational complexity observed in biological evolution. Specifically, the basic network models a premotor cortico-thalamic-basal ganglia (BG) loop, emphasizing BG circuitry’s role in reinforcement learning and efficient environmental exploration, which addresses lower-level uncertainty in hierarchical environments. The augmented network incorporates an associative cortico-thalamic-BG loop, highlighting the mediodorsal thalamus (MD) and its interactions with the prefrontal cortex (PFC) to process higher-level uncertainty related to contextual inference and strategy switching.
In addition to demonstrating how partitioning uncertainty types in hierarchical environments is critical for the networks to reproduce animal behavior, CogLinks provide insights into neural mechanisms underlying complex decision-making. Specifically, our model explains findings from an accompanying study15 on human behavior and fMRI readouts and offers insights into perturbed dynamics in a mouse model relevant to schizophrenia16. To our knowledge, few existing neural frameworks simultaneously solve complex cognitive tasks while providing computational insight into neural mechanisms. We propose that CogLinks constitute an important step toward bridging this gap.
Results
Building a basic CogLink network for handling lower-level uncertainty
To illustrate lower-level uncertainty, let us revisit the example of conversing with a new colleague. Suppose you know nothing about the person and naively attribute each sigh of boredom to a suboptimal choice of topic, disregarding higher-level factors such as mood or personal circumstances. This scenario highlights two key types of lower-level uncertainty: outcome uncertainty, which may arise from factors such as variability in the person’s focus (e.g., low focus might prevent them from following certain sentences), and associative uncertainty, which reflects our lack of knowledge about the person’s preferences (greater unfamiliarity corresponds to higher associative uncertainty). Successfully navigating this interaction requires balancing exploration and exploitation. Persisting with the Super Bowl (exploitation) tests its suitability as a topic but risks disengagement if it proves uninteresting. Conversely, switching to a new topic, such as a shared hobby or current news (exploration), sacrifices immediate feedback on the Super Bowl but creates an opportunity to reduce associative uncertainty by learning more about the colleague’s preferences.
To investigate how the brain handles uncertainty, we use an A-alternative forced choice task (A-AFC task) (Fig. 2a). In this task, the reward probabilities for each action at trial t are represented as a vector \({{{{\boldsymbol{\theta }}}}}_{t}\in {{\mathbb{R}}}^{A}\), where \({({{{{\boldsymbol{\theta }}}}}_{t})}_{a}\) denotes the probability of receiving a reward when choosing action a. We consider both stationary and dynamic environments. In the stationary environment, the reward probabilities remain constant across trials, such that θt = θ1 for all t ∈ [T]. In contrast, the dynamic environment features reward probabilities θt that vary across trials to reflect changing conditions. We will first study the stationary environment as it isolates lower-level uncertainty, providing a foundational framework for the basic CogLink. Subsequently, we will augment our CogLink to the dynamic environment to explore how hierarchical uncertainties interact.
a Schematic of the A-AFC task in a stationary environment. The task is parameterized by the number of alternatives A and the expected difference Δ between the most and least rewarding options. b Schematic of the basic CogLink network architecture. Yellow denotes the anterolateral motor cortex (ALM), green denotes the basal ganglia (BG), and orange denotes the primary motor cortex (M1). c Premotor corticostriatal-like circuit implementing sampling from a distribution. In the BG-like circuit, a quantile population code encodes associative uncertainty as a distribution of action-value beliefs. Coupled with premotor random sparsification dynamics, the circuit samples from the distribution to extract uncertainty information for downstream motor processing. PV denotes parvalbumin-positive neurons, and PN denotes pyramidal neurons. d Comparison between the probability density function (p.d.f.) of theoretical (black) and empirical distributions (gray) encoded in corticostriatal synapses. The empirical distribution is derived from circuit simulations in b with n = 100independent samples. e Schematic of the motor cortex-like downstream action selection circuit. The circuit receives sampled inputs from the BG and selects the action with the highest sampled action value. f P.d.f. plots of action-value beliefs and corresponding choice probabilities (n = 100 action choices) at trials 10 and 50. At trial 10, large overlaps between distributions promote exploration, while at trial 50, minimal overlaps lead to exploitation. g Summary plot of accurate choice probability over trials (mean ± s.e.m., n = 50 sessions). h Corticostriatal synaptic strengths summarized as a heatmap over trials (mean, n = 50 sessions). Each row represents a synapse, with 100 rows per block. i Accumulated regret for the full model (orange), KO-sparseness (yellow), and KO-distributional RPE (green) variants (mean ± s.e.m., n = 50 sessions, **P = 2.90 × 10−3, *P = 1.75 × 10−2; two-sided permutation test on the mean difference). j Accurate choice probability for the full model, KO-sparseness, and KO-distributional RPE variants (mean ± s.e.m., n = 50 sessions).
The basal ganglia (BG) are a natural candidate for handling lower-level uncertainties. A substantial body of research implicates the BG in learning action-outcome associations by integrating sensory inputs, motor actions, and reward feedback17,18. Dopaminergic signals encoding reward prediction errors (RPEs) facilitate synaptic plasticity within the BG, enabling the adaptive adjustment of action values over time. This iterative refinement process makes the BG well-suited for encoding associative uncertainty and guiding the trade-off between exploration and exploitation. Accordingly, our basic CogLink network incorporates BG-like circuits, along with dopamine-dependent plasticity mechanisms for online learning and premotor/motor cortical areas for action selection (Fig. 2b). Neuronal activity in these areas is modeled as rate neurons governed by:
where x represents the neurons’ firing rates, τ is the membrane time constant, f is a nonlinearity function, I is the input, and W denotes synaptic weights. As a convention, we denote synaptic weights from area A to area B using the variables WA/B (matrix form) and VA/B (vector form). The variable xA represents the neural activity in vector form at area A. In its most basic form, the CogLink network handles lower-level uncertainty through two core mechanisms: exploration and learning. The exploration mechanism represents uncertainty as a distribution in BG and uses premotor recurrent dynamics to implement a probability matching strategy, promoting exploration when uncertainty is high. The learning mechanism, inspired by distributional reinforcement learning and Bayesian inference, updates action-value beliefs based on trial outcomes via dopamine-dependent plasticity. We detail how these mechanisms are implemented in different neural areas below.
A defining feature of the basic CogLink network is its incorporation of a quantile population code in the BG-like area, which encodes associative uncertainty as a distribution over action-value beliefs (Fig. 2c). In this scheme, each neuron is associated with a fixed quantile of the probability distribution, meaning that selecting a subset of neurons corresponds to sampling specific probabilities from the encoded distribution. Random sparsification dynamics in the premotor cortex (anterolateral motor cortex (ALM))-like area leverage this property to extract uncertainty through sampling (Fig. 2c, d, “Methods” section). This use of a quantile code builds upon the broader concept of population encoding, where neuronal ensembles represent probability distributions. Established approaches to population encoding include probabilistic population codes19, sampling codes20, explicit probability codes21, and quantile codes22. In our model, we adopt the quantile coding approach to represent action-value beliefs as a probability distribution.
Specifically, we implement this by organizing neurons into A choice-specific ensemble of M premotor cortex-BG neuron pairs. In this framework, each ensemble encodes the distribution of action-value beliefs associated with a specific choice. The premotor corticostriatal synapses represent the distribution of action-value beliefs using a quantile code:
where va is a random variable representing action-value beliefs, and \({{{{\bf{V}}}}}_{a,m}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\) denotes the synaptic weight of the m-th connection between the m-th premotor neuron and BG neuron pair. Here, A is the number of alternatives and M is the size of a neuronal ensemble. This representation allows the network to efficiently extract uncertainty, as random sparsification in the premotor cortex-like area samples directly from the quantile-coded distribution formed by corticostriatal-like synapses (Fig. 2d). These sampled values are then relayed to the motor cortex-like area, where they inform action selection and balance exploration and exploitation during decision-making.
The sampling mechanism thus provides a way to translate associative uncertainty into inputs for the motor-like area, supporting efficient exploration during decision-making. While biological basal ganglia (BG) circuits project to the motor cortex via the thalamus and involve intricate circuitry beyond the corticostriatal loops modeled here, we abstract these additional components as relay functions to simplify the model. In this abstraction, the sampled values are directly projected to the motor cortex-like area to focus on the computations critical for exploration.
To convert the sampled action values, which encode associative uncertainty, into motor signals for action selection, we employ a model of the action selection mechanism inspired by ramp-to-threshold circuits observed in motor-related decision-making cortical circuits23. Specifically, the recurrent connections in the motor cortex-like area are configured to implement mutual inhibition, enabling a Winner-Take-All (WTA) procedure (Fig. 2e, “Methods” section). In this setup, when the activity of a motor neuron ramps up to the threshold, the corresponding action at will be chosen. Thus, the motor circuit effectively chooses the action corresponding to the highest action-value sample emitted from striatal circuits (Fig. 2e).
Importantly, because the probabilistic nature of the sampled values carries information about the uncertainty (e.g., a flat distribution with a low mean can still sample a high value to drive exploration), this allows for associative uncertainty-based exploration: when value belief distributions of different actions have large overlapping (high uncertainty on which action is optimal), the CogLink will explore more; on the other hand, when value belief distribution of different actions have small overlapping (low uncertainty on which action is optimal), the CogLink will exploit more (Fig. 2f).
Finally, after the model chooses action at and receives reward rt at trial t, the dopamine (DA) activities form a distributional RPE, \({{{\boldsymbol{\delta }}}}\in {{\mathbb{R}}}^{M}\), given by:
where \({{{{\bf{V}}}}}_{{a}_{t},m}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\) represents the predicted value of action at for the m-th quantile. The distributional RPE is then used to update the premotor-BG synapses according to the following rule:
where \({{{{\boldsymbol{\eta }}}}}_{{a}_{t}}\) denotes the learning rate of the corticostriatal synapses for the selected action at.
Unlike prior work on distributional reinforcement learning22, which focuses on learning reward distributions, our approach learns action-value belief distributions. This distinction is critical for representing uncertainty in action values, which is essential for reasoning about lower-level uncertainty. We elaborate on this distinction and its implications in the “Discussion” section.
To assess the model’s performance, we use the standard regret metric from bandit literature24, defined as:
where \({a}_{t}^{*}\) is the retrospectively optimal action, and at is the action chosen by the model. Regret measures the cumulative difference in rewards between the model’s chosen actions and the optimal actions, providing a benchmark for evaluating the model’s ability to adapt and balance exploration and exploitation.
The CogLink network successfully minimized regret by balancing exploration and exploitation (Fig. 2g). To explore the neural underpinnings of our model’s performance, we examined the synaptic strength of corticostriatal connections. Intriguingly, these synaptic profiles exhibited distinct signatures indicative of efficient exploration. The ensemble of synapses tuned to the accurate choice (action 1) rapidly narrowed the distribution and converged to the correct value estimates, while the synapses tuned to less preferred choices still showed a gradient of smaller synaptic strengths distinct from those of the preferred choice. The distinct gradient of synaptic strengths tuned to the less preferred choice indicated that the less preferred choice has high associative uncertainty (i.e., the distributions remained wide and hence those ensembles tuned to exhibit a gradient of strength in Fig. 2h) but low enough to confidently exploit the correct choice (i.e., the distributions are separated from the distribution of the optimal action and hence exhibit a gradient of synaptic strengths smaller than the synaptic strengths tuned to the optimal action in Fig. 2e).
To test the necessity of specific mechanisms in balancing exploration and exploitation, we performed two lesion experiments: one with reduced sparsification in the premotor cortex (KO-sparseness) and another replacing the distributional RPE with a scalar RPE (KO-distributional RPE). Both lesion variants resulted in significantly higher regret (Fig. 2i), driven by premature exploitation that led to persistent suboptimal choices (Fig. 2j). These results provide mechanistic insights into both random sparsification and distributional RPEs in balancing exploration and exploitation.
Our basic CogLink model approximates an algorithm with nearly optimal regret
A mechanistic model often proves too complex to clearly illustrate the underlying computational mechanisms or to admit mathematical analysis. To address this challenge, we approximate the basic CogLink network with an algorithm by leveraging the separation of scales, assuming that neural dynamics occur instantaneously (see “Methods” section). This simplification enables the premotor corticostriatal-like ensemble tuned to action a to act as a sampling mechanism for the action-value distribution. Specifically, the K − WTA dynamics in the premotor cortex-like circuit serve to randomly select K neurons, enabling efficient sampling of the action-value distribution:
where \({{{{\mathcal{V}}}}}_{a}\) represents the distribution of value belief of action a and \({\hat{v}}_{a}\) is the sampled value. The WTA mechanism of the motor cortex-like circuit selects the action with the highest sample value:
Finally, the dopamine-gated plasticity adjusts the corticostriatal synapses to refine action-value estimates over time:
where δm is the distributional RPE and \({{{{\boldsymbol{\eta }}}}}_{{a}_{t}}\) is the learning rate for the at-tuned synapses.
The algorithm provides an intuitive framework for understanding the functionality of our corticostriatal network model. In this framework, A posterior-like distributions, representing action-value beliefs, are sampled through random sparsification in the premotor cortex. The motor cortex then selects the action corresponding to the largest sampled value through the recurrent competitive dynamics. Following action selection, the model refines its action-value distributions based on distributional reward prediction error (RPE) signals from dopamine (DA) neurons. High associative uncertainty-indicating a lack of confidence in the value estimate-results in the significant overlap between posterior-like distributions, promoting exploration (Fig. 2e, left). Conversely, low associative uncertainty leads to well-separated posterior-like distributions, enabling exploitation as the model confidently selects the optimal action (Fig. 2e, right).
To assess our network’s performance against the theoretical regret limit, we conducted a mathematical analysis of the algorithm. Our analysis demonstrates that by appropriately configuring the parameters in the synaptic update rule, the regret of the algorithm will be on the order of \(O(\sqrt{AT\log (AT)})\), where A represents the number of actions and T denotes the number of trials (see Theorem 2 in the “Methods” section for a formal theorem).
Theorem 1
(Informal). If we select the sparsity K, the learning rate \({\{{\eta }_{a,t}\}}_{a\in [A],t\in [T]}\) and the initial synaptic weight \({\{{\bar{V}}_{a,m}^{{{{\rm{alm/bg}}}}}\}}_{a\in [A],m\in [M]}\) appropriately, then the regret of the algorithm after T trials in a static A-AFC task is at most \(C\sqrt{AT\log (AT)}\), where C is a constant.
It has been demonstrated that no algorithm can achieve regret smaller than \({{\Theta }}(\sqrt{AT})\)25. Our algorithm, which differs by only a logarithmic factor, is therefore close to optimal in terms of regret. This result provides a theoretical foundation for the model’s ability to perform efficient exploration under lower-level uncertainty, demonstrating its near-optimal balance of exploration and exploitation.
Relationship between basic CogLink and Bayesian inference with probability matching
We evaluated the performance of our basic CogLink model in static A-AFC tasks and compared it to Thompson Sampling (TS), a widely used algorithm that combines optimal Bayesian inference with probability matching and provides asymptotically optimal theoretical guarantees26. TS was chosen as the baseline because it represents a principled approach to balancing exploration and exploitation under uncertainty. Task difficulty was manipulated by varying the expected reward difference between the most and least rewarding actions (Δ) and the number of alternatives (A) (see “Methods” section) (Fig. 1a). Across all tested environments, CogLink consistently outperformed TS, achieving a better balance between exploration and exploitation, as evidenced by faster convergence to optimal actions and improved regret performance (Fig. 3a–c, Fig. S1a–c).
We evaluate our model across various A-AFC tasks for 50 sessions of 500 trials. a Summarized plot (mean ± s.e.m., n = 50 sessions) for the accurate choice probability of Thompson Sampling (TS) (blue) and basic CogLink (orange) in the AFC task with Δ = 0.4, A = 3. CogLink achieves faster convergence and higher long-term accuracy compared to TS. b Summarized plot (mean ± s.e.m., n = 50 sessions) for regret across different Δ. CogLink consistently outperforms TS across all tested Δ values (****P = 3.47 × 10−5, ***P = 1.25 × 10−4, ****P = 3.37 × 10−7, ***P = 1.32 × 10−4; two-way rank sum test). c Summarized plot (mean ± s.e.m., n = 50 sessions) for regret across different numbers of alternatives A. CogLink consistently outperforms TS across all tested A values (P = 7.25 × 10−3, ***P = 1.04 × 10−4, ****P = 5.49 × 10−7, **P = 3.5 × 10−3; two-way rank sum test). d Summarized plot (mean ± s.e.m., n = 50 sessions) for the expectation of the distribution of value beliefs over trials. Both CogLink and TS converge to similar expectations, and their trajectories are closely aligned throughout the trials, demonstrating comparable accuracy and adaptation in value estimation. e Summarized plot (mean ± s.e.m., n = 50 sessions) for the variance of the distribution of value beliefs, showing that both methods exhibit similar rates of uncertainty reduction over time. f Empirical p.d.f. plot (n = 500 samples) for the distribution of value beliefs under varying premotor cortex sparsity K. Higher K values produce narrower distributions, emphasizing exploitation, while lower K values promote exploration.
To further evaluate CogLink’s versatility in handling more complex decision-making scenarios, we extended its application to two generalizations: a cued A-AFC task, which incorporates state (cue) information, and a binary tree maze task, which introduces state transitions (see “Methods” section). These tasks represent a progression from stateless bandit problems to scenarios where decisions depend on environmental states. In these tasks, we compared CogLink against TS and a neural network-based method, Deep Q-Network (DQN,27). CogLink demonstrated robust performance across varying difficulty settings, maintaining competitive regret compared to both TS and DQN (Fig. S2a–d). These results underscore CogLink’s ability to adapt from simple, stateless environments to more complex tasks involving state information and transitions, demonstrating its versatility in managing lower-level uncertainty during decision-making.
The robust performance of CogLink in various tasks raises questions about the underlying principles that enable its effective decision-making. To better understand these mechanisms, we examined the algorithm resulting from CogLink’s approximation and observed that the resulting algorithm after approximation shares key similarities with Thompson Sampling (TS), particularly in its use of action-value distributions and probability matching-like action selection, but differs in the update rule. To investigate this relationship further, we analyzed the correspondence between the distributional RPE update in Equation (2.4) and Bayesian updates.
First, we initialized the corticostriatal weights to approximate a uniform prior, analogous to Bayesian inference using a uniform prior:
Next, we examined how the expectation and variance of the action-value distribution evolved under the distributional RPE update. By selecting learning rates ηt ∝ 1/t (see “Methods” section), we found that our updates closely approximated the evolution of both the expectation and variance under optimal Bayesian inference (Fig. 3d, e). This choice of learning rate satisfies two critical conditions: \({\sum }_{t=0}^{\infty }{{{{\boldsymbol{\eta }}}}}_{t}=\infty\) and \({\lim }_{t\to \infty }{{{{\boldsymbol{\eta }}}}}_{t}=0\), ensuring that the variance diminishes over time while the expectations converge to the true action-value distribution.
For the action selection, compared to TS, CogLink provides flexibility in balancing exploration and exploitation by modulating the parameter K, the sparsity in the premotor cortex-like area. Larger K values result in narrower distributions after updates, favoring exploitation (Fig. 3f). Specifically, when K = 1, this is probability matching, and when K = M, this is deterministically sampling the expected value. When 1 < K < M, the model employs generalized probability matching28,29,30,31,32, where higher K values increase the emphasis on exploitation while still allowing some degree of exploration. This framework provides a continuum of strategies for balancing exploration and exploitation.
We hypothesize that K could be dynamically modulated in biological systems through neural mechanisms, such as altering the excitability of premotor cortical neurons via neuromodulation. This hypothesis aligns with prior studies demonstrating that neuromodulatory systems, including dopamine and norepinephrine signaling, play a role in adjusting the exploration-exploitation trade-off 33,34. This dynamic modulation offers a plausible pathway for organisms to adapt their decision-making strategies to changing environmental demands.
Building an augmented CogLink network for handling higher-level uncertainty
Returning to our example of conversing with a new colleague, the assumption that each sigh of boredom is solely due to a suboptimal topic choice or random outcome variability (e.g., low focus) is overly simplistic. Higher-level factors, such as the colleague’s mood or workload, can also play a role, and these conditions are often dynamic and unobservable. In naturalistic environments, animals must contend not only with lower-level uncertainties, such as outcome and associative uncertainty, but also higher-level uncertainties, including contextual uncertainty-ambiguity about the underlying context governing the environment. To address this challenge, we designed a probabilistic reversal task in a dynamic environment (Fig. 4a). While the basic CogLink network performed well in static environments, it struggled to adapt quickly to changing contexts in this dynamic setting (Fig. S3e, f).
a Schematic of the probabilistic reversal task, where the context alternates every 200 trials. b Schematic of the network architecture in the augmented CogLink, where the PFC-MD circuit infers the current context and activates downstream premotor circuits accordingly. c Illustration of the PFC-MD circuit encoding contextual likelihood. PFC-MD connectivity constrains MD activity to reside on a low-dimensional manifold, representing the likelihood of contexts. d Summarized plot (mean ± s.e.m., n = 50 sessions) of MD activity across trials in the probabilistic reversal task, with dashed lines indicating context switches. Pink denotes context 1 and blue denotes context 2. e Illustration of nonlinear Hebbian learning in PFC-MD synapses, which allows learning only when contextual likelihood is high. f Summarized plot (mean ± s.e.m., n = 50 sessions) of PFC-MD synaptic strengths encoding the contextual generative model for rewards. The left panel shows results from the full model, while the right panel represents the KO-nonlinear Hebb variant. The inset displays a box plot (n = 50 sessions, ****P = 2.56 × 10−11) of the L2 distance between the PFC-MD synaptic strengths and the true generative model. Solid line denotes action right and dashed line denotes action left. g Summarized plot (mean ± s.e.m., n = 50 sessions) of estimated contextual uncertainty across trials. Uncertainty peaks immediately after context switches, reflecting increased ambiguity about the context following a switch. h Summarized plot (mean ± s.e.m., n = 50 sessions) of learning rates for PFC-MD plasticity, showing modulation by contextual uncertainty. i Diagram of interneuron-mediated thalamocortical projections. PV-mediated pathways suppress cortical activity and plasticity, while VIP-mediated pathways amplify them. j Summarized plot (mean ± s.e.m., n = 50 sessions) of exploration probability as a function of contextual uncertainty, where 0.5 indicates chance level (maximum exploration). k Summarized plot (mean ± s.e.m., n = 50 sessions) of corticostriatal synaptic strengths encoding contextual action values. The left panel shows results from the full model, and the right panel depicts the KO-interneuron gating variant. The inset provides a box plot (n = 50 sessions, ****P = 2.94 × 10−13) of the L2 distance between corticostriatal synaptic strengths and the true action values. MD: mediodorsal thalamus, PFC: prefrontal cortex, PN: pyramidal neurons, TC: thalamocortical neurons, TRN: thalamic reticular nucleus, PV: parvalbumin-positive neurons, VIP: vasoactive intestinal peptide neurons, SST: somatostatin neurons. In the box plots, the orange line indicates the median; the box spans from the first to the third quartile, and the whiskers extend to 1.5 times the interquartile range.
As an initial step, we introduced explicit external contextual cues to the model, activating separate instances of the basic CogLink network depending on the provided cues (Fig. S3b). This modification allowed the model to achieve instantaneous behavioral switching (Fig. S3g, h). However, animals in natural environments rarely have access to explicit contextual cues and instead must infer the underlying context from ambiguous and incomplete observations.
The prefrontal cortex (PFC)-mediodorsal thalamus (MD) circuit is a natural candidate for enabling such contextual inference. The PFC is well-established as a key region for flexible, context-dependent behavior35,36, generating complex activity patterns to support such capacities. Recent studies suggest that these patterns are regulated by interactions with the MD37,38,39,40,41,42,43, which encodes task context explicitly in a range of decision-making paradigms1,44,45,46. Inspired by these findings, we augmented CogLink by incorporating a PFC-MD-like circuit to infer and provide contextual information to the basic CogLink networks (Fig. 4b). This augmentation enables the model to adapt to dynamic environments without relying on explicit external cues. One important assumption in our model is that disjoint basic CogLink networks are activated based on the inferred context. We discuss the biological plausibility of this mechanism further in the “Discussion” section.
A prominent feature of the augmented CogLink model is its low-dimensional representation of contextual likelihood in the MD-like area, consistent with previous literature1,44,45,46,47,48,49. Specifically, we propose that the MD encodes the conditional likelihood p(c∣a≤t, r≤t) of a context c, given the history of action-outcome pairs {a≤t, r≤t}. To achieve this, we hypothesize that MD activity lies on a low-dimensional simplex attractor, enabling a stable representation of contextual likelihood. Since the thalamus lacks intrinsic excitatory recurrence42, we propose that PFC and MD form an excitatory loop, with the thalamic reticular nucleus50,51,52,53 providing local inhibition to stabilize the attractor (Fig. 4, see “Methods” section). In this framework, contextual likelihood is represented as an explicit probability code, where higher neural activity corresponds to a higher likelihood of the associated context. The simplex attractor structure allows MD activity to dynamically integrate inputs, enabling it to traverse the manifold in response to changing environmental conditions (Fig. 4c, d). However, contextual inference requires that these inputs be appropriately encoded to reflect the conditional likelihood.
Bayes’ rule provides a framework for determining the required input encoding by defining how contextual likelihoods are computed:
This formalism suggests that the inputs should correspond to the single-trial contextual generative model p(at, rt∣c), which is accumulated across trials to compute the overall likelihood. We hypothesize that PFC-MD synapses learn this single-trial generative model, while the MD’s low-dimensional attractor dynamics perform the accumulation. To enable this process, we implemented a Hebbian learning rule for PFC-MD connections (Fig. 4e, see “Methods” section):
Here, \({{{{\bf{x}}}}}_{c}^{\,{\mbox{md}}\,}\) and \({{{{\bf{x}}}}}_{a,r}^{\,{\mbox{pfc}}\,}\) denote the activities of MD and PFC neurons tuned to context c and action-outcome pair (a, r), respectively, \({{{{\bf{V}}}}}_{c,a,r}^{\,{\mbox{pfc/md}}\,}\) represents PFC-MD synaptic weights between these MD and PFC neurons, and fhebb (Fig. S4b) is a sigmoidal gating function that modulates synaptic plasticity.
Naive Hebbian plasticity may incorrectly associate action-outcome pairs with the wrong context when contextual uncertainty is high, resulting in inaccurate estimates of the contextual generative model (Fig. 4f). To mitigate this issue, we incorporate a gating mechanism fhebb that modulates plasticity based on MD activity. This gating enhances learning when the MD confidently infers the context (high MD activity) and suppresses plasticity when contextual uncertainty is high (low MD activity). By doing so, the mechanism achieves two key objectives: it accelerates the learning of contextual statistics when confidence is high and prevents the misattribution of associations under high contextual uncertainty (Fig. 4g, h).
To causally test the necessity of this mechanism, we evaluated a variant of the model with naive Hebbian plasticity (KO-nonlinear Hebb). The results show that the generative model learned by the full CogLink closely approximates the true generative model of the environment, whereas the KO-nonlinear Hebb variant deviates significantly (Fig. 4f). This supports the critical role of fhebb in enabling accurate contextual learning under uncertainty.
Another key component of the augmented CogLink is an interneuron-mediated thalamocortical projection pathway that modulates cortical activity to drive exploration under high contextual uncertainty (Fig. 4i). When contextual uncertainty is high, animals need to explore more to gather information about the current context. To implement this pathway, we drew inspiration from experimental findings showing that the MD thalamus modulates PFC functional connectivity through distinct interneuron-mediated mechanisms. Specifically, Mukherjee et al. identified two thalamocortical pathways: one amplifies cortical connections via local disinhibition by vasoactive intestinal peptide (VIP) interneurons, and the other suppresses cortical activity through fast inhibition mediated by parvalbumin (PV) interneurons45.
Building on these findings, we assumed that such modulation enables contextually relevant PFC populations to differentially influence downstream premotor circuits, thereby facilitating context-dependent behavior. To model this mechanism, we included both thalamic projection pathways: one amplifies effective cortical connectivity for the preferred context, while the other inhibits cortical activity related to the opposing context. This modulation adjusts the activity in the PFC-like area and influences downstream premotor corticostriatal connections (Fig. 4i). Specifically, the projections modulate the effective strength of corticostriatal connections according to the following dynamics:
Here, τbg is the membrane time constant of striatal neurons. \({{{{\bf{x}}}}}_{c,a,m}^{\,{\mbox{bg}}\,}\) and \({{{{\bf{x}}}}}_{c,a,m}^{\,{\mbox{alm}}\,}\) represent the activities of the mth BG and premotor neurons tuned to context c and action a, respectively, while \({{{{\bf{V}}}}}_{c,a,m}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\) is the strength of the corticostriatal synapse connecting these neurons. fin is a sigmoidal nonlinearity, and \({{{{\bf{x}}}}}_{c}^{\,{\mbox{vip}}\,}\) and \({{{{\bf{x}}}}}_{c}^{\,{\mbox{pv}}\,}\) denote the activities of VIP and PV interneurons receiving MD inputs tuned to the preferred and opposing contexts, respectively (see “Methods” section). Since these interneurons receive contextual inputs from MD, the term \({f}_{{{{\rm{in}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{vip}}\,}-{{{{\bf{x}}}}}_{c}^{\,{\mbox{pv}}\,})\) encodes contextual certainty. This mechanism ensures that corticostriatal connections are weakened under high contextual uncertainty, promoting exploratory behavior.
To validate this mechanism, we systematically varied MD activity to manipulate contextual uncertainty and measured its effect on exploratory behavior. Consistent with our predictions, higher contextual uncertainty corresponded to increased exploration, confirming the role of thalamocortical projections in dynamically regulating contextual uncertainty-based exploration (Fig. 4j).
In addition to modulating exploratory behaviors, contextual uncertainty should also regulate learning. Under high contextual uncertainty, naive dopamine-dependent plasticity risks misattributing associations to the wrong context, resulting in inaccurate action-value estimates (Fig. 4k). To address this, we implemented a mechanism in which interneuron-mediated inputs gate the plasticity of corticostriatal synapses (Fig. 4i). This design is inspired by experimental findings that interneuron-mediated pathways can modulate cortical plasticity54,55. Specifically, our model incorporates the following update rule:
Here, \({{{{\boldsymbol{\delta }}}}}_{c}\in {{\mathbb{R}}}^{M}\) represents the distributional dopamine activities tuned to context c, and \({{{{\bf{V}}}}}_{c,{a}_{t},m}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\) denotes the corticostriatal synaptic weights. The gating term \({f}_{{{{\rm{in}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{vip}}\,}-{{{{\bf{x}}}}}_{c}^{\,{\mbox{pv}}\,})\), a sigmoidal nonlinearity, reflects the relative activities of VIP and PV interneurons, encoding contextual certainty. When contextual uncertainty is high (low fin), the mechanism suppresses learning to avoid associating incorrect contexts with observed outcomes. Conversely, under low uncertainty, plasticity is enhanced, promoting accurate learning.
To test the necessity of this gating mechanism, we developed a variant of the model that bypasses interneuron-mediated gating and uses direct thalamocortical modulation (KO-interneuron gating). We compared the action-value estimates learned by the full CogLink model to those of the KO-interneuron gating variant. The full model closely approximated the true action values of the environment, whereas the KO-interneuron gating variant deviated significantly (Fig. 4k). These results underscore the critical role of interneuron-mediated gating in enabling accurate and continual learning across contextual switches.
The MD circuit approximates an algorithm that detects environmental changes optimally
To computationally understand how CogLink achieves flexible switching, we next describe the effective MD circuit and approximate the dynamics of the MD circuit with an algorithm. The MD circuit is structured to accumulate contextual likelihoods, enabling robust context inference. Mathematically, by letting the dynamics of the thalamic reticular nucleus (TRN) and frontal neurons happen instantaneously, the MD circuit can be effectively described by the following equations (see “Methods” section):
where τmd = τeffD/2 represents the membrane time constant of MD, τ eff represents the effective time constant for accumulation dynamics, \(w=\frac{1}{D}\), and \({{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,},{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}\) represent the PFC inputs to MD. The nonlinearity function is defined as:
Defining \(X={{{{\bf{x}}}}}_{1}^{\,{\mbox{md}}\,}-{{{{\bf{x}}}}}_{2}^{\,{\mbox{md}}\,}\), the dynamics simplifies to:
and
Equation (2.16) corresponds to a drift-diffusion process, while Equation (2.17) describes the dynamics above threshold ±D. When \(| {{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,}-{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}| \ll 1\), X stabilizes at approximately ±D, resulting in thresholded drift-diffusion behavior with inputs \({{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,}-{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}\) (Fig. 5a).
a Summarized plot (mean ± s.e.m., n = 50 sessions) of the firing rate difference between two MD populations tuned to distinct contexts during a probability reversal task. b Example trajectory comparing evidence accumulation in CogLink’s MD circuit and the CUSUM algorithm during a context switch. Blue denotes the CUSUM algorithm, and orange denotes CogLink. c Summarized plot (mean ± s.e.m., n = 50 sessions) of accumulated regret over 1000 trials in probability reversal tasks, comparing CogLink (orange) to HMM-TS (green). d Box plot (n = 50 sessions) of accumulated regret over 1000 trials in probability reversal tasks (****P = 1.89 × 10−7; two-sided rank sum test). e Summarized plot (mean ± s.e.m., n = 200) showing recovery dynamics of accurate choice probability following a context switch in the probability reversal task. f. Box plot (n = 50 sessions) of overall accuracy over 1000 trials in probability reversal tasks (****P = 1.89 × 10−7; two-sided rank sum test). g Box plot (n = 200 switches) of context-switching time (number of trials to reach 80% accuracy) in probability reversal tasks (****P = 5.94 × 10−6; two-sided rank sum test). h Box plot comparing switching time under high (n = 50 switches) and low (n = 150 switches) associative uncertainty conditions in probability reversal tasks (****P = 1.21 × 10−5; two-sided rank sum test). i Box plot comparing switching time under high (n = 200 switches) and low (n = 200 switches) outcome uncertainty conditions in probability reversal tasks (****P = 3.54 × 10−60; two-sided rank sum test). In the box plots, the black line indicates the median; the box spans from the first to the third quartile, and the whiskers extend to 1.5 times the interquartile range.
If the PFC inputs learn the accurate generative model from Equation (2.11), these dynamics align with the CUSUM algorithm, a theoretically optimal method for detecting distributional changes56,57. Specifically, this occurs when:
Here, α denotes the baseline excitation. To illustrate, if we set X0 = −D and St = Xt + D, the evolution of Xt corresponds to
When St < 2D, the CogLink model functions as a CUSUM algorithm with a threshold at D for detecting distributional changes (see “Methods” section). This alignment underscores the efficiency of the thalamocortical model in identifying environmental changes and facilitating transitions between different instances of the basic CogLink for decision-making. Consistent with our theoretical predictions, the model closely approximates the behavior of the CUSUM algorithm during the first contextual switch (Fig. 5b).
Recognizing that real-world environments often involve multiple sequential changes, the CogLink model incorporates a capping mechanism to address the limitations of the CUSUM algorithm, which is designed for single change point detection. By capping the accumulation of evidence for each context (Equation (2.17)), this mechanism prevents overcommitment to a single context and enables the model to reset quickly and prepare for subsequent environmental shifts (Fig. 5a). This explains the observed deviations from the CUSUM algorithm’s behavior after the first detected change and highlights the importance of this feature in maintaining adaptability.
Furthermore, the evidence-capping mechanism supports the model’s independence from prior knowledge of the generative model. As long as there is sufficient time between context changes for I pfc/md to accurately learn the contextual generative model, the CogLink model operates effectively without requiring specific environmental assumptions. This model-agnostic property not only distinguishes it from ideal observer models, which depend on precise access to the generative model, but also underscores its versatility and robustness across diverse and dynamic environments.
The augmented CogLink achieves flexible decision-making and continual learning by managing hierarchical uncertainty
To empirically evaluate CogLink’s performance in dynamic environments, we compared it to a Hidden Markov Model (HMM) that has prior knowledge of the hidden generative model of the environment and uses Thompson sampling for action selection (see “Methods” section). Despite the HMM’s advantage of full prior knowledge, CogLink achieves comparable levels of regret and accuracy while learning the generative model from scratch (Fig. 5c-f). This comparison underscores CogLink’s ability to perform effectively without relying on predefined assumptions about the environment.
Analyzing the models’ behaviors after a context switch reveals differences in their adaptation strategies. While both models transition rapidly to the new context, the HMM switches slightly faster but requires more trials to fully stabilize its decisions (Fig. 5e). To quantify this, we define trials to switch as the number of trials needed for a model to achieve 80% accuracy over the past 10 trials following a context change. As expected, the HMM exhibits faster switching times due to its prior knowledge, though CogLink’s switching performance remains competitive (Fig. 5g).
To understand the mechanisms underlying CogLink’s performance, we analyzed the evidence accumulation dynamics in MD, as predicted by the theoretical framework in the previous section. The model rapidly and accurately detects context switches after each block, leveraging these dynamics to adapt effectively (Fig. 5a). Furthermore, CogLink demonstrates robust continual learning by accurately updating action values and the contextual generative model, even as environmental statistics shift across blocks. These learned estimates remain stable across switches, retaining prior block information while enabling adaptation to new contexts (Fig. 4f, k).
To further explore this adaptability, we examined how CogLink leverages contextual uncertainty encoded in MD populations to support continual learning. Contextual uncertainty peaks immediately after context switches, reflecting the model’s need to gather information during transitions (Fig. 4g). This uncertainty modulation directly influences Hebbian learning rates of PFC-MD synapses (Equation (2.11)), which rapidly decrease for the previous context after a switch. This reduction prevents the model from incorrectly learning generative models in the wrong context (Fig. 4h). Similarly, VIP- and PV-mediated learning rates (Equation (2.13)) are modulated to ensure that action-outcome associations are appropriately attributed to the current context (Fig. S4c, d).
Interestingly, uncertainty modulation operates bidirectionally between hierarchical levels. High associative uncertainty, arising from insufficient knowledge of action-outcome associations, slows the model’s updates to contextual uncertainty, reflecting the difficulty in attributing evidence to the correct hierarchical process. This behavior manifests in longer switching times when CogLink encounters a novel block (Fig. 5h). Conversely, in dynamic environments with low outcome uncertainty (e.g., reward probabilities of 90%/10%), CogLink switches contexts much more rapidly (Fig. 5i). This suggests that reduced variability in outcomes enables the model to more readily attribute failure to context changes, thereby facilitating faster contextual updates. Together, these findings indicate that both associative and outcome uncertainty shape the dynamics of contextual uncertainty. By orchestrating these interactions across hierarchical uncertainty levels, CogLink achieves flexible decision-making and robust continual learning, even in complex and dynamic environments.
The model explains experimental findings showing MD causal engagement in decision-making involving changing but not stationary environments
A number of studies have shown that MD lesions or inactivation perturbs behavioral adjustment when the environment changes, but doesn’t necessarily impact behavior when conditions are stable44,45,58,59,60. To test whether our model exhibits these features, we performed perturbation studies by suppressing model MD neural activity (see “Methods” section). In agreement with the corpus of experimental findings, we found that the MD-suppressed model took significantly longer to switch compared to the normal model (Fig. 6a–f)44. Specifically, following a block switch, the MD-suppressed model exhibits a gradual increase in exploration of the alternate action until commitment (Fig. 6c). Moreover, the model provided a unique perspective on why this happens with an experimentally testable prediction: in the other model component, the M1-BG component, analysis of corticostriatal connection strength revealed fluctuating value estimates across blocks, indicating unlearning of value estimates from the previous context to adapt to the current one (Fig. 6g, h). This is consistent with the idea that without the MD, animals may default to lower-level or model-free strategies to solve tasks that they would otherwise be able to solve with frontal control.
We evaluate the CogLink model (yellow) and MD inhibition model (purple) in the probabilistic reversal task for 50 sessions. a Summarized plot (mean ± s.e.m., n = 50 sessions) for average accumulated regret. b Box plot (n = 50 sessions) for the final regret (****P = 7.01 × 10−18; two-way rank sum test). c Summarized plot (mean ± s.e.m., n = 200 switches, 4 switches for 50 trials) for accurate choice probability after a switch. d Box plot (n = 50 sessions) for the accuracy (****P = 7.01 × 10−18; two-way rank sum test). e Summarized plot (mean ± s.e.m., n = 50 sessions) for numbers of trials to switch (*P = 2.74 × 10−2, ****P = 4.00 × 10−5; two-sided permutation test on Spearman’s rank correlation coefficient ρ). f Box plot (n = 200, 4 switches for 50 trials) for the switching time (****P = 6.50 × 10−60; two-way rank sum test). g Summarized plot (mean ± s.e.m., n = 50 sessions) for average estimated action values encoded in corticostriatal synapses. h Summarized plot (mean) for average corticostriatal synaptic strengths. i Box plot (n = 50 sessions) for the regret in a stationary AFC task with Δ = 0.3, A = 2 (****P = 8.00 × 10−6, ***P = 9.26 × 10−4, NS P = 0.84; Bonferroni-corrected Kruskal–Wallis test with post hoc Dunn’s test). In the box plots, the black line indicates the median; the box spans from the first to the third quartile, and the whiskers extend to 1.5 times the interquartile range.
A natural question then arises: Is MD necessary for efficient exploration in a stationary environment? To answer this question, we evaluated our models in various stationary 2-AFC tasks. In contrast to the results above, the MD inhibition model still has comparable behavioral performance with Thompson sampling across various environments and only slightly degrades its performance from the full model (Fig. 6i, Fig. S6). This further indicates that MD is not directly involved in simple associative learning, but rather serves as a central hub to orchestrate the learning of contextual models and modulation of downstream associative learning through learned contextual models (see “Discussion” section).
Hyperactivation of striatal D2 receptors induces Schizophrenia-like behaviors, and MD stimulation can rescue these deficits
There is increasing evidence that schizophrenia patients exhibit impaired belief updating processes61,62,63,64, which may be related to susceptibility to delusional thinking61,65,66. Separately, resting state functional connectivity between the MD thalamus and PFC is altered in schizophrenia patients67,68,69,70,71,72,73,74. A recent study using mouse models carrying schizophrenia-relevant mutations showed both perturbed MD function and belief updating, and optogenetic MD stimulation led to a normalization of the belief updating process16. While striking, these findings leave open the question of what the mechanistic links are between MD perturbation and the belief updating decision-making process.
Inspired by the fact that most antipsychotics targeting D2 receptors (D2Rs) are dopamine antagonists75,76,77 and most Schizophrenia patients show an elevated level of striatal D2Rs78,79, we consider a model with hyperactivation of striatal D2Rs. Since a hyperactive striatum is expected to inhibit the MD thalamus80, we model our impaired model with decreased MD excitability (see “Methods” section, Fig. 7a).
Here, βd2 is the decreased excitability from D2R hyperactivation. It is observed that the impaired model has suboptimal regret and accuracy and never fully commits to accurate choices after a block switch (Fig. 7b–e). Moreover, the model also exhibits longer exploration after a switch (Fig. 7f, g), showing impaired cognitive flexibility16,81. On the other hand, the impaired model also shows an elevated win-switch rate (Fig. 7h), suggesting a perception of environmental instability leading to this erratic behavior. These two seemingly contradictory behaviors of slow switching and high win-switch are consistent with experimental findings in both patients and animal models16,62,81,82.
a A schematic of our impaired model and MD activation rescue model. We posit that hyperactivation of striatal D2R leads to stronger inhibitory BG output, which, in terms, reduces MD's excitability through shunting inhibition. We inject a current in MD to rescue the impaired model. b Summarized plot (mean ± s.e.m., n = 50 sessions) for average accumulated regret. Yellow denotes the full CogLink model, red denotes the impaired model, and blue denotes the MD activation rescue model. c Box lot (n = 50 sessions) for the final regret (****P = 8.30 × 10−8, **P = 1.29 × 10−3, NS P = 0.13; Bonferroni-corrected Kruskal–Wallis test with post hoc Dunn’s test). d Summarized plot (mean ± s.e.m., n = 200, 4 switches from 50 trials) for average accurate choice probability after a switch. e Box plot (n = 50 sessions) for the accuracy (****P = 8.30 × 10−8, **P = 1.29 × 10−3, NS P = 0.13; Bonferroni-corrected Kruskal–Wallis test with post hoc Dunn’s test). f Summarized plot (mean ± s.e.m., n = 50 sessions) for the number of trials to switch. g Box plot (n = 200, 4 switches from 50 trials) for the switching time (***P = 3.98 × 10−3, ****P = 9.47 × 10−7, ****P = 2.54 × 10−16; Bonferroni-corrected Kruskal–Wallis test with post hoc Dunn’s test). h Box plot (n = 50) for the win-switch rate (****P = 5.54 × 10−18, ****P = 8.11 × 10−10, *P = 4.27 × 10−2; Bonferroni-corrected Kruskal–Wallis test with post hoc Dunn’s test). i Summarized plot (mean ± s.e.m., n = 50 sessions) for the average activity difference in MD populations. j Summarized plot (mean) for average corticostriatal synaptic strength for the impaired model. k–m The top row represents data from the impaired model, while the bottom row represents data from the rescue model. k Summarized plot (mean ± s.e.m., n = 50 sessions) for the average estimated probability to receive a reward. l Summarized plot (mean ± s.e.m., n = 50 sessions) for the average learning rate of PFC-MD plasticity. m Summarized plot (mean ± s.e.m., n = 50 sessions) for average learning rate of interneuron-gated plasticity. In the box plots, the black line indicates the median; the box spans from the first to the third quartile, and the whiskers extend to 1.5 times the interquartile range.
To investigate the neural mechanisms behind these two behaviors, we first examine the drift process formed by the difference in activities of two contextual MD populations. Compared to the normal drift process, which is well-separated across contexts, the impaired drift process saturates its evidence at a much lower threshold, inducing a strong prior for the volatility of the environment (Fig. 7i). To understand its underlying dysfunctions, we leverage CogLink’s capacity to approximate an algorithm and show that the threshold of the accumulation dynamics becomes smaller. Moreover, the impaired normative model exhibits leaky evidence integration, further reinforcing its prior belief in the environment’s volatility (see “Methods” section). By examining the contextual uncertainty the impaired model decoded, we can also observe its strong belief in environmental volatility (Fig. S7b).
On the other hand, the corticostriatal strengths exhibit more homogeneous profiles (Fig. S7c), indicating low associative uncertainty. Moreover, its learning rate modulated by VIP/PV interneurons is much lower than the normal model (Fig. 7m, Fig. S4d). This suggests that although the impaired model has a strong prior on the volatility of the environment, it also updates its belief at a much slower rate within a single context, potentially contributing to slow switching.
Numerous studies have demonstrated alterations in PFC-MD coupling in schizophrenia patients67,68,69,70,71,72,73,74. Given that our model suggests PFC-MD connections are involved in learning contextual generative models, we aim to investigate whether the impaired model also exhibits deficits in model learning. Compared to the normal model, the impaired model struggles to learn the correct contextual generative model of the environments (Fig. 7k). To probe the mechanism, we examine the learning rate of PFC-MD connections. Indeed, lower excitability results in neuronal activities insufficient to induce Hebbian plasticity (Fig. 7l).
To restore the model’s learning capacity, we introduce a small excitatory current into the MD neurons (Fig. 7a). This intervention reduces both regret and exploratory behaviors after a switch (Fig. 7b–g). Additionally, although the rescue model does not reduce the win-switch rate (Fig. 7h), the drift process of the rescue model exhibits a higher threshold for evidence accumulation, indicating a weaker prior on environmental volatility (Fig. 7i). Moreover, the rescue model learns a more accurate generative model of the world (Fig. 7k) and reinstates proper learning in PFC-MD connections (Fig. 7l). These findings are consistent with the recent MD activation experiments on Schizophrenia-relevant mouse models16.
Discussion
Biological plausibility of the CogLink
CogLink incorporates diverse, biologically inspired mechanisms to model associative and contextual uncertainty processing in frontal networks. To ensure computational tractability while maintaining biological plausibility, the model includes certain assumptions and simplifications, which we discuss below.
To address hierarchical uncertainty, CogLink models hierarchical cortico-thalamic-basal ganglia (BG) loops. In animals, these loops process different types of information-such as motor, limbic, and associative-through parallel streams83,84. In CogLink, we specifically model the motor and associative components of the cortico-thalamic-BG loop to process low- and high-level uncertainty, respectively.
The basic CogLink focuses on the premotor cortico-thalamic-basal ganglia (BG) loop, emphasizing BG’s role in associative learning under uncertainty. Instead of modeling the full complexity of BG circuitry, including direct and indirect pathways85, we adopt a simplified actor-critic structure, commonly used to capture BG’s associative learning capacity17. Importantly, we assume that striatal neurons encode a distribution of action-value beliefs, updated by dopamine neurons through distributional reward prediction errors (RPEs). This assumption is suggested by evidence that dopamine neurons exhibit inhomogeneous responses forming distributional RPEs22. To implement sampling of these distributions, we hypothesize random input sparsification in the premotor cortex, inspired by evidence of variable sparse firing in supragranular cortical layers (layers 2/3)86,87,88, which are parts of the cortical circuit known to generate striatal inputs.
The augmented CogLink extends this framework to an associative cortico-thalamic-BG loop to address contextual uncertainty. Although thalamic-striatal connections also have been implicated in flexible behavior89,90, we focus on the well-established role of the prefrontal cortex (PFC) and mediodorsal thalamus (MD) in cognitive flexibility1,16,43,44,45,52 and choose not to model those connections. Another key assumption in this model is the use of disjoint cortical representations that are contextually activated. This modular strategy representation is supported theoretically91,92 and experimentally, as Kim et al. demonstrated context-dependent stable representations of the same action in mice93.
Extensive studies have shown that the MD thalamus explicitly encodes task contexts across various decision-making paradigms44,45,46, with recent findings highlighting its role in encoding contextual uncertainty1. Based on this evidence, we propose that MD acts as a central hub for encoding contextual uncertainty. In CogLink, thalamocortical projections serve dual roles: driver-like projections maintain stable contextual representations via recurrent PFC-MD loops (Equation (2.14)), while modulatory projections influence cortical connectivity and plasticity through interneurons (Equation (2.12), Equation (2.13)). These dual roles align with the classical distinction between core (driver) and matrix (modulatory) thalamic projections94,95, as well as recent findings on the diverse functions of the thalamus in modulating cortical dynamics43,44,52,96,97,98,99,100,101,102.
Additionally, we hypothesize that PFC populations modulated by these interneuron-mediated thalamocortical projections activate downstream premotor circuits in a context-dependent manner. Supporting this hypothesis, Wang and Sun103 demonstrated that PFC sends context-encoding inputs to the premotor cortex to initiate movement.
Together, these features make CogLink a biologically plausible framework for understanding how hierarchical uncertainty shapes decision-making and cognitive flexibility.
Neural representation, computation, and usage of uncertainty
Even though it is well-established that uncertainty profoundly impacts behavior104,105,106, an overarching computational framework for how different forms of uncertainty are encoded, represented, and decoded to drive behavioral adjustment is lacking107,108. On the encoding front, uncertainty may be represented at the single neuron level, which empirical studies have found in the basal ganglia109 and the frontal cortex110,111. Another encoding strategy is in the form of a distribution at the neural population level19,112. A distributional code is more computationally demanding but may offer flexibility through differential decoding (e.g., different parts of the distribution can be selectively weighted based on other state variables). There are different frameworks to represent distributions in a neural population, such as probabilistic population codes19,113,114, sampling-based codes20,112,115, explicit probabilistic codes21,116, and quantile codes22. Our model used two distinct ways to encode uncertainty as a distribution, which are motivated by empirical findings that we explain below.
For associative uncertainty, which is computed in the BG component of our model, it is encoded in a quantile code similar to distributional reinforcement learning (RL)22. This is consistent with the fact that the striatum is a major output for dopaminergic neurons, and indeed, our model shows that it is quite straightforward for dopamine-gated plasticity to update this form of BG distribution. In addition, our simulations show that it is easy to sample from a quantile distribution through recurrent competitive dynamics because sampling neurons corresponds to sampling the corresponding probabilistic quantity in such code. We should note, however, that while our model uses a quantile code similar to those in the distributional RL literature, our implementation differs in two important ways. First, rather than varying the optimism of each synapse, we vary the initial synaptic strength. This approach enables our model to learn the posterior over action-value beliefs rather than the reward distribution, allowing for representation of associative uncertainty. Second, we introduce a mechanism to couple this representation to behavioral adjustment through sampling. This conceptually motivated deviation from previous distributional RL models is designed to link the representation of uncertainty to decision-making, rather than merely using distribution learning for improved generalization. In our model, we posit that sampling, which can be efficiently done on its quantile code, is a mechanism to couple uncertainty to efficient exploration.
For contextual uncertainty, which is computed in the MD thalamus, it is encoded as an explicit probabilistic code inspired by past experimental works showing MD encodes context44. This representation has the distinct advantage of contextually modulating local learning (Fig. 4g, h). However, the detailed mechanism of how this representation arises is poorly understood. To investigate how to compute such a representation, we include two mechanisms in PFC-MD circuits. First, PFC-MD connections learn the contextual model of the environment at a single-trial level via Hebbian learning. Second, recurrent dynamics in the PFC-MD circuit accumulate the single-trial likelihoods from corticothalmic inputs to calculate the current likelihood of contexts conditioned on previous experiences. Based on recent evidence showing that the thalamus modulates both activities and plasticity of downstream cortical networks1,45,54,55, we include interneuron-mediated pathways to allow the contextual MD representation to accomplish these functions and explain how contextual uncertainty can impact exploration and learning through these mechanisms.
Thalamocortical interaction as a system-level solution for flexible behaviors and model-based learning
Both animals and humans rely on a delicate coordination between model-free and model-based learning processes to adapt flexibly to their environments117,118,119,120,121. BG has traditionally been associated with model-free learning, while PFC has emerged as a locus for model-based learning and the mediation between the two systems17,122,123,124,125,126. However, the intricate mechanisms underlying the coordination of these learning types remain poorly understood. In our study, we propose the thalamus as a potential communication hub orchestrating this coordination, hypothesizing a detailed circuit mechanism to achieve this integration.
The thalamus is well-known for its topographic and reciprocal connections with the neocortex, as well as its projections to the BG95,127. While traditionally viewed as a relay station for sensory information, recent research has revealed its involvement in diverse functions across sensory52,98,53,128, cognitive1,43,44,45,97 and motor domains96,100. The convergence of inputs onto the thalamus and its diverse modulation of cortical and BG circuits position it ideally as a locus of plasticity for learning contextual states in model-based learning to coordinate between model-free and model-based systems.
In our model, PFC-MD circuits learn the contextual model of the environment and represent contexts in MD. This model-based learning component then modulates both plasticity and activities of the downstream model-free learning component, corticostriatal circuits, based on estimation and uncertainty of current contexts in MD. Lesioning MD disrupts this coordination, impairing the model’s ability to flexibly switch behaviors in dynamic environments. However, the lesioned model can still perform in a stationary environment, indicating MD is not involved directly in pure model-free learning. These observations underscore the pivotal role of PFC-MD circuits as the locus of model-based learning, utilizing the learned model of the world to modulate corticostriatal model-free learning and achieve flexible behaviors.
Brains provide different levels of specialized mechanisms for credit assignment
The role of dopamine innervation in the basal ganglia is well-established in carrying reward prediction error (RPE) signals to reinforce behaviors associated with unexpected rewards through synaptic plasticity mechanisms122,129,130,131. However, decision-making in animals involves navigating through multiple cues, actions, and contexts, posing the challenge of appropriately assigning credits to the corresponding synaptic connections responsible for the unexpected rewards—a phenomenon termed credit assignment132,133,134,135.
Traditional machine learning approaches, such as backpropagation, attempt to reinforce internal activity states leading to unexpected rewards134. However, backpropagation relies on symmetric feedback weights and a separation of errors and activities, which are not observed in biological brains135. Additionally, traditional artificial neural networks often struggle with crediting sensorimotor associations to the correct context across different contexts, leading to catastrophic forgetting136,137,138,139,140.
To address these challenges, researchers have proposed a plethora of cellular, circuit, and system-level mechanisms for proper credit assignment141,142,143,144,145,146,147,148,149,150,151. In our work, we integrate mechanisms at multiple levels to facilitate credit assignment.
At the cellular level, Hebbian-like learning in thalamocortical connections enables credit assignment by crediting associations to specific contexts only when the model is confident in its context inference. Circuit-level credit assignment is exemplified by dopamine-gated plasticity in the basal ganglia, where only corticostriatal connections corresponding to the chosen action undergo plasticity changes. This can be implemented by maintaining an eligibility trace from a motor action’s efference copy back to corticostriatal synapses.
Moreover, thalamocortical interactions via interneurons offer a system-level solution for credit assignment. In our model, the thalamus modulates cortical learning through cortical interneurons to correctly attribute sensorimotor associations to the appropriate context. PV neurons inhibit context-irrelevant cortical ensembles to prevent learning in the wrong context, while VIP neurons facilitate downstream learning when the model is confident in its inferred context.
These examples illustrate the brain’s utilization of diverse mechanisms operating at different levels to perform credit assignments effectively in complex natural environments.
CogLink network as a way to link molecular and behavioral changes in schizophrenia
Genetics are recognized as significant risk factors in schizophrenia152, and computational modeling has highlighted deficits in belief updating as a key aspect of the disorder61,62,63,64. However, the intricate mechanisms bridging these genetic risk factors and belief updating deficits remain poorly understood. Our CogLink network, capable of linking mechanisms with normative behavior, creates a foundation to study these connections.
In constructing our schizophrenia model, we focus on a striatal D2R overexpression model because most antipsychotics targeting D2 receptors (D2Rs) are dopamine antagonists75,76,77 and most SZ patients showed an elevated level of striatal D2Rs78,79. We focus on the effects of striatal D2R overexpression on the PFC-MD circuit, given mounting evidence implicating alterations in these regions in schizophrenia pathology67,68,69,70,71,72,73,74. Since the abundance of D2Rs increases the inhibition from BG to the thalamus, we model schizophrenia by reducing the excitability of MD neurons to mimic a high level of BG inhibition.
Our schizophrenia model replicates experimental findings in both patients and animal models, such as exploratory behavior following contextual switches and an elevated win-switch rate16,62,81,82. The CogLink network further explains how circuit-level perturbation connects to these specific cognitive impairments. In particular, by examining the corresponding normative model, we can show that the model exhibits a much lower threshold for evidence accumulation, and the accumulation dynamic becomes leaky, indicating a strong bias toward environmental volatility. Additionally, decreased excitability in MD compromised the ability of PFC-MD connections to accurately learn the environmental model. To address this impairment, we applied current injections to MD to restore activity levels to a range conducive to Hebbian plasticity. Remarkably, the rescue model demonstrated reduced exploratory behavior following switches and exhibited a higher threshold for MD activity switching, indicative of a diminished bias towards environmental volatility. Moreover, the rescue model exhibited improved learning of the environmental model within its PFC-MD connections. These findings validate recent experiments in Schizophrenia-related animal models16 and demonstrate the utility of the CogLink network in computational psychiatry.
CogLink network vertically integrates and describes neural phenomena from different perspectives
Different modeling approaches offer distinct perspectives on understanding brain computation153,154. Normative theories have traditionally elucidated animal behaviors and neural coding but often lack direct connections to lower-level neural correlates. In contrast, mechanistic models provide such links, allowing for testable predictions through symmetric perturbations on models and animals. However, understanding mechanistic models at the computational level can be challenging due to their complexity.
In this paper, our CogLink network aims to bridge the gap by constructing a mechanistic model capable of approximating normative models. By incorporating observed neural mechanisms into our model, we establish a direct connection to neural circuits. Simultaneously, approximating normative theories enables mathematical analysis, offering both quantitative and qualitative computational insights. Furthermore, our CogLink network provides distinct advantages in providing a model hypothesis. On the one hand, neural mechanisms provide strong biological prior to a normative model; on the other hand, connections to a normative model provide a guide to adjust the mechanistic parameters to achieve complex cognitive behaviors. We view this modeling approach as the initial step toward integrating Marr’s three levels of analysis-computational, algorithmic, and implementation levels. Furthermore, many neurological diseases have genetic origins along with cognitive symptoms. Since our modeling approach contains both mechanistic details and computational insights into behaviors, it can serve as a lens to study these neurological diseases.
Recently, population dynamic approaches have proven potent in uncovering underlying computations from electrophysiological data155. However, these approaches often lack integration with connectivity and functional data information, limiting their ability to provide insights at the circuit level. In the future, we aim to develop a CogLink network that incorporates electrophysiological data as well as connectivity and functional data.
Methods
Model overview
Our model is specified by a differential equation governing the evolution of the neural activities (Equation (2.1)), a set of synaptic weights, and synaptic update rules (Equations (2.4), (2.11), (2.13)). In the subsequent section, we provide a more detailed specification of our model.
Basic CogLink model
This section presents the details of the basic CogLink model. Let A denote the number of alternatives, M represent the size of a premotor cortex ensemble, and K indicate the sparsity of cortical activities. In the basic CogLink model, there are A ensembles of premotor neurons. Within the ath ensemble, the premotor cortex activities \({{{{\bf{x}}}}}_{a}^{\,{\mbox{alm}}\,}\in {{\mathbb{R}}}^{M}\) evolve according to the following equation:
Here, the membrane time constant τ alm = 1/6, excitatory inputs I = K − 0.25, and recurrent synaptic weights \({{{{\bf{W}}}}}_{a}^{\,{\mbox{alm}}\,}\in {{\mathbb{R}}}^{M\times M}\) are defined as:
The nonlinearity function \(g:{\mathbb{R}}\to {\mathbb{R}}\) is defined as:
Bt represents a standard Brownian motion with identity covariance. The selection of the recurrent weights \({{{{\bf{W}}}}}_{a}^{\,{\mbox{alm}}\,}\) and inputs I is designed to implement K-WTA dynamics156.
The premotor cortex then projects to the BG. The activities of the BG at the ath ensemble, \({{{{\bf{x}}}}}_{a}^{\,{\mbox{bg}}\,}\in {{\mathbb{R}}}^{M}\), evolve according to the following equation:
Here, the membrane time constant τbg = 0.1, and the premotor cortex-BG synapses, \({{{{\bf{V}}}}}_{a}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\in {{\mathbb{R}}}^{M}\), are initialized with:
The BG then projects to the motor cortex, and the recurrent competitive dynamics of the motor cortex determine the action at at trial t. Specifically, the activities of the motor cortex, denoted as \({{{{\bf{x}}}}}^{{{{\rm{mct}}}}}\in {{\mathbb{R}}}^{A}\), evolve according to the following equations:
and
Here, the membrane time constant τmct = 1, and BG-motor cortex synapses that tuned to action a, \({{{{\bf{W}}}}}_{a}^{\,{\mbox{bg/mct}}\,}\in {{\mathbb{R}}}^{M}\), where ∀ m ∈ [M], a ∈ [A], \({{{{\bf{W}}}}}_{a,m}^{\,{\mbox{bg/mct}}\,}=1/K\). The recurrent synaptic weights, \({{{{\bf{W}}}}}^{{{{\rm{mct}}}}}\in {{\mathbb{R}}}^{A\times A}\), are defined as:
The action a is chosen as at if either \({{{{\bf{x}}}}}_{a}^{\,{\mbox{mct}}\,}\) reaches the threshold = 1 within 5 s after the trial starts, or chosen stochastically from a softmax distribution, at~softmax(30xmct), after that.
Once the model receives the reward rt, it forms a distributional reward prediction error (RPE), denoted as \({{{\boldsymbol{\delta }}}}\in {{\mathbb{R}}}^{M}\):
This RPE is then used to update the premotor cortex-BG synapses, \({{{{\bf{V}}}}}_{{a}_{t}}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\), according to the equation:
Here, the learning rate, ηa, is defined as follows: \({{{{\boldsymbol{\eta }}}}}_{a}=\frac{1}{7+{N}_{a}}\), where Na represents the count of the number of times a was chosen up to trial t.
For the KO-sparseness model, we let K = 80, and for KO-distributional-RPE model, we let M = 1.
The simulation is conducted by discretizing the differential equation using dt = 0.005.
Approximation of the basic CogLink model to an algorithm with an analysis of the algorithm
In this section, we approximate the basic CogLink as an algorithm and conduct a mathematical analysis of its performance.
The stable fixed points \({S}_{a}^{\,{\mbox{alm}}\,}\) of the premotor cortex dynamics at the ath ensemble (see Equation (4.1)) are defined as:
Here, supp(x) = {xi∣xi ≠ 0}. Assuming the K-WTA sampling dynamic at the premotor cortex occurs instantaneously (i.e., τalm is small), the premotor cortex dynamic \({{{{\bf{x}}}}}_{a}^{\,{\mbox{alm}}\,}\) converges to one of the fixed points above for each ensemble. As the network is symmetric, it uniformly converges to one of the fixed points \({\hat{{{{\bf{x}}}}}}_{a}^{\,{\mbox{alm}}\,}\), i.e., \({\hat{{{{\bf{x}}}}}}_{a}^{\,{\mbox{alm}}} \sim {\mbox{unif}}({S}_{a}^{{\mbox{alm}}\,})\).
Similarly, assuming the BG dynamic (see Equation (4.4)) occurs instantaneously (i.e., τbg is small), we obtain:
From Equation (4.6), we also have
\({{{{\bf{I}}}}}_{a}^{\,{\mbox{mct}}\,}\) is then the sample value \({\hat{v}}_{a}\) in our algorithm (Equation (2.6)).
Finally, assuming the WTA motor dynamic (see Equation (4.7)) occurs instantaneously (i.e., τmct is small), the motor cortex dynamic outputs action At:
This corresponds to Equation (2.7) in the algorithm.
The regret of the algorithm is nearly optimal
In this section, we analyze the performance of the algorithm. To simplify the notation for analysis, we present the pseudo-code of the algorithm and introduce a few notation changes (see Algorithm 1).
Algorithm 1
Algorithmic form of the CogLink model
1: Input parameters \(A,K,M,{\left\{{{{{\boldsymbol{\eta }}}}}_{(t)}\right\}}_{t\in [T]},{\left\{{\bar{{{{\bf{V}}}}}}_{a,m}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\right\}}_{a\in [A],m\in [M]}\)
2: For all a ∈ [A], for m ∈ [M], initialize N1,a = 1 and \({{{{\bf{v}}}}}_{1,a,m}={\bar{{{{\bf{V}}}}}}_{a,m}^{{{{\rm{alm}}}}/{{{\rm{bg}}}}}\)
3: for trial t = 1, …, T do
4: for action a = 1, …, A do
5: Let \({{{{\mathcal{V}}}}}_{t,a}\) be the uniform distribution over \({\left\{\frac{1}{K}{\sum }_{j=1}^{K}{{{{\bf{v}}}}}_{t,a,{i}_{j}}\right\}}_{1\le {i}_{1} < \ldots < {i}_{K}\le M}\)
6: Sample \({\hat{{{{\bf{v}}}}}}_{t,a} \sim {{{{\bf{V}}}}}_{t,a}\)
7: Output action \({a}_{t}\leftarrow \arg {\max }_{a}{\hat{{{{\bf{v}}}}}}_{t,a}\)
8: Receive reward rt
9: \({{{{\boldsymbol{\delta }}}}}_{t}\leftarrow {r}_{t}-{{{{\bf{v}}}}}_{t,{a}_{t}}\)
10: if a = at then
11: N(t+1),a ← Nt,a + 1
12: \({{{{\bf{v}}}}}_{(t+1),a}\leftarrow {{{{\bf{v}}}}}_{t,a}+{\eta }_{({N}_{t,a})}{{{{\boldsymbol{\delta }}}}}_{t}\)
13: else
14: N(t+1),a ← Nt,a
15: v(t+1),a ← vt,a
Let μi represent the probability of receiving rewards for choosing action i. Without loss of generality, let action 1 denote the optimal action. Define Δi = μ1 − μi and \(D=\frac{{{{{\mathbf{\Delta }}}}}_{1}}{{{{{\mathbf{\Delta }}}}}_{2}}\). The primary objective of this section is to establish the following theorem:
Theorem 2
Let K = 1, and for all a ∈ [A] and m ∈ [M], \({\bar{{{{\bf{V}}}}}}_{a,m}^{\,{\mbox{alm/bg}}\,}=\frac{Cm}{M}\), where \(C=\frac{16\log (2ATD{{{{\mathbf{\Delta }}}}}_{2}^{2}\log T)}{{{{{\mathbf{\Delta }}}}}_{2}}\). Additionally, let \({\eta }_{(t)}=\frac{1}{1+t}\) for all t ∈ [T]. Under these conditions, the regret of this algorithm is bounded by \(\sqrt{324ATD\log (2ATD{{{{\mathbf{\Delta }}}}}_{2}^{2}\log T)}\).
Let M denote the number of neurons in each ensemble, and let vtam represent the synaptic strength of the mth neuron at the ath ensemble at trial t. The learning rate after an action is chosen t times is denoted by ηt. Additionally, Nt,a denotes the number of times action a has been chosen at the end of trial t, and Tn,a represents the trial when action a has been chosen for the nth time. If action \(\hat{a}\) is chosen at trial t, we employ the standard reward prediction error update:
And for \(a\ne \hat{a}\), v(t+1),a,m = vt,a,m. One can conceive of this ensemble of synapses as a quantile distribution \({{{{\mathcal{V}}}}}_{t,a}\) representing the values for each action. Each ensemble randomly samples \({\hat{{{{\bf{v}}}}}}_{t,a}\) from this quantile distribution \({{{{\mathcal{V}}}}}_{t,a}\) by selecting K synaptic strengths uniformly at random and averaging them:
The action is then chosen based on the values of the samples through a mutual competition process:
By recursively expanding Equation (4.15), we obtain:
For the theoretical analysis of this circuit, we consider the following simple setting: let K = 1, and for all a ∈ [A], \({{{{\bf{v}}}}}_{0,a,m}=\frac{Cm}{M}\), where C > 0 is a constant we will define later. For all m ∈ [M] and t ∈ [T], let \({\eta }_{(t)}=\frac{1}{1+t}\). By substituting these conditions into the equation, we obtain:
Now, we aim to bound the expectation of Nt,a for a ≠ 1. Demonstrating that the model selects suboptimal actions infrequently implies small regret. Given any \(\epsilon \in {\mathbb{R}}\), we define \({E}_{a}(t)=\{{\hat{{{{\bf{v}}}}}}_{ta}\le {{{{\boldsymbol{\mu }}}}}_{1}-\epsilon \}\). We establish the following stopping time to capture the event when rewards are concentrated around the mean:
Applying the maximal Hoeffding inequality yields:
By union bounding over all intervals and actions:
Setting \({\delta }^{{\prime} }=\frac{\delta }{A\log T}\), we obtain:
Now, let’s bound the expectation of Nt,a using this stopping time. We have:
Now, let’s decompose the first term as follows:
Let \({a}_{t}^{{\prime} }=\arg {\max }_{a\ne 1}{\hat{{{{\bf{v}}}}}}_{t,a}\), and let Ft,a be the cumulative distribution function of \({{{{\mathcal{V}}}}}_{t,a}\) conditioning on τ ≥ t. To bound the first term, let’s examine:
Now, we have:
Note that from Equation (4.19) and Equation (4.20), we have, conditioning on τ ≥ t:
Notice that if \({N}_{t,1}\ge \frac{4\log \frac{2A\log T}{\delta }}{{\epsilon }^{2}}\), then we have
If \({N}_{t,1} < \frac{4\log \frac{2A\log T}{\delta }}{{\epsilon }^{2}}\) and \(C\ge \frac{8\log \frac{2A\log T}{\delta }}{\epsilon }\), then
This implies that \({F}_{{T}_{t,1}1}({{{{\boldsymbol{\mu }}}}}_{1}-\epsilon )\le \frac{1}{2}\), hence
Consequently,
Similarly, we can bound the second term:
By Equation (4.27), if \({N}_{t,a} > \frac{16D\log \frac{2A\log T}{\delta }}{{({{{{\mathbf{\Delta }}}}}_{a}-\epsilon )}^{2}}\) and \(C\le \frac{8D\log \frac{2A\log T}{\delta }}{{\Delta }_{a}-\epsilon }\), then \({F}_{{T}_{t,a},a}(\,{{{{\boldsymbol{\mu }}}}}_{1}-\epsilon )=1\). Hence,
Now, let’s set \(\epsilon=\frac{{{{{\mathbf{\Delta }}}}}_{a}}{2}\), \(\delta=\frac{1}{TD{{{{\mathbf{\Delta }}}}}_{2}^{2}}\), and \(C=\frac{16\log \frac{2A\log T}{\delta }}{{{{{\mathbf{\Delta }}}}}_{2}}\). This satisfies the condition for C:
Combining Equation (4.23), Equation (4.31), and Equation (4.33), we find:
Now, let’s bound the regret:
For any Δ > 0, we can divide the sum as follows.
By geometric inequality, we have
as desired.
Specifically, when A = 2, we can present the following simplified theorem.
Theorem 3
Let K = 1 and Δ = ∣μ1 − μ2∣ and \(\forall a\in [2],\,{v}_{0am}=\frac{Cm}{M}\) where \(C=\frac{16\log (4T{\Delta }^{2}\log T)}{\Delta }\). For all m ∈ [M], t ∈ [T], let \({\eta }_{(t)}=\frac{1}{1+t}\). Then the regret of this algorithm is bounded by \(36\sqrt{T\log (4T\Delta \log T)}\).
Details on the augmented CogLink
This section provides details of the augmented CogLink model. At its core, the model comprises the PFC-MD-like circuit for contextual inferences and copies of basic CogLink models for dynamically switching behavioral strategies based on the inferred context.
At trial t, the prefrontal cortex activities \({{{{\bf{x}}}}}^{{{{\rm{pfc}}}}}\in {{\mathbb{R}}}^{A\times 2}\) jointly encode actions and rewards at the last trial, with the following formulation:
To form a line attractor in MD, we consider the following thalamocortical loop:
The MD activities, denoted as \({{{{\bf{x}}}}}^{{{{\rm{md}}}}}\in {{\mathbb{R}}}^{2}\), evolve according to the equation:
the frontal cortex activities, denoted as \({{{{\bf{x}}}}}^{{{{\rm{fc}}}}}\in {{\mathbb{R}}}^{2}\), evolve according to the equation:
and the TRN activity, denoted as \({x}^{{{{\rm{trn}}}}}\in {\mathbb{R}}\), evolves according to the equation:
Here, τeff = 5 represents the effective time constant for accumulation dynamics, τfc = τtrn = 0.1 represents the membrane time constant of frontal neurons and TRN neurons, and D = 4 signifies the threshold for accumulation dynamics.
The nonlinearity function, \({f}_{{{{\rm{md}}}}}:{\mathbb{R}}\to {\mathbb{R}},\) is defined as
and the PFC-MD inputs, denoted as \({{{{\bf{I}}}}}^{{{{\rm{pfc/md}}}}}\in {{\mathbb{R}}}^{2}\), are given by
Here, \({{{{\bf{W}}}}}_{c}^{\,{\mbox{pfc/md}}\,}\in {{\mathbb{R}}}^{A\times 2}\) represents PFC-MD connections projecting to MD neurons tuned to context c, and ⋅ signifies the matrix inner product. Additionally, \({f}_{{{{\rm{pfc}}}}}(x)={[2.7+\log (x)]}_{+}\), βd2 = 0.85 in the D2R hyperactivation model and the rescue model, βd2 = 1 otherwise, and Irescue = 0.45 in the rescue model and Irescue = 0 otherwise. We set xmd = 0 for the MD inhibition model throughout the experiment.
We update the PFC-MD connections through Hebbian learning as follows:
Here, \({{{{\bf{W}}}}}_{c,a,r}^{\,{\mbox{pfc/md}}\,}\) represents the synapse \({f}_{{{{\rm{hebb}}}}}:{\mathbb{R}}\to {\mathbb{R}}\) denotes the sigmoidal nonlinearity function,
The learning rate is determined by \(\forall c\in [2],a\in [A],{{{{\boldsymbol{\eta }}}}}_{c}\in {{\mathbb{R}}}^{2}\), given by \({{{{\boldsymbol{\eta }}}}}_{c}=\frac{{f}_{{{{\rm{hebb}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{md}}\,})}{4+{N}_{c}}\), where Nc represents a rolling sum of \({f}_{{{{\rm{hebb}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{md}}\,})\), and is updated as \({N}_{c}\leftarrow {N}_{c}+{f}_{{{{\rm{hebb}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{md}}\,})\). For the KO-nonlinear Hebb, we replace fhebb with a linear function
MD neurons then modulate the downstream d-CS models via interneuron-mediated pathways. Specifically, the interneuron activities are defined as
and
Here, the interneuron membrane time constant is τvip = τpv = 0.1, and \(\bar{c}\) represents a context different from c. These activities modulate the downstream d-CS models as follows:
where
For KO-interneuron gating, we replace fin with direct MD modulation
where
These interneuron-mediated pathways also modulate plasticity. Specifically,
where \({{{{\boldsymbol{\eta }}}}}_{c,a}=\frac{1}{4+{N}_{c,a}}\) and Nc,a is the rolling sum of \(0.5*{{{{\bf{1}}}}}_{a={a}_{t}}\,{f}_{{{{\rm{in}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{vip}}\,}-{{{{\bf{x}}}}}_{c}^{\,{\mbox{pv}}\,})\), with \({N}_{a,c}\leftarrow {N}_{a,c}+0.5*{{{{\bf{1}}}}}_{a={a}_{t}}\,{f}_{{{{\rm{in}}}}}({{{{\bf{x}}}}}_{c}^{\,{\mbox{vip}}\,}-{{{{\bf{x}}}}}_{c}^{\,{\mbox{pv}}\,})\). The remainder of the model consists of two copies of the basic CogLink models.
Approximation of the thalamocortical model to an algorithm
In this section, we approximate the thalamocortical model as an algorithm and demonstrate its connection to the CUSUM algorithm56,57. Additionally, we illustrate that the D2R hyperactive impaired model corresponds to a leaky evidence integrator.
We recall that the MD circuit can be described by (Equation (4.40), Equation (4.41), Equation (4.42)). Notice that by letting the dynamics of xfc and xtrn be instantaneous, we can describe the effective MD circuit dynamics as follows:
Here, D = 4, and we will show that D represents the threshold of accumulation dynamics. τ eff = 5 represents the effective time constant for accumulation dynamics, τ md = τ effD/2 represents the membrane time constant, \(w=\frac{1}{D}\), and \({{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,},{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}\) represent the PFC inputs to MD. The nonlinearity function is defined as:
Let \(X={{{{\bf{x}}}}}_{1}^{\,{\mbox{md}}\,}-{{{{\bf{x}}}}}_{2}^{\,{\mbox{md}}\,}\). Then, we have:
and
If \(| {{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,}-{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}| \ll 1\), then at the stationary point, X remains approximately ±D. This corresponds to a drift-diffusion process with a threshold at ±D and inputs of \({{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,}-{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}\). To be precise, we discretize the differential equation and threshold X at ±4 to derive the following algorithm:
We prove the following theorem:
Theorem 4
Let dt = τeff. If we set X0 = −D, St = Xt + D and assume \(| {{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,}-{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}| \ll 1\), the evolution of X0 approximates to
We can prove the theorem by substituting the variable into Equation (4.58) and adding D to both sides:
as desired. Notably, when \({{{{\bf{I}}}}}_{1}^{\,{\mbox{pfc/md}}\,}(t)=\log P({a}_{t},{r}_{t}| c=1)+\alpha,{{{{\bf{I}}}}}_{2}^{\,{\mbox{pfc/md}}\,}(t)=\log P({a}_{t},{r}_{t}| c=2)+\alpha\) for any α > 0 and Sn < 2D, this corresponds exactly to the CUSUM algorithm:
To analyze the impaired model, we recall the equations:
Let \(X={{{{\bf{x}}}}}_{1}^{\,{\mbox{md}}\,}-{{{{\bf{x}}}}}_{2}^{\,{\mbox{md}}\,}\). Then, we have:
This indicates that the evidence accumulation dynamic is a leaky integrator. At the stationary point, we have
By plugging in the model learned by the impaired model in Fig. 7k, we have
So we have the threshold \(X\approx \frac{0.055{\beta }_{{{{\rm{d2}}}}}D}{2(1-{\beta }_{{{{\rm{d2}}}}})}=0.62\ll 4=D\), consistent with Fig. 7i. This demonstrates that the impaired model has a much lower evidence accumulation threshold compared to the normal model, thereby inducing a strong prior on environmental volatility.
Details of other models
This section contains details on other models used in the paper. For Thompson sampling, let γ be the discounted factor. Initialize for all a ∈ [A], αa = βa = 0. Then, we sample from the posterior
We output action
and receive reward rt. We then update the parameter
In all simulations in the paper, we use γ = 1 for Thompson sampling and γ = 0.93 for discounted Thompson sampling.
For the hidden Markov model with Thompson sampling, Initialize for all a ∈ [A], c ∈ [2], αa,c = βa,c = 0. We use HMM with known environmental parameters to infer the current contextual likelihood, \({{{{\bf{p}}}}}_{c}\in {{\mathbb{R}}}^{2}\)
And then we sample from the posterior,
We output action
and receive reward rt. We then update the parameter
For the Deep Q-Network (DQN)27, we use a multilayer perceptron with a hidden layer size of 10 and an ϵ-greedy exploration strategy. To balance exploration and exploitation, at trial t, given state st, we define \({{{\bf{N}}}}\in {{\mathbb{R}}}^{S}\), where S is the total number of states. The visit count for each state is updated as: \({{{{\bf{N}}}}}_{s}\leftarrow {{{{\bf{N}}}}}_{s}+{{{{\bf{1}}}}}_{s={s}_{t}}0.2\). The model explores uniformly at random with probability \(\epsilon=\frac{1}{{{{{\bf{N}}}}}_{{s}_{t}}}\) and otherwise selects the action with the highest Q-value.
A-AFC task
This section contains details for the stationary A-AFC task. The task contains two parameters: the expected difference in reward probability between the most and the least rewarding actions (Δ) and the number of alternatives (A). The reward probability θa of action a ∈ [A] is specified by
Each session contains 500 trials, and for each simulation, we run the task for 50 sessions.
Cued 2-AFC task
This section contains details for the cue 2-AFC task (Fig. S2a). The task contains one parameter, the expected difference in reward probability between the most and the least rewarding actions (Δ). In each trial, with uniform probability, the model will be presented with cue 1 or cue 2. The reward probability of action 1 after seeing cue 1 is 70% while action 2 is (70 − Δ)%. On the other hand, the reward probability of action 1 after seeing cue 2 is (70 − Δ)% while action 2 is 70%. Each session contains 500 trials, and for each simulation, we run the task for 50 sessions.
Binary tree maze task
This section contains details for the binary tree maze task (Fig. S2c). The task consists of a depth-2 binary tree maze with 4 end locations. Upon reaching each end location, the model will receive a reward with probability \(1,\frac{2}{3},\frac{1}{3},0\) respectively. The task contains one parameter a; at the start, if the model chooses left, it receives a reward with probability a, and if the model chooses right, it receives a reward with probability (1 − a). Each session contains 500 trials, and for each simulation, we run the task for 50 sessions.
Probabilistic reversal task
This section contains details for the probabilistic reversal task. There are two alternatives, left or right, in the task, and the reward probability in the context 1 is θR = 0.3, θL = 0.7, and the reward probability in the context 2 is θR = 0.7, θL = 0.3. The task starts with context 1 and switches to an alternative context for every 200 trials. The task consists of 1000 trials, and for each simulation, we run the task for 50 sessions. For the low outcome uncertainty environment in Fig. 5i, we replace the 70%, 30% reward probability with 90%, 10%.
Regret and contextual uncertainty
Let \({\theta }_{t}\in {{\mathbb{R}}}^{A}\) be the probability of getting a reward for each action at trial t. We define regret at trial T, RT, as the expected differences in rewards between the retrospectively optimal action and the action taken,
where \({a}_{t}^{*}\) is the retrospectively optimal action.
To decode the contextual uncertainty, U, at Fig. 4g and Fig. S7b, we consider the following nonlinear transformation of the MD activities xmd:
Notice that when two MD populations have the same activity, the uncertainty is 1, and when they have a large difference in activity, the uncertainty is close to 0.
Statistic test
Data were first tested for normality using the Shapiro–Wilk test. All data presented in this paper are non-normally distributed; therefore, all statistical tests were conducted using nonparametric statistics. For all comparisons of two groups, we used a two-way rank sum test. For comparison of more than two groups, we used the Bonferroni-corrected Kruskal–Wallis test with post hoc Dunn’s test. All permutation tests are done using 106 resamples.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The behavioral and neural activity data of the models in this study have been deposited at FigShare and are publicly available. DOI is https://doi.org/10.6084/m9.figshare.26065372.
Code availability
All original code has been deposited at Zenodo and is publicly available as of the date of publication. DOI is https://doi.org/10.5281/zenodo.13152289. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
References
Lam, N. H. et al. Prefrontal transthalamic uncertainty processing drives flexible switching. Nature 637, 127–136 (2024).
Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science 364, eaav8911 (2019).
Bill, J., Gershman, S. J. & Drugowitsch, J. Visual motion perception as online hierarchical inference. Nat. Commun. 13, 7403 (2022).
Rohe, T., Ehlis, A. C. & Noppeney, U. The neural dynamics of hierarchical Bayesian causal inference in multisensory perception. Nat. Commun. 10, 1907 (2019).
Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: statistics, structure, and abstraction. Science 331, 1279–1285 (2011).
Mathys, C. D. et al. Uncertainty in perception and the hierarchical Gaussian filter. Front. Hum. Neurosci. 8, 825 (2014).
Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719 (2004).
Kording, K. P. & Wolpert, D. M. Bayesian integration in sensorimotor learning. Nature 427, 244–247 (2004).
Nott, D. J., Drovandi, C. & Frazier, D. T. Bayesian inference for misspecified generative models. Annu. Rev. Stat. Appl. 11, 179–202 (2024).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Yamins, D. L. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. USA 111, 8619–8624 (2014).
Monosov, I. E. How outcome uncertainty mediates attention, learning, and decision-making. Trends Neurosci. 43, 795–809 (2020).
Knill, D. C. & Whitman, R. (eds) Perception as Bayesian Inference (Cambridge Univ. Press, 1996).
Stocker, A. A. & Simoncelli, E. P. Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci. 9, 578–585 (2006).
Wang, B. A. et al. Thalamic regulation of reinforcement learning strategies across prefrontal-striatal networks. Nat. Commun. https://doi.org/10.1038/s41467-025-63995-x (2025).
Zhou, T. et al. Enhancement of mediodorsal thalamus rescues aberrant belief dynamics in a mouse model with schizophrenia-associated mutation. Preprint at bioRxiv https://doi.org/10.1101/2024.01.08.574745 (2024).
Niv, Y. Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154 (2009).
Soltani, A. & Wang, X. J. A biophysically based neural model of matching law behavior: melioration by stochastic synapses. J. Neurosci. 26, 3731–3744 (2006).
Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).
Hoyer, P. & Hyvärinen, A. Interpreting neural response variability as Monte Carlo sampling of the posterior. In Advances in Neural Information Processing Systems Vol. 15 (eds Becker, S. et al.) (MIT Press, 2002).
Rao, R. P. Bayesian computation in recurrent neural circuits. Neural Comput. 16, 1–38 (2004).
Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).
Roitman, J. D. & Shadlen, M. N. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 22, 9475–9489 (2002).
Lattimore, T. & Szepesvári, C. Bandit Algorithms (Cambridge Univ. Press, 2020).
Auer, P., Cesa-Bianchi, N., Freund, Y. & Schapire, R. Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proc. IEEE 36th Annual Foundations of Computer Science 322–331 (1995).
Korda, N., Kaufmann, E. & Munos, R. Thompson sampling for 1-dimensional exponential family bandits. In Advances in Neural Information Processing Systems Vol. 26 (eds Burges, C. et al.) (Curran Associates, Inc., 2013).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Vul, E. Sampling in Human Cognition. Ph.D. dissertation, Massachusetts Institute of Technology https://dspace.mit.edu/handle/1721.1/62097 (2010).
Battaglia, P. W., Kersten, D. & Schrater, P. R. How haptic size sensations improve distance perception. PLoS Comput. Biol. 7, e1002080 (2011).
Acerbi, L., Vijayakumar, S. & Wolpert, D. M. On the origins of suboptimality in human probabilistic inference. PLoS Comput. Biol. 10, e1003661 (2014).
Prat-Carrabin, A., Wilson, R. C., Cohen, J. D. & Azeredo da Silveira, R. Human inference in changing environments with temporal structure. Psychol. Rev. 128, 879–912 (2021).
Prat-Carrabin, A., Meyniel, F. & Azeredo da Silveira, R. Resource-rational account of sequential effects in human prediction. eLife 13, e81256 (2024).
Chen, C. S., Mueller, D., Knep, E., Ebitz, R. B. & Grissom, N. M. Dopamine and norepinephrine differentially mediate the exploration-exploitation tradeoff. J. Neurosci. 44, e1194232024 (2024).
Doya, K. Metalearning and neuromodulation. Neural Netw. 15, 495–506 (2002).
Sakai, K. & Passingham, R. E. Prefrontal interactions reflect future task operations. Nat. Neurosci. 6, 75–81 (2003).
Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).
Wolff, M. & Halassa, M. M. The mediodorsal thalamus in executive control. Neuron 112, 893–908 (2024).
Wang, M. B. & Halassa, M. M. Thalamocortical contribution to flexible learning in neural systems. Netw. Neurosci. 6, 980–997 (2022).
Scott, D. N., Mukherjee, A., Nassar, M. R. & Halassa, M. M. Thalamocortical architectures for flexible cognition and efficient learning. Trends Cogn. Sci. 28, 739–756 (2024).
Nakajima, M. & Halassa, M. M. Thalamic control of functional cortical connectivity. Curr. Opin. Neurobiol. 44, 127–131 (2017).
Halassa, M. M. & Kastner, S. Thalamic functions in distributed cognitive control. Nat. Neurosci. 20, 1669–1679 (2017).
Halassa, M. M. & Sherman, S. M. Thalamocortical circuit motifs: a general framework. Neuron 103, 762–770 (2019).
Schmitt, L. I. et al. Thalamic amplification of cortical connectivity sustains attentional control. Nature 545, 219–223 (2017).
Rikhye, R. V., Gilra, A. & Halassa, M. M. Thalamic regulation of switching between cortical representations enables cognitive flexibility. Nat. Neurosci. 21, 1753–1763 (2018).
Mukherjee, A., Lam, N. H., Wimmer, R. D. & Halassa, M. M. Thalamic circuits for independent control of prefrontal signal and noise. Nature 600, 100–104 (2021).
Chen, X., Sorenson, E. & Hwang, K. Thalamocortical contributions to working memory processes during the n-back task. Neurobiol. Learn. Mem. 197, 107701 (2023).
Zheng, W. L., Wu, Z., Hummos, A., Yang, G. R. & Halassa, M. M. Rapid context inference in a thalamocortical model using recurrent neural networks. Nat Commun. 15, 8275 (2024).
Hummos, A., Wang, B. A., Drammis, S., Halassa, M. M. & Pleger, B. Thalamic regulation of frontal interactions in human cognitive flexibility. PLoS Comput Biol. 18, e1010500 (2022).
Zhang, X., Mukherjee, A., Halassa, M. M. & Chen, Z. S. Mediodorsal thalamus regulates task uncertainty to enable cognitive flexibility. Nat Commun. 16, 2640 (2025).
Halassa, M. M. & Acsády, L. Thalamic Inhibition: Diverse Sources, Diverse Scales. Trends Neurosci. 39 680–693 (2016).
Halassa, M. M. et al. State-dependent architecture of thalamic reticular subnetworks. Cell. 158, 808–821 (2014).
Wimmer, R. D. et al. Thalamic control of sensory selection in divided attention. Nature 526, 705–709 (2015).
Nakajima, M., Schmitt, L. I. & Halassa, M. M. Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway. Neuron 103, 445–458 (2019).
Canto-Bustos, M., Friason, F. K., Bassi, C. & Oswald, A. M. Disinhibitory circuitry gates associative synaptic plasticity in olfactory cortex. J. Neurosci. 42, 2942–2950 (2022).
Williams, L. E. & Holtmaat, A. Higher-order thalamocortical inputs gate synaptic long-term potentiation via disinhibition. Neuron 101, 91–102 (2019).
Moustakides, G. V. Optimal stopping times for detecting changes in distributions. Ann. Stat. 14, 1379–1387 (1986).
Lorden, G. Procedures for reacting to a change in distribution. Ann. Math. Stat. 42, 1897–1908 (1971).
Chakraborty, S., Kolling, N., Walton, M. E. & Mitchell, A. S. Critical role for the mediodorsal thalamus in permitting rapid reward-guided updating in stochastic reward environments. eLife 5, e13588 (2016).
Alcaraz, F. et al. Dissociable effects of anterior and mediodorsal thalamic lesions on spatial goal-directed behavior. Brain Struct. Funct. 221, 79–89 (2016).
Hwang, K., Bruss, J., Tranel, D. & Boes, A. D. Network localization of executive function deficits in patients with focal thalamic lesions. J. Cogn. Neurosci. 32, 2303–2319 (2020).
Baker, S. C., Konova, A. B., Daw, N. D. & Horga, G. A distinct inferential mechanism for delusions in schizophrenia. Brain 142, 1797–1812 (2019).
Sheffield, J. M., Suthaharan, P., Leptourgos, P. & Corlett, P. R. Belief updating and paranoia in individuals with schizophrenia. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 7, 1149–1157 (2022).
Adams, R. A., Napier, G., Roiser, J. P., Mathys, C. & Gilleen, J. Attractor-like dynamics in belief updating in schizophrenia. J. Neurosci. 38, 9471–9485 (2018).
Nassar, M. R., Waltz, J. A., Albrecht, M. A., Gold, J. M. & Frank, M. J. All or nothing belief updating in patients with schizophrenia reduces precision and flexibility of beliefs. Brain 144, 1013–1029 (2021).
Corlett, P. R. & Fletcher, P. Modelling delusions as temporally-evolving beliefs. Cogn. Neuropsychiatry 26, 231–241 (2021).
Corlett, P., Taylor, J., Wang, X.-J., Fletcher, P. & Krystal, J. Toward a neurobiology of delusions. Prog. Neurobiol. 92, 345–369 (2010).
Huang, A. S. et al. A prefrontal thalamocortical readout for conflict-related executive dysfunction in schizophrenia. Cell Rep. Med. 5, 101802 (2024).
Anticevic, A. & Halassa, M. M. The thalamus in psychosis spectrum disorder. Front. Neurosci. 17, 1163600 (2023).
Mukherjee, A. & Halassa, M. M. The associative thalamus: a switchboard for cortical operations and a promising target for schizophrenia. Neuroscientist 30, 132–147 (2022).
Paz, R. & nez Amaya, J. M. The mediodorsal thalamic nucleus and schizophrenia. J. Psychiatry Neurosci. 33, 489–498 (2008).
Anticevic, A. et al. Mediodorsal and visual thalamic connectivity differ in schizophrenia and bipolar disorder with and without psychosis history. Schizophr. Bull. 40, 1227–1243 (2014).
Byne, W. et al. Magnetic resonance imaging of the thalamic mediodorsal nucleus and pulvinar in schizophrenia and schizotypal personality disorder. Arch. Gen. Psychiatry 58, 133–140 (2001).
Woodward, N. D., Karbasforoushan, H. & Heckers, S. Thalamocortical dysconnectivity in schizophrenia. Am. J. Psychiatry 169, 1092–1099 (2012).
Pomarol-Clotet, E. et al. Medial prefrontal cortex pathology in schizophrenia as revealed by convergent findings from multimodal imaging. Mol. Psychiatry 15, 823–830 (2010).
Seeman, P. & Lee, T. Antipsychotic drugs: direct correlation between clinical potency and presynaptic action on dopamine neurons. Science 188, 1217–1219 (1975).
Creese, I., Burt, D. R. & Snyder, S. H. Dopamine receptor binding predicts clinical and pharmacological potencies of antischizophrenic drugs. Science 192, 481–483 (1976).
Meltzer, H. Y., Matsubara, S. & Lee, J. C. Classification of typical and atypical antipsychotic drugs on the basis of dopamine D-1, D-2 and serotonin2 pKi values. J. Pharmacol. Exp. Ther. 251, 238–246 (1989).
Wong, D. F. et al. Positron emission tomography reveals elevated D2 dopamine receptors in drug-naive schizophrenics. Science 234, 1558–1563 (1986).
Abi-Dargham, A. et al. Increased baseline occupancy of D2 receptors by dopamine in schizophrenia. Proc. Natl. Acad. Sci. USA 97, 8104–8109 (2000).
Cazorla, M., Shegda, M., Ramesh, B., Harrison, N. L. & Kellendonk, C. Striatal D2 receptors regulate dendritic morphology of medium spiny neurons via Kir2 channels. J. Neurosci. 32, 2398–2409 (2012).
Waltz, J. A. The neural underpinnings of cognitive flexibility and their disruption in psychotic illness. Neuroscience 345, 203–217 (2017).
Deserno, L. et al. Volatility estimates increase choice switching and relate to prefrontal activity in schizophrenia. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 5, 173–183 (2020).
Foster, N. N. et al. The mouse cortico-basal ganglia-thalamic network. Nature 598, 188–194 (2021).
Alexander, G. E., DeLong, M. R. & Strick, P. L. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986).
Cox, J. & Witten, I. B. Striatal circuits for reward learning and decision-making. Nat. Rev. Neurosci. 20, 482–494 (2019).
Petersen, C. C. & Crochet, S. Synaptic computation and sensory processing in neocortical layer 2/3. Neuron 78, 28–48 (2013).
Barth, A. L. & Poulet, J. F. Experimental evidence for sparse firing in the neocortex. Trends Neurosci. 35, 345–355 (2012).
Kerr, J. N. et al. Spatial organization of neuronal population responses in layer 2/3 of rat barrel cortex. J. Neurosci. 27, 13316–13328 (2007).
Kato, S. et al. Action selection and flexible switching controlled by the intralaminar thalamic neurons. Cell Rep. 22, 2370–2382 (2018).
Minamimoto, T., Hori, Y. & Kimura, M. Roles of the thalamic cm-pf complex-basal ganglia circuit in externally driven rebias of action. Brain Res. Bull. 78, 75–79 (2009).
Wolpert, D. M. & Kawato, M. Multiple paired forward and inverse models for motor control. Neural Netw. 11, 1317–1329 (1998).
Heald, J. B., Lengyel, M. & Wolpert, D. M. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600, 489–493 (2021).
Kim, J.-H., Daie, K. & Li, N. A combinatorial neural code for long-term motor memory. Nature 637, 663–672 (2024).
Jones, E. G. The thalamic matrix and thalamocortical synchrony. Trends Neurosci. 24, 595–601 (2001).
Sherman, S. M. & Guillery, R. W. Exploring the Thalamus and Its Role in Cortical Function 2nd edn (MIT Press, 2005), hardcover edn.
Tanaka, M. Cognitive signals in the primate motor thalamus predict saccade timing. J. Neurosci. 27, 12109–12118 (2007).
Saalmann, Y. B. & Kastner, S. The cognitive thalamus. Front. Syst. Neurosci. 9, 39 (2015).
Zhou, H., Schafer, R. J. & Desimone, R. Pulvinar-cortex interactions in vision and attention. Neuron 89, 209–220 (2016).
Bolkan, S. S. et al. Thalamic projections sustain prefrontal activity during working memory maintenance. Nat. Neurosci. 20, 987–996 (2017).
Guo, Z. V. et al. Maintenance of persistent activity in a frontal thalamocortical loop. Nature 545, 181–186 (2017).
Guo, W., Clause, A. R., Barth-Maron, A. & Polley, D. B. A corticothalamic circuit for dynamic switching between feature detection and discrimination. Neuron 95, 180–194 (2017).
Mukherjee, A. et al. Variation of connectivity across exemplar sensory and associative thalamocortical loops in the mouse. eLife 9, e62554 (2020).
Wang, Y. & Sun, Q.-Q. A prefrontal motor circuit initiates persistent movement. Nat. Commun. 15, 5264 (2024).
Gershman, S. J. Deconstructing the human algorithms for exploration. Cognition 173, 34–42 (2018).
Cohen, J. D., McClure, S. M. & Yu, A. J. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos. Trans. R. Soc. Lond. B Biol. Sci. 362, 933–942 (2007).
Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A. & Cohen, J. D. Humans use directed and random exploration to solve the explore-exploit dilemma. J. Exp. Psychol. Gen. 143, 2074–2081 (2014).
Ma, W. J. & Jazayeri, M. Neural coding of uncertainty and probability. Annu. Rev. Neurosci. 37, 205–220 (2014).
Walker, E. Y. et al. Studying the neural representations of uncertainty. Nat. Neurosci. 26, 1857–1867 (2023).
Akiti, K. et al. Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction. Neuron 110, 3789–3804 (2022).
O’Neill, M. & Schultz, W. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron 68, 789–800 (2010).
Masset, P., Ott, T., Lak, A., Hirokawa, J. & Kepecs, A. Behavior- and modality-general representation of confidence in orbitofrontal cortex. Cell 182, 112–126 (2020).
Orban, G., Berkes, P., Fiser, J. & Lengyel, M. Neural variability and sampling-based probabilistic representations in the visual cortex. Neuron 92, 530–543 (2016).
Walker, E. Y., Cotton, R. J., Ma, W. J. & Tolias, A. S. A neural basis of probabilistic computation in visual cortex. Nat. Neurosci. 23, 122–129 (2020).
Geurts, L. S., Cooke, J. R. H., van Bergen, R. S. & Jehee, J. F. M. Subjective confidence reflects representation of Bayesian probability in cortex. Nat. Hum. Behav. 6, 294–305 (2022).
Echeveste, R., Aitchison, L., Hennequin, G. & Lengyel, M. Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inference. Nat. Neurosci. 23, 1138–1149 (2020).
Deneve, S. Bayesian spiking neurons I: inference. Neural Comput. 20, 91–117 (2008).
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
van der Meer, M. A., Johnson, A., Schmitzer-Torbert, N. C. & Redish, A. D. Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron 67, 25–32 (2010).
Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).
Scher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Akam, T. et al. The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron 109, 149–163 (2021).
Witkowski, P. P., Park, S. A. & Boorman, E. D. Neural mechanisms of credit assignment for inferred relationships in a structured world. Neuron 110, 2680–2690 (2022).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Soltani, A. & Koechlin, E. Computational models of adaptive behavior and prefrontal cortex. Neuropsychopharmacology 47, 58–71 (2022).
Jones, E. G. (ed.) The Thalamus (Springer US, 1985).
Phillips, J. M., Kambi, N. A. & Saalmann, Y. B. A subcortical pathway for rapid, goal-driven, attentional filtering. Trends Neurosci. 39, 49–51 (2016).
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).
Bamford, N. S., Wightman, R. M. & Sulzer, D. Dopamine’s effects on corticostriatal synapses during reward-based behaviors. Neuron 97, 494–510 (2018).
Whittington, J. C. R. & Bogacz, R. Theories of error back-propagation in the brain. Trends Cogn. Sci. 23, 235–250 (2019).
Minsky, M. Steps toward artificial intelligence. Proc. IRE 49, 8–30 (1961).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci. 21, 335–346 (2020).
McCloskey, M. & Cohen, N. J. Catastrophic interference in connectionist networks: the sequential learning problem. In Psychology of Learning and Motivation Vol. 24 109–165 (Academic Press, 1989).
French, R. M. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3, 128–135 (1999).
Kumaran, D., Hassabis, D. & McClelland, J. L. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends Cogn. Sci. 20, 512–534 (2016).
Kemker, R., McClure, M., Abitino, A., Hayes, T. & Kanan, C. Measuring catastrophic forgetting in neural networks. In AAAI Conference on Artificial Intelligence https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16410 (2018).
Parisi, G. I., Kemker, R., Part, J. L., Kanan, C. & Wermter, S. Continual lifelong learning with neural networks: A review. Neural Netw. 113, 54–71 (2019).
Fiete, I. R. & Seung, H. S. Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys. Rev. Lett. 97, 048104 (2006).
Schiess, M., Urbanczik, R. & Senn, W. Somato-dendritic synaptic plasticity and error-backpropagation in active dendrites. PLoS Comput. Biol. 12, 1–18 (2016).
Kusmierz, L., Isomura, T. & Toyoizumi, T. Learning with three factors: modulating Hebbian plasticity with errors. Curr. Opin. Neurobiol. 46, 170–177 (2017).
Richards, B. A. & Lillicrap, T. P. Dendritic solutions to the credit assignment problem. Curr. Opin. Neurobiol. 54, 28–36 (2019).
Sacramento, J., Ponte Costa, R., Bengio, Y. & Senn, W. Dendritic cortical microcircuits approximate the backpropagation algorithm. In Advances in Neural Information Processing Systems Vol. 31 8735–8746 (Curran Associates, Inc., 2018).
Kornfeld, J. et al. An anatomical substrate of credit assignment in reinforcement learning. Preprint at bioRxiv https://doi.org/10.1101/2020.02.18.954354 (2020).
Liu, Y. H., Smith, S., Mihalas, S., Shea-Brown, E. & Sümbül, U. Cell-type–specific neuromodulation guides synaptic credit assignment in a spiking neural network. Proc. Natl. Acad. Sci. USA 118, e2111821118 (2021).
O’Reilly, R. C. Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Comput. 8, 895–938 (1996).
Roelfsema, P. R. & van Ooyen, A. Attention-gated reinforcement learning of internal representations for classification. Neural Comput. 17, 2176–2214 (2005).
Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. Random synaptic feedback weights support error backpropagation for deep learning. Nat. Commun. 7, 13276 (2016).
Roelfsema, P. R. & Holtmaat, A. Control of synaptic plasticity in deep cortical networks. Nat. Rev. Neurosci. 19, 166–180 (2018).
Gejman, P. V., Sanders, A. R. & Duan, J. The role of genetics in the etiology of schizophrenia. Psychiatr. Clin. North Am. 33, 35–66 (2010).
Levenstein, D. et al. On the role of theory and modeling in neuroscience. J. Neurosci. 43, 1074–1088 (2023).
Dayan, P. & Abbott, L. F. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems (MIT Press, 2001).
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Majani, E., Erlanson, R. & Abu-Mostafa, Y. On the k-winners-take-all network. In Advances in Neural Information Processing Systems Vol. 1 (ed. Touretzky, D.) (Morgan-Kaufmann, 1988).
Acknowledgements
This work is supported by NIMH grants p50mh132642, r01mh134466, r01mh120118 (M.B.W. and M.M.H.) and NSF grants CCR-2139936, CCR-2003830, CCF-1810758 (M.B.W. and N.L.).
Author information
Authors and Affiliations
Contributions
M.B.W. and M.M.H. conceived the project. M.B.W. developed the main models with inputs from M.M.H. M.B.W. developed the mathematical derivation and analysis with inputs from N.L. M.B.W. conducted the computational experiments and analyzed the data. M.B.W. wrote the manuscript with edits and feedback from M.M.H. and N.L. All the authors read the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, M.B., Lynch, N. & Halassa, M.M. The neural basis for uncertainty processing in hierarchical decision making. Nat Commun 16, 9096 (2025). https://doi.org/10.1038/s41467-025-63994-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-63994-y