Factorized embedding of goal and uncertainty in the lateral prefrontal cortex guides stably flexible learning

Sung, Yoondo; Rigotti, Mattia; Lee, Sang Wan

doi:10.1038/s41467-025-66677-w

Download PDF

Article
Open access
Published: 26 November 2025

Factorized embedding of goal and uncertainty in the lateral prefrontal cortex guides stably flexible learning

Nature Communications volume 16, Article number: 11643 (2025) Cite this article

4474 Accesses
26 Altmetric
Metrics details

Subjects

Abstract

A major challenge for adaptive agents is achieving behavioral flexibility without compromising stability—particularly in goal-directed learning within uncertain environments. Agents must adjust as goals shift while maintaining resilience against noisy signals, necessitating the delicate tradeoff: balancing flexibility for goal pursuit with stability for preventing erratic behavior. To investigate how the brain navigates this dilemma, we combined model simulations with behavioral and fMRI data collected during a goal-directed learning task under varying levels of uncertainty. Our simulations revealed that model-free learning struggles with the flexibility-stability trade-off, whereas model-based learning allows for flexible goal pursuit with varying degrees of stability. Interestingly, human participants displayed both stable and flexible goal-directed behavior. The fMRI data uncovered the underlying mechanism: goals and uncertainty are represented as factorized embeddings in the lateral prefrontal and orbitofrontal cortex. Notably, the neural separability of goals and their resilience to uncertainty in these regions correlated with participants’ behavioral flexibility and stability.

Rethinking model-based and model-free influences on mental effort and striatal prediction errors

Article 03 April 2023

Error-driven upregulation of memory representations

Article Open access 30 January 2025

Evidence for non-selective response inhibition in uncertain contexts revealed by combined meta-analysis and Bayesian analysis of fMRI data

Article Open access 16 June 2022

Introduction

Biological agents must rapidly adapt learned behavior, the ability known as behavioral flexibility. As a core feature of adaptive behavior, goal-directed learning requires representing action-outcome contingencies and updating them to attain a goal. Under uncertainty, however, it becomes challenging to distinguish goal-relevant feedback from environmental noise. Being overly sensitive to noisy outcomes can undermine stability, whereas failing to revise outdated representations hinders adaptation to changing contexts. To illustrate the stability-flexibility dilemma, consider a rescue operation under highly uncertain conditions. A rescue team must remain flexible enough to adjust if new information pinpoints the survivors’ location or the building’s accessibility. However, overreacting to every rumor wastes resources on false leads. In such a scenario, filtering out noisy events (stability) while responding appropriately to significant signals (flexibility) is crucial for success under shifting circumstances (stably flexible decision-making).

The prefrontal cortex (PFC) plays a vital role in cognitive control, which is necessary for behavioral flexibility^1,2. It specializes in representing and integrating various variables such as context, decisions, and sensory information^3,4,5,6. The PFC guides context-dependent decision-making by flexibly altering its information representation based on the current goals^3,7,8,9,10. Neural representations in the PFC reflect task demands, facilitating successful goal achievement and directly impacting performance^6,11.

The PFC also encodes environmental uncertainty. The lateral prefrontal cortex (LPFC) and orbitofrontal cortex (OFC) are particularly important in representing and tracking environmental uncertainty to adjust behavior^12,13,14,15. The LPFC is involved in arbitrating between different learning strategies according to uncertainty^16,17,18,19. It is also known to perform an important role in model-based control necessary for goal pursuit²⁰. The OFC is involved in making confidence judgments under uncertainty^21,22, representing state-space according to task demands^23,24,25, and calculating values^26,27,28,29.

While we must flexibly adjust our behavior to pursue goals, we also need to respond stably to changes in an uncertain environment. However, cognitive stability and flexibility are generally considered to be in a trade-off relationship, making it challenging to achieve both^{30,31,32,33,34,35}. Similarly, neural separability for flexibility and neural robustness for stability appear to be conflicting properties. In situations where multiple pieces of information are mixed, high-dimensional neural representations that easily distinguish stimuli or contexts are advantageous for flexible responses^{11,36,37,38,39}. Conversely, stable task performance requires neural representations that abstract information into a low-dimensional manifold by removing task-irrelevant noise or distractors^7,40,41,42. This creates a known trade-off between neural separability and robustness, depending on the representational dimensionality^37,42,43,44. While these studies focused on perceptual decision-making to explore neural representations for cognitive flexibility or stability, it remains unclear how the brain achieves stable, flexible goal pursuit during sequential decision-making with various contextual changes.

Our study employs model simulations, behavioral data analysis, and model-based fMRI analysis, to examine whether and how the brain resolves the behavioral flexibility-stability trade-off during goal-directed learning. We compare humans’ behavioral flexibility and stability with conventional value learning theory, including model-free and model-based reinforcement learning. We then analyze how the neural representations of goals and uncertainty in related brain areas, including the dlPFC, vlPFC, and OFC, facilitate stable and flexible goal pursuit.

Results

Behavioral stability-flexibility dilemma during goal-directed learning

To explore the neural representations involved in pursuing goals within uncertainty-changing environments, we used behavior and fMRI data published in a previous study¹⁷. Twenty participants performed a context-dependent two-stage Markov decision task (Fig. 1a, b; Task). Each state in the task was represented by a fractal image, and participants made left or right choices at each state across two stages. After the second choice, an outcome state and a corresponding coin were displayed, with the task’s objective being to maximize the accumulated coin score.

**Fig. 1: Flexible and stable human behavior during goal-directed learning.**

The experiment featured four types of block conditions, corresponding to each combination of two binary context conditions: goal condition and uncertainty condition (Fig. 1b). By systematically varying goal specificity and state transition uncertainty, respectively, this design captures two important dimensions of goal pursuit: the clarity of the goal to pursue and the predictability of the environment. First, the goal condition includes specific goal and non-specific goal conditions. In the specific goal condition, only coins matching the color (red, blue, or yellow) of a box presented at the start of the trial were considered valid rewards, and coins of other colors scored zero. For example, when a red box appears, participants understand that the task for that trial is to reach the state yielding a red coin; the goal is therefore the red-coin state. This setup encourages participants to align their actions with a clear goal, thereby promoting goal-directed behavior. Conceptually, it resembles having a specific goal (e.g., rescuing survivors in a single building). In contrast, the non-specific goal condition served as a control, where a white box was displayed, and coins of any color were recognized as valid rewards. While this condition still involves reward maximization, the absence of a clearly defined goal can lead to more habitual behaviors that are less sensitive to environmental changes. It is comparable to having a broad, general objective (e.g. “help anyone, anywhere").

The uncertainty condition modulated the distribution of state-action-state transition probabilities. Under the low uncertainty condition, the probabilities of the two possible subsequent states following a choice were assigned as (0.9, 0.1), whereas in the high uncertainty condition, these probabilities were (0.5, 0.5). This difference captures how predictable (or unpredictable) an environment is. In the low uncertainty condition, state transitions follow relatively predictable probabilities, whereas in the high uncertainty condition, transitions are dominated by random variability. Such uncertainty changes influence the efficacy of goal-directed learning, since increased uncertainty reduces the prediction accuracy of state-action-state transitions, a key factor in model-based learning.

We investigated the impact of goal and uncertainty contexts on state-action values and behavior. Using an oracle agent, we calculated the true state-action values for each context using the Bellman equation⁴⁵. The results showed that in the low uncertainty condition, where the state transition probabilities were biased, the value difference between left and right choices was larger than in the high uncertainty condition (Fig. 1c). From a value-based decision-making perspective, in the high uncertainty condition, participants are less likely to make optimal choices due to a smaller value difference^46,47,48. However, by measuring choice optimality and choice consistency for each trial (Behavioral measures), we found uncertainty did not affect task performance in the specific-goal condition (Fig. 1d). In the non-specific goal condition, high uncertainty led to fewer optimal choices (Fig. 1d, average choice optimality; low unc.: 0.850(mean) ± 0.035(s.e.m.); high unc.: 0.619 ± 0.021; low unc. vs high unc.: t(19) = 5.975, p < 0.0001) and more frequent choice changes (Fig. 1d, average choice consistency; low unc.: 0.897 ± 0.023; high unc.: 0.847 ± 0.021; low unc. vs high unc.: t(19) = 5.294, p < 0.0001). Surprisingly, in the specific goal condition, choice optimality and consistency were not significantly affected by high uncertainty (Fig. 1d; low unc. vs high unc. (optimality): t(19) = 0.887, p = 0.384; low unc. vs high unc. (consistency): t(19) = 0.891, p = 0.386), suggesting that humans achieve robust performance when pursuing a specific goal in uncertain conditions. Relatedly, participants’ response times were longer in the specific-goal than in the non-specific-goal condition, and within the specific-goal condition, they increased further under high uncertainty (Supplementary Fig. 1). These prolonged latencies likely reflect the additional deliberation required both to reach the specified goal and to handle unexpected state transitions in an uncertain environment.

This finding led us to focus on the specific goal condition to understand humans’ stably flexible goal pursuit. Optimal performance requires flexible behavior aligned with the goal, unhindered by noisy state transitions. We quantified these attributes using choice versatility and choice consistency (Fig. 1e, Behavioral measures). Each measure, calculated on a trial-by-trial basis, was designed to quantify choice sensitivity to goal changes and choice consistency despite environmental noise. The session-wide average of these trial-level values provided an individual measure of each participant’s behavioral flexibility or stability. In the specific goal condition, participants must select different trajectories to reach the outcome state matching the given goal (red, blue, or yellow). Hence, when the goal of the current trial is different from the previous trial, a different choice at the same state indicates choice versatility = 1, whereas the same choice indicates choice versatility = 0. Conversely, given a goal, participants should consistently make the optimal choice despite noisy state transitions. Therefore, if the goal of the current trial matches the goal of the previous trial, making the same choice at the same state indicates choice consistency = 1, while a different choice indicates choice consistency = 0.

To understand how human participants maintain the behavioral flexibility-stability balance, we compared the relationships between flexibility, stability, and performance of human participants with those of standard value learning models, including model-based (MB) and model-free (MF) reinforcement learning algorithms (Fig. 1f, Simulation). The overall performance of each participant was calculated by averaging trial-wise choice optimality (Behavioral measures), which evaluated the degree to which their actions aligned with those of an oracle agent possessing complete knowledge of reward contingencies and state-transition probabilities (1 = optimal, 0 = non-optimal). Because the MF algorithm learns a reward function based on consistent sampling of action-outcome events, whereas the MB algorithm does so by learning the dynamics of the action-state-outcome, each model’s behavior is known to be stable and flexible, respectively. We found effects of flexibility and stability on performance in MB agents and humans, but not in MF agents (the left and the middle plot of Fig. 1f). For the flexibility–performance comparison, humans showed a strong positive correlation (r = 0.827, t(18) = 6.244, p < 0.0001), MB agents showed a comparable effect (r = 0.799; 95% subsampling CI [0.649, 0.916], 10000 resamples with subsample size m = 20), whereas MF agents showed no reliable relationship (r = − 0.035; CI [ − 0.475, 0.411]). For the stability–performance comparison, humans again displayed a robust association (r = 0.779, t(18) = 5.269, p < 0.0001), MB agents exhibited an even stronger link (r = 0.963; CI [0.933, 0.986]), while MF agents remained near zero (r = 0.042; CI [ − 0.399, 0.485]). Furthermore, MF agents suffer from a flexibility-stability trade-off (as indicated by a strong negative correlation in the right plot of Fig. 1f), whereas humans and the MB agents did not. Humans showed a moderate positive flexibility–stability relation (r = 0.541, t(18) = 2.726, p = 0.014), MB agents showed a strong positive coupling (r = 0.829; CI [0.681, 0.930]), and MF agents showed a pronounced negative coupling (r = − 0.964; CI [ − 0.986, − 0.929]). Notably, humans exhibited the most flexible and stable behavior.

Evidence of goal and uncertainty representation in PFC during goal-directed learning

As the first step to investigate neural underpinnings of flexible and stable goal pursuit, we investigated whether the brain encodes goals and uncertainty information. We used the ROI-based multivoxel pattern analysis, focusing on eight brain regions known to be engaged in context-dependent behavior: ventrolateral prefrontal cortex (vlPFC), dorsolateral prefrontal cortex (dlPFC), orbitofrontal cortex (OFC), anterior cingulate cortex (ACC), pre-supplementary motor area (preSMA), primary visual cortex (V1), hippocampus (HPC), and ventral striatum (vStr). The vlPFC is known to be involved in the arbitration of MB/MF reinforcement learning^17,18,49,50, whereas the dlPFC is known to represent context information and guide task-switching^15,51,52. The OFC is implicated in state-space representation^23,24,25. The ACC plays a crucial role in monitoring conflicts in guiding adaptive adjustments in cognitive control^53,54. The preSMA is involved in the flexible control of voluntary actions⁵⁵. The HPC is involved in the formation of cognitive maps^56,57,58. Lastly, the ventral striatum plays a key role in computing reward prediction errors^59,60,61. The AAL3 atlas⁶² was used to define each ROI (Supplementary Fig. 2).

For flexible goal pursuit, the brain must effectively encode information about goals and associated states. We focused on the specific goal condition and conducted a decoding analysis on multivoxel patterns to quantify goal-related information. This was assessed by the classification accuracy of three distinct goals (red, blue, and yellow) using linear support vector machines (SVM) (fMRI decoding analyses). All classifications on the fMRI data were performed with leave-one-run-out cross-validation. To prevent potentially biased predictions from class imbalance, we performed 100 rounds of undersampling to balance the classes, using the average accuracy from these iterations as the final result. After this calibration, we confirmed that the decoding accuracy did not significantly exceed the chance level when the class label is shuffled (Supplementary Fig. 3). We trained and tested separate SVM classifiers for each trial event (fix: fixation, S1: state 1, A1: action 1, S2: state 2, A2: action 2, S3: state 3; Fig. 1a) to obtain decoding accuracy specific to that event. The decoding accuracy was calculated as a single value per participant based on multiple trials. Group-level statistical tests were performed, treating each participant as a random sample.

The vlPFC, dlPFC, OFC, ACC, preSMA and V1 demonstrated the highest goal decoding accuracy during the second stage (Fig. 2a; event-specific decoding accuracies; see Supplementary Table 2 for the statistical details). The average decoding accuracy across all events corroborates that these six regions significantly represented goal information (Fig. 2b; t-test of event-averaged decoding accuracy against the chance level (33.3%); vlPFC: 37.2% ± 0.7%, t(19) = 5.712, p < 0.0001; dlPFC: 38.8% ± 0.9%, t(19) = 5.999, p < 0.0001; OFC: 36.6% ± 0.7%, t(19) = 4.443, p = 0.0003; ACC: 35.4% ± 0.6%, t(19) = 3.459, p = 0.0026; preSMA: 35.4% ± 0.6%, t(19) = 3.701, p = 0.0015; V1: 36.9% ± 0.5%, t(19) = 7.401, p < 0.0001; uncorrected per predefined ROI; Supplementary Table 2). Furthermore, regarding state information, the vlPFC, dlPFC, OFC, ACC, preSMA, V1, and HPC demonstrated significant decodability of both intermediate and outcome states (Supplementary Fig. 4a, b). Notably, decoding accuracy for the intermediate state peaked during stage 2, while outcome state decoding peaked during stage 3.

**Fig. 2: Evidence of goal and uncertainty representation.**

For stable and flexible goal pursuit, the neural representation of goals must be separable from uncertainty. This demands uncertainty encoding during goal-directed learning. When measured as the test accuracy of classifying uncertainty conditions using a linear SVM on multivoxel patterns in the specific goal condition and the non-specific goal conditions, the vlPFC, dlPFC, and OFC significantly represented uncertainty information exclusively within the specific goal condition (Fig. 2d; t-test of event-averaged decoding accuracy against the chance level (50%); vlPFC: 51.8% ± 0.7%, t(19) = 2.549, p = 0.020; dlPFC: 52.1% ± 0.9%, t(19) = 2.412, p = 0.026; OFC: 53.4% ± 0.9%, t(19) = 3.749, p = 0.0014; uncorrected per predefined ROI; Supplementary Table 2). Additionally, our principal component analysis (PCA) showed that the neural dimensionality in vlPFC, dlPFC, OFC, and ACC is higher in the specific goal condition compared to the non-specific goal condition (Supplementary Fig. 5). These results imply that the LPFC and OFC encode uncertainty while engaging in complex neural computations to guide goal-directed behavior.

Factorized embedding of goal and uncertainty in the LPFC

Building on our findings that goal and uncertainty are represented in the LPFC and OFC, we sought to investigate how these two variables are represented in a single neural space to facilitate flexible yet stable goal pursuit. Following previous studies on representational geometry^11,41,43,63, we evaluated three hypotheses on mixed representations of goals and uncertainty (Fig. 3a). The types of possible linear separations vary depending on the complexity of the neural embedding structure^64,65.

**Fig. 3: Shattering analysis for neural goal and uncertainty embeddings.**

If only one of the two variables is represented, one can linearly separate the neural representations of the classes of that variable, but not of the other variable (compression hypothesis; the first column of Fig. 3a). The single represented variable remains stable and invariant across the changes in the other variable. However, because the represented variable has no information about the other variable, it is impossible to distinguish situations in which that other variable changes. For instance, if the goal is represented without representing uncertainty, the distinct goal information remains accessible but cannot detect or adapt to fluctuations in uncertainty.

Conversely, when both variables are independently represented along their respective coding axes, binary classifications involving the two variables can be linearly separated (factorized mixing hypothesis; the second column of Fig. 3a). In this case, both variables maintain distinct representations, ensuring that changes in one do not alter the embedding structure of the other. As a result, a downstream neural readout that decodes one variable can generalize across variations in the other variable. In such a factorized embedding structure, it is possible to detect uncertainty changes, while goal information remains consistently represented regardless of uncertainty levels.

Lastly, if there exists an interaction between the two variables, where the coding axis of one variable changes contingent on the other, dichotomies involving the nonlinear interaction can also be linearly separated (nonlinear mixing hypothesis; the third column of Fig. 3a). This high-dimensional neural embedding structure allows distinction across a wide range of situations arising from variable combinations. However, such dependency among variables reduces generalizability. As illustrated in the figure, uncertainty changes lead to changes in the goal embedding structure, making the representation of goal information highly sensitive and vulnerable to uncertainty shifts.

To understand the representational geometry, we performed a shattering analysis (Shattering analysis) to identify the types of possible linear separations among all dichotomies. We categorized all dichotomies into four types: goal, uncertainty, linear, and nonlinear (Supplementary Fig. 6). The average test accuracy of dichotomies within each category was defined as the shattering dimensionality (SD) for that category. Fig. 3a illustrates plausible neural embeddings under different combinations of these four types of separability. If the neural embedding follows the compression hypothesis, one of the SDs for goal or uncertainty will be distinctly high. Under the factorized mixing hypothesis, the SDs for goal, uncertainty, and linear will be notably high, whereas the SD for nonlinear will be significantly lower. Conversely, if goal and uncertainty form a nonlinearly mixed embedding, all four types of SD will be substantially high.

To evaluate which dichotomies-averaged into four SD categories-were linearly separable within each ROI, we applied multivoxel-pattern linear decoding. Aside from incorporating multiple binary label sets, the procedure followed our standard decoding pipeline (fMRI decoding analyses). For each ROI, we trained separate linear SVM classifiers at every task-informative event (S1, A1, S2, A2, S3, fix’) and averaged their decoding accuracies to obtain the SD score. The initial fixation period (fix) was excluded because it lacks task-relevant information for the current trial. However, as the fixation period preceding a new trial retains residual information from the previous trial, neural activity for the current trial was decoded using the fixation epoch of the subsequent trial (fix’).

The vlPFC, dlPFC, and OFC showed a separability profile corresponding to factorized mixing (Fig. 3b). These brain regions showed significant SD for goal, uncertainty, linear, and nonlinear types (Supplementary Fig. 7; vlPFC goal: 0.530 ± 0.005 (t(19) = 5.634, p < 0.0001); vlPFC uncertainty: 0.518 ± 0.007 (t(19) = 2.582, p = 0.0183); vlPFC linear: 0.519 ± 0.003 (t(19) = 5.527, p < 0.0001); vlPFC nonlinear: 0.508 ± 0.002 (t(19) = 4.018, p = 0.000735). dlPFC goal: 0.541 ± 0.007 (t(19) = 6.259, p < 0.0001); dlPFC uncertainty: 0.521 ± 0.009 (t(19) = 2.397, p = 0.0270); dlPFC linear: 0.526 ± 0.004 (t(19) = 6.158, p < 0.0001); dlPFC nonlinear: 0.511 ± 0.002 (t(19) = 4.432, p = 0.000286). OFC goal: 0.527 ± 0.006 (t(19) = 4.663, p = 0.00017); OFC uncertainty: 0.534 ± 0.009 (t(19) = 3.724, p = 0.00144); OFC linear: 0.521 ± 0.004 (t(19) = 4.807, p = 0.000122); OFC nonlinear: 0.507 ± 0.003 (t(19) = 2.904, p = 0.00909); paired t-tests against the chance level (0.5) within each predefined ROI, uncorrected). Notably, the vlPFC and dlPFC showed significantly lower nonlinear SD compared to goal and linear SDs (pairwise comparison of event-averaged SDs between four classification types by paired t-test; see Supplementary Table 3 for the statistical details). Similarly, the OFC showed significantly lower nonlinear SD than goal, uncertainty, and linear SDs. According to our three hypotheses (Fig. 3a), these results demonstrated a factorized embedding of goal and uncertainty. It also suggests a neural mechanism that maintains stable goal representations across varying levels of uncertainty.

The next step was to clarify how factorized neural embeddings contribute to maintaining stable and flexible goal-pursuit behavior. To this end, we conducted correlation analyses quantifying the relationship between different patterns of neural separability and task behavior. The neural metric was restricted to signals that directly inform the choice behavior. Accordingly, outcome-related events occurring after the choice (S3 and fix’) were excluded, and each SD was recalculated as the mean accuracy across the remaining pre-outcome events (S1-A2). Multiple comparison corrections were applied with a false-discovery rate (Benjamini-Hochberg procedure) for the number of ROIs with q = 0.05. Adjusted p-values were reported for all exploratory correlation analyses (Statistical analysis).

The significant correlation between each type of neural separability and behavioral measures that we observed suggests that goal separability in the vlPFC, dlPFC, and OFC is associated with goal pursuit performance (Fig. 3c; FDR-corrected, q < 0.05; Statistical analysis). Notably, higher goal separability in the LPFC correlates with greater behavioral flexibility, stability, and performance. These results carry two critical implications. First, a neural embedding in the LPFC, distinguishing goals from uncertainty, is associated with goal-dependent behavioral adaptation. Second, a clear goal representation is essential for consistently pursuing desired outcomes across multiple stages, even in noisy environments. On the other hand, uncertainty separability was not significantly related to goal-directed behavior, while linear and nonlinear interaction separabilities showed relatively weaker but generally consistent results with goal separability (Fig. 3d). To summarize, a neural embedding capable of clearly distinguishing goals is important for effective goal pursuit.

Neurally stable goal embedding in LPFC guides stably flexible learning

We further hypothesized that goal embeddings remain unaffected by uncertainty to maintain stable behavior. To test this hypothesis, we employed the neural metric called cross-condition generalization performance (CCGP; Cross-condition generalization performance)⁴¹, defined as the generalized accuracy of a linear decoder across different conditions. Specifically, we trained a linear classifier (SVM) to decode a target variable (i.e., the goal) in one context condition (e.g., low uncertainty) and then tested it in a different condition (e.g., high uncertainty). The CCGP score is calculated as the average test accuracy across these conditions. If the target embedding is influenced by context, the decision boundary for decoding would shift, leading to a lower CCGP. Thus, CCGP reflects how stably the neural representation can be decoded by downstream neural readouts across varying context conditions. To assess the robustness of goal representations, we measured the goal CCGP across uncertainty levels. Consistent with the preceding decoding analysis, we trained linear SVM classifiers for each ROI at every task-informative event (S1-fix’) and averaged their decoding accuracies to obtain the final CCGP score.

Supporting our hypothesis, the CCGP and SD values in each uncertainty condition were comparable (Fig. 4a; pairwise comparison of event-averaged neural measures by paired t-tests within each predefined ROI, uncorrected; see Supplementary Table 4 for the statistical details). Although the vlPFC, dlPFC, and OFC significantly represented uncertainty information (Fig. 2b), goal embeddings in these regions remained robust and unaffected by uncertainty. Similarly, the ACC, preSMA, and V1, which did not significantly represent uncertainty, maintained uncertainty-robust goal embeddings. The HPC and vStr, the regions that did not significantly represent goals or uncertainty, showed CCGP values near the chance level.

**Fig. 4: Neural goal robustness and correlations with behavior.**

Furthermore, we found that neural robustness in the vlPFC and dlPFC is crucial for goal pursuit behavior (Fig. 4b; FDR-corrected, q < 0.05). The goal CCGPs across uncertainty in the vlPFC and dlPFC were significantly correlated with behavioral flexibility, stability, and performance. Additionally, higher CCGP in the OFC was associated with greater behavioral stability. As in the SD-behavior correlation analysis, we used the mean CCGP averaged across the pre-outcome events (S1-A2) for this correlation.

Additionally, we quantified whether the neural goal representation remained stable across uncertainty levels by computing the parallelism score⁴¹. For each ROI, we averaged the multidimensional BOLD pattern within each class and derived vectors representing the direction from one specific goal representation to another under each uncertainty condition (Supplementary Fig. 9a). The cosine similarity between the goal-encoding vectors obtained under the two uncertainty conditions was then measured. Parallelism scores were averaged across five fMRI runs. As with the other classification-based analyses, class balancing was performed through undersampling.

Consequently, we found that the vlPFC, dlPFC, OFC, and preSMA exhibited significantly positive parallelism scores (Supplementary Fig. 9b), indicating minimal reorientation of neural goal embeddings in response to uncertainty. This suggests that the underlying representational geometry remained largely parallel. Combined with the CCGP results demonstrating robust trial-by-trial decodability based on decision boundaries, these findings confirmed that goal embeddings remain relatively stable under changing uncertainty.

Discussion

To investigate how humans achieve flexible and stable goal pursuit, we studied human goal-directed learning in uncertainty-changing environments using a two-stage Markov decision task. Our findings revealed that humans exhibit more robust behavior during goal-directed learning. Furthermore, higher behavioral flexibility in response to goal changes correlates with more stable behavior under uncertainty. We measured the neural representational separability and robustness of brain regions, which indicated that goal and uncertainty representations form factorized embeddings in the vlPFC, dlPFC, and OFC. Neural goal separability and robustness in these regions are associated with stably flexible goal-directed behavior in humans.

Our study presents a significant conceptual advance by offering a representational solution to the stability-flexibility dilemma in goal-directed learning. While prior works have primarily focused on identifying ‘what’ variables are encoded in the PFC (e.g., value, uncertainty), our work elucidates ‘how’ the brain navigates this trade-off through the specific geometrical structure of its neural representations. Earlier studies in decision-making have attempted to probe behavioral flexibility in response to changes in goals or context^{29,60,66,67,68,69}, while placing less emphasis on the concurrent challenge of maintaining stability against environmental noise. Additionally, studies on uncertainty representation have often centered on perceptual judgments^{70,71,72,73,74,75} rather than on the complex dynamics of sequential action toward delayed goals. Our findings bridge this gap by showing, through the lens of representational geometry, that the PFC employs a joint, factorized coding scheme for goals and uncertainty. This neural architecture offers a mechanistic account of how the brain reconciles competing cognitive demands, enabling robust and generalizable goal pursuit while remaining attuned to environmental statistics, thus supporting both adaptive behavior and resilience to noise.

Across the eight ROIs, our MVPA results reveal complementary contributions to stable yet flexible goal pursuit. V1 encodes task-relevant sensory features, whereas vlPFC, dlPFC, and OFC jointly represent factorized goal and uncertainty information; the strength of these codes correlates with individual differences in behavioral flexibility under uncertainty. By contrast, ACC and preSMA mainly encode goal information, and the magnitude of these signals tracks behavioral performance, suggesting that medial frontal areas relay resolved goals to downstream control systems once uncertainty has been represented in the lateral PFC and OFC circuits^55,76. Uncertainty was not decodable in ACC, indicating that our noisy state transitions did not evoke the internal value conflict typically associated with this region^53,77, which is consistent with our findings that factorized goal and uncertainty codes help resolve the stability-flexibility dilemma. Hippocampus and ventral striatum showed no reliable pattern-level goal or uncertainty coding within the sensitivity limits of our analysis. Together, these findings support a model in which lateral PFC and OFC furnish a flexible state representation, while medial PFC contributes goal-driven control, collectively balancing behavioral flexibility and stability.

In our study, we confirmed that humans can maintain successful goal-pursuit behavior even in uncertain environments. Previous studies have reported that an inaccurate prediction by the model-based learning system¹⁶ or low outcome controllability^78,79 can reduce goal-directed behavior. High state-transition uncertainty, for example, increases state prediction error, making it more challenging to achieve desirable outcomes. However, our two-stage Markov decision task, which features multiple possible trajectories leading to the goal state across 16 different branches, allows a greater flexibility in reaching the desired outcome. This design contrasts with simpler decision-making tasks, allowing the observation of robust goal-directed behavior under uncertainty. Our findings suggest that this robustness is supported by the factorized representation of goal and uncertainty in the prefrontal cortex, particularly in the vlPFC, dlPFC, and OFC.

The finding in our study that goal representation remains robust to uncertainty is consistent with previous findings suggesting that having a factorized representation facilitates generalization across various contexts^{37,41,63,80,81,82}. Moreover, the perspective that a similar orthogonal representation structure enables humans to avoid catastrophic forgetting and learn various tasks^7,83,84 aligns with the results of our study, indicating that goal embedding independent of the uncertainty allows for stable goal pursuit in varying levels of environmental noise.

While many studies have explored how PFC state representations change depending on the context^3,7,8, there has been a lack of research focusing on strategic processes including multiple contexts and stages. We targeted to fill this gap, and our findings demonstrated that uncertainty, which can influence behavioral strategy selection^13,16,17,19, and goal, which affects action selection, are independently represented in the LPFC. Thus, the LPFC is capable of guiding behavioral strategies while setting and pursuing goals independently of the strategy. This suggests that the LPFC possesses the ability to establish a stable hierarchy of strategy selection and action selection.

In our task, we observed that when specific goals were not provided, the decrease in value difference led to a decrease in the optimality and consistency of behavior (Fig. 1e), consistent with previous findings in value-based decision-making research^46,47,48. Interestingly, during goal-directed learning, we observed that participants maintained optimal choices despite the influence of uncertainty on action value. According to our neural data analysis, this stable goal-pursuit behavior is supported by the neural goal robustness to uncertainty in the LPFC and OFC. Thus, our study contributes to understanding the role of the PFC in guiding multistage decision-making involving multiple contexts at the neural representational level, which was previously difficult to explain through value learning alone^85,86,87,88.

We confirmed that participants exhibiting flexible behavior also show high behavioral stability. It contrasts with previous studies widely discussing the stability-flexibility trade-off^{30,31,32,33,34,35}. However, recent task-switch-related research suggests that these two characteristics can be controlled by independent mechanisms and are not necessarily conflicting^89,90. The neural evidence we present could serve as a key to explaining the brain’s ability to be flexible yet stable, along with computational modeling research.

Generally, the OFC is known to compress and judge external information such as task-relevant state space, confidence, emotion, and value estimation^{21,22,23,24,26,91}. Our results showing that the OFC is most sensitive to uncertainty are consistent with the existing literature. Furthermore, regarding specific goals as explicit task-specific information, the OFC seems to extract latent information about environmental uncertainty while minimizing interference between them. Thus, by factorizing abstract latent information and task-specific information, the OFC facilitates the generalization of important information as tasks change.

On the other hand, the LPFC extracts important context information related to meta-control, task switching, and planning^{17,18,51,92,93}, guiding strategy selection and action selection. The observation that the LPFC is most sensitive to goal information is consistent with the existing literature. Furthermore, the ability of the LPFC to independently represent task-relevant goals in uncertain environments enables consistent pursuit of task goals across various environments. That is, the factorized representation in the LPFC enables the transfer of goals across environments.

A promising direction for future studies is to develop neural network models that integrate factorized representations of goal and uncertainty, extending approaches such as β-VAE from sensory perception^94,95 to context-dependent reinforcement learning. Testing whether this architecture enables artificial agents to balance flexibility and stability, analogous to human performance, would clarify the representational mechanisms underlying adaptive, goal-directed behavior. Further investigation should probe how goal and uncertainty signals evolve across extended timescales or in more complex tasks, assessing whether factorized geometry in PFC persists in diverse scenarios.

In conclusion, our findings suggest a representational solution for achieving flexible yet stable goal pursuit under uncertainty. By maintaining separate codes for goal and uncertainty, the PFC preserves goal-directed action plans while selectively adjusting behavior in response to environmental variability. This perspective extends conventional computational accounts by emphasizing how the geometry of neural representations can link specialized computations with more generalizable cognitive control. This representational account provides a mechanistic framework for understanding how the brain maintains goals across varying contexts, offering testable hypotheses for future computational and empirical research on robust context-dependent learning.

Methods

Participants

We used the same participant dataset as the previous study¹⁷. Twenty-two subjects (all right-handed, six females, mean age: 28 years, age range: 19 to 40 years) participated in the experiment, and none of them had a history of neurological or psychiatric diseases. All subjects gave informed consent, and the study was approved by the Institutional Review Board of the California Institute of Technology.

One subject was excluded from the analysis since the subject consistently chose only one of the two choices in stage 1 and never experienced one of the four goal states. Another subject was excluded from the analysis because of the exceptionally low behavioral performance; the average choice optimality of that subject was less than 0.5 in stage 2, which translated into worse performance than random choice.

Task

We used behavior and fMRI data published in a previous study¹⁷. Twenty participants performed a sequential two-choice Markov decision task. In each trial, they began in a common start state and made two sequential choices (by pressing left or right within 4 s) to obtain a monetary reward in the form of a colored coin (red, yellow, or blue) at the end of the sequence. If participants did not respond within 4 s, the computer selected a random choice for them, and that trial was designated as a penalizing trial. The reward values (USD 0.40, 0.20, and 0.10) were randomly assigned to each coin color for each subject at the beginning of the experiment.

Before the main experiment, each participant completed a pretraining session consisting of 100 trials in which the state-transition probability was fixed at (0.5, 0.5). During these trials, a white “collection box" was presented, indicating that any colored coin would yield its assigned monetary reward. This pretraining was intended to allow participants ample opportunity (based on pilot data indicating that 80 trials are sufficient) to become familiar with both the sequential choice structure and the general reward contingency of a two-choice Markov task.

Following pretraining, the experiment proceeded in five separate scanning sessions of approximately 80 trials each, for a total of 400 trials in the main task. Each scanning session featured two conditions that manipulated the goal or collection box presented at the start of each trial. In the specific goal condition, the collection box was rendered in a single color (red, yellow, blue, or gray), indicating that only one particular coin color would be valuable on that trial (i.e., yield money if obtained). In the non-specific goal condition, the collection box was white, indicating that any of the three colored end states would provide its associated monetary outcome.

Throughout the main task, participants were not informed of the numeric state-transition probabilities, only that these contingencies could change. Specifically, the transitions alternated between (0.9, 0.1) and (0.5, 0.5) across short blocks to induce shifts in task predictability. Each block contained three to five trials under the (0.9, 0.1) condition and five to seven trials under the (0.5, 0.5) condition. This design ensured that participants experienced periods of relatively deterministic transitions and more uncertain transitions, encouraging the engagement of both model-based and model-free learning strategies. The time between states (and between trials) was sampled from a uniform distribution (1–4 s), and the reward outcome was displayed for 2 s at the end of each trial.

Participants were instructed that they would receive the cumulative monetary earnings from the task and that they should learn, through experience, which choices were more likely to lead to each colored coin. They were also aware that goal states and transition probabilities could vary. No further explicit information regarding probabilities or block lengths was provided, ensuring that they relied on ongoing experience to guide their choices.

Behavioral measures

We employed three behavioral measures to characterize goal-directed learning behavior. First, to assess behavioral flexibility between goals, we used choice versatility, defined as the switch in choice upon a goal change. In trials where the goal changed, the choice versatility was assigned a value of one if the current choice differed from the previous one at the same state, and zero if the choice remained consistent. Second, to evaluate behavioral stability within a goal, we used the choice consistency measure. For trials with the same goal, the choice consistency was one if the current choice was identical to the previous one at the same state, and zero if the choice changed. Lastly, choice optimality was used to assess the behavioral performance of goal-directed learning. For each trial, the choice optimality was assigned a value of one if the agent made an optimal value-based decision, and zero if the choice was not optimal. If two choices had identical action values, resulting in the same expected reward regardless of the choice, those trials were excluded from the analysis and choice optimality was not calculated for them.

Simulation

To simulate the learning processes, we generated virtual episodes using both the MB and MF learning agents. The MB agent employed both FORWARD learning⁹⁶ and BACKWARD planning¹⁷. The FORWARD learning mechanism enables the agent to learn the model of the environment by computing the state prediction error to update state-action-state transition probabilities and corresponding state-action values. BACKWARD planning allows instantaneous updates to the state-action value signal in response to changes in the goal, which defines the rewards of the outcome states. Whenever a goal is given, the model-based agent calculates the action value (Q_MB) using BACKWARD planning as follows. In doing so, it uses the estimated state-transition matrix $T(s,a,{s}^{{\prime} })$ for the given environment to compute the action value:

$$r(s)=\left\{\begin{array}{ll}R,\hfill &\,{{{\rm{for}}}}\,{{{\rm{a}}}}\,{{{\rm{goal}}}}\,{{{\rm{state}}}}\,,\\ 0,&{{{\rm{otherwise}}}}.\hfill \end{array}\right.$$

$$ \quad \quad\quad \quad{{\bf{for}}}\,i = 3 , 2 \\ \quad \quad\quad \quad \; {{\bf{for}}}\,s\in {S}_{i-1} \\ {Q}_{{{\rm{MB}}}}(s , a) = {\sum}_{{s}^{{\prime} }}T(s , a , {s}^{{\prime} })\left[r({s}^{{\prime} })+\gamma {\max }_{{a}^{{\prime} }}\,{Q}_{{{\rm{MB}}}}({s}^{{\prime} } , {a}^{{\prime} })\right] , \quad \forall \,a \\ \quad \quad\quad \quad \quad {{\bf{end}}}\,{{\bf{for}}} \\ \quad \quad\quad \quad \;\;{{\bf{end}}}\,{{\bf{for}}}$$

Here, R is the reward value corresponding to the goal state, and S_i refers to the set of states in the i-th stage. s, ${s}^{{\prime} }$ refers to the current and the next state, respectively. a, ${a}^{{\prime} }$ refers to the action in the current state and in the next state, respectively. Since γ is the temporal discount factor and, in our task, the actual reward is only given at the final stage, we set γ = 1.

In addition, at each state transition, it calculates a state prediction error (SPE) to update $T(s,a,{s}^{{\prime} })$:

$${\delta }_{{{{\rm{SPE}}}}} = 1-T(s,a,{s}^{{\prime} }),\\ \Delta T(s,a,{s}^{{\prime} }) = \eta \,{\delta }_{{{{\rm{SPE}}}}},\\ {Q}_{{{{\rm{MB}}}}}(s,a) = {\sum}_{{s}^{{\prime} }}T(s,a,{s}^{{\prime} })\left[r({s}^{{\prime} })+\gamma {\max }_{{a}^{{\prime} }}\,{Q}_{{{{\rm{MB}}}}}({s}^{{\prime} },{a}^{{\prime} })\right],$$

where η is the learning rate of the state-transition probability estimation.

The MF learning agent was implemented using SARSA, which utilized conventional temporal-difference (TD) updates to compute the reward prediction error for state-action value (Q_MF) updates⁴⁵:

$${\delta }_{{{{\rm{RPE}}}}} = r({s}^{{\prime} })+\gamma \,{Q}_{{{{\rm{MF}}}}}\left({s}^{{\prime} },{a}^{{\prime} }\right)-{Q}_{{{{\rm{MF}}}}}\left(s,a\right),\\ \Delta {Q}_{{{{\rm{MF}}}}}\left(s,a\right)= \alpha \,{\delta }_{{{{\rm{RPE}}}}},$$

where α is the learning rate of Q_MF.

Both the MB and MF models then compute the choice probabilities from action values using the softmax function, and their stochastic choices constitute the simulation behavior:

$$P\left(a| s\right)=\frac{\exp \left(\tau Q(s,a)\right)}{{\sum }_{{a}^{{\prime} }}\exp \left(\tau Q(s,{a}^{{\prime} })\right)}.$$

Here, τ serves as the inverse temperature controlling how strongly the model exploits value differences.

To maintain consistency with human behavioral data, we utilized the total number of trials, block condition sequences, and specific goal sequences directly from the data of 20 human participants. The RL agents made their own choices, leading to state transitions that differed from those in the human data. For each human experimental sequence, we performed simulations of both MB and MF agents using 1000 different random parameter sets, resulting in a total of 20,000 samples for analysis. Both the MB and MF models include two free parameters-a learning rate and a softmax inverse temperature. Specifically, the MB model uses the SPE learning rate η, whereas the MF model uses the RPE learning rate α.

fMRI data collection and pre-processing

We used the fMRI dataset provided by the previous study¹⁷. MRI images were obtained from the Caltech Brain Imaging Center, which uses a 3T Siemens (Erlangen) Trio scanner with a 32-channel radiofrequency coil. Structural images were collected using a standard MPRAGE pulse sequence (long repetition time (TR): 1,500 ms, short echo time (TE): 2.63 ms, flip angle: 10^∘, voxel size: 1 mm × 1 mm × 1 mm). For the functional images, 45 slices were collected at an angle of 30^∘ from the anterior commissure-posterior commissure axis using a one-shot echo-planar imaging pulse sequence (TR: 2,800 ms, TE: 30 ms, flip angle: 80^∘, field of view: 100 mm, voxel size: 3 mm × 3 mm × 3 mm).

fMRI data were preprocessed using the SPM8 software package. Preprocessing steps were conducted for each participant individually. Slice-timing correction was applied to adjust for acquisition time differences across slices within each image, using the first slice as the reference. To correct for participant motion, realignment was performed with the mean of the images as the reference. Each participant’s structural image was coregistered to the mean functional realigned image and normalized to the Montreal Neurological Institute (MNI) 152 template. The functional images were subsequently spatially transformed based on these normalization parameters, aligning them to the MNI152 template brain to account for anatomical variability across participants.

For additional preprocessing steps required for multivoxel pattern analysis (MVPA), we utilized the Princeton MVPA toolbox (http://code.google.com/p/princeton-mvpa-toolbox) and custom code. Within each scanning session, the fMRI data were detrended, and the BOLD time series of each voxel was z-scored. The resulting multi-voxel time series were then used as trial-by-trial brain activity patterns.

We performed ROI analysis based on eight brain regions: the ventrolateral prefrontal cortex (vlPFC), dorsolateral prefrontal cortex (dlPFC), orbitofrontal cortex (OFC), anterior cingulate cortex (ACC), pre-supplementary motor area (preSMA), primary visual cortex (V1), hippocampus (HPC), and ventral striatum (vStr) (Supplementary Fig. 2). All fMRI data used in the analyses were extracted from brain regions defined by the automated anatomical labeling (AAL3) atlas⁶², except for the preSMA. The preSMA was defined as the preSMA region from the JuBrain Anatomy toolbox⁹⁷ (the SPM Anatomy Toolbox; https://www.fz-juelich.de/en/inm/inm-7/resources/tools/jubrain-anatomy-toolbox). The vlPFC was defined as the triangular part of the inferior frontal gyrus¹⁷. The dlPFC was defined as the middle frontal gyrus. The OFC was defined as bilateral inferior, middle, and superior orbital gyri and bilateral rectal gyri²³. The ACC was defined as the pregenual and supracallosal anterior cingulate cortex. The V1 was defined as calcarine fissures and the surrounding cortex. The hippocampus and ventral striatum were defined as the AAL3 ROIs with the same name, respectively. In our preliminary analyses, we used under-sampling to match the number of voxels between ROIs and confirmed that differences in voxel number do not significantly affect our results. All results were bilaterally averaged for each region since there were no significant differences between hemispheres.

fMRI decoding analyses

For all our analyses (simple decoding, shattering, and CCGP), we trained linear SVMs on multivoxel patterns from a participant’s ROI to separate task-variable classes (for simple decoding analysis; Fig. 2) or particular dichotomies (for shattering analysis; Fig. 3 and CCGP analysis; Fig. 4). We treated the pre-processed voxel-wise BOLD time courses within each ROI as trial-by-trial neural activity patterns. To avoid session-related dependencies between the training and test data, we used leave-one-session-out cross-validation for all analyses. For each participant, the BOLD patterns from one of the five fMRI scanning sessions served as the test set, whereas the data from the remaining sessions formed the training set used to fit the classifier. All reported classification results correspond to the mean test accuracy obtained across the five cross-validation folds. Additionally, the label imbalance effect was ruled out by undersampling of the larger label. To minimize the random effect of the under-sampling, we repeated sampling 100 times with different random seeds, and the test accuracies of all 100 samplings were averaged for the main analysis.

To measure trial-event-specific neural measures, we used fMRI data scanned in the corresponding time bin. We recorded the timings of the following trial events: Fixation 1, Stimulus 1 (representing the first state S1), Choice 1 (A1), Fixation 2, Stimulus 2 (S2), Choice 2 (A2), Fixation 3, and Stimulus 3 (S3). The timings of the eight events were recorded for each trial. Regarding the sluggish hemodynamic response, we used the first fMRI volume recorded immediately after the occurrence of a specific event as the neural response elicited by that event. However, choices 1 and 2 were always followed by instant fixation cues (Fixations 2 and 3, respectively). As a result, each fixation cue after a choice always preceded the scanning of the choice-specific response volume. Therefore, we labeled volumes scanned after Fixation 2 and 3 as choice-specific activity (A1 and A2, respectively).

We performed decoding analyses on voxel-level multivariate patterns from ROIs of each participant. Decoding accuracy was calculated as a single value per participant based on multiple trials. The group-level statistical tests were performed, treating each participant as a random sample. No multiple comparisons correction was performed since we tested pre-defined ROIs’ results individually and did not perform an exploratory search for some specific effect using multiple samples.

The association between a neural measure and a behavioral measure was assessed using Pearson’s correlation coefficient across participants. Here, investigating the relationship between neural and behavioral measures via correlation were exploratory. Therefore, we performed correlation analyses across all eight ROIs and corrected for multiple comparisons using the Benjamini-Hochberg procedure (FDR, q = 0.05). All correlation results (Figs. 3c, 4b) reflect this correction, and the adjusted p-values are reported.

Shattering analysis

To investigate the representational geometry of goal and uncertainty in multi-voxel patterns of brain regions, we computed the shattering dimensionality (SD)^11,41 by averaging test accuracies of all linear support vector machines (SVM) trained to dichotomize task variables. Accordingly, the SD quantifies the separability of neural embeddings associated with each class of a task variable. The number of total dichotomies, which is equivalent to the number of ways of binary labeling, is determined by the number of task variable classes. There are 2^C ways of binary labeling with the C classes. The actual number of dichotomies required to be tested reduces to 2^C−1 − 1 after excepting the two cases of all positive or negative labeling and removing half of the duplicated cases due to the symmetry of binary labeling. Since it is based on binary classifiers, the chance level is 0.5.

To investigate the mixed embedding structure of specific goals and uncertainty, we performed a shattering analysis on goal-uncertainty combined classes (red-low, blue-low, yellow-low, red-high, blue-high, yellow-high). We categorized all dichotomies based on six classes into four types of classification: goal, uncertainty, linear, and nonlinear (Supplementary Fig. 6). The goal type included three dichotomies that separated each goal class from the others (e.g., red-low & red-high vs. the other classes), and the goal SD was defined as the average test accuracy of these three dichotomies (Fig. 3b). Similarly, the uncertainty type included one dichotomy separating the low vs. high uncertainty conditions (3 classes vs. 3 classes), and its test accuracy defined the uncertainty SD.

To determine linear and nonlinear type dichotomies, we assessed linear separability in a random dataset with a linearly mixed representation. Neural activity due to goal and uncertainty was expressed as:

$${{{\bf{y}}}}={W}_{g}{{{\bf{g}}}}+{W}_{u}u+{{{\bf{b}}}}$$

where y is the N-dimensional neural response, g is a 3 × 1 one-hot vector representing one of the three specific goals, u is a binary scalar variable representing uncertainty, W_g and W_u are the linear weight parameters for the two variables, and b is the N-dimensional bias parameter independent of goal and uncertainty. Using N=3, we generated 1000 sets of random parameters and assessed the linear separability of all dichotomies for the six classes. MATLAB’s ‘perceptron’ function was used for linear classification. Nine dichotomies not linearly separable across all random seeds were categorized into the nonlinear type. Dichotomies not categorized into the goal, uncertainty, or nonlinear types were categorized into the linear type.

Cross-condition generalization performance

To examine how the context condition (uncertainty) influences the neural embedding of the goal, we quantified cross-condition generalization performance (CCGP), defined as the generalized accuracy of a linear decoder across contexts. Because the uncertainty condition is binary, two cross-decoding directions arise: (i) training on low-uncertainty trials and testing on high-uncertainty trials, and (ii) the reverse. The CCGP is defined as the mean test accuracy obtained from these two directions.

A high CCGP indicates that the decoded variable (specific goals) is disentangled from contextual condition (uncertainty levels). If the goal representation is modulated by uncertainty, the CCGP will be lower than the within-condition decoding accuracy. To capture the performance decrease by comparing CCGP against the SD, we performed the cross-uncertainty goal decoding analysis consistent with the shattering analysis. Specifically, we implemented three linear dichotomies that discriminate each specific goal from the other two (red vs. blue, yellow; blue vs. red, yellow; yellow vs. red, blue) and averaged their test accuracies.

Consistent with all other classification analyses, we adopted a leave-one-session-out validation scheme: each fMRI scanning session served as the test set, while the remaining sessions constituted the training set. For CCGP, this procedure was repeated in both cross-condition directions (e.g., training on low-uncertainty data from the remaining sessions and testing on high-uncertainty data from the held-out session, and vice versa). The CCGP value is the mean of the two resulting test accuracies.

All other analysis settings matched those used in the shattering-dimension analysis.

Statistical analysis

Unless otherwise stated, hypothesis-driven tests were evaluated at α = 0.05 (two-tailed). Because each ROI was selected a priori on theoretical grounds, statistical tests were performed independently for every ROI without a family-wise correction across ROIs (Figs. 2, 3b, 4a)^73,98,99. Where a single test involved several task events or SD categories within the same ROI, these comparisons were likewise planned and reported without additional correction.

In contrast, the correlations between neural metrics and behavior (Figs. 3c, d, 4b) were exploratory. Here we corrected across the eight ROIs using the Benjamini-Hochberg false-discovery-rate procedure (q = 0.05) and report the adjusted q-values. Exact p- and q-values are provided in Supplementary Tables 2–4.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The human behavioral data used in this study are available in the GitHub repository at https://github.com/brain-machine-intelligence/RLdim-mvpa-model. The processed fMRI data generated in this study (ROI-masked EPI) have been deposited in the OSF database (https://osf.io/2gyue). The data used to create the figures in this paper are provided in the Source Data file. Source data are provided with this paper.

Code availability

The code used for the neural and simulation analyses in this study is available in the GitHub repository (https://github.com/brain-machine-intelligence/RLdim-mvpa-model) and archived at Zenodo (https://doi.org/10.5281/zenodo.17412741)¹⁰⁰.

References

Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).
Article PubMed Google Scholar
Cohen, J. D. Cognitive Control. In The wiley handbook of cognitive control, chap. 1, 1-28 (John Wiley & Sons, Ltd, 2017).
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Article ADS PubMed PubMed Central Google Scholar
Stokes, M. G. et al. Dynamic coding for cognitive control in prefrontal cortex. Neuron 78, 364–375 (2013).
Article PubMed PubMed Central Google Scholar
Cromer, J. A., Roy, J. E. & Miller, E. K. Representation of multiple, independent categories in the primate prefrontal cortex. Neuron 66, 796–807 (2010).
Article PubMed PubMed Central Google Scholar
Aoi, M. C., Mante, V. & Pillow, J. W. Prefrontal cortex exhibits multidimensional dynamic encoding during decision-making. Nat. Neurosci. 23, 1410–1420 (2020).
Article PubMed PubMed Central Google Scholar
Flesch, T., Juechems, K., Dumbalska, T., Saxe, A. & Summerfield, C. Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron 110, 1258–1270.e11 (2022).
Article PubMed PubMed Central Google Scholar
Castegnetti, G., Zurita, M. & De Martino, B. How usefulness shapes neural representations during goal-directed behavior. Sci. Adv. 7, eabd5363 (2021).
Article ADS PubMed PubMed Central Google Scholar
Molinaro, G. & Collins, A. G. E. A goal-centric outlook on learning. Trends in cognitive sciences (2023).
Muhle-Karbe, P. S. et al. Goal-seeking compresses neural codes for space in the human hippocampus and orbitofrontal cortex. Neuron 111, 3885–3899.e6 (2023).
Article ADS PubMed Google Scholar
Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).
Article ADS PubMed PubMed Central Google Scholar
Huettel, S. A., Song, A. W. & McCarthy, G. Decisions under uncertainty: probabilistic context influences activation of prefrontal and parietal cortices. J. Neurosci. 25, 3304–3311 (2005).
Article PubMed PubMed Central Google Scholar
Soltani, A. & Izquierdo, A. Adaptive learning under expected and unexpected uncertainty. Nat. Rev. Neurosci. 20, 635–644 (2019).
Article PubMed PubMed Central Google Scholar
Hsu, M., Bhatt, M., Adolphs, R., Tranel, D. & Camerer, C. F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310, 1680–1683 (2005).
Article ADS PubMed Google Scholar
Soltani, A. & Koechlin, E. Computational models of adaptive behavior and prefrontal cortex. Neuropsychopharmacology 47, 58–71 (2022).
Article PubMed Google Scholar
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article PubMed Google Scholar
Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Article PubMed PubMed Central Google Scholar
Kim, D., Jeong, J. & Lee, S. W. Prefrontal solution to the bias-variance tradeoff during reinforcement learning. Cell Rep. 37, 110185 (2021).
Article PubMed Google Scholar
Collins, A. G. E. & Cockburn, J. Beyond dichotomies in reinforcement learning. Nat. Rev. Neurosci. 21, 576–586 (2020).
Article PubMed PubMed Central Google Scholar
Smittenaar, P., FitzGerald, T. H. B., Romei, V., Wright, N. D. & Dolan, R. J. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).
Article PubMed PubMed Central Google Scholar
De Martino, B., Fleming, S. M., Garrett, N. & Dolan, R. J. Confidence in value-based choice. Nat. Neurosci. 16, 105–110 (2013).
Article PubMed Google Scholar
Lebreton, M., Abitbol, R., Daunizeau, J. & Pessiglione, M. Automatic integration of confidence in the brain valuation signal. Nat. Neurosci. 18, 1159–1167 (2015).
Article PubMed Google Scholar
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Article PubMed PubMed Central Google Scholar
Park, S. A., Miller, D. S., Nili, H., Ranganath, C. & Boorman, E. D. Map making: constructing, combining, and inferring on abstract cognitive maps. Neuron 107, 1226–1238.e8 (2020).
Article PubMed PubMed Central Google Scholar
Whittington, J. C. R., McCaffary, D., Bakermans, J. J. W. & Behrens, T. E. J. How to build a cognitive map. Nat. Neurosci. 25, 1257–1272 (2022).
Article PubMed Google Scholar
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Article ADS PubMed PubMed Central Google Scholar
O’doherty, J. P. Lights, camembert, action! the role of human orbitofrontal cortex in encoding stimuli, rewards, and choices. Ann. N. Y. Acad. Sci. 1121, 254–272 (2007).
Article ADS PubMed Google Scholar
Rigotti, M., Rubin, D. B. D., Morrison, S., Salzman, C. & Fusi, S. Attractor concretion as a mechanism for the formation of context representations. Neuroimage 52, 833–847 (2010).
Article PubMed Google Scholar
Saez, A., Rigotti, M., Ostojic, S., Fusi, S. & Salzman, C. Abstract context representations in primate amygdala and prefrontal cortex. Neuron 87, 869–881 (2015).
Article PubMed PubMed Central Google Scholar
Goschke, T. Volition in action: intentions, control dilemmas, and the dynamic regulation of cognitive control (2013).
Hommel, B. Chapter Two - Between persistence and flexibility: The yin and yang of action control. in Elliot, A. J. (ed.) advances in motivation science, vol. 2, 33–67 (Elsevier, 2015).
Armbruster-Genç, D. J. N., Ueltzhöffer, K. & Fiebach, C. J. Brain signal variability differentially affects cognitive flexibility and cognitive stability. J. Neurosci. 36, 3978–3987 (2016).
Article PubMed PubMed Central Google Scholar
Dreisbach, G. & Fröber, K. On how to be flexible (or not): Modulation of the stability-flexibility balance. Curr. Directions Psychological Sci. 28, 3–9 (2019).
Article Google Scholar
Musslick, S. & Cohen, J. D. Rationalizing constraints on the capacity for cognitive control. Trends Cogn. Sci. 25, 757–775 (2021).
Article PubMed Google Scholar
Qiao, L., Zhang, L. & Chen, A. Control dilemma: Evidence of the stability–flexibility trade-off. Int. J. Psychophysiol. 191, 29–41 (2023).
Article PubMed Google Scholar
Raposo, D., Kaufman, M. T. & Churchland, A. K. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 17, 1784–1792 (2014).
Article PubMed PubMed Central Google Scholar
Fusi, S., Miller, E. K. & Rigotti, M. Why neurons mix: High dimensionality for higher cognition. Curr. Opin. Neurobiol. 37, 66–74 (2016).
Article PubMed Google Scholar
Tang, E. et al. Effective learning is accompanied by high-dimensional and efficient representations of neural activity. Nat. Neurosci. 22, 1000–1009 (2019).
Article PubMed Google Scholar
Sheng, J. et al. Higher-dimensional neural representations predict better episodic memory. Sci. Adv. 8, eabm3829 (2022).
Article PubMed PubMed Central Google Scholar
Mack, M. L., Preston, A. R. & Love, B. C. Ventromedial prefrontal cortex compression during concept learning. Nat. Commun. 11, 46 (2020).
Article ADS PubMed PubMed Central Google Scholar
Bernardi, S. et al. The geometry of abstraction in the hippocampus and prefrontal cortex. Cell 183, 954–967.e21 (2020).
Article PubMed PubMed Central Google Scholar
Chung, S. & Abbott, L. F. Neural population geometry: An approach for understanding biological and artificial neural networks. Curr. Opin. Neurobiol. 70, 137–144 (2021).
Article PubMed PubMed Central Google Scholar
Badre, D., Bhandari, A., Keglovits, H. & Kikumoto, A. The dimensionality of neural representations for control. Curr. Opin. Behav. Sci. 38, 20–28 (2021).
Article PubMed Google Scholar
Jazayeri, M. & Ostojic, S. Interpreting neural computations by examining intrinsic and embedding dimensionality of neural activity. Curr. Opin. Neurobiol. 70, 113–120 (2021).
Article PubMed PubMed Central Google Scholar
Sutton, R. S. & Barto, A. G.Reinforcement learning, second edition: An introduction (MIT Press, 2018).
Solway, A. & Botvinick, M. M. Evidence integration in model-based tree search. Proc. Natl Acad. Sci. 112, 11708–11713 (2015).
Article ADS PubMed PubMed Central Google Scholar
Krajbich, I. & Rangel, A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc. Natl Acad. Sci. 108, 13852–13857 (2011).
Article ADS PubMed PubMed Central Google Scholar
Hare, T. A., Schultz, W., Camerer, C. F., O’Doherty, J. P. & Rangel, A. Transformation of stimulus value signals into motor commands during simple choice. Proc. Natl Acad. Sci. 108, 18120–18125 (2011).
Article ADS PubMed PubMed Central Google Scholar
Kim, D., Park, G. Y., O’Doherty, J. P. & Lee, S. W. Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning. Nat. Commun. 10, 5738 (2019).
Article ADS PubMed PubMed Central Google Scholar
Heo, S., Sung, Y. & Lee, S. W. Effects of subclinical depression on prefrontal–striatal model-based and model-free learning. PLOS Computational Biol. 17, e1009003 (2021).
Article ADS Google Scholar
Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y. & Tanji, J. Activity in the lateral prefrontal cortex reflects multiple steps of future events in action plans. Neuron 50, 631–641 (2006).
Article PubMed Google Scholar
Yamagata, T., Nakayama, Y., Tanji, J. & Hoshi, E. Distinct information representation and processing for goal-directed behavior in the dorsolateral and ventrolateral prefrontal cortex and the dorsal premotor cortex. J. Neurosci. 32, 12934–12949 (2012).
Article PubMed PubMed Central Google Scholar
Kerns, J. G. et al. Anterior cingulate conflict monitoring and adjustments in control. Science 303, 1023–1026 (2004).
Article ADS PubMed Google Scholar
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Article PubMed Google Scholar
Nachev, P., Wydell, H., O’Neill, K., Husain, M. & Kennard, C. The role of the pre-supplementary motor area in the control of action. NeuroImage 36, T155–T163 (2007).
Article PubMed Google Scholar
O’Keefe, J. & Nadel, L.The hippocampus as a cognitive map (Clarendon Press, 1978).
Crivelli-Decker, J. et al. Goal-oriented representations in the human hippocampus during planning and navigation. Nat. Commun. 14, 2946 (2023).
Article ADS PubMed PubMed Central Google Scholar
Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643–1653 (2017).
Article PubMed Google Scholar
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J. & Frith, C. D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 (2006).
Article ADS PubMed PubMed Central Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article PubMed PubMed Central Google Scholar
Payzan-LeNestour, E., Dunne, S., Bossaerts, P. & O’Doherty, J. P. The neural representation of unexpected uncertainty during value-based decision making. Neuron 79, 191–201 (2013).
Article PubMed PubMed Central Google Scholar
Rolls, E. T., Huang, C.-C., Lin, C.-P., Feng, J. & Joliot, M. Automated anatomical labelling atlas 3. NeuroImage 206, 116189 (2020).
Article PubMed Google Scholar
Ito, T. et al. Compositional generalization through abstract representations in human and artificial neural networks. Adv. Neural Inf. Process. Syst. 35, 32225–32239 (2022).
Google Scholar
Vapnik, V. N. & Chervonenkis, A. Y. On the uniform convergence of relative frequencies of events to their probabilities. In Vovk, V., Papadopoulos, H. & Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis, 11-30 (Springer International Publishing, Cham, 2015).
Abu-Mostafa, Y. S. The vapnik-chervonenkis dimension: information versus complexity in learning. Neural Comput. 1, 312–317 (1989).
Article Google Scholar
Feher Da Silva, C., Lombardi, G., Edelson, M. & Hare, T. A. Rethinking model-based and model-free influences on mental effort and striatal prediction errors. Nature Human Behaviour (2023).
Möhring, L. & Gläscher, J. Prediction errors drive dynamic changes in neural patterns that guide behavior. Cell Reports 42, https://www.cell.com/cell-reports/abstract/S2211-1247(23)00942-7 (2023).
Kool, W., Gershman, S. J. & Cushman, F. A. Cost-benefit arbitration between multiple reinforcement-learning systems. Psychological Sci. 28, 1321–1333 (2017).
Article Google Scholar
Kim, T. et al. Neurocomputational model of compulsivity: deviating from an uncertain goal-directed system. Brain 147, 2230–2244 (2024).
Article PubMed PubMed Central Google Scholar
Kiani, R. & Shadlen, M. N. Representation of confidence associated with a decision by neurons in the parietal cortex. Science 324, 759–764 (2009).
Article ADS PubMed PubMed Central Google Scholar
Hebart, M. N., Schriever, Y., Donner, T. H. & Haynes, J.-D. The relationship between perceptual decision variables and confidence in the human brain. Cereb. Cortex 26, 118–130 (2016).
Article PubMed Google Scholar
Gherman, S. & Philiastides, M. G. Human vmpfc encodes early signatures of confidence in perceptual decisions. eLife 7, e38293 (2018).
Article PubMed PubMed Central Google Scholar
van Bergen, R. S., Ji Ma, W., Pratte, M. S. & Jehee, J. F. M. Sensory uncertainty decoded from visual cortex predicts behavior. Nat. Neurosci. 18, 1728–1730 (2015).
Article PubMed PubMed Central Google Scholar
Li, H.-H., Sprague, T. C., Yoo, A. H., Ma, W. J. & Curtis, C. E. Joint representation of working memory and uncertainty in human cortex. Neuron 109, 3699–3712.e6 (2021).
Article PubMed PubMed Central Google Scholar
Geurts, L. S., Cooke, J. R. H., van Bergen, R. S. & Jehee, J. F. M. Subjective confidence reflects representation of bayesian probability in cortex. Nat. Hum. Behav. 6, 294–305 (2022).
Article PubMed PubMed Central Google Scholar
Friedman, N. P. & Robbins, T. W. The role of prefrontal cortex in cognitive control and executive function. Neuropsychopharmacology 47, 72–89 (2022).
Article PubMed Google Scholar
Shenhav, A., Botvinick, M. M. & Cohen, J. D. The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron 79, 217–240 (2013).
Article PubMed PubMed Central Google Scholar
Dorfman, H. M. & Gershman, S. J. Controllability governs the balance between pavlovian and instrumental action selection. Nat. Commun. 10, 5826 (2019).
Article ADS PubMed PubMed Central Google Scholar
Liljeholm, M., Dunne, S. & O’Doherty, J. P. Differentiating neural systems mediating the acquisition vs. expression of goal-directed and habitual behavioral control. Eur. J. Neurosci. 41, 1358–1371 (2015).
Article PubMed PubMed Central Google Scholar
Baram, A. B., Muller, T. H., Nili, H., Garvert, M. M. & Behrens, T. E. J. Entorhinal and ventromedial prefrontal cortices abstract and generalize the structure of reinforcement learning problems. Neuron 109, 713–723.e7 (2021).
Article PubMed PubMed Central Google Scholar
Witkowski, P. P., Park, S. A. & Boorman, E. D. Neural mechanisms of credit assignment for inferred relationships in a structured world. Neuron 110, 2680–2690.e9 (2022).
Article PubMed Google Scholar
Ritz, H. & Shenhav, A. Orthogonal neural encoding of targets and distractors supports multivariate cognitive control. Nat. Human Behav. 1–17 (2024).
Dekker, R. B., Otto, F. & Summerfield, C. Curriculum learning for human compositional generalization. Proc. Natl Acad. Sci. 119, e2205582119 (2022).
Article PubMed PubMed Central Google Scholar
Flesch, T., Saxe, A. & Summerfield, C. Continual task learning in natural and artificial agents. Trends Neurosci. 46, 199–210 (2023).
Article PubMed PubMed Central Google Scholar
Juechems, K. & Summerfield, C. Where does value come from? Trends Cogn. Sci. 23, 836–850 (2019).
Article PubMed Google Scholar
Averbeck, B. & O’Doherty, J. P. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 47, 147–162 (2022).
Article PubMed Google Scholar
Martino, B. D. & Cortese, A. Goals, usefulness and abstraction in value-based choice. Trends Cogn. Sci. 27, 65–80 (2023).
Article PubMed Google Scholar
Frömer, R. & Shenhav, A. Filling the gaps: Cognitive control as a critical lens for understanding mechanisms of value-based decision-making. Neurosci. Biobehav. Rev. 134, 104483 (2022).
Article PubMed Google Scholar
Egner, T. Principles of cognitive control over task focus and task switching. Nature reviews psychology 1–13 (2023).
Nack, C. & Yu-Chin, C. Cognitive flexibility and stability at the task-set level: A dual-dimension framework. Adv. /Psychol. 1, 1–28 (2023).
Google Scholar
O’Reilly, R. C. Unraveling the mysteries of motivation. Trends Cogn. Sci. 24, 425–434 (2020).
Article PubMed PubMed Central Google Scholar
Domenech, P. & Koechlin, E. Executive control and decision-making in the prefrontal cortex. Curr. Opin. Behav. Sci. 1, 101–106 (2015).
Article Google Scholar
Vaidya, A. R. & Badre, D. Abstract task representations for inference and control. Trends Cogn. Sci. 26, 484–498 (2022).
Article PubMed PubMed Central Google Scholar
Higgins, I. et al. beta-vae: Learning basic visual concepts with a constrained variational framework https://openreview.net/forum?id=Sy2fzU9gl (2017).
Higgins, I. et al. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nat. Commun. 12, 6456 (2021).
Article ADS PubMed PubMed Central Google Scholar
Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Article PubMed PubMed Central Google Scholar
Eickhoff, S. B. et al. Assignment of functional activations to probabilistic cytoarchitectonic areas revisited. NeuroImage 36, 511–521 (2007).
Article PubMed Google Scholar
Gadassi Polack, R., Mollick, J. A., Keren, H., Joormann, J. & Watts, R. Neural responses to reward valence and magnitude from pre- to early adolescence. NeuroImage 275, 120166 (2023).
Article PubMed Google Scholar
Kenyon, K. H. et al. The characteristics and reproducibility of motor speech functional neuroimaging in healthy controls. Frontiers in Human Neuroscience 18, https://www.frontiersin.org, https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/fnhum.2024.1382102/full (2024).
Sung, Y., Rigotti, M. & Lee, S. W. Factorized embedding of goal and uncertainty in the lateral prefrontal cortex guides stably flexible learning https://doi.org/10.5281/zenodo.17412741 (2025).

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF), funded by the Korean government (MSIT) (No. RS-2024-00439903). This research was partly supported by the National Research Foundation of Korea (NRF), funded by the Korean government (RS-2022-NR068758) and Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT)(No.RS-2019-II190075 Artificial Intelligence Graduate School Program (KAIST)).

Author information

Authors and Affiliations

Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Yoondo Sung & Sang Wan Lee
IBM Research, Zurich, Switzerland
Mattia Rigotti
Center for Neuroscience-inspired AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Sang Wan Lee
Department of Brain and Cognitive Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Sang Wan Lee
Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Sang Wan Lee
Graduate School of Data Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
Sang Wan Lee

Authors

Yoondo Sung
View author publications
Search author on:PubMed Google Scholar
Mattia Rigotti
View author publications
Search author on:PubMed Google Scholar
Sang Wan Lee
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, Y.S. and S.W.L.; Methodology, Y.S., M.R., and S.W.L.; Software, Y.S.; Validation, Y.S.; Formal Analysis, Y.S.; Investigation, Y.S. and S.W.L.; Resources, S.W.L.; Data Curation, S.W.L.; Writing - Original Draft, Y.S.; Writing - Review & Editing, Y.S., M.R., and S.W.L.; Visualization, Y.S.; Supervision, S.W.L.; Project Administration, S.W.L; Funding Acquisition, S.W.L.

Corresponding author

Correspondence to Sang Wan Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

: Nature Communications thanks Benedetto De Martino and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sung, Y., Rigotti, M. & Lee, S.W. Factorized embedding of goal and uncertainty in the lateral prefrontal cortex guides stably flexible learning. Nat Commun 16, 11643 (2025). https://doi.org/10.1038/s41467-025-66677-w

Download citation

Received: 04 December 2024
Accepted: 12 November 2025
Published: 26 November 2025
Version of record: 30 December 2025
DOI: https://doi.org/10.1038/s41467-025-66677-w

Subjects

Abstract

Similar content being viewed by others

Rethinking model-based and model-free influences on mental effort and striatal prediction errors

Error-driven upregulation of memory representations

Evidence for non-selective response inhibition in uncertain contexts revealed by combined meta-analysis and Bayesian analysis of fMRI data

Introduction

Results

Behavioral stability-flexibility dilemma during goal-directed learning

Evidence of goal and uncertainty representation in PFC during goal-directed learning

Factorized embedding of goal and uncertainty in the LPFC

Neurally stable goal embedding in LPFC guides stably flexible learning

Discussion

Methods

Participants

Task

Behavioral measures

Simulation

fMRI data collection and pre-processing

fMRI decoding analyses

Shattering analysis

Cross-condition generalization performance

Statistical analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links