Introduction

Bipolar disorder (BD) is a chronic affective condition characterised by episodes of elation, depression, and mixed states, interspersed with periods of clinical remission [1, 2]. Alterations in reward processing and impaired decision-making performance have been associated with the condition [3, 4], pointing to disrupted functional connectivity between the prefrontal cortex (PFC) and the mesolimbic reward system [5, 6]. Yet findings across studies are variable, even when considering euthymic periods alone [5,6,7]. BD research has reported both heightened sensitivity to negative feedback and decreased learning from rewards, or the reverse [8,9,10,11,12]. Recently, Ossola et al. [13] found that in euthymic BD, attenuated belief updating from positive feedback forecasts relapse, highlighting the importance of investigating dynamic belief updating during euthymia.

Influential proposals advocate for the application of computational models in BD research to understand fluctuations in mood and reward processing [14,15,16]. Short-lived emotional changes in response to rewards can accumulate to generate longer-lasting mood states, which further bias the processing of outcomes, favouring computations congruent with the valence of the current mood [17,18,19]. In BD, new frameworks, building upon previous neurocomputational work on mood instability [20], suggest that altered affective reactivity to reward and punishment may elevate learning rates, even during euthymia, predisposing individuals to form stronger expectations about rewards or punishments. Recent empirical work supports this, revealing a tendency in BD for reward perception to be biased by fluctuations in the momentum of recent reward prediction errors [21].

An increased learning rate could also reflect a heightened anticipation of environmental changes in BD [14]. Indeed, seminal modelling studies support that agents learn faster when anticipating more frequent transitions in the environment [22, 23]. In BD, persistent biased expectations during mood episodes will deviate from the true hidden state, increasing the likelihood of unsuccessful decisions. This could promote a perception that the environment is more volatile. However, the extent to which individuals with BD overestimate volatility and how this influences their belief updating and decision-making remains unexplored. Building on the proposal that moods can specify the prior probabilities of different computations [20, 24], we hypothesise that a history of experiencing mood extremes and high mood instability in BD will set a prior on high environmental volatility in this condition. Such inflated volatility estimates could introduce ‘noise’ into the decision-making process [25], leading to incorrect decisions. Alternatively, the reported shifts in decision-making performance during euthymic BD [3, 4] could be explained by slower belief updating, aligning with recent empirical observations in valence-dependent learning [13].

To test these hypotheses, we investigate the computational processes underlying altered decision-making in euthymic bipolar patients, compared to healthy participants, as they undertake a probabilistic reward-based learning task in a volatile environment. We employ the Hierarchical Gaussian Filter (HGF), a validated modelling framework based on Hierarchical Bayesian inference that describes individual learning dynamics in uncertain and volatile environments [26,27,28,29]. The HGF is based on influential theories of cortical function, which propose that the brain continuously makes and refines predictions about the states of the world using approximate Bayesian inference [30, 31]. We used the HGF to model how input about probabilistic reward outcomes and their change over time is integrated with prior beliefs during learning, resulting in posterior beliefs about the hidden states causing the observed outcomes [26, 27]. Belief updates in the HGF are driven by prediction errors (PE)—the discrepancy between predictions and outcomes—and are modulated by the precision weights, where precision is defined as the inverse variance of belief distributions. This computational framework has already proven useful for understanding psychiatric conditions [12, 32, 33], aligning with proposals that understand clinical and subclinical conditions as manifestations of aberrant belief updating and predictive processing [34, 35]. Integrating generative models of learning and inference, such as the HGF, with dynamic models of mood in BD could offer insights into how extreme changes in affective states and mood dynamically shape adaptive learning [16, 36,37,38,39].

To gain a more mechanistic understanding of the processes underlying the hypothesised computational alterations in euthymic BD, we additionally investigated the neural correlates of hierarchical belief updating using magnetoencephalography (MEG). Existing research supports the role of cortical oscillations in maintaining predictions and encoding PEs [40,41,42,43]. Specific frequency rhythms such as alpha (8–12 Hz) and beta (13–30 Hz) oscillations have been associated with the transmission of top-down predictions, and encoding precision, while gamma-band activity (>30 Hz) has been linked to the propagation of PEs and precision-weighted PEs, pwPEs [41, 43,44,45]. Importantly, disruptions in these rhythms are suggested to contribute to learning deficits observed in various psychiatric conditions, including anxiety, schizophrenia, and autism [45,46,47].

On a neural level, we hypothesised that biases in probabilistic reward-based learning in BD in a volatile setting can be reflected in alpha, beta, and gamma activity during the encoding of pwPE and precision. We anticipated between-group differences in gamma activity along with concomitant alpha/beta activity during pwPE processing, and in alpha/beta activity during the encoding of uncertainty. Faster (slower) belief updating in BD would be reflected in increased (decreased) gamma activity, with opposite directional modulation in alpha/beta during pwPE encoding [45, 46]. These alterations are expected to manifest in the orchestrated activity across decision-making brain areas, such as the prefrontal, anterior cingulate, and orbitofrontal cortex (PFC, ACC, OFC). These regions are involved in learning in volatile and uncertain settings [22, 45, 48, 49], and form part of the fronto-striatal reward circuit, which exhibits disturbed connectivity in BD [9, 50]. We therefore additionally hypothesised that changes in frequency-domain connectivity patterns between these regions during belief updating would occur in BD relative to healthy control participants.

Lastly, we aimed to determine whether the computations underlying decision-making deficits in euthymic BD influence the motivational aspects associated with the invigoration of movements. Evidence suggests that reward expectations can speed motor performance [51, 52], and the nigrostriatal dopamine pathway is crucial for invigorating future movements [53]. The ‘dopamine hypothesis’ of BD [54, 55] posits that dopamine dysregulation underlies both manic episodes and the broader episodic features of BD. Moreover, individuals with BD have been shown to exhibit heightened energy and effort following success, indicating enhanced motor vigour effects [4, 56]. Consequently, our final complementary hypothesis posits that the strength of predictions about reward contingencies will speed decision-related movements more in euthymic BD than in healthy individuals [57].

Methods and materials

Ethics declarations

The study was approved by the Institutional Review Board of National Research University Higher School of Economics and the Local Ethical Committee of the First Moscow State Medical University. All procedures contributing to this work comply with the relevant ethical guidelines and regulations for research involving human participants, including those of the approving institutional committees and the Helsinki Declaration of 1975, including its subsequent amendments. All participants provided written informed consent.

Participants

Participants included 22 bipolar patients (mean age: 29.1 years [SEM = 1.67], 17 females; Table 1), and 27 healthy participants (27.5 years [SEM = 1.18], 15 females). Bipolar participants were assessed by a consultant psychiatrist who confirmed the diagnosis of BD (I or II) using the structured clinical interview for the International Statistical Classification of Diseases and Related Health Problems (ICD-11) [58]. Patients included in the study were euthymic for at least 2.5 months before recruitment. Additional inclusion criteria were: most recent episode being depression, aged 18–50 years, absence of symptoms from other mental health conditions beyond BD, and no history of substance abuse. We assessed residual mood symptoms and cognitive performance using validated scales on mania, anxiety, depression and tasks on executive and general cognitive performance. See Table 1 for further details, and Supplementary Material for sample size estimates.

Table 1 Group demographics and variables describing the measures of affective state, cognitive and executive function in euthymic bipolar disorder patients (BD, N = 22) and healthy control participants (HC, N = 27).

Reward-based motor sequence learning task

Participants underwent an initial fine motor control assessment, then completed a validated motor-based decision-making paradigm [57] (Fig. 1A), which combines probabilistic binary reward-based learning within a volatile setting (reminiscent of reversal learning) with the execution of motor sequences to express decisions. Participants learned two sequences of four finger presses (matched in difficulty, Supplementary Materials), followed by a 320-trial test phase. In each trial, they were required to choose and perform one of the sequences to potentially earn a reward (5 points; Fig. 1A). Reward probabilities for sequences were reciprocal (p, 1-p) and changed pseudorandomly every 26–38 trials (Fig. 1B). The aim was to infer the reward probability associated with each sequence (‘action-outcome’ contingencies henceforth) and adjust their choices considering changing contingencies. Accumulated points translated to monetary rewards. See timeline in Fig. 1C. The task, programmed in MATLAB using Psychtoolbox, recorded participants’ keypress timings to evaluate reaction time (RT) and performance tempo (Fig. 1D). See Supplementary Material.

Fig. 1: Experimental paradigm and motor performance overview.
figure 1

A The initial phase of the task involved practicing two motor sequences, each linked to a distinct fractal image. The red fractal corresponded to sequence 1 (seq1: 1-3-2-4), and the blue fractal to sequence 2 (seq2: 4-1-3-2), with button presses producing sounds of varying pitches (E5, F5, G5, A5). B The stimulus-outcome mapping varied per participant across each block of 160 trials, with the win probability shifting every 26–38 trials through different phases (blue fractal: p(win|seq2) = 0.9, 0.7, 0.1, 0.3, 0.5) and the red fractal (seq1) having reciprocal probabilities (p(win|seq1) = 1-p(win|seq2)). Across both blocks participants encountered each contingency type twice. C Each trial presented the fractals on-screen, prompting participants to perform the sequence they believed most likely to win, aiming to maximize rewards. On average, participants performed the sequences within 1561 (SEM 40) ms, displayed as ~ 1600 ms. Binary feedback on reward acquisition was displayed 1000 [±200] ms after sequence performance, visible for 1900 [ ± 100] ms, indicating either ‘You earned 5 points’ or ‘You earned 0 points’. D Trial-by-trial performance tempo (ms) for the healthy control (HC, green) and bipolar disorder (BD, purple) groups. Tempo, calculated as the mean inter-key press interval, is shown as trial-wise averages (black dots) with 95% confidence intervals represented by bars.

General task performance

General probabilistic task performance was assessed using the win rate (rate of rewarded trials), lose-shift and win-stay rates [45, 59], related to our first hypothesis (Supplementary Material). Higher learning would be associated with higher values across three variables. Separately, we controlled for between-group differences in error rates (performance errors and timeouts).

Modelling decision-making behaviour using hierarchical Gaussian filters

To assess probabilistic learning in our task we used a validated hierarchical Bayesian model, the 3-level perceptual HGF for binary categorical inputs [26, 27](Fig. 2A). This model described how participants infer hidden states about the tendency of the action-outcome contingencies on trial k, x2(k)(level 2), and the rate of change in that tendency (log-volatility), x3(k). Level 1 represents the binary reward input. Gaussian belief distributions on levels 2 and 3 are represented by their posterior mean (μ2(k), μ3(k)) and posterior variance (uncertainty: σ2, σ3), where precision is the inverse variance or uncertainty, πi (i = 2, 3). The first-, second- and third-level variance (σ1, σ2, σ3) represent irreducible, estimation and volatility uncertainty [26, 60, 61]. Estimation and volatility uncertainty arise from imperfect information about the true states x2 and x3, respectively, and can be reduced as learning progresses. By contrast, σ1 cannot be reduced through learning, as it embodies the probabilistic nature of response-outcome relationships. See further details in Supplementary Materials.

Fig. 2: Computational model and changes in learning behaviour in euthymic bipolar patients.
figure 2

A Overview of the winning model: 3-level binary categorical HGF perceptual model and coupled response model. In this model, agents infer true states about the current tendency of the action-outcome probabilistic mapping on trial k, x2(k), and its rate change or log-volatility, x3(k). Beliefs about these true states are Gaussian distributions parametrised by their mean (μ2(k), μ3(k)) and variance (σ2(k), σ3(k)), representing uncertainty or the inverse of precision. These mean and precision variables are updated using one-step equations, with updates modulated by parameters such as κ, ω2, ω3. The response model maps these beliefs to decisions based on the expectation of log-volatility from the previous trial (μ3(k-1)), equivalent to the prediction for the current trial (denoted by "^", Supplementary Materials). B Trajectories used in further analyses include the strength of predictions about action-outcome contingencies,\(\left|{\hat{\mu }}_{2}^{\left(k\right)}\right|\) (top), for assessing motor vigour effects; the trajectory of unsigned precision-weighted prediction errors updating beliefs at level 2, labelled |ε2| here (centre), serving as a parametric regressor of source-reconstructed MEG activity, alongside uncertainty regressors σ2, σ3; and log-volatility estimates, μ3 (bottom), averaged to test the hypothesis that BD participants overestimate volatility in this setting. See expanded Supplementary Figs. S1 and S2. C Comparative win rates show BD participants (purple) were significantly less successful in achieving rewarding outcomes than their healthy counterparts (green; lower win rate, PFDR = 0.0014, permutation test). D BD patients exhibited a significantly higher tendency to switch after a win compared to the HC group (reduced win-stay behaviour, PFDR = 0.0194). Nonetheless, lose-shift behaviour was similar across groups (P = 0.0966, non-significant; BF10 = 0.8905; anecdotal evidence against group differences). Mean and SEM rates shown in black dots for panels c and d represent performance by ideal Bayesian observers with the same input as our participants (detailed in Supplementary Material), highlighting deviations from these ideal patterns in our actual participants, which however did not account for the observed between-group differences. EG Between-group comparisons of HGF computational variables revealed that BD patients consistently overestimated environmental log-volatility (E; initially, μ3(0): PFDR = 0.0142, and throughout the task, F; mean μ3: PFDR = 0.0428), while showing an attenuation effect on tonic volatility, ω2 (G; significant reduction compared to HC, PFDR = 0.0174).

Belief updating on each level i and trial k is driven by prediction errors, and modulated by precision ratios, weighting the influence of precision or uncertainty in the current level and the level below. This is termed precision-weighted PE, pwPE. For level 2, belief updating takes the simple form:

$$\triangle {\mu }_{2}^{k}={\mu }_{2}^{\left(k\right)}-{\mu }_{2}^{\left(k-1\right)}={\sigma }_{2}^{k}{\delta }_{1}^{\left(k\right)}$$
(1)

Thus, updating beliefs about the tendency of the action-outcome contingencies is proportional to the PE about action outcomes, δ1(k), weighted with the estimation uncertainty on that level, σ2(k). Here, pwPE is equal to σ2(k)δ1(k). See general equation, representing updates on level 3, in Supplementary Material, and ref. [27].

States x2, x3 evolve as random Gaussian walks, with volatility states x3 directly influencing the time evolution of x2 through its variance (conditional on past values):

$${x}_{2}^{k}\sim N\left({x}_{2}^{k-1},{f}_{2}\left({x}_{3}\right)\right)$$
(2)

with (dropping k for simplicity)

$${f}_{2}\left({x}_{3}\right)\mathop{=}^{\mathrm{def}}exp \left(\kappa {x}_{3}+{\omega }_{2}\right)$$
(3)

In (3), ω2 is the tonic portion of the log conditional variance of x2, and κ is a coupling constant that regulates how phasic volatility, x3, alters the magnitude of belief updates about action outcomes. The step size at level 3 is modulated by ω3, representing high-level tonic volatility. Larger values of ω2 and ω3 are associated to larger updates in beliefs about the probabilistic mapping at level 2 and volatility, respectively, as demonstrated in previous simulations [59]. See also Results. Higher κ values increase the influence of log-volatility changes on belief updates at level 2. See further details in Supplementary Material.

To assess how beliefs mapped to decisions, we coupled this perceptual model to response models previously used in similar tasks [27, 45, 46]. First, we considered a unit-square sigmoid response model where choice probability is shaped by a free fixed (time-invariant) parameter ζ, interpreted as inverse decision noise: the sigmoid approaches a step function as ζ tends to infinity. This constituted our model M1. Model M2 was similar but employed a two-level HGF with constant volatility. M3 combined the 3-level HGF with a response model where the sigmoid function depends on the trial-wise prediction of log-volatility, \(\zeta ={e}^{-{\mu }_{3}^{\left(k-1\right)}}\) [25](Fig. 2A). In this model, higher estimates of volatility lead to a more stochastic mapping from beliefs to decisions. As a result, there is an increased likelihood of choosing responses that deviate from predictions, consistent with increased exploration (exploring whether the contingency has changed). In models M1 and M3, parameters ω2 and ω3 were free; ω2 was also free in M2. Additionally, ζ was free in M1 and M2, while initial values μ3(0) and σ3(0) were free in M3. Higher initial values in μ3(0) indicate that an agent expects rapid changes in the probabilistic mapping initially, while σ3(0) represents the initial uncertainty an agent has about μ3(0). A fourth model, M4, was constructed similarly to M3 but replaced the free parameter ω2 with κ [32].

Models were fitted to individual behavioural data (series of responses and observed outcomes) using priors described in Supplementary Table S1. Log model evidence from random-effects Bayesian model selection was used for model comparison (Supplementary Materials). Simulations, similar to previous work, quantified the estimability of free model parameters [32, 45]. Relevant belief and uncertainty trajectories were used subsequently for our MEG analysis (Fig. 2B; and expanded figures in Supplementary Figs. S1 and S2; see Results). The models were implemented as a part of the TAPAS toolbox [62]. We used the HGF release v7.1 in MATLAB R2020b, and functions ‘tapas_ehgf_binary’.

Assessing motor invigoration

Details on assessing motor vigour are included in Supplementary Materials. Using Bayesian multilevel regression modelling, we investigated whether trial-by-trial predictions about the action-outcome contingencies, \({\hat{\mu }}_{2}^{\left(k\right)}\), differentially influenced the timing of motor performance in the groups, related to our motor vigour hypothesis. As in ref. [57], we hypothesised a negative association between the strength of predictions, \(|{\hat{\mu }}_{2}^{\left(k\right)}|\), and performance timing, suggesting that stronger expectations about reward contingencies speed performance. We also hypothesised a greater sensitivity to these predictions (steeper slope) in BD compared to HC. See Supplementary Tables S2, S3 and the Supplementary Material, which also includes control analyses on baseline motor performance.

MEG recording and preprocessing

MEG was performed using a 306-channel system (Elekta Neuromag VectorView), with head movements tracked by a head position indicator with four coils. Concurrently, ECG and EOG were recorded for MEG artefact rejection. Recordings were sampled at 1000 Hz and filtered between 0.1–330 Hz. MEG preprocessing involved head movement correction, noise reduction, and channel selection using standard methods ([63]; Elekta Maxfilter software; Supplementary Material). MEG data was further processed using MNE-python [64] (Python version 3.11.5) and custom Python scripts, lowpass filtered at 125 Hz, downsampled to 250 Hz, with a notch filter applied at 50 and 100 Hz. Independent component analysis (FastICA) removed eye and heart artifacts (3.3. ICs on average per participant).

Source reconstruction of MEG signals

We reconstructed MEG signals using Linearly Constrained Minimum Variance beamforming (LCMV [65]) in MNE-Python and individual T1-MRI images for cortical divisions with Freesurfer 6.0 [66, 67], http://surfer.nmr.mgh.harvard.edu/). We aligned MRI and MEG coordinate systems, selected the Desikan-Killiany atlas for cortical parcellation (DK [68]), and performed forward modelling with boundary element models [45].

We focused on alpha and beta frequency bands, band-pass filtering signals between 1–40 Hz before LCMV beamforming. Theta-band activity was also examined given its robust association with feedback processing [42, 69], relevant for win/lose outcomes in our task. Gamma frequency analysis followed a similar process (30–124 Hz band-pass filter). Time courses were extracted for regions of interest (ROIs) associated with decision-making under uncertainty and reward processing [43, 70,71,72,73,74,75], and linked with impairments in fronto-striatal reward circuitry in refs. [5, 9, 76, 77]. These included the (1) ACC, (2) OFC, including the ventromedial PFC, (3) dorsomedial PFC (dmPFC), (4) dorsolateral PFC (dlPFC). We also included the (5) primary motor cortex (M1) and (6) premotor cortex (PMC), to assess motor activity during decision-making [78].

Our study’s ROIs comprised 16 bilateral labels in eight areas from the DK atlas: (1) rostral and caudal ACC, (2) lateral and medial OFC (including vmPFC), (3) superior frontal gyrus (dmPFC, and supplementary motor area, SMA), (4) rostral middle frontal gyrus (rMFG), (5) precentral gyrus (M1), and (6) caudal MFG. Time series extraction utilised the PCA flip method in MNE-Python. Although the ‘flip’ operator was not relevant for our time-frequency analysis, it was essential for preparing the source-reconstructed time series for subsequent connectivity analysis. See anatomical label references in Supplementary Material.

Convolution modelling of time-frequency responses during outcome processing

We used a validated convolution-modelling approach to analyse frequency-domain amplitude changes related to belief updating and uncertainty following outcome presentation [44, 46, 79]. Building on previous work [45], this frequency-domain general linear model (GLM) included as parametric regressors the unsigned pwPE updating beliefs on level 2 (representing precision-weighted Bayesian surprise; the absolute value is preferred for the binary HGF where sign on level 2 is arbitrary [59, 80]), and uncertainty measures (σ2, σ3). It also included discrete regressors for win/lose outcomes and error trials. To avoid regressor collinearity and potential GLM misspecification, we excluded the level 3 pwPE [45, 46], due to its high linear correlation with the unsigned pwPE on level 2 (Supplementary Materials).

The GLM was applied to concatenated epochs of source-reconstructed data in our ROIs, using Morlet wavelets for time-frequency (TF) analysis in 4–100 Hz and within −0.5–1.8 s (Supplementary Fig. S3). We conducted this analysis using SPM12 software (http://www.fil.ion.ucl.ac.uk/spm/), adapting original code by ref. [81], as used in ref. [45], with additional details available in the Supplementary Materials.

Frequency-resolved functional connectivity

To analyse directed functional connectivity between frequency-resolved activity in our ROIs, we employed time-reversed Granger causality (TRGC [82]; Supplementary Materials) as a robust metric for directed information flow [83]. Following Pellegrini et al. [83], we applied TRGC in the frequency domain to LCMV-based source-reconstructed time series from our 16 ROIs after the PCA flip transformation.

Our analysis focused on between-group differences in the directionality of information flow within the 8–30 Hz range during the 0.5–1 s interval of outcome processing for trials with large unsigned pwPEs updating beliefs at level 2. We employed a median split of unsigned pwPE values, yielding approximately 160 high-|pwPE| trials per participant. This frequency range was selected based on evidence that beta-band functional connectivity from the PFC effectively differentiates levels of predictability, exhibiting reduced values during unpredictable trials [42]. By examining TRGC in trials with high unsigned pwPE values, we anticipated a general decrease in beta-band TRGC in HC, in parallel with alpha/beta amplitude suppression during belief updating. We hypothesised that this pattern would be disrupted in BD. The TRGC analysis was conducted using the ROIconnect plugin for EEGLAB [83], adapted for our MNE-python LCMV outputs. See Supplementary Materials.

Statistical analysis

Between-group analyses of behavioural, computational, and TRGC-derived variables used independent-sample permutation tests (5000 permutations, two-sided test) in MATLAB®. Within-subject analyses used paired permutation tests (two-sided). We maintained an alpha significance level at 0.05 and controlled false discovery rates (FDR) at q = 0.05 for multiple tests. Non-parametric effect sizes are reported as probability of superiority [84, 85] (Δ). Non-significant effects were further evaluated using Bayes Factors (BF10), interpreted following Wetzels and Wagenmakers [86].

Statistical analysis of source-level time-frequency images used cluster-based permutation testing in the FieldTrip Toolbox [87, 88] (1000 permutations). We averaged TF activity across frequency bins within each band (theta, alpha, beta; 60–100 Hz for gamma [45]). Temporal intervals of interest for statistical analyses were selected based on previous research [45, 46, 69]: 0.5–1.8 s for parametric regressors, 0.2–1 s for win/lose regressors. We controlled the family-wise error rate (FWER) at 0.05 (two-sided tests, effects considered if PFWER < 0.025). See Supplementary Materials.

Results

Demographics

BF analysis provided anecdotal evidence for a balanced distribution of age and sex across the groups. Furthermore, substantial to anecdotal evidence indicated similar scores in mania, anxiety, depression, and general cognitive functioning between groups. Significant differences were observed exclusively in executive functioning, with the BD group demonstrating lower performance. See Table 1.

Altered reward-based decision dynamics in bipolar disorder during euthymia

Euthymic BD participants exhibited lower win rates compared to HC individuals (PFDR = 0.0014; Δ = 0.79, CI = [0.60, 0.90]; Fig. 2C). They also demonstrated lower win-stay rates (PFDR = 0.0194; Δ = 0.71, CI = [0.55, 0.85]; Fig. 2D). This indicates that, after securing a win on a trial, BD individuals were less likely to repeat the sequence compared to HC. Their decision to switch strategies post-loss was similar, based on anecdotal evidence (lose-shift rate: P = 0.0966; BF10 = 0.8905; Fig. 2D), and despite an overall increased total switch rate in BD relative to HC (See details in Supplementary Materials, including evidence for similar performance error rates).

To test our computational hypotheses, we used the HGF framework [27]. Bayesian model selection identified as the best model overall, and for each group separately, a three-level HGF with a response model in which the decisions depend on dynamic trial-by-trial expectations of log-volatility, μ3(k-1), and with ω2, ω3, μ3(0), and σ3(0) as free model parameters (M3; Supplementary Table S4). Simulation analyses confirmed good parameter recovery (Supplementary Fig. S4).

Using this model, we found that BD participants had higher expectations of log-volatility initially and on average (μ3(0): PFDR = 0.0142; Δ = 0.70, CI = [0.55, 0.85]; trial-average μ3: PFDR = 0.0428; Δ = 0.66, CI = [0.52, 0.82]; Fig. 2E, F). This suggests increased stochasticity in their responses, as also indicated by a positive correlation between log-volatility μ3 and the response switch rate in the total sample and within each group separately (Supplementary Fig. S5a). Parameter μ3 also exhibited a negative correlation with the win-stay rate (Supplementary Fig. S5b; Supplementary Materials), consistent with the finding that BD had overestimation of μ3 and a lower win-stay rate than HC (Fig. 2D, F). Additionally, BD participants exhibited lower tonic volatility, ω2, compared to HCs (PFDR = 0.0174; Δ = 0.67, CI = [0.52, 0.82]; Fig. 2G), suggesting a slower adjustment of beliefs about action-outcome contingencies (see simulation analysis in Supplementary Fig. S6). No significant between-group differences were found in ω3.

We additionally assessed the association between residual symptoms in BD participants and relevant HGF variables. Prior work suggests a positive correlation between volatility and trait anxiety [45]. Accordingly, we analysed the relationship between trait anxiety levels in BD and μ3, confirming a significant positive correlation (Spearman’s rank correlation ρ = 0.46, 95% confidence interval, CI, [0.04, 0.75], PFDR = 0.030). For mania scores, we hypothesised a correlation with the precision weights term, σ2 (estimation uncertainty), which scales the influence of PEs on belief updates about action-outcome contingencies (Eq. (1)). We posited that higher mania levels in BD might be associated with an enhanced reactivity to PEs [21], speeding belief updating via σ2. Non-parametric regression analyses revealed a negative association between mania and σ2 (ρ = −0.46 [−0.75, −0.02], PFDR = 0.037). Conversely, we considered that depression scores might be associated with attenuated reward-based belief updating (lower σ2) yet found a lack of association between these variables (ρ = 0.04 [−0.34, 0.41], P = 0.836; BF10 = 0.464, anecdotal evidence). See Supplementary Fig. S7 and Supplementary Materials.

Control analyses revealed no medication effects (antipsychotics and dopamine-blocking/modulating drugs) on these associations or the main between-group computational results. See Supplementary Table S5.

Expectation about the tendency of the reward probability invigorates motor performance similarly in both groups

Bayesian multilevel modelling demonstrated that greater expectations about the tendency of the action-outcome probability speeded performance tempo, but similarly in BD and HC groups (Supplementary Table S6, Supplementary Figs. S8, S9). RT was not modulated by trial-wise predictions, as in ref. [57].

Attenuated neural representation of precision-weighted prediction errors updating beliefs about the action-outcome contingencies in bipolar disorder

During the processing of unsigned pwPEs about the tendency of action-outcome contingencies, HC and BD participants exhibited suppression of 8–30 Hz activity across prefrontal, orbitofrontal, cingulate, and motor regions (negative cluster within 0.5–0.9 s, post relative to pre-outcome baseline, PFWER = 0.001, 0.024 in each group; Fig. 3A, B). This suppression effect was less widespread in BD, and the between-group difference was significant across the caudal and rostral ACC, MFG, and OFC; as well as in the SFG and M1 (BD − HC positive cluster at 0.6–0.9 s, PFWER = 0.0130; Fig. 3C, D; Supplementary Fig. S12). Alongside these 8–30 Hz effects, the BD group exhibited significantly attenuated high gamma activity (60–100 Hz) compared to HC (negative cluster, PFWER = 0.0090; Fig. 3E). The latency of the gamma effect coincided with the timing of the alpha-beta modulations, spanning 0.5–0.82 s, and overlapping within the aforementioned ROIs. No significant within-subject changes in gamma activity to the unsigned pwPE regressor were observed in either group (Supplementary Material).

Fig. 3: Attenuated gamma increase and alpha-beta suppression during encoding unsigned precision-weighted prediction errors about stimulus outcomes in bipolar disorder.
figure 3

A Source reconstruction of MEG signals was carried out with linearly constrained minimum norm variance (LCMV) beamforming. The statistical analysis of convolution GLM results targeted brain regions implicated in decision-making under uncertainty and reward processing [43, 71,72,73,74,75], associated with impairments in the fronto-striatal reward circuitry in BD [5, 9, 76, 77]: caudal and rostral ACC, OFC (lateral and medial portions: lOFC, mOFC), SFG, caudal and rostral MFG, M1. Panel a illustrates these regions using anatomical labels from the neuroanatomical Desikan-Killiany atlas (DK), utilised to parcellate the cerebral cortex of each participant based on their individual T1-weighted MRI. B Left and centre panels display within-subject effects in time-frequency (TF) images representing oscillatory amplitude responses to unsigned precision-weighted PEs about stimulus outcomes. TF images cover the 4–100 Hz range, including theta (4–6 Hz), alpha (8–12 Hz), beta (14–30 Hz), and gamma (32–100 Hz) activity. The TF images were normalised by subtracting the mean and dividing by standard deviation (SD) of the activity in the [−300, −50] ms pre-outcome interval, and thus are presented in SD units. Significant within-subject effects are outlined in black for the HC (left) and BD (centre) groups (cluster-based permutation tests, negative cluster within 0.5–0.9 s post relative to pre-outcome baseline, PFWER = 0.001, 0.024 in each group, respectively. Although no within-subject effects in BD were observed in the illustrated SFG label, there were effects across other ROIs). The right panel shows the between-group differences, significant in a cluster-based permutation test (positive cluster within 8–30 Hz, PFWER = 0.0130; negative cluster within 60–100 Hz, PFWER = 0.0090; N = 21 BD and 27 HC independent samples). The time point 0 s marks the onset of outcome presentation. C, D Panels Illustrate between-group effects in the alpha (C) and beta (D) ranges, attributed to more pronounced alpha and beta suppression in HC than in BD participants during encoding of unsigned pwPE on level 2. Effects are depicted in ROIs including the cACC, lOFC, SFG, M1. E Similar to panels C and D but in the gamma range, showing that unsigned pwPE were associated with increases in TF amplitude in gamma range for HC participants, yet with gamma attenuation in BD participants, and across a similar range of ROIs. Labels denote the rostral anterior cingulate cortex, rACC; caudal ACC, cACC; superior frontal gyrus, SFG; lateral and medial orbitofrontal cortex, lOFC and mOFC; primary motor cortex, M1; caudal and rostral middle frontal gyrus, cMFG, rMFC.

In addition, for the uncertainty regressors σ2 and σ3, despite a significant widespread increase in 8–30 Hz activity to estimation uncertainty σ2 in HC, no significant between-group differences were observed (Supplementary Fig. S13). Regarding theta modulation by win and lose events, no significant differences were observed between groups either. However, as expected, both groups showed significant increases in theta activity from baseline in the ACC, extending to prefrontal and orbitofrontal ROIs (Supplementary Fig. S14).

In a post-hoc analysis, we investigated alpha and beta raw power during inter-trial intervals. This aimed to determine whether the reduced suppression in the 8–30 Hz range to the pwPE regressor in BD indicated a limited dynamic range of activity at these frequencies. Significantly lower power was observed in BD compared to HC, yet exclusively at 13–20 Hz. This effect emerged in most of the ROIs where the pwPE effect was expressed (Supplementary Fig. S15; Supplementary Material).

Frequency-domain functional connectivity patterns during unsigned pwPE processing

We next assessed group differences in the directionality of information flow during outcome processing for trials with large unsigned pwPEs updating beliefs at level 2. The BD cohort exhibited significantly larger TRGC coefficients than HC participants from the cACC to the rMFG and rACC, as well as from the SFG to the cMFG, in the beta frequency range (PFDR = 0.0032, 0.0064, 0.0064, respectively; Fig. 4). The effect from the cACC to the rMFG extended to the alpha range (Fig. 4A, C). These findings indicate stronger evidence for statistical dependencies between sources in the identified directions for BD than HC in the beta (alpha) range. Importantly, these between-group effects were not attributable to differences in signal-to-noise ratio (Supplementary Fig. S16).

Fig. 4: Time-reversed Granger causality during outcome processing for trials with large unsigned pwPEs updating beliefs at level 2.
figure 4

A Comparison of TRGC estimates in the alpha band for healthy control participants (HC, left column), bipolar disorder patients (BD, centre), and their difference (BD-HC, right column). The direction of information flow goes from rows to columns, with positive coefficients denoting increased predictability in that direction, while negative coefficients denote the reverse (increased predictability from column to row). Between-group statistical analysis was conducted in the above-diagonal values. Anatomical labels represent our regions of interest, bilaterally. Labels are displayed for one hemisphere. The coloured pixel in the right panel indicates a significant between-group difference in TRGC metric, after FDR control, due to increased evidence for TRGC from the caudal ACC to the rostral MFG in BD (PFDR = 0.0032). B Same as A but for the beta band, illustrating a significantly larger TRGC metric in BD than HC from the cACC to the rMFG and rACC, as well as from the SGF to the cMFG (PFDR = 0.0032, 0.0064, 0.0064, respectively). C Left: Illustration of the TRGC metric from cACC to the rMFG between 8–30 Hz for HC (green line: mean and SEM as shaded area) and BD (purple line: mean and SEM). The horizontal black line denotes the frequency bins of significant differences after FDR control, shown in A. Middle: Same as the left panel but for the TRGC metric from cACC to rACC, showing beta effects. Right: Same as left and middle panels, exhibiting larger TRGC metric values in BD than HC from SGF to the cMFG in the beta range. Labels: rACC, rostral anterior cingulate cortex; cACC, caudal ACC; cMFG, caudal medial frontal gyrus, rMFG, rostral MFG; lOFC, lateral orbitofrontal cortex; mOFC, medial OFC; SFG, superior frontal gyrus; M1, primary motor cortex.

Discussion

Completing our reward-based motor decision-making task, euthymic BD participants demonstrated lower win rates and a decreased tendency to repeat rewarded actions than HC, despite similar post-loss decision-making behaviour. Furthermore, employing the HGF to probe the computational processes underpinning decision-making, we found that BD participants expected more environmental volatility than HC, leading to a more stochastic mapping from beliefs to actions and higher switch rates, particularly after wins. These findings align with previous reports of heightened risk-taking and inconsistent behaviour in BD [89, 90], mirroring elevated win-switch tendencies in BD adolescents [91] and deficits in response reversal during remission [3, 4]. This suggests that decisions in euthymic BD are misaligned with their beliefs about recent successes, favouring suboptimal actions due to an overestimation of environmental changes, potentially overriding the influence of their beliefs about action-outcome contingencies on decisions.

Despite expecting increased volatility, BD participants were slower to adjust their expectations of action-outcome contingencies compared to HC, with a lower tonic volatility parameter ω2 indicating slower adaptation. Similar results in HGF modelling for paranoia [32] suggest that this propensity to anticipate change without learning from it appropriately may be a common feature across paranoia and BD. Additionally, despite similar residual symptom levels in BD and HC, trait anxiety in BD correlated with volatility estimates, aligning with findings that high trait anxiety exacerbates difficulties in adapting to environmental changes [45, 92].

Of relevance in BD, we observed that residual mania symptoms negatively correlated with estimation uncertainty, σ2, which scales the influence of PEs about action-outcomes on level-2 belief updating. Therefore, those with higher mania scores struggled more with updating these predictions. Given that early relapse in BD has been associated with a reduction in an empirical measure of belief updating in response to positive feedback [13], future work could investigate if computational metrics of belief updating like σ2 enhance prediction of clinical progression over behavioural indicators. Further investigations should also explore the effect of comorbid anxiety on volatility responses and relapse.

Despite deficits in decision-making and baseline executive function in our BD sample, motor performance invigoration was comparable to HC, indicating preserved motivational drive in euthymic BD. This contrasts with previous findings that rewards and success amplify energy and effort in BD [4, 56]. Our Bayesian analyses revealed a similar sensitivity of performance tempo to expectations about reward contingencies in both groups, highlighting that the alterations in euthymic BD were confined to decision-making processes.

On a neural level, convolution modelling on source-reconstructed time-frequency activity revealed BD individuals had attenuated neural representations of encoding unsigned pwPE—updating beliefs about action-outcome contingencies—compared to HC. This was marked by decreased gamma and increased alpha-beta amplitude changes 0.5–0.9 s post-outcome across multiple PFC, OFC, ACC, and motor regions. Spatial effects in anatomical PFC labels corresponded with functional vmPFC, dmPFC, and dlPFC, aligning with the neural correlates of decision-making under uncertainty [43, 72,73,74], and BD-specific neural alterations during reward processing [5, 9, 76, 77].

Recent rhythm-based formulations of predictive coding suggest distinct roles of oscillatory activity at different frequencies in conveying predictions and PE during perception [42, 93]. Alpha and beta oscillations in deep cortical layers are implicated in conveying top-down predictions, while gamma oscillations in superficial layers are associated with the representation of PE, particularly in sensory cortices and related areas [42, 93]. This division has received empirical validation in both human and animal studies, supporting generative models like predictive coding [94,95,96] and hierarchical Bayesian inference [44, 97], extending across perceptual and cognitive domains [43, 98]. In models of hierarchical Bayesian inference, like the HGF, these oscillatory activities may underpin pwPE encoding, demonstrating an antithetical modulation of alpha/beta and gamma activity [44]. The observed dysregulation of these rhythms in conditions like anxiety [45, 46] suggests a neurophysiological basis for symptoms resulting from imbalances in belief updating.

Our findings indicate that in euthymic BD, exacerbated alpha and beta activity may inhibit gamma activity during unsigned pwPE encoding, potentially accounting for maladaptive belief updating. This may reflect an under-reliance on using predictions about action-outcome contingencies to optimise behaviour, in line with the computational results. Such rhythmic changes match electrophysiological evidence of heightened beta and reduced gamma activity in BD during oddball processing [99, 100]. Moreover, using the TRGC to assess directional influences in frequency-domain activity, we observed stronger evidence for beta-band directional flow in BD compared to HC, from cACC to rACC and rMFG and SFG to cMFG during trials with larger unsigned pwPEs. TRGC values increased in BD but decreased in HC, aligning with expectations from primate research where beta-band Granger Causality in the PFC decreases during unpredictable trials—a pattern suggesting normative responses [42]. Thus, our study revealed that euthymic BD was associated both with altered frequency-domain amplitude changes and functional connectivity during belief updating.

Insufficient GABAergic neurotransmission and excessive glutamatergic activity have been linked to electrophysiological alterations in BD [101]. Considering that mood stabilisers for BD, such as valproate and lithium, may have opposing effects on beta activity and potentially on beta/gamma connectivity [101, 102], a promising avenue for future research is to assess the modulation of alpha-beta and gamma amplitude and connectivity during pwPE encoding in BD as potential markers for tracking treatment response and for diagnostic purposes. A key limitation of our study is the inclusion of patients on diverse psychiatric medications, including mood stabilisers, antipsychotics, and antidepressants. These treatments impact various neurotransmitter systems like dopamine and serotonin, affecting neural and behavioural aspects of decision-making [50, 103]. The varied effects of these medications may have influenced the magnitudes of the effects reported, a factor future research should consider. However, control analyses showed that medication types did not account for differences in the main or exploratory computational analyses. Additionally, the study was not preregistered, yet all analyses followed established pipelines from our recent work involving similar tasks [45, 57], except for the TRGC analysis. This was specifically designed based on similar Granger-causality analyses that assess rhythm-based hypotheses of predictive processing [42]. Lastly, our study did not contrast the HGF framework with alternatives like Bayesian change-point models [104] or those jointly estimating volatility and stochasticity [23]. Future research would benefit from such comparisons to validate the computational processes underlying belief updating alterations in BD. Integrating dynamic models of mood in BD [16, 38, 39] with the HGF framework in longitudinal studies will also be crucial for determining how markers identified in euthymic BD—overestimation of volatility and lower ω2—relate to changes across BD episodes, including depression, mania/hypomania, and mixed states. We tentatively propose that overestimation of volatility may be a trait marker of BD, with a stronger effect on reward or punishment learning depending on the episode.

In sum, our findings highlight significant alterations in belief updating among BD individuals during euthymia, when learning reward-based probabilistic mappings in volatile environments, without affecting the motivational aspects of motor execution. Importantly, the identification of frequency-domain amplitude and functional connectivity alterations underpinning these computational maladaptations provides crucial insights for enhancing relapse prediction and monitoring treatment response in future research.