Abstract
Goal-directed learning arises from distributed neural circuits including the prefrontal, posterior parietal and temporal cortices. However, the role of cortico-cortical functional interactions remains unclear. To address this question, we integrated information dynamics analysis with magnetoencephalography to investigate the encoding of learning signals through neural interactions. Our findings revealed that information gain (the reduction in uncertainty about the causal relationship between actions and outcomes) is represented over the visual, parietal, lateral prefrontal and ventromedial/orbital prefrontal cortices. Cortico-cortical interactions encoded information gain synergistically at the level of pairwise and higher-order relations, such as triplets and quadruplets. Higher-order synergistic interactions were characterized by long-range relationships centered in the ventromedial and orbitofrontal cortices, which served as key receivers in the broadcast of information gain across cortical circuits. Overall, this study provides evidence that information gain is encoded through synergistic and higher-order functional interactions and is broadcast to prefrontal reward circuits.
Similar content being viewed by others
Introduction
A key factor of human agency is the ability to create beliefs about the consequence of our actions. This ability provides the basis for rational decision-making and, in general, allows people to engage in meaningful life and social interactions all along their life1. Learning the causal relation between actions and outcomes is thought to be supported by the goal-directed system2,3,4. Neurally, goal-directed behaviours emerge from the coordinated activity of neural populations distributed over the associative fronto-striatal circuit5 and the limbic “reward” system5. More specifically, human neuroimaging and primate neurophysiology have shown that goal-directed learning recruits large-scale cortical circuits including the lateral and medial prefrontal cortex (lPFC and mPFC), posterior parietal cortex, orbito-frontal cortex (OFC), temporal gyrus and visual cortex, and the temporal parietal junction6,7,8,9,10,11,12,13,14,15.
Goal-directed learning is rooted in the balance between reward maximisation and information seeking processes16,17,18,19,20,21,22. Reward maximisation is formalised using associative23 and reinforcement learning models24,25, where agents learn by maximising cumulative rewards. Update of action values is driven by reward prediction error (RPE) signals, which indicate whether an outcome is better or worse than expected. At the neural level, RPE signals are encoded by neurons in the midbrain, ventral striatum and ventromedial prefrontal cortex (vmPFC)26,27,28,29,30,31,32. In parallel, information seeking processes support learning by reducing the uncertainty in the causal relation between actions and outcomes, through signals quantifying the degree of surprise and information gain17,33. Information gain (IG) is formalised in model-based RL algorithms or Bayesian models17,33,34,35,36,37,38,39 by Bayesian surprise40,41, which quantifies how much the agent’s belief changes given new observation. Classical associative learning models formalise this process using the concepts of “surprisingness” or “associability” of events42,43,44. At the neural level, IG has been mapped to the activity of distributed brain networks including the middle frontal gyrus, the insula and the intraparietal sulcus36,39,45,46,47. Neural correlates of subjective value of information in instrumental settings (i.e., information that can be used to guide future actions and future outcomes) have been observed in a distributed network including the ventral striatum, the vmPFC, the middle and superior frontal gyrus (i.e., the dorsolateral prefrontal cortex dlPFC), and posterior cingulate cortex48,49,50.
Accumulating evidence suggests that the brain encodes update signals supporting reward maximisation and information gain during goal-directed learning. However, the role of cortico-cortical functional interactions underlying goal-directed learning signals remains elusive. To address this issue, we used information decomposition techniques to investigate whether cortico-cortical functional interactions, defined as statistical relationships between the activity of different cortical regions51, encode learning signals for reward maximisation and information gain. In particular, we tested two hypotheses about the nature of functional interactions. First, we tested whether brain circuits encode information about cognitive processes by means of redundant and synergistic functional interactions52,53. Redundant encoding enhances robustness to perturbations by ensuring that critical information is consistently duplicated, and also facilitates the downstream readout of information54,55. However, it may lead to inefficiencies by consuming excessive resources and limiting overall encoding capacity. Synergistic encoding enhances flexibility by supporting the emergence of novel information and richer representations by means of complementary encoding. Yet, it is less robust, because it relies on the precise integration of multiple inputs to generate new, emergent information. How the brain trades off redundant and synergistic encoding remains unaddressed.
The second hypothesis is that cognitive functions emerge from higher-order brain interactions, beyond pairwise relations51,56,57,58,59,60. Generally, higher-order functional interactions have the potential to link multiple neurons or brain regions in complex ways, and they can potentially generate emergent phenomena such as information integration, cognitive flexibility, and broadcast information51,60,61,62,63. The two hypotheses (tradeoffs of synergy/redundancy and high-order interactions) are not mutually exclusive and may allow an efficient encoding of information revelation for learning. More specifically, we expected to identify a distributed synergistic encoding and broadcasting of learning signals over multiple brain areas of the goal-directed and reward circuits in the prefrontal cortex.
Here, we used an experimental task design that manipulates learning and induces highly reproducible explorative strategies across participants and sessions. We combined information decomposition techniques based on partial information decomposition64,65,66 and source-level high-gamma activity (HGA, from 60 to 120 Hz) from magnetoencephalography (MEG) to test whether cortico-cortical interactions over learning circuits encode and broadcast reward prediction-error and information gain. We found that information gain is encoded in a distributed network including the visual, parietal, lateral prefrontal and ventro-medial and orbital prefrontal cortices. Interestingly, information gain was encoded in both redundant and synergistic cortico-cortical functional interactions. These functional interactions displayed complementary local-versus-distributed properties, synergistic interactions begin more prone for long-range interactions. Synergistic interactions displayed higher-order behaviours with a key role played by the prefrontal reward circuit, which acted as a receiver in the network. Overall, we suggest that higher-order synergistic interaction play a role in encoding and propagating learning signals in the brain, with a pivotal role played by the limbic cortical regions.
Results
Behavioural scores, exploration strategy and learning signals
The experimental setup was structured to control the exploratory phase and guarantee consistent performance across sessions and individuals67. The correct stimulus-response associations were not set a priori. Instead, they were assigned as learning proceeds in the task (Fig. 1b). Participants were asked to discover the associations existing among stimuli (coloured circles) and actions (finger movement). The task was controlled in such a way that the first presentation of each stimulus was always followed by an incorrect outcome, irrespective of subject’s choice (Fig. 1b, trials 1 to 3). On the second presentation of stimulus S1, any new untried finger movement was considered as a correct response (trial 4 in Fig. 1b). For the second stimulus S2, the response was defined as correct only when the subject had performed 3 incorrect finger movements (trial 9 in Fig. 1b). For stimulus S3, the subject had to try 4 different finger movements before the correct response was found (trial 14 in Fig. 1b). In other words, the correct response was the second finger movement (different from the first tried response) for stimulus S1, the fourth finger movement for stimulus S2, the fifth for stimulus S3. This task design assured a minimum of eight incorrect trials during acquisition (1 for S1, 3 for S2 and 4 for S4).
A On each trial, subjects were presented a coloured circle to which they had to respond within 1.5 s. The outcome image was presented after a variable delay ranging from 1.5 to 2.5 s (randomly drawn from a log-normal distribution). The outcome image informed the subject whether the response was correct, incorrect, or too late (if the reaction time exceeded 1.5 s). B Matrix of all the possible stimulus–response combinations, updated according to the exemplar learning session in Fig. 1A for an ideal participant. A red cross and a green tick-mark refer to incorrect and correct stimulus–response sequences, respectively. The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the motor response (from trials 1 to 3). On the second presentation of the stimulus S1 (the blue circle), any untried finger movement was always followed by a correct outcome (trial 4). The correct response for S2 and S3 (red and green circles, respectively) was found after 3 and 4 incorrect finger movements (at trials 9 and 14, respectively).
Each participant performed four sessions of 60 trials each (total of 240 trials). Each session included three colours randomly presented in blocks of three trials. The average reaction time was 0.504 s ± 0.04 s (mean ± s.e.m.). The total number of error trials prior to the first correct outcome was 9.775 ± 0.38 (mean ± s.e.m.), thus indicating a low variability across participants and sessions. This indicated that the task produces reproducible behavioural performances across sessions and subjects, as observed in a previous study67. The total number of errors after the first correct outcome was 4.15 ± 0.34, and they represented ~9% of the trials during the exploitative phase.
For what concerns the behavioural strategies, participants employed a “tree-search” heuristic, characterised by a tendency to repeat the same finger movement across trials until the correct response was found. This strategy is referred to as “lose-stay”, where agents maintain the response when not rewarded. Given that stimuli were pseudo-randomised in blocks of three trials, we computed the probability of repeating a given choice within a block of trials. This probability quantified the likelihood of a directed exploration, as opposed to random. For the first block (trials 1 to 3), the adoption of a lose-stay strategy occurred in 67.5% ± 6%, whereas in the second block of three trials (trials 4 to 6), it occurred in 37.5% ± 4.5%.
If the participants had employed a random exploration strategy, the proportion of consecutive trials with identical actions would be equivalent to the ratio of the total number of three-trial patterns with matching actions to the number of possible ordered triplets derived from all combinations of 3 distinct actions chosen from 5 available options (m = 60 combinations). Therefore, the likelihood of encountering a sequence of trials with the same action due to chance alone would be 8.3%. Our results demonstrate that participants used a directed exploration strategy during learning and rapidly attained learning and proficiency during each learning session.
A Q-learning model24,68 was fitted to the behavioural data to estimate changes in stimulus-action-outcome probabilities and learning signals over trials. The fitted model provided accurate predictions of learning curves, defined as the the probability of correct responses over trials (Supplementary Fig. 1). Reward prediction errors (RPE) and information gain (IG) signals were extracted from the model. While RPE represents the discrepancy between the received and the anticipated outcome, IG measures the distance between the probability distributions of actions after and before the outcome. RPE embodies a scalar learning signal, fluctuating positive or negative based on whether reality surpasses or falls short of expectations, respectively. IG captures the magnitude of information conveyed by an outcome, and the amount of update in the associative model. We should note that RPE and IG do not covary monotonically, but display a U-shape relationship. Errors (i.e., negative RPE) and successes (i.e., positive RPE) are both associated with positive IG. As learning advances, both RPE and IG tend to zero (Supplementary Fig. 2).
Brain areas encoding information gain signals during learning
To characterise both the spatial organisation and temporal dynamics of brain areas encoding learning signals, we performed time-resolved mutual information analyses between the across-trials high-gamma activity (HGA) and outcome-related learning signals, such as reward prediction errors (RPE) and information gain (IG). Group-level random-effect analyses, combining permutation tests and cluster-based statistics (see Methods), were used to identify temporal clusters displaying significant encoding of learning signals and HGA. We did not find any cortical area significantly encoding \(({p}_{{cluster}} < 0.05)\) RPEs. On the other hand, a large-scale network comprising visual, temporal, parietal and frontal areas displayed significant encoding of IG (Fig. 2). Forty brain regions were found to encode significantly IG, approximately 49% of the brain regions in the atlas. The strongest and earliest response was observed in the bilateral occipital regions, with a peak centred on the primary visual areas (VCcm and CMrm) of MarsAtlas (see top middle inset of Fig. 2), corresponding to Brodmann areas (BA) 17, 18, 19 and 39. Significant encoding was also observed in the left medial and superior parietal cortices (SPCm and PCm), corresponding to BA 7 and 31, respectively. In the temporal lobe, the right temporal cortices displayed significant encoding of IG. In the right frontal lobe, the right premotor areas (PMdl and PMdm), BA 6 and 8, and the right dorsolateral prefrontal cortices (PFcdm, PFcdl, Pfrdli and Pfrdls, PFdls, PFrdl and PFrdll), roughly corresponding to BA 45, 46, 27 and the most posterior parts of BA 9 and 10. Finally, the bilateral activation of the ventromedial prefrontal cortex (PFCvm, BA 32/10/11), orbitofrontal areas (OFCv, OFCvl and OFCvl, roughly BA 10/11/47) and the insula were found to display significant encoding of IG. Supplementary Fig. 3 shows the same results shown in Fig. 2, but depicted on an inflated brain plot using MarsAtlas parcellation69.
Regions-of-interest (ROI) of the MarsAtlas are shown in the inset in the top-central part of the figure. ROIs are grouped in four panels according to the different lobes (occipital, parietal, temporal and frontal) and hemisphere (left and right panels). In each panel, coloured areas indicate the temporal clusters for which there is a significant encoding of information gain. The p-values for each cluster accounted for multiple comparisons and they were threshold to 0.05. Colormap shows the value of the mutual information (in bits) between the HGA and IG. Time in seconds is on the x-axis with respect to outcome onset (time 0). A large-scale network encodes IG and it recruits the occipital, parietal, right temporal, right dorsolateral prefrontal areas, ventromedial prefrontal cortex and, orbitofrontal areas.
Overall, IG was encoded in a set of eight cortical subnetworks (or clusters) including bilateral visual areas (the “L Vis” cluster included the left VCcm, VCl, Cu and VCrm; cluster “R Vis” included the right VCcm, VCl, Cu, VCrm and VCs), the right temporal regions (cluster “R Temp” included the right ITCm, ITCr, MTCc, STCc and MTCr), the left superior parietal area (cluster “L Par” included the left PCm and SPCm), the right motor and premotor areas (cluster “R MotPM” included the right Mdl, PMdl and PMdm), the right dorsolateral prefrontal cortices (cluster “R dlPFC” included the right Pfrdli, Pfrdls, PFcdm and PFcdl) and the bilateral ventromedial prefrontal and orbitofrontal areas (cluster “R vmPFCOFC” included the right OFCvl, OFCv, OFCvm, PFCvm, whereas the cluster “L vmPFCOFC” included the left OFCvl, OFCv, OFCvm and PFCvm). Such grouping was based on anatomical constraints and belonging to the same lobe or anatomical region.
In order to characterise the time course of learning-related signals encoding IG, we averaged the mutual information across areas within each height cluster of cortical regions displaying significant effects (Fig. 3). The aim was to extract an average time course for each anatomical cluster. The average time courses are shown in Fig. 3. The visual (Fig. 3A, B), temporal (Fig. 3C) and parietal (Fig. 3D) clusters displayed a time course characterised by a fast and transient increase in MI and peaking around 0.2-0.3 s after outcome onset.
The time axis indicates the time from the outcome presentation in seconds. A, B Left and right visual areas, respectively; (C) Right temporal area; (D) Right parietal area; (E) Right motor and premotor areas; (F) Right dorso-lateral prefrontal cortex; (G, H) Right and left ventromedial prefrontal and orbitofrontal cortices respectively. Time in seconds is on the x-axis with respect to outcome onset. Shaded areas are standard errors of the mean computed across ROIs within each cluster.
The encoding in temporal regions peaked around 0.3 s after outcome onset (Fig. 3C) and returned to based around 0.8 s after outcome onset. The premotor and dorsolateral regions of the right hemisphere peaked approximately from 0.3 to 0.4 s after outcome and they were characterised by a slower return to baseline (Fig. 3E). Finally, the vmPFC and OFC regions were characterised by an onset at ~0.2 s and an elevated encoding lasting until almost 1 s (Fig. 3F–H).
To differentiate complementary spatio-temporal patterns of encoding of IG, we performed non-negative matrix factorisation (NMF) of the MI values across clusters. The rationale was to decompose the MI time courses into interpretable temporal components and corresponding spatial weights. Figure 4A shows the variance explained by the first ten components of the non-negative matrix factorisation. Based on the so-called the elbow method, which is a standard heuristic for determining the number of components in a dataset, we studied the first four components whose loadings are depicted in Fig. 4B. Figure 4C shows the variance explained of the top four components, normalized by their total variance (in colour) for each brain area (rows). The first component (in blue) modelled the fastest and most transient encoding of IG and it is mainly observed in the visual and parietal areas (blue bars in Fig. 4C). The orange and green components occurred later in the trial, peaking around 0.35 and 0.45 s after outcome onset, recruiting primarily the temporal, premotor and dorsolateral areas. Finally, the fourth component (in red) captured the slowest component that emerged around 0.5 s and lasted ~0.4 s. This latest component mainly recruited the ventromedial PFC and OFC cluster. Overall, these results show that the encoding of IG recruits a large-scale network of cortical clusters with overlapping temporal and spatial dynamics.
A Variance explained by each component of the non-negative matrix factorisation. B First four loading of the non-negative matrix factorisation. The colour of the loading indicates the component number (see legend). The time axis indicates the time from the outcome presentation in seconds. C Variance explained of the top four components, normalized by their total variance. Each row shows the percentage of variance explained by each component across all brain regions. The colour of each bar indicates the component number (see legend in panel B).
Cortico-cortical interactions mediating information gain
We then tested the hypothesis that information gain is encoded at the level of cortico-cortical interactions, in addition to local activations. To do so, we exploited recent advances in information theory that allow isolating the contribution of network-level interactions from those carried at the node level. We thus decomposed the information carried by pairs of brain regions about IG into redundant (i.e., shared or common between regions) and synergistic (i.e., requiring the joint combination) terms, as formalised by the Partial Information Decomposition (PID) framework64,65. The Partial Information Decomposition (PID) framework enables the breakdown of multivariate mutual information between a set of predictor variables and a “target” variable into distinct, non-negative components. These components quantify the unique, redundant, and synergistic contributions of the predictors in conveying information about the target variable. In our analysis, we measured the redundant and synergistic information provided by pairs of brain regions (predictors) regarding IG (the target variable). The predictor variables consisted of the high-gamma activity (HGA) recorded across trials from two brain regions, while the target variable represented the trial-by-trial evolution of IG. Redundant information captures the shared contribution of both brain regions to IG, meaning the information that regions share. In contrast, synergistic information reflects the additional information carried only when both regions are observed together, which cannot be obtained from either one individually.
Significant redundant information about IG was observed across multiple brain regions (Fig. 5A). Strongest redundant connectivity was observed earlier over bilateral visual areas during the first 0.2 s after outcome presentation (left panel of Fig. 5A). Redundancy connectivity then spread over the right temporal lobe, right motor and premotor cortices, and vmPFC and OFC in the time interval between ~0.2 and 0.8 s. Redundant connectivity over the vmPFC—OFC cluster were long lasting and faded away ~0.8 to 1 s after outcome (rightmost panel of Fig. 5A). Interestingly, we observed strong synergistic interactions between specific pairs of brain regions (Fig. 5B), indicating that these regions jointly contribute information about the target variable in a way that neither region provides independently. Synergistic connectivity appeared to link distant areas of the visual system, the right temporal areas, the right dlPFC and the left vmPFC and OFC. The peak of interaction strength occurred from 0.2 to 0.5 s after the outcome presentation.
Redundant (A) and synergistic (B) pairwise interactions significantly carrying information about information gain signals. Each panel is the average value in 0.2 s time windows aligned on outcome presentation. The labels of the regions of interest plotted in a circle are identical to those of Fig. 2. The position of each brain region over the cortical sheet and lobe can be seen in the inset of Fig. 2. The p-values for each cluster accounted for multiple comparisons and they were threshold to 0.05. The colour of each node indicates its association with a particular lobe (visual in purple, temporal in red, parietal in blue, and frontal in green).
Redundant connectivity appeared to predominantly recruit pairs of brain regions with the same region or cluster, whereas synergistic connectivity appeared to be stronger across systems. In order to verify such an effect, we computed the average redundant and synergistic connectivity between pairs of brain regions within and across anatomical clusters in the visual, temporal, motor and premotor, dlPFC and vmPFCOFC clusters. The visual cluster included the left and the right VCcm, VCl, Cu and VCrm areas; the temporal cluster included the right ITCm, ITCr, MTCc, STCc and MTCr; the motor and premotor cluster included the right Mdl, PMdl and PMdm; the dorsolateral PFC included the right Pfrdli, Pfrdls, PFcdm and PFcdl; and finally the vmPFCOFC region included the bilateral OFCvl, OFCv, OFCvm, PFCvm, OFCvl, OFCv, OFCvm and PFCvm. Indeed, we found that redundant connectivity showed stronger average value across pairs of brain regions within the same module (Fig. 6A), whereas synergistic coupling showed a stronger effect across modules (Fig. 6B).
Average redundant (A) and synergistic (B) connectivity computed over pairs of brain regions being part of the same (within) or different (across) cluster. The “within-cluster” connectivity is represented by a solid line, while the “across-cluster” connectivity is depicted using a dotted line. Mean and standard error of the mean are plotted as solid line and shade area, respectively. The time axis indicates the time from the outcome presentation in seconds. Please note that the scales are different across panels (A, B).
By definition (Eq. 9), redundant functional interactions represent the smallest amount of information about learning-related variables that is already present in each individual brain area. In other words, they quantify the overlap in information shared by brain regions, reflecting the common contribution that each region independently provides about the learning process. On the other hand, synergistic connectivity reflects functional interactions that arise only through the combined activity of multiple brain areas, rather than being explained by the individual contributions of each region alone. Significant synergistic functional interactions therefore reveal the presence of collective behaviour encoding IG that emerge from the coordination of distant brain areas and circuits. This highlights the important role of collective and synergistic processes beyond the contribution of individual brain areas and redundant information encoding.
Higher-order cortico-cortical interactions and information gain
Next, we investigated whether information relevant to cognitive functions, such as IG, is encoded in a distributed manner across multiple brain regions, extending beyond simple pairwise interactions to higher-order dynamics51,56,57,58,59,60. Building on recent studies that emphasize higher-order correlations beyond pairwise interactions as a potential foundation for emergent phenomena59,61,70, we examined whether cortico-cortical interactions encode IG through higher-order co-modulations. We anticipated a central role for brain regions involved in reward and goal-directed circuits, such as the vmPFC, OFC, and dlPFC, potentially carrying higher-order synergistic information through interactions among multiple regions.
To do so, we studied higher-order relations between the eight cortical clusters encoding IG (Figs. 3 and 4). Since each cluster independently encodes information gain, they are also expected to contribute to redundant information at higher-order relational levels. However, higher-order synergistic connectivity—representing information that emerges from collective patterns beyond pairwise relationships—requires further investigation. Cluster-based statistical analyses and correction for multiple comparisons (across all pairs and multiples across all orders) were performed to identify significant higher-order encoding of IG. Statistical analyses revealed a total of 15 pairs, 7 triplets and 4 quadruplets to be encoding significantly IG (Fig. 7A). Higher-order synergistic co-modulations carrying information about IG were found up to order 4. Given that we analysed 8 clusters, the percentage of significant pairwise links, triplets and quadruplets was 53%, 12.5% and 6%, respectively. We next asked whether higher-order functional interactions were associated with hypergraph or simplicial complex representations. As a reminder, a hypergraph is a description of higher-order interactions, consisting of k nodes representing a k-way interaction between the nodes. A simplicial complex is a special case of a hypergraph, where all possible (k-1)-way interactions among that same set of nodes occur71,72. In other words, hypergraphs without lower-level interactions may reflect a stronger “marker” of emergent collective behaviour, because it would lack lower level components. We found that 6 out of 7 triplets were composed of significant pairwise links, and therefore reflected simplicial complexes (Fig. 7A). Only one triplet was composed of nodes that did not display all pairwise interactions, therefore reflecting a stronger emergent form of collective behaviour (L Vis - R MotPM - L vmPFCOFC). The four quadruplets were not composed of all lower order triplets. This can be observed in Supplementary Fig. 4, which depicts the same results but using hypergraph representations. Overall, these results indicate that IG is encoded in a distributed fashion through synergistic interactions that extend beyond individual brain regions, revealing genuine higher-order synergy.
A Time course of t-values showing significant encoding of IG at the level of pairs of clusters (black labels), triplets (blue label) and quadruplets (red labels). We then computed the synergistic weight, defined as the sum of the synergistic information for each cluster. This value reflects the amount of contribution in synergistic encoding across scale. B Synergistic weight for clusters and for pairwise interactions. C, D show cluster weight for triplets and quadruplets, respectively. The time axis indicates the time from the outcome presentation in seconds. In panels (B−D), clusters are ranked in ascending order based on their maximum synergistic weight value.
Significant temporal clusters were mainly observed over the interval between 0.1 s and 0.6 s, and no clear differentiation between pairwise and 3rd- or 4th-order interactions. To better characterise the participation of clusters in each multiple, we computed the synergistic weight defined as the summed synergistic connectivity. Figure 7B shows the nodal synergistic weight at the pairwise level, highlighting that the bilateral visual clusters exhibit the highest values. However, at higher orders, the visual clusters were accompanied by the left vmPFC-OFC, which dominated both triplet (Fig. 7C) and quadruplet (Fig. 7D) interactions. This indicates that the left vmPFC-OFC, a key region in the reward circuits, gained prominence at higher orders of synergistic interactions.
Cortico-cortical information transfer encoding learning signals
Previous results highlighted a time-lagged encoding of IG from visual to prefrontal areas, suggesting a feedforward broadcasting of IG signals to prefrontal regions (Figs. 3 and 4). In order to test this hypothesis, we exploited a recently-developed information-theoretic measure termed Feature-specific Information Transfer (FIT). Feature-specific Information Transfer (FIT) quantifies how much information about specific features flows between brain areas73. FIT merges the Wiener-Granger causality principle74,75,76 with content specificity based on the PID framework64,65. FIT isolates information about a specific task variable Y (information gain) encoded in the current activity of a receiving neural population, which was not encoded in its past activity, and which was instead encoded by the past activity of the sender neural population. We used the FIT measure to quantify the broadcasting of IG between the eight clusters of brain regions. Four directional relations significantly encoded IG and broadcasted information to the vmPFC and OFC (Fig. 8). The temporal information is aligned from the point of view of the receiving brain area. The IG-specific information transfer peaked around 0.35 to 0.4 s after outcome onset, thus roughly corresponding to the first peak in HGA observed locally. Two patterns of interactions converge towards the left and right vmPFC-OFC, respectively. These “colliding” patterns included two higher-order synergistic multiplets, namely a triplet including the right visual and temporal regions with the left vmPFC-OFC, and a quadruplet including right visual and temporal regions with the bilateral vmPFC-OFC (Fig. 7).
Discussion
Information seeking and directed exploration in goal-directed learning
Goal-directed learning is defined as the ability to learn the causal relationship between our behaviours and their outcomes; this supports the selection of actions according to expected outcomes, as well as current goals and motivational state2,3,4. Goal-directed learning relies on the balance between exploitation and exploration strategies16,17,18,19,20,21. A trade-off between choosing the most valuable option and sampling informative options is central for successful goal-directed learning and it is one of the main focus of reinforcement learning and Bayesian learning theories25,33,77. Exploration in humans has been shown to be based on a mixture of random and directed strategies78,79. Random exploration is a deviation from the most rewarding option, and it is normally formalised using the softmax choice rule, which models the degree of randomness (decision noise). Directed exploration selectively samples options that are informative (information seeking), that are associated with the highest uncertainty.
Here, we have addressed the neural computations supporting information seeking during goal-directed learning in terms of information gain and directed exploration. In order to investigate directed exploration and information gain, we used a learning task previously developed for fMRI67 that manipulates learning and induces reproducible explorative strategies across subjects and sessions. Learning was characterised by a directed exploration strategy based on a “tree-search” or “lose-stay” pattern, in which participants had a tendency to repeat the same action after incorrect outcomes and span all options in an ordered manner. Given that the correct action-outcome association was deterministic, such directed exploration strategy pertained only the initial phase of the task, up to the first correct outcome. Subsequently, participants performed a relatively low number of maintenant errors and engaged in an exploitative strategy. From the computational point of view, information seeking and exploration is formalised using model-based RL algorithms or Bayesian models33,35,36,37,38,77,80 and it has been proposed to play a general role in curiosity-driven learning17,34. The value of actions is not solely determined by the expected utility (or extrinsic value), but also by the informative (intrinsic, epistemic, or non-instrumental) value that is expected to be gained, which helps reduce uncertainty about environmental contingencies16,33,81,82. Regarding the update of action values during learning, we propose that directed exploration involves at least two processes: (i) updating the expected reward (extrinsic) value of an action via reward prediction errors (RPE), and (ii) updating the (intrinsic or informative) value of an action using an information gain (IG) signal. IG signals track the trial-by-trial evolution of information gain during directed exploration. This information is used to reduce uncertainty about the causal relationship between actions and outcomes, thereby decreasing the agent’s uncertainty about the world. Within the taxonomy of surprise definitions, our study focuses on the subcategory known as “belief-mismatch surprise”80, which we refer to as information gain. A limitation of our study is the inability to link the neural results to alternative measures of information gain within the belief-mismatch surprise category, such as the minimization of free energy83.
Distributed spatio-temporal cortical activations encoding information gain
To study how IG is encoded in the brain, we performed information theoretical analyses of single-trial and cortical-level high-gamma activity (HGA) from MEG. A network comprising visual, temporal, parietal and frontal areas was found to encode IG during goal-directed learning (Fig. 2). At the circuit level, the results suggest that information gain is encoded in the dorsal and ventral circuits of the goal-directed system14,84. The dorsal circuit, comprising the dorso-lateral prefrontal cortex (dlPFC) and the inferior parietal cortices, is thought to primarily encode information relevant for actions and control behaviour to achieve goals. The ventral circuit, comprising the ventro-medial prefrontal cortex (vmPFC) and orbito-frontal cortex (OFC), is suggested to primarily learn to assign and update subjective values to potential outcomes and motivational value of objects in the environment, owing to visual input from the anterior temporal cortex and connections with the hypothalamus13. Our results show that IG is encoded over both the ventral and dorsal circuits without a clear dissociation between circuits. Our findings confirm that IG and Bayesian surprise are represented in distributed brain networks including the middle frontal gyrus, the insula and the intraparietal sulcus36,39,45,46,47. Indeed, fMRI activity encoding belief updates (Bayesian surprise) has been observed in the midbrain and ventral striatum, in addition to the presupplementary motor area (pre-SMA), dorsal anterior cingulate cortex, posterior parietal cortex and lateral prefrontal cortex85,86. Our results complement fMRI studies showing correlates of information value in instrumental settings (i.e., information that can be used to guide future actions and future outcomes) in the ventral striatum, vmPFC, the middle and superior frontal gyrus (i.e., the dorsolateral prefrontal cortex dlPFC), and posterior cingulate cortex48,49,50. Further work is needed to fully appreciate the relation between the processing of information about outcomes and action selection during information seeking.
At the hemispheric level, a right-lateralized activation was observed (Fig. 2 and Supplementary Fig. 3). This result may relate to established literature showing a right-hemisphere dominance for stimulus-driven shifts of spatial attention and target detection87,88 and activation patterns observed in the ventral attentional network89. This result also resonates with associative learning models formalising attentional processes during learning using concepts such as “surprisingness” or “associability” of events44. We propose that attentional processes, along with the neural circuits involved, contribute to information-seeking behaviour, although the exact relationship between them remains unclear.
At the regional level, a key finding was the encoding of IG in vmPFC and OFC. This suggests a link to the reward circuits in limbic regions classically associated with the encoding of reward prediction error (RPE) signals26,27,28,29,30,31,32. Our results are in line with the notion of a “common currency” for information and reward values in the limbic system50,90. These areas are main targets of dopaminergic projections, which encode, in non-instrumental settings, expected advance information about future outcomes encoding errors in information prediction (information prediction error, reflecting the difference between obtained and expected information gain)81,91,92, in the OFC in human fMRI93 and in frontal EEG electrodes94. The anterior cingulate cortex (ACC) and two subregions of the basal ganglia (the internal-capsule-bordering portion of the dorsal striatum and the anterior pallidum) signalling reward uncertainty and information-anticipatory activity95 may additionally provide a substrate for common encoding of reward and information value. The underlying neural mechanisms are currently unknown, However, converging evidence supports that subpopulations of midbrain dopamine neurons encoding motivational values96 may contribute to the observed activations in limbic regions. This may explain the assignment of value to uncertainty reduction, integrating the value of information with the value of physical rewards, and may provide a link with the neural circuits for information seeking and curiosity-driven behaviours97.
We suggest that the current results provide a comprehensive understanding of the role of ventromedial PFC and OFC in decision-making and information seeking. It is widely accepted that vmPFC/OFC plays a key role in learning and decision making by encoding outcome information and value98,99,100,101. Moreover, during goal-directed exploration, fMRI activity in the rostral frontopolar cortex (RFPC) activity102 and frontal theta component in EEG have indicated a direct role in directed exploration103,104, rather than random exploration105. Stimulation and inhibition of RFPC with direct current (tDCS) has shown to increase and decrease the frequency of exploratory choices106,107. These studies put forward the idea that the RFPC is crucial for integrating information about past, present, and future payoffs to arbitrate between exploration and exploitation in human decision making. Our results suggest that a key computation supporting directed exploration and information seeking is the processing of information gain. Overall, the results provide strong support for the idea that information gain signals are encoded in the human brain. This mediates the brain’s capacity to integrate uncertainty-reducing information, as predicted by theories of information-seeking behaviour33,35,36,38,39,46,108 and curiosity17,34. This also aligns with computational models that aim to combine reward maximization with information gain mechanisms17,33.
Our results provide insights into the temporal dynamics of information gain during directed exploration, an aspect that was lacking in previous neuroimaging and neurophysiological studies in primates. The strongest and earliest encoding of information gain occurred in occipital regions, followed by the parietal and temporal cortices, and the right frontal lobe including premotor and dorsolateral prefrontal cortices, and final activation of the ventromedial prefrontal cortex orbitofrontal areas and insula (Fig. 3). We statistically dissociated four complementary spatio-temporal patterns of HGA encoding IG (Fig. 4). The first component modelled the fastest and most transient encoding of IG recruiting the visual and parietal areas. The second and third component occurred later in the trial peaking around 0.35 and 0.45 s after outcome onset and they recruited primarily the temporal, premotor and dorsolateral areas. Finally, the fourth component captured the slowest component that emerged around 0.5 s and later ~0.4 s, involving the ventromedial PFC and OFC cluster. Overall, our results demonstrated that the encoding of IG recruits a large-scale network of cortical clusters with overlapping temporal and spatial dynamics, spanning visual to limbic frontal areas.
Redundant and synergistic functional interactions encode information gain
We next investigated whether and how information relevant for learning such as IG is encoded in cortico-cortical interactions. We observed that IG is encoded at the level of cortico-cortical interactions (Fig. 5), in addition to local activations (Fig. 2). We performed information decomposition analyses to quantify the information carried by pairs of brain regions, by dissociating the encoding of IG into redundant (i.e., shared or common between regions) and synergistic (i.e., requiring the joint combination) interactions. We observed that IG was encoded both by redundant and synergistic cortico-cortical co-modulations (Fig. 5). Strongest redundant connectivity was observed earlier over bilateral visual areas during the first 0.2 s after outcome presentation and then spread over the right temporal lobe, right motor and premotor cortices, and vmPFC and OFC cluster. By definition (Eq. 9), redundant functional interactions equate the minimum of information carried locally. Redundant functional connectivity therefore reflects the shared information carried by co-modulations in HGA across areas (Fig. 5A). On the other hand, synergistic connectivity analysis reveals functional interactions that cannot be explained by individual brain areas, but that emerge from the collective coordination between areas at a specific. Synergistic functional interactions therefore reveal the presence of collective behaviour encoding IG that emerge from the coordination of distant brain areas and circuits (Fig. 5B). Contrary to redundant connectivity, synergistic coupling showed a stronger effect across clusters, rather than within clusters (Fig. 6), therefore suggesting a more distributed and long-range nature of synergistic functional interactions. Our results align with a body of literature supporting the idea that cognitive functions, including learning, emerge from the coordinated activity of neural populations distributed across large-scale brain networks109,110 and emerge from network-wide and self-organised information routing patterns111,112. This distributed nature of neural processing, as observed in task-relevant representations and sensory-to-action transformations, is in line with the growing evidence of brain-wide dynamics during learning and decision-making13,14,113,114. In mice, recent evidence suggests that transformations linking sensation to action during learning and decision-making are highly distributed and parallelized in a brain-wide manner115.
Within this framework, we suggest that redundant and synergistic functional interactions could be interpreted as a proxy of functional segregation and integration processes, although a direct mapping may be lacking. Redundant functional interactions may appear in collective states dominated by oscillatory synchronisation109,116,117,118 or resonance phenomena119. Synergistic functional interactions may be associated with functionally-complementary interactions (i.e., functional integration). Indeed, synergistic interactions have been reported between distant transmodal regions during high-level cognition120,121 and, at the microscale, in populations of neurons within a cortical column of the visual cortex and across areas of the visuomotor network59,122. During learning, a recent study has shown that redundant functional interactions encoding either reward or punishment prediction errors are associated by segregated networks in prefrontal regions, whereas the integration between reward and punishment learning is mediated by synergistic interactions between them123. Synergistic interactions may provide functional advantage, in contrast to redundancy and unique information, because they would enable the full exploitation of possible combinations of neurons or brain areas, making their informational capacity to be exponential with the system size124. Redundancy, instead, would provide robustness, because over-representation would ensure that information remains available even if any neuron or brain region would be perturbed. A drawback of redundancy would be the cost of over-representation, whereas a limitation of synergistic encoding would be the vulnerability to disruptions in individual nodes would disrupt information synergistically held together with other sources125. Indeed, information decomposition analysis of artificial neural networks performing cognitive tasks has shown that while redundant information is required for robustness to perturbations in the learning process, synergistic information is used to combine information from multiple modalities and more generally for flexible learning126. Along the same reasoning, artificial networks exhibit higher levels of synergy at comparatively lower orders, rather than higher orders, probably as an attempt to minimise the effect of losing synergistic information with several neurons127. In brain networks, redundant functional interactions may be important for robust sensory and motor functions, whereas synergistic interaction for flexible higher cognition52. The tradeoff between redundancy and synergistic encoding may support robustness and resilience at the large-scale level, as suggested for microscopic neural networks51. Within this interpretative framework, our results provide support to the hypothesis that brain interactions regulate segregation and integration processes to support cognitive functions111,128,129,130,131,132,133. Such balanced states may give rise to selective patterns of information flow112,134,135,136, as observed in the feature-specific information transfer (FIT) results (Fig. 8). Taken together, our results suggest general principles for network interactions, whereby synergistic relations encode relevant information for goal-directed learning. The causal effect of synergistic interactions on cognition remains, however, an open question that could be tackled with perturbation or pharmacological manipulations.
Higher-order synergistic encoding and broadcasting of information gain
In order to explore the nature of synergistic cortico-cortical interactions in directed exploration, we tested whether neuronal interactions supporting cognitive functions require collective behaviours among multiple brain areas, the so-called higher-order correlations51,56,57,62,122,137,138,139,140,141. Higher-order interactions are at the core of complex systems71,72,142,143 and they are thought to support collective behaviours in social systems144, ecology145,146 and biology147. We generalised information theoretical decomposition methods to higher-order functional interactions, and we showed that information gain is encoded in higher-order synergistic functional interactions up to quadruplets. By definition (Eq. 12), significant multiplets at a given order encode synergistic information that is not present at lower orders. For example, significant triplets contain information beyond what is carried by the underlying pairwise connections. Similarly, the two largest quadruplets—comprising the visual-temporal-vmPFC-OFC and visual-parietal-vmPFC-OFC clusters—encode additional information that cannot be explained by lower-order interactions (Fig. 7A). Our results demonstrate that IG is encoded by functional interactions beyond the pairwise regime. Higher-order synergistic co-modulations encoding IG were found up to order 4. The analysis of centrality of individual brain areas in higher-order behaviours showed that visual areas dominated pairwise functional interactions, whereas the vmPFC-OFC dominated both triplet and quadruplet interactions (Fig. 7C and D). In addition, feature-specific information transfer (FIT) analyses73 showed that IG signals are broadcasted from multiple brain regions and converge to the ventromedial PFC and OFC (Fig. 8). These results further support the conclusion highlighting a central role of the vmPFC-OFC cluster in the processing of information gain.
Limitations and open questions
Information gain leads to uncertainty reduction in action-outcome associations. People are equipped with cognitive systems that estimate and handle uncertainty148 to drive further learning105,149,150,151,152. Uncertainty can be formalised as the entropy of the probability distribution over possible actions, representing the degree of unpredictability about most rewarding action. From the mathematical point of view, however, uncertainty and information gain are related. Information gain can be decomposed into the difference of the cross entropy of \(P({a}_{t-1})\) and \(P({a}_{t})\), which measures the mismatch between the prior and updated posterior policy, and the entropy of the posterior distribution \(P({a}_{t})\), which measures the remaining uncertainty in the action selection policy after update. The difference of these terms measures the information gain or the reduction in uncertainty achieved by updating the prior to the posterior after each observation (see methods section for more details). IG and uncertainty reduction are closely linked, particularly in the deterministic task used in this study. Further research is needed to explore whether IG and uncertainty signals are differently encoded in synergistic interactions in the brain.
A second limitation is the difficulty in determining whether information gained from correct outcomes differs from that gained from errors. In this task, IG and RPEs show a U-shaped relationship, with both errors (negative RPE) and successes (positive RPE) linked to positive IG. As learning progresses, both RPE and IG tend to decrease. Therefore, we cannot determine if IG is more associated with confirmation (linked to positive RPEs) or conflict monitoring (linked to negative RPEs). The IG signals here reflect a combined contribution from both errors and successes. Future research could explore the distinct circuits involved in IG from errors and successes during goal-directed learning.
A third limitation is the focus on high-gamma activity, as it may overlook the role of phase synchronization in neural interactions. Phase synchronization, particularly in lower frequency bands (such as alpha and beta range), facilitates long-range connectivity by aligning neural excitability across regions, enabling efficient information transfer116,118,119,153. Ignoring these lower-frequency phase interactions may lead to an incomplete understanding of large-scale network dynamics and neural coordination supporting cognitive processes. Further studies are needed to bridge the explanatory gap between neural interactions supported by high-gamma activity and phase-related synchronization in brain networks.
An open question concerns the role of the basal ganglia and cortico-striatal interactions in the encoding of information gain. The inherent limitations of MEG in resolving deep brain structures prevent a detailed study of these interactions. Intracranial EEG recordings in epileptic patients rarely include electrodes in the striatum, the main input structure of the basal ganglia. As a result, human neurophysiology provides limited access to fast neural dynamics and cortico-striatal interactions. The absence of measurements in the basal ganglia could explain why we did not observe correlates of reward prediction error (RPE), as this region plays a crucial role in encoding reward signals. Future studies investigating the functional role of cortico-striatal interactions in information seeking and goal-directed learning in primates will be crucial for advancing our understanding of adaptive behaviour and decision-making processes.
Although the number of participants (N = 11) is limited, our previous study indicated that it has appropriate group-level statistical power154. In the study, the Matthews Correlation Coefficient (MCC) was used to determine the minimum number of trials and subjects for group-level inference with cluster-wise corrections. An MCC of 0.8 or higher indicates excellent model performance. With an average of 225 trials per participant, the MCC was ~0.9. Nevertheless, we cannot exclude that significant correlates of RPE could be identified if a larger number of participants could be studied.
Finally, the Q-learning model was used to estimate the trial-by-trial variability in RPE and IG signals, rather than to serve as a comprehensive model of choice behaviours in our task. Indeed, the model does not account for the win-stay and lose-switch strategies observed in our task. As standard RL models rarely capture across-trial behavioural strategies, accounting for it would require new computational theories whose development is beyond the scope of this work.
To conclude, the current study provides evidence that information gain is encoded in synergistic and higher-order functional interactions alongside with its broadcasting towards the prefrontal reward circuitry. Our study provides evidence regarding how information relevant for cognition is encoded and broadcasted in distributed cortical networks and brain-wide dynamics.
Methods
Experimental conditions and behavioural learning task
Eleven healthy subjects participated in the study (all right handed, 7 females; average age 26 years old). All participants gave written informed consent according to established institutional guidelines, and they received monetary compensation (50 euros) for their participation. The project has been promoted by the INSB of the CNRS (CNRS No. 17020; ANSM No. 2017-A03614-49) and approved by the ethical committee CPP Sud-Méditerranée.
Goal-directed learning was studied using an arbitrary visuomotor learning task design, where the relation between the visual stimulus, the action and its outcome is arbitrary and causal. Participants were instructed to learn by trial-and-error the correct associations between 3 coloured circles and 5 finger movements (Fig. 1A). Participants were instructed that correct actions were not exclusive across stimuli, meaning that they could not exclude certain fingers if assigned to another stimulus when seeking the correct association. On each trial, participants were presented a coloured circle to which they had to respond within 1.5 s. Reaction times were computed as the time difference between stimulus presentation and motor response (finger movement). After a fixed delay of 1 s following the disappearance of the coloured image, an outcome image was presented for 1 s and informed the subject whether the response was correct, incorrect, or too late (if the reaction time exceeded 1.5 s). “Late” trials were excluded from the analysis, because they were either absent or very rare (i.e., maximum 2 late trials per session). The next trial started after a variable delay ranging from 2 to 3 s (randomly drawn from a uniform distribution) with the presentation of another visual stimulus. Visual stimuli (coloured circles) were randomised in blocks of three trials. Each learning session was composed of 60 trials, 3 stimulus types (i.e., different colours, S1, S2, and S3) and 5 possible finger movements. Each subject performed 4 learning sessions, each containing different sets of coloured stimuli. To ensure reproducible performances across sessions and subjects, the task design manipulated learning and produced reproducible phases of acquisition and consolidation across sessions and individuals67. More precisely, the correct stimulus–response associations were not set a priori. Instead, the correct stimulus–response associations were assigned as the subject proceeded in the task. Figure 1B shows the matrix of all the possible stimulus–response combinations, updated according to the exemplar learning session (Fig. 1A). The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the subject’s motor response (Fig. 1B, from trials 1 to 3). Then, on the second presentation of stimulus S1, any untried finger movement was always considered as a correct response. Because the stimuli were presented randomly within blocks of 3 trials, stimulus S1 could occur for the second time from trials 4 to 6 (trial 4 in Fig. 1). For the second stimulus S2, the response was correct only when the subjects had performed 3 incorrect finger movements (trial 9 in Fig. 1B). For stimulus S3, the subject had to try 4 different finger movements before the correct response was found (trial 12 in Fig. 1B). In other words, the correct response was the second finger movement (different from the first tried response) for stimulus S1, the fourth finger movement for stimulus S2, the fifth for stimulus S3. Each learning session was composed of 42 trials, 3 stimulus types (i.e., different colours, S1, S2, and S3) and 5 possible finger movements. During scanning, each subject performed 4 learning sessions, each containing new coloured stimuli.
MarsAtlas cortical parcellation and source model
Anatomical T1-weighted MRI images were acquired for all participants using a 3-T whole-body imager equipped with a circular polarised head coil (Bruker). Magnetoencephalographic (MEG) recordings were performed using a 248 magnetometers system (4D Neuroimaging magnes 3600). Visual stimuli were projected using a video projection and motor responses were acquired using a LUMItouch® optical response keypad with five keys. Presentation® software was used for stimulus delivery and experimental control during MEG acquisition.
Single-subject cortical parcellation was performed using the MarsAtlas approach69. After denoising using non-local means155, T1-weighted MR-images were segmented using the FreeSurfer “recon-all” pipeline (v5.3.0). Grey and white matter segmentations were imported into the BrainVisa software (v4.5) and processed using the Morphologist pipeline procedure (http://brainvisa.info). White matter and pial surfaces were reconstructed and triangulated, and all sulci were detected and labelled automatically. A parameterization of each hemisphere white matter mesh was performed using the Cortical Surface toolbox (http://www.meca-brain.org/softwares/). It resulted in a 2D orthogonal system defined on the white matter mesh, constrained by a set of primary and secondary sulci69. The MarsAtlas parcelization contained a total of 82 cortical parcels (41 each hemisphere). Each individual white matter mesh was then resampled onto a template defined by the average geometry across a population of 138 healthy subjects (https://meca-brain.org/software/hiphop138/) in two steps: (1) the parameterization induced by the 2D orthogonal system is used to map the white matter mesh onto a canonical spherical domain; (2) the location of each vertex of the template mesh is interpolated onto the individual mesh geometry using the classical barycentric mapping. This procedure defines one-to-one spatial correspondence across the white matter meshes from all individuals at the vertex level. By adapting the spatial resolution of the template mesh, we can fix the number of points that define the correspondence across subjects. The sources are then defined at the location of these points and are thus distributed at consistent anatomical locations across subjects.
Finally, we created source space and forward models readable by MNE-python156 using cortical meshes generated using BrainVISA (MarsAtlas) and subcortical structures generated by Freesurfer. These three elements are needed for the power estimation at the source level, which will be discussed in the next paragraph. These steps were performed using the BV2MNE toolbox (https://github.com/brainets/bv2mne), a python library developed in our team based on MNE-python. The anatomical meshes were coregistred with the MEG data using the ‘mne coreg’ interface.
Single-trial high-gamma activity (HGA) in MarsAtlas
We focused on broadband gamma from 60 to 120 Hz for two main reasons. First, it has been shown that the gamma band activity correlates with both spiking activity and the BOLD fMRI signals157,158,159,160,161, and it is commonly used in MEG and iEEG studies to map task-related brain regions162,163,164. Therefore, focusing on the gamma band facilitates linking our results with the fMRI and spiking literature on probabilistic learning. Second, single-trial and time-resolved high-gamma activity can be exploited for the analysis of cortico-cortical interactions in humans using MEG and iEEG techniques55,154,165,166.
MEG signals were high pass filtered to 1 Hz, low-pass filtered to 250 Hz, notch filtered in multiples of 50 Hz and segmented into epochs aligned on stimulus and outcome presentation. Independent Component Analysis (ICA) was performed to detect and reject cardiac, eye-blink and oculomotor artefacts. Artefact rejection was performed semi automatically, at first by a visual inspection of the epochs’ time series, then by means of the autoreject python library167 to detect and reject bad epochs from further analysis.
Spectral density estimation was performed using a multi-taper method based on discrete prolate spheroidal (slepian) sequences168. To extract high-gamma activity from 60 to 120 Hz, MEG time series were multiplied by k orthogonal tapers (k = 11) (0.2 s in duration and 60 Hz of frequency resolution, each stepped every 0.005 s), centred at 90 Hz and Fourier-transformed. Complex-valued estimates of spectral measures\(\,{X}_{{sensor}}^{n}\left(t,k\right)\), including cross-spectral density matrices, were computed at the sensor level for each trial n, time t and taper k. Source analysis requires a physical forward model or leadfield, which describes the electromagnetic relation between sources and MEG sensors. The leadfield combines the geometrical relation of sources (dipoles) and sensors with a model of the conductive medium (i.e., the head model). For each participant, we generated a boundary element model using a single-shell model constructed from the segmentation of the cortical tissue obtained from individual MRI scans. Lead-fields were not normalised. The head model, source locations and the information about MEG sensor position for both models were combined to derive single-participant leadfields. The orientation of cortical sources was set perpendicular to the cortical surface.
We used adaptive linear spatial filtering to estimate the power at the source level. In particular, we employed the Dynamical Imaging of Coherent Sources (DICS) method, a beamforming algorithm for the tomographic mapping in the frequency domain169, which is a well suited for the study of neural oscillatory responses based on single-trial source estimates of band-limited MEG signals. At each source location, DICS employs a spatial filter that passes activity from this location with unit gain while maximally suppressing any other activity. The spatial filters were computed on all trials for each time point and session, and then applied to single-trial MEG data.
Single-trial power estimates aligned with outcome and stimulus onset were log-transformed to make the data approximate Gaussian and low-pass filtered at 50 Hz to reduce noise. Single-trial mean power and standard deviation in a time window from -0.5 and -0.1 s prior to stimulus onset was computed for each source and trial, and used to z-transform single-trial movement-locked power time courses. Similarly, single-trial outcome-locked power time courses were log-transformed and z-scored with respect to baseline period, to produce HGAs for the prestimulus period from -1.6 to −0.1 s with respect to stimulation for subsequent functional connectivity analysis. Finally, single-trial HGA for each brain region of MarsAtlas was computed as the mean z-transformed power values averaged across all sources within the same region. Single-trial HGA estimates were performed using home-made scripts based on MNE-python156.
Model-based analysis of behavioural data and estimate of learning signals
We estimated the evolution of stimulus-action-outcome probabilities and updating learning signals during learning using a Q-learning model24 from reinforcement learning theory25. Briefly, the Q-learning model updates action values through the Rescorla-Wagner learning rule (1972) expressed by the following equation:
The left-hand side of the equation corresponds to the updated Q-value for the observed stimulus \({s}_{t}\) and chosen action \({a}_{t}\) after the observation of the outcome \({r}_{t}\), at trial \(t\). In our case, \({s}_{t}\) corresponds to one of the three coloured stimuli, while \({a}_{t}\) represents one of the five possible finger movements and \({r}_{t}\) is either one or zero for correct and incorrect outcomes, respectively. Q-values were then transformed into probabilities according to the softmax equation:
\(P({a}_{t}|{s}_{t})\) is the conditional probability of selecting action \(a\) given stimulus \(s\), and it sums to one over possible actions \({\sum }_{a}P({a}_{t}|{s}_{t})=1\). For simplicity, we drop the conditioning over stimulus \(s\) and refer to it as action policies \(P({a}_{t})\). The coefficient β is termed the inverse ‘temperature’: low β (<1) causes all actions to be (nearly) equiprobable, whereas high β (>1) amplifies the differences in association values. We identified the set of parameters that best fitted the behavioural data using a maximum likelihood approach, as in the previous paper67. The model was fitted separately for each block of trials and learning session. For each learning session, we used a grid-search algorithm varying the learning rate α from 0.1 to 1 (in steps of 0.01) and the inverse temperature β from 1 to 10 (in steps of 0.2). Generally, grid-search algorithms are often preferred over optimization methods for parameter inference, because they systematically explore the entire parameter space, ensuring that the best possible combination is found. Unlike optimization techniques, which may get stuck in local minima, grid search guarantees a comprehensive evaluation of all parameter choices. This exhaustive search is particularly useful when dealing with low-dimensional problems or when prior knowledge about the best region of the parameter space is limited. In addition, a previous fMRI study used the same task and Q-learning model67, and it showed that the grid-search approach provided accurate estimates of action-outcome conditional probabilities that fitted behavioural data (Fig. 2 and Fig. 3 in67). The optimal set of parameters was chosen so to maximise the log-likelihood of the probability to make the action performed by the participant \({P}({c}_{t})=P({a}_{t}={chosen})\), defined as \({LL}={\sum }_{t}\log {P}({c}_{t})\). Model fitting was performed per learning block, so to accommodate potential changes in α and β across sessions. Therefore, the Q-learning model provides estimates of the evolution of reward prediction-errors during learning as:
Bayesian inference is a principled statistical model based on Bayesian updating of beliefs about the current state of the world. The amount of update after each observable is related to the notion of information gain. In Bayesian statistics, information gain can be defined as the amount of information required to “revise” one’s beliefs from the prior probability distribution \(P({a}_{t-1})\) to the posterior probability distribution \(P({a}_{t})\). This measure can be formalised via the Kullback–Leibler (KL) divergence between the posterior and prior. In neuroscience, this measure is also referred to as the Bayesian surprise39,41:
Note that the information gain depends on the stimulus presented at each trial. The index associated with stimulus type was dropped for simplicity. The equation measuring information gain can be expanded and rewritten as
The first term on the right-hand side is the cross entropy of \(P({a}_{t-1})\) and \(P({a}_{t})\), and it measures the mismatch between the prior and updated posterior policy. The second term is the entropy of the posterior distribution \(P({a}_{t})\) and it measures the remaining uncertainty in the action selection policy after update. The difference of these terms measures the information gain or the reduction in uncertainty achieved by updating the prior to the posterior after each observation. Information gain therefore relates to epistemic uncertainty, as it captures the reduction of model uncertainty due to observed outcomes. In the current task, information gain and uncertainty are correlated, because the correct action policies are deterministic. During the initial exploratory phase, both information gain and uncertainty start high when the participant’s policy is far from optimal and uncertain. At the end of learning, the policy becomes approximately deterministic, and both information gain and uncertainty approach zero. Information gain after the observation of the outcome leads to a reduction of uncertainty in the policy. No dissociation can be drawn between IG and uncertainty reduction in the current task. Nevertheless, information gain and reward prediction errors are not linearly related. Errors (i.e., negative RPE) and successes (i.e., positive RPE) are both associated with positive IG. In the current task, IG and RPE display approximately a U-shape relationship. On the other hand, IG scales with the absolute value of RPE, because both reflect the impact of surprise on learning. The absolute value of RPE captures how unexpected an outcome is. IG quantifies how much uncertainty is reduced as a result of that unexpected outcome.
Cortical areas encoding learning signals
We used information-theoretic metrics to quantify the statistical dependency between single-trial HGA and the outcome-related learning signals. Information-based measures quantify how much the neural activity of a single brain region explains a variable of the task. To this end, we computed the mutual information (MI); as a reminder, mutual information is defined as:
In this equation the variables \(X\) and \(Y\) represent the HGA power and the behavioural variables, respectively. \(X\) represents the HGA power over time, trials and regions. \(Y\) represents the behavioural variables across trials. \(H(X)\) is the entropy of \(X\), and \(H(X|Y)\) is the conditional entropy of \(X\) given \(Y\). We used Gaussian-Copula Mutual Information (GCMI)170, which is a semi-parametric binning-free technique to calculate MI. In the GCMI approach, which includes a rank based normalisation of the data, entropy values are computed using the differential entropy formula for a Gaussian distribution. The transformed data allows entropy estimation based on the determinant of the covariance matrix and the assumption of Gaussian marginals. The GCMI is a robust rank-based approach that allows detecting any type of monotonic relation between variables. More precisely, the GCMI is a lower-bound approximate estimator of mutual information for continuous signals based on a Gaussian copula, and it is of practical importance for the analysis of short and potentially multivariate neural signals. Note, however, that the GCMI does not detect non-monotonic (e.g., parabolic) relations. In the current work, the GCMI was computed across trials between time-resolved HGA and outcome-related learning signals. The parametric GCMI estimation was bias-corrected using an analytic correction to compensate for the bias due to the estimation of the covariance matrix from limited data (i.e. here, limited number of repetitions or trials)170. Since this parametric correction only depends on the number of trials, the same value is going to be used for both permuted and non-permuted data. Therefore, this bias correction only impacts the estimated effect size but has no effects on statistical results. The considered outcome-related learning signals were reward prediction errors (RPE) and Information Gain (IG).
Cortico-cortical functional interactions encoding learning signals
The goal of network-level analyses was to identify dynamic functional networks encoding outcome-related computations, such as the reward prediction errors (RPE) and information gain (IG). We used information-theoretic metrics to quantify the statistical dependency between single-trial event-related co-modulations in HGA across trials and learning computations. In general terms, the aim of our analyses was to assess the interdependence between pairs of brain signals (X1 and X2) and a third learning variable (Y), either RPE or IG. More precisely, we wished to quantify whether the interaction in activity co-modulation between brain regions supported functional segregation and integration phenomena supporting learning. We reasoned that functional segregation could be quantified as the amount of redundant information shared in the co-modulation of activity across brain regions. Similarly, we propose that functional integration processes could be quantified as the amount of synergistic information carried by the two brain regions about the target learning variable. Such research questions can be formalised within the Partial Information Decomposition (PID) framework64,66. Indeed, the PID framework allows the decomposition of multivariate mutual information between a system of predictors and a target variable, and to quantify the information that several sources (or predictors) variables provide uniquely, redundantly or synergistically about a target variable. In the current study, we considered the case where the source variables are the across-trials HGA of two brain regions (X1 and X2) and the target variable is the trial-by-trial evolution of the learning variable (Y). If pairs of brain regions X1 and X2 carry the same information about learning variable Y, we refer to it as redundant information. If only one area X1 or X2 carries information about Y, it is referred to as unique information carried by X1 or X2 about Y, and vice-versa for Y. Finally, if neither of the variables alone provide information about Y, but they need to be observed together, we refer to it as synergistic information. In other words, the knowledge of any of the predictors separately does not provide any information about the target variable C. Analytically, the PID proposes to decompose the total mutual information between a pair of source variables X and Y and a target variable Z \(I({X}_{1},{X}_{2};Y)\) into four non-negative components:
\(U({X}_{1};Y)\) and \(U({X}_{2};Y)\) are unique information carried for the two areas, respectively. \(R({X}_{1},{X}_{2};Y)\) and \(S({X}_{1},{X}_{2};Y)\) are the redundancy and synergy terms, respectively. In addition, the PID formulation links to classical Shannon measures of mutual information as
The problem with this approach is that its governing equations form an under-determined system: only three quantities that can be computed from the data (i.e., the mutual information quantities \(I({X}_{1},{X}_{2};Y)\), \(I({X}_{1};Y)\) and \(I({X}_{2};Y)\)) for the four terms of the PID (i.e., two unique information, redundant and synergistic). To actually calculate the decomposition, an additional assumption must therefore be made. Here, we exploit the so-called minimum mutual information (MMI) PID, which has been shown to provide correct estimations for a broad class of systems following a multivariate Gaussian distribution120,171. According to the MMI PID, redundant information carried by pairs of brain regions is given by the minimum of the information provided by each individual source to the target,
Then, synergistic information can be computed by substituting Eqs. 8, 9 and 10 in Eq. 7 and rearranging the terms
Equations 9 and 10 represent the redundant and synergistic information carried by the co-modulations in HGA of pairs of brain regions X1 and X2 about the learning variable Z, respectively. By definition, redundant functional interactions carry the same information about learning variables than the one carried by individual brain areas. On the other hand, synergistic connectivity reveals brain functional interactions that cannot be explained by individual contribution, but only by their combined and collective coordination.
Higher-order cortico-cortical interactions encoding learning signals
We then studied whether outcome-related learning signals would be encoded in the interactions beyond the pairwise relations, the so-called higher-order functional interactions, or correlations. The definitions of redundancy and synergy can be generalised to higher-order by simply iterating over all regions of a multiplet. By considering a multiplet as a set of \(n\) variables \({X}^{n}=\{{X}_{1},...,{X}_{n}\}\), the redundant information carried by each higher-order multiplet about the target variable is given by:
Then, synergistic information can be computed by substituting Eqs. 8, 9 and 10 in Eq. 7 and rearranging the terms
where \({{X}_{i}}^{-}\) is the the set of variables of \({X}^{n}\) excluding the brain area \(i\). One of the main issues in the study of the higher-order behaviours of complex systems is the computational cost required to investigate all the multiplets of any order. In order to perform HOIs analyses over all orders, we thus identified 9 sets of brain regions that displayed significant encoding of learning signals and we averaged the HGA for each set of regions-of-interest. This allowed us to perform an exhaustive analysis of HOIs at all orders. Equations 11 and 12 represent the redundant and synergistic information carried by multiplets of brain regions about the learning variable Y, respectively.
Cortico-cortical information transfer encoding learning signals
In order to investigate whether learning signals are broadcasted across different cortical regions, we used a recently-developed information-theoretic measure termed Feature-specific Information Transfer (FIT) that quantifies how much information about specific features flows between brain areas73. FIT merges the Wiener-Granger causality principle74,75,76 with content specificity based on the PID framework64,66. FIT isolates information about a specific task variable Y encoded in the current activity of a receiving neural population, which was not encoded in its past activity, and which was instead encoded by the past activity of the sender neural population. More precisely, the FIT measure is based on two four-variables PID and it is defined as the minimum of two “atoms”. The first atom is defined in a four-variable PID, which considers the learning variable \(Y\) as the target and three source variables, namely the brain activity in the past of the sending signal \({X}_{1}(t-\tau )\), the past of the receiver \({X}_{2}(t-\tau )\) and the present of the receiver \({X}_{2}(t)\). This atom formalises the principle of Granger-Wiener causality within the PID framework and it adds the encoding of task variables (e.g., learning) in the time-lagged predictability between signals. More formally, the first atom is defined as the redundant information about \(Y\) carried by the past of \({X}_{1}(t-\tau )\) and the present of \({X}_{2}(t)\), which is unique with respect to past of the \({X}_{2}(t-\tau )\). As for previous analyses, we used the MMI PID to compute this atom, which was computed as
Note that the second term on the right-hand side assures the uniqueness with respect to the past of the receiver \({X}_{2}(t-\tau )\). The second atom for the calculation of the FIT is defined for a four-variable PID, considering the present of the receiver \({X}_{2}(t)\), as the target variable and the learning variable \(Y\), \({X}_{1}(t-\tau )\) and\(\,{X}_{2}(t-\tau )\) as source variables. This second term assures that the FIT measure does exceed: (i) the overall propagation of information between signals, referred to as the Directed Information or Transfer Entropy172, and quantified by means of the conditional mutual information \(I({X}_{1}(t-\tau );\,{X}_{2}(t)|{X}_{2}(t-\tau ))\); (ii) the mutual information between the learning variable and the past of the sender, \(I({Y;X}_{1}(t-\tau ))\) and (iii) the mutual information between the learning variable and the present of the receiver, \(I({Y;X}_{2}(t))\). This term is defined as the redundant information about the present of the receiver \({X}_{2}(t)\) carried by \({X}_{1}(t-\tau )\) and \(Y\), which is unique with respect to the past of the receiver \({X}_{2}(t-\tau )\):
Therefore, the FIT measure is defined as the minimum between these two atoms (Eqs. 14 and 15) to select the smallest nonnegative piece of information:
We computed the FIT measure between all pairs of 8 sets of brain regions at different delays tau ranging from −0.005 to -0.2 s, and averaged over delays. Given that the scope of the study was not a detailed characterisation of time delays between clusters, we used such averaging procedure, as in previous papers73,173. Statistical analysis was similarly applied as for all other measures.
Statistical analysis
For the statistical inferences, we used a group-level approach based on non-parametric permutations and encompassing non-negative measures of information developed in a previous study154. We used a random effect (RFX) to take into account the inter-subject variability. In this approach, the information theoretical metrics (e.g., mutual information, MI) between the neurophysiological signal and the behavioural regressor are computed across trials for each participant separately, at each time point and brain region. In order to assess whether the estimated effect size in MI significantly differed from the chance distribution and to correct for multiple comparisons, we implemented a cluster-wise statistics approach174. For cluster-level statistics, the cluster forming threshold is defined as the 95th percentile across all of the permutations. Such threshold is used to identify the clusters on both the true and permuted data. To sample the distribution of MI attainable by chance, we computed the MI between the brain data and a randomly shuffled version of the behavioural variable. This procedure was then repeated 1000 times. We took the mean of the MI values computed on the permutations, and used this mean (MI) to perform a one sample t-test across all the participants’ MI values obtained both from original and permuted data. We then used a cluster-based approach to assess whether the size of the estimated t-values significantly differs from its distribution. The cluster forming threshold was defined as the 95th percentile of the distribution of t-values. We used this cluster forming threshold to identify the cluster mass of t-values on both original and permuted data. As a reminder, the cluster-based approach174 is performed on the mass of the clusters (i.e., the summed activity above the threshold), rather than on individual temporal samples. Thus, the p-value of the cluster is not representative of the individual samples within the cluster and cannot be interpreted as a measure of the onset and offset of significant effects. Finally, to correct for multiple comparisons across both time and space, we build a distribution made of the 1000 largest clusters estimated on the permuted data. The final corrected p-values were inferred as the proportion of permutations exceeding the t-values. For what concerns functional connectivity analysis, the information theoretical metrics of interest (e.g., redundant connectivity) across pairs of brain regions were studied similarly as the local MI time courses.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The group-level data and results, in addition to Jupiter notebooks to reproduce the figures of the paper, have been deposited on the Github repository (https://github.com/brainets/hosi_infogain), (https://doi.org/10.5281/zenodo.15674302). The single-subject MRI, MEG and behavioural data are protected and are not available due to data privacy laws. The processed data may be requested to the corresponding author.
Code availability
Higher-order redundancy and synergistic measures were computed using the HOI toolbox (https://github.com/brainets/hoi). All pair-wise information theoretical measures, including the FIT, and statistical analyses were performed using the Frites Python package (https://github.com/brainets/frites)175. Hypergraph visualisation was performed using the XGI package (https://xgi.readthedocs.io/en/stable/).
References
Bandura, A. Self-Efficacy: The Exercise of Control, Vol. 604 (New York: W.H. Freeman, 1997).
Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Dickinson, A.CHAPTER 3 - Instrumental conditioning. In Animal Learning and Cognition (ed. Mackintosh, N. J.) 45–79 (Academic Press, San Diego, 1994).
Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Haber, S. N. & Knutson, B. The reward circuit: linking primate anatomy and human imaging. Neuropsychopharmacology 35, 4–26 (2009).
Liljeholm, M., Tricomi, E., O’Doherty, J. P. & Balleine, B. W. Neural correlates of instrumental contingency learning: differential effects of action-reward conjunction and disjunction. J. Neurosci. 31, 2474–2480 (2011).
Tanaka, S. C., Balleine, B. W. & O’Doherty, J. P. Calculating consequences: brain systems that encode the causal effects of actions. J. Neurosci. 28, 6750–6755 (2008).
Liljeholm, M., Wang, S., Zhang, J. & O’Doherty, J. P. Neural correlates of the divergence of instrumental probability distributions. J. Neurosci. 33, 12519–12527 (2013).
Norton, K. G. & Liljeholm, M. The rostrolateral prefrontal cortex mediates a preference for high-agency environments. J. Neurosci. 40, 4401–4409 (2020).
Jocham, G. et al. Reward-guided learning with and without causal attribution. Neuron 90, 177–190 (2016).
Walton, M. E., Behrens, T. E. J., Buckley, M. J., Rudebeck, P. H. & Rushworth, M. F. S. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron 65, 927–939 (2010).
Morris, R. W., Dezfouli, A., Griffiths, K. R., Le Pelley, M. E. & Balleine, B. W. The neural bases of action-outcome learning in humans. J. Neurosci. 42, 3636−3647 (2022)
Averbeck, B. B. & Costa, V. D. Motivational neural circuits underlying reinforcement learning. Nat. Neurosci. 20, 505–512 (2017).
Averbeck, B. & O’Doherty, J. P. Reinforcement-learning in fronto-striatal circuits. Neuropsychopharmacology 47, 147–162 (2021).
Bartolo, R. & Averbeck, B. B. Prefrontal cortex predicts state switches during reversal learning. Neuron 106, 1044–1054.e4 (2020).
Cohen, J. D., McClure, S. M. & Yu, A. J. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos. Trans. R. Soc. Lond. B Biol. Sci. 362, 933–942 (2007).
Schwartenbeck, P. et al. Computational mechanisms of curiosity and goal-directed exploration. Elife 8, e41703 (2019).
Gottlieb, J., Oudeyer, P.-Y., Lopes, M. & Baranes, A. Information-seeking, curiosity, and attention: computational and neural mechanisms. Trends Cogn. Sci. 17, 585–593 (2013).
Cockburn, J., Man, V., Cunningham, W. A. & O’Doherty, J. P. Novelty and uncertainty regulate the balance between exploration and exploitation through distinct mechanisms in the human brain. Neuron 110, 2691–2702.e8 (2022).
Badre, D., Doll, B. B., Long, N. M. & Frank, M. J. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron 73, 595–607 (2012).
Frank, M. J., Doll, B. B., Oas-Terpstra, J. & Moreno, F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009).
Mehlhorn, K. et al. Unpacking the exploration–exploitation tradeoff: A synthesis of human and animal literatures. Decisions 2, 191–215 (2015).
Rescorla, R. A. Associative relations in instrumental learning: the eighteenth bartlett memorial lecture. Q. J. Exp. Psychol. Sect. B 43, 1–23 (1991).
Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).
Sutton, R. S. & Barto, A. G. Reinforcement Learning, Second Edition: An Introduction. (MIT Press, 2018).
Schultz, W. Behavioral theories and the neurophysiology of reward. Annu. Rev. Psychol. 57, 87–115 (2006).
O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).
Frank, M. J., Seeberger, L. C. & O’reilly, R. C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J. & Frith, C. D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 (2006).
D’Ardenne, K., McClure, S. M., Nystrom, L. E. & Cohen, J. D. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267 (2008).
Palminteri, S., Khamassi, M., Joffily, M. & Coricelli, G. Contextual modulation of value signals in reward and punishment learning. Nat. Commun. 6, 8096 (2015).
Gueguen, M. C. M. et al. Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans. Nat. Commun. 12, 3344 (2021).
Friston, K. et al. Active inference and epistemic value. Cogn. Neurosci. 6, 187–214 (2015).
Gottlieb, J. & Oudeyer, P.-Y. Towards a neuroscience of active sampling and curiosity. Nat. Rev. Neurosci. 19, 758–770 (2018).
Faraji, M., Preuschoff, K. & Gerstner, W. Balancing new against old information: the role of puzzlement surprise in learning. Neural Comput 30, 34–83 (2018).
Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Liakoni, V., Modirshanechi, A., Gerstner, W. & Brea, J. Learning in volatile environments with the bayes factor surprise. Neural Comput 33, 269–340 (2021).
Yu, A. J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Modirshanechi, A., Becker, S., Brea, J. & Gerstner, W. Surprise and novelty in the brain. Curr. Opin. Neurobiol. 82, 102758 (2023).
Baldi, P. & Itti, L. Of bits and wows: a Bayesian theory of surprise with applications to attention. Neural Netw. 23, 649–666 (2010).
Itti, L. & Baldi, P. Bayesian surprise attracts human attention. Vis. Res 49, 1295–1306 (2009).
Mackintosh, N. J. A theory of attention: variations in the associability of stimuli with reinforcement. Psychol. Rev. 82, 276–298 (1975).
Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Courville, A. C., Daw, N. D. & Touretzky, D. S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. 10, 294–300 (2006).
Fouragnan, E., Retzler, C. & Philiastides, M. G. Separate neural representations of prediction error valence and surprise: evidence from an fMRI meta-analysis. Hum. Brain Mapp. 39, 2887–2906 (2018).
Liakoni, V. et al. Brain signals of a surprise-actor-critic model: evidence for multiple learning modules in human decision making. Neuroimage 246, 118780 (2022).
Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Kobayashi, K. & Hsu, M. Common neural code for reward and information value. Proc. Natl. Acad. Sci. USA. 116, 13061–13066 (2019).
Kobayashi, K. et al. Dynamic representation of the subjective value of information. J. Neurosci. 41, 8220–8232 (2021).
Kobayashi, K. & Kable, J. W. Neural mechanisms of information seeking. Neuron 112, 1741–1756 (2024).
Panzeri, S., Moroni, M., Safaai, H. & Harvey, C. D. The structures and functions of correlations in neural population codes. Nat. Rev. Neurosci. 23, 551–567 (2022).
Luppi, A. I., Rosas, F. E., Mediano, P. A. M., Menon, D. K. & Stamatakis, E. A. Information decomposition and the informational architecture of the brain. Trends Cogn. Sci. 28, 352–368 (2024).
Varley, T. F., Sporns, O., Schaffelhofer, S., Scherberger, H. & Dann, B. Information-processing dynamics in neural networks of macaque cerebral cortex reflect cognitive state and behavior. Proc. Natl Acad. Sci. USA. 120, e2207677120 (2023).
Valente, M. et al. Correlations enhance the behavioral readout of neural population activity in association cortex. Nat. Neurosci. 24, 975–986 (2021).
Combrisson, E. et al. Neural interactions in the human frontal cortex dissociate reward and punishment learning. eLife 12, RP92938 (2024).
Martignon, L. et al. Neural coding: higher-order temporal patterns in the neurostatistics of cell assemblies. Neural Comput 12, 2621–2653 (2000).
Yu, S. et al. Higher-order interactions characterized in cortical activity. J. Neurosci. 31, 17514–17526 (2011).
Shahidi, N., Andrei, A. R., Hu, M. & Dragoi, V. High-order coordination of cortical spiking activity modulates perceptual accuracy. Nat. Neurosci. 22, 1148–1158 (2019).
Varley, T. F., Pope, M., Puxeddu, M. G., Faskowitz, J. & Sporns, O. Partial entropy decomposition reveals higher-order information structures in human brain activity. Proc. Natl. Acad. Sci. Usa. 120, e2300888120 (2023).
Chelaru, M. I. et al. High-order interactions explain the collective behavior of cortical populations in executive but not sensory areas. Neuron 109, 3954–3961.e5 (2021).
Santoro, A., Battiston, F., Lucas, M., Petri, G. & Amico, E. Higher-order connectomics of human brain function reveals local topological signatures of task decoding, individual identification, and behavior. Nat. Commun. 15, 1–12 (2024).
Schneidman, E., Berry, M. J. 2nd, Segev, R. & Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007–1012 (2006).
Köster, U., Sohl-Dickstein, J., Gray, C. M. & Olshausen, B. A. Modeling higher-order correlations within cortical microcolumns. PLoS Comput Biol. 10, e1003684 (2014).
Williams, P. L. & Beer, R. D. Nonnegative decomposition of multivariate information. arXiv https://doi.org/10.48550/arXiv.1004.2515 (2010).
Wibral, M., Priesemann, V., Kay, J. W., Lizier, J. T. & Phillips, W. A. Partial information decomposition as a unified approach to the specification of neural goal functions. Brain Cogn. 112, 25–38 (2017).
Lizier, J. T., Bertschinger, N., Jost, J. & Wibral, M. Information decomposition of target effects from multi-source interactions: perspectives on previous, current and future work. Entropy 20, 307 (2018).
Brovelli, A., Laksiri, N., Nazarian, B., Meunier, M. & Boussaoud, D. Understanding the neural computations of arbitrary visuomotor learning through fMRI and associative learning theory. Cereb. Cortex 18, 1485–1495 (2008).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. (MIT Press, 1998).
Auzias, G., Coulon, O. & Brovelli, A. MarsAtlas: A cortical parcellation atlas for functional mapping. Hum. Brain Mapp. 37, 1573–1592 (2016).
Gatica, M. et al. High-order interdependencies in the aging brain. Brain Connect 11, 734–744 (2021).
Battiston, F. et al. Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 874, 1–92 (2020).
Battiston, F. et al. The physics of higher-order interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021).
Celotto, M. et al. An information-theoretic quantification of the content of communication between brain regions. Adv. Neural Inf. Process. Syst. 36, 64213–64265 (2024).
Granger, C. W. J. Testing for causality: a personal viewpoint. J. Econ. Dyn. Control 2, 329–352 (1980).
Brovelli, A. et al. Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. Proc. Natl. Acad. Sci. USA. 101, 9849–9854 (2004).
Bressler, S. L. & Seth, A. K. Wiener–Granger Causality: A well established methodology. Neuroimage 58, 323–329 (2011).
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P. & Pezzulo, G. Active inference: a process theory. Neural Comput. 29, 1–49 (2017).
Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A. & Cohen, J. D. Humans use directed and random exploration to solve the explore-exploit dilemma. J. Exp. Psychol. Gen. 143, 2074–2081 (2014).
Gershman, S. J. Deconstructing the human algorithms for exploration. Cognition 173, 34–42 (2018).
Modirshanechi, A., Brea, J. & Gerstner, W. A taxonomy of surprise definitions. J. Math. Psychol. 110, 102712 (2022).
Bromberg-Martin, E. S. & Hikosaka, O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron 63, 119–126 (2009).
Tishby, N. & Polani, D. Information Theory of Decisions and Actions. In Perception-Action Cycle: Models, Architectures, and Hardware (eds. Cutsuridis, V., Hussain, A. & Taylor, J. G.) 601–636 (Springer New York, New York, NY, 2011).
Schwartenbeck, P., FitzGerald, T., Dolan, R. & Friston, K. Exploration, novelty, surprise, and free energy minimization. Front. Psychol. 4, 63551 (2013).
Averbeck B., M. E. A. Hypothalamic interactions with large-scale neural circuits underlying reinforcement learning and motivated behavior. Trends Neurosci. 43, 681−694 (2020).
Nour, M. M. et al. Dopaminergic basis for signaling belief updates, but not surprise, and the link to paranoia. Proc. Natl. Acad. Sci. USA. 115, E10167–E10176 (2018).
O’Reilly, J. X. et al. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc. Natl. Acad. Sci. USA. 110, E3660–E3669 (2013).
Shulman, G. L. et al. Right hemisphere dominance during spatial selective attention and target detection occurs outside the dorsal frontoparietal network. J. Neurosci. 30, 3640–3651 (2010).
Corbetta, M. & Shulman, G. L. Spatial neglect and attention networks. Annu. Rev. Neurosci. 34, 569–599 (2011).
Vossel, S., Geng, J. J. & Fink, G. R. Dorsal and ventral attention systems: distinct neural circuits but collaborative roles. Neuroscientist 20, 150–159 (2014).
Levy, D. J. & Glimcher, P. W. The root of all value: a neural common currency for choice. Curr. Opin. Neurobiol. 22, 1027–1038 (2012).
Bromberg-Martin, E. S. et al. A neural mechanism for conserved value computations integrating information and rewards. Nat. Neurosci. 27, 159–175 (2024).
Bromberg-Martin, E. S. & Hikosaka, O. Lateral habenula neurons signal errors in the prediction of reward information. Nat. Neurosci. 14, 1209–1216 (2011).
Charpentier, C. J., Bromberg-Martin, E. S. & Sharot, T. Valuation of knowledge and ignorance in mesolimbic reward circuitry. Proc. Natl. Acad. Sci. USA. 115, E7255–E7264 (2018).
Brydevall, M., Bennett, D., Murawski, C. & Bode, S. The neural encoding of information prediction errors during non-instrumental information seeking. Sci. Rep. 8, 6134 (2018).
White, J. K. et al. A neural network for information seeking. Nat. Commun. 10, 5168 (2019).
Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. Dopamine in motivational control: rewarding, aversive, and alerting. Neuron 68, 815–834 (2010).
Monosov, I. E. Curiosity: primate neural circuits for novelty and information seeking. Nat. Rev. Neurosci. 25, 195–208 (2024).
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Ballesta, S., Shi, W., Conen, K. E. & Padoa-Schioppa, C. Values encoded in orbitofrontal cortex are causally related to economic choices. Nature 588, 450–453 (2020).
Gardner, M. P. H. et al. Processing in lateral orbitofrontal cortex is required to estimate subjective preference during initial, but not established, economic choice. Neuron 108, 526–537.e4 (2020).
Gore, F. et al. Orbitofrontal cortex control of striatum leads economic decision-making. Nat. Neurosci. 26, 1566–1574 (2023).
Badre, D. & D’Esposito, M. Is the rostro-caudal axis of the frontal lobe hierarchical?. Nat. Rev. Neurosci. 10, 659–669 (2009).
Cavanagh, J. F. & Frank, M. J. Frontal theta as a mechanism for cognitive control. Trends Cogn. Sci. 18, 414–421 (2014).
Cavanagh, J. F., Figueroa, C. M., Cohen, M. X. & Frank, M. J. Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cereb. Cortex 22, 2575–2586 (2012).
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Raja Beharelle, A., Polanía, R., Hare, T. A. & Ruff, C. C. Transcranial stimulation over frontopolar cortex elucidates the choice attributes and neural mechanisms used to resolve exploration-exploitation trade-offs. J. Neurosci. 35, 14544–14556 (2015).
Zajkowski, W. K., Kossut, M. & Wilson, R. C. A causal role for right frontopolar cortex in directed, but not random, exploration. Elife 6, e27430 (2017).
Parr, T. & Friston, K. J. Uncertainty, epistemics and active inference. J. R. Soc. Interface 14, 136 (2017).
Varela, F., Lachaux, J. P., Rodriguez, E. & Martinerie, J. The brainweb: phase synchronization and large-scale integration. Nat. Rev. Neurosci. 2, 229–239 (2001).
Bressler, S. L. & Menon, V. Large-scale brain networks in cognition: emerging methods and principles. Trends Cogn. Sci. 14, 277–290 (2010).
Deco, G., Tononi, G., Boly, M. & Kringelbach, M. L. Rethinking segregation and integration: contributions of whole-brain modelling. Nat. Rev. Neurosci. 16, 430–439 (2015).
Battaglia, D. & Brovelli, A. Functional connectivity and neuronal dynamics: insights from computational methods. InThe Cognitive Neurosciences (Sixth Edition), (eds. Poeppel, D., Mangun, G. R., Gazzaniga, M. S.) (The MIT Press, 2020).
Bassett, D. S. & Mattar, M. G. A network neuroscience of human learning: potential to inform quantitative theories of brain and behavior. Trends Cogn. Sci. 21, 250–264 (2017).
Hunt, L. T. & Hayden, B. Y. A distributed, hierarchical and recurrent framework for reward-based choice. Nat. Rev. Neurosci. 18, 172–182 (2017).
Khilkevich, A. et al. Brain-wide dynamics linking sensation to action during decision-making. Nature 634, 890–900 (2024).
Engel, A. K., Fries, P. & Singer, W. Dynamic predictions: Oscillations and synchrony in top–down processing. Nat. Rev. Neurosci. 2, 704–716 (2001).
Buzsáki, G. & Draguhn, A. Neuronal oscillations in cortical networks. Science 304, 1926–1929 (2004).
Fries, P. Rhythms for cognition: communication through coherence. Neuron 88, 220–235 (2015).
Vinck, M. et al. Principles of large-scale neural interactions. Neuron 111, 987–1002 (2023).
Luppi, A. I. et al. A synergistic core for human brain evolution and cognition. Nat. Neurosci. 25, 771–782 (2022).
Varley, T. F., Pope, M., Faskowitz, J. & Sporns, O. Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex. Commun. Biol. 6, 451 (2023).
Nigam, S., Pojoga, S. & Dragoi, V. Synergistic coding of visual information in columnar networks. Neuron 104, 402–411.e4 (2019).
Combrisson, E. et al. Neural interactions in the human frontal cortex dissociate reward and punishment learning. Elife 12, RP92938 (2023).
Rosas, F. E., Mediano, P. A. M., Rassouli, B. & Barrett, A. B. An operational information decomposition via synergistic disclosure. J. Phys. A: Math. Theor. 53, 485001 (2020).
Mediano, P. A. M. et al. Integrated information as a common signature of dynamical and information-processing complexity. Chaos 32, 013115 (2022).
Proca, A. M. et al. Synergistic information supports modality integration and flexible learning in neural networks solving multiple tasks. PLoS Comput. Biol. 20, e1012178 (2024).
Tax, T. M. S., Mediano, P. A. M. & Shanahan, M. The partial information decomposition of generative neural network models. Entropy 19, 474 (2017).
Wang, R. et al. Segregation, integration, and balance of large-scale resting brain networks configure different cognitive abilities. Proc. Natl. Acad. Sci. USA. 118, e2022288118 (2021).
Sporns, O. Network attributes for segregation and integration in the human brain. Curr. Opin. Neurobiol. 23, 162–171 (2013).
Finc, K. et al. Dynamic reconfiguration of functional brain networks during working memory training. Nat. Commun. 11, 2435 (2020).
Cohen, J. R. & D’Esposito, M. The segregation and integration of distinct brain networks and their relationship to cognition. J. Neurosci. 36, 12083–12094 (2016).
Braun, U. et al. Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc. Natl. Acad. Sci. USA. 112, 11678–11683 (2015).
Shine, J. M. et al. The dynamics of functional brain networks: integrated network states during cognitive task performance. Neuron 92, 544–554 (2016).
Buehlmann, A. & Deco, G. Optimal information transfer in the cortex through synchronization. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1000934 (2010).
Kirst, C., Timme, M. & Battaglia, D. Dynamic information routing in complex networks. Nat. Commun. 7, 11061 (2016).
Palmigiano, A., Geisel, T., Wolf, F. & Battaglia, D. Flexible information routing by transient synchrony. Nat. Neurosci. 20, 1014–1022 (2017).
Montani, F. et al. The impact of high-order interactions on the rate of synchronous discharge and information transmission in somatosensory cortex. Philos. Trans. A Math. Phys. Eng. Sci. 367, 3297–3310 (2009).
Ohiorhenuan, I. E. et al. Sparse coding and high-order correlations in fine-scale cortical networks. Nature 466, 617–621 (2010).
Ganmor, E., Segev, R. & Schneidman, E. Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proc. Natl. Acad. Sci. USA. 108, 9679–9684 (2011).
Shimazaki, H., Amari, S.-I., Brown, E. N. & Grün, S. State-space analysis of time-varying higher-order spike correlation for multiple neural spike train data. PLoS Comput. Biol. 8, e1002385 (2012).
Sizemore, A. E., Phillips-Cremins, J. E., Ghrist, R. & Bassett, D. S. The importance of the whole: topological data analysis for the network neuroscientist. Netw. Neurosci. 3, 656–673 (2019).
Crutchfield, J. P. The calculi of emergence: computation, dynamics and induction. Phys. D. 75, 11–54 (1994).
Santoro, A., Battiston, F., Petri, G. & Amico, E. Higher-order organization of multivariate time series. Nat. Phys. 19, 221–229 (2023).
Benson, A. R., Gleich, D. F. & Leskovec, J. Higher-order organization of complex networks. Science 353, 163–166 (2016).
Grilli, J., Barabás, G., Michalska-Smith, M. J. & Allesina, S. Higher-order interactions stabilize dynamics in competitive network models. Nature 548, 210–213 (2017).
Levine, J. M., Bascompte, J., Adler, P. B. & Allesina, S. Beyond pairwise mechanisms of species coexistence in complex communities. Nature 546, 56–64 (2017).
Sanchez-Gorostiaga, A., Bajić, D., Osborne, M. L., Poyatos, J. F. & Sanchez, A. High-order interactions distort the functional landscape of microbial consortia. PLoS Biol. 17, e3000550 (2019).
Bach, D. R. & Dolan, R. J. Knowing how much you don’t know: a neural organization of uncertainty estimates. Nat. Rev. Neurosci. 13, 572–586 (2012).
Yoshida, W. & Ishii, S. Resolution of uncertainty in prefrontal cortex. Neuron 50, 781–789 (2006).
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Payzan-LeNestour, E. & Bossaerts, P. Risk, unexpected uncertainty, and estimation uncertainty: bayesian learning in unstable settings. PLOS Comput. Biol. 7, e1001048 (2011).
Payzan-LeNestour, E., Dunne, S., Bossaerts, P. & O’Doherty, J. P. The neural representation of unexpected uncertainty during value-based decision making. Neuron 79, 191–201 (2013).
Vinck, M., Uran, C., Dowdall, J. R., Rummell, B. & Canales-Johnson, A. Large-scale interactions in predictive processing: oscillatory versus transient dynamics. Trends Cogn. Sci. 29, 133–148 (2025).
Combrisson, E. et al. Group-level inference of information-based measures for the analyses of cognitive brain networks from neurophysiological data. Neuroimage 258, 119347 (2022).
Coupe, P. et al. An optimized blockwise nonlocal means denoising filter for 3-D magnetic resonance images. IEEE Trans. Med. Imaging 27, 425–441 (2008).
Gramfort, A. et al. MEG and EEG data analysis with MNE-Python. Front. Neurosci. 7, 267 (2013).
Mukamel, R. et al. Coupling between neuronal firing, field potentials, and FMRI in human auditory cortex. Science 309, 951–954 (2005).
Niessing, J. et al. Hemodynamic signals correlate tightly with synchronized gamma oscillations. Science 309, 948–951 (2005).
Lachaux, J.-P. et al. Relationship between task-related gamma oscillations and BOLD signal: new insights from combined fMRI and intracranial EEG. Hum. Brain Mapp. 28, 1368–1375 (2007).
Nir, Y. et al. Coupling between neuronal firing rate, gamma LFP, and BOLD fMRI is related to interneuronal correlations. Curr. Biol. 17, 1275–1285 (2007).
Ray, S. & Maunsell, J. H. R. Different origins of gamma rhythm and high-gamma activity in macaque visual cortex. PLOS Biol. 9, e1000610 (2011).
Brovelli, A., Lachaux, J.-P., Kahane, P. & Boussaoud, D. High gamma frequency oscillatory activity dissociates attention from intention in the human premotor cortex. Neuroimage 28, 154–164 (2005).
Crone, N. E., Sinai, A. & Korzeniewska, A. High-frequency gamma oscillations and human brain mapping with electrocorticography. Prog. Brain Res. 159, 275–295 (2006).
Jerbi, K. et al. Task-related gamma-band dynamics from an intracerebral perspective: review and implications for surface EEG and MEG. Hum. Brain Mapp. 30, 1758–1771 (2009).
Brovelli, A., Chicharro, D., Badier, J.-M., Wang, H. & Jirsa, V. Characterization of cortical networks and corticocortical functional connectivity mediating arbitrary visuomotor mapping. J. Neurosci. 35, 12643–12658 (2015).
Brovelli, A. et al. Dynamic reconfiguration of visuomotor-related functional connectivity networks. J. Neurosci. 37, 839–853 (2017).
Jas, M., Engemann, D. A., Bekhti, Y., Raimondo, F. & Gramfort, A. Autoreject: Automated artifact rejection for MEG and EEG data. Neuroimage 159, 417–429 (2017).
Percival, D. B. & Walden, A. T. Spectral Analysis for Physical Applications Illustrated edition, Vol. 612 (Cambridge University Press, 1993).
Gross, J. et al. Dynamic imaging of coherent sources: studying neural interactions in the human brain. Proc. Natl. Acad. Sci. USA 98, 694–699 (2001).
Ince, R. A. A. et al. A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula. Hum. Brain Mapp. 38, 1541–1573 (2017).
Barrett, A. B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 91, 052802 (2015).
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 85, 461–464 (2000).
Lemke, S. M. et al. Information flow between motor cortex and striatum reverses during skill learning. Curr. Biol. 34, 1831–1843.e7 (2024).
Maris, E. & Oostenveld, R. Nonparametric statistical testing of EEG- and MEG-data. J. Neurosci. Methods 164, 177–190 (2007).
Combrisson, E., Basanisi, R., Cordeiro, V. L., Ince, R. A. A. & Brovelli, A. Frites: A python package for functional connectivity analysis and group-level statistics of neurophysiological data. J. Open Source Softw. 7, 3842 (2022).
Acknowledgements
A.B. and E.C were supported by the PRC project “CausaL” (ANR-18-CE28-0016) and received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 945539 (Human Brain Project SGA3). A.B. was supported by A*Midex Foundation of Aix-Marseille University project “Hinteract” (AMX-22-RE-AB-071). R.B. and M.N. have received funding from the French government under the “France 2030” investment plan managed by the French National Research Agency (reference: ANR−16-CONV000X / ANR−17-EURE-0029) and from Excellence Initiative of AixMarseille University - A*MIDEX (AMX−19-IET-004). The “Center de Calcul Intensif of the Aix-Marseille University (CCIAM)” is acknowledged for high-performance computing resources. A.B, E.C and D.M. were supported by EU’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreements No. 101147319 (EBRAINS 2.0 Project). G.A. was supported by the ANR SulcalGRIDS Project (ANR-19-CE45-0014). G.P. is supported by the European Research Council (ERC) Consolidator Grant under the European Union’s Horizon Europe programme (grant agreement No. 101171380, project RUNES). We thank Pedro Mediano and Luca Faes for fruitful suggestions and discussions.
Author information
Authors and Affiliations
Contributions
Conceptualization: A.B. Data curation: A.B., E.C., R.B., M.N., G.A. Formal analysis: A.B., R.B. Funding acquisition: A.B., R.B., G.P., D.M., S.P. Investigation: A.B., R.B.. Methodology: All authors. Project administration: A.B. Resources: A.B., S.P. Software: E.C., R.B., G.A., M.N. Supervision: A.B., D.M., G.P. Visualization: A.B., E.C., R.B. Writing (original draft): A.B. Writing (review & editing): All authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Andrea Luppi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Combrisson, E., Basanisi, R., Neri, M. et al. Higher-order and distributed synergistic functional interactions encode information gain in goal-directed learning. Nat Commun 16, 7179 (2025). https://doi.org/10.1038/s41467-025-62507-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-62507-1
This article is cited by
-
Towards an informational account of interpersonal coordination
Nature Reviews Neuroscience (2026)










